Better typography with text-wrap pretty

363 points by todsacerdoti 9 days ago

taeric 9 days ago

I find myself laughing at "Many developers are understandably concerned about the performance of text-wrap: pretty." I just can't bring myself to believe there is a meaningfully sized group of developers that have considered the performance of text-wrapping.

queuebert 9 days ago

Text wrapping is actually a difficult optimization problem. That's part of the reason LaTeX has such good text wrapping -- it can spend serious CPU cycles on the problem because it doesn't do it in real time.
- taeric 9 days ago
  
  You aren't wrong; but I stand by my claim. For one, plenty of things are actually difficult optimization problems that people don't give any passing thought to.
  But, more importantly, the amount of cycles that would be needed to text-wrap most websites is effectively zero. Most websites are simply not typesetting the volumes of text that would be needed for this to be a concern.
  Happy to be shown I'm flat wrong on that. What sites are you envisioning this will take a lot of time for?
  - pcwalton 9 days ago
    
    > But, more importantly, the amount of cycles that would be needed to text-wrap most websites is effectively zero.
    I've measured this, and no, it's not. What you're missing is the complexities of typesetting Unicode and OpenType, where GSUB/GPOS tables, bidi, ruby text, etc. combine to make typesetting quite complex and expensive. HarfBuzz is 290,000 lines of code for a reason. Typesetting Latin-only text in Times New Roman is quick, sure, but that doesn't cut it nowadays.
    
    taeric 9 days ago
    
    Apologies, the additional cycles to do justified text is effectively zero compared to the rest of the stack for most sites. Clearly, it is work, so not actually zero. And, yes, proper text handling is huge.
    I would wager you can find scenarios where it is a large number. My question is if there are sites people use?
    Seriously taken, these would all be reasons not to do many of the things Unicode does. And yet here we are.
    That all said, if you have measurements, please share. Happy to be proven wrong.
    
    rhdunn 8 days ago
    
    Some complexities:
    - handling shy hyphens/hyphenation when splitting long words -- working out where to hyphenate so it is readable, then how that affects the available space takes time to compute to ensure that justified text doesn't result in large blocks of whitespace esp. for long words;
    - handing text effects like making the first letter large, or the first word(s) larger, like is done in various books;
    - reflow due to any changes resulting from text rendering/formatting (e.g. if applying kerning or hyphenation results in/does not result in text wrapping);
    - impact of things like kerning and digraphs (e.g. ffi) on text width -- including determining if a word can be split in the middle of one of these or not;
    - combining characters, emoji (with zero-width non-joiners), flag Unicode character pairs (including determining valid pairs to determine if/where to split on), etc.;
    - mixed direction text (left to right and right to left) handling and positioning;
    - the mentioned Ruby text (e.g. https://www.w3.org/International/articles/ruby/styling.en.ht...) -- dealing with both the main text wrapping and the above/below text wrapping, both of which could happen;
    - for Chinese/Japanese/Korean ensuring that characters within a word don't have extra space, as those languages don't use spacing to delimit words;
    - other things affecting line height such as sub/super script text (mathematical equations, chemical symbols, references, etc.).
    
    taeric 8 days ago
    
    This isn't really disagreeing with me? I can even add to it, for a fun example, knowing where to hyphenate a word can depend on its use in a sentence. (Strictly, I'm probably wording that poorly?)
    Like, I am largely aware that it is a hard problem. So is just rendering text, at large. The added difficulty for justifying text is still not something I expect to impact the vast majority of sites. If you are willing to break your content into multiple pages, I hazard it isn't a significant chunk of time for most content.
    Are there edge cases? Absolutely! Most of these are isolated in impact, though. And I do think a focus on whole content optimization is clouding a lot of people's view here. You are not doing yourself any favor by optimizing a full book every time a chapter changes.
    There is also the idea that you have to find the absolute best answer for justifying text. Why? One that is low enough on a penalty score is just fine. Akin to the difficulty in finding all tilings of a board, versus just finding a single tiling for a board. Or a single solution to the N-Queens question, versus all solutions. If you just want a single solution, you don't need the raw compute necessary to get them all.
  - binaryturtle 9 days ago
    
    With the current state of Websites and how much resources they waste any text wrapping is probably not an issue at all. :)
    I hardly can open any website w/o some anti-bot check burning my CPU to the ground for 1/2 minute or something (if it doesn't manage to entirely crash my Firefox in the process like cloudflare). I rather would wait for 0.2s text wrapping than that, that's for sure. :)
  - cobertos 9 days ago
    
    Any page with dynamic text. If the calculation takes a moderate amount of time, that will accumulate if the page layout reflows a lot.
    
    taeric 9 days ago
    
    Only if the entire text has to be optimized as a whole? Which, most dynamic text sites do not have to do this. Most dynamic sites will be a series of individual "card" like things that could be justified internally, but are not justified with regard to anything else on the page.
  - contact9879 9 days ago
    
    Quick example would be https://standardebooks.org/ebooks/mark-twain/the-innocents-a...
    Try zooming in and out with text-wrap: pretty vs text-wrap: wrap
    
    taeric 9 days ago
    
    I... uh, wouldn't consider a text dump of a full novel getting completely typeset as a good example to consider when talking about sites?
    
    contact9879 9 days ago
    
    sure, but html isn't only used in a browser context. I have a severely under-powered ereader that I use to read epubs (html). It already takes ten seconds to paginate on first open and font size changes. I can't imagine how long it would take to non-naively wrap lines
    
    taeric 9 days ago
    
    I don't know why you'd expect an ereader to do a full text optimization of a book, though? Pick the starting line and layout to the next screen's breaking point. If needed, consider if you left an orphan for the next page and try to adjust that into the current screen.
    Are there ereaders that have to typeset the entire text of what you are reading? What is the advantage of making the task harder?
    
    tjader 9 days ago
    
    KOReader typesets the whole book at once. It is needed in order to get page counts right, for example.
    
    taeric 9 days ago
    
    Even if that is the case, it has to redo this how often? If the page counts are to be stable, I'm assuming so are page numbers? At which point we are back to this not being something I would expect to slow things down appreciably on the vast majority of actual uses.
    Still, I should acknowledge you provided an example. It surprises me.
    
    tjader 8 days ago
    
    It needs to rerender everything whenever you change any setting that affects typesetting. This used to be quite annoying when trying out fonts or find the best value for some setting, but recently they implemented a better way so that it first renders the current fragment (usually a chapter), releases the UI so that you can read, and renders the rest in the background. Page counts and some other stuff are broken in this meantime.
    
    taeric 8 days ago
    
    That better way makes a ton of sense, and is what I would expect to be default. Getting page numbers is a flex that just doesn't feel necessary. As I said, I would expect even faster renders if it did present content first. Would edge case into unstable page counts, but I struggle to care on that? Make it an optional setting, and be done with it. Especially as I prefer resizing to keep the current start of page I'm looking at. Something obviously not guaranteed in a resize.
    
    throwanem 8 days ago
    
    > it has to redo this how often?
    As often as the font size changes.
    
    taeric 8 days ago
    
    So, never for most reads? :)
    Even the few times I do change text size on my e-readers are largely mistakes. Having gestures to resize is frustrating, in the extreme.
    
    throwanem 7 days ago
    
    Eh, I don't really have a dog in the fight. When I'm out and about I just read on my phone, which my oral surgeon says is too small for my eyes; I haven't asked my ophthalmologist for advice on my dental implants, but I have been reading as much off screens as off paper since the late 1980s, so any kind of sense I might once have had for the aesthetics of typography must surely have been strangled in the crib.
    When I'm home, I read books.
  - NoMoreNicksLeft 9 days ago
    
    Won't this end up in Apple iBooks or whatever it's called now? Most novels can be a megabyte or more of text, pretty much all of it needing to be wrapped.
    
    CharlesW 9 days ago
    
    It seems more likely that Apple would've adapted this from the proven technology that they currently use for Apple Books and everything else, TextKit (which first appeared in OpenStep). https://developer.apple.com/videos/play/wwdc2021/10061/
    
    addaon 8 days ago
    
    > Apple Books and everything else
    Can't speak to Apple Books, but at least Pages.app (and iWork in general) use a separate text engine from TextKit, focused on higher fidelity at the cost of performance -- optical kerning, etc. (Terminal.app also does not use TextKit.)
    
    alwillis 8 days ago
    
    Doubtful.
    OpenStep used Display Postscript and was written in Objective-C; WebKit is written in C++.
    Rendering text on the web is a different animal all together.
    
    NoMoreNicksLeft 8 days ago
    
    I was under the impression that when we got new css in Safari, in the next software cycle those same features ended up in Books. It wouldn't make sense to give it a different rendering engine... but then I've never been able to find much in the way of which epub readers used which rendering engines anywhere.
    
    taeric 9 days ago
    
    I mean, not wrong. But optimizing over a megabyte's worth of text is almost certainly not going to take a lot of time. Especially as there will be chapter stops. Such that we are down to, what 100k of text per chapter to layout?
    Again, I won't claim it is absolutely free. It is almost certainly negligible in terms of processing power involved with any of the things we are talking about.
- porphyra 9 days ago
  
  But with modern hardware, running the dynamic programming solution to this optimization problem takes a trivial amount of cycles* compared to rendering your typical React webapp.
  * for most webpages. Of course you can come up with giant ebooks or other lengthy content for which this will be more challenging.
  - ta988 8 days ago
    
    even on 5 pages documents LaTeX can spend a surprising amount of time
    
    taeric 8 days ago
    
    What five page documents? I've seen it zip through far larger texts on the regular.
    And is most of the effort from LaTeX in paragraph layout? Most slowness there, is in Mathematica typesetting, I would think.
    
    lttlrck 8 days ago
    
    Is Latex text wrapping known to be well optimized?
    
    taeric 8 days ago
    
    The main algorithm most any folks know by name for doing this was created for it, so sorta? I don't know that it is necessarily better than any closed source options. I was under the impression it is the baseline good, though.
    
    __david__ 8 days ago
    
    Its author was supposedly pretty good at algorithms. I think he may have even written a book or four about them. So I suspect it’s decently optimized.
- throw0101d 8 days ago
  
  > That's part of the reason LaTeX has such good text wrapping -- it can spend serious CPU cycles on the problem because it doesn't do it in real time.
  Is that the reason the Microsoft Word team tells themselves as well?
  We have multi-core, multi-gigahertz CPUs these days: there aren't cycles to spare to do this?
  - queuebert 8 days ago
    
    You would think, but Word has some serious efficiency problems that I can't explain. For one, it is an order of magnitude slower at simply performing word counts than tools like wc or awk. Besides that, the problem does not parallelize well, due to the long-range dependency of line breaks.
    Zooming in a bit, Word also does not kern fonts as well as LaTeX, so it might be missing some logic there that would trickle down into more beautiful word spacing and text flow.
  - toomim 8 days ago
    
    It's a O(n^2) to O(n!) problem, not O(n), so it doesn't scale linearly with cpu cores.
    
    taeric 8 days ago
    
    Sorta? For one, you don't always have to do full site optimization for this problem. As such, the "N" you are working over is going to be limited to the number of breaks within a section you are looking at. And you can probably divide work across the sections quite liberally on a page.
    Yes, there can be edge cases where optimizing one section causes another section to be resized. My gut is that that is the exception, not the norm. More, for most of the resizes that will lead folks to do, it will largely result in a "card" being moved up/down in such a way that the contents of that card do not need to be re-optimized.
    Yes, you could make this even harder by optimizing what the width of a section should be and flowing another section around it. But how many sites actually do something like that?
  - jcelerier 8 days ago
    
    to be honest as a LaTeX user on a very beefy CPU I regularly have 30s+ of build times for larger documents. I doubt Word users would want that. A simple 3 page letter without any graphics is a couple seconds already.
    
    taeric 8 days ago
    
    I'd hazard most of that 30s+ build time is not in the line wrapping algorithm. Most slow documents are either very heavy in math typesetting, or loaded with many cross references needing multiple passes through TeX to reconcile.
    To be clear, just because I would hazard that, does not mean I'm right. Would love to see some benchmarks.
    
    xhkkffbf 8 days ago
    
    Mine are slow because of large images. If I set "draft" mode, it speeds up dramatically. I don't know why latex needs to read the entire image file but it seems to do that.
    
    throw0101c 8 days ago
    
    > to be honest as a LaTeX user on a very beefy CPU I regularly have 30s+ of build times for larger documents.
    A lot of the time folks are sitting and thinking and things are idle. Perhaps Word could 'reflow' text in the background (at least the parts that are off screen)? Also, maybe the saved .docx could perhaps have hints so that on loading things don't have be recalculated?
    
    taeric 8 days ago
    
    Oh dear lord, the last thing I ever wanted Word to do was to try and reflow things. I can't be the only person that tried to move an image only to have it somehow disappear into the ether while Word tried to flow around it. :D
- int_19h 9 days ago
  
  Lest we forget, TeX is almost 50 years old now, so what constitutes "serious CPU cycles" has to be understood in the context of hardware available at the time.
  - setopt 8 days ago
    
    TeX is still slow to compile documents on my current device (MacBook M1), especially when compared to e.g. Typst. I can only imagine how slow it would have been on a 40yo computer.
    
    xigoi 8 days ago
    
    How much of that time is spent on text wrapping?
- jgalt212 9 days ago
  
  Computerphile did a nice video on this.
  https://www.youtube.com/watch?v=kzdugwr4Fgk
  The Kindle Text Problem - Computerphile
- jkmcf 9 days ago
  
  That's true, but I think the OP is commenting on the state of FE development :)
  - taeric 9 days ago
    
    Largely, yes. I also challenge if it would be measurable for the size of most sites.
    Typesetting all of wikipedia? Probably measurable. Typesetting a single article of wikipedia? Probably not. And I'd wager most sites would be even easier than wikipedia.
- frereubu 8 days ago
  
  You've phrased your comment as if it's a counterpoint to OP, but it's not - both can be true (and from personal experience OP is absolutely right).
watersb 8 days ago

The best study on the optimization of line breaking algorithms is now only on the Internet Archive. Lots of examples.
"Line Breaking", xxyxyz.org
https://web.archive.org/web/20171021044009/http://xxyxyz.org...
_moof 9 days ago

Same. I read that and think, oh NOW you all are worried about performance?
zigzag312 8 days ago

I once did a naive text wrapping implementation for a game and with a longer text it caused performance to drop way below 60 FPS.
This was on a 4.5 GHz quad core CPU. Single threaded performance of todays top CPUs is only 2-3x faster, but many gamers now have 144Hz+ displays.
pcwalton 9 days ago

Remember the days of Zuck saying "going with HTML5 instead of native was our biggest mistake"? Though hardware improvements have done a lot to reduce the perceptible performance gap between native and the Web, browser developers haven't forgotten those days, and layout is often high in the profile.
jcelerier 8 days ago

I have to consider the performance of rendering text literally all time, even without wrapping. This is one of the most gluttonous operations when rendering a UI if you want, say, 60 fps on a raspberry pi zero.
dominicrose 8 days ago

Basically everything that comes built-in a browser has to be perform well in most use-cases and most devices. We don't want an extra 5% quality at the cost of degraded performance.
0cf8612b2e1e 9 days ago

Open any random site without an ad blocker and it is clear that nobody cares about well optimized sites.
- Telemakhos 9 days ago
  
  Very likely the site is well optimized. It's optimized for search engines, which is why we found the site in the first place, which is in turn the reason I said "very likely" in the first sentence: we come upon web sites not truly randomly but because someone optimized them for search ranking. It also appears from your "without an ad blocker" that the site may be optimized for ad revenue, monetizing our visit as much as possible. There's probably optimization of tracking you in order to support those ads, too.
  What you're complaining about is that the site is not optimized for your reading enjoyment. The site is probably quite well optimized, but your reading enjoyment was not one of the optimizer's priorities. I think we agree about how annoying that is and how prevalent, so the news that new typographical features are coming seems to me like good news for those of us who would appreciate more sites that prioritize us the readers over other possible optimization strategies.
  - taeric 8 days ago
    
    I want to believe you. I just can't bring myself to agree, anymore. Most sites are flat out not optimized, at all. Worse, many of them have instrumentation buckled on to interface with several different analytics tools.
    And to be clear, most sites flat out don't need to be optimized. Laying out the content of a single site's page is not something that needs a ton of effort put into it. At least, not a ton in comparison to the power of most machines, nowadays.
    This is why, if I open up GMail in the inspector tab, I see upwards of 500+ requests in less than 10 seconds. All to load my inbox, which is almost certainly less than the 5 megs that has been transferred. And I'd assume GMail has to be one of the more optimized sites out there.
    Now, to your point, I do think a lot of the discussion around web technologies is akin to low level assembly discussions. The markup and script layout of most sites is optimized for development of the site and the creation of the content far more than it is for display. That we have moved to "webpack" tricks to optimize rendering speaks to that.
- lcnPylGDnU4H9OF 9 days ago
  
  The developer in such a case is only allowed to care as much as the PM.

alwillis 8 days ago

I just can't bring myself to believe there is a meaningfully sized group of developers that have considered the performance of text-wrapping.

You're kidding, right? There are a ton of non-trivial edge cases that have to be considered: break points, hyphenation, other Latin-based languages, etc.

From a Google engineer's paper describing the challenges: https://docs.google.com/document/d/1jJFD8nAUuiUX6ArFZQqQo8yT...

    Performance Considerations
    While the `text-wrap: pretty` property is an opt-in to accept slower
    line breaking, it shouldn’t be too slow, or web developers can’t use
     them due to their performance restrictions.
    
    The pinpoint result when it is enabled for all web_tests is in this CL.
    Complexity
    The score-based algorithm has different characteristics from the
    bisection algorithm. The bisection algorithm is O(n * log w) where n is
    the number of lines and w is the sum of spaces at the right end. The
    score-based algorithm is O(n! + n) where n is the number of break
    opportunities, so it will be slower if there are many break
     opportunities, such as when hyphenation is enabled.
    
    Also, computing break opportunities are not cheap; it was one of
    LayoutNG's optimizations to minimize the number of computing break
     opportunities. The score-based algorithm will lose the benefit.

    Last 4 Lines
    Because computing all break opportunities is expensive, and computing
    the score is O(n!) for the number of break opportunities, the number of
    break opportunities is critical for the performance. To minimize the
    performance impact, the implementation caches 4 lines ahead of the
     layout.
    Before laying out a line, compute line breaking of 4 lines ahead of the
     layout.
    If it finds the end of the block or a forced break, compute the score
     and optimize line breaks.
    Otherwise layout the first line from the greedy line breaking results,
     and repeat this for the next line.
    The line breaking results are cached, and used if the optimizer decided
     not to apply, to minimize the performance impact.
    
    Currently, it applies to the last 4 lines of each paragraph, where
    “paragraph” is content in an inline formatting context separated by
     forced breaks.
    The Length of the Last Line
    Because the benefit of the score-based line breaking is most visible
    when the last line of the paragraph is short, a performance
    optimization is to kick the optimizer in only when the last line is
     shorter than a ratio of the available width.
        
    Currently, it applies only when the last line is equal to or less than ⅓
    of the available width.
    Checking if the last line has only a single word
    Checking if the last line has only a single word (i.e. no break
     opportunities) requires running the break iterator, but only once.

taeric 8 days ago

You mistake my comment to mean it isn't hard. It is absolutely a difficult problem with serious edge cases. There are people that have studied it quite heavily.
They are still not a sizeable number in comparison to the number of devs that have enabled different text wrap options. Most of which do not give much thought to a setting that did not appreciably slow things down at all.

mdhb 8 days ago

I think this is the kind of nonsense the Safari team tells themselves so that they can continue to ship absolutely fucking nonsense features like this while ignoring anything that encroaches on the idea of the web becoming a meaningful competitor to their walled garden where they are able to rip off every single person involved every time a transaction occurs.

tiltowait 9 days ago

I’m pretty excited for this to be added to ereaders, which notoriously (among people who care about this kind of thing) have terrible layout engines.

velcrovan 9 days ago

Better ways of laying out digital text have existed since before ereaders existed. Even this one CSS directive has already been supported by Chrome for two years. What's missing is Amazon & co. giving a shit about it. That needle shows no signs of moving.
- MBCook 9 days ago
  
  > Even this one CSS directive has already been supported by Chrome for two years
  The article says what chrome does is only support the “no super short lines” bit.
  So while you won’t end up with one word on its own line at the end of a paragraph, it’s not trying to prevent rivers of text or keep a relatively clean ragged right or anything else.
  That’s allowed by spec, but it’s not quite the same thing.
- Cthulhu_ 9 days ago
  
  I was about to ask about that, how are / were traditional paper books lined out to prevent this? Surely not by hand. Proprietary software maybe?
  - Sharlin 9 days ago
    
    Well, before desktop publishing, by phototypesetting [1], before that by hot-metal typesetting [2], before that, by hand. Nowadays, with software like Adobe InDesign, of if you happen to be a CS/math/physics nerd, with LaTeX, which has a famously high-quality layouting engine that utilizes the Knuth–Plass line-breaking algorithm [3]. Indeed it's fairly well known that Donald Knuth created TeX because he wasn't happy with the phototypeset proofs he received in 1977 for the second edition of The Art of Computer Programming, finding them inferior compared to the hot-metal typeset first edition.
    [1] https://en.wikipedia.org/wiki/Phototypesetting
    [2] https://en.wikipedia.org/wiki/Hot_metal_typesetting
    [3] https://en.wikipedia.org/wiki/Knuth%E2%80%93Plass_line-break...
  - Telemakhos 9 days ago
    
    The old linotype machines had visual indicators of minimum and maximum line width, and the operator would make a judgement call with each line. Spacers would then automatically justify the letters. It was all mechanical and amazing.
    See http://widespacer.blogspot.com/2014/01/two-spaces-old-typist... for many details.
  - aardvark179 9 days ago
    
    Books were produced before computers, and with very good typesetting. One difference between websites and books is that theee is a feedback loop with books where somebody ia at looking at the layout and either adjusting the spacing subtly, or even editing the text to avoid problems. Sometimes this is just to ensure that left justified text isn’t too ragged on the right edge, sometimes it’s to avoid river of space rubbing through a paragraph, and sometimes it’s editing or typesetting to avoid orphans.
    But text on a page is set for a set layout, and that’s where the web really differs.
  - Finnucane 9 days ago
    
    In ye olde dayes, indeed by hand. That's why there was often extra space after punctuation. In more mechanized times, the operator has to watch for it. Proofreaders are trained to watch for loose lines, rivers, widows, hyphenation errors, and other spacing problems. Those things will be marked as errors in proof. Even with modern DTP tools, typesetters still have to make a lot of manual corrections. Of course, for print, you're setting for a fixed format. You can do a lot of fine-tuning that a browser can't do on the fly.
    
    Telemakhos 8 days ago
    
    The idea of extra space after punctuation (especially periods) being a result of printing technology is a myth. Extra space is present in handwritten documents: go look at the US Declaration of Independence or Constitution for well-known examples. People only started shortening the space between sentences to match the space between words very recently.
    
    Finnucane 8 days ago
    
    By 'very recently' you mean 'since the early 20th century'?
  - omnimus 8 days ago
    
    Nowdays basically any professionally produced book is made in Indesign. And text wrapping is semi-automated. It's automated but checked for issues and fixed manually. Indesign has two text wrapping algorithms paragraph composer that tries to balance whole paragraphs and line composer that only checks line by line.
    Surprisingly in the high-end the less automated line composer is used a lot more. It requires more work but human decisions lead to best results if done properly.
TiredOfLife 9 days ago

Currently on Android i use Moon+ reader that has hyphenation + hanging punctuation. Before that (2008-2013) I used eInk reader that came with CoolReader (its layout engine "crengine" is the base for KOReader) that also had good hyphenation and hanging punctuation and nice footnotes.
So in my experience ereaders have had great layout engines.
Finnucane 9 days ago

Given the way ebook software is developed, it'll be years before this makes it to a device near you.
- archagon 8 days ago
  
  I dunno, I imagine Apple Books would be eager to implement this as soon as possible.
MBCook 9 days ago

Note that it’s up to the browser to do whatever it thinks is best. They didn’t lay down specific rules.
So unless the e-reader uses an engine that already has good rules there will be no real change unless the manufacturer does what it should have already.
taeric 9 days ago

Is this where folks will get it for ereaders? Naively, I hadn't realized ereaders were glorified webkit displays.
- mintplant 9 days ago
  
  EPUB is HTML.
  https://www.w3.org/TR/epub-33/
  - taeric 9 days ago
    
    I knew it was related, but I had assumed it did not rely on CSS. Again, I noted it was a naive view.
    Granted, I probably view CSS with far more disdain than I should.
    
    NetOpWibby 9 days ago
    
    CSS is your friend
    
    taeric 9 days ago
    
    The cascading aspect is mostly a big L. As is the dream of user style sheets. All the more so from the obfuscated style names so many tools saddle us with.
    Don't get me wrong, there are smart people working on css. That we decided, as an industry, to treat layout as a Rube Goldberg machination of interacting rules on a growing number of elements is not something you can easily overcome.
spookie 9 days ago

Sometimes I like to think Knuth may have intrusive violent thoughts about how shit programmers make text look.
numbers 9 days ago

yeah, I agree, sometimes reading an ebook feels very off b/c of the lines just looking way too justified

Sloppy 9 days ago

Far to little effort and attention has been devoted to creating beautiful text online. The web set text back centuries. In some ways it was never this bad except for the monospaced typewriters. This is welcome indeed.

accrual 9 days ago

This made me think of one person who cares for it - Matthew Butterick's Practical Typography appears to have spent quite a bit on bringing typeset-like text to his website.
https://practicaltypography.com/
- ashton314 8 days ago
  
  MB over-engineered that book. Example: look at the paragraph that stars “But I don’t have visual skills” on this page: https://practicaltypography.com/why-does-typography-matter.h...
  Notice how the open quotation marks hang into the left margin. There’s been some recent work with CSS to make this automatic, but that’s newer than this book and support is spotty. MB made it happen with a (iirc) custom filter inside the Pollen setup he made for this book. Wild. And beautiful.
  - flobosg 8 days ago
    
    See also the added soft hyphens within each word for hyphenation: https://practicaltypography.com/optional-hyphens.html#:~:tex...
    
    OisinMoran 8 days ago
    
    Ironically (or not) this section (on mobile anyway) seems to hyphenate way too much, with three lines in a row and two lines in a row ending in a hyphen (5/8 total lines in the paragraph).

crazygringo 9 days ago

This is fantastic. I'm not surprised they focus on short last lines and on rag, since it's easy to imagine defining metrics for them to then minimize.

But they say:

> We are not yet making adjustments to prevent rivers, although we’d love to in the future.

And indeed, I couldn't even begin to guess how to define a metric for rivers, that can occur at different angles, with different variation, being interrupted by various degrees... I'm curious if there's a clever metric anybody has invented that actually works? Or does it basically require some level of neural-network pattern recognition that is way too expensive to calculate 1,000 variations of for each paragraph?

ameliaquining 9 days ago

There's a TeX package that, among many other features, detects rivers: https://mirrors.ibiblio.org/pub/mirrors/CTAN/macros/latex/co...
The intent here is that the document author is informed that their text contains rivers, and responds by tweaking the wording until they land on something that doesn't have them.
Of course, for a browser engine this is a complete nonstarter; a useful feature for dealing with rivers would require not just detecting them but automatically removing them, without changing the text content. I'm not aware of any existing software that does this, but I've found one proposed list of exploratory directions that could make a decent starting point for anyone who wanted to build this: https://tex.stackexchange.com/a/736578
taeric 9 days ago

I think the main difficulty is that it is a paragraph level optimization and not a line one. Right? Otherwise, it seems like you can probably get pretty far by defining a metric that looks at connected whitespace sections between lines? With higher penalty for connected space between words that has been stretched. (That is, if you have space between some words expanded to make them pretty at the edge, those will be more visible as rivers if they are stacked?)
And, yes, there are some concerns that are done at the line level that could lead to a paragraph getting reworked. Ending with a single word, is an easy example. That is still something where you can evaluate it at the line level easily.
- crazygringo 9 days ago
  
  I think the difficulties are, how close do spaces need to be to be considered connected? Rivers aren't only perfectly vertical. And to what degree do they need to maintain the same angle across consecutive lines? How much can they wiggle? And a river is still visible across 10 lines even if one line in the middle doesn't have the space, so it needs to be able to handle breaks in contiguity.
  There's no problem with paragraph-level optimizations inherently. Reducing raggedness is paragraph-level and that's comparatively easy. The problem is the metric in the first place.
  - taeric 9 days ago
    
    I wouldn't try and consider spaces individually, I don't think? Rather, I'd consider the amount of space being considered. We aren't talking about fixed width typesetting, after all. To that end, you will have more space after punctuation and such. Rather than try to enumerate the different options, though, you almost certainly have some model of how much "space" is in a section. Try different model weights for how much to penalize different amounts of connected space and see how well different models optimize.
    Or, maybe not? I'll note that the vast majority of "rivers" I've seen in texts coincide with punctuation quite heavily. Even the example in this article has 5/8 lines using a comma to show the river. With the other lines having what seems to be obvious stretched space between words to use more of the line? Maybe enumerating the different reasons for space would be enough?
    Granted, this also calls out how dependent you almost certainly are on the font being used, as well?
6510 9 days ago

I hve no idea but it looks something like this
https://www.loc.gov/resource/gdcwdl.wdl_03042/?sp=5&r=-0.122...

fngjdflmdflg 9 days ago

>The purpose of pretty, as designed by the CSS Working Group, is for each browser to do what it can to improve how text wraps. [...] There is no mandate for every browser to make the same choices. In fact, a browser team might decide in 2025 to handle some aspects of improving these qualities, and then change what their implementation does in the future. [...] Because of the way Chrome’s implementation of pretty has been taught, a lot of web developers expect this value is only supposed to prevent short last lines. But that was never the intention.

Why did they even design it like this in the first place? This seems like it is counter to much of what browsers have been doing recently like being able to customize select, the browser interop and baseline projects, web platform test etc. I would rather move away from these types of features in favor of more explicit ones. I understand that this isn't a serious issue and is unlikely to cause bugs compared to other interop issues which are true deviations from the spec. It just seems counterintuitive to do this though.

giraffe_lady 9 days ago

They point to the reason in the intro but don't make it explicit: it's because this is at the intersection of computing with a much older tradition, typesetting.
There's no "correct" way to typeset a document, there wouldn't even be a consensus among typesetters on what the implementation specifics look like. Rather than turn the spec committee into a decades-long ecumenical council of typographers they just left the specifics up to each individual "shop" as it always has been. Except now instead of printers it's the browser vendors needing to make the final call.
- fngjdflmdflg 9 days ago
  
  >There's no "correct" way to typeset a document
  They can add multiple typesetting properties and allow the develop to decide which one to use. Besides, letting each browser decide what the "best" line break looks like doesn't solve the problem of there not being a definitive answer to that question. Even here, I don't think the Chrome developers have a vastly different opinion on what a good line break looks like. It's possible they didn't like the performance implications of webkit's version or had some other tangential reason, although the blog says performance is not an issue.
  - giraffe_lady 9 days ago
    
    ok. you should tell them.
- mrandish 8 days ago
  
  > Rather than turn the spec committee into a decades-long ecumenical council of typographers...
  Having worked with passionate (aka opinionated) typographers, that phrasing earned a well-deserved chuckle. Leaving implementation choices up to each browser was certainly the only to way to get it into CSS. Hopefully various implementations will evolve over time and coalesce into a fairly consistent baseline.
moralestapia 9 days ago

While this is valuable work, leaving it implementation-dependent is a terrible mistake.
The whole point of CSS is/was to standardize presentation across browsers.
- alwillis 8 days ago
  
  The whole point of CSS is/was to standardize presentation across browsers.
  CSS was created to standardize how to deal with presentation, but that doesn't mean every website should look exactly the same on every device or in every browser. The era of attempting to do that is over.
  text-wrap: pretty is a great example of progressive enhancement [1]: it's a great way to add some polish to a website but if the user's device doesn't support it, they can still access all of the content on the site and be none the wiser.
  If you read the CSS specifications, browser makers, in some cases, are allowed to use platform-specific heuristics to determine whether or not to execute certain features. Downloading web fonts works like this—browsers fallback to system fonts if a webfont doesn't download within 3 seconds.
  It makes sense that text-wrap: pretty should be one of those. If your smartphone is low on power and the signal isn't that great, you can forgo expertly wrapped text and elegant hyphenation in order to view the webpage as quickly as possible.
  [1]: https://en.wikipedia.org/wiki/Progressive_enhancement
  - fngjdflmdflg 8 days ago
    
    >CSS was created to standardize how to deal with presentation, but that doesn't mean every website should look exactly the same on every device or in every browser. The era of attempting to do that is over.
    For every device I agree but that was never the goal of CSS. It is meant to respond to the device's constraints such as screen dimensions and device type (desktop, mobile, print) using eg. media queries. In every browser I do think they should try to accomplish the same thing. Even if the exact algorithms used are different, the intended result should be agreed upon.
  - moralestapia 8 days ago
    
    >but that doesn't mean every website should look exactly the same on every device or in every browser
    That was the point. Maybe Gen Z changed its meaning now, but that was the main premise. There was even the Acid3 test and similar stuff.
- fngjdflmdflg 8 days ago
  
  It seems that the original spec had a more explicit intention:
  >The pretty value is intended for body text, where the last line is expected to be a bit shorter than the average line.[0]
  which seems to mainly be about avoiding short last lines. That is from a note. The actual value "specifies the UA should bias for better layout over speed, and is expected to consider multiple lines, when making break decisions," which is more broad. But the intent is clearly specified in the note. This is also how chrome described the feature as mentioned in the article. But it does say that the effect would change in the future:
  >The feature does a little more than just ensure paragraphs don't end with a single word, it also adjusts hyphenation if consecutive hyphenated lines appear at the end of a paragraph or adjusts previous lines to make room. It will also appropriately adjust for text justification. text-wrap: pretty is for generally better line wrapping and text breaking, currently focused on orphans. In the future, text-wrap: pretty may offer more improvements.[1]
  The design doc linked in [1] says this about it:
  >The `text-wrap: pretty` is the property to minimize typographic orphans without such side effects.
  >There are other possible advantages for paragraph-level line breaking, such as minimizing rivers. The csswg/#672 describes such other possible advantages. But the initial implementation focuses on typographic orphans, as it’s the most visible benefit, and to minimize the performance impacts.
  >Because paragraph-level algorithms are slow, there are multiple variants to mitigate the performance impacts.[2]
  The new draft[3] changed it to the current definition. What's also interesting from that new draft is this new note:
  >The necessary computations may be expensive, especially when applied to large amounts of text. Authors are encouraged to assess the impact on performance when using text-wrap-style: pretty, and possibly use it selectively where it matters most.
  which seems to go against what was written in the webkit blog. If developers start using this value everywhere expecting that it will be fast then that effectively stops future implementations from using a slower but better algorithm (assuming one exists).
  [0] https://www.w3.org/TR/css-text-4/#propdef-text-wrap-style
  [1] https://developer.chrome.com/blog/css-text-wrap-pretty
  [2] https://docs.google.com/document/d/1jJFD8nAUuiUX6ArFZQqQo8yT...
  [3] https://drafts.csswg.org/css-text-4/#text-wrap-style

mac3n 8 days ago

> The demo has content in English

strange English.

> It's far text

> this text has short a lot of words all in a row

not relevant to the subject, unless you want to consider improving line breaks by rearranging words

throw0101d 9 days ago

(La)Tex still seems to have the 'best' results for line breaking:

* https://en.wikipedia.org/wiki/Knuth–Plass_line-breaking_algo...

robszumski 9 days ago

Really excited for text-wrap: balance. This will prevent a ton of breakpointing or manual line breaks for web headers.

qingcharles 5 days ago

balance has been working for a couple of years on everything except Safari (about a year on that I think). I've been using it on my headlines for that long.

IshKebab 9 days ago

Ironically the letter height for the monospace text on this website is all over the place for me. I'm using Chrome on Windows so you'd think it would work fine... Seems to be an issue with SF Mono.

janalsncm 9 days ago

I’d love to learn more about the pretty algorithm and the optimizations that have been tried out so far.

It seems like a pretty straightforward machine learning regression problem: given a sequence of word widths, find a line length which when applied to the sequence satisfy the constraint of being “pretty”.

Using a model would allow computing the answer in constant time.

deredede 8 days ago

Machine learning is not magic -- no algorithm, machine learning or otherwise, will be able to treat an arbitrary-length sequence in constant time.
The actual problem is also more complex than fixed word widths due to hyphenation and justification - from what I recall, Knuth's paper (IIRC there's two and the second one is the one to read) on TeX's layout gives a good overview of the problem and an algorithm that's still state of the art. I think the Typst people might have a blog post about their modern take on it, but I'm not sure.
alwillis 8 days ago

See https://docs.google.com/document/d/1jJFD8nAUuiUX6ArFZQqQo8yT...
accrual 9 days ago

Could it be linear time? I thought larger inputs take longer for models to process. I suppose it depends on the length of each line and the number of lines.
ezfe 8 days ago

It can't be linear time. At best it would be log(n) but that would require storing all the possible inputs in a lookup table.

OisinMoran 9 days ago

This is excellent, thank you! I hadn't heard of "balance" before either, so definitely going to experiment with that now too. Anything that can improve typography on the web, even a little bit is a big win in my opinion. I'm also stealing that 1lh tip they link to!

If you like this and are interested in something closer to TeX, why not the TeX algorithm itself!? There's this lovely package that works wonderfully on the web https://github.com/robertknight/tex-linebreak?tab=readme-ov-... And if you want to play around with a live example to check the performance, I use it on my site's about page: https://lynkmi.com/about

Been on a big LaTeX buzz recently and even added support for it in comments there too!

rambambram 9 days ago

After a quick glance this looks pretty useful for me. I'm the kind of guy who is willing to change words or sentences (up to a point) to make the overall text look prettier.

wruza 9 days ago

I also do this for monospace text, but in proportional there's perfect "fill" justification which for some reason gets ignored.
Here's how you do it in advanced layout systems like CSS: https://stackoverflow.com/questions/6205109/justify-text-to-...

aktau 8 days ago

Does anyone know how this contrasts with `text-align: justify` (mentioned in https://news.ycombinator.com/item?id=43258709#43260606)?

In Chrome (can't test Safari), `text-wrap: pretty` has a much milder effect.

Should one use both together in the main text of your average blog? I checked, they do appear to make individual changes.

cantSpellSober 8 days ago

`text-align: justify` solves a different problem, it justifies your text; both the left and right edges of each line are aligned with the margins.
> Should one use both together in the main text of your average blog
Optimize for legibility; the properties are compatible.

donbrae 8 days ago

This is great. I’ve already been using `text-wrap: balance` on headlines. Before, I was concatenating certain words with ` ` to try to avoid bad wrapping at certain viewport widths. (Doing so is still a useful trick in edge cases.) `text-wrap: pretty` should fix similar ugliness in body text.

DadBase 9 days ago

I’ve been doing this manually for years using non-breaking spaces and zero-width joins. Glad WebKit is finally catching up.

intelliot 9 days ago

Is there any concrete data showing whether this has any effect on the readability or comprehensibility of text content?

tobr 9 days ago

Good question, but keep in mind that it's claiming to `text-wrap: pretty`, not `text-wrap: comprehensible`. That should be enough!
- eviks 8 days ago
  
  It's claiming both:
  > we want you to be able to use this CSS to make your text easier to read and softer on the eyes, to provide your users with better readability and accessibility.

pronik 8 days ago

> Support for text-wrap: pretty just shipped in Safari Technology Preview, bringing an unprecedented level of polish to typography on the web.

According to caniuse.com, Chrome has had support for this since September 2023. Maybe I'm dumb, but what's so "unprecedented" about this?

lloeki 8 days ago

in TFA, it is explained how Chromium only considers the last four lines and aims to only solve one aspect (lone word) and none of rivers, bag rags, hyphenation, etc...
Comparatively:
> WebKit is not the first browser engine to implement, but we are the first browser to use it to evaluate and adjust the entire paragraph. And we are the first browser to use it to improve rag.
marcellus23 8 days ago

You're not dumb but you didn't read the article.
dspillett 8 days ago

This is how the Apple world works. Things go from unnecessary to revolutionary at the point Apply implements them, no matter how long other implementations might have already existed.
- moogleii 8 days ago
  
  Dig deeper. Use a shovel instead of a spade.
  - dspillett 8 days ago
    
    Apple aren't the only ones to do is¹, of course, but it is definitely true.
    This enough digging for you?
    Go on. Vote down without offering a counter again. Take my fake points from me. You know it'll make you feel big. :)
    ----
    [1] In true Apple style, people did it before, they polished the act a bit, and took it as theirs!

david2ndaccount 9 days ago

This does actually look a lot better. Using the inspector to toggle it on and off, “pretty” does look a lot prettier.

executesorder66 9 days ago

I'm confused, this was created by Webkit, but is currently only available on Chromium based browsers according to : https://caniuse.com/?search=text-wrap%20pretty

How did that happen?

ameliaquining 9 days ago

text-wrap: pretty tells the browser to wrap the text so as to make it look pretty. But the CSS standard doesn't specify what exactly that means; it's up to each individual browser to decide what algorithm yields the prettiest results.
Chromium is the only browser engine whose stable channel currently supports text-wrap: pretty. In this post, WebKit is announcing not only that they've implemented it (though not yet in a stable channel), but that they've done so using an algorithm that's better than Chromium's. Their algorithm adjusts for various things that Chromium's currently does not.
velcrovan 9 days ago

It wasn't created by WebKit. WebKit is announcing (at last) support for it.
- alwillis 8 days ago
  
  The WebKit implementation is the only one that can handle many pages of text with no noticeable performance hit, while Chrome and Firefox are limited to only dealing with the last 4 or 6 lines of a paragraph.
  - TiredOfLife 8 days ago
    
    Firefox doesn't have this feature at all.
    
    mrandish 8 days ago
    
    I was curious as it seems like a nice thing to have and I use Firefox, so I looked it up. It's been suggested and you can upvote it here: https://connect.mozilla.org/t5/ideas/text-wrap-pretty-for-de.... Per the last post there (Aug 24).
    > Mozilla standard position on this feature is positive > https://github.com/mozilla/standards-positions/issues/993
    
    TiredOfLife 7 days ago
    
    Firefox has 20 year old rendering related bugs still open
- executesorder66 9 days ago
  
  Now it makes much more sense.
  I realize I misread "One solution is text-wrap:pretty" as "OUr solution is text-wrap:pretty". Combined with the fact that this was on the webkit blog.
  Thanks.
moogleii 8 days ago

https://news.ycombinator.com/item?id=43630964

wruza 9 days ago

As we reached such thin matters, can we have now "ui: modular" and "data-changes: async autoupdate"? Cause using a tight userland framework for components and [a]syncing data to ui starts getting a little old.

prmph 8 days ago

Nice, but where is automatic ellipsis for multi-line text? Do the people spending time on these effort really talk to devs to determine what they find most critically missing and/or annoying about HTML?

cantSpellSober 8 days ago

Ironically the paragraph about orphans ("avoid leaving a single word by itself") puts "large" on a line by itself (at the largest viewport).

Applying `text-wrap:pretty` solves that!

pradn 9 days ago

I see a good number of these articles, each with their own typographic features. Is there a "gold standard" set of recommendations for making the most beautiful type, on the web?

presbyterian 9 days ago

I know a lot of people will point you to Practical Typography. I reference it a lot myself, and even though I don't agree with everything there, it's a really, really great reference and gets you thinking about the goals of your typography.
https://practicaltypography.com/

eviks 8 days ago

Would be nice if the first "bad" example was actually corrected and shown in the same blog that describes how it fixes the errors

xnx 9 days ago

Does different line wrapping result in difference in comprehension or reading speeds?

Eddy_Viscosity2 9 days ago

Anybody know what font the article uses? I like it.

jasonjmcghee 9 days ago

For next time, just right-click the text you're trying to identify the font for, hit "Inspect" and click "Computed" on the right in the styles sidebar. And you'll see `font-family`.
You'll often see multiple listed, like `-apple-system, "SF Pro Text", Helvetica, sans-serif` in this case. It tries to use them from left to right.
- ygra 8 days ago
  
  Browsers typically say below the styles also which fonts are actually used (there can be multiple, e.g. when substitution is required to render certain characters).
sim04ful 9 days ago

It uses: SF Display, SF Mono, and SF Text, designed by Apple.
In case you come across other website fonts you like, you can use https://fontofweb.com to get their names.
Disclaimer: I'm the creator
jsheard 9 days ago

The sans is Apple's own SF Pro, and the code is SF Mono.

curtisszmania 8 days ago

[dead]