ynfnehf 37 minutes ago

First place I read about this idea (specifically newlines, not in general trusting trust) was day 42 in https://www.sigbus.info/how-i-wrote-a-self-hosting-c-compile...

"For example, my compiler interprets "\n" (a sequence of backslash and character "n") in a string literal as "\n" (a newline character in this case). If you think about this, you would find this a little bit weird, because it does not have information as to the actual ASCII character code for "\n". The information about the character code is not present in the source code but passed on from a compiler compiling the compiler. Newline characters of my compiler can be traced back to GCC which compiled mine."

nasso_dev 2 hours ago

> This post was inspired by another post about exactly the same thing. I couldn't find it when I looked for it, so I wrote this. All credit to the original author for noticing how interesting this rabbit hole is.

I think the author may be thinking of Ken Thompson's Turing Award lecture "Reflections on Trusting Trust".

  • Karellen 2 hours ago

    Although that presentation does point out that the technique is more generally used in quines. Given that there is a fair amount of research, papers and commentary on quines, it's possible that the author may have read something along those lines.

    https://en.wikipedia.org/wiki/Quine_(computing)

happytoexplain 21 minutes ago

This is over my head. Why did we need to take a trip to discover why \n is encoded as a byte with the value 10? Isn't that expected? The author and HN comments don't say, so I feel stupid.

  • kibwen 16 minutes ago

    The point is to ask "who" encoded that byte as the value of 10. If you're writing a parser and you parse a newline as the escape sequence `\n`, then where did the value 10 come from? If you instead parse a newline as the integer literal `10`, then where does the actual binary value 1010 come from?

    The ultimate point of this exercise is to alter your perception of what a compiler is (in the same way as the famous Reflections On Trusting Trust presentation).

    Which is to say: your compiler is not something that outputs your program; your compiler is also input to your program. And as a program itself, your compiler's compiler was an input to your compiler, which makes it transitively an input to your program, and the same is true of your compiler's compiler's compiler, and your compiler's compiler's compiler's compiler, and your compiler's compiler's compiler's compiler's compiler, and...

tzot an hour ago

I always thought, maybe because of C, that \0??? is an octal escape; so in my mind \012 is \x0a or 0x0a, and \010 is 0x08.

So I find this quite confusing; maybe OCaml does not have octal escapes but decimal ones, and \09 is the Tab character. I haven't checked.

atoav an hour ago

One rule of programming I figured out pretty quick is: if there are two ways of doing it and there is a 50/50 chance of one being correct and the other one isn't, chances are you will get it wrong the first time.

  • chgs an hour ago

    The USB rule.

    First time is the wrong way up

    Second time is also the wrong way up

    Third time works

    • jancsika an hour ago

      It's like the Two General's Problem embedded in a single connector.

      You never really know it's right until you take it out and test the friction against the other orientation.

    • fader an hour ago

      It's because of the quantum properties of USB connectors. They have spin 1/2.

      • SAI_Peregrinus an hour ago

        I thought it was because USB connectors occupy 4 spatial dimensions.

        • PaulDavisThe1st 42 minutes ago

          That's good, because otherwise we'd never be able to find them when we need them.

    • dtgriscom 41 minutes ago

      I boosted my USB plugged-in-successfuly-on-first-try rate when I imagined the offset block in the cable male USB connector as being heavy, so it should be below the centerline when plugged into a laptop's female USB connector. (Only works when the connector is horizontal, but better than nothing.)

    • dailykoder 22 minutes ago

      It's actually super easy and, atleast for me, was always intuitive. Most USB cables have their logo or something else engraved on the "top" with the air gap. And since the ports are mostly arranged the same way, there is rarely any problem. Maybe I am just too dumb to understand jokes, but it always confused me :(

dist-epoch 2 hours ago

I remember a similar article for some C compiler, and it turned out the only place the value 0x10 appeared was in the compiler binary, because in the source code it had something like "\\n" -> "\n"

kijin 2 hours ago

The incorrect capitalization made me think that, perhaps, there's a scarcely known escape sequence \N that is different from \n. Maybe it matches any character that isn't a newline? Nope, just small caps in the original article.

  • cpach 2 hours ago

    If you do view source, it’s actually \n, but it’s not displayed as such because of this CSS rule:

      .title {
        font-variant: small-caps;
      }
    • sedatk an hour ago

      So, the HN title is wrong.

      • isatty an hour ago

        The original title is.

        • deathanatos 23 minutes ago

          In addition to what others have said about smallcaps being a stylistic rendering, if you copy & paste the original title, you'll get

            Whence '\n'?
        • niederman an hour ago

          No, the original title is correct, small caps are just an alternate way of setting lowercase letters.

  • deathanatos 20 minutes ago

    Python has a \N escape sequence. It inserts a Unicode character by name. For example,

      '\N{PILE OF POO}'
    
    is the Unicode string containing a single USV, the pile of poop emoji.

    Much more self-documenting than doing it with a hex sequence with \u or \U.

  • paulddraper 2 hours ago

    There is actually.

    Many systems use \N in CSVs or similar as NULL, to distinguish from an empty string.

    I figured this is what the article was about?

archmaster 2 hours ago

if only this went into where the ocaml escape came from :)

  • diath 2 hours ago
    • fiddlerwoaroof an hour ago

      But this doesn’t really explain anything: ‘\010’ isn’t really any more primitive than ‘\x0a’: they’re just different representations of the same bit sequence

      • fluoridation 32 minutes ago

        But it is more primitive than '\n', and can be rendered into binary without any further arbitrary conversion steps (arbitrary in that there's nothing in '\n' that says it should mean 10). It's just "transform the number after the backslash into the byte with that value".

gjvc an hour ago

this is a nothingburger of an article

coolio1232 an hour ago

I thought this was going to be about '\N' but there's only '\n' here.

  • dang an hour ago

    It's in the html doc title but the article doesn't deliver.