Counterexample: Ive been able to complete more side projects in the last month leveraging llms than i have ever in my life. One of which I believe to have potential as a viable product, and another which involved complicated rust `no_std` and linker setup for compiling rust code onto bare metal RISCV from scratch.
I think the key to being successful here is to realize that you're still at the wheel as an engineer. The llm is there to rapidly synthesize the universe of information.
You still need to 1) have solid fundamentals in order to have an intuition against that synthesis, and 2) be experienced enough to translate that synthesis into actionable outcomes.
If youre lacking in either, youre at the same whims of copypasta that have always existed.
I’ve found LLMs to basically be a more fluent but also more lossy way of interfacing with stack overflow and tutorials.
If a topic is well represented in those places, then you will get your answer quicker and it can be to some extent shaped to your use case.
If the topic is not well represented there, then you will get circular nonsense.
You can say “obviously, that’s the training data”, and that’s true, and I do find it obvious personally, but the reaction to LLMs as some kind of second coming does not align with this reality.
Is it possible that you’re both using LLMs the same way you’d use SO and that’s the reason you see such similarities? The reason I ask is because it doesn’t not match my experience. It feels more like I’m able to Matrix-upload docs into my brain like Trinity learning to fly a helicopter.
I am using it like stack overflow in the sense that I’m solving a problem and I’m using it to answer questions when I’m in an unfamiliar or non-obvious place in the problem space.
If I have a question about a first order language or framework feature or pattern, it works great. If I have a question about a second order problem, like an interaction between language or framework features, or a logical inconsistency in feature behavior, then it usually has no idea what’s going on, unless it turns out to be a really common problem such as something that would come up when working through a tutorial.
For code completion, I’ve just turned it off. It saves time on boilerplate typing for sure, but the actual content pieces are so consistently wrong that on balance I find it distracting.
Maybe I have a weird programming style that doesn’t mesh well with the broader code training corpus, not sure. Or maybe a lot of people spend more time in the part of problem-space that intersects with tutorial-space? I am not very junior these days.
That being said I definitely do use LLMs to engage with tutorial type content. For that it is useful. And outside of software it is quite a bit better for interfacing with Wikipedia type content. Except for the part where it lies to your face. But it will get better! Extrapolating never hurt anyone.
Same. I use it to bootstrap my writing a react native app from pretty low familiarity with React.
It's pretty good at writing screens in broad strokes. You will have to fill in some details.
The exact details of correctly threading data through; or prop drilling vs alternatives; the rules around wrapping screens to use them in React Navigation? It's terrible at them.
> I’m able to Matrix-upload docs into my brain like Trinity learning to fly a helicopter.
So you're using it like Wikipedia? I find when learning something new (and non coding related) YouTube is infinitely better than an LLM. But then I prefer visual demonstration to tutorial or verbal explanation.
> It feels more like I’m able to Matrix-upload docs into my brain like Trinity learning to fly a helicopter
Did you puzzle about this sentence specifically? Imagine you don't know jack about flying helicopters, then Tank uploads the Helicopter Pilot Program (TM) directly to your brain; it would feel like magic.
Conversely, if you know a lot about helicopters, just not enough to fly a B-212, and the program includes instructions like "Press (Y) and Left Stick to hover", you'd know it's confabulating real world piloting with videogames.
That's the same with LLMs, you need to know a lot of the field to ask the right questions, and recognize/correct slop or confabulation, otherwise they seem much more powerful and smart than they really are.
I find that LLMs are almost comically bad at projects that have a hardware component like RaspberryPi or Pico, or Ardunio.
I think that its because often the libraries you use are niche or have a a few similar versions, the LLM really commonly hallucinated solutions and would continually suggest that library X did have that capability. I think because often in hardware projects you often hit a point where you can't do something or you need to modify a library, but the LLM tries to be "helpful" and it makes up a solution.
Based on my own modestly successful forays into that world, I have to imagine one problem the LLMs have in that space is terrible training data. A good three quarters of any search result you search for in that space will be straight-up out-of-date and not work on your system. Then you've got all the tiny variations between dozens of chipsets, and all the confidently wrong people on the internet telling you to do nonsensical things, entire ecosystems that basically poofed out of existence three years ago but are still full of all kinds of juicy search terms and Google juice... if I can hardly paw through this stuff with years of experience and the physical hardware right in front of me to verify claims with, I don't know how an LLM is supposed to paw through all that and produce a valid answer to any question in that space, without its own hardware to fiddle with directly.
The number of projects I’ve done where my notes are the difference between hours of relearning The Way or instant success. Google doesn’t work as some niche issue is blocking the path.
ESP32, Arduino, Home Assistant
And various media server things.
They’re also pretty bad at typescript generics. They’re quite good at explaining concepts (like mapped types), but when push comes to shove they generate all sorts of stuff that looks convincing, but doesn’t pass the type checker.
And then you’ll paste in the error, and they’ll just say “ok I see the problem” and output the exact same broken code lol.
I’m guessing the problem is lack of training data. Most TS codebases are mostly just JS with a few types and zod schemas. All of the neat generic stuff happens in libraries or a few utilities
Actually, it's because many of the people writing tutorials and sharing answers about that stuff don't know what the hell they're doing or grasp the fundamentals of how those systems work and so most of the source material the LLM's are trained on is absolute garbage.
Public Arduino, RPi, Pico communities are basically peak cargo cult, with the blind leading the blind through things they don't understand. The noise is vastly louder than the signal.
There's a basically giant chasm between expereinced or professional embedded developers that mostly have no need to ever touch those things or visit their forums, and the confused hobbyists on those forums randomly slapping together code until something sorta works while trying to share their discoveries.
Presumably, those communities and their internal knowledge will mature eventually, but it's taking a long long time and it's still an absolute mess.
If you're genuinely interested in embedded development and IoT stuff, and are willing to put in the time to learn, put those platforms away and challenge yourself to at least learn how to directly work with production-track SoC'a from Nordic or ESP or whatever. And buy some books or take some courses instead of relying on forums or LLM's. You'll find yourself rewarded for the effort.
> Presumably, those communities and their internal knowledge will mature eventually, but it's taking a long long time and it's still an absolute mess.
It won't because the RPi are all undocumented, closed-source toys.
It would be an interesting experiment to see which chips an LLM is better at helping out with: RPi's with its hallucinatory ecosystem or something like the BeagleY-AI which has thousands of pages of actual TI documentation for its chips.
It would be really nice if the LLMs could cover for this and circumvent where RPi's keep getting used because they were dumped under cost to bootstrap a network effect.
>Presumably, those communities and their internal knowledge will mature eventually, but it's taking a long long time and it's still an absolute mess.
I'm not sure they will. There's a kind of evaporative cooling effect where once you get to a certain level of understanding you switch around your tools enough that there's not much point interacting with the community anymore.
I was just today trying to fix some errors in an old Linux kernel version 3.x.x .dts file for some old hardware, so that I could get a modern kernel to use it. ChatGPT seemed very helpful at first - and I was super impressed. I thought it was giving me great insight into why the old files were now producing errors … except the changes it proposed never actually fixed anything.
Eventually I read some actual documentation and realised it was just spouting very plausible sounding nonsense - and confident at it!
The same thing happened a year or so ago when I tried to get a much older ChatGPT to help me with with USB protocol problems in some microcontroller code. It just hallucinated APIs and protocol features that didn’t actually exist. I really expected more by now - but I now suspect it’ll just never be good at niche tasks (and these two things are not particularly niche compared to some).
For the best of both worlds make the LLM first 'read' the documentation, and then ask for help. Make a huge difference in the quality and relevance of the answers you get.
Claude is like having my own college professor. I've learned more in the past month with Claude then I learned in the past year. I can ask questions repeatedly and get clarification as fine as a need it. Granted, Claude has limits, but its a game-changer.
> I think the key to being successful here is to realize that you're still at the wheel as an engineer. The llm is there to rapidly synthesize the universe of information.
Bingo. OP is like someone who is complaining about the tools, when they should be working on their talent. I have a LOT of hobbies (circuits, woodworking, surfing, playing live music, cycling, photography) and there will always be people who buy the best gear and complain that the gear sucks. (NOTE: I"m not implying claude is "the best gear", but it's a big big help.)
I think the only problem with LLMs is synthesis of new knowledge is severely limited. They are great at explaining things others have explained, but suck hard at inventing new things. At least that's my experience with Claude: it's terrible as a "greenfield" dev.
I don't use Claude, so maybe there's a huge gap in reliability between it and ChatGPT 4o. But with that disclaimer out of the way, I'm always fairly confused when people report experiences like these—IME, LLMs fall over miserably at even very simple pure math questions. Grammatical breakdowns of sentences (for a major language like Japanese) are also very hit-or-miss. I could see an LLM taking the place of, like, an undergrad TA, but even then only for very well-trod material in its training data.
(Or maybe I've just had better experiences with professors, making my standard for this comparison abnormally high :-P )
EDIT: Also, I figure this sort of thing must be highly dependent on which field you're trying to learn. But that decreases the utility of LLMs a lot for me, because it means I have to have enough existing experience in whatever I'm trying to learn about so that I can first probe whether I'm in safe territory or not.
Major in the context of Japanese is rough, I can see a significant drop in quality when interacting with the same model in say Spanish vs English
For as rich a culture the Japanese have, there's only about 1XX million speakers and the size of the text corpus really matters here, the couple billion of English speakers are also highly motivated to choose English over anything else because Lingua Franca has homefield advantage
To use LLM's efectively you have to work with knowledge of their weaknesses, Math is a good example, you'll get better results from Wolphram Alpha even for the simple things, which is expected
Broad reasoning and explanations tend to be better than overly specific topics, the more common a language, the better the response
If a topic has a billion tutorials online, an LLM has a really high chance of figuring out first try
Be smart with the context you provide, the more you actively constrain an LLM, the more likely it is to work with you
I have friends that just use it to feed class notes to generate questions and probe it for blindspots until they're satisfied, the improvements on their grade s make it seem like a good approach, but they know that just feeding responses to the LLM isn't trustworthy, so they do and then they also check by themselves, the extra time valuable by itself, if just to improve familiarity with the subject
> LLMs fall over miserably at even very simple pure math questions
They are language models, not calculators or logic languages like Prolog or proof languages like Coq. If you go in with that understanding, it makes a lot more sense as to their capabilities. I would understand the parent poster to mean that they are able to ask and rapidly synthesize information from what the LLM tells them, as a first start rather than necessarily being 100% correct on everything.
I think a lot of these people object to AI probably see the gross amounts of energy it is using, or the trillions of dollars going to fewer than half a dozen men (most american, mostly white).
But, once you've had AI help you solve some gnarly problems, it is hard not to be amazed.
And this is coming from a gal who thinks the idea of self-driving cars is the biggest waste of resources ever.
(EDIT: Upon rereading this, it feels unintentionally blunt. I'm not trying to argue, and I apologize if my tone is somewhat unfriendly—that's purely a reflection of the fact that I'm a bad writer!)
Sorry, maybe I should've been clearer in my response—I specifically disagree with the "college professor" comparison. That is to say, in the areas I've tried using them for, LLM's can't even help me solve simple problems, let alone gnarly ones. Which is why hearing about experiences like yours leaves me genuinely confused.
I do get your point about people disagreeing with modern AI for "political" reasons, but I think it's inaccurate to lump everyone into that bucket. I, for one, am not trying to make any broader political statements or anything—I just genuinely can't see how LLMs are as practically useful as other people claim, outside of specific use cases.
Simple: one thing I'm learning about is RFCs for TCP/IP. I can literally go test it. It's like saying, "How do you know it is right when it says 2+2=4"? Some knowledge when taught is self-correcting. Other things I'm studying, like, say, tensor calculus, I can immediately use and know I learned it correctly.
TCP/IP is a great example though of something you can get seemingly correct and then be subject to all kinds of failure modes in edge cases you didn’t handle correctly (fragmentation, silly windows, options changing header sizes, etc).
> My college professor has certifications and has passed tests that weren't in their training data.
Granted, they are not (can't be) as rigorous as the tests your professor took, but new models are run through test suites before being released, too.
That being said, I saw my college professors making up things, too (mind you, they were all graduated from very good schools). One example I remember was our argument with a professor who argued that there is a theoretical limit for the coefficient of friction, and it is 1. That can potentially be categorised as a hallucination as it was completely made up and didn't make sense. Maybe it was in his training data (i.e. his own professors).
I agree with the "I don't know" part, though. This is something that LLMs are notoriously bad.
It is immediately wrong in Step 1. A newborn is not a 2:1 ratio of height:width. Certainly not 25cm width (what does that even mean? Shoulder to shoulder?).
This is a perfect example of where not knowing the “domain” leads you astray. As far as I know “newborn width” is not something typically measured, so Claude is pulling something out of thin air.
Indeed you are showing that something not in the training data leads to failure.
Babies also aren't rectangles.. you could lay a row shoulder to shoulder, then do another row upside down from the first and their heads would fit between the heads of the first row, saving space.
Edit: it also doesn't account for the fact the moon is more or less a sphere, and not a flat plane.
Ask them what's the airspeed velocity of a laden astronaut riding a horse on the moon...
Edit: couldn't resist, and dammit!!
Response: Ah, I see what you're doing! Since the Moon has no atmosphere, there’s technically no air to create any kind of airspeed velocity. So, the answer is... zero miles per hour. Unless, of course, you're asking about the speed of the horse itself! In that case, we’d just have to know how fast the astronaut can gallop without any atmosphere to slow them down.
But really, it’s all about the fun of imagining a moon-riding astronaut, isn’t it?
Did you actually test the math done? Usually LLMs are terrible at math as, as I mentioned in another comment, they are language models, not calculators. Hopefully that changes when LLMs leverage other apps like calculators to get their results, I am not sure if Claude does that already or it's still in development.
You can also test your professor's answers. I don't just walk around going "Oh, Claude was right", I'm literally using what I just learned and am generating correct results. I'm not learning facts like dates, or subject things, I'm learning laws, equations, theories, proofs, etc. (Like how to apply Euler's totient or his extended theories on factorization... there's only one "right answer").
Also, you method for attesting your professors accuracy is inherently flawed. That little piece of paper on their wall doesn't correlate with how accurate they are; it doesn't mean zero, but it isn't foolproof. Hate to break it to you, but even heroes are fallible.
I find LLMs to be decent unblockers. I only turn to them form time to time though, unless Im in a playful mode and try poking out various ideas. As a coder I also ask for snippets when Im lazy. I tried growing a slightly larger solution a few times and it failed in dumb ways. It was clear it doesn’t really comprehend we do, it’s not aware it’s moving in circles and so on. All these things will probably see a lot incremental of improvents and as a tool will definitely stay but fundamentally LLMs can’t really think, at least the way we do and expecting that is also foolish.
> which involved complicated rust `no_std` and linker setup for compiling rust code onto bare metal RISCV from scratch.
That's complicated, but I wouldn't say the resulting software is complex. You gave an LLM a repetitive, translation-based job, and you got good results back. I can also believe that an LLM could write up a dopey SAAS in half the time it would take a human to do the same.
But having the right parameters only takes you so far. Once you click generate, you are trusting that the model has some familiarity with your problem and can guide you without needing assistance. Most people I've seen rely entirely on linting and runtime errors to debug AI code, not "solid fundamentals" that can fact-check a problem they needed ChatGPT to solve first place. And the "experience" required to iterate and deploy AI-generated code basically boils down to your copy-and-paste skills. I like my UNIX knowledge, but it's not a big enough gate to keep out ChatGPT Andy and his cohort of enthusiastic morons.
We're going to see thousands of AI-assisted success stories come out of this. But we already had those "pennies on the dollar" success stories from hiring underpaid workers out of India and Pakistan. AI will not solve the unsolved problems of our industry and in many ways it will exacerbate the preexisting issues.
If the summary goal of your existence is to be the most delirious waste of resources that humanity has yet known, sure. It's the hammer and nail of spoiled burnouts everywhere that need a credible ruse to help them out of the bottle.
Some of us are capable of wanting for things better than a coin-operated REST API. The kind of imagination used to put people on the moon, that now helps today's business leaders imagine more profitable ways to sell anime pornography on iPhone. (Don't worry, AI will disrupt that industry too.)
Now, you are paying a Taiwanese or American company to produce GPUs for you. This allows you to use open-source models like DeepSeek R1, significantly reducing your reliance on Indian tech labor
Yep. I think a default state of skepticism is an absolute necessity when working with these tools.
I love LLMs. I agree with OP them expanding my hobby capacity as well. But I am constantly saying (in effect) “you sure…?” and tend to have a pretty good bs meter.
I’m still working to get my partner to that stage. They’re a little too happy to accept an answer without pushback or skepticism.
I think being ‘eager to accept an answer’ is the default mode of most people anyway. These tools are likely enabling faster disinformation consumption for the unaware.
Yes, you essentially have an impossibly well read junior engineer you can task with quick research questions like, "I'm trying to do x using lib y, can you figure that out for me." This is incredibly productive because in the answer is typically all the pieces you need but not always assembled right.
Getting the LLM to pull out well-known names of concepts is for me the skill you can't get anywhere else. You can describe a way to complete a task and ask for what it's called and you'll be heading down arxiv links right away. Like yes the algorithm to find the closest in edit distance and length needle string in a haystack is called Needleman–Wunsch, of course Claude, everyone knows that.
Junior devs can get plenty of value out of them too, if they have discipline in how they use them - as a learning tool, not as a replacement for thinking about projects.
Senior devs can get SO much more power from these things, because they can lean on many years of experience to help them evaluate if the tool is producing useful results - and to help them prompt it in the most effective way possible.
A junior engineer might not have the conceptual knowledge or vocabulary to say things like "write tests for this using pytest, include a fixture that starts the development server once before running all of the tests against it".
IMO experience provides better immunity for common hangups. Generated code tends to be directionally pretty good, but with lots of minor/esoteric failures. The experience to spot those fast and tidy them makes all the difference. Copilot helps me move 10x faster with tedious arduino stuff, but I can easily see where if I didn't have decent intuition around debugging and troubleshooting, there'd be almost zero traction since it'd be hard to clear that last 10% hurdle needed to even run the thing.
I wouldn't assume that at all. Most of the senior devs I talk to on a regular basis think commercial* LLMs are ridiculous and the AI hype is nonsensical.
* I put commercial there as a qualifier because there's some thought that in the future, very specifically-trained smaller models (open source) on particular technologies and corpuses (opt-in) might yield useful results without many of the ethical minefields we are currently dealing with.
It depends it think it’s less about how senior they are and how good they are at writing requirements, and knowing what directives should be explicitly stated and what can be safely inferred.
Basically if they are good at utilizing junior developers and interns or apprentices they probably will do well with an LLM assistant.
Ya. I think people that have better technical vocabulary and an understanding of what should be possible with code do better. That’s usually a senior engineer, but not always
It's the LLM paradox, seniors get more senior with them while juniors get more junior, creating a bimodal distribution in the future simply because juniors will start depending on them too much to learn how to code properly while seniors (who some may also exhibit the previous trait) will by and large be able to rapidly synthesize information from LLMs with their own understanding.
I had a couple of the most capable senior developers reach out to me to tell me how Github Copilot accelerated their productivity, which surprised me initially. So I think there's something to it.
Exactly this. OP, credit where credit is due, appears to be someone who “hacks things together” copy pasting solutions blindly from the internet - with little intuition gained along the way.
I agree with his point about asking AI to “fix” problems though. It’s really nice that you don’t have to fully understand something to use it, but that becomes a problem if you lean on it too much
Ime engineers who find LLM useful have misunderstood their reasons for existence outside of being salaried..
What is your main project ? Do you LLM that? (
I wager you're not a rust expert and should maybe reconsider using rust in your main project.
FWIW asking LLM whether you should use rust ~ asking it about the meaning of life. Important questions that need answers, but not right away! (A week or 2 tops)
If you need to synthesize the universe of information with LLM.. that is not the universe you want to live or play in
Indeed LLMs are useful as an intern, they are at the “cocky grad” stage of their careers. If you don’t understand the problem and can’t steer the solution and worse has only limited understanding of the code they produce you are unlikely to be productive.
On the other hand if you understand what needs to be done, and how to direct the work the productivity boost can be massive.
Claude 3.5 sonnet and O1 are awesome at code generation even with relatively complex tasks and they have a long enough context and attention windows that the code they produce even on relatively large projects can be consistent.
I also found a useful method of using LLMs to “summarize” code in an instructive manner which can be used for future prompts. For example summarizing a large base class that may be reused in multiple other classes can be more effective than having to overload a large part of your context window with a bunch do code.
In my experience LLMs will help you with things that have been solved thousands of times before and are just a matter of finding some easily researched solution.
The very moment when you try to go off the beaten path and do something unconventional or stuff that most people won't have written a lot about, it gets more tricky. Just consider how many people will know how to configure some middleware in a Node.js project... vs most things related to hardware or low level work. Or even working with complex legacy codebases that have bits of code with obscure ways of interacting and more levels of abstraction that can be reasonably put in context.
Then again, if an LLM gets confused, then a person might as well. So, personally I try to write code that'd be understandable by juniors and LLMs alike.
In my experience, a LLM decided to not know type alignment rules in C and confidently trotted out the wrong answer. It left a horrible taste in my mouth for the one time I decided to look at using a LLM for anything and it keeps leaving me wondering if I'd end up more time bashing the LLM into working than just working out the answer myself and learning the underlying reasoning why.
It was so wrong that I wonder what version of the C standard it was even hallucinating.
> This PR provides a big jump in speed for WASM by leveraging SIMD instructions for qX_K_q8_K and qX_0_q8_0 dot product functions.
> Surprisingly, 99% of the code in this PR is written by DeekSeek-R1. The only thing I do is to develop tests and write prompts (with some trials and errors)
A single PR doesn't really "prove" anything. Optimization passes on well-tested narrowly scoped code are something that LLMs are already pretty good at.
Nah, in my experience, if there is the slightest error in the first sentence of the chain of thought, it tends to get worse and worse. I've had prompts that would generate a reasonable response in llama, but turn out utter garbage in Deepthink.
But how is this any different from real humans? They are not always right either. Sure, humans can understand things better, but are we really going to act like LLMs can't get better in the next year? And what about the next 6 months? I bet there are unknown startups like Deepseek that can push the frontier further.
The ways in which humans err are very different. You have a sense of your own knowledge on a topic and if you start to stray from what you know you're aware of it. Sure, you can lie about it but you have inherent confidence levels in what you're doing.
Sure, LLMs can improve but they're ultimately still bound by the constraints of the type of data they're trained on and don't actually build world models through a combination of high bandwidth exploratory training (like humans) and repeated causal inference.
at a certain point though, one wonders if you can trust people to accurately report how much is written by an LLM. (not even implying bad faith, but if you're constantly re-reading, selecting and re-combining snippets written by LLMs, it's not really "written" by LLMs in the same way that's implied).
We kinda went through this with images when Photoshop and similar tools appeared. I remember a lot of people asking questions in the late 90s/early 00s in particular about if an image were “real” or not and the distinctions between smart photography and digital compositions. Nowadays we just assume everyone is using such tools as a baseline and genuinely clever photography is now celebrated as an exception. Perhaps ditto with CGI and prop/set making in movies. Unless a director crows about how genuine the effects are, we assume CGI.
> at a certain point though, one wonders if you can trust people to accurately report how much is written by an LLM.
That's an interesting thought. I think there are ways to automate this, and some IDEs / tools track this already. I've seen posts by both Google and Amz providing percentages of "accepted" completions in their codebases, and that's probably something they track across codebases automatically.
But yeah I agree that "written by" doesn't necessarily imply "autonomously", and for the moment it's likely heavily curated by a human. And that's still ok, IMO.
I use CoPilot pretty much as a smarter autocomplete that can sometimes guess what I'm planning to type next. I find it's not so good at answering prompts, but if I type:
r = (rgba >> 24) & 0xff;
...and then pause, it's pretty good at guessing:
g = (rgba >> 16) & 0xff;
b = (rgba >> 8) & 0xff;
a = rgba & 0xff;
... for the next few lines. I don't really ask it to do more heavy lifting than that sort of thing. Certainly nothing like "Write this full app for me with these requirements [...]"
LLMs are surprisingly good at Haskell (and I'm not).
I hope for a rennaisance of somewhat more rigorous programming languages: you can typecheck the LLM suggestions to see if they're any good. Also you can feed the type errors back to the LLM.
I've only started using LLMs for code recently, and I already tend to mentally translate what I want to something that I imagine is 'more commonly done and well represented in the training data'.
But especially the ability to just see some of the stuff it produces, and now to see its thought process, is incredibly useful to me already. I do have autism and possibly ADD though.
I have fought the "lowest cognitive load" code-style fight forever at my current gig, and I just keep losing to the "watch this!" fancytowne code that mids love to merge in. They DO outnumber me, so... fair deuce I suppose.
There is value in code being readable by Juniors and LLMs -- hell, this Senior doesn't want to spend time figuring out your decorators, needless abstractions and syntax masturbation. I just want to fix a bug and get on with my day.
While I think this comment got flagged (probably for the way it was worded), you aren't wrong! A good way I've heard a similar thought expressed is that code should be not only easy to maintain, but also throw away and replace, which more or less urges you towards writing the easiest things you can get away with (given your particular performance requirements and other constraints).
> AI isn’t a co-pilot; it’s a junior dev faking competence. Trust it at your own risk.
This is a good take that tracks with my (heavy) usage of LLMs for coding. Leveraging productive-but-often-misguided junior devs is a skill every dev should actively cultivate!
> Leveraging productive-but-often-misguided junior devs is a skill every dev should actively cultivate!
Feels like this is only worthwhile because the junior dev learns from the experience; an investment that yields benefits all around, in the broad sense. Nobody wants a junior around that refuses to learn in perpetuity, serving only as a drag on productivity and eventually your sanity.
That's somewhere that the AI-as-junior-dev analogy breaks down a little.
There's still incredible accumulated value here, but it's at the other end. The more times you successfully use an LLM to produce working code, the more you learn about how to use them - what they're good at, what they're bad at, how to effectively prompt them.
I don't think GP was talking about themselves being a junior using LLMs, at least my interpretation was that devs should learn how to leverage misguided junior, and LLMs are more-or-less on the level of a misguided junior.
Which I completely agree, I use LLMs for the cases where I do know what I'm trying to do, I just can't remember some exact detail that would require reading documentation. It's much quicker to leverage a LLM rather than going on a wild goose chase of the piece of information I know exists.
Also it's a pretty good tool to scaffold the boring stuff, asking a LLM "generate test code for X asserting A, B, and C" and editing it to be a proper test frees up mental space for more important stuff.
I wouldn't trust a LLM to generate any kind of business logic-heavy code, instead I use it as a quite smart template/scaffold generator.
I know the details, I've been through the wading, thrashing around the docs, the books, I just can't recall the right incantation at that moment and a LLM is more efficient than searching the web.
I still have the skills to search the web if the magic piano disappears.
Don't know why you are trying to come up with a situation that doesn't exist, what's your point exactly against this quite narrow use-case?
It is quite remarkable that we are already at the stage where saying "this AI is about as competent as an inexperienced college graduate" constitutes criticism. It is entirely proper for people to be engaging sceptically with LLMs at their current level of capability, but I think we should also keep in mind the astonishingly rapid growth rate in their performance. LLMs were a toy two years ago, they're now a useful if flawed colleague, but what can we expect two years from now?
I mean 2 years ago they were at about the same place, theres been very little practical gain from gpt4 in my opinion. No matter the model the fundamental failure cases have remained the same.
I disagree, context size alone has exploded from 8k to 200k now and that makes a huge difference. LLMs have also progressed significantly in many other metrics, code quality, understanding, etc. The recent reasoning models have upped the ante further, especially when combined with models that are good at editing code.
Minor correction: GPT-4 was announced on March 14, 2023, less than two years ago. I don’t remember how much LLMs had been discussed as coding assistants before then, but it was Greg Brockman’s demonstration of using it to write code that first brought that capability to my attention:
I used Claude to help me build a side project in 4 hours that I would never have built otherwise. Essentially, it's a morphing wavetable oscillator in React (https://waves.tashian.com).
Six months ago, I tried building this app with ChatGPT and got nowhere fast.
Building it with Claude required a gluing together a few things that I didn't know much about: JavaScript audio processing, drawing on a JavaScript canvas, an algorithm for bilinear interpolation.
I don't write JavaScript often. But I know how to program and I understand what I'm looking at. The project came together easily and the creative momentum of it felt great to me. The most amazing moment was when I reported a bug—I told Claude that the audio was stuttering whenever I moved the controls—and it figured out that we needed to use an AudioWorklet thread instead of trying to play the audio directly from the React component. I had never even heard of AudioWorklet. Claude refactored my code to use the AudioWorklet, and the stutter disappeared.
I wouldn't have built this without Claude, because I didn't need it to exist that badly. Claude reduced the creative inertia just enough for me to get it done.
I have used LLMs as a tool and I start to "give up" working with it after a few tries. It excels at simple tasks, boilerplate, or scripts but larger programs you really have to know what exactly you want to do.
I do see the LLMs ingesting more and more documentation and content and they are improving at giving me right answers. Almost two years ago I don't believe they had every python package indexed and now they appear to have at least the documentation or source code of it.
The trouble is the only reliable use-case LLMs actually seem good at is "augmented search engine". Any attempts at coding with them just end up feeling like trying to code via a worse interface.
So it's handy to get a quick list of "all packages which do X", but it's worse then useless to have it speculate as to which one to use or why, because of the hallucination problem.
Yes it does work as an augmented search engine but, it does output working code you just have to prompt it better if the code is not that complex like a simple endpoint you just have to understand exactly what you want.
I've had a similar experience, shipping new features at incredible speed, then waste a ton of time going down the wrong track trying to debug something because the LLM gave me a confidently wrong solution.
I think the parents post happened to everybody, and if it hasn’t it will.
The edge between being actually more productive or just “pretend productive” using large language models is something that we all haven’t completely figured out yet.
often it's something you casually overlook, some minor implementation detail that you didn't give much thought to that ends up being a huge mess later on, IME
Seems like LLMs would be well suited for test driven development. A human writes tests and the LLM can generate code passing all tests; ending with a solution that meets the humans expectations.
This is more or less how I use LLMs right now. They’re fantastic at the plumbing, so that I can focus on the important part - the business and domain logic.
I disagree because you're only considering the "get code to make the test pass". Refactoring, refining, and simplifying is critical and I've yet to see this applied well. (I've also yet to see the former applied usably well either despite "write tests generate code" being an early direction.)
One strategy I've been experimenting with is maintaining a 'spec' document, outlining all features and relevant technical notes about a project. I include the spec with all relevant source files in my prompt before asking the LLM to implement a new change or feature. This way it doesn't have to do as much guessing as to what my code is doing, and I can avoid relying on long-running conversations to maintain context. Instead, for each big change I include an up-to-date spec and all of the relevant source files. I update the spec to reflect the current state of the project as changes are made to give the LLM context about my project (this also doubles as documentation).
I use an NPM script to automate concatenating the spec + source files + prompt, which I then copy/paste to o1. So far this has been working somewhat reliably for the early stages of a project but has diminishing returns.
You're describing functionality that's built into Aider. You might want to try it out.
Aider also has a copy/paste mode to use web ui interfaces/subscriptions instead apis.
I definitely use and update my CONVENTIONS.md files and started adding a second specification file for new projects. This + architect + "can your suggestion be improved, or is there a better way?" has gotten me pretty far.
I ask this question without a hint of tone or sarcasm. You said: "*it’s a junior dev faking competence. Trust it at your own risk.*" My question is simply: "wouldn't you expect to personally be able to tell that a human junior dev was faking competence?" Why should it be different with the LLM?
Obviously, it depends on context. When talking to someone live you can pick up on subtle hints such as tone of voice, or where they look, or how they gesticulate, or a myriad other signals which give you a hint to their knowledge gaps. If you're communicating via text, the signals change. Furthermore, as you interact with people more often you understand them better and refine your understanding of them. LLMs always forget and “reset” and are in flux. They aren’t as consistent. Plus, they don’t grow with you and pick up on your signals and wants.
It’s incredibly worrying that it needs to be explained again and again that LLMs are different from people, do not behave like people, and should not be compared to people or interacted like people, because they are not people.
Interestingly your description of social cues you expect to pick up on are the exact sort of social cues I struggle with. If someone says something, generally speaking I expect it to be true unless there is an issue with it that suggests otherwise.
I suppose the wide range of negative and positive experiences people seem to have working with LLMs is related to the wide range of expectations people have for their interactions in general.
Not instantly. You’d give the human junior dev the benefit of the doubt at first. But when it becomes clear that the junior dev is faking competence all the time (that might take longer than the four days in TFA — yes I know it’s not exactly comparable, just saying) and won’t stop with that and start being honest instead, you’d eventually let them go, because that’s no way to work with someone.
I’ve been able to do more far complex things with ESP32s and RPis in an evening without knowing the first thing about python or c++.
I can also tell when it’s stuck in some kind of context swamp and won’t be any more help, because it will just keep making the same stupid mistakes over and over and generally forgetting past instructions.
At that point I take the last working code and paste it into a new chat.
Perhaps being a PM for several years has helped, I’ve had great success speeding up my programming workflows by prompting Claude with very specific, well defined tasks.
Like many others are saying, you need to be in the drivers seat and in control. The LLM is not going to fully complete your objectives for you, but it will speed you up when provided with enough context, especially on mundane boilerplate tasks.
I think the key to LLMs being useful is knowing how to prompt with enough context to get a useful output, and knowing what context is not important so the output doesn’t lead you in the wrong direction.
Funny enough, I posted an article I wrote here yesterday with the same sort of thesis. Different technologies (mine was Docker) but same idea of LLM leading me astray and causing a lot of frustration
Your plan was to use USB, but to me it looks like you're pretty much just using serial via USB. That's completely fine of course! One cheap way to tackle your problem is to use a version of printf with locking, which is likely available in many microcontroller SDKs (it's also slow). (Or you could add your own mutex.)
USB-CDC is cooler than that, you can make the Pico identify as more than just one device. E.g. https://github.com/Noltari/pico-uart-bridge identifies as two devices (so you get /dev/ttyACM0 and /dev/ttyACM1). So you could have logs on one and image transfers on another. I don't think you're limited to just two, but I haven't looked into it too far.
You can of course also use other USB protocols. For example you could have the Pico present itself as a mass-storage device or a USB camera, etc. You're just limited by the relatively slow speed of USB1.1. (Though the Pico doesn't exactly have a lot of memory so even USB1.1 will saturate all your RAM in less than 1 second)
Really enjoyed reading your article. Haven’t laughed as much reading a tech article in quite some time. You should consider doing some YouTube videos as your communications style is very humble and entertaining.
Made me wanna join in your garage and help out with the project :)
FWIW, I fed in the same problematic prompt to all the current ChatGPT models and even the legacy/mini models enumerated a bunch of pitfalls and considerations. I wonder why/how it managed to tell the author everything was perfect? A weird one-off occurrence?
Counterpoint: I'm on day 26 of an afternoon project I never would have attempted on my own, and I'm going to release it as a major feature.
Cursor & Claude got the boilerplate set up, which was half the mental barrier. Then they acted as thought partners as I tried out various implementations. In the end, I came up with the algorithm to make the thing performant, and now I'm hand-coding all the shader code—but they helped me think through what needed to be done.
My take is: LLMs are best at helping you code at the edge of your capabilities, where you still have enough knowledge to know when they're going wrong. But they'll help you push that edge forward.
I spend a good portion of my time asking people to fix their LLM code now at work. It has made code reviews tiring. And it has increased pairing time significantly, making it a less fun activity.
When workmanship doesn't matter, than the ship is already sinking.
It has been my experience 1 code clown can poison a project with dozens of reasonably talented engineers active. i.e. clowns often go through the project smearing bad kludges over acceptable standards to appear like their commit frequency means something.
This is why most developers secretly dream of being plumbers. Good luck, =3
Today, I needed to write a proxy[0] that wraps an object and log all method calls recursively.
I asked claude to write the initial version. It came up with a complicated class based solution. I spent more than 30 minutes getting a good abstract to come out. I was copy pasting typescript errors and applying fixes it suggested without thinking much.
In the end, I gave up and wrote what I wanted myself in 5 minutes.
Yes. I wrote how to do it technically. Claude was able to come up with a solution that worked on second attempt. The problem was it didn’t work with typescript nicely. The approach overcomplicated anything that depended on this class.
A bit on a tangent, but has there been any discussion of how junior devs in the future are ever going to get past that stage and become senior dev calibre if companies can replace the junior devs with AIs? Or is the thinking we'll be fine until all the current senior devs die off and by then AI will be able to replace them too so we won't need anyone?
It's definitely as good as junior dev at a lot of tasks but you always have to be in the driver seat. I don't ask junior devs to write functions one at a time. I give them a task, they ping me if they need something, but otherwise I hope I don't hear from them again for a while.
I don't see AI replacing that. AI is a tool with the instant Q&A intelligence of a junior dev but it's not actually doing the job of a junior dev. That's a subtle distinction.
Training will adapt to the widespread use of AI coding assistance if they are that universally useful, and people will come into the market as junior AI wranglers, with skillsets stronger than curent junior devs is some areas but weaker in others; current seniors will grumble about holes in their knowledge, but that's been the case with the generational changes in software development as the common problems people face at different levels have shifted over time. The details are new, but the process isn't.
Not if the goal is to replace the junior devs with AIs -- people won't be "coming into the market" because they won't be needed.
Companies are not saving money by paying for AI tools if they continue to hire the same number of people. The only way it makes financial sense, and for the enormous amounts of money being invested into AI to reap profits, is if companies are able to reduce the cost of labor. First, they only need 75% of the junior devs they have now, then 50%, then 25%.
> Not if the goal is to replace the junior devs with AIs -- people won't be "coming into the market" because they won't be needed.
It won't happen all at once, and as tasks done by current juniors are incrementally taken over by AI, the expected entry skillset will evolve in line with those changes. There will always be junior people in the field, but their expected knowledgebase and tasks will evolve, and even if 100% of the work currently done by juniors is eventually AI-ified, there will still be juniors, they just will be doing completely different things, and going through a completely different learning process to get there.
> Companies are not saving money by paying for AI tools if they continue to hire the same number of people.
Companies which have a fixed lump of tech work (in practice, none, actually) will save money because they will hire fewer total workers because output per worker will increase, but they will still have people who are newer and more experienced within that set, because the
More realistic companies that either make money with tech work or that apply internal effort to tech as long as it has net positive utility may actually end up spending more on tech, because each dollar spent gives more results. This still saves money (or makes more money), but the savings (where it is about savings, and not revenue) will be in the areas tech is applied to, not tech itself.
You are running the show and the LLM can act like many other roles to help you. Obscure or confusing topics, likely the LLM will be as bad at solving as any employee. Give it a plan. Follow up and make sure it’s on track.
What I learned is you can’t outsource expertise to an LLM, after many similar experiences to OP. Don’t ask it for advice, ask it to teach you so you can make decisions like this on your own. Preferably ground questions with excerpts from human made documents. It seems to make less mistakes when explaining things, and those mistakes are more noticeable when they do happen.
One of the areas where I've struggled to get effective use out of the LLM's is with UI/UX. That isn't my primary area of expertise (backend) so it definitively could be operator error here, but I use tools like v0.dev and just can't quite get it to do what I need it to do. Anybody have any tools, workflows, suggestions for this?
> "From her perspective, I order doordash and turn into a degen who is unfit to father. From my perspective, I get to enjoy my favorite place and just tinker or play games or do whatever. These are the days I get to play mad scientist and feel most like myself."
Most demeaning and depressingly toxic thing I've read today...
I’ve come to a similar conclusion - for now at least it’s best applied at a fairly granular level. Make me a red brick wall there rather than „hey architect make me a house“.
I do think OP tried a bit too much new stuff in one go though. USB plus zig is quite a bit more ambitious than the traditional hello world in a new lang
I have never found a use for LLMs for programming because I can find the (correct) answer much easier with a search. Perhaps the search engines just suck so hard these days people resort to LLMs. I use Kagi and GitHub to search and the results are much better.
I've found them tobe quite a time saver, within limits. The blog post seemed scattered and disorganized to me, and the author admits having no experience with using LLMs to this end, so perhaps the problem lies behind their eyes.
I'm developing an intuition to how and what to ask in order for the LLM's answer to be helpful. Once you start spinning your wheels, clear context, copy what you need, and start over.
> My wife and I have a deal where 1 day a month, She takes the kiddo and I am absolved of all responsibilities. I get a full day to lock in and build projects.
There's not much actual LLM-generated text in this post to go by, but it seems like each of the tokens generated by the LLM would be reasonable to have high probability. It sounds like the developer here thought that the sequence of tokens then carried meaning, where instead any possible meaning came from the reading. I wonder if this developer would be as irritated by the inaccuracies if they had cast sticks onto the ground to manage their stock portfolio and found the prophecy's "meaning" to be plausible but inaccurate.
The OP misunderstands (perhaps deliberately or for humorous effect) what a co-pilot is. This is telling:
"I learned that I need to stay firmly in the driver’s seat when tackling new tech."
Er, that's pretty much what a pilot is supposed to do! You can't (as yet) just give an AI free reign over your codebase and expect to come back later that day to discover a fully finished implementation. Maybe unless your prompt was "Make a snake game in Python". A pilot would be supervising their co-pilot at all times.
Comparing AIs to junior devs is getting tiresome. AIs like Claude and newer versions of ChatGPT have incredible knowledge bases. Yes, they do slip up, especially with esoteric matters where there are few authoritative (or several conflicting) sources, but the breadth of knowledge in and of itself is very valuable. As an anecdote, neither Claude nor ChatGPT were able to accurately answer a question I had about file operation flags yesterday, but when I said to ChatGPT that its answer wasn't correct, it apologised and said the Raymond Chen article it had sourced wasn't super clear about the particular combination I'd asked about. That's like having your own research assistant, not a headstrong overconfident junior dev. Yes, they make mistakes, but at least now they'll admit to them. This is a long way from a year or two ago.
In conclusion: don't use an AI as one of your primary sources of information for technology you're new to, especially if you're not double-checking its answers like a good pilot.
This is like watching a carpenter blame their hammer because they didn’t measure twice. AI is a tool, it's like a power tool for a tradesperson: it'll amplify your skills, but if you let it steer the whole project? You’ll end up with a pile of bent nails.
LLMs are jittery apprentices. They'll hallucinate measurements, over-sand perfectly good code, or spin you in circles for hours. I’ve been there back in the GPT-4 days especially, nothing stings like realising you wasted a day debugging AI’s creative solution to a problem you could've solved in 20 minutes.
When you treat AI like a toolbelt, not a replacement for your own brain? Magic. It’s killer at grunt work like; explaining regex, scaffolding boilerplate, or untangling JWT auth spaghetti. You still gotta hold the blueprint. AI ain't some magic wand: it’s a nail gun. Point it wrong, and you’ll spend four days prying out mistakes.
Sucks it cost you time, but hey, now you know to never let the tool work you. It's hopefully a lesson OP learns once and doesn't let it sour their experience with AI, because when utilised properly, you can really get things done, even if it's just the tedious/boring stuff or things you'd spend time Google bashing, reading docs or finding on StackOverflow.
> AI is great for generating ideas or drafting code, but it doesn’t understand. It’s like giving a junior developer a chainsaw instead of a scalpel—it might finish the job, but you’ll spend twice as long cleaning up the mess.
For anything remotely complex, this is dead on. I use various models daily to help with coding, and more often than not, I have to just DIY it or start brand new chats (because the original context got overwhelmed and started hallucinating).
This is why it's incredibly frustrating to see VCs and AI founders straight-up gaslighting people about what this stuff can (or will) do. They're trying to push this as a "work killer," but really, it's going to be some version of the opposite: a mess creator that necessitates human intervention.
Where we're at is amazing, but we've got a loooong way to go before we can be on hover crafts sipping sodas Wall-E style.
I am frankly tired of seeing this kind of post on HN. I feel like the population of programmers is bifurcating into those who are committed to mastering these tools, learning to work around their limitations and working to leverage their strengths… and those who are committed to complaining about how they aren’t already perfect Culture Ship Minds.
We get it. They’re not superintelligent at everything yet. They couldn’t infer what you must’ve really meant in your heart from your initial unskillful prompt. They couldn’t foresee every possible bug and edge case from the first moment of conceptualizing the design, a flaw which I’m sure you don’t have.
The thing that pushes me over the line into ranting territory is that computer programmers, of all people, should know that computers do what you tell them to.
> computer programmers, of all people, should know that computers do what you tell them to.
Right. The problem isn't that the tool isn't perfect, it's that you get a lot of excitable people with incentives pretending that it is or will soon be perfect (while simultaneously scaring non-technical people into thinking they'll be replaced with a chat bot soon).
There are certainly luddite types who are outright rejecting these tools, but if you have hands-on, daily experience, you can see the forest for the trees. You quickly realize that all of the "omg this thing is sentient" or "we can't let what we've got into the world, it's too dangerous" fodder like the Google panic memo are just covert marketing.
>The thing that pushes me over the line into ranting territory is that computer programmers, of all people, should know that computers do what you tell them to.
are you claiming LLMs function like computer program instructions? like they clearly don't operate like that at all.
They are closer to being deterministic machines that comply exactly with your instructions, for better or worse, than they are to magical pixies that guess what you must’ve actually meant. The implicit expectation demonstrated by many in the “loudly disappointed in LLMs” contingent seems to be that LLMs should just know what you meant, and then blame them for not correctly guessing it and delivering it.
I think LLMs have uncovered what we have always known in this industry: that people are, by default, bad at communicating their intent clearly and unambiguously.
If you express your intent to an LLM with sufficient clarity and disambiguation, it will rarely screw up. Often, we don’t have time to do this, and instead we aim for the sweet spot of sufficient but not exhaustive clarity. This can be fine if you are experienced with that particular LLM and you have a good feel for where its sweet spot actually is. If you miss that target, though, the LLM will not correctly infer your intended subtext. This is one of the things that requires experience. In fact, even the “same” LLM will change in its behavior and capabilities as it undergoes fine tuning. Sometimes it will even get worse at certain things.
All of this is to say, of course, you’re right that it’s not a compiler. But I think people fail in their application of LLMs for much the same reason that novice coders fail to get compilers to guess what they intended.
> They are closer to being deterministic machines that comply exactly with your instructions, for better or worse, than they are to magical pixies that guess what you must’ve actually meant.
If those are your only two reference points, yes they're closer to the former.
But the biggest problem is how much "pixie that does something you neither wanted nor asked for" gets mixed in. And I think a lot of the complaints you're saying are about lack of mind reading are actually about that problem instead.
Thanks, I’m not thinking clearly at all. I had it in my head that programming instruction are not the only way that a computer might do as it’s told. For example, if we delete a folder in a UI by accident we shouldn’t be surprised the folder is gone. But it doesn’t really quite fit the parent analogy. Sorry about that.
Nobody is complaining that LLMs aren't perfect Culture minds. People disagree with the premise that they are useful tools given their current capabilities. Your portrayal of those with whom you disagree is such a strawman that it might as well be set to a soy-vs-wojak meme.
They clearly are useful tools given their current capabilities. It just depends on what you’re using them for. You don’t use a screwdriver to drive nails, and you don’t go to HardwareNews to complain when your screwdriver isn’t working as a hammer.
I’m currently using them to port a client-side API SDK into multiple languages. This would be a pain in the ass time consuming task but is a breeze with LLMs because the exact behavior I want is clearly defined and relatively deterministic, and it’s also straightforward to test that I’m getting what I intend. The LLM thus gets done in 3 days what would take me 3 weeks (or more) to do by hand.
If the complaint is that it can’t do X, where X is something that would clearly require full AGI and likely true superintelligence — in this case expecting instantaneous, correct code that solves novel problems on the first try - then I have to insist that people are actually expecting Claude to be a Culture Ship Mind, implicitly. They just don’t realize that what they’re asking for his hard, which is itself a psychologically interesting fact, I suppose.
> I am frankly tired of seeing this kind of post on HN.
You've been here since 2016 and this is the kind of posting that finally gets to you? How in the world have you avoided all the shitposts in the last decade? What is your secret?
There are about 3-5 (5 being a strict upper bound in my experience over the last 17 years) takes on any given subject on HN that are regurgitated without thinking over, and over, and over again. I think an AI can be enlisted to basically catalogue the HN responses and package them so we don't have to discuss the same shit not for the 100th time, but for the 1000th time.
the "AI lies" takeaway is way off for those actually using these tools. Calling it a "junior dev faking competence" is catchy, but misses the point. We're not expecting a co-pilot, it's a tool, a super-powered intern that needs direction. The spaghetti code mess wasn't AI "lying", it was a lack of control and proper prompting.
Experienced folks aren't surprised by this. LLMs are fast for boilerplate, research, and exploring ideas, but they're not autonomous coders. The key is you staying in charge: detailed prompts, critical code review, iterative refinement. Going back to web interfaces and manual pasting because editor integration felt "too easy" is a massive overcorrection. It's like ditching cars for walking after one fender bender.
Ultimately, this wasn't an AI failure, it was an inexperienced user expecting too much, too fast. The "lessons learned" are valid, but not AI-specific. For those who use LLMs effectively, they're force multipliers, not replacements. Don't blame the tool for user error. Learn to drive it properly.
“Experienced folks” in this case means folks who’ve used LLM’s enough to somewhat understand how to “feed them” in ways that make the tools generate productive output.
Learning to properly prompt an LLM to get a net gain in value is a skill in it of itself.
Counterexample: Ive been able to complete more side projects in the last month leveraging llms than i have ever in my life. One of which I believe to have potential as a viable product, and another which involved complicated rust `no_std` and linker setup for compiling rust code onto bare metal RISCV from scratch.
I think the key to being successful here is to realize that you're still at the wheel as an engineer. The llm is there to rapidly synthesize the universe of information.
You still need to 1) have solid fundamentals in order to have an intuition against that synthesis, and 2) be experienced enough to translate that synthesis into actionable outcomes.
If youre lacking in either, youre at the same whims of copypasta that have always existed.
I’ve found LLMs to basically be a more fluent but also more lossy way of interfacing with stack overflow and tutorials.
If a topic is well represented in those places, then you will get your answer quicker and it can be to some extent shaped to your use case.
If the topic is not well represented there, then you will get circular nonsense.
You can say “obviously, that’s the training data”, and that’s true, and I do find it obvious personally, but the reaction to LLMs as some kind of second coming does not align with this reality.
That matches my experience too. I wonder how fast they'll improve and if LLMs will hit a wall, as some AI experts think.
Is it possible that you’re both using LLMs the same way you’d use SO and that’s the reason you see such similarities? The reason I ask is because it doesn’t not match my experience. It feels more like I’m able to Matrix-upload docs into my brain like Trinity learning to fly a helicopter.
I am using it like stack overflow in the sense that I’m solving a problem and I’m using it to answer questions when I’m in an unfamiliar or non-obvious place in the problem space.
If I have a question about a first order language or framework feature or pattern, it works great. If I have a question about a second order problem, like an interaction between language or framework features, or a logical inconsistency in feature behavior, then it usually has no idea what’s going on, unless it turns out to be a really common problem such as something that would come up when working through a tutorial.
For code completion, I’ve just turned it off. It saves time on boilerplate typing for sure, but the actual content pieces are so consistently wrong that on balance I find it distracting.
Maybe I have a weird programming style that doesn’t mesh well with the broader code training corpus, not sure. Or maybe a lot of people spend more time in the part of problem-space that intersects with tutorial-space? I am not very junior these days.
That being said I definitely do use LLMs to engage with tutorial type content. For that it is useful. And outside of software it is quite a bit better for interfacing with Wikipedia type content. Except for the part where it lies to your face. But it will get better! Extrapolating never hurt anyone.
Same. I use it to bootstrap my writing a react native app from pretty low familiarity with React.
It's pretty good at writing screens in broad strokes. You will have to fill in some details.
The exact details of correctly threading data through; or prop drilling vs alternatives; the rules around wrapping screens to use them in React Navigation? It's terrible at them.
> I’m able to Matrix-upload docs into my brain like Trinity learning to fly a helicopter.
So you're using it like Wikipedia? I find when learning something new (and non coding related) YouTube is infinitely better than an LLM. But then I prefer visual demonstration to tutorial or verbal explanation.
I'm sorry, what?
> It feels more like I’m able to Matrix-upload docs into my brain like Trinity learning to fly a helicopter
Did you puzzle about this sentence specifically? Imagine you don't know jack about flying helicopters, then Tank uploads the Helicopter Pilot Program (TM) directly to your brain; it would feel like magic.
Conversely, if you know a lot about helicopters, just not enough to fly a B-212, and the program includes instructions like "Press (Y) and Left Stick to hover", you'd know it's confabulating real world piloting with videogames.
That's the same with LLMs, you need to know a lot of the field to ask the right questions, and recognize/correct slop or confabulation, otherwise they seem much more powerful and smart than they really are.
lol thanks, did an LLM generate this reply?
I find that LLMs are almost comically bad at projects that have a hardware component like RaspberryPi or Pico, or Ardunio.
I think that its because often the libraries you use are niche or have a a few similar versions, the LLM really commonly hallucinated solutions and would continually suggest that library X did have that capability. I think because often in hardware projects you often hit a point where you can't do something or you need to modify a library, but the LLM tries to be "helpful" and it makes up a solution.
Based on my own modestly successful forays into that world, I have to imagine one problem the LLMs have in that space is terrible training data. A good three quarters of any search result you search for in that space will be straight-up out-of-date and not work on your system. Then you've got all the tiny variations between dozens of chipsets, and all the confidently wrong people on the internet telling you to do nonsensical things, entire ecosystems that basically poofed out of existence three years ago but are still full of all kinds of juicy search terms and Google juice... if I can hardly paw through this stuff with years of experience and the physical hardware right in front of me to verify claims with, I don't know how an LLM is supposed to paw through all that and produce a valid answer to any question in that space, without its own hardware to fiddle with directly.
You’re making my eye twitch.
The number of projects I’ve done where my notes are the difference between hours of relearning The Way or instant success. Google doesn’t work as some niche issue is blocking the path.
ESP32, Arduino, Home Assistant And various media server things.
They’re also pretty bad at typescript generics. They’re quite good at explaining concepts (like mapped types), but when push comes to shove they generate all sorts of stuff that looks convincing, but doesn’t pass the type checker.
And then you’ll paste in the error, and they’ll just say “ok I see the problem” and output the exact same broken code lol.
I’m guessing the problem is lack of training data. Most TS codebases are mostly just JS with a few types and zod schemas. All of the neat generic stuff happens in libraries or a few utilities
Actually, it's because many of the people writing tutorials and sharing answers about that stuff don't know what the hell they're doing or grasp the fundamentals of how those systems work and so most of the source material the LLM's are trained on is absolute garbage.
Public Arduino, RPi, Pico communities are basically peak cargo cult, with the blind leading the blind through things they don't understand. The noise is vastly louder than the signal.
There's a basically giant chasm between expereinced or professional embedded developers that mostly have no need to ever touch those things or visit their forums, and the confused hobbyists on those forums randomly slapping together code until something sorta works while trying to share their discoveries.
Presumably, those communities and their internal knowledge will mature eventually, but it's taking a long long time and it's still an absolute mess.
If you're genuinely interested in embedded development and IoT stuff, and are willing to put in the time to learn, put those platforms away and challenge yourself to at least learn how to directly work with production-track SoC'a from Nordic or ESP or whatever. And buy some books or take some courses instead of relying on forums or LLM's. You'll find yourself rewarded for the effort.
> Presumably, those communities and their internal knowledge will mature eventually, but it's taking a long long time and it's still an absolute mess.
It won't because the RPi are all undocumented, closed-source toys.
It would be an interesting experiment to see which chips an LLM is better at helping out with: RPi's with its hallucinatory ecosystem or something like the BeagleY-AI which has thousands of pages of actual TI documentation for its chips.
It would be really nice if the LLMs could cover for this and circumvent where RPi's keep getting used because they were dumped under cost to bootstrap a network effect.
>Presumably, those communities and their internal knowledge will mature eventually, but it's taking a long long time and it's still an absolute mess.
I'm not sure they will. There's a kind of evaporative cooling effect where once you get to a certain level of understanding you switch around your tools enough that there's not much point interacting with the community anymore.
Any book or course recommendations for someone who has a senior SWE background but has never touched hardware/embedded systems?
I was just today trying to fix some errors in an old Linux kernel version 3.x.x .dts file for some old hardware, so that I could get a modern kernel to use it. ChatGPT seemed very helpful at first - and I was super impressed. I thought it was giving me great insight into why the old files were now producing errors … except the changes it proposed never actually fixed anything.
Eventually I read some actual documentation and realised it was just spouting very plausible sounding nonsense - and confident at it!
The same thing happened a year or so ago when I tried to get a much older ChatGPT to help me with with USB protocol problems in some microcontroller code. It just hallucinated APIs and protocol features that didn’t actually exist. I really expected more by now - but I now suspect it’ll just never be good at niche tasks (and these two things are not particularly niche compared to some).
Eventually I read some actual documentation...
For the best of both worlds make the LLM first 'read' the documentation, and then ask for help. Make a huge difference in the quality and relevance of the answers you get.
And hope the docs aren't too large. LLMs tend to confabulate more with longer contexts.
Claude is like having my own college professor. I've learned more in the past month with Claude then I learned in the past year. I can ask questions repeatedly and get clarification as fine as a need it. Granted, Claude has limits, but its a game-changer.
> I think the key to being successful here is to realize that you're still at the wheel as an engineer. The llm is there to rapidly synthesize the universe of information.
Bingo. OP is like someone who is complaining about the tools, when they should be working on their talent. I have a LOT of hobbies (circuits, woodworking, surfing, playing live music, cycling, photography) and there will always be people who buy the best gear and complain that the gear sucks. (NOTE: I"m not implying claude is "the best gear", but it's a big big help.)
I think the only problem with LLMs is synthesis of new knowledge is severely limited. They are great at explaining things others have explained, but suck hard at inventing new things. At least that's my experience with Claude: it's terrible as a "greenfield" dev.
> Claude is like having my own college professor.
I don't use Claude, so maybe there's a huge gap in reliability between it and ChatGPT 4o. But with that disclaimer out of the way, I'm always fairly confused when people report experiences like these—IME, LLMs fall over miserably at even very simple pure math questions. Grammatical breakdowns of sentences (for a major language like Japanese) are also very hit-or-miss. I could see an LLM taking the place of, like, an undergrad TA, but even then only for very well-trod material in its training data.
(Or maybe I've just had better experiences with professors, making my standard for this comparison abnormally high :-P )
EDIT: Also, I figure this sort of thing must be highly dependent on which field you're trying to learn. But that decreases the utility of LLMs a lot for me, because it means I have to have enough existing experience in whatever I'm trying to learn about so that I can first probe whether I'm in safe territory or not.
Major in the context of Japanese is rough, I can see a significant drop in quality when interacting with the same model in say Spanish vs English
For as rich a culture the Japanese have, there's only about 1XX million speakers and the size of the text corpus really matters here, the couple billion of English speakers are also highly motivated to choose English over anything else because Lingua Franca has homefield advantage
To use LLM's efectively you have to work with knowledge of their weaknesses, Math is a good example, you'll get better results from Wolphram Alpha even for the simple things, which is expected
Broad reasoning and explanations tend to be better than overly specific topics, the more common a language, the better the response If a topic has a billion tutorials online, an LLM has a really high chance of figuring out first try
Be smart with the context you provide, the more you actively constrain an LLM, the more likely it is to work with you I have friends that just use it to feed class notes to generate questions and probe it for blindspots until they're satisfied, the improvements on their grade s make it seem like a good approach, but they know that just feeding responses to the LLM isn't trustworthy, so they do and then they also check by themselves, the extra time valuable by itself, if just to improve familiarity with the subject
> LLMs fall over miserably at even very simple pure math questions
They are language models, not calculators or logic languages like Prolog or proof languages like Coq. If you go in with that understanding, it makes a lot more sense as to their capabilities. I would understand the parent poster to mean that they are able to ask and rapidly synthesize information from what the LLM tells them, as a first start rather than necessarily being 100% correct on everything.
Of course that's fair advice in itself, but the parent specifically equated them to a "college professor."
Maybe that should be "college art professor" then? :)
I think a lot of these people object to AI probably see the gross amounts of energy it is using, or the trillions of dollars going to fewer than half a dozen men (most american, mostly white).
But, once you've had AI help you solve some gnarly problems, it is hard not to be amazed.
And this is coming from a gal who thinks the idea of self-driving cars is the biggest waste of resources ever.
(EDIT: Upon rereading this, it feels unintentionally blunt. I'm not trying to argue, and I apologize if my tone is somewhat unfriendly—that's purely a reflection of the fact that I'm a bad writer!)
Sorry, maybe I should've been clearer in my response—I specifically disagree with the "college professor" comparison. That is to say, in the areas I've tried using them for, LLM's can't even help me solve simple problems, let alone gnarly ones. Which is why hearing about experiences like yours leaves me genuinely confused.
I do get your point about people disagreeing with modern AI for "political" reasons, but I think it's inaccurate to lump everyone into that bucket. I, for one, am not trying to make any broader political statements or anything—I just genuinely can't see how LLMs are as practically useful as other people claim, outside of specific use cases.
Like I made very clear, it is great at some things and terrible at others.
YMMV. /shrugs/
How do you know it's accurate?
Simple: one thing I'm learning about is RFCs for TCP/IP. I can literally go test it. It's like saying, "How do you know it is right when it says 2+2=4"? Some knowledge when taught is self-correcting. Other things I'm studying, like, say, tensor calculus, I can immediately use and know I learned it correctly.
TCP/IP is a great example though of something you can get seemingly correct and then be subject to all kinds of failure modes in edge cases you didn’t handle correctly (fragmentation, silly windows, options changing header sizes, etc).
Thanks for telling me how wrong I am! I bet you're fun at parties.
This is not a party, this is a technology forum and we prize being right
[flagged]
They are reasonably accurate, and no tutor is perfect. How do you know your college professor is accurate?
My college professor has certifications and has passed tests that weren't in their training data.
My college professor was also willing to say "I don't know, ask me next class"
> My college professor was also willing to say "I don't know, ask me next class"
This is a key differentiator that I see in humans over LLMs: knowing ones limits.
> My college professor has certifications and has passed tests that weren't in their training data.
Granted, they are not (can't be) as rigorous as the tests your professor took, but new models are run through test suites before being released, too.
That being said, I saw my college professors making up things, too (mind you, they were all graduated from very good schools). One example I remember was our argument with a professor who argued that there is a theoretical limit for the coefficient of friction, and it is 1. That can potentially be categorised as a hallucination as it was completely made up and didn't make sense. Maybe it was in his training data (i.e. his own professors).
I agree with the "I don't know" part, though. This is something that LLMs are notoriously bad.
What do you consider 'not in its training data'?
I just asked Claude a question I am pretty sure was not in its training data.
* https://i.imgur.com/XjvImeT.jpeg
It is immediately wrong in Step 1. A newborn is not a 2:1 ratio of height:width. Certainly not 25cm width (what does that even mean? Shoulder to shoulder?).
This is a perfect example of where not knowing the “domain” leads you astray. As far as I know “newborn width” is not something typically measured, so Claude is pulling something out of thin air.
Indeed you are showing that something not in the training data leads to failure.
Babies also aren't rectangles.. you could lay a row shoulder to shoulder, then do another row upside down from the first and their heads would fit between the heads of the first row, saving space.
Edit: it also doesn't account for the fact the moon is more or less a sphere, and not a flat plane.
That's almost in the training data:
https://www.quora.com/How-many-Humans-can-we-fit-on-the-Moon
I guess coming up with a truly original question is tougher that it seems. Any ideas?
Ask them what's the airspeed velocity of a laden astronaut riding a horse on the moon...
Edit: couldn't resist, and dammit!!
Response: Ah, I see what you're doing! Since the Moon has no atmosphere, there’s technically no air to create any kind of airspeed velocity. So, the answer is... zero miles per hour. Unless, of course, you're asking about the speed of the horse itself! In that case, we’d just have to know how fast the astronaut can gallop without any atmosphere to slow them down.
But really, it’s all about the fun of imagining a moon-riding astronaut, isn’t it?
An African horse or a European horse?
Did you actually test the math done? Usually LLMs are terrible at math as, as I mentioned in another comment, they are language models, not calculators. Hopefully that changes when LLMs leverage other apps like calculators to get their results, I am not sure if Claude does that already or it's still in development.
Claude has access to an analysis frame which takes javascript which it can use for calculations.
You can also test your professor's answers. I don't just walk around going "Oh, Claude was right", I'm literally using what I just learned and am generating correct results. I'm not learning facts like dates, or subject things, I'm learning laws, equations, theories, proofs, etc. (Like how to apply Euler's totient or his extended theories on factorization... there's only one "right answer").
Also, you method for attesting your professors accuracy is inherently flawed. That little piece of paper on their wall doesn't correlate with how accurate they are; it doesn't mean zero, but it isn't foolproof. Hate to break it to you, but even heroes are fallible.
I would add though, that they can be very good at combining known concepts. Which can create a non-trivial set of "new knowledge".
Creating new knowledge from current knowledge is called "synthesis" (ancient term, nothing modern). I'm hoping you're right, it would be amazing.
[flagged]
I find LLMs to be decent unblockers. I only turn to them form time to time though, unless Im in a playful mode and try poking out various ideas. As a coder I also ask for snippets when Im lazy. I tried growing a slightly larger solution a few times and it failed in dumb ways. It was clear it doesn’t really comprehend we do, it’s not aware it’s moving in circles and so on. All these things will probably see a lot incremental of improvents and as a tool will definitely stay but fundamentally LLMs can’t really think, at least the way we do and expecting that is also foolish.
> which involved complicated rust `no_std` and linker setup for compiling rust code onto bare metal RISCV from scratch.
That's complicated, but I wouldn't say the resulting software is complex. You gave an LLM a repetitive, translation-based job, and you got good results back. I can also believe that an LLM could write up a dopey SAAS in half the time it would take a human to do the same.
But having the right parameters only takes you so far. Once you click generate, you are trusting that the model has some familiarity with your problem and can guide you without needing assistance. Most people I've seen rely entirely on linting and runtime errors to debug AI code, not "solid fundamentals" that can fact-check a problem they needed ChatGPT to solve first place. And the "experience" required to iterate and deploy AI-generated code basically boils down to your copy-and-paste skills. I like my UNIX knowledge, but it's not a big enough gate to keep out ChatGPT Andy and his cohort of enthusiastic morons.
We're going to see thousands of AI-assisted success stories come out of this. But we already had those "pennies on the dollar" success stories from hiring underpaid workers out of India and Pakistan. AI will not solve the unsolved problems of our industry and in many ways it will exacerbate the preexisting issues.
A tool that can "write up a dopey SAAS in half the time it would take a human to do" is a pretty incredible thing to add to your toolbox!
If the summary goal of your existence is to be the most delirious waste of resources that humanity has yet known, sure. It's the hammer and nail of spoiled burnouts everywhere that need a credible ruse to help them out of the bottle.
Some of us are capable of wanting for things better than a coin-operated REST API. The kind of imagination used to put people on the moon, that now helps today's business leaders imagine more profitable ways to sell anime pornography on iPhone. (Don't worry, AI will disrupt that industry too.)
I'm sorry, but generated REST API boilerplate has probably fed more people than putting people on the moon.
Pity about the direct injection SQL vulnerabilities though?
What it will do is to free up a lot of brainpower to think about those hard problems and empower people to try our their ideas.
I used to think the exact same thing would happen when we paid Pakistani and Indian labor to do America's busywork.
That was about 15 years ago, I no longer have the same enthusiasm you do.
Now, you are paying a Taiwanese or American company to produce GPUs for you. This allows you to use open-source models like DeepSeek R1, significantly reducing your reliance on Indian tech labor
I believe they are saying we did not learn to think more deeply then, so why should we expect to learn how now.
Is it reasonable to assume that more senior devs benefit more from LLMs?
I believe so. In my experience, you need to have that gut intuition (or experience) to say, “No way. That’s totally wrong.”
Since AI will capitulate and give you whatever you want.
You also have to learn how to ask without suggesting because it will take whatever you give it and agree.
Yep. I think a default state of skepticism is an absolute necessity when working with these tools.
I love LLMs. I agree with OP them expanding my hobby capacity as well. But I am constantly saying (in effect) “you sure…?” and tend to have a pretty good bs meter.
I’m still working to get my partner to that stage. They’re a little too happy to accept an answer without pushback or skepticism.
I think being ‘eager to accept an answer’ is the default mode of most people anyway. These tools are likely enabling faster disinformation consumption for the unaware.
Yes, you essentially have an impossibly well read junior engineer you can task with quick research questions like, "I'm trying to do x using lib y, can you figure that out for me." This is incredibly productive because in the answer is typically all the pieces you need but not always assembled right.
Getting the LLM to pull out well-known names of concepts is for me the skill you can't get anywhere else. You can describe a way to complete a task and ask for what it's called and you'll be heading down arxiv links right away. Like yes the algorithm to find the closest in edit distance and length needle string in a haystack is called Needleman–Wunsch, of course Claude, everyone knows that.
Once it gives me the names for the concepts I'm struggling with, I often end up finding the stackoverflow or documentation it's copy pasting.
I think so.
Junior devs can get plenty of value out of them too, if they have discipline in how they use them - as a learning tool, not as a replacement for thinking about projects.
Senior devs can get SO much more power from these things, because they can lean on many years of experience to help them evaluate if the tool is producing useful results - and to help them prompt it in the most effective way possible.
A junior engineer might not have the conceptual knowledge or vocabulary to say things like "write tests for this using pytest, include a fixture that starts the development server once before running all of the tests against it".
IMO experience provides better immunity for common hangups. Generated code tends to be directionally pretty good, but with lots of minor/esoteric failures. The experience to spot those fast and tidy them makes all the difference. Copilot helps me move 10x faster with tedious arduino stuff, but I can easily see where if I didn't have decent intuition around debugging and troubleshooting, there'd be almost zero traction since it'd be hard to clear that last 10% hurdle needed to even run the thing.
I wouldn't assume that at all. Most of the senior devs I talk to on a regular basis think commercial* LLMs are ridiculous and the AI hype is nonsensical.
* I put commercial there as a qualifier because there's some thought that in the future, very specifically-trained smaller models (open source) on particular technologies and corpuses (opt-in) might yield useful results without many of the ethical minefields we are currently dealing with.
It depends it think it’s less about how senior they are and how good they are at writing requirements, and knowing what directives should be explicitly stated and what can be safely inferred.
Basically if they are good at utilizing junior developers and interns or apprentices they probably will do well with an LLM assistant.
Ya. I think people that have better technical vocabulary and an understanding of what should be possible with code do better. That’s usually a senior engineer, but not always
It's the LLM paradox, seniors get more senior with them while juniors get more junior, creating a bimodal distribution in the future simply because juniors will start depending on them too much to learn how to code properly while seniors (who some may also exhibit the previous trait) will by and large be able to rapidly synthesize information from LLMs with their own understanding.
I had a couple of the most capable senior developers reach out to me to tell me how Github Copilot accelerated their productivity, which surprised me initially. So I think there's something to it.
Exactly this. OP, credit where credit is due, appears to be someone who “hacks things together” copy pasting solutions blindly from the internet - with little intuition gained along the way.
I agree with his point about asking AI to “fix” problems though. It’s really nice that you don’t have to fully understand something to use it, but that becomes a problem if you lean on it too much
Another data point. Plugin ideation to publication in 2 minutes, from a single prompt, albeit a multi-step shell prompt. 3 packages published in 24 hrs https://x.com/xundecidability/status/1884077427871342955
Ime engineers who find LLM useful have misunderstood their reasons for existence outside of being salaried..
What is your main project ? Do you LLM that? (
I wager you're not a rust expert and should maybe reconsider using rust in your main project.
FWIW asking LLM whether you should use rust ~ asking it about the meaning of life. Important questions that need answers, but not right away! (A week or 2 tops)
If you need to synthesize the universe of information with LLM.. that is not the universe you want to live or play in
Indeed LLMs are useful as an intern, they are at the “cocky grad” stage of their careers. If you don’t understand the problem and can’t steer the solution and worse has only limited understanding of the code they produce you are unlikely to be productive.
On the other hand if you understand what needs to be done, and how to direct the work the productivity boost can be massive.
Claude 3.5 sonnet and O1 are awesome at code generation even with relatively complex tasks and they have a long enough context and attention windows that the code they produce even on relatively large projects can be consistent.
I also found a useful method of using LLMs to “summarize” code in an instructive manner which can be used for future prompts. For example summarizing a large base class that may be reused in multiple other classes can be more effective than having to overload a large part of your context window with a bunch do code.
I've had both experiences strangely enough.
> The llm is there to rapidly synthesize the universe of information
That's a nice way of putting it.
This
[flagged]
You are very smart and cool.
"You are very smart and cool."
No I am really not. You aren't either.
In my experience LLMs will help you with things that have been solved thousands of times before and are just a matter of finding some easily researched solution.
The very moment when you try to go off the beaten path and do something unconventional or stuff that most people won't have written a lot about, it gets more tricky. Just consider how many people will know how to configure some middleware in a Node.js project... vs most things related to hardware or low level work. Or even working with complex legacy codebases that have bits of code with obscure ways of interacting and more levels of abstraction that can be reasonably put in context.
Then again, if an LLM gets confused, then a person might as well. So, personally I try to write code that'd be understandable by juniors and LLMs alike.
In my experience, a LLM decided to not know type alignment rules in C and confidently trotted out the wrong answer. It left a horrible taste in my mouth for the one time I decided to look at using a LLM for anything and it keeps leaving me wondering if I'd end up more time bashing the LLM into working than just working out the answer myself and learning the underlying reasoning why.
It was so wrong that I wonder what version of the C standard it was even hallucinating.
> vs most things related to hardware or low level work.
counter point:
https://github.com/ggerganov/llama.cpp/pull/11453
> This PR provides a big jump in speed for WASM by leveraging SIMD instructions for qX_K_q8_K and qX_0_q8_0 dot product functions.
> Surprisingly, 99% of the code in this PR is written by DeekSeek-R1. The only thing I do is to develop tests and write prompts (with some trials and errors)
A single PR doesn't really "prove" anything. Optimization passes on well-tested narrowly scoped code are something that LLMs are already pretty good at.
I think DeekThink is something different though.
It is able to figure out some things that I know do not have much training data at all.
It is looking at the manual and figuring things out. "That doesn't make sense. Wait, that can't be right. I must have the formula wrong."
I just seen that in the chain of thought.
Nah, in my experience, if there is the slightest error in the first sentence of the chain of thought, it tends to get worse and worse. I've had prompts that would generate a reasonable response in llama, but turn out utter garbage in Deepthink.
But how is this any different from real humans? They are not always right either. Sure, humans can understand things better, but are we really going to act like LLMs can't get better in the next year? And what about the next 6 months? I bet there are unknown startups like Deepseek that can push the frontier further.
The ways in which humans err are very different. You have a sense of your own knowledge on a topic and if you start to stray from what you know you're aware of it. Sure, you can lie about it but you have inherent confidence levels in what you're doing.
Sure, LLMs can improve but they're ultimately still bound by the constraints of the type of data they're trained on and don't actually build world models through a combination of high bandwidth exploratory training (like humans) and repeated causal inference.
at a certain point though, one wonders if you can trust people to accurately report how much is written by an LLM. (not even implying bad faith, but if you're constantly re-reading, selecting and re-combining snippets written by LLMs, it's not really "written" by LLMs in the same way that's implied).
We kinda went through this with images when Photoshop and similar tools appeared. I remember a lot of people asking questions in the late 90s/early 00s in particular about if an image were “real” or not and the distinctions between smart photography and digital compositions. Nowadays we just assume everyone is using such tools as a baseline and genuinely clever photography is now celebrated as an exception. Perhaps ditto with CGI and prop/set making in movies. Unless a director crows about how genuine the effects are, we assume CGI.
Yeah I never know exactly what this means. The pr says for one variant it got in one shot and the other they said took re-prompting 4 to 8 more times.
> at a certain point though, one wonders if you can trust people to accurately report how much is written by an LLM.
That's an interesting thought. I think there are ways to automate this, and some IDEs / tools track this already. I've seen posts by both Google and Amz providing percentages of "accepted" completions in their codebases, and that's probably something they track across codebases automatically.
Also on topic, here's aider's "self written code" statistics: https://aider.chat/HISTORY.html
But yeah I agree that "written by" doesn't necessarily imply "autonomously", and for the moment it's likely heavily curated by a human. And that's still ok, IMO.
I use CoPilot pretty much as a smarter autocomplete that can sometimes guess what I'm planning to type next. I find it's not so good at answering prompts, but if I type:
...and then pause, it's pretty good at guessing: ... for the next few lines. I don't really ask it to do more heavy lifting than that sort of thing. Certainly nothing like "Write this full app for me with these requirements [...]"LLMs are surprisingly good at Haskell (and I'm not).
I hope for a rennaisance of somewhat more rigorous programming languages: you can typecheck the LLM suggestions to see if they're any good. Also you can feed the type errors back to the LLM.
I've only started using LLMs for code recently, and I already tend to mentally translate what I want to something that I imagine is 'more commonly done and well represented in the training data'.
But especially the ability to just see some of the stuff it produces, and now to see its thought process, is incredibly useful to me already. I do have autism and possibly ADD though.
Brilliant.
I have fought the "lowest cognitive load" code-style fight forever at my current gig, and I just keep losing to the "watch this!" fancytowne code that mids love to merge in. They DO outnumber me, so... fair deuce I suppose.
There is value in code being readable by Juniors and LLMs -- hell, this Senior doesn't want to spend time figuring out your decorators, needless abstractions and syntax masturbation. I just want to fix a bug and get on with my day.
While I think this comment got flagged (probably for the way it was worded), you aren't wrong! A good way I've heard a similar thought expressed is that code should be not only easy to maintain, but also throw away and replace, which more or less urges you towards writing the easiest things you can get away with (given your particular performance requirements and other constraints).
That rhymes with my experience of trying to generate placeholder art using AI.
Since it's just a placeholder I often ask for a funny twist but it's rarely ever anything like it.
> AI isn’t a co-pilot; it’s a junior dev faking competence. Trust it at your own risk.
This is a good take that tracks with my (heavy) usage of LLMs for coding. Leveraging productive-but-often-misguided junior devs is a skill every dev should actively cultivate!
> Leveraging productive-but-often-misguided junior devs is a skill every dev should actively cultivate!
Feels like this is only worthwhile because the junior dev learns from the experience; an investment that yields benefits all around, in the broad sense. Nobody wants a junior around that refuses to learn in perpetuity, serving only as a drag on productivity and eventually your sanity.
That's somewhere that the AI-as-junior-dev analogy breaks down a little.
There's still incredible accumulated value here, but it's at the other end. The more times you successfully use an LLM to produce working code, the more you learn about how to use them - what they're good at, what they're bad at, how to effectively prompt them.
It's worthwhile because you cannot do everything and often it is better to have someone far worse than you work on a problem than to just ignore it.
What you're doing is sacrificing learning for speed.
Which is fine, if it's a conscious choice for yourself.
I don't think GP was talking about themselves being a junior using LLMs, at least my interpretation was that devs should learn how to leverage misguided junior, and LLMs are more-or-less on the level of a misguided junior.
Which I completely agree, I use LLMs for the cases where I do know what I'm trying to do, I just can't remember some exact detail that would require reading documentation. It's much quicker to leverage a LLM rather than going on a wild goose chase of the piece of information I know exists.
Also it's a pretty good tool to scaffold the boring stuff, asking a LLM "generate test code for X asserting A, B, and C" and editing it to be a proper test frees up mental space for more important stuff.
I wouldn't trust a LLM to generate any kind of business logic-heavy code, instead I use it as a quite smart template/scaffold generator.
And the end result is you won't learn the details, so you will become more and more dependent on your magic piano.
I know the details, I've been through the wading, thrashing around the docs, the books, I just can't recall the right incantation at that moment and a LLM is more efficient than searching the web.
I still have the skills to search the web if the magic piano disappears.
Don't know why you are trying to come up with a situation that doesn't exist, what's your point exactly against this quite narrow use-case?
Thanks for explaining my intent, you nailed it.
It is quite remarkable that we are already at the stage where saying "this AI is about as competent as an inexperienced college graduate" constitutes criticism. It is entirely proper for people to be engaging sceptically with LLMs at their current level of capability, but I think we should also keep in mind the astonishingly rapid growth rate in their performance. LLMs were a toy two years ago, they're now a useful if flawed colleague, but what can we expect two years from now?
I mean 2 years ago they were at about the same place, theres been very little practical gain from gpt4 in my opinion. No matter the model the fundamental failure cases have remained the same.
I disagree, context size alone has exploded from 8k to 200k now and that makes a huge difference. LLMs have also progressed significantly in many other metrics, code quality, understanding, etc. The recent reasoning models have upped the ante further, especially when combined with models that are good at editing code.
Reasoning models like o1 or QwQ absolutely destroy 4o in coding, let alone GPT-4 circa 2022.
Minor correction: GPT-4 was announced on March 14, 2023, less than two years ago. I don’t remember how much LLMs had been discussed as coding assistants before then, but it was Greg Brockman’s demonstration of using it to write code that first brought that capability to my attention:
https://www.youtube.com/live/outcGtbnMuQ?si=oTMA02ns_BJDRS4c...
Advances since then have indeed been remarkable.
I used Claude to help me build a side project in 4 hours that I would never have built otherwise. Essentially, it's a morphing wavetable oscillator in React (https://waves.tashian.com).
Six months ago, I tried building this app with ChatGPT and got nowhere fast.
Building it with Claude required a gluing together a few things that I didn't know much about: JavaScript audio processing, drawing on a JavaScript canvas, an algorithm for bilinear interpolation.
I don't write JavaScript often. But I know how to program and I understand what I'm looking at. The project came together easily and the creative momentum of it felt great to me. The most amazing moment was when I reported a bug—I told Claude that the audio was stuttering whenever I moved the controls—and it figured out that we needed to use an AudioWorklet thread instead of trying to play the audio directly from the React component. I had never even heard of AudioWorklet. Claude refactored my code to use the AudioWorklet, and the stutter disappeared.
I wouldn't have built this without Claude, because I didn't need it to exist that badly. Claude reduced the creative inertia just enough for me to get it done.
What was your workflow for doing that? Just going back and forth in a chat, or a more integrated experience in a dedicated editor?
Just copy/paste from the chat window. I kept running into token limits. I came away from it wanting a much better workflow.
That's the next step for me in learning AI... playing with different integrated editor tools.
I have used LLMs as a tool and I start to "give up" working with it after a few tries. It excels at simple tasks, boilerplate, or scripts but larger programs you really have to know what exactly you want to do.
I do see the LLMs ingesting more and more documentation and content and they are improving at giving me right answers. Almost two years ago I don't believe they had every python package indexed and now they appear to have at least the documentation or source code of it.
The trouble is the only reliable use-case LLMs actually seem good at is "augmented search engine". Any attempts at coding with them just end up feeling like trying to code via a worse interface.
So it's handy to get a quick list of "all packages which do X", but it's worse then useless to have it speculate as to which one to use or why, because of the hallucination problem.
Yes it does work as an augmented search engine but, it does output working code you just have to prompt it better if the code is not that complex like a simple endpoint you just have to understand exactly what you want.
I've had a similar experience, shipping new features at incredible speed, then waste a ton of time going down the wrong track trying to debug something because the LLM gave me a confidently wrong solution.
Well that's kind of on you for not noticing that it was the wrong solution, isn't it?
Sometimes the solution is 99% correct but the other 1% is so subtly wrong that it both doesn't work and is a debugging hell.
Welcome to programming.
I think the parents post happened to everybody, and if it hasn’t it will.
The edge between being actually more productive or just “pretend productive” using large language models is something that we all haven’t completely figured out yet.
often it's something you casually overlook, some minor implementation detail that you didn't give much thought to that ends up being a huge mess later on, IME
Ya but you kind of get painted in corner sometimes. And sunken cost fallacy kicks in.
It's on you however for not understanding the greater point?
Seems like LLMs would be well suited for test driven development. A human writes tests and the LLM can generate code passing all tests; ending with a solution that meets the humans expectations.
This is more or less how I use LLMs right now. They’re fantastic at the plumbing, so that I can focus on the important part - the business and domain logic.
I disagree because you're only considering the "get code to make the test pass". Refactoring, refining, and simplifying is critical and I've yet to see this applied well. (I've also yet to see the former applied usably well either despite "write tests generate code" being an early direction.)
One strategy I've been experimenting with is maintaining a 'spec' document, outlining all features and relevant technical notes about a project. I include the spec with all relevant source files in my prompt before asking the LLM to implement a new change or feature. This way it doesn't have to do as much guessing as to what my code is doing, and I can avoid relying on long-running conversations to maintain context. Instead, for each big change I include an up-to-date spec and all of the relevant source files. I update the spec to reflect the current state of the project as changes are made to give the LLM context about my project (this also doubles as documentation).
I use an NPM script to automate concatenating the spec + source files + prompt, which I then copy/paste to o1. So far this has been working somewhat reliably for the early stages of a project but has diminishing returns.
You're describing functionality that's built into Aider. You might want to try it out.
Aider also has a copy/paste mode to use web ui interfaces/subscriptions instead apis.
I definitely use and update my CONVENTIONS.md files and started adding a second specification file for new projects. This + architect + "can your suggestion be improved, or is there a better way?" has gotten me pretty far.
Didn't know about Aider - going to give that a try, thanks!
I ask this question without a hint of tone or sarcasm. You said: "*it’s a junior dev faking competence. Trust it at your own risk.*" My question is simply: "wouldn't you expect to personally be able to tell that a human junior dev was faking competence?" Why should it be different with the LLM?
Obviously, it depends on context. When talking to someone live you can pick up on subtle hints such as tone of voice, or where they look, or how they gesticulate, or a myriad other signals which give you a hint to their knowledge gaps. If you're communicating via text, the signals change. Furthermore, as you interact with people more often you understand them better and refine your understanding of them. LLMs always forget and “reset” and are in flux. They aren’t as consistent. Plus, they don’t grow with you and pick up on your signals and wants.
It’s incredibly worrying that it needs to be explained again and again that LLMs are different from people, do not behave like people, and should not be compared to people or interacted like people, because they are not people.
Interestingly your description of social cues you expect to pick up on are the exact sort of social cues I struggle with. If someone says something, generally speaking I expect it to be true unless there is an issue with it that suggests otherwise.
I suppose the wide range of negative and positive experiences people seem to have working with LLMs is related to the wide range of expectations people have for their interactions in general.
Not instantly. You’d give the human junior dev the benefit of the doubt at first. But when it becomes clear that the junior dev is faking competence all the time (that might take longer than the four days in TFA — yes I know it’s not exactly comparable, just saying) and won’t stop with that and start being honest instead, you’d eventually let them go, because that’s no way to work with someone.
I’ve been able to do more far complex things with ESP32s and RPis in an evening without knowing the first thing about python or c++.
I can also tell when it’s stuck in some kind of context swamp and won’t be any more help, because it will just keep making the same stupid mistakes over and over and generally forgetting past instructions.
At that point I take the last working code and paste it into a new chat.
Most LLMs default to being sycophantic yes-men, but if you create a custom prompt, it can help mitigate any issues.
I have a custom prompt that instructs gpt4o to get aggressive about attacking anything I say (and, importantly, anything it says).
Here's my result for the same question:
https://chatgpt.com/share/67984aa9-1608-8012-be93-a77728ab8e...
As opposed to not trusting an LLM, and ending up on day 4 of an afternoon project? :P
I've been doing that since way before LLMs were a thing.
Perhaps being a PM for several years has helped, I’ve had great success speeding up my programming workflows by prompting Claude with very specific, well defined tasks.
Like many others are saying, you need to be in the drivers seat and in control. The LLM is not going to fully complete your objectives for you, but it will speed you up when provided with enough context, especially on mundane boilerplate tasks.
I think the key to LLMs being useful is knowing how to prompt with enough context to get a useful output, and knowing what context is not important so the output doesn’t lead you in the wrong direction.
Funny enough, I posted an article I wrote here yesterday with the same sort of thesis. Different technologies (mine was Docker) but same idea of LLM leading me astray and causing a lot of frustration
Your plan was to use USB, but to me it looks like you're pretty much just using serial via USB. That's completely fine of course! One cheap way to tackle your problem is to use a version of printf with locking, which is likely available in many microcontroller SDKs (it's also slow). (Or you could add your own mutex.)
USB-CDC is cooler than that, you can make the Pico identify as more than just one device. E.g. https://github.com/Noltari/pico-uart-bridge identifies as two devices (so you get /dev/ttyACM0 and /dev/ttyACM1). So you could have logs on one and image transfers on another. I don't think you're limited to just two, but I haven't looked into it too far.
You can of course also use other USB protocols. For example you could have the Pico present itself as a mass-storage device or a USB camera, etc. You're just limited by the relatively slow speed of USB1.1. (Though the Pico doesn't exactly have a lot of memory so even USB1.1 will saturate all your RAM in less than 1 second)
Really enjoyed reading your article. Haven’t laughed as much reading a tech article in quite some time. You should consider doing some YouTube videos as your communications style is very humble and entertaining.
Made me wanna join in your garage and help out with the project :)
FWIW, I fed in the same problematic prompt to all the current ChatGPT models and even the legacy/mini models enumerated a bunch of pitfalls and considerations. I wonder why/how it managed to tell the author everything was perfect? A weird one-off occurrence?
Counterpoint: I'm on day 26 of an afternoon project I never would have attempted on my own, and I'm going to release it as a major feature.
Cursor & Claude got the boilerplate set up, which was half the mental barrier. Then they acted as thought partners as I tried out various implementations. In the end, I came up with the algorithm to make the thing performant, and now I'm hand-coding all the shader code—but they helped me think through what needed to be done.
My take is: LLMs are best at helping you code at the edge of your capabilities, where you still have enough knowledge to know when they're going wrong. But they'll help you push that edge forward.
I spend a good portion of my time asking people to fix their LLM code now at work. It has made code reviews tiring. And it has increased pairing time significantly, making it a less fun activity.
When workmanship doesn't matter, than the ship is already sinking.
It has been my experience 1 code clown can poison a project with dozens of reasonably talented engineers active. i.e. clowns often go through the project smearing bad kludges over acceptable standards to appear like their commit frequency means something.
This is why most developers secretly dream of being plumbers. Good luck, =3
Today, I needed to write a proxy[0] that wraps an object and log all method calls recursively.
I asked claude to write the initial version. It came up with a complicated class based solution. I spent more than 30 minutes getting a good abstract to come out. I was copy pasting typescript errors and applying fixes it suggested without thinking much.
In the end, I gave up and wrote what I wanted myself in 5 minutes.
0] https://github.com/cloudycotton/browser-operator/blob/main/s...
Would you have written it in five minutes has you not just spent 30 minutes ruling out wrong solutions?
Yes. I wrote how to do it technically. Claude was able to come up with a solution that worked on second attempt. The problem was it didn’t work with typescript nicely. The approach overcomplicated anything that depended on this class.
Nemo, are you using a self-hosted install or .com?
If .com, email me (it's in my profile) and I can see if there is a reason your account is getting so heavily captcha'd.
> junior dev faking competence
A bit on a tangent, but has there been any discussion of how junior devs in the future are ever going to get past that stage and become senior dev calibre if companies can replace the junior devs with AIs? Or is the thinking we'll be fine until all the current senior devs die off and by then AI will be able to replace them too so we won't need anyone?
1. CS/Eng degree 2. ??? 3. Senior dev!
It's definitely as good as junior dev at a lot of tasks but you always have to be in the driver seat. I don't ask junior devs to write functions one at a time. I give them a task, they ping me if they need something, but otherwise I hope I don't hear from them again for a while.
I don't see AI replacing that. AI is a tool with the instant Q&A intelligence of a junior dev but it's not actually doing the job of a junior dev. That's a subtle distinction.
Training will adapt to the widespread use of AI coding assistance if they are that universally useful, and people will come into the market as junior AI wranglers, with skillsets stronger than curent junior devs is some areas but weaker in others; current seniors will grumble about holes in their knowledge, but that's been the case with the generational changes in software development as the common problems people face at different levels have shifted over time. The details are new, but the process isn't.
Not if the goal is to replace the junior devs with AIs -- people won't be "coming into the market" because they won't be needed.
Companies are not saving money by paying for AI tools if they continue to hire the same number of people. The only way it makes financial sense, and for the enormous amounts of money being invested into AI to reap profits, is if companies are able to reduce the cost of labor. First, they only need 75% of the junior devs they have now, then 50%, then 25%.
> Not if the goal is to replace the junior devs with AIs -- people won't be "coming into the market" because they won't be needed.
It won't happen all at once, and as tasks done by current juniors are incrementally taken over by AI, the expected entry skillset will evolve in line with those changes. There will always be junior people in the field, but their expected knowledgebase and tasks will evolve, and even if 100% of the work currently done by juniors is eventually AI-ified, there will still be juniors, they just will be doing completely different things, and going through a completely different learning process to get there.
> Companies are not saving money by paying for AI tools if they continue to hire the same number of people.
Companies which have a fixed lump of tech work (in practice, none, actually) will save money because they will hire fewer total workers because output per worker will increase, but they will still have people who are newer and more experienced within that set, because the
More realistic companies that either make money with tech work or that apply internal effort to tech as long as it has net positive utility may actually end up spending more on tech, because each dollar spent gives more results. This still saves money (or makes more money), but the savings (where it is about savings, and not revenue) will be in the areas tech is applied to, not tech itself.
You are running the show and the LLM can act like many other roles to help you. Obscure or confusing topics, likely the LLM will be as bad at solving as any employee. Give it a plan. Follow up and make sure it’s on track.
What I learned is you can’t outsource expertise to an LLM, after many similar experiences to OP. Don’t ask it for advice, ask it to teach you so you can make decisions like this on your own. Preferably ground questions with excerpts from human made documents. It seems to make less mistakes when explaining things, and those mistakes are more noticeable when they do happen.
One of the areas where I've struggled to get effective use out of the LLM's is with UI/UX. That isn't my primary area of expertise (backend) so it definitively could be operator error here, but I use tools like v0.dev and just can't quite get it to do what I need it to do. Anybody have any tools, workflows, suggestions for this?
> "From her perspective, I order doordash and turn into a degen who is unfit to father. From my perspective, I get to enjoy my favorite place and just tinker or play games or do whatever. These are the days I get to play mad scientist and feel most like myself."
Most demeaning and depressingly toxic thing I've read today...
That’s a fun writeup.
I’ve come to a similar conclusion - for now at least it’s best applied at a fairly granular level. Make me a red brick wall there rather than „hey architect make me a house“.
I do think OP tried a bit too much new stuff in one go though. USB plus zig is quite a bit more ambitious than the traditional hello world in a new lang
I have never found a use for LLMs for programming because I can find the (correct) answer much easier with a search. Perhaps the search engines just suck so hard these days people resort to LLMs. I use Kagi and GitHub to search and the results are much better.
LLM == WGCM = Wild Goose Chase Model
I've found them tobe quite a time saver, within limits. The blog post seemed scattered and disorganized to me, and the author admits having no experience with using LLMs to this end, so perhaps the problem lies behind their eyes.
I'm developing an intuition to how and what to ask in order for the LLM's answer to be helpful. Once you start spinning your wheels, clear context, copy what you need, and start over.
OP could have written the firmware in zig, too!
https://github.com/ZigEmbeddedGroup/microzig
> My wife and I have a deal where 1 day a month, She takes the kiddo and I am absolved of all responsibilities. I get a full day to lock in and build projects.
I love it!
> As they say, I was trying to get a few birds stoned at once.
Imma gonna have to work this into a convo some day. Just to see the “wait, what??” expressions on people’s faces.
I used very similar hardware to accomplish a very similar project (Notifications on a round screen) and the LLM was great for everything except UX.
A few mitigation ideas
There's not much actual LLM-generated text in this post to go by, but it seems like each of the tokens generated by the LLM would be reasonable to have high probability. It sounds like the developer here thought that the sequence of tokens then carried meaning, where instead any possible meaning came from the reading. I wonder if this developer would be as irritated by the inaccuracies if they had cast sticks onto the ground to manage their stock portfolio and found the prophecy's "meaning" to be plausible but inaccurate.
The OP misunderstands (perhaps deliberately or for humorous effect) what a co-pilot is. This is telling:
"I learned that I need to stay firmly in the driver’s seat when tackling new tech."
Er, that's pretty much what a pilot is supposed to do! You can't (as yet) just give an AI free reign over your codebase and expect to come back later that day to discover a fully finished implementation. Maybe unless your prompt was "Make a snake game in Python". A pilot would be supervising their co-pilot at all times.
Comparing AIs to junior devs is getting tiresome. AIs like Claude and newer versions of ChatGPT have incredible knowledge bases. Yes, they do slip up, especially with esoteric matters where there are few authoritative (or several conflicting) sources, but the breadth of knowledge in and of itself is very valuable. As an anecdote, neither Claude nor ChatGPT were able to accurately answer a question I had about file operation flags yesterday, but when I said to ChatGPT that its answer wasn't correct, it apologised and said the Raymond Chen article it had sourced wasn't super clear about the particular combination I'd asked about. That's like having your own research assistant, not a headstrong overconfident junior dev. Yes, they make mistakes, but at least now they'll admit to them. This is a long way from a year or two ago.
In conclusion: don't use an AI as one of your primary sources of information for technology you're new to, especially if you're not double-checking its answers like a good pilot.
Context matters a lot, copy-pasting snippets to a webpage is _way_ less effective than Cursor/Windsurf.
for what it’s worth, my afternoon projects tend to take over four days even if no llm is involved
Laughed aloud at this one. Yup, same here.
This is like watching a carpenter blame their hammer because they didn’t measure twice. AI is a tool, it's like a power tool for a tradesperson: it'll amplify your skills, but if you let it steer the whole project? You’ll end up with a pile of bent nails.
LLMs are jittery apprentices. They'll hallucinate measurements, over-sand perfectly good code, or spin you in circles for hours. I’ve been there back in the GPT-4 days especially, nothing stings like realising you wasted a day debugging AI’s creative solution to a problem you could've solved in 20 minutes.
When you treat AI like a toolbelt, not a replacement for your own brain? Magic. It’s killer at grunt work like; explaining regex, scaffolding boilerplate, or untangling JWT auth spaghetti. You still gotta hold the blueprint. AI ain't some magic wand: it’s a nail gun. Point it wrong, and you’ll spend four days prying out mistakes.
Sucks it cost you time, but hey, now you know to never let the tool work you. It's hopefully a lesson OP learns once and doesn't let it sour their experience with AI, because when utilised properly, you can really get things done, even if it's just the tedious/boring stuff or things you'd spend time Google bashing, reading docs or finding on StackOverflow.
> AI is great for generating ideas or drafting code, but it doesn’t understand. It’s like giving a junior developer a chainsaw instead of a scalpel—it might finish the job, but you’ll spend twice as long cleaning up the mess.
For anything remotely complex, this is dead on. I use various models daily to help with coding, and more often than not, I have to just DIY it or start brand new chats (because the original context got overwhelmed and started hallucinating).
This is why it's incredibly frustrating to see VCs and AI founders straight-up gaslighting people about what this stuff can (or will) do. They're trying to push this as a "work killer," but really, it's going to be some version of the opposite: a mess creator that necessitates human intervention.
Where we're at is amazing, but we've got a loooong way to go before we can be on hover crafts sipping sodas Wall-E style.
I mean....
No design. Hardware & software. 2 different platforms. A new language. Zig. Unrealistic time expectations.
A senior SWE would've still tanked this, just in different ways.
Personally, I'd still consider it a valuable experiment, because the lessons learned are really valuable ones. Enjoy round 2 :)
The junior dev faking competence is useful but needs a lot of supervision (unlike a real junior dev we don't know if this one will get better).
I am frankly tired of seeing this kind of post on HN. I feel like the population of programmers is bifurcating into those who are committed to mastering these tools, learning to work around their limitations and working to leverage their strengths… and those who are committed to complaining about how they aren’t already perfect Culture Ship Minds.
We get it. They’re not superintelligent at everything yet. They couldn’t infer what you must’ve really meant in your heart from your initial unskillful prompt. They couldn’t foresee every possible bug and edge case from the first moment of conceptualizing the design, a flaw which I’m sure you don’t have.
The thing that pushes me over the line into ranting territory is that computer programmers, of all people, should know that computers do what you tell them to.
> computer programmers, of all people, should know that computers do what you tell them to.
Right. The problem isn't that the tool isn't perfect, it's that you get a lot of excitable people with incentives pretending that it is or will soon be perfect (while simultaneously scaring non-technical people into thinking they'll be replaced with a chat bot soon).
There are certainly luddite types who are outright rejecting these tools, but if you have hands-on, daily experience, you can see the forest for the trees. You quickly realize that all of the "omg this thing is sentient" or "we can't let what we've got into the world, it's too dangerous" fodder like the Google panic memo are just covert marketing.
>The thing that pushes me over the line into ranting territory is that computer programmers, of all people, should know that computers do what you tell them to.
are you claiming LLMs function like computer program instructions? like they clearly don't operate like that at all.
They are closer to being deterministic machines that comply exactly with your instructions, for better or worse, than they are to magical pixies that guess what you must’ve actually meant. The implicit expectation demonstrated by many in the “loudly disappointed in LLMs” contingent seems to be that LLMs should just know what you meant, and then blame them for not correctly guessing it and delivering it.
I think LLMs have uncovered what we have always known in this industry: that people are, by default, bad at communicating their intent clearly and unambiguously.
If you express your intent to an LLM with sufficient clarity and disambiguation, it will rarely screw up. Often, we don’t have time to do this, and instead we aim for the sweet spot of sufficient but not exhaustive clarity. This can be fine if you are experienced with that particular LLM and you have a good feel for where its sweet spot actually is. If you miss that target, though, the LLM will not correctly infer your intended subtext. This is one of the things that requires experience. In fact, even the “same” LLM will change in its behavior and capabilities as it undergoes fine tuning. Sometimes it will even get worse at certain things.
All of this is to say, of course, you’re right that it’s not a compiler. But I think people fail in their application of LLMs for much the same reason that novice coders fail to get compilers to guess what they intended.
> They are closer to being deterministic machines that comply exactly with your instructions, for better or worse, than they are to magical pixies that guess what you must’ve actually meant.
If those are your only two reference points, yes they're closer to the former.
But the biggest problem is how much "pixie that does something you neither wanted nor asked for" gets mixed in. And I think a lot of the complaints you're saying are about lack of mind reading are actually about that problem instead.
What part of the comment makes you think they are claiming that?
The part I quoted? I don't really see how to interpret it any other way.
Thanks, I’m not thinking clearly at all. I had it in my head that programming instruction are not the only way that a computer might do as it’s told. For example, if we delete a folder in a UI by accident we shouldn’t be surprised the folder is gone. But it doesn’t really quite fit the parent analogy. Sorry about that.
Nobody is complaining that LLMs aren't perfect Culture minds. People disagree with the premise that they are useful tools given their current capabilities. Your portrayal of those with whom you disagree is such a strawman that it might as well be set to a soy-vs-wojak meme.
They clearly are useful tools given their current capabilities. It just depends on what you’re using them for. You don’t use a screwdriver to drive nails, and you don’t go to HardwareNews to complain when your screwdriver isn’t working as a hammer.
I’m currently using them to port a client-side API SDK into multiple languages. This would be a pain in the ass time consuming task but is a breeze with LLMs because the exact behavior I want is clearly defined and relatively deterministic, and it’s also straightforward to test that I’m getting what I intend. The LLM thus gets done in 3 days what would take me 3 weeks (or more) to do by hand.
If the complaint is that it can’t do X, where X is something that would clearly require full AGI and likely true superintelligence — in this case expecting instantaneous, correct code that solves novel problems on the first try - then I have to insist that people are actually expecting Claude to be a Culture Ship Mind, implicitly. They just don’t realize that what they’re asking for his hard, which is itself a psychologically interesting fact, I suppose.
"People disagree with the premise that they are useful tools given their current capabilities."
I will argue the opposite of that forever. They're very evidently useful, if you take the time to learn how to apply them.
Use uBlock Origin to block posts with the keywords you don't want in the title, like AI or LLM.
> I am frankly tired of seeing this kind of post on HN.
You've been here since 2016 and this is the kind of posting that finally gets to you? How in the world have you avoided all the shitposts in the last decade? What is your secret?
It’s the fact that it’s the same thing over and over. The 100th case of “Well I had a bad experience using LLM for coding!” is not interesting.
There are about 3-5 (5 being a strict upper bound in my experience over the last 17 years) takes on any given subject on HN that are regurgitated without thinking over, and over, and over again. I think an AI can be enlisted to basically catalogue the HN responses and package them so we don't have to discuss the same shit not for the 100th time, but for the 1000th time.
I doubt one experience, positive, or negative counts for them all.
Learning how to build or create with a new kind of word processor is a skill unto itself.
the "AI lies" takeaway is way off for those actually using these tools. Calling it a "junior dev faking competence" is catchy, but misses the point. We're not expecting a co-pilot, it's a tool, a super-powered intern that needs direction. The spaghetti code mess wasn't AI "lying", it was a lack of control and proper prompting.
Experienced folks aren't surprised by this. LLMs are fast for boilerplate, research, and exploring ideas, but they're not autonomous coders. The key is you staying in charge: detailed prompts, critical code review, iterative refinement. Going back to web interfaces and manual pasting because editor integration felt "too easy" is a massive overcorrection. It's like ditching cars for walking after one fender bender.
Ultimately, this wasn't an AI failure, it was an inexperienced user expecting too much, too fast. The "lessons learned" are valid, but not AI-specific. For those who use LLMs effectively, they're force multipliers, not replacements. Don't blame the tool for user error. Learn to drive it properly.
> We're not expecting a co-pilot
Microsoft’s offering is literally called “copilot”. That is exactly what they’re marketing it as.
“Experienced folks” in this case means folks who’ve used LLM’s enough to somewhat understand how to “feed them” in ways that make the tools generate productive output.
Learning to properly prompt an LLM to get a net gain in value is a skill in it of itself.
> TLDR - AI isn’t a co-pilot; it’s a junior dev faking competence. Trust it at your own risk.
A junior dev faking competence while plagiarizing like crazy.
The plagiarizing part is why the junior dev from hell might not get fired: laundering open source copyrights can have beancounter alignment.
[dead]