I've been using Claude Code for a couple of days

458 points by tosh 4 months ago

develoopest 4 months ago

I must be the dumbest "prompt engineer" ever, each time I ask an AI to fix or even worse, create something from scratch it rarely returns the right answer and when asked for modification it will struggle even more.

All the incredible performance and success stories always come from these Twitter posts, I do find value in asking simple but tedious task like a small refactor or generate commands, but this "AI takes the wheel level" does not feel real.

abxyz 4 months ago

I think it's probably the difference between "code" and "programming". An LLM can produce code and if you're willing to surrender to the LLMs version of whatever it is you ask for, then you can have a great and productive time. If you're opinionated about programming, LLMs fall short. Most people (software engineers, developers, whatever) are not "programmers" they're "coders" which is why they have a positive impression of LLMs: they produce code, LLMs produce code... so LLMs can do a lot of their work for them.
Coders used to be more productive by using libraries (e.g: don't write your own function for finding the intersection of arrays, use intersection from Lodash) whereas now libraries have been replaced by LLMs. Programmers laughed at the absurdity of left-pad[1] ("why use a dependency for 16 lines of code?") whereas coders thought left-pad was great ("why write 16 lines of code myself?").
If you think about code as a means to an end, and focus on the end, you'll get much closer to the magical experience you see spoken about on Twitter, because their acceptance criteria is "good enough" not "right". Of course, if you're a programmer who cares about the artistry of programming, that feels like a betrayal.
[1] https://en.wikipedia.org/wiki/Npm_left-pad_incident
- miki123211 4 months ago
  
  Oh, this captures my experience perfectly.
  I've been using Claude Code a lot recently, and it's doing amazing work, but it's not exactly what I want it to do.
  I had to push it hard to refactor and simplify, as the code it generated was often far more complicated than it needed to be.
  To be honest though, most of the code it generated I would accept if I was reviewing another developer's work.
  I think that's the way we need to look at it. It's a junior developer that will complete our tasks, not always in our preferred way, but at 10x the speed, and frequently make mistakes that we need to point out in CR. It's not a tool which will do exactly what we would.
  - biker142541 4 months ago
    
    My experience so far on Claude 3.7 has been over-engineered solutions that are brittle. Sometimes they work, but usually not precisely the way I prompted it to, and often attempts to modify them require more refactoring due to the unnecessary complexity.
    This has been the case so far in both js for web (svelte, react) and python automation.
    I feel like 3.5 generally came up "short" more often than 3.7, but in practical usage it meant I could more easily modify and build on top of. 3.7 has led to a lot of deconstructing, reprompting, starting over.
- jmull 4 months ago
  
  All I really care about is the end result and, so far, LLMs are nice for code completion, but basically useless for anything else.
  They write as much code as you want, and it often sorta works, but it’s a bug filled mess. It’s painstaking work to fix everything, on part with writing it yourself. Now, you can just leave it as-is, but what’s the use of releasing software that crappy?
  I suppose it’s a revolution for that in-house crapware company IT groups create and foist on everyone who works there. But the software isn’t better, it just takes a day rather than 6 months (or 2 years or 5 years) to create. Come to think of it, it may not be useful for that either… I think the end-purpose is probably some kind of brag for the IT manger/exec, and once people realize how little effort is involved it won’t serve that purpose.
  - barbazoo 4 months ago
    
    I love the subtle mistakes that get introduced in strings for example that then take me all the time I saved to fix.
    
    dabinat 4 months ago
    
    Do you have an example of this?
    
    the_lonely_time 4 months ago
    
    Can’t remember account login so created a new account to respond.
    I recently used Claude with something along the lines of “Ruby on rails 8, Hotwire, stimulus, turbo, show me how to do client side validations that don’t require a page refresh”
    I am new to prompt engineering so feel free to critique. Anyway, it generated a stimulus controller called validations_controller.js and then proceeded to print out all of the remaining connected files but in all of them it referred to the string “validation” not “validations”. The solution it provided worked great and did exactly what I wanted to (though I expected a turbo frame based solution not a stimulus solution, but whatever it did what I asked it to do) with the exception of having to change all of the places where it put the string “validation” where it needed to put “validations” to match the name it used in the provided stimulus controller.
    
    sagarpatil 4 months ago
    
    Say you hire a developer and ask him to directly debug an issue by simply skimming through the codebase, do you think he can complete this task say in 5-10 minutes? No, right? In claude code(CC), do the following: 1. /init which acts as a project guide. 2. Ask it to summarize the project as save it as summary.md 3. The prompt needs to be clear and detailed. Here’s an example: https://imgur.com/a/RJyp3f9
    
    Chaosvex 4 months ago
    
    I remember reading the origin article for that prompt example and laughing at how long it likely took to write that essay when typing "hikes near San Francisco" into your favoured search engine will do the same thing, minus the hallucinations.
    
    rcpt 4 months ago
    
    You can ask AI to help with your prompt
    
    dingnuts 4 months ago
    
    are you in the habit of saving bad LLM output to later reference in future Internet disputes?
    
    KingMob 4 months ago
    
    ??? A zillion LLM tools maintain history for you automatically. As long as you remember what the chat was about, it's only a search away.
  - fallinditch 4 months ago
    
    Have you tried using Cursor rules? [1]
    Creating a standard library stdlib with many (potentially thousands) of rules, and then iteratively adding to and amending the rules as you go, is one of the best practices for successful AI coding.
    [1] https://docs.cursor.com/context/rules-for-ai
    
    jmull 4 months ago
    
    > …many (potentially thousands) of rules, and then iteratively adding to and amending the rules as you go…
    Is this an especially better (easier, more efficient) route to a working, quality app/system than conventional programming?
    I’m skeptical if the answer to the way to achieve 10x results is 10x more effort.
    
    fallinditch 4 months ago
    
    It's such a fast moving space, perhaps the need for 'rules' is just a temporary thing, but right now the rules will help you to achieve more predictable results and higher quality code.
    You could easily end up with a lot of rules if you are working with a reasonably large codebase.
    And as you work on your code, every time you have to deal with an issue of the code generation you ask Cursor to create a new rule so that next time it does it correctly.
    In terms of AI programming vs conventional programming, the writing's on the wall: AI assistance is only getting better and now is a good time to jump on the train. Knowing how to program and configure your AI assistants and tools is now a key software engineering skill.
    
    habinero 4 months ago
    
    Or it's just a bubble and after diminishing returns they'll go in the bin with all the blockchain startups lol
    
    ZeroTalent 4 months ago
    
    10x more effort once, 10x faster programming forever. Also once you got examples of the rules files, LLM can write most of them for next projects.
    
    crucialfelix 4 months ago
    
    I think specifying rules could be very useful in the same way as types, documentation, coding styles, advanced linting, semgrep etc.
    We could use it for LLM driven coding style linting. Generate PRs for refactoring. Business logic bug detector
    Also, you can just tell copilot to write rules for you.
    
    UltraSane 4 months ago
    
    at that point aren't you just replacing regular programming with creating the thousands of rules? I suppose the rules are reusable so it might be a form of meta-programming or advanced codegen
    
    nyarlathotep_ 4 months ago
    
    People endlessly creating "rules" for their proompts is the new version of constantly tweaking Vim or Emacs configurations
    
    UltraSane 4 months ago
    
    Born too late to explore the world.
    Born to early to explore the galaxy.
    Born at the right time to write endless proompts for LLMs.
- bikamonki 4 months ago
  
  Right on point. The same principle applies when deciding whether to use a framework or not. Coders often marvel at the speed with which they can build something using a framework they don’t fully understand. However, a true programmer seeks to know and comprehend what’s happening under the hood.
  - InvertedRhodium 4 months ago
    
    I'll preface this with the fact I agree that there is a difference between using a framework and being curious enough to dig into the details I think you're veering into No True Scotsman territory here.
    IMO, the vast majority of programmers wouldn't meet the definition you've put forward here. I don't know many that dig into the operating system, platform, or hardware all that much, though I work in streaming media so that might just be an industry bias.
    
    vbezhenar 4 months ago
    
    I'm the one who feels uncomfortable when something is magical. I'm trying to dig whenever possible. I don't always have success. I know very little about Linux kernel internals, for example.
    That said, it's rarely pays out. Quite the opposite: I often spending lots of time digging unnecessarily. I'm not complaining, I am who I am and I don't want to change. Digging into internals makes me happier and satisfied. And sometimes it is useful. So Linux kernel internals are on the roadmap and hopefully I'll dig into it some day.
    I agree that absolute majority of people I met are not of that kind. And probably they should not. Learning business side of the software is what makes someone really useful. I hate business side, I only love computer side.
    
    baruch 4 months ago
    
    I find that my understanding of different layers is usually not needed but is rarely important toy ability to solve thorny issues or figure out ways to do things or debug issues that others are not even trying.
    I'm working in areas of systems programming where such knowledge and willingness to go deeper into the stack is helpful, even if rarely so.
    I can't say I understand the kernel, I only scratch the surface of most of it and only dug deeper where I really needed to and only as deep as time allowed me and the need required though.
    
    jimbokun 4 months ago
    
    That just means they are not programmers, but coders as defined here.
- icedchai 4 months ago
  
  This aligns with my experience. I've seen LLMs produce "code" that the person requesting is unable to understand or debug. It usually almost works. It's possible the person writing the prompt didn't actually understand the problem, so they got a half baked solution as a result. Either way, they need to go to a human with more experience to figure it out.
  - hsuduebc2 4 months ago
    
    Tbh If I do not understand generated code perfectly, meaning it is using slightly something I do not know I usually spend approximately same time understanding generated code as writing it myself.
- beezlewax 4 months ago
  
  I'm waiting for artisan programming to become a thing.
  - discordance 4 months ago
    
    by 100% organic, free range and fair trade programmers
    
    djmips 4 months ago
    
    Replace programmers with 'intelligence' to contrast with artificial
  - pydry 4 months ago
    
    Artisanal code has been a thing for a long while.
    If we're the luddite artisans, LLMs seem to represent the knitting frames which replaced their higher quality work with vastly cheaper, far crappier merchandise. There is a historical rhyme here.
    
    ReptileMan 4 months ago
    
    You didn't had to spend time debugging a peace of cloth, and the cloth defects are obvious
    
    pydry 4 months ago
    
    There's a lot of code out there written for people who are far more concerned with cost and speed than quality - analogous to the "fast fashion" consumer segment.
    Ive worked on all sorts of code bases filled to the brim with bugs which end users just worked around or ignored or didnt even encounter. Pre-product market fit startups, boring crappy CRUD for routine admin, etc.
    It was horrible shit for end users and developers (me) but demand is very high. I expect demand from this segment will increase as LLMs drive the cost of supply to nearly zero.
    I wouldnt be surprised if high end software devs (e.g. >1 million hit/day webapps where quality is critical) barely do anything different while the demand for devs at the low end of the market craters.
    
    majormajor 4 months ago
    
    > I wouldnt be surprised if high end software devs (e.g. >1 million hit/day webapps where quality is critical) barely do anything different while the demand for devs at the low end of the market craters.
    Over the last few decades the prevalance of more and more frameworks to handle more and more of what used to be boilerplate that everyone had to do for GUI apps has not significantly impacted the bottom end of the job market.
    It's only, if anything, caused even higher demand for "just one more feature".
    When's the last time you worked on a product in that space where the users or product managers didn't have any pending feature requests? Didn't add new ideas to the backlog faster than the team could implement?
    Someone's gonna have to know just enough to make sure the LLM is producing shit that passes the (low) bar of adequateness. But that job will probably be even further away from the CS-knowing, performance-critical, correctness-matters roles than it is today. It's already a big gap, I just don't see the shit-shoveler job going away. The corporate world loves its shit shoveling.
  - LeftHandPlane 4 months ago
    
    Artisanal firmware is the future (or the past? or both?): https://www.youtube.com/watch?v=vBXsRC64hw4
  - jdmoreira 4 months ago
    
    From before people even knew what llms were: https://handmade.network
  - dr_dshiv 4 months ago
    
    Like, writing binaries directly? Is assembly code too much of an abstraction?
    
    swat535 4 months ago
    
    People stress about good system design because of maintainability. No one cares about binary code because that's just the end result. What matters is the code that generates it, as that’s what needs to be maintained.
    We have not yet reached a point where LLM generated code can also be maintained by LLMs and the tooling is not there. Once that happens, your argument will hold more weight. But for now, it doesn’t. Injecting unreadable, likely bug-ridden code into your application increases technical debt tenfold.
- someothherguyy 4 months ago
  
  > If you think about code as a means to an end, and focus on the end
  The problem with this is that you will never be able to modify the code in a meaningful way after it crosses a threshold, so either you'll have a prompt only modification ability, or you will just have to rewrite things from scratch.
  I wrote my first application ever (equivalent to a education CMS today) in the very early 2000s with barely any notion of programming fundamentals. It was probably a couple hundred thousand lines of code by the time I abandoned it.
  I wrote most of it in HTML, JS, ASP and SQL. I was in high school. I didn't know what common data structures were. I once asked a professor when I got into late high school "why arrays are necessary in loops".
  We called this cookbook coding back in the day.
  I was pretty much laughed at when I finally showed people my code, even though it was a completely functional application. I would say an LLM probably can do better, but it really doesn't seem like something we should be chasing.
- oxag3n 4 months ago
  
  I tried LLMs for my postgraduate "programming" tasks to create lower level data structures and algorithms that are possible to write a detailed requirements for - they failed miserably. When I pushed in certain directions, I've got student level replies like "collision probability is so low we can just ignore it", while same LLM accurately estimated that in my dataset there will be collisions.
  And I don't believe until I see LLMs can use real debugger to figure out a root cause for a sophisticated, cascading bug.
- mrits 4 months ago
  
  This surrendering to the LLM has been going around a lot lately. I can only guess it is from people that haven't tried it very much themselves but love to repeat experiences from other people.
- bodhi_mind 4 months ago
  
  I’m a software developer by trade but also program art creation tools as a hobby. Funny thing is, at work, code is definitely a means to an end. But when I’m doing it for an art project, I think of the code as part of the art :) the process of programming and coming up with the code is ultimately a part of the holistic artistic output. The two are not separate, just as the artists paint and brushes are also a part of the final work of a painting.
- roflyear 4 months ago
  
  > LLMs version of whatever it is you ask for, then you can have a great and productive time
  Sure, but man are there bugs.
- nbardy 4 months ago
  
  This is untrue.
  You can be over specified in your prompts and say exactly what types and algorithms you want if you’re opinionated.
  I often write giant page long specs to get exactly the code I want.
  It’s only 2x as fast as coding, but thinking in English is way better than coding.
- throwaway2037 4 months ago
  > Programmers laughed at the absurdity of left-pad[1] ("why use a dependency for 16 lines of code?")
  I'm confused. Did they laugh, then still use it as a dependency? If not, did they reinvent the wheel or copy the 16 lines into their project? Right up until the day of the "NPM left-pad incident" tiny dependencies were working just fine.
  Also, if you cannot tell the difference between code written by an LLM or a human, what is the difference? This whole post is starting to feel like people with very strong (gaterkeeper'ish) views on hi-fi stereo equipment, coffee, wine, ... and programming. Or should I say "code-as-craft" <cringe>?
  - abxyz 4 months ago
    
    My comment isn't intended to pit "programmers" against "coders" or suggest that one is better than the other. I think the distinction is useful to help people understand why LLMs can be game-changing for some, and useless for others, because our attitudes towards programming/code can be so different.
    If you go through and read the left-pad posts here on hn, you'll find people at both extremes: some fiercely defend left-pad as useful and worthwhile and think writing left-pad yourself is dumb-as-hell, and then at the other end you'll find some fiercely deride using left-pad as absurd when they could just write the code themselves. Here's a good thread to start with: https://news.ycombinator.com/item?id=11348798
    Personally, I'd rather hire a "coder" than a "programmer" and consider myself more "coder" than "programmer" :)
- jv22222 4 months ago
  
  Thank you for eloquently saying what I've been trying hard to express.
- gitgud 4 months ago
  
  Interesting, but it seems ridiculous to disambiguate “Programmer” vs “Coder”.
  They’re synonymous words and mean the same thing right?
  Person who writes logic for machines
  - madmountaingoat 4 months ago
    
    The different mindsets exist, but I agree these are bad words to differentiate them. Back when I started in software in the 80s a common expression was: there are two types of programmers, nerds and hippies. The distinction falling along similar lines - nerds needed to taste the metal, while hippies were more interested in getting stuff done.
  - dpritchett 4 months ago
    
    There may never be a perfect taxonomy of programmer archetypes.
    I imagine most of us here can agree that some elevate the craft and ritual to great effect while others avoid such high-minded conceits in favor of shipping whatever hits the expected target this week.
    I’ve been both at different points in my career. Usually it’s a response to my environment.
  - jimbokun 4 months ago
    
    Who cares? Substitute whatever labels you prefer, but the distinction between the two groups I’d certainly real.
    
    throwaway743 4 months ago
    
    Obviously you do - enough to respond.
  - fullstackchris 4 months ago
    
    What about those two vs. the concept of "software engineering"? - there, the "code" or "program" is even _less_ important, just a tool in an ecosystem where you are asking bigger questions like "is this maintainable / testable / readable / etc.?" "is this the right language / framework for my org / others to use" and so on and so on. These questions and context quite literally represent billions of context tokens in LLM world and why I continually do not worry about 99% of the fear mongering of them replacing anybody who writes code.
  - baq 4 months ago
    
    Yeah this one is dumb.
    The real distinction is programmer vs software engineer, or IOW do you want to write code or solve problems?
BeetleB 4 months ago

Some hints for people stuck like this:
Consider using Aider. It's a great tool and cheaper to use than Code.
Look at Aiders LLM leaderboard to figure out which LLMs to use.
Use its architect mode (although you can get quite fast without it - I personally haven't needed it).
Work incrementally.
I use at least 3 branches. My main one, a dev one and a debug one. I develop on dev. When I encounter a bug I switch to debug. The reason is it can produce a lot of code to fix a bug. It will write some code to fix it. That won't work. It will try again and write even more code. Repeat until fixed. But in the end I only needed a small subset of the new code. So you then revert all the changes and have it fix it again telling it the correct fix.
Don't debug on your dev branch.
Aider's auto committing is scary but really handy.
Limit your context to 25k.
Only add files that you think are necessary.
Combining the two: Don't have large files.
Add a Readme.md file. It will then update the file as it makes code changes. This can give you a glimpse of what it's trying to do and if it writes something problematic you know it's not properly understanding your goal.
Accept that it is not you and will write code differently from you. Think of it as a moderately experienced coder who is modifying the codebase. It's not going to follow all your conventions.
https://aider.chat/
https://aider.chat/docs/leaderboards/
- majormajor 4 months ago
  
  > I use at least 3 branches. My main one, a dev one and a debug one. I develop on dev. When I encounter a bug I switch to debug. The reason is it can produce a lot of code to fix a bug. It will write some code to fix it. That won't work. It will try again and write even more code. Repeat until fixed. But in the end I only needed a small subset of the new code. So you then revert all the changes and have it fix it again telling it the correct fix.
  how big/complex does the codebase have to be for this to be for you to actually save time compared to just using a debugger and fixing it yourself directly? (I'm assuming here that bugs in smaller codebases are that much easier for a human to identify quickly)
  - BeetleB 4 months ago
    
    So far I've used Aider for only a few projects - almost all where it starts from scratch. And virtually always for personal use - not work. As such, the focus on quality is not as high (i.e. there's no downside to me letting it run wild).
    So I hope you can understand it when I say: Why should I waste my brain cells debugging when I can just tell it to fix its own problems?
    Say you want something done (for personal use), and don't have the time to develop it yourself. Someone volunteers to write it for you. You run it, and it fails. Would you spend much time reading code someone else wrote to find the bug, or just go back to the guy with the error?
    Yes, I have had to debug a few things myself occasionally, but I do it only when it's clear that the LLM isn't able to solve it.
    In other cases, I'm writing something in a domain I'm not fully knowledgeable (or using a library I'm only mildly familiar with). So I lack the knowledge to debug quickly. I would have to read the docs or Google. Why Google when the LLM is more effective at figuring it out? Certainly in a few cases, the solution turned out to require knowledge I did not have, and I appreciate that the LLM solved it (and I learned something as a result).
    The point with all this is: The experience not binary. It's the full spectrum. For my main codebase I'm responsible for at work, I haven't bothered using an LLM (and I have access to Copilot). I need to ensure the quality of the code, and I don't want to spend my time understanding the code the LLM wrote - to the level I would need to feel comfortable pushing to production.
- geoka9 4 months ago
  
  Thanks, that's a helpful set of hints!
  Can you provide a ballpark of what kind of $ costs we are talking here for using Aider with, say, Claude? (or any other provider that you think is better at the moment).
  Say a run-of-the-mill bug-fixing session from your experience vs the most expensive one off the top of your head?
  - BeetleB 4 months ago
    
    I've used it only a few times - mostly for projects written from scratch (not existing codebases). And so far only with Claude Sonnet.
    Twice I had a "production ready" throwaway script in under $2 (the first was under a dollar). Both involved some level of debugging. But I can't overstate how awesome it is to have a single use script be so polished (command line arguments, extensive logging, etc). If I didn't make it polished, it would probably have been $0.15.
    Another one I wrote - I probably spent $5-8 total, because I actually had it do it 3 times from scratch. The first 2 times there were things I wasn't happy with, or the code got too littered with attempts to debug (and I was not using 3 branches). When I finally figured everything out, I started again for the third time and it was relatively quick to get something working.
    Now if I did this daily, it's a tad expensive - $40-60/month. But I do this only once or twice a week - still cheaper than paying for Copilot. If I plan to use it more, I'd likely switch to DeepSeek. If you look at the LLM leaderboard (https://aider.chat/docs/leaderboards/), you'll see that R1 is not far behind Sonnet, and is a third of the cost.
  - yoyohello13 4 months ago
    
    When I was using aider with Claude 3.5 api it cost about $0.01 per action.
- tptacek 4 months ago
  
  The three-branch thing is so smart.
  - BeetleB 4 months ago
    
    It took a while for me to realize it, and frankly, it's kind of embarrassing that I didn't think of it immediately.
    It is, after all, what many of us would do in our manual SW development. But when using an LLM that seems pretty good, we just assume we don't need to follow all the usual good practices.
    
    vlovich123 4 months ago
    
    Does the LLM make commits along the way? I think I’m missing why you need all these branches vs git reset —hard once it figures out the bug?
    
    BeetleB 4 months ago
    
    Aider, by default, makes commits after each change (so that you can easily tell it to "undo"). Once a feature is done, you manually squash the commits if desired. Some people love it, some hate it.
    You can configure it not to autocommit, although I suppose the "undo" command won't work in that case.
    
    matthewmc3 4 months ago
    
    The just sounds like ⌘-Z with extra steps.
    
    BeetleB 4 months ago
    
    Aider doesn't run in your editor. Undo in editor won't undo Aider's changes.
- ddanieltan 4 months ago
  
  do you have a special prompt to instruct aider to log file changes in the repo's README? I've used aider in repos with a README.md but it has not done this update. (granted, i've never /add the readme into aider's context window before either...)
  - Ey7NFZ3P0nzAe 4 months ago
    
    Take a loot at conventions.md in aider.chat documentation
branko_d 4 months ago

I have the same experience.
Where AI shines for me is as a form of a semantic search engine or even a tutor of sorts. I can ask for the information that I need in a relatively complex way, and more often than not it will give me a decent summary and a list of "directions" to follow-up on. If anything, it'll give me proper technical terms, that I can feed into a traditional search engine for more info. But that's never the end of my investigation and I always try to confirm the information that it gives me by consulting other sources.
- mentalgear 4 months ago
  
  Exactly same experience: since the early-access GPT-3 days, I played out various scenarios, and the most useful case has always been to use generativeAI as semantic search. It's generative features are just lacking in quality (for anything other than a toy project), and the main issues since the early GPT days remains, even though it gets better, it's still too unreliable for (mid-complex systems) serious work. Also, if you don't pay attention, it messes up other parts of the code.
- jofzar 4 months ago
  
  Yeah I have had some "magic" moments where I knew "what" I needed, had an idea of "how it would look",but no idea how to do it and ai helped me understand how I should do it instead of the hacky very stupid way I would have done it
- Yoric 4 months ago
  
  Same here. In some cases, brainstorming even kinda works – I mean, it usually gives very bad responses, but it serves as a good duck.
  Code? Nope.
smallerfish 4 months ago

I've done code interviews with hundreds of candidates recently. The difference between those who are using LLMs effectively and those who are not is stark. I honestly think engineers who think like OP are going to get left behind. Take a weekend to work on getting your head around this by building a personal project (or learning a new language).
A few things to note:
a) Use the "Projects" feature in Claude web. The context makes a significant amount of difference in the output. Curate what it has in the context; prune out old versions of files and replace them. This is annoying UX, yes, but it'll give you results.
b) Use the project prompt to customize the response. E.g. I usually tell it not to give me redundant code that I already have. (Claude can otherwise be overly helpful and go on long riffs spitting out related code, quickly burning through your usage credits).
c) If the initial result doesn't work, give it feedback and tell it what's broken (build messages, descriptions of behavior, etc).
d) It's not perfect. Don't give up if you don't get perfection.
- triyambakam 4 months ago
  
  Hundreds of candidates? That's significant if not an exaggeration. What are the stark differences you have seen? Did you inquire about the candidate's use of language models?
  - smallerfish 4 months ago
    
    Yes. I do async video interviews in round 1 of my interview process in order to narrow the candidate funnel. Candidates get a question at the start of the interview, with a series of things to work through in their own IDE while sharing their screen. I review all recordings (though I will skip around, and if candidates don't get very far I won't spend a lot of time watching at 1x speed.) The question as laid out encourages them to use all of the tools they usually rely on while coding (including google, stackoverflow, LLMs, ...).
    Candidates who use LLMs generally get through 4 or 5 steps in the interview question. Candidates who don't are usually still on step 2 by the end of the interview (with rare exceptions), without their code quality being significantly better.
    (I end up in 1:1 interviews with perhaps 10-15% of candidates who take round 1).
    
    gammarator 4 months ago
    
    So you’re not _interviewing_ them, you’re having them complete expensive work-sample tests. And your evaluation metric is “completes lots of steps in a small time box.”
    
    anon22981 4 months ago
    
    Seems more like trying to find out the most proficient LMM users than anything else. I’ve never done interviews but I imagine I’d be hard pressed to skip candidates solely because they aren’t using LLMs.
    Each to their own and maybe their method works out, but it does seem whack.
    
    ZeroTalent 4 months ago
    
    The thing is, when you're doing frontend, a human programmer can't write 4,000 lines of React code in 1 hour. A properly configured LLM system can.
    This is why I wouldn't hire a person who doesn't know how to do this.
    
    christophilus 4 months ago
    
    What are you doing where 4000 lines of LLM-generated code per hour is a net positive? Sounds like a techdebt machine to me.
    
    ZeroTalent 4 months ago
    
    UIs in React are very verbose. I'm not saying this is running 24/7.
    
    alextingle 4 months ago
    
    Is the question actually difficult, though? If you ask for some standard task, then of course those who are leaning heavily on LLMs will do well, as that's exactly where they work best. That doesn't tell you anything about the performance of those candidates in situations where the LLM won't help them.
    I suppose, if you are specifically looking for coders to perform routine tasks, then you'll get what you need.
    Of course, you could argue that ~90% of a programmer's work day is performing standard tasks, and even brilliant programmers who don't use LLMs will lose so much productivity that they are not worth hiring... Counterpoint: IMO, the amount of code you bash out in a given time bears no relation to your usefulness as a programmer. In fact, producing lots of code is often a problem.
    
    smallerfish 4 months ago
    
    No, I'm not doing leetcode or algorithm questions - it's basically "build a [tiny] product to specs", in a series of steps. I'm evaluating candidates on their process, their effectiveness, their communication (I ask for narration), and their attention to detail. I do review code afterwards. And, bear in mind that this is only round 1 - once I talk with the ones who do well, I'll go deep on a number of topics to understand how well rounded they are.
    I think it's a reasonably balanced interview process. Take home tests are useless now that LLMs exist. Code interviews are very time consuming on the hiring side. I'm a firm believer that hiring without some sort of evaluation of practical competence is a very bad idea - as is often discussed on here, the fizzbuzz problem is real.
    
    majormajor 4 months ago
    
    > it's basically "build a [tiny] product to specs", in a series of steps
    That seems like exactly what the person you're replying to is saying - that sounds like basic standard product-engineering stuff, but simpler, like any of a million examples out there that an LLM has seen a million times. "Here's a problem LLMs are good at, wow, the people using the LLMs do best at it." Tautolgy.
    So it's great for finding people who can use an LLM to do tiny product things.
    In the same way takehomes had all the same limitations. More power to you if those are the people you are looking for, though.
    But it also sounds like a process that most people with better options are gonna pass on most of the time. (Also like with takehomes.)
    
    smallerfish 4 months ago
    
    > "Here's a problem LLMs are good at, wow, the people using the LLMs do best at it."
    Yes, product engineering, the thing that 90% of developers do most of their time.
    
    mrheosuper 4 months ago
    
    But what LLM haven't look at ?
    
    kuschku 4 months ago
    
    LLMs have looked at everything that exists so far. If all you're creating is yet another "Uber for Dogs", LLMs will do fine.
    But if you want to create a paradigm shift, you need to do something new. Something LLMs don't yet know about.
    
    branko_d 4 months ago
    
    > I ask for narration
    That's a mistake. There are plenty of people who are not good multitaskers and cannot effectively think AND talk at the same time, myself included.
    IMHO, for programming, monotaskers tend to do better than multitaskers.
    
    computerdork 4 months ago
    
    Haven't coded for a couple years (but have been a dev for two decades) and haven't used LLM's myself for coding (not against this), so am really just curious, wouldn't you want to know if a dev can solve and understand problems themselves?
    Because it seems like tougher real-world technical problems (where there are tons of dependencies with other systems in addition to technical and business requirements) needs for the dev to have an understanding of how things work, and if you rely on an LLM, you may not gain enough of an understanding of what's going on to solve problems like this...
    ... Although, I could see how devs that are more focused on application development and knowing the business domain is their key skill, wouldn't need to have as strong an understanding of the technical (no judgement here, have been in this role myself at times).
    
    smallerfish 4 months ago
    
    > Haven't coded for a couple years (but have been a dev for two decades) and haven't used LLM's myself for coding (not against this), so am really just curious, wouldn't you want to know if a dev can solve and understand problems themselves?
    Yes, definitely, though I lean more on the 1:1 interviews for that. I understand the resistance to this from developers, but there's a lot of repetition across the industry in product engineering, and so of course it can be significantly optimized with automation. But, you still need good human brains to make most architectural decisions, and to develop IP.
    
    computerdork 4 months ago
    
    Ah, I see, round 1 is just the initial weeder, while on top of this, you'd like devs that are using LLM's for automation. Sounds like a good balance:)
    
    namaria 4 months ago
    
    Are you concerned that eventual LLM price hikes in the near future reflecting their real cost might explode your cost or render your workforce ineffective?
  - nsonha 4 months ago
    
    if it's real that person interviewed at least one candidate per day last year. Idk what kind of engineering role in what kind of org where you even do that.
    
    roarcher 4 months ago
    
    I suspect he doesn't do much engineering, which would explain why he's impressed by candidates who can quickly churn out small rote sample projects with AI. Anyone who actually writes software for a living knows that working on a large production code base has little in common with this.
    
    simonw 4 months ago
    
    When I've had an open req for my team at a California tech company I've had days where I would interview (remotely) 2-3 candidates in a single day, several days a week for several weeks straight. It's not impossible to interview 100 people in a few months at that rate.
    
    jmull 4 months ago
    
    So... do you really think you were doing it right?
    Sorry this is a little harsh, but how do you get anywhere near 100 people before realizing the approach must be horribly flawed, and devising and implementing a better one? Surely it behooves you to not waste your employer's time, your own time, and the time of all those people you're interviewing (mostly pointlessly).
    
    simonw 4 months ago
    
    These were 25 minute phone screens for candidates that had been sourced by recruiting, a few years ago at a company that was going through hyper-growth (hiring hundreds of engineers in a year). Phone screening several dozen people for 2-4 eventual hires doesn't feel too inefficient to me.
    
    h4ny 4 months ago
    
    There are companies whose product is high-quality mock interviews. I wouldn't be surprised by that number of interviews in just a year and it can easily be more than one candidate per day.
    Edit: there are also recruitment agencies with ex-engineers that do coding interviews, too.
- jacobedawson 4 months ago
  
  I'd add to that that the best results are with clear spec sheets, which you can create using Claude (web) or another model like ChatGPT or Grok. Telling them what you want and what tech you're using helps them create a technical description with clear segments and objectives, and in my experience works wonders in getting Claude Code on the right track, where it has full access to the entire context of your code base.
- cheema33 4 months ago
  
  > The difference between those who are using LLMs effectively and those who are not is stark.
  Same here. Most candidates I interviewed said they did not use AI for development work. And it showed. These guys were not well informed on modern tooling and frameworks. Many of them seemed stuck in/comfortable with their old way of doing things and resistant to learning anything new.
  I even hired a couple of them, thinking that they could probably pick up these skills. That did not happen. I learned my lesson.
  - pm215 4 months ago
    
    Isn't that more correlation than causation, though? The kind of person who's not keeping up with the current new tech hotness isn't going to be looking at AI or modern frameworks; and conversely the kind of person who's dabbling with AI is also likely to be looking at other leading-edge tech stuff in their field. That seems to me more likely to be the cause of what you're seeing than that the act of using AI/LLMs itself resulting in candidates improving their knowledge and framework awareness.
InvertedRhodium 4 months ago

My workflow for that kind of thing goes something like this (I use Sonnet 3.7 Thinking in Cursor):
1. 1st prompt is me describing what I want to build, what I know I want and any requirements or restrictions I'm aware of. Based on these requirements, ask a series of questions to produce a complete specification document.
2. Workshop the specification back and forward until I feel it's complete enough.
3. Ask the agent to implement the specification we came up with.
4. Tell the agent to implement Cursor Rules based on the specifications to ensure consistent implementation details in future LLM sessions.
I'd say it's pretty good 80% of the time. You definitely still need to understand the problem domain and be able to validate the work that's been produced but assuming you had some architectural guidelines you should be able to follow the code easily.
The Cursor Rules step makes all the difference in my experience. I picked most of this workflow up from here: https://ghuntley.com/stdlib/
Edit: A very helpful rule is to tell Cursor to always checkout a new branch based on the latest HEAD of master/main for all of it's work.
- theshrike79 4 months ago
  
  I need to steal the specification idea.
  Cursor w/ Claude has a habit of running away on tangents instead of solving just the one problem, then I need to reject its changes and even roll back to a previous version.
  With a proper specification as guideline it might stay on track a bit better.
  - prettyblocks 4 months ago
    
    Copilot supports this somewhat natively:
    https://docs.github.com/en/copilot/customizing-copilot/addin...
    The first thing I do for a new project is ask Copilot to create a custom-instructions.md for me and then as I work on my projects, I ask it to update the instructions every now and then based on the current state of my project.
    Much less misses this way in my experience.
    
    theshrike79 4 months ago
    
    Cool!
    I just spent last night working with Cursor and Claude Code, both support different styles of custom rules.
    Then I got to work and was lamenting what my corp sponsored Copilot can't do the same - but turns out it can! :D
    EDIT: Just tried it and Copilot is completely clueless. It can explain the file, but doesn't know how to generate one. Claude generates its own CLAUDE.md on startup. Cursor can create its own rulefiles pretty well, even better if you add a rule for generating rules.
    
    prettyblocks 4 months ago
    
    I have had success with having copilot generate the file for me with no issues. I use sonnet 3.5 or 3.7. I start by writing a paragraph describing the project and it does pretty well.
slooonz 4 months ago

I decided to try seriously the Sonnet 3.7. I started with a simple prompt on claude.ai ("Do you know claude code ? Can you do a simple implementation for me ?"). After minimal tweaking from me, it gave me this : https://gist.github.com/sloonz/3eb7d7582c33e95f2b000a0920016...
After interacting with this tool, I decided it would be nice if the tool could edit itself, so I asked (him ? it ?) to create its next version. It came up with a non-working version of this https://gist.github.com/sloonz/3eb7d7582c33e95f2b000a0920016.... I fixed the bug manually, but it started an interactive loop : I could now describe what I wanted, describe the bugs, and the tool will add the features/fix the bugs itself.
I decided to rewrite it in Typescript (by that I mean: can you rewrite yourself in typescript). And then add other tools (by that: create tools and unit tests for the tools). https://gist.github.com/sloonz/3eb7d7582c33e95f2b000a0920016... and https://gist.github.com/sloonz/3eb7d7582c33e95f2b000a0920016... have been created by the tool itself, without any manual fix from me. Setting up the testing/mock framework ? Done by the tool itself too.
In one day (and $20), I essentially had recreated claude-code. That I could improve just by asking "Please add feature XXX". $2 a feature, with unit tests, on average.
- WD-42 4 months ago
  
  So you’re telling me you spent 20 dollars and an entire day for 200 lines of JavaScript and 75 lines of python and this to you constitutes a working re-creation of Claude Code?
  This is why expectations are all out of whack.
  - BeetleB 4 months ago
    
    That amount of output is comparable to what many professional engineers produce in a given day, and they are a lot more expensive.
    Keep in mind this is the commenters first attempt. And I'm surprised he paid so much.
    Using Aider and Sonnet I've on multiple occasions produced 100+ lines of code in 1-2 hours, for under $2. Most of that time is hunting down one bug it couldn't fix by itself (reflective of real world programming experience).
    There were many other bugs, but I would just point out the failures I was seeing and it would fix it itself. For particularly difficult bugs it would at times even produce a full new script just to aid with debugging. I would run it and it would spit out diagnostics which I fed back into the chat.
    The code was decent quality - better than what some of my colleagues write.
    I could probably have it be even more productive if I didn't insist on reading the code it produced.
    
    WD-42 4 months ago
    
    The lines of code isn’t the point. Op claimed they asked Claude to recreate Claude code and it was successful. This is obviously an extreme exaggeration. I think this is the crux of a lot of these posts. This code generator output a very basic utility. To some this is a revelation, but it leaves others wondering what all the fuss is about.
    It seems to me people’s perspective on code gen has largely to do with their experience level of actually writing code.
    
    BeetleB 4 months ago
    
    It's a very narrow reading of his comment. What he meant to say was it quickly created a rudimentary version of an AI code editor.
    Just as a coworker used it to develop an AI code review tool in a day. It's not fancy - no bells and whistles, but it's still impressive to do it in a day with almost no manual coding.
    
    WD-42 4 months ago
    
    > In one day (and $20), I essentially had recreated claude-code.
    Not sure it’s a narrow reading. This is my point, if it’s a basic or rudimentary version people should be explicit about that. Otherwise these posts read like hype and only lead to dissatisfaction and disappointment for others.
    
    BeetleB 4 months ago
    
    s/reading/interpretation/
    Reading something literally is by definition the narrowest interpretation.
    
    majormajor 4 months ago
    
    > Using Aider and Sonnet I've on multiple occasions produced 100+ lines of code in 1-2 hours, for under $2. Most of that time is hunting down one bug it couldn't fix by itself (reflective of real world programming experience).
    Was this using technologies you aren't familiar with? If not, the output rate seems pretty low (very human-paced, just with an extra couple bucks spent.)
    
    BeetleB 4 months ago
    
    By 100+ I mean 100-300 lines. I think most people aren't churning out 100 lines of code per hour unless it involves boilerplate.
    More importantly, the 100-300 lines was very low effort for me. That does have its downsides (skills atrophy).
    
    slooonz 4 months ago
    
    Remember that input tokens are quadratic with the length of the conversation, since you re-upload the n previous messages to get the (n+1)-nth message. When Claude completes a task in 3-4 shots, that’s cents. When he goes down in a rabbit hole, however…
    
    BeetleB 4 months ago
    
    I'm aider there's a command to "reset" so it doesn't send any prior chat. Whenever I complete a mini feature I invoke the command. It helpfully shows the size of the current contact in tokens and the cost so I keep an eye on it.
    Doesn't Code have a similar option?
    
    mwigdahl 4 months ago
    
    It does — /clear. It also has /compact to summarize previous tasks to preserve some situational awareness while reducing context bulk.
  - slooonz 4 months ago
    
    2200 lines. Half of them unit tests I would probably have been too lazy to write myself even for a "more real" project. Yes, I consider $20 cheap for that, considering:
    1. It’s a learning experience 2. Looking at the chat transcripts, many of those dollars are burned for stupid reasons (Claude often fails with the insertLines/replaceLines functions and break files due to miss-by-1 offset) that are probably fixable 3. Remember that Claude started from a really rudimentary base with few tools — the bootstrapping was especially inefficient
    Next experiment will be on an existing codebase, but that’s probably for next weekend.
- Silhouette 4 months ago
  
  Thanks for writing up your experience and sharing the real code. It is fascinating to see how close these tools can now get to producing useful, working software by themselves.
  That said - I'm wary of reading too much into results at this scale. There isn't enough code in such a simple application to need anything more sophisticated than churning out a few lines of boilerplate that produce the correct result.
  It probably won't be practical for the current state of the art in code generators to write large-scale production applications for a while anyway just because of the amount of CPU time and RAM they'd need. But assuming we solve the performance issues one way or another eventually it will be interesting to see whether the same kind of code generators can cope with managing projects at larger scales where usually the hard problems have little to do with efficiently churning out boilerplate code.
  - NitpickLawyer 4 months ago
    
    aider has this great visualisation of "self written code" - https://aider.chat/HISTORY.html
    
    throwaway0123_5 4 months ago
    
    I suspect it would be somewhat challenging to do, but I'd love to see something like this where the contributions are bucketed into different levels of difficulty. It is often the case for me that a small percentage of the lines of code I write take a large percentage of the time I spend coding (and I assume this is true for most people).
matt_heimer 4 months ago

LLM are replacing Google for me when coding. When I want to get something implemented, let's say make a REST request in Java using a specific client library, I previously used Google to find example of using that library.
Google has gotten worse (or the internet has more garbage) so finding code an example is more difficult than it used to be. Now I ask an LLM for an example. Sometimes I have to ask for a refinement and and usually something is broken in the example but it takes less time to get the LLM produced example to work than it does to find a functional example using Google.
But the LLM has only replaced my previous Google usage, I didn't expect Google to develop my applications and I don't with LLMs.
- ptmcc 4 months ago
  
  This has been my experience of successful usage as well. It's not writing code for me, but pulling together the equivalent of a Stack Overflow example and some explaining sentences that I can follow up on. Not perfect and I don't blindly copy paste it, same as Stack Overflow ever was, but faster and more interactive. It's helpful for wayfinding, but not producing the end result.
- deergomoo 4 months ago
  
  I used the Kagi free trial when I was doing Advent of Code in a somewhat unfamiliar language (Swift) last year, as well as ChatGPT occasionally.
  The LLM was obviously much faster and the information was much higher density, but it had quite literally about a 20% rate of just making up APIs from my limited experiment. But I was very impressed with Kagi’s results and ended up signing up, now using it as my primary search engine.
  - heisenbit 4 months ago
    
    It is really a double edged sword. Some APIs I would not have found myself. In some way an AI works like my mind fuzzy associating memory fragements: There should be an option for this command to do X because similar commands have this option and it would be possible to provide this option. But in reality the library is less than perfectly engineered and the option is not there. The AI also guesses the option is there. But I do not need a guess when I ask the AI - I need reliable facts. If the cost of an error is not high I still ask the AI and if it fails it is back to RTFM but if the cost of failure is high then everything that comes out of a LLM needs checking.
  - theshrike79 4 months ago
    
    I did the Kagi trial in the fall of 2023 and tried to hobble along with the cheapest tier.
    Then I got hooked by having a search engine that actually finds the stuff I need and I've been a subscriber for bit over a year now.
    Wouldn't go back to Google lightly.
- layer8 4 months ago
  
  In order to use a library, I need to (this is my opinion) be able to reason about the library’s behavior, based on a specification of its interface contract. The LLM may help with coming up with suitable code, but verifying that the application logic is correct with respect to the library’s documented interface contract is still necessary. It’s therefore still a requirement to read and understand the library’s documentation. For example, for the case of a REST client, you need to understand how the possible failure modes of the HTTP protocol and REST API are translated by the library.
- jayd16 4 months ago
  
  I wonder how good Google could be if they had a charge per query model that these LLMs do. AI or not, dropping the ad incentive would be nice.
escapecharacter 4 months ago

I've found AI to be useful on precisely-scoped tasks I might assign to a junior programmer to take a day to do, like "convert this exact bash script to a Powershell script".
But in my own work, those tasks are pretty rare, like 3 times a month? Often I start working on something, and the scope and definition of success changes while I'm in the midst of it. Or it turns out to be harder than expected and it makes sense to timebox it and do a quick search for workarounds.
As much as we joke about StackOverflow commenters sometimes telling a question-asker they shouldn't be doing what they're trying to do, you do actually want that (soft) pushback some of the time. Most modern LLMs will gleefully come up with a complete plan for how we're gonna get the square peg in the round hole.
- philipswood 4 months ago
  
  > you do actually want that (soft) pushback some of the time. Most modern LLMs will gleefully come up with a complete plan for how we're gonna get the square peg in the round hole.
  I once accidentally asked a local DeepSeek distilled model to do the wrong thing by accidentally copy pasting the wrong variable name.
  It told me how to do it, and then asked me if I was sure.
  My local DeepSeek R1 model (deepseek-r1:7b) saw me trying to do something stuupid (I was working with the wrong variable). It told me how to do what I asked and then asked:
  > _Is this modification part of a larger change you're making to the code? I'd like to make sure we're not modifying something important that might have side effects._
  Looking at its though process:
  > _The user wants to modify this string by replacing "-input" with "-kb". But looking at the ARN structure, I recall that AWS S3 ARNs are usually in the form arn<:aws:1151472526310103070>s3:::bucket_name RegionalPart path. The part after the bucket name is typically s3:// followed by the object key._ > _Wait, maybe the user has a specific reason to make this change. Perhaps they're formatting or structuring something for a different purpose. They might not realize that directly modifying ARNs can cause issues elsewhere if not done correctly._
  - escapecharacter 4 months ago
    
    That's nice!
clusterhacks 4 months ago

I am a slow adopter of new tech but plan to spend a significant amount of time in 2025 using AI tools when coding. I am net negative on AI simply replacing programmers, but I think the practice of development is undergoing a seismic shift at this point.
My recent usage is oriented towards using pseudocode descriptions that closely map to Python to produce Python functions. I am very impressed with Claude 3.7's syntactic correctness when given a chunk of pseudocode that looks "python-y" to begin with.
My one concern is that much of my recent code requirements lack novelty. So there is a somewhat reasonable chance that the tool is just spitting out code it slurped somewhere in github or elsewhere in the larger Internet. Just this week, I gave Claude a relatively "anonymous" function in pseudocode, meaning variable names were not particularly descriptive with one tiny exception. However, Claude generated a situationally appropriate comment as part of the function definition. This was . . . surprising to me if somehow the model had NOT in its training set had some very close match to my pseudocode description that included enough context to add the comment.
- doug_durham 4 months ago
  
  At this point very little code is "novel". Everyone is simply rewriting code that has already been written in a similar form. The LLM isn't slurping up and restating code verbatim. It is taking code that it has seen thousands of times and generating a customized version for your needs. It's hubris to think that anyone here is generating "novel" code.
  - clusterhacks 4 months ago
    
    I have seen the argument that very little code is novel but I find it inherently unsatisfying and lacking in nuance? I think what bugs me about is that if you squint hard enough, all programming reduces to "take some data, do something to it." That "something" is doing a lot of heavy lifting in the argument that "something" is or isn't novel.
    Heck, if we think about it from the programming language perspective, all code is "simply" using already existing language functions to cobble together a solution to some specific set of requirements. Is no program novel?
    There is probably a consideration here that maybe boils down to the idea of engineering vs artisanal craftsmanship and where a specific project falls in that spectrum . . .
csomar 4 months ago

Yeah, it's so bad now I only trust my eyes. Everyone is faking posts, tweets and benchmarks that the truth no longer exists.
I'm using Claude 3.7 now and while it improved on certain areas, it degraded on others (ie: it randomly removes/changes things more now).
- namaria 4 months ago
  
  It's clear to anyone paying attention that LLMs hit a wall a while back. RAG is just expert systems with extra steps. 'Reasoning' is just burning more tokens in hopes it somehow makes the results better. And lately we've seen that a small blanket is being pulled that way or another.
  LLMs are cool, machine learning is cooler. Still no 'AI' in sight.
julienmarie 4 months ago

I initially had the same experience. My codebase is super opinionated with a specific way to handle things. Initially it kept on wanting to do things it's way. I then changed my approach and documented the way the codebase is structured, how things should be done, all the conventions used and on every prompt I make sure to tell him to use these documents as reference. I also have a central document that keeps track of dependencies of modules and the global data model. Since I made these documents as reference developing new features has been a breathe. I created the architecture, documented it, and now it uses it.
The way I prompt it is first I write the documentation of the module I want, following the format I detailed inbthe master documents, and ask him to follow the documentation and specs.
I use cursor as well, but more as an assistant when I work on the architecture pieces.
But I would never let an AI the driver seat for building the architecture and making tech decisions.
crabl 4 months ago

What I've noticed from my extensive use over the past couple weeks has been Claude Code really sucks at thinking things through enough to understand the second and third order consequences of the code that it's writing. That said, it's easy enough to work around its deficiencies by using a model with extended thinking (Grok, GPT4.5, Sonnet 3.7 in thinking mode) to write prompts for it and use Claude Code as basically a dumb code-spewing minion. My workflow has been: give Grok enough context on the problem with specific code examples, ask it to develop an implementation plan that a junior developer can follow, and paste the result into Claude Code, asking it to diligently follow the implementation plan and nothing else.
- simonw 4 months ago
  
  "Claude Code really sucks at thinking things through enough to understand the second and third order consequences of the code that it's writing"
  Yup, that's our job as software engineers.
  - alienthrowaway 4 months ago
    
    > Yup, that's our job as software engineers
    The more seasoned you are as a SWE, the higher the orders you consider, and not just on the technical aspect, but the human and business sides as well.
- cglace 4 months ago
  
  In all of these posts I fail to see how this is engineering anymore. It seems like we are one step away from taking ourselves out of the picture completely.
  - bckr 4 months ago
    
    I don’t write binaries, assembly, or C. If I don’t have to write an application, I’m okay with that.
    I still have to write the requirements, design, and acceptance criteria.
    I still have to gather the requirements from stakeholders, figure out why those will or will not work, provision infra, figure out how to glue said infra together, test and observe and debug the whole thing, get feedback from stakeholders…
    I have plenty of other stuff to do.
    And if you automate 99% of the above work?
    Then the requirements are going to get 100Xed. Put all the bells and whistles in. Make it break the laws of physics. Make it never ever crash and always give incredibly detailed feedback to the end users. Make it beautiful and faster than thought itself.
    I’m not worried about taking myself out of the loop.
    
    draebek 4 months ago
    
    I have to say that I am worried that, by taking myself out of the loop for the 99%, I'm going to get worse at the 1% of things that occasionally fall into my lap because the LLM can't seem to do them. I think software engineering is a skill that is "use it or lose it", like many others.
    There's also the question of whether I will enjoy my craft if it is reduced to, say, mostly being a business analyst and requirements gatherer. Though the people paying me probably don't care very much about that question.
    
    geoka9 4 months ago
    
    Reading some of the comments in the "Layoffs don't work" right before reading comments here must have been one of the more surreal experiences for me :)
    The takes are as different as (paraphrasing): "if a person can't create something with en empty text editor, I fail them", and "if a person can't speed run through an unrealistically large set of goals because they don't use AI-assisted development, I fail them".
    I guess one should keep their skills at both honed at all times, even if neither are particularly useful at most real jobs, because you never know when you're going to be laid off and interviewing.
    
    majormajor 4 months ago
    
    It's very specialized already, though.
    How many devs could debug both a K8s network configuration issue and a bug in an Android app caused by a weird vendor's OS tweak? Not most of us.
    Some people will be better at pushing the LLM things to generate the write crap for the MVP. Some people will be better at using these tools for testing and debugging. Some people will be better at incidence response. They'll probably all be using tools with some level of AI "magic" in them, but the specialization will be somewhat recognizable to what it's been for the past decade.
    If you're on the business side you still want a team of people running that stuff until there's a step-change in the ability to trust these things and they get so good you'd be able to give over control of all your cloud/datacenter/network/whatever infrastructure and spending.
    And at THAT point... the unemployed software engineers can team up with the unemployed lawyers and doctors and blue-collar workers who were replaced by embodied-LLM-powered robots and ... go riot and ransack some billionare's houses until they decide that these new magical productivity machines should let everyone have more free time and luxury, not less.
    
    cglace 4 months ago
    
    Thanks for sharing. I hope you are right. It's hard to stay objective as things are changing so quickly.
- TylerLives 4 months ago
  
  This has been my experience as well. Breaking problems into smaller problems where you can easily verify correctness works much better than having it solve the whole problem on its own.
  - WD-42 4 months ago
    
    you just described how a good developer works.
Balgair 4 months ago

Hey, I've been hearing about this issue that programmers have on HN a lot.
But I'm in the more 'bad programmer/hacker' camp and think that LLMs are amazing and really helpful.
I know that one can post a link to the chat history. Can you do that for an example that you are comfortable sharing? I know that it may not be possible though or very time consuming.
What I'm trying to get at is: I suck at programming, I know that. And you probably suck a lot less. And if you say that LLMs are garbage, and I say they are great, I want to know where I'm getting the disconnect.
I'm sincerely, not trying to be a troll here, and I really do want to learn more.
Others are welcome to post examples and walk through them too.
Thanks for any help here.
- vlod 4 months ago
  
  >and think that LLMs are amazing and really helpful
  Respectively, are you understanding what it produces or do you think that's its amazing because it produces something, that 'maybe' works.
  Here's an e.g. I was just futzing with. I did a refactor of my code (typescript) and my test code broke (vitest) and for some reason it said 'mockResolvedValue()' is not a function. I've done this a gazillion times.
  I allowed it via 3-4 iterations to try and fix it (I was being lazy and wanted my error to go away) and the amount of crap (rewriting tests, referenced code) it was producing was beyond ridiculous. (I was using github co-pilot).
  Eventually I said "f.that for a game of soldiers" and used by brain. I forgot to uncomment a vi.mock() during the refactor.
  I DO use it to fix stupid typescript errors (the error blob it dumps on you can be a real pita to process) and appreciate it when gives me a simple solution.
  So I agree with quite a few comments here. I'm not ready to bend the knee to our AI Overloads.
  - Balgair 4 months ago
    
    Yeah, so I'm a 'hacker' (MIT definition here). I've only taken a single class in javascript with Sun based workstations about 20 years ago now (god, I'm old). I hated it.
    All my work now is in python and SQL now and though I've watched a lot of youtube videos and plunked at StackOverFlow for ~15 years, I've never had formal education in either language. Like, it takes me about as long to set up the libraries and dependancies in python as it does to write my code. My formal job titles have never had 'programmer' in them.
    My code, as such, is just to get something done. Mostly this is hardware interfacing stuff, but a little software too. I've had my code get passed up the chain and incorporated into codebases, but that's only happened a handful of times. The vast majority of the code I write is never seen by others. My code hasn't had to be maintainable as I've never really had to maintain it for more than 5 years max. I've used git on projects before, but I don't really see the need these days. The largest program I've written is ~1M lines of code. Most code is about 100 lines long now. I almost always know what 'working' means, in that I know the output that I want to see (again, mostly working in hardware). I almost never run tests of the code.
    I've had the same issues you have had with LLMs, where they get stuck and I have to try to restart the process. Usually this happens to me in about 20 back and forths. I'm mostly just pasting relevant code snippets and the errors back into the LLM for a while until things work for me. Again, I know what 'working' looks like from the start.
    Typically, I'll start the session with an LLM by telling it the problem I have, what I want the code skeleton to look like, and then what I want the output to look like. Then it will give the psuedo code, then I walk it through each portion of the psuedo code. Then I get to errors and debugging. Usually about half of this is just libraries and versions in python. Then, I get to the errors of the code itself. I can typically find what line of code is causing the error just from the terminal outputs. I'll talk with the LLM about that line of code, trying to understand it from the error. Then, on to the next error. Repeat that process until I get the working output I desire. I'm never expecting the right code out of the first LLM interaction, but I am expecting (and seeing) that the time it takes to get to working code is faster.
    The time it would usually take me to get through all this before LLMs was about 2 weeks of work (~80 hours) per project. Now it takes me about half a day (~4 hours), and it's getting faster.
    I'm not in the camp of thinking that AI is going to take my job. I am in the camp of thinking that AI is going to finally let me burn down the list of things that we really need to do around here.
    Thank you for the reply!
- mns 4 months ago
  
  I can give you an example here. We had to do some basic local VAT validation for EU countries and as the API that you can use for that has some issues for some countries (as it checks this in the national databases) we wanted to also have a basic local one. So using Claude 3.7 I wanted to get some basic VAT validation, in general the answer and solution was good, you would be impressed, but here comes the fun part. The basic solution was just some regular expressions and then it went further on its own and created specific validations for certain countries. These validations were something like credit card number validations, sums, check digits, quite nice you would say. But the thing is in a lot of these countries these numbers are basically assigned randomly and have no algorithm, so it went on to hallucinate some validations that don't exist providing a solution that looks nice, but basically it doesn't work in most cases.
  Then I went on github and found that it used some code written by someone in JS 7 years ago and just converted and extended it for my language, but that code was wrong and simply useless. We'll end up with people publishing exploits and various other security flaws in Github, these LLMs will get trained on that and people that have no clue what they are doing will push out code based on that. We're in for fun times ahead.
- alexkwood 4 months ago
  
  here is one solution i am helping out with which was very very easy to create using claude https://www.youtube.com/watch?v=R72TvoXCimg&t=2s
sovietmudkipz 4 months ago

Maybe it’s the shot up plane effect; we only see the winners but rarely see the failures. Leads us to wrong or incorrect conclusions.
Finding the right prompt to have current generation AI create the magic depicted in twitter posts may be a harder problem than most anticipate.
fullstackwife 4 months ago

"wild", "insane" keywords usually are a good filter for marketing spam.
- belter 4 months ago
  
  Influencer would be another term...
epolanski 4 months ago

I don't believe in those either, and I never see compelling YouTube videos showing that in action.
For small stuff LLMs are actually great and often a lifesaver on legacy codebases, but that's more or less where it stops.
noufalibrahim 4 months ago

I'm in the same boat. I've found it useful in micro contexts but in larger programs, it's like a "yes man" that just agrees with what I suggest and creates an implementation without considering the larger ramifications. I don't know if it's just me.
iambateman 4 months ago

I have a challenging, repetitive developer task that I need to do ~200 times. It’s for scraping a site and getting similar pieces of data.
I wrote a worksheet for Cursor and give it specific notes for how to accomplish the task in a particular case. Then let it run and it’s fairly successful.
Keep in mind…it’s never truly “hands off” for me. I still need to clean things up after it’s done. But it’s very good at figuring out how to filter the HTML down and parse out the data I need. Plus it writes good tests.
So my success story is that it takes 75% of the energy out of a task I find particularly tedious.
- WD-42 4 months ago
  
  I haven’t found llm code gen to be very good except in cases like you mention here. When you need to do large boilerplatey code with a lot of hardcoded values or parameters. The kind of thing you could probably write a code generator yourself for if you cared enough to do it. Thankfully Llms can save us from some of that.
onion2k 4 months ago

it rarely returns the right answer
One of the biggest difficulties AI will face is getting developers to unlearn the idea that there's a right answer, and that of the many thousands of possible right answers, 'the code I would have written myself' is just one (or a few if you're one of the few great devs who don't stop thinking about approaches after your first attempt.)
rhubarbtree 4 months ago

I spent a few hours trying cursor. I was impressed at first, I liked the feel of it and I tried to vibe code, whatever that means.
I tried to get it to build a very simple version of an app I’ve been working on. But the basics didn’t work, and as I got it to fix some functionality other stuff broke. It repeatedly nuked the entire web app, then rolled back again and again. It tried quick and dirty solutions that would lead to dead ends in just a few more features. No sense of elegance or foundational abstractions.
The code it produced was actually OK, and I could have fixed the bugs given enough time, but overall the results were far inferior to every programmer I’ve ever worked with.
On the design side, the app was ugly as hell and I couldn’t get it to fix that at all.
Autocomplete on a local level seems far more useful.
_steve_yegge_ 4 months ago

Gene and I would like to invite you to review our book, if you're up for it. It should be ready for early review in about 7-10 days.
It seems like you would be the perfect audience for it. We're hoping the book can teach you what you need in order to have all those success stories yourself.
- Gunnerhead 4 months ago
  
  How can I follow this book? I’m interested too.
moomin 4 months ago

I can definitely save time, but I find I need to be very precise about the exact behaviour, a skill I learned as… a regular programmer. Soper up is higher in languages I’m not familiar with, where I know what needs doing but not necessarily the standard way to do it.
Delomomonl 4 months ago

I had Claude prototype a few things and for that it's really enjoyable.
Like a single page HTML J's page which does a few things and saves it state in local storage with a json backup feature (download the json).
I also enjoy it for doing things I don't care much but makes it more polished. Like I hate my basically empty readme with two commands. It looks ugly and when I come back to stuff like this a few days/weeks later I always hate it.
Claude just generates really good readmes.
I'm trying out Claude code right now and like it so far.
Kiro 4 months ago

Funny, because I have the same feeling toward the "I never get it to work" comments. You don't need any special prompt engineering so that's definitely not it.
dilap 4 months ago

Yeah I gave Claude Code a try at about 5 different things, with miserable results on all of them (insult to injury -- each time it charged me about a buck!). I wonder if because it was C# with Unity code, maybe not so heavily represented in the training set?
I still find lots of use for LLMs authoring stuff at more like the function level. "I know I need exactly this."
Edit: I did however find it amazing for asking questions about sections of the code I did not write.
babyent 4 months ago

I’ve dug into this a few times.
Every single time they were doing something simple.
Just because someone has decades of experience or is a SME in some niche doesn’t mean they’re actually good… engineers.
yodsanklai 4 months ago

> I do find value in asking simple but tedious task like a small refactor or generate commands,
This is already a productivity boost. I'm more and more impressed about what I can get out of these tools (as you said, simple but tedious things). ChatGPT4o (provided by company) does pretty complex things for me, and I use it more and more.
Actually, I noticed that when I can't use it (e.g. internal tools/languages), I'm pretty frustrated.
- cglace 4 months ago
  
  Are you concerned that these tools will soon replace the need for engineers?
  - yodsanklai 4 months ago
    
    Yes, I used to be skeptical about the hype, but now I'm somewhat concerned. I don't think they will replace engineers but they do increase their productivity. I'm not able to quantify by how much though. In my case, maybe it increases my productivity by 5-10%, saving me a few hours of work each week. Very rough estimate.
    Does it mean that we'll need less engineers to perform the same amount of work? or we'll produce better products? In my company, there's no shortage of things to do, so I don't think we'll hire less people if suddenly engineers are a bit more productive. But who knows how it'll impact the industry as a whole.
kolbe 4 months ago

I am willing to say I am a good prompt engineer, and "AI takes the wheel" is only ever my experience when my task is a very easy one. AI is fantastic for a few elements of the coding process--building unit tests, error checking, deciphering compile errors, and autocompleting trivially repetitive sections. But I have not been able to get it "take the wheel"
nsonha 4 months ago

this space is moving really fast, I suggest before forming an definitive opinion try the best tool, such as the latest Claude model and use "agentic" mode or the equivalence on your client. For example, on Copilot this mode is brand new and only available in vscode insider. Cursor and other tools have had it for a little longer.
- collingreen 4 months ago
  
  People have been saying it writes amazing code that works for far longer than that setup has been available though. Your comment makes me think the product is still trying to catch up to these expectations people are setting.
  That being said I appreciate your suggestion and will consider giving that a shot.
egorfine 4 months ago

You have to learn and figure out how to prompt it. My experience with Claude Code is this: one time it produces an incredible result; another time it's an utter failure. There are prompt tips and tricks which have enormous influence on the end result.
- ido 4 months ago
  
  Can you give us some of these tips?
  - egorfine 4 months ago
    
    Not that I have anything concrete in my mind yet. I'm learning as we all do. But after some usage I've developed a little bit of a hunch which prompt works and which not.
    For example, I have mindlessly asked Claude Code over a large codebase "where is desktop app version stored and how is it presented on site". I have expected useless answer given how vague the questions was. Instead I have got a truly exceptional and extremely clear report that fully covers the question.
    Another example. I have asked Claude Code to come up with a script to figure out funding rate time intervals on a given exchange and Code ended up in an almost endless loop running small test scripts in node.js to figure this out and came up with a super suboptimal and complicated solution. Turns out my prompt was too verbose and detailed and I have specifically asked Claude Code to figure out time intervals, not just get them. So it did. Instead of just querying the exchange via API and printing the list on terminal (3 lines script) it actually, truly tried to figure them out in various ways.
  - borgdefenser 4 months ago
    
    You should also try the same prompt multiple times to see how this works.
    Sometimes you will get better or worse answers completely by chance.
    I think Claude is pretty good if you have it write a function and give it the inputs, output and a data example. You can also put to ask clarifying questions as needed because there is a good chance there are aspects of the prompt that are ambiguous.
    My prompts are always better if I write them in a separate text file and then paste them in too. I think I just take my time and think things out more that way instead of trying to get to the answer as fast.
jayd16 4 months ago

I agree it feels very different from my experience.
I'm curious when we'll start seeing verifiable results like live code streams with impressive results or companies dominating the competition with AI built products.
Ancalagon 4 months ago

Are you actually using claude? There's an enormous difference between claude code and copilot, with the latter being a bigger burden these days than a help.
razemio 4 months ago

Can you clarify what tools and programming language you use? If find that the issue often is wrong tooling, exotic programming languages or frameworks.
- develoopest 4 months ago
  
  I would consider frontend tasks using Typescript and React quite standard.
  - razemio 4 months ago
    
    React in my experience sucks with AI. In fact I have not yet encountered a "heavy" framework which works well. Use something light like svelte.
    Typed programming languages like Typescript, Scala, Haskell and so on will produce more errors -> you need to fix stuff manually. However it will also reduce bugs. So it is a mixed bag. For an error free experience python and JavaScript work very well.
    When it comes to tooling, if you haven't used cline, roocode or aider (not as good) yet, you haven't seen what an AI can do.
    A good starter would be starting fresh, by creating a README which describes the hole application you want to build in detail and let the AI decide the tech stack. You can most certainly build complex applications with an AI in blazing speed.
gloosx 4 months ago

From the creators of static open-source marketing benchmarks: twitter PR posts.
huvin99 4 months ago

Is this true even for Claude 3.7 Sonnet/3.7 Sonnet Thinking ?
timewizard 4 months ago

Oh. That's because he's clearly lying.
habinero 4 months ago

It's not.
A lot -- and I mean a lot -- of people who hype it up are hobby or aspirational coders.
If you drill down on what exactly they use it for, they invariably don't write code in professional settings that will be maintained and which other humans have to read.
Everyone who does goes "eh, it's good for throwaway code or one offs and it's decent at code completion".
Then there's the "AGI will doom us all" cult weirdos, but we don't talk about them.
dgellow 4 months ago

I have the same experience
darepublic 4 months ago

What model are you using
EigenLord 4 months ago

You've got to do piecemeal validation steps yourself, especially for models like Sonnet 3.7 that tend to over-generate code and bury themselves in complexity. Windsurf seems to be onto something. Running Sonnet 3.7 in thinking mode will sometimes reveal bits and pieces about the prompts they're injecting when it mentions "ephemeral messages" reminding it about what files it recently visited. That's all external scaffolding and context built around the model to keep it on track.

CGamesPlay 4 months ago

I decided to check this out after seeing the discussion here. I had previously misunderstood that it required a Claude.ai plan, but it actually just uses your API keys.

I did a comparison between Claude Code and Aider (my normal go-to): I asked it to do clone a minor feature in my existing app with some minor modifications (specifically, a new global keyboard shortcut in a Swift app).

Claude Code spent about 60 seconds and $0.73 to search the code base and make a +51 line diff. After it finished, I was quite impressed by its results; it did exactly the correct set of changes I would have done.

Now, this is a higher level of task than I would normally give to Aider (because I didn't provide any file names, and it requires changing multiple files), so I was not surprised that Aider completely missed the files it needed to modify to start (asking me to add 1 correct file and 2 incorrect files). I did a second attempt after manually adding the correct files. After doing this, it produced an equivalent diff to Claude Code. Aider did this in 1 LLM prompt, or about 15 seconds, with a total cost of $0.07, about 10% of the Claude Code cost.

Overall, it seems clear that the higher level of autonomy carries a higher cost with it. My project here was 7k SLOC; I would worry about ballooning costs on much larger projects.

bufferoverflow 4 months ago

Now, the billion dollar question is - how long would it take you to code that diff?
- CGamesPlay 4 months ago
  
  Probably about 3 minutes? That's my main usage of these types of coding tools, honestly. I already know generally what I want to happen and validating that the LLM is on the right track / reached the right solution is easy.
  I'm not saying that "Claude Code makes me 300% more effective", but I guess it did for this (simple) task.
  - generic92034 4 months ago
    
    Are you comparing only the generation time with your coding time? How are the figures if you include the necessary step of checking the generated code? And how do these times change if you are coding in a very complex environment?
    
    CGamesPlay 4 months ago
    
    To be clear, I explicitly picked this task because my judgement call was that it was going to be faster to use an AI than coding it myself. Checking the generated code was easy for me to do (whether I or the AI wrote it).
    I don't know what you mean by "how do these times change if you are coding in a very complex environment", but to restate my original post: I'm fearful that Claude's additional autonomy will allow it to waste money (and time) pursuing useless ideas.
    
    lgas 4 months ago
    
    As I can attest to, personally, it even allows the human operators to waste money (and time) pursuing all sorts of larks :)
  - off_by_inf 4 months ago
    
    And how long would it take you to debug AI generated code again and again?
    
    Maxatar 4 months ago
    
    This shouldn't be factored in unless you never debug your own code. At any rate, I don't know about OP but usually when I get an LLM to write out some code, I also have it write out the tests for it as well.
- ddacunha 4 months ago
  
  I recently made some changes to a website generated by a non-technical user using Dreamweaver. The initial state was quite poor, with inline CSS and what appeared to be haphazard copy-pasting within a WYSIWYG editor.
  Although I’m not proficient in HTML, CSS, or JavaScript, I have some understanding of what good code looks like. Through several iterations, I managed to complete the task in a single evening, which would have required me a week or two to relearn and apply the necessary skills. Not only the code is better organised, it’s half the size and the website looks better.
- fragmede 4 months ago
  
  time spent is not the only question. how much thought it takes, however impossible that may be to measure, is another one. If an LLM assisted programmer is able to solve the problem without deep focus, while responding to emails and attending meetings, vs the programmer who can't, is time really the only metric we can have here?
- ignoramous 4 months ago
  
  > how long would it take you to code that diff?
  My scrum/agile coach says, by parallelizing prompts, a single developer can babysit multiple changes in the same time slice. By having a sequence of prompts ready before hand, a single developer can pipeline those one after the other. With an IDE that helps schedule such work, a single developer can effectively hyper-thread their developmental workflow. If the developer is epoll'ing at 10x the hertz... that's another force multiplier. Of course context switches & side-channels are of concern, but a voice over my shoulder tells me that as long as memory safety is guaranteed, everything should turn up alrigd3adb33f.
- throwawaymsft 4 months ago
  
  Infinite, because the median counterfactual is never getting around to this P4 “nice to have” issue that’s languished in the backlog.
artdigital 4 months ago

Same here, I did a few small taks with Claude Code after seeing this discussion here and is too expensive for me.
A small change to create a script file (20 LoC) was 10cts, a quick edit to a README was 7ct
Yes yes engineers make more than that blah blah but the cost would quickly jump out of control for bigger tasks. I’d easy burn through $10-20 upwards a day with this, or upwards $100-$300 a month. Unless you have a Silicon Valley salary, that’s too expensive.
I use other tools like Cody (the tool the author created) or Copilot because I pay $10 a month and that’s it. Yes I get rate limited almost daily but I don’t need to worry that my tool cost is going out of control suddenly.
I hope Anthropic introduces a new plan that bundles Claude Code into it, I’d be much more comfortable using that knowing it won’t suddenly be more than my $50/mo (or whatever)
- stevage 4 months ago
  
  It's an interesting question. As a freelance consultant, theoretically a tool like this could allow me to massively scale up my income, assuming I could find enough clients.
  I'm a bit nervous where I'd end up though - with code I'd "written" but wasn't familiar with, and with who knows what kinds of limitations or subtle bugs baked in.
  - firtoz 4 months ago
    
    I currently pay around $200-300 to a combination of Cursor + Anthropic through the API. I have both a full time job and freelance work. It pays for itself. I end up reviewing more than manual coding, to ensure the quality of the results. Funnily, the work I did through this method has received more praise than my usual work.
    
    coffeecantcode 4 months ago
    
    Did you outgrow the vase 500 searches that Cursor gives you per month and connect your API key for usage based pricing?
    I’m having a hard time coming close to the 500 included in the monthly subscription and I use it like, a lot.
    Just curious how you’re hitting that 200-300 mark unless you’re talking about paying Anthropic outside of cursor. Which I just now realized is probably the case.
    
    firtoz 4 months ago
    
    I ran out of fast requests and using my own API key
  - leoedin 4 months ago
    
    > I'm a bit nervous where I'd end up though - with code I'd "written" but wasn't familiar with
    This does seem like quite a big downside. It turns every new feature into “implement this in someone else’s code base”. I imagine you’d very quickly have complete dependency on the AI. Maybe that’s an inevitability in this new world?
    
    bluefirebrand 4 months ago
    
    It sounds fine as long as you can fully trust the AI to do good work right?
    I don't think there's any current AI that is fully trustworthy this way though.
    I wouldn't even put them at 50% trustworthy
    I think we are going to see a cliff where they become 80% good, and every tiny bit of improvement past that point will be exponentially more difficult and expensive to achieve. I don't think we reach 100% reliable AI in any of our lifetimes
    
    itsoktocry 4 months ago
    
    I think we are going to reach a cliff where a type of old school developers keep saying, "it just can't write code like I can" while at the same time wondering why they can't land a job.
    Current AI is likely already beyond 50% trustworthiness, whatever that means.
    
    bluefirebrand 4 months ago
    
    > "it just can't write code like I can" while at the same time wondering why they can't land a job
    People had this same prediction about offshore development
    Those old school devs are able to find well paying work fixing broken software churned out by overseas code sweatshops
    I predict if you can read and understand code without the help of AI models you will be in even higher demand to fix the endless broken software built by AI assisted coders who cannot function without AI help
    
    Suppafly 4 months ago
    
    >It turns every new feature into “implement this in someone else’s code base”.
    It amazes me that more people aren't worried about this.
    
    optimalsolver 4 months ago
    
    >you’d very quickly have complete dependency on the AI
    That's a feature, not a bug. At least from OAI/Anthropic's point of view.
- bayarearefugee 4 months ago
  
  > Yes yes engineers make more than that blah blah but the cost would quickly jump out of control for bigger tasks.
  Also (most) engineers don't hallucinate answers. Claude still does regularly. When it does it in chat mode via a flat rate Pro plan I can laugh it off and modify the prompt to give it the context it clearly didn't understand but if its costing me very real money for the LLM to over-eagerly over-engineer an incorrect implementation of the stated feature its a lot less funny.
  - artdigital 4 months ago
    
    Exactly! Especially agentic tools like Aider and Claude that are designed to pull in more files into their context automatically, based on what the LLM thinks it should read. That can very quickly go out of control and result in huge context windows.
    Right now with Copilot or other fixed subscriptions I can also laugh it off and just create a new tab with fresh context. Or if I get rate-limited because of too much token use I can wait 1 day. But if these actions are linked to directly costing money on my card, then that's becoming a lot more scary.
  - naasking 4 months ago
    
    > Also (most) engineers don't hallucinate answers.
    They absolutely do, where do you think bugs come from? The base rate is typically just lower than current AIs.
    
    aziaziazi 4 months ago
    
    Bugs from engineers comes from a variety of reasons and most have nothing in common with an LLM hallucinating.
    For exemple I can’t remember seing a PR with an API that seems plausible but never ever existed, or an interpretation of the specs so convoluted and edgy that you couldn’t even use sarcasm as a justification for that code.
    Don’t take me wrong: some LLMs are capable of producing bugs that looks like humans ones, but the term hallucinate is something else’s and doesn’t fit with much humans bugs.
    
    naasking 4 months ago
    
    > For exemple I can’t remember seing a PR with an API that seems plausible but never ever existed
    A PR is code that has already been tested and refined, which is not comparable to the output of an LLM. The output of an LLM is comparable to the first, untested code that you wrote based off of your sometimes vague memory of how some API works. It's not at all uncommon to forget some details of how an API works, what calls it supports, the details of the parameters, etc.
    
    ljm 4 months ago
    
    But you don't submit that rough draft with the 110% conviction that it's correct, which is what an LLM will do when it hallucinates.
    It won't say "I think it should look something like this but I might be wrong," it'll say "simple! here is how you do it."
    Hence hallucination and not error. It thinks it's right.
    
    lanstin 4 months ago
    
    It’s kind of uncommon to be aware that you have only a vague recall of the API and not go check the documentation or code to refresh your memory. That self knowledge that you knew something and aren’t sure of the details is indeed the thing that these tools lack. So far.
    
    TeMPOraL 4 months ago
    
    Human programmers have continuous assistance on every keystroke - autocomplete, syntax highlighting, and ultimately, also the compilation/build step itself.
    For an LLM-equivalent experience, go open notepad.exe and make substantial changes there, and then rebuild - and let the compiler tell you what's your base rate of hallucinating function names and such.
    
    lanstin 4 months ago
    
    In the 1990s, that is closer to what making software was like. There, one had an even more heightened awareness of how confident one was in what one was typing. We would then go to the manual (physical in many cases) and look it up.
    And we never made up APIs, as there just weren't that many APIs. We would open the .h file for the API we were targeting as we typed into the other window. And the LLMs have ingested all the documentation and .h files (or the modern equivalent) so they don't have a real excuse.
    But I use the LLMs all the time for math, and they do profusely hallucinate in a way people do not. I think it's a bit disingenuous to say that LLMs don't have that failure mode that people don't really have.
- zo1 4 months ago
  
  I use Grok and it's free (even Grok3). I definitely don't hit limits unless it's a pretty heavy day and I do a lot of adjustments. Also, don't send entire codebases to it, just one-off files. What's quite amazing is how it doesn't matter that it doesn't have the source to dependent files, it figures it out and infers what each method does based on its name and context, frigging amazing if you ask me.
  And it doesn't fight me like the OpenAI tooling does that logs me out randomly every day and I have to login and spend 4 minutes copying login codes from my email or answering their stupid Captcha test. And this is on their API playground where I pay for every single call - so not like I'm trying to scrape my free chat usage as an API.
  - artdigital 4 months ago
    
    > I use Grok and it's free (even Grok3). I definitely don't hit limits unless it's a pretty heavy day and I do a lot of adjustments
    Okay maybe need to clarify: I hit those limits when I do agentic stuff, which is what Claude Code does: So let the LLM automatically pull in files into the context it thinks it needs, analyze my codebase, follow imports, add more code, etc. It can quickly balloon out of control when the LLM pulls in too many LoC and the context window gets too big.
    Then do a few back and forth actions like "let's refine this plan, instead of X pls do Y", or "hmm I think maybe we should also look into file blah.ts" and you quickly hit 500k tokens.
    If I use Cody only, which has some agentic capabilities but is much more "how can I implement Y in this file @src/file1.ts db models are in @src/models/foo.ts", then I rarely ever hit any rate limitations. That's more similar to what you describe of copying code back and forth, except it's in the editor and you can do it by writing @somefile.
  - immibis 4 months ago
    
    You send your company's trade secrets directly to Elon Musk?
    
    konha 4 months ago
    
    So? Most of your code is worthless to anyone except yourself.
lolinder 4 months ago

I think that tools like this have to operate on a subscription model like Cursor does in order to make any kind of sense for most users. The pay as you go model for agentic code tools makes you responsible for paying for:
* Whatever context the agent decides to pull in.
* However many iterations the model decides to run.
* Any result you get, regardless of how bad it is.
With pay as you go, the tool developer has no incentive to minimize any of these costs—they get paid more if it's inefficient, as long as it's not so inefficient that no one uses it. They don't even need it to be especially popular, they just need some subset of the population to decide that costs don't matter (i.e. those with Silicon Valley salaries).
With Cursor's model of slow and fast queries, they are taking responsibility for ensuring that the agents are as cost efficient as possible. The more efficient the agent the larger their cut. The fewer times that people have to ask a question a second time, the larger their cut. This can incentivize cutting corners, but that somewhat balanced out by the need to keep people renewing their subscription, and on the whole for most users it's better to have a flat subscription price and a company that's optimizing their costs than to have a pay-as-you-go model and a company that has no incentive to improve efficiency.
- foz 4 months ago
  
  I think this core business model question is happening at all levels in these companies. Each time the model goes in the wrong direction, and I stop it - or I need to go back and reset context and try again - I get charged. The thing is, this is actually a useful and productive way to work sometimes. Like when pairing with another developer, you need to be able to interrupt each other, or even try something and fail.
  I don't mind paying per-request, but I can't help but think the daily API revenue graph at Cursor is going up whenever they have released a change that trips up development and forces users to intervene or retry things. And of course that's balanced by churn if users get frustrated and stop or leave. But no product team wants to have that situation.
  In the end I think developers will want to pay a fair and predictable price for a product that does a good job most of the time. I don't personally care about switching models, I tend to gravitate towards the one that works best for me. Eventually, I think most coding models will soon be good at most things and the prices will go down. Where will that leave the tool vendors?
- cft 4 months ago
  
  Unfortunately the opposite is happening: Cursor is going to pay-per-use model:
  https://x.com/ericzakariasson/status/1898753771754434761
  https://old.reddit.com/r/cursor/comments/1j5kvun/cursor_0470...
  I am afraid that the endgame of programming will be who has the biggest budget for an LLM, further consolidating programming to megacorps and raising barrier to entry.
personjerry 4 months ago

That seems like a steal? Engineers are paid much more to do much less
- CGamesPlay 4 months ago
  
  No, I'm paid much more to do much more than what I did in this simple task. Claude didn't even test the changes (in this case, it does not have the hardware required to do that), or decide that the feature needed to be implemented in the first place. But my comparison wasn't "how do I compare to Claude Code", it was "how does Aider compare to Claude Code". My boss does not use Aider or Claude Code, and would not be happy with the results of replacing me with it (yet).
  - Sterling9x 4 months ago
    
    [flagged]
    
    CGamesPlay 4 months ago
    
    I said that the AI literally does not have the hardware required to do the testing necessary. But ignoring that, automated testing is not sufficient for shipping software. Imagine shipping a website that has full test coverage but never once opening the browser. This isn't a fundamentally impossible problem for AI, but no amount of "good prompting" is going to get you there today.
  - y1n0 4 months ago
    
    I think you missed the point. The claim is that it would have cost you more to hire a person to do what either claude code or aider did.
    
    CGamesPlay 4 months ago
    
    I think I pretty directly addressed that point. Yes, it would be more expensive to hire me to do what Claude Code / Aider did, but nobody would be satisfied with my work if I stopped where Claude Code / Aider did.
    
    johnmaguire 4 months ago
    
    They aren't necessarily saying it can replace you. They're saying that even though it's expensive, it's cheaper than your time (which can be better spent on other tasks, as you point out.)
    
    codebolt 4 months ago
    
    And with this increase in productivity, your boss needs fewer of you to do the work that would previously require a larger team.
    
    baq 4 months ago
    
    The first half is correct, but the conclusions shouldn’t be ‘we’re replicating our software engineers with Claude today’, they’re ‘our experienced engineers just 10x their productivity, we’ll never need to hire an intern’
    
    bhaney 4 months ago
    
    You mean "our experienced engineers just 10x'd their productivity, so we can fire 90% of them"
    
    risyachka 4 months ago
    
    Productivity gains decrease exponentially after a few weeks as your engineering skills become rusty very fast (yes, they do, in 100% cases)
    Thats the biggest part everyone misses. It’s all sunshine and rainbows until in a month you realize you start asking llm to think for you and at that point the code becomes shit and degrades fast.
    Like with everything else “use it or lose it”
    If you don’t code yourself- you will lose the ability to properly so it very fast, and you won’t realize it until too late
    
    MyOutfitIsVague 4 months ago
    
    If you're using the LLM poorly. Many team leads spend very little time programming, and spend a lot of time reviewing code, which is basically what working with LLMs is. If the LLM writes code that you couldn't have written yourself, you aren't qualified to approve it.
    
    snowwrestler 4 months ago
    
    Accurate only for companies who think today “I have exactly as much engineering talent as I’ll ever need.”
    Which is approximately zero companies.
    
    baq 4 months ago
    
    Nah the value unlock is too great. For now. Don’t quote me next March.
    
    cloudbonsai 4 months ago
    
    I'm pondering where this "AI-automated programming" trend is heading.
    For example: thirty years ago, FX trading was executed by a bunch of human traders. Then, computers arrived on the scene, which made all of them practically obsolete. Nowadays FX trading is executed by a collection of automated algorithms, being monitored by few quants.
    My question is: is the software development in 2025 basically like what the foreign exchange was in the 2000s?
    
    philipswood 4 months ago
    
    My take is something like the following:
    With industrialisation blacksmiths were replaced by assembly lines. I'm sure that blacksmiths are more flexible and capable in almost any important dimension, but the economics of factories just made more sense.
    I expect that when the dust settles (assuming that the dust settles), that most software will be an industrial product. The humans involved in its creation will be engineers and not craftsmen. Today we have machinists and industrial engineers - not blacksmiths.
    Quality and quality assurance processes will become more important, I also expect optimised production processes. I think a lot of the software ecosystem is a baroque set of over-engineered (or over crafted) steps and processes and this will probably be refactored.
    I expect code quality metrics to be super refined. Craftsmen don't usually produce artifacts to the tolerances that our machines do now - code will be the same. I expect automated correctness proofs, specification languages, enhanced type systems to have a renaissance.
    
    Suppafly 4 months ago
    
    This, if you hire engineers, you want them doing tasks only engineers can do, not busy work.
- beepbooptheory 4 months ago
  
  I know this is not really in the spirit of the room here, but before I ever dreamed of getting paid to code, I only learned to at all because I was cheap and/or poor cook/grad student that wanted to make little artsy musical things on the computer. I remember the first time I just downloaded pure data. No torrent, no cracks, it was just there for me and all it asked for was my patience.
  The only reason I ever got into linux at all was because I ended up with some dinky acer chromebook for school but didn't want to stop making stuff. Crouton changed my life in a small way with that.
  As I branched out and got more serious, learning web development, emacs, java, I never stopped feeling so irrationally lucky that it was all free, and always would be. Coming on here and other places to keep learning. It is to this day still the lovely forever hole I can excavate that costs only my sleep and electricity.
  This is all not gone, but if I was just starting now, I'd find hn and so and coding twitter just like I did 10 years ago, but would be immediately turned off by this pervasive sense that "the way to do things now" is seemingly inseparable from a credit card number and monthly charge, however small. I just probably would not of gotten into it. It just wouldn't feel like its for me: "oh well I don't really know how to do this anyway, I can't justify spending money on it!" $0.76 for 50 loc is definitely nuts, but even $0.10 would of turned me way off. I had the same thoughts with all the web3 stuff too...
  I know this speaks more to my money eccentricities than anything, and I know we dont really care on here about organic weirdo self teachers anymore (just productivity I guess). I am truly not even bemoaning the present situation, everyone has different priorities, and I am sure people are still having the exciting discovery of the computer like I did on their cursor ide or whatever. But I am personally just so so grateful the timeline lined up for me. I don't know if I'd have my passion for this stuff if I was born 10 years later than I was, or otherwise started learning now. But I guess we don't need the passion anymore anyway, its all been vectorized!
  - cwalv 4 months ago
    
    > But I am personally just so so grateful the timeline lined up for me.
    I know the feeling. We still have access to the engineering thought processes responsible for some of the most amazing software feats ever accomplished (thru source repo history and mailing lists), just with access to the Internet. Of course there's a wealth of info available for free on the web for basically any profession, but for software engineering in particular it's almost direct access to world class teams/projects to learn from.
    > but would be immediately turned off by this pervasive sense that "the way to do things now" is seemingly inseparable from a credit card number and monthly charge
    To be effective you still need to understand and evaluate the quality of the output. There will always be a certain amount of time/effort required to get to that point (i.e., there's still no silver bullet).
    > But I guess we don't need the passion anymore anyway, its all been vectorized!
    We're not running out of things that can be improved. With or without these tools, the better you get, the more of the passion/energy that gets directed at higher levels of abstraction, i.e. thinking more about what to solve, tradeoffs in approaches, etc. instead of the minute details of specific solutions.
  - idiotsecant 4 months ago
    
    This doesn't make much sense to me. Is there some reason a kid today can't still learn to code? In the contrary, you have LLMs available that can answer your exact personalized questions. It's like having a free tutor. It's easier than it's ever been to learn for free.
    
    GlassOwAter 4 months ago
    
    I’m approaching middle age and have always wanted to learn to code and run servers, but would get caught up somewhere on tutorials and eventually give up in frustration.
    Over the past year I have accomplished so much with the ever patient LLMs as my guide. It has been so much fun too. I imagine there are many others in my shoes and those that want to learn now have a much friendlier experience.
    
    stevage 4 months ago
    
    Yeah I'm middle aged and a competent programmer. But I hate learning new technologies these days. TypeScript was a huge hurdle to overcome. Working with ChatGPT made it so much more bearable. "Explain this to me. I need to do X, how? Why doesn't this work" etc.
    
    towelpluswater 4 months ago
    
    Yep we’re here. I think lots of us who don’t often post
    
    beepbooptheory 4 months ago
    
    This is definitely true in some ways. I was just talking around this point about spending money incrementally on aider, cursor, etc, and how it would have been a turnoff to me. But yes, all that I had back then people still have, and thats great.
    
    madmask 4 months ago
    
    Why learn if the computer can do it better than you and by the time you learn the roi on the market approaches 0? This wave of llm removed a lot of my interest in coding professionally
  - JKCalhoun 4 months ago
    
    The sounds like the best way to get into coding. (For me it was wanting to realize game ideas to entertain myself.)
    Money for a computer when I was getting into it was the credit-card part of it — there were no cheap Chromebooks then. (A student loan took care of the $1200 or so I needed for a Macintosh Plus.)
    I suspect that's always the way of it though. There will be an easier way throwing money at a thing and there will be the "programming finds a way" way.
  - baq 4 months ago
    
    It must be said, preferably in bold, that:
    > this pervasive sense that "the way to do things now" is seemingly inseparable from a credit card number and monthly charge
    …is true, but it only applies to experienced engineers who can sculpt the whole solution using these tools, not just random code. You need the whole learning effort to be able to ground the code the slop generators make. The passion absolutely helps here.
    Note this is valid today. I have concerns that I’ll have different advice in 2027…
    
    cft 4 months ago
    
    in 2027 a lot programming will simply be a question of who has the largest LLM budget.
    I wonder what it means for the open source...
    
    intended 4 months ago
    
    In 2028, the question will be who spent more money on lawsuits, and who spent more money on consultants to clean up their code base.
    Jokes aside, code tools are best used in the hands of someone who is already trained and can verify bad code, and bad patterns at a glance.
    AI code passes many tests. So does a lot of code written by us, for ourselves. When the code gets in front of users, especially the kind of genius users who learn how to fly by forgetting how to fall, then we learn many good habits.
    
    JKCalhoun 4 months ago
    
    In 2027 we'll have LLMs downloaded to our devices that are as good as Claude Code is today. (But as I have seen, as the leading edge of this stuff is always cooler than what you can run locally, we'll not be satisfied then with today's Claude Code.)
- Bjorkbat 4 months ago
  
  Tangential, but this reminds me of something someone said on Twitter that has resonated with me ever since. Startups targeting developers / building developer tooling are arguably one of the worst startups to build, because no matter how much of a steal the price is relative to the value you get, developers insist they can build their own or get by with an open-source competitor. We're as misguided on value as we are on efficiency and automation (more specifically, the old trope of a dev spending hours to automate something that takes minutes to do).
  - deepGem 4 months ago
    
    This is also why devs are not in charge of purchase decisions at tech companies. I don't mean FAANG but the usual tech shops. Someone buys a tool and you have to use it. I think the startups selling dev tools are not selling to developers at all, but to the IT folks of these big tech firms.
    Should they pull it off, it's not at all a bad startup to build. However, you need to now invest in a sales force that can sell to the Fortune 500. As a tech founder with no sales trope, this will be incredibly hard to pull off.
    I digress, but yeah selling to devs is almost always a terrible idea since we all want to build our own stuff. That spirit may also be waning with the advent of Replit agent, Claude code and other tools.
    
    Suppafly 4 months ago
    
    > I think the startups selling dev tools are not selling to developers at all, but to the IT folks of these big tech firms.
    They are often selling to IT managers against the advice of the developers and IT folks, and then they mostly don't get used because they don't actually add any value to the process.
  - cwalv 4 months ago
    
    I've noticed this tendency in myself and thought about the 'why' a lot, and I think it comes down to subconsciously factoring in the cost of lock-in, or worse, lack of access to fix/improve a tool I've come to rely on
    
    TheDong 4 months ago
    
    For me, a larger part than "cost of lock-in" is the "hacker spirit", the curiosity to understand how it works.
    Sure, I can pay google or fastmail to host a mailserver for me, but that deprives me of the joy of configuring and updating dovecot/postfix/etc, writing custom backup scripts, writing my own anti-spam tooling, etc. I want to know how all those pieces work.
    Sure, I can pay kagi to search its index of webpages for me, but that deprives me of the joy of creating and running a botnet to index webpages, storing 100s of terrabytes of scraped data, and writing my own search code.
    Targeting hackers is indeed a sucker's game.
    
    popcorncowboy 4 months ago
    
    > but that deprives me of the joy of configuring and updating dovecot/postfix/etc, writing custom backup scripts, writing my own anti-spam tooling
    I'm 90% sure you're serious, but that didn't stop me having the best belly laugh for a solid few minutes at this. Thank you.
    
    lqstuart 4 months ago
    
    I think this spirit is totally lost on most people in this field. It’s tempting to say younger generations but it’s everyone. It always amazes me when I meet someone who has spent 10+ years in this field and doesn’t even care how anything but their shitty Kafka-Flink pipelines work.
    
    sgarland 4 months ago
    
    If that; I’ve met plenty who only care that they work, not how they work.
    As someone who works in infra and dabbles in coding, this is a continual bugbear, because often I’ll find an optimization while troubleshooting “my” problem, and the dev team is disinterested in implementing it. Their endpoints are meeting their SLO, so who cares?
    
    Bjorkbat 4 months ago
    
    I've honestly thought of hacker spirit as embodying a kind of homesteader ethos in a way. There's this homesteading book I bought a long time ago when I was in college, rich with illustrations on how to do everything from raise animals and grow food to building a house, processing lumber, drilling a well, everything. The same fascination I have with homesteading and DIY culture extends into my interest in technology, and I suspect this is the same with a lot of developers as well.
  - bryanrasmussen 4 months ago
    
    >We're as misguided on value as we are on efficiency and automation (more specifically, the old trope of a dev spending hours to automate something that takes minutes to do).
    but automating something that takes minutes to do is Larry Wall's example of programmer laziness, and is a virtue.
    of course - this needs obligatory conflicting XKCD comics
    automation makes you do more work: https://xkcd.com/1319/
    is it worth the time to automate https://xkcd.com/1205/
- tropin 4 months ago
  
  Not everybody works in USA.
caseyf7 4 months ago

Which model did you use with aider?
- CGamesPlay 4 months ago
  
  My post above was with sonnet-3.5. When I used sonnet-3.7, it didn't speculate at the files at all; it simply requested that I add the appropriate ones.
maineagetter 4 months ago

[dead]

chaosprint 4 months ago

It seems the original poster hasn't extensively tried various AI coding assistants like Cursor or Windsurf.

Just a quick heads-up based on my recent experience with agent-based AI: while it's comfortable and efficient 90% of the time, the remaining 10% can lead to extremely painful debugging experiences.

In my view, the optimal scenarios for using LLM coding assistants are:

- Architectural discussions, effectively replacing traditional searches on Google.

- Clearly defined, small tasks within a single file.

The first scenario is highly strategic, the second is very tactical. Agents often fall awkwardly between these two extremes. Personally, I believe relying on an agent to manage multiple interconnected files is risky and counterproductive for development.

sitkack 4 months ago

Steve Yegge, you should know who he is AND the post also mentioned Cursor and Windsurf. His own company works on a similar product.
hashmap 4 months ago

This has been my experience as well. I find that the copy/paste workflow with a browser LLM still gets me the most bang for the buck in both those cases. The cli agents seem to be a bit manic when they get hold of the codebase and I have a harder time corralling them into not making large architectural changes without talking through them first.
For the moment, after a few sessions of giving it a chance, I find myself using "claude commit" but not asking it to do much else outside the browser. I still find o1-pro to be the most powerful development partner. It is slow though.
tomnipotent 4 months ago

The author works on Cody at Sourcegraph so I'll give him the benefit of the doubt that he's tried all the major players in the game.
- mvdtnz 4 months ago
  
  He's also author of one of the most legendary posts about programming language design of all time, Execution in the Kingdom of Nouns.
  http://steve-yegge.blogspot.com/2006/03/execution-in-kingdom...
finolex1 4 months ago

He literally says in his post "It might look antiquated but it makes Cursor, Windsurf, Augment and the rest of the lot (yeah, ours too, and Copilot, let's be honest) FEEL antiquated"
sbszllr 4 months ago

> In my view, the optimal scenarios for using LLM coding assistants are:
> - Architectural discussions, effectively replacing traditional searches on Google.
> - Clearly defined, small tasks within a single file.
I think you're on point here, and it has been my experience too. Also, not limited to coding but general use of LLMs.
bdangubic 4 months ago

duuuuude :) you should seriously consider deleting this post… if you do not know who Steve Yegge is (the original poster as you call him) you really should delete this post
- turnsout 4 months ago
  
  I appreciate that you're super attuned to this frothy space, but not everyone cares about learning all the many personalities in the ecosystem… even people who care about this stuff
  - mkozlows 4 months ago
    
    The point is that he's not an AI guy, he's a long-time software dev guy going back decades.
intrasight 4 months ago

> extremely painful debugging experiences.
I'd claim that if you're debugging the code - or even looking at it for that matter - that you're using AI tools the wrong way.
- chaosprint 4 months ago
  
  I'd be very interested to know of a way to make it work with AI that doesn't require debugging if you can illustrate.
  - intrasight 4 months ago
    
    Make what work with AI?
- vunderba 4 months ago
  
  Congratulations. You allow the AI to make some new subroutine, and you immediately commit and merge the changes to your system. You run it, and it executes without throwing any immediate errors.
  The business domain is far more nuanced and complex, and your flimsy "does it compile" test for the logic doesn't even begin to cover the necessary gamut of the input domain which you as the expert might have otherwise noticed had you performed even a cursory analysis of the LLM generated code before blindly accepting it.
  Nice to know that I'm going to be indefinitely employed fixing this kind of stuff for decades to come...
  - intrasight 4 months ago
    
    >You allow the AI to make some new subroutine
    Again, you're using AI the wrong way.
- collingreen 4 months ago
  
  This is exactly my impression of the summary of these kinds of posts and, I'm speculating here, maybe where there is such a stark difference.
  I'm guessing that the folks who read the output and want to understand it deeply and want to "approve" it like a standard pull request are having a very different perspective and workflow than those who are just embracing the vibe.
  I do not know if one leads to better outcomes than the other.
  - esafak 4 months ago
    
    Are you serious? Why not just vibe work with your human coworkers and merge to master then? Let's see what the outcome is!
    
    collingreen 4 months ago
    
    > Are you serious?
    I am serious and didn't think anything I said here was contentious. Which part are you feeling incredulity over? I'll try to clarify if I've been unclear or learn from your perspective.
    
    esafak 4 months ago
    
    You seem to be unsure if checking the code is likely to lead to better outcomes.
    
    collingreen 4 months ago
    
    I'm not passing judgement in that comment about if carefully crafting and curating code is a net win productivity wise over quickly churning out ai slop that is ostensibly viable long term.
    I _personally_ follow the careful crafting and review approach. I think a deep understanding of the systematic ideas underlying a project is critical to continuing to improve it effectively and I believe code that is easy to understand is extremely valuable for that. I'm my experience so far the ai stuff seems to create more, harder work for me in the long run and I end up with worse understanding after than if I just wrote it myself but I recognize that's just my person anedata so far and we're at the beginning of this ai coding landscape.
    I'm also open to the idea that lots of software we currently carefully craft by hand could likely be autogenerated ad hoc as needed. Moreover, good engineers are extremely expensive and most software is quite bad anyway so I expect there is a new balance to be found here.
    For my 2 cents, I'm not removing my or my company's ability to understand our own software in exchange for vendor lock in with ai (even if they weren't all owned by the current set of megalomaniacs) but I see the angles and I'm not afraid to consider the tradeoffs or have the conversation.

inciampati 4 months ago

I'm impressed by how many people who are working with Claude Code seem to have never heard of its open source inspiration, aider: https://aider.chat/

It's exactly what Yegge describes. It runs in the terminal, offering a retro vision of the most futuristic thing you can possibly be doing today. But that's been true since the command line was born.

But it's more than Claude Code, in that it's backend LLM agnostic. Although sonnet 3.7 with thinking _is_ killing the competition, you're not limited to it, and switching to another model or API provider is trivial, and something you might do many times a day even in your pursuit of code that vibes.

I've been a vim and emacs person. But now I'm neither. I never open an editor except in total desperation. I'm an aider person now. Just as overwhelmingly addicted to it as Yegge is to Claude Code. May the future tooling we use be open source and ever more powerful.

Another major benefit of aider is its deep integration with git. It's not "LLM plus coding" it's really like an interface between any LLM (or LLMs), git, and your computer. When things go wrong, I git branch, reset to a working commit, and continue exploring.

note: cross-posted from the other (flagged) thread on this

victorbjorklund 4 months ago

Aider is amazing. You can even use it in copypaste mode with web-based AI:s.

rs186 4 months ago

A single tweet with lots of analogy, with no screenshot/screen recording/code examples whatsoever. These are just words. Are we just discussing programming based on vibe?

delusional 4 months ago

It's influencer culture. It's like when people watch those "software developer" youtubers and pretend it's educational. It's reality television for computer people.
- mpalmer 4 months ago
  
  Reality television plus cooking show, exactly.
  - macNchz 4 months ago
    
    Cooking shows are a perfect analogy for this stuff. For some reason I never connected the highly-edited-mass-appeal "watch someone do skilled work" videos on YouTube with Food Network style content until just now, but you're right they're totally scratching the same basic itch. They make people feel like they're learning something just by watching, while there is really no substitute for actually just doing the thing.
    
    mpalmer 4 months ago
    
    Not to mention cooking show hosts often recommend or outright sell/endorse their tools.
    
    exikyut 4 months ago
    
    And the sad/crazy thing is that AI assisted coding/refactoring does seem to stray substantially over the line of observation vs participation to the point that the experience is largely defined by "learning by watching"
    :/
    Prompt engineering (to the extent that it results in the difference between applied LLM success vs failure) requires the human to truly grok what they're getting help on - the current top comment notes that the model applied more or less exactly the same diff they would have done, based on deep understanding of the exact codebase being modified - and also have some level of intuition about how LLMs work.
    Someone just treating the model and context like an obscure TV remote and pressing all the buttons to see what they do, will get a result, and it might seem interesting, but will it be ontologically correct? Good question!
- tylerrobinson 4 months ago
  
  > reality television for computer people
  Complete with computer people kayfabe!
frankc 4 months ago

I think the interest has more to do with who is doing the tweeting, don't you think?
- rs186 4 months ago
  
  Reminder: "appeal to authority" is a classical logical fallacy.
  For me, I don't know this person, which means that all the words are completely meaningless.
  Which is exactly the point.
  - frankc 4 months ago
    
    I don't think it's really an appeal to authority. No one is saying it must be true because this guy says its true. However, when a very respected figure in software engineer culture like Steve Yegge gives an opinion, that is more noteworthy than when random joe schmo gives his opinion. The fact that you don't know who he is means it's not interesting to you. Clearly, it's noteworthy to others.
  - JohnKemeny 4 months ago
    
    Exception: Be very careful not to confuse "deferring to an authority on the issue" with the appeal to authority fallacy. Remember, a fallacy is an error in reasoning. Dismissing the council of legitimate experts and authorities turns good skepticism into denialism. The appeal to authority is a fallacy in argumentation, but deferring to an authority is a reliable heuristic that we all use virtually every day on issues of relatively little importance. There is always a chance that any authority can be wrong, that’s why the critical thinker accepts facts provisionally. It is not at all unreasonable (or an error in reasoning) to accept information as provisionally true by credible authorities. Of course, the reasonableness is moderated by the claim being made (i.e., how extraordinary, how important) and the authority (how credible, how relevant to the claim).
    Quotation from https://www.logicallyfallacious.com/logicalfallacies/Appeal-...
    
    kittikitti 4 months ago
    
    Where are you going with this? The authority provided no evidence so noting the logical fallacy is correct here. In this scenario, people are appealing to authority because there's nothing else to count on.
    
    JohnKemeny 4 months ago
    
    The tweet is a domain expert talking about his experiences using a product.
    If (hyperbole) Picasso wrote a tweet saying certain canvases work great for certain tasks, and some random internet dude counters with A single tweet with lots of analogy, with no images/painting examples whatsoever. These are just words. Are we just discussing art based on vibe? ...
    I will read that comment in the comic book guy voice.
  - baq 4 months ago
    
    When Karpathy and Yegge speak, I listen. When they say approximately the same things, I listen carefully.
  - tymscar 4 months ago
    
    I agree with you that the tweet is basically useless but what they have done there is not an appeal to authority fallacy. They have only explained why it’s popular. Arguably your response is a genetic fallacy.
  - theonething 3 months ago
    
    > For me, I don't know this person, which means that all the words are completely meaningless.
    Because you don't know who Yegge is, the words are meaningless to you? So a body of text is meaningful only if you know who the author is? That's...lame.
mhh__ 4 months ago

> vibe
People do literally call it vibe coding.
https://en.wikipedia.org/wiki/Vibe_coding (it turns out there is a wikipedia page already although suspect it'll be gone soon)
- Bjorkbat 4 months ago
  
  I'm amused by all the flags this article has. It reinforces this belief that "vibe-coding" isn't something that evolved organically, but was forced. I wouldn't go as far as to call it "astroturfed", I believe it was a spontaneous emergence, but otherwise it feels like an effort by a disproportionately small group of people desperately trying to make vibe-coding a thing for whatever reason.
  - mhh__ 4 months ago
    
    it's definitely more organic than you think. I know smart productive people doing it as purely a 0 to 1 thing.
- rglover 4 months ago
  
  This is quite possibly one of the most disturbing things I've seen in awhile.
  Sure, for fun and one-off private apps this is fine, but the second that some buzzed-on-the-sides haircut guy thinks this is the way, the chaos will rival a Hollywood disaster movie.
  - leptons 4 months ago
    
    >"The practice defies the belief in the software industry that software engineering demands great skill."
    But actual software engineering does demand great skill. What "vibe coders" are doing isn't engineering. It's not even programming. It's like calling yourself a chef when you microwaved a frozen meal.
    
    baq 4 months ago
    
    Not even programming? That’s even better!
    Great engineers know that code is 10-20% of the whole solution to a problem and it doesn’t matter too much. Code is not an asset, it’s a liability.
    
    leptons 4 months ago
    
    >Code is not an asset, it’s a liability.
    Then AI generated code is a time bomb. Maintainability? Who needs that if you can sell the company quickly enough, I guess.
    
    baq 4 months ago
    
    Always has been! Not sure what AI changes here.
    
    rglover 4 months ago
    
    It encourages developers to not worry about understanding the code they write. At least now, there's someone around (typically) who can figure things out.
    But as more developers rely on AI and their skills atrophy, if entire products are being built with AI, eventually no one will exist who can fix the software when it breaks. Not only that, but anything that requires knowledge outside the training of the AI will be impossible (or at best, extremely difficult/costly) to implement.
    
    baq 4 months ago
    
    You’re right, my point is for most software it doesn’t matter. I can envision some software not even checking code in to source control, just prompts, in essence jumping to a higher level of abstraction, just like we don’t analyze what the v8 jit does to js when it emits machine code. (There are some who do, and no doubt there will be people who will look and/or care, just like there will be software that needs to be written in an exact way in assembly today.)
kleiba 4 months ago

What, someone cannot utter an opinion anymore?
- h4ny 4 months ago
  
  I find that question ironic.
  - kleiba 4 months ago
    
    But isn't that the point?

raylad 4 months ago

I tried it on a small Django app and was not impressed in the end.

It looks like it’s doing a lot, and at first I was very impressed, but after a while I realized that when it ran into a problem it kept on trying nonworking strategies even though it had tried them before and I had added to claude.md instructions to keep track of strategies and not reuse failing ones.

It was able to make a little progress, but not get to the end of the task, and some of its suggestions were completely insane. At one point there was a database issue and it suggested switching to an entirely different database than the one that was already used by the app, which was working and production.

$12 spent in a couple of hours later, it had created 1200 lines of partially working code and rather of a mess. I ended up throwing away all the changes and going back to using the web UI.

babyent 4 months ago

Now take your $12 and multiply it by 100k people or more trying it.
Even if you won’t use it again, that’s booked revenue for the next fundraise!
- bufferoverflow 4 months ago
  
  That's revenue, not profit. GPUs aren't cheap.
nprateem 4 months ago

I use it like a brush for new apps and a scalpel for existing ones and it generally works well. If it can't solve something after 3 attempts though I just do it.
winrid 4 months ago

LLMs seem to work a lot better with statically typed languages where the compiler can give feedback.
- jimbokun 4 months ago
  
  LLMs are what’s finally going to make Haskell popular!
  - tome 4 months ago
    
    And Haskell is going to make LLMs popular! https://groq.com/

phartenfeller 4 months ago

I tried it too and tasked it to do a bigger migration (one web framework to another). It failed pretty bad where I stopped the experiment. It still gave me a headstart where I can take parts and continue the migration manually. But the worst thing was that it did things I didn't asked for like changing the HTML structure and CSS of pages and changing hand picked HEX color codes...

More about my experience on my blog: https://hartenfeller.dev/blog/testing-claude-code

ludamn 4 months ago

Such a nice read, thanks for sharing!

ing33k 4 months ago

I tried Claude Code and gave it very clear instructions to build a web based tool I wanted to build over the weekend. It did exactly that ! sure, there were some minor modifications I had to make, but it completed over 80% of the work for me.

As for the app itself, it included a simple UI built on React with custom styling and real-time support using WSS . I provided it with my brand colors and asked it to use chadcn. It also includes a Node.js-based backend with socket.io and puppeteer. I even asked it to generate a Dockerfile and Kubernetes manifests. It almost did a perfect job—the only thing I had to fix manually was updating my Ingress to support WSS.

After studying the K8s manifests, I learned a bunch of new things as well.. spent around $6 for this session and I felt that it was worth it.

bob1029 4 months ago

I find that maintaining/developing code is not an ideal use case for LLMs and is distracting from the much more interesting ones.

Any LLM application that relies more-or-less on a single well-engineered prompt to get things done is entry level and not all that impressive in the big picture - 99% of the heavy lifting is in the foundation model and next token prediction. Many code assistants are based on something like this out of necessity of needing to support anybody's code. You can't rely on too many clever prompt chaining patterns to build optimizations for Claude Code because everyone takes different approaches to their codebase and has wildly differing expectations for how things should go down. Because the range of expectations is so vast, there is a lot of room to get disappointed.

The LLM applications that are most interesting have the model integrated directly with the product experience and rely on deep domain expertise to build sophisticated chaining of prompts, tool calling and nesting of conversations. In these applications, the user's experience and outcomes are mostly predetermined with the grey areas intended to be what the LLM is dealing with. You can measure things and actually do something about it. What was the probability of calling one tool over the other in a specific context of use? Placing these prompts and statistics alongside domain requirements will enable you to see and make a difference.

hleszek 4 months ago

I must have been a little too ambitious with my first test with Claude Code.

I asked it to refactor a medium-sized Python project to remove duplicated code by using a dependency injection mechanism. That refactor is not really straightforward as it involves multiple files and it should be possible to use different files with different dependencies.

Anyway, I explain the problem in a few lines and ask for a plan of what to do.

At first I was extremely impressed, it automatically used commands to read the files and gave me a plan of what to do. It seemed it perfectly understood the issue and even proposed some other changes which seemed like a great idea.

So I just asked him to proceed and make the changes and it started to create folders and new files, edit files, and even run some tests.

I was dumbfounded, it seemed incredible. I did not expect it to work with the first try as I had already some experience with AI making mistakes but it seemed like magic.

Then once it was done, the tests (which covered 100% of the code) were not working anymore.

No problem, I isolate a few tests failing and ask Claude Code to fix it and it does.

Now for a few times I found some failing tests and ask him to fix it, slowly trying to fix the mess until there is a test which had a small problem: it succeeded (with pytest) but froze at the end of the test.

I ask again Claude Code to fix it and it tries to add code to solve the issue, but nothing works now. Each time it adds some bullshit code and each time it fails, adding more and more code to try to fix and understand the issue.

Finally after $7,5 spent and 2000+ lines of code changed it's not working, and I don't know why as I did not make the changes.

As you know it's easier to write code than to read code so at end I decided to scrape everything and do all the changes myself little by little, checking that the tests keep succeeding as I go along. I did follow some of the recommended changes it proposed tough.

Next time I'll start with something easier.

jpc0 4 months ago

Really yoy nearly got the correct approach there.
I generally follow the same approach these days, ask it to develop a plan then execute but importantly have it excute each step in as small increments as possible and do a proper code review for each step. Ask if for changes you want it to make.
There is certainly times I need to do it myself but definitely this has improved some level of productivity for me.
It's just pretty tedious so I generally write a lot of "fun" code myself, and almost always do the POC myself then have the AI do the "boring" stuff that I know how to do but really don't want to do.
Same with docs, the modern reasoning models are very good at docs and when guided to a decent style can really produce good copy. Honestly R1/4o are the first AI I would actually concider pulling into my workflow since they make less mistakes and actually help more than they harm. They still need to be babysit though as you noticed with Claude.
UncleEntity 4 months ago

> ...do all the changes myself little by little, checking that the tests keep succeeding as I go along.
Or... you can do that with the robots instead?
I tried that with the last generation of Claude, only adding new functionality when the previously added functionality was complete, and it did a very good job. Well, Claude for writing the code and Deepseek-R1 for debugging.
Then I tried a more involved project with apparently too many moving parts for the stupid robots to keep track of and they failed miserably. Mostly Claude failed since that's where the code was being produced, can't really say if Deepseek would've fared any better because the usage limits didn't let me experiment as much.
Now that I have an idea of their limitations and had them successfully shave a couple yaks I feel pretty confident to get them working on a project which I've been wanting to do for a while.
darkerside 4 months ago

I'm curious for the follow up post from Yegge, because this post is worthless without one. Great, Claude Code seems to be churning out bug fixes. Let's see if it actually passes tests, deploys, and works as expected in production for a few days if not weeks before we celebrate.
- pchristensen 4 months ago
  
  He posts a few times a year at https://sourcegraph.com/blog
elcomet 4 months ago

I'm wondering if you can prompt it to work like this - make minimal changes, and run the tests at each step to make sure the code is still working
- espdev 4 months ago
  
  This thing can "fix" tests, not code. It just adjusts tests to incorrect code. So you need to keep an eye on the test code as well. That sounds crazy, of course. You have to constantly keep in mind that LLM doesn't understand what it is doing.
biorach 4 months ago

git commit after each change it makes. It will eventually get itself into a mess. Revert to the last good state and tell it to try a different approach. Squash your commits at the end

noisy_boy 4 months ago

The trick is not to get sucked into making it do 100% of the task and have a judgement of the sweet spot. Provide it proper details upfront along with the desired overall structure - that should settle in about 10-15 mins of back and forth. This must include tests that you have to review manually - again you will find issues and lose time again (say about 30-45mins). Cut your losses and close the lose ends of the test cide. Now run the tests and start giving it discreet tasks to fix the tests. This is easily 20-40 mins. Now take over and go through the while thing yourself because this is where you will find more issues upon in-depth checking (the LLM has done most of what it could) and this where you must understand the code you need to support.

iwasbirchyfirst 4 months ago

I went from copying and pasting with ChatGPT Canvas, to Claude Artifacts, to Cursor w Claude. I haven't explored Rules yet, but have been using the Notepads for much of this. Much of my time is spent managing git commits/reverts, and preparing to bring Claude up to speed after the next chat renewal.

AI coding is like having a partner who is both the smartest person in town, but also a functional alcoholic, who's really, really good at hiding it. LLMs act like they are working in a dark warehouse with a flashlight, and Altzheimers.

They can get an amazing amount of functional code done in a few minutes, and then spend hours trying to fix one detail. My standard prompt begins with, "Don't guess, debug!" They have limited resources and will bs you if they can.

For longer projects, since every prompt is almost starting from scratch (they do have a limited buffer, which will make it easy to become complacent), if you get into repeated debugging sessions, it will start creating new functions instead making exisiting functions work, and code bloat is tremendous. Perhaps Rules work, but I've given up trying to get it to code in my style. I'm trying to have AI do all the coding so I can just be the "idea" guy ("vibe" coding), so I'm learning to let go and let it code in ways the I would hate to maintain. It's working from code examples that don't use my style, so I'm not going to keep fighting it on style (with some execptions like variable naming conventions).

internet_points 4 months ago

> AI coding is like having a partner who is both the smartest person in town, but also a functional alcoholic, who's really, really good at hiding it.
I'm stealing this :-) (And from now on I'll be imagining coding alongside Columbo. Claude Columbo.)

benzible 4 months ago

This is very far from my experience. It's been impressive on a few things but as of now I'm on day 3 of trying to get it to fix an issue in an open source library I maintain that I haven't had time to deal with. I'm in an endless loop where it keeps having the same epiphany about the root cause, then it implements a "fix", then tries to verify via a test script, then when it runs into difficulty it keeps adding workarounds to the test script. I can't get it focused on the fact that it's acting on behalf of the library author and that hacking the test script has no value.

I am not disposed to AI skepticism. I'd love it if a tool existed that worked as this guy claims. Claude Code is the best tool of its type that I've worked with but I'd put it at "on balance a time-saver in a lot of cases, way more trouble than it's worth in many others".

mtlynch 4 months ago

This is particularly interesting, as Steve Yegge works on (and I think leads) Sourcegraph Cody[0], which is a competitor to Claude Code.

Cody does use Claude Sonnet, so they do have some aligned interests, but it's still surprising to see Yegge speak so glowingly about another product that does what his product is supposed to do.

[0] https://sourcegraph.com/cody

mechanicum 4 months ago

I mean, doing that is pretty much what made him (semi-)famous in the first place (https://gist.github.com/chitchcock/1281611).
- mtlynch 4 months ago
  
  Yeah, but it's pretty different complaining from the position of a rank and file engineer at what was then like a 50k-person org as opposed to praising a competitor's product when you're at a small company, and you're the public face of your product.
- istjohn 4 months ago
  
  Thanks, I never read that one. Yegge's writing is just delicious. He could write a guide to watching paint dry and I would savor every word.
esafak 4 months ago

Cody lets you pick your model.
manojlds 4 months ago

Rising tide lifts all the boats and all that.
Claude Code didn't feel that different to me, and maybe they have something that is better and when they do release it they can say hey look, we pushed hard and have something that's better than even Claude Code.

ddawson 4 months ago

I do not know how to code, nothing beyond the simplest things but I am obsessed with AI development. Several month’s ago I decided to see if Claude could do the work for me.

My test was to create a tower defense game that would run in a web browser. It took me about eight prompts, each time refining it, including agreeing to the suggesting that Claude recommended and seeing Claude agree with me on the bugs that I pointed out.

It was mind blowing. It’s not pretty at all but is recognizable as a TD game. I thanked Claude and said that was enough and Claude signed off as well saying, well if you’re going to continue to refine it, do these four things. I was really stunned.

101008 4 months ago

Honesty question, and leaving aside implications about what's possible and all of that, what was particular positive about the experience?
You didn't do anything, just asked a different entity to do it for you. And nothing noble or original, just a copy of existing games. I see no difference between this and getting a $500 coupon to use at Fiverrr and ask a freelance engineer to do the same while you chat with them.
- atonse 4 months ago
  
  Is there anything inherently noble about programming if not to solve a real world problem?
  If they were doing it for an exam where their skills were being evaluated, that’s one thing. But if they were doing it as a means to an end, does it matter if they found a more efficient way to do it?
  - monkaiju 4 months ago
    
    Idk about "noble" specifically, but fulfilling and meaningful yes. "Solving" a real world problem isnt the only reason I code. It might be why im paid for it, but I enjoy coding and get fulfillment from it.
    I recently interviewed a very pro-ai dev who's last position listed 'architect' and was stunned at how limited their knowledge seemingly was. I didnt have a coding question prepared because i assumed we could have a higher level 'architecture'-oriented chat but they had seemingly nothing to contribute... Thankfully another interviewer had a simple coding challenge prepared, which filtered the candidate out.
    Long story short, i wonder if the folks who.genuinely enjoy coding are going to be the only remaining skilled technologists after the rest make themselves obsolete by overly depending on these tools, regardless of how good they supposedly are.
  - 101008 4 months ago
    
    But copying an existing game is not solving a real world problem. You may do it to see if "I can make it", during a learning process, or as a challenge. But when you ask something/someone else to do it for you, what's the purpose?
    
    throwaway743 4 months ago
    
    Why are you trying to shit on the guy and put him down? What's wrong with just having fun and testing it's capabilities? So what if there's a million TD games out there? They wanted to see if it's possible and they enjoyed the experience. Or maybe someone wants to make a personalized version for someone they love, or a million other things.
    And what's it matter if someone else made it? Do you make your own bread or milk your own cows? Do you build your own cars?
    Like who gives a shit, when the most important thing for 95% of people is, "Can i use this? Can I operate it? What can this do for me? How does this make life easier/better/fun?".
    It's also interesting as it shows the reality we're heading towards with hyper-personalized media. Love it or hate it, it's coming.
    
    101008 4 months ago
    
    I asked him because I couldn't see the fun of it. And your examples about milk and car are exactly what I am saying. I don't create them...
    About your last example, it's like asking a AI to write Harry Potter but where the character has a different name and be blown away for what I did in a weekend!
    
    ddawson 4 months ago
    
    Exactly. It's also exciting to see things at the frontier, just because they're new. I wasn't planning to release the game, just get something working for myself. By the way, this was July 2024 shortly after Anthropic introduced Claude Artifacts.
- ddawson 4 months ago
  
  I'm not asking for an award. lol. I'm not sure exactly what you're after here, with asking what was particularly positive.
  It's a personal attempt to see how much I can do with an automaton. I could pay someone to do my taxes or file them myself (I'm in the US). There is much more room for frustration but also lots of benefits to the latter.
  In particular, with Claude Artifacts, I had a chance to see an amazing innovation. Have you ever wanted to see something new just because it's new? It changes you, which of course is one of the purposes of exploring novelties. By the way, this was my experience in July 2024.
kypro 4 months ago

I was talking to some colleagues about this recently, and I think the reason non-coders and amateur coders seem to be so much more impressed by the current state of AI code gen is that they don't fully understand what actually goes into the average software project.
Setting up a simple game on your local machine really isn't that hard. For example, you can probably take an open-source project and with some minor alterations have something working pretty quickly.
But most software development doesn't work like this. You are typically given very explicit requirements and when you're working for a corporate with real customers you have high quality standards that need to be met. Often this is going to require a lot of bespoke code and high-level solutionising which you're not going to get out of some open source project (at least not without significant changes).
Similarly, productionising products requires a lot of work and consideration which spinning something up locally doesn't. You need to think about security, data persistence, hosting, deployment, development environments, documentation, etc, etc, etc...
I think this partly explains why people have such widely different opinions on these tools at the moment. I acknowledge they write pretty good code, but for me they're almost useless in 90% of the things I do and think about as a software engineer.
fergie 4 months ago

I suspect, that just like real developers, that Claude is best at "greenfield" projects, but not so good at making changes to existing code generated by other developers or AIs.

bn-l 4 months ago

It really feels like I’m in an alternate reality with these posts. Is this paid shilling? I’m honestly wondering now considering how different my experience is EVERY DAY with llms for code.

rcpt 4 months ago

LLMs absolutely destroy interview questions and programming contests. They can spit out well designed classes and nicely packaged functions instantly.
But in my experience they haven't been great whenever asked to something more high level
namaria 4 months ago

> You just open your heart and your wallet, and Claude Code takes the wheel.
Yeah it sounds a lot like marketing to me

omgwalt 4 months ago

I've been using Claude for about 3 months now. What I've learned is that you have to learn how Claude "thinks" about your project, meaning what kinds of mistakes he makes. They're pretty consistent. As you uncover each new kind, compile a list of things to remind him of each time. For my own project, some of my reminder items to give him each chat (sometimes multiple times in the chat) include: "maintain a single source of truth", "avoid duplication", "stick to our established architecture instead of building new architecture". I always make sure he has an up-to-date file tree to look at. I remind him to ask me if there's a particular file he needs to see for reference instead of making up something new. I also say to make the changes specific to just what needs to be changed, rather than building in stuff and "anticipating" problems that aren't even here yet. Stuff like that. By the way, Project Knowledge is a very useful feature. Just don't expect Claude to ever look at it after your first statement/question. That's when he looks at what's there.

wholinator2 4 months ago

Tangential but referring to the AI as anything other than "it" is still extremely uncomfortable to me. It feels like the first step towards an inevitable "AI girl/boyfriend" trap.

trescenzi 4 months ago

I’m sorry what is happening with this paragraph:

> As long as the bank authorizations keep coming through, it will push on bug fixes until they're deployed in production, and then start scanning through the user logs to see how well it's doing.

I enjoy using these tools. They help me in my work. But the continual hype makes the discussion around them impossible to be genuine.

So I ask, genuinely, did I miss the configuration section where you can have it scan your logs for new errors and have it argue with you on PRs? Is he trying to say something else? Or is it just anthropomorphizing hype?

deanputney 4 months ago

I cannot tell if the original tweet is sarcasm or not. Sections like this make me think yes? It's got to be at least tongue-in-cheek.
- breckenedge 4 months ago
  
  My take is it's a mix of both sarcasm and not sarcasm, even in the same sentence. It's a post truth future with a ton of upvotes.
frankc 4 months ago

I haven't got to trying claude code yet, but absolutely with cursor and windsurf you can have the agent be reading the output of what it writes and runs and it can fix things it sees. You can also have it review code. It also help some times to have it review in a fresh chat with less context. I really think a lot of people on HN are not really pushing on everything that is available. Its magic for me but I spend a lot of effort and skill manifesting the magic. I'm not doubting other people's experience really, but wondering if they are giving up too fast because they actually don't want it work well for ego reasons.
- techpineapple 4 months ago
  
  I’m going to keep at it, because I was trained as an SRE, not a developer and have lots of ideas for side projects that have thus far taken a long time to get going, but I’ve been struggling, it sort of quickly gets into these infinite loop situations where it can’t seem it can’t seem to fix a feature and goes back and forth between multiple non working states. CSS layouts but even basic stuff like having the right web socket routes.
  We’ll see, maybe my whole approach is wrong, I’m going to try with a simpler project, my first approach was relatively complex.
  - ezekiel68 4 months ago
    
    I can't explain why, but I do get pretty good results with closing a prompt session completely and then initiating a fresh session later on. I have actually seen quite different code from the very same prompt across two sessions.
    However, the extra time between sessions does give me the chance to consider where the AI might have gone wrong in the first session and how I could have phrased the initial prompts more effectively.
    As others have stated throughout the threads here, I definitely recommend giving it smaller nibbles of problems. When I take this route and get working modules, I will start a fresh prompt session to upload the working code module files and ask it to integrate them into a simple solution with static inputs in a 'main' function. (Because providing coherent inputs from the router function inputs of a web service are simple enough, once these basics are covered)
    Basically - I do everything possible in order for the AI to not get distracted or go down a rabbit hole, chasing concerns that humans are very capable of taking care of (without paying tokens for). I will write most of the tests myself and then perhaps afterwards ask it to evaluate the test coverage and provide more of them. This way the new tests are in the format and platform of my choosing.
- trescenzi 4 months ago
  
  Oh ok this makes sense. Because of the ordering of the sentences I read it as “it pushes the code to production and then monitors it in production”.
  I have found that prompting something like “do X and write tests to confirm it works” works well for what you’re describing. Or even you write the tests then it’ll iterate to make sure they pass.
bakies 4 months ago

Yes it will, I wrote a quick script for local deployment (and then had claude improve it) and then quickly write documentation (and have claude improve it) on how to deploy & gather logs. It will do those things and follow the logs while I'm clicking around in the app. When starting a new session it will read the docs and know how to deploy and check logs. If something failed in the docker build that script output is read by claude since it ran it.
haven't tried PR stuff yet though

credit_guy 4 months ago

I'm using Copilot for writing documentation jupyter notebooks. I do lots of matplotlib plots. Setting up these plots takes lots of repetitive lines of code. Like plt.legend(). With Copilot these lines just show up, and you press tab and move on. Sometimes it is freaky how it guesses what I want to do. For this type of work, Copilot increases my productivity by a factor of 5 easily.

There are other types of work where Copilot is useless. But it's up to me to take the good parts, and ignore the bad parts.

bglazer 4 months ago

Yeah copilot is very good for matplotlib. Clunky interface with lots of repetitive code, but also tons of examples on the internet means that I almost never write matplotlib code by hand anymore.

jgalt212 4 months ago

That post from Yegge reads like it was written by some foaming at the mouth VC. I am getting decent use from Claude, but mostly as a stack overflow replacement. I will never let it read our company's code base. Then everyone will know how to do what we do. After all, that's why these things are so good at React and not so good at Solid (there's just so much public react code). Also, see the recent "AI IS STIFLING TECH ADOPTION" post

https://vale.rocks/posts/ai-is-stifling-tech-adoption

https://news.ycombinator.com/item?id=43047792

pmarreck 4 months ago

If you tell these AI assistants to use TDD and to only add 1 feature at a time, perhaps unsurprisingly, they code better! (Just like humans!) But then you have to keep reminding them or they'll forget. Just like humans...

SamCritch 4 months ago

I asked it to summarise my repo. It did a pretty good job. Then I asked it to see if it could summarise a specific function, but it said my $5 was up. Now I need to find whoever's in charge of our company account and spend half an hour busking in front of their desk to get some more credit.

Until then I'm sticking to a combination of Amazon Q Developer (with the @workspace tag in Intellij), ChatGPT and Gemini online.

BeetleB 4 months ago

For everyone complaining about the cost: Try Aider instead. You can easily limit the context and keep costs down.

A coworker tried both and found Code to be 4x the cost because it doesn't easily let you limit the context.

tysonworks 4 months ago

That's my experience as well. Claude Code/Cline — these tools are just a way to burn money. With Aider, my spending is minimal and I get exactly what I need.

gaia 4 months ago

Agreed it is a step above the rest, but I find that it still needs some oversight you'd hope it doesn't. Simple stuff like rewriting stuff differently which yields the same result (to the naked eye, no need to even test it), to having to call it out when it misses things (update other scripts with the same problem, reflect the changes in the README or requirements.txt) and to ask it to try harder to solve an issue in a better way.

Sometimes it takes the easy way out. If you look up your billing, you will see sometimes it uses the 3.5 instead of the 3.7 API, maybe this has something to do with it. It apologies ("you are correct", "yes that was hasty") but I'd rather have it try harder every time, not only when called out (and foot the bill that goes with that ofc).

But overall the experience is great. I hope to use it for a few months at least until something else comes along. At the current pace, I switch subscriptions/API/tool at least once a month.

999900000999 4 months ago

It’s ok.

It’s expensive and is correct for the easy 90%. It messes up the hard 10%.

I guess with it rapidly improving, I should wait 3 months. What’s frustrating is when you spend 20$ in credits on it writing non functional code.

That said, every programmer needs to at least demo these tools

owenthejumper 4 months ago

Feels like you shouldn't be using $20 of credits to produce non functional code.
I use Aider chat with Claude and my sessions are much smaller. You need to ask it much smaller tasks and work incrementally.
- 999900000999 4 months ago
  
  If you already have a moderately complex code base it starts making mistakes.
  - MyOutfitIsVague 4 months ago
    
    Depends a lot on what context you feed it. If you have decent internal documentation and can feed it the right files, it tends to do quite well for me, often needing corrections or style fixes. To use it most effectively, you have to know the vast majority of the code that you want it to write ahead of time. It saves time typing, looking up methods, wrangling APIs appropriately, and sometimes it surprises me by working around an edge case or writing an appropriate abstraction that I hadn't considered, but most of the time, it's mostly saving time on typing and tedious gluing, even with the corrections I make and hallucinations I have to correct.
    Maybe it will be at some point, but it is not yet a reasonable substitute for having to think, plan, or understand the code you are working with and what needs to be done. It's also still more expensive than it needs to be to use on my free time (I'm happy to burn company money on it at work, though).
    
    999900000999 4 months ago
    
    I think the next step is to get it to run some automated testing against the code it produces and then fix issues accordingly.
    If I was a better programmer I'd be working on that solution right now.
    
    becquerel 4 months ago
    
    Aider has built-in functionality for this. You can pass it a command (dotnet test or whatever) and it will autorun it after AI edits. If tests fail it can paste the output into the context for you.
    
    biorach 4 months ago
    
    Claude code does this already

macrolime 4 months ago

Can Claude Code also be a devops agent or is it only for coding?

I currently use Cursor as a devops agent, I use the remote ssh extension to ssh into a VM, then Cursor will set up everything, I make snapshots on way in case it fucks up. It's been really great to quickly be able to setup and try out different infrastructures and backends in no time at all. It works well enough that I now do all my development using remote dev with ssh or remote containers on the a server. Having a virtualized dev environment is a great addition to just having git for the code.

dcre 4 months ago

I think he’s wrong about it looking antiquated. It is one of the most beautifully done bits of TUI design I’ve seen.

vander_elst 4 months ago

Are there any videos showing these very advanced use cases? I'd be interested in learning how to achieve this level of proficiency. At the moment I still feel I'm better off without ai

MarkMarine 4 months ago

Claude code doesn’t need magic prompts. It’s not perfect but holy moly is it good, when it’s working it just one shots things for you. It’s just EXPENSIVE. I’ve spent 30$ in tokens on it this week.
Cursor’s “chat with your codebase” is a funny joke compared to Claude Code. Ask it questions, have it figure things out.
I had it analyze the openAPI schema and the backend that is serving the schema for the API I’m writing and write end to end tests for the API. Then I did my normal meetings and it was done with huge chunks of it, it had run the code locally and tested against the actual endpoints to understand if the end to end tests were working or it had found a bug. Then it fixed the bugs it found in the backend codebase. My prompt: “write me end to end tests against my openAPI schema”
That was it. 30$ in tokens later, pressing enter a bunch of times to approve its use of curl, sed, etc…
- vander_elst 4 months ago
  
  Thanks, but to me this feels too high level I'd really need to see a video of such things to better understand what's going on, how much time it took end to end, what was the quality. Yes I could spend 3 days and 300 bucks playing around with the tool, but I'd prefer to learn things offline first.
  - MarkMarine 4 months ago
    
    That is all right by me, we need the part of the adoption curve that isn’t at the leading edge.
- baal80spam 4 months ago
  
  > Claude code doesn’t need magic prompts. It’s not perfect but holy moly is it good, when it’s working it just one shots things for you. It’s just EXPENSIVE. I’ve spent 30$ in tokens on it this week.
  As long as it costs less than a developer, it will be used instead of the said developer.
- BeetleB 4 months ago
  
  Use Aider to keep the costs low. You can explicitly tell it what files to use.
- ido 4 months ago
  
  A junior developer cost about €300 per workday (fully loaded & location dependent) and often achieves far less than that per day.
- roflyear 4 months ago
  
  What are some examples of the bugs it fixed?
  - MarkMarine 4 months ago
    
    Accepting an epoch timestamp rather than 3339 like the rest of the codebase is the most recent example. The API standardized on one timestamp and I had an endpoint expecting epoch, it was smart enough to infer the rest of the APIs were 3339 and go change the epoch timestamp accepting backend route including writing a database migration and fixing tests. I haven’t pushed the fix yet
    So think about the reasoning there. It read the API docs that said all the timestamps were 3339, it ran a bunch of working 3339 timestamps and was getting them back from most of the APIs, so it decided this was a standard, read the code on the backend and understood this endpoint was different, and decided on the right fix and implemented it. Didn’t need to ask me for help. That is pretty impressive if you’ve been using copilot.
    
    an_taeus 4 months ago
    
    I hope this doesn’t come off as combative because that’s not my intention, but given that your code needed to be brought into alignment with the wider code base, couldn’t this have been achieved in a couple minutes with a find and replace that was scoped to just your endpoint methods and routes? In all the discussions about coding with AI I’m just continually struck by the idea that if someone has enough knowledge of their code base then most of the changes I see described seem trivial.
    On the other hand, I also get the sense that for a very large organization with siloed teams that sure, the org moved to bring all the endpoint timestamps to a standard 3339 and either you never got the memo or your endpoint being misaligned would get caught at code review time and changed then but the agent was proactive here and went ahead and saved some time. Like, truly, I get that the marginal gains there add up on the balance sheet.
    But what about when it is important that your APIs timestamps stay unaligned as epoch? Do you now need to set an override to meta-program that field for the AI as an additional token in the context window every time it runs?
    Part of me is impressed for sure about what’s being achieved, but then another part is like “is the boilerplate of copying in the database field names and then shift+alting to write 100 lines of { get; set; } at the same time really so terrible for everyone?” I’m being hyperbolic for effect but I think my point is clear.
    There’s also a third part of me that feels like I’m watching the creation of credit-default swaps of tech debt. If no one has time to do the changes that seem like a lot of work to a large organization with inertia because no one has the breadth of mastery needed to implement them as efficiently as possible, how the heck is anyone going to audit and fix the behavior when subtle mistakes eventually cascade?
    Is this just the coding version of Amazon outsourcing their shipping to third party contractors that definitely led to people dying but shielded Amazon from liability? Like, “oops, sorry we don’t know why the AI changed our code and your bank account got emptied, we’ll look into it, but it’s not our fault, just the vagaries of fate!”
    Again, I do understand the possible business savings in the short term in terms of administrative overhead, but I really shudder at the possible downstream effects of agentic code at scale and I’m trying to understand the real savings for a user that outweighs the downside risk of blowing up your production system and then being in a position of no one suited to fixing it who is responsible to do so.
    
    MarkMarine 4 months ago
    
    The promise of AI before all of this was a robot would do the dishes and fold the laundry, so we could do the enjoyable things. This is a robot that can do the dishes, finally, so I can do the hard work.

cybertheory 3 months ago

If anyone is having trouble with claude code output, my team and I are releasing an MCP server that provides all the latest and greatest technical knowledge in one place for AI to access. We are at 5k waitlists already, go ahead and sign up for updates! https://jetski.ai

relaxing 4 months ago

Does anyone know a good video of someone demonstrating their workflow with these coding assistants?

emporas 4 months ago

I have written a whole project [1] for making amateur videoclips [2] of A.I. generated music, using GPT and other LLMs. 10.000 to 12.000 lines of code was written exclusively by AIs.
I didn't know at the start what should be done, and the code ended up having lots of duplication, but i refactored it a lot and now it is in the order of 4.000 lines.
I could make some screencasts about the development process, but it is very simple. I ask it to write some code, and I always provide some relevant context. When it is a method of a struct, i give it the struct, and/or a similar method, and i describe what i want the method to do. Also sometimes I give it the type signature of the method/function which has to write. When it has to return an error, i provide the error Enum.
In other words, by providing the context, i never use it zero shot, rather always aim for few shot answers. When i want another function to use, i just provide the type signature, and almost never the whole function.
One more detail, I give as minimal of a context as possible. I use as a context window, 100 tokens, 200 or maximum 300 tokens. 100 tokens per query, then delete the context for the next task, and provide new context. Never use more than few hundred tokens per query. Even 300 tokens is pushing it too far.
That's about it! Never use LLMs zero shot, always few shot, and never use more than 100 tokens per query.
[1] https://github.com/pramatias/fxp_videoclipper/tree/main [2] https://www.youtube.com/watch?v=RmmoMPu091Y
pchristensen 4 months ago

This is the OP walking another developer through the process - https://m.youtube.com/watch?v=jpzv-_YQf6k
He had more like it - search for “yegge chat oriented programming”

ptsd_dalmatian 4 months ago

for those who don't want to go to twitter: https://xcancel.com/Steve_Yegge/status/1898674257808515242

turnsout 4 months ago

Yeah, the author needs to get off X/Twitter. At this point posting to X is like driving around in a Cybertruck with a bumper sticker reading "I Approve of the Current Situation"

catigula 4 months ago

I've burnt about $100 in Claude code so far.

It's very cool but highly limited in many ways.

It feels like this same statement has applied to LLMs since their popularity explosion.

dabinat 4 months ago

I used GitHub Copilot and Udemy to teach myself Rust. Copilot was especially helpful at resolving obtuse compiler error messages.

But as I’ve improved at Rust I have noticed I am using Copilot a lot less. For me now it has mainly become a tool for code completion. It’s not helping me solve problems now, it’s really just about saving time. I have estimated (unscientifically) that it probably improves my productivity 2-4x.

thyrsus 4 months ago

Do these AIs know how to do test driven development? Can you tell them the code generated must pass these test? Can AIs assist in developing tests?

esafak 4 months ago

Yes, absolutely.

jwr 4 months ago

After reading this, I tried Claude Code (in a docker container, as one does, you wouldn't want to use npm without protection after all).

I gave it a huge Clojure codebase and told it to implement proration in my subscription system. Which currently doesn't have proration of any kind. Very poorly specified task, and yet what I got was really quite good — not working, ready-to-run code, but certainly a great starting point.

I was impressed by how it was able to follow my coding conventions, interface with my Stripe library to (correctly) create and confirm payment intents, and most importantly, by the `calculate-proration-amount` function which was correct in spite of how difficult it was (there are custom plans, plan overrides, extra users, etc).

It is the first time I felt an AI coding tool is genuinely useful. I still think you need to set expectations correctly: this doesn't "just do stuff", it is not a human programmer and will not produce flawless ready-to-run code in most cases. But it's a very good tool.

jtwaleson 4 months ago

I haven't tried Claude Code yet, so forgive my ignorance, but does it integrate with linters as well as Cursor does? I've seem excellent results on my Rust & Typescript codebase where it started making a change in a wrong direction, but quickly fixed it when it got linting errors. It seems that a standalone CLI tool like Claude Code would struggle with this.

turnsout 4 months ago

I've tried out Claude Code a few times, and it seems to work fine, but not noticeably better than Cursor. And replacing my use of Cursor would add up to significantly more than $20/mo in Claude Code.

So why wouldn't I continue burning Cursor's VC money? LOL

orange_puff 4 months ago

https://open.substack.com/pub/orangepuff/p/first-impressions... I used Claude code to get started on a pdf reader I wanted to build. This pdf reader has a built in LLM chat and when you ask a question about the pdf you’re reading, the page text will be automatically prepended to the question.

Nothing fancy or special. It was built with streamlit in about 150 lines and a single file. But I was impressed that Claude code 1 shot it

dhumph 4 months ago

I find cline to be incredible about 80% of the time for creating new concept websites or python scripts. I use it with Open router and choose Claude exclusively. I’d CC is the next step we are headed in a crazy and scary direction.

skerit 4 months ago

I've successfully used it to fix a few issues here and there, but it also manages to make some pretty stupid mistakes. A few times it even started rewriting tests in a way so that the wrong outcome would be seen as a pass.

epolanski 4 months ago

Kinda reminds me of how when it finds issues with typescript it hacks the types rather than refine the values or business logic.

eddyg 4 months ago

See also: A quarter of startups in YC’s current cohort have codebases that are almost entirely AI-generated

https://techcrunch.com/2025/03/06/a-quarter-of-startups-in-y...

jnsaff2 4 months ago

I gave Claude Code a java codebase of an open source database and it burned through $3 to tell me exactly why and how it's restore and database loading is 10-20 times slower than it should be.

mritchie712 4 months ago

it's fun for things you're ok with throwing away.

For example, I wanted a dbt[0] like tool, but written in rust, specifically focused on duckdb. Claude Code knocked it out[1] it without much guidance.

Also added support for all duckdb output options (e.g. write to a partitioned parquet instead of a table).

0 - SQL transformation tool (https://github.com/dbt-labs/dbt-core)

1 - https://github.com/definite-app/crabwalk

kthxb 4 months ago

As a junior, this scares me. I don't think I'll be out of a job soon, but certainly the job market will change drastically and the times where SWEs are paid like doctors and lawyers will end?

jtwaleson 4 months ago

Make sure you learn a lot. Ask the LLMs to explain anything you don't deeply understand. With all of these coding assistants, there will be many juniors that get a lot done, but don't really understand what they are doing, and their worth will drop quickly.
So far LLMs are great at producing code, but not at architecture or maintenance.
- timeon 4 months ago
  
  Learning just by asking is not enough. One needs to exercise to build those muscles.
  I'm just restating the obvious because LLMs can do exercise for you, but there is not much to be gained if one follows this path.
- sureglymop 4 months ago
  
  Who's to say that their worth will really drop. We already live in a world where security and performance are largely forgotten aspects when it comes to software.
  Unfortunately I fear we will just enter an era of general slop where people get away with creating without really understanding. It sucks for anyone who is actually passionate and curious and really investing the time.
throw234234234 4 months ago

Across most of the world SWE's aren't paid like this anyway. It really is only the US that has this level of pay for the SWE staff unless you are into a different domain specialty (e.g. trading/finance) in which case you aren't just paid for coding knowledge.
Coding is the real market for LLM's - it is the "killer app". I'm starting to think that a possibility is that for most jobs it ends up being a "what was all the hype about" but for SWE's it will be carnage globally w.r.t jobs. They aren't generally intelligent (lots of hallucinations in other domains), but with RL can be trained on the mountains of open source software out there to displace software.
Software developers automating their own jobs away. The promise of "free software" with open source was realised - just not as people envisioned it.
No other profession would do this or at least not as fast. As a honest answer, as much as I want to be wrong, I don't recommend new comers join the industry unless they really want to do it and are happy taking the risk. The uncertainty is high right now, and the anxiety these tools are giving quite a number of people I talk to is very high. Capitalism doesn't reward value or how much is built - it allocates resources and rewards people who produce into scarce markets. AI makes the product of code significantly less scarce.

bv_dev 4 months ago

I have been using the Claude Code for about 48 hours now and nearly done with one full MVP. It did both my FastAPI side, building all the models guiding me through the Postgres part as well as a well polished React frontend. Burnt about $25 in 2 days [I have a lot of Anthropic credits] and created about 10k lines of good quality code all the way to a deployment. It is scary how much it can do to the point I'm not sure how long I will be needed.

tipsytoad 4 months ago

I usually am a huge fan of “copilot” tools (I use cursor, etc) and Claude has always been my go to.

But Sonnet 3.7 actually seems dangerous to me, it seems it’s been RL’d _way_ too hard into producing code that won’t crash — to the point where it will go completely against the instructions to sneak in workarounds (e.g. returning random data when a function fails!). Claude Code just makes this even worse by giving very little oversight when it makes these “errors”

curiouser3 4 months ago

this is a huge issue for me as well. It just kind of obfuscates errors and masks the original intent, rather than diagnosing and fixing the issue. 3.5 seemed to be more clear about what it's doing and when things broke at least it didn't seem to be trying to hide anything.

mikeocool 4 months ago

Counter point: I just spent $1 and 25 minutes to try and have Claude Code figure out why a relatively simple test was failing. It repeatedly told me incorrect things about the basic functioning of the code, and ended with "Try adding debug logging in your code to see what's happening in that specific check, or look at the exact error message from the failing test."

In other words: "do your job, developer!"

adamgroom 4 months ago

I find AI code completion can be annoying and misleading at times. I do find that I spend less time typing though, it’s great at guessing the simple stuff.

theusus 4 months ago

Hilarious claim without any proof. Sure I believe you

jonwinstanley 4 months ago

Do any of the current AI systems allow you to use voice?

I’d love to sometimes chat to an agent and dictate what I want to happen.

For one project there’s a lot of boilerplate and I imagine AI could be really fast for tasks like: “create a new controller called x”, “make a new migration to add a new field the to users table called x” etc

pramodbiligiri 4 months ago

VS Code Insiders has an Agent mode now that can also take voice input. You should be able to see the microphone icon to the left of "Agent" in the demo video in their blog post: https://code.visualstudio.com/blogs/2025/02/24/introducing-c...
mattew 4 months ago

I saw something mentioned of voice control in aider.chat
- jonwinstanley 4 months ago
  
  Found!
  https://aider.chat/docs/usage/voice.html
  Thanks!

andrewstuart 4 months ago

I had to give up my attempt to use Claude code when it didn’t let me specify my API key and password, instead requiring me to sign into a user account first which then forced creation of an API account ?

Something like that. Anyhow Claude needs to allow me to put in my own API keys.

tifik 4 months ago

The second paragraph is clearly sarcastic, but the rest seems genuine, so Im a bit confused.

throwaway314155 4 months ago

The second paragraph isn't sarcastic. At least not w.r.t. to Claude Code. That bit there about North Korean hackers is mild sarcasm, but has no bearing on the remainder of the post.
jofzar 4 months ago

I was confused untill I watched a video of it in use, nope it wasn't that sarcastic.
https://youtu.be/W13MloZg03Y
- cpldcpu 4 months ago
  
  That guy leads in with stating that he is missing "autocomplete" in claude code. Cleary a misunderstanding of the scope.
  - jofzar 4 months ago
    
    I mostly posted the video because I thought it was a good demonstration of the prompts to approve/cost that the tweet was talking about.
    He talks about it later of using autocomplete vs complete writing, I don't think he misunderstood the scope just talking about the difference of ways ai can be used for coding assistant.

rw2 4 months ago

I tried this vs Cline/Aider/Lovable.

For full stack, I would say Lovable is the best, for complexity, I think Cline is the best.

Claude Code with the several request I gave it just produced code that didn't run. I think it has a long way to go.

rishikeshs 4 months ago

+1 Cline is the best. I've tried it with openrouter!

winrid 4 months ago

So far I like using LLMs to create nasty scripts to do large migrations. It can churn out regexes that I would never write, but it's more predictable than just having the LLM make a bunch of changes directly.

telotortium 4 months ago

https://threadreaderapp.com/thread/1898674257808515242.html

sixQuarks 4 months ago

All I see here is coping by developers. You all are so blind to what’s coming.

Let’s not forget, if you relied on hacker news to give you accurate takes on things, you would’ve never bought bitcoin at one cent.

someothherguyy 4 months ago

What is coming? AGI that will destroy any notion of property rights or freedom that you ever held? Or something better?
- sixQuarks 4 months ago
  
  If past human behavior is any indication, way worse.

CSMastermind 4 months ago

I really wish they'd improve the Windows experience. Even with WSL2 set up I had to completely remove node and npm from my system and reinstall them only in Ubuntu to make it work.

emalafeew 4 months ago

Somebody here said programming is about designing and reusing libraries rather than coding from scratch like LLMs. But that choice has always been a tradeoff between abstraction and bloat vs performance and debuggability. Writing 50 lines of intentional code is often preferable to adapting a 50000 line library. The trick to using LLMs is to not ask for too much. Claude 3.5 could reliably code tasks that would have taken 5-10 min by hand. Claude 3.7 is noticeably better and can handle 10-30 min tasks without error. If you push those limits you will lose instead of save time.

thatsallfolkss 4 months ago

I think that was the shortest Yegge rant I have ever read. I was expecting 2000 words or more. Is the endless rant dead as an art form?

egorfine 4 months ago

Sidenote: thank you for using twitter dot com instead of x. This is a little detail and I'm sure I'm not the only one appreciating it.

Philpax 4 months ago

For me, Twitter as we knew it has been long dead, and X is the shambling, corrupt corpse that's taken its place.
I no longer mind referring to it as X, because that clearly outlines that it's a different website with a much more rancid vibe.
- egorfine 4 months ago
  
  That's another take. Never thought of it that way.
hleszek 4 months ago

We should use xcancel instead.
- tom_ 4 months ago
  
  There is also nitter.poast.org. There sites are possibly better than twitter.com or x.com, as more of the thread is made visible to non-users.
  - Chaosvex 4 months ago
    
    Giving away the secrets! ;)
    If the few services that are left become well-known, they'll cease to function just like the majority. Such is scraping Twitter now

ant6n 4 months ago

I don't get to code much these days, so I mostly use chatGPT to get occasional helps with scripts and whatnot. I've tried to get help from ChatGPT on a simple javascript/website project that is basically a css/js/html file, but I feel like I don't know how to give the chatbot the code except just pasting it in the prompt. Is Claude better for that, like does it have some IDE or something to play with? Or do people generally use some third party tool for integrations.

MyOutfitIsVague 4 months ago

To do it effectively, you need an API key and a tool like Aider: https://aider.chat/

dailykoder 4 months ago

AI Policy for Application *

While we encourage people to use AI systems during their role to help them work faster and more effectively, please do not use AI assistants during the application process. We want to understand your personal interest in Anthropic without mediation through an AI system, and we also want to evaluate your non-AI-assisted communication skills. Please indicate 'Yes' if you have read and agree.

chilldsgn 4 months ago

It makes working with a complex Angular enterprise application less painful. I hate Angular with a passion.

ElijahLynn 4 months ago

Can someone post a non-twitter, archive link? I have it blocked on my computer and phone.

fiatjaf 4 months ago

It's sad that I can't read the contents of that URL without an account.

RamblingCTO 4 months ago

Has anybody compared claude code with clint/roo code yet?

motorest 4 months ago

Does anyone know how Mistral fares against Claude Code?

jamil7 4 months ago

How does it compare to Aider with the same model?

CGamesPlay 4 months ago

I just posted as a top-level, but the biggest difference I see is that Claude Code does better at identifying which files to change than Aider does, and it's a lot more expensive.
https://news.ycombinator.com/item?id=43315371
- jamil7 4 months ago
  
  Thanks, super helpful. I tried it out and had similar observations, in my case, I also work on a Swift codebase but with a few internal dependencies that are in separate repos, Aider can handle this by pointing it to files in other repos with the read-only command. I wasn't able to get that working with Claude code. I also found the "magic" dependency resolution part of Claude code to be a little wasteful/poor, at least in my codebase in which it chewed through time and tokens searching for the correct files.

adamtaylor_13 4 months ago

Is it just me or is Anthropic years ahead of the competition? I’ve used several other AI tools and not a single one feels remotely close to Claude.

And notice how few benchmarks will include Sonnet 3.5/7 in their new model announcements, because they’re awful compared to Sonnet.

What’s going on here?

calrain 4 months ago

Please stop using X

If the bar lets Nazi's in, it's a Nazi bar.

martypitt 4 months ago

Further follow-up from Steve (OP), where he says it just gets better:

> Claude Code keeps doing stuff. It keeps solving massive problems, one after another. I throw larger and larger things at it, and it is unfazed. Chomp. Chomp. Chomp.

https://x.com/Steve_Yegge/status/1898993080931611112

KeplerBoy 4 months ago

The comment "@grok can you summarize" kills me. This post is like 200 words and takes a minute to read. Is this the direction we (or some of us) are headed?

ramblerman 4 months ago

lol - I thought the same, but the charitable take after looking at the user's profile is that he is a journalist, and not tech savvy.
So I understand his request as more along the lines of, can you explain this in a way that I understand it. For which summarization is the wrong phrasing.
ie.. It seems trivial on hacker news, but that post would be pure giberish for most of our parents.
- Shebanator 4 months ago
  
  That's steve yegge (https://en.wikipedia.org/wiki/Steve_Yegge) you are talking about. Kids these days, sheesh, they think lisp is a speech impediment.... /s
  - riwsky 4 months ago
    
    They’re not—they’re talking about this guy who replied to yegge: https://x.com/Beirutspring/status/1898689770563260535
joshmlewis 4 months ago

It's become very commonplace for there to be a dozen replies on viral posts where users all @grok to ask if the post is real or for more context. It's almost more work to compose a reply asking Grok about it than it is to just click the Grok button and it give you more context that way. I don't get it.
hhh 4 months ago

I see this a lot, I think most of these people would have just scrolled on otherwise. I don’t get it.

curtisszmania 4 months ago

[dead]

Thing11Uniq 4 months ago

[dead]

Sterling9x 4 months ago

[dead]

darepublic 4 months ago

TLDR; Claude code is the bomb yo. Anthropoic are the only ones who know wtf they are doing!

Please like and follow the smiling non threatening avatar of the author

bflesch 4 months ago

[flagged]

vlod 4 months ago

That's quite a wide brush you're painting with.
I think it would be useful if you provide a reason for 'why'.
Quite a few people know how to use mutes, lists etc to get the best from it.
If all you click on is rage-bait articles, hate-driven politics or people promoting their Only Fans pages with not subtle imagery, then the Algorithm will keep feeding you more of it.
bentobean 4 months ago

First it was Facebook, now it’s Twitter / X. What you’re essentially saying is - “I can’t take a platform seriously where many / most of the people disagree with me.”
- bflesch 4 months ago
  
  [flagged]
  - sejje 4 months ago
    
    You think Steve Yegge (author of the post) is a moron?
  - dboreham 4 months ago
    
    I've noticed that the morons came first. Facebook will surface them given a sufficiently large comment thread. There's no escaping them.
spiderfarmer 4 months ago

I have the same thought. I also want all politicians to stop using it. It might have been useful in the past but I’m now convinced it’s largely useless. A large part of the population actively shuns the website and the remainder is mostly people who love rage bait.

juzipar 4 months ago

Remember, LLMs have seen all the code in the Internet. And there is also a lot of bad code out there...