develoopest 11 hours ago

I must be the dumbest "prompt engineer" ever, each time I ask an AI to fix or even worse, create something from scratch it rarely returns the right answer and when asked for modification it will struggle even more.

All the incredible performance and success stories always come from these Twitter posts, I do find value in asking simple but tedious task like a small refactor or generate commands, but this "AI takes the wheel level" does not feel real.

  • abxyz 10 hours ago

    I think it's probably the difference between "code" and "programming". An LLM can produce code and if you're willing to surrender to the LLMs version of whatever it is you ask for, then you can have a great and productive time. If you're opinionated about programming, LLMs fall short. Most people (software engineers, developers, whatever) are not "programmers" they're "coders" which is why they have a positive impression of LLMs: they produce code, LLMs produce code... so LLMs can do a lot of their work for them.

    Coders used to be more productive by using libraries (e.g: don't write your own function for finding the intersection of arrays, use intersection from Lodash) whereas now libraries have been replaced by LLMs. Programmers laughed at the absurdity of left-pad[1] ("why use a dependency for 16 lines of code?") whereas coders thought left-pad was great ("why write 16 lines of code myself?").

    If you think about code as a means to an end, and focus on the end, you'll get much closer to the magical experience you see spoken about on Twitter, because their acceptance criteria is "good enough" not "right". Of course, if you're a programmer who cares about the artistry of programming, that feels like a betrayal.

    [1] https://en.wikipedia.org/wiki/Npm_left-pad_incident

    • miki123211 5 hours ago

      Oh, this captures my experience perfectly.

      I've been using Claude Code a lot recently, and it's doing amazing work, but it's not exactly what I want it to do.

      I had to push it hard to refactor and simplify, as the code it generated was often far more complicated than it needed to be.

      To be honest though, most of the code it generated I would accept if I was reviewing another developer's work.

      I think that's the way we need to look at it. It's a junior developer that will complete our tasks, not always in our preferred way, but at 10x the speed, and frequently make mistakes that we need to point out in CR. It's not a tool which will do exactly what we would.

    • jmull 10 hours ago

      All I really care about is the end result and, so far, LLMs are nice for code completion, but basically useless for anything else.

      They write as much code as you want, and it often sorta works, but it’s a bug filled mess. It’s painstaking work to fix everything, on part with writing it yourself. Now, you can just leave it as-is, but what’s the use of releasing software that crappy?

      I suppose it’s a revolution for that in-house crapware company IT groups create and foist on everyone who works there. But the software isn’t better, it just takes a day rather than 6 months (or 2 years or 5 years) to create. Come to think of it, it may not be useful for that either… I think the end-purpose is probably some kind of brag for the IT manger/exec, and once people realize how little effort is involved it won’t serve that purpose.

      • barbazoo 5 hours ago

        I love the subtle mistakes that get introduced in strings for example that then take me all the time I saved to fix.

        • dabinat 4 hours ago

          Do you have an example of this?

          • the_lonely_time 2 hours ago

            Can’t remember account login so created a new account to respond.

            I recently used Claude with something along the lines of “Ruby on rails 8, Hotwire, stimulus, turbo, show me how to do client side validations that don’t require a page refresh”

            I am new to prompt engineering so feel free to critique. Anyway, it generated a stimulus controller called validations_controller.js and then proceeded to print out all of the remaining connected files but in all of them it referred to the string “validation” not “validations”. The solution it provided worked great and did exactly what I wanted to (though I expected a turbo frame based solution not a stimulus solution, but whatever it did what I asked it to do) with the exception of having to change all of the places where it put the string “validation” where it needed to put “validations” to match the name it used in the provided stimulus controller.

          • dingnuts 3 hours ago

            are you in the habit of saving bad LLM output to later reference in future Internet disputes?

      • fallinditch 4 hours ago

        Have you tried using Cursor rules? [1]

        Creating a standard library stdlib with many (potentially thousands) of rules, and then iteratively adding to and amending the rules as you go, is one of the best practices for successful AI coding.

        [1] https://docs.cursor.com/context/rules-for-ai

        • jmull 3 hours ago

          > …many (potentially thousands) of rules, and then iteratively adding to and amending the rules as you go…

          Is this an especially better (easier, more efficient) route to a working, quality app/system than conventional programming?

          I’m skeptical if the answer to the way to achieve 10x results is 10x more effort.

          • fallinditch 2 hours ago

            It's such a fast moving space, perhaps the need for 'rules' is just a temporary thing, but right now the rules will help you to achieve more predictable results and higher quality code.

            You could easily end up with a lot of rules if you are working with a reasonably large codebase.

            And as you work on your code, every time you have to deal with an issue of the code generation you ask Cursor to create a new rule so that next time it does it correctly.

            In terms of AI programming vs conventional programming, the writing's on the wall: AI assistance is only getting better and now is a good time to jump on the train. Knowing how to program and configure your AI assistants and tools is now a key software engineering skill.

        • UltraSane 4 hours ago

          at that point aren't you just replacing regular programming with creating the thousands of rules? I suppose the rules are reusable so it might be a form of meta-programming or advanced codegen

    • icedchai 5 hours ago

      This aligns with my experience. I've seen LLMs produce "code" that the person requesting is unable to understand or debug. It usually almost works. It's possible the person writing the prompt didn't actually understand the problem, so they got a half baked solution as a result. Either way, they need to go to a human with more experience to figure it out.

    • beezlewax 10 hours ago

      I'm waiting for artisan programming to become a thing.

      • discordance 8 hours ago

        by 100% organic, free range and fair trade programmers

        • djmips an hour ago

          Replace programmers with 'intelligence' to contrast with artificial

      • dr_dshiv 8 hours ago

        Like, writing binaries directly? Is assembly code too much of an abstraction?

        • swat535 6 hours ago

          People stress about good system design because of maintainability. No one cares about binary code because that's just the end result. What matters is the code that generates it, as that’s what needs to be maintained.

          We have not yet reached a point where LLM generated code can also be maintained by LLMs and the tooling is not there. Once that happens, your argument will hold more weight. But for now, it doesn’t. Injecting unreadable, likely bug-ridden code into your application increases technical debt tenfold.

      • pydry 10 hours ago

        Artisanal code has been a thing for a long while.

        If we're the luddite artisans, LLMs seem to represent the knitting frames which replaced their higher quality work with vastly cheaper, far crappier merchandise. There is a historical rhyme here.

        • ReptileMan 10 hours ago

          You didn't had to spend time debugging a peace of cloth, and the cloth defects are obvious

          • pydry 10 hours ago

            There's a lot of code out there written for people who are far more concerned with cost and speed than quality - analogous to the "fast fashion" consumer segment.

            Ive worked on all sorts of code bases filled to the brim with bugs which end users just worked around or ignored or didnt even encounter. Pre-product market fit startups, boring crappy CRUD for routine admin, etc.

            It was horrible shit for end users and developers (me) but demand is very high. I expect demand from this segment will increase as LLMs drive the cost of supply to nearly zero.

            I wouldnt be surprised if high end software devs (e.g. >1 million hit/day webapps where quality is critical) barely do anything different while the demand for devs at the low end of the market craters.

    • roflyear 5 hours ago

      > LLMs version of whatever it is you ask for, then you can have a great and productive time

      Sure, but man are there bugs.

  • BeetleB 4 hours ago

    Some hints for people stuck like this:

    Consider using Aider. It's a great tool and cheaper to use than Code.

    Look at Aiders LLM leaderboard to figure out which LLMs to use.

    Use its architect mode (although you can get quite fast without it - I personally haven't needed it).

    Work incrementally.

    I use at least 3 branches. My main one, a dev one and a debug one. I develop on dev. When I encounter a bug I switch to debug. The reason is it can produce a lot of code to fix a bug. It will write some code to fix it. That won't work. It will try again and write even more code. Repeat until fixed. But in the end I only needed a small subset of the new code. So you then revert all the changes and have it fix it again telling it the correct fix.

    Don't debug on your dev branch.

    Aider's auto committing is scary but really handy.

    Limit your context to 25k.

    Only add files that you think are necessary.

    Combining the two: Don't have large files.

    Add a Readme.md file. It will then update the file as it makes code changes. This can give you a glimpse of what it's trying to do and if it writes something problematic you know it's not properly understanding your goal.

    Accept that it is not you and will write code differently from you. Think of it as a moderately experienced coder who is modifying the codebase. It's not going to follow all your conventions.

    https://aider.chat/

    https://aider.chat/docs/leaderboards/

    • tptacek 4 hours ago

      The three-branch thing is so smart.

      • BeetleB 4 hours ago

        It took a while for me to realize it, and frankly, it's kind of embarrassing that I didn't think of it immediately.

        It is, after all, what many of us would do in our manual SW development. But when using an LLM that seems pretty good, we just assume we don't need to follow all the usual good practices.

        • vlovich123 3 hours ago

          Does the LLM make commits along the way? I think I’m missing why you need all these branches vs git reset —hard once it figures out the bug?

          • BeetleB 3 hours ago

            Aider, by default, makes commits after each change (so that you can easily tell it to "undo"). Once a feature is done, you manually squash the commits if desired. Some people love it, some hate it.

            You can configure it not to autocommit, although I suppose the "undo" command won't work in that case.

            • matthewmc3 3 hours ago

              The just sounds like ⌘-Z with extra steps.

              • BeetleB 2 hours ago

                Aider doesn't run in your editor. Undo in editor won't undo Aider's changes.

  • branko_d 10 hours ago

    I have the same experience.

    Where AI shines for me is as a form of a semantic search engine or even a tutor of sorts. I can ask for the information that I need in a relatively complex way, and more often than not it will give me a decent summary and a list of "directions" to follow-up on. If anything, it'll give me proper technical terms, that I can feed into a traditional search engine for more info. But that's never the end of my investigation and I always try to confirm the information that it gives me by consulting other sources.

    • mentalgear 10 hours ago

      Exactly same experience: since the early-access GPT-3 days, I played out various scenarios, and the most useful case has always been to use generativeAI as semantic search. It's generative features are just lacking in quality (for anything other than a toy project), and the main issues since the early GPT days remains, even though it gets better, it's still too unreliable for (mid-complex systems) serious work. Also, if you don't pay attention, it messes up other parts of the code.

    • jofzar 10 hours ago

      Yeah I have had some "magic" moments where I knew "what" I needed, had an idea of "how it would look",but no idea how to do it and ai helped me understand how I should do it instead of the hacky very stupid way I would have done it

    • Yoric 10 hours ago

      Same here. In some cases, brainstorming even kinda works – I mean, it usually gives very bad responses, but it serves as a good duck.

      Code? Nope.

  • matt_heimer 7 hours ago

    LLM are replacing Google for me when coding. When I want to get something implemented, let's say make a REST request in Java using a specific client library, I previously used Google to find example of using that library.

    Google has gotten worse (or the internet has more garbage) so finding code an example is more difficult than it used to be. Now I ask an LLM for an example. Sometimes I have to ask for a refinement and and usually something is broken in the example but it takes less time to get the LLM produced example to work than it does to find a functional example using Google.

    But the LLM has only replaced my previous Google usage, I didn't expect Google to develop my applications and I don't with LLMs.

    • ptmcc 5 hours ago

      This has been my experience of successful usage as well. It's not writing code for me, but pulling together the equivalent of a Stack Overflow example and some explaining sentences that I can follow up on. Not perfect and I don't blindly copy paste it, same as Stack Overflow ever was, but faster and more interactive. It's helpful for wayfinding, but not producing the end result.

    • layer8 4 hours ago

      In order to use a library, I need to (this is my opinion) be able to reason about the library’s behavior, based on a specification of its interface contract. The LLM may help with coming up with suitable code, but verifying that the application logic is correct with respect to the library’s documented interface contract is still necessary. It’s therefore still a requirement to read and understand the library’s documentation. For example, for the case of a REST client, you need to understand how the possible failure modes of the HTTP protocol and REST API are translated by the library.

  • slooonz 9 hours ago

    I decided to try seriously the Sonnet 3.7. I started with a simple prompt on claude.ai ("Do you know claude code ? Can you do a simple implementation for me ?"). After minimal tweaking from me, it gave me this : https://gist.github.com/sloonz/3eb7d7582c33e95f2b000a0920016...

    After interacting with this tool, I decided it would be nice if the tool could edit itself, so I asked (him ? it ?) to create its next version. It came up with a non-working version of this https://gist.github.com/sloonz/3eb7d7582c33e95f2b000a0920016.... I fixed the bug manually, but it started an interactive loop : I could now describe what I wanted, describe the bugs, and the tool will add the features/fix the bugs itself.

    I decided to rewrite it in Typescript (by that I mean: can you rewrite yourself in typescript). And then add other tools (by that: create tools and unit tests for the tools). https://gist.github.com/sloonz/3eb7d7582c33e95f2b000a0920016... and https://gist.github.com/sloonz/3eb7d7582c33e95f2b000a0920016... have been created by the tool itself, without any manual fix from me. Setting up the testing/mock framework ? Done by the tool itself too.

    In one day (and $20), I essentially had recreated claude-code. That I could improve just by asking "Please add feature XXX". $2 a feature, with unit tests, on average.

    • WD-42 6 hours ago

      So you’re telling me you spent 20 dollars and an entire day for 200 lines of JavaScript and 75 lines of python and this to you constitutes a working re-creation of Claude Code?

      This is why expectations are all out of whack.

      • slooonz 26 minutes ago

        2200 lines. Half of them unit tests I would probably have been too lazy to write myself even for a "more real" project. Yes, I consider $20 cheap for that, considering:

        1. It’s a learning experience 2. Looking at the chat transcripts, many of those dollars are burned for stupid reasons (Claude often fails with the insertLines/replaceLines functions and break files due to miss-by-1 offset) that are probably fixable 3. Remember that Claude started from a really rudimentary base with few tools — the bootstrapping was especially inefficient

        Next experiment will be on an existing codebase, but that’s probably for next weekend.

      • BeetleB 4 hours ago

        That amount of output is comparable to what many professional engineers produce in a given day, and they are a lot more expensive.

        Keep in mind this is the commenters first attempt. And I'm surprised he paid so much.

        Using Aider and Sonnet I've on multiple occasions produced 100+ lines of code in 1-2 hours, for under $2. Most of that time is hunting down one bug it couldn't fix by itself (reflective of real world programming experience).

        There were many other bugs, but I would just point out the failures I was seeing and it would fix it itself. For particularly difficult bugs it would at times even produce a full new script just to aid with debugging. I would run it and it would spit out diagnostics which I fed back into the chat.

        The code was decent quality - better than what some of my colleagues write.

        I could probably have it be even more productive if I didn't insist on reading the code it produced.

        • slooonz 22 minutes ago

          Remember that input tokens are quadratic with the length of the conversation, since you re-upload the n previous messages to get the (n+1)-nth message. When Claude completes a task in 3-4 shots, that’s cents. When he goes down in a rabbit hole, however…

        • WD-42 3 hours ago

          The lines of code isn’t the point. Op claimed they asked Claude to recreate Claude code and it was successful. This is obviously an extreme exaggeration. I think this is the crux of a lot of these posts. This code generator output a very basic utility. To some this is a revelation, but it leaves others wondering what all the fuss is about.

          It seems to me people’s perspective on code gen has largely to do with their experience level of actually writing code.

          • BeetleB 3 hours ago

            It's a very narrow reading of his comment. What he meant to say was it quickly created a rudimentary version of an AI code editor.

            Just as a coworker used it to develop an AI code review tool in a day. It's not fancy - no bells and whistles, but it's still impressive to do it in a day with almost no manual coding.

            • WD-42 2 hours ago

              > In one day (and $20), I essentially had recreated claude-code.

              Not sure it’s a narrow reading. This is my point, if it’s a basic or rudimentary version people should be explicit about that. Otherwise these posts read like hype and only lead to dissatisfaction and disappointment for others.

              • BeetleB 2 hours ago

                s/reading/interpretation/

                Reading something literally is by definition the narrowest interpretation.

    • Silhouette 6 hours ago

      Thanks for writing up your experience and sharing the real code. It is fascinating to see how close these tools can now get to producing useful, working software by themselves.

      That said - I'm wary of reading too much into results at this scale. There isn't enough code in such a simple application to need anything more sophisticated than churning out a few lines of boilerplate that produce the correct result.

      It probably won't be practical for the current state of the art in code generators to write large-scale production applications for a while anyway just because of the amount of CPU time and RAM they'd need. But assuming we solve the performance issues one way or another eventually it will be interesting to see whether the same kind of code generators can cope with managing projects at larger scales where usually the hard problems have little to do with efficiently churning out boilerplate code.

      • NitpickLawyer 5 hours ago

        aider has this great visualisation of "self written code" - https://aider.chat/HISTORY.html

        • throwaway0123_5 5 hours ago

          I suspect it would be somewhat challenging to do, but I'd love to see something like this where the contributions are bucketed into different levels of difficulty. It is often the case for me that a small percentage of the lines of code I write take a large percentage of the time I spend coding (and I assume this is true for most people).

  • escapecharacter 4 hours ago

    I've found AI to be useful on precisely-scoped tasks I might assign to a junior programmer to take a day to do, like "convert this exact bash script to a Powershell script".

    But in my own work, those tasks are pretty rare, like 3 times a month? Often I start working on something, and the scope and definition of success changes while I'm in the midst of it. Or it turns out to be harder than expected and it makes sense to timebox it and do a quick search for workarounds.

    As much as we joke about StackOverflow commenters sometimes telling a question-asker they shouldn't be doing what they're trying to do, you do actually want that (soft) pushback some of the time. Most modern LLMs will gleefully come up with a complete plan for how we're gonna get the square peg in the round hole.

    • philipswood 3 hours ago

      > you do actually want that (soft) pushback some of the time. Most modern LLMs will gleefully come up with a complete plan for how we're gonna get the square peg in the round hole.

      I once accidentally asked a local DeepSeek distilled model to do the wrong thing by accidentally copy pasting the wrong variable name.

      It told me how to do it, and then asked me if I was sure.

      My local DeepSeek R1 model (deepseek-r1:7b) saw me trying to do something stuupid (I was working with the wrong variable). It told me how to do what I asked and then asked:

      > _Is this modification part of a larger change you're making to the code? I'd like to make sure we're not modifying something important that might have side effects._

      Looking at its though process:

      > _The user wants to modify this string by replacing "-input" with "-kb". But looking at the ARN structure, I recall that AWS S3 ARNs are usually in the form arn<:aws:1151472526310103070>s3:::bucket_name RegionalPart path. The part after the bucket name is typically s3:// followed by the object key._ > _Wait, maybe the user has a specific reason to make this change. Perhaps they're formatting or structuring something for a different purpose. They might not realize that directly modifying ARNs can cause issues elsewhere if not done correctly._

  • smallerfish 10 hours ago

    I've done code interviews with hundreds of candidates recently. The difference between those who are using LLMs effectively and those who are not is stark. I honestly think engineers who think like OP are going to get left behind. Take a weekend to work on getting your head around this by building a personal project (or learning a new language).

    A few things to note:

    a) Use the "Projects" feature in Claude web. The context makes a significant amount of difference in the output. Curate what it has in the context; prune out old versions of files and replace them. This is annoying UX, yes, but it'll give you results.

    b) Use the project prompt to customize the response. E.g. I usually tell it not to give me redundant code that I already have. (Claude can otherwise be overly helpful and go on long riffs spitting out related code, quickly burning through your usage credits).

    c) If the initial result doesn't work, give it feedback and tell it what's broken (build messages, descriptions of behavior, etc).

    d) It's not perfect. Don't give up if you don't get perfection.

    • triyambakam 7 hours ago

      Hundreds of candidates? That's significant if not an exaggeration. What are the stark differences you have seen? Did you inquire about the candidate's use of language models?

      • smallerfish 6 hours ago

        Yes. I do async video interviews in round 1 of my interview process in order to narrow the candidate funnel. Candidates get a question at the start of the interview, with a series of things to work through in their own IDE while sharing their screen. I review all recordings (though I will skip around, and if candidates don't get very far I won't spend a lot of time watching at 1x speed.) The question as laid out encourages them to use all of the tools they usually rely on while coding (including google, stackoverflow, LLMs, ...).

        Candidates who use LLMs generally get through 4 or 5 steps in the interview question. Candidates who don't are usually still on step 2 by the end of the interview (with rare exceptions), without their code quality being significantly better.

        (I end up in 1:1 interviews with perhaps 10-15% of candidates who take round 1).

        • alextingle 4 hours ago

          Is the question actually difficult, though? If you ask for some standard task, then of course those who are leaning heavily on LLMs will do well, as that's exactly where they work best. That doesn't tell you anything about the performance of those candidates in situations where the LLM won't help them.

          I suppose, if you are specifically looking for coders to perform routine tasks, then you'll get what you need.

          Of course, you could argue that ~90% of a programmer's work day is performing standard tasks, and even brilliant programmers who don't use LLMs will lose so much productivity that they are not worth hiring... Counterpoint: IMO, the amount of code you bash out in a given time bears no relation to your usefulness as a programmer. In fact, producing lots of code is often a problem.

          • smallerfish 3 hours ago

            No, I'm not doing leetcode or algorithm questions - it's basically "build a [tiny] product to specs", in a series of steps. I'm evaluating candidates on their process, their effectiveness, their communication (I ask for narration), and their attention to detail. I do review code afterwards. And, bear in mind that this is only round 1 - once I talk with the ones who do well, I'll go deep on a number of topics to understand how well rounded they are.

            I think it's a reasonably balanced interview process. Take home tests are useless now that LLMs exist. Code interviews are very time consuming on the hiring side. I'm a firm believer that hiring without some sort of evaluation of practical competence is a very bad idea - as is often discussed on here, the fizzbuzz problem is real.

        • gammarator 2 hours ago

          So you’re not _interviewing_ them, you’re having them complete expensive work-sample tests. And your evaluation metric is “completes lots of steps in a small time box.”

          • anon22981 2 hours ago

            Seems more like trying to find out the most proficient LMM users than anything else. I’ve never done interviews but I imagine I’d be hard pressed to skip candidates solely because they aren’t using LLMs.

            Each to their own and maybe their method works out, but it does seem whack.

      • nsonha 6 hours ago

        if it's real that person interviewed at least one candidate per day last year. Idk what kind of engineering role in what kind of org where you even do that.

        • simonw 6 hours ago

          When I've had an open req for my team at a California tech company I've had days where I would interview (remotely) 2-3 candidates in a single day, several days a week for several weeks straight. It's not impossible to interview 100 people in a few months at that rate.

        • h4ny 6 hours ago

          There are companies whose product is high-quality mock interviews. I wouldn't be surprised by that number of interviews in just a year and it can easily be more than one candidate per day.

          Edit: there are also recruitment agencies with ex-engineers that do coding interviews, too.

    • jacobedawson 10 hours ago

      I'd add to that that the best results are with clear spec sheets, which you can create using Claude (web) or another model like ChatGPT or Grok. Telling them what you want and what tech you're using helps them create a technical description with clear segments and objectives, and in my experience works wonders in getting Claude Code on the right track, where it has full access to the entire context of your code base.

  • crabl 8 hours ago

    What I've noticed from my extensive use over the past couple weeks has been Claude Code really sucks at thinking things through enough to understand the second and third order consequences of the code that it's writing. That said, it's easy enough to work around its deficiencies by using a model with extended thinking (Grok, GPT4.5, Sonnet 3.7 in thinking mode) to write prompts for it and use Claude Code as basically a dumb code-spewing minion. My workflow has been: give Grok enough context on the problem with specific code examples, ask it to develop an implementation plan that a junior developer can follow, and paste the result into Claude Code, asking it to diligently follow the implementation plan and nothing else.

    • simonw 6 hours ago

      "Claude Code really sucks at thinking things through enough to understand the second and third order consequences of the code that it's writing"

      Yup, that's our job as software engineers.

    • TylerLives 6 hours ago

      This has been my experience as well. Breaking problems into smaller problems where you can easily verify correctness works much better than having it solve the whole problem on its own.

      • WD-42 6 hours ago

        you just described how a good developer works.

    • cglace 7 hours ago

      In all of these posts I fail to see how this is engineering anymore. It seems like we are one step away from taking ourselves out of the picture completely.

      • bckr 4 hours ago

        I don’t write binaries, assembly, or C. If I don’t have to write an application, I’m okay with that.

        I still have to write the requirements, design, and acceptance criteria.

        I still have to gather the requirements from stakeholders, figure out why those will or will not work, provision infra, figure out how to glue said infra together, test and observe and debug the whole thing, get feedback from stakeholders…

        I have plenty of other stuff to do.

        And if you automate 99% of the above work?

        Then the requirements are going to get 100Xed. Put all the bells and whistles in. Make it break the laws of physics. Make it never ever crash and always give incredibly detailed feedback to the end users. Make it beautiful and faster than thought itself.

        I’m not worried about taking myself out of the loop.

        • cglace 2 hours ago

          Thanks for sharing. I hope you are right. It's hard to stay objective as things are changing so quickly.

  • csomar 9 hours ago

    Yeah, it's so bad now I only trust my eyes. Everyone is faking posts, tweets and benchmarks that the truth no longer exists.

    I'm using Claude 3.7 now and while it improved on certain areas, it degraded on others (ie: it randomly removes/changes things more now).

  • clusterhacks 5 hours ago

    I am a slow adopter of new tech but plan to spend a significant amount of time in 2025 using AI tools when coding. I am net negative on AI simply replacing programmers, but I think the practice of development is undergoing a seismic shift at this point.

    My recent usage is oriented towards using pseudocode descriptions that closely map to Python to produce Python functions. I am very impressed with Claude 3.7's syntactic correctness when given a chunk of pseudocode that looks "python-y" to begin with.

    My one concern is that much of my recent code requirements lack novelty. So there is a somewhat reasonable chance that the tool is just spitting out code it slurped somewhere in github or elsewhere in the larger Internet. Just this week, I gave Claude a relatively "anonymous" function in pseudocode, meaning variable names were not particularly descriptive with one tiny exception. However, Claude generated a situationally appropriate comment as part of the function definition. This was . . . surprising to me if somehow the model had NOT in its training set had some very close match to my pseudocode description that included enough context to add the comment.

    • doug_durham 3 hours ago

      At this point very little code is "novel". Everyone is simply rewriting code that has already been written in a similar form. The LLM isn't slurping up and restating code verbatim. It is taking code that it has seen thousands of times and generating a customized version for your needs. It's hubris to think that anyone here is generating "novel" code.

      • clusterhacks 2 hours ago

        I have seen the argument that very little code is novel but I find it inherently unsatisfying and lacking in nuance? I think what bugs me about is that if you squint hard enough, all programming reduces to "take some data, do something to it." That "something" is doing a lot of heavy lifting in the argument that "something" is or isn't novel.

        Heck, if we think about it from the programming language perspective, all code is "simply" using already existing language functions to cobble together a solution to some specific set of requirements. Is no program novel?

        There is probably a consideration here that maybe boils down to the idea of engineering vs artisanal craftsmanship and where a specific project falls in that spectrum . . .

  • noufalibrahim 4 hours ago

    I'm in the same boat. I've found it useful in micro contexts but in larger programs, it's like a "yes man" that just agrees with what I suggest and creates an implementation without considering the larger ramifications. I don't know if it's just me.

  • julienmarie 9 hours ago

    I initially had the same experience. My codebase is super opinionated with a specific way to handle things. Initially it kept on wanting to do things it's way. I then changed my approach and documented the way the codebase is structured, how things should be done, all the conventions used and on every prompt I make sure to tell him to use these documents as reference. I also have a central document that keeps track of dependencies of modules and the global data model. Since I made these documents as reference developing new features has been a breathe. I created the architecture, documented it, and now it uses it.

    The way I prompt it is first I write the documentation of the module I want, following the format I detailed inbthe master documents, and ask him to follow the documentation and specs.

    I use cursor as well, but more as an assistant when I work on the architecture pieces.

    But I would never let an AI the driver seat for building the architecture and making tech decisions.

  • Balgair 6 hours ago

    Hey, I've been hearing about this issue that programmers have on HN a lot.

    But I'm in the more 'bad programmer/hacker' camp and think that LLMs are amazing and really helpful.

    I know that one can post a link to the chat history. Can you do that for an example that you are comfortable sharing? I know that it may not be possible though or very time consuming.

    What I'm trying to get at is: I suck at programming, I know that. And you probably suck a lot less. And if you say that LLMs are garbage, and I say they are great, I want to know where I'm getting the disconnect.

    I'm sincerely, not trying to be a troll here, and I really do want to learn more.

    Others are welcome to post examples and walk through them too.

    Thanks for any help here.

    • vlod 5 hours ago

      >and think that LLMs are amazing and really helpful

      Respectively, are you understanding what it produces or do you think that's its amazing because it produces something, that 'maybe' works.

      Here's an e.g. I was just futzing with. I did a refactor of my code (typescript) and my test code broke (vitest) and for some reason it said 'mockResolvedValue()' is not a function. I've done this a gazillion times.

      I allowed it via 3-4 iterations to try and fix it (I was being lazy and wanted my error to go away) and the amount of crap (rewriting tests, referenced code) it was producing was beyond ridiculous. (I was using github co-pilot).

      Eventually I said "f.that for a game of soldiers" and used by brain. I forgot to uncomment a vi.mock() during the refactor.

      I DO use it to fix stupid typescript errors (the error blob it dumps on you can be a real pita to process) and appreciate it when gives me a simple solution.

      So I agree with quite a few comments here. I'm not ready to bend the knee to our AI Overloads.

  • epolanski 10 hours ago

    I don't believe in those either, and I never see compelling YouTube videos showing that in action.

    For small stuff LLMs are actually great and often a lifesaver on legacy codebases, but that's more or less where it stops.

  • fullstackwife 10 hours ago

    "wild", "insane" keywords usually are a good filter for marketing spam.

    • belter 10 hours ago

      Influencer would be another term...

  • EigenLord 4 hours ago

    You've got to do piecemeal validation steps yourself, especially for models like Sonnet 3.7 that tend to over-generate code and bury themselves in complexity. Windsurf seems to be onto something. Running Sonnet 3.7 in thinking mode will sometimes reveal bits and pieces about the prompts they're injecting when it mentions "ephemeral messages" reminding it about what files it recently visited. That's all external scaffolding and context built around the model to keep it on track.

  • iambateman 6 hours ago

    I have a challenging, repetitive developer task that I need to do ~200 times. It’s for scraping a site and getting similar pieces of data.

    I wrote a worksheet for Cursor and give it specific notes for how to accomplish the task in a particular case. Then let it run and it’s fairly successful.

    Keep in mind…it’s never truly “hands off” for me. I still need to clean things up after it’s done. But it’s very good at figuring out how to filter the HTML down and parse out the data I need. Plus it writes good tests.

    So my success story is that it takes 75% of the energy out of a task I find particularly tedious.

    • WD-42 6 hours ago

      I haven’t found llm code gen to be very good except in cases like you mention here. When you need to do large boilerplatey code with a lot of hardcoded values or parameters. The kind of thing you could probably write a code generator yourself for if you cared enough to do it. Thankfully Llms can save us from some of that.

  • Delomomonl 6 hours ago

    I had Claude prototype a few things and for that it's really enjoyable.

    Like a single page HTML J's page which does a few things and saves it state in local storage with a json backup feature (download the json).

    I also enjoy it for doing things I don't care much but makes it more polished. Like I hate my basically empty readme with two commands. It looks ugly and when I come back to stuff like this a few days/weeks later I always hate it.

    Claude just generates really good readmes.

    I'm trying out Claude code right now and like it so far.

  • Kiro 7 hours ago

    Funny, because I have the same feeling toward the "I never get it to work" comments. You don't need any special prompt engineering so that's definitely not it.

  • babyent 3 hours ago

    I’ve dug into this a few times.

    Every single time they were doing something simple.

    Just because someone has decades of experience or is a SME in some niche doesn’t mean they’re actually good… engineers.

  • Ancalagon 3 hours ago

    Are you actually using claude? There's an enormous difference between claude code and copilot, with the latter being a bigger burden these days than a help.

  • moomin 10 hours ago

    I can definitely save time, but I find I need to be very precise about the exact behaviour, a skill I learned as… a regular programmer. Soper up is higher in languages I’m not familiar with, where I know what needs doing but not necessarily the standard way to do it.

  • sovietmudkipz 10 hours ago

    Maybe it’s the shot up plane effect; we only see the winners but rarely see the failures. Leads us to wrong or incorrect conclusions.

    Finding the right prompt to have current generation AI create the magic depicted in twitter posts may be a harder problem than most anticipate.

  • yodsanklai 9 hours ago

    > I do find value in asking simple but tedious task like a small refactor or generate commands,

    This is already a productivity boost. I'm more and more impressed about what I can get out of these tools (as you said, simple but tedious things). ChatGPT4o (provided by company) does pretty complex things for me, and I use it more and more.

    Actually, I noticed that when I can't use it (e.g. internal tools/languages), I'm pretty frustrated.

    • cglace 9 hours ago

      Are you concerned that these tools will soon replace the need for engineers?

      • yodsanklai 2 hours ago

        Yes, I used to be skeptical about the hype, but now I'm somewhat concerned. I don't think they will replace engineers but they do increase their productivity. I'm not able to quantify by how much though. In my case, maybe it increases my productivity by 5-10%, saving me a few hours of work each week. Very rough estimate.

        Does it mean that we'll need less engineers to perform the same amount of work? or we'll produce better products? In my company, there's no shortage of things to do, so I don't think we'll hire less people if suddenly engineers are a bit more productive. But who knows how it'll impact the industry as a whole.

  • egorfine 10 hours ago

    You have to learn and figure out how to prompt it. My experience with Claude Code is this: one time it produces an incredible result; another time it's an utter failure. There are prompt tips and tricks which have enormous influence on the end result.

    • ido 10 hours ago

      Can you give us some of these tips?

      • egorfine 10 hours ago

        Not that I have anything concrete in my mind yet. I'm learning as we all do. But after some usage I've developed a little bit of a hunch which prompt works and which not.

        For example, I have mindlessly asked Claude Code over a large codebase "where is desktop app version stored and how is it presented on site". I have expected useless answer given how vague the questions was. Instead I have got a truly exceptional and extremely clear report that fully covers the question.

        Another example. I have asked Claude Code to come up with a script to figure out funding rate time intervals on a given exchange and Code ended up in an almost endless loop running small test scripts in node.js to figure this out and came up with a super suboptimal and complicated solution. Turns out my prompt was too verbose and detailed and I have specifically asked Claude Code to figure out time intervals, not just get them. So it did. Instead of just querying the exchange via API and printing the list on terminal (3 lines script) it actually, truly tried to figure them out in various ways.

      • borgdefenser 7 hours ago

        You should also try the same prompt multiple times to see how this works.

        Sometimes you will get better or worse answers completely by chance.

        I think Claude is pretty good if you have it write a function and give it the inputs, output and a data example. You can also put to ask clarifying questions as needed because there is a good chance there are aspects of the prompt that are ambiguous.

        My prompts are always better if I write them in a separate text file and then paste them in too. I think I just take my time and think things out more that way instead of trying to get to the answer as fast.

  • razemio 10 hours ago

    Can you clarify what tools and programming language you use? If find that the issue often is wrong tooling, exotic programming languages or frameworks.

    • develoopest 10 hours ago

      I would consider frontend tasks using Typescript and React quite standard.

      • razemio 9 hours ago

        React in my experience sucks with AI. In fact I have not yet encountered a "heavy" framework which works well. Use something light like svelte.

        Typed programming languages like Typescript, Scala, Haskell and so on will produce more errors -> you need to fix stuff manually. However it will also reduce bugs. So it is a mixed bag. For an error free experience python and JavaScript work very well.

        When it comes to tooling, if you haven't used cline, roocode or aider (not as good) yet, you haven't seen what an AI can do.

        A good starter would be starting fresh, by creating a README which describes the hole application you want to build in detail and let the AI decide the tech stack. You can most certainly build complex applications with an AI in blazing speed.

  • kolbe 6 hours ago

    I am willing to say I am a good prompt engineer, and "AI takes the wheel" is only ever my experience when my task is a very easy one. AI is fantastic for a few elements of the coding process--building unit tests, error checking, deciphering compile errors, and autocompleting trivially repetitive sections. But I have not been able to get it "take the wheel"

  • nsonha 6 hours ago

    this space is moving really fast, I suggest before forming an definitive opinion try the best tool, such as the latest Claude model and use "agentic" mode or the equivalence on your client. For example, on Copilot this mode is brand new and only available in vscode insider. Cursor and other tools have had it for a little longer.

    • collingreen 4 hours ago

      People have been saying it writes amazing code that works for far longer than that setup has been available though. Your comment makes me think the product is still trying to catch up to these expectations people are setting.

      That being said I appreciate your suggestion and will consider giving that a shot.

chaosprint 9 hours ago

It seems the original poster hasn't extensively tried various AI coding assistants like Cursor or Windsurf.

Just a quick heads-up based on my recent experience with agent-based AI: while it's comfortable and efficient 90% of the time, the remaining 10% can lead to extremely painful debugging experiences.

In my view, the optimal scenarios for using LLM coding assistants are:

- Architectural discussions, effectively replacing traditional searches on Google.

- Clearly defined, small tasks within a single file.

The first scenario is highly strategic, the second is very tactical. Agents often fall awkwardly between these two extremes. Personally, I believe relying on an agent to manage multiple interconnected files is risky and counterproductive for development.

  • hashmap 6 hours ago

    This has been my experience as well. I find that the copy/paste workflow with a browser LLM still gets me the most bang for the buck in both those cases. The cli agents seem to be a bit manic when they get hold of the codebase and I have a harder time corralling them into not making large architectural changes without talking through them first.

    For the moment, after a few sessions of giving it a chance, I find myself using "claude commit" but not asking it to do much else outside the browser. I still find o1-pro to be the most powerful development partner. It is slow though.

  • sbszllr 6 hours ago

    > In my view, the optimal scenarios for using LLM coding assistants are:

    > - Architectural discussions, effectively replacing traditional searches on Google.

    > - Clearly defined, small tasks within a single file.

    I think you're on point here, and it has been my experience too. Also, not limited to coding but general use of LLMs.

  • intrasight 5 hours ago

    > extremely painful debugging experiences.

    I'd claim that if you're debugging the code - or even looking at it for that matter - that you're using AI tools the wrong way.

    • chaosprint 5 hours ago

      I'd be very interested to know of a way to make it work with AI that doesn't require debugging if you can illustrate.

    • collingreen 4 hours ago

      This is exactly my impression of the summary of these kinds of posts and, I'm speculating here, maybe where there is such a stark difference.

      I'm guessing that the folks who read the output and want to understand it deeply and want to "approve" it like a standard pull request are having a very different perspective and workflow than those who are just embracing the vibe.

      I do not know if one leads to better outcomes than the other.

      • esafak 3 hours ago

        Are you serious? Why not just vibe work with your human coworkers and merge to master then? Let's see what the outcome is!

        • collingreen 2 hours ago

          > Are you serious?

          I am serious and didn't think anything I said here was contentious. Which part are you feeling incredulity over? I'll try to clarify if I've been unclear or learn from your perspective.

          • esafak 2 hours ago

            You seem to be unsure if checking the code is likely to lead to better outcomes.

  • tomnipotent 7 hours ago

    The author works on Cody at Sourcegraph so I'll give him the benefit of the doubt that he's tried all the major players in the game.

  • finolex1 3 hours ago

    He literally says in his post "It might look antiquated but it makes Cursor, Windsurf, Augment and the rest of the lot (yeah, ours too, and Copilot, let's be honest) FEEL antiquated"

rs186 10 hours ago

A single tweet with lots of analogy, with no screenshot/screen recording/code examples whatsoever. These are just words. Are we just discussing programming based on vibe?

  • frankc 19 minutes ago

    I think the interest has more to do with who is doing the tweeting, don't you think?

  • delusional 10 hours ago

    It's influencer culture. It's like when people watch those "software developer" youtubers and pretend it's educational. It's reality television for computer people.

    • mpalmer 8 hours ago

      Reality television plus cooking show, exactly.

      • macNchz 6 hours ago

        Cooking shows are a perfect analogy for this stuff. For some reason I never connected the highly-edited-mass-appeal "watch someone do skilled work" videos on YouTube with Food Network style content until just now, but you're right they're totally scratching the same basic itch. They make people feel like they're learning something just by watching, while there is really no substitute for actually just doing the thing.

        • mpalmer 5 hours ago

          Not to mention cooking show hosts often recommend or outright sell/endorse their tools.

    • tylerrobinson 9 hours ago

      > reality television for computer people

      Complete with computer people kayfabe!

  • kleiba 6 hours ago

    What, someone cannot utter an opinion anymore?

    • h4ny 6 hours ago

      I find that question ironic.

      • kleiba an hour ago

        But isn't that the point?

bob1029 5 hours ago

I find that maintaining/developing code is not an ideal use case for LLMs and is distracting from the much more interesting ones.

Any LLM application that relies more-or-less on a single well-engineered prompt to get things done is entry level and not all that impressive in the big picture - 99% of the heavy lifting is in the foundation model and next token prediction. Many code assistants are based on something like this out of necessity of needing to support anybody's code. You can't rely on too many clever prompt chaining patterns to build optimizations for Claude Code because everyone takes different approaches to their codebase and has wildly differing expectations for how things should go down. Because the range of expectations is so vast, there is a lot of room to get disappointed.

The LLM applications that are most interesting have the model integrated directly with the product experience and rely on deep domain expertise to build sophisticated chaining of prompts, tool calling and nesting of conversations. In these applications, the user's experience and outcomes are mostly predetermined with the grey areas intended to be what the LLM is dealing with. You can measure things and actually do something about it. What was the probability of calling one tool over the other in a specific context of use? Placing these prompts and statistics alongside domain requirements will enable you to see and make a difference.

phartenfeller 10 hours ago

I tried it too and tasked it to do a bigger migration (one web framework to another). It failed pretty bad where I stopped the experiment. It still gave me a headstart where I can take parts and continue the migration manually. But the worst thing was that it did things I didn't asked for like changing the HTML structure and CSS of pages and changing hand picked HEX color codes...

More about my experience on my blog: https://hartenfeller.dev/blog/testing-claude-code

raylad 6 hours ago

I tried it on a small Django app and was not impressed in the end.

It looks like it’s doing a lot, and at first I was very impressed, but after a while I realized that when it ran into a problem it kept on trying nonworking strategies even though it had tried them before and I had added to claude.md instructions to keep track of strategies and not reuse failing ones.

It was able to make a little progress, but not get to the end of the task, and some of its suggestions were completely insane. At one point there was a database issue and it suggested switching to an entirely different database than the one that was already used by the app, which was working and production.

$12 spent in a couple of hours later, it had created 1200 lines of partially working code and rather of a mess. I ended up throwing away all the changes and going back to using the web UI.

  • babyent 3 hours ago

    Now take your $12 and multiply it by 100k people or more trying it.

    Even if you won’t use it again, that’s booked revenue for the next fundraise!

hleszek 10 hours ago

I must have been a little too ambitious with my first test with Claude Code.

I asked it to refactor a medium-sized Python project to remove duplicated code by using a dependency injection mechanism. That refactor is not really straightforward as it involves multiple files and it should be possible to use different files with different dependencies.

Anyway, I explain the problem in a few lines and ask for a plan of what to do.

At first I was extremely impressed, it automatically used commands to read the files and gave me a plan of what to do. It seemed it perfectly understood the issue and even proposed some other changes which seemed like a great idea.

So I just asked him to proceed and make the changes and it started to create folders and new files, edit files, and even run some tests.

I was dumbfounded, it seemed incredible. I did not expect it to work with the first try as I had already some experience with AI making mistakes but it seemed like magic.

Then once it was done, the tests (which covered 100% of the code) were not working anymore.

No problem, I isolate a few tests failing and ask Claude Code to fix it and it does.

Now for a few times I found some failing tests and ask him to fix it, slowly trying to fix the mess until there is a test which had a small problem: it succeeded (with pytest) but froze at the end of the test.

I ask again Claude Code to fix it and it tries to add code to solve the issue, but nothing works now. Each time it adds some bullshit code and each time it fails, adding more and more code to try to fix and understand the issue.

Finally after $7,5 spent and 2000+ lines of code changed it's not working, and I don't know why as I did not make the changes.

As you know it's easier to write code than to read code so at end I decided to scrape everything and do all the changes myself little by little, checking that the tests keep succeeding as I go along. I did follow some of the recommended changes it proposed tough.

Next time I'll start with something easier.

  • jpc0 4 hours ago

    Really yoy nearly got the correct approach there.

    I generally follow the same approach these days, ask it to develop a plan then execute but importantly have it excute each step in as small increments as possible and do a proper code review for each step. Ask if for changes you want it to make.

    There is certainly times I need to do it myself but definitely this has improved some level of productivity for me.

    It's just pretty tedious so I generally write a lot of "fun" code myself, and almost always do the POC myself then have the AI do the "boring" stuff that I know how to do but really don't want to do.

    Same with docs, the modern reasoning models are very good at docs and when guided to a decent style can really produce good copy. Honestly R1/4o are the first AI I would actually concider pulling into my workflow since they make less mistakes and actually help more than they harm. They still need to be babysit though as you noticed with Claude.

  • elcomet 5 hours ago

    I'm wondering if you can prompt it to work like this - make minimal changes, and run the tests at each step to make sure the code is still working

  • darkerside 9 hours ago

    I'm curious for the follow up post from Yegge, because this post is worthless without one. Great, Claude Code seems to be churning out bug fixes. Let's see if it actually passes tests, deploys, and works as expected in production for a few days if not weeks before we celebrate.

  • UncleEntity 5 hours ago

    > ...do all the changes myself little by little, checking that the tests keep succeeding as I go along.

    Or... you can do that with the robots instead?

    I tried that with the last generation of Claude, only adding new functionality when the previously added functionality was complete, and it did a very good job. Well, Claude for writing the code and Deepseek-R1 for debugging.

    Then I tried a more involved project with apparently too many moving parts for the stupid robots to keep track of and they failed miserably. Mostly Claude failed since that's where the code was being produced, can't really say if Deepseek would've fared any better because the usage limits didn't let me experiment as much.

    Now that I have an idea of their limitations and had them successfully shave a couple yaks I feel pretty confident to get them working on a project which I've been wanting to do for a while.

noisy_boy 10 hours ago

The trick is not to get sucked into making it do 100% of the task and have a judgement of the sweet spot. Provide it proper details upfront along with the desired overall structure - that should settle in about 10-15 mins of back and forth. This must include tests that you have to review manually - again you will find issues and lose time again (say about 30-45mins). Cut your losses and close the lose ends of the test cide. Now run the tests and start giving it discreet tasks to fix the tests. This is easily 20-40 mins. Now take over and go through the while thing yourself because this is where you will find more issues upon in-depth checking (the LLM has done most of what it could) and this where you must understand the code you need to support.

credit_guy 5 hours ago

I'm using Copilot for writing documentation jupyter notebooks. I do lots of matplotlib plots. Setting up these plots takes lots of repetitive lines of code. Like plt.legend(). With Copilot these lines just show up, and you press tab and move on. Sometimes it is freaky how it guesses what I want to do. For this type of work, Copilot increases my productivity by a factor of 5 easily.

There are other types of work where Copilot is useless. But it's up to me to take the good parts, and ignore the bad parts.

  • bglazer 5 hours ago

    Yeah copilot is very good for matplotlib. Clunky interface with lots of repetitive code, but also tons of examples on the internet means that I almost never write matplotlib code by hand anymore.

dabinat 4 hours ago

I used GitHub Copilot and Udemy to teach myself Rust. Copilot was especially helpful at resolving obtuse compiler error messages.

But as I’ve improved at Rust I have noticed I am using Copilot a lot less. For me now it has mainly become a tool for code completion. It’s not helping me solve problems now, it’s really just about saving time. I have estimated (unscientifically) that it probably improves my productivity 2-4x.

macrolime 2 hours ago

Can Claude Code also be a devops agent or is it only for coding?

I currently use Cursor as a devops agent, I use the remote ssh extension to ssh into a VM, then Cursor will set up everything, I make snapshots on way in case it fucks up. It's been really great to quickly be able to setup and try out different infrastructures and backends in no time at all. It works well enough that I now do all my development using remote dev with ssh or remote containers on the a server. Having a virtualized dev environment is a great addition to just having git for the code.

mtlynch 10 hours ago

This is particularly interesting, as Steve Yegge works on (and I think leads) Sourcegraph Cody[0], which is a competitor to Claude Code.

Cody does use Claude Sonnet, so they do have some aligned interests, but it's still surprising to see Yegge speak so glowingly about another product that does what his product is supposed to do.

[0] https://sourcegraph.com/cody

  • esafak 3 hours ago

    Cody lets you pick your model.

  • manojlds 10 hours ago

    Rising tide lifts all the boats and all that.

    Claude Code didn't feel that different to me, and maybe they have something that is better and when they do release it they can say hey look, we pushed hard and have something that's better than even Claude Code.

  • mechanicum 6 hours ago

    I mean, doing that is pretty much what made him (semi-)famous in the first place (https://gist.github.com/chitchcock/1281611).

    • mtlynch 6 hours ago

      Yeah, but it's pretty different complaining from the position of a rank and file engineer at what was then like a 50k-person org as opposed to praising a competitor's product when you're at a small company, and you're the public face of your product.

    • istjohn 5 hours ago

      Thanks, I never read that one. Yegge's writing is just delicious. He could write a guide to watching paint dry and I would savor every word.

gaia 11 hours ago

Agreed it is a step above the rest, but I find that it still needs some oversight you'd hope it doesn't. Simple stuff like rewriting stuff differently which yields the same result (to the naked eye, no need to even test it), to having to call it out when it misses things (update other scripts with the same problem, reflect the changes in the README or requirements.txt) and to ask it to try harder to solve an issue in a better way.

Sometimes it takes the easy way out. If you look up your billing, you will see sometimes it uses the 3.5 instead of the 3.7 API, maybe this has something to do with it. It apologies ("you are correct", "yes that was hasty") but I'd rather have it try harder every time, not only when called out (and foot the bill that goes with that ofc).

But overall the experience is great. I hope to use it for a few months at least until something else comes along. At the current pace, I switch subscriptions/API/tool at least once a month.

BeetleB 4 hours ago

For everyone complaining about the cost: Try Aider instead. You can easily limit the context and keep costs down.

A coworker tried both and found Code to be 4x the cost because it doesn't easily let you limit the context.

  • tysonworks 3 hours ago

    That's my experience as well. Claude Code/Cline — these tools are just a way to burn money. With Aider, my spending is minimal and I get exactly what I need.

trescenzi 7 hours ago

I’m sorry what is happening with this paragraph:

> As long as the bank authorizations keep coming through, it will push on bug fixes until they're deployed in production, and then start scanning through the user logs to see how well it's doing.

I enjoy using these tools. They help me in my work. But the continual hype makes the discussion around them impossible to be genuine.

So I ask, genuinely, did I miss the configuration section where you can have it scan your logs for new errors and have it argue with you on PRs? Is he trying to say something else? Or is it just anthropomorphizing hype?

  • frankc 7 hours ago

    I haven't got to trying claude code yet, but absolutely with cursor and windsurf you can have the agent be reading the output of what it writes and runs and it can fix things it sees. You can also have it review code. It also help some times to have it review in a fresh chat with less context. I really think a lot of people on HN are not really pushing on everything that is available. Its magic for me but I spend a lot of effort and skill manifesting the magic. I'm not doubting other people's experience really, but wondering if they are giving up too fast because they actually don't want it work well for ego reasons.

    • techpineapple 7 hours ago

      I’m going to keep at it, because I was trained as an SRE, not a developer and have lots of ideas for side projects that have thus far taken a long time to get going, but I’ve been struggling, it sort of quickly gets into these infinite loop situations where it can’t seem it can’t seem to fix a feature and goes back and forth between multiple non working states. CSS layouts but even basic stuff like having the right web socket routes.

      We’ll see, maybe my whole approach is wrong, I’m going to try with a simpler project, my first approach was relatively complex.

      • ezekiel68 4 hours ago

        I can't explain why, but I do get pretty good results with closing a prompt session completely and then initiating a fresh session later on. I have actually seen quite different code from the very same prompt across two sessions.

        However, the extra time between sessions does give me the chance to consider where the AI might have gone wrong in the first session and how I could have phrased the initial prompts more effectively.

        As others have stated throughout the threads here, I definitely recommend giving it smaller nibbles of problems. When I take this route and get working modules, I will start a fresh prompt session to upload the working code module files and ask it to integrate them into a simple solution with static inputs in a 'main' function. (Because providing coherent inputs from the router function inputs of a web service are simple enough, once these basics are covered)

        Basically - I do everything possible in order for the AI to not get distracted or go down a rabbit hole, chasing concerns that humans are very capable of taking care of (without paying tokens for). I will write most of the tests myself and then perhaps afterwards ask it to evaluate the test coverage and provide more of them. This way the new tests are in the format and platform of my choosing.

    • trescenzi 7 hours ago

      Oh ok this makes sense. Because of the ordering of the sentences I read it as “it pushes the code to production and then monitors it in production”.

      I have found that prompting something like “do X and write tests to confirm it works” works well for what you’re describing. Or even you write the tests then it’ll iterate to make sure they pass.

  • deanputney 5 hours ago

    I cannot tell if the original tweet is sarcasm or not. Sections like this make me think yes? It's got to be at least tongue-in-cheek.

jonwinstanley 3 hours ago

Do any of the current AI systems allow you to use voice?

I’d love to sometimes chat to an agent and dictate what I want to happen.

For one project there’s a lot of boilerplate and I imagine AI could be really fast for tasks like: “create a new controller called x”, “make a new migration to add a new field the to users table called x” etc

theusus 4 hours ago

Hilarious claim without any proof. Sure I believe you

999900000999 10 hours ago

It’s ok.

It’s expensive and is correct for the easy 90%. It messes up the hard 10%.

I guess with it rapidly improving, I should wait 3 months. What’s frustrating is when you spend 20$ in credits on it writing non functional code.

That said, every programmer needs to at least demo these tools

  • owenthejumper 10 hours ago

    Feels like you shouldn't be using $20 of credits to produce non functional code.

    I use Aider chat with Claude and my sessions are much smaller. You need to ask it much smaller tasks and work incrementally.

    • 999900000999 10 hours ago

      If you already have a moderately complex code base it starts making mistakes.

      • MyOutfitIsVague 6 hours ago

        Depends a lot on what context you feed it. If you have decent internal documentation and can feed it the right files, it tends to do quite well for me, often needing corrections or style fixes. To use it most effectively, you have to know the vast majority of the code that you want it to write ahead of time. It saves time typing, looking up methods, wrangling APIs appropriately, and sometimes it surprises me by working around an edge case or writing an appropriate abstraction that I hadn't considered, but most of the time, it's mostly saving time on typing and tedious gluing, even with the corrections I make and hallucinations I have to correct.

        Maybe it will be at some point, but it is not yet a reasonable substitute for having to think, plan, or understand the code you are working with and what needs to be done. It's also still more expensive than it needs to be to use on my free time (I'm happy to burn company money on it at work, though).

SamCritch 8 hours ago

I asked it to summarise my repo. It did a pretty good job. Then I asked it to see if it could summarise a specific function, but it said my $5 was up. Now I need to find whoever's in charge of our company account and spend half an hour busking in front of their desk to get some more credit.

Until then I'm sticking to a combination of Amazon Q Developer (with the @workspace tag in Intellij), ChatGPT and Gemini online.

vander_elst 5 hours ago

Are there any videos showing these very advanced use cases? I'd be interested in learning how to achieve this level of proficiency. At the moment I still feel I'm better off without ai

  • MarkMarine 5 hours ago

    Claude code doesn’t need magic prompts. It’s not perfect but holy moly is it good, when it’s working it just one shots things for you. It’s just EXPENSIVE. I’ve spent 30$ in tokens on it this week.

    Cursor’s “chat with your codebase” is a funny joke compared to Claude Code. Ask it questions, have it figure things out.

    I had it analyze the openAPI schema and the backend that is serving the schema for the API I’m writing and write end to end tests for the API. Then I did my normal meetings and it was done with huge chunks of it, it had run the code locally and tested against the actual endpoints to understand if the end to end tests were working or it had found a bug. Then it fixed the bugs it found in the backend codebase. My prompt: “write me end to end tests against my openAPI schema”

    That was it. 30$ in tokens later, pressing enter a bunch of times to approve its use of curl, sed, etc…

    • baal80spam 33 minutes ago

      > Claude code doesn’t need magic prompts. It’s not perfect but holy moly is it good, when it’s working it just one shots things for you. It’s just EXPENSIVE. I’ve spent 30$ in tokens on it this week.

      As long as it costs less than a developer, it will be used instead of the said developer.

    • vander_elst 4 hours ago

      Thanks, but to me this feels too high level I'd really need to see a video of such things to better understand what's going on, how much time it took end to end, what was the quality. Yes I could spend 3 days and 300 bucks playing around with the tool, but I'd prefer to learn things offline first.

      • MarkMarine 3 hours ago

        That is all right by me, we need the part of the adoption curve that isn’t at the leading edge.

    • BeetleB 4 hours ago

      Use Aider to keep the costs low. You can explicitly tell it what files to use.

    • roflyear 5 hours ago

      What are some examples of the bugs it fixed?

      • MarkMarine 4 hours ago

        Accepting an epoch timestamp rather than 3339 like the rest of the codebase is the most recent example. The API standardized on one timestamp and I had an endpoint expecting epoch, it was smart enough to infer the rest of the APIs were 3339 and go change the epoch timestamp accepting backend route including writing a database migration and fixing tests. I haven’t pushed the fix yet

        So think about the reasoning there. It read the API docs that said all the timestamps were 3339, it ran a bunch of working 3339 timestamps and was getting them back from most of the APIs, so it decided this was a standard, read the code on the backend and understood this endpoint was different, and decided on the right fix and implemented it. Didn’t need to ask me for help. That is pretty impressive if you’ve been using copilot.

bn-l 8 hours ago

It really feels like I’m in an alternate reality with these posts. Is this paid shilling? I’m honestly wondering now considering how different my experience is EVERY DAY with llms for code.

dcre 6 hours ago

I think he’s wrong about it looking antiquated. It is one of the most beautifully done bits of TUI design I’ve seen.

thyrsus 4 hours ago

Do these AIs know how to do test driven development? Can you tell them the code generated must pass these test? Can AIs assist in developing tests?

  • esafak 3 hours ago

    Yes, absolutely.

mritchie712 9 hours ago

it's fun for things you're ok with throwing away.

For example, I wanted a dbt[0] like tool, but written in rust, specifically focused on duckdb. Claude Code knocked it out[1] it without much guidance.

Also added support for all duckdb output options (e.g. write to a partitioned parquet instead of a table).

0 - SQL transformation tool (https://github.com/dbt-labs/dbt-core)

1 - https://github.com/definite-app/crabwalk

skerit 11 hours ago

I've successfully used it to fix a few issues here and there, but it also manages to make some pretty stupid mistakes. A few times it even started rewriting tests in a way so that the wrong outcome would be seen as a pass.

  • epolanski 10 hours ago

    Kinda reminds me of how when it finds issues with typescript it hacks the types rather than refine the values or business logic.

jtwaleson 9 hours ago

I haven't tried Claude Code yet, so forgive my ignorance, but does it integrate with linters as well as Cursor does? I've seem excellent results on my Rust & Typescript codebase where it started making a change in a wrong direction, but quickly fixed it when it got linting errors. It seems that a standalone CLI tool like Claude Code would struggle with this.

adamgroom 10 hours ago

I find AI code completion can be annoying and misleading at times. I do find that I spend less time typing though, it’s great at guessing the simple stuff.

egorfine 10 hours ago

Sidenote: thank you for using twitter dot com instead of x. This is a little detail and I'm sure I'm not the only one appreciating it.

  • Philpax 9 hours ago

    For me, Twitter as we knew it has been long dead, and X is the shambling, corrupt corpse that's taken its place.

    I no longer mind referring to it as X, because that clearly outlines that it's a different website with a much more rancid vibe.

    • egorfine 8 hours ago

      That's another take. Never thought of it that way.

  • hleszek 8 hours ago

    We should use xcancel instead.

    • tom_ 6 hours ago

      There is also nitter.poast.org. There sites are possibly better than twitter.com or x.com, as more of the thread is made visible to non-users.

tifik 11 hours ago

The second paragraph is clearly sarcastic, but the rest seems genuine, so Im a bit confused.

  • jofzar 10 hours ago

    I was confused untill I watched a video of it in use, nope it wasn't that sarcastic.

    https://youtu.be/W13MloZg03Y

    • cpldcpu 10 hours ago

      That guy leads in with stating that he is missing "autocomplete" in claude code. Cleary a misunderstanding of the scope.

      • jofzar 10 hours ago

        I mostly posted the video because I thought it was a good demonstration of the prompts to approve/cost that the tweet was talking about.

        He talks about it later of using autocomplete vs complete writing, I don't think he misunderstood the scope just talking about the difference of ways ai can be used for coding assistant.

  • throwaway314155 10 hours ago

    The second paragraph isn't sarcastic. At least not w.r.t. to Claude Code. That bit there about North Korean hackers is mild sarcasm, but has no bearing on the remainder of the post.

dhumph 10 hours ago

I find cline to be incredible about 80% of the time for creating new concept websites or python scripts. I use it with Open router and choose Claude exclusively. I’d CC is the next step we are headed in a crazy and scary direction.

kthxb 10 hours ago

As a junior, this scares me. I don't think I'll be out of a job soon, but certainly the job market will change drastically and the times where SWEs are paid like doctors and lawyers will end?

  • jtwaleson 9 hours ago

    Make sure you learn a lot. Ask the LLMs to explain anything you don't deeply understand. With all of these coding assistants, there will be many juniors that get a lot done, but don't really understand what they are doing, and their worth will drop quickly.

    So far LLMs are great at producing code, but not at architecture or maintenance.

    • timeon 9 hours ago

      Learning just by asking is not enough. One needs to exercise to build those muscles.

      I'm just restating the obvious because LLMs can do exercise for you, but there is not much to be gained if one follows this path.

emalafeew 8 hours ago

Somebody here said programming is about designing and reusing libraries rather than coding from scratch like LLMs. But that choice has always been a tradeoff between abstraction and bloat vs performance and debuggability. Writing 50 lines of intentional code is often preferable to adapting a 50000 line library. The trick to using LLMs is to not ask for too much. Claude 3.5 could reliably code tasks that would have taken 5-10 min by hand. Claude 3.7 is noticeably better and can handle 10-30 min tasks without error. If you push those limits you will lose instead of save time.

motorest 10 hours ago

Does anyone know how Mistral fares against Claude Code?

jamil7 10 hours ago

How does it compare to Aider with the same model?

relaxing 7 hours ago

Does anyone know a good video of someone demonstrating their workflow with these coding assistants?

jgalt212 4 hours ago

That post from Yegge reads like it was written by some foaming at the mouth VC. I am getting decent use from Claude, but mostly as a stack overflow replacement. I will never let it read our company's code base. Then everyone will know how to do what we do. After all, that's why these things are so good at React and not so good at Solid (there's just so much public react code). Also, see the recent "AI IS STIFLING TECH ADOPTION" post

https://vale.rocks/posts/ai-is-stifling-tech-adoption

https://news.ycombinator.com/item?id=43047792

andrewstuart 8 hours ago

I had to give up my attempt to use Claude code when it didn’t let me specify my API key and password, instead requiring me to sign into a user account first which then forced creation of an API account ?

Something like that. Anyhow Claude needs to allow me to put in my own API keys.

ant6n 10 hours ago

I don't get to code much these days, so I mostly use chatGPT to get occasional helps with scripts and whatnot. I've tried to get help from ChatGPT on a simple javascript/website project that is basically a css/js/html file, but I feel like I don't know how to give the chatbot the code except just pasting it in the prompt. Is Claude better for that, like does it have some IDE or something to play with? Or do people generally use some third party tool for integrations.

bflesch 10 hours ago

I can't take people seriously who are still using twitter, it weakens their arguments.

  • vlod 5 hours ago

    That's quite a wide brush you're painting with.

    I think it would be useful if you provide a reason for 'why'.

    Quite a few people know how to use mutes, lists etc to get the best from it.

    If all you click on is rage-bait articles, hate-driven politics or people promoting their Only Fans pages with not subtle imagery, then the Algorithm will keep feeding you more of it.

  • bentobean 8 hours ago

    First it was Facebook, now it’s Twitter / X. What you’re essentially saying is - “I can’t take a platform seriously where many / most of the people disagree with me.”

    • bflesch 8 hours ago

      [flagged]

      • sejje 6 hours ago

        You think Steve Yegge (author of the post) is a moron?

      • dboreham 7 hours ago

        I've noticed that the morons came first. Facebook will surface them given a sufficiently large comment thread. There's no escaping them.

  • spiderfarmer 10 hours ago

    I have the same thought. I also want all politicians to stop using it. It might have been useful in the past but I’m now convinced it’s largely useless. A large part of the population actively shuns the website and the remainder is mostly people who love rage bait.

KeplerBoy 10 hours ago

The comment "@grok can you summarize" kills me. This post is like 200 words and takes a minute to read. Is this the direction we (or some of us) are headed?

  • ramblerman 10 hours ago

    lol - I thought the same, but the charitable take after looking at the user's profile is that he is a journalist, and not tech savvy.

    So I understand his request as more along the lines of, can you explain this in a way that I understand it. For which summarization is the wrong phrasing.

    ie.. It seems trivial on hacker news, but that post would be pure giberish for most of our parents.

  • joshmlewis 4 hours ago

    It's become very commonplace for there to be a dozen replies on viral posts where users all @grok to ask if the post is real or for more context. It's almost more work to compose a reply asking Grok about it than it is to just click the Grok button and it give you more context that way. I don't get it.

  • hhh 9 hours ago

    I see this a lot, I think most of these people would have just scrolled on otherwise. I don’t get it.