AgentMatrixAI 2 days ago

Prompt quality and knowing your domain is critical. One issue I had early on was experimenting with LLMs to generate a frontend application in a brand new framework I was unfamiliar with (Svelte at the time) which lead to situations where I would cruise along and get stuck in a loop. The other issue came from the increasing context size that led to more unpredictable behaviors (i would ask it to change the color of a button and it would completely change the entire page).

almost all the tools i've used to date for designing frontend framework, none really replaces using cursor and being able to dive deep, however cline does seem to have gotten significantly better.

the day where you can come back to a fully working web app with moderate complexity after cleaning the gutter is still some way off but thats the dream

  • bandrami 2 days ago

    How much of this is a cage we've built for ourselves? When I made my first website in 1998 even if we had had LLMs asking one how to change the color of a button would be ludicrous because I'd change an HTML attribute (did CSS exist back then? don't remember) either in a static file or in a perl script. That's it. And the modern "throw a hundred and seven frameworks at it" stack doesn't really do anything that simple CGI CRUD app couldn't do.

    • DecoySalamander 2 days ago

      What has changed over the years is our expectations and requirements. Sure, you can skip "all the noise" and slap an attribute on your button, but then you'll have to track down another 20 buttons in your codebase that need a color update. You can be smart with your approaches, but at a certain project size and complexity, frameworks are unavoidable. The only choice you really have is to learn an established one or handcraft something new for every thing you make.

      • skydhash 2 days ago

        That’s the problem template languages solve. That button could be a partial, and you include it where you want a button. And with grep, you can track it down.

        • qup 2 days ago

          Yes, and then you solve a dozen problems like this, bundle it up, and you've created a framework.

          • skydhash a day ago

            The problem with frameworks is with feature/need fit. Sometimes you need something simple and using a framework results in a big percentage of dead code. Or you extend it enough that there’s no longer any benefits.

        • bandrami a day ago

          One of Allaire’s selling points for CFML was that their IDE would track all the matching partials in a project if you edited one

  • energy123 2 days ago

    Same experience with context size and prompt quality.

    Some other things I picked up:

    - If you formulate a good prompt with small (but sufficient) context and it still makes mistakes after one attempt to feed the error message back to it, it's probably not going to be able to ever get it, no matter how many iterations you do with it. It will get stuck in a rut forever. Better not to argue with it.

    - o1-2024-12-17 is genuinely a big step change.

    • Mtinie 2 days ago

      When I catch an agent falling into a “debug loop” I ask it to summarize its understanding of the error. Next, I ask it to shift its context to a higher level in the code and look for patterns across my files which would impact {summary of error).

      It isn’t foolproof but it has a much better success rate for me than letting it spin.

  • ilrwbwrkhv 2 days ago

    So basically you use only tab completion fill in the middle?

  • m3kw9 2 days ago

    By that time you can have your personal humanoid robot clean your gutter. You just do nothing but prompt, what a life that’s gonna be.

    • ImHereToVote 2 days ago

      Can't prompting be automated?

      • baq 2 days ago

        Welcome to the agentic age, where prompting can be just barely automated, but tokens are so cheap it doesn’t matter. Fresh off the presses as of last week

ItsBob 2 days ago

Anecdotal but here's how I described using the likes of Copilot to my sceptical colleagues (they were late to the party!):

It's like having a senior software dev over your shoulder. He knows pretty much everything about coding but he often comes to work drunk...

And that was the best analogy I could come up with: I think it's sped up my work enormously because I limit what it does, rather than let it loose... if that makes sense.

As an example, I was working on a C# project with a repository layer with a bunch of methods that just retrieved one or two values from the database, e.g. GetUsernameFromUserGuid, GetFirstnameFromUserGuid and so on. They each had a SQL query in them (I don't use ORM's...).

They weren't hard to write but there were quite a few of them.

Copilot learned after the first couple what I was doing so I only needed to type in "GetEmail" and it finished it off, (GetEmailAddressFromUserGuid) did the rest, including the SQL query and the style I used etc.

To me, that's where it shines!

Once you figure out where it works best and its limits, it's brilliant imo.

  • skydhash 2 days ago

    To me, that’s either where a real editor shine (vim or emacs macro) or that’s when you need a metaprogramming capable language.

    • dingnuts 2 days ago

      as a heavy user of vim motions and macros I'm thinking this is one reason I've not found AI code generation terribly useful.

      Yes, it's good at boilerplate. But I've spent a long time getting good at vim macros and I'm also very good at generating boilerplate with a tiny number of keystrokes, quickly and without leaving my editor

      ...

      or I could type a paragraph to an LLM and copy paste and then edit the parts it gets wrong? also I have to pay per token to do that?

      no..

      • IanCal 2 days ago

        > or I could type a paragraph to an LLM and copy paste and then edit the parts it gets wrong?

        I think there's something you're missing in their description. They're not asking a model to do anything, it's automatically running and then suggests what should come next.

        Also in editors like cursor I can ask for a change directly in the editor and be presented with an inline diff to accept/reject.

        • skydhash a day ago

          I don’t know if you’re a vim user, but what makes people like vim is that once you master it, it’s not about typing and deleting characters. It’s about text manipulation, but live instead of typing an awk or sed script. It’s like when driving a car, you don’t think about each steps like watching the speedometer, verifying the exact pressure on the gas and brakes, and the angle of the steering wheel. You just have a global awareness and just drive it where you want to go.

          It’s the same with Vim. I want something done and it is. I can barely remind how I did it, because it does not matter. Something like duplicate the current function, change the name of the parameter and update the query as well as some symbols, can be done quickly as soon as the plan to do so appears. And it’s mostly in automatic mode.

          • IanCal 14 hours ago

            Yes, and this isn't the same.

            It's a tradition to torture car analogies so let me do so, this is more like when you start to say goodbye to people at a party your car identifies the pattern and warms up, opens the door as you walk to it and then drives you home. If you sit and take the wheel to drive towards a hotel you booked nearby it spots that and starts doing that for you.

            > Something like duplicate the current function, change the name of the parameter and update the query as well as some symbols, can be done quickly as soon as the plan to do so appears. And it’s mostly in automatic mode.

            And with these things I might move the cursor to where I want to put the new function, and then it's just immediately suggested for me. One key press and it's done. Then it suggests the other two based on the type definition somewhere else.

            Obviously this is the happy path.

  • jpt4 2 days ago

    If that analogy is remotely accurate, it strikes me as a strictly negative value proposition for the tool.

    • SkyBelow 2 days ago

      Depends upon the user. Are you fast (and confident) at evaluating the output and discarding the bad suggestions? This is why I think using AI hurts some developers and helps others, and the usefulness is best for those who already have a good deal of experience.

      • ItsBob a day ago

        I'm a senior developer so I spot the bullshit, like sometimes it makes up table names in the sql. Things like that.

        It's definitely not fire-and-forget: more like extreme autocomplete (at least that's how I use it)

        • SkyBelow 13 hours ago

          I don't ever use it for fire and forget, but I've been wondering how well that might work in small side projects where hidden bugs aren't a big concern. Like using a fire and forget to spin up a small javascript game. But never in production code that I might get a 2am Saturday incident call on.

potatoicecoffee 2 days ago

I like looking at stack overflow for coding examples and seeing a couple of nerds getting angry at each other about the best way to do stuff. or some interesting other ways of doing stuff. this is also why i come to HN! so its weird when people want to drop that small joy from their workday

  • jack_pp 2 days ago

    It's hard to keep that mindset after a while. People get tired of the same old pattern that's just ever so slightly different than last time. When something becomes repetitive you get bored of it. Or if the complexity of the solution you're working on is brought by a lot of moving parts it isn't a problem of figuring out the best programming truth so much as just being able to focus on the issue and keep all those moving parts in your head. That's tiring so you try to offload as much as you can on another brain

  • joseda-hg 2 days ago

    It is lovely when doing exploratory research, less so when in a hurry and you don't want to take the extra mental effort to figure out which of the nerds is right or if one of them is talking out of left field

    • frogsRnice 2 days ago

      As opposed to wondering if the llm is hallucinating?

      You have to expend a mental effort to think about your solutions anyway; I guess it’s pick your poison really.

      • zwnow 2 days ago

        Thats the issue, people just copy and paste code from llms thinking "yeah, looks fine to me". It might be a skill issue, but personally it takes me a while to understand the code its giving me and even more on how to actually implement it with all the edge cases that might happen.

        • anileated 2 days ago

          Before: I’m a lazy developer so I find the best libraries and abstract logic to write the least code and do least maintenance work.

          Now: I’m a lazy developer, so I get a glorified autocomplete to write 10x code than what I have the willpower to. Of course, I won’t read all of it.

      • joseda-hg 2 days ago

        Is it important if it's ocasionally hallucinating?

        It's not like you should blindly throw the code in, you should run it and verify it

        The more common the work you're doing the less likely it is to hallucinate, plus you can ask it to stick to whatever arbitrary coding standards you want so it's more readable to you, a rewrite to remove a wrong library takes an extra couple seconds per method/function

        Also it's not like Stack Overflow or other non generated resources don't ocasionally hallucinate, it's not weird for the second or third voted answers in SO to be followed by the comment "This doesnt work because XYZ"

        • skydhash 2 days ago

          That’s why you take a quick glance of the answer, then read the comments. The do a deeper analysis. Take something like 10 sexonds as it seems every real answer I find that’s good is usually just one or two paragraphs.

          • frogsRnice 2 days ago

            Yeah I agree- I think the time spent verifying should vary based on the complexity and sensitivity of what you are looking at, but you never really get away from it.

            I think my issue with LLMs is moreso aimed at people who wouldn’t have ever done the bare minimum verification anyway.

  • singularity2001 2 days ago

    i've never been as disgusted by a website as by stack overflow so getting rid of that drastically improves joy

  • hackernewds 2 days ago

    hard disagree. the amount of snobbiness and rude closing of threads as power trips was unbearable. I'm glad LLMs learned off them and we are where we are now

    • bilekas 2 days ago

      I've often heard this often and I don't buy it. Granted some mods could give some more time for users to rewrite but that's all. Users write bad questions or don't look up answers already solving it. It's a skill issue. Regardless trusting an LLM that's confidentiality incorrect a bit too offen is not really a trade up.

      If you consider most of the questions may not be actually "new questions" and have answers already, sometimes if it's important enough it's worth actually putting the effort in to understand the problem and solve it yourself. The over dependance that people are developing on LLMs is a little concerning.

mzhaase 2 days ago

There is an enormous difference for me between inline suggestions and using chat.

If you use chat it's work, and then you have to debug and understand something you didn't write yourself.

Using inline suggestions is the closest I have come to plugging my brain directly into the computer. I type 5 characters and in 80% of cases the suggestion is character by character exactly what I would have written. It speeds me up enormously.

lopuhin 2 days ago

I find it strange that the author is really happy with the quality of string comparison here https://pgaleone.eu/ai/coding/2025/01/26/using-ai-for-coding... and while it would kind of work, it's a very weird piece of code from ML standpoint, e.g. it's training a TF-IDF vectorizer on just two strings being compared, which at best won't change anything (unless the same word is repeated within one product), and is a super weird thing to do as for better quality you'd probably want to train that on some corpus, or not bother at all. And also it compare two strings as bags of words, which again is not the end of the world but maybe not what the author wants here, and if they want this then it's not the easiest way of doing it. So it's taking some things which can be useful when comparing texts (tf-idf and cosine similarity) but then applying them in a weird way which does not let them show their strengths.

numba888 2 days ago

Looks outdated by now. Try R1.

Seriously, I've been using LLMs for coding for a while and can say early experience was disappointing, but they get better and better fast. The latest o1 looks a lot better than 4o. It's reasonable to expect with proper human supervision and interface they will be able to handle big files and projects in a year or two. Interesting times...

  • bandrami 2 days ago

    They also said that a year or two ago

    • baq 2 days ago

      Impossible of yesterday became mundane today and obsolete tomorrow. Pace of progress is borderline debilitating, when you aren’t deafened by the woosh of goalposts flying past you.

      I can confidently say the way I code today is completely different from the way I was coding in late 2022 and it changed a couple times in between then and now, too.

      • bandrami 2 days ago

        "Isn't Lean just the same Six Sigma stuff you were selling us a year ago?"

        "No no no no no. This has a totally different name"

        • baq 2 days ago

          Yeah except throwing shit together for an internal react ui is now a pleasant couple hours instead of painful couple days.

          • bandrami 14 hours ago

            Right but throwing shit together for an internal perl cgi wireframe used to be like 15 minutes.

            We've regressed is all I'm saying

    • numba888 2 days ago

      Today it's within the reach, I think. While effective context window is small for big files it should be possible to use RAGs to get relevant pieces in smaller window. That will be mostly data structures and functions' declarations. Only what's needed. Humans work the same way on non-trivial projects. Supervision should include strategic planing and subdivision to smaller tasks. Not sure, just my feeling, we are close to have capable models with big enough context windows. May be they are already good enough, just need proper orchestration.

      Hmm... I can try... OAI has something called 'projects', but local handling with API calls is probably the right way of doing it. Easier to switch providers, run and debug in place. With current prices it should be like < $10/month.

ivoras 2 days ago

I did a similar thing but with backend-heavy code, and I agree with this assessment:

> In particular, I asked ChatGPT to write a function by knowing precisely how I would have implemented it. This is crucial since without knowing the expected result and what every line does, I might end up with a wrong implementation.

In my eyes, it makes the whole idea of AI coding moot. If I need to explain every step in detail - and it does not "understand" what it's doing; I can virtually the statistical trial-and-error behind its action - then what's the point? I might as well write it all myself and be a bit more sure the code ends up how I like it.

link: https://www.linkedin.com/feed/update/urn:li:activity:7289241...

  • doug_durham 2 days ago

    Because as op pointed out it's faster. It has ready access to the correct usage of different libraries.

    • WD-42 2 days ago

      So does the language features of basically any modern editor.

      • rfw300 2 days ago

        Not really - there's a difference between having the docstring of a function available for you to read, and a model which has learned from thousands of examples how to use a particular API and integrate it into a larger set of instructions. The latter is vastly faster and takes much less human work than the former.

        • girvo 2 days ago

          Except when it consistently gets said particular API wrong. I was using it to do basic graphql-yoga setup with R1 and then Claude Sonnet 3.5 and they both output incorrect usage, and got stuck in a loop trying to fix it.

          If it can’t do something that basic and that common using a language and toolset with that much training data, then I’m pessimistic personally.

          I’m yet to see Copilot be useful for any of my juniors when we pair, it gets in the way far more than it helps and it is ruining their deeper understanding, it seems.

          I’ll continue trying to use these tools, but I swear you’re overselling their abilities even still.

          • simonw 2 days ago

            The way to fix that is to find an example of correct usage of that API and paste that example in at the start of the prompt.

            This technique can reliably make any good LLM fluent in an API that it's never seen in its training data.

            • throwup238 2 days ago

              At this point with Cursor you can have it index the online docs by giving it a base URL and have it automatically RAG the relevant content into the chat (using the @ symbol to reference the docs). Both Windsurf and Cursor also support reading from URLs (iirc Aider does too).

              I’ve had better luck with manually including the page but including the indexed docs is usually enough to fix API mistakes.

            • WD-42 2 days ago

              Begs the question again: if you need to go out of your way to find an example of correct usage of the api to paste into the prompt, why are you even bothering?

              I find copilot useful when I already know what I want and start typing it out, at a certain point the scope of the problem is narrowed sufficiently for the LLM to fill the rest in. Of course this is more in line of “glorified autocomplete” than “replacing junior devs” that a keep hearing claims of.

              • simonw 2 days ago

                "if you need to go out of your way to find an example of correct usage of the api to paste into the prompt, why are you even bothering?"

                Because it's faster.

                Here's an example: https://tools.simonwillison.net/ocr

                That's an entirely client-side web page you can use to open a PDF which then converts every page to an image (using PDF.js), then runs each image through the Tesseract.js OCR program and lets you copy out the resulting text.

                I built the first version of that in about 5 minutes while paying attention to a talk at a conference, by pasting in examples of PDF.js and Tesseract.js usage. Here's that transcript: https://gist.github.com/simonw/6a9f077bf8db616e44893a24ae1d3...

                I wrote more about that process here, including the prompts I used: https://simonwillison.net/2024/Mar/30/ocr-pdfs-images/

                That's why I'm bothering: I can produce useful software in just a few minutes, while only paying partial attention to what the LLM is doing for me.

                • WD-42 2 days ago

                  That's a nice little self contained example. I have yet to see this approach work for the day job: a larger codebase with complex inter-dependencies, where the solution isn't so easily worded (make the text box pink) and where the resulting code is reviewed and tested by one's peers.

                  We actually had to make a rule at work that if you use an LLM to create an PR and can't explain the changes without using more LLMs, you can't submit the PR. I've seen it almost work - code that looks right but does a bunch of unnecessary stuff, and then it required a real person (me) to clean it up and ends up taking just as much time as if it were just written correctly the first time.

                • bavell 2 days ago

                  It's faster if all you're concerned with can fit in a static html file but what about for more complex projects?

                  I've struggled with getting any productivity benefits beyond single-file contexts. I've started playing with aider in an attempt to handle more complex workflows and multi-file editing but keep running into snags and end up spinning my wheels fighting my tools instead of making forward progress...

              • baq 2 days ago

                Because it still takes 5 mins for it to output the minimum viable change whereas it’d take me an hour

            • girvo 2 days ago

              Yeah thats the trick I've been using too, but by that point I get a better result by implementing it myself... of course, I've had two decades of practice and I don't have to communicate what I want lossily to myself, so it's an unfair comparison, but perhaps I've just not found the right use-case yet. I'm sure it exists, I've just not had much luck over the past couple years yet (including just this past weekend).

          • therealpygon 2 days ago

            That is far more likely to happen when it is relying on compressed knowledge of documentation and usage for an API it would have seen (comparatively) only a few times in training. That is where the various types of memory, tool calling and supplementary materials being fed in can make them significantly more situationally useful.

            The LLMs you mention are first and foremost a “general knowledge” machine rather than a domain expert. In my opinion, Junior developers are the least likely to benefit from their use because they have neither the foundational understanding to know when the approach is wrong, nor the practical experience to correct any mistakes. An LLM can replace a junior dev because we expect the mistakes and potentially poor quality, but you don’t really want a junior developer doing code reviews for another junior developer before pushing code.

            • onemoresoop 2 days ago

              The expectation for junior devs will probably change as well and they’d do a lot more shadowing while learning the product. Experience is gained in time.

    • sneedle 2 days ago

      So do I, via google.com

  • simonw 2 days ago

    LLMs are way, way faster at typing code than I am. I often dictate to them exactly what I need and it saves me a bunch of time.

    • bcrosby95 2 days ago

      So you would say typing speed matters?

      • onemoresoop 2 days ago

        Sometimes it does. If typing slows you down enough to take you out of the zone then it probably does.

        • rubslopes 2 days ago

          and, for tasks that take the LLM a minute, you can grab some coffee while it works and come back just to review. It's a great feeling.

  • alkonaut 2 days ago

    There are tons of use cases. E.g. if you know an algorithm (take any pseudocode description from a moderately complex algorithm on Wikipedia for example) and you know the programming language, you still may be looking at an hour or two of typing just to get that pseudocode down into code using your own language, variable names, libraries.

    But this is the kind of thing a LLM excels at. It gives you 200 lines of impl right away, you have a good understanding of both what it should look like and how it should work.

    Slow and error prone to type but, but quick and easy to verify once done, that's the key use case for me.

  • guelo 2 days ago

    It’s a better, faster, personalized Stack Overflow. Just like SO you might be led down the wrong path by an answer, but if you’re a programmer and you say you don’t get value out of Stack Overflow I don’t believe you.

outside1234 2 days ago

This is a pretty basic write up. Is there anything out there that does a survey of all of the things people have found to be successful in an engineering system for a project?

(eg. Github Copilot for PR reviews, etc.)

  • jansan 2 days ago

    The only thing that I have been using AI for was to translate a localization file. And it was pretty good at that.

cyanydeez 2 days ago

Anyone got a line on a local first AI Tooling. Happy to pay100$. Don't want a subscription.

  • 8thcross 2 days ago

    Cline or Roo Cline allows for local LLMs to be used. OpenRouter.ai - has made both Google Gemini 2.0 flash and thinking free!

  • cube2222 2 days ago

    continue.dev, aider, Zed’s AI assistant are all free and will work with a local Ollama installation.

    You will of course need an insanely beefy and expensive machine to run any useful models at reasonable speeds, which would likely cover API usage costs for many, many years (an entire lifetime, likely).

    • baq 2 days ago

      Unless you want to fine tune your custom model and/or have strict on prem requirements… shit gets expensive fast, but it’s probably worth it for some

  • magic_hamster 2 days ago

    Cline also supposedly supports Ollama but it doesn't work that well with most models. There are some models dedicated to cline.

  • Havoc 2 days ago

    That’s a google away. It’ll be noticeably inferior to the hosted frontier models though

  • _boffin_ 2 days ago

    CodeGPT for idea platform

koinedad 2 days ago

Was this blog also written by LLM?

cube2222 2 days ago

Yeah this seems to be similar to my experiences.

I use Zed’s AI assistant with Sonnet, and will generally give it 10-20k tokens of sample code from elsewhere in the codebase, shared libraries, database schema, etc. and more or less have a very specific expectation of exactly the code I want to get. More often than not, it will succeed I’ll get it faster than typing myself.

However, it’s also pretty good at poking holes in your design, coming up with edge cases, etc. Sure, most of its observations will likely be moot somehow, but if it lists 10 points, then even if only 2 are valid and I didn’t think of, it’s already valuable to me.

I’ve also used Cline a bit, it’s nice too, though most of the time a single run of Claude works just fine, and I like Zed’s AI Assistant UX (I actually don’t use it for coding other than that).

  • dgfitz 2 days ago

    > More often than not, it will succeed I’ll get it faster than typing myself.

    Like, all told? The whole bit where you need to find code, paste it in, find more code, paste it in, prompt a good question and most likely iterate on it, for an answer you say you had already expected, is faster than typing it out?

    I don’t understand.

    • dwaltrip 2 days ago

      A lot of the new tools are embedded in your editor so you don’t have to copy paste in either direction.

      My experience is that it can still be tricky to get high quality results when letting the AI actually edit the code for you. A few of my attempts went rather poorly. I’m hoping tweaks to how I use the tools improve this. Or I’ll just wait until better versions are released :)

      I use Claude in the web interface quite often though. It’s very helpful for certain queries. And I can usually abort quickly when it gets lost or starts hallucinating.

      • dgfitz 2 days ago
        • dwaltrip a day ago

          Yeah that jives.

          We are trying to learn how to use these tools effectively. They are a bit alien. And the landscape / ecosystem is a confusing mix of crazy hype and real progress that is rapidly changing month by month.

          You definitely can't give them a complex task and then walk away. Maybe in the next year or so that will more be possible. We shall see.

    • jack_pp 2 days ago

      It is way faster, especially when you're learning a new framework and don't know all the patterns by heart even if you have a good idea how they work. If you have 2000+ hours using a specific framework then it's probably not much use except for boilerplate code but for me when I have less than 300-400 hours with each individual component of my stack... it's a gamechanger. I can probably output code at 80-90% the speed of a developer that has 10x more experience than me.

    • Kiro 2 days ago

      You never need to paste anything. You just press "Apply".

      Edit: saw you said "paste it in". Same thing there. You either just include the whole file or select the code and press "Include". You can also let the editor handle the inclusion itself based on your prompt. It will then try to find the relevant files to include.

    • cube2222 2 days ago

      Like I said, I’m using Zed’s AI integration, which means I can easily reference other files, and have it edit code inline. I also reuse the bulk of the prompt across usages. There’s no finding and no pasting, in general.

8thcross 2 days ago

This was my finding as well - I now use Roo Cline tho....

SunlitCat a day ago

Well, my experience with using an LLM (ChatGPT) for coding in a nutshell:

(asking Chatgpt after getting a very cut together looking example-ish example):

Me: You simply read various examples from the D3D12 documentation and mixed them together without really understanding them? Admit it! :D

ChatGPT: Haha, I admit it, that was a bit of a ‘best of DirectX 12 documentary’! But hey, I tried to build you a solid base that covers both window handling and the basics of DirectX 12.