> LLMs still seem as terrible at this as they'd been in the GPT-3.5 age. Software agents break down once the codebase becomes complex enough, game-playing agents get stuck in loops out of which they break out only by accident, etc.
This has been my observation. I got into Github Copilot as early as it launched back when GPT-3 was the model. By that time (late 2021) copilot can already write tests for my Rust functions, and simple documentation. This was revolutionary. We didn't have another similar moment since then.
The Github copilot vim plugin is always on. As you keep typing, it keeps suggesting in faded text the rest of the context. Because it is always on, I kind of can read into the AI "mind". The more I coded, the more I realized it's just search with structured results. The results got better with 3.5/4 but after that only slightly and sometimes not quite (ie: 4o or o1).
I don't care what anyone says, as yesterday I made a comment that truth has essentially died: https://news.ycombinator.com/item?id=43308513 If you have a revolutionary intelligence product, why is it not working for me?
The last line has been my experience as well. I only trust what I've verified firsthand now because the Internet is just so rife with people trying to influence your thoughts in a way that benefits them, over a good faith sharing of the truth.
I just recently heard this quote from a clip of Jeff Bezos: "When the data and the anecdotes disagree, the anecdotes are usually right.", and I was like... wow. That quote is the zeitgeist.
If it's so revolutionary, it should be immediately obvious to me. I knew Uber, Netflix, Spotify were revolutionary the first time I used them. With LLMs for coding, it's like I'm groping in the dark trying to find what others are seeing, and it's just not there.
> I knew Uber, Netflix, Spotify were revolutionary the first time I used them.
Maybe re-tune your revolution sensor. None of those are revolutionary companies. Profitable and well executed, sure, but those turn up all the time.
Uber's entire business model was running over the legal system so quickly that taxi licenses didn't have time to catch up. Other than that it was a pretty obvious idea. It is a taxi service. The innovations they made were almost completely legal ones; figuring out how to skirt employment and taxi law.
Netflix was anticipated online by and is probably inferior to YouTube except for the fact that they have a pretty traditional content creator lab tacked on the side to do their own programs. And torrenting had been a thing for a long time already showing how to do online distribution of video content.
They were revolutionary as product genres, not necessary individual companies. Ordering a cab without making a phone call was revolutionary. Netflix at least with its initial promise of having all the world's movies and TV was revolutionary, but it didn't live up to that. Spotify because of how cheap and easy it was to have access to all the music, this was the era when people were paying 99c per song on iTunes.
I've tried some AI code completion tools and none of them hit me that way. My first reaction was "nobody is actually going to use this stuff" and that opinion hasn't really changed.
And if you think those 3 companies weren't revolutionary then AI code completion is even less than that.
Not only Uber/Grab (or delivery app) were revolutionary, they are still revolutionary. I could live without LLMs and my life will be slightly impacted when coding. If delivery apps are not available, my life is severely degraded. The other day I was sick. I got medicine and dinner with Grab. Delivered to the condo lobby which is as far as I can get. That is revolutionary.
While I don't disagree with that observation, it falls into the "well, duh!"-category for me. The models are build with no mechanism for long term memory and thus suck at tasks that require long term memory. There is nothing surprising here. There was never any expectation that LLMs magically develop long term memory, as that's impossible given the architecture. They predict the next word and once the old text moves out of the context window, it's gone. The models neither learn as they work nor can they remember the past.
It's not even like humans are all that different here. Strip a human of their tools (pen&paper, keyboard, monitor, etc.) and have them try solving problems with nothing but the power of their brain and they'll struggle a hell of a lot too, since our memory ain't exactly perfect either. We don't have perfect recall, we look things up when we need to, a large part of our "memory" is out there in the world around us, not in our head.
The open question is how to move forward. But calling AI progress a dead end before we even started exploring long term memory, tool use and on-the-fly learning is a tad little premature. It's like calling quits on the development of the car before you put the wheels on.
Nope. I try the latest models as they come and I have a self-made custom setup (as in a custom lua plugin) in Neovim. What I am not, is selling AI or AI-driven solutions.
Similar experience, I try so hard to make AI useful, and there are some decent spots here and there. Overall though I see the fundamental problem being that people need information. Language isn't strictly information, and the LLMs are very good at language, but they aren't great at information. I think anything more than the novelty of "talking" to the AI is very over hyped.
There is some usefulness to be had for sure, but I don't know if the usefulness is there with the non-subsidized models.
Do you mean that you have successfully managed to get the same experience in cursor but in neovim? I have been looking for something like that to move back to my neovim setup instead of using cursor. Any hints would be greatly appreciated!
Yeah, I'd buy it. I've been using Claude pretty intensively as a coding assistant for the last couple months, and the limitations are obvious. When the path of least resistance happens to be a good solution, Claude excels. When the best solution is off the beaten track, Claude struggles. When all the good solutions lay off the beaten track, Claude falls flat on its face.
Talking with Claude about design feels like talking with that one coworker who's familiar with every trendy library and framework. Claude knows the general sentiment around each library and has gone through the quickstart, but when you start asking detailed technical questions Claude just nods along. I wouldn't bet money on it, but my gut feeling is that LLMs aren't going to be a straight or even curved shot to AGI. We're going to see plenty more development in LLMs, but it'll be just be that. Better LLMs that remain LLMs. There will be areas where progress is fast and we'll be able to get very high intelligence in certain situations, but there will also be many areas where progress is slow, and the slow areas will cripple the ability of LLMs to reach AGI. I think there's something fundamentally missing, and finding what that "something" is is going to take us decades.
Yes, but on the other hand I don't understand why people think something that you can train something on pattern matching and it magically becomes intelligent.
We don't know what exactly makes us humans as intelligent as we are. And while I don't think that LLMs will be general intelligent without some other advancements, I don't get the confident statements that "clearly pattern matching can't lead to intelligence" when we don't really know what leads to intelligence to begin with.
I am not so sure about that. Using Claude yesterday it gave me a correct function that returned an array. But the algorithm it used did not return the items sorted in one pass so it had run a separate sort at the end. The fascinating thing is that it realized that, commented on it and went on and returned a single pass function.
That seems a pretty human thought process and shows that fundamental improvements might not depend as much on the quality of the LLM itself but on the cognitive structure it is embedded.
I've been writing code that implements tournament algorithms for games. You'd think an LLM would excel at this because it can explain the algorithms to me. I've been using cline on lots of other tasks to varying success. But it just totally failed with this one: it kept writing edge cases instead of a generic implementation. It couldn't write coherent enough tests across a whole tournament.
So I wrote tests thinking it could implement the code from the tests, and it couldn't do that either. At one point it went so far with the edge cases that it just imported the test runner into the code so it could check the test name to output the expected result. It's like working with a VW engineer.
Edit: I ended up writing the code and it wasn't that hard, I don't know why it struggled with this one task so badly. I wasted far more time trying to make the LLM work than just doing it myself.
> At some point there might be massive layoffs due to ostensibly competent AI labor coming onto the scene, perhaps because OpenAI will start heavily propagandizing that these mass layoffs must happen. It will be an overreaction/mistake. The companies that act on that will crash and burn, and will be outcompeted by companies that didn't do the stupid.
We're already seeing this with tech doing RIFs and not backfilling domestically for developer roles (the whole, "we're not hiring devs in 202X" schtick), though the not-so-quiet secret is that a lot of those roles just got sent overseas to save on labor costs. The word from my developer friends is that they are sick and tired of having to force a (often junior/outsourced) colleague to explain their PR or code, only to be told "it works" and for management to overrule their concerns; this is embedding AI slopcode into products, which I'm sure won't have any lasting consequences.
My bet is that software devs who've been keeping up with their skills will have another year or two of tough times, then back into a cushy Aeron chair with a sparkling new laptop to do what they do best: write readable, functional, maintainable code, albeit in more targeted ways since - and I hate to be that dinosaur - LLMs produce passable code, provided a competent human is there to smooth out its rougher edges and rewrite it to suit the codebase and style guidelines (if any).
Oh, no, you’re 100% right. One of these days I will pen my essay on the realities of outsourced labor.
Spoiler alert: they are giving just barely enough to not get prematurely fired, because they know if you’re cheap enough to outsource in the first place, you’ll give the contract to whoever is cheapest at renewal anyway.
I think the author provides an interesting perspective to the AI hype, however, I think he is really downplaying the effectiveness of what you can do with the current models we have.
If you've been using LLMs effectively to build agents or AI-driven workflows you understand the true power of what these models can do. So in some ways the author is being a little selective with his confirmation bias.
I promise you that if you do your due diligence in exploring the horizon of what LLMs can do you will understand what I'm saying. If ya'll want a more detailed post I can get into the AI systems I have been building. Don't sleep on AI.
The thing I can't wrap my head around is that I work on extremely complex AI agents every day and I know how far they are from actually replacing anyone. But then I step away from my work and I'm constantly bombarded with “agents will replace us”.
I wasted a few days trying to incorporate aider and other tools into my workflow. I had a simple screen I was working on for configuring an AI Agent. I gave screenshots of the expected output. Gave a detailed description of how it should work. Hours later I was trying to tweak the code it came up with. I scrapped everything and did it all myself in an hour.
There are some fields though where they can replace humans in significant capacity. Software development is probably one of the least likely for anything more than entry level, but A LOT of engineering has a very very real existential threat. Think about designing buildings. You basically just need to know a lot of rules / tables and how things interact to know what's possible and the best practices. A purpose built AI could develop many systems and back test them to complete the design. A lot of this is already handled or aided by software, but a main role of the engineer is to interface with the non-technical persons or other engineers. This is something where an agent could truly interface with the non-engineer to figure out what they want, then develop it and interact with the design software quite autonomously.
I think though there is a lot of focus on AI agents in software development though because that's just an early adopter market, just like how it's always been possible to find a lot of information on web development on the web!
Most engineering fields are de jure professional, which means they can and probably will enforce limitations on the use of GenAI or its successor tech before giving up that kind of job security. Same goes for the legal profession.
Software development does not have that kind of protection.
In my experience this word means you don't know whatever you're speaking about. "Just" almost always hide a ton of unknown unknowns. After being burned enough times nowadays when I'm going to use it I try to stop and start asking more questions.
I keep coming back to this point. Lots of jobs are fundamentally about taking responsibility. Even if AI were to replace most of the work involved, only a human can meaningfully take responsibility for the outcome.
I promise the amount of time, experiments and novel approaches you’ve tested are .0001% of what others have running in stealth projects. Ive spent an average of 10 hours per day constantly since 2022 working on LLMs, and I know that even what I’ve built pales in comparison to other labs. (And im well beyond agents at this point). Agentic AI is what’s popular in the mainstream, but it’s going to be trounced by at least 2 new paradigms this year.
Oh course lesswrong, being heavily AI doomers, may be slightly biased against near term AGI just from motivated reasoning.
Gotta love this part of the post no one has yet addressed:
> At some unknown point – probably in 2030s, possibly tomorrow (but likely not tomorrow) – someone will figure out a different approach to AI. Maybe a slight tweak to the LLM architecture, maybe a completely novel neurosymbolic approach. Maybe it will happen in a major AGI lab, maybe in some new startup. By default, everyone will die in <1 year after that
I would expect similar doom predictions in the era of nuclear weapon invention, but we've survived so far. Why do people assume AGI will be orders of magnitude more dangerous than what we already have?
> LLMs are not good in some domains and bad in others. Rather, they are incredibly good at some specific tasks and bad at other tasks. Even if both tasks are in the same domain, even if tasks A and B are very similar, even if any human that can do A will be able to do B.
i think this is true of ai/ml systems in general. we tend to anthropomorphise their capability curves to match the cumulative nature of human capabilities, where often times the capability curve of the machine is discontinuous and has surprising gaps.
LLMs make it very easy to cheat, both academically and professionally. What this looks like in the workplace is a junior engineer not understanding their task or how to do it but stuffing everything into the LLM until lint passes. This breaks the trust model: there are many requirements that are a little hard to verify than an LLM might miss, and the junior engineer can now represent to you that they "did what you ask" without really certifying the work output. I believe that this kind of professional cheating is just as widespread as academic cheating, which is an epidemic.
What we really need is people who can certify that a task was done correctly, who can use LLMs as an aid. LLMs simply cannot be responsible for complex requirements. There is no way to hold them accountable.
I see no reason to believe the extraordinary progress we've seen recently will stop or even slow down. Personally, I've benefited so much from AI that it feels almost alien to hear people downplaying it. Given the excitement in the field and the sheer number of talented individuals actively pushing it forward, I'm quite optimistic that progress will continue, if not accelerate.
I hear you, I feel constantly bewildered by comments like "LLMs haven't changed really since GPT3.5.", I mean really? It went from an exciting novelty to a core pillar of my daily work, it's allowed me and my entire (granted , quote senior) org to be incredibly more productive and creative with our solutions.
And the I stumble across a comment where some LLM hallucinated a library that means clearly AI is useless.
I'd expect mid-level developer to show more understanding and better reasoning. So far it looks like a junior dev who read a lot of books and good at copy pasting from stackoverflow.
(Based on my everyday experience with Sonet and Cursor)
> It seems to me that "vibe checks" for how smart a model feels are easily gameable by making it have a better personality.
I don't buy that at all, most of my use cases don't involve model's personality, if anything I usually instruct to skip any commentary and give the result excepted only. I'm sure most people using AI models seriously would agree.
> My guess is that it's most of the reason Sonnet 3.5.1 was so beloved. Its personality was made much more appealing, compared to e. g. OpenAI's corporate drones.
I would actually guess it's mostly because it was good at code, which doesn't involve much personnality
Let's imagine that we all had a trillion dollars. Then we would all sit around and go "well dang, we have everything, what should we do?". I think you'll find that just about everyone would agree, "we oughta see how far that LLM thing can go". We could be in nuclear fallout shelters for decades, and I think you'll still see us trying to push the LLM thing underground, through duress. We dream of this, so the bear case is wrong in spirit. There's no bear case when the spirit of the thing is that strong.
Wdym all of us?
I certainly would find much better usages for the money.
What about reforming democracy? Use the corrupt system to buy the votes, then abolish all laws allowing these kind of donations that allow buying votes.
I'll litigate the hell out of all the oligarchs now that they can't out pay justice.
This would pay off more than a moon shot. I would give a bit of money for the moon shot, why not, but not all of it.
This seems to be ignoring the major force driving AI right now - hardware improvements. We've barely seen a new hardware generation since ChatGPT was released to the market, we'd certainly expect it to plateau fairly quickly on fixed hardware. My personal experience of AI models is going to be a series of step changes every time the VRAM on my graphics card doubles. Big companies are probably going to see something similar each time a new more powerful product hits the data centre. The algorithms here aren't all that impressive compared to the creeping FLOPS/$ metric.
Bear cases always welcome. This wouldn't be the first time in computing history that progress just falls off the exponential curve suddenly. Although I would bet money on there being a few years left and AGI is achieved.
hardware improvements don't strike me as the horse to bet on.
LLM Progression seems to be linear and compute needed exponential. And I don't see exponential hardware improvements besides some new technology (that we should not bet on coming ayntime soon).
>At some point there might be massive layoffs due to ostensibly competent AI labor coming onto the scene, perhaps because OpenAI will start heavily propagandizing that these mass layoffs must happen. It will be an overreaction/mistake. The companies that act on that will crash and burn, and will be outcompeted by companies that didn't do the stupid.
(IMO) Apart from programmer assistance (which is already happening), AI agents will find the most use in secretarial, ghostwriting and customer support roles, which generally have a large labor surplus and won't immediately "crash and burn" companies even if there are failures. Perhaps if it's a new startup or a small, unstable business on shaky grounds this could become a "last straw" kind of a factor, but for traditional corporations with good leeway I don't think just a few mistakes about AI deployment can do too much harm. The potential benefits, on the other hand, far outmatch the risk taken.
I see engineering, not software, but the other technical areas that have the biggest threat. High paid, knowledge based fields, but not reliant on interpersonal communication. Secretarial and customer support less so, they aren't terribly high paid and anything that relies on interacting with people is going to meet a lot of pushback. US based call centers is already a big selling point for a lot of companies and chat bots have been around for years in customer support and people hate them and there's a long way to go to change that perception.
The typical AI economic discussion always focuses on job loss, but that's only half the story. We won't just have corporations firing everyone while AI does all the work - who would buy their products then?
The disruption goes both ways. When AI slashes production costs by 10-100x, what's the value proposition of traditional capital? If you don't need to organize large teams or manage complex operations, the advantage of "being a capitalist" diminishes rapidly.
I'm betting on the rise of independents and small teams. The idea that your local doctor or carpenter needs VC funding or an IPO was always ridiculous. Large corps primarily exist to organize labor and reduce transaction costs.
The interesting question: when both executives and frontline workers have access to the same AI tools, who wins? The manager with an MBA or the person with practical skills and domain expertise? My money's on the latter.
Idk where you live, but in my world "being a capitalist" requires you to own capital. And you know what, AI makes it even better to own capital. Now you have these fancey machines doing stuff for you and you dont even need any annoying workers.
By "capitalist," I'm referring to investors whose primary contribution is capital, not making a political statement about capitalism itself.
Capital is crucial when tools and infrastructure are expensive. Consider publishing: pre-internet, starting a newspaper required massive investment in printing presses, materials, staff, and distribution networks. The web reduced these costs dramatically, allowing established media to cut expenses and focus on content creation. However, this also opened the door for bloggers and digital news startups to compete effectively without the traditional capital requirements. Many legacy media companies are losing this battle.
Unless AI systems remain prohibitively expensive (which seems unlikely given current trends), large corporations will face a similar disruption. When the tools of production become accessible to individuals and small teams, the traditional advantage of having deep pockets diminishes significantly.
Hmm, I didn’t read the article but from the gist of other comments, we seem to have bought into Sama’s “agents so good, you don’t need developers/engineers/support/secretaries/whatever anymore”. Issue is, it is almost same as claiming, pocket calculators so good, we don’t need accountants anymore, even computers so good, we don’t need accountants anymore. This AI seems to claim to be that motor car moment when horse cart got replaced. But a horse cart got replaced with a Taxi(and they also have unions protecting them!). With AI, all these “to be replaced” people are like accountants, more productive, same as with higher level languages compared to assembly, many new devs are productive. Despite cars replacing the horse carts of the long past, we still fail to have self driving cars and still someone needs to learn to drive that massive hunk of metal, same as whoever plans to deploy LLM to layoff devs must learn to drive those LLMs and know what it is doing.
I believe it is high time we come out this madness and reveal the lies of the marketers and grifters of AI for what it is. If AI can replace anyone, it should begin with doctors, they work with rote knowledge and service based on explicit(though ambiguous) inputs, same as an LLM needs, but I still have doctors and wait for hours on end in the waiting room to get prescribed a cough hard candy only to later comeback again because it was actually covid and my doctor had a brain fart.
I sincerely wonder how long that will be true. Google was amazing and didn't have more than small, easily ignorable ads in 1999, and they weren't really tracking you the way they are today, just an all-around better experience than Google delivers today.
I'm not sure that it's a technology difference that makes LLM a better experience than search today, it's that the VC's are still willing to subsidize user experience today, and won't start looking for return on their investment for a few more years. Give OpenAI 10 years to pull all the levers to pay back the VC investment and what will it be like?
They will sell "training data slots". So that when I'm looking for a butter cookie recipe, ChatGPT says I'll have to use 100g of "Brand (TM) Butter" instead of just "Butter".
Ask it how to deploy an app to the cloud and it will insist you need to deploy it to Azure.
These ads would be easily visible though. You can probably sell far more malicious things.
LLMs are already super useful.
It does all my coding and scripting for me @home
It does most of the coding and scripting at the workplace
It creates 'fairly good' checklists for work (not perfect, but it takes a 4 hour effort and makes it 25mins - but the "Pro" is still needed to make this or that checklist usable - I call this a win)(need both the tech AND the human)
If/when you train an 'in-house' LLM it can make some easy wins (on mega-big-companies with 100k staff they can get quick answers on "which Policy writes about XYZ, which department can I talk to about ABC, etc.)
We won't have the "AGI"/Skynet anytime soon, and when one will exist the company (let's use OpenAI for example) will split in two. Half will give LLMs for the masses at $100 per month, the "Skynet" will go to the DOD and we will never hear about it again, except in the Joe Rogan podcast as a rumor.
It is a great 'idea generator' (search engine and results aggregator): give me a list of 10 things I can do _that_ weekend in _city_I_will_be_traveling_to so if/when I go to (e.g. London): here are the cool concerts, theatrical performances, parks, blah blah blah
Yeah agree 100%. LLMs are overrated. I describe them as the “Jack of all, master of none” of AI. LLMs are that jackass guy we all know who has to chime in to every topic like he knows everything, but in reality he’s a fraud with low self-esteem.
I’ve known a guy since college who now has a PhD in something niche, supposedly pulls a $200k/yr salary. One of our first conversations (in college, circa 2014) was how he had this clever and easy way to mint money- by selling Minecraft servers installed on Raspberry Pis. Some of you will recognize how asinine this idea was and is. For everyone else- back then, Minecraft only ran on x86 CPUs (and I doubt a Pi would make a good Minecraft server today, even if it were economical). He had no idea what he was talking about, he was just spewing shit like he was God’s gift. Actually, the problem wasn’t that he had no idea- it was that he knew a tiny bit- enough to sound smart to an idiot (remind you of anyone?).
That’s an LLM. A jackass with access to Google.
I’ve had great success with SLMs (small language models), and what’s more I don’t need a rack of NVIDIA L40 GPUs to train and use them.
I'm generally more skeptical when reading takes and predictions from people working at AI companies, who have a financial interest in making sure the hype train continues.
To make an analogy - most people who will tell you not to invest in cryptocurrency are not blockchain engineers. But does that make their opinion invalid?
Of course I trust people who working on L2 chains to tell me how to scale Bitcoin and people who working on cryptography to walk me through the ETH PoS algorithms.
You cannot lead to truth by learning from people who don't know. People who know can be biased, sure, so the best way to learn is to learn the knowledge, not the "hot-takes" or "predictions".
The crypto people have no coherent story about why crypto is fundamentally earth-shaking more than a story about either gambling or regulatory avoidance, whereas the story for AI, if you believe it, is a second industrial revolution and labor automation where, to at least some small extent, it is undeniable.
> Be careful about consuming information from chatters, not doers
The doers produce a new javascript framework every week, claiming it finally solves all the pains of previous frameworks, whereas the chatters pinpoint all the deficiencies and pain points.
One group has an immensely better track record than the other.
I would listen to people who used the previous frameworks about the deficiencies and pain points, not people who just casually browse the documentation about their high-flying ideas why these have deficiencies and pain points.
One group has an immensely more convincing power to me.
> LLMs still seem as terrible at this as they'd been in the GPT-3.5 age. Software agents break down once the codebase becomes complex enough, game-playing agents get stuck in loops out of which they break out only by accident, etc.
This has been my observation. I got into Github Copilot as early as it launched back when GPT-3 was the model. By that time (late 2021) copilot can already write tests for my Rust functions, and simple documentation. This was revolutionary. We didn't have another similar moment since then.
The Github copilot vim plugin is always on. As you keep typing, it keeps suggesting in faded text the rest of the context. Because it is always on, I kind of can read into the AI "mind". The more I coded, the more I realized it's just search with structured results. The results got better with 3.5/4 but after that only slightly and sometimes not quite (ie: 4o or o1).
I don't care what anyone says, as yesterday I made a comment that truth has essentially died: https://news.ycombinator.com/item?id=43308513 If you have a revolutionary intelligence product, why is it not working for me?
The last line has been my experience as well. I only trust what I've verified firsthand now because the Internet is just so rife with people trying to influence your thoughts in a way that benefits them, over a good faith sharing of the truth.
I just recently heard this quote from a clip of Jeff Bezos: "When the data and the anecdotes disagree, the anecdotes are usually right.", and I was like... wow. That quote is the zeitgeist.
If it's so revolutionary, it should be immediately obvious to me. I knew Uber, Netflix, Spotify were revolutionary the first time I used them. With LLMs for coding, it's like I'm groping in the dark trying to find what others are seeing, and it's just not there.
> I knew Uber, Netflix, Spotify were revolutionary the first time I used them.
Maybe re-tune your revolution sensor. None of those are revolutionary companies. Profitable and well executed, sure, but those turn up all the time.
Uber's entire business model was running over the legal system so quickly that taxi licenses didn't have time to catch up. Other than that it was a pretty obvious idea. It is a taxi service. The innovations they made were almost completely legal ones; figuring out how to skirt employment and taxi law.
Netflix was anticipated online by and is probably inferior to YouTube except for the fact that they have a pretty traditional content creator lab tacked on the side to do their own programs. And torrenting had been a thing for a long time already showing how to do online distribution of video content.
They were revolutionary as product genres, not necessary individual companies. Ordering a cab without making a phone call was revolutionary. Netflix at least with its initial promise of having all the world's movies and TV was revolutionary, but it didn't live up to that. Spotify because of how cheap and easy it was to have access to all the music, this was the era when people were paying 99c per song on iTunes.
I've tried some AI code completion tools and none of them hit me that way. My first reaction was "nobody is actually going to use this stuff" and that opinion hasn't really changed.
And if you think those 3 companies weren't revolutionary then AI code completion is even less than that.
> Ordering a cab without making a phone call was revolutionary.
With the power of AI, soon you'll be able to say "Hey Siri, get me an Uber to the airport". As easy as making a phone call.
And end up at an airport in an entirely different city.
> None of those are revolutionary companies.
Not only Uber/Grab (or delivery app) were revolutionary, they are still revolutionary. I could live without LLMs and my life will be slightly impacted when coding. If delivery apps are not available, my life is severely degraded. The other day I was sick. I got medicine and dinner with Grab. Delivered to the condo lobby which is as far as I can get. That is revolutionary.
While I don't disagree with that observation, it falls into the "well, duh!"-category for me. The models are build with no mechanism for long term memory and thus suck at tasks that require long term memory. There is nothing surprising here. There was never any expectation that LLMs magically develop long term memory, as that's impossible given the architecture. They predict the next word and once the old text moves out of the context window, it's gone. The models neither learn as they work nor can they remember the past.
It's not even like humans are all that different here. Strip a human of their tools (pen&paper, keyboard, monitor, etc.) and have them try solving problems with nothing but the power of their brain and they'll struggle a hell of a lot too, since our memory ain't exactly perfect either. We don't have perfect recall, we look things up when we need to, a large part of our "memory" is out there in the world around us, not in our head.
The open question is how to move forward. But calling AI progress a dead end before we even started exploring long term memory, tool use and on-the-fly learning is a tad little premature. It's like calling quits on the development of the car before you put the wheels on.
You’re not using the best tools.
Claude Code, Cline, Cursor… all of them with Claude 3.7.
Nope. I try the latest models as they come and I have a self-made custom setup (as in a custom lua plugin) in Neovim. What I am not, is selling AI or AI-driven solutions.
Similar experience, I try so hard to make AI useful, and there are some decent spots here and there. Overall though I see the fundamental problem being that people need information. Language isn't strictly information, and the LLMs are very good at language, but they aren't great at information. I think anything more than the novelty of "talking" to the AI is very over hyped.
There is some usefulness to be had for sure, but I don't know if the usefulness is there with the non-subsidized models.
Perhaps we could help if you shared some real examples of what walls you’re hitting. But it sounds like you’ve already made up your mind.
Do you mean that you have successfully managed to get the same experience in cursor but in neovim? I have been looking for something like that to move back to my neovim setup instead of using cursor. Any hints would be greatly appreciated!
Yeah, I'd buy it. I've been using Claude pretty intensively as a coding assistant for the last couple months, and the limitations are obvious. When the path of least resistance happens to be a good solution, Claude excels. When the best solution is off the beaten track, Claude struggles. When all the good solutions lay off the beaten track, Claude falls flat on its face.
Talking with Claude about design feels like talking with that one coworker who's familiar with every trendy library and framework. Claude knows the general sentiment around each library and has gone through the quickstart, but when you start asking detailed technical questions Claude just nods along. I wouldn't bet money on it, but my gut feeling is that LLMs aren't going to be a straight or even curved shot to AGI. We're going to see plenty more development in LLMs, but it'll be just be that. Better LLMs that remain LLMs. There will be areas where progress is fast and we'll be able to get very high intelligence in certain situations, but there will also be many areas where progress is slow, and the slow areas will cripple the ability of LLMs to reach AGI. I think there's something fundamentally missing, and finding what that "something" is is going to take us decades.
Yes, but on the other hand I don't understand why people think something that you can train something on pattern matching and it magically becomes intelligent.
We don't know what exactly makes us humans as intelligent as we are. And while I don't think that LLMs will be general intelligent without some other advancements, I don't get the confident statements that "clearly pattern matching can't lead to intelligence" when we don't really know what leads to intelligence to begin with.
I am not so sure about that. Using Claude yesterday it gave me a correct function that returned an array. But the algorithm it used did not return the items sorted in one pass so it had run a separate sort at the end. The fascinating thing is that it realized that, commented on it and went on and returned a single pass function.
That seems a pretty human thought process and shows that fundamental improvements might not depend as much on the quality of the LLM itself but on the cognitive structure it is embedded.
I've been writing code that implements tournament algorithms for games. You'd think an LLM would excel at this because it can explain the algorithms to me. I've been using cline on lots of other tasks to varying success. But it just totally failed with this one: it kept writing edge cases instead of a generic implementation. It couldn't write coherent enough tests across a whole tournament.
So I wrote tests thinking it could implement the code from the tests, and it couldn't do that either. At one point it went so far with the edge cases that it just imported the test runner into the code so it could check the test name to output the expected result. It's like working with a VW engineer.
Edit: I ended up writing the code and it wasn't that hard, I don't know why it struggled with this one task so badly. I wasted far more time trying to make the LLM work than just doing it myself.
A tip: ask Claude to put a critical hat on. I find the output afterwards to be improved.
Do you have an example?
> At some point there might be massive layoffs due to ostensibly competent AI labor coming onto the scene, perhaps because OpenAI will start heavily propagandizing that these mass layoffs must happen. It will be an overreaction/mistake. The companies that act on that will crash and burn, and will be outcompeted by companies that didn't do the stupid.
We're already seeing this with tech doing RIFs and not backfilling domestically for developer roles (the whole, "we're not hiring devs in 202X" schtick), though the not-so-quiet secret is that a lot of those roles just got sent overseas to save on labor costs. The word from my developer friends is that they are sick and tired of having to force a (often junior/outsourced) colleague to explain their PR or code, only to be told "it works" and for management to overrule their concerns; this is embedding AI slopcode into products, which I'm sure won't have any lasting consequences.
My bet is that software devs who've been keeping up with their skills will have another year or two of tough times, then back into a cushy Aeron chair with a sparkling new laptop to do what they do best: write readable, functional, maintainable code, albeit in more targeted ways since - and I hate to be that dinosaur - LLMs produce passable code, provided a competent human is there to smooth out its rougher edges and rewrite it to suit the codebase and style guidelines (if any).
One could argue that's not strictly "AI labor", just cheap (but real) labor using shortcuts because they're not paid enough to give a damn.
Oh, no, you’re 100% right. One of these days I will pen my essay on the realities of outsourced labor.
Spoiler alert: they are giving just barely enough to not get prematurely fired, because they know if you’re cheap enough to outsource in the first place, you’ll give the contract to whoever is cheapest at renewal anyway.
I think the author provides an interesting perspective to the AI hype, however, I think he is really downplaying the effectiveness of what you can do with the current models we have.
If you've been using LLMs effectively to build agents or AI-driven workflows you understand the true power of what these models can do. So in some ways the author is being a little selective with his confirmation bias.
I promise you that if you do your due diligence in exploring the horizon of what LLMs can do you will understand what I'm saying. If ya'll want a more detailed post I can get into the AI systems I have been building. Don't sleep on AI.
The thing I can't wrap my head around is that I work on extremely complex AI agents every day and I know how far they are from actually replacing anyone. But then I step away from my work and I'm constantly bombarded with “agents will replace us”.
I wasted a few days trying to incorporate aider and other tools into my workflow. I had a simple screen I was working on for configuring an AI Agent. I gave screenshots of the expected output. Gave a detailed description of how it should work. Hours later I was trying to tweak the code it came up with. I scrapped everything and did it all myself in an hour.
I just don't know what to believe.
There are some fields though where they can replace humans in significant capacity. Software development is probably one of the least likely for anything more than entry level, but A LOT of engineering has a very very real existential threat. Think about designing buildings. You basically just need to know a lot of rules / tables and how things interact to know what's possible and the best practices. A purpose built AI could develop many systems and back test them to complete the design. A lot of this is already handled or aided by software, but a main role of the engineer is to interface with the non-technical persons or other engineers. This is something where an agent could truly interface with the non-engineer to figure out what they want, then develop it and interact with the design software quite autonomously.
I think though there is a lot of focus on AI agents in software development though because that's just an early adopter market, just like how it's always been possible to find a lot of information on web development on the web!
Most engineering fields are de jure professional, which means they can and probably will enforce limitations on the use of GenAI or its successor tech before giving up that kind of job security. Same goes for the legal profession.
Software development does not have that kind of protection.
> just
In my experience this word means you don't know whatever you're speaking about. "Just" almost always hide a ton of unknown unknowns. After being burned enough times nowadays when I'm going to use it I try to stop and start asking more questions.
>a main role of the engineer is to interface with the non-technical persons or other engineers
The main role of the engineer is being responsible for the building not collapsing.
I keep coming back to this point. Lots of jobs are fundamentally about taking responsibility. Even if AI were to replace most of the work involved, only a human can meaningfully take responsibility for the outcome.
ChatGPT will probably take more responsibility than Boeing for their airplane software.
You’re biased because if you’re here, you’re likely an A-tier player used to working with other A-tier players.
But the vast majority of the world is not A players. They’re B and C players
I don’t think the people evaluating AI tools have ever worked in wholly mediocre organizations - or even know how many mediocre organizations exist
I promise the amount of time, experiments and novel approaches you’ve tested are .0001% of what others have running in stealth projects. Ive spent an average of 10 hours per day constantly since 2022 working on LLMs, and I know that even what I’ve built pales in comparison to other labs. (And im well beyond agents at this point). Agentic AI is what’s popular in the mainstream, but it’s going to be trounced by at least 2 new paradigms this year.
Say more.
So what is your prediction?
Author also made a highly upvoted and controversial comment about o3 in the same vein that's worth reading: https://www.lesswrong.com/posts/Ao4enANjWNsYiSFqc/o3?comment...
Oh course lesswrong, being heavily AI doomers, may be slightly biased against near term AGI just from motivated reasoning.
Gotta love this part of the post no one has yet addressed:
> At some unknown point – probably in 2030s, possibly tomorrow (but likely not tomorrow) – someone will figure out a different approach to AI. Maybe a slight tweak to the LLM architecture, maybe a completely novel neurosymbolic approach. Maybe it will happen in a major AGI lab, maybe in some new startup. By default, everyone will die in <1 year after that
I would expect similar doom predictions in the era of nuclear weapon invention, but we've survived so far. Why do people assume AGI will be orders of magnitude more dangerous than what we already have?
Nuclear weapons are not self-improving or self-replicating.
More ability to kill everyone. That's harder to do with nukes.
That said, the actual forecast odds on metaculus are pretty similar for nuclear and AI catastrophies: https://possibleworldstree.com/
Most people are just ignorant and dumb, dont listen to it.
> LLMs are not good in some domains and bad in others. Rather, they are incredibly good at some specific tasks and bad at other tasks. Even if both tasks are in the same domain, even if tasks A and B are very similar, even if any human that can do A will be able to do B.
i think this is true of ai/ml systems in general. we tend to anthropomorphise their capability curves to match the cumulative nature of human capabilities, where often times the capability curve of the machine is discontinuous and has surprising gaps.
This poetic statement by the author sums it up for me:
”People are extending LLMs a hand, hoping to pull them up to our level. But there's nothing reaching back.”
When you (attempt to) save a person from drowning there is ridiculously high chance of them drowning you.
Haha.
Shame on you for making me laugh. That was very inappropriate.
LLMs make it very easy to cheat, both academically and professionally. What this looks like in the workplace is a junior engineer not understanding their task or how to do it but stuffing everything into the LLM until lint passes. This breaks the trust model: there are many requirements that are a little hard to verify than an LLM might miss, and the junior engineer can now represent to you that they "did what you ask" without really certifying the work output. I believe that this kind of professional cheating is just as widespread as academic cheating, which is an epidemic.
What we really need is people who can certify that a task was done correctly, who can use LLMs as an aid. LLMs simply cannot be responsible for complex requirements. There is no way to hold them accountable.
I see no reason to believe the extraordinary progress we've seen recently will stop or even slow down. Personally, I've benefited so much from AI that it feels almost alien to hear people downplaying it. Given the excitement in the field and the sheer number of talented individuals actively pushing it forward, I'm quite optimistic that progress will continue, if not accelerate.
I hear you, I feel constantly bewildered by comments like "LLMs haven't changed really since GPT3.5.", I mean really? It went from an exciting novelty to a core pillar of my daily work, it's allowed me and my entire (granted , quote senior) org to be incredibly more productive and creative with our solutions.
And the I stumble across a comment where some LLM hallucinated a library that means clearly AI is useless.
The impression I get from using all cutting edge AI tools:
1. Sonnet 3.7 is a mid-level web developer at least
2. DeepResearch is about as good an analyst as an MBA from a school ranked 50+ nationally. Not lower than that. EY, not McKinsey
3. Grok 3/GPT-4.5 are good enough as $0.05/word article writers
Its not replacing the A-players but its good enough to replace B players and definitely better than C and D players
A midlevel web developer should do a whole lot more than just respond to chat messages and do exactly what they are told to do and no more.
When I use LLMs that what it does. Spawns commands, edits files, runs tests, evaluates outputs, iterates and solutions under my guidance.
I'd expect mid-level developer to show more understanding and better reasoning. So far it looks like a junior dev who read a lot of books and good at copy pasting from stackoverflow.
(Based on my everyday experience with Sonet and Cursor)
> It seems to me that "vibe checks" for how smart a model feels are easily gameable by making it have a better personality.
I don't buy that at all, most of my use cases don't involve model's personality, if anything I usually instruct to skip any commentary and give the result excepted only. I'm sure most people using AI models seriously would agree.
> My guess is that it's most of the reason Sonnet 3.5.1 was so beloved. Its personality was made much more appealing, compared to e. g. OpenAI's corporate drones.
I would actually guess it's mostly because it was good at code, which doesn't involve much personnality
Let's imagine that we all had a trillion dollars. Then we would all sit around and go "well dang, we have everything, what should we do?". I think you'll find that just about everyone would agree, "we oughta see how far that LLM thing can go". We could be in nuclear fallout shelters for decades, and I think you'll still see us trying to push the LLM thing underground, through duress. We dream of this, so the bear case is wrong in spirit. There's no bear case when the spirit of the thing is that strong.
Wdym all of us? I certainly would find much better usages for the money.
What about reforming democracy? Use the corrupt system to buy the votes, then abolish all laws allowing these kind of donations that allow buying votes.
I'll litigate the hell out of all the oligarchs now that they can't out pay justice.
This would pay off more than a moon shot. I would give a bit of money for the moon shot, why not, but not all of it.
"So, after Rome's all yours you just give it back to the people? Tell me why."
This seems to be ignoring the major force driving AI right now - hardware improvements. We've barely seen a new hardware generation since ChatGPT was released to the market, we'd certainly expect it to plateau fairly quickly on fixed hardware. My personal experience of AI models is going to be a series of step changes every time the VRAM on my graphics card doubles. Big companies are probably going to see something similar each time a new more powerful product hits the data centre. The algorithms here aren't all that impressive compared to the creeping FLOPS/$ metric.
Bear cases always welcome. This wouldn't be the first time in computing history that progress just falls off the exponential curve suddenly. Although I would bet money on there being a few years left and AGI is achieved.
> Although I would bet money on there being a few years left and AGI is achieved.
Yeah? I'll take you up on that offer. $100AUD AGI won't happen this decade.
hardware improvements don't strike me as the horse to bet on.
LLM Progression seems to be linear and compute needed exponential. And I don't see exponential hardware improvements besides some new technology (that we should not bet on coming ayntime soon).
>At some point there might be massive layoffs due to ostensibly competent AI labor coming onto the scene, perhaps because OpenAI will start heavily propagandizing that these mass layoffs must happen. It will be an overreaction/mistake. The companies that act on that will crash and burn, and will be outcompeted by companies that didn't do the stupid.
(IMO) Apart from programmer assistance (which is already happening), AI agents will find the most use in secretarial, ghostwriting and customer support roles, which generally have a large labor surplus and won't immediately "crash and burn" companies even if there are failures. Perhaps if it's a new startup or a small, unstable business on shaky grounds this could become a "last straw" kind of a factor, but for traditional corporations with good leeway I don't think just a few mistakes about AI deployment can do too much harm. The potential benefits, on the other hand, far outmatch the risk taken.
I see engineering, not software, but the other technical areas that have the biggest threat. High paid, knowledge based fields, but not reliant on interpersonal communication. Secretarial and customer support less so, they aren't terribly high paid and anything that relies on interacting with people is going to meet a lot of pushback. US based call centers is already a big selling point for a lot of companies and chat bots have been around for years in customer support and people hate them and there's a long way to go to change that perception.
The typical AI economic discussion always focuses on job loss, but that's only half the story. We won't just have corporations firing everyone while AI does all the work - who would buy their products then?
The disruption goes both ways. When AI slashes production costs by 10-100x, what's the value proposition of traditional capital? If you don't need to organize large teams or manage complex operations, the advantage of "being a capitalist" diminishes rapidly.
I'm betting on the rise of independents and small teams. The idea that your local doctor or carpenter needs VC funding or an IPO was always ridiculous. Large corps primarily exist to organize labor and reduce transaction costs.
The interesting question: when both executives and frontline workers have access to the same AI tools, who wins? The manager with an MBA or the person with practical skills and domain expertise? My money's on the latter.
Idk where you live, but in my world "being a capitalist" requires you to own capital. And you know what, AI makes it even better to own capital. Now you have these fancey machines doing stuff for you and you dont even need any annoying workers.
By "capitalist," I'm referring to investors whose primary contribution is capital, not making a political statement about capitalism itself.
Capital is crucial when tools and infrastructure are expensive. Consider publishing: pre-internet, starting a newspaper required massive investment in printing presses, materials, staff, and distribution networks. The web reduced these costs dramatically, allowing established media to cut expenses and focus on content creation. However, this also opened the door for bloggers and digital news startups to compete effectively without the traditional capital requirements. Many legacy media companies are losing this battle.
Unless AI systems remain prohibitively expensive (which seems unlikely given current trends), large corporations will face a similar disruption. When the tools of production become accessible to individuals and small teams, the traditional advantage of having deep pockets diminishes significantly.
Regarding "AGI", is there any evidence of true synthetic a priori knowledge from an LLM?
Produce true synthetic a priori knowledge of your own, and ill show you an automated LLM workflow that can arrive at the same outcome without hints.
Hmm, I didn’t read the article but from the gist of other comments, we seem to have bought into Sama’s “agents so good, you don’t need developers/engineers/support/secretaries/whatever anymore”. Issue is, it is almost same as claiming, pocket calculators so good, we don’t need accountants anymore, even computers so good, we don’t need accountants anymore. This AI seems to claim to be that motor car moment when horse cart got replaced. But a horse cart got replaced with a Taxi(and they also have unions protecting them!). With AI, all these “to be replaced” people are like accountants, more productive, same as with higher level languages compared to assembly, many new devs are productive. Despite cars replacing the horse carts of the long past, we still fail to have self driving cars and still someone needs to learn to drive that massive hunk of metal, same as whoever plans to deploy LLM to layoff devs must learn to drive those LLMs and know what it is doing.
I believe it is high time we come out this madness and reveal the lies of the marketers and grifters of AI for what it is. If AI can replace anyone, it should begin with doctors, they work with rote knowledge and service based on explicit(though ambiguous) inputs, same as an LLM needs, but I still have doctors and wait for hours on end in the waiting room to get prescribed a cough hard candy only to later comeback again because it was actually covid and my doctor had a brain fart.
> It blows Google out of the water at being Google
That is enough for me.
I sincerely wonder how long that will be true. Google was amazing and didn't have more than small, easily ignorable ads in 1999, and they weren't really tracking you the way they are today, just an all-around better experience than Google delivers today.
I'm not sure that it's a technology difference that makes LLM a better experience than search today, it's that the VC's are still willing to subsidize user experience today, and won't start looking for return on their investment for a few more years. Give OpenAI 10 years to pull all the levers to pay back the VC investment and what will it be like?
They will sell "training data slots". So that when I'm looking for a butter cookie recipe, ChatGPT says I'll have to use 100g of "Brand (TM) Butter" instead of just "Butter".
Ask it how to deploy an app to the cloud and it will insist you need to deploy it to Azure.
These ads would be easily visible though. You can probably sell far more malicious things.
LLMs seem less hyped than block chains were back in the day
Agreed and unlike blockchain people actually use this product
Some people use blockchain to buy drugs...
My predictions on the matter:
Yeah agree 100%. LLMs are overrated. I describe them as the “Jack of all, master of none” of AI. LLMs are that jackass guy we all know who has to chime in to every topic like he knows everything, but in reality he’s a fraud with low self-esteem.
I’ve known a guy since college who now has a PhD in something niche, supposedly pulls a $200k/yr salary. One of our first conversations (in college, circa 2014) was how he had this clever and easy way to mint money- by selling Minecraft servers installed on Raspberry Pis. Some of you will recognize how asinine this idea was and is. For everyone else- back then, Minecraft only ran on x86 CPUs (and I doubt a Pi would make a good Minecraft server today, even if it were economical). He had no idea what he was talking about, he was just spewing shit like he was God’s gift. Actually, the problem wasn’t that he had no idea- it was that he knew a tiny bit- enough to sound smart to an idiot (remind you of anyone?).
That’s an LLM. A jackass with access to Google.
I’ve had great success with SLMs (small language models), and what’s more I don’t need a rack of NVIDIA L40 GPUs to train and use them.
I think all these articles begging the question: what's author's credential to claim these things.
Be careful about consuming information from chatters, not doers. There is only knowledge from doing, not from pondering.
I'm generally more skeptical when reading takes and predictions from people working at AI companies, who have a financial interest in making sure the hype train continues.
To make an analogy - most people who will tell you not to invest in cryptocurrency are not blockchain engineers. But does that make their opinion invalid?
Of course I trust people who working on L2 chains to tell me how to scale Bitcoin and people who working on cryptography to walk me through the ETH PoS algorithms.
You cannot lead to truth by learning from people who don't know. People who know can be biased, sure, so the best way to learn is to learn the knowledge, not the "hot-takes" or "predictions".
The crypto people have no coherent story about why crypto is fundamentally earth-shaking more than a story about either gambling or regulatory avoidance, whereas the story for AI, if you believe it, is a second industrial revolution and labor automation where, to at least some small extent, it is undeniable.
> Be careful about consuming information from chatters, not doers
The doers produce a new javascript framework every week, claiming it finally solves all the pains of previous frameworks, whereas the chatters pinpoint all the deficiencies and pain points.
One group has an immensely better track record than the other.
I would listen to people who used the previous frameworks about the deficiencies and pain points, not people who just casually browse the documentation about their high-flying ideas why these have deficiencies and pain points.
One group has an immensely more convincing power to me.
LW isn't a place that cares about credentialism.
He has tons of links for the objective statements. You either accept the interpretation or you don't.
[dead]