I would also put into question if you _really_ need to check for updates every 5 minutes. Once per startup is already enough, and if you're concerned about users who leave it on for days, it could easily be daily or even less often.
A 5 minute update check interval is usage-reporting in disguise. Way fewer people would turn off a setting labeled “check for updates” than one labeled “report usage statistics”.
Screen Studio can collect basic usage data to help us improve the app, but you can opt out of it during the first launch. You can also opt out at any time in the app settings.
Eh, this one is probably ignorance over malice. It's super common to see people who need to make an arbitrary interval choice go with 300 out of habit.
To be as user friendly as possible, always ask if user wants automatic background updates or not. If you can’t update without user noticing it, please implement manual updates as two mechanisms:
1) Emergency update for remote exploit fixes only
2) Regular updates
The emergency update can show a popup, but only once. It should explain the security risk. But allow user to decline, as you should never interrupt work in progress. After decline leave an always visible small warning banner in the app until approved.
The regular update should never popup, only show a very mild update reminder that is NOT always visible, instead behind a menu that is frequently used. Do not show notification badges, they frustrate people with inbox type 0 condition.
This is the most user friendly way of suggesting manual updates.
You have to understand, if user has 30 pieces of software, they have to update every day of the month. That is not a good overall user experience.
> You have to understand, if user has 30 pieces of software, they have to update every day of the month. That is not a good overall user experience.
That's not an user issue tho, it's a "packaging and distribution of updates" issue which coincidentally has been solved for other OS:es using a package manager.
I'd also question if the updater needs to download the update before the user saying they want it. Why not check against a simple endpoint if a newer version is available and if so, prompt the user that an update could be downloaded and then download it. This would also allow the user to delay the update if they are on metered connections.
First thing I thought as well. Every 5 minutes for a screen recording software is an absurd frequency. I doubt they release multiple new versions per day.
It's near-instant now not usually because of more incessant polling, but because it simply keeps the connection open (can last many hours without sending a single byte, depending also on the platform) and writes data onto it as needed (IMAP IDLE). This has gotten more efficient if anything
Yeah but that should be a variable anyways. Maybe even a variable provided by the server. But in this case it should be on demand. with the old version cached and only downloading the new one when there is a new version once a day.
> Screen Studio is a screen recorder for macOS. It is desktop app. It means we need some auto-updater to allow users to install the latest app version easily.
No, it doesn't mean that.
Auto updater introduced series of bad outcomes.
- Downloading update without consent, causing traffic for client.
- Not only that, the download keeps repeating itself every 5 minutes? You did at least detect whether user is on metered connection, right... ?
- A bug where update popup interrupts flow
- A popup is a bad thing on itself you do to your users. I think it is OK when closing the app and let the rest be done in background.
- Some people actually pay attention to outgoing connections apps make and even a simple update check every 5 minutes is excessive. Why even do it while app is running? Do on startup and ask on close. Again some complexity: Assume you're not on network, do it in background and don't bother retrying much.
- Additional complexity for app that caused all of the above. And it came with a price tag to developer.
Wouldn't app store be perfect way to handle updates in this case to offload the complexity there?
App store updates are perfect: no unnecessary complications, no unnecessary work (assuming Screen Studio is published and properly updated in the app store), and the worst case scenario is notifications about a new Screen Studio version ending up in a Screen Studio recording in progress.
Thinking of it, the discussed do-it-yourself update checking is so stupid that malice and/or other serious bugs should be assumed.
That was a thing I thought was missing from this writeup. Ideally you only roll up the update to a small percent of users. You then check to see if anything broke (no idea how long to wait, 1 day?). Then you increase the percent a little more (say, 1% to 5%) and wait a day again and check. Finally you update everyone (who has updates on)
yes obviously something as mature as the App store supports phased rollout. I believe it is even the default setting once you reach certain audience sizes. Updates are always spread over 7 days slowly increasing the numbers
I am always kind of a stickler about code reviews. I once had a manager tell me that I should leave more to QA with an offhand comment along the lines of "what is the worst that could happen" to which I replied without missing a beat "We all lose our jobs. We are always one bad line of code away from losing our jobs"
The number of times I have caught junior or even experienced devs writing potential PII leaks is absolutely wild. It's just crazy easy in most systems to open yourself up to potential legal issues.
Contrarian approach: $8000 is not a lot in this context. What did the CEO think of this? Most of the time it is just a very small speed bump in the overall finances of the company.
Avoidable, unfortunate, but the cost of slowing down development progress e.g. 10% is much higher.
But agree that senior gatekeepers should know by heart some places where review needs to be extra careful. Like security pitfalls, exponential fallback of error handling, and yeah, probably this.
How do you adjust your testing approach to catch cases like this? In my experience, timing related issues are hard to catch and can linger for years unnoticed.
I would mock/hook/monkey patch/whatever the functions to get the current time/elapsed time, simulate a period of time (a day/week/month/year/whatever), make sure the function to download the file is called the correct amount of times. That would probably catch this bug.
Although, after such a fuck up, I would be tempted to make a pre-release check that tests the compiled binary, not any unit test or whatever. Use LD_PRELOAD to hook the system timing functions(a quick google shows that libfaketime[0] exists, but I've never used it), launch the real program and speed up time to make sure it doesn't try to download more than once.
> We decided to take responsibility and offer to cover all the costs related to this situation.
Good on them. Most companies would cap their responsibility at a refund of their own service's fees, which is understandable as you can't really predict costs incurred by those using your service, but this is going above and beyond and it's great to see.
The scale is astounding. I was briefly interested in the person that caused the error then immediately realized it was irrelevant because if a mechanism doesn't exist to catch an issue like that, then any company is living on borrowed time.
Well when I follow a desktop link on my phone, it redirects me to the mobile version, despite the URL specifically asking to serve the desktop site, it just doesn't work the other way around. Plus I never asked to see the mobile site, I followed a link someone else posted
They had no cost usage alerts. So they even did not know that the thing was happening, just realized with the first bill.
I think that is the essence of what is wrong with the cloud costs. Defaulting to possibility for everyone to scale rapidly while in reality 99% have quite predictable costs month over month.
>Add special signals you can change on your server, which the app will understand, such as a forced update that will install without asking the user.
I understand the reasoning, but that makes it feel a bit too close to a C&C server for my liking. If the update server ever gets compromised, I imagine this could increase the damage done drastically.
I read that as "A single line of code costs $8000" which, from the comments seems like a few others had the same thought. Reading the article it is not costs and the original title is "One line of code that did cost $8,000", so as some others have pointed out it is a bug that cost $8000.
I have Screen Studio and I don't leave it open but all I wish for now is that you disable the auto updater. Provide an option for it to be disabled and allow for manual update checking. Checking for an update every 5 minutes is total overkill and downloading the update automatically is just bad. What if I was on mobile internet and had limited bandwidth and usage. The last thing I want is an app downloading it's own update without my consent and knowledge.
What might be fun is figuring out all the ways this bug could have been avoided.
Another way to avoid this problem would have been using a form of “content addressable storage”. For those who are new, this is just a fancy way of saying make sure to store/distribute the hash (ex. Sha256) of what you’re distributing and store it on disk in a way that content can be effectively deduplicated by name.
It’s probably not so easy as to make it a rule, but most of the time, an update download should probably do this
Note that billing alerts protect against unexpected network traffic, not directly against bugs and bad design in the software. Update checking remains a terrible idea.
And in that case, the problem would not be discovered until 1) someone opened a bug report, which rarely happens, because any competent user would just disable auto-updates, and 2) that bug report would be investigated, which also rarely happens.
I assume most of that 2PB network traffic was not egress, right? Otherwise how did it "only" cost you $8k on Google Cloud?
Even at a cost of 0.02$ per GB, which is usually a few times lower than the actual prices I could find there, that would still result in an invoice of about $40k...
> While refactoring it, I forgot to add the code to stop the 5-minute interval after the new version file was available and downloaded.
I’m sorry but it’s exactly cases like these that should be covered by some kind of test, especially When diving into a refactor. Admittedly it’s nice to hear people share their mistakes and horror stories, I would get some stick for this at work.
These articles are great, but I have to one-up the blog: I recently helped a small dev team clean up a one-line mistake that cost them $95,000... which they didn't notice for three months.
The relevance is that instead of checking for a change every 5 minutes, the delay wasn't working at all, so the check ran as fast as possible in a tight loop. This was between a server and a blob storage account, so there was no network bottleneck to slow things down either.
It turns out that if you read a few megabytes 1,000 times per second all day, every day, those fractions of a cent per request are going to add up!
I would also put into question if you _really_ need to check for updates every 5 minutes. Once per startup is already enough, and if you're concerned about users who leave it on for days, it could easily be daily or even less often.
A 5 minute update check interval is usage-reporting in disguise. Way fewer people would turn off a setting labeled “check for updates” than one labeled “report usage statistics”.
Don’t give them ideas!!
Do they say that they don't do any usage reporting?
from their FAQ on the buttom of the fronpage:
Screen Studio can collect basic usage data to help us improve the app, but you can opt out of it during the first launch. You can also opt out at any time in the app settings.
Eh, this one is probably ignorance over malice. It's super common to see people who need to make an arbitrary interval choice go with 300 out of habit.
To be as user friendly as possible, always ask if user wants automatic background updates or not. If you can’t update without user noticing it, please implement manual updates as two mechanisms:
1) Emergency update for remote exploit fixes only
2) Regular updates
The emergency update can show a popup, but only once. It should explain the security risk. But allow user to decline, as you should never interrupt work in progress. After decline leave an always visible small warning banner in the app until approved.
The regular update should never popup, only show a very mild update reminder that is NOT always visible, instead behind a menu that is frequently used. Do not show notification badges, they frustrate people with inbox type 0 condition.
This is the most user friendly way of suggesting manual updates.
You have to understand, if user has 30 pieces of software, they have to update every day of the month. That is not a good overall user experience.
> You have to understand, if user has 30 pieces of software, they have to update every day of the month. That is not a good overall user experience.
That's not an user issue tho, it's a "packaging and distribution of updates" issue which coincidentally has been solved for other OS:es using a package manager.
In the previous year 2023 discussion, the founder says that the update interval was changed to 3 hours. lol. see https://news.ycombinator.com/item?id=35873727
If the update interval had been 1 day+, they probably wouldn't have noticed after one month when they had a 5 minute update check interval.
I'd also question if the updater needs to download the update before the user saying they want it. Why not check against a simple endpoint if a newer version is available and if so, prompt the user that an update could be downloaded and then download it. This would also allow the user to delay the update if they are on metered connections.
First thing I thought as well. Every 5 minutes for a screen recording software is an absurd frequency. I doubt they release multiple new versions per day.
IIRC, Every 5 minutes used to be the standard interval between email checks, back in the days of dialup and desktop email clients.
How the times have changed ..
It's near-instant now not usually because of more incessant polling, but because it simply keeps the connection open (can last many hours without sending a single byte, depending also on the platform) and writes data onto it as needed (IMAP IDLE). This has gotten more efficient if anything
And because how expensive they were in Portugal, I never done it, it was always on manual.
Depends on the application. I have my browser running for months at a time.
Yeah but that should be a variable anyways. Maybe even a variable provided by the server. But in this case it should be on demand. with the old version cached and only downloading the new one when there is a new version once a day.
Yeah but that should be a variable anyways. Maybe even a variable provided by the server.
> Screen Studio is a screen recorder for macOS. It is desktop app. It means we need some auto-updater to allow users to install the latest app version easily.
No, it doesn't mean that.
Auto updater introduced series of bad outcomes.
- Downloading update without consent, causing traffic for client.
- Not only that, the download keeps repeating itself every 5 minutes? You did at least detect whether user is on metered connection, right... ?
- A bug where update popup interrupts flow
- A popup is a bad thing on itself you do to your users. I think it is OK when closing the app and let the rest be done in background.
- Some people actually pay attention to outgoing connections apps make and even a simple update check every 5 minutes is excessive. Why even do it while app is running? Do on startup and ask on close. Again some complexity: Assume you're not on network, do it in background and don't bother retrying much.
- Additional complexity for app that caused all of the above. And it came with a price tag to developer.
Wouldn't app store be perfect way to handle updates in this case to offload the complexity there?
App store updates are perfect: no unnecessary complications, no unnecessary work (assuming Screen Studio is published and properly updated in the app store), and the worst case scenario is notifications about a new Screen Studio version ending up in a Screen Studio recording in progress.
Thinking of it, the discussed do-it-yourself update checking is so stupid that malice and/or other serious bugs should be assumed.
Does the app store handle staged rollouts?
That was a thing I thought was missing from this writeup. Ideally you only roll up the update to a small percent of users. You then check to see if anything broke (no idea how long to wait, 1 day?). Then you increase the percent a little more (say, 1% to 5%) and wait a day again and check. Finally you update everyone (who has updates on)
yes obviously something as mature as the App store supports phased rollout. I believe it is even the default setting once you reach certain audience sizes. Updates are always spread over 7 days slowly increasing the numbers
Yes it does support this
While we're listing complaints... 250MB for a screen recorder update?
I am always kind of a stickler about code reviews. I once had a manager tell me that I should leave more to QA with an offhand comment along the lines of "what is the worst that could happen" to which I replied without missing a beat "We all lose our jobs. We are always one bad line of code away from losing our jobs"
The number of times I have caught junior or even experienced devs writing potential PII leaks is absolutely wild. It's just crazy easy in most systems to open yourself up to potential legal issues.
...And if there's no one around to review the code?
The website makes it seem like it's a one person shop.
Just amazed that ‘better testing’ isn’t one of the takeaways in the summary.
Just amazed. Yea ‘write code carefully’ as if suggesting that’ll fix it is a rookie mistake.
So so frustrating when developers treat user machines like their test bed!
Contrarian approach: $8000 is not a lot in this context. What did the CEO think of this? Most of the time it is just a very small speed bump in the overall finances of the company.
Avoidable, unfortunate, but the cost of slowing down development progress e.g. 10% is much higher.
But agree that senior gatekeepers should know by heart some places where review needs to be extra careful. Like security pitfalls, exponential fallback of error handling, and yeah, probably this.
How do you adjust your testing approach to catch cases like this? In my experience, timing related issues are hard to catch and can linger for years unnoticed.
I would mock/hook/monkey patch/whatever the functions to get the current time/elapsed time, simulate a period of time (a day/week/month/year/whatever), make sure the function to download the file is called the correct amount of times. That would probably catch this bug.
Although, after such a fuck up, I would be tempted to make a pre-release check that tests the compiled binary, not any unit test or whatever. Use LD_PRELOAD to hook the system timing functions(a quick google shows that libfaketime[0] exists, but I've never used it), launch the real program and speed up time to make sure it doesn't try to download more than once.
[0] https://github.com/wolfcw/libfaketime
> We decided to take responsibility and offer to cover all the costs related to this situation.
Good on them. Most companies would cap their responsibility at a refund of their own service's fees, which is understandable as you can't really predict costs incurred by those using your service, but this is going above and beyond and it's great to see.
"Luckily, it was not needed"
Why in the world does it need to check for updates every 5 minutes?
The author seemed to enjoy calculating the massive bandwidth numbers, but didn’t stop to question whether 5 minutes was a totally ridiculous.
‘Simple’ bugs get a bit more expensive than this…
https://en.m.wikipedia.org/wiki/Knight_Capital_Group#2012_st...
440m usd
The scale is astounding. I was briefly interested in the person that caused the error then immediately realized it was irrelevant because if a mechanism doesn't exist to catch an issue like that, then any company is living on borrowed time.
whyyy does wikipedia not redirect mobile links to the desktop website when you have a desktop UA?
Because people on desktops asking for the mobile site should be able to view the mobile site.
The url specifically asks Wikipedia to serve the mobile site.
Well when I follow a desktop link on my phone, it redirects me to the mobile version, despite the URL specifically asking to serve the desktop site, it just doesn't work the other way around. Plus I never asked to see the mobile site, I followed a link someone else posted
Why do they have a separate mobile website at all instead of writing proper CSS to make one website work on all devices?
See also: Every time a small error in a spreadsheet has caused a huge problem https://eusprig.org/research-info/horror-stories/
(2023)
Previous discussion: https://news.ycombinator.com/item?id=35858778
They had no cost usage alerts. So they even did not know that the thing was happening, just realized with the first bill.
I think that is the essence of what is wrong with the cloud costs. Defaulting to possibility for everyone to scale rapidly while in reality 99% have quite predictable costs month over month.
Not to mention the cost users paid to download 250 MB every 5 minutes.
It was mentioned, at the bottom. One customer even had their ISP cancel their service.
It seems a bit self centered to make their lost $8000 the focus of the article.
The title should have been: "how a single line of code cost our users probably more than $8000"
Why on earth are you checking for updates every 5 minutes to begin with?!
Seriously this alone makes me question everything about this app.
It would also be nice if the update archive wasn't 250MB. Sparkle framework supports delta updates, which can cut down the traffic considerably.
This is an electron app.
which is their design choice, not an obligation.
Electron really messed up a few things in this world
>Add special signals you can change on your server, which the app will understand, such as a forced update that will install without asking the user.
I understand the reasoning, but that makes it feel a bit too close to a C&C server for my liking. If the update server ever gets compromised, I imagine this could increase the damage done drastically.
I read that as "A single line of code costs $8000" which, from the comments seems like a few others had the same thought. Reading the article it is not costs and the original title is "One line of code that did cost $8,000", so as some others have pointed out it is a bug that cost $8000.
I have Screen Studio and I don't leave it open but all I wish for now is that you disable the auto updater. Provide an option for it to be disabled and allow for manual update checking. Checking for an update every 5 minutes is total overkill and downloading the update automatically is just bad. What if I was on mobile internet and had limited bandwidth and usage. The last thing I want is an app downloading it's own update without my consent and knowledge.
Bugs are great chances to learn.
What might be fun is figuring out all the ways this bug could have been avoided.
Another way to avoid this problem would have been using a form of “content addressable storage”. For those who are new, this is just a fancy way of saying make sure to store/distribute the hash (ex. Sha256) of what you’re distributing and store it on disk in a way that content can be effectively deduplicated by name.
It’s probably not so easy as to make it a rule, but most of the time, an update download should probably do this
> out all the ways this bug could have been avoided.
The most obvious one is setting up billing alerts.
Past a certain level of complexity, you're often better off focusing on mitigation that trying to avoid every instance of a certain kind of error.
Note that billing alerts protect against unexpected network traffic, not directly against bugs and bad design in the software. Update checking remains a terrible idea.
Oh boy, I know of at least one case where a single line of code cost ~$500k…
Curious where the high-water mark is across all HNers (:
Others have reported higher already, but for data:
Our team had a bug that cost us about $120k over a week.
Another bug running on a large system had an unmeasurable cost. (Could $K, could be $M)
I would be surprised if half of the user on this site did _not_ create or personally see a bug where a line cost way more than $8000
$1.2mln, gone in about 30 minutes.
Plenty of (valid) criticism in the comments, but I appreciate the developer for publishing it.
Knowing where to put the line: $7999 (is sadly not the story)
Ever consider not using cloud for everything? Hosting this on traditional hosting would have limited the problem and the cost.
And in that case, the problem would not be discovered until 1) someone opened a bug report, which rarely happens, because any competent user would just disable auto-updates, and 2) that bug report would be investigated, which also rarely happens.
It's not like you are forbidden to monitor your services just because you didn't put them in big clown.
Would have cost $0 on Cloudflare's R2.
You need https://www.vantage.sh
Set up daily emails.
Set up cost anomaly alerts.
I assume most of that 2PB network traffic was not egress, right? Otherwise how did it "only" cost you $8k on Google Cloud? Even at a cost of 0.02$ per GB, which is usually a few times lower than the actual prices I could find there, that would still result in an invoice of about $40k...
I'll let my employer know to update my salary or reduce my workload.
So did you pay or Google showed you mercy by chewing their potential earnings?
meanwhile the CTOs plan to apply AI into their production codebases :)
> While refactoring it, I forgot to add the code to stop the 5-minute interval after the new version file was available and downloaded.
I’m sorry but it’s exactly cases like these that should be covered by some kind of test, especially When diving into a refactor. Admittedly it’s nice to hear people share their mistakes and horror stories, I would get some stick for this at work.
(2023)
These articles are great, but I have to one-up the blog: I recently helped a small dev team clean up a one-line mistake that cost them $95,000... which they didn't notice for three months.
The relevance is that instead of checking for a change every 5 minutes, the delay wasn't working at all, so the check ran as fast as possible in a tight loop. This was between a server and a blob storage account, so there was no network bottleneck to slow things down either.
It turns out that if you read a few megabytes 1,000 times per second all day, every day, those fractions of a cent per request are going to add up!
In other news a screen recorder app is a 250MB (presumably compressed) download...
FWIW, OBS is ~150 MB, not an electron app and actually open source.
https://obsproject.com/
With public sector procurement, $8000 is a pretty standard price for a line of code.
Do you mean "a" line of code, or "each" line of code?
A dead-simple 1000-line app? $8 million from Accenture, IBM or similar
[flagged]