Inside Rust's std and parking_lot mutexes – who wins?

blog.cuongle.dev

115 points by signa11 5 days ago

dcrazy a minute ago

FYI, Apple platforms have had futexes since iOS 17.4 and macOS 14.4: https://developer.apple.com/documentation/os/synchronization...

kccqzy 6 hours ago

There was a giant super-long GitHub issue about improving Rust std mutexes a few years back. Prior to that issue Rust was using something much worse, pthread_mutex_t. It explained the main reason why the standard library could not just adopt parking_lot mutexes:

From https://github.com/rust-lang/rust/issues/93740

> One of the problems with replacing std's lock implementations by parking_lot is that parking_lot allocates memory for its global hash table. A Rust program can define its own custom allocator, and such a custom allocator will likely use the standard library's locks, creating a cyclic dependency problem where you can't allocate memory without locking, but you can't lock without first allocating the hash table.

> After some discussion, the consensus was to providing the locks as 'thinnest possible wrapper' around the native lock APIs as long as they are still small, efficient, and const constructible. This means SRW locks on Windows, and futex-based locks on Linux, some BSDs, and Wasm.

> This means that on platforms like Linux and Windows, the operating system will be responsible for managing the waiting queues of the locks, such that any kernel improvements and features like debugging facilities in this area are directly available for Rust programs.

tialaramex 3 hours ago

> This means SRW locks on Windows, and futex-based locks on Linux, some BSDs, and Wasm.
Note that the SRW Locks are gone, except if you're on a very old Windows. So today the Rust built-in std mutex for your platform is almost certainly basically a futex though if it is on Windows it is not called a futex and from some angles is better - the same core ideas of the futex apply, we only ask the OS to do any work when we're contended, there is no OS limited resource (other than memory) and our uncontended operations are as fast as they could ever be.
SRW Locks were problematic because they're bulkier than a futex (though mostly when contended) and they have a subtle bug and for a long time it was unclear when Microsoft would get around to fixing that which isn't a huge plus sign for an important intrinsic used in all the high performance software on a $$$ commercial OS...
Mara's work (which you linked) is probably more work, and more important, but it's not actually the most recent large reworking of Rust's Mutex implementation.
wahern 6 hours ago

> Prior to that issue Rust was using something much worse, pthread_mutex_t
Presumably you're referring to this description, from the Github Issue:
> > On most platforms, these structures are currently wrappers around their pthread equivalent, such as pthread_mutex_t. These types are not movable, however, forcing us to wrap them in a Box, resulting in an allocation and indirection for our lock types. This also gets in the way of a const constructor for these types, which makes static locks more complicated than necessary.
pthread mutexes are const-constructible in a literal sense, just not in the sense Rust requires. In C you can initialize a pthread_mutex_t with the PTHREAD_MUTEX_INITIALIZER initializer list instead of pthread_mutex_init, and at least with glibc there's no subsequent allocation when using the lock. But Rust can't do in-place construction[1] (i.e. placement new in C++ parlance), which is why Rust needs to be able to "move" the mutex. Moving a mutex is otherwise non-sensical once the mutex is visible--it's the address of the mutex that the locking is built around.
The only thing you gain by not using pthread_mutex_t is a possible smaller lock--pthread_mutex_t has to contain additional members to support robust, recursive, and error checking mutexes, though altogether that's only 2 or 3 additional words because some are union'd. I guess you also gain the ability to implement locking, including condition variables, barriers, etc, however you want, though now you can't share those through FFI.
[1] At least not without unsafe and some extra work, which presumably is a non-starter for a library type where you want to keep it all transparent.
- nemetroid 3 hours ago
  
  > The effect of referring to a copy of the object when locking, unlocking, or destroying it is undefined.
  https://pubs.opengroup.org/onlinepubs/9699919799/functions/V...
  I.e., if I pthread_mutex_init(&some_addr, ...), I cannot then copy the bits from some_addr to some_other_addr and then pthread_mutex_lock(&some_other_addr). Hence not movable.
  > Moving a mutex is otherwise non-sensical once the mutex is visible
  What does "visible" mean here? In Rust, in any circumstance where a move is possible, there are no other references to that object, hence it is safe to move.
  - tialaramex 3 hours ago
    
    Well, technically if you only have a mutable borrow (it's not your object) then you can't move from it unless you replace it somehow. If you have two such borrows you can swap them, if the type implements Default you can take from one borrow and this replaces it with its default and if you've some other way to make one you can replace the one you've got a reference to with that one, but if you can't make a new one and don't have one to replace it with, then too bad, no moving the one you've got a reference to.
    
    nemetroid 3 hours ago
    
    You're right and I edited my comment.
- kccqzy 5 hours ago
  
  I’m actually thinking of the sheer size of pthread mutexes. They are giant. The issue says that they wanted something small, efficient, and const constructible. Pthread mutexes are too large for most applications doing fine-grained locking.
  - tialaramex 3 hours ago
    
    On a typical modern 64-bit Linux for example they're 40 bytes ie they are 320 bits. So yeah, unnecessarily bulky.
    On my Linux system today Rust's Mutex<Option<CompactString>> is smaller than the pthread mutex type whether it is locked and has the text "pthread_mutex_t is awful" inside it or maybe unlocked with explicitly no text (not an empty string), either would only take like 30-odd bytes, the pthread_mutex_t is 40 bytes.
    On Windows the discrepancy is even bigger, their OS native mutex type is this sprawling 80 byte monster while their Mutex<Option<CompactString> is I believe slightly smaller than on Linux even though it has the same features.
strbean 5 hours ago

Seems like the simple solution to this problem would be to have both, no?
A simple native lock in the standard library along with a nicer implementation (also in the standard library) that depends on the simple lock?
- scottlamb 2 hours ago
  
  The simplest solution is for `std::mutex` to provide a simple, efficient mutex which is a good choice for almost any program. And it does. Niche programs can pull in a crate.
  I doubt `parking_lot` would have been broadly used—maybe wouldn't even have been written—if `std` had this implementation from the start.
  What specifically in this comparison made you think that `parking_lot` is broadly needed? They had to work pretty hard to find a scenario in which `parking_lot` did much better in any performance metrics. And as I alluded to in another comment, `parking_lot::Mutex<InnerFoo>` doesn't have a size advantage over `std::mutex::Mutex<InnerFoo>` when `InnerFoo` has word alignment. That's the most common situation, I think.
  If I were to make a wishlist of features for `std::mutex` to just have, it wouldn't be anything `parking_lot` offers. It'd be stuff like the lock contention monitoring that the (C++) `absl::Mutex` has. (And at least on some platforms you can do a decent job of monitoring this with `std::mutex` by monitoring the underlying futex activity.)
- loeg 3 hours ago
  
  My takeaway is that the documentation should make more explicit recommendations depending on the situation -- i.e., people writing custom allocators should use std mutexes; most libraries and allocations that are ok with allocation should use parking_lot mutexes; embedded or libraries that don't want to depend on allocate should use std mutexes. Or maybe parking_lot is almost useless unless you're doing very fine-grained locking. Something like that.

pizlonator 7 hours ago

Author of the original WTF::ParkingLot here (what rust’s parking_lot is based on).

I’m surprised that this only compared to std on one platform (Linux).

The main benefit of parking lot is that it makes locks very small, which then encourages the use of fine grained locking. For example, in JavaScriptCore (ParkingLot’s first customer), we stuff a 2-bit lock into every object header - so if there is ever a need to do some locking for internal VM reasons on any object we can do that without increasing the size of the object

scottlamb 5 hours ago

> The main benefit of parking lot is that it makes locks very small, which then encourages the use of fine grained locking. For example, in JavaScriptCore (ParkingLot’s first customer), we stuff a 2-bit lock into every object header - so if there is ever a need to do some locking for internal VM reasons on any object we can do that without increasing the size of the object
IMHO that's a very cool feature which is essentially wasted when using it as a `Mutex<InnerBlah>` because the mutex's size will get rounded up to the alignment of `InnerBlah`. And even when not doing that, afaict `parking_lot` doesn't expose a way to use the remaining six bits in `parking_lot::RawMutex`. I think the new std mutexes made the right choice to use a different design.
> I’m surprised that this only compared to std on one platform (Linux).
Can't speak for the author, but I suspect a lot of people really only care about performance under Linux. I write software that I often develop from a Mac but almost entirely deploy on Linux. (But speaking of Macs: std::mutex doesn't yet use futexes on macOS. Might happen soon. https://github.com/rust-lang/rust/pull/122408)
- pizlonator 4 hours ago
  
  > I suspect a lot of people really only care about performance under Linux
  Yeah this is true
nextaccountic 5 hours ago

How can a parking_lot lock be less than 1 byte? does this uses unsafe?
Rust in general doesn't support bit-level objects unless you cast things to [u8] and do some shifts and masking manually (that is, like C), which of course is wildly unsafe for data structures with safety invariants
- pizlonator 4 hours ago
  
  Original post: https://webkit.org/blog/6161/locking-in-webkit/
  Post that mentions the two bit lock: https://webkit.org/blog/7122/introducing-riptide-webkits-ret...
  I don’t know the details of the Rust port but I don’t imagine the part that involves the two bits to require unsafe, other than in the ways that any locking algorithm dances with unsafety in Rust (ownership relies on locking algorithms being correct)
  - writebetterc 3 hours ago
    
    This is very similar to how Java's object monitors are implemented. In OpenJDK, the markWord uses two bits to describe the state of an Object's monitor (see markWord.hpp:55). On contention, the monitor is said to become inflated, which basically means revving up a heavier lock and knowing how to find it.
    I'm a bit disappointed though, I assumed that you had a way of only using 2 bits of an object's memory somehow, but it seems like the lock takes a full byte?
    
    zozbot234 3 hours ago
    
    The idea is that six bits in the byte are free to use as you wish. Of course you'll need to implement operations on those six bits as CAS loops (which nonetheless allow for any arbitrary RMW operation) to avoid interfering with the mutex state.
- bobbylarrybobby 23 minutes ago
  
  The lock uses two bits but still takes up a whole (atomic) byte
- Conscat 5 hours ago
  
  This article elaborates how it works.
  - scottlamb 2 hours ago
    
    Unhelpful response. This cuongle.dev article does not answer nextaccountic's question, and neither do the webkit.org articles that describe the parking lot concept but not this Rust implementation. The correct answer appears to be that it's impossible: `parking_lot::RawMutex` has private storage that owns the entire byte and does not provide any accessor for the unused six bits.
    https://docs.rs/parking_lot/0.12.5/parking_lot/struct.RawMut...
    (unless there's somewhere else in the crate that provides an accessor for this but that'd be a weird interface)
    (or you just use transmute to "know" that it's one byte and which bits within the byte it actually cares about, but really don't do that)
    (slightly more realistically, you could probably use the `parking_lot_core::park` portion of the implementation and build your own equivalent of `parking_lot::RawMutex` on top of it)
    (or you send the `parking_lot` folks a PR to extend `parking_lot::RawMutex` with interface you want; it is open source after all)
    
    loeg an hour ago
    
    The two bit lock was specifically refering to the C++ WTF::ParkingLot (and the comment mentioning it explicitly said that). nextaccountic is confused.
    
    scottlamb an hour ago
    
    No. nextaccountic's comment and the cuongle.dev article are both talking about Rust. The Rust `parking_lot` implementation only uses two bits within a byte, but it doesn't provide a way for anything else to use the remaining six.
    pizlonator's comments mention both the (C++) WTF::ParkingLot and the Rust `parking_lot`, and they don't answer nextaccountic's question about the latter.
    > nextaccountic is confused.
    nextaccountic asked how this idea could be applied to this Rust implementation. That's a perfectly reasonable question. pizlonator didn't know the anwer. That's perfectly reasonable too. Conscat suggested the article would be helpful; that was wrong.
    
    loeg an hour ago
    
    nextaccountic replied to this original comment: https://news.ycombinator.com/item?id=46035698
    Yes, nextaccountic's reply is confused about Rust vs C++ implementations. But the original mention was not talking about Rust.

adzm 6 hours ago

The original webkit blog post about parking lot mutex implementation is a great read https://webkit.org/blog/6161/locking-in-webkit/

kouteiheika 5 hours ago

> Poisoning: Panic Safety in Mutexes

This is one of the biggest design flaws in Rust's std, in my opinion.

Poisoning mutexes can have its use, but it's very rare in practice. Usually it's a huge misfeature that only introduces problems. More often than not panicking in a critical section is fine[1], but on the other hand poisoning a Mutex is a very convenient avenue for a denial-of-service attack, since a poisoned Mutex will just completely brick a given critical section.

I'm not saying such a project doesn't exist, but I don't think I've ever seen a project which does anything sensible with Mutex's `Poisoned` error besides ignoring it. It's always either an `unwrap` (and we know how well that can go [2]), or do the sensible thing and do this ridiculous song-and-dance:

    let guard = match mutex {
        Ok(guard) => guard,
        Err(poisoned) => poisoned.into_inner()
    };

Suffice to say, it's a pain.

So in a lot of projects when I need a mutex I just add `parking_lot`, because its performance is stellar, and it doesn't have the poisoning insanity to deal with.

[1] -- obviously it depends on a case-by-case basis, but if you're using such a low level primitive you should know what you're doing

[2] -- https://blog.cloudflare.com/18-november-2025-outage/#memory-...

LegionMammal978 4 hours ago

> It's always either an `unwrap` (and we know how well that can go [2])
If a mutex has been poisoned, then something must have already panicked, likely in some other thread, so you're already in trouble at that point. It's fine to panic in a critical section if something's horribly wrong, the problem comes with blindly continuing after a panic in other threads that operate on the same data. In general, you're unlikely to know what that panic was, so you have no clue if the shared data might be incompletely modified or otherwise logically corrupted.
In general, unless I were being careful to maintain fault boundaries between threads or tasks (the archetypical example being an HTTP server handling independent requests), I'd want a panic in one thread to cascade into stopping the program as soon as possible. I wouldn't want to swallow it up and keep using the same data like nothing's wrong.
- kouteiheika 4 hours ago
  
  > If a mutex has been poisoned, then something must have already panicked, likely in some other thread, so you're already in trouble at that point.
  I find that in the majority of cases you're essentially dealing with one of two cases:
  1) Your critical sections are tiny and you know you can't panic, in which case dealing with poisoning is just useless busywork.
  2) You use a Mutex to get around Rust's "shared xor mutable" requirement. That is, you just want to temporarily grab a mutable reference and modify an object, but you don't have any particular atomicity requirements. In this case panicking is no different than if you would panic on a single thread while modifying an object through a plain old `&mut`. Here too dealing with poisoning is just useless busywork.
  > I'd want a panic in one thread to cascade into stopping the program as soon as possible.
  Sure, but you don't need mutex poisoning for this.
  - LegionMammal978 3 hours ago
    
    > 1) Your critical sections are tiny and you know you can't panic, in which case dealing with poisoning is just useless busywork.
    Many people underestimate how many things can panic in corner cases. I've found quite a few unsafe functions in various crates that were unsound due to integer-overflow panics that the author hadn't noticed. Knowing for a fact that your operation cannot panic is the exception rather than the rule, and while it's unfortunate that the std Mutex doesn't accomodate non-poisoning mutexes, I see poisoning as a reasonable default.
    (If Mutex::lock() unwrapped the error automatically, then very few people would even think about the "useless busywork" of the poison bit. For a similar example, the future types generated for async functions contain panic statements in case they are polled after completion, and no one complains about those.)
    > 2) You use a Mutex to get around Rust's "shared xor mutable" requirement. That is, you just want to temporarily grab a mutable reference and modify an object, but you don't have any particular atomicity requirements.
    Then I'd stick to a RefCell. Unless it's a static variable in a single-threaded program, in which case I usually just write some short wrapper functions if I find the manipulation too tedious.
- kprotty 4 hours ago
  
  > so you have no clue if the shared data might be incompletely modified or otherwise logically corrupted.
  One can make a panic wrapper type if they cared: It's what the stdlib Mutex currently does:
  MutexGuard checks if its panicking during drop using `std::thread::panicking()`, and if so, sets a bool on the Mutex. The next acquirer checks for that bool & knows state may be corrupted. No need to bake this into the Mutex itself.
  - LegionMammal978 3 hours ago
    
    My point is that "blindly continuing" is not a great default if you "don't care". If you continue, then you first have to be aware that a multithreaded program can and will continue after a panic in the first place (most people don't think about panics at all), and you also have to know the state of the data after every possible panic, if any. Overall, you have to be quite careful if you want to continue properly, without risking downstream bugs.
    The design with a verbose ".lock().unwrap()" and no easy opt-out is unfortunate, but conceptually, I see poisoning as a perfectly acceptable default for people who don't spend all their time musing over panics and their possible causes and effects.
sunshowers 4 hours ago

To the contrary, the projects I've been part of have had no end of issues related to being cancelled in the middle of a critical section [1]. I consider poisoning to be table stakes for a mutex.
[1] https://sunshowers.io/posts/cancelling-async-rust/#the-pain-...
- kouteiheika 4 hours ago
  
  Well, I mean, if you've made the unfortunate decision to hold a Mutex across await points...?
  This is completely banned in all of my projects. I have a 100k+ LOC project running in production, that is heavily async and with pervasive usage of threads and mutexes, and I never had a problem, precisely because I never hold a mutex across an await point. Hell, I don't even use async mutexes - I just use normal synchronous parking lot mutexes (since I find the async ones somewhat pointless). I just never hold them across await points.
  - sunshowers 2 hours ago
    
    As I said in the article, we avoid Tokio mutexes entirely for the exact reason that being cancelled in the middle of a critical section is bad. In Rust, there are two sources of cancellations in the middle of a critical section: async cancellations and panics. Ergo, panicking in the middle of a critical section is also bad, and mutexes ought to detect that and mark their internal state as corrupted as a result.
    
    kouteiheika 2 hours ago
    
    > Ergo, panicking in the middle of a critical section is also bad, and mutexes ought to detect that and mark their internal state as corrupted as a result.
    I fundamentally disagree with this. Panicking in the middle of an operation that is supposed to be atomic is bad. If it's not supposed to be atomic then it's totally fine, just as panicking when you hold a plain old `&mut` is fine. Not every use of a `Mutex` is protecting an atomic operation that depends on not being cancelled for its correctness, and even for those situations where you do it's a better idea to prove that a panic cannot happen (if possible) or gracefully handle the panic.
    I really don't see a point of mutex poisoning in most cases. You can either safely panic while you're holding a mutex (because your code doesn't care about atomicity), or you simply write your code in such a way that it's still correct even if you panic (e.g. if you temporarily `.take()` something in your critical section then you write a wrapper which restores it on `Drop` in case of a panic). The only thing poisoning achieves is to accidentally give you denial-of-service CVEs, and is actively harmful when it comes to producing reliable software.
    
    conradludgate 9 minutes ago
    
    You might not think you need atomicity, but some function you call that takes in a `&mut T` might actually expect it
JoshTriplett 4 hours ago

We're currently working on separating poison from mutexes, such that the default mutexes won't have poisoning (no more `.lock().unwrap()`), and if you want poisoning you can use something like `Mutex<Poison<T>>`.
- kouteiheika 3 hours ago
  
  Yeah, I'm looking forward to it!
  While we're at it, another thing that'd be nice to get rid of is `AssertUnwindSafe`, which I find even more pointless.
- sunshowers 2 hours ago
  
  I'm very disappointed at this. The path of least resistance ought to be the right thing to do.
thayne 5 hours ago

There are cases where it is useful.
I had a case where if the mutex was poisened it was possible to reset the lock to a safe state (by writing a new value to the locked content).
Or you may want to drop some resource or restart some operation instead of panicing if it is poisoned.
But I agree that the default behavior should be that the user doesn't have to worry about it.

ballpug an hour ago

For Cargo.toml, an error: invalid basic string, expected `"` for 10:11 std/parking_lot_mutexes.

Sourcing VS, documentation indicates Python, C/C++, GitHubCopilot, and an Extension Pack for Java in top extensions.

[1]: https://code.visualstudio.com/docs

kccqzy 4 hours ago

I will personally recommend that unless you are writing performance sensitive code*, don’t use mutexes at all because they are too low-level an abstraction. Use MPSC queues for example, or something like RCU. I find these abstractions much more developer friendly.

*: You may be, since you are using Rust.

fpoling 2 hours ago

I have found out that mutex solutions are more maintainable and amendable without big redesigns compared with channels or RCU.
Consider a simple case of single producer-single consumer. While one can use bounded channels to implement back-pressure, in practice when one wants to either drop messages or apply back-pressure based on message priority any solution involving channels will lead to pile of complex multi-channel solutions and select. With mutex the change will be a straightforward replace of a queue by a priority queue and an extra if inside the mutex.
jltsiren 3 hours ago

A mutex is a natural abstraction when there is exactly one of them. You have a bunch of tasks doing their own stuff, with shared mutable state behind the mutex. When you start thinking about using two mutexes, other abstractions often become more convenient.

Shelby-Thomas 5 hours ago

[dead]

Barry-Perkins 3 hours ago

[dead]

mgaunard 3 hours ago

tl;dr: the implementation that is designed for fairness has lower standard deviation under contention, but otherwise performs slightly worse.

Nothing too surprising.