Pretty State Machine Patterns in Rust (2016)

135 points by PaulHoule 6 months ago

phibz 6 months ago

I'm a bit surprised that they don't use the name "type state". Perhaps it wasn't in wide use when this post was originally written?

The important ideas here are that each state is moved in to the method that transitions to the next state. This way you're "giving away" your ownership of the data. This is great for preventing errors. You cannot retain access to stale state.

And by using the From trait to implement transitions, you ensure that improper transitions are impossible to represent.

It's a great pattern and has only grown in use since this was written.

jelder 6 months ago

Just a Nine Inch Nails fan who really wanted the title to be a pun on “Pretty Hate Machine.”
wging 6 months ago

I think I first learned of that term from this 2019 article: https://cliffle.com/blog/rust-typestate/ I can't be the only one...
- zokier 6 months ago
  
  Typestates were also notable feature in early Rust, albeit in a very different form. I do recall them mentioned often in presentations/talks/etc at the time.
  Tbh it would make interesting blog post to compare modern typestate patterns to the historical built-in typestate mechanism.
  https://github.com/rust-lang/rust/issues/2178

gsliepen 6 months ago

I don't understand why you would code these explicit state machines when you can just write normal code that is much more readable. The state machine example they start with could be written as:

  while (true) {
     wait();
     fill();
     finish();
  }

I don't think the approach from the article would have any benefits like less bugs or higher performance.

MrJohz 6 months ago

For a very simple example like this, your version will probably be okay, but it has its own set of problems:
* It's difficult to introspect the current state of the system. If I were to build an API that fetches the current state, I'd need to do something like add an extra state that the different functions could then update, which makes all the code more messy. By turning this into an explicit state machine, I can encode the states directly in the system and introspect them.
* Similarly it's often useful to be able to listen to state transitions in order to update other systems. I could include that as part of the functions themselves, but again, if I just encode this operation as an explicit state machine, the transition points fall out very nicely from that.
* Here there is no branching, but state machines can often have complicated branching logic as to which state should be called next. It's possible to write this directly as code, but in my experience, it often gets complicated more quickly than you'd think. This is just a simple example, but in practice a bottle filler probably has extra states to track whether the machine has been turned off, in which case if it's in the `fill` state it will switch to the `finish` or `wait` state to ensure the internal machinery gets reset before losing power. Adding this extra logic to a state machine is usually easier than adding it to imperative code.
* In your version, the different functions need to set up the state ready to be used by the next function (or need to rely on the expected internal state at the end of the previous function). This gives the functions an implicit order that is not really enforced anywhere. You can imagine in a more complicated state machine, easily switching up two functions and calling them in the wrong order. In OP's version, because the state is encoded in such a type-safe way, any state transitions must handle the data from the previous state(s) directly, and provide all the correct data for the next state. Even if you were to get some of the states the wrong way round, you'd still need to correctly handle that transition, which prevents anything from breaking even if the behaviour is incorrect.
theptip 6 months ago
If you try to implement an actual state machine (or even interlinked ones like TCP) in this style you will have a very bad time.
The FSM model is restrictive but this makes it much easier to exhaustively cover and validate all state combinations.
To be concrete, the kind of code you end up writing in your non-FSM approach (for a non-trivial example where you have, say, an RPC per state transition or other logic between the transitions) is
```
    def finish():
      if state == WAIT:
        #error
      elif state == FILL:
        …
```
And worse, you need to validate your state in each transition, which ends up being duplicative and error-prone vs. just specifying your transitions once and then depending on them to enforce that the states are valid.
FSMs are a great pattern when they apply cleanly.
IshKebab 6 months ago

One very common reason is to make the code non-blocking. In fact Rust's async/await system works by converting the code into a state machine.
Unfortunately Rust doesn't have proper support for coroutines or generators yet so often you'll want to use a hand written state machine anyway.
Even if it did, sometimes the problem domain means that a state machine naturally fits the semantics better than using a control flow based approach anyway.
sdenton4 6 months ago

I tend to think of state machines as becoming important when you're forced to deal with the unpredictably of the real world, rather than just pummeling bits until they repent.
You've got some complicated Thing to control which you don't have full visibility into... Like, say, a Bluetooth implementation, or a robot. You have a task to complete, which goes through many steps, and requires some careful reset operations before you can try again when things inevitably don't go according to plan. What steps are needed for the reset depends on where you're at in the process. Maybe you only need to retry from there steps ago instead of going all the way back to the beginning... The states help you keep track of where things are at, and more rigorously define the paths available.
crq-yml 6 months ago

It's a formal model that we can opt into surfacing, or subsume into convenient pre-packaged idioms. For engineering purposes you want to be aware of both.
It's way easier to make sense of why it's relevant to write towards a formalism when you are working in assembly code and what is near at hand is load and store, push and pop, compare and jump.
Likewise, if the code you are writing is actually concurrent in nature(such as the state machines written for video games, where execution is being handed off cooperatively across the game's various entities to produce state changes over time) most prepackaged idioms are insufficient or address the wrong issue. Utilizing a while loop and function calls for this assumes you can hand off something to the compiler and it produces comparisons, jumps, and stack manipulations, and that that's what you want - but in a concurrent environment, your concerns shift towards how to "synchronize, pause and resume" computations and effects, which is a much different way of thinking about control flow that makes the formal model relevant again.
joshka 6 months ago

There's definitely a missing part of this which talks about when to use this sort of approach. The answer is often when there's non trivial amounts of stuff that happens between the end of one method and the start of the next which is in control of the external system. That said, I often argue that async/await solves the majority of that problem by implicit modeling of the state machine while keeping the code readable.
asimpletune 6 months ago

Because to reason about things becomes harder as the stakes are raised. We had to implement paxos for distributed systems in college and my partner I started over probably about three times trying to code it normally. Then we switched to just focusing on defining states and the conditions that transition between them and our solution became much easier to code.
marginalia_nu 6 months ago

One benefit is that if you persist the state on transition, you can create a program that survives restarts and crashes. The pattern is very useful if you have tasks with very long run times that need to be orchestrated somehow.
There are also some parsers that can be very rough to implement in an imperative fashion.
imtringued 6 months ago

That's only because the diagram is completely wrong and your code is wrong accordingly, because it implements the incorrect diagram.
Usually a wait state has a self referential transition. You don't perform the wait once, you keep waiting until the condition for transitioning to the fill state is true.
The next problem is that you are doing polling. If you were to implement this on a microcontroller then your code cannot be called from an interrupt and let the microcontroller go to sleep to conserve power.
baq 6 months ago

That’s true for everything. If you feel confident you can safely refactor and modify a state machine encoded in this way, go for it. Most of us who have seen the trenches don’t feel confident and gladly accept tool assistance.
creatonez 6 months ago

That's just a minimalistic example. You might want to say, add a progress bar, or persist the state in an SQL database, which you can't do by letting the CPU ram through a huge code block.
skavi 6 months ago

allows trivially mocking effects for testing
eddd-ddde 6 months ago

Yeah, and while you are at it drop the types and just use normal assembly, that way you won't even have UB!

theOGognf 6 months ago

It’s funny seeing this blog post again. This is actually a reference I used to make a poker game as a state machine last year: https://github.com/theOGognf/private_poker

It made the development feel a lot safer and it’s nice knowing the poker game state cannot be illegally transitioned with the help of the type system

michalsustr 6 months ago

Well this is timely :) I’m in the middle of writing a library, based on rust-fsm crate, that adds nice support for Mealy automata, with extensions like

- transition handlers

- guards

- clocks

- composition of automata into a system.

The idea is to allow write tooling that will export the automata into UPPAAL and allow for model checking. This way you don’t need to make too much additional effort to ensure your model and implementation match/are up to date, you can run the checker during CI tests to ensure you don’t have code that deadlocks/ some states are always reachable etc.

I plan to post a link here to HN once finished.

dafelst 6 months ago

Please do!
- michalsustr 6 months ago
  
  I published first version: https://github.com/michalsustr/rust-automata
  Happy to get feedback.

mijoharas 6 months ago

Maybe I'm missing something, but it seems that this approach doesn't allow you to store external input that's provided when you transition states.

Say stateB is transitioned to from stateA and needs to store a value that is _not_ directly computed from stateA but is externally supplied at the point of transition.

As far as I understand this isn't possible with the proposed solution? Am I missing something? This seems like a pretty common use case to me.

vlovich123 6 months ago

impl From<(OldState, Input)> for NewState would be one way.
- fcoury 6 months ago
  
  Ok you just blew my mind with this pattern. I created a similar From trait only to take in an extra param so many times. Thanks for this.
- mijoharas 6 months ago
  
  Ahh, nice. Didn't think of that!
michalsustr 6 months ago

Hi, I just made a library which does that!
I published first version: https://github.com/michalsustr/rust-automata
Happy to get feedback.

gnabgib 6 months ago

(2016) Popular, but barely discussed at the time (62 points, 3 comments) https://news.ycombinator.com/item?id=12703623

nativeit 6 months ago

This isn't really relevant to the article's topic, but I noticed something in their stylesheet that has me intensely curious--can anyone explain to me what's happening with this CSS rule?

  * {
 --index: calc(1 * var(--prime2) * var(--prime3) * var(--prime5) * var(--prime7) * var(--prime11) * var(--prime13) * var(--prime17) * var(--prime19) * var(--prime23) * var(--prime29) * var(--prime31) * var(--prime37) * var(--prime41) * var(--prime43) * var(--prime47) * var(--prime53) * var(--prime59) * var(--prime61) * var(--prime67) * var(--prime71) * var(--prime73) * var(--prime79) * var(--prime83) * var(--prime89) * var(--prime97) * var(--prime101) * var(--prime103) * var(--prime107) * var(--prime109) * var(--prime113) * var(--prime127) * var(--prime131) * var(--prime137) * var(--prime139) * var(--prime149) * var(--prime151) * var(--prime157) * var(--prime163) * var(--prime167) * var(--prime173) * var(--prime179) * var(--prime181) * var(--prime191) * var(--prime193) * var(--prime197) * var(--prime199) * var(--prime211) * var(--prime223) * var(--prime227) * var(--prime229) * var(--prime233) * var(--prime239) * var(--prime241) * var(--prime251) * var(--prime257) * var(--prime263) * var(--prime269) * var(--prime271) * var(--prime277) * var(--prime281) * var(--prime283) * var(--prime293) * var(--prime307) * var(--prime311) * var(--prime313) * var(--prime317) * var(--prime331) * var(--prime337) * var(--prime347) * var(--prime349) * var(--prime353) * var(--prime359) * var(--prime367) * var(--prime373) * var(--prime379) * var(--prime383) * var(--prime389) * var(--prime397) * var(--prime401) * var(--prime409) * var(--prime419) * var(--prime421) * var(--prime431) * var(--prime433) * var(--prime439) * var(--prime443) * var(--prime449) * var(--prime457) * var(--prime461) * var(--prime463) * var(--prime467) * var(--prime479) * var(--prime487) * var(--prime491) * var(--prime499) * var(--prime503) * var(--prime509) * var(--prime521) * var(--prime523) * var(--prime541) * var(--prime547) * var(--prime557) * var(--prime563) * var(--prime569) * var(--prime571) * var(--prime577) * var(--prime587) * var(--prime593) * var(--prime599));
}

MarkSweep 6 months ago

It's to create random numbers to make the star field on the home page:
https://hoverbear.org/
It's defined in this file:
https://github.com/Hoverbear/hoverbear.org/blob/root/sass/_m...
Which links to an explanation here:
https://medium.com/hypersphere-codes/counting-in-css-unlock-...
nativeit 6 months ago

It appears to be related to how they draw a fancy SVG logo with a starfield and constellation. These variables relate to the positions of the stars in the starfield (or something similar). Clever.

skavi 6 months ago

i think stable coroutines [0] would be huge for rust. they would enable writing pure state machines in the form of straight line imperative code.

currently they’re used in the implementation of async/await, but aren’t themselves exposed.

[0]: https://doc.rust-lang.org/beta/unstable-book/language-featur...

locusofself 6 months ago

Is the title a nod to the Nine Inch Nails album "Pretty Hate Machine" ?

gnabgib 6 months ago

Partially, according to op: https://news.ycombinator.com/item?id=12708060

joshka 6 months ago

I prefer giving the transitions explicit names over relying on the From implemenations defined on the machine (defining them on the states still prevents bad transitions). The raft example drops a bunch of syntactic noise and repetition this way:

https://play.rust-lang.org/?version=stable&mode=debug&editio...

    fn main() {
        let is_follower = Raft::new(/* ... */);
        // Raft typically comes in groups of 3, 5, or 7. Just 1 for us. :)
        
        // Simulate this node timing out first.
        let is_candidate = is_follower.on_timeout();
        
        // It wins! How unexpected.
        let is_leader = is_candidate.on_wins_vote();
        
        // Then it fails and rejoins later, becoming a Follower again.
        let is_follower_again = is_leader.on_disconnected();
        
        // And goes up for election...
        let is_candidate_again = is_follower_again.on_timeout();
        
        // But this time it fails!
        let is_follower_another_time = is_candidate_again.on_lose_vote();
    }
    
    
    // This is our state machine.
    struct Raft<S> {
        // ... Shared Values
        state: S
    }
    
    // The three cluster states a Raft node can be in
    
    // If the node is the Leader of the cluster services requests and replicates its state.
    struct Leader {
        // ... Specific State Values
    }
    
    // If it is a Candidate it is attempting to become a leader due to timeout or initialization.
    struct Candidate {
        // ... Specific State Values
    }
    
    // Otherwise the node is a follower and is replicating state it receives.
    struct Follower {
        // ... Specific State Values
    }
    
    impl<S> Raft<S> {
        fn transition<T: From<S>>(self) -> Raft<T> {
            let state = self.state.into();
            // ... Logic prior to transition
            Raft {
                // ... attr: val.attr 
                state,
            }
        }
    }
    
    // Raft starts in the Follower state
    impl Raft<Follower> {
        fn new(/* ... */) -> Self {
            // ...
            Raft {
                // ...
                state: Follower { /* ... */ }
            }
        }
        
        // When a follower timeout triggers it begins to campaign
        fn on_timeout(self) -> Raft<Candidate> {
            self.transition()
        }
    }
    
    
    
    impl Raft<Candidate> {
        // If it doesn't receive a majority of votes it loses and becomes a follower again.
        fn on_lose_vote(self) -> Raft<Follower> {
            self.transition()
        }
    
        // If it wins it becomes the leader.
        fn on_wins_vote(self) -> Raft<Leader> {
            self.transition()
        }
    }
    
    impl Raft<Leader> {
        // If the leader becomes disconnected it may rejoin to discover it is no longer leader
        fn on_disconnected(self) -> Raft<Follower> {
            self.transition()
        }
    }
    
    // The following are the defined transitions between states.
    
    // When a follower timeout triggers it begins to campaign
    impl From<Follower> for Candidate {
        fn from(state: Follower) -> Self {
            Candidate { /* ... */ }
        }
    }
    
    // If it doesn't receive a majority of votes it loses and becomes a follower again.
    impl From<Candidate> for Follower {
        fn from(state: Candidate) -> Self {
            Follower { /* ... */ }
        }
    }
    
    // If it wins it becomes the leader.
    impl From<Candidate> for Leader {
        fn from(val: Candidate) -> Self {
            Leader { /* ... */ }
        }
    }
    
    // If the leader becomes disconnected it may rejoin to discover it is no longer leader
    impl From<Leader> for Follower {
        fn from(val: Leader) -> Self {
            Follower { /* ... */ }
        }
    }

gnatolf 6 months ago

I did the same. In short examples like the ones used in the article, it's easy to reason about the states and transitions. But in a much larger codebase, it gets so much harder to even discover available transitions if one is leaning too much on the from/into implementations. Nice descriptive function names go a long way in terms of ergonomic coding.

raphinou 6 months ago

Is there any crate advised to be used when developing state machines? Any experience to share?

michalsustr 6 months ago

Hi, I made a new crate since I wasnt happy with the existing ones.
I published first version: https://github.com/michalsustr/rust-automata
Happy to get feedback.
acid_burn 6 months ago

I have a particular interest in hierarchical state machines, so I made moku [1] to take care of all the boilerplate associated with them.
Its ergonomics are definitely tailored for nested states, but it can generate flat machines perfectly fine.
[1] https://docs.rs/moku/latest/moku/

jll29 6 months ago

How about state machines with millions of transitions such as letter transducers?

pjmlp 6 months ago

Kind of interesting seeing folks rediscovering ideas from Standard ML.

jfauwasdf 6 months ago

Are you sure you aren't thinking of Idris or LinearML? Standard ML does not have linear or affine logic AFAIK.
- pjmlp 6 months ago
  
  The state machine approach described in the article has nothing to do with linear or affine types.
  - jfauwasdf 6 months ago
    
    With respect to state machine transitions
    FTA: "Changing from one state to another should consume the state so it can no longer be used."
    consume is a dog whistle for affine logic, is it not?
    also you are right in that state machines themselves don't have anything to do with linear or affine types but this article is about implementing one in rust which has affine logic.
    
    pjmlp 6 months ago
    
    It can be modeled as part of transition states on the type system, doesn't necessarily need affine logic.
    
    jfauwasdf 6 months ago
    
    Agreed but I never made the claim they couldn't be.
    Can you expand on the ideas from Standard ML that are referenced in the article? That's what I'm interested in and didn't intend to go on a tangent here. Apologies for that.

jesse__ 6 months ago

Honestly, I prefer the long-form first-cut that he dismisses to save 10 lines of code at the cost of using generics.

joshka 6 months ago

She, not he. https://www.linkedin.com/in/hoverbear/
jesse__ 6 months ago

On a second skim it is nice that errors are surfaced at compile time, but honestly I'm not sure that's worth the complexity.
- kadoban 6 months ago
  
  I feel like either one could be a reasonable choice, mostly depending on how complex the state machine is and how bad it is if it errors at runtime.