Frankenstein's `init`

ohadravid.github.io

99 points by todsacerdoti 3 months ago

mjr00 3 months ago

Doing anything like this in __init__ is crazy. Even `Path("config.json").read_text()` in a constructor isn't a good idea.

Friends don't let friends build complicated constructors that can fail; this is a huge violation of the Principle of Least Astonishment. If you require external resources like a zeromq socket, use connect/open/close/etc methods (and a contextmanager, probably). If you need configuration, create a separate function that parses the configuration, then returns the object.

I appreciate the author's circumstances may have not allowed them to make these changes, but it'd drive me nuts leaving this as-is.

echelon 3 months ago

Not just Python. Most languages with constructors behave badly if setup fails: C++ (especially), Java, JavaScript. Complicated constructors are a nuisance and a danger.
Rust is the language I'm familiar with that does this exceptionally well (although I'm sure there are others). It's strictly because there are no constructors. Constructors are not special language constructs, and any method can function in that way. So you pay attention to the function signature just like everywhere else: return Result<Self, T> explicitly, heed async/await, etc. A constructor is no different than a static helper method in typical other languages.
new Thing() with fragility is vastly inferior to new_thing() or Thing::static_method_constructor() without the submarine defects.
Enforced tree-style inheritance is also weird after experiencing a traits-based OO system where types don't have to fit onto a tree. You're free to pick behaviors at will. Multi-inheritance was a hack that wanted desperately to deliver what traits do, but it just made things more complicated and tightly coupled. I think that's what people hate most about "OOP", not the fact that data and methods are coupled. It's the weird shoehorning onto this inexplicable hierarchy requirement.
I hope more languages in the future abandon constructors and strict tree-style and/or multi-inheritance. It's something existing languages could bolt on as well. Loose coupling with the same behavior as ordinary functions is so much easier to reason about. These things are so dated now and are best left in the 60's and 70's from whence they came.
- rcxdude 3 months ago
  
  The annoying thing is you can actually just use the rust solution in C++ as well (at least for initial construction), but basically no-one does.
- hdjrudni 3 months ago
  
  I don't have enough experience with traits, but they also sound like a recipe for creating a mess. I find anything more than like 1 level of inheritance starts to create trouble. But perhaps that's the magic of traits? Instead of creating deep stacks, you mix-and-match all your traits on your leaf class?
  - echelon 3 months ago
    
    > I find anything more than like 1 level of inheritance starts to create trouble.
    That's the beauty of traits (or "type classes"). They're behaviors and they don't require thinking in terms of inheritance. Think interfaces instead.
    If you want your structure or object to print debug information when logged to the console, you custom implement or auto-derive a "Debug" trait.
    If you want your structure or object to implement copies, you write your own or auto-derive a "Clone" trait. You can control whether they're shallow or deep if you want.
    If you want your structure or object to be convertible to JSON or TOML or some other format, you implement or auto-derive "Serialize". Or to get the opposite behavior of hydrating from strings, the "Deserialize" trait.
    If you're building a custom GUI application, and you want your widget to be a button, you implement or auto-derive something perhaps called "Button". You don't have to shoehorn this into some kind of "GObject > Widget > Button" kind of craziness.
    You can take just what you need and keep everything flat.
    Here's a graphical argument I've laid out: https://imgur.com/a/bmdUV3H
- whatevaa 3 months ago
  
  In C#, it is also not a good idea to have constructors with failable io in them.
  - neonsunset 3 months ago
    
    Yup, and IO being async usually creates impedance mismatch in constructors.
    Had to refactor quite a few anti-pattern constructors like this into `async Task<Thing> Create(...)` back in the day. No idea what was the thought process of the original authors, if there was any...
- fouronnes3 3 months ago
  
  How would you do traits in Python?
  - vaylian 3 months ago
    
    Python has a trait-like system. It's called "abstract base classes": https://docs.python.org/3/library/abc.html
    The abstract base class should use the @abstractmethod decorator for methods that the implementing class needs to implement itself.
    Obviously, you can also use abstract base classes in other ways, but they can be used as a way to define traits/interfaces for classes.
    
    dragonwriter 3 months ago
    
    Protocols, too (they are defined on top of ABCs)
    https://typing.python.org/en/latest/spec/protocol.html#proto...
    
    gpderetta 3 months ago
    
    Protocols are definitely the closest thing to traits in Python. ABC are pretty much OoO interfaces.
  - echelon 3 months ago
    
    I'd really need to think long and hard about it, but my initial feeling is that we'd attach them to data classes or a similar new construct. I don't think you'd want to reason about the blast radius with ordinary classes. Granted, that's more language complexity, creates two unequal systems, and makes much more to reason about. There's a lot to juggle.
    As much fun as putting a PEP together might be, I don't think I have the bandwidth to do so. I would really like to see traits in Python, though.
  - svilen_dobrev 3 months ago
    
    there were "traits" in python :/ . Search for pyprotocols.
    (searching myself.. not much left of it)
    2004 - https://simonwillison.net/2004/Mar/23/pyprotocols/
    some related rejected PEP .. https://peps.python.org/pep-0246/ talking about something new to be expected.. in 2001?2005?
    no idea did it happen and which one would that be..
    
    dragonwriter 3 months ago
    
    Possibly Guido, in that rejection, was talking about what ended up as ABCs, original PEP: https://peps.python.org/pep-3119/
    Further work was done in this area, building on ABCs, with Protocols, original PEP: https://peps.python.org/pep-0544/
rcxdude 3 months ago

Why? An object encapsulates some state. If it doesn't do anything unless ypu call some other method on it first, it should just happen in the constructor. Otherwise you've got one object that's actually two types: the object half-initialised and the object fully initialised, and it's very easy to confuse the two. Especially in python there's basically no situation where you're going to need that half-state for some language restriction.
- mjr00 3 months ago
  It's a lot easier to reason about code if I don't need to worry that something as simple as
  my_variable = MyType()
  might be calling out to a database with invalid credentials, establishing a network connection which may fail to connect, or reading a file from the filesystem which might not exist.
  You are correct that you don't want an object that can be half-initialized. In that case, any external resources necessary should be allocated before the constructor is called. This is particularly true if the external resources must be closed after use. You can see my other comment[0] in this thread for a full example, but the short answer is use context managers; this is the Pythonic way to do RAII.
  [0] https://news.ycombinator.com/item?id=43736940
  - rcxdude 3 months ago
    
    Why, when
    my_var = MyType() my_var.actually_init()
    Works in exactly the same way? Errors get reported the same, it has the same semantics, you have the same need to keep track of things, it's just two lines instead of one and you're more likely to have a partially initialised object floating around. I really don't see the advantage at all.
    Context managers are handy if you need to guarantee cleanup in simple object lifetimes, and in that case __enter__ is probably the better place to do setup because it's closer to where you get guarantees about the cleanup happening, but context managers are not actually RAII, just a partial substitute.
    
    mjr00 3 months ago
    
    Your code is comparing to the wrong alternative; the comparison should be to
    connection = connect_to_database() my_var = MyType(connection)
    It's far more testable, maintainable, more explicit about when and which objects are created, and never has a partially initialized object.
  - lyu07282 3 months ago
    
    > might be calling out to a database with invalid credentials, establishing a network connection which may fail to connect, or reading a file from the filesystem which might not exist.
    In python its not unusual that,
    import some_third_partylib
    will do exactly that. I've seen libraries that will load a half gigabyte machine learning model into memory on import and one that sends some event to sentry for telemetry.
    People write such shit code, its unbelievable.
- hdjrudni 3 months ago
  
  > Otherwise you've got one object that's actually two types: the object half-initialised and the object fully initialised, and it's very easy to confuse the two.
  You said it yourself -- if you feel like you have two objects, then literally use two objects if you need to split it up that way, FooBarBuilder and FooBar. That way FooBar is always fully built, and if FooBarBuilder needs to do funky black magic, it can do so in `build()` instead of `__init__`.
  - rcxdude 3 months ago
    
    I mean, you can do that (and it kight be useful if you want to seperate where the construction arguments are from where when the initialisation happens), but I don't understand why not just do it in init instead of build. It's not all that special, it's literally just a method that's called when you make an instance of the class, if you need to do it before you use the class, just make it happen when you make the class.
  - exe34 3 months ago
    
    no thanks, that's how we end up with FactoryFactory. If the work needs to be done upon startup, then it needs to be done. if it is done in response to a later event, then it been done later.
    
    duncanfwalker 3 months ago
    
    For me the point is that __init__ is special - it's the language's the way to construct an object. If we want to side-effecting code we can with a non-special static constructor like
    class Foo @classmethod def connect(): return Foo()._connect()
    The benefit is that we can choose when we're doing 1) object construction, 2) side-effecting 3) both. The downside is client might try use the __init__() so object creation might need to be documented more than it would otherwise
    
    rcxdude 3 months ago
    
    Init isn't really special (especially compared to e.g. __new__). It has no special privileges, it's just a method that is called when you construct an instance of the class.
- banthar 3 months ago
  
  Python context managers somewhat require it. You have to create an object on which you can call `__enter__`.
  - jakewins 3 months ago
    
    This is the exact opposite? They explicitly encourage doing resource-opening in the __enter__ method call, and then returning the opened resource encapsulated inside an object.
    Nothing about the contract encourages doing anything fallible in __init__
  - ptsneves 3 months ago
    
    It is a tragedy that python almost got some form or RAII but then figured out an object has 2 stages of usage.
    I also strongly disagree constructors cannot fail. An object that is not usable should fail fast and stop code flow the earliest possible. Fail early is a good thing.
    
    jakewins 3 months ago
    
    There is no contradiction between “constructors cannot fail” and “fail early”, nobody is arguing the constructor should do fallible things and then hide the failure.
    What you should do is the fallible operation outside the constructor, before you call __init__, then ask for the opened file, socket, lock, what-have-you as an argument to the constructor.
    Fallible initialisation operations belong in factory functions.
    
    int_19h 3 months ago
    
    The real problem is that constructors and factory functions are distinct in the first place. They aren't, in Rust, and it's much easier to reason about and requires far less verbiage to write.
    
    throw-the-towel 3 months ago
    
    Why though?

abdusco 3 months ago

My go-to is a factory classmethod:

    class Foo:
        def __init__(self, config: Config): ...

        @classmethod
        def from_config_file(cls, filename: str):
          config = # parse config file
          return cls(config)

jhji7i77l 3 months ago

> Even `Path("config.json").read_text()` in a constructor isn't a good idea.
If that call is necessary to ensure that the instance has a good/known internal state, I absolutely think it belongs in __init__ (whether called directly or indirectly via a method).
- mjr00 3 months ago
  You're right that consistent internal state is important, but you can accomplish this with
  class MyClass: def __init__(self, config: str): self._config = config
  And if your reaction is "that just means something else needs to call Path("config.json").read_text()", you're absolutely right! It's separation of concerns; let some other method deal with the possibility that `config.json` isn't accessible. In the real world, you'd presumably want even more specific checks that specific config values were present in the JSON file, and your constructor would look more like
  def __init__(self, host: str, port: int):
  and you'd validate that those values are present in the same place that loads the JSON.
  This simple change makes code so much more readable and testable.
  - asplake 3 months ago
    
    Better still, pass the parsed config. Let the app decide how it is configured, and let it deal with any problems.

zokier 3 months ago

Pretty simple to fix by changing the _init to something like:

    def _init(self):
        init_complete = threading.Event()
        def worker_thread_start():
            FooWidget.__init__(self)
            init_complete.set()
            self.run()

        worker_thread = Thread(target=worker_thread_start, daemon=True)
        worker_thread.start()
        init_complete.wait()

Spawning worker thread from constructor is not that crazy, but you want to make sure the constructing is completed by the time you return from the constructor.

greatgib 3 months ago

To be clear, a good init is not supposed to create a thread or do any execution that might not be instant of things like that.
It would have been better to an addition start, run, exec method that does that kind of things. Even if for usage it is an inch more complicated to use with an additional call.
- lmm 3 months ago
  
  > It would have been better to an addition start, run, exec method that does that kind of things. Even if for usage it is an inch more complicated to use with an additional call.
  That breaks RAII - you shouldn't give the user a way to create the object in an invalid state. Although if it's intended to be used as a context manager anyway then maybe doing it on context enter rather than on init would be nicer.
  - greatgib 3 months ago
    
    Doesn't necessarily have to be in an invalid state. You can have your object with a none value for the socket or a state like "not_started" and the different method will look at that to handle everything properly.
    Also, it was not supposed to be used in a context manager as there is just a close method but it is the caller that decided to wrap it in a context.
    But as what you suggest, indeed it would have been a great idea to add an __enter__ and do the starting there are as it would be the proper way.
- crdrost 3 months ago
  
  I don't think it's totally crazy to, say, open a DB connection in __init__() even though that's not an instantaneous operation. That's not a hill I would die on, you can just say “open a connection and hand it to me as an argument,” but it just looks a little cleaner to me if the lifecycle of the connection is being handled inside this DB-operations class. (You can also defer creating the connection until it is actually used, or require an explicit call to connect it, but then you are also writing a bunch of boilerplate to make the type checker happy for cases where the class is having its methods called before it was properly connected.)
  - mjr00 3 months ago
    
    It's not totally crazy in that I see it all the time, but it's one of the two most common things I've found make Python code difficult to reason about.[0] After all, if you open a DB connection in __init__() -- how do you close it? This isn't C++ where we can tie that to a destructor. I've run into so many Python codebases that do this and have tons of unclosed connections as a result.
    A much cleaner way (IMO) to do this is use context managers that have explicit lifecycles, so something like this:
    @contextmanager def create_db_client(host: str, port: int) -> Generator[_DbClient, None, None]: try: connection = mydblib.connect(host, port) client = _DbClient(connection) yield client finally: connection.close() class _DbClient: def __init__(self, connection): self._connection = connection def do_thing_that_requires_connection(...): ...
    Which lets you write client code that looks like
    with create_db_client('localhost', 5432) as db_client: # port 3306 if you're a degenerate db_client.do_thing_that_requires_connection(...)
    This gives you type safety, connection safety, has minimal boilerplate for client code, and ensures the connection is created and disposed of properly. Obviously in larger codebases there's some more nuances, and you might want to implement a `typing.Protocol` for `_DbClient` that lets you pass it around, but IMO the general idea is much better than initializing a connection to a DB, ZeroMQ socket, gRPC client, etc in __init__.
    [0] The second is performing "heavy", potentially failing operations outside of functions and classes, which can cause failures when importing modules.
    
    tayo42 3 months ago
    
    I get the points everyone is making and they make sense, but sometimes you need persistent connections. Open and closing constantly like can cause issues
    
    mjr00 3 months ago
    
    There's nothing about the contextmanager approach that says you're open and closing any more or less frequently than a __init__ approach with a separate `close()` method. You're just statically ensuring 1) the close method gets called, and 2) database operations can only happen on an open connection. (or, notably, a connection that we expect to be open, as something external the system may have closed it in the meantime.)
    Besides, persistent connections are a bit orthogonal since you should be using a connection pool in practice, which most Python DB libs provide out of the box. In either case, the semantics are the same, open/close becomes lease from/return to pool.
    
    lijok 3 months ago
    
    > This isn't C++ where we can tie that to a destructor
    `def __del__`
    
    12_throw_away 3 months ago
    
    > def __del__
    Nope, do not ever do this, it will not do what you want. You have no idea _when_ it will be called. It can get called at shutdown where the entire runtime environment is in the process of being torn down, meaning that nothing actually works anymore.
    
    mjr00 3 months ago
    
    C++ destructors are deterministic. Relying on a nondeterministic GC call to run __del__ is not good code.
    Also worth noting that the Python spec does not say __del__ must be called, only that it may be called after all references are deleted. So, no, you can't tie it to __del__.
- jhji7i77l 3 months ago
  
  > To be clear, a good init is not supposed to create a thread or do any execution that might not be instant of things like that.
  Maybe it indirectly ensures that only one thread is created per instantiation; there are better ways of achieving that though.
- eptcyka 3 months ago
  
  What if the object is in an invalid state unless the thread is started?
  - greatgib 3 months ago
    
    Nothing prevents you to have a valid "not_yet_started" state.
    
    rcxdude 3 months ago
    
    No, but it does complicate things substantially. I don't understand why you would have a useless half-constructed state, especially in python where init isn't all that magical.
    
    lgas 3 months ago
    
    The benefits of making illegal states unrepresentable are just not widely understood outside of pure FP circles. I think it's hard for stuff like this to catch on in python in particular because everything is so accessible and mutable, it would be hard to talk about anything that provides the level of confidence that eg. ADTs in Haskell provide. And it's hard to understand why you would want to try to do something like that in a context like python unless you already have experience with it in a statically typed context.
    
    int_19h 3 months ago
    
    This is basically equivalent to what null is for pointers, and has all the same problems.
    
    eptcyka 3 months ago
    
    And what would you do with this object after construction? Call start() on it immediately?
    
    greatgib 3 months ago
    
    Kind of yes or no, depend on your usage. But clearly would have worked well for the case of the automatic test if the start was the context
ohr 3 months ago

Thanks, I hate it! Jk, but would you consider this a good solution overall?
- im3w1l 3 months ago
  
  Not him, but I would also consider making FooBarWidget have a FooWidget rather than be a FooWidget.
  Normally with a FooWidget you can create one on some thread and then perform operations on it on that thread. But in the case of the FooBarWidget you can not do operations on it because operations must be done in the special thread that is inaccessible.
- shiandow 3 months ago
  
  I can't come up with a single good reason why you would wish to use inheritance in this case.
  Except possibly for type checking, but
  1. Python has duck typing anyway
  2. It's debatable whether these two classes should be interchangeable
  3. You shouldn't need to use inheritance just to make the type checking work.
  It could be considered a clever hack in some situations, but it's completely unsurprising that it causes issues. Putting band-aids on it after you find a bug does not fix the real problem.
- gpderetta 3 months ago
  
  I would consider this an hack. But a reasonable hack if you can't change the base class.

wodenokoto 3 months ago

> solving an annoying problem with a complete and utter disregard to sanity, common sense, and the feelings of other human beings.

While I enjoy comments like these (and the article overall!), they stand stronger if followed by an example of a solution that regards sanity, common sense and the feelings of others.

crdrost 3 months ago

So in this case that would be, a FooBarWidget is not a subclass of FooWidget but maybe still a subclass of AbstractWidget above that. It contains a thread and config as its state variables. That thread instantiates a FooWidget with the saved config, and runs it, and finally closes its queue.
The problem still occurs, because you have to define what it means to close a FooBarWidget and I don't think python Thread has a “throw an exception into this thread” method. Just setting the should_exit property, has the same problem as the post! The thread might still be initing the object and any attempt to tweak across threads could sometimes tweak before init is complete because init does I/O. But once you are there, this is just a tweak of the FooWidget code. FooWidget could respond to a lock, a semaphore, a queue of requests, any number of things, to be told to shut down.
In fact, Python has a nice built-in module called asyncio, which implements tasks, and tasks can be canceled and other such things like that, probably you just wanted to move the foowidget code into a task. (See @jackdied's great talk “Stop Writing Classes” for more on this general approach. It's not about asyncio specifically, rather it's just about how the moment we start throwing classes into our code, we start to think about things that are not just solving the problem in front of us, and solving the problem in front of us could be done with just simple structures from the standard library.)

pjot 3 months ago

Rather than juggling your parent’s __init__ on another thread, it’s usually clearer to:

1. Keep all of your object–initialization in the main thread (i.e. call super().__init__() synchronously).

2. Defer any ZMQ socket creation that you actually use in the background thread into the thread itself.

ledauphin 3 months ago

yeah, this just feels like one of those things that's begging for a lazy (on first use) initialization. if you can't share or transfer the socket between threads in the first place, then your code will definitionally not be planning to use the object in the main thread.

smitty1e 3 months ago

To paraphrase Wheeler/Lampson: "All problems in python can be solved by another level of indiscretion."

jerf 3 months ago

Oh, that's good. That has some depth to it. I'm going to have to remember that one. Of course one can switch out "Python" for whatever the local situation is too.

_Algernon_ 3 months ago

fix: add time.sleep(0.1) in the .close method.

gnfargbl 3 months ago

    # NOTE: doubled this to 0.2s because of some weird test failures.

Const-me 3 months ago

Using sleep instead of proper thread synchronization is unreliable. Some people have slow computers like a cheap tablet or free tier cloud VM. Other people have insufficient memory for the software they run and their OS swaps.
When I need something like that, in my destructor I usually ask worker thread to shut down voluntarily (using a special posted message, manual reset event, or an atomic flag), then wait for the worker thread to quit within a reasonable timeout. If it didn’t quit within the timeout, I log a warning message if I can.
- crdrost 3 months ago
  
  I believe _Algernon_ was making a joke about how you could take this problem and make the code even worse, not proposing a solution to the problem.
  It's also not even a problem with slow computers or insufficient memory, __init__ does I/O here, it connects to ZeroMQ, so it could have arbitrary latency in various circumstances exceeding the 100 milliseconds that we would be sleeping for. So the joke is, this fixes the problem in your test environment where everything is squeaky clean and you know that ZeroMQ is reachable, and now you have bugs in prod still.
- btown 3 months ago
  
  Notably on this point, resource contention on e.g. a Kubernetes cluster can cause these kinds of things to fail spuriously. Sleeps have their uses, but assuming that work is done while you’re sleeping is rarely a good idea, because when it rains it will pour.

devrandoom 3 months ago

That's Python's constructor, innit?

draw_down 3 months ago

[dead]

Frankenstein's `__init__`

Frankenstein's `init`