Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Self Hell in Python (kmkeen.com)
67 points by renlinx on Jan 4, 2015 | hide | past | favorite | 89 comments


I find the self in the method call parameters slightly awkward, but other than that I like writing a lot of 'self'. One of the biggest reasons I prefer Python to Ruby or Perl is that I really agree "Explicit is better than implicit.", and adding the self before your instance variables and method calls explicitly labels their namespace/scope, and I don't find it too distracting. I sometimes do it in Java too, prefixing instance variables and method calls with 'this'.

I do agree that OOP approaches can be a bit overused (especially if someone's been coding too much Java), and I've always appreciate that Python doesn't shove it down your throat. I like that you can lean towards more functional code like this example, or towards simple imperative code. But when I do want some classes, I find the Python mechanism mostly ok.


I like the self parameter, because it makes the language more powerful. You can now simply assign a function to a class and make it a member method if it accepts an appropriate first parameter. The opposite is equally true. You can do something like `map(Class.method, sequence)`.


In javascript you can rebind functions to any context without explicit self in signature. I admit this (PNI) was deeply confusing at first but I tend to prefer this way now. It feels a simpler and more general abstraction.


But then getting a reference to a bound method is simpler in Python: bound_method = object.method

In JavaScript one would have to write: var bound_method = object.method.bind(object);


Also for unbound methods to make them usable as functions.

Python:

    func = Foo.method
JavaScript:

    var func = Function.prototype.call.bind(Foo.prototype.method);


I like it in Go, too. Go requires it, the way python does, but the convention is to use a real variable name rather than self, since there's no real difference between that the function receiver variable and the test of the function arguments. I also like the explicit name spacing.


In Ruby we use an @ / self for both instance variables (mandatory, use self to call the getter / setter and @ to interact with the variable directly) and methods (self optional, generally used to reduce confusion if there might be multiple things with the same name).


I agree. I even always have a "var self = this" in my JavaScript class methods so I don't ever have to worry about scope of "this".


For what it's worth, the perl object system was actually borrowed from Python, so perl code is full of $self (or $this) too.


Neat, I expected it was the other way around:

    " I don't really know much about Python. I only stole its object system for Perl 5. I have since repented."
Larry Wall http://www.perl.com/pub/2007/12/06/soto-11.html


The author points to this line as the main culprit for self-hell:

  self.minimum, self.maximum = min(self.minimum, self.maximum), max(self.minimum, self.maximum)`
This line is needed to 'defensively protect the input'), i.e. try to compensate when minimum and maximum arrive out of order. Firstly, I can't understand exactly why this defensive protection is not done before assigning the attributes, i.e. simply:

  # not too hellish, more readable on two lines
  self.minimum = min(minimum, maximum)
  self.maximum = max(minimum, maximum)
Secondly, I'm not very up on my Python Zen, but I think this naming of arguments followed by defensive protection isn't particularly Pythonic anyway. How about something like this?

  class Window(object):
       def __init__(self, x1, x2):
           # pass your dimensions in any order
           self.minimum = min(x1, x2)
           self.maximum = max(x1, x2)
       def __call__(self, x):
           return self.minimum <= x <= self.maximum


The solution of adding the protection first is mentioned but there will be other situations later in which the use of a lot of self cannot be avoided, so it doesn't solve the general problem, if you call it that.

What is being called "defensive" I would rather call "permissive". If at all possible, I would prefer raising a ValueError if x1 >= x2.


This would also work:

    def make_window(min, max):
        min, max = sorted([min, max])
        return lambda val: min <= val <= max


Simple min and max on my computer reliably seems to take less time than the sorted solution, and also makes more sense to read. This is what I would intuitively expect anyway with the overhead of making the array.

  >>> timeit.timeit('x=4; y=3; x, y = sorted([x, y])')
  0.49517297744750977
  >>> timeit.timeit('x=4; y=3; x = min(x,y); y = max(x, y)')
  0.2774021625518799


The Pythonic solution is named parameters:

  def __init__(self, min=x, max=y):


Any parameter/argument is named in Python. What you're providing are default values for arguments.


Of course you are correct. This is why I shouldn't post shortly after waking up.


Note that the name "self" is a convention, not a requirement. If it's just the amount of keystrokes that you're concerned about, name it something else like "o".

Of course, good luck having someone else understand your code...


Did not know this. Thought self was equivalent to this in Java. Mind = blown.


See, you pass it as a parameter

So it can be called anything you want

BUT DON'T DO IT! Because you'll break the convention and anyone who maintains your code after that will hate you.


You can ALSO declare a class member function in a different file. Since it just accepts self as the first parameter... It still works.


Peter Norvig once proposed using '_' but got shot down.


'_' is already idiomatic for throwaway variables. It wouldn't help matters to use it for OOP.


In the Python world, '_' is used in three different ways:

* As a throw-away variable:

    >>> a, b, _, c, d = f(t)
* As a gettext call:

    >>> print _('Welcome to Python')
* And at the interactive prompt as the result of the previous expression:

    >>> 3 + 4
    7
    >>> _ * 10
    70


As most of my Python involves localized Django, I twitch every time I see `_` used for anything other than some variation of gettext.


Weird, since the gettext use-case is definitely the least idiomatic and least common use of `_`.


As I said, most of my Python involves localized Django. So gettext is used everywhere and (apart from the REPL) it's the other uses of `_` that are the odd ones out.


that's why you should use '__' as throw-away variable (two underscores)


Wow. How did I go years without knowing this about the repl?


Its a really crap variable name as well. I posted on Stack Overflow asking what it was doing, assuming it was some Python syntax, only to be pointed to the import at the top of the file, which was being renamed to "_". Give variables meaningful names please.


Trying to avoid self in Python is just no good idea. You just break one basic convention of Python just because you don't like one thing in Python.

I would recommend to the author of the article one of those things:

  (a) either find a programming language, where you like everything (I guess, unlikely)

  (b) or write your own programming language
After some years in the business, I have learnt, that every programming language has its own "hell".

Some hells are hotter, some are relatively mild. When you try to avoid every hell, you will end up writing your own programming language ... and be very lonely for the rest of your computer-life.

Pythons oddities are one of the mildest I have found yet, IMHO.


bust most importantly: (c) don't write much of this self-less code that nobody will be able to understand, test or statically analyze (since many tools assume OOP)


Yes, I implied that. But of course, when you do so, you create an own "flavor" of Python, that will not be understandable for others.


After 5 years of Python, I don't even see self anymore (how zen).

Author could have written

    class Window(object):
        def __init__(self, min, max):
            if min > max:
                min, max = max, min
            self.min = min
            self.max = max
to avoid repetition. If many args are needed,

    def __init__(self, a,b,c,d,e,f,...):
        for k, v in locals().items():
            setattr(self, k, v)
Etc etc there are endless idiomatic ways to avoid boilerplate in Python


Or this:

    class Window(object):
        def __init__(self, minimum, maximum):
            self.minimum = min(minimum, maximum)
            self.maximum = max(minimum, maximum)
Why set attributes twice? AFAIAC you only need to write "self." once for each attribute in the constructor.


    class Window(object):
        def __init__(self, minimum, maximum):
            self.min_func = min
            self.max_func = max
            self.tuple_type = tuple
            self.minimum = minimum
            self.maximum = maximum
            self.minimum, self.maximum = self.tuple_type([self.min_func(*self.tuple_type([self.minimum, self.maximum])), self.max_func(*self.tuple_type([self.minimum, self.maximum]))])
    
        def __call__(self, x):
            self.x = x
            result = self.minimum <= self.x <= self.maximum
            del self.x
            return result


I agree, there is hardly any need for OPs post.


My problem with inner functions is that you can't create a unit test for them. They're inaccessible.

I would propose making your inner function into a classmethod and moving away from the state of the class. This structure is well defined and is easily testable.

Alternatively, assuming you feature doesn't get more complicated, you can make your min-max ordering happen in one line:

    self.minimum, self.maximum = sorted((minimum, maximum))
I think you can avoid self hell through careful programming.


Inner functions in closures are tested in much the same way as methods are tested for classes.

   closure = OuterFunc(var1, var2)
   self.assertEqual(closure(var3), result)

   inst = SomeClass(var1, var2)
   self.assertEqual(inst.somemethod(var3), result)
When you think about it, there isn't much sense to calling an inner function without having called the enclosing function (the inner function typically depends on data trapped or closed by the outer function).

Likewise, there isn't much sense in calling a method without having instantiated a class (methods typically depend on data stored in the instance).


You don't test inner functions the same way as you don't test private methods in a class. You only test for public API. They are implicitly tested when you test the higher level function.


Typically I would agree with you, but I've worked on a lot of old codebases with extensive testing for deep scary private parts. A good example is some public method which does something seemingly innocent and useful, but in its guts calls a bunch of black magic to deal with different operating systems and configuration options.


It sounds like the testing frameworks in Python must be very limited if this is true.


They're limited in the sense that they can't magically reach into a scope and inspect its private variables.

(Not technically true - deep code inspection is possible. That wouldn't be the same as unit testing, though.)


Why would you need to do this for a unit test? Can't you just test the properties of the return value of the outer function like you would for any other value? If the function returned an object, how would that be different?


OK, I think some wires got crossed. You can trivially create a unit test for the outer function that indirectly tests the output of the inner function. What the GGP meant was that you can't directly test the inner function because you can't access it and so can't call inner() from the unit test.

For example, given:

  def add(a, b):
      def inner():
          return a + b
      return inner()
You can write tests like:

  assert add(1, 2) == 3
But you can't write something like:

  assert add.inner() == 3
If inner() were doing something nontrivial, you'd probably want it to be standalone.


It's not about the testing framework, it's about not being able to access the inner function

http://stackoverflow.com/questions/7054228/accessing-a-funct...


It's true that people tend to force OOP onto python, especially when coming from other languages. But I think it's good to mention that in certain cases, keeping state by building a class is necessary, or at least easier than using a module-based approach.

  > The double checking costs a few CPU cycles but saves a lot of keystrokes.
Not sure I agree with this, idiomatic python should be explicit, even if it means more keystrokes.


Once Python got nested scoping, 'class' became redundant. It's a nuisance not just because of all the selves: say you started with

    def make_window(x1, x2):
        lo, hi = sorted((x1, x2))
        def window(x): return lo <= x <= hi
        return window
and decided to give the window a custom repr. You could write

        window.__repr__ = lambda: '<%s..%s>' % (lo, hi)
but this sort of thing gets less fun the further you take it. So you actually rewrite all of this code with 'self' and '__init__' and it gets much longer and if you're me you conceive an antipathy to classes. What I wish you could write is

    def make_window(x1, x2):
        lo, hi = sorted((x1, x2))
        def window:
            def __call__(x): return lo <= x <= hi
            def __repr__(): return '<%s..%s>' % (lo, hi)
        return window
BTW, 'foo hell' is too strong a term for most real foo in programming -- we don't need more encouragement to split into warring tribes.


You can sort of do this with type (and this probably makes python purists really angry, but I've done similar once or twice):

    def make_window(x1, x2):
      lo, hi = sorted( (x1, x2) )
      window = type('Window', (object,), {
        '__call__' : lambda _, x: lo <= x <= hi,
        '__repr__' : lambda _: '<%s..%s>' % (lo, hi)
        })
      return window()
That's actually making a new class called 'Window', inheriting from object, with a call and a repr special method. I still haven't personally decided whether it's gross or not, but I am pretty sure it's not pythonic.


Yeah, it's possible. My bigger point, which I should've made explicit:

Python with nested scopes and classes has two ways to make an object. (Originally it had only classes.) A Python-like language with just one way of defining objects, using nested scopes like my "def window:" above, would be simpler and more concise and more Pythonic (in the sense that there's one and only one obvious way to do it). It's obviously too late to call that language Python, which disappoints me.

When the obvious way to do it is very different depending on how many methods an object gets (a function for one, a class for more), refactorings like my example get kind of annoying.


The reason that is gross is that you are essentially creating a new class each time you call it, instead of just creating an instance.

Plus, if we really wanted to, we could do something insane, like

    def window_repr(self):
      return '<%s..%s>' % (self.__closure__[0], self.__closure__[1])
... but we are trying to program in python. We have standards.


With that approach, each call to ``make_window`` produces a distinct type, such that isinstance couldn't be used to compare two windows. That kind of breaks the idea of a "type", I think, as being a collection of values.


Python is my favorite language, but I think that explicitly having to pass self to methods was a huge mistake. It's a gotcha that catches even experienced programmers, the error message is confusing and unhelpful, it's verbose, and the one case it allows (calling the class directly instead of the instance?) is not generally that useful. It makes some metaprogramming stuff slightly cleaner sometimes, but that's not a great justification.


Well look at JavaScript where self (this) works exactly the same way (it's conceptually just another argument to a function) but is passed implicitly and declared as parameter implicitly. People are not very happy with that either...


The problem with javascript's "this" isn't that it's implicit, it's that it acts in a surprising way because of the prototype-based type system. Normally you wouldn't expect a closure inside a method to introduce a new "this" variable, but in javascript it does. Other languages with an implicit "this" variable don't have that issue. In a language with a more traditional type system it's not really a problem.


Making this an explicit argument would remove that ambiguity, and JS isn't the only language to have done things that way, and to have throughly confused people by doing so.


The asymmetry in python signature is the confusing part at first, you call a method with N arguments, but the error will complain about N+1 passed.


The author is right in that you should use iterators and generators whenever they are a elegant solution.

I do not think, that you should use nested functions that often, as already mentioned, they are hard to test.

You get more readable and testable code, when you put the code in a new method or in another class instead. You end up with less lines per method and you can pass the data explicitly, getting rid of the self.


> they are hard to test.

I don't get this. Do you write test for every private methods in your class? That's totally unnecessary.


I don't test (2 + 2) to make sure it equals 4.


Huh, the last thing I would have expected to see on HN. My apologies for the generally muddled writing, this was one of my oldest posts.


I enjoyed it, and thought it was quite clear. I never use classes in my own Python code any more, treating it as a functional language, except when I'm interfacing with existing library code.


The body of your second version of __init__() in one line:

self.min, self.max = sorted((min, max))

This may be what was meant with the sentence that claims "but eventually you'll need a line like this outside of init" but I'm not sure what that part means. (I generally support functional programming practices, which Python leaves a lot of room for).

Or, you could subclass tuple.


That line means that in some other method than init you will likely need to write into a line that refers to self attributes > 4 times and you will have the same issue but this time it's not avoidable.


This is just a bad example. I would use a closure here for the simple fact that it's simpler. No need to create an extra class with __call__ magic when you can just return a closure.

Now, if you're doing something more complex, a class complete with self parameters is simply unavoidable.


It is a bad example. Or at least a contrived one. A class with one member, that member being __call__, would be an odd thing to write. Showing how to convert a class which should have been a function all along into a function is not hugely helpful.


Spoiler: he does not just advocate for not using 'self', he goes as far as advocating for not using objects at all and use closures instead.


Anyone that wants to read more theory surrounding this should read a fantastic 2006 paper called 'Out of the Tar Pit'. It's available on the "papers we love" gh repo [1] and basically covers the issue of state and how it always adds avoidable complexity (and thus how you can avoid complexity by taking an alternative route).

[1] https://github.com/papers-we-love/papers-we-love/tree/master...


If it is too tiresome to write one more argument:

    class window:
        @staticmethod
        def __init__(min, max):
            window.min, window.max = sorted((min, max))

        @staticmethod
        def __call__(x):
            return window.min <= x <= window.max
Or if you are just bored of self:

    class window:
        @classmethod
        def __init__(cls, min, max):
            window.min, window.max = sorted((min, max))

        @classmethod
        def __call__(cls, x):
            return window.min <= x <= window.max
But my favorite is:

    window = lambda *args: (
        lambda (min, max)=sorted(args):
            type('', (), dict(min=min,
                              max=max,
                              __call__=lambda _, x: min <= x <= max))()
    )()


If "self" is a hell, it's such a super tiny small hell that I'm happy that's the worst thing someone has to say about it. I do think it's nice when it comes to class methods vs instance methods, being clear that there is nothing called "self" in the method, etc. Further, it's a nice indication of what OO is at a very very basic level - method organization that takes a self as a first argument.

I kind of like what Ruby did with "@" instead, but dislike some of the other syntactic choices.

Now, say, Python metaclasses -- there's a language feature I don't like.


Isn't there an issue in Python where closures don't work as expected? The following code will blow up:

https://gist.github.com/skatenerd/72281cadd2ae3e44d6cf

For some reason, stashing the variable inside of a list will fix the variable resolution..?

http://stackoverflow.com/questions/4851463/python-closure-wr...


Good point. I think serious python people (not me, I had to search a bit) would understand the scoping well enough to recognise the gotcha.

Turns out the problem is the +=, perhaps it would be a bit clearer if written as:

    grand_total = grand_total + to_add
which introduces a new local variable called grand_total shadowing the outer one. So it's straightforward enough, the pitfall is missing the shadowing.

Note that you could use the outer grand_total without problems if you don't shadow it, e.g. using it as an rvalue (is that a pythonic term?) , and also note Python 3 provides a 'nonlocal' keyword to let you pull the outer variable into scope so it won't be shadowed.


The issue is with scoping, not the closure mechanism.

Python will by default shadow variables in an enclosing scope instead of overwriting them during assignment, to prevent accidental introduction of global state. You can use the "global" keyword to force overwriting the outer variable. The list hack also works because it is a mutable data structure, but I wouldn't recommend it.


What you actually want is "nonlocal". "global" will force the assignment to be on a global variable.


Interesting - I can read from the variable but I can't assign to it.


Python decides for each name if it refers to a local or non-local (or global) variable when the function is defined. If you assign to it, that fact is used to decide that the name refers to a local variable. Otherwise, it must refer to a global variable or a variable in some parent scope.

This means that in the following code, "x" will refer to a local variable for the whole function body:

  x = "foo"

  def f():
      print(x)
      x = 42

  f()
...so instead of printing "foo" (the value of the global variable) the print statement will raise an exception:

  UnboundLocalError: local variable 'x' referenced before assignment
If you meant it to refer to a global variable or something in the parent scope, simply put "global x" or "nonlocal x", respectively, at the start of the function. Again, that applies to the whole function body, so the assignment in this example would change the value of the global variable.


You can assign to it, you just have to be explicit about that being what you want to do. (Explicit being better than implicit is part of the python philosophy)


This is what "nonlocal" is for. That would make the original code work. And putting the value inside a list works because in that case you are mutating the list not assigning a new list to the variable.


http://melpon.org/wandbox/permlink/XsgfjNU0oTsj0GaN

nonlocal was added to fix this problem.

lists make a difference because they are a mutable container for other values, not because name resolution changes when you use them.


The list solves the problem because it is okay to mutate an object from an outer scope, whereas the += tries to assign a new value to the name.

There are lots of other awful hacks:

    def a2(start):
    	def a(v):
    		a.v+=v
    		return a.v
    	a.v=start
	return a


Okay, so if we're talking crimes against Pythonicity, how about avoiding self while still using a class?

    def Window(x1, x2):
        minimum = min(x1, x2)
        maximum = max(x1, x2)
        class _(object):
            def __call__(_, x):
                return minimum <= x <= maximum
        return _()
The trick here is that the object instance takes values from its creating context as closure variables, rather than constructor parameters. Hence, it doesn't need a constructor at all, or any self references. It doesn't even need to use the self parameter it implicitly receives - hence why i can call it _ even though Peter Norvig couldn't! I've also called the class _, partly because it's sort of throwaway, in that it's never referred to after the instantiation which immediately follows its declaration, and partly just to wind people up.

You can do this in Java too, using an anonymous class to avoid having to use a throwaway name, as long as the object you're creating implements an interface:

    public static Function<Double, Boolean> Window(double x1, double x2) {
        double minimum = Math.min(x1, x2);
        double maximum = Math.max(x1, x2);
        return new Function<Double, Boolean>() {
            public Boolean apply(Double x) {
                return (minimum <= x) && (x <= maximum);
            }
        };
    }

Neither of these are outlandish hacks; rather, they illustrate the fundamental connection between objects and closures.

In particular, the Java version is extra comical because the interface being implemented is functional, so there is a mechanical transformation (that my IDE will do for me!) which evaporates the anonymous class and uses a naked function, in the form of a lambda, just as the OP advocates:

    public static Function<Double, Boolean> Window(double x1, double x2) {
        double minimum = Math.min(x1, x2);
        double maximum = Math.max(x1, x2);
        return x -> (minimum <= x) && (x <= maximum);
    }


I like self. It's explicit.


An excellent counterpoint to object-oriented thinking.


> An excellent counterpoint to object-oriented thinking.

What exactly?

That you can write code that isn't OO? That's hardly a surprise, given that code has been written in other programming paradigmas for 50 years, before and after the invention of great OO languages like Smalltalk.

Or that you can write good Python code without "self"? Also hardly a surprise, as the OO-ness has been tacked onto Python afterwards.

That's exactly the reason why there is the "self hell" in the first place.


> as the OO-ness has been tacked onto Python afterwards

This is a myth, if I recall correctly.

Python had the OO-ness from the very beginning. It might seem "tacked-on" if you dislike Guido's style of OO, though.


Well, they were added before the first public release (0.9.0 IIRC) but were not considered from the start and if I remember Guido's write-up correctly, it was more of a design hack because somebody mentioned with Python's internal design at that time, it would have been easy to add some sort of classes.

I'm no expert in the internals of Python and its evolution, so I actually don't know how much that influenced the actual design of the classes later on but it always seemed plausible from what I've seen and also given the fact that Guido himself often says that the very first version didn't have a class statement.


Do you have a link to Guido saying that?


An extensive discussion on the early implementation of Python classes: http://python-history.blogspot.com/2009/02/adding-support-fo...

Or here in this also quite interesting talk about "21 Years of Python" hhttps://www.youtube.com/watch?v=ugqu10JV7dk&t=49m50s


@staticmethod?




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: