I like that method of defining a named tuple much more. I understand why it's required, but having to duplicate the name of the named tuple always bothered me. The type hints are excellent to have too.
It really shouldn't. This is actually an example of pickle being broken.
The class exists, there is an existing valid reference to it, and Python itself knows about it and the class itself knows its own name and namedtuple generates the right __getnewargs__. Everything is there for this to "just work" but pickle expects that every class object will have a reference to it of the same name which is kinda weird when you think about it.
You can see it with this stupid little program.
from collections import namedtuple
import pickle
def get_all_subclasses(cls):
all_subclasses = []
for subclass in cls.__subclasses__():
if subclass != type:
all_subclasses.append(subclass)
all_subclasses.extend(get_all_subclasses(subclass))
return all_subclasses
E = namedtuple('EEEEEEEEEEEEEEEEEEE', 'x')
e = E(x='hello')
for cls in get_all_subclasses(object):
print(cls)
pickle.dumps(e)
You'll that the class it there and called the right thing! But pickle tries to look up a reference to it under __main__.
There’s a difference between the classes __qualname__ and the requirement that there be reference to the class at, say, __main__.$qualname as the way for pickle to actually find the class object.
This is the part that, to me, is really odd because pickle knows (in theory) the class and module name of the thing it needs to instantiate and the class objects themselves know their name and module.
Up to the lack efficiency actually doing this you could just enumerate all class objects to find the one with the right name and module name.
You get nested classes for free this way and you break the requirement that there be references to class objects of the same name in the module.
That's awesome and I was not previously aware of it. Literally a decade ago I was asking on SO about exactly this kind of use-case [1] and at the time the answers were pretty unsatisfying; it's great to hear that the story is better now.
If thats all your using your classes for, then a named tuple is probably a better solution, or a dataclass. Though I normally just use dicts in that situation. If I see someone create a class without any methods, or atleast planned methods, I don't let it through code review.
EDIT: Also, Raymond Hettinger created named tuples. I'm not normally one for call to authority, or hero worship, but I am a huge fan of his. I recommend that anyone interested in Python should watch as many of his talks as they can.
EDIT2: As masklinn pointed out, another really good use of named tuples is when you're already returning a tuple, and you realize it would be better if it had names. You could change it to a named tuple without breaking any of the existing code. Unless they're doing something dumb like halfassing type checking at runtime. (this use case is in the article, which i didn't read at first)
Well in and of itself none, in the sense that anything a namedtuple can do you could do by hand (it really just defines a class). However namedtuple:
* extends tuples, so a namedtuple is literally a tuple (which is useful)
* sets up a bunch of properties for the "named fields", which are basically just names on the tuple elements
* sets up a few other utility methods e.g. nice formatting, `_make`, `_asdict`, `_replace`
Now the latter two are nice, and mostly replicated by dataclasses (or attrs). The first one is the raison d'être of namedtuples though: originally their purpose is to "upgrade" tuple return values into richer / clearer types e.g. urlparse originally returned a 6-utple which is not necessarily super wieldy / clear, you can probably infer that the 3rd element is the path but… after upgrading to namedtuple it's just `result.path which is usually much clearer.
And because namedtuples are still classes in and of themselves, you can inherit from them to create a class with a `__dict__` with relative ease.
I feel like none of the sibling answers actually answer your question which is "absolutely nothing." The function namedtuple is code generator that constructs a class definition and then eval()'s it.
The reason you reach for it is because it's tedious to write the same methods over and over to get things like a nice repr, methods covert between dicts, or pickling support.
The source from Python 3.6 is much more readable than 3.9 so I recommend reading that if you want to see how it works.
NamedTuple or namedtuple instances are tuple instances that have the same properties that regular tuples have. They are immutable (you cannot reassign their fields), you can index into them (a[0], a[1] instead of a.x and a.y), you can unpack them with *a. They can have methods like regular classes can, including additional @property methods. A NamedTuple class cannot inherit from another class, not even other named tuples.
You can make an env.py and override module __getattr__ and then import environment variables just like they're regular Python objects (even booleans, floats, collections, etc, despite .env files being string only
Huge force multiplier for ML cuz then you can do hyperparameter optimization just by passing different environment variables in an outer loop (even inside your infrastructure as code)
A little unrelated, but this brings up a question I've had for a while.
Seems like one day, everyone around me was using dataclasses. I had not even heard of them. It felt like I had missed some memo or newsletter. It felt weird.
Here's my question: what should I have been reading / where should I have been "hanging out" online, so that I would have known that dataclasses were a thing? What are your go-tos for news about new language features, libraries that everyone is using, etc?
Hacker news is great, but it doesn't quite fill that need for me, it seems.
For Python, you pretty much just need to be aware of when the new major version is released because the "what's new" pages are pretty good. Here's the one in which data classes were released: https://docs.python.org/3/whatsnew/3.7.html
Concretely, I found out about dataclasses by using [pydantic](https://pydantic-docs.helpmanual.io/) and seeing their drop-in `@dataclass` annotation - it got me curious about the adjacent stdlib class. I was using pydantic because I started using FastAPI to build a REST interface, which has pydantic deeply integrated.
Generally, I find out about new features through PEP posts, and I reach those by seeing a keyword that I don't know in random code I read online
I follow the language specific sub-reddits, and I read release notes for major releases of languages (so for python that would be 3.X) even if I wasn't going to jump to the version set.
> I read release notes for major releases of languages
_This_, so much.
If you are a heavy user of a language / library, it's immensely helpful to look at the release notes every once in a while. Even if you don't plan to upgrade now, it gives you an idea of where things are going (and may eventually tip the scales to a "fuck it, it's now worth upgrading" moment).
I found out about data classes on hn, before they were in the standard library. I also regularly search for python to see what stories I missed.
I also like to keep up to date with the PyCon videos, as well as some of the other python conferences. But, as others have said, the release notes are the main source for whats new, if a bit dry.
That said, I never actually use data classes. I normally just use dicts, and occasionally named tuples.
I read the release notes. I often see posts for releases here on HN or on reddit, but often I will check in on the official repos or websites to see whats new.
I like to spend a few hours a week reading up on whats happening, or try something new to keep up. Checking out new language features is part of that processes to me.
Dataclasses and Enums have, since their introduction, taken over as my foundation of Python data structures. They've obsoleted NamedTuple, namedtuple, and traditional classes in my code.
They're a clean way of defining how difference functions, methods etc should interact with each other.
Y'all may be interested in a fast dataclass-like library I maintain called msgspec (https://jcristharif.com/msgspec/) that provides many of the benefits of dataclasses (mutable, type declarations), but with speedy performance. The objects are mainly meant to be used for (de)serialization (currently only msgpack is supported, but JSON support is in the works), with native type validation (think a faster pydantic).
Mirroring the author's initialization benchmark:
In [1]: import msgspec
In [2]: from typing import NamedTuple
In [3]: class Point(msgspec.Struct):
...: x: int
...: y: int
...:
In [4]: class PointNT(NamedTuple):
...: x: int
...: y: int
...:
In [5]: %timeit Point(1, 2)
48.4 ns ± 0.195 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [6]: %timeit PointNT(1, 2)
185 ns ± 0.851 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
After your example, `Point` is a class. You are not creating a record, you’re creating the class that defines the schema of fields in any record. A class can be defined without using the `class foo:` syntax, and in this case the class is returned by the function `namedtuple` to produce an equivalent effect.
As for the unusual space-delimited syntax, the missing context here is that namedtuple is a very, very old part of Python that predates the conventions now considered good style. Using space delimiters for lists of strings is a common idiom in Perl scripting due to the `qw()` quote syntax. Note the archetypical context where namedtuple was imagined to apply (record-oriented processing of logs and SQL result sets) was commonly handled using Perl before Python became dominant.
Namedtuple is definitely the most prominent example of this syntax convention in Python, but other libraries use it too. `enum.Enum` supports a function-like interface directly modeled off namedtuple. It’s a mildly bad idea to keep using it IMHO because it complicates static analysis or refactoring. If someone does a wide search-and-replace of a field name literal it’s easy to miss this edge case.
> a very, very old part of Python that predates the conventions now considered good style. Using space delimiters for lists of strings is a common idiom in Perl scripting due to the `qw()` quote syntax.
Note that the Smalltalk-80 class creation method also expects a space-separated string of instance variable names, and that's an environment considerably older than Perl.
The function was defined to either take a space separated list of names or a sequence of names. The docs seem pretty clear to me:
> The field_names are a sequence of strings such as ['x', 'y']. Alternatively, field_names can be a single string with each fieldname separated by whitespace and/or commas, for example 'x y' or 'x, y'.
It's a class factory function, so right off the bat it's a bit weird. The original intent of using spaces was probably to minimize typing. Since they're attribute names they can't have spaces in them, so it's a safe delimiter.
You could imagine the function dynamically creates the class by manipulating the underlying dictionary (or whatever the "slot" alternative uses). At that level of python, attributes are strings anyway. Handling spaces is just a matter of calling .split().
In modern python, there's a whole metaclass system that would possibly let you do the equivalent without getting your hands dirty with internal data structures.
Yeah, I think more time is wasted in confusion and arguing about style than is saved in keyboard strokes.
There's definitely a class of
persnickety coders out there though. As a technical leader within a growing organization, sometimes the bulk of my time spent in a code review turns into style guide enforcement. It can get old arguing about the subtle merits of someone's preferred but style violating syntax over and over, especially when all I care about is maintaining a standard of consistency.
Using a list or tuple for the fields is generally best:
Point = namedtuple('Point', ('x', 'y'))
Support for a space and/or comma separated strings was requested by users. It made life easier for them when syncing with other space/comma separated strings. For example, an SQL query, "SELECT name, rank, serial_number FROM Soldiers;" would have a corresponding named tuple where the field names could be cut-and-pasted from the SQL query.
I think it comes down to the idea that going out of your way to make the library work either way makes it easier for people to use, even if it makes the library itself a bit more complicated.
I wish more library devs would go out of their way to add such niceties.
A big one that I always do is if I'm expecting an iterator of
objects I make it just work with one.
from collections.abc import Iterable
def my_function(arg):
# slightly different if you're looking for a collection of strings or bytes
if not isinstance(arg, Iterable):
arg = [arg]
for item in arg:
do the thing
Or if you have a specific type of object you want it goes like this
from collections.abc import Iterable
def my_function(arg):
if isinstance(arg, MyObjectIWant):
arg = [arg]
for item in arg:
do the thing
I like to think of my libraries as mini programs for users, and I hate when validation is too strict, when it could be so easy to fix. Like when a phone number validator insists on (XXX)XXX-XXXX or XXX.XXX.XXXX or XXXXXXXXXX when it could just ignore everything that isn't a number and make sure there is 10 of them.
This sounds like a nice idea in theory, and makes a lot of sense for polished, publicly visible libraries where convenience trumps simplicity, but the edge cases can lead to confusing failures and bloat otherwise simple code — as you noted, your example code appears to work for arbitrary objects but actually fails for `str` or `bytes`.
A great case study in the issues here is Pandas, which routinely allows arguments to be columns, lists of columns, string column labels, lists of string column labels, and so on. It works surprisingly well, but at the cost of inventing a new semantic distinction between `list` objects and other sequence types like `tuple` — someone unfamiliar with Pandas who thinks “Why does this need to be a list comprehension when a generator expression will do?” is likely introducing a bug.
Another subtle issue is that code permissive with inputs is harder to extend via wrapper code. Suppose you have a function that does some sort of processing for any number of given datetimes, but also accepts integer seconds since 1970-01-01, a formatted date string, or any mixed sequence of these types. If you need to write a wrapper that first rounds all times to the most recent hour, your task is much easier if the only accepted type is `Iterable[datetime]`.
I’d speculate it’s meant to mimic Perl’s `qw()` operator, which is like `str.split()` in Python. The module was originally written for contexts where you’re processing SQL result sets with fixed schemas, and before Python these tasks were traditionally handled in Perl. Python inherits a lot of these loose traditions. Similarly, some parts of the standard-lib (`sys`, `os`) follow shell- or C-like naming conventions that would seem bizarre to someone who’s never used a shell prompt.
With a non-trivial example that uses readable attribute names, a single, long, space-delimited string becomes more a burden than a convenience, I think. Also, the amount of time saved typing is miniscule in comparison to all the rest of the development work that'll happen.
> With a non-trivial example that uses readable attribute names, a single, long, space-delimited string becomes more a burden than a convenience, I think.
For a suitable definition of “non-trivial” and “readable” (where the former is “long list of attributes” and the latter is “long attribute names”), I’d agree, but plenty of real, serious namedtuple use is for namedtuples with small numbers of short attribute names, and those are more readable (in the literal sense) this way.
OTOH, for the nontrivial, static uses, you probably want to skip right past namedtuple() with a static list of string literals for names to typing.NamedTuple with its dataclass-like syntax, including type hints, since its more readable and also supports typing.
> Also, the amount of time saved typing is miniscule in comparison to all the rest of the development work that'll happen.
Sure, but if you start passing on providing (or, on the other side, using) conveniences because each is small in isolation, the aggregate cost ends up being high.
They are dynamically creating a named tuple class (or prototype). The namedtuple implementation in the python standard library indeed accepts a space separated list of field names. Once it's been defined (as above) then you can create instances (records) like "Point(2,3)"
> Are you dynamically creating a named tuple (a record?)
No, it's creating a namedtuple type, aka a subclass of `tuple`. So the fieldnames are literally the names given to the tuple's items: Point is a pair (a two-uple) whose 0th item can be accessed as `x` and 1st item `y`.
> Are you dynamically creating a named tuple (a record?) by passing a space separated list of field names? Why?
Very little in python is bound statically. This is akin to a type definition. The type will behave as an ordered tuple that can be indexed but also alias these attribute names to those ordinals.
> Are you dynamically creating a named tuple (a record?) by passing a space separated list of field names? Why?
I believe that's called "procedural record interface" in Scheme and it does have its uses, for example if you need to create records for data the structure of which you don't know in advance.
Speaking of namedtuple, I would encourage anybody who uses Python and wants to learn a thing or two to read the source code for them. At least one of the things you learn should probably fall in the "what not to do" category. There's a lot going on in there to support all that magic you see from the outside, and it's a little scary in there.
I have been spending an embarrassing amount of time trying to merge a one-off version of dataclasses with dataclasses to pick up the codegen based on type hints. This stuff is nasty and subtle under the covers. I would rather be dealing with a real closure or with real compositional capabilities. Dataclasses are gross in the weeds.
For example, y = dataclass(x) mutates its argument. That is, y = x.
I thought I'd add to this (because I was surprised) that the equivalent namedtuple is also 56 bytes (I expected it to be larger) and the equivalent dataclass is a mere 48 bytes. (although there's overhead for defining a namedtuple or a dataclass, on the order of a constant 1kb).
Although there seems to be some kind of trickery happening there, because if I make the class accept 3 floats instead of two, neither the class nor the instance get larger.
> Although there seems to be some kind of trickery happening there, because if I make the class accept 3 floats instead of two, neither the class nor the instance get larger.
The dataclass stores stuff in a normal instance __dict__ and sys.getsize is not recursive.
namedtuples are variable-size instances, they store the attributes “inline”.
sys.getsizeof is not recursive so that's just the size of the collection itself, excluding the stuff it references (so both keys and values are additional, here the strings are literal so they're interned and part of the program constants).
Not to mention that the first case gets assistance from autocomplete in an IDE or ipython session. That speeds up typing long, descriptive names so much.
In cases where you don't care about immutability, I'd think of it as a better version of TypedDict (though TypedDict still has its place). It makes my IDE more helpful, makes my code more self-documenting, and allows mypy to tell me when I'm being dumb.
I guess one difference is that when you inspect it, it doesn't indicate that it is a `point`, just that it's named tuple of two variables, so it's not exactly equivalent.
https://docs.python.org/3/library/typing.html#typing.NamedTu...
This is equivalent to: