Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Go’s map iteration order is random (nathanleclaire.com)
46 points by zenlikethat on April 27, 2014 | hide | past | favorite | 55 comments


It's completely unclear from this post whether the poster is a) a student who doesn't understand hash tables and is using the word "random" in its colloquial sense of "arbitrary", or b) someone who did know that hash tables store in a non-obvious order and is using the word "random" in its technical sense.

Given that the poster says that "you would think" the iteration order has anything to do with the insertion order, and doesn't show multiple iteration runs with different results, I'm rather inclined to assume the former. Is there evidence in the other direction?


Here: http://blog.golang.org/go-maps-in-action

"When iterating over a map with a range loop, the iteration order is not specified and is not guaranteed to be the same from one iteration to the next."

Also, "Since Go 1 the runtime randomizes map iteration order, as programmers relied on the stable iteration order of the previous implementation."



Well, sort of. The hash table stores elements in a semi-deterministic order, dependent only on the keys and a per-map hash seed. Every time you iterate over it, you start at a randomly-chosen point in the table and iterate sequentially.

So although you can't rely on seeing elements in the same order on any particular iteration, the order itself is far from uniformly random.


To play devils advocate here... There's nobody saying a map has to be a hash map. It could be any sort of map.

Look at JavaScript. Those are just plain maps. Why should expecting sophisticated hashes for simple maps be the default?


JS Object is actually an ordered map, because they failed to specify whether it was ordered or not early on, and we ended up stuck with the particular map implementation IE used (there are now lots of applications out there that depend on getting back k/v pairs in the order they were added).


In the ECMAScript language spec, at least as far back as 1999, it quite specifically does say "unordered".


yes, but kevingadd was (almost certainly) talking about in practice. In fact, I vaguely recall reading that the V8 team originally wanted to abandon insertion-order enumeration for performance but couldn't because too much code broke on the web.

It's not quite insertion order, though, which is one of the issues with just specifying it in ES6 as it's implemented. If you insert with a numeric key (or a string that's strictly a number), I believe the JS engines don't agree on the order in which the numeric key will be enumerated in relation to the alphanumeric ones (v8 and spidermonkey will put the numeric ones ahead of all the keys, sorted ascending, then do the rest in insertion order, for instance).


I'm not saying it has to be a hash map either; just that the author's post is consistent with plain old hashtable behaviour (i.e. the results are not iterated in insertion order, nor in natural order). By posting just a single example of a map iteration, the author has not demonstrated that the iteration order is "random" nor even that it changes each time through (though from other posts here, I gather that it does).

Essentially, I'm not calling out the author's claim as unbelievable, I'm calling out the article for being unconvincing.


Another good example is C++, std::map is (usually) a red-black tree.


Hmm, seems like part of the artcile went over your head, especially the references to the intentional change of this behaviors from Go's designers.

The story with regards to random traversal is about hiding an implementation detail the users should not rely on.


I think this reflects a lot of the philosophy of Go (and, probably the goals of Google in supporting it). They didn't just want a language that was productive and shiny and new, but a language whose features would actively prevent coders (Google employees in particular) from writing bad code, sometimes even at the cost of brevity and coder convenience.

So in addition to this feature (actively preventing developers from depending on a particular hash map implementation), there are:

1. No implicit type conversions.

2. No assignment/increment operators as expressions (only as statements).

3. No unchecked array dereferences (though this is pretty standard in new languages).

4. You have to go out of your way to do manual locking.

5. A canonical whitespace style, and tools for enforcing it.

etc. Some of these are useful if you're careful, but the language assumes that you're not careful and will mess it up if you try doing something tricky.

This can be annoying, but it also creates this feeling that, if your code compiles, it doesn't have the same kind of hidden problems that, say C++ or Java code usually does, which is a big draw. (If I understand correctly, those are the other two main languages used at Google on the server-side.)


> ...if your code compiles, it doesn't have the same kind of hidden problems...

Except that Golang has the equivalent of C void*, or Java's Object. This is an issue with working with common data structures like Linked Lists or Trees as Go doesn't have generics. The data section of these in go are all interface{}, so they have to be cast to use the data. This cast can cause unexpected behavior if you're not careful.

EDIT: example[0] here. This is a linked list. PushBack takes an interface{}, and the ".Value" is also an interface{}. The ".(int)" and ".(string)" are casts to get the data out. The assignment to 'b' is a runtime error (not compile time). You can get the type using switch statements or reflection, but there are no compile-time checks for common data structures.

[0] http://play.golang.org/p/K6PpnOkOq6


You're blaming a language for a bug in your code.

interface{} is nothing like void* because it's type-safe. interface{} is (type, value) i.e. it remembers the type of the value it contains. void * is just a value.

Your code snippet has a bug because you didn't check the type assertion result. Here's a fixed version: http://play.golang.org/p/vlsXtEmgMs

In C a cast is unsafe because it doesn't tell you if you did something wrong.

In Go, type assertion is safe. It tells you if it succeeded and if you ignore it, it'll panic, informing you that you have tried to perform an illegal operation.

For more see: http://golang.org/ref/spec#Type_assertions


I was partially incorrect. I said "reflection" could look it up, but you can check it during the conversion as you said (from Effective Go[0]). But my original point (before the edit) still holds true. You need to convert any data stored in a common data structure implementations and you don't easily know its type. So you have to track the type of the data in that structure yourself. So if you pass around a tree, linked list, or whatever, it's not clear what the type is of the stored data by the type information.

But you are right, I'm new enough to Go to forget about the 'ok' check on the type conversion. Checking errors on any unsafe operations is really the best way to go. If you are unsure of the type of a structure you are getting, you really should be checking it.

[0] http://golang.org/doc/effective_go.html#interface_conversion...


> 1. No implicit type conversions.

There are implicit type conversions: http://golang.org/ref/spec#Assignability

> 4. You have to go out of your way to do manual locking.

I don't understand what this means. Go certainly does have a model that permits data races.


(...on the server-side?)


... whoops.

Fixed.


Kind of strange they don't just have a type for that. In Java we have unsorted ones (hashmap), linked ones that retain the sort order (linkedhashmap), and even automatically sorting trees (treemap / sortedmap). Honestly, considering users were relying on the key order being different from the hash order, the default type should have just been made the sorted or linked one and the type added just made unsortedmap or something. Seems like a failing of the language libraries that you have to code your own of something so basic


For those interested, this was added a few months before Go 1.0:

https://codereview.appspot.com/5285042/

This has proven surprisingly useful in cutting down on fragile tests, where order was relied on for a set of items, the code was later changed to insert in a different order, and then the test was exposed as never being right.

Also of note, if you range over map of comparable keys with text/template, the order is reliable: http://golang.org/pkg/text/template/


Perl started to enforce this in 5.18 due to security issues. This also gives wider latitude to the data structures used in implementations due to the looser guarantees.

http://onionstand.blogspot.com/2012/12/are-you-relying-on-ha...


A minor detail: The key order in Perl hashes was always arbitrary (as opposed to in the order the keys were added) but identical hashes had the keys returned (by keys(), values(), etc) in the same order before Perl 5.18 (released about a year ago) [1].

[1] https://metacpan.org/pod/release/RJBS/perl-5.18.0/pod/perlde...


Since everyone here seems to be confused: yes, the Go compiler actively randomizes the order of the keys beyond the unpredictability inherent in hash tables. If your function iterates over a map twice in a row without modifying it, you will get different orders.


That is empirically not true, at least under go 1.2.1. The poster's own code produces identical iteration results even when iterating the same unchanged Map multiple times in a single code execution.


>The Go language designers noticed that people were relying on the fact that keys were normally stored in the order they were added in, so they randomized the order in which the keys are iterated over.

Um, I'm just a first-year CS student here, but isn't that a core part of how hash tables work?

They can so efficiently store and lookup data because they essentially store contents in a list with the indices as the hashes of the content.

As such, one can never reliably predict the order of the keys in a hash table, by design. That's not a feature of Go.


Yes, but there's more going on. Simple hash tables use a fixed hash function-- the order of keys is unpredictable, but consistent between runs of your program. This allows algorithmic denial of service attacks-- if you have a web server that puts querystring parameters into a hash table, and you know what bucket a key will hash to, you can force collisions and O(n^2) runtime complexity.

Hash randomization avoids this attack. Go bug: https://code.google.com/p/go/issues/detail?id=2630


Hash randomization avoids the O(n) attack on collisions if the randomized seed is not detectable. But if you don't sort the output of keys, rather depend on the random seed to iterate over the keys, the seed provides no protection anymore for the O(n) attack. It is very exposed and a minor security risc for DDoS attacks.

Perl is bit better than Go here, but not as good as Python. The real problem is exposing too much information of the seed by the ordering of the keys, and not preventing the algorithmic explosion on collision attacks. Such languages should really rework their collision handling strategy and not rely on randomized seeds (which can be brute forced or easily guessed such as here) or even worse, by strengthening the hash function.

TL;DR: The poster has no idea about hash tables, what Go did, why they did it, what the problem is and what should be improved.

PS: His cygwin git blog post is also completely wrong. He uses msysgit with some msys openssl version and complains about cert warnings. Of course, he is not using cygwin.


Python solves that by generating salt on start that is used for hash generation.

This way it is still extremely hard to commence such attack and at the same time not lose performance due to randomizing elements on each iteration.

BTW: I'm not convinced this randomization even solves this vulnerability. The patch referenced doesn't look like the one that implemented this behavior. I suspect we are mixing up two different behaviors.


True, but since it works this way in practice in other languages most of the time, particularly for small maps, people come to depend on it, then get surprised when it isn't the case.

Might as well keep people aware that key ordering can't be relied upon.


If a map is implemented as a hash table then yes, that is a property. A map could also be implemented as a balanced tree, then sorted iteration order and range queries would be a feature to expose.


Just tried go1.2.1. Hashmap iteration order was not random, it was same every time. Same happens with Go playground: (http://play.golang.org/p/px7C_lCn6d).

I got this output by running same locally (go1.2.1.darwin-amd64-osx10.8.pkg):

  $ go run test.go 
  There are 500 views for unix
  There are 300 views for python
  There are 100 views for go
  There are 4000 views for javascript
  There are 800 views for testing
  There are 500 views for unix
  There are 300 views for python
  There are 100 views for go
  There are 4000 views for javascript
... same order is repeated forever.


The Go playground and tour won't generate random numbers: "Note: the environment in which these programs are executed is deterministic, so rand.Intn will always return the same number."

Could this cause iteration order to be the same when executed there?


Perhaps, but I'm getting analogous results when running exactly same code locally.


The ordering is only quasi-random, depending on how keys are distributed across hash buckets. Your particular example will no longer have a consistent iteration order starting in Go 1.3.

https://code.google.com/p/go/issues/detail?id=6719


So random order in Go1.3. Sounds good to me.

Tried anyways with ten items, the results are interesting:

http://play.golang.org/p/dVZvmBOu_g

With ten items, it seems to randomly (?) alternate between two iteration orders.

Local Go1.2.1 has similar behavior.


There are at least 4 different types of ordering behavior in a map:

1. Returning in insertion order. Quite strange thing to do as it requires extra bookkeeping.

2. An order unrelated to insertion order that is consistently the same for a given set of keys. Using a hash it would be sorting by the hash value, in a balanced tree, sorting by the value itself.

3. An order unrelated to insertion order and that can be different for the same set of keys but is always the same on repeated iterations on the same map. This could be a hash table where on creation a random salt is generated on creation and hashed together.

4. An order that actually changes on every iteration. Is there an efficient way of doing this?

From the discussion here and some quick testing:

Type1: Ruby 1.9.3 (very strange)

Type2: Ruby 1.8.3, python 2.7.3, Perl <5.18

Does anyone know if Go and Perl 5.18 are doing 3 or 4? Ruby >1.9 behavior is also quite strange but seems to be intended, they've implemented a hash together with a doubly linked list[1].

[1] http://www.igvita.com/2009/02/04/ruby-19-internals-ordered-h...


People needs to stop confusing "random" with "non-deterministic". It is a stated attribute of a Hash Map (leaving aside special classes, such as a Tree Map), that iteration of keys will occur in a non-deterministic order.

That is different than random. Random implies that iterating the same Map twice (without intervening updates) will produce different results. I've never met a Hash Map for which that was true.

The irony is that the poster's own code violates the point he was trying to make -- when run under go 1.2.1, it produces insertion order results every time (at least for me). I've also never met a Hash Map (until this example in Go) which predictably produced insertion order key iteration. I am sure even here, it is mere coincidence.

I do believe the poster lacks a formal understanding of data structures. My apologies if that is incorrect, and I am missing his point.


Yes, you are missing the point.

In Go, iteration order of keys was always non-deterministic.

At some point it was changed to be also random i.e. iterating the same hash the second time would produce a different sequence of key/values than the first time.

This is exactly what the article says, except using more words.


Listen. Order of key iteration is well understood attribute of a Hash Map, and it's always non-deterministic. That's what I have said, and I am not missing the point. The poster tries to imply an insertion order iteration in hash maps that has never existed.

My second, arguably more interesting point, is that the poster's own code violates the premise of his post.


Order of key iteration in a Hash Map is not non-deterministic, unless the hash is using a random salt. Unless you are using a random salt for each Hash Map, hashes can be precomputed and will always be the same for the same input of keys, therefore deterministic even though it may look non-deterministic in nature.

On the other hand, I agree that the author of the article, does not seem to understand the underlying data structures. Either that, or the way he has written the article portrays lack of understanding.

It is quite possible that pre 1.0 they may not have been using a hash map at all, and instead they were using an ordered map, which would have given an insertion order iteration.

Although note this is pure speculation as I have not tested this on pre 1.0, and in fact you may be absolutely correct that he is implying an iteration order that never existed.

What should be noted though is this: https://code.google.com/p/go/issues/detail?id=6719. It looks like that as of Go-1.3, there will be a sort of semi-random iteration, where while iterating each bucket in the hash map will be iterated over in increasing or decreasing order, chosen at random. Which is good as it is not too much of a performance hit, with the benefit that the iteration order will be non-deterministic for small maps, which is currently not the case.

EDIT:

Here is an explanation of iteration over Maps in Go <1.3 and >1.3:

Map iteration previously started from a random bucket, but walked each bucket from the beginning. Now, iteration always starts from the first bucket and walks each bucket starting at a random offset. For performance, the random offset is selected at the start of iteration and reused for each bucket.


What? Is unordered traversal of a hashmap unexpected? I would be surprised if it did anything different.


In Java, you could write an unittest that inserts 10 fixed items in a hashmap, checks if the first one is "foobar", and it'd always pass, since the order is predictable - it's essentially sorted by the fixed hashfunction of those keys.

In Golang, it'd change between each iteration.


With uniquely keyed hashes for each hashmap, all environments that iterate in hash bucket order should have effectively truly random iteration order: across equal copies of the same hash map, each copy should have its own random order.

Python 3 dicts iterate in random order; random across invocations of the interpreter in this case.


Ok here is a correct clarification of what is happening when iterating over maps in Go. The author of this article doesn't seem to understand what is going on.

Ever since Go 1.0, they implemented a feature where when iterating over a map it will choose a random bucket, in the map, to start from and iterate from the start of each bucket to the end of the bucket in order and then move to the next bucket in order. The non-deterministic nature in this stems from the initial bucket to start from, which was chosen at random. This means that the number of different possible iterations of the same map is equal to the number of buckets in the map.

This leads to a problem with small maps (those less than 8 items). Because the max size of a bucket is 8 items, if a map has less than 8 items there will be only one ordering, and therefore for small maps (those with less than 8 items) the programmer could still rely on the iteration order of the map, which is not desired.

So along came this issue: https://code.google.com/p/go/issues/detail?id=6719 which seems to have a fix implemented in Go version 1.3. (Here is the revision log for the committed change: https://code.google.com/p/go/source/detail?r=6ea672cbfe43).

So as of Go version 1.3, the behaviour will be as follows. Buckets will be iterated over from bucket 1 to bucket N. But within each bucket iteration will start at a random index and then iterate through the bucket. For performance reasons, the random starting index for within each bucket will be determined at the start of iteration and will be the same for every bucket. Iteration over a map with 8 or fewer elements, a single bucket, will now be non-deterministic. It should be noted that this also means that the max number of possible different iterations of a map is 8, but that does not matter as it solves the problem of having programmers relying on iteration order no matter what size the map is (except for size 0 and 1).


Wait, what exactly is happening? Do they randomize iterations or just use a hash algorithm that will spread the keys more randomly into the table? The former would lead to cache misses on big hash tables, while the latter would still performs the same, but it wouldn't be random. It just would be more obviously not the insertion order.


I'm lost after reading this. Is the order actually randomized, or does it merely reflect the underlying storage order?


Put this in a test.go file locally and run it repeatedly with "go run test.go":

    package main

    func main() {
        x := map[string]string{"A": "", "B": "", "C": "", "D": "", "E": ""}
        for key, _ := range(x) {
             print(key + "\n")
        }
        print("\nNext iter:\n")
        for key, _ := range(x) {
             print(key + "\n")
        }
    }
You'll see the order can change even within a single run. It does not appear to be uniformly random; I had to put 5 keys in because it happened far less often than expected with 3 keys, but you can see randomization often enough with this.

Normally, I'd link to the Go Playground, but this doesn't manifest there "correctly", probably due to caching of the results.


Iteration is in storage order, but the start point for that iteration is chosen at random.


I think this is a good thing to get right in a new language. The .NET hash function for dictionaries is fixed because of this problem -- there is too much legacy code which depends on the ordering of that particular hash function to change it.


Nice post. I admire the tenacity of the golang team to keep pushing this language forward. Adoption was slow at first, and Google can be somewhat quick to drop projects as we all know, so it's nice to see this one moving on.


        fmt.Println("Key:", k, "Value:", m[k])
Interesting that, given the assumption that it makes regarding the implementation of the method. (Hash order is order just the same.)


Fuzzing as a language feature!

Hmm... What other things could be done like this, I wonder?


Either is is hash mapped and the order should be based on the hash value.

Or golang wastes shitloads of good CPU power to help a few developers to write correct loops, help the developers so they don't get blamed for their faulty loops when shit goes haywire.


Or you missed some tricks on writing hash maps.


Thats not surprising at all to me.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: