It's completely unclear from this post whether the poster is a) a student who doesn't understand hash tables and is using the word "random" in its colloquial sense of "arbitrary", or b) someone who did know that hash tables store in a non-obvious order and is using the word "random" in its technical sense.
Given that the poster says that "you would think" the iteration order has anything to do with the insertion order, and doesn't show multiple iteration runs with different results, I'm rather inclined to assume the former. Is there evidence in the other direction?
"When iterating over a map with a range loop, the iteration order is not specified and is not guaranteed to be the same from one iteration to the next."
Also, "Since Go 1 the runtime randomizes map iteration order, as programmers relied on the stable iteration order of the previous implementation."
Well, sort of. The hash table stores elements in a semi-deterministic order, dependent only on the keys and a per-map hash seed. Every time you iterate over it, you start at a randomly-chosen point in the table and iterate sequentially.
So although you can't rely on seeing elements in the same order on any particular iteration, the order itself is far from uniformly random.
JS Object is actually an ordered map, because they failed to specify whether it was ordered or not early on, and we ended up stuck with the particular map implementation IE used (there are now lots of applications out there that depend on getting back k/v pairs in the order they were added).
yes, but kevingadd was (almost certainly) talking about in practice. In fact, I vaguely recall reading that the V8 team originally wanted to abandon insertion-order enumeration for performance but couldn't because too much code broke on the web.
It's not quite insertion order, though, which is one of the issues with just specifying it in ES6 as it's implemented. If you insert with a numeric key (or a string that's strictly a number), I believe the JS engines don't agree on the order in which the numeric key will be enumerated in relation to the alphanumeric ones (v8 and spidermonkey will put the numeric ones ahead of all the keys, sorted ascending, then do the rest in insertion order, for instance).
I'm not saying it has to be a hash map either; just that the author's post is consistent with plain old hashtable behaviour (i.e. the results are not iterated in insertion order, nor in natural order). By posting just a single example of a map iteration, the author has not demonstrated that the iteration order is "random" nor even that it changes each time through (though from other posts here, I gather that it does).
Essentially, I'm not calling out the author's claim as unbelievable, I'm calling out the article for being unconvincing.
I think this reflects a lot of the philosophy of Go (and, probably the goals of Google in supporting it). They didn't just want a language that was productive and shiny and new, but a language whose features would actively prevent coders (Google employees in particular) from writing bad code, sometimes even at the cost of brevity and coder convenience.
So in addition to this feature (actively preventing developers from depending on a particular hash map implementation), there are:
1. No implicit type conversions.
2. No assignment/increment operators as expressions (only as statements).
3. No unchecked array dereferences (though this is pretty standard in new languages).
4. You have to go out of your way to do manual locking.
5. A canonical whitespace style, and tools for enforcing it.
etc. Some of these are useful if you're careful, but the language assumes that you're not careful and will mess it up if you try doing something tricky.
This can be annoying, but it also creates this feeling that, if your code compiles, it doesn't have the same kind of hidden problems that, say C++ or Java code usually does, which is a big draw. (If I understand correctly, those are the other two main languages used at Google on the server-side.)
> ...if your code compiles, it doesn't have the same kind of hidden problems...
Except that Golang has the equivalent of C void*, or Java's Object. This is an issue with working with common data structures like Linked Lists or Trees as Go doesn't have generics. The data section of these in go are all interface{}, so they have to be cast to use the data. This cast can cause unexpected behavior if you're not careful.
EDIT: example[0] here. This is a linked list. PushBack takes an interface{}, and the ".Value" is also an interface{}. The ".(int)" and ".(string)" are casts to get the data out. The assignment to 'b' is a runtime error (not compile time). You can get the type using switch statements or reflection, but there are no compile-time checks for common data structures.
interface{} is nothing like void* because it's type-safe. interface{} is (type, value) i.e. it remembers the type of the value it contains.
void * is just a value.
In C a cast is unsafe because it doesn't tell you if you did something wrong.
In Go, type assertion is safe. It tells you if it succeeded and if you ignore it, it'll panic, informing you that you have tried to perform an illegal operation.
I was partially incorrect. I said "reflection" could look it up, but you can check it during the conversion as you said (from Effective Go[0]). But my original point (before the edit) still holds true. You need to convert any data stored in a common data structure implementations and you don't easily know its type. So you have to track the type of the data in that structure yourself. So if you pass around a tree, linked list, or whatever, it's not clear what the type is of the stored data by the type information.
But you are right, I'm new enough to Go to forget about the 'ok' check on the type conversion. Checking errors on any unsafe operations is really the best way to go. If you are unsure of the type of a structure you are getting, you really should be checking it.
Kind of strange they don't just have a type for that. In Java we have unsorted ones (hashmap), linked ones that retain the sort order (linkedhashmap), and even automatically sorting trees (treemap / sortedmap). Honestly, considering users were relying on the key order being different from the hash order, the default type should have just been made the sorted or linked one and the type added just made unsortedmap or something. Seems like a failing of the language libraries that you have to code your own of something so basic
This has proven surprisingly useful in cutting down on fragile tests, where order was relied on for a set of items, the code was later changed to insert in a different order, and then the test was exposed as never being right.
Perl started to enforce this in 5.18 due to security issues. This also gives wider latitude to the data structures used in implementations due to the looser guarantees.
A minor detail: The key order in Perl hashes was always arbitrary (as opposed to in the order the keys were added) but identical hashes had the keys returned (by keys(), values(), etc) in the same order before Perl 5.18 (released about a year ago) [1].
Since everyone here seems to be confused: yes, the Go compiler actively randomizes the order of the keys beyond the unpredictability inherent in hash tables. If your function iterates over a map twice in a row without modifying it, you will get different orders.
That is empirically not true, at least under go 1.2.1. The poster's own code produces identical iteration results even when iterating the same unchanged Map multiple times in a single code execution.
>The Go language designers noticed that people were relying on the fact that keys were normally stored in the order they were added in, so they randomized the order in which the keys are iterated over.
Um, I'm just a first-year CS student here, but isn't that a core part of how hash tables work?
They can so efficiently store and lookup data because they essentially store contents in a list with the indices as the hashes of the content.
As such, one can never reliably predict the order of the keys in a hash table, by design. That's not a feature of Go.
Yes, but there's more going on. Simple hash tables use a fixed hash function-- the order of keys is unpredictable, but consistent between runs of your program. This allows algorithmic denial of service attacks-- if you have a web server that puts querystring parameters into a hash table, and you know what bucket a key will hash to, you can force collisions and O(n^2) runtime complexity.
Hash randomization avoids the O(n) attack on collisions if the randomized seed is not detectable. But if you don't sort the output of keys, rather depend on the random seed to iterate over the keys, the seed provides no protection anymore for the O(n) attack. It is very exposed and a minor security risc for DDoS attacks.
Perl is bit better than Go here, but not as good as Python. The real problem is exposing too much information of the seed by the ordering of the keys, and not preventing the algorithmic explosion on collision attacks. Such languages should really rework their collision handling strategy and not rely on randomized seeds (which can be brute forced or easily guessed such as here) or even worse, by strengthening the hash function.
TL;DR: The poster has no idea about hash tables, what Go did, why they did it, what the problem is and what should be improved.
PS: His cygwin git blog post is also completely wrong. He uses msysgit with some msys openssl version and complains about cert warnings. Of course, he is not using cygwin.
Python solves that by generating salt on start that is used for hash generation.
This way it is still extremely hard to commence such attack and at the same time not lose performance due to randomizing elements on each iteration.
BTW: I'm not convinced this randomization even solves this vulnerability. The patch referenced doesn't look like the one that implemented this behavior. I suspect we are mixing up two different behaviors.
True, but since it works this way in practice in other languages most of the time, particularly for small maps, people come to depend on it, then get surprised when it isn't the case.
Might as well keep people aware that key ordering can't be relied upon.
If a map is implemented as a hash table then yes, that is a property. A map could also be implemented as a balanced tree, then sorted iteration order and range queries would be a feature to expose.
Just tried go1.2.1. Hashmap iteration order was not random, it was same every time. Same happens with Go playground: (http://play.golang.org/p/px7C_lCn6d).
I got this output by running same locally (go1.2.1.darwin-amd64-osx10.8.pkg):
$ go run test.go
There are 500 views for unix
There are 300 views for python
There are 100 views for go
There are 4000 views for javascript
There are 800 views for testing
There are 500 views for unix
There are 300 views for python
There are 100 views for go
There are 4000 views for javascript
The Go playground and tour won't generate random numbers: "Note: the environment in which these programs are executed is deterministic, so rand.Intn will always return the same number."
Could this cause iteration order to be the same when executed there?
The ordering is only quasi-random, depending on how keys are distributed across hash buckets. Your particular example will no longer have a consistent iteration order starting in Go 1.3.
There are at least 4 different types of ordering behavior in a map:
1. Returning in insertion order. Quite strange thing to do as it requires extra bookkeeping.
2. An order unrelated to insertion order that is consistently the same for a given set of keys. Using a hash it would be sorting by the hash value, in a balanced tree, sorting by the value itself.
3. An order unrelated to insertion order and that can be different for the same set of keys but is always the same on repeated iterations on the same map. This could be a hash table where on creation a random salt is generated on creation and hashed together.
4. An order that actually changes on every iteration. Is there an efficient way of doing this?
From the discussion here and some quick testing:
Type1: Ruby 1.9.3 (very strange)
Type2: Ruby 1.8.3, python 2.7.3, Perl <5.18
Does anyone know if Go and Perl 5.18 are doing 3 or 4? Ruby >1.9 behavior is also quite strange but seems to be intended, they've implemented a hash together with a doubly linked list[1].
People needs to stop confusing "random" with "non-deterministic". It is a stated attribute of a Hash Map (leaving aside special classes, such as a Tree Map), that iteration of keys will occur in a non-deterministic order.
That is different than random. Random implies that iterating the same Map twice (without intervening updates) will produce different results. I've never met a Hash Map for which that was true.
The irony is that the poster's own code violates the point he was trying to make -- when run under go 1.2.1, it produces insertion order results every time (at least for me). I've also never met a Hash Map (until this example in Go) which predictably produced insertion order key iteration. I am sure even here, it is mere coincidence.
I do believe the poster lacks a formal understanding of data structures. My apologies if that is incorrect, and I am missing his point.
In Go, iteration order of keys was always non-deterministic.
At some point it was changed to be also random i.e. iterating the same hash the second time would produce a different sequence of key/values than the first time.
This is exactly what the article says, except using more words.
Listen. Order of key iteration is well understood attribute of a Hash Map, and it's always non-deterministic. That's what I have said, and I am not missing the point. The poster tries to imply an insertion order iteration in hash maps that has never existed.
My second, arguably more interesting point, is that the poster's own code violates the premise of his post.
Order of key iteration in a Hash Map is not non-deterministic, unless the hash is using a random salt. Unless you are using a random salt for each Hash Map, hashes can be precomputed and will always be the same for the same input of keys, therefore deterministic even though it may look non-deterministic in nature.
On the other hand, I agree that the author of the article, does not seem to understand the underlying data structures. Either that, or the way he has written the article portrays lack of understanding.
It is quite possible that pre 1.0 they may not have been using a hash map at all, and instead they were using an ordered map, which would have given an insertion order iteration.
Although note this is pure speculation as I have not tested this on pre 1.0, and in fact you may be absolutely correct that he is implying an iteration order that never existed.
What should be noted though is this:
https://code.google.com/p/go/issues/detail?id=6719.
It looks like that as of Go-1.3, there will be a sort of semi-random iteration, where while iterating each bucket in the hash map will be iterated over in increasing or decreasing order, chosen at random. Which is good as it is not too much of a performance hit, with the benefit that the iteration order will be non-deterministic for small maps, which is currently not the case.
EDIT:
Here is an explanation of iteration over Maps in Go <1.3 and >1.3:
Map iteration previously started from a random bucket, but walked each
bucket from the beginning. Now, iteration always starts from the first
bucket and walks each bucket starting at a random offset. For
performance, the random offset is selected at the start of iteration
and reused for each bucket.
In Java, you could write an unittest that inserts 10 fixed items in a hashmap, checks if the first one is "foobar", and it'd always pass, since the order is predictable - it's essentially sorted by the fixed hashfunction of those keys.
With uniquely keyed hashes for each hashmap, all environments that iterate in hash bucket order should have effectively truly random iteration order: across equal copies of the same hash map, each copy should have its own random order.
Python 3 dicts iterate in random order; random across invocations of the interpreter in this case.
Ok here is a correct clarification of what is happening when iterating over maps in Go. The author of this article doesn't seem to understand what is going on.
Ever since Go 1.0, they implemented a feature where when iterating over a map it will choose a random bucket, in the map, to start from and iterate from the start of each bucket to the end of the bucket in order and then move to the next bucket in order. The non-deterministic nature in this stems from the initial bucket to start from, which was chosen at random. This means that the number of different possible iterations of the same map is equal to the number of buckets in the map.
This leads to a problem with small maps (those less than 8 items). Because the max size of a bucket is 8 items, if a map has less than 8 items there will be only one ordering, and therefore for small maps (those with less than 8 items) the programmer could still rely on the iteration order of the map, which is not desired.
So as of Go version 1.3, the behaviour will be as follows. Buckets will be iterated over from bucket 1 to bucket N. But within each bucket iteration will start at a random index and then iterate through the bucket. For performance reasons, the random starting index for within each bucket will be determined at the start of iteration and will be the same for every bucket. Iteration over a map with 8 or fewer elements, a single bucket, will now be non-deterministic. It should be noted that this also means that the max number of possible different iterations of a map is 8, but that does not matter as it solves the problem of having programmers relying on iteration order no matter what size the map is (except for size 0 and 1).
Wait, what exactly is happening? Do they randomize iterations or just use a hash algorithm that will spread the keys more randomly into the table? The former would lead to cache misses on big hash tables, while the latter would still performs the same, but it wouldn't be random. It just would be more obviously not the insertion order.
Put this in a test.go file locally and run it repeatedly with "go run test.go":
package main
func main() {
x := map[string]string{"A": "", "B": "", "C": "", "D": "", "E": ""}
for key, _ := range(x) {
print(key + "\n")
}
print("\nNext iter:\n")
for key, _ := range(x) {
print(key + "\n")
}
}
You'll see the order can change even within a single run. It does not appear to be uniformly random; I had to put 5 keys in because it happened far less often than expected with 3 keys, but you can see randomization often enough with this.
Normally, I'd link to the Go Playground, but this doesn't manifest there "correctly", probably due to caching of the results.
I think this is a good thing to get right in a new language. The .NET hash function for dictionaries is fixed because of this problem -- there is too much legacy code which depends on the ordering of that particular hash function to change it.
Nice post. I admire the tenacity of the golang team to keep pushing this language forward. Adoption was slow at first, and Google can be somewhat quick to drop projects as we all know, so it's nice to see this one moving on.
Either is is hash mapped and the order should be based on the hash value.
Or golang wastes shitloads of good CPU power to help a few developers to write correct loops, help the developers so they don't get blamed for their faulty loops when shit goes haywire.
Given that the poster says that "you would think" the iteration order has anything to do with the insertion order, and doesn't show multiple iteration runs with different results, I'm rather inclined to assume the former. Is there evidence in the other direction?