Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Programming error invalidates US Diversity Visa lottery results (state.gov)
52 points by troydavis on May 13, 2011 | hide | past | favorite | 65 comments


Q: Why was it necessary to invalidate the names that were selected?

U.S. law requires that Diversity Immigrant visas be made available through a strictly random process. A computer programming error resulted in a selection that was not truly random.

They maybe should be careful with their wording, since "not truly random" would basically disqualify any code based random generator.

Overview of PRNG, TRNG for reading if you're unfamiliar: http://www.random.org/randomness/


My guess is that there is a NIST definition of "truly random" which they failed to satisfy.


I doubt PRNG is the issue. PRNGs satisfy the criteria for equal likelihood of being selected which I believe is what they want. A truly stochastic process is not necessary.


I wasn't suggesting that it was the issue. I was pointing out their QnA section uses the phrase "truly random" to explain why that specific set wasn't valid. Which semantically, "truly random" disqualifies all of their previous sets and not just this one. Their QnA section probably should have been more along the lines of "not appearing random enough".


One issue with PRNGs used for lotteries like this is the seed size. For example, the law might say that every combination of 12 eligible people is equally likely to be selected for a jury. But the number of possible sets of 12 people can be very large, say 2^1024 for some population. If the initial seed of the PRNG only takes 320 bits, then it violates the law. There can only be 2^320 different combinations of 12 people selected, this means that by definition not all possible sets of the 2^1024 are equally likely (many have probability 0).

If the law had said instead that each PERSON must be equally likely to be selected for the jury, then the PRNG would have been fine since the number of people is much lower than the seed space.

I don't believe this was the issue here, but it's interesting.

Reference with more examples: http://portal.acm.org/citation.cfm?id=769827


"Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin" - John von Neumann


I was about to ask how are they generating their random numbers now.


Further, a "truly random" process is not necessarily uniformly random - although the colloquial definition of "random" implies uniformly random, a number that comes from any arbitrary probability distribution is a random number.


Posted results have been rescinded and the lottery is being redone. state.gov site links to video with more info: http://link.brightcove.com/services/player/bcpid1857622883?b...


Important bit that's in the video but not on the page:

>"...[a programming error had caused] more than 90% of the selectees to come from the first two days of the registration period ..."

The registration period was 30 days, and had "many" entries each day. That's... shockingly bad.


Would be much more interesting if we knew how the results wound up nonrandom.


Maybe something like the following?

  randWinnerID = (int) Math.random();


It was much worse, all the "winners" were the people who submitted their applications in the first 2 or 3 days.


Do you have a reference? That would be really messed up...


It says that in the video message posted: http://link.brightcove.com/services/player/bcpid1857622883?b...


Crucial, actually. They should provide enough transparency to assure that "not random" != "favored persons not selected".


That's a bit conspiracy-heavy for my tastes. If someone in a position of power in the State Department wants to give a particular individual a greencard there are easier ways than keeping on re-running the greencard lottery 'til they get one.


You'll note I make no accusations. Such a reversal requires transparency as an ordinary matter of course.

There is no reason they shouldn't do this on a Linux system, using published code, under a published procedure. Had they done that, they would't have screwed it up in the first place.


>The results were not valid because they did not represent a fair, random selection of entrants, as required by U.S. law.

How does one get so bad at using random numbers? Surely selecting random entries from a list isn't hard. `entries[rand*entries.count]`


Could've been a bad shuffle implementation. There are lots of ways to do that: http://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle#Im...


If "rand" is generated by a computer and not from a stochastic process, it's not technically random.


I'm fairly sure other, accepted government tasks have used PRNGs. So there would be precedence for doing so.

If not, I'd roll a 10,000 sided die, and type it in by hand. (edit: or cow pie bingo. That has a long tradition of unbiased results.)


Then sell the video rights to cow pie bingo to a global audience, as people from around the world watch in eager anticipation hoping their future in the one that gets shit upon! I love it as a farce - add a love interest and it's a short film waiting to happen.


Precedence and precedent have very different meanings. I think you meant to say that there is a precedent.

That said, it sounds like a definition of random is clearly specified and it was not met, so precedent isn't important.


Rolling a die isn't truly random either. Given the same environmental conditions and the same force vectors, you will always get the same result.

I think their definition of true randomness is flawed. It does not matter whether it's truly random, rather it should about whether it's random to a point where it's out of the operator's control and ability to predict.


Given absolutely identical conditions, yes. Ignoring quantum effects. But rolling a die is definitely in the realm of chaotic behavior, especially when any reasonable force is applied, so extremely minor changes cause massive, unpredictable end results.

In practice, such precise conditions are completely impossible, and the end result is unpredictable as long as it tumbles enough times. Shake it around in a cup to randomize the starting location, and you're as good as you can get.


As another poster stated, the drawing pulled 90% of the applicants from the first two days of the application period.

Perhaps the bug was something like "bool choose_this_applicant = random() < 1/total_applicants_received_so_far". It would have similar behavior, might appear correct at first glance (at least to the sort of programmer who settles for a government job) and would be utterly, completely wrong.


Good: I got rejected, I get a second shot.

Bad: How can the State Department be so incompetent as to fuck this up? Also, why does it take months to generate this? I could do it with 15min and a simple script. Granted, any code generator is never truly random (Newtonian laws of physics pretty much forbid randomness,) but for all intents and purposes they're more than capable of handling a simple lottery.


Uhh, Newtonian laws have really nothing to do with this. Its more information theory, you can't get more entropy from what you put in. Also, it is clear now that the Newtonian model is not an accurate view of the universe, and that there are fairly random events such as radioactive decay.


There's an interesting article on random.org that talks about just that. http://www.random.org/randomness/

This is dwelling into philosophy, but some argue that the universe is deterministic. With absolute knowledge of every particle in the universe, you could predict the entire course of everything that would happen in the future.

The key is not to find true true randomness, which can reasonably be argued not to exist, but rather to find a system that is chaotic enough to be beyond control or predictability, and use that to generate random numbers. Weather systems, radioactivity, etc. are examples of this.

Now I don't know much about quantum physics, and I hear that that might be an alternative to that—but my point is that I don't think it matters. A generator doesn't have to be truly random, it should just be random enough.


With absolute knowledge of every particle in the universe...

That's not possible, as stated by the Heisenberg uncertainty principle. The more you know about a particle's position, the less you know about its momentum, and vice-versa. This property is largely agreed upon to be an inherent property of the universe, and not just a technical limitation.


You need to look even deeper. Waaaay deeper, beyond the science, more into the realm of philosophy. The Heisenberg uncertainty principle is about the here and now, but when you go deeper, with more dimensions, then you're looking past that.

At that point, you're seeing the mechanics that makes the HUP work, the formulas that govern everything, the forumulas that make things un-formulable... Basically, beyond comprehension :)


I've read your comment several times, but each time I am unable to attribute any meaning to your words.


It sounds like a very poor description of hidden variable theory[1]. Hidden variable theory hasn't been in favor in a very long time, nor is it "beyond comprehension." In fact, I really hate the phrase "beyond comprehension" applied to almost anything because it has a tendency to mystify science, which can always be comprehended and expressed in the language of mathematics.

[1] http://en.wikipedia.org/wiki/Hidden_variable_theory


This is dwelling into philosophy, but some argue that the universe is deterministic. With absolute knowledge of every particle in the universe, you could predict the entire course of everything that would happen in the future.

Actually, this is impossible. There's an elegant proof of this statement based on Cantor's diagonalization argument which shows that the computational device required to perform such an analysis cannot exist in the universe, via contradiction.


Can you give me a google-able name (or description) for this proof?



Clearly we need to import better programmers.


State is better than Interior, read some articles on their "use" of technology.


According to dutch teletext, the error was a very simple one: not all applicants were entered into the draw. It does not give further details.


Why will it take two months to randomly select entries from a list?


They probably want to make extra sure that there are no errors this time.


That's the problem with randomness, you can never be sure.



Have they used the same software in prior years also, or is this because of a new software?


My bet is that the problem was much worse and more subtle than the other commenter's have noted. They posit a slight imbalance to due to rounding or PRNG characteristics.

I suspect that is it like the problem of choosing values randomly from a sparse hash table. Choosing a random slot and scanning forward to the next empty slot results in a bias where closely spaced entries have a much lower chance of selection than entries preceded by a wider gap.

It is easy to make that same mistake when trying to pick a random record number when record numbers have been assigned in chunks (like social security numbers).


I participated, and things looked bad enough when I saw that IE7 or lower was requiered... The Vista boot option, which I never use, alas did not help.


I was one of those picked and am really pissed. This is so screwed up! Does your employer ask you to pay the extra monthly salary back when they made a mistake in their system? "NO!". They should have made the exception to re-draw "randomly(!)" among the non winners and give them another chance. But do not take away what you promised because of your inaptitude!


uh, imagine if you are one of the winners :/


More importantly, for how many years has the selection process not been "truly random"? if it was really fucked up it could even mean some people could never be selected given their name and DoB.


They should have gave to those who were selected despite of non-random. Cmon they must be freaking depressed now!


BTW, if you win, you have to pay the Diversity Visa Lottery fee $440/person


If that's true, it would be a small price to pay for a green card.


The Visa Lottery is infuriating. It's almost as stupid as the family sponsorship driving our visa allocations.

The purpose of an immigration policy is to serve the citizenry. Only the most useful and "profitable" people should be let in, period.


my mom won the green card lottery 10 years ago, when I was still young and surely wouldn't have been very useful or profitable. 10 years later, I'm getting my master's at one of the best universities I could've hoped for, and a pretty good chance of being able to do interesting work with competitive pay. I'm extremely grateful for the lottery, which is the one thing that made all of this possible.

my mom also got a chance to actually apply her computer science degree (which she couldn't do before because of the explicit sexism which meant women were generally employed as, perhaps glorified, receptionists), and now works as a senior software engineer at a large company. I imagine her true potential would also have been difficult to estimate prior to the move, as she'd done very little actual programming after college - but she effectively got a fresh start here, and was able to quickly advance in an environment which allowed her to be judged solely on technical competency.


Good for you. Are the American people better off thanks to you and your mother's presence here? Or would they have got along just as well without you? That's the salient question.


I imagine they probably could have, but even looking at it from a strictly utilitarian point of view, both me and her are now American citizens producing a fair amount of value by plying our trade. I suppose you could debate whether or not skilled workers like ourselves were truly needed, but I think being able to find employment is a good enough proxy for being a productive member of society, so I would say the American people as a whole are 'better' than they would have been without us - although clearly this hinges on the actual criteria you choose.


So much for "All men are created equal," eh?

Over ten million people each year apply through this program for only 50,000 green cards. In a nation of 300 million people, I think we can manage this.


All men (and women) are created equal, of course. What rubashov is talking about is what they do with their lives after they're created.


Stop and think about how absurd this is. Ten million people apply and we pick randomly, rather than take the cream.

Is this not plainly insane?


Given the values that the United States was founded on, in the purest sense,

Why?


I'm not sure what founding values you're talking about. A cursory reading of the founders plainly reveals they viewed America as a nation of and for Europeans. Immigration policy for most of American history explicitly excluded non-whites, and often sought to exclude southern and eastern europeans.

I'm not advocating a racist immigration policy, as the US was founded on, just one that serves the people living here. After all the purpose of the constitution is to "secure the blessings of Liberty to ourselves and our posterity". That means the country is for citizens and their bloodlines, not foreigners.


I suppose I more mean "all men created equal". And the "American Dream" of coming to this country from nothing and making it big. A meritocratic process, or a process that has a likelihood of being gamed/manipulated/bought, would seem to detract from that possibility.

I suppose, technically, that's not what this country's Constitution has written on it; but, perhaps I'm an idealist and would like to think that, in the hearts and minds of the founders, they were intending to create a confederacy of freedom, and of equal opportunity. Why would the bar for entry into the country be any different?

[edit: finished my thought, then changed wording slightly]


Meanwhile, US foreign economic policies like free trade agreements displace people from their country into the US in order to survive, by becoming second class citizens and cheap labor for big corporations.

Meanwhile, red necks scream in anger, teir took uor jobs!!

Do you remember that time when the sons of immigrants (because in america everybody is the son or the grandson of a immigrant) were not angry at the newcomers? Neither do I, because it is always been this way.


A point based system as the one used in other countries may be better. You get points for education, experience, financial status, family ties, age, origin, etc.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: