Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Day 1: IBM Watson Ties for first on Jeopardy (venturebeat.com)
124 points by nyellin on Feb 15, 2011 | hide | past | favorite | 96 comments


After not being able to find a suitable online stream of this event (I don't have cable), I gave up and built a rudimentary antenna for my TV. Basically, I just wound a bunch of copper wire around a piece of cardboard and stuck it in the window. The other end of the wire went into the cable port on the back of my TV. Surprisingly, I was able to pick up the game in crystal clear, digital 1080p!


There's something fitting about fashioning your own antenna to watch AI compete on Jeopardy. True hacker.


It may have been crystal clear, but it wasn't in 1080p. It was either 1080i or 720p. 1080p over-the-air broadcasts still do not exist.


You're right, my mistake, it was 1080i.


That's pretty impressive, mind sharing a photo with us? I just want to understand what this level of awesome looks like.


Thanks, I just wrote it up and took some pictures:

http://news.ycombinator.com/item?id=2222827


Antennas made of a straight bit of wire of about 1/4 the wavelength usually work well. Think FM radios: freq about 100 MHz -> lambda = c/freq = 300 cm -> antenna of 75 cm (30 in).


The non-realtime/non-hacking way: http://www.youtube.com/watch?v=ZLdkJpAtt1I (part1, part2 link on screen)


I grabbed it out of youtube not long after it played on the East coast.

youtube search: jeopardy watson

search options: upload today

edit: I think I may have included the date in the search


For anyone searching for it, including today's show:

http://www.youtube.com/user/Rashad8821


Before I signed up for cable, I built this and had pretty great reception. Just a couple of parts from Home Depot/Radio Shack and you're good to go:

http://blog.makezine.com/archive/2009/01/maker-workshop-dtv-...


For the sake of clarity, could you explain the set up? How many channels could your copper wire pick up?


Watching Watson's avatar gave a little insight about what the software may have been doing. The results, based upon that avatar. The % is its confidence in its #1 response.

  Category    Clue         Watson#1Resp   % rsult otcm
  ----------- ------------ ------------  -- ----- ----
  Alt Meaning belief       view          70 right Brad
  Alt Meaning horse foot   shoe          68 right won
  Lit Chars   split person Hyde          71 right won
  Beatles     guy pain     Jude          98 right won
  Olympic Odd 2008 perfect Michael Phelp 93 right won
  Name Decade Disneyland   1950s         87 right Ken
  Final Front black hole   Event horizon 97 right won
  Lit Chars   Beowolf      Grendel       97 right won
  Final Front Michelango   Last Judgemnt 97 right won
  Beatles     title gal    Lady Madonna  90 right won
  Olympic Odd 1908 city    London        69 right won
  Name Decade Emp St Bldg  1930s         50 right Brad
  Beatles     Silver Hammr Maxwell       98 right won
  Lit Chars   his victims  Harry Potter  37 wrong Brad
  Alt Meaning piece wood   stick         96 right won
  Final Front Latin ending finis         97 wrong lost 
  Beatles     John mother  Julia         97 right Ken
  Lit Chars   Les Miz      Jean Valjean  76 right won
  Olympic Odd 1976 epee    Pentathelon   85 right won
  Name Decade Klaus Barbie 2002          11 wrong Ken
  Olympic Odd George Eyser leg           61 wrong lost
  Alt Meaning bent arm     Knee          40 wrong Ken
  Name Decade Oreos        1920s         57 wrong lost
  Final Front paper limit  Envelope      61 right Brad
  Alt Meaning students     chic          82 wrong lost
  Final Front summit       peak          65 wrong Brad
  Name Decade Kitty Hawk   1900s         17 right Brad
  Olympic Odd sole member  Olympic Games 20 wrong Ken
  Beatles     died church  Eleanor Rigby 98 right won
  Lit Chars   evilness     Sauron        74 right won


I also put together the full game stats: http://news.ycombinator.com/item?id=2220667


I attended a talk by one of the designers. Some of Watson's 'wrong' answers he described were pretty funny. There was one where the clue (paraphrasing) was 'Punch below the beltline that rhymes' and the right answer was 'Low blow', but Watson came up with 'Wang bang'. In another one the clue was something like 'The end of this had an exclamation mark in headlines in 1919', with the right answer being 'World war I', whereas Watson answered 'Sentence'.



Thanks for the links. I love the bits when Trebek visits Watson in the lab - it's the sort of thing where kids in 20 years will giggle at the fact that their pocket computer is more powerful.


I couldn't have wished for a more exciting opening to this game. Watson just shot right off into the lead like there was no tomorrow, jetting ahead by thousands! I was blown away. Then he slowly started to crumble a bit and his answers were faulty more and more. But his strategy is amazing! He hops from category to category, while most humans pick a category and stick with it. I wonder if IBM has him do that purely because it might throw the humans off a bit. Totally awesome. I can't wait for tonight's show!


> He hops from category to category, while most humans pick a category and stick with it.

Ken Jennings himself did that for the express reason of throwing off his opponents. As the category selector he would have an extra couple seconds to start thinking in that direction. Totally fair game for Watson to do too.

I agree that Watson started with a higher value clue in order to look for the Daily Double. Statistically the fourth row has the highest concentration of them. DDs are valuable not just because of the opportunity to double up or bet more than the clue's actual value, but also because you get exclusive access to the question without buzzer competition.

As for Watson's early lead, that happened because the clues chosen early happened to be the types of questions it could answer quickly and reliably. Computers are great at stuff like 4-letter words and searching Beatles song lyrics. Computers are less great at clues where the hint drives the answer, like the one about "don't name Voldemort" or the "Final" category where the hint is in the category name. It just so happened that those harder clues for Watson came after the easier clues.


A while back, before the game, I wrote to Ken about one of the videos IBM made about Watson. In it, someone said that Watson "learns" the category. The best strategy for humans, then, is to start at the highest value questions.

When Ken jumped into a category, notice that he started at the highest values.


> The best strategy for humans, then, is to start at the highest value questions.

Also because doing that searches for Daily Doubles, which are concentrated at the higher value clues.


He hops from category to category, while most humans pick a category and stick with it.

I noticed the same thing, but I really hoped it was unintentional. It doesn't feel like it's a fair fight otherwise, since it means he's exploiting a weakness intrinsic to his competitors. The whole point of this competition is to show that a computer can compete and win at a human level. If it weren't, then why force him to physically push a button to ring in? If Watson is intentionally jumping categories to confuse humans, then it sort of feels like he's cheating a bit.

As I write this, I realize a lot of this comment is motivated by my bruised human ego, but it just doesn't feel quite right to me.


The first rule of competitive game playing is that anything in the rules is... game.

If it helps your bruised human ego, another human player could attempt to exploit this exact same weakness.

(I don't know if Watson is programmed to do this or if it is an actual weakness of human players.)


I think this is precisely the thing about AI which makes it so interesting. An intelligence which is not a mere emulation of the human will be alien, and through its strangeness we can learn more both our limitations, as well as complement our own abilities.


It doesn't feel like it's a fair fight otherwise, since it means he's exploiting a weakness intrinsic to his competitors

Late in the first round you can see Jennings change his strategy a bit -- he starts ringing in before he knows the answer. Jennings is exploiting the fact that Watson won't ring in until it has confidence in an answer.


That's awesome. He can make that strategy pay off because he's Ken freaking Jennings. Usually the money penalty for buzzing in but answering incorrectly (or not at all) would dissuade a contestant from buzzing in before actually knowing the answer.


I wonder if the Watson developers did this because they were unable to simulate the benefit human players get by graduating through progressively difficult questions in a category.


I suspect the hopping categories was to try and buy time to calculate what sort of questions would be asked in each. But it's first move of getting into the lead (and the daily double) I would say is a combination of good strategy and PSYOP.


It looked like he went to the 4th row to find the daily double, then just went through all the cheapest questions, as he was most likely to answer those. He finished all of the $200s, then went to the $400s, etc. I don't think it was to trip up his opponents, I think it was to have a better chance at answering.


I thought Watson was searching for the daily double spot to jump ahead $1000 right away.


And also to steal the humans their chance for a late comeback in the game.


oooh, good strategy.


The daily double is equally likely to be under any unrevealed spot, so jumping between categories has no effect on the probability of hitting it.


Ok, this is wrong. Ken Jennings said in an interview that the DDs are more likely to be in the bottom rows, so you actually can hunt for it.


I've seen elsewhere it was suggested that whereas humans will want to get used to the question style of each category and learn from each one as they progress to harder answers, Watson has no such issues with jumping straight for high-value questions in all categories.


Incorrect. Watch the IBM video where they explicitly state that Watson learns the category, as well.


That Trebek gave Watsom credit for "Maxwell's silver hammer" when the clue was looking for the person – Maxwell – gave me a moment's pause.

Jeopardy does tend to be a little more forgiving of contestants during the first round – such as giving them a reminder to phrase-as-a-question, and letting them correct an initially wrong or incomplete utterance, if done almost instantly. Still, I'm not sure if such a interpretive error by a person would have been overlooked.


I noticed that, too; but a little later in the game, Trebek declared "leg" (instead of "he was missing a leg") wrong without hesitation.

So it's definitely being held to some standard.


Actually, according to arstechnica, that lack of hesitation was before the segment was reshot.

A human had answered "missing a hand," which created context that means "What is a leg" could be interpreted as "he was missing a leg". Initially Alex did give him credit, before realizing that Wattson couldn't have been using that context.

http://arstechnica.com/media/news/2011/02/ibms-watson-tied-f...


OK, so I have a question about this, how was it legitimate for Brad to answer at this point? Brad would have seen Alex initially credit Watson with the right answer and then reverse the judgement, rendering the answer incorrect. Brad would have been able to infer the correct answer at this point.


In fact, Brad didn't answer the question... I was wondering about that since the answer was obvious just from Ken's guess and Watson's response.


Very interesting; thanks for the clarification. I was fooled.


Apparently he actually initially declared that as correct before realizing that Watson didn't have the context to be going off of Ken's incorrect answer. Hurrah for editing.


This sort of overanswering seems to be usually accepted, as long as the correct answer is contained in the wrong one.


Going the Sci-Fi route, I would love this technology to expand and become an interactive Wikipedia. How cool would it be to go to your local library, (you remember those?) and ask it a series of questions and get answers.

But thinking about how quickly archaic the library is turning into these days, I see it going this route: You call this service from your computer or phone and ask it anything. But by then Goggle will already have that feature. Google AI (Beta)


I too am inspired by all the possibilities this implies for human-computer interaction. I picture a portable device that records events from your surroundings and provides personalized reasoning similar to this. It could remind you of people's names, it would know all the details of your personal finances and could provide budget and stock tips, remind you of events if you have a pattern of forgetting to plan ahead, etc. Aside from the fact that Watson fills up a whole room and its knowledge is tailored to Jeopardy, it really doesn't seem that far off combined with a few other technologies like machine vision and translation that are already out there.


Have you seen Qwiki? http://www.qwiki.com/


I could beat Watson. "I'll take quote semi-colon drop table categories for 200"



If you want to watch it but have no way of doing so, the game is (still?) available on YouTube and easily findable. It will also be available on the IBM Watson website (http://ibmwatson.com/) the day after tomorrow.


http://news.ycombinator.com/item?id=2220667

For those that missed the game, or can't remember every question, a nice spreadsheet/empty discussion made by ckwalsh.


If Watson wins, it should have to keep coming back as defending champion until someone else beats it. It should also qualify to return in the Tournament of Champions if it gets that far.

Now that would be a true test.


It would be quite entertaining, but that would mean either more permanently moving Jeopardy! filming to IBM's campus (cost prohibitive) or rebuilding Watson on Jeopardy!'s set (also cost prohibitive) and you see why this is just a three day exhibition match.


Understood.

Does anyone know what these IBM supercomputers ultimately are used for after they're done winning Jeopardy or beating Gary Kasparov? I'd imagine something top-secret for the US government, but it would be nice to know exactly what Deep Blue is doing now.


Does anyone want to create a community site for the contest? I am looking for another Django developer who can participate in an all-night code sprint. The site would:

1. Show an online score board.

2. Track Watson's performance (like shkb is doing on Google Docs)

3. Aggregate pictures, videos, and written analysis of the event


I think you should've been doing this on Sunday...


I didn't think of it until now. I assumed that the official IBM site would have a way of tracking Watson's progress.


Just to clarify, I didn't do anything but link to what ckwalsh made and posted at I guess a bad time to draw attention. Least here, the reddit discussion seems to be going well.


I don't have the time to build a site at the moment (curse you schoolwork!), but if you'd like to help maintain/update the docs I'd love the help.


Thank you for the offer. I did some more research and I found a few good resources for normal Jeopardy games. I have less of an itch to scratch right now, so I think I will just hack on my main project tonight.


Spoiler not appreciated by many, I'm guessing. Next time, please don't do that.


Well I guess some people dvr everything, but I don't think this is a spoiler any more than ESPN's website the day after the superbowl, or a newspaper headline reading OBAMA WINS.

Maybe if it had been posted at 7pm EDT yesterday.


I'm confused. Where's the spoiler? I don't remember seeing anything in the article (and I'm glad I didn't I wouldn't appreciated the spoiler).


The title of the submission is the spoiler.


Sorry about that. If I post the scores again, I'll do so w/o giving away anything in the title.


=====MILD SPOILERS FOR DAY 2 BELOW=====

(I posted this in another thread, but this one seems to have most of the action.)

I think it's become apparent that most of Watson's advantage is in signaling speed. Figure that a top human player like Jennings knows about 80% of the answers and Watson about 90%. Watson should be winning, but not by nearly that much.

Jeopardy signaling 101 for those that don't know: An offscreen operator presses a button to enable the buzzers when Alex is done speaking. If you buzz too early, there's a 300 ms lockout until you can buzz again. A light near the board lights up when the buzzers are open. Watson monitors that light (whether by electrical connection or optical sensor, they haven't said) and physically presses the signaling buzzer. Its reaction speed must be faster than the humans' and it will never miss and buzz too early.

Also, Watson's clue selection pattern shows that it definitely starts by searching for Daily Doubles. It picks the bottom 3 clues of each category before anything in the top 2 rows, where the DDs are statistically concentrated.


The longer the question the longer Watson has to generate an answer. Isn't that an important advantage?


I'm not sure I understand your point. All of the contestants "see" the answer at the same time, and none of them are allowed to buzz in until the magic light goes on.


But if the question takes 20 seconds to read, Watson has 20 of his seconds to narrow down his confidence to a particular answer. If he only had 10 seconds, his calculations might not have gone far enough to bubble the correct answer on top. It would be interesting to see a chart of confidence of answers vs time for Watson..


But the same is true for the human contestants; at this level especially, I'm sure that they scan the whole question in a second and then have the rest of the spoken time (which, even for long questions, is nowhere near 20 seconds - probably more like 4) to think about it.


I'd like to see the next version take the avatar concept further. They went half way by making it physically buzz in. Go the rest of the way, and make it only receive input from the avatar. Add a computer vision and speech recognition component, and make it read the board and listen to Trebek like the human players. Then, make it mobile, and put the avatar in the actual Jeopardy! studio.

EDIT: Clarified "human" players. Welcome to the future!


According the designers they considered it, but finally decided to concentrate on the question answering. OCR for this situation could probably be done in fast and reliable way, so it wouldn't make much of a difference. Speech recognition would be harder and less reliable, but redundant most of the time (though it presumably could have helped for the 1920s question).


Understandable, and I agree that they definitely tackled the meat of the problem. I think there could be some interesting problems in the computer vision aspect of it, though. Seeing and interpreting the board is subtly different than a straight OCR problem.

Maybe I'm overstating the difference, though.


Does anyone know why Watson was unable to buzz in immediately in some cases when he had 97% confidence in his answer? Seems odd that a computer would not be able to beat the human opponents at hitting the buzzer.

(It's possible that he didn't have the answer until after the human opponents, but that seems unlikely.)


Watson needs a few seconds to calculate the answer and will only hit the buzzer when it’s finished calculating. Its opponents can buzz in even if they are not completely sure that they have an answer (or they simply might be faster than Watson). It’s important to note that it is only possible to buzz in after the question has been read out. (Someone behind the stage flips a switch or something like that. Lights indicate that the buzzers are open for the humans, they are locked out for a few hundred milliseconds when they press the buzzer too early, Watson never buzzes in prematurely.)

I would like to know whether it is possible to beat Watson to the buzzer even if you and Watson both already know the answer. Watson has probably better reaction times but humans can anticipate when the host is finished reading. That probably depends highly on the consistency of the person behind the stage flipping the switch and opening the buzzers.

Here are some stats from the first round: Watson was the first to buzz in 16 times (with two wrong answers). It was above its buzzing threshold and didn’t manage to buzz in seven times. (It would have been wrong three times if it had made it to the buzzer first.) It was below its buzzing threshold six times. (The remaining one is the daily double which Watson got correct.)

This needs to be in a table. Numbers in brackets indicate wrong or potentially wrong answers:

  Confident & first:  14+(2)
  Confident & beaten:  4+(3)
  Not confident:       6
  Daily Double:        1
Edit: Some think that Watson’s (probably) superior reaction time gives it an unfair edge. I don’t really agree because fast reactions are simply a part of the game but I can sort of see the point that it’s not really about reaction times. We already know that computers can be better than humans when it comes to those.

I propose the following modification: One of the contestants who manages to buzz in in the first 100ms (the human reaction time) is randomly selected, buzzing in after those 100ms works as usual. Contestants are also no longer locked out for buzzing in too early.


I'm a bit lazy to look up for the source, but one article mentioned that the engineers were surprised that humans could get down to the 10 - 15 ms range by anticipating the light rather than reacting to it. They said this beats the mechanical buzzer, and it does happen, though not terribly often.

So yes, it is possible to beat Watson to the buzzer even if you both know the answer.

I suspect they could have improved the buzzer if they really wanted to, but it might have made things too lopsided.


Other articles, not linked directly here, indicate that there is a lockout on the buzzer if Watson hits it too soon. I've heard anywhere from 300-1000ms delay if you punch in the buzzer before Trebeck stops speaking.

I also read that due to buzzer experience, Ken Jennings also advocated to the producers that competitors should be given practice on the buzzer before doing the game. It would make a more fair game. Honorable "battle", if you will.


According to the designers Watson waits for the light indicators (not seen on TV) that you can press the button, so no lockout. Humans often try to predict when the light will turn on instead of waiting, so sometimes they press the button faster than Watson.


Has anyone considered building an open source Watson?


Not very many people have a state-of-the-art supercomputer in their basement, so it wouldn't be that useful.


True, but you can rent something like a state-of-the-art supercomputer from Amazon for fairly reasonable hourly rate. Watson has 90 8-core servers and 16Tb of RAM - to rent 100 Quadruple Extra Large instances (1600 cores, but "only" ~2.3Tb of RAM) would cost you $160/hour. I'm sure there are many other differences (and the fact that you can't fire up that many cluster instances without calling Amazon first), but the idea that an open source package like Watson wouldn't be useful doesn't make sense to me.


And IBM did state that they could run the Watson software on 1 system. (It just took 7 hours to get the confidence level - not 3 seconds)


IBM's using two racks of computers and a huge SAN. Maybe it couldn't compete with watson, but a decent setup could be cobbled together for 5,000. It wouldn't be as fast, or as comprehensive, but it could be done.

The harder part is the tens of thousands of man hours with very smart people.


I think a little project called Linux got tens of thousands of man hours with very smart people ... Just saying.


I think this is a bit short-sighted. Attempting the problem within constraints imposed by not having deep pockets could be very useful.


"Watson can’t adjust its answers to what the other players say and so it simply answers with whatever comes up as its top answer."

This is false, in fact they specifically mentioned Watson learning from the answers of the other players.

In the game, Watson answers "The 1920s" after Jennings answers incorrectly with "the 20s". Basically, it didn't write off its own answer because it thought Jennings' was different enough that it might have been incorrect for the way it was phrased. The same way you might correctly answer "inner ear" after someone else incorrectly answers "ear".


I was at an event at MIT with one of Watson's designers as I watched the show. I asked him about the 20s thing, and he said they talked about it when they were designing Watson but they figured the scenario wouldn't happen where another contestant gave the wrong answer and Watson gave the same-but-different wrong answer. Edge case!


Capt. Ed Murphy was a smart man.


He was Major Ed Murphy.


I believe that it learns from the correct answers. i.e. once the question is complete, it can use the known correct answer in its analysis. But it can't "hear" anything.


That's correct. Watson only learns from correct answers after a question has been completed. This is mentioned specifically int he Nova documentary.


Looks like this isn't too clear. An IBM spokesperson said he can't hear what other players are saying at all and only takes input from the question board. (http://latimesblogs.latimes.com/technology/2011/02/ibms-wats...)

Regardless, Watson only selectively takes input, if it does at all.


Not sure if it does it during the question though. The NOVA documentary on Watson[1] mentions having problems specifically with time phrases like the '20s, and it also has a part on giving Watson text input of the correct answer to improve performance after the question is answered. Nothing on getting every response from the other players.

[1] http://www.pbs.org/wgbh/nova/tech/smartest-machine-on-earth....


They say that Watson has no "ears or eyes". So I assume it's not aware what others answer in real-time, hence that mistake. Were it not for this trivial mistake, Watson could be winning!


Its a baby god.


Take note fellow hackers:

We're witnessing a truly monumental event in human history and technological development.

Computers competing with humans on gameshows as if they are intellectually equivalent...

We live in the freaking future now, man.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: