Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The tribal thesis in the AI world seems to be that AI workers don't need subject matter expertise, as the AI will figure it out during training. In fact, subject matter expertise can be a negative because it's a distraction from making the AI good enough to figure it out on its own.

This assumption has proven to be very fragile, but I don't think the AI bigwigs have accepted that yet. Still flush from the success of things like AlphaZero, where this thesis was more true.



An old story:

“What are you doing?”, asked Minsky.

“I am training a randomly wired neural net to play Tic-Tac-Toe” Sussman replied.

“Why is the net wired randomly?”, asked Minsky.

“I do not want it to have any preconceptions of how to play”, Sussman said.

Minsky then shut his eyes.

“Why do you close your eyes?”, Sussman asked his teacher.

“So that the room will be empty.”

At that moment, Sussman was enlightened.


Maybe I'm naive - but I don't get it.

Closing your eyes doesn't make the room empty. And in the same way not programming preconceptions into the neural net doesn't make the preconceptions go away?

I realize explaining a joke or something like this takes away some of the charm (sorry), but would love to get the point :)


Randomly wiring a neural net doesn't remove preconceptions, it just makes it that you don't know them. Similarly, closing your eyes doesn't make a room empty, it just makes it that you can't see what's there. Minsky is pointing out that Sussman's fundamental assumptions on desiring to remove preconceptions is logically flawed.


Gotcha - I read it as Sussman didn't want to program his preconceptions into the network (for which randomness seemed suitable, which is why I was confused). Your explanation makes more sense.


There's an old series of - stories? jokes? called 'unix koans' [1] which always end with a master answering a question in a very unclear but profound-sounding way, then the line 'Upon hearing this, [someone] was enlightened.'

I never found them laugh-out-loud funny myself.

This is probably a reference to those.

[1] http://www.catb.org/~esr/writings/unix-koans/


A Koan (公案) is a concept from Zen Buddhism. Zen made its way into pop-culture (or at least the popular counter-culture movement) in the US in the 1960s via several writers of the Beat Generation. Project MAC at MIT was founded in the early 60s (Gerald Sussman started there in '64 I think)? So a number of faux Koans were in circulation in the AI crowd by about 1970.


Unfortunately because it's delivered as a koan we'll never know whether he's talking about the fact that the random weights determine the nearest local minimum or the hyperparameters.


also unfortunately, a "real koan" often relies on conflating expectations+conditioning versus direct experience, where direct experience is shown to invoke an "impossible" result; teaching that the conditioned, subjective mind does not see fundamental aspects of reality, though it thinks it has the answers.

These technical mimics of that structure echo a time when a lot of people, relatively, were experimenting with disregarding personal subjectivity in favor of direct experience and deeper Truth. In the modern tech versions, that is rarely if ever part of the story?


I think you’ve exactly described the point.


I interpret that as Minsky being... an unpleasurable person, to say the least.


Every time I read this I appreciate its sublimity.


The problem with preconceptions about your parameters is that you might be missing some crazy cool path to your goal, which you might find by randomly exploring your sample space. I remember seeing this same principle in mcmc samplers using uniform priors. Why is this so crazy?


It's predicated on the assumption that a random discovery from a zero-comprehension state is more likely to get you to a goal than an evolution from a state that has at least some correctness.

More generally, it disingenuously disregards the fact that the definition of the problem brings with it an enormous set of preconceptions. Reductio ad absurdum, you should just start training a model on completely random data in search of some unexpected but useful outcome.

Obviously we don't do this; by setting a goal and a context we have already applied constraints, and so this really just devolves into a quantitative argument about the set of initial conditions.

(This is the entire point of the Minsky / Sussman koan.)


> from a zero-comprehension state is more likely to get you to a goal than an evolution from a state that has at least some correctness.

I get that starting from a point with "some correctness" makes sense if you want to use such information (e.g. a standard starting point). However, such information is a preconceived solution to the problem, which might not be that useful after all. The fact is that you indeed might not at all need such information to find an optimal solution to a given problem.

> by setting a goal and a context we have already applied constraints.

I might be missing your point here since the goal and constraints must come from the real world problem to solve which is independent from the method to solve the problem. Unless you're describing p-value hacking your wait out, which is a broader problem.


With exploring, the starting state should only affect which local-maximum you end up in. Therefore you need to make an argument that a random starting state is likely to end up in a higher local-maximum than a non-random starting state.

There is always a starting state; using a random one only means you don't know what it is.


Exactly, but why do so many people seem to have a problem with this? Sounds like a political problem to me instead of a scientific one.


There are a lot of problems that arise from lack of domain expertise, but they can be overcome with a multidisciplinary team.

The biggest defeating problem for pure AI teams is that they don't understand the domain well enough to know if their data sets are representative. Humans are great at salience assessments, and can ignore tons of the examples and features they witness when using their experience. This affects dataset curation. When a naive ML system trains on this data, it won't appreciate the often implicit curation decisions that were made, and will thus be miscalibrated for the real world.

A domain expert can offer a lot of benefits. They could know how to feature engineer in a way that is resilient to these saliency issues. They can immediately recognize when a system is making stupid decisions on out of sample data. And if the ML model allows for introspection, then the domain expert can assess whether the model's representations look sensible.

I'm scenarios where datasets actually do accurately resemble the "real world", it is possible for ML to transcend human experts. Linguistics is a pretty good example of this.


It makes sense to have a domain expert and an AI expert working together, but I'd offer two important modifications:

1) The AI expert is auxiliary here, and the domain expert is in the driver's seat. How can it be otherwise? You no more put the AI expert in charge than you'd put an electronic health record IT specialist in charge of the hospital's processes. The relationship needs to be outcome-focused, not technology-focused.

2) The end result is most likely to be a productivity tool which augments the abilities/accuracy/speed of human experts rather than replacing them. AGI being not that sciencey of a fiction, we aren't likely to be actually diagnosed by an AI radiologist in our lifetimes, nor will an AI scientist make an important scientific discovery. Ditch the hype and get to work on those productivity tools, because that's all you can do for the foreseeable future. That might seem like a disappointing reduction in ambition, but at least it's reality-based.


Unless of course the "domain experts" have fundamental disagreements or have equally limited knowledge of what should constitute what is important to extrapolate data beyond their own scope. E.g. like in comp sci, there might be multiple comparable ways to accomplish n, but which is best to reliably accomplish an unknown or unforseen n+1...depends.


> Humans are great at salience assessments, and can ignore tons of the examples and features they witness when using their experience

This is called the frame problem in AI.


> The tribal thesis in the AI world seems to be that AI workers don't need subject matter expertise

Not throwing any stones here, because I've been guilty of the same sort of arrogance in other contexts. But I think the same thing happened a ton during Bubble 1.0 and the software-is-eating-the-world thing. And it's hardly limited to tech: https://xkcd.com/793/

For me, at least, where this came from was ignorance and naivete. Three things cured me. One was getting deeper mastery of particular things, and experiencing a fair bit of frustration when dealing with people who didn't understand those things or respect my expertise. The second was truly recognizing there were plenty of equally smart people who'd spent just as long on other things. And the third was working in close, cross-functional team contexts with those people, where mutual listening and respect were vital to the team doing our best work.

So here's hoping that the AI bigwigs learn that one way or another.


Not only AlphaZero, but hadn't the whole field of computer vision (which LeCun is specialised in) had its major breakthrough with letting the AI figure out the features (i.e. CNNs)?


Yes, but they needed 2 billion images training data to get to the point where the AI usually draws the correct number of limbs...

Any radiology AI that needs millions of training sets is useless in practice.


> Any radiology AI that needs millions of training sets is useless in practice.

Why? I have no doubt that radiology AI might not be that useful (though radiologist friends of mine say AI is making an increasing impact on their field.) But this logic doesn't make sense. So what if an AI needs a million training examples or even a million training sets? Once your topology and weights are set, that net can be copied/used by others and you get a ready-to-go AI. There's an argument to be made that if training scale is what's needed to get to AGI, then maybe AGI is unrealizable, but that's not the same as saying a domain-specific AI is useless because it needs a large training set.


It helps that human scientific expertise in the topic of "recognizing objects" is limited.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: