Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Machine Learning’s ‘Amazing’ Ability to Predict Chaos (quantamagazine.org)
272 points by adenadel on April 18, 2018 | hide | past | favorite | 74 comments


Don't be too seduced by the enticing ideas at the end of the article. The disconnect here is that success in learning how to predict the results of an algorithmic simulation is not really indicative of how it would perform with the decidedly non-algorithmic natural behavior of weather or earthquakes, phenomena which don't operate in a closed system with predefined limits and parameters. It sounds like the next step, but even if weather were reducible to machine-discoverable patterns, you first must face the need to collect an immense amount of high-resolution condition data from around the globe on an ongoing basis. I'm guessing our current mesh of satellites and weather stations, as impressive as it is, comes nowhere close to enough raw data to sufficiently feed the beast in this case. And even that assumes that we _know about_ and can measure all of the factors that go into weather patterns. I sincerely doubt that's the case.


It sounds like the next step, but even if weather were reducible to machine-discoverable patterns, you first must face the need to collect an immense amount of high-resolution condition data from around the globe on an ongoing basis.

There has been some success using Ensemble Kalman Filters (EnKF) to predict hurricanes [1]. I think that these filters sit somewhere between machine learning and deterministic models. The filters are based on a semi-realistic mathematical model but are updated with statistics every few hours so that they can improve their predictions.

As you said, going full on machine learning "sounds like the next step". However, I would disagree that the ideas at the end of the article are unrealistic. If the new technique can beat the Kalman filters it is already useful.

[1] http://hfip.psu.edu/realtime/AL2016/forecast_track.html


> machine learning and deterministic models

This is a confusing distinction. Aren't lots of machine learning algorithms deterministic? Sure, some include stochastic elements or components, but even those can be deterministic, e.g. by reusing the same seed for the random number generators.


It's actually kind of unclear what it means to predict results from an algorithmic simulation. Surely that's just trying to use a different method to simulate the process?

I also don't see them mentioning anything about noise, so what exactly are they trying to achieve? Surely perfect prediction should already be possible, otherwise what are they comparing the output to?


Yeah, they're predicting the outcome of a chaotic but deterministic process based on the output data given.

Surely they're discovering hidden (or obvious) regularities based specifically on the process being deterministic.

As a counter example, the Smale Horseshoe [1] is an example of a inherently unpredictable map - it's approximately equivalent to choosing a real number "at random" and attempting to "predict" each digit as you find it. Sure, if some implementation has chosen a number with repeating digits, you can "predict" those digits but this would have nothing to do with being an oracle for algorithm itself.

More broadly, I think it's pretty established that any realistic model of weather is kind of an extension of this concept - it depends unstably on it's initial conditions (small changes in initial conditions result in large changes in final result after some period).

[1] https://en.wikipedia.org/wiki/Horseshoe_map


This interpretation sounds right to me. I found it cool that the article characterizes success in terms of prediction accuracy out to N "Lyapunov times", a concept that is supposed to incorporate the unpredictability of the system. So perhaps discovering these regularities proves that in this implementation, the true Lyapunov time is longer.


I believe (but I might be wrong) that are compared the results of two different methods: (a) "algorithmic" where solution(t+1) = algo(solution(t)) and errors accumulate with time steps, vs. (b) based on sufficiently many available pairs (initial_1, solution_1), ...,(initial_N,solution_N), each solution_i a function of time t \in [0,T], a NN (or another gadget) builds a function ALGO(initial)=solution so that the errors in time of solution(t) wrt real_solution(t) are uniform, where real_solution is the real solution for the initial condition initial.


Check this out: https://www.wunderground.com/wundermap?lat=40&lon=-105

Just start breaking the problem down. We have modestly accurate ideas of air pressure features across most of the world. Not enough to predict every breeze, but enough to predict what days it'll be stormy in a given location about a week in advance, plus or minus a few days, and plus or minus a bit of severity. With continued training, the models can only get more accurate. Personal weather stations are a thing now, and only growing in numbers.

Combine that with radar and satellite data for an even more accurate picture.

Now, if only we could get a log of every (5min? 1min?) reading of every station...


(Part of) the idea seems to be synthesizing missing data from later observations of the known data, i. e. "to arrive at this state (A_t,C_t,E_t), the initial state must have been...(A_t-1,B_t-1,C_t-1,E_t-1)..."

I have my doubts that this can just overcome the fundamental problem of chaos, but it doesn't sound impossible.


Well, but then this is just a fancy name for combinatorial optimisation.


Why did they choose such a specific task to make a general statement? Wouldn't it make more sense to say, predict the real world complex motion of a double pendulum?


I think you're correct about collecting weather data. That said, the prediction doesn't have to be perfect - like race horses or the lottery - in the short term it just needs to be relatively close. It all depends on how far out your long term needs to be.


What's a good middle ground?

A "chaotic system" that is (a) real (b) a system and (c) condition data is a crossable hurdle. Ideas?


I wouldn't say that natural behavior is non-algorithmic. Its more large scale hidden information algorithmic.


I agree but the point was to derogitate the computer simulation, which is literally algorithmic.


Can we have more publishing like Quanta magazine? It's just perfect in terms of reporting from the frontier, being easy to unsetstand, and not talking down to you.


It's funded by Jim Simons, of the hedge fund Renaissance Technologies. A mathematician with money, trying to promote mathematics. Quanta magazine is a fantastic service.

But the other Renaissance magnate, Bob Mercer, as his own media hobby funded Breitbart News.


I used to be part of a collaboration at a particle accelerator called RHIC. It was the most powerful accelerator of it's type before the LHC was built, and a lot of really great research was done there. One year, the budget got slashed and there just wasn't enough funding for it to operate. Simons donated a massive amount of his own money, and organized fundraising from other sources, both of which played central roles in the experiments being able to continue on with their research. There's now a road inside of Brookhaven National Lab called "Renaissance" in honor of him to celebrate his contributions. It's actually pretty sad that it was necessary for something like that to happen, but it certainly speaks volumes about Simons.


> But the other Renaissance magnate, Bob Mercer, as his own media hobby funded Breitbart News.

And was one of the major sources of funds behind Cambridge Analytica.


Yeah it's kind of sad to me. Quanta is amazing but I'm not sure if I could reasonably expect to exist without Simons or someone like him privately funding it


Why do you feel sad about the fact that an individual who values science is "privately funding" a great publication? surely you don't object to private wealth and/or activity?


Being dependent on the whim of a wealthy individual is saddening.


Could one be, instead, appreciative of and grateful to the considered, generous decision of that individual? how do you think he would respond to someone expressing one kind of feeling or the other?


I assume that he became a multibillionaire in part by not troubling himself with the opinions of nobodies.


Why do you think that I'm not? Lol


Would it be any less saddening to depend on the whim of an uneducated, innumerate and manipulable mob?


Better would be the considered preferences of an educated numerated mobs.


Please don't put opinions in other people's mouths. Not everyone believes that extreme levels of wealth controlled by one individual is good.


I was going to comment in here about how the Simons Foundation runs Quanta. Glad other people have mentioned it. Simons is an amazing guy.


I always wondered how Quanta put such high quality material out in a traditionally low-profit industry. Now I know!


It would be fascinating to apply this technique to card shuffling. Shuffling techniques are not even close to random...casinos hope for chaos at best. It would be interesting to see if a machine learning algorithm could come up with an approach that could be carried out by humans to, for example, predict whether or not the next shoe of a hand-shuffled blackjack game is going to have a positive or negative expectation for the player.


You can card count on apps already. The repeated linear operator + nonlinearity of the ESN (the reservoir) could implement a card-counting memory but, you know, so could a card counting memory in a normal program.


Oh I wasn't talking about card counting. I was talking about being able to determine, knowing the order of the cards going into the shuffle, whether or not the next shoe (after the shuffle) would have an overall positive or negative expectation. The theoretical expectation accounting for every single possible order of cards is slightly negative. However, if a massive percentage of the possible orders that the cards can be in is eliminated, the expectation may either be much more positive or negative than the overall theoretical return.


Novel paper. But it seems like a lot of the excitement is because of the fusion of two buzz-words, one from the 80s and 90s and another from the 2010s. So, the output is a bunch of stuff coming out of a kinda simple dynamical system. Chaotic for sure, but still simple. Deep learning (and more generally, recurrent neural nets, LSTMs, and derivatives thereof) has been shown capable of learning much more complex nonlinear systems, including human perception. Given this, I think the OP paper is a low-hanging fruit. It was only a matter of time before someone figured out a way to learn specific chaotic nonlinear dynamical systems. Nice work nonetheless.


In the example they give they predict the future of a simulation. If you have a perfect simulation, with perfect observation, why not just run the simulation forwards? Well, the goal is to apply this to the real world, where you possibly have only an approximate model and observations - which are noisy and imprecise. So, instead, try predicting the future based on noisy observations. Due to exponential divergence, it seems unlikely this would work. Looking through the paper, it looks like they do not analyze the performance under noisy observation - they just analyze their ability to estimate the Lyapunov exponents under noise, which is much easier.

So the real world application (in terms of forward forecasting) seems like it's limited to cases where chaotic divergence between simulation and real world is due to simulation model errors rather than observation error - the latter is still a fundamental limit. Otherwise, this demonstrates that DNNs can be trained to solve diff eqs well, which is fairly well-trodden work.

As a result, claiming that this can enhance ex. weather prediction is highly misleading at this point, as nothing has demonstrated any prediction performance improvements under noisy observation, which is really the fundamental (and practical) limit in chaotic systems. While it still may be possible to use DNNs to do that (by ex. learning how to most accurately estimate the components of state with the largest Lyapunov constants) I don't think this work demonstrates the feasibility of that yet. The language in the paper is more reserved, in that they primarily claim that they can use the DNN to learn a dynamic model even with observation noise and then that model can be used to estimate Lyapunov constants accurately. That's much more reasonable.


That's like saying, "it was only a matter of time someone figured out time traveling", when it happens. The whole point of the investigation was about chaotic systems, regardless of whether they're complex or not.


Nope, given our current theories time travel is practically impossible. On the other hand we know that chaotic systems are functions and that matrices can approximate any function.


> matrices can approximate any function

I'm not sure what point you're trying to make. Matrices only perform linear transformations, so matrices only approximate functions linearly, which in general, is a terrible approximation globally.


Especially, if you have complex systems where discretization and linearization aren't computationally achievable and/or numerically accurate ... like predicting global weather patterns or even very small experiments. I think about the phrase: All models are wrong; some models are useful.


I was pretty sure someone was going to focus on the example I gave instead of the idea :), very predictable. Chaotic systems are not just functions, that's why there's an entire discipline that studies them. There are many ways to approximate functions that have been developed during a long time and all have had trouble approximating chaotic functions.


If you’re going to be that pedantic, I’ll see your pedantry and raise you. Time travel is not only possible, but routine, just in the forward direction only. It’s only travel into the past which is probably impossible, and to be even more pedantic and abstract, impossible only in our observable spacetime geometry.


So is with the chaotic systems: we know for sure that approximating them is just a matter of boundary conditions: that is - the more you know about the starting points, the better you approximate the real world.

I mean, there might be a case where we will finally get to the point where we are just "good enough" at measuring the boundary conditions to predict weather for the next year, but it has got nothing to do with obtaining the knowledge about chaotic systems themselves.


ITYMeant "A multilayer perceptron is a universal function approximator.


I'm not sure if human perception is a chaotic system.

Chaos is defined as "small change in input -> large change in output".

Perception is actually the opposite, with small input changes (changes in light, different angles,...) leading to fundamentally unchanged perception ("It's a tree").


First, sensitivity to initial conditions is a necessary but not sufficient property for chaos. Some sort of folding/mixing is also necessary, which can be gauranteed by a bounded state space.

Second, it's definitely true that information processing systems, if they are to be reliable enough to be useful, are not going to be chaotic throughout their state space. They need to return the same output given a certain input. But I'd imagine there are also at least a few noisy regions.


I always thought that a perception like "it's a tree" that remains stable could possibly be an attractor in a chaotic system. If you look at the trajectory of a particle around an attractor, its position is very unpredictable after a while, but which attractor it is orbiting is not so random or unstable. Thinking of the visual of the Lorenz attractor.


You need also ergodicity and mixing, as jessriedel states, meaning respectively that you explore every area of your phase space with equal probability, and that two trajectories that begin arbtrarily close together do in fact diverge instead of sticking together, at infinite time


Doesn't that mean that the neural net learned itself a numerical method to compute the solution of the equation, and that it is close enough in terms of approximation up to 7 Lyapukov times, and after that time, the approximation becomes not good enough and the system can't predict?

It doesn't sound like too groundbreaking...


I think it's blurring the line between modelling and simulation. You can find an efficient route down a hillside by pouring water down it, or a line of least resistance through a system by passing a current through it. Is the system learning a numerical solution, or performing one? I think this is like building a model of a system that is much closer to the territory than the map compared to a normal model, but still easier to work with than the territory.


> Is the system learning a numerical solution, or performing one?

Is there a real difference? Any learning must be, fundamentally, an algorithm, so learning is performing.


This is a deep topic but one good treatment of is in the works of David Deutsch: https://www.cs.indiana.edu/~dgerman/hector/deutsch.pdf


Very cool! This looks like a very promising way to improve weather forecasts, predict wildfire movements (maybe?), and can maybe be used in control for chaotic systems? (Among other uses of course).

It feels like every day I'm seeing deep learning make more mathematical tools obsolete. It's amazing how useful this tool has been.


When we started saving layers and gained nonlinear functions, it gained the ability to approximate any function, given enough time and data. These are big caveats.

RL can function with a lot less data. SVMs can run with a lot less time and space. Partial function application and expert modelling reduce data needed - and some of the best results are from ensemble suites. It's not obsoleting, it's another tool in the box.


I was really hoping this was about weather forecasting.


It can be!


From the article:

“This paper suggests that one day we might be able perhaps to predict weather by machine-learning algorithms and not by sophisticated models of the atmosphere,” Kantz said.


If there's an algorithm to create this 'randomness' then how can it be all that random?

Isn't it just learning to make an approximation of that algorithm, rather than actually predicting chaos?


It’s working with imperfect information about the current state, which makes the algorithm that generates it useless.


Can anyone speak to whether PRNGs might be affected?


If you know the seed and the used algorithm, you are already able to "predict" the pseudo-random sequence perfectly.


Why wouldn't it? A lot of misunderstanding out there in both machine learning and nonlinear dynamics.


I'm curious, has anyone tried to apply ML to solving large linear systems?


Yes. Vladmir Vaptnik (1995)'s work on SVMs is probably the most recent, but linear regression is an ML technique as well.


I want to know how this does with traffic. Pretty remarkable.


I'd like to apply it to the stock market.


Ergodicity becomes a problem here. The nonlinear systems typically studied by complexity and chaos theorists have strong fixed rules that do not change in time. Markets have systematic and structural changes which can make prior observations completely irrelevant.


even after cross-validation?


Isn't deterministic-chaos an oxymoron?


The specific kind of chaos here implies (roughly) that a system starting in state X and a system starting arbitrarily close to X will at some point in the future behave totally differently. Not different as in "they'll drift further apart," but different as in "this one rolls off the cliff and this doesn't."

It's a fascinating field of dynamical systems, which is one of the coolest subfields of math (since the world is dynamic). Here's the wikipedia page https://en.wikipedia.org/wiki/Chaos_theory


No. Chaotic dynamical system can be completely deterministic.


Not only can chaos be exhibited in fully deterministic systems, it can be exhibited in absurdly simple systems. The logistic map is an incredibly simple iterated function that exhibits highly chaotic behavior.

It remains one of most mysterious things in mathematics, as far as I can tell. Studying it has consumed a lot of very smart people's careers.


The fact that the answer to this is "no" has fed entire subfields of mathematics for a generation.


The double pendulum is a nice example https://en.wikipedia.org/wiki/Double_pendulum


>when the powerful algorithm known as “deep learning”

Quanta is normally better than this (and the rest of the article is decent). But still :(




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: