Hacker Newsnew | past | comments | ask | show | jobs | submit | hgibbs's commentslogin

If you want to work in a Hilbert space (i.e. have angles) then you are stuck with the euclidean metric up to scaling of the axes.


Indeed … by definition … artifact of the choice then :-)


The term "matrix derivative" is a bit loaded - you can either mean the derivative of functions with matrix arguments, or functions with vector arguments that have some matrix multiplication terms. Either way, I don't really understand what the confusion is about - if you slightly modify the definition of a derivative to be directional (e.g. lim h->0 (f(X + hA) - f(X))/h) then all of this stuff looks the same (vector derivatives, matrix derivatives and so forth). Taking this perspective was very useful during my PhD where I had to work with analytic operator valued functions.


There’s a little more to it than that. Matrices have several properties like being symmetric or unitary (maybe even diagonal) that scalars don’t. Those enable rewritings to make systems more stable or computationally efficient.


That bears no relation to the symbolic differentiation in the OP though.

Even plain old multiplication and division, and even addition and subtraction have stability and efficiency problems on floats, which don't appear in symbolic solvers.


Can you give an example? I am unfamiliar (and interested).


I made a public bet with Norman in early 2020 (late Feb) about whether the university we were at would hold classes until the end of the term (he believed they would not due to covid). Norman was the only one with the guts to publicly point out the obvious (anybody with the Wikipedia page for China case numbers and a spreadsheet could have figured out what was going to happen in late Feb). Among other things he was a great lecturer and a talented go player, although I never really thought that rational trig was worth all the effort - regular math seems to have plenty of utility even if you think it is built on shaky philosophical foundations.


> I never really thought that rational trig was worth all the effort - regular math seems to have plenty of utility even if you think it is built on shaky philosophical foundations

For those familiar with "normal" (irrational?) trig, it's certainly an extra effort to learn, and may hinder communication, which gives a poor cost/benefit ratio.

For those unfamiliar with "normal" trig (e.g. not reaching that stage in school, or having later forgot it all), then rational trig certainly seems easier to learn and understand. I also like the way it generalises, e.g. to hyperbolic space (relativity), by simply changing the dot-product.


Part of the utility of rational trig seems to be that it can work in any field, while functions like sin and cos only work for reals. I.e. you can use rational trig describe the "geometry" of a "triangle" whose points have coordinates in a finite field


> Norman was the only one with the guts to publicly point out...

Sorry; what context was this in? UNSW maths department or some other social circle? I know I was anticipating lockdowns some time around then for exactly the same reasons as you list, but I assumed that all the mathematicians would have figured it out around the same moment.


Yes UNSW maths. Maybe everybody else had it figured out, but certainly Norman was the first to publicly acknowledge that all the learning would have to be online within a few months.


In my CS department this was anticipated by a few of us, pretty universally so by the end of February.


> even if you think it is built on shaky philosophical foundations.

What does this mean? What shaky philosophical foundations?


NJW posits that Real Numbers and Set Theory are based on the notion that it is possible to do an infinite number of calculations which he considers disingenuous.

He highlights in some of his Youtube videos that in respected Math textbooks the definition of real numbers is left vauge.

In his opinion set theory has the same kind of holes the we are expected to accept that we can add an infinite quantity of things to a Set by describing a function or simply having a desciption of the elements of the Set.


Within ZFC there are precise models of the real numbers and it’s not at all vague or on a loose footing.


https://www.youtube.com/watch?v=jlnBo3APRlU

The gist of the argument is that addition + other operations on non-computable numbers (which the real numbers contain) require infinite algorithms or something similar (unlike addition on computable irrational numbers which may require infinite work, but the algorithms are finite). You can therefore get situations where, say, the tenths digit in a sum of non-computable numbers is not defined because of potentially infinite carries, and there's no way to determine if the sequence of carries terminates or not. He discusses the problem in the context of different representations of real numbers, including infinite decimals, cauchy sequences, and dedekind cuts etc. This is just the gist of it.


This is different than saying the definition of the reals is “vague”. The models and various definitions are not vague. They are as precise as the axioms of Euclidean geometry or any other axiomatic system. His objections are reasons why he doesn’t like the axioms. One can either accept or reject the axiomatic system but it isn’t vague.


Maybe not "vague", but "underspecified".

ZFC models the set of real numbers, but only provides a model for a measure-zero amount of specific individual real numbers. It just says "yeah they exist".

People like Wildberger believe that anything that exists in math should have some way of determining its exact value, otherwise, what is "it"?


That I understand. I objected to the idea that the theory was vague rather than that the theory contains objects that are vague. Even so, if one thinks the theory contains vague objects then we run into considerations such as the following:

https://mathoverflow.net/questions/44102/is-the-analysis-as-...

I suppose being a finitist allows one the get around what Hankins wrote.


Wildberger is an ultrafinitist


I'd like to plug riptables (https://github.com/rtosholdings/riptable), which is (more-or-less) a performance upgrade to pandas.


Looks nice!


Earthsea is great. Ursula also wrote a lot of amazing science fiction: left hand of darkness, the dispossessed, the lathe of heaven, the word for world is forest etc. All are worth reading.


I would appreciate references on the time complexity of merging sorted lists (if I am not worried about space complexity at all), for arbitrary n and m. There is an excellent discussion in Volume 3 of TAOCP, but the references within are quite old.


Why do you say it fits very few uses cases?


Numba can compile in “no Python” mode only with a subset of Python. Eg classes is limited and is still experimental. Also I think string manipulation is slow but the doc has details.

If you want to specify the type (for example for aot or just because you want to make it clear) then the call signature is less flexible.

In short, pick any random Python library, you’d find there are very few places you can jit accelerate something effectively. It is for numeric.

Even for numerical code, it is more like writing C functions than say C++ (with classes etc).

But it does makes accelerating vectorized code very easy. Even if you have a function that uses Numpy, it is likely you can speed it up using Numba with a decorator only.

But when it doesn’t work, it might often be not very clear why you can’t until you get some experience.


The ahead of time compilation output is... Well.. let's say difficult to package _properly_ (compare it to Cython where it's well supported and documented). That makes it useless for production, unless you want to ship giant containers with compilers etc


In theory, a compiler toolchain is not required since Numba already comes with LLVM, i.e. for JIT compilation, no additional compiler is necessary.

In the past, that was also possible for AOT compilation [1], but that technique broke during some update and it seems like there is no one left who knows how to fix this.

[1] https://stackoverflow.com/a/42198101


Jax also has experimental support for persisting its JIT cache on the filesystem in the ‘jax.experimental.compilation_cache’ module.


How does jax compare to numba?


numba is more general. Any change to the shapes of arrays triggers a JIT recompilation in jax, numba is a bit more forgiving. jax has autodiff that numba doesn't. Also, JAX supports TPUs, which numba doesn't support (yet).


What??? Numba has more usage in the AI/ML community than Cython has ever had by anyone, ever.

"Fits very few use cases" LOL okay without numba there's no UMAP and HDBScan and those are pretty popular and important libraries that come to mind just off the top of my head...

Also, claiming Cython is well documented also gets a huge LOL from me as someone whose actually written a bit of Cython.


I have written quite a bit of cython code as well and at least the last time I looked cython was much better documented than numba (it has been a couple of years though so things might have improved on the numba side), and I would agree with the previous poster is generally quite well documented.


Also I am specifically referring to the documentation on creating ahead of time compiled packages using Numba vs Cython, but perhaps that was unclear.


FWIW Numba's JIT caches the compiled function as long as you don't call it again with different type signatures (eg. int32[] vs int64[])

I've succesfully deployed numba code in an AWS lambda for instance -- llvmlite takes a lot of your 250mb package budget, but once the lambda is "warm" the jit lag isn't an issue.

That said, if you absolutely want AOT you'll have to use Cython or some horrible hack dumping the compiled function binary.


Exactly!


You realize that Scikit-learn is written mostly in Cython (where high performance is needed)? It is a part of the most influential ML library in existence.


Also pandas


You're aware pandas and most of scipy is cython, right?

I like numba, but cython is clearly used more in the popular packages


It’s not really gonna be used in your database rest api is it.


I assume the parent comment was talking about the context of computations where numba is supposed to be a drop-in for wherever numpy is used.

And I agree that it's not actually usable everywhere, since the support for numpy's feature set is actually quite limited, especially around multidimensional arrays. I had to effectively rewrite my logic to make use of numba. Still it is pretty worth it imo, given how it can add parallelism for free. And conforming to numbas allowed subset of numpy usually results in simpler and more efficient code. In my case I ended up having to work around the lack of support for multidimensional arrays but ended up with a more efficient solution relying on low dimensional arrays being broadcasted, reducing a lot of duplicate computations


I've had success with numba speeding up code that worked on apache arrow returned by duckdb, which might just go into a rest api


Having a paper published can take a very very long time, a year is quite short and 6 months is basically the minimum wait. My last paper has taken 2 years to be accepted from the first time I submitted it, despite it being largely the same as the initial submission, accepted as part of my thesis over a year ago, and having several citations. It is very frustrating, and it also means that easier (and less original) work is easier to publish.


But the symptoms of covid are very similar to e.g. the flu, and this would make you and your partner some of the first people on Europe to have had covid (given exponential increase in case counts it cannot have circulated for long before early 2020). My prior on anybody having covid in Europe in Dec 2019 is extremely low, I think yours should be too - just having been really sick is no evidence at all.


As a point of curiosity, what do you mean by optimal?

To me, I can see at least two (potentially different) cost functions that a Wordle strategy can aim to minimise. Firstly, you can try to minimise the expected number of guesses, and secondly you can try to minimise the expected number of guesses conditional on never losing the game. In principle you could optimise for the first but not the second by allowing a small list of hard words to take > 6 guesses on average.

Part of my point is the lack of an absolute definition of optimal: strategies are only optimal up to the specified coat function.


IMO, never losing the game is the bare minimum for any reasonable cost function: never take > 6 guesses. Beyond that, you can have different optimization functions based on whether you're trying to minimize the worst-case depth of the decision tree (number of guesses) for each position, or the average depth, or some other reasonable function of the distribution over number of guesses.

Although you could have arbitrarily many cost functions, a fairly general class is the following: pick some nonnegative constants (w1, w2, w3, w4, w5, w6, w7), and for a certain strategy, define the cost as:

    w1*n1 + w2*n2 + w3*n3 + w4*n4 + w5*n5 + w6*n6 + w7*n7
where nk (for 1≤k≤6) is the number of hidden (solution) words for which the strategy takes n guesses, and n7 is the number of words for which the strategy loses the game. Then,

• Setting w7 infinite / very high is a way of encoding the condition that you not lose the game. (You can set it finite/small if you don't mind occasionally losing the game for some reason!)

• Setting w1 = 1, w2 = 2, …, w6 = 6 will simply minimize the average number of guesses,

• Having them grow exponentially, e.g. setting wk = c^(k-1), where c is some constant larger than 12972 (the number of acceptable guess words) will make sure that the minimal-cost strategy first minimizes the maximum number of guesses, then break ties by having the fewest words take that many guesses, and so on.


> IMO, never losing the game is the bare minimum for any reasonable cost function: never take > 6 guesses.

I wouldn’t think so. If I get 99.9% of rounds correct with 3 guesses (and fail to find a solution to 0.1% of the rounds), and you get 100% of rounds with 6 guesses, I’d say I’ve soundly defeated you 99.9% of the time.


Right, that point of view makes sense if you imagine two players separately playing Wordle and comparing their results, and indeed that's what the OP may have been thinking. But as far as I know, Wordle is primarily a game for one player to play against the computer, where eventually getting to the word (or not) is the most prominent outcome (the only notion of "win" or "defeat" in the game itself), and solving it in fewer guesses is just a bonus. But sure, if you don't care about occasionally losing, you can just set an appropriate weight for w7 within the same family of cost functions.


Indeed, it’s essentially a single player game, but we’re talking about which bot a player would want to use which is inherently a notion of bots competing against each other.


If you look at the failed attempts in the article, you'll see it's not so easy to get 100% correct with 6 guesses. For example:

SLATE, CRONY, HOUND, BOUND, POUND, MOUND (correct: WOUND)

SLATE, SHARE, SHAME, SHAPE, SHAKE, SHADE (correct: SHAVE)

...so sometimes even if it gets 4 of 5 letters right on the second try, it still has too many options left. Of course in cases like this it could go "one step back" and try a word with (most of) the letters that are different between these words (HBPMW or MPKDV). So try different words until the number of options is narrowed down sufficiently.


> I’d say I’ve soundly defeated you 99.9% of the time.

Except it wouldn't be 99.9% of the time. Just the games were you guessed fewer than GP.

GP's solver would expect to guess the correct word 100% of the time in fewer than 6 guess, but not that it would always take 6 guesses.


I was saying if you hypothetically got every correct guess in exactly 6 tries. You would pass the suggested “bare minimum” while my 99.9% strategy doesn’t, yet I claim my strategy is better.


For some value of better, sure.


Sure, where "some value of better" is "wins in the overwhelming majority of rounds."


> never take > 6 guesses

I suspect that with most languages it’s impossible to achieve 100% win rate for five or more letters. So w7 should be by far the largest weight but I don’t think it can be infinite.


I suspect the opposite. While the theoretical number of combinations is huge, the amount of information given by each guess grows faster than the number of real words of longer lengths.


They specified that in the post, they were able to constrain to a strategy that guarantees 6 or fewer and then optimize for EV to get 3.55, but could get down to 3.52 by allowing some small number of words to take more than 6.

I would be interested to know if 6 is the lowest you can constrain it. Can you guarantee a solution in 5, and if so what is the best EV with that constraint?


Yes, you can -- I believe the EV was around 3.47 for that. The "greedy" strategy (minimizing the worst-case size of the largest set) also ends up using at most 5 words.

Perhaps unsurprisingly, you can't guarantee at most 4 words.


You can get EV of 3.46393 with a guaranteed maximum of five tries. (I don't know whether this bound is tight.)



Good to know.

Another possible question is: What is the best strategy minimizing (at any point) _first_ the maximum number of tries, and second the EV (average number of tries over all possible words)? That's subtly different from either answer.


Funnily enough, I think neither of those notions are the correct thing to optimize. For me, you want to minimize the maximum number of guesses. This leads to an equilibrium. Otherwise, your opponent can just abuse your strategy.


My point is exactly that though - the choice of cost function is subjective. There is no best cost function, just a mapping from costs functions to optimal strategies.


The (exact) optimal EV-minimizing strategies (assuming each target word is equally likely) always solve in 5 of fewer guesses in normal mode, and always solve in 6 guesses in hard mode. If you wish to guarantee 5 guesses in hard mode, there's a different (optimal) strategy with slightly higher EV. See: http://sonorouschocolate.com/notes/index.php?title=The_best_...

I don't think a maximum-number-of-guesses optimizer leads to an equilibrium (if you mean Nash equilibrium), assuming you're playing the game where the score of the game for the setter is the number of guesses. In particular, if the solver's strategy is deterministic, the setter will always pick one of the words that needs 5 guesses. There's no reason to think the nash equilibrium can be easily found.


Adversarial strategies are another type of problem entirely (as in, if you're playing Word Mastermind). Wordle is a 1-player game, so I find hgibbs's comment totally reasonable. I would think that 'optimal' is more accurately his second definition - minimizing avg guesses but always winning (as in, losing is +inf guesses).


Isn't that just #2, but changing the definition of losing to be the least number of rounds where you can always win (which for Wordle is 5)? These small changes to the definition don't seem to change ev (or the code needed to generate it) that much.

If you don't care about optimizing the EV at all, the greedy tree already has the property you're interested in.


What do you mean by opponent?

The entity choosing the word in a game like Wordle or Hang Man is more like the dungeon master in a game of D&D than an opponent in the traditional sense? They are there to help you have fun.


If two players are competing in a single round of wordle, then surely the player who guesses the correct word in the fewer number of guesses wins, right? Then it seems natural to say that the better of two players is the one with the most wins after playing all possible rounds.

The only problem is that I think this definition leads to the possibility of non-transitivity, where A beats B, B beats C, and C beats A.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: