More

arithma · on Aug 6, 2019

I don’t understand this argument. Can’t the Chinese buy a LOT of influence, real estate, and US properties with all the dollars? Why is printing money “free”? How is this not either hyperinflation on one end or just regular currency value on the other. On one end it hurts the issuing country, on the other it’s mutually beneficial. How does printed-paper for goods fit into that spectrum.

I truly can’t wrap my head around this argument: been on the back of my head since I heard I think Peter Thiel say it (iirc.)

arithma · on Aug 4, 2019

As long as Bob doesn’t observe himself and tries to improve his performance.

arithma · on July 30, 2019

A good self-rating strategy in an interview would be to try and percentile yourself among the people based on your knowledge and try to justify it with humility. C++ is huge, I expect I know more than 80% of the people due to (years of experience, different kinds of projects, yada yada), but I feel i have only been able to explore like 50% of the language so far (concretely learn that is) and there’s more to learn like (1, 2, 3..) and probably more that I might have missed completely.

thechao · on Aug 1, 2019

It’s almost always 7. The actual question I ask is “rate yourself on a scale from one to Bjarne”. Bjarne Stroustrup was the principle in the group I did my PhD, and he rates himself a 7, so when interviewees give ‘7’ as the answer, it makes me laugh.

arithma · on July 22, 2019

Not disagreeing, but just to add on the other side of the coin: language can be a driver for thought and ideas. This might feel wishywashy without an example, but I usually feel that I need to reinterpret algebraic manipulation that I have done to understand what significance they have.

So in a way, I allowed myself to be a mechanical executer through the rules of algebra/calculus I've used, and then I re-engaged my intuition to understand what is happening.

It might even be my unwillingness to accept the mechanical approach purely that poses a big hindrance to fast tracking a lot of learning in this domain I have to do. Maybe I should just shut up and calculate as they say, but it's hard.

This just makes it all the more true that learning "languages" is the best way to become smarter. "Smarter" is a weird personal word here, but I'm at a loss for a better way to put it. What am going for is: languages can be an abstract lever that allows our limited cognitive potential to build culturally, which is distinct and will definitely interact with the knowledge ladder, but in a distinct way. "Languages" here are more general than just the textual/oral/syntactic, and am not talking about Russian/Arabic/French either.

TuringTest · on July 23, 2019

> I usually feel that I need to reinterpret algebraic manipulation that I have done to understand what significance they have. So in a way, I allowed myself to be a mechanical executer through the rules of algebra/calculus I've used, and then I re-engaged my intuition to understand what is happening.

That's a sensible approach, but it works because you're executing mechanical in a well-defined problem domain, so there's an intuition attached to each term used, and you can build an overall intuition. You might as well use random mechanical transformations over an algebra and they wouldn't make any sense, just like if you attached random dictionary words forming grammatically correct sentences.

I see usage of formal languages as mathematicians and philosophers making "very precise sentences", i.e. being extremely precise with the meaning attached to each term so that it remains consistent among all the transformations of discourse (in fact, there's a formal theory of meaning which does exactly that).

But the main value of such exercise is in knowing what you want to express, not in the fact that you're being extremely precise while doing it; the latter can help you convince others of what you're saying, but they need to agree to the original axioms of your theory, otherwise they'll just be able to point very precisely where they think you are wrong.

arithma · on July 9, 2019

App becomes a local backend server that has privileged access to the machine (with control over what it accesses, maybe give it a specific user), and a web UI to access the app.

Someone in the tree of replies for your parent explained and was contemplating actually going through with it.

arithma · on July 7, 2019

Terribly tangential, but is there a possibility at all that the higher core counts will converge with GPU core counts in the future, in a large CPU, many smaller medium cores, and hundreds/thousands of micro cores.

Programming those would require a new paradigm possibly.

yarg · on July 7, 2019

I'd say no (but you might need to clarify what you consider a core to be and whether you're confusing it with a thread). The way that SIMD processors work is fundamentally different from the way that a CPU works.

GPU cores run the same operations across a large number of "threads" (32, 64, etc) - they're no really threads.

When you have branching code running on a GPU, the threads are split into the branches, and each branch gets executed separately.

CPU threads run independently of each other (there's often locking for access of shared data, but that's not what I'm talking about).

Even when multiple threads on a single CPU core are running (SMT) they're still not performing the same operation.

Per thread (it's not quite what they are, but a more appropriate terminology escapes me) there is and (unless it's a very stripped down CPU) will always be a far greater silicon overhead for CPUs compared with GPUs.

That wasn't the best description of the way things work, but I'm rather tired and not a hardware guy.

jb1991 · on July 7, 2019

> When you have branching code running on a GPU, the threads are split into the branches, and each branch gets executed separately.

I'm not an expert on GPU hardware but in my readings on this, I've seen it said that if there is a branch in your code, all cores take that branch even if it is a no-op for many of them, thus all branches are taken by all cores when the code is not coherent. Hence why making GPU parallelization can be quite a different programming paradigm to really take advantage of how it works, and it is more challenging to do properly than truly independent CPU threads that do not affect each other when they branch.

jdashg · on July 7, 2019

That's right! It's called "wavefront divergence".

Ideally all "threads" of the wavefront take the same side of a branch, so it can skip the not-taken side of the branch. Wavefront divergence is when a single wavefront has threads that take different sides of a branch, so the whole wavefront runs both sides and masks out the results based on branch direction per thread.

yarg · on July 7, 2019

Thanks for the reply - I hadn't actually thought in any depth about what's going on, and I'd like to get in GPU programming at some stage.

Also, sorry about the barely prompted and completely unjustified wall of rambling text - this is something that I really should've given more consideration to already.

That's a programming model that has the potential to go very wrong, very fast if you don't think in depth about what you're doing. Branches with branches are going to be very problematic (with exponentially decaying throughput), although multiple branches at the same level are handled very cleanly.

To be honest, I don't know how I thought that it might be rescheduling things instead.

The sort of rescheduling that I seem to have been thinking of, could only make a difference in cases where the batch size exceeds the wave size, and my guess is that the factor of difference would need to be large to make an impact.

At the base case, NOPs and rescheduling would perform identical operations - so all the scheduler would get you is a hardware overhead (the time overhead could be mitigated when running on a single wave).

The scheduler would also introduce latency since different waves in the same batch would need to wait for each other before rescheduling could proceed.

You'd cause problems for your memory layout - programmers referencing values from threads that have branched would probably need to be treated as undefined behaviour.

You'd also need to introduce a stack, per thread, to keep track of the wave histories - allowing you to unsort and re-reschedule at the ends of branches.

All this to run parallel execution on what seems to be uncommonly large batch sizes (I believe AMD just dropped their wave size down from 64 to 32 - I don't know if this is because 32 is a Goldilocks batch size, or if it's simply to achieve better performance on Nvidia optimised applications).

yarg · on July 8, 2019

> all branches are taken by all cores

Perhaps this should be threads, but certainly not cores - it's a single core running the same operations on different data across multiple threadish things.

I'm not sure if there's a more technically accurate terminology for quite what they are.

gumby · on July 7, 2019

That is basically what a GPU is. However NUMA architectures are often really "NU" so the GPU doesn't have all the cache infrastructure that a CPU is because of the CPU's typically MIMD workload.

If you're interested in an earlier mass market phase of moving GP computation of the von Neumann CPU approach, check out the PlayStation 3's "cell" architecture. Devs really struggled with it and Sony went back to von Neumann for the PS4.

Drdrdrq · on July 7, 2019

If anyone else is curious: http://www.redgamingtech.com/sony-playstation-3-post-mortem-...

Const-me · on July 7, 2019

I think that’s unlikely to happen. See what happened to Intel’s Larrabee / Knights Landing / Xeon Phi. They have 50-70 x86 cores (initially Pentium, later Atom) on the chip, with extra AVX512 vector units.

CPU cores are spending a lot of transistors minimizing latency: branch prediction, microops fusion, sophisticated multi-level caches. GPUs are fine without most of that (they do have caches but much simpler ones), they are spending majority of their transistors and electricity on ALUs. Instead of fighting latency, GPU cores embrace it and optimize for bandwidth on massively parallel workloads: they have cheap hardware threads so they switch threads instead of stalling the pipeline.

mantap · on July 7, 2019

In that case it would make more sense to move the smaller cores to a PCI card a la Xeon Phi.

The programming paradigm to enable this exists, it's just pure functional programming. But people are intimidated by Haskell.

We need more investment in functional programming before the dream of a 1000 core computer can be realized.

yvdriess · on July 7, 2019

On paper.

Parallel programming in Haskell is hard. GHC's style of by-demand lazy evaluation and the ubiquitous use of monads impose a lot of sequential execution.

There has been good research around parallel FP programming languages, but that was mostly around the 90s (Sisal, pH)

tempguy9999 · on July 7, 2019

> The programming paradigm to enable this exists, it's just pure functional programming

Whether more cores can be used is ultimately down to the problem, not any language. Some problems are naturally paralisable, some just aren't.

FP maybe exposes a bit more parallelism, but it may introduce more overheads such as less efficient cache use. FP is not a solution, it may be part of the solution.

arithma · on July 1, 2019

I think it's saying that "software politics" and productivity are not one in the same.

arithma · on July 1, 2019

For a course at uni (am not a practitioner in this domain in any measurement) we used cvxpy (and there was cvxopt for the matlab inclined).

Are those within the ballpark of what you need to do?

--

Am adding this response as both:

-1 Hey maybe you didn't know about this

-2 A question: Is CVXpy even in the same domain as those other tools listed here in the replies

leethargo · on July 1, 2019

cvxpy is only for linear programming (LP), but does not support mixed-integer programming (MIP), so you can only use it for some problems.

Also, I think cvxpy might implement an interior point method, not a simplex method, so you don't get a "vertex solution", which often has some nice sparsity attributes.

arithma · on June 11, 2019

A different pattern I have thought of would be to provide an accessor interface object, and certain functionality is limited to be accessible only through that object. Controlling who has access to that interface can be flexible (possibly returning it along with construction, or passing it in at creation time...) Albeit, this is not without overhead, and for that, Badge is actually better, in terms of performance, if it fits the use case. What happens when you have multiple Device objects though, in this case, they can all access VFS..

--

C++'s private inheritance can be used here to avoid some of the gymnastics, wherein, that private aspect of the class can be passed out into the appropriate party at creation time or otherwise.

To illustrate: https://gist.github.com/arithma/0d96aec5c07e2d8dc24cfc6cd31f...

Note: I have never used this in production as far as I remember.

mmkos · on June 11, 2019

Pretty sure you described the attorney-client pattern.

arithma · on June 11, 2019

If it is, it's the first time I hear about it. It always gives me a kick when I make up a similar design pattern to some grey beard's invention. Alas, I should read up on those patterns to avoid reinventing stuff and sharing things with people with appropriate names.

arithma · on June 9, 2019

A bit OT, but I wait for the day where bandwidth isn't a limiting factor and we can host machines doing computations anywhere, instead of having to rent out economies of scale. With Kubernetes and other private-cloud-like solutions, wouldn't it eventually be meaningless to depend on centralized cloud clusters and instead host your own and tap into an equal-footing infrastructure. Projects like vast.ai (no affiliation) are examples of the ideal I hope for.