Hacker Newsnew | past | comments | ask | show | jobs | submit | torusle's commentslogin

Back then Microsoft was lobbying as hard as they could to turn that decision to move to linux over.

They knew: If Linux makes it in Munich, it will likely spread over and they loose tons of contracts with other German states.


"I could only find a couple tutorials/guides and both were imperative"

Aren't Quadtrees covered by almost all basic data-structure books? It is the most simple form of taking the binary tree into the next (2D) dimension.


The problem is rather, that most data structure tutorials and books don't even get the idea, to introduce a purely functional version, but merely state the imperative versions. Coming up with the functional versions of data structures can be difficult. Papers about it can be hard to understand and often require one to already know some niche language, that a researcher used for the paper. Even if you can find a functional implementation of a data structure, there is often not a good explanation and you need to, sort of, reverse engineer it and translate it to the language you are using.

In short, it seems relatively few people have the skills to implement them and even fewer have the skills to come up with functional versions of ordinary data structures. They are almost completely absent from university lectures as well, as far as I am aware. For example for AVL trees I could only find a document from ETH from a lecture, that no longer exists or is taught. The language is Isabel and I need to understand its syntax first, before being able to translate it to Scheme.

If anyone has an obscure source for implementations one can learn from and implement oneself in another language, please share.


I've been looking for the same for scheme and clojure. Here are a few I've found:

Functional Data Structures and Algorithms, A Proof Assistant Approach by Tobias Nipkow (Ed.) [https://fdsa-book.net/functional_data_structures_algorithms....]

Purely Functional Data Structures thesis by Chris Okasaki [https://www.cs.cmu.edu/~rwh/students/okasaki.pdf]

https://en.wikipedia.org/wiki/Purely_functional_data_structu...


[1] are the exercises with solutions (!) of a lecture [2] at ETH Zurich, which include AVL trees in Isabel. The whole list of exercise PDFs is at [3]. I started porting the AVL trees to Scheme [4], but lately have not worked more on this. I also try to make it easy to understand the implementation by providing explanatory comments.

[1]: https://archiv.infsec.ethz.ch/education/permanent/csmr/exerc...

[2]: https://archiv.infsec.ethz.ch/education/permanent/csmr.html

[3]: https://archiv.infsec.ethz.ch/education/permanent/csmr/exerc...

[4]: https://codeberg.org/ZelphirKaltstahl/guile-data-structures/...


Okasaki's is basically the Bible for this stuff. Anyone writing data structure libraries in a functional language will have read this or have it on their to-read list.

I have the book, but it also doesn't contain that many data structures. Some of the maybe most used (AVL tree?) are not in there.

Not all exercises have solutions. I get stuck on some exercise and have a hard time finding solutions to them, that I can compare with my implementations in Scheme. Not being a Haskell user (yet), I also have some issues translating Haskell to Scheme. In languages like Haskell one often controls flow by pattern matching on type. In Scheme one needs to make structures or records explicitly and use their predicates to explicitly check the type. It's all possible, but not exactly great for learning. Sort of the road has many sticks and stones to stumble.


You can even build them with basically one line of code by sorting points using Morton / Z-curve order. It's linear time if you use a counting/radix sort.

Edit: lol, downvoted for this post. Never change, HN.


I'd love to see that. Could you link me to an implementation or explain this in more detail please?

Here's a 3D version used in the creation of sparse voxel octrees:

https://forceflow.be/2013/10/07/morton-encodingdecoding-thro...

Here's an example from AWS, where lat/long pairs are put into a Z-index, which is used as a DynamoDB sort key, letting you efficiently query for items near a point.

https://aws.amazon.com/blogs/database/z-order-indexing-for-m...


I would like to know about this more, too. Is there a code anywhere, ideally with comments? But I am fine without comments, too, I would just like to see the code and possibly with an example usage.

Okay, I was intrigued and I did some digging. Morton / Z-order is all about interleaving the individual bits of the x and y coordinates. You end up grouping by quadrants. Python one liner:

    points.sort(key=lambda p: sum(((p[0]>>i&1)<<(2*i))|((p[1]>>i&1)<<(2*i+1)) for i in range(16)))

Yeah good point, they are downsides for sure but it's a simple enough approach and most of all it can be shoved in a database (or b-tree or any 1d-sorted data structure).

And for a z-curve, the order is basically a depth-first traversal of a quadtree.


It’s super easy to click the downvote button by accident on mobile when you meant to upvote. And this UI will never be fixed because this is HN after all.

I just implemented an HN UI. This is good feedback as I'm aiming to have a mobile friendly web version.

https://proc0.github.io/HackerZen (it's also open source)


This is precisely the reason that I do not log in to HN on my phone. My phone is read-only and if I want to upvote or comment then I have to switch to my laptop. Pretty easy with firefox because I can send tabs to other devices.

Zoom in or unvote/revote

This is true, however you can see if you downvoted by the "undown" link displayed instead of "unvote".

That is why I like harmonic app, there is an invite button separating the upvote and downvote. Never going to have this kind of issue

this

For those who don't know: Vias are not only used to get an electrical connection from one side of the PCB to another.

You also need them to keep radiation in check and often to move heat away.

With this technique, good luck getting through EMC testing for anything but trivial circuits.


> You also need them to keep radiation in check and often to move heat away.

Hence my post saying vias are useful at all stages of prototyping.

> good luck getting through EMC testing for anything but trivial circuits

It's for prototyping! Nobody says you can't add more in a board spin.


Honestly,

this is really bad. It might be a breakthrough of what you are doing, but when I listen to the output all of the timing and phrasing is aweful.


Alright then, post your ML simulation of a piano player, I'll wait :)


There are couple of tricks you can do if you fiddle with the bits of a floating point value using integer arithmetic and binary logic.

That was a thing back in the 90th..

I wonder how hard the performance hit from moving values between integer and float pipeline is nowadays.

Last time I looked into that was the Cortex-A8 (first I-Phone area). Doing that kind of trick costed around 26 cycles (back and forth) due to pipeline stalls back then.


There are basic integer operations in the FP/SIMD units on most CPUs, so there’s no generally need to “move back and forth” unless you need to branch on the result of a comparison, use a value as an address, or do some more specialized arithmetic.


On x86 there is sometimes (depending on the specific microarchitecture) an extra cycle additional latency when using an integer operation on a xmm register last used with a float operation.

I have seen it explained as the integer and foat ALUs's being physically distant and the forwarding network needing an extra cycle to transport the operands.


This is correct, but it’s happily pretty rare for it to matter in practice (because the domain bypass penalty is small and does not directly impact throughput, only latency).


(For that matter, though, most modern FP/SIMD units have a direct approximate-reciprocal instruction that is single-cycle throughput or better and much more accurate--generally around 10-12 bits, so there's no need for this sort of thing. See, e.g. FRECPE on ARM NEON and [V]RCPP[S/D] on x86.)


These kinds of tricks are still used today. They're not so useful if you need a reciprocal or square root, since CPUs now have dedicated hardware for that, but it's different if you need a _cube_ root or x^(1/2.4).


I wonder to what extent the dedicated hardware is essentially implementing the same steps but at the transistor level.


The big cores do. They essentially pump division through something like an FMA (fused multiply-add) unit, possibly the same unit that is used for multiplication and addition. That's for the Newton-Raphson steps, or Goldschmidt steps.

In hardware it's much easier to do a LUT-based approximation for the initial estimate rather than the subtraction trick, though.

It's common for CPUs to give 6-8 accurate bits in the approximation. x86 gives 13 accurate bits. Back in 1975, the Cray 1 gave 30 (!) accurate bits in the first approximation, and it didn't even have a division instruction (everything about that machine was big and fast).


There is also the case with machines that lack FP support, like some ARM Cortex M variants.


I've worked in the payment industry and among other things built a payment terminal, so I know a thing or two about it.

1st: The message overhead during communication is not an issue. It is tiny compared to the time it takes for the credit card to do it's crypto thing.

2nd: This won't be adapted ever. There is simply way to much working infrastructure out there build on the APDU protocol. And there are so many companies and stakeholders involved that doing any change will take years. If they start aligning on a change, it would be something that makes a difference, not something that safes time in the order of 5 to 20 milliseconds. per payment.


Hi there, author here! I think you've highlighted a couple of things worth clarifying in the doc:

1. Apple having just announced it is opening up NFC to developers means that both major mobile platforms can now act as responding devices; so widely distributing new NFC protocols to consumer devices has become very fast and inexpensive through an update to the OS or installing a third party app.

2. Mobile consumer hardware is sufficiently fast for the application operations (eg. Cryptographic operations) so that these roundtrip and metadata overheads of APDU do actually make a meaningful contribution to the total time it takes to complete a transaction. Experiencing this in my development efforts here was the motivation for designing this alternative.

3. A1 is interoperable with APDU infrastructure and can therefore be adopted by terminals immediately, since reader terminals can attempt an A1 initiation and any APDU response from a legacy device is considered a failure; at which point the terminal can fall back to its default APDU message exchange.

I will update the doc to clarify these points, what do you think?

Given your experience I'd be interested in your detailed feedback, maybe we could jump on a call soon if you have time?


> 1. Apple having just announced it is opening up NFC > to developers means that both major mobile > platforms can now act as responding devices;

> 2. Mobile consumer hardware is sufficiently fast for the application > operations (eg. Cryptographic operations)

You are right here. It is possible to emulate a card using mobile phones. We've been able to shim/emulate any card for much longer.

The thing is: To connect to the payment system you need a certificate. And you simply don't get it unless you can prove that you have all kinds of security measures applied.

For android and apple, the actual payment stuff runs inside a isolated little micro-controller which has been certified and is temper proof/protected. This little thing is fortified so far that it will destruct itself when you try to open it up. There are alarm meshes, light sensors and much more inside the chip to detect any intrusion just to protect the certificate.

If you don't have that security, the payment providers will deny you their certificate, plain and simple.

You can build your own thing using card emulation via apps, but you will take all the risks.

How it works in practice is the following: These temper proof micro-controllers can run java-card code. You can write an applet for them, get it installed (if you have the key from apple/google - not easy). Then install it and you have a two-way communication: Your applet inside the secure world communicating with the payment world, and your ordinary mobile app showing things on the screen.


"You can build your own thing using card emulation via apps, but you will take all the risks." right! This is exactly what I've developed. I've built a new NFC application, with its own PKI infrastructure, that's deployed onto users' mobile devices via an app install. It works over APDU, but it would be more efficient if A1 was made available by the two major mobile OS. It would likely shave off >10% of the total time to complete which makes a material difference to the failure rate (ie. customer's lifting away too early or other interference) and therefore improves the overall UX.


Or as simple as using the hardware accelerated CRC32 that we have in our x86 CPUs.

Last time I checked, CRC32 worked surprisingly well as a hash.


Hah, neat.

The 'weird' instructions can often be 3-4 orders of magnitude slower than arithmetic instructions, though I doubt that matters here.


CRC32 is the same speed as an integer multiply, going all the way back to Nehalem (2008). 3 cycles latency, and you can start a new one each cycle (or more than one, on Zen 5).


Sure, in the same way SIMD instructions get a linear speedup in theory.

If you Google the words "CRC32 is slow", you can see hundreds of people complaining about this.


Another bonus quirk, from the 486 and Pentium area..

BSWAP EAX converts from little endian to big endian and vice versa. It was a 32 bit instruction to begin with.

However, we have the 0x66 prefix that switches between 16 and 32 bit mode. If you apply that to BSWAP EAX undefined funky things happen.

On some CPU architectures (Intel vs. AMD) the prefix was just ignored. On others it did something that I call an "inner swap". E.g. in your four bytes that are stored in EAX byte 1 and 2 are swapped.

  0x11223344 became 0x11332244.


Also known as "bswap ax", and research shows that it does something surprising but consistent on almost all hardware: It zeros the register.

https://www.ragestorm.net/blogs/?p=141

https://gynvael.coldwind.pl/?id=268

However, this page, now gone, suggests that some CPUs (early 486s?) did something different: http://web.archive.org/web/20071231192014/http://www.df.lth....

Unfortunately I have not found any evidence nor reason to believe that this "inner swap" behaviour you mention exists in some CPU --- except perhaps some flawed emulators?


Nah, it is not that bad.

Sure you can mess up your performance by picking bad compiler options, but most of the time you are fine with just default optimizations enabled and let it do it's thing. No need to understand the black magic behind it.

This is only really necessary if you want to squeeze the last bit of performance out of a piece of code. And honestly, how often dies this occur in day to day coding unless you write a video or audio codec?


The main flags to look at:

* mtune/march - specifying a value of native optimizes for the current machine, x86-64-v1/v2/v3/v4 for generations or you can specify a specific CPU (ARM has different naming conventions). Recommendation: use the generation if distributing binaries, native if building and running locally unless you can get much much more specific

* -O2 / -O3 - turn on most optimizations for speed. Alternatively Os/Oz for smaller binaries (sometimes faster, particularly on ARM)

* -flto=thin - get most of the benefits of LTO with minimal compile time overhead

* pgo - if you have a representative workload you can use this to replace compiler heuristics with real world measurements. AutoFDO is the next evolution of this to make it easier to connect data from production environments to compile time.

* math: -fno-math-errno and -fno-trapping-math are “safe” subsets of ffast-math (i.e. don’t alter the numerical accuracy). -fno-signed-zeros can also probably be considered if valuable.


Also I learned recently that there's `-Og` which enables optimizations suitable for debug build.


In practice I’ve had limited success with that flag. It still seems to enable optimizations that make debugging difficult.


Agreed. I like to compile most translation units with -O3 and then only compile the translation units that I want to debug with -O0. That way I can often end up with a binary that's reasonably fast but still debuggable for the parts that I care about.


Yup that’s what I’ve resorted to (in Rust I do it at the crate level instead of translation unit). The only downside is forgetting about it and then wondering why stepping through is failing to work properly.


Correct.

And every graphic programmer worth it's salt had a copy of "Computer Graphics - Principles and Practice" on the desk and followed whatever came out of the "Graphics Gems" series.

We knew about BSP, Octrees and all that stuff. It was common knowledge.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: