Performant execution for any vector language (APL/J/K) requires amortizing the array actions. For example, summing an array should invoke an action in the VM that adds together all of the elements of the array with only a single type look-up.
I am skeptical of targeting an array language to a VM that requires a type check on every element of the array. Numpy handles this by performing all its actions in a C library independently of the CPython VM.
> I am skeptical of targeting an array language to a VM that requires a type check on every element of the array. Numpy handles this by performing all its actions in a C library independently of the CPython VM.
If arrays are implemented as binaries then you should be fine.
> For example, summing an array should invoke an action in the VM that adds together all of the elements of the array with only a single type look-up.
Erlang makes it easy to add NIF when you want sum to be a little bit faster. I think this is the last problem anyone needs to solve though, the elegance of array-language solutions is far more important than their performance. That they're also performant is how we know all other languages are wrong :)
I believe that HiPE-compiled BEAM code already has the type-checks lifted up to the earliest point in the module it can manage. The resultant code would still do an initial type-check on each element of the input list (in order to, essentially, prove that the thing you were handed is a homogenous array), but then all the transformations of that list would carry through the homogenous-ness if possible (which it isn't always, if you're allowing custom mapping functions, because there's no way to require that a fun has a particular return type. Otherwise, though, I believe it "just works.")
Agreed. BEAM is absolutely fantastic, and having other languages to be able to leverage its power can only be a good thing.
I have high hopes for Alpaca personally, a super modern ML that targets BEAM. I think that in a similar way to how an APL-like would allow leveraging BEAMs building blocks in unique ways, an ML type system that’s aware of BEAM would allow for some seriously powerful, seriously correct programs. There’s a lot of research into algebras for concurrent and parallel processing, which slot nicely into ML-likes.
I feel like a smooth interface between BEAM and Julia would be particularly useful. Elixir would be a nice platform since both have lisp-y macros and 1st class AST support, so each language could pass ASTs to each other for execution. Julia could be used for heavy computational loads (which it excels at), and BEAM could handle IO, reliability, supervision, and process management (which it excels at).
An APL on BEAM would probably be terrible; since BEAM intentionally does not support arrays. Having lists all be linked lists is a deliberate choice because you have O(1) immutable copy, which is absolutely critical for passing data between procs without "sharing memory", and creating an absolutely safe concurrent system. Given the relative heavy weight of BEAM, and the featureset it provides, it's quite impressive that for I/O and routing heavy tasks its performance is within an order of magnitude of, say, C.
You could maybe bootstrap arrays as binaries, but as you start making extensive modifications to the array, you will get crazy sharding of the binary information, and performance will die.
I am writing another post which will describe better the niche I think it should fill - an auxiliary language used to write library modules which would be called from applications written in Elixir or Erlang.
The functionality that would be replaced is currently written by traversing lists - this really is a business-rules language add-on to existing infrastructure rather than a high-performance array processing language.
julia is rolling out an all-julia database solution, and I just wish they used BEAM (probably elixir) as supervisors on top of julia worker processes instead of relying on julia, principally because of the lack of reliability guarantees in julia (I love julia, but it's not really a programming language designed for that)
> since BEAM intentionally does not support arrays.
I think tuples count: They're O(1) element access and fixed length.
Binaries probably work better for ints though, since a NIF can do heavy work if you solve all the other problems.
> Elixir would be a nice platform since both have lisp-y macros and 1st class AST support, so each language could pass ASTs to each other for execution. Julia could be used for heavy computational loads (which it excels at), and BEAM could handle IO, reliability, supervision, and process management (which it excels at).
An
Unfortunately, neither have what APL has: Good notation.
> Tuples don't really count. For performance computing you really want arrays to have mutable contents.
I don't think performance is the biggest win here, but: q/kdb+ doesn't have mutable arrays -- everything is copy-on-write, and it's a lot faster than Julia. If this hypothetical APLish language is otherwise useful, pulling a similar trick would benefit the rest of the Erlang ecosystem.
> Julia's notation is incredibly math-oriented.
APLs almost universally have better notation though. k/q follows:
f:{1+x} /or even just f:1+ // apl is: {1+⍵}
M:2 2#2 0 0 2f // apl is 2 2⍴2 0 0 2
v:1 2f
M mmu v // k can be M$v; apl is M+.×v
v mmu v // v+.×v
v mmu M mmu v
q is less good with your "subtleties" since it doesn't have rank (unlike other APLish languages), but I'm including it for completeness.
How can you be "a lot faster" than Julia? Julia is regularly 1.2x C and sometimes faster than c when leveraging builtin fortran libraries.
What's mmu? Is it matrix multiply? Why isn't it *
What is the +.* operator? Can you show me where this operator is described in a non-apl math paper?
Im sorry, but when i see apl, I just see a lot of symbols I can't bother to look up to understand. Code reviews would be a nightmare.
in Julia, if I were to write a numerical type for galois fields (and I have), I could write compile-time optimized code for performing reed-solomon erasure coding using the builtin matrix operators - the code would be exactly the notation you see in the papers describing erasure coding.
describes a number of benchmarks for an interesting data problem. kdb+ is the
third fastest performer and the fastest CPU performer. kdb+ is 1000x faster than another database written in C (kdb+ is written in k); but only about 100x faster than other C-based databases. The next closest looks like Yandex Clickhouse which is written in C++ and still less than half the speed of kdb+.
How much work is spent getting C++ to approach half the speed of an interpreted language?
> How can you be "a lot faster" than Julia?
That is an incredibly loaded question, but a big part of "how" is because of the notation: Because it's written in a language that is very dense, and puts parts of the solution near eachother, it is very easy to be correct, and to find repetitive parts of the program that can be coalesced and merged.
> Im sorry, but when i see apl, I just see a lot of symbols I can't bother to look up to understand.
It never ceases to amaze me when a programmer thinks that just because they can't do something, that somehow that follows that nobody else should either.
I mean, seriously!
> Code reviews would be a nightmare.
Why would you do code reviews in a language you don't understand?
The dense and carefully chosen symbols and meanings in an APLish language is its value. Not that it's fast (although it is), but specifically that it's easy to understand. Heck, the article we're commenting about mentions several algorithms that are so much easier to understand in an array language than anything else.
> Why isn't [matrix multiply] * [instead of something else]
I think you added this question after I responded, or maybe I missed it. It's a good question.
Because inner product (apl: +.×) is row-wise multiplication then column-wise summation. You can do a row-wise = then a column-wise > to see if the bottoms of a column end in a value (2>.=M).
There's also an outer-product (apl: ∘.×) which is the left-argument currying times the right row-wise.
* usually means power in APL. Not multiply. APL uses the standard × character for multiplication.
APL is designed to be read, but it's still describing numerical methods and operations as opposed to a mathematical notation which frequently hides complexity.
> in Julia, if I were to write a numerical type for galois fields (and I have), I could write compile-time optimized code for performing reed-solomon erasure coding using the builtin matrix operators - the code would be exactly the notation you see in the papers describing erasure coding.
That sounds great, but I don't understand any of that or why it's valuable.
Maybe that's because I don't typically read mathematical journals and type in the algorithm presented.
However you gave me an idea. Take a trie as a simple data structure with a great deal of literature about it. Some googling leads me here:
Now I can write "find" in k/q as @/ if I implement a 256-trie. Maybe the author doesn't know APLish languages and that's why they chose a poor notation. Maybe not.
But having to decode some paper like this occurs much more frequently than even wanting to implement a bunch of type code that the compiler can optimise so I can copy and paste mathematical notation.
As much as I love APL for its notation, Erlang does provides equivalent highlevel operators for processing sequences of elements. It's not fair to compare your highlevel implementation in APL to your lowlevel optimized (and quite unreadable) solution in Erlang. Here's the equivalent algorithm:
is_descendant(A, B) ->
lists:all(fun ({Actor, Event}) ->
Event =< proplists:get_value(Actor, B, -1)
end, A).
compare(A, B) ->
A_in_B = is_descendant(A, B),
B_in_A = is_descendant(B, A),
{A_in_B,
B_in_A,
A_in_B and B_in_A,
not (A_in_B or B_in_A}}.
The function `is_descendant` is nearly a translation of your APL code. I'm not convinced that we need APL on the Beam, for the quality of this VM certainly isn't in its array processing capabality... but I was expecting the argument "such an array-oriented language would motivate improvements in this direction", which is definitively true!
APL-enthousiasts often forget that APL forces you to use its builtin operators, while modern languages provide means to implement them (and aren't restricted to arrays). lists:all isn't a builtin in Erlang, but does it really matters that you can write it as `^/` in APL?
Note for functional programmers: `^` is the boolean AND operator, `/` is `fold`, and somehow there's an APL builtin hack so that `fold` knows that 1 (true) is the monoid identity of `AND`... but really `^/` is a compact notation for `fold (&&) True`
Well I'm an Erlang enthusiast not an APL one, my APL experience is reading a manual for 4 hours on a plane and 4 hours writing APL
There are a squad of specific set related problems in Riak that the Basho engineers skirted round or were afraid of - and they were all about Set operations - so APL's focus on tiny notational programmes that are graspable in the mind is appealing for that basis.
Is this the most considered blog post? No, hell no ;-) but APL struck me as fascinating...
Is the remark in parentheses just meant to make the previous statement more precise, as in "I'd like to have an APL dialect in Prolog; more precisely, I'd like to work in SWI-Prolog and implement something more like J"? Or is there some other distinction between APL/Prolog on the one hand and J/SWI-Prolog on the other hand that you are making?
Saw your comment on Reddit about this, am still confused by why you think this is a good idea. The integration of Prolog variables and backtracking is going to make a real mess of the array operations, don't you think, or is it going to be a total embedding? If it's a strict embedding, why not just have a J process and call it from SWI-Prolog?
Well, think of the IS operator
Z IS X + Y. That exists fine and works well.
X and Y can't be a variable only Z really can be. It can be annoying but that's standard Prolog.
So for the sake of discussion let's say we have
apl(Expression, Result).
Where Expression can't be a variable. If it fails to evaluate or has an error it will return false. If APL evaluates without error it will true true or the Result.
Prolog can be fast with list if using difference lists. But can be slow for some vector calculations. Using CHR/CLF one can do some mathematical calculations fast, but it runs out of memory fast for large dataset.
I might be wrong, but I figure it will be a nice experiment. My idea is to leave all things Logical and one of off type of code to Prolog and things that happen in groups to APL.
I don't think clpfd or chr are especially large hogs of memory. Have you run into issues there?
If you can write a smallish program that uses a subset of J (actually APL might be easier, since it is less likely to interfere with SWI's parser) I would be willing to try and figure out how to run it. Aim for 5-10 lines. Email me. I like the idea I'm just not totally convinced.
I feel like if I got better at J, my problems with it probably would be resolved. For instance, it feels to me like J is really bad at parsing (beyond regular languages anyway) and tree structures. But there are built-in verbs for these things, so I could be mistaken. I only threw a few months at it last year. And I kept being indecisive about APL and J. It didn't help that Aaron Hsu was such a passionate advocate for APL, and Roger Hui apparently now works for Dyalog and has tried to close the loop on a lot of the ways J improved on APL there.
My back-of-the-envelope sense, from writing a small amount of J and a fairly significant amount of Prolog, is that J is way better at solving the core of a certain class of problem, but you spend a lot more of your life converting the input into an internal representation you can use and back to a result a human might want to see. Prolog really likes a certain class of problem which is somewhat less general than J, but it makes up for that limitation somewhat by being amazing at parsing and having great internal representations of things. In most of the cases where I would want non-determinism in J, I would probably try to rework my problem so that I'm doing all the possibilities at once and then detecting which ones succeeded. If I had non-trivial calculations to do, I probably wouldn't use Prolog, but if I had to, I would certainly consider calling out to C or something. My day-to-day interests seem to coincide more with what Prolog does well than what J and APL do well.
Anyway, I think it might be a profitable area to explore. I was curious about Brachylog for similar reasons. https://github.com/JCumin/Brachylog
Anyway, I'm curious if you could show me what you are thinking. There must have been a smallish program you thought would work well with this combination. Can you share it?
But none of the examples I cite need 'high performance' (well CRDTs do) an expressive language for sets would remove accidental complexity and make more complex things simpler and less buggy
FYI, while Basho is dead, their assets were picked up by bet365, a UK-based gambling company that, like some other similar companies, uses Erlang extensively. Riak MDC -- i.e. the Multi-datacenter replication piece, the closed-source bit that Basho made its money from -- was open-sourced and development of Riak continues.
BTW, basho.com, and most of the stuff that had been on it, is online (I think bet365 resurrected it as part of their asset-purchase).
Riak the database so indestructable that is survives its parent companies demise is going into code freeze for its first post-bankruptcy release even as we speak
I am skeptical of targeting an array language to a VM that requires a type check on every element of the array. Numpy handles this by performing all its actions in a C library independently of the CPython VM.