Julia does it an order of magnitude better, though.
A generic `f` will probably be similar to the Common Lisp one. However, as soon as you use `f` in a context where types are known (e.g. if you use multimethods to overload a function that calls `f`), it will be specialized and JIT-compiled for the specific type (int, float, matrix, ...), with no extra work for the programmer.
> The JIT compiler works incrementally as functions are applied, but the JIT compiler makes only limited use of run-time information when compiling procedures, since the code for a given module body or lambda abstraction is compiled only once. The JIT’s granularity of compilation is a single procedure body, not counting the bodies of any lexically nested procedures.
If I understand correctly, functions are inlined[0] (not saying this is bad, just different). If you declare functions as inline then you can get rid of the dispatching code in contexts where the type is known in advance.
Yes, small and explicitly tagged functions can inline directly into the caller. But even when that doesn't occur, Julia is able to avoid dynamic dispatch overhead by doing the dispatch at compile-time, making the function call point directly at the specialized method.
In order to do that with CLOS, we have to add an explicit "inline" directive (as shown by lispm in another comment), because methods can be redefined anytime. Taking a shortcut to an effective method could break the code.
[Edit: Julia prides itself in being a fully generic language. From its paper:
"Generic functions have appeared in several object
systems in the past, notably CLOS [15] and Dylan [28]. Julia is distinguished from these in that it uses generic functions as its primary abstraction mechanism, putting it in the company of research languages like Diesel [10] and Cecil [9]." :-) ]
> You can externalize Common Lisp type specialization with its own platform-specific file full of DEFTYPE and/or declaim forms.
I'm not really famililar with CL, so forgive the perhaps dumb question: Does what you mentioned permit the user to write the completely generic algorithm while guaranteeing that it'll be specialized/monomorphized automatically by the comiler/runtime[1] when possible? As understood it, the thing the PP cared about whether it was guaranteed to occur and that it be automatic.
Compile-time monomorphization happens only when you request it from the compiler, because it relies on the unsafe assumption that your code won't be redefined.
Common Lisp is designed to be dynamic and the code that is produced should be able to work values of type T (anything) and dispatch it to the correct specific code. SBCL performs static analysis but the only useful way of doing static typing is to assume that known types won't change, otherwise you are going to widen types quickly and treat everything as type T.
The compiler can avoid doing redundant checks, in all modes.
But it is allowed to blindly trust static analysis only if you allow it, because that's a dangerous thing to do. SBCL also has a special operator name TRULY-THE, a more aggressive version of THE (http://clhs.lisp.se/Body/s_the.htm), which declares that an expression is of some type U (e.g. (the single-float x)).
Its use is reserved for cases where you really know what you are doing.
Is static typing as done here useless? Not at all, because sometimes you know that values cannot be redefined, for example inside a function.
I was writing a state machine, with local functions being used as states and a local variable that would be set to the function that represents current state. At one point, I compiled the code and I had a note about code removal being performed. The compiler guessed, based on the different `(setf state ...)` expression, the range of possible values for the state variable and determined that one state was not reachable. This is the kind of things for which the type system is useful, and dead-code elimination was safe to perform here because everything was done in the scope of a single function.
When I read this, all I can think about is the enormous amount of man-hours people have put in to make Python the go-to for Julias target market.
As the author says, the biggest drawback is libraries, and that is (1 of) the big attraction(s) towards languages.
Nobody wants to have to switch between C, Python, Assembly, etc. to make their code faster, but even fewer people want to write bindings to underlying C code. This also goes exactly against what the author doesn't want to do, which is to yak-shave around the language to get things working.
It's extremely common to prototype in a language like python or R to be able to bang out a draft idea before diving into making it fast. If I'm not sure how awesome something is going to be, there's no sense in wasting weeks getting a C implementation working. And prototyping in a higher level language often reveals a lot of the architectural complexity you'l have to deal with in a lower level implementation. So I think this guy's use-case is more common than you think.
Python addresses a good deal of the problem with Cython, which lets you declare variables and compile to get ~90% of the speed gain you would get from a pure C implementation. And then you can just link to a C implementation if you still find you need to build one...
>It's extremely common to prototype in a language like python or R to be able to bang out a draft idea before diving into making it fast.
I think what's far more common is prototyping something in Python or R, and then realizing that 99% of its CPU time is already spent in wrapped C code. And then you call it a day and move on to other things :)
This is why Julia hasn't taken off nearly as quickly as I initially expected - Python/R/etc are highly optimized for their common use-cases.
> I think what's far more common is prototyping something in Python or R, and then realizing that 99% of its CPU time is already spent in wrapped C code. And then you call it a day and move on to other things.
Someone has to write those libraries.
> This is why Julia hasn't taken off nearly as quickly as I initially expected
Not sure what you were expecting, but consider:
- the most conservative userbase estimate I would believe is 20k users (based on mailing list subscriptions, website stats, and download numbers).
Yes, but the discussion is about the "extremely common" case of prototyping, which definitely should not require library building, and usually should not require veering much from established libraries.
>Not sure what you were expecting
The data community can coalesce around a tool extremely quickly, in the matter of a year or two. Spark is about the same age as Julia, and has a thriving ecosystem around it. In 2004 R was a fairly esoteric analysis tool, but 2008-2010 it was the de facto data science language. Python made similar advancements in just a few years in the data science community.
Julia? I don't know a single person who uses it day-to-day, but I know a lot of people who tried very hard (myself included). The critical mass simply is not there: people aren't building packages because the users don't exist, and they don't exist because the packages don't exist. Your chart shows a linear growth in packages, which implies a constant amount of development work. This means it's not a growing language. This is what a growing language looks like: http://blog.revolutionanalytics.com/2010/01/r-package-growth...
Numba accelerates Python loopy code using an LLVM backend in the same spirit as Julia, so there's not much reason to look at Julia just for speed reasons
You can use python libraries in Julia. And it is really good at binding to C and Fortran.
Python is old and stuck with some serious design flaws which can not easily be fixed. The question is, can Python get fixed in shorter time than it takes to give Julia a competitive set of libraries.
I would love to know what fundamental design flaws are there for Python in your opinion...GIL? No problem with Cython. Unicode? It is ugly but it could be done. Other stuff like static type system, is about language itself, and I don't think it needs to be addressed as flaw to be fixed or rather just being different preference.
For me, there is not something I can think of being such a deal breaker for me to switch to a another language. More likely it might not be ideal but could addressed by the ecosystem of Python itself.
The question is not "can these problems be more or less solved" with Cython or Numpy, but: we shouldn't have these problems in the first place. We haven't talked about deployement yet, which is another wart of Python. I guess it can also be kind of solved with another lib/tooling yet-to-come, but how many of these things are we going to pile, whereas a newer language could get it right, or at least, better, from the word go?
But I don't think Julia is going to so well-rounded in face of those problems. Why deployment is hard?
Because Python has so many C/Fortran external dependencies.
Is Julia going to change it? By itself maybe, but currently it still borrows from Python in order to enrich its ecosystem, so basically adding even more layers. That for me, speaks to the volume that language is not really the pain here, library is.
Real-world problems are ugly deep down to its core and the requirements and constraints are every changing. Who thought GPU programming will be such a big deal just 5 years ago, if not DNN revolution completely turning thing around? And I don't think Julia's improvement is really prepared for this. Even more, if Tensorflow style data flow programming is better accepted, then probably any language could be used to just draw some the computation graph and the heavy lifting will be done by the framework anyway. It is going to be harder to persuade programmers to adapt a language just for the perks that is only peripheral.
Julia's GPU capabilities are _far_ better than python's. There are near complete, free wrappers for all the CUDA libraries. The same is not true for python (e.g. CUSPARSE).
This is a consequence of how easy it is to interface with languages like C in Julia. I.e. whenever a new CUDA library or change happens, it will take the Julia devs less effort to update their wrappers.
Add to this Julia's HPC support and it is by far the best offering if you want to do REPL style programming on a GPU cluster.
Wow! Thanks for the CUSPARSE.jl shoutout (I'm the maintainer of that package)!
I must take issue with "near complete" - CUSOLVER.jl, my other CUDA wrapper, is missing a lot of the RF (refactorization) functionality and doesn't really have a pretty high level API yet. The doc and testing situation is also pretty bad. Everyone else's packages in JuliaGPU (the overarching GPU org on GitHub) are in a much better state.
You are right, though, that writing CUDA bindings in Julia is very easy - so easy that I can do it! It's also thanks to packages like Clang.jl, which make it easy for us to automate the procedure of wrapping the low-level interfaces.
Finally I must add the caveat that although I should test CUSPARSE.jl and CUSOLVER.jl with MPI (and GPUDirect) I've been extremely busy recently and not able to do so. If anyone wants to help out in this regard I would be very appreciative!
This is a good argument. But that is not a market where majority Python programmers are.
Once the wrappers are done, for most developers, it becomes the matter to layering those build blocks building their own functionality. It is not a strong enough argument that people should use Julia because it is a better language to write wrapper. Let alone that Python wrappers for data science stuff is already like a de-facto builtin.
Not saying Python is perfect, but IMHO, Julia is not hitting the right spot in order to really stand up against Python.
You have to some extent switched to a different language when you're using Cython. Do you teach Cython to newcomers when you're introducing them to Python? If no, why not? Performance and parallelism shouldn't require hurdles.
I love Python (despite the community). Sorry to be that guy but Python is always the second best and for me that usually means I end up using Python less and less. There isn't one area that Python is the best tool and sadly that is why it just isn't taking over the programming world.
I wrote that article. I still bet on Julia. We just used pre-production software in a production environment and the tooling wasn't ready. I still love the language and believe in its future, and I can see us building future services in the language.
They tried to use Julia for things like networking instead of confining it within purely numeric parts. No idea why would anyone use a DSL for doing some "general purpose" stuff.
Of course it can be used for a general purpose stuff, but it should not, because there are always much better tools that a specialised for those domains. It's never a good idea to use a language for something it was not designed for.
Technically Julia is designed for general purpose computing. It just happens to be really good at numerical applications. The main thing holding it back is lack of tooling, debugging, and somewhat largish changes in syntax still. Having said that writing typed, expressive, and _performant_ DSL's in Julia is easy if not even kinda fun.
I think it will all come down to web assembly. Why?
Python has blaze, numba, dynd and dask going for it. These all ameliorate (and exceed Julia in some respects) many of the disadvantages of python (including fast user defined types). Then there are the libraries that while some can be used in Julia, you will never get full reliability, ease of use and compatibility.
On the other hand, Julia has amazing metaprogramming and much cleaner scientific syntax.
I think it will come down to whether Python can compile to fast LLVM regular standard lib programs.
Once (if?) Julia can be run in the browser with web assembly (using ahead of time compilation to produce small binaries), python has no chance if it doesn't follow suit. Python has no advantage that can permanently match Julia's potential ability to run on mobile, front end web, back end etc all from one beautiful codebase.
Can numba do this yet? Without having to write special "numba classes" code?
Blaze has a giant serialization wall between it and the JVM which it needs to solve to ever be competitive in its target market. Ibis is the much more compelling effort here.
Dynd is an overly complex C++ disaster that is trying way too hard to solve problems that aren't what anyone really struggles with. The number of real world problems for which homogeneous dense multidimensional arrays of floating point numbers is the solution isn't growing much. Real problems have interesting structure that your data structures need to be able to exploit on a problem specific basis. Doing the array layer all in templated C++ with Python bindings is not a recipe for making something easy to use and effective at problems it wasn't previously designed and compiled to solve. Displacing numpy is not going to happen in the Python space.
Dask has good ideas, but the insistence on implementing everything in pure python means you're stuck with the GIL and lousy performance of your scheduler. This needs to be low overhead if they want anyone to actually use it.
Granted special Numba class syntax, and nowhere near as elegant or integrated as Julia.
Dask distributed interfaces directly with HDFS...no need for serialization.
Dynd is specifically designed to deal with arrays of custom types...criticism sounds like it would be more aptly directly towards numpy.
once it gains Numba binding it will lose the Python overhead.
Re Dask...is 1ms per task too much? It has a threading scheduler that can work with numpy arrays to release the GIL....so not bound by that with numerical code.
Regarding regular list and text processing... Julia's poor performance with heterogenous data actually makes it slower than python for cleaning the corresponding dirty and text datasets.
Interfaces to HDFS how? PyObject has to get translated somewhere.
Arrays of custom element types is a good first start, but boy is dynd a seriously overengineered way to accomplish that. I'm referring to array structure, sparsity, symmetry, linear algebraic properties that should be reflected in the type system. Python's type system is lousy for this, and C++ isn't extensible.
Julia can fix performance on heterogeneous data with gradual compiler improvements, and the string representation is due for a major rework. Python can't fix the fact that the language and libraries were not designed to be efficiently JITted, and extension interfaces are closely coupled to the CPython interpreter's API.
I don't see why all that can't be reflected in dynd type system and array metadata. It's just early and foundations are still being laid.
Revelation julia union etc performance...are these improvents a given, hypothetocal or hope? Doesnt fast code on these types fly kn the face of julia static optimization ethos?
Also serious question: dask's use of fast python datastructures like dictionaries gives it a 1ms per task overhead. Is that slow? How does it compare to other dag frameworks like julia etc
> Doesnt fast code on these types fly [in] the face of julia static optimization ethos?
What does that even mean? Julia's union types aren't intentionally slow, they just aren't implemented very efficiently yet. Major revisions of how they're implemented are definitely on the roadmap, not far away.
For fine-grained parallelism of the type you'd use MPI for, 1ms overhead (assuming that's pure overhead above and beyond the actual cost of data movement) could be significant, sure. If you have calculations that need to go for thousands of individually cheap iterations, it adds up.
> Julia's potential ability to run on mobile, front end web,...
Julia is designed for interactive, exploratory scientific programming, like Matlab and R. I don't see any reason someone would want to do that on a phone or in a web browser when it already runs natively on a PC. Then there's the matter of Julia being designed to interface efficiently with optimized native C/Fortran libraries.
This. Scientific computing is not interested in running in web browsers.
As a former neuroscientist, I occasionally had connectivity analyses that would take 3 weeks to run. No way would I risk slowing it down with anything unnecessary. A 25% slowdown for using VMs in the browser would mean milliseconds for users and days for me.
R's "shiny" is a counterpoint. Obviously something that takes a week to run isn't going on the web. But a complex model and visualization that can be incrementally updated quickly and in real time? That would be cool.
+1 for this! For large programs -- server side Julia, but having browser native Julia via WebAssembly would make visualization and interactive analysis for data science fantastic. Performance of running large GUI applications and libraries would be much faster since WebAssembly could take advantage of the huge amount of work done for optimizing JS by tiered interpretation/JIT'ing.
I agree there's value on the visualization side. Most scientists still do a poor job of sharing results to the public and other scientists (other than by printed images in your article...)
I understand why some may see Julia as 'designed for scientific computing', but this is mostly a reflection of the work that's been put in to build up the library ecosystem. Julia could end up being a good general purpose language too, the only thing holding it back from this is the lack of library diversity and size.
WebAssembly is a neat idea, and could be useful for creating cool ways of interacting with scientific data. But a lot of code scientists write only needs to run once on one machine. And in that case, there's no reason to take the inevitable performance hit associated with WebAssembly.
> Once Julia can be run in the browser with web assembly [...]
That will never happen for Julia, because the language is designed to be fast for scientific computing... targeting web assembly is a pointless additional layer of abstraction that only negatively impacts performance.
A "compiled binary emitted by a hypothetical slimmed down non-numerical Julia" seems to be the going theory about WebAssembly support in that issue thread, as the numerical C/Fortran libraries needed by Julia have inline assembly too. Doesn't sound like there is a direct way forward.
I see value in Julia web notebooks, where the lifting is done by a server (or localhost), but what are the motivations for running Julia in a browser?
There are definitely a lot of hypotheticals here, but many of those goals are desired in their own right and aren't specific to web assembly.
Interestingly, there's a slow-but-steady effort underway to implement a generic BLAS and LAPACK in Julia itself, since this allows for computations with custom datatypes that aren't supported by the Fortran libraries. For example, you can now do QR factorizations and solves with any numeric type, including Rationals and Quaterions: http://stackoverflow.com/questions/20985783/rational-matrix-...
Julia's dispatch system allows for these kinds of layered algorithms very nicely, and they could just as well live in separate external packages.
Web assembly would be great. Most browsers support WebGL now. Julia can utilize the GPU via the browser; the gain would be huge. Second, browsers are perfect for SETI-style distributed computation. Imagine a beowulf cluster of these browsers running downloaded Julia snippets...
> Julia breaks down the second wall — the wall between your high-level code and native assembly [...] you can take a peek behind the curtain of any function into its LLVM Intermediate Representation as well as its generated assembly code — all within the REPL
While I agree with the author in "betting on Julia", I do not think inspecting LLVM IR or assembly of individual functions in the REPL is as useful as a feature as the author thinks, because it gives an inaccurate picture of what the actual output will be.
Peeking at the code at the level of functions means a non-inlined version of the function must be produced at the REPL, and anywhere else this function is used in your actual code, it will probably be inlined (the author's example certainly will) so it can be further optimized/eliminated by later passes. Furthermore, for a non-trivial function's dump, fundamental optimizations that increase code size (inlining, unrolling loops etc) would make the output confusing to compare with the Julia equivalents.
Thus, I imagine a lower optimization level is used to produce readable REPL dumps. For a performance minded person (the only type of person who would care about ASM/LLVM dumps at a REPL) even this is not helpful.
Yes, when a function is inlined, the caller's context can dramatically affect optimizations. And inspecting varargs functions can be tricky.
But beyond that, it generally displays exactly what gets executed. You can verify this yourself by looking at functions with different `-O` or `--inline=no` flags. And if you're interested how a function optimizes with inlining under typical conditions, you can just inspect the outer function.
Language design is vital for a language that can evolve. If you build a language for "cowboys" with the immediately-useful functionality that you can see, but without the "language geeks" insisting on the underlying principles, you end up with a language like perl - one that can dominate a particular use case for a while, but will be virtually impossible to evolve, and ultimately fall out of favour as it's overtaken by languages with firmer foundations.
That's why I'm betting on the strongly-typed functional tradition - OCaml/Haskell/etc. These language already have the killer feature from this article: they can be (almost) as expressive as Python and (almost) as performant as (naive) C++. But they also have really solid design with a lot of thought put into it that will let them evolve to meet the needs of the future.
Strongly typed and dynamically typed, which means that you can get runtime errors. Also, the "type system" with "dependent types" is really just equivalent to an ML datatype that's dynamically type checked.
I'm a bit mystified with respect to what passes as research in the field of type systems for dynamic languages.
Yes. I think good programming language design is necessarily general, and the best way to achieve that is with a language that's been validated in widely varied domains. Certainly my experience is that specialized languages are consistently worse than general-purpose ones.
Julia is general-purpose, but there is a deliberate orientation towards scientific computing. That's a partly non-technical choice about what is going to be optimized in priority.
> For me, code is like a car. It's a means to an end. The "expressiveness" of a piece of code is about as important to me as the "expressiveness" of a catalytic converter.
Im a sculptor, but all I make is coffee cups and ashtrays so I dont need to worry about the minor details of things..... This is what abstraction gets us.
Julia is getting plenty of attention. It's pretty obvious at this point that Julia is destined to eventually be a major player in some fields.
The main problem facing the language is maturity. That will come with time. Until it has fully stabilised, has a mature runtime, has a mature library ecosystem, and has strong debug and post-mortem tooling, it is not something everyone should use.
They're smart alright, but I think Julia has gotten a lot of play on other channels besides the Matlab/tech world. The stopper is for me is use cases. I play with ideas in Mathematica (notebook IDE for decades) because it has everything at my fingertips, keep J programming language opened on my desktop for quick prototypes (not just as a desk calculator), and if I need technical libraries and speed, I just use C with the relative libraries (even though I should learn Fortran :) ) I like Julia's syntax, and I think it will be more general purpose as it matures, but it is already the wild mistress of the Matlab playboy.
I meant that as a PL being pitched as a Matlab replacement, and for math and science communities, it has had some highly visible, hyperbolic articles in Wired ("Man Creates One Programming Language to Rule Them All"), Venture Beat, Fast Company and other more broad startup and popular tech magazines. I like the programming language J, and I think I can truly say it doesn't get a lot of exposure outside of its niche along these lines, so I wouldn't say Julia doesn't have good marketing; J has yet to be featured in Wired ;)
What Julia really lacks at this moment is something along the lines of the Matlab IDE. The community is aware of this though and there is something in the works that even I, someone that lives happily with a ssh/tmux/vim set-up, would deem as interesting.
Since we are on Hacker News, maybe I should also spread a rumour. MathWorks, allegedly, in a recent settled lawsuit added a condition stating that the other company was not to create any Julia bindings for its product. If this is true, which I have many reasons to believe, it is a major recognition of how at least MathWorks sees Julia as a real threat to the Matlab market share.
> MathWorks sees Julia as a real threat to the Matlab market share
I think Mathworks is feeling a lot of pressure from both Python & Julia. In the r2016a release of MATLAB, Mathworks essentially released a Jupyter (one of the first environments for Julia, and formerly iPython Notebooks) competitor with the 'Live Notebook'. Mathematica has had notebooks for quite a while, and the best Mathworks had was their MuPad suite. I think the prevalence of Jupyter definitely led to this development.
On the other side of things, are you aware of any efforts in the open source community to close the gap between jupyter and Mathematica notebooks (i.e. editing of pretty-printed math expressions)?
I would really like to not give Stephen Wolfram money every year, but evidently not quite as much as I really like to be free of paragraph-form monospace code blobs from hell. I was hoping that MathJax meant I could wait a few years and have my cake and eat it too, but so far I haven't seen anything that looks serious :(
LyX and TeXMacs can do that kind of thing through Maxima plugins. The Maxima frontend itself has some prettifying capabilities. Of course, if you aren't doing symbolic math that is going to be less useful to you.
Certainly using Julia with Jupyter gets you much closer to having a reasonable IDE, though you'll have to be comfortable with cell-mode debugging instead of breakpoints and single stepping. I don't personally use it but I'm intrigued and would love to switch away from Matlab if there were reasonable feature parity or an alternate model that seemed reasonable for most basic tasks.
I'd really like to have a replacement for cell arrays and struct arrays, that have some of the convenient features of those structures. I know you can get this with lists of dictionaries in python, for example, but it's not the same.
I've played around with them a little, but part of my issue is that our real-time data acquisition (in a neurophysiology lab) uses matlab/simulink so I'd need to create some middleware serializer to go from matlab data structures into dataframes. Not too difficult but there's a lot of momentum behind the matlab tooling unfortunately.
It is free/open source. And, it benefits from being a younger language and therefore has the hindsight of the quirks in Matlab's design. Coming from Python, the 1-indexed arrays are annoying but it's great for math.
That's interesting, because the 1-indexed arrays are primary the primary reason I don't ever seriously consider using Julia (or Lua); I can only shudder and imagine the obnoxious number of off-by-one mistakes I'd make.
If I can switch between programming languages that use 0-based and 1-based indexing without getting confused, surely you could too. If you don't think that learning to use 1-based indexing is worth it, that's fair enough, but it's pretty low on my list of things that give me trouble when I translate code between languages.
I think this is a mental block people have. I grew up on zero based arrays in C, but I can't say this has caused me any issues thus far using Julia.
I don't know how it is for others but for me I just into this mode where I think about Julia as doing math and I know in math you start at 1 for vectors and matrices.
Besides the way functions and APIs work, you seldom have to write code that really depends on the start index.
What you are saying is conjecture if you haven't tried Julia or Lua. The reality is that 1 based indexing is trivial compared to the difficulties of a complex program.
Yeah, it just feels wrong. Of course that's just habit and in fact starting counts at 1 is actually the more user-centric way of doing it.
I wish I could get used to Julia but for some reason I just think the source looks..ugly. Maybe it's my dark past with php that makes me shudder at the sight of avg(x) (instead of x.avg).
I know that's stupid (and the syntax is actually required for multiple dispatch). Maybe Julia and I should try couples' therapy.
Basically every mathematical/data science language uses 1-based arrays -- R, MATLAB, and Mathematica in addition to Julia. Probably because traditionally (and to a degree even now) the standard numerical language was FORTRAN.