Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Can you give an example of the copy/paste you're worried about?

I agree that working with numpy arrays in cython introduces some semantic overhead, then again in newer version of cython they've introduced the more generic concept of typed memoryviews [0]. I had a lot of success recently using this to generate some very efficient multithreaded code with a minimum of hassle.

[0] http://docs.cython.org/src/userguide/memoryviews.html



It's things like the square brackets and triangular brackets, e.g. def naive_convolve(np.ndarray[DTYPE_t, ndim=2] f, np.ndarray[DTYPE_t, ndim=2] g)

I know what this is for, but I have no idea how this is implemented - is it some sort of library, some sort of macro, something inside the compiler, or what? It feels very opaque. I don't understand the lifetime of these objects or what could go wrong.

Even with C++ templates, I can usually work out how something is happening, but not here without research. The Cython-specific syntax also feels pretty unpythonic. The feeling I get from using it is that it's an ugly Python/C/C++ mutant with a load of hidden rules.

What I really would like is a set of templates or even macros for C++ which would allow easy wrappers to be written for different Python implementations, handling the reference counting, etc. Does such a thing exist?


The example you give is the old way to specify the type of a numpy array. With typed memory views it's more straightforward IMHO:

    cdef convolve(double [:, :] f, double [:, :] g)
I don't know if there are templates or macros for C++ which do the same thing, but if you look at the output of Cython and the complexity and amount of work it's doing, I don't think it's likely something like that could work.

The generated code is of course clunky and difficult to follow, but I found that when trying to understand specific things it's easy to follow what parts of the Python C API are being invoked and consider the overhead involved. The annotated HTML output is useful for this (cython -a).


And usually you want to avoid invoking the python C API from cython unless you have to use it for some reason. The whole point of the typed memoryviews is that you get direct access to the raw memory buffer and can access it like a numpy array or even like a pointer array.

This blog post [0] incrementally shows how to get microoptimizations out of your cython code.

In the end, the autogenerated code is usually not too bad too follow since it's a one-to-one transpilation from the cython to C. cython -a helps a ton, or if you can use it, the cythonmagic [1] for the IPython notebook embeds the output of cython -a inline in a notebook, making the iteration process much quicker.

[0] https://jakevdp.github.io/blog/2012/08/08/memoryview-benchma...

[1] http://ipython.org/ipython-doc/2/config/extensions/cythonmag...


In the specific case you pasted, the [] are just syntactic sugar to specify information about the arrays that you are passing in to Cython.

I find that far more legible than complicated template expressions - but that's likely a matter of habit.

About the object lifetime: there is a lot of good information on the Cython webpage (although it is a bit spread out) - but usually, unless you are writing Cython only applications, I tend to write tight Cython numerical routines that take a pre-allocated input/output vector and just update the output vector.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: