Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Are you able to talk about your dataflow language and/or compilation techniques? I'd love to hear about them, if possible, as its something I'm very interested in and am looking into myself. I've found it dificult to find information on compilation techniques for compiling a dataflow language to a traditional processor so have been making most of it up as I need, but my own work is far from optimal. Any insights you can provide here would be appreaciated.

Are you simply converting dataflow code to blocks of imperative code and have these blocks running concurrently (eg, in threads) and communicating with each other? Or is there a more elaborate instruction scheduling mechanism at work?



Without going into too much detail for obvious reasons, we are doing as you mention - converting dataflow into blocks of imperative code and scheduling them on threads. Although the scheduling we do both at the thread level and also at the block level is non-trivial and reasonably optimized. I'm sure most of what I'll mention below you've already discovered if you're implementing a dataflow language, but just in case:

When implementing/generating code for a dataflow language it really depends on the level of granularity you want for your parallelism. On one extreme, you have super-fine-grained parallelism and can schedule each operation independently. On the other extreme, you have little or no parallelism and pre-schedule all of your operations in one big chunk. In the middle somewhere is typically a sweet spot where you have chunks of operations pre-scheduled, and then dynamically schedule those chunks.

A key to optimal dataflow code generation (while retaining parallelism) is figuring out where to draw that line regarding parallelism granularity. Too granular and you spend all your time scheduling. Too coarse and you lose all of your parallelism. The ideal point somewhere in the middle likely depends on the operations themselves (i.e. the composition of your language). In some cases it is a decision you have to make by trial and error. In other cases you can make more informed decisions based on the nature of the operations and the context of their use within the program.

Communication is really important as well. CPU caches encourage you to minimize data copies and operate on the data in place when possible (to minimize cache line fetches). Thus if your communication is based on a scheme where you need separate copies for each operation then your performance will really suffer. Message passing can work, so long as you are passing things by pointer and aren't having to make copies.


Thank you!!

My own simple implementation currently compiles mathematical expressions with an (optional) pre- and post-condition into a single block and then executes these blocks in a thread pool (ie, they don't have dedicated threads, but share a number of pre-created threads). There is obviously a significant overhead in passing data between the blocks, determining when they are ready to execute and so on. I have not yet looked at merging blocks, at some point I think I'll JIT merge multiple small blocks into a single large block.

As for data, my current system is statically typed, so it knows ahead of time what the datatype is. Simple values are passed in place and copied as needed. Complex values (structures, lists etc) are (or will be, I still have some work to do here) passed as pointers and the pointers copied as needed. In this case, the data itself uses a copy-on-write scheme, so that if it is only read, then no copies are made.

My implementation is currently not very intelligent. Operations are executed and scheduled in a virtual machine of sorts, though I will be JIT compiling expressions and conditions within the next week or two. Theres still loads of room for optimization though and I'm still trying to think of ways to make the block scheduling cheaper.

Always interested in hearing how people go about these problems, so thanks again for sharing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: