Hacker Newsnew | past | comments | ask | show | jobs | submit | l0stman's commentslogin

Well, the wayback machine is your friend. http://web.archive.org/web/20111001124736/http://log.nadim.c...


I was just wondering if you've planned on open-sourcing the code since the very beginning or if the idea came much later. Anyway congrats on launching.


We wanted to open-source all along, but there were concerns raised by investors about IP, etc. It took us a bit of time to work through all the issues people raised, which is why it took so long.


How do you plan on monetizing the business ? Are you looking at some kind of consultancy play, ala Redhat / MySQL (pre-Oracle) ?


We want everyone who has a reason to use the product to be able to get it and use it for free (as in beer and speech). As companies grow and operations become more important, we'll be there to provide paid support. It's actually a great business model because you don't have to hire a really expensive enterprise sales team (you still need a sales team, but it mostly has to handle inbound requests).

We're also looking into launching services, which is a great revenue stream for people who prefer to pay for convenience of not having to deal with operations at all.


Well, played. For a couple of seconds I was wondering why I had this impression of deja vu then I found this (http://hackerne.ws/item?id=4692794). I burst out laughing.


Haha you got me. PG can rest assured that middlebrow dismissal apparently only gets a disproportionate share of upvotes when it's about health or nutrition :P


    qsort :: Ord a => [a] -> [a]
    qsort []     = []
    qsort (p:xs) = qsort lesser ++ [p] ++ qsort greater
        where
                lesser = filter (< p) xs
                greater = filter (>= p) xs
I know nothing about Haskell but I don't think this code implements the original quicksort algorithm which sorts the input in-place. Moreover the two-pass of filter over the list and the concatenations cause unnecessary overhead. Thus, even if the sample code is simple and elegant, a real-world implementation would surely be a bit longer than the given example.


I find this a recurring problem when trying to learn Haskell. On the one hand you have lots of books and tutorials talking about how simple and elegant Haskell is. Showing all the awesome things you can do with two lines of code. Then I try to write simple and elegant Haskell like that and find it runs an order of magnitude slower than my Python code.

I wish more Haskell proponents would spent less time showing off simple and elegant code, and more time showing fast and correct code.


This is the real "sort" used by GHC 7: http://hackage.haskell.org/packages/archive/base/4.3.0.0/doc...

It's pretty elegant, too, if less than the inefficient pseudo-qsort shown by the OP.


Correct me if I am wrong but they are actually using mergesort (mergeAll) as it is significantly faster in Haskell than in-place quicksort (qsort).

That means there is a lot of overhead for doing something simple like an in-place qsort that should be much faster than a mergesort.

I am at a loss on why the provided code is any more elegant than: http://en.literateprograms.org/Merge_sort_%28C_Plus_Plus%29

or this (also taken from rosetta code)?

  #include <iterator>
  #include <algorithm> // for std::partition
  #include <functional> // for std::less
  
  template<typename RandomAccessIterator,
           typename Order>
  void quicksort(RandomAccessIterator first, RandomAccessIterator last, Order order)
  {
    if (last - first > 1)
    {
      RandomAccessIterator split = std::partition(first+1, last, std::bind2nd(order, *first));
      std::iter_swap(first, split-1);
      quicksort(first, split-1, order);
      quicksort(split, last, order);
    }
  }

  template<typename RandomAccessIterator>
 void quicksort(RandomAccessIterator first, RandomAccessIterator last)
  {
    quicksort(first, last, std::less<typename std::iterator_traits<RandomAccessIterator>::value_type>());
  }
The above code is more verbose?


> That means there is a lot of overhead for doing something simple like an in-place qsort

Well, the problem not so much that there is a lot of overhead for doing an in-place qsort. The problem is that "in-place" anything is impossible in Haskell, it violates referential transparency.

Merge sort however needs no in-place update and can trivially (in Haskell world) be parallelized (merge sort is associative, you can use it as the reduce step of MapReduce). So you are correct in concluding that mergesort is used in Haskell.

You state "something simple like an in-place qsort", your statement about qsort being simple makes some assumptions about your programming languages world, which are not the same for all languages especially not Haskell. No whether all this is a pro and a con depends on what sort of coding you're doing of course.

I think that people call Haskell's code more elegant because they (like me) find C++ template and type annotations to be annoyingly verbose and messy for no good reason. For example for the two quicksort functions:

"template<typename RandomAccessIterator> void quicksort(RandomAccessIterator first, RandomAccessIterator last)"

vs

"quicksort :: (Ord a) => [a] -> [a]" (where "Ord a" means that a is an instance of the typeclass of ordered items).


If you want to be terse, you can just use typedefs but most of ISO C++03 libraries just assume your editor can autocomplete quickly. The culture is different and absolute clarity is preferred over terseness.

There's a lot of infrastructure behind MapReduce and I bet you any implementation is going to have a lot more code in it than what is in Data.List.

Once things are in RAM (say, less than 1 GB), quicksort is the fastest algorithm (Edit: or some other optimized in-place variation on the theme such as introsort). You can parallelize across cores all you want but all that is going to do is crap up your caches and cause bus contention, making your code more complex and slower. The bottleneck is your databus not the CPU, after all what is the cost of a compare in clock cycles? This is also why mergesort is slower and parallelizing is not going to fix it.

If you want to see things that are parallelizable check out vectorizing libraries such as Thrust that run on massively parallel GPGPUs that have the bus design and RAM to handle that kind of parallelization. They are written in C++ and both the implementation and the code to use them looks almost the same like the code I posted. The compiler technology of other languages is years behind and currently can't even target these kinds of architectures.

I've seen GPU massive parallelization in Haskell and all it did was to generate C code and then invoke the CUDA compiler. This was in a research paper. In the meantime you can fire up nvcc with Thrust and sort bazillion of keys with less than 10 lines of code in C++ using a well-tested library.

Edit: BTW, anyone (and I really don't know who would be doing this) who is even thinking of running a MapReduce Haskell cluster must have lots of free time and $$$ to burn on wasted CPU cycles.


Well, I didn't mean an actual MapReduce cluster, that's just silly. I was just pointing out that Haskell compilers can convert trivially convert these sorts of things to multicore code. Parallel Haskell basically does this, converting normal code to easy multicore code.

My point wasn't about using Haskell for high performance computing, my point was that Haskell's compiler can convert normal Haskell code to multicore code without (much, if any) work from the programmer. There is a large gap in between sequential single threaded program and a full MapReduce cluster and I think there is a significant amount of software written in this gap where Haskell-like languages could help with the parallelism/concurrency. The only reason I referenced MapReduce is since its a well known example even for people who are not well versed in functional programming (its a very common concept in functional programming).


No, because it is solving a different problem. Quicksort cannot sort linked-lists, not even in C++. I think most implementations of std::list<>::sort (in C++) use some variant of mergesort.

And it is not true that in-place modification is not possible in Haskell. It is just not considered elegant, but it can be encapsulated in a perfectly Haskell-y way. (Hint: ST monad.)

Also I did not claim that my linked code is more elegant than some other code. Where did I make that claim? So I don't have to defend the claim I did not make, right?


In fairness, in-place sorting usually isn't what you want in functional programming. Immutable data structures help enforce the pure functional constraints. Of course, in practice, your library code will probably make a mutable copy, sort that in-place and then return an immutable copy/representation of the result, which is hopefully more efficient. (Disclaimer: I'm no Haskell programmer; my FP experience is limited to Lisp dialects)

EDIT: As a curious example, Clojure's sort function actually converts the sequence to a (Java) Array and calls java.util.Arrays.sort() on it, then turns it back into a sequence:

https://github.com/clojure/clojure/blob/b578c69d7480f621841e...


Advancing compilers is hard. When people argue efficiency as a compiler implementation detail that is going to get worked out, they forget about many who have fallen before them.

You can argue some older languages were written in a way in which it was (reasonably) easy to write a compiler that generates code with little performance overhead when compared to assembly (at worst a factor of 2 to 4, back in the 80s). Some popular languages keep adding abstractions and constructs that the compiler can actually deal with without some new compiler research breakthrough.


When people argue efficiency as a compiler implementation detail that is going to get worked out, they forget about many who have fallen before them.

Yes; see also http://prog21.dadgum.com/40.html


Thanks. Beatifully written and I love the print-shop analogy.

It's nice he is into the C&C Portland Wiki, there's golden nuggets of wisdom everywhere. I am reposting the link from his blog. http://c2.com/cgi/wiki?SufficientlySmartCompiler

Personally of the higher-level languages, I found SBCL to be crazy good at optimizing, of course given a few nudges with a compiler directive or two. By inspecting the dissasembly, I could validate it was doing the "right-thing" (TM), but then again, I do the same to check the C/C++ compiled code.


> Thus, even if the sample code is simple and elegant, a real-world implementation would surely be a bit longer than the given example.

Nice thought. What's the purpose of showing code which you won't be using in a real-world application? Making you think the language is more expressive than what it really is?

Many people say Haskell code is elegant and concise. Would this be the case if code we were shown would be real-world Haskell?


Not only is it making two-passes for filtering, it also has to do another pass to append "lesser" onto [p] and "greater". And that is for a single recursion so these multiple passes are happening per-recursive call.

Here's a page with some actual haskell quicksorts:

http://www.haskell.org/haskellwiki/Introduction/Direct_Trans...


Ok, I am still learning Haskell, but I thought I would try to fix the above so that there is only one filter pass:

    qsort :: Ord a => [a] -> [a]
    qsort []     = []
    qsort [x]    = [x]
    qsort (p:xs) = qsort lesser ++ [p] ++ qsort greater
        where
            (lesser, greater) = foldl split ([],[]) xs 
        where 
            split (left, right) x = if x<p then (left:x, right), else (left, right:x)
Does this work?


I had to look up what the `:' operator does and it adds an element to the beginning of the list. So the last line should be

    split (left, right) x = if x<p then (x:left, right), else (left, x:right)


It would be pretty difficult to write an in-place sort in Haskell, given that it only has mutable state through monads (and I doubt anyone wants to go into a monad just to sort something). GHC is pretty good at optimizing list concatenations, so that's probably not a big deal either. You're absolutely right about the two-pass filter, though.


It's not that good, which is why they gave up on using quicksort in Data.List .


Indeed. It's amazing just how difficult it is to write a correct in-place quicksort that handles all edge cases. It won't be short and elegant either. Just go ahead and try it.


http://pastie.org/1367726 -- it's not that hard, really :)

* especially with naive pivot from the middle of array


Don't ever use that pivot code for more then 2^30 elements.

http://googleresearch.blogspot.com/2006/06/extra-extra-read-...


Your code:

  program qsortTest;
  const N = 1000;
  var
    a: array [0..N] of integer;
    i: integer;

  procedure quicksort(var a: array of integer; l, r: integer);
  var
    i, j, temp: integer;
    pivot: integer;
  begin
    pivot := a[(l+r) div 2]; i := l; j := r;
    while i < j do 
    begin
      while a[i] < pivot do inc(i);
      while a[j] > pivot do dec(j);
      if i <= j then 
      begin
        temp := a[i];
        a[i] := a[j];
        a[j] := temp;
        inc(i); dec(j);
      end;
    end;
    if j > l then quicksort(a, l, j);
    if i < r then quicksort(a, i, r);
  end;

  begin
    { populating the array }
    for i := 0 to N do
      a[i] := random(N);
    writeln('initial array');
    for i := 0 to N do write(a[i]:6);
    writeln;
    quicksort(a, 0, N);

    writeln;
    writeln('sorted array');
    for i := 0 to N do write(a[i]:6);
    writeln;

  end.
Do for loops in delphi/pascal iterate 'up to' or 'up to and including' ?

If they iterate 'up to' then:

It would seem to me that on the initial call to 'quicksort' the '1000' gets passed in to 'r', the value of 'r' then gets passed in to 'j'.

Now if "'a[j]' < p" will refer to the uninitialized value in a[1000] and compare it to p, the first decrement will not take place, i<=j and now the code will swap with a[i] with a[1000].

This should cause the uninitialized value (0?) to end up as the first element in the sorted output array.

I hope I analyzed that correct, I don't have access to a pascal/delphi compiler here.

When you run your code do you see the output starting with a '0' element ?

Is there another element from the input array missing in the output ?

If pascal/delphi loops iterate up-to-and-including did you actually intend to sort an array of 1001 elements ?


For loops are "Up to and including".

Arrays boundaries are [0..N] -> [0..1000] (1001 element actually, sorry if that confused you).

[add]: I could rewrite it in C, it should be trivial bijection, I believe.


I suspected as much, but indeed it confused the hell out of me, I've never written a line of pascal so my buffer overflow detector false triggered :)


Just finished ``On Lisp''. This is an advanced Lisp book and shouldn't be your first one. It assumes some familiarity with the language so you're better of reading ``Practical Common Lisp'' first if you're diving into Lisp for the first time. The book focuses almost exclusively on advanced use of macros and metaprogramming.

Interestingly, I've also read SICP in parallel -- currently reading the last chapter -- and there's some overlap in the topics treated in the two books. I'm also walking through CLRS and TAOCP at the moment, albeit reading the latter with a slower pace compared to the former.

On the non-technical side, I'm reading ``Drawing with the right side of the brain'' by Betty Edwards. I've picked up this book since I've decided to resume drawing one year ago. I highly recommend it if you struggle to draw a realistic rendering of a real life object or a landscape.


Just add the following lines in your .bashrc:

    export GIT_PS1_SHOWDIRTYSTATE=yes
    export GIT_PS1_SHOWUNTRACKEDFILES=yes


Thanks. I didn't know about those. I've configured them in my ~/.bashrc now: http://github.com/avar/dotfiles/commit/43454870


Note also that there's a more feature complete version that accomplishes the same tasks in the contrib directory of git (http://git.kernel.org/?p=git/git.git;a=blob;f=contrib/comple...).

It's implemented in bash but I use it with almost no modification with the Korn shell.


You linked to the wrong version, given the topic of this thread.

Recently ZSH support has been added to git-completion.bash, but it's still only in the "next" branch of git: http://git.kernel.org/?p=git/git.git;a=blob_plain;f=contrib/...


This makes me very happy, thanks!


Actually I posted this article yesterday http://news.ycombinator.com/item?id=1624402 but I went unnoticed. The author was working on search feature for his Hex editor and tried to outperform GNU grep by using Boyer-Moore. After multiple attempts, he finally discovered the hard way all the tricks used by Mike Heartel to make GNU grep run faster than his implementation.


I read that one years ago thanks for the repost. I missed it on the submissions, they go by so fast now. I think we're too late to bring it up though, it would need quite a large number of upvotes to come back after dropping down this far.


Thanks for sharing this, it was extremely brave of you to do so. Speaking of one's failures in public requires a high degree of courage. Besides, from reading your other posts on HN, I thought you were what I would generally consider a successful guy. But like you said, there's always a flip side in every situation.


Since this issue has been raised many times on HN, could you show the kind of queries that point back to Mahalo in the results. I use Google daily but it never happened to me.


Seconding this. After years of heavy googling, I just came upon my first Mahalo page a few days ago. It was in probably the 6th or 7th position on the first page, and gave me the answer I needed.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: