Both your snippets are exceedingly long, verbose and painful to understand. In C...

kbp · on July 14, 2018

Sure, there's other ways to write it, the point of the post was Lisp's syntax, not the algorithm I used to demonstrate some features of it. You could write examples like yours in Lisp or Python, too. As well, your examples both slurp the whole file into memory, which mine avoided.

edit: And your C# version is only 24 non-whitespace characters shorter than the Python version, and 20 of those are because you called your variables 'f' instead of 'filespec' and 'w' instead of 'word'. A 4 non-whitespace character difference makes it exceedingly long and verbose? Or are you just advocating 1-letter variable names and avoiding newlines?

tigershark · on July 14, 2018

The c# version is calling only 4 methods and in C# you obviously have to declare the types. It’s the difference between declarative style versus imperative that I wanted to highlight, obviously Python is more compact than C#, but written in that way it becomes more verbose and more difficult to read and write. Write something like that in Lisp and let’s see how it compares.

kazinator · on July 14, 2018

Isn't "kv => kv.Key" a lambda function that is called?

kazinator · on July 14, 2018

TXR Lisp:

  (defun count-words (filespec)
    [(opip file-get-string
           (tok #/[^\s]+/)
           (group-reduce (hash) identity
                         (do inc @1) @1 0))
     filespec])

We build a function which gets the file as a string, then tokenizes non-space-character chunks out of it, which are then group-reduced to a histogram hash. We pass the filespec to this function.

I posted this yesterday but deleted it, because grandparent's point wasn't about code golfing but just comparing Lisp and Python syntax.

There is a group-by function in TXR Lisp, but group-reduce is more efficient, because by using it we avoid building the group lists and counting. It's something I invented. Basically it performs multiple left folds in parallel, using the entries in a hash as multiple accumulators. Items from the sequence are hashed to their respective accumulator entry and injected through it. 0 is the initial value for the accumulator when it doesn't exist, functioning exactly like the initial value in a regular fold. (do inc @1) expands to a function which just increments its left argument (the accumulator) and returns it. We cannot use succ because it takes exactly one argument.

group-reduce has an added flexibility in that it doesn't construct a new hash, but takes an existing one as an argument. This adds the (hash) verbosity to the code, since we have to construct the hash ourselves. But with that we could run multiple successive group-reduce jobs that go into the same hash table. Also, the function is spared from having to provide a way to pass through hash arguments for different kinds of hash tables with different options.

The identity argument is needed because group-reduce takes a function that projects the items to keys; in this case the items themselves are the keys so we use identity.

Here is an interactive gist of how to solve the problem succinctly using group-by and then counting lengths:

  This is the TXR Lisp interactive listener of TXR 198.
  Quit with :quit or Ctrl-D on empty line. Ctrl-X ? for cheatsheet.
  1> [group-by identity '(1 2 2 2 3 3 3 3 3 4 4)]
  #H(() (1 (1)) (2 (2 2 2)) (3 (3 3 3 3 3)) (4 (4 4)))
  2> [hash-update *1 len]
  #H(() (1 1) (2 3) (3 5) (4 2))

This leads to the following solution:

  (defun count-words (filespec)
    [(opip file-get-string
           (tok #/[^\s]+/)
           (group-by identity)
           (hash-update @1 len))
     filespec])

Easy to follow, and brief, but wastefully conses up lists just for the sake of obtaining their lengths.

> verbose and painful to understand.

Doesn't seem honest. I don't know Python, yet I can understand what that is doing, and might even be able to spot a logic error, if it had one (though not some issue of syntax).