A good compiler will make the lists disappear in many cases. No runtime overhead. I actually love single linked lists as a way to break down sequences of problem steps.
I believe it's possible in theory - Koka has a whole "functional but in-place" set of compiler optimisations that essentially translate functional algorithms into imperative ones. But I think that's also possible in Koka in part because of its ownership rules that track where objects are created and freed, so might also not be feasible for OCaml.
I mean, the set of valid deforestation transformations you could do to an OCaml program is not literally the empty set, but OCaml functions can do I/O, update refs, and throw exceptions, as well as failing to terminate, so you would have to be sure that none of the code you were running in the wrong order did any of those things. I don't think the garbage collection issues you mention are a problem, though maybe I don't understand them?
Part of what Koka's functional-but-in-place system relies on is the Perceus program analysis, which, as I understand it, is kind of like a limited Rust-like lifetime analysis which can determine statically the lifetime of different objects and whether they can be reused or discarded or whatever. That way, if you're, say, mapping over a linked list, rather than construct a whole new list for the new entries, the Koka compiler simply mutates the old list with the new values. You write a pure, functional algorithm, and Koka converts it to the imperative equivalent.
That said, I think this is somewhat unrelated to the idea of making linked lists disappear - Koka is still using linked lists, but optimising their allocation and deallocation, whereas Haskell can convert a linked list to an array and do a different set of optimisations there.
Why would a specific way of structuring data in memory be relevant to breaking down sequences of problem steps?
If what you mean is the ability to think in terms of "first" and "rest", that's just an interface that doesn't have to be backed by a linked list implementation.
Not GP but bump allocation (OCaml's GC uses a bump allocator into the young heap) mitigates this somewhat, list nodes tend to be allocated near each other. It is worse than the guaranteed contiguous access patterns of a vector, but it's not completely scattered either.
My theory is that only code written in functional languages has complex properties you can actually test.
In imperative programs, you might have a few utils that are appropriate for property testing - things like to_title_case(str) - but the bulk of program logic can only be tested imperatively with extensive mocking.
I strongly disagree, but I think there are a few problems
1. Lots of devs just don't know about it.
2. It's easy to do it wrong and try and reimplement the thing you're testing.
3. It's easy to try and redo all your testing like that and fail and then give up.
Here's some I've used:
Tabbing changes focus for UIs with more than 1 element. Shift tabbing the same number of times takes you back to where you came from.
This one on TVs with u/d/l/r nav -> if pressing a direction changes focus, pressing the opposite direction takes you back to where you came from.
An extension of the last -> regardless of the set of API calls used to make the user interface, the same is true.
When finding text ABC in a larger string and getting back `Match(x, start, end)`, if I take the string and chop out string[start:end] then I get back exactly ABC. This failed because of a dotted I that when lowercased during a normalisation step resulted in two characters - so all the positions were shifted. Hypothesis found this and was able to give me a test like "find x in 'İx' -> fail".
No input to the API should result in a 500 error. N, where N>0, PUT requests result in one item created.
Clicking around the application should not result in a 404 page or error page.
Overall I think there's lots of wider things you can check, because we should have UIs and tools that give simple rules and guarantees to users.
I think testing culture in general is suffering because the most popular styles/runtimes don’t support it easily.
Most apps (at least in my part of the world) these days are absolutely peppered with side effects. At work our code is mostly just endpoints that trigger tons of side effects, then return some glob of data returned from some of those effects. The joys of micro services!!
If you’re designing from the ground up with testing in mind, you can make things super testable. Defer the actual execution of side effects. Group them together and move local biz logic to a pure function. But when you have a service that’s just a 10,000 line tangle of reading and writing to queues, databases and other services, it’s really hard to ANY kind of testing.
I think that’s why unit testing and full on browser based E2E testing are popular these days. Unit testing pretends the complexity isn’t there via mocks, and lets you get high test coverage to pass your 85% coverage requirement. Then the E2E tests actually test user stories.
I’m really hoping there’s a shift. There are SO many interesting and comprehensive testing strategies available that can give you such high confidence in your program. But it mostly feels like an afterthought. My job has 90% coverage requirements, but not a single person writes useful tests. We have like 10,000 unit tests literally just mocking functions and then spying on the mocked return.
For anybody wanting to see a super duper interesting use of property based testing, check out “Breaking the Bank with test contract”, a talk by Allen Rohner. He pretty much uses property based testing to verify that mocks of services behave identically to the actual services (for the purpose of the program) so that you can develop and test against those mocks. I’ve started implementing a shitty version of this at work, and it’s wicked cool!!
The Python survey data (https://lp.jetbrains.com/python-developers-survey-2024/) holds pretty consistently at 4% of Python users saying they use it, which isn't as large as I'd like, but given that only 64% of people in the survey say they use testing at all isn't doing too badly, and I think certainly falsifies the claim that Python programs don't have properties you can test.
Thanks for sharing! Your article illustrates well the benefits of this approach.
One drawback I see is that property-based tests inevitably need to be much more complex than example-based ones. This means that bugs are much more likely, they're more difficult to maintain, etc. You do mention that it's a lot of code, but I wonder if the complexity is worth it in the long run. I suppose that since testing these scenarios any other way would be even more tedious and error-prone, the answer is "yes". But it's something to keep in mind.
> One drawback I see is that property-based tests inevitably need to be much more complex than example-based ones.
I don’t think that’s true, I just think the complexity is more explicit (in code) rather than implicit (in the process of coming up with examples). Example-based testing usually involves defining conditions and properties to be tested, then involves constructing sets of examples to test them and which attempt to anticipated edge cases from the description of the requirements (black box) or from knowledge of how the code is implemented (white box).
Property based testing involves defining the conditiosn and properties, writing code that generates the conditions and for each property writing a bit of code that can refute it by passing if and only if it is true of the subject under test for a particular set of inputs.
With a library like Hypothesis which both has good generators for basic types and good abstractions for combining and adapting generators, the latter seems to be less complex overall, as well as moving the complexity into a form where it is explicit and easy to maintain/adapt, whereas adapting example-based tests to requirements changes involves either throwing out examples and starting over or revalidating and updating examples individually.
> Property based testing involves defining the conditiosn and properties, writing code that generates the conditions and for each property writing a bit of code that can refute it by passing if and only if it is true of the subject under test for a particular set of inputs.
You're downplaying the amount of code required to properly setup a property-based test. In the linked article, the author implemented a state machine to accurately model the SUT. While this is not the most complex of systems, it is far from trivial, and certainly not a "bit of code". In my many years of example-based unit/integration/E2E testing, I've never had to implement something like that. The author admits that the team was reluctant to adopt PBT partly because of the amount of code.
This isn't to say that example-based tests are simple. There can be a lot of setup, mocking, stubbing, and helper code to support the test, but this is usually a smell that something is not right. Whereas with PBT it seems inevitable in some situations.
But then again, I can see how such tests can be invaluable, very difficult and likely more complex to implement otherwise. So, as with many things, it's a tradeoff. I think PBT doesn't replace EBT, nor vice-versa, but they complement eachother.
My experience is that PBT tests are mostly hard in devising the generators, not in the testing itself.
Since it came up in another thread (yes, it's trivial), a function `add` is no easier or harder to test with examples than with PBT, here are some of the tests as both PBT-style and example-based style:
@given(st.integers())
def test_left_identity_pbt(a):
assert add(a, 0) == a
def test_left_identity():
assert add(10, 0) == 10
@given(st.integers(), st.integers())
def test_commutative(a, b):
assert add(a, b) == add(b, a)
@parametrize("a,b", examples)
def test_commutative():
assert add(a, b) == add(b, a)
They're the same test, but one is more comprehensive than the other. And you can use them together. Supposing you do find an error, you add it to your example-based tests to build out your regression test suite. This is how I try to get people into PBT in the first place, just take your existing example-based tests and build a generator. If they start failing, that means your examples weren't sufficiently comprehensive (not surprising). Because PBT systems like Hypothesis run so many tests, though, you may need to either restrict the number of generated examples for performance reason or breakup complex tests into a set of smaller, but faster running, tests to get the benefit.
Other things become much simpler, or at least simpler to test comprehensively, like stateful and end-to-end tests (assuming you have a way to programmatically control your system). Real-world, I used Hypothesis to drive an application by sending a series of commands/queries and seeing how it behaved. There are so many possible sequences that manually developing a useful set of end-to-end tests is non-trivial. However, with Hypothesis it just generated sequences of interactions for me and found errors in the system. After each command (which may or may not change the application state) it issued queries in the invariant checks and verified the results against the model. Like with example-based testing, these can be turned into hard-coded examples in your regression test suite.
> Since it came up in another thread (yes, it's trivial), a function `add` is no easier or harder to test with examples than with PBT
Come on, that example is practically useless for comparing both approaches.
Take a look at the article linked above. The amount of non-trivial code required to setup a PBT should raise an eyebrow, at the very least.
It's quite possible that the value of such a test outweighs the complexity overhead, and that implementing all the test variations with EBT would be infeasible, but choosing one strategy over the other should be a conscious decision made by the team.
So as much as you're painting PBT in a positive light, I don't see it that clearly. I think that PBT covers certain scenarios better than EBT, while EBT can be sufficient for a wide variety of tests, and be simpler overall.
But again, I haven't actually written PBTs myself. I'm just going by the docs and articles mentioned here.
> Come on, that example is practically useless for comparing both approaches.
Come on, I admitted it was trivial. It was a quick example that fit into a comment block. Did you expect a dissertation?
> that implementing all the test variations with EBT would be infeasible
That's kind of the point to my previous comment. PBTs will generate many more examples than you would create by hand. If you have EBTs already, you're one step away from PBTs (the generators, I never said this was trivial just to preempt another annoying "Come on"). And then you'll have more comprehensive testing than you would have had sticking to just your carefully handcrafted examples. This isn't the end of property-based testing, but it's a really good start and the easiest way to bring it into an existing project because you can mostly reuse the existing tests.
Extending this, once you get used to it, to stateful testing (which many PBT libraries support, including Hypothesis) you can generate a lot of very useful end-to-end tests that would be even harder to come up with by hand. And again, if you have any example-based end-to-end tests or integration tests, you need to come up with generators and you can start converting them into property-based tests.
> but choosing one strategy over the other should be a conscious decision made by the team.
Ok. What prompted this? I never said otherwise. It's also not an either/or situation, which you seem to want to make it. As I wrote in that previous comment, you can use both and use the property-based tests to bolster the example-based tests, and convert counterexamples into more example-based tests for your regression suite.
Probably because it's difficult to build simplistic intuition around them in languages that don't have robust built-in mechanisms that enable that kind of thinking.
For example, gen-testing in Clojure feels simple and intuitive, but if I have to do something similar, hmmm... I dunno, say in Java, I wouldn't even know where to start - the code inevitably would be too verbose; mutable state would obscure invariants; and the required boilerplate alone would cloud the core idea. While in Clojure, immutability-by-default feels transformative for gen-testing; data generation is first-class - generators are just values or functions; composition is natural; debugging is easier; invariants are verifiable; FP mindset already baked in - easier to reason about stateful sequences/event streams; there's far less ceremony overall;
But if I'm forced to write Java and see a good case for prop-based testing, I'd still probably try doing it. But only because I already acquired some intuition for it. That's why it is immensely rewarding to spend a year or more learning an FP language, even if you don't see it used at work or every day.
So, the honest answer to your question would probably be: "because FP languages are not more popular"
I think core of the problem in property-based testing that the property/specification needs to be quite simple compared to the implementation.
I did some property-based testing in Haskell and in some cases the implementation was the specification verbatum. So what properties should I test?
It was clearer where my function should be symmetric in the arguments or that there is a neutral element, etc..
If the property is basically your specification which (as the language is very expressive) is your implementation then you're just going in circles.
Yeah, reimplementing the solution just to have something to check against is a bad idea.
I find that most tutorials talk about "properties of function `foo`", whereas I prefer to think about "how is function `foo` related to other functions". Those relationships can be expressed as code, by plugging outputs of one function into arguments of another, or by sequencing calls in a particular order, etc. and ultimately making assertions. However, there will usually be gaps; filling in those gaps is what a property's inputs are for.
Another good source of properties is trying to think of ways to change an expression/block which are irrelevant. For example, when we perform a deletion, any edits made beforehand should be irrelevant; boom, that's a property. If something would filter out negative values, then it's a property that sprinkling negative values all over the place has no effect. And so on.
That's not exactly for slowness reasons. The creators of uv have stated that if pip followed the same algorithms they'd see similar performance benefits. People greatly overstate the Python performance penalty.
... Actually, could you show me where they said it? I've been explaining it from first principles from a while now and it would be nice to be able to go "but don't take my word for it" as well.
(Python does incur a hefty performance penalty for things that are actually CPU bound. But that doesn't describe most of the process of installing Python packages; and the main part that is CPU bound is implemented by CPython in C.)
Python is slow due to design decisions in the language. For example operator dispatch is slow without some kind of static analysis. But this is hindered by how dynamic the language is.
It's hard to make Python run fast when it pervasively uses duck typing. It makes types only resolvable at runtime. JIT is the only thing that can work here at the moment, but I think that needs to make very similar assumptions to a branch predictor, plus it needs to identify lexical regions (is that what they're called?). People here have criticised PyPy, but I've forgotten why.
When DuckDB queries across multiple sources (say, Postgres and a CSV) does it first load all data into DuckDB or is it smart enough to only pull minimal data needed for the query on the fly?
"DuckDB offers robust capabilities for querying data stored partially on S3, particularly when dealing with Parquet files. This is achieved through several optimization techniques:
Predicate Pushdown: When you apply a WHERE clause to filter data, DuckDB can "push down" this filter directly into the Parquet file scan. If the Parquet file contains zonemaps (metadata about value ranges within columns), DuckDB can use this information to skip reading entire sections of the file that do not contain relevant data, significantly reducing the amount of data transferred from S3.
Projection Pushdown: When you select only specific columns in your SELECT statement, DuckDB automatically reads only those required columns from the Parquet file. This means you avoid downloading and processing unnecessary data, leading to faster queries and reduced S3 transfer costs.
HTTP Range Reads: DuckDB leverages HTTP range headers when interacting with S3 (or other object storage supporting range reads). This allows it to fetch only the necessary parts of the Parquet file, such as metadata or specific column chunks, rather than downloading the entire file."
Type checks are proofs across all inputs. Tests are proofs across a given set of inputs. They are complimentary. If you want really robust code, use both.
I've generally found the complement to be true as well: once you have decent type modeling, tests don't find very much, if anything. I've primarily only seen modeling done well in Scala though.
Lots, but it's Pareto principle all over again. Those who wanted a sweeter Java were the majority of users. They have no real reason to switch to Scala 3 when Java 25 has 20% of Scala features that provide 80% of the benefits.
reply