JSON Patch is a bizarre Frankenstein's monster made of the cognitive dissonance of REST aficionados.
JSON Patch is not REST. It is not representational state. But it's also not the way any sane person has ever done RPC over HTTP. Typically RPCs accomplish mutation by just supplying whatever is the most ergonomic at the time; if you want to be able to rename an entity, you implement a /rename endpoint., etc.
In Javascript, there was always a normal, obvious way to handle Patch, which was to leverage null vs. undefined properties in javascript. However, in statically typed languages, people wanted to deserialize data into static types that don't support the concept of "undefined." This was the beginning of the end, because deserializing into a static type is actually a bug for an API that has an evolving schema. No longer can you distinguish between a null and undefined property if the a client omitted it.
So instead we end up with yet another preposterous proposed "standard" of Json patch, a completely insane and difficult to read instruction-set-over-json that is truly a hodgepodge.
Remember, mutations like this need to be assembled by the client. That means not only the code but also the user experience. The classic REST-like ways of doing these mutations originate in the "for free" usage of the HTML form element. That's gone, now you need some insane JsonPatcher library, and you need to somehow manipulate your UI around it.
> However, in statically typed languages, people wanted to deserialize data into static types that don't support the concept of "undefined." This was the beginning of the end, because deserializing into a static type is actually a bug for an API that has an evolving schema
No, that's just a bug in your type.
data FieldPatch a
= Value a -- A value unmarshable to type `a` was given, apply patch
| SetToNull -- null was given, apply and patch to null
| Undefined -- field was not set, do not patch
| UnexpectedValue RawJSON -- optional, allows manual handling of unexpected types
You can do this in things like Java too with interfaces if you can tolerate openness, or if you want closed types like in the example, through the visitor pattern.
I think your comment is funny, I love a good rant.
I don't think it's as terrible as you describe. But I see your point. We can already do this kind of thing in many different ways that seem preferable over this.
> JSON Patch is not REST. It is not representational state.
This is a perfectly reasonable take, although it fits well in systems otherwise generally more inclined to use HTTP semantics. I know that’s not REST, but it’s certainly easier to reason about and clearly document (eg OpenAPI) than its RPC analogues.
> But it's also not the way any sane person has ever done RPC over HTTP.
Why not?
Aside: I’ve also used it as a format for internal calls and change tracking. It’s surely not ideal, but it was already in the service and trivial to repurpose. Turned out to be an extremely lucky decision, as it also became a crucial tool for forensic analysis in a major recovery effort.
> I work in an industry of idiots.
You could just… not be a jerk. You have that option.
Why is it problematic to use a JsonPatcher library over some browser specific feature? This is like refusing to use screws because you own a hammer and don't want to buy a drill.
I had to go double check some words because of your comment, you are correct though. Turns out 'screwdriver' can refer to manual or powered, where I had been using 'screwdriver' to mean manual and 'drill' was pulling double duty as both powered screwdriver and drills of all sorts. For anyone else curious, while they are frequently used interchangeably, the difference is that an electric screwdriver goes slower to give you better precision.
> In Javascript, there was always a normal, obvious way to handle Patch, which was to leverage null vs. undefined properties in javascript
Yeah, no, patch as you describe it is not a patch. Given this:
{field: [1,2,3]}
And given that two clients want to each add 4 and 5 respectively to the document, one will overwrite the other, which breaks the very principle of a patch since you end up with either [1,2,3,4] or [1,2,3,5] instead of [1,2,3,4,5] (or [1,2,3,5,4] depending on who was faster).
Then you'd have to implement a version or hash field on the object and verify it before updating and fail on mis-match and handle the version mis-match error on the client and re-retrieve and update again.
GET: {version: 1, field: [1,2,3]}
PATCH: {version: 1, field: [1,2,3,4]} # succeeds
PATCH: {version: 1, field: [1,2,3,5]} # fails with HTTP 409 response
# Client displays some sort of warning that the update failed, GETs the object again and merges the new state from the server with it's state and can retry the update
GET: {version: 2, field: [1,2,3,4]}
PATCH: {version: 2, field: [1,2,3,4,5]} # succeeds
Sure, though prefer to use If-Unmodified-Since or If-Match as those are standard and caches support them properly. Another issue is how do you support both removing `field` from the document or setting it to null? Sending `field: null` will only work for one of those cases.
Likewise the moment you start to do this architecture is the moment you fail to scale. You're putting unnecessary load on your backend by forcing the client to refetch the resource in order to make another update, not to mention the scenario where a lot of clients are trying to add items to field
> GETs the object again and merges the new state from the server
At some critical point all those clients will start blocking each other as they constantly make PATCHes that fail and reGET only to fail again because another client managed to make their PATCH before them. The problem made worse by network latency and app code.
Patches that work with changesets will support a much higher concurrent write load as the concurrency/serialization is done server side, and the changesets will be small rather than the entire value. Under load all the individual requests may take longer to complete they will all ultimately succeed. Concurrency conflicts may still occur but they will be a couple orders of magnitude less frequent.
Depending on the type of update, eg. adding items to a list, this is what you want. Removing / updating values otoh may be classed as something you'd prefer to have constraint check via If- headers.
That is a problem with representation and not a real problem. If you have multiple rows in a db, and want to update a row, do you send all rows or describe precisely how to update the rows? No. you send an update and a predicate on which rows to update.
That's why I tend to prefer using PUT for idempotent updates... at least the semantics are (should be?) a little more clear. But if you really want to avoid users stomping on each other's changes you basically have to implement concurrency checks throughout your whole application.
Nope, they're very different things. Database transactions protect you against data corruption if two commands try to modify the same row at the same time. Concurrency checks (which are arguably poorly named) prevent changes from being overwritten because someone attempts an update based on stale data.
Imagine that you're using an app to manage customer information. For whatever reason you need to update Jane Doe. You pull up her records and start to make the change, but then realize you're late for a meeting. Afterwards, you finish your changes but you notice that someone accidentally entered her name as lowercase ("doe"), so you correct it and save. Cool! Except it isn't cool, because while you were in your meeting she called in to change her name because she got married and is now Jane Smith. You just reverted her back to Jane Doe. If the app included concurrency checks, it would've used something like the SQL Server rowversion column type or the Postgres xmin system column to detect that Jane's data had been modified after you loaded her user record, and you'd have gotten an error prompting you to refresh the page and make your changes again.
It's one of those things that happens a lot more often than you might think in the real world, but yet it's rare enough (or not considered a major problem) so that many apps get away with ignoring it entirely.
Thanks for clarifying, so this isn't about concurrency but rather handling updates based on stale data.
I think this really should be up to the client to decide what is required, since there will be cases where the 'nonconcurrent' behavior is in fact expected. In terms of HTTP that would be by using the If-Unmodified-Since header, but yes, it requires server side checks although one can be clever and use a common updated field on all resources.
Typically you want concurrency checks to be based on something managed by the database (rowversion and xmin being the go to options) rather than timestamps, because timestamps can lie. IME at least, last modified timestamps are usually more like a record of "has this record logically changed" rather than a literal "has the row been modified," so it's not unusual to have things like internal data operations that modify a row without setting the updated timestamp. The ETag header [0] is meant for carrying this info, but TBH I don't think I've ever actually seen it used. Most of the time resources that need concurrency checks end up carrying around a version property or something similar.
The back end really has to make the determination of what resources need protection from concurrent updates and enforce it, because of the above and because if you're going to let the client decide whether or not to participate in concurrency checks you really end up with no protection at all.
I actually have a need for something like this. I wrote a single-user flashcard app which supports multi-device sync. The way I sync is having each device write to its own journal file a list of commands, such as "create card" or "update card with ID 1234" or "insert card evaluation". To "sync", each device just pushes up its journal to a common file store and pulls down each other device's journals. Then it just plays back all the edits in timestamp order. (It works fine for single-user since you're very unlikely to be using two devices at the exact same time.)
Anyway, I never did end up implementing an optimal "update card with ID 1234". If you edit a flashcard's front text, instead of updating the front text field alone, I ended up just writing out an entirely new copy of the flashcard in the journal (with the ID 1234 encoded). Super inefficient.
I was toying with the idea of coming up with a simpler format that does basically what this JSON patch does, where you an just encode "update object 1234's field foo.bar.baz to string ABC".
I figured I'd only need a handful of operations, to insert a new object, replace a node in an object's property tree given a key path, delete an object, append to an array property, etc.
If anyone has other solutions other than JSON Patch, I'd love to hear about them.
You should also check out Yjs/Yrs or Automerge which implement merging data structures using CRDTs. Most people think of them as useful for colabrotavive real-time editing from multiple users but they are just as good at offline merges. They are perfect for situation like you describe where you have one user with multiple devices.
Apple Notes use’s CRDTs internally for exactly this.
Last year I experimented with an app architecture that used CouchDB/PouchDB for for synchronising data for a single user, multi device app. Then using Yjs to merge the conflicting edits - it worked incredible well.
If I had the time I would love to build a Yjs/CRDT native CouchDB like database that could use the Yjs state vectors as a wire protocol for syncing…
If have a little free time someday I'm going to spend a day polishing what I was building and Open-source it. It was a kind of investigation into a few different ideas, it was the synced data store which was the more interesting thing that came out of it.
This is the very rough code behind the PouchDB/Yjs datastore. Effectively each Pouch/Couch document is actually "managed" by Yjs, all changes/operations via it. It then saves the binary Yjs blob as an attachment on the Pouch document with the current Yjs state exported as JSON for the main Pouch document. This gives you all the indexing/search you get with Pouch/Couch but with automatic merging of conflicting edits.
Ultimately though I don't think PouchDB is a good platform for this, building something that is native Yjs would be much better. If anyone is interested I would love to hear from them though!
I'm also interested in following updates to your approach here.
Something that stands out immediately to me is that reliance on binary attachments. In my own CouchDB ecosystem work binary attachments have turned out to be just about the worst part of the ecosystem. PouchDB stores them pretty reliably, but every other CouchDB ecosystem database (Couchbase, Cloudant) including different versions of CouchDB itself (1.x is different from 2.x is different from 3.x in all sorts of ways) all have very different behavior when synchronizing attachments, the allowed size of attachments, the allowed types of attachments, the allowed characters in attachment names, and in general the sync protocol itself is prone to failures/timeouts with large attachments that are tough to work around because the break in the middle of replications. The number of times I've had to delete an attachment that PouchDB stored just fine to get a sync operation to complete with another server has been way too many already.
I've had to build bespoke attachment sync tools because I haven't been able to rely on attachments working in the CouchDB ecosystem.
Probably more evidence to back up your gut instinct that a native Yjs-oriented store may be better in the long term.
(Admittedly, too, I've been thinking that I need to replace the CouchDB ecosystem as a whole. PouchDB is great, but the flux I've seen in the Apache CouchDB project and the issues I've had with the managed service providers especially Cloudant after IBM makes it really hard to recommend the ecosystem. Overall it seems unhealthy/in-decline, which is sad when the core sync infrastructure seems so nice to work with when it works.)
There is a diff and patch API described here - the underlying database (TerminusDB) also has the concept of merging records, which could fit to your use case: https://terminusdb.com/docs/index/json-diff-and-patch
I feel like nearly every team which has worked with JSON for a certain number of projects automatically starts building out their own "spec" for how to handle additions/deletions/updations to JSON instances on-the-fly.
I've built atleast 3 different specs for what could be described at JSON-Patch for different projects, and each has a slightly different semantic. eg. in one implementation someone suggested in another comment, instead of explicitly passing an "operation", I just pass "key": null and it gets interpreted as delete everything below this key. In another case "key": null and "key" not existing are treated as different cases etc.
TBH I'm glad that there's some form of standardization for this, but I wish it was more publicized so folks would stop building their own specs for this.
Am I the weird one out, I've always just build plain old CRUD applications, never really had to PATH JSON, I have some alternative ideas, like just sending the lastUpdated value and ID's to the back-end and getting back any records that are far more up to date, but I wouldn't even bother diffing it, unless there's a good strategy for that, aside from caching it or some fancy database feature I've never heard of.
Genuinely curious how you troubleshoot JSON-Patch it feels like something that adds more mental overhead than it would be worth if it breaks. I understand how it would work in practice from a front-end perspective, I'm unsure about the backend though.
My issue is, if I know what's changed I rather just send the ID's and values that changed to the front-end at that stage, or just those records respectively.
Patch and variations are popular in software such as Kubernetes, where multiple actors are writing to the same field sets. In distributed or highly concurrent environments it's quite likely to get a conflict on update, but in those situations where actors have interest in different subsets of the data, touching only those fields you "own" can avoid the read-update-conflict retry cycle.
It is sort of fun to watch the cycle where json replaced xml mostly because it was simpler. Then to see things that look like xslt, xpath, etc, come back around.
That's true, but XML tends to map rather poorly to basic data types in common programming languages. With JSON you usually get to work with native objects or map structures; with XML, you're stuck with either tree or event APIs, which are a lot more complex and add another layer of indirection.
The tragic flaw of XML was that there was xml.parse equivalent that could take a generic XML document and map it to actually useful types. I guess you could get a Tree<XMLNode> but that's not really all that useful when people want arrays, numbers, hashes, and strings. XML could have been amazing with a common schema for "basic types" with a clear "next step" for structs.
XML is too flexible for its own good. It's not hard to come up with a simple XML schema for basic data that's functionally equivalent to json, but there are probably a dozen different ways that you could structure the schema to achieve the same result. And of course with XML being XML, most people trying to make that "basic type" schema would probably end up throwing in extra metadata, events, or things of that nature.
I agree. It's one of those "why am I paying for all theses features no one uses" situations. An then everyone poorly reimplements the 10-15% of features they were using. And there are no standards for those 10-15%.
Meanwhile, I'm actually not aware of any XML patch standard.
Yup. We implemented our own thing a few companies ago. It wasn't great but we didn't know about JSON Patch back then. It would have saved us a lot of time and pain.
I know the first example in the docs is intentionally short and simple but it did make me laugh that the patch was at least double the byte count of the thing it was patching.
Yeah, I could however see how this could be much more useful where the data is much larger, my only reservation is crafting the diffs on the back-end will confuse me implementing it and troubleshooting it.
The impedance mismatch is that JSON is a fantastic data structure format but a poor code language. Sexp's are a poor data structure format (briefly, lacking a dictionary format and mixing function invocation with lists) but a fantastic code language.
But now you need an s-expression parser (ok so a very simple thing to do) and a json parser. I don't like JSON Patch but at least its dog-fooding its parser and not invented yet another text representation.
Most of the time the patch documents are not being written by a human anyway.
I will happily follow you, once you have achieved your goal of unifying all languages and markup in this enlightening new future. I look forward do being issued with a lisp machine.
Given the number of security holes we ended up with in JavaScript and HTML, I think possibly a strong-ish case could even be made that reducing it all to LISP would have made the world objectively worse.
"One hammer, and everything is a nail" sounds like a smorgasbord for malicious actors in a world where we know how often SQL injection and HTML injection proved to be common threat vectors (both being "trick the system into interpreting symbols that look the same in a different way" types of attacks). Imagine if all layers of the HTTP story had been the same language. Yikes.
Honest question, why aren't s-expressions in their purest form more prevalent?
This seems to happen constantly on HN:
Post: "Behold, a data format!"
Commenter: "This is just poorly implemented s-expressions!"
I know some heavily idolized LISP person out there made the over-quoted statement that all languages end up implementing s-expressions badly.
I did LISP in college. I had my moment of transcendence with it. So I get the fandom for it. But I don't get why people keep paraphrasing this one sentence over and over, yet absolutely no one uses s-expressions outside of LISP. The one person I know who uses LISP in the private sector (via Clojure which many people refuse to recognize as a LISP for some reason?) is being hired to do very academia type research and publish white papers about it. The rest of LISP users are quirky members of academia who complain endlessly about Python's prevalence.
It just kinda seems like if LISP or s-expressions were truly that great, there would be more convergence around it for expressing data structures.
Generally, any time you have a pairing of an "op" to a variadic list of arguments in a transfer format, you have an s-expression. This method, like many others, introduces a bunch of new characters:
- "op", "path", "value"
- quotes around paths
- colons, commas
- curly braces
In exchange for parentheses.
More generally though, we have an isomorphism. If transformation between the forms is trivial, then use what form is more comfortable for you, but I think being able to identify the 'op' -> 'var-args' pairing is a good thing to keep an eye on. It's macros' bread and butter.
Agreed. The patch key-value syntax is self-describing and extensible. The s-expression syntax's reliance on positional inputs is less self-describing and extension can only be done at the end of the s-expressions.
And the s-expressions don't even solve the problem of terseness for wire-format reasons; protobufs beat them out in that category.
Positional syntaxes are extensible: you can add new optional arguments at the tail end, such that when those are missing, the old behavior is provided.
The JSON could use positional syntax, since JSON has [ ] arrays, and that would reduce its clutter.
The S-exp could use keyword arguments; it would still be nicer. Note that if we do so, we still want the operation to be positionally determined; i.e. to be the first element.
> you can add new optional arguments at the tail end
I never claimed they weren't extensible. What you're describing as a feature I'm declaring a bug. Named syntaxes self-describe in a way positional syntaxes don't.
(... he says as his IDE helpfully kicks in to tell him what each of the twelve arguments to this auto-generated Java object constructor are in his source code... ;) ).
> JSON becomes even more verbose to achieve something like this
It doesn't because it has key-value pairs as a top-level construct via the object type. LISP represents those by creating positional local dependencies between positional arguments (which is a fine solution, but at that point they're equivalently verbose and if I go with the JSON approach I can take advantage of the JSON libraries already existing in the client and server of my web applications).
In an alternate universe where JavaScript had been just LISP, your approach would be great, but that ship sailed decades ago.
Yes you did because you claimed that it was an advantage of the original. The only purpose of such a remark would be to point out a contrast: extensibility is lost in Y, so I'm pointing out that X has it.
> Named syntaxes self-describe
That's nice but it doesn't connect with the extensibility point.
Named can make it easy to know which argument is serving which role, if you already understand the function and its arguments, where you would otherwise have to remember the argument positions: like which of two identically typed arguments is the destination of the data transfer, the other being source.
> It doesn't because it has key-value pairs as a top-level construct via the object type.
Yes, but this object has no order; there is no way to indicate the first element so you have to use a keyword to indicate the operator too. I believe I communicated this with the example.
> Yes you did because you claimed that it was an advantage of the original. The only purpose of such a remark would be to point out a contrast: extensibility is lost in Y, so I'm pointing out that X has it.
My original claim was "The s-expression syntax's reliance on positional inputs is less self-describing and extension can only be done at the end of the s-expressions." You "solved" the issue of "positional inputs... less self-describing" by doubling the number of symbols in your s-expression example so that (with the exception of the first input) they are no longer positional. That doesn't actually solve the problem; it just demonstrates you can do named arguments in s-expressions too. Yes, of course you can; but why bother when my browser already supports JSON?
> That's nice but it doesn't connect with the extensibility point
Correct; they are two separate points. Positional inputs are less self-describing and also extension can only be done at the end of the s-expression. Two criticisms.
> there is no way to indicate the first element so you have to use a keyword to indicate the operator too
Technically, this is untrue; order doesn't matter since the operator can simply be any key that isn't already reserved. So `"add": true` would work to indicate an add stanza.
But I would not recommend this approach. I'd just instead do what the RFC does and specify the operator as another argument: `"op": "add"`. The advantage to this approach (unlike yours) is that, indeed, order does not matter; I do not need the operator name to be the first thing in the list.
The two approaches in the named-argument structure are just about equivalent, except that the JSON approach has the advantage that browsers already have built-in JSON support (which gives that format a huge practical advantage over introducing a novel s-expression format in this problem domain, unfortunately).
"add" : true is not required to print first in the { ... } object syntax, which is an impediment to readability.
Actually, named arguments as such as an impediment to readability when you're already familiar with the API.
This is because they allow the order to be scrambled. No two invocations of the API have to use the same order, so you're always going to have to parse the names, which doubles the effort.
E.g. in C, memcpy(a, b, size) calls don't tell you which is the destination. But once you remember that it's the leftmost argument, it's moot., And it's better than dealing with six possible permutations:
memcpy(dst: a, src: b, size: s);
memcpy(src: b, dst: a, size: s);
memcpy(size: s, src: b, dst: a);
memcpy(size: s, dst: a, src: b);
memcpy(src: b, size: s, dst: a);
memcpy(dst: a, size: s, src: b);
If I drill down on any one of these, and read it carefully, of course there is no mistaking it, but I'd rather have it in a consistent order. Even if the arguments could be optionally labeled with names, I'd want it to work like this:
memcpy(src: b, size: s, dst: a); // ERROR: the first argument of memcpy is not called "src".
At least the darned memcpy is out on the left, which it wouldn't be if the function is also another named argument:
(src: b, op: memcpy, ...) // 24 permutations!
I'm not a big fan of keyword arguments. I've kept them out of the TXR Lisp language, except for their availability via a "parameter list macro" (a mechanism of my invention for transforming function parameter lists).
Once you're reaching for keyword arguments, it means you have a bad API. The fixed, required positional parameters of an API are usually the only part that is designed. The optionals are often less so, and keywords are just a bag of ad hoc afterthoughts which only care about getting some needed effect.
This entire argument is a bit of a honey-pot on my part, because the real answer is "Neither is ideal; the tooling around the language should allow the developer to switch back-and-forth arbitrarily, as needed. It should also allow sorting of named arguments."
Since I don't get that tooling for wire-protocol content(1), I prefer named properties there because the alternative is usually some nasty flavor of binary alignment or picking semantics out of a densely-packed forest of newline-stripped character streams. But the correct answer in an actual programming language is "Both. Your IDE should make it easy to toggle the names on and off for positional arguments."
If a dev house is already using JSON for all the reasons one might do so, the flat presentation given by the RFC is the right presentation. But the real correct answer is "match your tools to the problem and you don't have to trade off between brevity and clarity."
(1) unless I do. gRPC is a treat to work with because it's thin on-the-wire but the Chrome devtool extensions give you the context. It's not the right solution for the JSON patch protocol because, well, right there in the name... It's JSON. But it's the solution I'd actually recommend to a team putting something new together if they don't need to interface to the outside world / aren't concerned the outside world will balk at a basically binary-on-the-wire data format.
>Positional syntaxes are extensible: you can add new optional arguments at the tail end, such that when those are missing, the old behavior is provided.
That's just one direction of "extensible". You can't remove existing arguments, which you can with named fields.
You cannot remove an argument from a function without checking all the code which calls it, to make sure it's not using that argument.
Removal isn't extension; so we are no longer on the topic of extensibility. For instance, it wouldn't be considered a compiler/vendor extension if the "for" statement were removed; it would be a (severe) nonconformance.
It may be easier to remove a named argument, if that argument is rarely used, because you can look for uses of that argument by name. If the hits are nonexistent to rare, you have very little work. In a public API, where you don't have access to all the code base which uses the function, you may not have any way of evaluating "rarely used" other than polling the user base.
Needless to say, a data format for a patch operation is a public API.You don't even control the software which implements an argument to be able to remove it. You can write some blurbs in the spec which declare something to be deprecated, and 20 years later, maybe everyone will have stopped using it and removed support.
/// Every `{ "type": "foo" }` object must have a `bar` field.
function foo(obj) {
if (obj.type === "foo") {
reticulate(obj.bar);
}
}
v2.js:
/// Relax requirement that every `{ "type": "foo" }` object must have a `bar` field.
function foo(obj) {
if (obj.type === "foo" && typeof obj.bar !== "undefined") {
reticulate(obj.bar);
}
}
Neither sexpr nor JSON are self-describing. They are both dumb, barely more than syntax formats that you can build stuff on top of if you wish so, which then might be self-describing and extensible.
With that said, there's EDN and fressian which solve the problems mentioned.
In EDN you could encode an instruction like so for example `(replace :path "/foo" :value 1)`
This reads like a function call because it is a representation of a function call. The semantics are already there.
Even better you could use namespaced keywords and symbols where applicable and suddenly were in a world where we can convey context without nesting, out of bounds information/schema or funny string formats.
Self-describing, not self-documenting. I'm claiming that `"path": "foo", "value": "bar"` is a bit easier to pick up and learn than `["add" "/path" "value"]`.
This is probably a personal-preference thing; speaking from years of slinging both JSON and elisp, I have to go to the docs to remember the contents of a JSON object far less often than I do to remember the relevant positional arguments to a defun.
It's inferior because there's a bunch of useless clutter that gets in the way in the JSON patch version.
For the same reason
> If a person puts such a question to you as: ‘I have divided ten into two parts, and multiplying one of these by the other the result was twenty-one;’ then you know that one of the parts is thing and the other is ten minus thing.
is inferior to
> x(10-x)=21
The succinctness helps your brain focus on the important information so you don't get lost in the noise. If you find it hard to read at first, your brain can adapt to the patterns just fine.
> X times open parenthesis ten minus X close parenthesis equals twenty one
I'm not sure if your point is supposed to be that it's easier to understand if read another way or something, but no matter what the symbol version is the easiest for your brain (which is very good at pattern recognition) to see and pick up on.
Maybe I missed your point, so you'll have to be explicit.
By the way, I didn't contrive the example. I googled "before math notation" and chose it from the second result[1]. I had seen similar examples of what math looked like before notation, so I looked for an example.
These are basically reified function calls with known arity, this solution is less XMLish and more lispy, and can be efficiently applied with a bit of destructuring and dispatch on [0].
Hmm, is readability the most important thing here? It seems like terseness may be indicated if this is for PATCH requests. Request bodies are almost never compressed. If the patch is too big, you'd be better off just sending the whole file rather than preparing a patch. The unix diff tool's output is very terse and it's a reliable workhorse.
When git had the chance to create its own diff format in 2005, it copied the unix diff format. I don't think it's a given that just because it's old, all of its design decisions are wrong. The lack of compression for PATCH request bodies and the need to be smaller than the original file seems, to me, to indicate terseness.
I wasn't implying diff's design decisions were wrong, just that when we greenfield things these days, self-description is a feature where it decidedly wasn't in software built during the era where characters cost money (in ink and paper) to represent.
Git was wise to copy diff for backwards compatibility with existing CLI tools because it is primarily accessed via command line. No such backwards-compatibility gain exists with a PATCH protocol over HTTP.
In terms of terseness: when servers care about terseness these days, they use a non-ASCII-on-the-wire transport like GRPC. I believe that is out of scope for this RFC.
I think (without any kind of supporting evidence) that git chose to use terse diffs so they could be emailed to mailing lists. Git's diffs aren't backwards compatible with patch by default; I'm not sure that was the reason. All of that is to say: I think there are reasons why a modern patch format might reasonably choose terseness.
(Personally, I would just push for request body compression. It's not common but it's not impossible, and then you get the best of both worlds. I bet the terse format compresses to roughly the same size as the verbose format.)
It's VSO, there is always an operator (six of them), the subject [1] is always a path, and the only one missing an object is remove, which of course has none.
Here:
const [op, from, to] = patch
Now you can read it again. If you want to hashmap from {+, -, !, |, >, ?} to words, that's understandable.
Edit: We could simplify this to five verbs by letting + take one or two arguments, where the 1adic form sets the value to null. I mean I wouldn't, but we could.
Have seen JSON patch successfully reduce complexity in two separate projects now. My thoughts:
The Good:
1. Depending on your ORM or data access model, JSON patch is a fantastic option. If you’ve got existing services that can hand you deeply nested models, you can start writing PUT/PATCH routes that take JSON patch changesets and let it build things up automatically. Makes it as easy as checking a resource out of storage, applyPatch or applyOperation, check it back in.
2. You don’t really need to worry about bad types getting dumped into your base object, as you should be deferring type safety to the route DTO level anyway, and everything can trickle from there.
3. The ‘-‘ semantic works fine for signaling an intent to insert a new element.
4. 99% of the time the “protocol” seems to just work.
5. YMMV, but there’s a good chance that switching to JSON patch will let you delete front end code as well, since the front end models can in theory also use the very commands they pass to the back end to locally update models. It integrates very well with a redux pattern and seems like a base technology for undo/redo.
The Bad:
1. On the one hand ‘/‘ as the primary delimiter irks me. I suppose ‘.’ characters in a property name are more common than slashes, but still not my favorite. Furthermore, the escaped characters not using backslashes is weird. You write ~0 if you need to refer to ~ literally and ~1 for /, if I am not mistaken.
2. There’s a missing contextual mode for the ‘add’ op: creating missing objects along the path you’re talking about. For instance if your base object is {}, then add at path ‘/foo/bar’ will fail since foo is undefined. Should be a mode where I can say “yes, please create missing stuff along the way, thank you.”
“On the JSON patch, feel your GraphQL cravings subside in as little as thirty minutes!”
One final “bad”: the update operation (known as replace) is also slightly lacking in my opinion. I’d like to be able to update an object with a partial fragment, eg:
Will update “a”, but will throw away existing property /foo/c.
This is dumb. The other problems like it are dumb. Like so dumb it’s hard to not ruminate, so everybody here that complains is right. But not much so that the technology is void of use as some seem to suggest.
It’s so funny that JSON Patch 2 hasn’t come out with more extensions/improvements. Perhaps this Ycomb thread is the little push over the edge that we need to get a ball rolling!
Use case example: in a strategy game, we're implementing this format so that each weapon, skill, terrain, and unit has the chance to add patch commands during a specific part of the turn, which are then applied to a document representing the entire state of a battle. This makes it easier to handle conflicting effects and changes etc.
How do you deal with the fact that the patch operations are not associative? This seems like a recipe for conflicting effects... you don't want the result to depend on the order you picked up items. Do you batch the effects? (e.g. deletes, then inserts, then modifications?)
What is the behavior of your chosen library--mutable or immutable operations? I.e. do you pay the penalty of making a deeply nested copy of your scene graph for each turn?
Great question! We chose to make the order explicit and important - for example if we have two battling units, one with "unit reduces counter attack strength by 50%" and one with "unit counter attacks with 200% of original attack strength", order is very important. So the game designer has to add an 'order' value to every object in the game.
So far the favourite library is Fast-JSON-Patch[1]. The method we're trying is to build up a sequence of patches - so every object in the game can add a patch to the queue, and then once the system is ready, it can apply the patches and validate the result.
This particluar library allows the code to directly manipulate the object, and generate a patch from the changes made - so it keeps the code simple and easy to understand.
We used this to make updates to the NHS Patient Demographic Service (which stores names, addresses, contact information). The records are pretty detailed (FHIR) and there's a lot of business rules and deep nesting.
It worked really well, and our users had no problems making JSON Patch requests.
Why are pointers strings? They are really multi-indexes, so why are they not arrays of strings (for hashes/"objects") and numbers (for arrays)? This would have avoided the need for escaping and for string operations in general, which complicate the whole thing.
It's also less parsing, and for many systems a less wieldy/less errorprone entity to pass around your runtime. You unpack it at the time you need to dereference the JSON pointer.
I guess it's more user friendly to have the path as a string, except of course most of these patches will be machine generated and processed, so the user friendliness is a bit pointless.
I think it's natural to do it this way, since it's common for a REST API to also support GETs inside objects using URLs. That is, `GET /` can return `{"top": [{"key": "value"}]}`, while `GET /top/1` can return `{"key": "value"}`. So, reusing the same syntax to refer to fields comes very naturally if you already have this support.
I don’t get this either, the byte savings are trivial and escaping/unescaping logic adds needless encoding complexity to a concept that otherwise does not need string encoders!
At Notion, we use Array<string | number> as you suggest.
I did the same for a CQRS system. It's been humming away in production for at least 6 years. I've been surprised at how reliable it's been, 0 patch bugs.
Where would this be useful? I am not trying to bring the article down but I'm genuinely interested in knowing where/how people would use this. It seems like just locally modifying a payload would be more efficient?
Arrays require some care to handle which is why I'm got some extensions in my client, but a super power that JSON patching enables is that it can be algebraic in nature. That it, we can define a patching which is associative (another extension where you preserve nulls) such that (X + da) + db = X + (da + db) = X + dc where da + db + dc
This allows you to leverage flow control when congestion happens (especially on mobile connections).
I can think of realtime collaborative document editing - patches are sent to the server instead of sending the whole JSON-based document every time and the app resolves potential conflicts.
Software such as Kubernetes uses it with different processes writing to the same data fields, but different subsets. If you have a typical API with only one main blob that gets updated in its entirety, you might not avoid conflicts using the PATCH approach.
We have a supplier who uses it to distribute daily updates to their (large) product catalog. It helps that most of the mutations are to specific fields like availability and price, while EAN/GTIN etc tend to be stable. Deletions are trivial as well.
It's clear you want to change baz value to boo, but did you mean to delete foo or just ignore it? If "ignore", how do you indicate delete? If "delete", you will now have to send the entire JSON every time, anyway, so it's not a patch. Whatever your answers, there, it's not a "why not just..." situation.
You have a deeply nested property in a gigantic JSON document that you want to move, "why not just" send
That doesn't work, because DELETE is defined to delete the entire resource. PUT is defined to replace the resource with the body, which would replace the resource with the patch. Only PATCH is defined to accept a patch and do something special with it.
Abusing HTTP methods like this is something ElasticSearch does on some of their APIs. This is annoying because spec-compliant tools don't always play nice with them.
For example: ElasticSearch can take a body on some GET requests, but many tools will not let you do this because a GET request should not have a body.
If you have a proxy, cache, or middlebox in your network, and you use DELETE to mean "remove this key from the JSON object" your proxy won't know that, and will happily delete the entire resource, and tell everyone else that the resource has been deleted, and then your data will disappear, and that's not very fun.
What if you want to delete one attribute, and update another, within the same request? Seems like utilizing HTTP DELETE would not work well for this, unless you can specify it at the attribute level, leading you back towards something like JSON Patch.
There are all kinds of solutions ("why not just use undefined instead?"), but whichever it is, a decision has to be made, documented, remembered, and edge cases decided. Whatever you have, now the API has a non-obvious custom spec just for that API alone. It's not a "why not just".
I guarantee, another dev would come along, look at the custom spec, and think "Why not just use JSON Patch?"
undefined would in theory be slightly "better" for deleting values (the value { key: undefined } serializes to {}), but, alas, it isn't supported by JSON, so you'd have to use a custom serializer and deserializer for it to even work.
the partial updates approach works ok for updating 'schemaful' data, where it could only ever mean to set the field to null and could never delete the field from the object altogether
but for editing a JSON document or other schemaless data then both operations are possible and you'd need to be able to distinguish them
You can interpret a null field the same as a missing field. You only need to worry if you have arbitrary dictionary keys and need to differentiate a null value from an missing key.
> You can interpret a null field the same as a missing field.
Not in the general case no, they are two different states, which may have entirely different semantics.
For instance a missing field can mean "don't touch this field" while a field set to null means "set this field to null".
Some schemas might opt to ignore this distinction and collapse the states, but that is a specific decision, and an express loss of information.
> You only need to worry if you have arbitrary dictionary keys and need to differentiate a null value from an missing key.
There are other situations where that's an issue, like the above, or when you're using the object as a set (which JSON doesn't have), or when you've defined the schema to never have missing fields (but possibly have some set to `null`) so clients will break, ...
Null could imply the field was present and a value not entered or not necessary.
Missing could imply the field was never known about at all. The context is important.
In case you're trying to open the RFC links on the site, the IETF site no longer supports http and does not redirect to https, hence you see a 404 error. You can manually open https versions of the urls to visit them
I've used this in the past on projects -- overall a helpful set of functionality, and simplified some front-end stuff (less code written is always good in my books).
Very useful within a business I'm working with. There are multiple data providers who's data shall fall into a single domain data model and they are providing that data in vast amounts. We deal with that by normalizing data, comparing it to present state of data and enqueueing patches into a queue from which application aggregating that data is picking and applying updates. Maybe one small thing regarding Java libs - jsonpatch comes under double license and is not suitable for commercial use so beware, better use zjsonpatch (I also like it more because it's more intuitive).
An alternative approach is to use jsonpath; for python, the jsonpath-ng implementation is useful but could use more friendly documentation.
it contains XPath-style selectors and the ability to directly modify matched nodes in a Dict, including filtering (as indirect deletion) and upserts; the only thing that it's missing is sufficient addressability with regards to Lists (arrays), so I see myself building a similar 'op' processing but using jsonpath instead of jsonpatch language.
How about a library to enable operational transformation for this format? I wrote one once, but got really burned out with some of the stretch goals (log compression for 3+ participants).
I don't know if there are good use cases for JSON patch, but I found it pretty inconvenient to use when implementing a SCIM server (which uses JSON patch for PATCH method updates)
Escaping is weird. Why ~0 and ~1 instead of \ and \\ ? Also
> Finally, if you need to refer to the end of an array you can use - instead of an index. For example, to refer to the end of the array of biscuits above you would use /biscuits/-
"/biscuits/-" addresses "this value" because biscuits is not an array.
If you check out the implementations at the bottom of the post, you'll see that, at runtime, if biscuits is an object they index into `obj["-"]`. And if it's an array, they special-case "-" and index into `array[length]`.
Edit: i suppose i shouldn't be surprised there's multiple standards to varying degrees of acceptance: https://datatracker.ietf.org/doc/html/rfc6901 so may be it is fine to use this expression language...
I didn't mean to say that all there was were reference impls. I just clicked through the impls to quickly answer your question out of curiosity instead of scanning an RFC.
I've used it in the past for document storage updates when documents can be rather large. It is always a toss up, design wise, to patch a single object or to save a log of actions (applied either on the server or the client). The former uses less space and has faster "time to final object state" on the client whereas the latter allows for time travelling and can make debugging much easier.
I've seen several roll-your-own solutions to this, but I have yet to see one that supports "copy" and "move". And rarely do I see one that supports "test". I like those features, even if I think the "op"syntax is clumsy as anything and add/update/remove can just be a JSON object be key:value/new_value/undefined
In addition to the other limitations, this doesn't let you edit strings. You can't insert text into a string, or delete text. This makes it useless for collaborative editing.
Yjs and Automerge are not patch formats. They are CRDT libraries. You are comparing apples and oranges. If you want to send Yjs or Automerge patches over the network, you still need a patch format.
Isn't it amazing how JSON reinvents pretty much every technology XML ever had? This includes the well-forgotten XQuery, which is in a way what jq and JMESPath offer, too.
If you use JSON for configuration (as it should be), why not update the whole file and get change tracked in VCS? If you use it as a database, why not Mongo then?
I've used JSON Patch via Kustomize to perform last mile changes for k8s manifests. For example, add `label=dev` to everything in this chart. Much easier to manage differences across the environments.
For strictly one way one producer to one consumer state updates, I benchmarked this versus writing the new object to a zstd stream and flushing. Zstd won.
XSLT can do too much, you can even draw a map with XSLT with styles like Osmarender[1]. Javascript is already the XSLT of json and it is good enough at giving alternatives to a locked down format.
I'm also using a JSON-diff format in my data store[1], which is accumulated during insertions/updates/deletes in-memory and serialized during a commit, admittedly for large JSON instances (a few Gb and much more).
A lot of that comes down to internal application architecture. If you're working in Javascript / Typescript, where objects are inherently dictionaries / hashsets with ordering; yes, this is true. You will need to sort before serialization in order for the diff to be useful.
But: If you're working in a strongly-typed language like C# / Java / Rust / C++, and using an object -> JSON -> object serializer; the order of the elements is often quite explicit and never changes.
At first it looked interesting, but as pointed out by other people, it's enough to have a 'partial' json with the field that need update. The syntax stays the same, just add what you want to update. We do exactly that and it's very easy
The advantage is in mutating arrays. You don't need to send the entire array back, just the commands for which index gets added/ removed / replaced.
Also, there's no way to differentiate between deleting a key from an object and leaving it unchanged if you send a partial patch, assuming that `null` is a distinct value different than actually deleting the key itself.
To your second point, the JSON spec allows `null` but not `undefined`, so most js serializers (all?) don't send (or throw an exception) for any `undefined` from any object you attempt to send. To delete a property, you would send the entire surrounding object. Or you can simply define your data-layer without transient key/values. It's certainly not a necessary feature to have.
How do you delete fields? What about complex documents with dozens of nested fields?
For really simple use cases you don't need this but I've used something similar for an application where the JSON documents could get in the megabytes and it was a lifesaver.
JSON Patch is not REST. It is not representational state. But it's also not the way any sane person has ever done RPC over HTTP. Typically RPCs accomplish mutation by just supplying whatever is the most ergonomic at the time; if you want to be able to rename an entity, you implement a /rename endpoint., etc.
In Javascript, there was always a normal, obvious way to handle Patch, which was to leverage null vs. undefined properties in javascript. However, in statically typed languages, people wanted to deserialize data into static types that don't support the concept of "undefined." This was the beginning of the end, because deserializing into a static type is actually a bug for an API that has an evolving schema. No longer can you distinguish between a null and undefined property if the a client omitted it.
So instead we end up with yet another preposterous proposed "standard" of Json patch, a completely insane and difficult to read instruction-set-over-json that is truly a hodgepodge.
Remember, mutations like this need to be assembled by the client. That means not only the code but also the user experience. The classic REST-like ways of doing these mutations originate in the "for free" usage of the HTML form element. That's gone, now you need some insane JsonPatcher library, and you need to somehow manipulate your UI around it.
I work in an industry of idiots.