Eventual Consistency
subtitle: (celebrating 50 years of collaborative grocery lists)
NB: 90% complete, sections are made for convenience of editing and might not be useful for the publication
Contents
prologue: got milk? (1968)
A now canonized moment in many tellings of the history of computation is the so-called Mother of all demos when Douglas Engelbart and a team of (student) engineers performed a kind of technical theater at a technical computer conference in San Francisco in 1968. In the demo, Engelbart sits in front of computer display and proceeds to use something that (to a modern eye) seems to fuse elements of a text editor, outliner, drawing program, and web browser. At one point he begins to prepare a grocery list starting with a flat list of items needed, then proceeding to show off the systems ability to category and display the list in a variety of orders and eventually even produces a graphical optimal route to go home to complete his to dos. At some point (through a bit of stage magic using then state of the art video screen splitting equipment), Engelbart appears to initiate a kind of video conference with a colleague at a remote location and the two proceed to edit some computer code together.[1]
The wikipedia article Collaborative real-time editor was first created by user MattisManzel May 30 2005 [2], and initially listed just two software examples: SubEthaEdit and MoonEdit. On November 19 2008, a link is added to a new wikipedia page (also created that day) called Etherpad. A reference to Douglas Engelbart and the “mother of all demos” is later added on December 7, 2009. Despite getting marked as vandalism and removed the same day, the reference returns the 20th of December where it has remained until today.[3]
etherpad + OT
This essay started as a research into the algorithms underlying the Etherpad application[4], a browser-based collaborative text editor that has become central to many practices around Constant, (a Brussels-based arts and new media organization …). In my shared role as a Constant system administrator, and as a participant in the Diversions working session, I have developed a somewhat intimate relationship to this software object. The particularities of the software, and the centrality of its role to the much of the collaborative work around Constant, has significantly driven my own (and others around me) software practices. The necessity to manage problems such as etherpad’s ever-growing (and eventually unmanageably large) database size as well as the need to provide continuous access to documents given the server’s (and related ecosystem of plugins) often unstable nature has led to the development of scripts to manage static snapshots of etherpad content[5]. Similarly, the software’s high demands on network connectivity when collaborating with a large numbers of users working together in the same (physical) place led to extensive thinking about and software development for local/portable infrastructures[6].
If you want to find out how Etherpad’s Easysync works (the library that makes it really realtime), start with this PDF (complex, but worth reading).[7]
My explorations started in the Etherpad documentation, my interest piqued by that phrase complex but worth reading that appears in the README of the project’s source code. This led, in typical web fashion, to a mesh of interlinking (video) presentations and academic papers describing algorithms and techniques around collaborative (digital) writing systems. Two particular algorithms were repeatedly invoked and appeared central to this research: the operational transformation (OT) and the conflict-free replicated data type (CRDT).
David Greenspan, one of the original developers of Etherpad (and presumably an author of that complex, but worth reading PDF document), gave a helpful presentation at a special “Etherpad Meetup” in San Francisco in 2013 providing a technical description of Operational Transformation situated in the particular history of the etherpad project.[8] Greenspan starts his presentation by describing his current work for a startup web-framework project. Etherpad was itself born from another web framework called AppJet, a project Greenspan co-founded with two ex-Google engineers in 2007 to push the limits of applications based on a browser/server infrastructure rather working in isolatation like most traditional (offline) PC software. In mid 2008 he writes an email to colleagues proposing that they should try to build a cross between superhappychat (another early AppJet prototype) and SubEthaEdit (a Mac-based collaborative editor) where “merging should be done to an extent that people can comfortably co-edit a document, and perhaps text could be colored by author.”. The project is announced publically just months later, on November 19.
Documents
- A document is a list of characters, or a string.
- A document can also be represented as a list of changesets.
Changesets
- A changeset represents a change to a document.
- A changeset can be applied to a document to produce a new document.[9]
Greenspan proceeds to describe how etherpad documents are based on the concept of changesets – a concept he claims he “developed as the result of an all-nighter”. Changesets are a concise representation of just the changes an author makes to a particular text. For instance, if on were to type the following into etherpad:
this is a text
And then change the last word from “text” to “test”, the complete document as a list of changesets would be:
Changeset Interpretation -------------------- -------------------- Z:1>3*0+3$thi insert 3 characters: thi Z:4>2=3*0|1+2$s\n keep 3 chars, insert 1 line: s (newline) Z:6>2|1=5*0+2$is keep 1 line, insert 2 characters: is Z:8>1|1=5=2*0|1+1$\n keep 1 line, keep 2 characters, insert (newline) Z:9>2|2=8*0|1+2$a\n keep 2 lines, insert 1 line: a (newline) Z:b>2|3=a*0+2$te keep 3 lines, insert 2 characters: te Z:d>2|3=a=2*0+2$xt keep 3 lines, keep 2 characters, insert xt Z:f<1|3=a=2-1$ keep 3 lines, keep 2 characters, delete 1 character Z:e>1|3=a=2*0+1$s keep 3 lines, keep 2 characters, insert 1 character: s
Representing the “document as list of changesets” has something in common with a wikipedia article, where internally, a database is used to record the entire history of changes editors have made. In this way, it’s possible to view the entire timeline of edits, view differences between those edits, and eventually make “reversions” to previously written formulations or remove unwanted edits. In the case of wikipedia, the granularity of this history is each time an editor saves a version. In the case of etherpad, the granularity of editing is much finer, often reducing edits to a few keystrokes that in effect are automatically committed as each editor types. As each changeset is considered a “revision” it is usual for a document to have tens of thousands of revisions. Combined with the added overhead that each changeset also is timestamped and associated with an author, makes the seamingly compact representation grow to potentially unmanageable database sizes when used for a long period of time with many pads and authors.
While the design of the changeset representation and accompanying easysync algorithm is not for compactness it is instead for the speed of distributing these changes to other authors and for automatically merging different editors contributions. To demonstrate this, Greenspan gives the example of the text “Here is a very nice sentence” changed by two editors simultaneously into: “Here is a sentence” and “I like a nice sentence”. The goal of easysync is to combine or merge these two edits into a shared common version.
In the slide red cross-outs (deletions) and the green additions (insertions) show the changesets. The merge algorithm is designed to honor both editors’ deletions, then performs both of their insertions. In this case, the resulting merged sentence is “HereI like a sentence.”. Any user of etherpad will recognize the multi-colored formatting of the final sentence, with the two authors voices spliced together like wedges of neapolitan ice cream (each assigned a distinct color). Also familar would be the way using the software involves moments of rupture, where collectively written texts often temporarily break, or swell with duplications to then be resutured or whittled down in the process of further editing. The important point is that the software doesn’t so much choose a final version of the text, but rather the merging happens quickly enough to catalyze a social process of negotiation to an eventual correction or reworking of a text. Greenspan, for his part, is cognisant of this, and at one point reminds the audience that the algorithm is really only good for merging changes made in a very short time span (something on the order 30 seconds). The merging is purely to manage the kind of real or near real time conflicts where both editors are assumed present and actively engaged with the text. Using the algorithm to merge changes made by independent editing made out of the context of this live social negotiation simply wouldn’t make sense.
Though Greenspan only mentions it by name once at the very beginning of this presentation, the easysync algorithm is in fact an implementation of an *Operational Transformation*. In order to zoom into the algorithm, and this idea of transforming operations, Greenspan pares back the edits to an even simpler moment where the text "Hello" is again edited simultaneously to "Hello!" and "Yellow". He shows a series of diagrams where the two edits are represented as change vectors extending from a common starting point to two points of divergence (opposite corners of the diagram). The key idea of OT is how changes are exchanged and transformed among editors so that changes made by others can be (sensibly) applied to the one’s own local (and often changed) context. In other words, editor 2 will transform editor one’s exclamation point to add it to their edit Yellow, while editor one will receive editor 2’s “Y” and “w” and apply that to their edit “Hello!”. Ultimately, in addition to being design to be efficient, the goal is for the result of all the transformations to converge – that is produce the same final text. Thus OT “squares” the diagram with the result of the transformed transformations (in principle) returning back to a shared state.
Part of the challenge in working with OT is that, (at least to me) there's something kind of enchanting in it's complexity that teeters between fascination for how it appears to work, and yet ultimately remaning just out of reach of full comprehension. The very idea of transforming editing operations, themselves already a kind of transformation (insert/delete characters) seems entincingly recursive. Greenspan: "I put this in my head as you're transforming one operation against another, you are transforming it to allow for this other operation to already have happened". Greenspan also notes, with a muted sense of wonder, how each of the 5 possible transitions shown in the diagram each represent a (slightly) different transformation, and yet somehow they all relate to each other. Hearing this explanation, somewhere resonance in my head to something I once read from Donna Haraway:
Diffraction does not produce "the same" displaced, as reflection and refraction do. Diffraction is a mapping of interference, not of replication, reflection, or reproduction. A diffraction pattern does not map where differences appear, but rather maps where the effects of difference appear.[10]
And yet, the results ("HereI like a sentence") don't yet seem to quite live up to the speculative potentials of such radical diffractive representation.
CRDT
In a 2018 presentation, researcher Martin Kleppman describes an alternative technique to OT called Conflict-free replicated data type (CRDT). Kleppman begins his presentation introducing a generalized idea of “collaborative applications” and starts with an example that would be familiar to his technical audience, that of programmers commiting code with git, the popular distributed version control system.
He then moves to a “textual” example nearly identical to Greenspan, again showing two simple parallel edits merged into one. As in etherpad, the edits are shown as arrows from one text to another and labeled as “transformations” (insert “World” after “Hello”). Here, Kleppman notes that this is indeed the way that “Google Docs” work. Though Kleppman never refers to etherpad, the points he makes are valid to both as each is in fact based on the same OT-grounded principles.
Kleppman then goes on to draw a line between OT and a more recently developed concept, that of the Conflict-free replicated data type (CRDT). Kleppman, a researcher in distributed systems at the University of Cambridge, reminds his audience that OT, though originating in 1989, and implemented in a number of projects, has a checkered academic history with parts of its theories later disproven. In particular, many “edge cases” have been demonstrated where diverging edits will not converge unless somehow constrained. He notes how the particular algorithm employed by Google Docs (and etherpad) works only because it relies upon the operation of a single centralized server to ensure a consistent shared state. This, he notes, has serious consequences such as preventing say a mobile phone to synchronize data directly with a nearby laptop; indeed Google docs requires both devices to contact a “cloud server” in a data center. Thus, he gives a clear example of how a particularity in an algorithm can shape huge patterns of use with social and ecological implications.
The CRDT models Kleppman presents, in contrast, have been subject to rigorous formal testing and shown to demonstrate a property that the academic literature terms “Strong Eventual Consistency” – a guarantee that given enough time for messages from each editor to circulate to other editors, each document is guaranteed to arrive at the same resulting document state. Thus CRDTs (at least theoretically) seem better able to exploit a potentially more radical peer to peer network infrastructure.
Something interesting occurs however as Kleppman generalizes the technique of CRDTs to "data types" other than text (the DT in CRDT afterall refers to potentially many data types). He follows the “Hello World” slide by two other examples of “collaborative editing”, the graphical similarity of the diagrams suggesting a congruity between each considered case. The first shows that instead of text, editors might be editing sets. Starting from set {a, b}, one editor removes element {b} while another adds element {c} and voila: the algorithm correctly merges these to {a, c}. The second example is one that often appears in discussions of CRDTs as the simplest of cases: a “counter”. Here the two editors are shown with an initial state that says “c = 1”. Arrows are then shown labeled “increment” transforming the “text” into “c = 2”. Kleppman notes here that you might be tempted to say that both editors are in agreement with c = 2, but that this would in fact “lose information”. Instead, he shows how that when the two edits are merged, both editors arrive at the final state “c = 3”. To support this he notes:
We need to consider not just the state – the value at any one time – because if we compare just 2 and 2 they look the same, but actually we need to capture the sequence of changes that were made to the data and re-apply those, and that will allow us to reach a sensible merged outcome.
Here Kleppman makes a significant and subtle shift from speaking of text to speaking of data. Instead of insertions and deletions to text, the object at hand is "data" and "information" that presumably are unambiguous and thus can be “sensibly merged" with the precision of a logical operation.
In a textual sense, what exactly is meant with two authors write c + 1 = 2 depends on the context. If they were transcribing an algebra lecture, the repetition may indeed simply be that, an insistence upon the fact that c = 2 is indeed what was said. But even if one accepts the shift from “textual state” to “data”, the question of context remains. If indeed the “incrementer” is a “like counter”[11] then indeed doubly incrementing makes sense, but instead if the editors are collectively counting (in drinking-game style) the number of times a speaker uses a certain word, then c=3 would represent a double count.
Kleppman’s theoretical precision obscures a lack of consideration of any role the social context around the systems might in fact play a role in assessing the qualities of “collaboration” possible in such a system. While Greenspan makes reference to the fact that any kind of “auto merge” algorithm necessarily involves questions of “preserving intention”, Kleppman falls more or less into the trap of the evident clarity of precisely defined data structures. When Greenspan’s describes a “weird case” of two editors simultaneously deleting then editing opposing halves of text and notes that the algorithm is “guaranteed to look weird, but also guaranteed to work”, he’s assuming that editors are engaged with each other in the realtime loop of the editor and thus are likely to stop and negotiate. In contrast, when Kleppman at the end of his talk moves to describing a practical implementation of CRDTs it in a system called “automerge”, and the example a shopping list. XXXXXXXXXXXXXXX todo: make a point here ?
neats vs. scruffies
Friction between engineers formally defined systems, and the messy realties of (collaborative) writing practices. Is this just another case of neats vs. scruffies with Greenspan’s scruffy acceptance of the messiness of social process reflected in a lack of theoretical precision, or does Kleppman’s neat precision indeed obscure a dangerous lack of interest in the actual qualities of collaboration present, despite the logical correctness of their data structures. Blog posts such as Raph Levien's recent Toward a unified theory of OT and CRDT seem to point to a direction of healing between the "two camps".
( CUT??? It’s important to note though that here too, what the “correct” behavior is depends on the situation of use. For example, using etherpad to collectively transcribe a live event – a situation often used during Constant working sessions – it often arises that two people type duplications of the same text. Etherpads behaviour, which despite its flaws in determinism, also merges the disparate changes by including both versions of the text leads to the result that in this situation, the desire to produce a “clean” transcription results in one of the two authors needing to notice the duplication and choosing to delete either their own or the others input. It is conceivable, that a different modality of a merge algorithm might instead align the two editors contributions and instead perhaps highlight or emphasize the portions typed by both (thus reinforcing a kind of “insistence” rather than “repetition”). )
epilogue: collaborative technotexts
Technotexts: When a literary work interrogates the inscription technology that produces it, it mobilizes relexive oos between its imaginative world and the material apparatus embodying that creatin as a physical presence. [12]
TODO: epilogue: two alternative examples of “collaborative writing sytems” that function as “techno texts” (Hayles), making use of the particularity of their materiality.
- Your world of text … (simple character based level of conflict resolution – graffiti conflicts) – also anonymity ;; lots of racist / dicks etc. (without any care)
- Epicpedia, Annemieke van der Hoek (2008) (care community of wikipedia – exposing the underlying structure obscured by the wikipedia interface)
- ↑ Bootstrapping, Bardini, p. 141
- ↑ See https://en.wikipedia.org/w/index.php?title=Collaborative_real-time_editor&oldid=14449448
- ↑ See https://en.wikipedia.org/w/Collaborative_real-time_editor
- ↑ See https://etherpad.org/
- ↑ See https://gitlab.constantvzw.org/aa/etherdump
- ↑ See https://networksofonesown.constantvzw.org/etherbox/
- ↑ See https://github.com/ether/etherpad-lite/#things-you-should-know
- ↑ Transforming Text by David Greenspan, 2013 presentation at “Etherpad SF Meetup” https://www.youtube.com/watch?v=bV2JGpS-1R0
- ↑ See https://github.com/ether/etherpad-lite/blob/develop/doc/easysync/easysync-full-description.pdf
- ↑ Donna Haraway in The Promise of Monsters, p. 70, The Haraway Reader, Routlege 2004
- ↑ See for instance Consistency without consensus in production systems, which describes how Soundcloud uses CRDTs to efficiently maintain “like counts” on a globally accessible service that scales to artists with millions of followers.
- ↑ Hayles, Writing Machines, p. 25, MIT press (2002)