Datalog in JavaScript

jitl · on April 27, 2022

It’s fascinating to see so many different parties converging on Datalog for reactive apps & UI.

- There are several such talks at https://www.hytradboi.com/ (happening this Friday)

- Roam Research and its clones Athens, Logseq, use Datascript / ClojureScript https://github.com/tonsky/datascript

- differential-datalog isn’t an end-to-end system, but is highly optimized for quick reactivity https://github.com/vmware/differential-datalog

- Datalog UI is a Typescript port of some of differential-datalog’s ideas https://datalogui.dev/

I think an under appreciated aspect of Datalog is as a pure specification language for relationships. Schema languages like GraphQL specify a relation between types exists, but not how it is computed. Prisma and other ORM DSLs like ActiveRecord do a better job specifying how to compute a relation from SQL tables, but support only a few strict computation types that can be modeled using SQL JOIN on columns.

Datalog lets you purely specify arbitrarily complex logical relationships between data types. Even if you were to only use Datalog as a rules engine at test time, it could still be quite powerful for validating logic implemented in different languages on the same data model, to say keep your Typescript, Kotlin and Swift logic in sync.

simongray · on April 27, 2022

It is really great that people are discovering the utility of triplestores through their ubiquity in the Clojure ecosystem (e.g. Datomic and its many clones).

I do feel saddened that few seem to realise that the fundamental idea actually comes from RDF (i.e. the semantic web) and that Datomic's Datalog dialect is actually very similar to SPARQL. Datomic's primary innovation is the immutability/time travel feature, not the fact that is uses triples and works like a graph database. If you find triplestores fascinating I implore you to explore the semantic web stack too.

j-pb · on April 27, 2022

The most important thing that datomic and friends brought, is usability and pragmatism.

While RDF has an elegant core, it is bogged down by the complixity of the web and ancillary technologies.

XML Types, cURIes, OWL, the different encodings.

They all make RDF a nightmare to work with, and to implement in practice.

The is simply no good RDF software ecosystem, yet the Spec ecosystem continues to grow (wildly out of control).

On a technical note, SPARQL and Datomic Datalog are very different beasts.

The former is conjunctive queries with optionals and regular path expressions (i.e. Conjunctive queries + XPath + nonmonotonic defaults) while the latter is recursive conjunctive queries (i.e. Datalog).

greendream17 · on April 27, 2022

If PostgreSQL is to SQL what the browser is to HTML then what is X to SPARQL. One of the reasons why I don't think X "exists" is because it's not a good way of working with graph. I sometimes think there should be some alternative tech stack to RDF/SPARQL that isn't part of the semantic web but uses many of the same ideas but you know something that people can use.

amanj41 · on April 27, 2022

I suppose X in this case would be any of the number of graph databases out there. Ontotext GraphDB, Virtuoso, AWS Neptune.

Of course there are other GraphDBs with additional properties and custom query languages such as Neo4j and its query language Cypher. I Find that Neo4j's property graph is more intuitive to model data with than a plain RDF store and Cypher has some advantages over SPARQL as well

macmac · on April 27, 2022

Rich often mentions the RDF heritage. I think the separation of reads and writes through the transactor is also a significant innovation.

cscheid · on April 27, 2022

I'm sorry, but Datalog has existed since 1977; RDF, since 1999.

simongray · on April 27, 2022

Good thing I was talking triplestores and not Datalog then.

j-pb · on April 27, 2022

> In Datalog databases, there are no tables. Or really everything is just stored in one table, the triple table

The author conflates Datomic family databases which use triples as their data model and datalog as their query engine, with datalog the pure prolog subset / conjunctive query + recursion fragment.

Not a great start for a novel DB product...

turminal · on April 27, 2022

How does datalog the pure prolog subset store its data?

emmanueloga_ · on April 27, 2022

SWI Prolog's RDF store is implemented as a C extension. There's a description of the details here [1]. Apparently, it uses skip-lists for indexes.

As a curiosity... I implemented a very simple toy triplestore in Kotlin that used AVL TreeSets. [2] (only the storage layer, it has no other querying than a simple for-loop like mechanism :-) ).

1: https://github.com/SWI-Prolog/packages-semweb/blob/master/do...

2: https://gist.github.com/EmmanuelOga/1d52adb79bdb6092fb698ed5...

maweki · on April 27, 2022

Usually you'd have a set-like storage of n-tuples/structs per relation, as a row/tuple of a relation has fixed arity. Tree-sets of rows, for example. But you could just use SQL tables as a storage engine with one table per datalog relation.

stonemetal12 · on April 27, 2022

Datalog is a query language similar to SQL. It doesn't store data. Some database would store data however it likes then provide a datalog query interface.

triska · on April 27, 2022

Datalog also lets you define facts and rules, in addition to posting queries. A key attraction of Datalog is that the same simple and very uniform syntax is used in all cases. All these cases are special cases of a single unifying language element called Horn clauses, which are clauses with at most one positive literal: Queries have no positive literal, rules have exactly one positive literal and at least one negative literal, and facts consist of a single positive literal.

trurl · on April 27, 2022

If there is no support for recursion, this can hardly be called "Datalog".

stopachka · on April 27, 2022

If interested in supporting recursion, the SICP tutorial does a great job supporting it! I chose not to include this for the essay, but it's fun to hack. If you end up doing this, here's a reference Clojure implementation:

https://github.com/nezaj/clj-sicp/blob/master/src/logic_inte...

(The main ingredient, if I remember correctly, was `unify-match`)

refset · on April 27, 2022

Agreed! For example in DataScript's Datalog: https://github.com/tonsky/datascript/blob/4f1af628d5650e0ca0...

weatherlight · on April 27, 2022

I feel like, it would be hard to do initially

You 'd have to implement some sort of stratification to effectively make sure Datalog stays Turing incomplete when allowing for recursion.

mmarx · on April 27, 2022

No, Datalog always has a finite (though potentially exponentially large) grounding, since no rule can introduce new values to the database (i.e., it has no “value invention”/“function terms”/“existential quantification”).

Some form of stratification is usually required when you add Negation (which the demo also doesn't seem to have). But even here it's not because the language would become Turing complete, but rather to ensure that it has a consistent semantics. Consider, e.g., a program with two rules: `A :- not B`, `B :- not A`. Unlike Datalog programs, this does not have a unique minimal model, since both `{A}` and `{B}` are minimal models. Stratification is then a syntactic restriction that prevents such situtations.

zambal · on April 27, 2022

But what about potential loops, as in a cyclic graph. How to make sure they terminate?

mmarx · on April 27, 2022

For a Datalog program and a specific database, there is always a finite set of facts the program can possibly derive (that is the “finite grounding” from above), obtained in the following way: for every predicate that appears in the “head” of a rule (i.e., any predicate for which facts are potentially derived by the program), instantiate it with all combinations of values that occur in the database or (as constants) in the program (the so-called “active domain”). In general, this will be exponentially many facts (with respect to the size of the program), and polynomially many facts (with respect to the size of the database).

Entailment in Datalog is monotonic in the sense that once a fact has been derived by the program, no further rule application can invalidate this fact (this is no longer true when adding negation, and is indeed what we recover by stratifying).

Taken together, this means that you can always compute all facts derived by a Datalog program on a database by computing the least fixed point under immediate consequences (roughly “apply all rules and add the derived facts”; this approach is called “naive evaluation” in the literature, in practice, more optimised evaluation strategies are useful), and you are guaranteed to obtain this after polynomially many (with respect to the database) or exponentially many (with respect to the program) steps – every step derives at least one fact; once you have reached a state where there are no more immediate consequences, you have derived every fact that you will ever derive and can stop.

weatherlight · on April 28, 2022

very insightful, thanks!

substation13 · on April 27, 2022

Can you give an example of a query that won't work with this demo?

mmarx · on April 27, 2022

Anything involving transitive closure, for example. Say you have a directed graph, encoding an edge from `x` to `y` as `edge(x, y)` (or as a triple `(x, edge, y)`, and you want to find all vertices reachable from `a`. In standard Datalog, you'd have two rules `reachable(x, y) :- edge(x, y)` and `reachable(x, z) :- reachable(x, y), edge(y, z)`, with the second being the recursive rule.

haolez · on April 27, 2022

I was expecting to get a fish (e.g. a JS library), but the author gave me a fishing pole. Well played!

graderjs · on April 27, 2022

This is awesome. Thank you stopachka for teaching me so much incredibly valuable "force multiplying" stuff, and for de-intimidating SICP!! And all in a single post <3

stopachka · on April 27, 2022

This brightened my day to read. Thank you!

graderjs · on April 27, 2022

I hope you have a fantastic rest of your day, seriously. You're a phenom!

netcraft · on April 27, 2022

I really wish datomic was free and open source. I would love to play with it and datalog a lot more to see how things really work outside of tutorial space, but having to pay to even try it is an activation energy I don't expect I will ever overcome.

jitl · on April 27, 2022

There is a similar open-source database called XTDB: https://xtdb.com/

I still haven’t found the activation energy, though.

efitz · on April 27, 2022

The title seemed like a non sequitur to me, like “Rembrandt is n finger paints” or “Bach on the kazoo”.