- Datalog UI is a Typescript port of some of differential-datalog’s ideas https://datalogui.dev/
I think an under appreciated aspect of Datalog is as a pure specification language for relationships. Schema languages like GraphQL specify a relation between types exists, but not how it is computed. Prisma and other ORM DSLs like ActiveRecord do a better job specifying how to compute a relation from SQL tables, but support only a few strict computation types that can be modeled using SQL JOIN on columns.
Datalog lets you purely specify arbitrarily complex logical relationships between data types. Even if you were to only use Datalog as a rules engine at test time, it could still be quite powerful for validating logic implemented in different languages on the same data model, to say keep your Typescript, Kotlin and Swift logic in sync.
It is really great that people are discovering the utility of triplestores through their ubiquity in the Clojure ecosystem (e.g. Datomic and its many clones).
I do feel saddened that few seem to realise that the fundamental idea actually comes from RDF (i.e. the semantic web) and that Datomic's Datalog dialect is actually very similar to SPARQL. Datomic's primary innovation is the immutability/time travel feature, not the fact that is uses triples and works like a graph database. If you find triplestores fascinating I implore you to explore the semantic web stack too.
The most important thing that datomic and friends brought, is usability and pragmatism.
While RDF has an elegant core, it is bogged down by the complixity of the web and ancillary technologies.
XML Types, cURIes, OWL, the different encodings.
They all make RDF a nightmare to work with, and to implement in practice.
The is simply no good RDF software ecosystem, yet the Spec ecosystem continues to grow (wildly out of control).
On a technical note, SPARQL and Datomic Datalog are very different beasts.
The former is conjunctive queries with optionals and regular path expressions (i.e. Conjunctive queries + XPath + nonmonotonic defaults) while the latter is recursive conjunctive queries (i.e. Datalog).
If PostgreSQL is to SQL what the browser is to HTML then what is X to SPARQL.
One of the reasons why I don't think X "exists" is because it's not a good way of working with graph.
I sometimes think there should be some alternative tech stack to RDF/SPARQL that isn't part of the semantic web but uses many of the same ideas but you know something that people can use.
I suppose X in this case would be any of the number of graph databases out there. Ontotext GraphDB, Virtuoso, AWS Neptune.
Of course there are other GraphDBs with additional properties and custom query languages such as Neo4j and its query language Cypher. I Find that Neo4j's property graph is more intuitive to model data with than a plain RDF store and Cypher has some advantages over SPARQL as well
> In Datalog databases, there are no tables. Or really everything is just stored in one table, the triple table
The author conflates Datomic family databases which use triples as their data model and datalog as their query engine, with datalog the pure prolog subset / conjunctive query + recursion fragment.
SWI Prolog's RDF store is implemented as a C extension. There's a description of the details here [1]. Apparently, it uses skip-lists for indexes.
As a curiosity... I implemented a very simple toy triplestore in Kotlin that used AVL TreeSets. [2] (only the storage layer, it has no other querying than a simple for-loop like mechanism :-) ).
Usually you'd have a set-like storage of n-tuples/structs per relation, as a row/tuple of a relation has fixed arity. Tree-sets of rows, for example. But you could just use SQL tables as a storage engine with one table per datalog relation.
Datalog is a query language similar to SQL. It doesn't store data. Some database would store data however it likes then provide a datalog query interface.
Datalog also lets you define facts and rules, in addition to posting queries. A key attraction of Datalog is that the same simple and very uniform syntax is used in all cases. All these cases are special cases of a single unifying language element called Horn clauses, which are clauses with at most one positive literal: Queries have no positive literal, rules have exactly one positive literal and at least one negative literal, and facts consist of a single positive literal.
If interested in supporting recursion, the SICP tutorial does a great job supporting it! I chose not to include this for the essay, but it's fun to hack. If you end up doing this, here's a reference Clojure implementation:
No, Datalog always has a finite (though potentially exponentially large) grounding, since no rule can introduce new values to the database (i.e., it has no “value invention”/“function terms”/“existential quantification”).
Some form of stratification is usually required when you add Negation (which the demo also doesn't seem to have). But even here it's not because the language would become Turing complete, but rather to ensure that it has a consistent semantics. Consider, e.g., a program with two rules: `A :- not B`, `B :- not A`. Unlike Datalog programs, this does not have a unique minimal model, since both `{A}` and `{B}` are minimal models. Stratification is then a syntactic restriction that prevents such situtations.
For a Datalog program and a specific database, there is always a finite set of facts the program can possibly derive (that is the “finite grounding” from above), obtained in the following way: for every predicate that appears in the “head” of a rule (i.e., any predicate for which facts are potentially derived by the program), instantiate it with all combinations of values that occur in the database or (as constants) in the program (the so-called “active domain”). In general, this will be exponentially many facts (with respect to the size of the program), and polynomially many facts (with respect to the size of the database).
Entailment in Datalog is monotonic in the sense that once a fact has been derived by the program, no further rule application can invalidate this fact (this is no longer true when adding negation, and is indeed what we recover by stratifying).
Taken together, this means that you can always compute all facts derived by a Datalog program on a database by computing the least fixed point under immediate consequences (roughly “apply all rules and add the derived facts”; this approach is called “naive evaluation” in the literature, in practice, more optimised evaluation strategies are useful), and you are guaranteed to obtain this after polynomially many (with respect to the database) or exponentially many (with respect to the program) steps – every step derives at least one fact; once you have reached a state where there are no more immediate consequences, you have derived every fact that you will ever derive and can stop.
Anything involving transitive closure, for example. Say you have a directed graph, encoding an edge from `x` to `y` as `edge(x, y)` (or as a triple `(x, edge, y)`, and you want to find all vertices reachable from `a`. In standard Datalog, you'd have two rules `reachable(x, y) :- edge(x, y)` and `reachable(x, z) :- reachable(x, y), edge(y, z)`, with the second being the recursive rule.
This is awesome. Thank you stopachka for teaching me so much incredibly valuable "force multiplying" stuff, and for de-intimidating SICP!! And all in a single post <3
I really wish datomic was free and open source. I would love to play with it and datalog a lot more to see how things really work outside of tutorial space, but having to pay to even try it is an activation energy I don't expect I will ever overcome.
- There are several such talks at https://www.hytradboi.com/ (happening this Friday)
- Roam Research and its clones Athens, Logseq, use Datascript / ClojureScript https://github.com/tonsky/datascript
- differential-datalog isn’t an end-to-end system, but is highly optimized for quick reactivity https://github.com/vmware/differential-datalog
- Datalog UI is a Typescript port of some of differential-datalog’s ideas https://datalogui.dev/
I think an under appreciated aspect of Datalog is as a pure specification language for relationships. Schema languages like GraphQL specify a relation between types exists, but not how it is computed. Prisma and other ORM DSLs like ActiveRecord do a better job specifying how to compute a relation from SQL tables, but support only a few strict computation types that can be modeled using SQL JOIN on columns.
Datalog lets you purely specify arbitrarily complex logical relationships between data types. Even if you were to only use Datalog as a rules engine at test time, it could still be quite powerful for validating logic implemented in different languages on the same data model, to say keep your Typescript, Kotlin and Swift logic in sync.