git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Cedric Sodhi <manday@openmail.cc>
To: Simon Richter <Simon.Richter@hogyros.de>
Cc: git@vger.kernel.org
Subject: Re: Git for structured data
Date: Sun, 7 Dec 2025 18:23:30 +0100	[thread overview]
Message-ID: <aTW4EsiThW7WFwSl@air> (raw)
In-Reply-To: <2ae0a2d5-e909-4c51-9459-83f5c6950d51@hogyros.de>

Hi Simon

If your suggestion at this point would be that I consider implementing an VCS in the database instead of basing it on Git, I'll be sceptical. I'd end up re-implementing Git's features.

I agree with many things you say. There is no magic recipe to apply Git to relational databases; specific tools -- at the very least one per database type, but possibly tailored further to the specific data it holds -- would have to be written.

However, I do think Git generalizes to RDBS more readily than it may seem and, in fact, one method to map a DB into a filesystem-isomorphic thing which Git knows how to handle, would fit 99% of all cases. Your example from KiCad (which can be understood as content which are stored in a RDBS), could be a good illustration:

If you normalize the file (one element per line) by sorting by UIDs corresponding to the individual elements, then you'd see no diff unless either UID or contents change. And UIDs typically wouldn't change unless you actually delete-and-recreate something. Every table in the DB which is normalized that way could be mapped as a single file. Of course a more granular mapping table-file or even row-file would be possible. In fact, mapping one row (or element of the Schema/Layout) per file would exploit Git's ability to detect "moves" meaning that you wouldn't even need UIDs for the elements to create good diffs. There is power in the semantics of the filesystem hierarchy which you lose when all contents become a single database/KiCad-file.

In short: In my opinion, there really doesn't seem to be any algorithmic difficulty. The only thing that stops Git from doing praticable versioning of databases is its inability to access its contents transparently in any of most trivial manners.

Best,
Cedric

      reply	other threads:[~2025-12-07 17:24 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-05 16:51 Git for structured data Cedric Sodhi
2025-12-06 16:27 ` René Scharfe
2025-12-06 18:47   ` Cedric Sodhi
2025-12-06 21:02     ` Christian Couder
2025-12-07  5:26 ` Simon Richter
2025-12-07 17:23   ` Cedric Sodhi [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aTW4EsiThW7WFwSl@air \
    --to=manday@openmail.cc \
    --cc=Simon.Richter@hogyros.de \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).