git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Aguilar <davvid@gmail.com>
To: Avery Pennarun <apenwarr@gmail.com>
Cc: sebastianspublicaddress@googlemail.com, git@vger.kernel.org
Subject: Re: How do you best store structured data in git repositories?
Date: Thu, 3 Dec 2009 16:14:10 -0800	[thread overview]
Message-ID: <20091204001359.GA6709@gmail.com> (raw)
In-Reply-To: <32541b130912021317y705d1d4cj28e230a3e727df2e@mail.gmail.com>

On Wed, Dec 02, 2009 at 04:17:10PM -0500, Avery Pennarun wrote:
> On Wed, Dec 2, 2009 at 4:08 PM, Sebastian Setzer
> <sebastianspublicaddress@googlemail.com> wrote:
> > Do you store everything in a single file and configure git to use
> > special diff- and merge-tools?
> > Do you use XML for this purpose?
> 
> XML is terrible for most data storage purposes.  Data exchange, maybe,
> but IMHO the best thing you can do when you get XML data is to put it
> in some other format ASAP.

I agree 100%.

JSON's not too bad for data structures and is known to
be friendly to XML expats.

http://json.org/


> That said, however, you should still try to make your files as stable
> as possible, because:
> 
> - If your program outputs the data in random order, it's just being
> sloppy anyway
> 
> - 'git diff' doesn't work usefully otherwise (for examining the data
> and debugging)


If you were using Python + simplejson then using something
like the sort_keys=True flag would ensure that your data
is stable as the dictionaries keys will always appear in a
deterministic order.

Since I mentioned JSON and git in the same email then I might as
well also mention an old UGFWIINI candidate:

http://www.ordecon.com/2009/04/22/is-git-more-than-just-a-version-control-system/


Lastly, BERT might not be a good choice for storing inside
of a git repository, but it is a nice format for representing
data structures:

http://github.com/blog/531-introducing-bert-and-bert-rpc


We've been using git for tracking changes to a large set of
JSON files at $dayjob and it's worked out pretty well.

I'd suggest that you try to break your data up into multiple
files if possible.  As someone else mentioned, it's often
easier to diff and merge stuff if you structure things in a
merge-friendly way.

One feature that we've implemented is file referencing
where data can "#include" another data file.  That is
the kind of thing that can make things easier on you if
you foresee having a lot of common data that can be
shared amongst the various different files.

-- 
		David

  reply	other threads:[~2009-12-04  0:13 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-12-02 21:08 How do you best store structured data in git repositories? Sebastian Setzer
2009-12-02 21:17 ` Avery Pennarun
2009-12-04  0:14   ` David Aguilar [this message]
2009-12-04  1:45     ` Avery Pennarun
2009-12-04  8:00       ` jamesmikedupont
2009-12-07 21:20     ` Sebastian Setzer
2009-12-08  7:14       ` David Aguilar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091204001359.GA6709@gmail.com \
    --to=davvid@gmail.com \
    --cc=apenwarr@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=sebastianspublicaddress@googlemail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).