git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* git-svn has a _lot_ of metadata
@ 2007-10-16 10:22 Karl Hasselström
  2007-10-16 13:22 ` Sam Vilain
  0 siblings, 1 reply; 2+ messages in thread
From: Karl Hasselström @ 2007-10-16 10:22 UTC (permalink / raw)
  To: Eric Wong; +Cc: git list

I just imported an svn repository with about 120 tags and 140
branches, and with some repacking got the pack file down to a
comfortable 80 MB. However, .git is over 600 MB, owing to about 520 MB
of git-svn metadata. (This wasn't a problem when I only tracked a
handful of branches, since they're only a few megs apiece.)

There appears to be two kinds of metadata that takes up a significant
fraction of the space.

  * An index file is saved for each branch and tag. I presume this
    corresponds to the branch head, and is used to speed up importing
    of new revisions to that branch. However, recreating an index with
    git-read-tree is very fast, so I don't think these need to be
    saved between git-svn runs.

  * A "rev_db" file is saved for each branch and tag. This is a text
    file with one sha1 per line -- I seem to remember that line X of
    this file is the commit sha1 of svn revision X. For revisions that
    didn't touch this branch/tag, there's a line of 40 zeros. And
    since every revision touches just one branch, it's almost all
    zeros unless the number of branches is very small.

    This could probably be stored _much_ more efficiently. Just
    gzipping it with the standard options shrinks it by between a
    factor of 4 (for one of the busiest branches) and 300 (for a tag,
    which is written just once). But I understand that we need quick
    random access here?

The index files should be easy enough to erase between runs, if they
indeed just correspond to the branch head. The rev_db files are
trickier; exactly what kind of lookups are required? Could it perhaps
be done with just one file, instead of one per branch/tag?

-- 
Karl Hasselström, kha@treskal.com
      www.treskal.com/kalle

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2007-10-16 13:22 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-16 10:22 git-svn has a _lot_ of metadata Karl Hasselström
2007-10-16 13:22 ` Sam Vilain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).