git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Adam Heath <doogie@brainfood.com>
To: git@vger.kernel.org
Subject: large(25G) repository in git
Date: Mon, 23 Mar 2009 16:10:11 -0500	[thread overview]
Message-ID: <49C7FAB3.7080301@brainfood.com> (raw)

We maintain a website in git.  This website has a bunch of backend
server code, and a bunch of data files.  Alot of these files are full
videos.

We use git, so that the distributed nature of website development can
be supported.  Quite often, you'll have a production server, with
online changes occurring(we support in-browser editting of content), a
preview server, where large-scale code changes can be previewed, then
a development server, one per programmer(or more).

Last friday, I was doing a checkin on the production server, and found
1.6G of new files.  git was quite able at committing that.  However,
pushing was problematic.  I was pushing over ssh; so, a new ssh
connection was open to the preview server.  After doing so, git tried
to create a new pack file.  This took *ages*, and the ssh connection
died.  So did git, when it finally got done with the new pack, and
discovered the ssh connection was gone.

So, to work around that, I ran git gc.  When done, I discovered that
git repacked the *entire* repository.  While not something I care for,
I can understand that, and live with it.  It just took *hours* to do so.

Then, what really annoys me, is that when I finally did the push, it
tried sending the single 27G pack file, when the remote already had
25G of the repository in several different packs(the site was an
hg->git conversion).  This part is just unacceptable.

So, here are my questions/observations:

1: Handle the case of the ssh connection dying during git push(seems
simple).

2: Is there an option to tell git to *not* be so thorough when trying
to find similiar files.  videos/doc/pdf/etc aren't always very
deltafiable, so I'd be happy to just do full content compares.

3: delta packs seem to be poorly done.  it seems that if one repo gets
repacked completely, that the entire new pack gets sent, when the
target has most of the objects already.

4: Are there any config options I can set to help in this?  There are
tons of options, and some documentation as to what each one does, but
no recommended practices type doc, that describes what should be done
for different kinds of workflows.

ps: Thank you for your time.  I hope that someone has answers for me.

pps: I'm not subscribed, please cc me.  If I need to be subscribed,
I'll do so, if told.

             reply	other threads:[~2009-03-23 21:19 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-23 21:10 Adam Heath [this message]
2009-03-24  1:19 ` large(25G) repository in git Nicolas Pitre
2009-03-24 17:59   ` Adam Heath
2009-03-24 18:31     ` Nicolas Pitre
2009-03-24 20:55       ` Adam Heath
2009-03-25  1:21         ` Nicolas Pitre
2009-03-24 18:33     ` david
2009-03-24  8:59 ` Andreas Ericsson
2009-03-24 22:35   ` Adam Heath
2009-03-24 21:04 ` Sam Hocevar
2009-03-24 21:44   ` Adam Heath
2009-03-25  0:28     ` Nicolas Pitre
2009-03-25  0:57       ` Adam Heath
2009-03-25  1:47         ` Nicolas Pitre
2009-03-26 15:43 ` Marcel M. Cary
2009-03-26 16:35   ` Adam Heath

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49C7FAB3.7080301@brainfood.com \
    --to=doogie@brainfood.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).