git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Marcel M. Cary" <marcel@oak.homeunix.org>
To: Adam Heath <doogie@brainfood.com>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: large(25G) repository in git
Date: Thu, 26 Mar 2009 08:43:39 -0700	[thread overview]
Message-ID: <49CBA2AB.30304@oak.homeunix.org> (raw)
In-Reply-To: <49C7FAB3.7080301@brainfood.com>

Adam Heath wrote:
> We maintain a website in git.  This website has a bunch of backend
> server code, and a bunch of data files.  Alot of these files are full
> videos.
>
> We use git, so that the distributed nature of website development can
> be supported.  Quite often, you'll have a production server, with
> online changes occurring(we support in-browser editting of content), a
> preview server, where large-scale code changes can be previewed, then
> a development server, one per programmer(or more).

My company manages code in a similar way, except we avoid this kind of
issue (with 100 gigabytes of user-uploaded images and other data) by not
checking in the data.  We even went so far is as to halve the size of
our repository by removing 2GB of non-user-supplied images -- rounded
corners, background gradients, logos, etc, etc.  This made Git
noticeably faster.

While I'd love to be able to handle your kind of use case and data size
with Git in that way, it's a little beyond the intended usage to handle
hundreds of gigabytes of binary data, I think.

I imagine as your web site grows, which I'm assuming is your goal, your
problems with scaling Git will continue to be a challenge.

Maybe you can find a way to:

* Get along with less data in your non-production environments; we're
hoping to be able to do this eventually

* Find other ways to copy it; we use rsync even though it does take
forever to crawl over the file system

* Put your data files in a separate Git repository, at least, assuming
your checkin, update, and release code more often than your video files.
 That way you'll experience pain less often, and maybe even be able to
tune your repository differently.

Marcel

  parent reply	other threads:[~2009-03-26 15:45 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-23 21:10 large(25G) repository in git Adam Heath
2009-03-24  1:19 ` Nicolas Pitre
2009-03-24 17:59   ` Adam Heath
2009-03-24 18:31     ` Nicolas Pitre
2009-03-24 20:55       ` Adam Heath
2009-03-25  1:21         ` Nicolas Pitre
2009-03-24 18:33     ` david
2009-03-24  8:59 ` Andreas Ericsson
2009-03-24 22:35   ` Adam Heath
2009-03-24 21:04 ` Sam Hocevar
2009-03-24 21:44   ` Adam Heath
2009-03-25  0:28     ` Nicolas Pitre
2009-03-25  0:57       ` Adam Heath
2009-03-25  1:47         ` Nicolas Pitre
2009-03-26 15:43 ` Marcel M. Cary [this message]
2009-03-26 16:35   ` Adam Heath

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49CBA2AB.30304@oak.homeunix.org \
    --to=marcel@oak.homeunix.org \
    --cc=doogie@brainfood.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).