git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Ali Tofigh <alix.tofigh@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: many files, simple history
Date: Fri, 14 May 2010 00:05:59 -0400	[thread overview]
Message-ID: <20100514040559.GB6075@coredump.intra.peff.net> (raw)
In-Reply-To: <AANLkTinHZbJ4obpa1FpT8boFWjNYpgU184HUTvki_A0G@mail.gmail.com>

On Thu, May 13, 2010 at 10:57:22PM -0400, Ali Tofigh wrote:

> short version: will git handle large number of files efficiently if
> the history is simple and linear, i.e., without merges?

Short answer: large number of files, yes, large files, not really. The
shape of history is largely irrelevant.

Longer answer:

Git separates the conceptual structure of history (the digraph of
commits, and the pointers of commits to trees to blobs) from the actual
storage of objects representing that history. Problems with large files
are usually storage issues. Copying them around in packfiles is
expensive, storing an extra copy in the repo is expensive, trying deltas
and diffs is expensive. None of those things has to do with the shape of
your history. So I would expect git to handle such a load with a linear
history about as well as a complex history with merges.

For large numbers of files, git generally does a good job, especially if
those files are distributed throughout a directory hierarchy. But keep
in mind that the git repo will store another copy of every file. They
will be delta-compressed between versions, and zlib compressed overall,
but you may potentially be doubling the amount of disk space required if
you have a lot of uncompressible binary files.

For large files, git expects to be able to pull each file into memory.
Sometimes two versions if you are doing a diff. And it will copy those
files around when repacking (which you will want to do for the sake of
the smaller files). So files on the order of a few megabytes are not a
problem. If you have files in the hundreds of megabytes or gigabytes,
expect some operations to be slow (like repacking).

Really, I would start by just "git add"-ing your whole filesystem, doing
a "git repack -ad", and seeing how long it takes, and what the resulting
size is.

-Peff

  parent reply	other threads:[~2010-05-14  4:06 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-14  2:57 many files, simple history Ali Tofigh
2010-05-14  3:27 ` Michael Witten
2010-05-14 16:25   ` Ali Tofigh
2010-05-14  4:05 ` Jeff King [this message]
2010-05-14 16:18   ` Ali Tofigh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100514040559.GB6075@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=alix.tofigh@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).