From: Linus Torvalds <torvalds@linux-foundation.org>
To: Alan Manuel Gloria <almkglor@gmail.com>
Cc: Jeff King <peff@peff.net>, Nicolas Pitre <nico@cam.org>,
Jakub Narebski <jnareb@gmail.com>,
Christopher Jefferson <caj@cs.st-andrews.ac.uk>,
git@vger.kernel.org
Subject: Re: Problem with large files on different OSes
Date: Wed, 27 May 2009 18:56:49 -0700 (PDT) [thread overview]
Message-ID: <alpine.LFD.2.01.0905271825520.3435@localhost.localdomain> (raw)
In-Reply-To: <f95910c20905271609u63d04965oa38b8af34d7704c1@mail.gmail.com>
On Thu, 28 May 2009, Alan Manuel Gloria wrote:
>
> If you'd prefer someone else to hack it, can you at least give me some
> pointers on which code files to start looking? I'd really like to
> have proper large-file-packing support, where large file is anything
> much bigger than a megabyte or so.
>
> Admittedly I'm not a filesystems guy and I can just barely grok git's
> blobs (they're the actual files, right? except they're named with
> their hash), but not packs (err, a bunch of files?) and trees (brown
> and green stuff you plant?). Still, I can try to learn it.
The packs is a big part of the complexity.
If you were to keep the big files as unpacked blobs, that would be
fairly simple - but the pack-file format is needed for fetching and
pushing things, so it's not really an option.
For your particular case, the simplest approach is probably to just
limit the delta search. Something like just saying "if the object is
larger than X, don't even bother to try to delta it, and just pack it
without delta compression".
The code would still load that whole object in one go, but it sounds like
you can handle _one_ object at a time. So for your case, I don't think you
need a fundamental git change - you'd be ok with just an inefficient pack
format for large files that are very expensive to pack otherwise.
You can already do that by using .gitattributes to not delta entries
by name, but maybe it's worth doing explicitly by size too.
I realize that the "delta" attribute is apparently almost totally
undocumented. But if your big blobs have a particular name pattern, what
you should try is to do something like
- in your '.gitattributes' file (or .git/info/attributes if you don't
want to check it in), add a line like
*.img !delta
which now sets the 'delta' attribute to false for all objects that
match the '*.img' pattern.
- see if pack creation is now acceptable (ie do a "git gc" or try to push
somewhere)
Something like the following may also work, as a more generic "just don't
even bother trying to delta huge files".
Totally untested. Maybe it works. Maybe it doesn't.
Linus
---
Documentation/config.txt | 7 +++++++
builtin-pack-objects.c | 9 +++++++++
2 files changed, 16 insertions(+), 0 deletions(-)
diff --git a/Documentation/config.txt b/Documentation/config.txt
index 2c03162..8c21027 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1238,6 +1238,13 @@ older version of git. If the `{asterisk}.pack` file is smaller than 2 GB, howeve
you can use linkgit:git-index-pack[1] on the *.pack file to regenerate
the `{asterisk}.idx` file.
+pack.packDeltaLimit::
+ The default maximum size of objects that we try to delta.
++
+Big files can be very expensive to delta, and if they are large binary
+blobs, there is likely little upside to it anyway. So just pack them
+as-is, and don't waste time on them.
+
pack.packSizeLimit::
The default maximum size of a pack. This setting only affects
packing to a file, i.e. the git:// protocol is unaffected. It
diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index 9742b45..9a0072b 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -85,6 +85,7 @@ static struct progress *progress_state;
static int pack_compression_level = Z_DEFAULT_COMPRESSION;
static int pack_compression_seen;
+static unsigned long pack_delta_limit = 64*1024*1024;
static unsigned long delta_cache_size = 0;
static unsigned long max_delta_cache_size = 0;
static unsigned long cache_max_small_delta_size = 1000;
@@ -1270,6 +1271,10 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
if (trg_entry->type != src_entry->type)
return -1;
+ /* If we limit delta generation, don't even bother for larger blobs */
+ if (pack_delta_limit && trg_entry->size >= pack_delta_limit)
+ return -1;
+
/*
* We do not bother to try a delta that we discarded
* on an earlier try, but only when reusing delta data.
@@ -1865,6 +1870,10 @@ static int git_pack_config(const char *k, const char *v, void *cb)
pack_size_limit_cfg = git_config_ulong(k, v);
return 0;
}
+ if (!strcmp(k, "pack.packdeltalimit")) {
+ pack_delta_limit = git_config_ulong(k, v);
+ return 0;
+ }
return git_default_config(k, v, cb);
}
next prev parent reply other threads:[~2009-05-28 1:57 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-05-27 10:52 Problem with large files on different OSes Christopher Jefferson
2009-05-27 11:37 ` Andreas Ericsson
2009-05-27 13:02 ` Christopher Jefferson
2009-05-27 13:28 ` John Tapsell
2009-05-27 13:30 ` Christopher Jefferson
2009-05-27 13:32 ` John Tapsell
2009-05-27 14:01 ` Tomas Carnecky
2009-05-27 14:09 ` Christopher Jefferson
2009-05-27 14:22 ` Andreas Ericsson
2009-05-27 14:37 ` Jakub Narebski
2009-05-27 16:30 ` Linus Torvalds
2009-05-27 16:59 ` Linus Torvalds
2009-05-27 17:22 ` Christopher Jefferson
2009-05-27 17:30 ` Jakub Narebski
2009-05-27 17:37 ` Nicolas Pitre
2009-05-27 21:53 ` Jeff King
2009-05-27 22:07 ` Linus Torvalds
2009-05-27 23:09 ` Alan Manuel Gloria
2009-05-28 1:56 ` Linus Torvalds [this message]
2009-05-28 3:26 ` Nicolas Pitre
2009-05-28 4:21 ` Eric Raible
2009-05-28 4:30 ` Shawn O. Pearce
2009-05-28 5:52 ` Eric Raible
2009-05-28 8:52 ` Andreas Ericsson
2009-05-28 17:41 ` Nicolas Pitre
2009-05-28 19:43 ` Jeff King
2009-05-28 19:49 ` Linus Torvalds
2009-05-27 23:29 ` Nicolas Pitre
2009-05-28 20:00 ` Jeff King
2009-05-28 20:54 ` Nicolas Pitre
2009-05-28 21:21 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.2.01.0905271825520.3435@localhost.localdomain \
--to=torvalds@linux-foundation.org \
--cc=almkglor@gmail.com \
--cc=caj@cs.st-andrews.ac.uk \
--cc=git@vger.kernel.org \
--cc=jnareb@gmail.com \
--cc=nico@cam.org \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).