git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Leaving large binaries out of the packfile
@ 2010-06-10  6:25 Joshua Jensen
  2010-06-10 18:04 ` Shawn O. Pearce
  0 siblings, 1 reply; 5+ messages in thread
From: Joshua Jensen @ 2010-06-10  6:25 UTC (permalink / raw)
  To: git@vger.kernel.org

  Hi.

I've been dealing with a Subversion repository that contains a lot of 
large binaries.  Git generally seems to handle them reasonably enough, 
although it chokes under the pressure of a 'git gc' with this git-svn 
repository.  The repository packs total 2.7 gigabytes.  As it turns out, 
the 250 individual blob revisions worth of large binaries are about 2.4 
gigabytes of that.

Sometimes, 'git gc' runs out of memory.  I have to discover which file 
is causing the problem, so I can add it to .gitattributes with a 
'-delta' flag.  Mostly, though, the repacking takes forever, and I dread 
running the operation.

As an experiment, I added a '-pack' flag to .gitattributes.  This flag 
will leave the file type specified by the .gitattributes entry loose in 
the repository.  During a 'git gc', instead of recopying gigabytes of 
data each time, the loose objects are used.  The 'git gc' process runs 
very quick with this change.

The only issue I've found is in too_many_loose_objects().  gitk is 
always telling me the repository needs to be packed, obviously because 
of all the loose objects.

I haven't yet come up with a good idea for handling this.  I thought 
about putting the forced loose objects in a separate directory.  (This 
idea goes along with another that I want to build on top of this 
functionality, the ability to commit and have -pack binaries go to an 
alternates location.)  I have also thought about writing out a file with 
the count of forced loose objects and using that to drive the 
guesstimate made by too_many_loose_objects() down.

Does anyone have any thoughts?

Thanks!

Josh

---
  builtin/pack-objects.c |   25 +++++++++++++++++++++++++
  1 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 214d7ef..f33a7fb 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -644,6 +644,28 @@ static int no_try_delta(const char *path)
      return 0;
  }

+static void setup_pack_attr_check(struct git_attr_check *check)
+{
+    static struct git_attr *attr_pack;
+
+    if (!attr_pack)
+        attr_pack = git_attr("pack");
+
+    check[0].attr = attr_pack;
+}
+
+static int must_pack(const char *path)
+{
+    struct git_attr_check check[1];
+
+    setup_pack_attr_check(check);
+    if (git_checkattr(path, ARRAY_SIZE(check), check))
+        return 1;
+    if (ATTR_FALSE(check->value))
+        return 0;
+    return 1;
+}
+
  static int add_object_entry(const unsigned char *sha1, enum 
object_type type,
                  const char *name, int exclude)
  {
@@ -667,6 +689,9 @@ static int add_object_entry(const unsigned char 
*sha1, enum object_type type,
      if (!exclude && local && has_loose_object_nonlocal(sha1))
          return 0;

+    if (name && !must_pack(name))
+        return 0;
+
      for (p = packed_git; p; p = p->next) {
          off_t offset = find_pack_entry_one(sha1, p);
          if (offset) {
--
1.7.1.msysgit.3.1.g108b5.dirty

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-06-24  6:32 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-10  6:25 Leaving large binaries out of the packfile Joshua Jensen
2010-06-10 18:04 ` Shawn O. Pearce
2010-06-11 15:29   ` Paolo Bonzini
2010-06-11 16:17     ` Shawn O. Pearce
2010-06-24  6:32   ` Joshua Jensen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).