git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Shawn O. Pearce" <spearce@spearce.org>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: david@lang.hm, Junio C Hamano <junkio@cox.net>,
	Dana How <danahow@gmail.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH] Prevent megablobs from gunking up git packs
Date: Thu, 24 May 2007 20:55:07 -0400	[thread overview]
Message-ID: <20070525005507.GR28023@spearce.org> (raw)
In-Reply-To: <Pine.LNX.4.64.0705241828160.4648@racer.site>

Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> On Thu, 24 May 2007, david@lang.hm wrote:
> > On Thu, 24 May 2007, Shawn O. Pearce wrote:
> > 
> > > Now #3 is actually really important here.  Don't forget that we
> > > *just* disabled the fancy "new loose object format".  It doesn't
> > > exist.  We can read the packfile-like loose objects, but we cannot
> > > write them anymore.  So lets say we explode a megablob into a loose
> > > object, and its 800 MiB by itself.  Now we have to send that object
> > > to a client.  Yes, that's right, we must *RECOMPRESS* 800 MiB for
> > > no reason.  Not the best choice.  Maybe we shouldn't have deleted
> > > that packfile formatted loose object writer...
> > 
> > when did the object store get changed so that loose objects aren't
> > compressed?
> 
> That never happened. But we had a different file format for loose objects, 
> which was meant to make it easier to copy as-is into a pack. That file 
> format went away, since it was not as useful as we hoped.

That "different file format" thing was added exactly for this type
of problem.  Someone added a bunch of large blobs to their repository
and then spent a lot of time decompressing and recompressing them
during their next repack.

The reason that recompress must happen is the deflate stream in a
standard (aka legacy) loose object contains both the Git object
header and the raw data; in a packfile the Git object header is
stored external from the deflate stream.  The "different file format"
used the packfile format, allowing us to store the Git object header
external from the deflate stream.  That meant we could just copy
the raw bytes as-is from the loose object into the packfile.

So we still store loose objects compressed, its just that we can
no longer create loose objects that can be copied directly into
a packfile without recompression.  And that is sort of Dana's
problem here.  OK, not entirely, but whatever.

-- 
Shawn.

  reply	other threads:[~2007-05-25  0:55 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-22  6:14 [PATCH] Prevent megablobs from gunking up git packs Dana How
2007-05-22  6:30 ` Shawn O. Pearce
2007-05-22  7:33   ` Dana How
2007-05-22  6:52 ` Junio C Hamano
2007-05-22  8:00   ` Dana How
2007-05-22 11:05     ` Jakub Narebski
2007-05-22 16:59       ` Dana How
2007-05-22 23:44         ` Jakub Narebski
2007-05-23  0:28           ` Junio C Hamano
2007-05-23  1:58             ` Nicolas Pitre
2007-05-22 17:38 ` Nicolas Pitre
2007-05-22 18:07   ` Dana How
2007-05-23 22:08 ` Junio C Hamano
2007-05-23 23:55   ` Dana How
2007-05-24  1:44     ` Junio C Hamano
2007-05-24  7:12       ` Shawn O. Pearce
2007-05-24  9:38         ` Johannes Schindelin
2007-05-24 17:23         ` david
2007-05-24 17:29           ` Johannes Schindelin
2007-05-25  0:55             ` Shawn O. Pearce [this message]
2007-05-24 20:43         ` Geert Bosch
2007-05-24 23:29         ` Dana How
2007-05-25  2:06           ` Shawn O. Pearce
2007-05-25  5:44             ` Nicolas Pitre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070525005507.GR28023@spearce.org \
    --to=spearce@spearce.org \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=danahow@gmail.com \
    --cc=david@lang.hm \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).