Re: [PATCH] fast-import: Stream very large blobs directly to pack

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Shawn O. Pearce" <spearce@spearce.org>
To: Junio C Hamano <gitster@pobox.com>
Cc: git <git@vger.kernel.org>, Nicolas Pitre <nico@fluxnic.net>
Subject: Re: [PATCH] fast-import: Stream very large blobs directly to pack
Date: Fri, 29 Jan 2010 07:22:54 -0800	[thread overview]
Message-ID: <20100129152254.GC21821@spearce.org> (raw)
In-Reply-To: <7vockdjx6w.fsf@alter.siamese.dyndns.org>

Junio C Hamano <gitster@pobox.com> wrote:
> "Shawn O. Pearce" <spearce@spearce.org> writes:
> 
> > +static void stream_blob(
> > +	uintmax_t len,
> > +	unsigned char *sha1out,
> > +	uintmax_t mark)
> 
> A funny way to indent and line wrap...

Consistent with something else in the same file that has a similar
task... store_object().  I literally just copied the lines from
there and pasted here.

> > +	/* Determine if we should auto-checkpoint. */
> > +	if ((pack_size + 60 + len) > max_packsize
> > +		|| (pack_size + 60 + len) < pack_size)
> > +		cycle_packfile();
> 
> What's "60" in this math?

IIRC 60 is 20 bytes for the SHA-1 footer, and another 40 padding
for the object header, and the deflate() wrapping.

Again, the 60 was stolen from the existing store_object(), which
already has the same assumption.  Only there I think we have len as
the fully compressed version, so the deflate() wrapping is already
being accounted for.

> If the data is not compressible, we could even grow and the end result
> might be more than (pack_size + len), busting max_packsize.

Yea, that's a good point.  I probably should run the length through
deflateBound() and use that here in the test.  But I didn't think it
would make a huge difference.  If the file isn't compressible what
is the real increment over the uncompressed size?  There's a header
and footer, and instructions in the stream to indicate literal data,
but its not like its doubling in size.

> As we are
> streaming out, we cannot say "oops, let me try again after truncating and
> closing the current file and then opening a new file", and instead may
> have to copy the data from the current one to a new one, and truncate the
> current one.  Is this something worth worrying about?

Hmm.  I'm not that worried about it, but then there's the case of
a blob that is larger than the max pack size.  We can't store that,
period.  Should we try to exceed max pack size for that one object?
Or should we refuse?

The major reason for this test is to ensure an object offset
starts before the 4 GiB boundary so idx v1 can hold the offset.
A much less important reason is to try to support older 32 bit
filesystems which can't do more than 2 GiB per file.

-- 
Shawn.

next prev parent reply	other threads:[~2010-01-29 15:29 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-01-29  1:23 [PATCH] fast-import: Stream very large blobs directly to pack Shawn O. Pearce
2010-01-29  2:33 ` Nicolas Pitre
2010-01-29  2:37   ` Shawn O. Pearce
2010-01-29  5:29 ` Junio C Hamano
2010-01-29 15:22   ` Shawn O. Pearce [this message]
2010-01-29 16:38     ` [PATCH v2] " Shawn O. Pearce
2010-01-29 18:29       ` Jakub Narebski
2010-01-29 18:30         ` Shawn O. Pearce
2010-01-29 23:02           ` A Large Angry SCM
2010-01-30  7:17             ` Junio C Hamano
2010-01-29 18:35 ` [PATCH] " Sverre Rabbelier
2010-01-29 18:37   ` Shawn O. Pearce
2010-01-29 18:41     ` Sverre Rabbelier
2010-01-29 18:44       ` Shawn O. Pearce
2010-01-30  3:41     ` Junio C Hamano
2010-01-30  6:19       ` Junio C Hamano
2010-01-30  7:33         ` Junio C Hamano
2010-02-01 15:28           ` Shawn O. Pearce
2010-02-01 20:14             ` Junio C Hamano
2010-02-04  2:01             ` Junio C Hamano
2010-02-04  2:07               ` Shawn O. Pearce
2010-02-04  2:25                 ` Junio C Hamano
2010-02-04  2:27                   ` Junio C Hamano
2010-02-04  2:30                     ` Shawn O. Pearce
2010-02-04  2:28               ` Nicolas Pitre
2010-02-01 15:41           ` [PATCH] fast-import: Document the core.bigFileThreshold configuration setting Shawn O. Pearce
2010-02-01 15:23         ` [PATCH] fast-import: Stream very large blobs directly to pack Shawn O. Pearce

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100129152254.GC21821@spearce.org \
    --to=spearce@spearce.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=nico@fluxnic.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).