From: Johan Herland <johan@herland.net>
To: Shawn Pearce <spearce@spearce.org>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCHv4 09/10] pack-objects: Estimate pack size; abort early if pack size limit is exceeded
Date: Mon, 23 May 2011 19:07:27 +0200 [thread overview]
Message-ID: <201105231907.27639.johan@herland.net> (raw)
In-Reply-To: <BANLkTimQURetDxYeAJr5sVRB-ja9yuqT7Q@mail.gmail.com>
On Monday 23. May 2011, Shawn Pearce wrote:
> We can still get a tighter estimate if we wanted to. I wouldn't mix
> it into this patch, but make a new one on top of it. During delta
> compression we hold onto deltas, or at least compute and retain the
> size of the chosen delta. We could re-check the pack size after the
> Compressing phase by including the delta sizes in the estimate, and
> if we are over, abort before writing.
Ok. Not sure when I'll have the time/courage to dive into this, but I'll
at least give it a try.
> For non-delta, non-reuse we may be able to guess by just using the
> loose object size. The loose object is most likely compressed at the
> same compression ratio as the outgoing pack stream will use, so a
> deflate(inflate(loose)) cycle is going to be very close in total
> bytes used. If we over shoot the limit by more than some fudge
> factor (say 8K in 1M limit or 0.7%), abort before writing.
I already have an unsubmitted patch on top of the series that includes
the on-disk/compressed size of loose objects in the estimate. However,
it's quite intrusive (need to extend sha1_object_info() to return
compressed size of loose objects). Also, since I don't yet take the
delta compression into account, these numbers are obviously unreliable.
That said, in the cases where loose objects are not deltified it seems
the compressed/loose versions are about 3 to 7 bytes larger than the
corresponding compressed/packed versions. I guess this is due to the
loose files using a "<type> SP <size> NUL" text header (deflated),
whereas the pack uses a more compact binary format (not deflated).
We could test a large corpus (e.g. linux-kernel) to find the average
difference between compressed/loose size and compressed/packed size, and
then multiply this with the number of non-delta, non-reuse object to
determine the fudge factor you describe above.
Have fun! :)
...Johan
--
Johan Herland, <johan@herland.net>
www.herland.net
next prev parent reply other threads:[~2011-05-23 17:08 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-05-23 0:51 [PATCHv4 00/10] Push limits Johan Herland
2011-05-23 0:51 ` [PATCHv4 01/10] Update technical docs to reflect side-band-64k capability in receive-pack Johan Herland
2011-05-23 0:51 ` [PATCHv4 02/10] send-pack: Attempt to retrieve remote status even if pack-objects fails Johan Herland
2011-05-23 20:06 ` Junio C Hamano
2011-05-23 22:58 ` Johan Herland
2011-05-23 0:51 ` [PATCHv4 03/10] Tighten rules for matching server capabilities in server_supports() Johan Herland
2011-05-23 0:51 ` [PATCHv4 04/10] receive-pack: Prepare for addition of the new 'limit-*' family of capabilities Johan Herland
2011-05-23 20:21 ` Junio C Hamano
2011-05-24 0:16 ` Johan Herland
2011-05-23 0:51 ` [PATCHv4 05/10] pack-objects: Teach new option --max-commit-count, limiting #commits in pack Johan Herland
2011-05-23 23:17 ` Junio C Hamano
2011-05-24 0:18 ` Johan Herland
2011-05-23 0:51 ` [PATCHv4 06/10] send-pack/receive-pack: Allow server to refuse pushes with too many commits Johan Herland
2011-05-23 23:39 ` Junio C Hamano
2011-05-24 1:11 ` Johan Herland
2011-05-23 0:52 ` [PATCHv4 07/10] pack-objects: Allow --max-pack-size to be used together with --stdout Johan Herland
2011-05-24 0:09 ` Junio C Hamano
2011-05-24 1:15 ` Johan Herland
2011-05-23 0:52 ` [PATCHv4 08/10] send-pack/receive-pack: Allow server to refuse pushing too large packs Johan Herland
2011-05-24 0:12 ` Junio C Hamano
2011-05-23 0:52 ` [PATCHv4 09/10] pack-objects: Estimate pack size; abort early if pack size limit is exceeded Johan Herland
2011-05-23 16:11 ` Shawn Pearce
2011-05-23 17:07 ` Johan Herland [this message]
2011-05-24 0:18 ` Junio C Hamano
2011-05-24 1:17 ` Johan Herland
2011-05-23 0:52 ` [PATCHv4 10/10] receive-pack: Allow server to refuse pushes with too many objects Johan Herland
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201105231907.27639.johan@herland.net \
--to=johan@herland.net \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).