From: Shawn Pearce <spearce@spearce.org>
To: Johan Herland <johan@herland.net>
Cc: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Subject: Re: [PATCHv4 09/10] pack-objects: Estimate pack size; abort early if pack size limit is exceeded
Date: Mon, 23 May 2011 09:11:15 -0700 [thread overview]
Message-ID: <BANLkTimQURetDxYeAJr5sVRB-ja9yuqT7Q@mail.gmail.com> (raw)
In-Reply-To: <1306111923-16859-10-git-send-email-johan@herland.net>
On Sun, May 22, 2011 at 17:52, Johan Herland <johan@herland.net> wrote:
> Currently, when pushing a pack to the server that has specified a pack size
> limit, we don't detect that we exceed that limit until we have already
> generated (and started transmitting) that much pack data.
>
> Ideally, we should be able to predict the approximate pack size _before_ we
> start generating and transmitting the pack data, and abort early if the
> estimated pack size exceeds the pack size limit.
>
> This patch tries to provide such an estimate: It looks at the objects that
> are to be included in the pack, and for already-packed objects, it assumes
> that their compressed in-pack size is a good estimate of how much they will
> contribute to the pack currently being generated. This assumption should be
> valid as long as the objects are reused as-is.
This looks good to me.
> I'm not really happy with excluding loose objects in the pack size
> estimate. However, the size contributed by loose objects varies wildly
> depending on whether a (good) delta is found. Therefore, any estimate
> done at an early stage is bound to be wildly inaccurate. We could maybe
> use some sort of absolute minimum size per object instead, but I
> thought I should publish this version before spending more time futzing
> with it...
>
> A drawback of not including loose objects in the pack size estimate,
> is that pushing loose objects is a very common use case (most people
> push more often than they 'git gc'). However, for the pack sizes that
> servers are most likely to refuse (hundreds of megabytes), most of
> those objects will probably already be packed anyway (e.g. by
> 'git gc --auto'), so I still hope the pack size estimate will be useful
> when it really matters.
That is my impression too. Most servers using this feature will
probably put a limit of at least 10MB. Once you get into the 25-100M
range, the client probably has already packed the bulk of that
content. Especially if we also have Junio's new stream large blobs to
packs during git add patch. So as you point out, cases where this is
mostly useful (really huge push) this is likely to still trigger
correctly.
We can still get a tighter estimate if we wanted to. I wouldn't mix it
into this patch, but make a new one on top of it. During delta
compression we hold onto deltas, or at least compute and retain the
size of the chosen delta. We could re-check the pack size after the
Compressing phase by including the delta sizes in the estimate, and if
we are over, abort before writing.
For non-delta, non-reuse we may be able to guess by just using the
loose object size. The loose object is most likely compressed at the
same compression ratio as the outgoing pack stream will use, so a
deflate(inflate(loose)) cycle is going to be very close in total bytes
used. If we over shoot the limit by more than some fudge factor (say
8K in 1M limit or 0.7%), abort before writing.
--
Shawn.
next prev parent reply other threads:[~2011-05-23 16:11 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-05-23 0:51 [PATCHv4 00/10] Push limits Johan Herland
2011-05-23 0:51 ` [PATCHv4 01/10] Update technical docs to reflect side-band-64k capability in receive-pack Johan Herland
2011-05-23 0:51 ` [PATCHv4 02/10] send-pack: Attempt to retrieve remote status even if pack-objects fails Johan Herland
2011-05-23 20:06 ` Junio C Hamano
2011-05-23 22:58 ` Johan Herland
2011-05-23 0:51 ` [PATCHv4 03/10] Tighten rules for matching server capabilities in server_supports() Johan Herland
2011-05-23 0:51 ` [PATCHv4 04/10] receive-pack: Prepare for addition of the new 'limit-*' family of capabilities Johan Herland
2011-05-23 20:21 ` Junio C Hamano
2011-05-24 0:16 ` Johan Herland
2011-05-23 0:51 ` [PATCHv4 05/10] pack-objects: Teach new option --max-commit-count, limiting #commits in pack Johan Herland
2011-05-23 23:17 ` Junio C Hamano
2011-05-24 0:18 ` Johan Herland
2011-05-23 0:51 ` [PATCHv4 06/10] send-pack/receive-pack: Allow server to refuse pushes with too many commits Johan Herland
2011-05-23 23:39 ` Junio C Hamano
2011-05-24 1:11 ` Johan Herland
2011-05-23 0:52 ` [PATCHv4 07/10] pack-objects: Allow --max-pack-size to be used together with --stdout Johan Herland
2011-05-24 0:09 ` Junio C Hamano
2011-05-24 1:15 ` Johan Herland
2011-05-23 0:52 ` [PATCHv4 08/10] send-pack/receive-pack: Allow server to refuse pushing too large packs Johan Herland
2011-05-24 0:12 ` Junio C Hamano
2011-05-23 0:52 ` [PATCHv4 09/10] pack-objects: Estimate pack size; abort early if pack size limit is exceeded Johan Herland
2011-05-23 16:11 ` Shawn Pearce [this message]
2011-05-23 17:07 ` Johan Herland
2011-05-24 0:18 ` Junio C Hamano
2011-05-24 1:17 ` Johan Herland
2011-05-23 0:52 ` [PATCHv4 10/10] receive-pack: Allow server to refuse pushes with too many objects Johan Herland
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=BANLkTimQURetDxYeAJr5sVRB-ja9yuqT7Q@mail.gmail.com \
--to=spearce@spearce.org \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=johan@herland.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).