From: Johan Herland <johan@herland.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Shawn Pearce <spearce@spearce.org>,
Johan Herland <johan@herland.net>,
git@vger.kernel.org
Subject: [PATCHv4 09/10] pack-objects: Estimate pack size; abort early if pack size limit is exceeded
Date: Mon, 23 May 2011 02:52:02 +0200 [thread overview]
Message-ID: <1306111923-16859-10-git-send-email-johan@herland.net> (raw)
In-Reply-To: <1306111923-16859-1-git-send-email-johan@herland.net>
Currently, when pushing a pack to the server that has specified a pack size
limit, we don't detect that we exceed that limit until we have already
generated (and started transmitting) that much pack data.
Ideally, we should be able to predict the approximate pack size _before_ we
start generating and transmitting the pack data, and abort early if the
estimated pack size exceeds the pack size limit.
This patch tries to provide such an estimate: It looks at the objects that
are to be included in the pack, and for already-packed objects, it assumes
that their compressed in-pack size is a good estimate of how much they will
contribute to the pack currently being generated. This assumption should be
valid as long as the objects are reused as-is.
For loose objects that are to be included in the pack, we currently have no
good estimate as to how much they will contribute to the pack size. Since
it's better to underestimate (because an overestimation will prevent us
from sending a pack that might actually be within the pack size limit),
we don't include loose objects at all in the pack size estimate. This makes
the estimate somewhat useless in common workflows (where the push happens
before (most of) the pushed objects are packed).
The estimate is generated before the "Compressing" and "Writing" phases of
the push, so if the estimate exceeds the pack size limit, we abort before
sending any pack data to the server.
If the estimate turns out to be too low (e.g. because we're pushing many
loose objects), there is still code in place to abort the push when we
reach the pack size limit during transmission.
Signed-off-by: Johan Herland <johan@herland.net>
---
I'm not really happy with excluding loose objects in the pack size
estimate. However, the size contributed by loose objects varies wildly
depending on whether a (good) delta is found. Therefore, any estimate
done at an early stage is bound to be wildly inaccurate. We could maybe
use some sort of absolute minimum size per object instead, but I
thought I should publish this version before spending more time futzing
with it...
A drawback of not including loose objects in the pack size estimate,
is that pushing loose objects is a very common use case (most people
push more often than they 'git gc'). However, for the pack sizes that
servers are most likely to refuse (hundreds of megabytes), most of
those objects will probably already be packed anyway (e.g. by
'git gc --auto'), so I still hope the pack size estimate will be useful
when it really matters.
...Johan
builtin/pack-objects.c | 23 +++++++++++++++++++++++
1 files changed, 23 insertions(+), 0 deletions(-)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index e226053..c0c6a0a 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1141,23 +1141,46 @@ static int pack_offset_sort(const void *_a, const void *_b)
(a->in_pack_offset > b->in_pack_offset);
}
+static unsigned long estimate_packed_size(const struct object_entry *entry)
+{
+ unsigned long ret;
+ if (entry->in_pack) {
+ /* Assume that all packed objects are reused as-is */
+ struct revindex_entry *revidx = find_pack_revindex(
+ entry->in_pack,
+ entry->in_pack_offset);
+ return revidx[1].offset - entry->in_pack_offset;
+ }
+ return 0;
+}
+
static void get_object_details(void)
{
uint32_t i;
struct object_entry **sorted_by_offset;
+ unsigned long sum_size;
sorted_by_offset = xcalloc(nr_objects, sizeof(struct object_entry *));
for (i = 0; i < nr_objects; i++)
sorted_by_offset[i] = objects + i;
qsort(sorted_by_offset, nr_objects, sizeof(*sorted_by_offset), pack_offset_sort);
+ if (pack_to_stdout && pack_size_limit)
+ sum_size = sizeof(struct pack_header) + 20; /* pack overhead */
+
for (i = 0; i < nr_objects; i++) {
struct object_entry *entry = sorted_by_offset[i];
check_object(entry);
if (big_file_threshold <= entry->size)
entry->no_try_delta = 1;
+ if (pack_to_stdout && pack_size_limit && !entry->preferred_base)
+ sum_size += estimate_packed_size(entry);
}
+ if (pack_to_stdout && pack_size_limit && sum_size > pack_size_limit)
+ die("estimated pack size exceeds the pack size limit (%lu bytes)",
+ pack_size_limit);
+
free(sorted_by_offset);
}
--
1.7.5.rc1.3.g4d7b
next prev parent reply other threads:[~2011-05-23 0:53 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-05-23 0:51 [PATCHv4 00/10] Push limits Johan Herland
2011-05-23 0:51 ` [PATCHv4 01/10] Update technical docs to reflect side-band-64k capability in receive-pack Johan Herland
2011-05-23 0:51 ` [PATCHv4 02/10] send-pack: Attempt to retrieve remote status even if pack-objects fails Johan Herland
2011-05-23 20:06 ` Junio C Hamano
2011-05-23 22:58 ` Johan Herland
2011-05-23 0:51 ` [PATCHv4 03/10] Tighten rules for matching server capabilities in server_supports() Johan Herland
2011-05-23 0:51 ` [PATCHv4 04/10] receive-pack: Prepare for addition of the new 'limit-*' family of capabilities Johan Herland
2011-05-23 20:21 ` Junio C Hamano
2011-05-24 0:16 ` Johan Herland
2011-05-23 0:51 ` [PATCHv4 05/10] pack-objects: Teach new option --max-commit-count, limiting #commits in pack Johan Herland
2011-05-23 23:17 ` Junio C Hamano
2011-05-24 0:18 ` Johan Herland
2011-05-23 0:51 ` [PATCHv4 06/10] send-pack/receive-pack: Allow server to refuse pushes with too many commits Johan Herland
2011-05-23 23:39 ` Junio C Hamano
2011-05-24 1:11 ` Johan Herland
2011-05-23 0:52 ` [PATCHv4 07/10] pack-objects: Allow --max-pack-size to be used together with --stdout Johan Herland
2011-05-24 0:09 ` Junio C Hamano
2011-05-24 1:15 ` Johan Herland
2011-05-23 0:52 ` [PATCHv4 08/10] send-pack/receive-pack: Allow server to refuse pushing too large packs Johan Herland
2011-05-24 0:12 ` Junio C Hamano
2011-05-23 0:52 ` Johan Herland [this message]
2011-05-23 16:11 ` [PATCHv4 09/10] pack-objects: Estimate pack size; abort early if pack size limit is exceeded Shawn Pearce
2011-05-23 17:07 ` Johan Herland
2011-05-24 0:18 ` Junio C Hamano
2011-05-24 1:17 ` Johan Herland
2011-05-23 0:52 ` [PATCHv4 10/10] receive-pack: Allow server to refuse pushes with too many objects Johan Herland
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1306111923-16859-10-git-send-email-johan@herland.net \
--to=johan@herland.net \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).