From: Taylor Blau <me@ttaylorr.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 1/7] oid_array: use size_t for count and allocation
Date: Tue, 14 Apr 2020 18:27:58 -0600 [thread overview]
Message-ID: <20200415002758.GC7457@syl.local> (raw)
In-Reply-To: <20200330140309.GA2456038@coredump.intra.peff.net>
On Mon, Mar 30, 2020 at 10:03:09AM -0400, Jeff King wrote:
> The oid_array object uses an "int" to store the number of items and the
> allocated size. It's rather unlikely for somebody to have more than 2^31
> objects in a repository (the sha1's alone would be 40GB!), but if they
> do, we'd overflow our alloc variable.
>
> You can reproduce this case with something like:
>
> git init repo
> cd repo
>
> # make a pack with 2^24 objects
> perl -e '
> my $nr = 2**24;
>
> for (my $i = 0; $i < $nr; $i++) {
> print "blob\n";
> print "data 4\n";
> print pack("N", $i);
> }
> ' | git fast-import
>
> # now make 256 copies of it; most of these objects will be duplicates,
> # but oid_array doesn't de-dup until all values are read and it can
> # sort the result.
> cd .git/objects/pack/
> pack=$(echo *.pack)
> idx=$(echo *.idx)
> for i in $(seq 0 255); do
> # no need to waste disk space
> ln "$pack" "pack-extra-$i.pack"
> ln "$idx" "pack-extra-$i.idx"
> done
>
> # and now force an oid_array to store all of it
> git cat-file --batch-all-objects --batch-check
>
> which results in:
>
> fatal: size_t overflow: 32 * 18446744071562067968
>
> So the good news is that st_mult() sees the problem (the large number is
> because our int wraps negative, and then that gets cast to a size_t),
> doing the job it was meant to: bailing in crazy situations rather than
> causing an undersized buffer.
>
> But we should avoid hitting this case at all, and instead limit
> ourselves based on what malloc() is willing to give us. We can easily do
> that by switching to size_t.
>
> The cat-file process above made it to ~120GB virtual set size before the
> integer overflow (our internal hash storage is 32-bytes now in
> preparation for sha256, so we'd expect ~128GB total needed, plus
> potentially more to copy from one realloc'd block to another)). After
> this patch (and about 130GB of RAM+swap), it does eventually read in the
> whole set. No test for obvious reasons.
;). This patch looks good, and makes immediate sense given your
explanation.
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> sha1-array.h | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/sha1-array.h b/sha1-array.h
> index dc1bca9c9a..c5e4b9324f 100644
> --- a/sha1-array.h
> +++ b/sha1-array.h
> @@ -49,8 +49,8 @@
> */
> struct oid_array {
> struct object_id *oid;
> - int nr;
> - int alloc;
> + size_t nr;
> + size_t alloc;
> int sorted;
> };
>
> --
> 2.26.0.597.g7e08ed78ff
Thanks,
Taylor
next prev parent reply other threads:[~2020-04-15 0:28 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-30 14:02 [PATCH 0/7] oid_array cleanups Jeff King
2020-03-30 14:03 ` [PATCH 1/7] oid_array: use size_t for count and allocation Jeff King
2020-03-30 14:09 ` Jeff King
2020-04-15 0:27 ` Taylor Blau [this message]
2020-03-30 14:03 ` [PATCH 2/7] oid_array: use size_t for iteration Jeff King
2020-03-30 14:03 ` [PATCH 3/7] oid_array: rename source file from sha1-array Jeff King
2020-04-15 0:34 ` Taylor Blau
2020-03-30 14:04 ` [PATCH 4/7] test-tool: rename sha1-array to oid-array Jeff King
2020-03-30 14:04 ` [PATCH 5/7] bisect: stop referring to sha1_array Jeff King
2020-03-30 14:04 ` [PATCH 6/7] ref-filter: stop referring to "sha1 array" Jeff King
2020-03-30 14:04 ` [PATCH 7/7] oidset: stop referring to sha1-array Jeff King
2020-04-15 0:35 ` [PATCH 0/7] oid_array cleanups Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200415002758.GC7457@syl.local \
--to=me@ttaylorr.com \
--cc=git@vger.kernel.org \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).