From: Thomas Rast <trast@inf.ethz.ch>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
"Jeff King" <peff@peff.net>
Subject: Re: [PATCH] Hold an 'unsigned long' chunk of the sha1 in obj_hash
Date: Thu, 25 Apr 2013 23:09:33 +0200 [thread overview]
Message-ID: <87bo92gvw2.fsf@hexa.v.cablecom.net> (raw)
In-Reply-To: <7vobd22wto.fsf@alter.siamese.dyndns.org> (Junio C. Hamano's message of "Thu, 25 Apr 2013 13:13:07 -0700")
Junio C Hamano <gitster@pobox.com> writes:
> Thomas Rast <trast@inf.ethz.ch> writes:
>
>> So we take a slightly different approach, and trade some memory for
>> better cache locality.
>
> Interesting. It feels somewhat bait-and-switch to reveal that the
> above "some" turns out to be "double" later, but the resulting code
> does not look too bad, and the numbers do not look insignificant.
Oh, that wasn't the intent. I was too lazy to gather some memory
numbers, so here's an estimate on the local effect and some measurements
on the global one.
struct object is at least 24 bytes (flags etc. and sha1). We grow the
hash by 2x whenever it reaches 50% load, so it is always at least 25%
loaded.
A 25% loaded hash-table used to consist of 75% pointers (8 bytes) and
25% pointers-to-struct-object (32 bytes), for 14 bytes per average slot.
Now it's 22 bytes (one more unsigned long) per slot, i.e., a 60%
increase for the data managed by the hash table.
But that's using the crudest estimates I could think of. If we assume
that an average blob and tree is at least as big as the smallest
possible commit, we'd guess that objects are at least ~240 bytes (this
is still somewhat of an estimate and assumes that you don't go and
handcraft commits with single-digit timestamps). So the numbers above
go up by 25% * 240 per average slot, and work out to an about 11%
overall increase.
Here are some real numbers from /usr/bin/time git rev-list --all --objects:
before:
2.30user 0.02system 0:02.33elapsed 99%CPU (0avgtext+0avgdata 247760maxresident)k
0inputs+0outputs (0major+17844minor)pagefaults 0swaps
after:
2.18user 0.02system 0:02.21elapsed 99%CPU (0avgtext+0avgdata 261936maxresident)k
0inputs+0outputs (0major+18202minor)pagefaults 0swaps
So that would be about 14MB or 5.7% of extra memory.
--
Thomas Rast
trast@{inf,student}.ethz.ch
next prev parent reply other threads:[~2013-04-25 21:09 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-04-25 18:04 [PATCH] Hold an 'unsigned long' chunk of the sha1 in obj_hash Thomas Rast
2013-04-25 20:13 ` Junio C Hamano
2013-04-25 21:09 ` Thomas Rast [this message]
2013-04-25 21:12 ` Duy Nguyen
2013-05-01 20:49 ` Jeff King
2013-05-02 0:53 ` [PATCH] process tree diffs during "rev-list --objects" Jeff King
2013-05-02 8:52 ` [PATCH] Hold an 'unsigned long' chunk of the sha1 in obj_hash Thomas Rast
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87bo92gvw2.fsf@hexa.v.cablecom.net \
--to=trast@inf.ethz.ch \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=pclouds@gmail.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.