From: Joshua Redstone <joshua.redstone@fb.com>
To: Nguyen Thai Ngoc Duy <pclouds@gmail.com>,
Thomas Rast <trast@inf.ethz.ch>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: [PATCH 0/3] On compresing large index
Date: Mon, 6 Feb 2012 15:54:39 +0000 [thread overview]
Message-ID: <CB555FF2.40859%joshua.redstone@fb.com> (raw)
In-Reply-To: <CACsJy8AnGg11PeCGFs_BxOM3wAjwzs2tOCWJV31_2_KMFTxhDA@mail.gmail.com>
Fwiw, specifically related to 'git ls-files', since it is a relatively
rare operation, it's probably ok if it's a bit slow. I know you chose it
as a good benchmark of index reading performance. I just mention it
because, in some hypothetical wild-and-crazy world in which we had a
git-aware file system layer, one could imagine doing away with most of the
index file and querying the file system for info on what's changed, SHA1
of subtrees, etc.
Do you have a sense of which operations on the index are high-value pain
points for large repositories? I can imagine things like 'git-add' and
'git-commit', but I'm not super familiar with other common operations it
has a role in.
Josh
On 2/5/12 8:35 PM, "Nguyen Thai Ngoc Duy" <pclouds@gmail.com> wrote:
>2012/2/6 Thomas Rast <trast@inf.ethz.ch>:
>>> We need to figure out what git uses 4s user time for.
>>
>> When I worked on the cache-tree stuff, my observation (based on
>> profiling, so I had actual data :-) was that computing SHA1s absolutely
>> dominates everything in such operations. It does that when writing the
>> index to write the trailing checksum, and also when loading it to verify
>> that the index is valid.
>
>You're right. This is on another machine but with same index (2M
>files), without SHA1 checksum:
>
>$ time ~/w/git/git ls-files --stage|head > /dev/null
>real 0m1.533s
>user 0m1.228s
>sys 0m0.306s
>
>and with SHA-1 checksum:
>
>$ time git ls-files --stage|head > /dev/null
>real 0m7.525s
>user 0m7.257s
>sys 0m0.268s
>
>I guess we could fall back to cheaper digests for such a large index.
>Still more than one second for doing nothing but reading index is too
>slow to me.
>
>> ls-files shouldn't be so slow though. A quick run with callgrind in a
>> linux-2.6.git tells me it spends about 45% of its time on SHA1s and a
>> whopping 25% in quote_c_style(). I wonder what's so hard about
>> quoting...
>
>That's why I put "| head" there, to cut output processing overhead
>(hopefully).
>--
>Duy
prev parent reply other threads:[~2012-02-06 15:55 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-05 8:30 [PATCH 0/3] On compresing large index Nguyễn Thái Ngọc Duy
2012-02-05 8:30 ` [PATCH 1/3] read-cache: factor out cache entries reading code Nguyễn Thái Ngọc Duy
2012-02-05 8:30 ` [PATCH 2/3] read-cache: reduce malloc/free during writing index Nguyễn Thái Ngọc Duy
2012-02-05 8:30 ` [PATCH 3/3] Support compressing index when GIT_ZCACHE=1 Nguyễn Thái Ngọc Duy
2012-02-05 21:22 ` [PATCH 0/3] On compresing large index Thomas Rast
2012-02-06 1:35 ` Nguyen Thai Ngoc Duy
2012-02-06 15:54 ` Joshua Redstone [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CB555FF2.40859%joshua.redstone@fb.com \
--to=joshua.redstone@fb.com \
--cc=git@vger.kernel.org \
--cc=pclouds@gmail.com \
--cc=trast@inf.ethz.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).