git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Derrick Stolee <dstolee@microsoft.com>,
	git@vger.kernel.org, git@jeffhostetler.com, sbeller@google.com
Subject: Re: [PATCH v3 3/5] sha1_name: Unroll len loop in find_unique_abbrev_r
Date: Wed, 4 Oct 2017 09:06:32 -0400	[thread overview]
Message-ID: <ff54de0c-cd7c-bc3d-dd18-5fc248f3f573@gmail.com> (raw)
In-Reply-To: <xmqqpoa3cujz.fsf@gitster.mtv.corp.google.com>

On 10/4/2017 2:10 AM, Junio C Hamano wrote:
> Derrick Stolee <stolee@gmail.com> writes:
> ...
>> I understand that this patch on its own does not have good numbers. I
>> split the
>> patches 3 and 4 specifically to highlight two distinct changes:
>>
>> Patch 3: Unroll the len loop that may inspect all files multiple times.
>> Patch 4: Parse less while disambiguating.
>>
>> Patch 4 more than makes up for the performance hits in this patch.
> Now you confused me even more.  When we read the similar table that
> appears in [Patch 4/5], what does the "Base Time" column mean?
> Vanilla Git with [Patch 3/5] applied?  Vanillay Git with [Patch 4/5]
> alone applied?  Something else?
In PATCH 3, 4, and 5, I used the commit-by-commit diff for the perf 
numbers, so the "Base Time" for PATCH 4 is the time calculated when 
PATCH 3 is applied. The table in the [PATCH 0/5] message includes the 
relative change for all commits.

I recalculated the relative change for each patch related to the 
baseline (PATCH 2). Looking again, it appears I misspoke and PATCH 4 
does include a +8% change for a fully-repacked Linux repo relative to 
PATCH 2. Since PATCH 5 includes an optimization targeted directly at 
large packfiles, the final performance gain is significant in the 
fully-packed cases.

It is also worth looking at the absolute times for these cases, since 
the fully-packed case is significantly faster than the multiple-packfile 
case, so the relative change impacts users less.

One final note: the improvement was clearer in test p0008.1 when the 
test included "sort -R" to shuffle the known OIDs. Providing OIDs in 
lexicographic order has had a significant effect on the performance, 
which does not reflect real-world usage. I removed the "sort -R" because 
it is a GNU-ism, but if there is a good cross-platform alternative I 
would be happy to replace it.

p0008.1: find_unique_abbrev() for existing objects
--------------------------------------------------

For 10 repeated tests, each checking 100,000 known objects, we find the
following results when running in a Linux VM:

| Repo  | Baseline | Patch 3 | Rel % | Patch 4 | Rel % | Patch 5 | Rel % |
|-------|----------|---------|-------|---------|-------|---------|-------|
| Git   | 0.09     | 0.06    | -33%  | 0.05    | -44%  | 0.05    | -44%  |
| Git   | 0.11     | 0.08    | -27%  | 0.08    | -27%  | 0.08    | -27%  |
| Git   | 0.09     | 0.07    | -22%  | 0.06    | -33%  | 0.06    | -33%  |
| Linux | 0.13     | 0.32    | 146%  | 0.14    | + 8%  | 0.05    | -62%  |
| Linux | 1.13     | 1.12    | - 1%  | 0.94    | -17%  | 0.88    | -22%  |
| Linux | 1.08     | 1.05    | - 3%  | 0.86    | -20%  | 0.80    | -26%  |
| VSTS  | 0.12     | 0.23    | +92%  | 0.11    | - 8%  | 0.05    | -58%  |
| VSTS  | 1.02     | 1.08    | + 6%  | 0.95    | - 7%  | 0.95    | - 7%  |
| VSTS  | 2.25     | 2.08    | - 8%  | 1.82    | -19%  | 1.93    | -14%  |

(Each repo has three versions, in order: 1 packfile, multiple packfiles, 
and multiple packfiles and loose objects.)

Thanks,
-Stolee


  reply	other threads:[~2017-10-04 13:06 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-25  9:54 [PATCH v2 0/5] Improve abbreviation disambiguation Derrick Stolee
2017-09-25  9:54 ` [PATCH v2 1/5] test-list-objects: List a subset of object ids Derrick Stolee
2017-09-26  9:24   ` Junio C Hamano
2017-10-05  8:42   ` Jeff King
2017-10-05  9:48     ` Junio C Hamano
2017-10-05 10:00       ` Jeff King
2017-10-05 10:16         ` Junio C Hamano
2017-10-05 12:39         ` Derrick Stolee
2017-10-06 14:11           ` Jeff King
2017-10-07 19:12             ` Derrick Stolee
2017-10-07 19:33               ` Jeff King
2017-10-08  1:46                 ` Junio C Hamano
2017-09-25  9:54 ` [PATCH v2 2/5] p0008-abbrev.sh: Test find_unique_abbrev() perf Derrick Stolee
2017-09-26  9:27   ` Junio C Hamano
2017-10-05  8:55   ` Jeff King
2017-10-05  8:57     ` Jeff King
2017-09-25  9:54 ` [PATCH v2 3/5] sha1_name: Unroll len loop in find_unique_abbrev_r Derrick Stolee
2017-09-25  9:54 ` [PATCH v2 4/5] sha1_name: Parse less while finding common prefix Derrick Stolee
2017-09-25 23:42   ` Stefan Beller
2017-10-02 14:52     ` Derrick Stolee
2017-09-25  9:54 ` [PATCH v2 5/5] sha1_name: Minimize OID comparisons during disambiguation Derrick Stolee
2017-10-02 14:56 ` [PATCH v3 0/5] Improve abbreviation disambituation Derrick Stolee
2017-10-05  9:49   ` Jeff King
2017-10-02 14:56 ` [PATCH v3 1/5] test-list-objects: List a subset of object ids Derrick Stolee
2017-10-03  4:16   ` Junio C Hamano
2017-10-02 14:56 ` [PATCH v3 2/5] p0008-abbrev.sh: Test find_unique_abbrev() perf Derrick Stolee
2017-10-02 14:56 ` [PATCH v3 3/5] sha1_name: Unroll len loop in find_unique_abbrev_r Derrick Stolee
2017-10-03 10:49   ` Junio C Hamano
2017-10-03 11:26     ` Derrick Stolee
2017-10-04  6:10       ` Junio C Hamano
2017-10-04 13:06         ` Derrick Stolee [this message]
2017-10-04  6:07   ` Junio C Hamano
2017-10-04 13:19     ` Derrick Stolee
2017-10-05  1:26       ` Junio C Hamano
2017-10-05  9:13     ` Jeff King
2017-10-05  9:50       ` Junio C Hamano
2017-10-02 14:56 ` [PATCH v3 4/5] sha1_name: Parse less while finding common prefix Derrick Stolee
2017-10-04  6:14   ` Junio C Hamano
2017-10-02 14:56 ` [PATCH v3 5/5] sha1_name: Minimize OID comparisons during disambiguation Derrick Stolee
2017-10-03 15:55   ` Stefan Beller
2017-10-03 17:05     ` Derrick Stolee
2017-10-05  9:44   ` Jeff King
2017-10-06 13:52     ` [PATCH] cleanup: fix possible overflow errors in binary search Derrick Stolee
2017-10-06 14:18       ` Jeff King
2017-10-06 14:41         ` Derrick Stolee
2017-10-08 18:29           ` [PATCH v2] " Derrick Stolee
2017-10-09 13:33             ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ff54de0c-cd7c-bc3d-dd18-5fc248f3f573@gmail.com \
    --to=stolee@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).