git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michael Haggerty <mhagger@alum.mit.edu>
To: Junio C Hamano <gitster@pobox.com>
Cc: Julian Phillips <julian@quantumfyre.co.uk>,
	Martin Fick <mfick@codeaurora.org>,
	Christian Couder <christian.couder@gmail.com>,
	git@vger.kernel.org, Christian Couder <chriscool@tuxfamily.org>,
	Thomas Rast <trast@student.ethz.ch>
Subject: Re: [PATCH v3] refs: Use binary search to lookup refs faster
Date: Fri, 30 Sep 2011 17:30:04 +0200	[thread overview]
Message-ID: <4E85E07C.5070402@alum.mit.edu> (raw)
In-Reply-To: <7v62karjv3.fsf@alter.siamese.dyndns.org>

On 09/30/2011 01:48 AM, Junio C Hamano wrote:
> This version looks sane, although I have a suspicion that it may have
> some interaction with what Michael may be working on.

Indeed, I have almost equivalent changes in the giant patch series that
I am working on [1].  The branch is very experimental.  The tip
currently passes all the tests, but it has a known performance
regression in connection if "git fetch" is used to fetch many commits.


But before comparing ref-related optimizations, we have an *urgent* need
for a decent performance test suite.  There are many slightly different
scenarios that have very different performance characteristics, and we
have to be sure that we are optimizing for the whole palette of
many-reference use cases.  So I made an attempt at a kludgey but
somewhat flexible performance-testing script [2].  I don't know whether
something like this should be integrated into the git project, and if so
where; suggestions are welcome.


To run the tests, from the root of the git source tree:

    make # make sure git is up-to-date
    t/make-refperf-repo --help
    t/make-refperf-repo [OPTIONS]
    t/refperf
    cat refperf.times # See the results

The default repo has 5k commits in a linear series with one reference on
each commit.  (These numbers can both be adjusted.)

The reference namespace can be laid out a few ways:

* Many references in a single "directory" vs. sharded over many
"directories"

* In lexicographic order by commit, in reverse order, or "shuffled".

By default, the repo is written to "refperf-repo".

The time it takes to create the test repository is itself also an
interesting benchmark.  For example, on the maint branch it is terribly
slow unless it is passed either the --pack-refs-interval=N (with N, say
100) or --no-replace-object option.  I also noticed that if it is run like

    t/make-refperf-repo --refs=5000 --commits=5000 \
            --pack-refs-interval=100

(one ref per commit), git-pack-refs becomes precipitously and
dramatically slower after the 2000th commit.

I haven't had time yet for systematic benchmarks of other git versions.

See the refperf script to see what sorts of benchmarks that I have built
into it so far.  The refperf test is non-destructive; it always copies
from "refperf-repo" to "refperf-repo-copy" and does its tests in the
copy; therefore a test repo can be reused.  The timing data are written
to "refperf.times" and other output to "refperf.log".

Here are my refperf results for the "maint" branch on my notebook with
the default "make-refperf-repo" arguments (times in seconds):

3.36 git branch (cold)
0.01 git branch (warm)
0.04 git for-each-ref
3.08 git checkout (cold)
0.01 git checkout (warm)
0.00 git checkout --orphan (warm)
0.15 git checkout from detached orphan
0.12 git pack-refs
1.17 git branch (cold)
0.00 git branch (warm)
0.17 git for-each-ref
0.95 git checkout (cold)
0.00 git checkout (warm)
0.00 git checkout --orphan (warm)
0.21 git checkout from detached orphan
0.18 git branch -a --contains
7.67 git clone
0.06 git fetch (nothing)
0.01 git pack-refs
0.05 git fetch (nothing, packed)
0.10 git clone of a ref-packed repo
0.63 git fetch (everything)

Probably we should test with even more references than this, but this
test already shows that some commands are quite sluggish.

There are some more things that could be added, like:

* Branches vs. annotated tags

* References on the tips of branches in a more typical "branchy" repository.

* git describe --all

* git log --decorate

* git gc

* git filter-branch
  (This has very different performance characteristics because it is a
script that invokes git many times.)

I suggest that we try to do systematic benchmarking of any changes that
we claim are performance optimizations and share before/after results in
the cover letter for the patch series.

Michael

[1] branch hierarchical-refs at git://github.com/mhagger/git.git
[2] branch refperf at git://github.com/mhagger/git.git

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

  reply	other threads:[~2011-09-30 15:30 UTC|newest]

Thread overview: 126+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-09  3:44 Git is not scalable with too many refs/* NAKAMURA Takumi
2011-06-09  6:50 ` Sverre Rabbelier
2011-06-09 15:23   ` Shawn Pearce
2011-06-09 15:52     ` A Large Angry SCM
2011-06-09 15:56       ` Shawn Pearce
2011-06-09 16:26         ` Jeff King
2011-06-10  3:59           ` NAKAMURA Takumi
2011-06-13 22:27             ` Jeff King
2011-06-14  0:17             ` Andreas Ericsson
2011-06-14  0:30               ` Jeff King
2011-06-14  4:41                 ` Junio C Hamano
2011-06-14  7:26                   ` Sverre Rabbelier
2011-06-14 10:02                     ` Johan Herland
2011-06-14 10:34                       ` Sverre Rabbelier
2011-06-14 17:02                       ` Jeff King
2011-06-14 19:20                         ` Shawn Pearce
2011-06-14 19:47                           ` Jeff King
2011-06-14 20:12                             ` Shawn Pearce
2011-09-08 19:53                               ` Martin Fick
2011-09-09  0:52                                 ` Martin Fick
2011-09-09  1:05                                   ` Thomas Rast
2011-09-09  1:13                                     ` Thomas Rast
2011-09-09 15:59                                   ` Jens Lehmann
2011-09-25 20:43                                   ` Martin Fick
2011-09-26 12:41                                     ` Christian Couder
2011-09-26 17:47                                       ` Martin Fick
2011-09-26 18:56                                         ` Christian Couder
2011-09-30 16:41                                           ` Martin Fick
2011-09-30 19:26                                             ` Martin Fick
2011-09-30 21:02                                             ` Martin Fick
2011-09-30 22:06                                               ` Martin Fick
2011-10-01 20:41                                                 ` Junio C Hamano
2011-10-02  5:19                                                   ` Michael Haggerty
2011-10-03  0:46                                                     ` Martin Fick
2011-10-04  8:08                                                       ` Michael Haggerty
2011-10-03 18:12                                                 ` Martin Fick
2011-10-03 19:42                                                   ` Junio C Hamano
2011-10-04  8:16                                                   ` Michael Haggerty
2011-10-08 20:59                                                 ` Martin Fick
2011-10-09  5:43                                                   ` Michael Haggerty
2011-09-28 19:38                                       ` Martin Fick
2011-09-28 22:10                                         ` Martin Fick
2011-09-29  0:54                                           ` Julian Phillips
2011-09-29  1:37                                             ` Martin Fick
2011-09-29  2:19                                               ` Julian Phillips
2011-09-29 16:38                                                 ` Martin Fick
2011-09-29 18:26                                                   ` Julian Phillips
2011-09-29 18:27                                                 ` René Scharfe
2011-09-29 19:10                                                   ` Junio C Hamano
2011-09-29  4:18                                                     ` [PATCH] refs: Use binary search to lookup refs faster Julian Phillips
2011-09-29 21:57                                                       ` Junio C Hamano
2011-09-29 22:04                                                       ` [PATCH v2] " Julian Phillips
2011-09-29 22:06                                                       ` [PATCH] " Junio C Hamano
2011-09-29 22:11                                                         ` [PATCH v3] " Julian Phillips
2011-09-29 23:48                                                           ` Junio C Hamano
2011-09-30 15:30                                                             ` Michael Haggerty [this message]
2011-09-30 16:38                                                               ` Junio C Hamano
2011-09-30 17:56                                                                 ` [PATCH] refs: Remove duplicates after sorting with qsort Julian Phillips
2011-10-02  5:15                                                                 ` [PATCH v3] refs: Use binary search to lookup refs faster Michael Haggerty
2011-10-02  5:45                                                                   ` Junio C Hamano
2011-10-04 20:58                                                                     ` Junio C Hamano
2011-09-30  1:13                                                           ` Martin Fick
2011-09-30  3:44                                                             ` Junio C Hamano
2011-09-30  8:04                                                               ` Julian Phillips
2011-09-30 15:45                                                               ` Martin Fick
2011-09-29 20:44                                                     ` Git is not scalable with too many refs/* Martin Fick
2011-09-29 19:10                                                   ` Julian Phillips
2011-09-29 20:11                                                   ` Martin Fick
2011-09-30  9:12                                                     ` René Scharfe
2011-09-30 16:09                                                       ` Martin Fick
2011-09-30 16:52                                                       ` Junio C Hamano
2011-09-30 18:17                                                         ` René Scharfe
2011-10-01 15:28                                                           ` René Scharfe
2011-10-01 15:38                                                             ` [PATCH 1/8] checkout: check for "Previous HEAD" notice in t2020 René Scharfe
2011-10-01 19:02                                                               ` Sverre Rabbelier
2011-10-01 15:43                                                             ` [PATCH 2/8] revision: factor out add_pending_sha1 René Scharfe
2011-10-01 15:51                                                             ` [PATCH 3/8] checkout: use add_pending_{object,sha1} in orphan check René Scharfe
2011-10-01 15:56                                                             ` [PATCH 4/8] revision: add leak_pending flag René Scharfe
2011-10-01 16:01                                                             ` [PATCH 5/8] bisect: use " René Scharfe
2011-10-01 16:02                                                             ` [PATCH 6/8] bundle: " René Scharfe
2011-10-01 16:09                                                             ` [PATCH 7/8] checkout: " René Scharfe
2011-10-01 16:16                                                             ` [PATCH 8/8] commit: factor out clear_commit_marks_for_object_array René Scharfe
2011-09-26 15:15                                     ` Git is not scalable with too many refs/* Martin Fick
2011-09-26 15:21                                       ` Sverre Rabbelier
2011-09-26 15:48                                         ` Martin Fick
2011-09-26 15:56                                           ` Sverre Rabbelier
2011-09-26 16:38                                             ` Martin Fick
2011-09-26 16:49                                               ` Julian Phillips
2011-09-26 18:07                                       ` Martin Fick
2011-09-26 18:37                                         ` Julian Phillips
2011-09-26 20:01                                           ` Martin Fick
2011-09-26 20:07                                             ` Junio C Hamano
2011-09-26 20:28                                             ` Julian Phillips
2011-09-26 21:39                                               ` Martin Fick
2011-09-26 21:52                                                 ` Martin Fick
2011-09-26 23:26                                                   ` Julian Phillips
2011-09-26 23:37                                                     ` David Michael Barr
2011-09-27  1:01                                                       ` [PATCH] refs.c: Fix slowness with numerous loose refs David Barr
2011-09-27  2:04                                                         ` David Michael Barr
2011-09-26 23:38                                                     ` Git is not scalable with too many refs/* Junio C Hamano
2011-09-27  0:00                                                       ` [PATCH] Don't sort ref_list too early Julian Phillips
2011-10-02  4:58                                                         ` Michael Haggerty
2011-09-27  0:12                                                     ` Git is not scalable with too many refs/* Martin Fick
2011-09-27  0:22                                                       ` Julian Phillips
2011-09-27  2:34                                                         ` Martin Fick
2011-09-27  7:59                                                           ` Julian Phillips
2011-09-27  8:20                                                     ` Sverre Rabbelier
2011-09-27  9:01                                                       ` Julian Phillips
2011-09-27 10:01                                                         ` Sverre Rabbelier
2011-09-27 10:25                                                           ` Nguyen Thai Ngoc Duy
2011-09-27 11:07                                                         ` Michael Haggerty
2011-09-27 12:10                                                           ` Julian Phillips
2011-09-26 22:30                                                 ` Julian Phillips
2011-09-26 15:32                                     ` Michael Haggerty
2011-09-26 15:42                                       ` Martin Fick
2011-09-26 16:25                                         ` Thomas Rast
2011-09-09 13:50                                 ` Michael Haggerty
2011-09-09 15:51                                   ` Michael Haggerty
2011-09-09 16:03                                   ` Jens Lehmann
2011-06-10  7:41         ` Andreas Ericsson
2011-06-10 19:41           ` Shawn Pearce
2011-06-10 20:12             ` Jakub Narebski
2011-06-10 20:35             ` Jeff King
2011-06-13  7:08             ` Andreas Ericsson
2011-06-09 11:18 ` Jakub Narebski
2011-06-09 15:42   ` Stephen Bash

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E85E07C.5070402@alum.mit.edu \
    --to=mhagger@alum.mit.edu \
    --cc=chriscool@tuxfamily.org \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=julian@quantumfyre.co.uk \
    --cc=mfick@codeaurora.org \
    --cc=trast@student.ethz.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).