From: Derrick Stolee <stolee@gmail.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: git@vger.kernel.org, sbeller@google.com, dstolee@microsoft.com,
jrnieder@gmail.com, jonathantanmy@google.com,
mfick@codeaurora.org
Subject: Re: [PATCH 00/23] Multi-pack-index (MIDX)
Date: Thu, 7 Jun 2018 10:54:13 -0400 [thread overview]
Message-ID: <b81eedfe-e789-7ce3-cdd5-2754c8bbc40a@gmail.com> (raw)
In-Reply-To: <877enazmdd.fsf@evledraar.gmail.com>
On 6/7/2018 10:45 AM, Ævar Arnfjörð Bjarmason wrote:
> On Thu, Jun 07 2018, Derrick Stolee wrote:
>
>> To test the performance in this situation, I created a
>> script that organizes the Linux repository in a similar
>> fashion. I split the commit history into 50 parts by
>> creating branches on every 10,000 commits of the first-
>> parent history. Then, `git rev-list --objects A ^B`
>> provides the list of objects reachable from A but not B,
>> so I could send that to `git pack-objects` to create
>> these "time-based" packfiles. With these 50 packfiles
>> (deleting the old one from my fresh clone, and deleting
>> all tags as they were no longer on-disk) I could then
>> test 'git rev-list --objects HEAD^{tree}' and see:
>>
>> Before: 0.17s
>> After: 0.13s
>> % Diff: -23.5%
>>
>> By adding logic to count hits and misses to bsearch_pack,
>> I was able to see that the command above calls that
>> method 266,930 times with a hit rate of 33%. The MIDX
>> has the same number of calls with a 100% hit rate.
> Do you have the script you used for this? It would be very interesting
> as something we could stick in t/perf/ to test this use-case in the
> future.
>
> How does this & the numbers below compare to just a naïve
> --max-pack-size=<similar size> on linux.git?
>
> Is it possible for you to tar this test repo up and share it as a
> one-off? I've been polishing the core.validateAbbrev series I have, and
> it would be interesting to compare some of the (abbrev) numbers.
Here is what I used. You will want to adjust your constants for whatever
repo you are using. This is for the Linux kernel which has a
first-parent history of ~50,000 commits. It also leaves a bunch of extra
files around, so it is nowhere near incorporating into the code.
#!/bin/bash
for i in `seq 1 50`
do
ORDER=$((51 - $i))
NUM_BACK=$((1000 * ($i - 1)))
echo creating batch/$ORDER
git branch -f batch/$ORDER HEAD~$NUM_BACK
echo batch/$ORDER
git rev-parse batch/$ORDER
done
lastbranch=""
for i in `seq 1 50`
do
branch=batch/$i
if [$lastbranch -eq ""]
then
echo "$branch"
git rev-list --objects $branch | sed 's/ .*//'
>objects-$i.txt
else
echo "$lastbranch"
echo "$branch"
git rev-list --objects $branch ^$lastbranch | sed 's/
.*//' >objects-$i.txt
fi
git pack-objects --no-reuse-delta
.git/objects/pack/branch-split2 <objects-$i.txt
lastbranch=$branch
done
for tag in `git tag --list`
do
git tag -d $tag
done
rm -rf .git/objects/pack/pack-*
git midx write
next prev parent reply other threads:[~2018-06-07 15:30 UTC|newest]
Thread overview: 192+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-06-07 14:03 [PATCH 00/23] Multi-pack-index (MIDX) Derrick Stolee
2018-06-07 14:03 ` [PATCH 01/23] midx: add design document Derrick Stolee
2018-06-11 19:04 ` Stefan Beller
2018-06-18 18:48 ` Derrick Stolee
2018-06-07 14:03 ` [PATCH 02/23] midx: add midx format details to pack-format.txt Derrick Stolee
2018-06-11 19:19 ` Stefan Beller
2018-06-18 19:01 ` Derrick Stolee
2018-06-18 19:41 ` Stefan Beller
2018-06-07 14:03 ` [PATCH 03/23] midx: add midx builtin Derrick Stolee
2018-06-07 17:20 ` Duy Nguyen
2018-06-18 19:23 ` Derrick Stolee
2018-06-11 21:02 ` Stefan Beller
2018-06-18 19:40 ` Derrick Stolee
2018-06-18 19:55 ` Stefan Beller
2018-06-18 19:58 ` Derrick Stolee
2018-06-07 14:03 ` [PATCH 04/23] midx: add 'write' subcommand and basic wiring Derrick Stolee
2018-06-07 17:27 ` Duy Nguyen
2018-06-07 14:03 ` [PATCH 05/23] midx: write header information to lockfile Derrick Stolee
2018-06-07 17:35 ` Duy Nguyen
2018-06-12 15:00 ` Duy Nguyen
2018-06-19 12:54 ` Derrick Stolee
2018-06-19 14:59 ` Duy Nguyen
2018-06-19 15:24 ` Derrick Stolee
2018-06-07 14:03 ` [PATCH 06/23] midx: struct midxed_git and 'read' subcommand Derrick Stolee
2018-06-07 17:54 ` Duy Nguyen
2018-06-20 13:13 ` Derrick Stolee
2018-06-07 18:31 ` Duy Nguyen
2018-06-20 13:33 ` Derrick Stolee
2018-06-20 15:07 ` Duy Nguyen
2018-06-20 16:39 ` Derrick Stolee
2018-06-07 14:03 ` [PATCH 07/23] midx: expand test data Derrick Stolee
2018-06-07 14:03 ` [PATCH 08/23] midx: read packfiles from pack directory Derrick Stolee
2018-06-07 18:03 ` Duy Nguyen
2018-06-20 16:33 ` [PATCH] packfile: generalize pack directory list Derrick Stolee
2018-06-07 14:03 ` [PATCH 09/23] midx: write pack names in chunk Derrick Stolee
2018-06-07 18:26 ` Duy Nguyen
2018-06-21 15:25 ` Derrick Stolee
2018-06-21 17:38 ` Junio C Hamano
2018-06-22 18:25 ` Derrick Stolee
2018-06-22 18:31 ` Junio C Hamano
2018-06-22 18:32 ` Derrick Stolee
2018-06-07 14:03 ` [PATCH 10/23] midx: write a lookup into the pack names chunk Derrick Stolee
2018-06-09 16:43 ` Duy Nguyen
2018-06-21 17:23 ` Derrick Stolee
2018-06-07 14:03 ` [PATCH 11/23] midx: sort and deduplicate objects from packfiles Derrick Stolee
2018-06-09 17:07 ` Duy Nguyen
2018-06-21 17:54 ` Derrick Stolee
2018-06-07 14:03 ` [PATCH 12/23] midx: write object ids in a chunk Derrick Stolee
2018-06-09 17:25 ` Duy Nguyen
2018-06-07 14:03 ` [PATCH 13/23] midx: write object id fanout chunk Derrick Stolee
2018-06-09 17:28 ` Duy Nguyen
2018-06-21 19:49 ` Derrick Stolee
2018-06-07 14:03 ` [PATCH 14/23] midx: write object offsets Derrick Stolee
2018-06-09 17:41 ` Duy Nguyen
2018-06-07 14:03 ` [PATCH 15/23] midx: create core.midx config setting Derrick Stolee
2018-06-07 14:03 ` [PATCH 16/23] midx: prepare midxed_git struct Derrick Stolee
2018-06-09 17:47 ` Duy Nguyen
2018-06-07 14:03 ` [PATCH 17/23] midx: read objects from multi-pack-index Derrick Stolee
2018-06-09 17:56 ` Duy Nguyen
2018-06-21 20:03 ` Derrick Stolee
2018-06-07 14:03 ` [PATCH 18/23] midx: use midx in abbreviation calculations Derrick Stolee
2018-06-09 18:01 ` Duy Nguyen
2018-06-22 18:38 ` Derrick Stolee
2018-06-07 14:03 ` [PATCH 19/23] midx: use existing midx when writing new one Derrick Stolee
2018-06-07 14:03 ` [PATCH 20/23] midx: use midx in approximate_object_count Derrick Stolee
2018-06-09 18:03 ` Duy Nguyen
2018-06-22 18:39 ` Derrick Stolee
2018-06-07 14:03 ` [PATCH 21/23] midx: prevent duplicate packfile loads Derrick Stolee
2018-06-09 18:05 ` Duy Nguyen
2018-06-07 14:03 ` [PATCH 22/23] midx: use midx to find ref-deltas Derrick Stolee
2018-06-07 14:03 ` [PATCH 23/23] midx: clear midx on repack Derrick Stolee
2018-06-09 18:13 ` Duy Nguyen
2018-06-22 18:44 ` Derrick Stolee
2018-06-07 14:06 ` [PATCH 00/23] Multi-pack-index (MIDX) Derrick Stolee
2018-06-07 14:45 ` Ævar Arnfjörð Bjarmason
2018-06-07 14:54 ` Derrick Stolee [this message]
2018-06-25 14:34 ` [PATCH v2 00/24] " Derrick Stolee
2018-06-25 14:34 ` [PATCH v2 01/24] multi-pack-index: add design document Derrick Stolee
2018-06-25 14:34 ` [PATCH v2 02/24] multi-pack-index: add format details Derrick Stolee
2018-06-25 14:34 ` [PATCH v2 03/24] multi-pack-index: add builtin Derrick Stolee
2018-06-25 19:15 ` Junio C Hamano
2018-06-25 14:34 ` [PATCH v2 04/24] multi-pack-index: add 'write' verb Derrick Stolee
2018-06-25 14:34 ` [PATCH v2 05/24] midx: write header information to lockfile Derrick Stolee
2018-06-25 19:19 ` Junio C Hamano
2018-07-05 19:13 ` Derrick Stolee
2018-06-25 14:34 ` [PATCH v2 06/24] multi-pack-index: load into memory Derrick Stolee
2018-06-25 19:38 ` Junio C Hamano
2018-07-05 14:19 ` Derrick Stolee
2018-07-05 18:58 ` Eric Sunshine
2018-07-06 19:20 ` Junio C Hamano
2018-06-25 14:34 ` [PATCH v2 07/24] multi-pack-index: expand test data Derrick Stolee
2018-06-25 19:45 ` Junio C Hamano
2018-06-25 14:34 ` [PATCH v2 08/24] packfile: generalize pack directory list Derrick Stolee
2018-06-25 19:57 ` Junio C Hamano
2018-06-25 14:34 ` [PATCH v2 09/24] multi-pack-index: read packfile list Derrick Stolee
2018-06-25 14:34 ` [PATCH v2 10/24] multi-pack-index: write pack names in chunk Derrick Stolee
2018-06-25 14:34 ` [PATCH v2 11/24] midx: read pack names into array Derrick Stolee
2018-06-25 23:52 ` Eric Sunshine
2018-06-25 14:34 ` [PATCH v2 12/24] midx: sort and deduplicate objects from packfiles Derrick Stolee
2018-06-25 14:34 ` [PATCH v2 13/24] midx: write object ids in a chunk Derrick Stolee
2018-06-25 14:34 ` [PATCH v2 14/24] midx: write object id fanout chunk Derrick Stolee
2018-06-25 14:34 ` [PATCH v2 15/24] midx: write object offsets Derrick Stolee
2018-06-25 14:34 ` [PATCH v2 16/24] config: create core.multiPackIndex setting Derrick Stolee
2018-06-25 14:34 ` [PATCH v2 17/24] midx: prepare midxed_git struct Derrick Stolee
2018-06-25 14:34 ` [PATCH v2 18/24] midx: read objects from multi-pack-index Derrick Stolee
2018-06-25 14:34 ` [PATCH v2 19/24] midx: use midx in abbreviation calculations Derrick Stolee
2018-06-25 14:34 ` [PATCH v2 20/24] midx: use existing midx when writing new one Derrick Stolee
2018-06-25 14:34 ` [PATCH v2 21/24] midx: use midx in approximate_object_count Derrick Stolee
2018-06-25 14:34 ` [PATCH v2 22/24] midx: prevent duplicate packfile loads Derrick Stolee
2018-06-25 14:34 ` [PATCH v2 23/24] packfile: skip loading index if in multi-pack-index Derrick Stolee
2018-06-25 14:34 ` [PATCH v2 24/24] midx: clear midx on repack Derrick Stolee
2018-07-06 0:52 ` [PATCH v3 00/24] Multi-pack-index (MIDX) Derrick Stolee
2018-07-06 0:52 ` [PATCH v3 01/24] multi-pack-index: add design document Derrick Stolee
2018-07-06 0:52 ` [PATCH v3 02/24] multi-pack-index: add format details Derrick Stolee
2018-07-06 0:53 ` [PATCH v3 03/24] multi-pack-index: add builtin Derrick Stolee
2018-07-06 3:54 ` Eric Sunshine
2018-07-06 0:53 ` [PATCH v3 04/24] multi-pack-index: add 'write' verb Derrick Stolee
2018-07-06 4:07 ` Eric Sunshine
2018-07-06 0:53 ` [PATCH v3 05/24] midx: write header information to lockfile Derrick Stolee
2018-07-06 0:53 ` [PATCH v3 06/24] multi-pack-index: load into memory Derrick Stolee
2018-07-06 4:19 ` Eric Sunshine
2018-07-06 5:18 ` Eric Sunshine
2018-07-09 19:08 ` Junio C Hamano
2018-07-12 16:06 ` Derrick Stolee
2018-07-06 0:53 ` [PATCH v3 07/24] multi-pack-index: expand test data Derrick Stolee
2018-07-06 4:36 ` Eric Sunshine
2018-07-06 5:20 ` Eric Sunshine
2018-07-12 14:10 ` Derrick Stolee
2018-07-12 18:02 ` Eric Sunshine
2018-07-12 18:06 ` Derrick Stolee
2018-07-06 0:53 ` [PATCH v3 08/24] packfile: generalize pack directory list Derrick Stolee
2018-07-06 0:53 ` [PATCH v3 09/24] multi-pack-index: read packfile list Derrick Stolee
2018-07-06 0:53 ` [PATCH v3 10/24] multi-pack-index: write pack names in chunk Derrick Stolee
2018-07-06 0:53 ` [PATCH v3 11/24] midx: read pack names into array Derrick Stolee
2018-07-06 4:58 ` Eric Sunshine
2018-07-06 0:53 ` [PATCH v3 12/24] midx: sort and deduplicate objects from packfiles Derrick Stolee
2018-07-06 0:53 ` [PATCH v3 13/24] midx: write object ids in a chunk Derrick Stolee
2018-07-06 5:04 ` Eric Sunshine
2018-07-06 0:53 ` [PATCH v3 14/24] midx: write object id fanout chunk Derrick Stolee
2018-07-06 0:53 ` [PATCH v3 15/24] midx: write object offsets Derrick Stolee
2018-07-06 5:27 ` Eric Sunshine
2018-07-12 16:33 ` Derrick Stolee
2018-07-06 0:53 ` [PATCH v3 16/24] config: create core.multiPackIndex setting Derrick Stolee
2018-07-06 5:39 ` Eric Sunshine
2018-07-12 13:19 ` Derrick Stolee
2018-07-12 16:30 ` Derrick Stolee
2018-07-11 9:48 ` SZEDER Gábor
2018-07-12 13:01 ` Derrick Stolee
2018-07-12 13:31 ` SZEDER Gábor
2018-07-12 15:40 ` Derrick Stolee
2018-07-12 17:29 ` Junio C Hamano
2018-07-06 0:53 ` [PATCH v3 17/24] midx: prepare midxed_git struct Derrick Stolee
2018-07-06 5:41 ` Eric Sunshine
2018-07-06 0:53 ` [PATCH v3 18/24] midx: read objects from multi-pack-index Derrick Stolee
2018-07-06 0:53 ` [PATCH v3 19/24] midx: use midx in abbreviation calculations Derrick Stolee
2018-07-06 0:53 ` [PATCH v3 20/24] midx: use existing midx when writing new one Derrick Stolee
2018-07-06 0:53 ` [PATCH v3 21/24] midx: use midx in approximate_object_count Derrick Stolee
2018-07-06 0:53 ` [PATCH v3 22/24] midx: prevent duplicate packfile loads Derrick Stolee
2018-07-06 0:53 ` [PATCH v3 23/24] packfile: skip loading index if in multi-pack-index Derrick Stolee
2018-07-06 0:53 ` [PATCH v3 24/24] midx: clear midx on repack Derrick Stolee
2018-07-06 5:52 ` Eric Sunshine
2018-07-12 19:39 ` [PATCH v4 00/23] Multi-pack-index (MIDX) Derrick Stolee
2018-07-12 19:39 ` [PATCH v4 01/23] multi-pack-index: add design document Derrick Stolee
2018-07-12 19:39 ` [PATCH v4 02/23] multi-pack-index: add format details Derrick Stolee
2018-07-12 19:39 ` [PATCH v4 03/23] multi-pack-index: add builtin Derrick Stolee
2018-07-20 18:22 ` Junio C Hamano
2018-07-20 22:15 ` brian m. carlson
2018-07-20 22:28 ` Junio C Hamano
2018-07-12 19:39 ` [PATCH v4 04/23] multi-pack-index: add 'write' verb Derrick Stolee
2018-07-12 22:56 ` Eric Sunshine
2018-07-12 19:39 ` [PATCH v4 05/23] midx: write header information to lockfile Derrick Stolee
2018-07-12 19:39 ` [PATCH v4 06/23] multi-pack-index: load into memory Derrick Stolee
2018-07-12 19:39 ` [PATCH v4 07/23] t5319: expand test data Derrick Stolee
2018-07-12 19:39 ` [PATCH v4 08/23] packfile: generalize pack directory list Derrick Stolee
2018-07-12 19:39 ` [PATCH v4 09/23] multi-pack-index: read packfile list Derrick Stolee
2018-07-12 19:39 ` [PATCH v4 10/23] multi-pack-index: write pack names in chunk Derrick Stolee
2018-07-12 19:39 ` [PATCH v4 11/23] midx: read pack names into array Derrick Stolee
2018-07-12 19:39 ` [PATCH v4 12/23] midx: sort and deduplicate objects from packfiles Derrick Stolee
2018-07-12 19:39 ` [PATCH v4 13/23] midx: write object ids in a chunk Derrick Stolee
2018-07-12 19:39 ` [PATCH v4 14/23] midx: write object id fanout chunk Derrick Stolee
2018-07-12 19:39 ` [PATCH v4 15/23] midx: write object offsets Derrick Stolee
2018-07-12 19:39 ` [PATCH v4 16/23] config: create core.multiPackIndex setting Derrick Stolee
2018-07-12 21:05 ` Junio C Hamano
2018-07-13 0:50 ` Derrick Stolee
2018-07-12 19:39 ` [PATCH v4 17/23] midx: read objects from multi-pack-index Derrick Stolee
2018-07-12 19:39 ` [PATCH v4 18/23] midx: use midx in abbreviation calculations Derrick Stolee
2018-07-12 19:39 ` [PATCH v4 19/23] midx: use existing midx when writing new one Derrick Stolee
2018-07-12 19:39 ` [PATCH v4 20/23] midx: use midx in approximate_object_count Derrick Stolee
2018-07-12 19:39 ` [PATCH v4 21/23] midx: prevent duplicate packfile loads Derrick Stolee
2018-07-12 19:39 ` [PATCH v4 22/23] packfile: skip loading index if in multi-pack-index Derrick Stolee
2018-07-12 19:39 ` [PATCH v4 23/23] midx: clear midx on repack Derrick Stolee
2018-07-12 21:11 ` [PATCH v4 00/23] Multi-pack-index (MIDX) Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b81eedfe-e789-7ce3-cdd5-2754c8bbc40a@gmail.com \
--to=stolee@gmail.com \
--cc=avarab@gmail.com \
--cc=dstolee@microsoft.com \
--cc=git@vger.kernel.org \
--cc=jonathantanmy@google.com \
--cc=jrnieder@gmail.com \
--cc=mfick@codeaurora.org \
--cc=sbeller@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.