* Test failure in p5332-multi-pack-reuse.sh
@ 2025-04-22 2:01 Philippe Blain
2025-04-22 4:06 ` Junio C Hamano
2025-04-22 11:16 ` [PATCH] p5332: drop "+" from --stdin-packs input Jeff King
0 siblings, 2 replies; 6+ messages in thread
From: Philippe Blain @ 2025-04-22 2:01 UTC (permalink / raw)
To: Git mailing list; +Cc: Taylor Blau
Hi Taylor,
I noticed that p5332-multi-pack-reuse.sh, which you added in
ba47d88795 (t/perf: add performance tests for multi-pack reuse,
2023-12-14) fails early on in the second test ("setup bitmaps for
1-pack scenario"). Since perf tests run with '--immediate', I do not
know if further tests in that file also fail. It is reproducible on macOS [1] as
well as Linux [2] (I don't know if these logs are public though).
I also tested on Linux on version 2.44.0 which is the first release
in which this test was added, and it also failed similarily.
Sidenote: on GitHub CI, I could not demonstrate the failure on Linux
because all Linux jobs run in containers, and the images we use do
not have Git installed, such that actions/checkout@v4 uses the GitHub
API to download the repository instead of cloning it [3]. This leads
die_if_build_dir_not_repo from perf-lib.sh to fail with
"No $GIT_PERF_REPO defined, and your build directory is not a repo" [4].
We could fix that by installing the 'git' package before the 'actions/checkout'
step, but we would need to account for the different package managers of
the distros we test on.
Cheers,
Philippe.
[1] https://github.com/phil-blain/git/actions/runs/14580975799/job/40897421311#step:4:896
[2] https://gitlab.com/phil-blain/git/-/jobs/9780586827#L2889
[3] https://github.com/phil-blain/git/actions/runs/14580975799/job/40897421399#step:4:28
[4] https://github.com/phil-blain/git/actions/runs/14580975799/job/40897421399#step:8:838
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: Test failure in p5332-multi-pack-reuse.sh 2025-04-22 2:01 Test failure in p5332-multi-pack-reuse.sh Philippe Blain @ 2025-04-22 4:06 ` Junio C Hamano 2025-04-22 11:16 ` [PATCH] p5332: drop "+" from --stdin-packs input Jeff King 1 sibling, 0 replies; 6+ messages in thread From: Junio C Hamano @ 2025-04-22 4:06 UTC (permalink / raw) To: Philippe Blain; +Cc: Git mailing list, Taylor Blau Philippe Blain <levraiphilippeblain@gmail.com> writes: > Sidenote: on GitHub CI, I could not demonstrate the failure on Linux > because all Linux jobs run in containers, and the images we use do > not have Git installed, such that actions/checkout@v4 uses the GitHub > API to download the repository instead of cloning it [3]. This leads > die_if_build_dir_not_repo from perf-lib.sh to fail with > "No $GIT_PERF_REPO defined, and your build directory is not a repo" [4]. > We could fix that by installing the 'git' package before the 'actions/checkout' > step, but we would need to account for the different package managers of > the distros we test on. Not limited to this topic, but wouldn't it make more sense to first run install-dependencies (including "/usr/bin/git") and then invoke the actions/checkout thing, I have to wonder. We were bitten by a separate topic due to the same issue quite recently. Thanks. ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH] p5332: drop "+" from --stdin-packs input 2025-04-22 2:01 Test failure in p5332-multi-pack-reuse.sh Philippe Blain 2025-04-22 4:06 ` Junio C Hamano @ 2025-04-22 11:16 ` Jeff King 2025-04-22 15:49 ` Junio C Hamano 2025-04-22 17:24 ` Taylor Blau 1 sibling, 2 replies; 6+ messages in thread From: Jeff King @ 2025-04-22 11:16 UTC (permalink / raw) To: Philippe Blain; +Cc: Junio C Hamano, Git mailing list, Taylor Blau On Mon, Apr 21, 2025 at 10:01:25PM -0400, Philippe Blain wrote: > I noticed that p5332-multi-pack-reuse.sh, which you added in > ba47d88795 (t/perf: add performance tests for multi-pack reuse, > 2023-12-14) fails early on in the second test ("setup bitmaps for > 1-pack scenario"). Since perf tests run with '--immediate', I do not > know if further tests in that file also fail. It is reproducible on macOS [1] as > well as Linux [2] (I don't know if these logs are public though). > > I also tested on Linux on version 2.44.0 which is the first release > in which this test was added, and it also failed similarily. I think the patch below is probably the right solution. With it I got the output I'd expect (multi-pack reuse with many packs yields a CPU speedup at the cost of increased size): Test this tree ---------------------------------------------------------------------------------- 5332.3: clone for 1-pack scenario (single-pack reuse) 6.66(37.73+0.19) 5332.4: clone size for 1-pack scenario (single-pack reuse) 117.0M 5332.5: clone for 1-pack scenario (multi-pack reuse) 6.89(38.71+0.25) 5332.6: clone size for 1-pack scenario (multi-pack reuse) 117.0M 5332.9: clone for 10-pack scenario (single-pack reuse) 5.67(35.65+0.37) 5332.10: clone size for 10-pack scenario (single-pack reuse) 125.1M 5332.11: clone for 10-pack scenario (multi-pack reuse) 2.47(5.71+0.15) 5332.12: clone size for 10-pack scenario (multi-pack reuse) 134.3M 5332.15: clone for 100-pack scenario (single-pack reuse) 14.50(130.54+0.55) 5332.16: clone size for 100-pack scenario (single-pack reuse) 224.2M 5332.17: clone for 100-pack scenario (multi-pack reuse) 3.34(3.69+0.18) 5332.18: clone size for 100-pack scenario (multi-pack reuse) 307.3M -- >8 -- Subject: [PATCH] p5332: drop "+" from --stdin-packs input This perf script creates a midx by running "git multi-pack-index write" with the "--stdin-packs" option. We feed that stdin by running "find" on .git/objects/pack, using sed to strip off everything but the basename. But that sed invocation also does something peculiar: it adds a "+" to the start of each pack name. This causes the multi-pack-index command to barf. The modified name does not match any pack it knows about, so it ends up with an empty list of packs to put in the midx. And thus nothing matches the --preferred-pack option we pass, which causes it die(). The fix is to remove the extra "+" (which also lets us simplify the sed invocation a bit, as it is now just stripping the leading directories). But that leaves the mystery of why it was ever there in the first place. The answer is that an earlier iteration of the patch series had a concept of "disjoint" packs in the midx. And one of its patches here: https://lore.kernel.org/git/c52d7e7b27a9add4f58b8334db4fe4498af1c90f.1701198172.git.me@ttaylorr.com/ taught read_packs_from_stdin() to treat a leading "+" as marking a disjoint pack. But in the second version of the series, which was ultimately merged, that disjoint concept went away, and the code to parse "+" did likewise. The regular regression tests were adjusted to match, but this case in t/perf was forgotten. Signed-off-by: Jeff King <peff@peff.net> --- t/perf/p5332-multi-pack-reuse.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/perf/p5332-multi-pack-reuse.sh b/t/perf/p5332-multi-pack-reuse.sh index d1c89a8b7d..0a2525db44 100755 --- a/t/perf/p5332-multi-pack-reuse.sh +++ b/t/perf/p5332-multi-pack-reuse.sh @@ -58,7 +58,7 @@ do ' test_expect_success "setup bitmaps for $nr_packs-pack scenario" ' - find $packdir -type f -name "*.idx" | sed -e "s/.*\/\(.*\)$/+\1/g" | + find $packdir -type f -name "*.idx" | sed -e "s/.*\///" | git multi-pack-index write --stdin-packs --bitmap \ --preferred-pack="$(find_pack $(git rev-parse HEAD))" ' -- 2.49.0.682.g886cb1c59a ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] p5332: drop "+" from --stdin-packs input 2025-04-22 11:16 ` [PATCH] p5332: drop "+" from --stdin-packs input Jeff King @ 2025-04-22 15:49 ` Junio C Hamano 2025-05-01 16:03 ` Jeff King 2025-04-22 17:24 ` Taylor Blau 1 sibling, 1 reply; 6+ messages in thread From: Junio C Hamano @ 2025-04-22 15:49 UTC (permalink / raw) To: Jeff King Cc: Philippe Blain, Git mailing list, Taylor Blau, Martin von Zweigbergk, Nico Williams, Edwin Kempin, Elijah Newren Jeff King <peff@peff.net> writes: > > -- >8 -- > Subject: [PATCH] p5332: drop "+" from --stdin-packs input > > This perf script creates a midx by running "git multi-pack-index write" > with the "--stdin-packs" option. We feed that stdin by running "find" on > .git/objects/pack, using sed to strip off everything but the basename. > > But that sed invocation also does something peculiar: it adds a "+" to > the start of each pack name. This causes the multi-pack-index command to > barf. The modified name does not match any pack it knows about, so it > ends up with an empty list of packs to put in the midx. And thus nothing > matches the --preferred-pack option we pass, which causes it die(). > > The fix is to remove the extra "+" (which also lets us simplify the sed > invocation a bit, as it is now just stripping the leading directories). > > But that leaves the mystery of why it was ever there in the first place. > The answer is that an earlier iteration of the patch series had a > concept of "disjoint" packs in the midx. And one of its patches here: > > https://lore.kernel.org/git/c52d7e7b27a9add4f58b8334db4fe4498af1c90f.1701198172.git.me@ttaylorr.com/ > > taught read_packs_from_stdin() to treat a leading "+" as marking a > disjoint pack. But in the second version of the series, which was > ultimately merged, that disjoint concept went away, and the code to > parse "+" did likewise. The regular regression tests were adjusted to > match, but this case in t/perf was forgotten. > > Signed-off-by: Jeff King <peff@peff.net> > --- > t/perf/p5332-multi-pack-reuse.sh | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Thanks. I wonder if we had some tools and mechanisms people were discussing to track changes on changsets, such a mishap could have been caught more easily. [jc: random folks from that discussion CC'ed, just in case they are interested]. > > diff --git a/t/perf/p5332-multi-pack-reuse.sh b/t/perf/p5332-multi-pack-reuse.sh > index d1c89a8b7d..0a2525db44 100755 > --- a/t/perf/p5332-multi-pack-reuse.sh > +++ b/t/perf/p5332-multi-pack-reuse.sh > @@ -58,7 +58,7 @@ do > ' > > test_expect_success "setup bitmaps for $nr_packs-pack scenario" ' > - find $packdir -type f -name "*.idx" | sed -e "s/.*\/\(.*\)$/+\1/g" | > + find $packdir -type f -name "*.idx" | sed -e "s/.*\///" | > git multi-pack-index write --stdin-packs --bitmap \ > --preferred-pack="$(find_pack $(git rev-parse HEAD))" > ' ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] p5332: drop "+" from --stdin-packs input 2025-04-22 15:49 ` Junio C Hamano @ 2025-05-01 16:03 ` Jeff King 0 siblings, 0 replies; 6+ messages in thread From: Jeff King @ 2025-05-01 16:03 UTC (permalink / raw) To: Junio C Hamano Cc: Philippe Blain, Git mailing list, Taylor Blau, Martin von Zweigbergk, Nico Williams, Edwin Kempin, Elijah Newren On Tue, Apr 22, 2025 at 08:49:45AM -0700, Junio C Hamano wrote: > > The fix is to remove the extra "+" (which also lets us simplify the sed > > invocation a bit, as it is now just stripping the leading directories). > > > > But that leaves the mystery of why it was ever there in the first place. > > The answer is that an earlier iteration of the patch series had a > > concept of "disjoint" packs in the midx. And one of its patches here: > > > > https://lore.kernel.org/git/c52d7e7b27a9add4f58b8334db4fe4498af1c90f.1701198172.git.me@ttaylorr.com/ > > > > taught read_packs_from_stdin() to treat a leading "+" as marking a > > disjoint pack. But in the second version of the series, which was > > ultimately merged, that disjoint concept went away, and the code to > > parse "+" did likewise. The regular regression tests were adjusted to > > match, but this case in t/perf was forgotten. > > > > Signed-off-by: Jeff King <peff@peff.net> > > --- > > t/perf/p5332-multi-pack-reuse.sh | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > Thanks. I wonder if we had some tools and mechanisms people were > discussing to track changes on changsets, such a mishap could have > been caught more easily. [jc: random folks from that discussion > CC'ed, just in case they are interested]. IMHO it would probably not help that much, because the error was in the other direction. I.e., the issue was that something _didn't_ change between two versions of the series, but should have. In general I think we'd usually rely on tests or compiler analysis (e.g., leftover unused variables) to catch this kind of thing. It's just that hardly anybody actually runs the perf tests. I know there was some discussion about running them regularly in CI, but I'm skeptical that the CPU time / utility tradeoff is very good there. In some sense, this case was the process working as designed. Running the test _did_ catch the problem, but we didn't notice because nobody ran it for a while. So in the interim, nobody was hurt. ;) I'm mostly joking. It is certainly more convenient to catch these things earlier, but latent buggy code that nobody runs might not be that big a worry. -Peff ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] p5332: drop "+" from --stdin-packs input 2025-04-22 11:16 ` [PATCH] p5332: drop "+" from --stdin-packs input Jeff King 2025-04-22 15:49 ` Junio C Hamano @ 2025-04-22 17:24 ` Taylor Blau 1 sibling, 0 replies; 6+ messages in thread From: Taylor Blau @ 2025-04-22 17:24 UTC (permalink / raw) To: Jeff King; +Cc: Philippe Blain, Junio C Hamano, Git mailing list On Tue, Apr 22, 2025 at 07:16:32AM -0400, Jeff King wrote: > --- > t/perf/p5332-multi-pack-reuse.sh | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) My apologies for the mistake in the first place, but thank you for digging and providing the fix. Acked-by: Taylor Blau <me@ttaylorr.com> Thanks, Taylor ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-05-01 16:03 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-04-22 2:01 Test failure in p5332-multi-pack-reuse.sh Philippe Blain 2025-04-22 4:06 ` Junio C Hamano 2025-04-22 11:16 ` [PATCH] p5332: drop "+" from --stdin-packs input Jeff King 2025-04-22 15:49 ` Junio C Hamano 2025-05-01 16:03 ` Jeff King 2025-04-22 17:24 ` Taylor Blau
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).