git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Test failure in p5332-multi-pack-reuse.sh
@ 2025-04-22  2:01 Philippe Blain
  2025-04-22  4:06 ` Junio C Hamano
  2025-04-22 11:16 ` [PATCH] p5332: drop "+" from --stdin-packs input Jeff King
  0 siblings, 2 replies; 6+ messages in thread
From: Philippe Blain @ 2025-04-22  2:01 UTC (permalink / raw)
  To: Git mailing list; +Cc: Taylor Blau

Hi Taylor,

I noticed that p5332-multi-pack-reuse.sh, which you added in 
ba47d88795 (t/perf: add performance tests for multi-pack reuse,
2023-12-14) fails early on in the second test ("setup bitmaps for
1-pack scenario"). Since perf tests run with '--immediate', I do not
know if further tests in that file also fail. It is reproducible on macOS [1] as 
well as Linux [2] (I don't know if these logs are public though).

I also tested on Linux on version 2.44.0 which is the first release
in which this test was added, and it also failed similarily.

Sidenote: on GitHub CI, I could not demonstrate the failure on Linux
because all Linux jobs run in containers, and the images we use do 
not have Git installed, such that actions/checkout@v4 uses the GitHub
API to download the repository instead of cloning it [3]. This leads 
die_if_build_dir_not_repo from perf-lib.sh to fail with
"No $GIT_PERF_REPO defined, and your build directory is not a repo" [4].
We could fix that by installing the 'git' package before the 'actions/checkout'
step, but we would need to account for the different package managers of 
the distros we test on.

Cheers,

Philippe.

[1] https://github.com/phil-blain/git/actions/runs/14580975799/job/40897421311#step:4:896
[2] https://gitlab.com/phil-blain/git/-/jobs/9780586827#L2889
[3] https://github.com/phil-blain/git/actions/runs/14580975799/job/40897421399#step:4:28
[4] https://github.com/phil-blain/git/actions/runs/14580975799/job/40897421399#step:8:838

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Test failure in p5332-multi-pack-reuse.sh
  2025-04-22  2:01 Test failure in p5332-multi-pack-reuse.sh Philippe Blain
@ 2025-04-22  4:06 ` Junio C Hamano
  2025-04-22 11:16 ` [PATCH] p5332: drop "+" from --stdin-packs input Jeff King
  1 sibling, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2025-04-22  4:06 UTC (permalink / raw)
  To: Philippe Blain; +Cc: Git mailing list, Taylor Blau

Philippe Blain <levraiphilippeblain@gmail.com> writes:

> Sidenote: on GitHub CI, I could not demonstrate the failure on Linux
> because all Linux jobs run in containers, and the images we use do 
> not have Git installed, such that actions/checkout@v4 uses the GitHub
> API to download the repository instead of cloning it [3]. This leads 
> die_if_build_dir_not_repo from perf-lib.sh to fail with
> "No $GIT_PERF_REPO defined, and your build directory is not a repo" [4].
> We could fix that by installing the 'git' package before the 'actions/checkout'
> step, but we would need to account for the different package managers of 
> the distros we test on.

Not limited to this topic, but wouldn't it make more sense to first
run install-dependencies (including "/usr/bin/git") and then invoke
the actions/checkout thing, I have to wonder.  We were bitten by a
separate topic due to the same issue quite recently.

Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] p5332: drop "+" from --stdin-packs input
  2025-04-22  2:01 Test failure in p5332-multi-pack-reuse.sh Philippe Blain
  2025-04-22  4:06 ` Junio C Hamano
@ 2025-04-22 11:16 ` Jeff King
  2025-04-22 15:49   ` Junio C Hamano
  2025-04-22 17:24   ` Taylor Blau
  1 sibling, 2 replies; 6+ messages in thread
From: Jeff King @ 2025-04-22 11:16 UTC (permalink / raw)
  To: Philippe Blain; +Cc: Junio C Hamano, Git mailing list, Taylor Blau

On Mon, Apr 21, 2025 at 10:01:25PM -0400, Philippe Blain wrote:

> I noticed that p5332-multi-pack-reuse.sh, which you added in 
> ba47d88795 (t/perf: add performance tests for multi-pack reuse,
> 2023-12-14) fails early on in the second test ("setup bitmaps for
> 1-pack scenario"). Since perf tests run with '--immediate', I do not
> know if further tests in that file also fail. It is reproducible on macOS [1] as 
> well as Linux [2] (I don't know if these logs are public though).
> 
> I also tested on Linux on version 2.44.0 which is the first release
> in which this test was added, and it also failed similarily.

I think the patch below is probably the right solution. With it I got
the output I'd expect (multi-pack reuse with many packs yields a CPU
speedup at the cost of increased size):

  Test                                                            this tree
  ----------------------------------------------------------------------------------
  5332.3: clone for 1-pack scenario (single-pack reuse)           6.66(37.73+0.19)
  5332.4: clone size for 1-pack scenario (single-pack reuse)               117.0M
  5332.5: clone for 1-pack scenario (multi-pack reuse)            6.89(38.71+0.25)
  5332.6: clone size for 1-pack scenario (multi-pack reuse)                117.0M
  5332.9: clone for 10-pack scenario (single-pack reuse)          5.67(35.65+0.37)
  5332.10: clone size for 10-pack scenario (single-pack reuse)             125.1M
  5332.11: clone for 10-pack scenario (multi-pack reuse)          2.47(5.71+0.15)
  5332.12: clone size for 10-pack scenario (multi-pack reuse)              134.3M
  5332.15: clone for 100-pack scenario (single-pack reuse)        14.50(130.54+0.55)
  5332.16: clone size for 100-pack scenario (single-pack reuse)            224.2M
  5332.17: clone for 100-pack scenario (multi-pack reuse)         3.34(3.69+0.18)
  5332.18: clone size for 100-pack scenario (multi-pack reuse)             307.3M

-- >8 --
Subject: [PATCH] p5332: drop "+" from --stdin-packs input

This perf script creates a midx by running "git multi-pack-index write"
with the "--stdin-packs" option. We feed that stdin by running "find" on
.git/objects/pack, using sed to strip off everything but the basename.

But that sed invocation also does something peculiar: it adds a "+" to
the start of each pack name. This causes the multi-pack-index command to
barf. The modified name does not match any pack it knows about, so it
ends up with an empty list of packs to put in the midx. And thus nothing
matches the --preferred-pack option we pass, which causes it die().

The fix is to remove the extra "+" (which also lets us simplify the sed
invocation a bit, as it is now just stripping the leading directories).

But that leaves the mystery of why it was ever there in the first place.
The answer is that an earlier iteration of the patch series had a
concept of "disjoint" packs in the midx. And one of its patches here:

  https://lore.kernel.org/git/c52d7e7b27a9add4f58b8334db4fe4498af1c90f.1701198172.git.me@ttaylorr.com/

taught read_packs_from_stdin() to treat a leading "+" as marking a
disjoint pack. But in the second version of the series, which was
ultimately merged, that disjoint concept went away, and the code to
parse "+" did likewise. The regular regression tests were adjusted to
match, but this case in t/perf was forgotten.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/perf/p5332-multi-pack-reuse.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/perf/p5332-multi-pack-reuse.sh b/t/perf/p5332-multi-pack-reuse.sh
index d1c89a8b7d..0a2525db44 100755
--- a/t/perf/p5332-multi-pack-reuse.sh
+++ b/t/perf/p5332-multi-pack-reuse.sh
@@ -58,7 +58,7 @@ do
 	'
 
 	test_expect_success "setup bitmaps for $nr_packs-pack scenario" '
-		find $packdir -type f -name "*.idx" | sed -e "s/.*\/\(.*\)$/+\1/g" |
+		find $packdir -type f -name "*.idx" | sed -e "s/.*\///" |
 		git multi-pack-index write --stdin-packs --bitmap \
 			--preferred-pack="$(find_pack $(git rev-parse HEAD))"
 	'
-- 
2.49.0.682.g886cb1c59a


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] p5332: drop "+" from --stdin-packs input
  2025-04-22 11:16 ` [PATCH] p5332: drop "+" from --stdin-packs input Jeff King
@ 2025-04-22 15:49   ` Junio C Hamano
  2025-05-01 16:03     ` Jeff King
  2025-04-22 17:24   ` Taylor Blau
  1 sibling, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2025-04-22 15:49 UTC (permalink / raw)
  To: Jeff King
  Cc: Philippe Blain, Git mailing list, Taylor Blau,
	Martin von Zweigbergk, Nico Williams, Edwin Kempin, Elijah Newren

Jeff King <peff@peff.net> writes:

>
> -- >8 --
> Subject: [PATCH] p5332: drop "+" from --stdin-packs input
>
> This perf script creates a midx by running "git multi-pack-index write"
> with the "--stdin-packs" option. We feed that stdin by running "find" on
> .git/objects/pack, using sed to strip off everything but the basename.
>
> But that sed invocation also does something peculiar: it adds a "+" to
> the start of each pack name. This causes the multi-pack-index command to
> barf. The modified name does not match any pack it knows about, so it
> ends up with an empty list of packs to put in the midx. And thus nothing
> matches the --preferred-pack option we pass, which causes it die().
>
> The fix is to remove the extra "+" (which also lets us simplify the sed
> invocation a bit, as it is now just stripping the leading directories).
>
> But that leaves the mystery of why it was ever there in the first place.
> The answer is that an earlier iteration of the patch series had a
> concept of "disjoint" packs in the midx. And one of its patches here:
>
>   https://lore.kernel.org/git/c52d7e7b27a9add4f58b8334db4fe4498af1c90f.1701198172.git.me@ttaylorr.com/
>
> taught read_packs_from_stdin() to treat a leading "+" as marking a
> disjoint pack. But in the second version of the series, which was
> ultimately merged, that disjoint concept went away, and the code to
> parse "+" did likewise. The regular regression tests were adjusted to
> match, but this case in t/perf was forgotten.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  t/perf/p5332-multi-pack-reuse.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Thanks.  I wonder if we had some tools and mechanisms people were
discussing to track changes on changsets, such a mishap could have
been caught more easily.  [jc: random folks from that discussion
CC'ed, just in case they are interested].

>
> diff --git a/t/perf/p5332-multi-pack-reuse.sh b/t/perf/p5332-multi-pack-reuse.sh
> index d1c89a8b7d..0a2525db44 100755
> --- a/t/perf/p5332-multi-pack-reuse.sh
> +++ b/t/perf/p5332-multi-pack-reuse.sh
> @@ -58,7 +58,7 @@ do
>  	'
>  
>  	test_expect_success "setup bitmaps for $nr_packs-pack scenario" '
> -		find $packdir -type f -name "*.idx" | sed -e "s/.*\/\(.*\)$/+\1/g" |
> +		find $packdir -type f -name "*.idx" | sed -e "s/.*\///" |
>  		git multi-pack-index write --stdin-packs --bitmap \
>  			--preferred-pack="$(find_pack $(git rev-parse HEAD))"
>  	'

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] p5332: drop "+" from --stdin-packs input
  2025-04-22 11:16 ` [PATCH] p5332: drop "+" from --stdin-packs input Jeff King
  2025-04-22 15:49   ` Junio C Hamano
@ 2025-04-22 17:24   ` Taylor Blau
  1 sibling, 0 replies; 6+ messages in thread
From: Taylor Blau @ 2025-04-22 17:24 UTC (permalink / raw)
  To: Jeff King; +Cc: Philippe Blain, Junio C Hamano, Git mailing list

On Tue, Apr 22, 2025 at 07:16:32AM -0400, Jeff King wrote:
> ---
>  t/perf/p5332-multi-pack-reuse.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

My apologies for the mistake in the first place, but thank you for
digging and providing the fix.

  Acked-by: Taylor Blau <me@ttaylorr.com>

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] p5332: drop "+" from --stdin-packs input
  2025-04-22 15:49   ` Junio C Hamano
@ 2025-05-01 16:03     ` Jeff King
  0 siblings, 0 replies; 6+ messages in thread
From: Jeff King @ 2025-05-01 16:03 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Philippe Blain, Git mailing list, Taylor Blau,
	Martin von Zweigbergk, Nico Williams, Edwin Kempin, Elijah Newren

On Tue, Apr 22, 2025 at 08:49:45AM -0700, Junio C Hamano wrote:

> > The fix is to remove the extra "+" (which also lets us simplify the sed
> > invocation a bit, as it is now just stripping the leading directories).
> >
> > But that leaves the mystery of why it was ever there in the first place.
> > The answer is that an earlier iteration of the patch series had a
> > concept of "disjoint" packs in the midx. And one of its patches here:
> >
> >   https://lore.kernel.org/git/c52d7e7b27a9add4f58b8334db4fe4498af1c90f.1701198172.git.me@ttaylorr.com/
> >
> > taught read_packs_from_stdin() to treat a leading "+" as marking a
> > disjoint pack. But in the second version of the series, which was
> > ultimately merged, that disjoint concept went away, and the code to
> > parse "+" did likewise. The regular regression tests were adjusted to
> > match, but this case in t/perf was forgotten.
> >
> > Signed-off-by: Jeff King <peff@peff.net>
> > ---
> >  t/perf/p5332-multi-pack-reuse.sh | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Thanks.  I wonder if we had some tools and mechanisms people were
> discussing to track changes on changsets, such a mishap could have
> been caught more easily.  [jc: random folks from that discussion
> CC'ed, just in case they are interested].

IMHO it would probably not help that much, because the error was in the
other direction. I.e., the issue was that something _didn't_ change
between two versions of the series, but should have.

In general I think we'd usually rely on tests or compiler analysis
(e.g., leftover unused variables) to catch this kind of thing. It's just
that hardly anybody actually runs the perf tests. I know there was some
discussion about running them regularly in CI, but I'm skeptical that
the CPU time / utility tradeoff is very good there.

In some sense, this case was the process working as designed. Running
the test _did_ catch the problem, but we didn't notice because nobody
ran it for a while. So in the interim, nobody was hurt. ;) I'm mostly
joking. It is certainly more convenient to catch these things earlier,
but latent buggy code that nobody runs might not be that big a worry.

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-05-01 16:03 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-22  2:01 Test failure in p5332-multi-pack-reuse.sh Philippe Blain
2025-04-22  4:06 ` Junio C Hamano
2025-04-22 11:16 ` [PATCH] p5332: drop "+" from --stdin-packs input Jeff King
2025-04-22 15:49   ` Junio C Hamano
2025-05-01 16:03     ` Jeff King
2025-04-22 17:24   ` Taylor Blau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).