* [PATCH 2/2] t0610-reftable-basics: mitigate a flaky test on cygwin
@ 2026-01-16 20:39 Ramsay Jones
2026-01-19 6:50 ` Patrick Steinhardt
0 siblings, 1 reply; 4+ messages in thread
From: Ramsay Jones @ 2026-01-16 20:39 UTC (permalink / raw)
To: GIT Mailing-list; +Cc: Johannes Schindelin, Junio C Hamano, Patrick Steinhardt
Test #29 ('ref transaction: corrupted tables cause failure') started to
fail intermittently for me (from v2.52.0-rc0) when running the testsuite
with '-j8'. (Also, having moved to a new laptop and windows 11, rather
than windows 10). If the test is run by hand, or without any parallelism,
then it passes without issue.
When the test fails (e.g. 1 out of 32 parallel runs) the cause is due to
a permission error while corrupting a table file:
./test-lib.sh: line 1010: .git/reftable/0x000000000001-0x000000000002-d89bb8ee.ref: Permission denied
This corruption is done in a shell loop, directly after a 'test_commit',
which uses an ': >"$f"' expression to truncate the file. Adding a sleep
of one second after the 'test_commit' and before the shell loop fixes
the test (it is not clear why). Replacing the redirection shell expression
with a 'test-tool truncate "$f" 0' invocation also provides a fix, which
could simply be another way to change the timing sufficiently to win the
race.
During a debug session, I tried looking at the strace output for the
shell redirection:
$ rm /tmp/hello; echo hello >/tmp/hello; ls -l /tmp/hello
-rw-r--r-- 1 ramsay None 6 Nov 10 17:25 /tmp/hello
$
$ strace -o zzz bash -c ': >/tmp/hello'
$
Similarly, for the test-tool solution:
$ strace -o xxx ./t/helper/test-tool truncate /tmp/hello 0
$
When comparing the output, the differences seemed to be what you would
expect and, if anything, the shell redirect probably would have taken
longer than the test-tool solution (many fcntl() calls to dup the stdout
to the <fd>). The call to the win32 api NtCreateFile() was identical,
apart from the first (FileHandle) parameter, of course.
In order to fix this flaky test on cygwin, despite not knowing why it
works, replace the shell redirection with the above 'test-tool truncate'
invocation.
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
---
t/t0610-reftable-basics.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
index 6575528f21..e19e036898 100755
--- a/t/t0610-reftable-basics.sh
+++ b/t/t0610-reftable-basics.sh
@@ -207,7 +207,7 @@ test_expect_success 'ref transaction: corrupted tables cause failure' '
test_commit file1 &&
for f in .git/reftable/*.ref
do
- : >"$f" || return 1
+ test-tool truncate "$f" 0 || return 1
done &&
test_must_fail git update-ref refs/heads/main HEAD
)
--
2.52.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH 2/2] t0610-reftable-basics: mitigate a flaky test on cygwin
2026-01-16 20:39 [PATCH 2/2] t0610-reftable-basics: mitigate a flaky test on cygwin Ramsay Jones
@ 2026-01-19 6:50 ` Patrick Steinhardt
2026-01-19 17:10 ` Ramsay Jones
0 siblings, 1 reply; 4+ messages in thread
From: Patrick Steinhardt @ 2026-01-19 6:50 UTC (permalink / raw)
To: Ramsay Jones; +Cc: GIT Mailing-list, Johannes Schindelin, Junio C Hamano
On Fri, Jan 16, 2026 at 08:39:56PM +0000, Ramsay Jones wrote:
>
> Test #29 ('ref transaction: corrupted tables cause failure') started to
> fail intermittently for me (from v2.52.0-rc0) when running the testsuite
> with '-j8'. (Also, having moved to a new laptop and windows 11, rather
> than windows 10). If the test is run by hand, or without any parallelism,
> then it passes without issue.
>
> When the test fails (e.g. 1 out of 32 parallel runs) the cause is due to
> a permission error while corrupting a table file:
>
> ./test-lib.sh: line 1010: .git/reftable/0x000000000001-0x000000000002-d89bb8ee.ref: Permission denied
This rings a bell. I remember that we discussed a case at some point in
time where a redirect converted to `test-tool truncate` fixed a flake on
Cygwin.
> This corruption is done in a shell loop, directly after a 'test_commit',
> which uses an ': >"$f"' expression to truncate the file. Adding a sleep
> of one second after the 'test_commit' and before the shell loop fixes
> the test (it is not clear why). Replacing the redirection shell expression
> with a 'test-tool truncate "$f" 0' invocation also provides a fix, which
> could simply be another way to change the timing sufficiently to win the
> race.
>
> During a debug session, I tried looking at the strace output for the
> shell redirection:
>
> $ rm /tmp/hello; echo hello >/tmp/hello; ls -l /tmp/hello
> -rw-r--r-- 1 ramsay None 6 Nov 10 17:25 /tmp/hello
> $
>
> $ strace -o zzz bash -c ': >/tmp/hello'
> $
>
> Similarly, for the test-tool solution:
>
> $ strace -o xxx ./t/helper/test-tool truncate /tmp/hello 0
> $
>
> When comparing the output, the differences seemed to be what you would
> expect and, if anything, the shell redirect probably would have taken
> longer than the test-tool solution (many fcntl() calls to dup the stdout
> to the <fd>). The call to the win32 api NtCreateFile() was identical,
> apart from the first (FileHandle) parameter, of course.
Too bad. I stil wonder whether it is the extra process that we spawn
that ends up fixing the issue.
> In order to fix this flaky test on cygwin, despite not knowing why it
> works, replace the shell redirection with the above 'test-tool truncate'
> invocation.
>
> Helped-by: Patrick Steinhardt <ps@pks.im>
Oh, so is this the exact case that we were talking about? If so, it
might make sense to link to the mail thread so that folks can also read
a bit into our discussion around this.
> Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
> ---
> t/t0610-reftable-basics.sh | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
> index 6575528f21..e19e036898 100755
> --- a/t/t0610-reftable-basics.sh
> +++ b/t/t0610-reftable-basics.sh
> @@ -207,7 +207,7 @@ test_expect_success 'ref transaction: corrupted tables cause failure' '
> test_commit file1 &&
> for f in .git/reftable/*.ref
> do
> - : >"$f" || return 1
> + test-tool truncate "$f" 0 || return 1
> done &&
> test_must_fail git update-ref refs/heads/main HEAD
> )
In any case, if it seems to reliably fix the issue I'd say we just merge
it. It's unfortunate that we haven't been able to figure out the root
cause, but so be it.
Thanks!
Patrick
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 2/2] t0610-reftable-basics: mitigate a flaky test on cygwin
2026-01-19 6:50 ` Patrick Steinhardt
@ 2026-01-19 17:10 ` Ramsay Jones
2026-01-20 5:49 ` Patrick Steinhardt
0 siblings, 1 reply; 4+ messages in thread
From: Ramsay Jones @ 2026-01-19 17:10 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: GIT Mailing-list, Johannes Schindelin, Junio C Hamano
On 19/01/2026 6:50 am, Patrick Steinhardt wrote:
> On Fri, Jan 16, 2026 at 08:39:56PM +0000, Ramsay Jones wrote:
>>
>> Test #29 ('ref transaction: corrupted tables cause failure') started to
>> fail intermittently for me (from v2.52.0-rc0) when running the testsuite
>> with '-j8'. (Also, having moved to a new laptop and windows 11, rather
>> than windows 10). If the test is run by hand, or without any parallelism,
>> then it passes without issue.
>>
>> When the test fails (e.g. 1 out of 32 parallel runs) the cause is due to
>> a permission error while corrupting a table file:
>>
>> ./test-lib.sh: line 1010: .git/reftable/0x000000000001-0x000000000002-d89bb8ee.ref: Permission denied
>
> This rings a bell. I remember that we discussed a case at some point in
> time where a redirect converted to `test-tool truncate` fixed a flake on
> Cygwin.
Indeed, the mail thread starts at:
https://lore.kernel.org/git/f22c95ad-43c8-41de-8315-e707224e830b@ramsayjones.plus.com/
>> This corruption is done in a shell loop, directly after a 'test_commit',
>> which uses an ': >"$f"' expression to truncate the file. Adding a sleep
>> of one second after the 'test_commit' and before the shell loop fixes
>> the test (it is not clear why). Replacing the redirection shell expression
>> with a 'test-tool truncate "$f" 0' invocation also provides a fix, which
>> could simply be another way to change the timing sufficiently to win the
>> race.
>>
>> During a debug session, I tried looking at the strace output for the
>> shell redirection:
>>
>> $ rm /tmp/hello; echo hello >/tmp/hello; ls -l /tmp/hello
>> -rw-r--r-- 1 ramsay None 6 Nov 10 17:25 /tmp/hello
>> $
>>
>> $ strace -o zzz bash -c ': >/tmp/hello'
>> $
>>
>> Similarly, for the test-tool solution:
>>
>> $ strace -o xxx ./t/helper/test-tool truncate /tmp/hello 0
>> $
>>
>> When comparing the output, the differences seemed to be what you would
>> expect and, if anything, the shell redirect probably would have taken
>> longer than the test-tool solution (many fcntl() calls to dup the stdout
>> to the <fd>). The call to the win32 api NtCreateFile() was identical,
>> apart from the first (FileHandle) parameter, of course.
>
> Too bad. I stil wonder whether it is the extra process that we spawn
> that ends up fixing the issue.
Well, a 'sleep 1' before the shell loop also fixes the issue. I hate to
mention the 'windows delays updating some file attributes until after the
process has exited' conspiracy theory, but ... :) (yeah, I just don't
think that is possible, except ...)
>> In order to fix this flaky test on cygwin, despite not knowing why it
>> works, replace the shell redirection with the above 'test-tool truncate'
>> invocation.
>>
>> Helped-by: Patrick Steinhardt <ps@pks.im>
>
> Oh, so is this the exact case that we were talking about? If so, it
> might make sense to link to the mail thread so that folks can also read
> a bit into our discussion around this.
Indeed! I thought about referencing the email thread, but I decided that it
didn't really offer any more supporting evidence than the commit message
(in fact less - it doesn't mention the 'strace' scan).
I can add that (again [1]), if you think it's worth it, but I just re-read
the email thread and I'm not convinced it offers much extra value. So, I would
rather not re-roll, but I will if you think it worth it. Let me know.
[1] https://lore.kernel.org/git/f22c95ad-43c8-41de-8315-e707224e830b@ramsayjones.plus.com/
>> Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
>> ---
>> t/t0610-reftable-basics.sh | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
>> index 6575528f21..e19e036898 100755
>> --- a/t/t0610-reftable-basics.sh
>> +++ b/t/t0610-reftable-basics.sh
>> @@ -207,7 +207,7 @@ test_expect_success 'ref transaction: corrupted tables cause failure' '
>> test_commit file1 &&
>> for f in .git/reftable/*.ref
>> do
>> - : >"$f" || return 1
>> + test-tool truncate "$f" 0 || return 1
>> done &&
>> test_must_fail git update-ref refs/heads/main HEAD
>> )
>
> In any case, if it seems to reliably fix the issue I'd say we just merge
> it. It's unfortunate that we haven't been able to figure out the root
> cause, but so be it.
Agreed!
Thanks!
ATB,
Ramsay Jones
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 2/2] t0610-reftable-basics: mitigate a flaky test on cygwin
2026-01-19 17:10 ` Ramsay Jones
@ 2026-01-20 5:49 ` Patrick Steinhardt
0 siblings, 0 replies; 4+ messages in thread
From: Patrick Steinhardt @ 2026-01-20 5:49 UTC (permalink / raw)
To: Ramsay Jones; +Cc: GIT Mailing-list, Johannes Schindelin, Junio C Hamano
On Mon, Jan 19, 2026 at 05:10:46PM +0000, Ramsay Jones wrote:
> On 19/01/2026 6:50 am, Patrick Steinhardt wrote:
> > On Fri, Jan 16, 2026 at 08:39:56PM +0000, Ramsay Jones wrote:
> >> In order to fix this flaky test on cygwin, despite not knowing why it
> >> works, replace the shell redirection with the above 'test-tool truncate'
> >> invocation.
> >>
> >> Helped-by: Patrick Steinhardt <ps@pks.im>
> >
> > Oh, so is this the exact case that we were talking about? If so, it
> > might make sense to link to the mail thread so that folks can also read
> > a bit into our discussion around this.
>
> Indeed! I thought about referencing the email thread, but I decided that it
> didn't really offer any more supporting evidence than the commit message
> (in fact less - it doesn't mention the 'strace' scan).
>
> I can add that (again [1]), if you think it's worth it, but I just re-read
> the email thread and I'm not convinced it offers much extra value. So, I would
> rather not re-roll, but I will if you think it worth it. Let me know.
>
> [1] https://lore.kernel.org/git/f22c95ad-43c8-41de-8315-e707224e830b@ramsayjones.plus.com/
Fair enough, let's just keep it as-is. Thanks!
Patrick
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-01-20 5:49 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-16 20:39 [PATCH 2/2] t0610-reftable-basics: mitigate a flaky test on cygwin Ramsay Jones
2026-01-19 6:50 ` Patrick Steinhardt
2026-01-19 17:10 ` Ramsay Jones
2026-01-20 5:49 ` Patrick Steinhardt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox