* v2.52.0-rc0 test failure on cygwin
@ 2025-11-04 23:49 Ramsay Jones
2025-11-06 10:53 ` Patrick Steinhardt
0 siblings, 1 reply; 6+ messages in thread
From: Ramsay Jones @ 2025-11-04 23:49 UTC (permalink / raw)
To: GIT Mailing-list; +Cc: Junio C Hamano, Adam Dinwoodie, Patrick Steinhardt
Hi Junio,
Just a quick heads up: the rc0 build on cygwin has a flaky test, thus:
$ tail test-out-2-52-rc0
Test Summary Report
-------------------
t0610-reftable-basics.sh (Wstat: 256 (exited 1) Tests: 90 Failed: 1)
Failed test: 29
Non-zero exit status: 1
Files=1024, Tests=32232, 2703 wallclock secs (23.38 usr 60.53 sys + 7886.88 cusr 10419.88 csys = 18390.67 CPU)
Result: FAIL
make[1]: *** [Makefile:78: prove] Error 1
make[1]: Leaving directory '/home/ramsay/git/t'
make: *** [Makefile:3327: test] Error 2
$
Initially, while investigating the failure, I was running the test by hand and it
didn't fail ... So, I tried a stess test, like so:
$ cd t
$ ./t0610-reftable-basics.sh --run=29 --stress-limit=10
FAIL 3.1
OK 7.1
OK 19.1
OK 18.1
OK 2.1
OK 6.1
OK 9.1
OK 0.1
OK 21.1
OK 8.1
OK 4.1
OK 10.1
OK 12.1
OK 26.1
OK 29.1
OK 17.1
OK 1.1
OK 23.1
OK 5.1
OK 11.1
OK 27.1
OK 16.1
OK 14.1
OK 30.1
OK 15.1
OK 25.1
OK 31.1
OK 22.1
OK 24.1
OK 20.1
OK 28.1
OK 13.1
Log(s) of failed test run(s):
Contents of '/home/ramsay/git/t/test-results/t0610-reftable-basics.stress-3.out':
Initialized empty Git repository in /home/ramsay/git/t/trash directory.t0610-reftable-basics.stress-3/.git/
ok 1 # skip pack-refs does not crash with -h (--run)
ok 2 # skip init: creates basic reftable structures (--run)
...
ok 27 # skip clone: can clone reffiles into reftable repository (--run)
ok 28 # skip clone: can clone reftable into reffiles repository (--run)
expecting success of 0610.29 'ref transaction: corrupted tables cause failure':
test_when_finished "rm -rf repo" &&
git init repo &&
(
cd repo &&
test_commit file1 &&
for f in .git/reftable/*.ref
do
: >"$f" || return 1
done &&
test_must_fail git update-ref refs/heads/main HEAD
)
++ test_when_finished 'rm -rf repo'
++ test 0 = 0
++ test_cleanup='{ rm -rf repo
} && (exit "$eval_ret"); eval_ret=$?; :'
++ git init repo
Initialized empty Git repository in /home/ramsay/git/t/trash directory.t0610-reftable-basics.stress-3/repo/.git/
++ cd repo
++ test_commit file1
++ local notick=
++ local echo=echo
++ local append=
++ local author=
++ local signoff=
++ local indir=
++ local tag=light
++ test 1 '!=' 0
++ case "$1" in
++ break
++ indir=
++ local file=file1.t
++ test -n ''
++ echo file1
++ git add -- file1.t
++ test -z ''
++ test_tick
++ test -z ''
++ test_tick=1112911993
++ GIT_COMMITTER_DATE='1112911993 -0700'
++ GIT_AUTHOR_DATE='1112911993 -0700'
++ export GIT_COMMITTER_DATE GIT_AUTHOR_DATE
++ git commit -m file1
[main (root-commit) 69af168] file1
Author: A U Thor <author@example.com>
1 file changed, 1 insertion(+)
create mode 100644 file1.t
++ case "$tag" in
++ git tag file1
++ for f in .git/reftable/*.ref
++ :
./test-lib.sh: line 1017: .git/reftable/0x000000000001-0x000000000002-3b7f1804.ref: Permission denied
error: last command exited with $?=1
not ok 29 - ref transaction: corrupted tables cause failure
#
# test_when_finished "rm -rf repo" &&
# git init repo &&
# (
# cd repo &&
# test_commit file1 &&
# for f in .git/reftable/*.ref
# do
# : >"$f" || return 1
# done &&
# test_must_fail git update-ref refs/heads/main HEAD
# )
#
1..29
$
Note the 'Permission denied' error when attempting to corrupt (empty)
the '*.ref' tables. (Also, this shows 1 error in 32 parallel attempts).
However, the 'trash' directory shows that the file permissions are set
such that I should be able to corrupt them:
$ cd trash\ directory.t0610-reftable-basics.stress-failed/
$ cd repo
$ ls -l .git/reftable
total 3.0K
-rw-r--r-- 1 ramsay None 296 Nov 4 17:56 0x000000000001-0x000000000002-3b7f1804.ref
-rw-r--r-- 1 ramsay None 139 Nov 4 17:56 0x000000000003-0x000000000003-66444a41.ref
-rw-r--r-- 1 ramsay None 86 Nov 4 17:56 tables.list
$
Indeed, I can do just that:
$ : >.git/reftable/0x000000000001-0x000000000002-3b7f1804.ref
$ ls -l .git/reftable
total 2.0K
-rw-r--r-- 1 ramsay None 0 Nov 4 18:54 0x000000000001-0x000000000002-3b7f1804.ref
-rw-r--r-- 1 ramsay None 139 Nov 4 17:56 0x000000000003-0x000000000003-66444a41.ref
-rw-r--r-- 1 ramsay None 86 Nov 4 17:56 tables.list
$ git update-ref refs/heads/main HEAD
fatal: HEAD: not a valid SHA1
$
So, this appears to be a timing issue (a bit difficult to think how it
actually could be, but ...), so maybe try:
$ cd ../..
$ vim t0610-reftable-basics.sh
$ git diff
diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
index 3ea5d51532..4fc5cbca99 100755
--- a/t/t0610-reftable-basics.sh
+++ b/t/t0610-reftable-basics.sh
@@ -205,6 +205,7 @@ test_expect_success 'ref transaction: corrupted tables cause failure' '
(
cd repo &&
test_commit file1 &&
+ sleep 1 &&
for f in .git/reftable/*.ref
do
: >"$f" || return 1
$ ./t0610-reftable-basics.sh --run=29 --stress-limit=10
OK 0.1
OK 13.1
OK 4.1
OK 1.1
OK 2.1
OK 3.1
OK 10.1
OK 8.1
OK 26.1
OK 21.1
OK 7.1
OK 19.1
OK 9.1
OK 5.1
OK 14.1
OK 22.1
OK 11.1
OK 27.1
OK 17.1
OK 29.1
OK 6.1
OK 30.1
OK 15.1
OK 23.1
OK 12.1
OK 24.1
OK 25.1
OK 16.1
OK 18.1
OK 28.1
OK 31.1
OK 20.1
OK 0.2
OK 13.2
OK 4.2
OK 10.2
OK 2.2
OK 3.2
OK 1.2
OK 22.2
OK 8.2
OK 21.2
OK 19.2
OK 5.2
OK 9.2
OK 7.2
OK 26.2
OK 11.2
OK 14.2
OK 25.2
OK 28.2
OK 15.2
OK 17.2
OK 27.2
OK 6.2
OK 18.2
OK 30.2
OK 31.2
OK 29.2
OK 20.2
OK 16.2
OK 12.2
OK 23.2
OK 24.2
OK 13.3
OK 4.3
OK 0.3
OK 3.3
OK 10.3
OK 2.3
OK 1.3
OK 8.3
OK 22.3
OK 19.3
OK 5.3
OK 14.3
OK 26.3
OK 7.3
OK 25.3
OK 15.3
OK 28.3
OK 21.3
OK 9.3
OK 11.3
OK 12.3
OK 23.3
OK 16.3
OK 17.3
OK 18.3
OK 29.3
OK 6.3
OK 30.3
OK 27.3
OK 24.3
OK 31.3
OK 20.3
$
[I hit ctrl-c here]
So, not really an answer, but I have noted several times over the years
that cygwin seems to delay setting some file attributes until after the
process has exited ... [yeah, I don't see how either! ;) ].
I noted the above last night and, unfortunately, I haven't had any
time to look into this tonight. (hopefully tomorrow).
[I haven't tried bisecting because, well ... flaky test! ;) ]
ATB,
Ramsay Jones
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: v2.52.0-rc0 test failure on cygwin 2025-11-04 23:49 v2.52.0-rc0 test failure on cygwin Ramsay Jones @ 2025-11-06 10:53 ` Patrick Steinhardt 2025-11-06 18:27 ` Johannes Sixt 2025-11-06 20:28 ` Ramsay Jones 0 siblings, 2 replies; 6+ messages in thread From: Patrick Steinhardt @ 2025-11-06 10:53 UTC (permalink / raw) To: Ramsay Jones; +Cc: GIT Mailing-list, Junio C Hamano, Adam Dinwoodie On Tue, Nov 04, 2025 at 11:49:46PM +0000, Ramsay Jones wrote: > Just a quick heads up: the rc0 build on cygwin has a flaky test, thus: > > $ tail test-out-2-52-rc0 > Test Summary Report > ------------------- > t0610-reftable-basics.sh (Wstat: 256 (exited 1) Tests: 90 Failed: 1) > Failed test: 29 > Non-zero exit status: 1 > Files=1024, Tests=32232, 2703 wallclock secs (23.38 usr 60.53 sys + 7886.88 cusr 10419.88 csys = 18390.67 CPU) > Result: FAIL > make[1]: *** [Makefile:78: prove] Error 1 > make[1]: Leaving directory '/home/ramsay/git/t' > make: *** [Makefile:3327: test] Error 2 > $ > > Initially, while investigating the failure, I was running the test by hand and it > didn't fail ... So, I tried a stess test, like so: Interesting. My first hunch is that the root cause is auto-maintenance. git-maintenance(1) spawns `git pack-refs --auto`, and that process will open the stack so that it can verify whether it needs to be packed or not. And Windows being Windows, the file being open may mean that it cannot be written by another process at the same point in time. In any case, I was able to reproduce the issue. But disabling auto maintenance with the following patch does not fix the flake. diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh index 3ea5d51532..52bbf4fe57 100755 --- a/t/t0610-reftable-basics.sh +++ b/t/t0610-reftable-basics.sh @@ -204,6 +204,7 @@ test_expect_success 'ref transaction: corrupted tables cause failure' ' git init repo && ( cd repo && + git config set maintenance.auto false && test_commit file1 && for f in .git/reftable/*.ref do And I guess that makes sense? I'd assume that Cygwin already knows to open files with POSIX semantics, so it should be possible to write to the file even if it was held open by another Git process. [snip] > So, not really an answer, but I have noted several times over the years > that cygwin seems to delay setting some file attributes until after the > process has exited ... [yeah, I don't see how either! ;) ]. What? That's horrible if true. How doesn't this cause more issues? I wonder whether the issue is surfaced because we use the shell to truncate the file. If you instead use `file-tool truncate 0` for example then I cannot reproduce the flake anymore: diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh index 3ea5d51532..1058f83993 100755 --- a/t/t0610-reftable-basics.sh +++ b/t/t0610-reftable-basics.sh @@ -207,7 +207,7 @@ test_expect_success 'ref transaction: corrupted tables cause failure' ' test_commit file1 && for f in .git/reftable/*.ref do - : >"$f" || return 1 + test-tool truncate "$f" 0 || return 1 done && test_must_fail git update-ref refs/heads/main HEAD ) But this may very well just be due to timing again -- spawning the process will be slower than using shell redirection to trim the file. All of this is quite curious. I don't really have any better idea than to use something like the above patch. It's ugly, doubly so because I don't understand either the root cause nor why the patch properly fixes it. So I'd be grateful if anyone were to enlighten me :) > I noted the above last night and, unfortunately, I haven't had any > time to look into this tonight. (hopefully tomorrow). > > [I haven't tried bisecting because, well ... flaky test! ;) ] I have verified that the flake already exists in Git 2.51, so at least it's not a regression in the current release cycle. Thanks! Patrick ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: v2.52.0-rc0 test failure on cygwin 2025-11-06 10:53 ` Patrick Steinhardt @ 2025-11-06 18:27 ` Johannes Sixt 2025-11-06 21:00 ` Ramsay Jones 2025-11-06 20:28 ` Ramsay Jones 1 sibling, 1 reply; 6+ messages in thread From: Johannes Sixt @ 2025-11-06 18:27 UTC (permalink / raw) To: Patrick Steinhardt, Ramsay Jones Cc: GIT Mailing-list, Junio C Hamano, Adam Dinwoodie Am 06.11.25 um 11:53 schrieb Patrick Steinhardt: > On Tue, Nov 04, 2025 at 11:49:46PM +0000, Ramsay Jones wrote: >> So, not really an answer, but I have noted several times over the years >> that cygwin seems to delay setting some file attributes until after the >> process has exited ... [yeah, I don't see how either! ;) ]. > > What? That's horrible if true. How doesn't this cause more issues? Unlike POSIX write(), Windows's WriteFile() doesn't update the modification time stamp immediately. It's only updated when the last file handle is closed. https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-writefile > When writing to a file, the last write time is not fully updated until > all handles used for writing have been closed. -- Hannes ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: v2.52.0-rc0 test failure on cygwin 2025-11-06 18:27 ` Johannes Sixt @ 2025-11-06 21:00 ` Ramsay Jones 0 siblings, 0 replies; 6+ messages in thread From: Ramsay Jones @ 2025-11-06 21:00 UTC (permalink / raw) To: Johannes Sixt, Patrick Steinhardt Cc: GIT Mailing-list, Junio C Hamano, Adam Dinwoodie On 06/11/2025 6:27 pm, Johannes Sixt wrote: > Am 06.11.25 um 11:53 schrieb Patrick Steinhardt: >> On Tue, Nov 04, 2025 at 11:49:46PM +0000, Ramsay Jones wrote: >>> So, not really an answer, but I have noted several times over the years >>> that cygwin seems to delay setting some file attributes until after the >>> process has exited ... [yeah, I don't see how either! ;) ]. >> >> What? That's horrible if true. How doesn't this cause more issues? > > Unlike POSIX write(), Windows's WriteFile() doesn't update the > modification time stamp immediately. It's only updated when the last > file handle is closed. > > https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-writefile > >> When writing to a file, the last write time is not fully updated until >> all handles used for writing have been closed. > Yep, I deliberately said 'seems to ...' because I have very little by way of hard facts! ;) Well, except, in approx 1996 on windows NT3.51 (on the only occasion that I did any commercial windows programming), I had a 'problem' with what appeared to be a 'late' update to a file attribute (I can't remember which one - it could have been time related, rather than permissions). A 'windows expert' (which I have never been) gave me a workaround for the 'windows filesystem bug' (his words). It was at this time that I first used cygwin (beta 14 I think - I still have the cygnus solutions CD somewhere). I had managed to not use windows of any description before then, so I was absolutely horrified by the awful development environment that greeted me! (even with the visual C++ GUI). So, on every laptop since then, I have immediately installed cygwin, along with dual-booting linux. In a small way, I have tried to 'pay back' the cygwin developers, who saved me from going crazy during that project! ;) ATB, Ramsay Jones ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: v2.52.0-rc0 test failure on cygwin 2025-11-06 10:53 ` Patrick Steinhardt 2025-11-06 18:27 ` Johannes Sixt @ 2025-11-06 20:28 ` Ramsay Jones 2025-11-07 6:04 ` Patrick Steinhardt 1 sibling, 1 reply; 6+ messages in thread From: Ramsay Jones @ 2025-11-06 20:28 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: GIT Mailing-list, Junio C Hamano, Adam Dinwoodie On 06/11/2025 10:53 am, Patrick Steinhardt wrote: > On Tue, Nov 04, 2025 at 11:49:46PM +0000, Ramsay Jones wrote: >> Just a quick heads up: the rc0 build on cygwin has a flaky test, thus: >> >> $ tail test-out-2-52-rc0 >> Test Summary Report >> ------------------- >> t0610-reftable-basics.sh (Wstat: 256 (exited 1) Tests: 90 Failed: 1) >> Failed test: 29 >> Non-zero exit status: 1 >> Files=1024, Tests=32232, 2703 wallclock secs (23.38 usr 60.53 sys + 7886.88 cusr 10419.88 csys = 18390.67 CPU) >> Result: FAIL >> make[1]: *** [Makefile:78: prove] Error 1 >> make[1]: Leaving directory '/home/ramsay/git/t' >> make: *** [Makefile:3327: test] Error 2 >> $ >> >> Initially, while investigating the failure, I was running the test by hand and it >> didn't fail ... So, I tried a stess test, like so: > > Interesting. My first hunch is that the root cause is auto-maintenance. > git-maintenance(1) spawns `git pack-refs --auto`, and that process will > open the stack so that it can verify whether it needs to be packed or > not. And Windows being Windows, the file being open may mean that it > cannot be written by another process at the same point in time. > > In any case, I was able to reproduce the issue. But disabling auto > maintenance with the following patch does not fix the flake. > > diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh > index 3ea5d51532..52bbf4fe57 100755 > --- a/t/t0610-reftable-basics.sh > +++ b/t/t0610-reftable-basics.sh > @@ -204,6 +204,7 @@ test_expect_success 'ref transaction: corrupted tables cause failure' ' > git init repo && > ( > cd repo && > + git config set maintenance.auto false && > test_commit file1 && > for f in .git/reftable/*.ref > do > Thanks for looking into this - yesterday was unexpectedly busy, so I didn't have time to look at this myself. :( I also thought, briefly, about 'git maintenance' since the error seems to happen in parallel heavy workloads. You probably didn't notice that the test finished in 45min, because I ran the test with '-j8'. I have recently replaced my win10 laptop. On my old laptop, the non-parallel test run used to take 6+ hours. With my new laptop it is 4+ hours, so it is still a long time to wait. However, the 'meson test', which by default runs the tests in parallel, was much faster (about 80-90min). So, it was worth a try... At the moment the parallel tests hang about half of the time (prove hangs right at the very end!), so I am still experimenting. [snip] > I wonder whether the issue is surfaced because we use the shell to > truncate the file. If you instead use `file-tool truncate 0` for example > then I cannot reproduce the flake anymore: > > diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh > index 3ea5d51532..1058f83993 100755 > --- a/t/t0610-reftable-basics.sh > +++ b/t/t0610-reftable-basics.sh > @@ -207,7 +207,7 @@ test_expect_success 'ref transaction: corrupted tables cause failure' ' > test_commit file1 && > for f in .git/reftable/*.ref > do > - : >"$f" || return 1 > + test-tool truncate "$f" 0 || return 1 > done && > test_must_fail git update-ref refs/heads/main HEAD > ) > > But this may very well just be due to timing again -- spawning the > process will be slower than using shell redirection to trim the file. I tried this patch tonight, letting: $ ./t0610-reftable-basics.sh --run=29 --stress-limit=10 finish, which it did without failure. So that's 32 * 10 successful runs. (I had expected 16 * 10 yesterday, ie 2 * cores * 10, but this laptop has 8 cores 16 threads, so 'getconf _NPROCESSORS_ONLN' returns 16 not 8). > > All of this is quite curious. I don't really have any better idea than > to use something like the above patch. It's ugly, doubly so because I > don't understand either the root cause nor why the patch properly fixes > it. So I'd be grateful if anyone were to enlighten me :) Me too! :) > I have verified that the flake already exists in Git 2.51, so at least > it's not a regression in the current release cycle. OK, that's good to know. Despite the mystery, I think a patch based on the above would be the best solution for now. (Assuming nobody has a better idea). Thanks. ATB, Ramsay Jones ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: v2.52.0-rc0 test failure on cygwin 2025-11-06 20:28 ` Ramsay Jones @ 2025-11-07 6:04 ` Patrick Steinhardt 0 siblings, 0 replies; 6+ messages in thread From: Patrick Steinhardt @ 2025-11-07 6:04 UTC (permalink / raw) To: Ramsay Jones; +Cc: GIT Mailing-list, Junio C Hamano, Adam Dinwoodie On Thu, Nov 06, 2025 at 08:28:35PM +0000, Ramsay Jones wrote: > On 06/11/2025 10:53 am, Patrick Steinhardt wrote: > > I wonder whether the issue is surfaced because we use the shell to > > truncate the file. If you instead use `file-tool truncate 0` for example > > then I cannot reproduce the flake anymore: > > > > diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh > > index 3ea5d51532..1058f83993 100755 > > --- a/t/t0610-reftable-basics.sh > > +++ b/t/t0610-reftable-basics.sh > > @@ -207,7 +207,7 @@ test_expect_success 'ref transaction: corrupted tables cause failure' ' > > test_commit file1 && > > for f in .git/reftable/*.ref > > do > > - : >"$f" || return 1 > > + test-tool truncate "$f" 0 || return 1 > > done && > > test_must_fail git update-ref refs/heads/main HEAD > > ) > > > > But this may very well just be due to timing again -- spawning the > > process will be slower than using shell redirection to trim the file. > > I tried this patch tonight, letting: > > $ ./t0610-reftable-basics.sh --run=29 --stress-limit=10 > > finish, which it did without failure. So that's 32 * 10 successful runs. > > (I had expected 16 * 10 yesterday, ie 2 * cores * 10, but this laptop > has 8 cores 16 threads, so 'getconf _NPROCESSORS_ONLN' returns 16 not 8). Nice :) > > All of this is quite curious. I don't really have any better idea than > > to use something like the above patch. It's ugly, doubly so because I > > don't understand either the root cause nor why the patch properly fixes > > it. So I'd be grateful if anyone were to enlighten me :) > > Me too! :) > > > I have verified that the flake already exists in Git 2.51, so at least > > it's not a regression in the current release cycle. > OK, that's good to know. > > Despite the mystery, I think a patch based on the above would be > the best solution for now. (Assuming nobody has a better idea). Please feel free to take it and turn it into a proper patch. My main goal was to verify that this is not a regression and that nothing new broke in the reftable backend. I'm happy to let you take over from here, as I'm a bit short on time otherwise. Thanks! Patrick ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-11-07 6:04 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-11-04 23:49 v2.52.0-rc0 test failure on cygwin Ramsay Jones 2025-11-06 10:53 ` Patrick Steinhardt 2025-11-06 18:27 ` Johannes Sixt 2025-11-06 21:00 ` Ramsay Jones 2025-11-06 20:28 ` Ramsay Jones 2025-11-07 6:04 ` Patrick Steinhardt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).