From: Patrick Steinhardt <ps@pks.im>
To: Ramsay Jones <ramsay@ramsayjones.plus.com>
Cc: GIT Mailing-list <git@vger.kernel.org>,
Junio C Hamano <gitster@pobox.com>
Subject: Re: v2.47.0-rc1 test failure on cygwin
Date: Fri, 4 Oct 2024 08:13:05 +0200 [thread overview]
Message-ID: <Zv-HbT8qrM6IYKb4@pks.im> (raw)
In-Reply-To: <Zv9oIrKveu-JAGQM@pks.im>
On Fri, Oct 04, 2024 at 05:59:30AM +0200, Patrick Steinhardt wrote:
> On Fri, Oct 04, 2024 at 02:02:44AM +0100, Ramsay Jones wrote:
> > Hi Patrick,
> >
> > Just a quick heads up: t0610-reftable-basics.sh test 47 (ref transaction: many
> > concurrent writers) fails on cygwin. The tail end of the debug output for this
> > test looks like:
> >
> [snip]
> >
> > t0610-reftable-basics.sh passed on 'rc0', but this test (and the timeout facility)
> > is new in 'rc1'. I tried simply increasing the timeout (10 fold), but that didn't
> > change the result. (I didn't really expect it to - the 'reftable: transaction
> > prepare: I/O error' does not look timing related!).
> >
> > Again, just a heads up. (I can't look at it until tomorrow now; any ideas?)
>
> This failure is kind of known and discussed in [1]. Just to make it
> explicit: this test failure doesn't really surface a regression, the
> reftable code already failed for concurrent writes before. I fixed that
> and added the test that is now flaky, as the fix itself is seemingly
> only sufficient on Linux and macOS.
>
> I didn't yet have the time to look at whether I can fix it, but should
> finally find the time to do so today.
Hm, interestingly enough I cannot reproduce the issue on Cygwin myself,
but I can reproduce the issue with MinGW. And in fact, the logs you have
sent all indicate that we cannot acquire the lock, there is no sign of
I/O errors here. So I guess you're running into timeout issues. Does the
following patch fix this for you?
diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
index 2d951c8ceb..b5cad805d4 100755
--- a/t/t0610-reftable-basics.sh
+++ b/t/t0610-reftable-basics.sh
@@ -455,10 +455,7 @@ test_expect_success 'ref transaction: many concurrent writers' '
git init repo &&
(
cd repo &&
- # Set a high timeout such that a busy CI machine will not abort
- # early. 10 seconds should hopefully be ample of time to make
- # this non-flaky.
- git config set reftable.lockTimeout 10000 &&
+ git config set reftable.lockTimeout -1 &&
test_commit --no-tag initial &&
head=$(git rev-parse HEAD) &&
The issue on Win32 is different: we cannot commit the "tables.list" lock
via rename(3P) because the target file may be open for reading by a
concurrent process. I guess that Cygwin has proper POSIX semantics for
rename(3P) and thus doesn't hit the same issue.
We already try to emulate POSIX semantics somewhat in `mingw_rename()`
by using a retry-loop when we hit `ERROR_ACCESS_DENIED`, which is what
we get when the target file is open in another process. But that
seemingly isn't enough when there is a lot of contention around a file.
So I'm currently investigating whether we can adopt something similar to
what Cygwin is doing for Win32, as well. I assume that they use
`FILE_RENAME_INFORMATION_EX` with `FILE_RENAME_POSIX_SEMANTICS`, which
should give us what we're looking for.
gh, well. Turns out the implementation of rename(3P) in Cygwin is 500
lines long. I guess this is a non-trivial problem :) But they of course
have to handle a whole lot more cases than we have to. But my guess was
correct: they do use `FILE_RENAME_POSIX_SEMANTICS`. The catch is that
this flag only exists in Windows 10 and newer. But that should be a fine
compromise.
I'll try to wrap my head around how all of this works.
Patrick
next prev parent reply other threads:[~2024-10-04 6:13 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-04 1:02 v2.47.0-rc1 test failure on cygwin Ramsay Jones
2024-10-04 3:59 ` Patrick Steinhardt
2024-10-04 6:13 ` Patrick Steinhardt [this message]
2024-10-04 9:13 ` Johannes Schindelin
2024-10-04 10:09 ` Patrick Steinhardt
2024-10-04 11:11 ` Johannes Schindelin
2024-10-04 11:32 ` Patrick Steinhardt
2024-10-04 16:09 ` Junio C Hamano
2024-10-04 17:14 ` Patrick Steinhardt
2024-10-04 17:54 ` Junio C Hamano
2024-10-04 12:16 ` [PATCH] t0610: work around flaky test with concurrent writers Patrick Steinhardt
2024-10-04 14:47 ` Ramsay Jones
2024-10-04 15:26 ` Patrick Steinhardt
2024-10-04 16:32 ` Junio C Hamano
2024-10-04 16:22 ` Junio C Hamano
2024-10-04 15:32 ` [PATCH v2] " Patrick Steinhardt
2024-10-04 16:32 ` Ramsay Jones
2024-10-04 16:35 ` Junio C Hamano
2024-10-04 22:41 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zv-HbT8qrM6IYKb4@pks.im \
--to=ps@pks.im \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=ramsay@ramsayjones.plus.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).