git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Ramsay Jones <ramsay@ramsayjones.plus.com>
Cc: GIT Mailing-list <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: v2.47.0-rc1 test failure on cygwin
Date: Fri, 4 Oct 2024 08:13:05 +0200	[thread overview]
Message-ID: <Zv-HbT8qrM6IYKb4@pks.im> (raw)
In-Reply-To: <Zv9oIrKveu-JAGQM@pks.im>

On Fri, Oct 04, 2024 at 05:59:30AM +0200, Patrick Steinhardt wrote:
> On Fri, Oct 04, 2024 at 02:02:44AM +0100, Ramsay Jones wrote:
> > Hi Patrick,
> > 
> > Just a quick heads up: t0610-reftable-basics.sh test 47 (ref transaction: many
> > concurrent writers) fails on cygwin. The tail end of the debug output for this
> > test looks like:
> > 
> [snip]
> > 
> > t0610-reftable-basics.sh passed on 'rc0', but this test (and the timeout facility)
> > is new in 'rc1'. I tried simply increasing the timeout (10 fold), but that didn't
> > change the result. (I didn't really expect it to - the 'reftable: transaction
> > prepare: I/O error' does not look timing related!).
> > 
> > Again, just a heads up. (I can't look at it until tomorrow now; any ideas?)
> 
> This failure is kind of known and discussed in [1]. Just to make it
> explicit: this test failure doesn't really surface a regression, the
> reftable code already failed for concurrent writes before. I fixed that
> and added the test that is now flaky, as the fix itself is seemingly
> only sufficient on Linux and macOS.
> 
> I didn't yet have the time to look at whether I can fix it, but should
> finally find the time to do so today.

Hm, interestingly enough I cannot reproduce the issue on Cygwin myself,
but I can reproduce the issue with MinGW. And in fact, the logs you have
sent all indicate that we cannot acquire the lock, there is no sign of
I/O errors here. So I guess you're running into timeout issues. Does the
following patch fix this for you?

diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
index 2d951c8ceb..b5cad805d4 100755
--- a/t/t0610-reftable-basics.sh
+++ b/t/t0610-reftable-basics.sh
@@ -455,10 +455,7 @@ test_expect_success 'ref transaction: many concurrent writers' '
 	git init repo &&
 	(
 		cd repo &&
-		# Set a high timeout such that a busy CI machine will not abort
-		# early. 10 seconds should hopefully be ample of time to make
-		# this non-flaky.
-		git config set reftable.lockTimeout 10000 &&
+		git config set reftable.lockTimeout -1 &&
 		test_commit --no-tag initial &&
 
 		head=$(git rev-parse HEAD) &&

The issue on Win32 is different: we cannot commit the "tables.list" lock
via rename(3P) because the target file may be open for reading by a
concurrent process. I guess that Cygwin has proper POSIX semantics for
rename(3P) and thus doesn't hit the same issue.

We already try to emulate POSIX semantics somewhat in `mingw_rename()`
by using a retry-loop when we hit `ERROR_ACCESS_DENIED`, which is what
we get when the target file is open in another process. But that
seemingly isn't enough when there is a lot of contention around a file.
So I'm currently investigating whether we can adopt something similar to
what Cygwin is doing for Win32, as well. I assume that they use
`FILE_RENAME_INFORMATION_EX` with `FILE_RENAME_POSIX_SEMANTICS`, which
should give us what we're looking for.

gh, well. Turns out the implementation of rename(3P) in Cygwin is 500
lines long. I guess this is a non-trivial problem :) But they of course
have to handle a whole lot more cases than we have to. But my guess was
correct: they do use `FILE_RENAME_POSIX_SEMANTICS`. The catch is that
this flag only exists in Windows 10 and newer. But that should be a fine
compromise.

I'll try to wrap my head around how all of this works.

Patrick

  reply	other threads:[~2024-10-04  6:13 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-04  1:02 v2.47.0-rc1 test failure on cygwin Ramsay Jones
2024-10-04  3:59 ` Patrick Steinhardt
2024-10-04  6:13   ` Patrick Steinhardt [this message]
2024-10-04  9:13     ` Johannes Schindelin
2024-10-04 10:09       ` Patrick Steinhardt
2024-10-04 11:11         ` Johannes Schindelin
2024-10-04 11:32           ` Patrick Steinhardt
2024-10-04 16:09           ` Junio C Hamano
2024-10-04 17:14             ` Patrick Steinhardt
2024-10-04 17:54               ` Junio C Hamano
2024-10-04 12:16 ` [PATCH] t0610: work around flaky test with concurrent writers Patrick Steinhardt
2024-10-04 14:47   ` Ramsay Jones
2024-10-04 15:26     ` Patrick Steinhardt
2024-10-04 16:32     ` Junio C Hamano
2024-10-04 16:22   ` Junio C Hamano
2024-10-04 15:32 ` [PATCH v2] " Patrick Steinhardt
2024-10-04 16:32   ` Ramsay Jones
2024-10-04 16:35   ` Junio C Hamano
2024-10-04 22:41   ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zv-HbT8qrM6IYKb4@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=ramsay@ramsayjones.plus.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).