From: Patrick Steinhardt <ps@pks.im>
To: git@vger.kernel.org
Cc: Ramsay Jones <ramsay@ramsayjones.plus.com>,
Junio C Hamano <gitster@pobox.com>,
Johannes Schindelin <Johannes.Schindelin@gmx.de>,
Jeff King <peff@peff.net>, Josh Steadmon <steadmon@google.com>
Subject: [PATCH] t0610: work around flaky test with concurrent writers
Date: Fri, 4 Oct 2024 14:16:45 +0200 [thread overview]
Message-ID: <f83e23f1e76454a80e3e53cd02b3bb5bba6b8da1.1728044178.git.ps@pks.im> (raw)
In-Reply-To: <b1b5fb40-f6c2-4621-b58c-9b7c8c64cc01@ramsayjones.plus.com>
In 6241ce2170 (refs/reftable: reload locked stack when preparing
transaction, 2024-09-24) we have introduced a new test that exercises
how the reftable backend behaves with many concurrent writers all racing
with each other. This test was introduced after a couple of fixes in
this context that should make concurrent writes behave gracefully. As it
turns out though, Windows systems do not yet handle concurrent writes
properly, as we've got two reports for Cygwin and MinGW failing in this
newly added test.
The root cause of this is how we update the "tables.list" file: when
writing a new stack of tables we first write the data into a lockfile
and then rename that file into place. But Windows forbids us from doing
that rename when the target path is open for reading by another process.
And as the test races both readers and writers with each other we are
quite likely to hit this edge case.
Now the two reports are somewhat different from one another:
- On Cygwin we hit timeouts because we fail to lock the "tables.list"
file within 10 seconds. The renames themselves succeed even when the
target file is open because Cygwin provides extensive compatibility
logic to make them work even when the target file is open already.
- On MinGW we hit I/O errors on rename. While we do have some retry
logic in place to make the rename work in some cases, this is
seemingly not sufficient when there is this much contention around
the files.
Neither of these cases is a regression: the logic didn't work before the
mentioned commit, and after the commit it performs well on Linux, macOS
and in Cygwin, and at least a bit better with MinGW. But the tests show
that we need to put more thought into how to make this work properly on
MinGW systems.
The fact that Cygwin can work around this issue with better emulation of
POSIX-style atomic renames shows that we can in theory make MinGW work
better, as well. But doing so likely requires quite some fiddling with
Windows internals, and Git v2.47 is about to be released in a couple
days. This makes any potential fix quite risky as it would have to
happen deep down in our rename(3P) implementation in "compat/mingw.c".
Let's instead work around both issues by disabling the test on MinGW
and by significantly increasing the locking timeout for Cygwin. This
bumped timeout also helps when running with e.g. the address and memory
sanitizers, which also tend to significantly extend the runtime of this
test.
This should be revisited after Git v2.47 is out.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
This fix can be applied to remove some of the stress with the Git v2.47
release pending. If would of course be preferable to find an alternate
fix that makes MinGW work as required, but if you take the 500 lines of
code that is the rename(3P) implemenation of Cygwin as a hint you
quickly figure out that this is a rather complex problem.
Patrick
t/t0610-reftable-basics.sh | 22 +++++++++++++++++-----
1 file changed, 17 insertions(+), 5 deletions(-)
diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
index 2d951c8ceb..86a746aff0 100755
--- a/t/t0610-reftable-basics.sh
+++ b/t/t0610-reftable-basics.sh
@@ -450,15 +450,27 @@ test_expect_success 'ref transaction: retry acquiring tables.list lock' '
)
'
-test_expect_success 'ref transaction: many concurrent writers' '
+# This test fails most of the time on Windows systems. The root cause is
+# that Windows does not allow us to rename the "tables.list.lock" file into
+# place when "tables.list" is open for reading by a concurrent process.
+#
+# The same issue does not happen on Cygwin because its implementation of
+# rename(3P) is emulating POSIX-style renames, including renames over files
+# that are open.
+test_expect_success !MINGW 'ref transaction: many concurrent writers' '
test_when_finished "rm -rf repo" &&
git init repo &&
(
cd repo &&
- # Set a high timeout such that a busy CI machine will not abort
- # early. 10 seconds should hopefully be ample of time to make
- # this non-flaky.
- git config set reftable.lockTimeout 10000 &&
+ # Set a high timeout. While a couple of seconds should be
+ # plenty, using the address sanitizer will significantly slow
+ # us down here. Furthermore, Cygwin is also way slower due to
+ # the POSIX-style rename emulation. So we are aiming way higher
+ # than you would ever think is necessary just to keep us from
+ # flaking. We could also lock indefinitely by passing -1, but
+ # that could potentially block CI jobs indefinitely if there
+ # was a bug here.
+ git config set reftable.lockTimeout 300000 &&
test_commit --no-tag initial &&
head=$(git rev-parse HEAD) &&
base-commit: 111e864d69c84284441b083966c2065c2e9a4e78
--
2.47.0.rc0.dirty
next prev parent reply other threads:[~2024-10-04 12:16 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-04 1:02 v2.47.0-rc1 test failure on cygwin Ramsay Jones
2024-10-04 3:59 ` Patrick Steinhardt
2024-10-04 6:13 ` Patrick Steinhardt
2024-10-04 9:13 ` Johannes Schindelin
2024-10-04 10:09 ` Patrick Steinhardt
2024-10-04 11:11 ` Johannes Schindelin
2024-10-04 11:32 ` Patrick Steinhardt
2024-10-04 16:09 ` Junio C Hamano
2024-10-04 17:14 ` Patrick Steinhardt
2024-10-04 17:54 ` Junio C Hamano
2024-10-04 12:16 ` Patrick Steinhardt [this message]
2024-10-04 14:47 ` [PATCH] t0610: work around flaky test with concurrent writers Ramsay Jones
2024-10-04 15:26 ` Patrick Steinhardt
2024-10-04 16:32 ` Junio C Hamano
2024-10-04 16:22 ` Junio C Hamano
2024-10-04 15:32 ` [PATCH v2] " Patrick Steinhardt
2024-10-04 16:32 ` Ramsay Jones
2024-10-04 16:35 ` Junio C Hamano
2024-10-04 22:41 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f83e23f1e76454a80e3e53cd02b3bb5bba6b8da1.1728044178.git.ps@pks.im \
--to=ps@pks.im \
--cc=Johannes.Schindelin@gmx.de \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
--cc=ramsay@ramsayjones.plus.com \
--cc=steadmon@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).