From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: Patrick Steinhardt <ps@pks.im>
Cc: Ramsay Jones <ramsay@ramsayjones.plus.com>,
GIT Mailing-list <git@vger.kernel.org>,
Junio C Hamano <gitster@pobox.com>
Subject: Re: v2.47.0-rc1 test failure on cygwin
Date: Fri, 4 Oct 2024 13:11:15 +0200 (CEST) [thread overview]
Message-ID: <f29241a7-aadd-e824-97f3-a95ac6619951@gmx.de> (raw)
In-Reply-To: <Zv--68J5qv60IuQz@pks.im>
Hi Patrick,
On Fri, 4 Oct 2024, Patrick Steinhardt wrote:
> On Fri, Oct 04, 2024 at 11:13:34AM +0200, Johannes Schindelin wrote:
> >
> > On Fri, 4 Oct 2024, Patrick Steinhardt wrote:
> >
> > > diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
> > > index 2d951c8ceb..b5cad805d4 100755
> > > --- a/t/t0610-reftable-basics.sh
> > > +++ b/t/t0610-reftable-basics.sh
> > > @@ -455,10 +455,7 @@ test_expect_success 'ref transaction: many concurrent writers' '
> > > git init repo &&
> > > (
> > > cd repo &&
> > > - # Set a high timeout such that a busy CI machine will not abort
> > > - # early. 10 seconds should hopefully be ample of time to make
> > > - # this non-flaky.
> > > - git config set reftable.lockTimeout 10000 &&
> > > + git config set reftable.lockTimeout -1 &&
> > > test_commit --no-tag initial &&
> > >
> > > head=$(git rev-parse HEAD) &&
> >
> > That looks plausible to me, as this test is exercising the POSIX emulation
> > layer of Cygwin much more than Git itself (and therefore I expect this
> > test case to take a really long time on Cygwin). If it turns out that this
> > works around the issue, I would propose to use something like this
> > instead:
> >
> > # Cygwin needs a slightly longer timeout
> > if have_prereq !CYGWIN
> > then
> > git config set reftable.lockTimeout 10000
> > else
> > git config set reftable.lockTimeout 60000
> > fi &&
>
> We also saw that this creates an issue when running e.g. with the memory
> and leak sanitizers. So I'd just generally raise the timeout value to
> something way higher to avoid flakiness, like 5 minutes.
Sounds good!
> > > The issue on Win32 is different: we cannot commit the "tables.list" lock
> > > via rename(3P) because the target file may be open for reading by a
> > > concurrent process. I guess that Cygwin has proper POSIX semantics for
> > > rename(3P) and thus doesn't hit the same issue.
> >
> > Indeed, this is where the symptom lies. I worked around it in Git for
> > Windows v2.47.0-rc1 via this patch:
> >
> > -- snipsnap --
> > diff --git a/compat/mingw.c b/compat/mingw.c
> > index c1030e40f227..56db193d1360 100644
> > --- a/compat/mingw.c
> > +++ b/compat/mingw.c
> > @@ -784,6 +784,7 @@ int mingw_open (const char *filename, int oflags, ...)
> > int fd, create = (oflags & (O_CREAT | O_EXCL)) == (O_CREAT | O_EXCL);
> > wchar_t wfilename[MAX_LONG_PATH];
> > open_fn_t open_fn;
> > + int tries = 0;
> >
> > va_start(args, oflags);
> > mode = va_arg(args, int);
> > @@ -813,7 +814,11 @@ int mingw_open (const char *filename, int oflags, ...)
> > else if (xutftowcs_long_path(wfilename, filename) < 0)
> > return -1;
> >
> > - fd = open_fn(wfilename, oflags, mode);
> > + do {
> > + fd = open_fn(wfilename, oflags, mode);
> > + } while (fd < 0 && GetLastError() == ERROR_SHARING_VIOLATION &&
> > + retry_ask_yes_no(&tries, "Opening '%s' failed because another "
> > + "application accessed it. Try again?", filename));
> >
> > if ((oflags & O_CREAT) && fd >= 0 && are_wsl_compatible_mode_bits_enabled()) {
> > _mode_t wsl_mode = S_IFREG | (mode&0777);
>
> Wait, this is surprising to me as I saw the error happening when calling
> rename, not open. So why would retries in `open()` fix the symptom? I'm
> probably missing something.
I am sorry, I did not really read the bug report in detail, I only meant
to unblock Git for Windows v2.47.0-rc1 and thought I had a fix in hand. It
certainly fixed the failures on my local machine, but it unfortunately did
not fix the problems in CI.
I tried to debug in CI a bit, but it is a gnarly bug to investigate, what
with plenty of processes the intentionally block each other, and no `gdb`
to help.
> In any case I also tried something like the below patch (sorry,
> whitespace-broken).
Oh, you reminded me that the `mingw_rename()` function looks
_substantially_ different in Git for Windows than in Git. I did not have
the time (or the strength of mind, more like) to upstream those changes
yet.
> But unfortunately this still caused permission errors when the new path
> was held open by another process.
Yes, this will _always_ be a problem, I think. The
`FILE_RENAME_POSIX_SEMANTICS` as per its documentation should help, but if
it does not in your tests it might actually not quite work as advertised
(wouldn't be the first time I encounter such an issue).
I tried to read through the code (it's a lot!) to figure out whether there
is potentially any situation when the `tables.list` file is opened but not
closed immediately, but couldn't find any. Do you know off-hand of any
such scenario?
> I think for now I'd still lean into the direction of adding the !WINDOWS
> prerequisite to the test and increasing timeouts such that I can
> continue to investigate without time pressure.
Let me bang my head against this problem for a little while longer. You
might be right, though, that this is a thing we cannot fix in time for
v2.47.0, which would be sad.
Ciao,
Johannes
>
> Patrick
>
> @ -2152,9 +2152,12 @@ int mingw_accept(int sockfd1, struct sockaddr *sa, socklen_t *sz)
> int mingw_rename(const char *pold, const char *pnew)
> {
> DWORD attrs, gle;
> - int tries = 0;
> + int tries = 0, wpnew_len, wpold_len;
> wchar_t wpold[MAX_PATH], wpnew[MAX_PATH];
> - if (xutftowcs_path(wpold, pold) < 0 || xutftowcs_path(wpnew, pnew) < 0)
> +
> + wpold_len = xutftowcs_path(wpold, pold);
> + wpnew_len = xutftowcs_path(wpnew, pnew);
> + if (wpold_len < 0 || wpnew_len < 0)
> return -1;
>
> /*
> @@ -2166,6 +2169,58 @@ int mingw_rename(const char *pold, const char *pnew)
> if (errno != EEXIST)
> return -1;
> repeat:
> + {
> +#define FileRenameInfoEx 22
> + enum {
> + FILE_RENAME_REPLACE_IF_EXISTS = 0x01,
> + FILE_RENAME_POSIX_SEMANTICS = 0x02,
> + FILE_RENAME_IGNORE_READONLY_ATTRIBUTE = 0x40,
> + };
> + typedef struct FILE_RENAME_INFORMATION {
> + union {
> + BOOLEAN ReplaceIfExists;
> + ULONG Flags;
> + };
> + HANDLE RootDirectory;
> + ULONG FileNameLength;
> + WCHAR FileName[1];
> + } *PFILE_RENAME_INFORMATION;
> + size_t wpnew_size = wpnew_len * sizeof(wchar_t);
> + PFILE_RENAME_INFORMATION fri = NULL;
> + HANDLE handle = NULL;
> + BOOL success;
> + int ret;
> +
> + handle = CreateFileW(wpold, DELETE,
> + FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE, NULL,
> + OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
> + if (handle == INVALID_HANDLE_VALUE) {
> + errno = ENOENT;
> + ret = -1;
> + goto out;
> + }
> +
> + fri = xcalloc(1, sizeof(*fri) + wpnew_size);
> + fri->Flags = FILE_RENAME_REPLACE_IF_EXISTS | FILE_RENAME_POSIX_SEMANTICS |
> + FILE_RENAME_IGNORE_READONLY_ATTRIBUTE;
> + fri->FileNameLength = wpnew_len;
> + memcpy(fri->FileName, wpnew, wpnew_size);
> +
> + success = SetFileInformationByHandle(handle, FileRenameInfoEx,
> + fri, sizeof(*fri) + wpnew_size);
> + if (!success) {
> + errno = err_win_to_posix(GetLastError());
> + ret = -1;
> + goto out;
> + }
> +
> + ret = 0;
> +out:
> + CloseHandle(handle);
> + free(fri);
> + return ret;
> + }
> +
>
>
next prev parent reply other threads:[~2024-10-04 11:11 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-04 1:02 v2.47.0-rc1 test failure on cygwin Ramsay Jones
2024-10-04 3:59 ` Patrick Steinhardt
2024-10-04 6:13 ` Patrick Steinhardt
2024-10-04 9:13 ` Johannes Schindelin
2024-10-04 10:09 ` Patrick Steinhardt
2024-10-04 11:11 ` Johannes Schindelin [this message]
2024-10-04 11:32 ` Patrick Steinhardt
2024-10-04 16:09 ` Junio C Hamano
2024-10-04 17:14 ` Patrick Steinhardt
2024-10-04 17:54 ` Junio C Hamano
2024-10-04 12:16 ` [PATCH] t0610: work around flaky test with concurrent writers Patrick Steinhardt
2024-10-04 14:47 ` Ramsay Jones
2024-10-04 15:26 ` Patrick Steinhardt
2024-10-04 16:32 ` Junio C Hamano
2024-10-04 16:22 ` Junio C Hamano
2024-10-04 15:32 ` [PATCH v2] " Patrick Steinhardt
2024-10-04 16:32 ` Ramsay Jones
2024-10-04 16:35 ` Junio C Hamano
2024-10-04 22:41 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f29241a7-aadd-e824-97f3-a95ac6619951@gmx.de \
--to=johannes.schindelin@gmx.de \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=ps@pks.im \
--cc=ramsay@ramsayjones.plus.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).