All of lore.kernel.org
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: Ramsay Jones <ramsay@ramsayjones.plus.com>,
	GIT Mailing-list <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: v2.47.0-rc1 test failure on cygwin
Date: Fri, 4 Oct 2024 13:32:01 +0200	[thread overview]
Message-ID: <Zv_SKtehwx2dsflo@pks.im> (raw)
In-Reply-To: <f29241a7-aadd-e824-97f3-a95ac6619951@gmx.de>

On Fri, Oct 04, 2024 at 01:11:15PM +0200, Johannes Schindelin wrote:
> Hi Patrick,
> > > > The issue on Win32 is different: we cannot commit the "tables.list" lock
> > > > via rename(3P) because the target file may be open for reading by a
> > > > concurrent process. I guess that Cygwin has proper POSIX semantics for
> > > > rename(3P) and thus doesn't hit the same issue.
> > >
> > > Indeed, this is where the symptom lies. I worked around it in Git for
> > > Windows v2.47.0-rc1 via this patch:
> > >
> > > -- snipsnap --
> > > diff --git a/compat/mingw.c b/compat/mingw.c
> > > index c1030e40f227..56db193d1360 100644
> > > --- a/compat/mingw.c
> > > +++ b/compat/mingw.c
> > > @@ -784,6 +784,7 @@ int mingw_open (const char *filename, int oflags, ...)
> > >  	int fd, create = (oflags & (O_CREAT | O_EXCL)) == (O_CREAT | O_EXCL);
> > >  	wchar_t wfilename[MAX_LONG_PATH];
> > >  	open_fn_t open_fn;
> > > +	int tries = 0;
> > >
> > >  	va_start(args, oflags);
> > >  	mode = va_arg(args, int);
> > > @@ -813,7 +814,11 @@ int mingw_open (const char *filename, int oflags, ...)
> > >  	else if (xutftowcs_long_path(wfilename, filename) < 0)
> > >  		return -1;
> > >
> > > -	fd = open_fn(wfilename, oflags, mode);
> > > +	do {
> > > +		fd = open_fn(wfilename, oflags, mode);
> > > +	} while (fd < 0 && GetLastError() == ERROR_SHARING_VIOLATION &&
> > > +		 retry_ask_yes_no(&tries, "Opening '%s' failed because another "
> > > +			"application accessed it. Try again?", filename));
> > >
> > >  	if ((oflags & O_CREAT) && fd >= 0 && are_wsl_compatible_mode_bits_enabled()) {
> > >  		_mode_t wsl_mode = S_IFREG | (mode&0777);
> >
> > Wait, this is surprising to me as I saw the error happening when calling
> > rename, not open. So why would retries in `open()` fix the symptom? I'm
> > probably missing something.
> 
> I am sorry, I did not really read the bug report in detail, I only meant
> to unblock Git for Windows v2.47.0-rc1 and thought I had a fix in hand. It
> certainly fixed the failures on my local machine, but it unfortunately did
> not fix the problems in CI.

Nothing to be sorry about. Quite on the contrary, thanks for chiming in
here, I was hoping for your input.

> I tried to debug in CI a bit, but it is a gnarly bug to investigate, what
> with plenty of processes the intentionally block each other, and no `gdb`
> to help.

Yup, indeed it is.

> > In any case I also tried something like the below patch (sorry,
> > whitespace-broken).
> 
> Oh, you reminded me that the `mingw_rename()` function looks
> _substantially_ different in Git for Windows than in Git. I did not have
> the time (or the strength of mind, more like) to upstream those changes
> yet.
> 
> > But unfortunately this still caused permission errors when the new path
> > was held open by another process.
> 
> Yes, this will _always_ be a problem, I think. The
> `FILE_RENAME_POSIX_SEMANTICS` as per its documentation should help, but if
> it does not in your tests it might actually not quite work as advertised
> (wouldn't be the first time I encounter such an issue).

It could of course be that the code I wrote is just plain wrong. I had
to stub out the struct as well as the constants because those are not
available on my Windows system, due to whatever reason. Not even when
bumping _WIN32_WINNT to 0x0A00 (W10+). Maybe this is because I'm using
Windows 10, but I thought that thish should have been available starting
with Windows 10 RC1.

> I tried to read through the code (it's a lot!) to figure out whether there
> is potentially any situation when the `tables.list` file is opened but not
> closed immediately, but couldn't find any. Do you know off-hand of any
> such scenario?

We indeed do keep the `tables.list` file descriptor open as part of our
stat cache. But that only happens on non-Windows system. This all
happens in `reftable_stack_reload_maybe_reuse()` and is documented quite
extensively in there.

> > I think for now I'd still lean into the direction of adding the !WINDOWS
> > prerequisite to the test and increasing timeouts such that I can
> > continue to investigate without time pressure.
> 
> Let me bang my head against this problem for a little while longer. You
> might be right, though, that this is a thing we cannot fix in time for
> v2.47.0, which would be sad.

A bit sad, yes, but as I mentioned this is at least not a regression.
The test just demonstrates that the improvements I did in this area are
not yet sufficient for all platforms and that I need to spend some more
time on it. Or that the central "tables.list" mechanism is not a good
fit for Windows, which would be a shame.

I'll send the patch as a reply to this thread so that it can be picked
up as required.

Patrick

  reply	other threads:[~2024-10-04 11:32 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-04  1:02 v2.47.0-rc1 test failure on cygwin Ramsay Jones
2024-10-04  3:59 ` Patrick Steinhardt
2024-10-04  6:13   ` Patrick Steinhardt
2024-10-04  9:13     ` Johannes Schindelin
2024-10-04 10:09       ` Patrick Steinhardt
2024-10-04 11:11         ` Johannes Schindelin
2024-10-04 11:32           ` Patrick Steinhardt [this message]
2024-10-04 16:09           ` Junio C Hamano
2024-10-04 17:14             ` Patrick Steinhardt
2024-10-04 17:54               ` Junio C Hamano
2024-10-04 12:16 ` [PATCH] t0610: work around flaky test with concurrent writers Patrick Steinhardt
2024-10-04 14:47   ` Ramsay Jones
2024-10-04 15:26     ` Patrick Steinhardt
2024-10-04 16:32     ` Junio C Hamano
2024-10-04 16:22   ` Junio C Hamano
2024-10-04 15:32 ` [PATCH v2] " Patrick Steinhardt
2024-10-04 16:32   ` Ramsay Jones
2024-10-04 16:35   ` Junio C Hamano
2024-10-04 22:41   ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zv_SKtehwx2dsflo@pks.im \
    --to=ps@pks.im \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=ramsay@ramsayjones.plus.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.