public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] t5410: avoid hangs in CI runs in the win+Meson test jobs
@ 2025-06-05 10:16 Johannes Schindelin via GitGitGadget
  2025-06-05 11:10 ` Patrick Steinhardt
  2025-06-05 13:30 ` Junio C Hamano
  0 siblings, 2 replies; 3+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2025-06-05 10:16 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

In the GitHub workflow used in Git's CI builds, the `vs test` jobs use a
subset of a specific revision of Git for Windows' SDK to run Git's test
suite. This revision is validated by another CI workflow to ensure that
said revision _can_ run Git's test suite successfully, skipping buggy
updates in Git for Windows' SDK.

The `win+Meson test` jobs do things differently, quite differently. They
use the Bash of the Git for Windows version that is installed on the
runners to run Git's test suite.

This difference has consequences.

When 68cb0b5253a0 (builtin/receive-pack: add option to skip connectivity
check, 2025-05-20) introduced a test case that uses `tee <file> | git
receive-pack` as `--receive-pack` parameter (imitating an existing
pattern in the same test script), it hit just the sweet spot to trigger
a bug in the MSYS2 runtime shipped in Git for Windows v2.49.0. This
version is the one currently installed on GitHub's runners.

The problem is that the `git receive-pack` process finishes while the
`tee` process does not need to write anything anymore and therefore does
not receive an EOF. Instead, it should receive a SIGPIPE, but the bug in
the MSYS2 runtime prevents that from working as intended. As a
consequence, the `tee` process waits for more input from the `git.exe
send-pack` process but none is coming, and the test script patiently
waits until the 6h timeout hits.

Only every once in a while, the `git receive-pack` process manages to
send an EOF to the `tee` process and no hang occurs. Therefore, the
problem can be worked around by cancelling the clearly-hanging job after
twenty or so minutes and re-running it, repeating the process about half
a dozen times, until the hang was successfully avoided.

This bug in the MSYS2 runtime has been fixed in the meantime, which is
the reason why the same test case causes no problems in the `win test`
and the `vs test` jobs.

This will continue to be the case until the Git for Windows version on
the GitHub runners is upgraded to a version that distributes a newer
MSYS2 runtime version. However, as of time of writing, this _is_ the
latest Git for Windows version, and will be for another 1.5 weeks, until
Git v2.50.0 is scheduled to appear (and shortly thereafter Git for
Windows v2.50.0). Traditionally it takes a while before the runners pick
up the new version.

We could just wait it out, six hours at a time.

Here, I opt for an alternative: Detect the buggy MSYS2 runtime and
simply skip the test case. It's not like the `receive-pack` test cases
are specific to Windows, and even then, to my chagrin the CI runs in
git-for-windows/git spend around ten hours of compute time each and
every time to run the entire test suite on all the platforms, even the
tests that cover cross-platform code, and for Windows alone we do that
three times: with GCC, with MSVC, and with MSVC via Meson. Therefore, I
deem it more than acceptable to skip this test case in one of those
matrices.

For good luck, also the preceding test case is skipped in that scenario,
as it uses the same `--receive-pack=tee <file> | git receive-pack`
pattern, even though I never observed that test case to hang in
practice.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
    t5410: avoid hangs in CI runs in the win+Meson test jobs
    
    I finally had a chance to look more closely at this problem. Here is my
    alternative to what Patrick proposed in
    https://lore.kernel.org/git/aD7tKfXD7YxprSZh@pks.im/.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1932%2Fdscho%2Ft5410-hangs-in-win%2BMeson-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1932/dscho/t5410-hangs-in-win+Meson-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1932

 t/t5410-receive-pack.sh | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/t/t5410-receive-pack.sh b/t/t5410-receive-pack.sh
index f76a22943ef..09d6bfd2a10 100755
--- a/t/t5410-receive-pack.sh
+++ b/t/t5410-receive-pack.sh
@@ -41,7 +41,19 @@ test_expect_success 'with core.alternateRefsPrefixes' '
 	test_cmp expect actual.haves
 '
 
-test_expect_success 'receive-pack missing objects fails connectivity check' '
+# The `tee.exe` shipped in Git for Windows v2.49.0 is known to hang frequently
+# when spawned from `git.exe` and piping its output to `git.exe`. This seems
+# related to MSYS2 runtime bug fixes regarding the signal handling; Let's just
+# skip the tests that need to exercise this when the faulty MSYS2 runtime is
+# detected; The test cases are exercised enough in other matrix jobs of the CI
+# runs.
+test_lazy_prereq TEE_DOES_NOT_HANG '
+	test_have_prereq !MINGW &&
+	case "$(uname -a)" in *3.5.7-463ebcdc.x86_64*) false;; esac
+'
+
+test_expect_success TEE_DOES_NOT_HANG \
+	'receive-pack missing objects fails connectivity check' '
 	test_when_finished rm -rf repo remote.git setup.git &&
 
 	git init repo &&
@@ -62,7 +74,8 @@ test_expect_success 'receive-pack missing objects fails connectivity check' '
 	test_must_fail git -C remote.git cat-file -e $(git -C repo rev-parse HEAD)
 '
 
-test_expect_success 'receive-pack missing objects bypasses connectivity check' '
+test_expect_success TEE_DOES_NOT_HANG \
+	'receive-pack missing objects bypasses connectivity check' '
 	test_when_finished rm -rf repo remote.git setup.git &&
 
 	git init repo &&

base-commit: 0d42fbd9a1f30c63cf0359a1c5aaa77020972f72
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] t5410: avoid hangs in CI runs in the win+Meson test jobs
  2025-06-05 10:16 [PATCH] t5410: avoid hangs in CI runs in the win+Meson test jobs Johannes Schindelin via GitGitGadget
@ 2025-06-05 11:10 ` Patrick Steinhardt
  2025-06-05 13:30 ` Junio C Hamano
  1 sibling, 0 replies; 3+ messages in thread
From: Patrick Steinhardt @ 2025-06-05 11:10 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget; +Cc: git, Johannes Schindelin

On Thu, Jun 05, 2025 at 10:16:45AM +0000, Johannes Schindelin via GitGitGadget wrote:
> diff --git a/t/t5410-receive-pack.sh b/t/t5410-receive-pack.sh
> index f76a22943ef..09d6bfd2a10 100755
> --- a/t/t5410-receive-pack.sh
> +++ b/t/t5410-receive-pack.sh
> @@ -41,7 +41,19 @@ test_expect_success 'with core.alternateRefsPrefixes' '
>  	test_cmp expect actual.haves
>  '
>  
> -test_expect_success 'receive-pack missing objects fails connectivity check' '
> +# The `tee.exe` shipped in Git for Windows v2.49.0 is known to hang frequently
> +# when spawned from `git.exe` and piping its output to `git.exe`. This seems
> +# related to MSYS2 runtime bug fixes regarding the signal handling; Let's just
> +# skip the tests that need to exercise this when the faulty MSYS2 runtime is
> +# detected; The test cases are exercised enough in other matrix jobs of the CI
> +# runs.
> +test_lazy_prereq TEE_DOES_NOT_HANG '
> +	test_have_prereq !MINGW &&
> +	case "$(uname -a)" in *3.5.7-463ebcdc.x86_64*) false;; esac
> +'
> +
> +test_expect_success TEE_DOES_NOT_HANG \
> +	'receive-pack missing objects fails connectivity check' '
>  	test_when_finished rm -rf repo remote.git setup.git &&
>  
>  	git init repo &&

Quite interesting. I any case, I think this is a sensible fix for now.
It's a known bug, we know it's fixed, we just have to wait. And the fact
that this prereq will basically auto-disarm itself once we have the new
version is nice.

I did wonder whether we can maybe rewrite the test so that we compute
our own packfile instead of intercepting the one from git-send-pack(1).
But I'm not sure whether that's really worth it.

Thanks!

Patrick

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] t5410: avoid hangs in CI runs in the win+Meson test jobs
  2025-06-05 10:16 [PATCH] t5410: avoid hangs in CI runs in the win+Meson test jobs Johannes Schindelin via GitGitGadget
  2025-06-05 11:10 ` Patrick Steinhardt
@ 2025-06-05 13:30 ` Junio C Hamano
  1 sibling, 0 replies; 3+ messages in thread
From: Junio C Hamano @ 2025-06-05 13:30 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: git, Patrick Steinhardt, Johannes Schindelin

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> ...
> This bug in the MSYS2 runtime has been fixed in the meantime, which is
> the reason why the same test case causes no problems in the `win test`
> and the `vs test` jobs.
>
> This will continue to be the case until the Git for Windows version on
> the GitHub runners is upgraded to a version that distributes a newer
> MSYS2 runtime version. However, as of time of writing, this _is_ the
> latest Git for Windows version, and will be for another 1.5 weeks, until
> Git v2.50.0 is scheduled to appear (and shortly thereafter Git for
> Windows v2.50.0). Traditionally it takes a while before the runners pick
> up the new version.
> ...
>     I finally had a chance to look more closely at this problem. Here is my
>     alternative to what Patrick proposed in
>     https://lore.kernel.org/git/aD7tKfXD7YxprSZh@pks.im/.

Superb.  It must have taken a truly heroic effort.

Thanks and congratulations for finally solving the puzzle.

I do agree that Patrick's "wrap the same in a script" smelled like
shifting a timing issue and not truly a solution.

> +# The `tee.exe` shipped in Git for Windows v2.49.0 is known to hang frequently
> +# when spawned from `git.exe` and piping its output to `git.exe`. This seems
> +# related to MSYS2 runtime bug fixes regarding the signal handling; Let's just
> +# skip the tests that need to exercise this when the faulty MSYS2 runtime is
> +# detected; The test cases are exercised enough in other matrix jobs of the CI
> +# runs.
> +test_lazy_prereq TEE_DOES_NOT_HANG '
> +	test_have_prereq !MINGW &&
> +	case "$(uname -a)" in *3.5.7-463ebcdc.x86_64*) false;; esac
> +'

That's very specific ;-).

As this is not in a library-ish part, it does not have to be lazy.
Anybody running this test script need to tell if their environment
satisfies the prerequisite, but lazy one does have a documentation
value, I guess, and a bit of extra indirection does not hurt.

Will queue.  Thanks again.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-06-05 13:30 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-05 10:16 [PATCH] t5410: avoid hangs in CI runs in the win+Meson test jobs Johannes Schindelin via GitGitGadget
2025-06-05 11:10 ` Patrick Steinhardt
2025-06-05 13:30 ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox