All of lore.kernel.org
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Jeff King <peff@peff.net>
Cc: Michael Montalbo <mmontalbo@gmail.com>,
	git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>
Subject: Re: [RFH] Why do osx CI jobs so unreliable?
Date: Mon, 22 Jun 2026 12:29:53 +0200	[thread overview]
Message-ID: <ajkOoRhqaAcy6gBg@pks.im> (raw)
In-Reply-To: <ajkGkB2ckf3p43QR@pks.im>

On Mon, Jun 22, 2026 at 11:55:31AM +0200, Patrick Steinhardt wrote:
> On Mon, Jun 22, 2026 at 11:48:01AM +0200, Patrick Steinhardt wrote:
> > On Mon, Jun 22, 2026 at 06:42:24AM +0200, Patrick Steinhardt wrote:
> > > On Sun, Jun 21, 2026 at 05:34:07PM -0400, Jeff King wrote:
> > > > On Sat, Jun 20, 2026 at 08:33:13AM -0700, Michael Montalbo wrote:
> > [snip]
> > > > > When it is wedged the whole chain sits at 0% CPU. upload-pack is
> > > > > blocked in write() on the ls-refs advertisement, curl blocked in
> > > > > select(). So it looks like an HTTP/2 flow-control stall on the
> > > > > response side. The same stall resets itself after ~60-85s on my Linux
> > > > > box and on a bare-metal Mac, but not on the GitHub runner; I haven't
> > > > > pinned down why yet.
> > > > 
> > > > We had some HTTP/2 stalls/deadlocks in the past, and they were dependent
> > > > on libcurl and apache (actually h2_mod) versions. IIRC some of the
> > > > non-TLS code paths for HTTP/2 were not well tested, which led to
> > > > 8f2146dbf1 (t5559: make SSL/TLS the default, 2023-02-23). Of course
> > > > after that commit those cleartext code paths should not be a problem, so
> > > > that is probably not exactly the issue now.
> > > > 
> > > > But it might be worth checking the versions you're running locally
> > > > versus what's in the GitHub runner.
> > > 
> > > I didn't observe any similar hangs in GitLab's CI systems, so I wonder
> > > whether this is because of different versions of curl. And indeed we use
> > > different versions:
> > > 
> > >   - On GitHub we use 8.6.0.
> > > 
> > >   - On GitLab we use 8.7.1.
> > > 
> > > Now this of course doesn't mean that updating the curl version is the
> > > fix to this whole issue, as there's a ton of other factors that could
> > > play a role in whether or not the test hangs. So while we could just
> > > upgrade parts of the stack and cross our fingers, but that feels rather
> > > unsatisfactory. Still, one place to start could be to update our build
> > > images to macOS 15.
> > > 
> > > But the big question to me is whether the hang is because of a bug in
> > > Git with how we drive curl, a bug in curl itself, or a bug in Apache.
> > 
> > I noticed that a osx-clang job failed today in t5551 [1]. This time it
> > didn't hang, but produced an actual error:
> > 
> >     2026-06-22T09:25:45.1984230Z ++ git -C too-many-refs fetch -q --tags
> >     2026-06-22T09:25:45.1984420Z error: RPC failed; curl 18 transfer closed with outstanding read data remaining
> >     2026-06-22T09:25:45.1984520Z fatal: expected flush after ref listing
> >     2026-06-22T09:25:45.1984610Z error: last command exited with $?=128
> >     2026-06-22T09:25:45.1984660Z ++ rm -f tags
> >     2026-06-22T09:25:45.1984710Z ++ :
> >     2026-06-22T09:25:45.1984830Z not ok 35 - http can handle enormous ref negotiation
> > 
> > There was a second test failing similarly.
> 
> Oh, and Linux is also failing in the same test suite [1], even though
> the job logs are truncated, so it's hard to say whether it's the same
> failure or not.
> 
> There certainly seems to be a deeper issue here. We could of course just
> disable the test again, but by now I do wonder whether this would paper
> over an actual bug.
> 
> Patrick
> 
> [1]: https://github.com/git/git/actions/runs/27940620478/job/82672854864

Sorry for the repeated spam.

I think the issue is rather simple: we're hitting timeouts in Apache. If
you apply the following diff:

diff --git a/t/lib-httpd/apache.conf b/t/lib-httpd/apache.conf
index 40a690b0bb..4054fe008f 100644
--- a/t/lib-httpd/apache.conf
+++ b/t/lib-httpd/apache.conf
@@ -302,3 +302,5 @@ RewriteRule ^/half-auth-complete/ - [E=AUTHREQUIRED:yes]
 		SVNPath "${LIB_HTTPD_SVNPATH}"
 	</Location>
 </IfDefine>
+
+Timeout 1

Then you'll see the same errors locally:

    $ GIT_TEST_LONG=Yes meson test t5551-http-fetch-smart --test-args=-ix -i
    Failed to clone 'sub'. Retry scheduled
    Cloning into '/home/pks/Development/git/build/test-output/trash directory.t5551-http-fetch-smart/sub'...
    error: RPC failed; curl 18 transfer closed with outstanding read data remaining
    fatal: early EOF
    fatal: fetch-pack: invalid index-pack output
    fatal: clone of 'http://127.0.0.1:5551/smart_headers/repo.git' into submodule path '/home/pks/Development/git/build/test-output/trash directory.t5551-http-fetch-smart/sub' failed
    Failed to clone 'sub' a second time, aborting
    error: last command exited with $?=1
    not ok 36 - custom http headers
    #	
    #		test_must_fail git -c http.extraheader="x-magic-two: cadabra" \
    #			fetch "$HTTPD_URL/smart_headers/repo.git" &&
    #		git -c http.extraheader="x-magic-one: abra" \
    #		    -c http.extraheader="x-magic-two: cadabra" \
    #		    fetch "$HTTPD_URL/smart_headers/repo.git" &&
    #		git update-index --add --cacheinfo 160000,$(git rev-parse HEAD),sub &&
    #		git config -f .gitmodules submodule.sub.path sub &&
    #		git config -f .gitmodules submodule.sub.url \
    #			"$HTTPD_URL/smart_headers/repo.git" &&
    #		git submodule init sub &&
    #		test_must_fail git submodule update sub &&
    #		git -c http.extraheader="x-magic-one: abra" \
    #		    -c http.extraheader="x-magic-two: cadabra" \
    #			submodule update sub
    #	
    1..36

And Apache also logs this as a timeout:

    [Mon Jun 22 10:26:52.115717 2026] [cgi:warn] [pid 3686957:tid 3686957] [client 127.0.0.1:55114] AH01220: Timeout waiting for output from CGI script /home/pks/Development/git/build/git-http-backend
    [Mon Jun 22 10:26:52.115748 2026] [core:error] [pid 3686957:tid 3686957] (70007)The timeout specified has expired: [client 127.0.0.1:55114] AH00574: ap_content_length_filter: apr_bucket_read() failed
    [Mon Jun 22 10:27:01.567533 2026] [cgi:warn] [pid 3686958:tid 3686958] [client 127.0.0.1:54384] AH01220: Timeout waiting for output from CGI script /home/pks/Development/git/build/git-http-backend
    [Mon Jun 22 10:27:01.567559 2026] [core:error] [pid 3686958:tid 3686958] (70007)The timeout specified has expired: [client 127.0.0.1:54384] AH00574: ap_content_length_filter: apr_bucket_read() failed

This is because our keepalive mechanisms aren't helping:

  - The TCP-level keepalives don't help with Apache.

  - The application-level sideband keepalives don't apply to the
    "ls-refs" endpoint.

Whether that's the same issue like we see in macOS sometimes is a
different question.

Patrick

  reply	other threads:[~2026-06-22 10:30 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-20 15:33 [RFH] Why do osx CI jobs so unreliable? Michael Montalbo
2026-06-21 21:34 ` Jeff King
2026-06-22  4:42   ` Patrick Steinhardt
2026-06-22  9:47     ` Patrick Steinhardt
2026-06-22  9:55       ` Patrick Steinhardt
2026-06-22 10:29         ` Patrick Steinhardt [this message]
2026-06-22  5:05   ` Junio C Hamano
  -- strict thread matches above, loose matches on Subject: below --
2026-06-19  0:35 Junio C Hamano
2026-06-19 14:03 ` Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ajkOoRhqaAcy6gBg@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mmontalbo@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.