Git development
 help / color / mirror / Atom feed
* Re: [RFH] Why do osx CI jobs so unreliable?
@ 2026-06-20 15:33 Michael Montalbo
  2026-06-21 21:34 ` Jeff King
  0 siblings, 1 reply; 20+ messages in thread
From: Michael Montalbo @ 2026-06-20 15:33 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Junio C Hamano

Patrick Steinhardt <ps@pks.im> writes:
> So I strongly suspect that it most be one of the t555* tests.
> [...]
> Maybe this is something that's specific to GitHub's environment...

I think you're right it's t5551/t5559. The runs Junio linked:

  osx-clang     cancelled  360min
  osx-gcc       cancelled  360min
  osx-reftable  success     35min
  osx-meson     success     61min

All four run the same t5551/t5559 under EXPENSIVE. The two that
finished differ in just two ways, which look like the levers:
osx-reftable generates the 100k-ref advertisement in ~24ms vs ~1.2s
for loose refs on macOS (so much less time mid-response), and
osx-meson runs tests at nproc while the prove jobs hardcode --jobs=10
on a 3-core runner (over recent master/next the prove jobs hang ~40%,
meson ~10%).

When it is wedged the whole chain sits at 0% CPU. upload-pack is
blocked in write() on the ls-refs advertisement, curl blocked in
select(). So it looks like an HTTP/2 flow-control stall on the
response side. The same stall resets itself after ~60-85s on my Linux
box and on a bare-metal Mac, but not on the GitHub runner; I haven't
pinned down why yet.

On the chance those two levers are the fix, a branch off master:

  https://github.com/mmontalbo/git/tree/mm/macos-ci-hang-fix

  - pack the refs in t5551's enormous-ref-negotiation test (doesn't
    change what it checks on the wire, just avoids re-reading 100k loose
    files to advertise them, like reftable already does)
  - use the core count for $JOBS on the GitHub macOS path, matching the
    GitLab branch in the same ci/lib.sh and what meson does

I ran the two macOS jobs under EXPENSIVE about eight times with these
and they all finished in ~30-44min instead of hanging. Happy to send
out a patch if it's helpful.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2026-06-29  0:34 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-20 15:33 [RFH] Why do osx CI jobs so unreliable? Michael Montalbo
2026-06-21 21:34 ` Jeff King
2026-06-22  4:42   ` Patrick Steinhardt
2026-06-22  9:47     ` Patrick Steinhardt
2026-06-22  9:55       ` Patrick Steinhardt
2026-06-22 10:29         ` Patrick Steinhardt
2026-06-26  3:27           ` Michael Montalbo
2026-06-26  5:16             ` Jeff King
2026-06-26 10:50               ` Patrick Steinhardt
2026-06-26 13:45                 ` Junio C Hamano
2026-06-26 23:26                 ` Michael Montalbo
2026-06-28  7:57                   ` [PATCH 0/3] fixing expensive http test timeouts Jeff King
2026-06-28  8:00                     ` [PATCH 1/3] t/lib-httpd: bump apache timeout Jeff King
2026-06-28  8:03                     ` [PATCH 2/3] t5551: put many-tags case into its own repo Jeff King
2026-06-28 21:44                       ` Junio C Hamano
2026-06-29  0:34                         ` Jeff King
2026-06-28  8:07                     ` [PATCH 3/3] t5551: pack refs after creating many tags Jeff King
2026-06-28 21:25                       ` Junio C Hamano
2026-06-26 23:43                 ` [RFH] Why do osx CI jobs so unreliable? Jeff King
2026-06-22  5:05   ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox