From: Patrick Steinhardt <ps@pks.im>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, Derrick Stolee <stolee@gmail.com>
Subject: Re: [PATCH 3/3] ci: stop installing "gcc-13" for osx-gcc
Date: Thu, 16 May 2024 14:36:19 +0200 [thread overview]
Message-ID: <ZkX9w6etjDVAh-ln@tanuki> (raw)
In-Reply-To: <Zj8blb0QqC2zdOAC@framework>
[-- Attachment #1: Type: text/plain, Size: 3989 bytes --]
On Sat, May 11, 2024 at 09:17:41AM +0200, Patrick Steinhardt wrote:
> On Fri, May 10, 2024 at 04:13:48PM -0400, Jeff King wrote:
> > On Fri, May 10, 2024 at 09:00:04AM +0200, Patrick Steinhardt wrote:
> >
> > > On Thu, May 09, 2024 at 12:25:44PM -0400, Jeff King wrote:
> > > [snip]
> > > > I'd like to report that this let me get a successful CI run, but I'm
> > > > running into the thing where osx jobs seem to randomly hang sometimes
> > > > and hit the 6-hour timeout. But I did confirm that this lets us get to
> > > > the actual build/test, and not barf while installing dependencies.
> > >
> > > Yeah, this one is puzzling to me. We see the same thing on GitLab CI,
> > > and until now I haven't yet figured out why that is.
> >
> > Drat. I was hoping maybe it was a problem in GitHub CI and somebody else
> > would eventually fix it. ;)
> >
> > It feels like a deadlock somewhere, though whether it is in our code, or
> > in our tests, or some system-ish issue with prove, perl, etc, I don't
> > know. It would be nice to catch it in the act and see what the process
> > tree looks like. I guess poking around in the test environment with
> > tmate might work, though I don't know if there's a way to get tmate
> > running simultaneously with the hung step (so you'd probably have to
> > connect, kick off the "make test" manually and hope it hangs).
>
> My hunch tells me that it's the Perforce tests -- after all, this is
> where the jobs get stuck, too. In "lib-git-p4.sh" we already document
> that p4d is known to crash at times, and overall the logic to spawn the
> server is quite convoluted.
>
> I did try to get more useful logs yesterday. But as usual, once you
> _want_ to reproduce a failure like this is doesn't happen anymore.
I was spending (or rather wasting?) some more time on this. With the
below diff I was able to get a list of processes running after ~50
minutes:
diff --git a/ci/run-build-and-tests.sh b/ci/run-build-and-tests.sh
index 98dda42045..d5570b59d3 100755
--- a/ci/run-build-and-tests.sh
+++ b/ci/run-build-and-tests.sh
@@ -51,8 +51,15 @@ esac
group Build make
if test -n "$run_tests"
then
- group "Run tests" make test ||
+ (
+ sleep 3200 &&
+ mkdir -p t/failed-test-artifacts &&
+ ps -A >t/failed-test-artifacts/ps 2>&1
+ ) &
+ pid=$!
+ group "Run tests" gtimeout 1h make test ||
handle_failed_tests
+ kill "$pid"
fi
check_unignored_build_artifacts
I trimmed that process list to the following set of relevant processes:
PID TTY TIME CMD
5196 ?? 0:00.01 /bin/sh t9211-scalar-clone.sh --verbose-log -x
5242 ?? 0:00.00 /bin/sh t9211-scalar-clone.sh --verbose-log -x
5244 ?? 0:00.00 tee -a /Volumes/RAMDisk/test-results/t9211-scalar-clone.out
5245 ?? 0:00.09 /bin/sh t9211-scalar-clone.sh --verbose-log -x
7235 ?? 0:00.02 /Users/gitlab/builds/gitlab-org/git/scalar clone file:///Volumes/RAMDisk/trash directory.t9211-scalar-clone/to-clone maint-fail
7265 ?? 0:00.01 /Users/gitlab/builds/gitlab-org/git/git fetch --quiet --no-progress origin
7276 ?? 0:00.01 /Users/gitlab/builds/gitlab-org/git/git fsmonitor--daemon run --detach --ipc-threads=8
So it seems like the issue is t9211, and the hang happens in "scalar
clone warns when background maintenance fails" specifically. What
exactly the root cause is I have no clue though. Maybe an fsmonitor
race, maybe something else entirely. Hard to say as I have never seen
this happen on any other platform than macOS, and I do not have access
to a Mac myself.
The issue also doesn't seem to occur when running t9211 on its own, but
only when running the full test suite. This may further indicate that
there is a race condition, where the additional load improves the
likelihood of it. Or there is bad interaction with another test.
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2024-05-16 12:36 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-09 16:22 [PATCH 0/3] un-breaking osx-gcc ci job Jeff King
2024-05-09 16:23 ` [PATCH 1/3] ci: drop mention of BREW_INSTALL_PACKAGES variable Jeff King
2024-05-09 16:24 ` [PATCH 2/3] ci: avoid bare "gcc" for osx-gcc job Jeff King
2024-05-10 7:00 ` Patrick Steinhardt
2024-05-10 20:16 ` Jeff King
2024-05-10 20:32 ` Kyle Lippincott
2024-05-10 20:48 ` Junio C Hamano
2024-05-10 22:02 ` Jeff King
2024-05-10 22:47 ` Junio C Hamano
2024-05-11 17:21 ` Patrick Steinhardt
2024-05-16 7:19 ` Jeff King
2024-05-16 7:27 ` Jeff King
2024-05-16 9:54 ` Patrick Steinhardt
2024-05-17 8:19 ` Jeff King
2024-05-17 8:33 ` Patrick Steinhardt
2024-05-17 16:59 ` Junio C Hamano
2024-05-23 9:10 ` Jeff King
2024-05-23 15:35 ` Junio C Hamano
2024-05-09 16:25 ` [PATCH 3/3] ci: stop installing "gcc-13" for osx-gcc Jeff King
2024-05-10 7:00 ` Patrick Steinhardt
2024-05-10 20:13 ` Jeff King
2024-05-11 7:17 ` Patrick Steinhardt
2024-05-16 12:36 ` Patrick Steinhardt [this message]
2024-05-17 8:11 ` Jeff King
2024-05-17 8:25 ` Patrick Steinhardt
2024-05-17 11:30 ` Patrick Steinhardt
2024-05-26 6:34 ` Philip
2024-05-26 19:23 ` Junio C Hamano
2024-05-27 5:12 ` Patrick Steinhardt
2024-05-29 9:27 ` Jeff King
2024-05-09 16:52 ` [PATCH 0/3] un-breaking osx-gcc ci job Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZkX9w6etjDVAh-ln@tanuki \
--to=ps@pks.im \
--cc=git@vger.kernel.org \
--cc=peff@peff.net \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).