* [PATCH] runqueue: Really fix sigchld handling
@ 2014-03-18 22:58 Richard Purdie
2014-03-19 21:30 ` Chris Larson
0 siblings, 1 reply; 3+ messages in thread
From: Richard Purdie @ 2014-03-18 22:58 UTC (permalink / raw)
To: bitbake-devel
There are several problems. Firstly, a return value of "None" can mean
there is a C signal handler installed so we need to better handle that
case. signal.SIG_DFL is 0 which equates to false so we also need to
handle that by testing explicitly for None.
Finally, the signal handler *must* call waitpid on all child processes
else it will just get called repeatedly, leading to the hanging behaviour
we've been seeing. The solution is to only error for the worker children,
we warn about any other stray children which we'll have to figure out the
sources of in due course.
Hopefully this patch gets things working again properly though.
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
---
diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
index 055db48..3ab5439 100644
--- a/bitbake/lib/bb/runqueue.py
+++ b/bitbake/lib/bb/runqueue.py
@@ -914,32 +914,32 @@ class RunQueue:
workerpipe.close()
def sigchild_exception(self, *args, **kwargs):
- for w in [self.worker, self.fakeworker]:
- if not w:
- continue
+ pid = -1
+ while pid:
try:
- pid, status = os.waitpid(w.pid, os.WNOHANG)
+ pid, status = os.waitpid(-1, os.WNOHANG)
if pid != 0 and not self.teardown:
+ name = None
if self.worker and pid == self.worker.pid:
name = "Worker"
elif self.fakeworker and pid == self.fakeworker.pid:
name = "Fakeroot"
else:
- name = "Unknown"
- bb.error("%s process (%s) exited unexpectedly (%s), shutting down..." % (name, pid, str(status)))
- self.finish_runqueue(True)
+ bb.warn("Unknown process (%s) exited unexpectedly (%s), shutting down..." % (pid, str(status)))
+ if name and not self.teardown:
+ bb.error("%s process (%s) exited unexpectedly (%s), shutting down..." % (name, pid, str(status)))
+ self.finish_runqueue(True)
except OSError:
- pid = False
- if callable(self.oldsigchld):
- self.oldsigchld(*args, **kwargs)
+ return
def start_worker(self):
if self.worker:
self.teardown_workers()
self.teardown = False
- if not self.oldsigchld:
- self.oldsigchld = signal.getsignal(signal.SIGCHLD)
- signal.signal(signal.SIGCHLD, self.sigchild_exception)
+ if self.oldsigchld is None:
+ self.oldsigchld = signal.signal(signal.SIGCHLD, self.sigchild_exception)
+ if self.oldsigchld is None:
+ self.oldsigchld = signal.SIG_DFL
self.worker, self.workerpipe = self._start_worker()
def start_fakeworker(self, rqexec):
@@ -948,7 +948,7 @@ class RunQueue:
def teardown_workers(self):
self.teardown = True
- if self.oldsigchld:
+ if self.oldsigchld is not None:
signal.signal(signal.SIGCHLD, self.oldsigchld)
self.oldsigchld = None
self._teardown_worker(self.worker, self.workerpipe)
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: [PATCH] runqueue: Really fix sigchld handling
2014-03-18 22:58 [PATCH] runqueue: Really fix sigchld handling Richard Purdie
@ 2014-03-19 21:30 ` Chris Larson
2014-03-19 21:40 ` Richard Purdie
0 siblings, 1 reply; 3+ messages in thread
From: Chris Larson @ 2014-03-19 21:30 UTC (permalink / raw)
To: Richard Purdie; +Cc: bitbake-devel
[-- Attachment #1: Type: text/plain, Size: 551 bytes --]
On Tue, Mar 18, 2014 at 3:58 PM, Richard Purdie <
richard.purdie@linuxfoundation.org> wrote:
> + bb.warn("Unknown process (%s) exited unexpectedly
> (%s), shutting down..." % (pid, str(status)))
>
This says it's shutting down, but the commit message and code imply that it
isn't for these. I'm guessing this message needs adjustment from a
copy/paste? :)
--
Christopher Larson
clarson at kergoth dot com
Founder - BitBake, OpenEmbedded, OpenZaurus
Maintainer - Tslib
Senior Software Engineer, Mentor Graphics
[-- Attachment #2: Type: text/html, Size: 984 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [PATCH] runqueue: Really fix sigchld handling
2014-03-19 21:30 ` Chris Larson
@ 2014-03-19 21:40 ` Richard Purdie
0 siblings, 0 replies; 3+ messages in thread
From: Richard Purdie @ 2014-03-19 21:40 UTC (permalink / raw)
To: Chris Larson; +Cc: bitbake-devel
On Wed, 2014-03-19 at 14:30 -0700, Chris Larson wrote:
>
> On Tue, Mar 18, 2014 at 3:58 PM, Richard Purdie
> <richard.purdie@linuxfoundation.org> wrote:
> + bb.warn("Unknown process (%s) exited
> unexpectedly (%s), shutting down..." % (pid, str(status)))
>
>
>
> This says it's shutting down, but the commit message and code imply
> that it isn't for these. I'm guessing this message needs adjustment
> from a copy/paste? :)
>
Yes, indeed.
This signal handler stuff has been a mess and that patch and others have
just been making things worse.
Basically there are bugs in python 2.7.3 which expose problems which are
addressed in 2.7.4 and onwards. The whole signal handler approach was
flawed due to the toxic mix with subprocess anyway.
I've pushed some further patches basically reverting the signal handler
and we've ended up polling. Hopefully this stops things hanging and gets
us back to some kind of stability, I think I do understand all the
facets of the issues we've been hitting finally.
Cheers,
Richard
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2014-03-19 21:40 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-18 22:58 [PATCH] runqueue: Really fix sigchld handling Richard Purdie
2014-03-19 21:30 ` Chris Larson
2014-03-19 21:40 ` Richard Purdie
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.