* [PATCH] process: Improve exit handling and hangs
@ 2013-08-24 12:07 Richard Purdie
2013-08-24 12:40 ` Richard Purdie
2013-08-27 20:11 ` Jason Wessel
0 siblings, 2 replies; 3+ messages in thread
From: Richard Purdie @ 2013-08-24 12:07 UTC (permalink / raw)
To: bitbake-devel
It turns out we have a number of different ways the process server termination can
hang. If we call cancel_join_thread() on the event queue, it means that it can be left
containing partial data. This means the reading of the event queue in the terminate()
function can hang, the timeout and block parameters to Queue.get() don't make any
difference.
Equally, if we don't call cancel_join_thread(), the join_thread in terminate()
will hang giving a different deadlock.
The best solution I could find is to loop over the process is_alive() after requesting
it stops, trying to join the thread and if that fails, try and flush the event
queue again.
It wasn't clear what difference a force option should make in this case, we're
gracefully trying to empty queues and shut down regardless of whether its a SIGTERM
so I've simply removed the force option.
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
---
Jason: Not sure if this or the other patch will help the hang you are
seeing or not but they seem like good changes regardless and fix real
world issues.
diff --git a/bitbake/lib/bb/server/process.py b/bitbake/lib/bb/server/process.py
index 0d4a26c..99a6bf5 100644
--- a/bitbake/lib/bb/server/process.py
+++ b/bitbake/lib/bb/server/process.py
@@ -105,7 +105,7 @@ class ProcessServer(Process, BaseImplServer):
except Exception:
logger.exception('Running command %s', command)
- self.event_queue.cancel_join_thread()
+ self.event_queue.close()
bb.event.unregister_UIHhandler(self.event_handle)
self.command_channel.close()
self.cooker.stop()
@@ -150,27 +150,25 @@ class BitBakeProcessServerConnection(BitBakeBaseServerConnection):
self.connection = ServerCommunicator(self.ui_channel)
self.events = self.event_queue
- def terminate(self, force = False):
+ def terminate(self):
+ def flushevents():
+ while True:
+ try:
+ event = self.event_queue.get(block=False)
+ except (Empty, IOError):
+ break
+ if isinstance(event, logging.LogRecord):
+ logger.handle(event)
+
signal.signal(signal.SIGINT, signal.SIG_IGN)
self.procserver.stop()
- if force:
- self.procserver.join(0.5)
- if self.procserver.is_alive():
- self.procserver.terminate()
- self.procserver.join()
- else:
- self.procserver.join()
- while True:
- try:
- event = self.event_queue.get(block=False)
- except (Empty, IOError):
- break
- if isinstance(event, logging.LogRecord):
- logger.handle(event)
+
+ while self.procserver.is_alive():
+ flushevents()
+ self.procserver.join(0.1)
+
self.ui_channel.close()
self.event_queue.close()
- if force:
- sys.exit(1)
# Wrap Queue to provide API which isn't server implementation specific
class ProcessEventQueue(multiprocessing.queues.Queue):
@@ -203,5 +201,5 @@ class BitBakeServer(BitBakeBaseServer):
def establishConnection(self):
self.connection = BitBakeProcessServerConnection(self.serverImpl, self.ui_channel, self.event_queue)
- signal.signal(signal.SIGTERM, lambda i, s: self.connection.terminate(force=True))
+ signal.signal(signal.SIGTERM, lambda i, s: self.connection.terminate())
return self.connection
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] process: Improve exit handling and hangs
2013-08-24 12:07 [PATCH] process: Improve exit handling and hangs Richard Purdie
@ 2013-08-24 12:40 ` Richard Purdie
2013-08-27 20:11 ` Jason Wessel
1 sibling, 0 replies; 3+ messages in thread
From: Richard Purdie @ 2013-08-24 12:40 UTC (permalink / raw)
To: bitbake-devel
> @@ -203,5 +201,5 @@ class BitBakeServer(BitBakeBaseServer):
>
> def establishConnection(self):
> self.connection = BitBakeProcessServerConnection(self.serverImpl, self.ui_channel, self.event_queue)
> - signal.signal(signal.SIGTERM, lambda i, s: self.connection.terminate(force=True))
> + signal.signal(signal.SIGTERM, lambda i, s: self.connection.terminate())
> return self.connection
FWIW I think this piece of the change *may* make bitbake's handling of
Ctrl+C more robust. I know people have reported problems with that and
the function being called here was full of deadlocks. I'd be interested
in feedback on whether it helps.
Cheers,
Richard
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] process: Improve exit handling and hangs
2013-08-24 12:07 [PATCH] process: Improve exit handling and hangs Richard Purdie
2013-08-24 12:40 ` Richard Purdie
@ 2013-08-27 20:11 ` Jason Wessel
1 sibling, 0 replies; 3+ messages in thread
From: Jason Wessel @ 2013-08-27 20:11 UTC (permalink / raw)
To: Richard Purdie; +Cc: bitbake-devel
On 08/24/2013 07:07 AM, Richard Purdie wrote:
> It turns out we have a number of different ways the process server termination can
> hang. If we call cancel_join_thread() on the event queue, it means that it can be left
> containing partial data. This means the reading of the event queue in the terminate()
> function can hang, the timeout and block parameters to Queue.get() don't make any
> difference.
>
> Equally, if we don't call cancel_join_thread(), the join_thread in terminate()
> will hang giving a different deadlock.
>
> The best solution I could find is to loop over the process is_alive() after requesting
> it stops, trying to join the thread and if that fails, try and flush the event
> queue again.
>
> It wasn't clear what difference a force option should make in this case, we're
> gracefully trying to empty queues and shut down regardless of whether its a SIGTERM
> so I've simply removed the force option.
>
> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
> ---
>
> Jason: Not sure if this or the other patch will help the hang you are
> seeing or not but they seem like good changes regardless and fix real
> world issues.
Certainly the behavior is a bit better with your additional patches, but it is not the root of the problem. I have tested your patches in the heavy load situations where we have observed all the hangs.
I'll send a patch separately along with an explanation of the root cause of the hangs in the PR Server.
As for your patches, I have reviewed and tested them: Acked-by: Jason Wessel <jason.wessel@windriver.com>
Cheers,
Jason.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2013-08-27 20:11 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-24 12:07 [PATCH] process: Improve exit handling and hangs Richard Purdie
2013-08-24 12:40 ` Richard Purdie
2013-08-27 20:11 ` Jason Wessel
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.