[RESEND PATCH 1/1] udevd: process events with timeliness requirements during exit to avoid deadlock

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RESEND PATCH 1/1] udevd: process events with timeliness requirements during exit to avoid deadlock
@ 2011-12-16 17:46 Andy Whitcroft
  2011-12-17 18:15 ` [RESEND PATCH 1/1] udevd: process events with timeliness Kay Sievers
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Andy Whitcroft @ 2011-12-16 17:46 UTC (permalink / raw)
  To: linux-hotplug

When processing the incoming kernel event stream we typically enqueue
events, with those events being fed to workers as they become free.
The one exception is for events which indicate a timeliness requirement
(TIMEOUT=N).  Queueing these would violate this constraint and so they
are started immediatly regardless of the number of workers.

Typically events with timeliness requirements are firmware events.
The common case here is for a device add event to pop out of the kernel
during device discovery, udev responds by triggering a module load event.
The module initialisation may then require firmware to initialise the
card which triggers a firmware load event for the device.  This firmware
event pops out of the kernel into udevs queue.  As this is blocking the
modprobe the kernel indicates timeliness on the request, and this in turn
allows the event to be processed immediatly.   This convieniently avoids
a potential deadlock should all of the workers be busy running module
loads which required firmware.

In exit (udevadm control --exit) processing we currently dump the entire
incoming queue and ignore any further incoming kernel events.  We then wait
for the current workers to complete any events they were assigned, before
finally exiting.  However if any of the pending events should trigger a
nested event that new event would not be processed preventing completion
of the existing worker event.  We will eventually timeout both leading
to a long boot delay and in some cases to partially initialised cards.
These cards often will not be repaired even during coldplug and are lost
requiring manual intervention to recover.

Modify exit processing such that we handle events with timeliness
constraints on them, bringing them into line with normal processing.
This allows events which are triggered from our existing workers events
to run to completion and allow completion of those workers.  This allows
us to flush the queue and exit cleanly.

BugLink: http://bugs.launchpad.net/bugs/842560
Signed-off-by: Andy Whitcroft <apw@canonical.com>
---
 udev/udevd.c |   39 ++++++++++++++++++++++++++++-----------
 1 files changed, 28 insertions(+), 11 deletions(-)

This bug is triggered on some machines we have with two bnx2 network
interfaces each of which needs two firmware blobs loading.  If we find the
root disks in the middle we request udev exit so we can transition to the
version in the real root.  This request deadlocks resulting in a timeout.
The firmware never gets loaded for the second card and manual intervention
is required to recover the card.

It should also be noted that this change does slightly exacerbate what I
believe is an existing bug in the timeout handling.  We essentially reset
the pending timout to the maximum every time we handle any incoming event,
for example when a worker completes.  This does render the exact timeout
somewhat hard to predict, before and after this change.

diff --git a/udev/udevd.c b/udev/udevd.c
index 05d4b2d..19f7128 100644
--- a/udev/udevd.c
+++ b/udev/udevd.c
@@ -584,7 +584,7 @@ static void event_queue_start(struct udev *udev)
 	}
 }

-static void event_queue_cleanup(struct udev *udev, enum event_state match_type)
+static void __event_queue_cleanup(struct udev *udev, enum event_state match_type, bool keep_timely)
 {
 	struct udev_list_node *loop, *tmp;

@@ -594,10 +594,25 @@ static void event_queue_cleanup(struct udev *udev, enum event_state match_type)
 		if (match_type != EVENT_UNDEF && match_type != event->state)
 			continue;

+		/* Keep events which have timelyness requirements
+		 * we will skew these timeouts on coldplug. */
+		if (keep_timely && udev_device_get_timeout(event->dev) > 0)
+			continue;
+
 		event_queue_delete(event, false);
 	}
 }

+static void event_queue_cleanup(struct udev *udev, enum event_state match_type)
+{
+	__event_queue_cleanup(udev, match_type, false);
+}
+
+static void event_queue_cleanup_nontimely(struct udev *udev, enum event_state match_type)
+{
+	__event_queue_cleanup(udev, match_type, true);
+}
+
 static void worker_returned(int fd_worker)
 {
 	for (;;) {
@@ -1580,24 +1595,21 @@ int main(int argc, char *argv[])
 		int i;

 		if (udev_exit) {
-			/* close sources of new events and discard buffered events */
+			/* close sources of new events and discard buffered events,
+			 * keep the kernel channel so we can pick out events with a
+			 * timeliness requirement to avoid deadlock. */
 			if (fd_ctrl >= 0) {
 				epoll_ctl(fd_ep, EPOLL_CTL_DEL, fd_ctrl, NULL);
 				fd_ctrl = -1;
 			}
-			if (monitor != NULL) {
-				epoll_ctl(fd_ep, EPOLL_CTL_DEL, fd_netlink, NULL);
-				udev_monitor_unref(monitor);
-				monitor = NULL;
-			}
 			if (fd_inotify >= 0) {
 				epoll_ctl(fd_ep, EPOLL_CTL_DEL, fd_inotify, NULL);
 				close(fd_inotify);
 				fd_inotify = -1;
 			}

-			/* discard queued events and kill workers */
-			event_queue_cleanup(udev, EVENT_QUEUED);
+			/* discard queued non-timely events and kill workers */
+			event_queue_cleanup_nontimely(udev, EVENT_QUEUED);
 			worker_kill(udev, 0);

 			/* exit after all has cleaned up */
@@ -1649,8 +1661,13 @@ int main(int argc, char *argv[])

 			dev = udev_monitor_receive_device(monitor);
 			if (dev != NULL) {
-				udev_device_set_usec_initialized(dev, now_usec());
-				if (event_queue_insert(dev) < 0)
+				/* If we are exiting then only schedule events with
+				 * timeliness requirements. */
+				if (!udev_exit || udev_device_get_timeout(dev) > 0) {
+					udev_device_set_usec_initialized(dev, now_usec());
+					if (event_queue_insert(dev) < 0)
+						udev_device_unref(dev);
+				} else
 					udev_device_unref(dev);
 			}
 		}
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RESEND PATCH 1/1] udevd: process events with timeliness
  2011-12-16 17:46 [RESEND PATCH 1/1] udevd: process events with timeliness requirements during exit to avoid deadlock Andy Whitcroft
@ 2011-12-17 18:15 ` Kay Sievers
  2011-12-20 13:31 ` Andy Whitcroft
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Kay Sievers @ 2011-12-17 18:15 UTC (permalink / raw)
  To: linux-hotplug

On Fri, Dec 16, 2011 at 18:46, Andy Whitcroft <apw@canonical.com> wrote:
> In exit (udevadm control --exit) processing we currently dump the entire
> incoming queue and ignore any further incoming kernel events.  We then wait
> for the current workers to complete any events they were assigned, before
> finally exiting.  However if any of the pending events should trigger a
> nested event that new event would not be processed preventing completion
> of the existing worker event.  We will eventually timeout both leading
> to a long boot delay and in some cases to partially initialised cards.
> These cards often will not be repaired even during coldplug and are lost
> requiring manual intervention to recover.
>
> Modify exit processing such that we handle events with timeliness
> constraints on them, bringing them into line with normal processing.
> This allows events which are triggered from our existing workers events
> to run to completion and allow completion of those workers.  This allows
> us to flush the queue and exit cleanly.

Is 'udevadm settle' called before doing --exit in the initramfs? If
not, I guess that's why others have not seen that.

In general, requesting firmware synchronously on module init sounds
pretty broken. The firmware request should be async. If the device
allow that, done as late as the first ifup of the netdev, and not at
module load time.

If udev is not running, modprobe will hang until it runs into the
timeout? Having module init depending on a 'userspace transaction'
sounds pretty weird. How does that work when the module is
compiled-in?

The TIMEOUT= is basically just a left-over from the times of the crazy
/sbin/hotplug era, where all the hooked-in shell scripts took ages to
handle events, and the kernel's firmware loaders default was 10
seconds, and it ran into that all the time.

The patch looks sensible, but I haven't really wrapped my head around
if we really should make the TIMEOUT= handling even more special here.
I rather see the firmware loading model fixed for the affected
drivers, as I think it is very wrong, for many other reasons too, what
they do here.

Thanks,
Kay

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RESEND PATCH 1/1] udevd: process events with timeliness
  2011-12-16 17:46 [RESEND PATCH 1/1] udevd: process events with timeliness requirements during exit to avoid deadlock Andy Whitcroft
  2011-12-17 18:15 ` [RESEND PATCH 1/1] udevd: process events with timeliness Kay Sievers
@ 2011-12-20 13:31 ` Andy Whitcroft
  2011-12-20 13:44 ` Kay Sievers
  2011-12-20 13:58 ` Kay Sievers
  3 siblings, 0 replies; 5+ messages in thread
From: Andy Whitcroft @ 2011-12-20 13:31 UTC (permalink / raw)
  To: linux-hotplug

On Sat, Dec 17, 2011 at 07:15:00PM +0100, Kay Sievers wrote:
> On Fri, Dec 16, 2011 at 18:46, Andy Whitcroft <apw@canonical.com> wrote:
> > In exit (udevadm control --exit) processing we currently dump the entire
> > incoming queue and ignore any further incoming kernel events.  We then wait
> > for the current workers to complete any events they were assigned, before
> > finally exiting.  However if any of the pending events should trigger a
> > nested event that new event would not be processed preventing completion
> > of the existing worker event.  We will eventually timeout both leading
> > to a long boot delay and in some cases to partially initialised cards.
> > These cards often will not be repaired even during coldplug and are lost
> > requiring manual intervention to recover.
> >
> > Modify exit processing such that we handle events with timeliness
> > constraints on them, bringing them into line with normal processing.
> > This allows events which are triggered from our existing workers events
> > to run to completion and allow completion of those workers.  This allows
> > us to flush the queue and exit cleanly.
> 
> Is 'udevadm settle' called before doing --exit in the initramfs? If
> not, I guess that's why others have not seen that.

No we wish to avoid 'settle'ing as that would force us to wait for the
majority of device probes to complete, and we do not have the majority
of the modules in the initramfs to service them.

> In general, requesting firmware synchronously on module init sounds
> pretty broken. The firmware request should be async. If the device
> allow that, done as late as the first ifup of the netdev, and not at
> module load time.

I will agree that the right thing is to fix the kernel drivers overall.
I have looked at a few and all of the ones I have looked at seem to follow
this model, so I suspect this is not going to be a quick fix.

> If udev is not running, modprobe will hang until it runs into the
> timeout? Having module init depending on a 'userspace transaction'
> sounds pretty weird. How does that work when the module is
> compiled-in?

If udev is not running we will not trigger any modprobes in response to
the device discovery events and so cannot trigger firmware loads during
that time.  If the module is built-in then we will trigger the firmware
load event, and either udev is there to service it or it is lost.
However the firmware object is coldplug-able so when udev is finally
(re)started we will replay the firmware event and expedite the firmware
load as it has timeliness set.  It is only the period when we udev is
exiting that we can be running a modprobe but not willing to handle the
event (without this patch).

> The TIMEOUT= is basically just a left-over from the times of the crazy
> /sbin/hotplug era, where all the hooked-in shell scripts took ages to
> handle events, and the kernel's firmware loaders default was 10
> seconds, and it ran into that all the time.
> 
> The patch looks sensible, but I haven't really wrapped my head around
> if we really should make the TIMEOUT= handling even more special here.
> I rather see the firmware loading model fixed for the affected
> drivers, as I think it is very wrong, for many other reasons too, what
> they do here.

I believe that the change makes TIMEOUT= handline consistantly special,
ie. it always triggers expedited handling regardless of whether we are
running normally or in the process of exiting.  Plus, by handling these
events we can avoid deadlocking the modprobe for these bad drivers.

-apw

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RESEND PATCH 1/1] udevd: process events with timeliness
  2011-12-16 17:46 [RESEND PATCH 1/1] udevd: process events with timeliness requirements during exit to avoid deadlock Andy Whitcroft
  2011-12-17 18:15 ` [RESEND PATCH 1/1] udevd: process events with timeliness Kay Sievers
  2011-12-20 13:31 ` Andy Whitcroft
@ 2011-12-20 13:44 ` Kay Sievers
  2011-12-20 13:58 ` Kay Sievers
  3 siblings, 0 replies; 5+ messages in thread
From: Kay Sievers @ 2011-12-20 13:44 UTC (permalink / raw)
  To: linux-hotplug

On Tue, Dec 20, 2011 at 14:31, Andy Whitcroft <apw@canonical.com> wrote:
> On Sat, Dec 17, 2011 at 07:15:00PM +0100, Kay Sievers wrote:
>> On Fri, Dec 16, 2011 at 18:46, Andy Whitcroft <apw@canonical.com> wrote:
>> > In exit (udevadm control --exit) processing we currently dump the entire
>> > incoming queue and ignore any further incoming kernel events.  We then wait
>> > for the current workers to complete any events they were assigned, before
>> > finally exiting.  However if any of the pending events should trigger a
>> > nested event that new event would not be processed preventing completion
>> > of the existing worker event.  We will eventually timeout both leading
>> > to a long boot delay and in some cases to partially initialised cards.
>> > These cards often will not be repaired even during coldplug and are lost
>> > requiring manual intervention to recover.
>> >
>> > Modify exit processing such that we handle events with timeliness
>> > constraints on them, bringing them into line with normal processing.
>> > This allows events which are triggered from our existing workers events
>> > to run to completion and allow completion of those workers.  This allows
>> > us to flush the queue and exit cleanly.
>>
>> Is 'udevadm settle' called before doing --exit in the initramfs? If
>> not, I guess that's why others have not seen that.
>
> No we wish to avoid 'settle'ing as that would force us to wait for the
> majority of device probes to complete, and we do not have the majority
> of the modules in the initramfs to service them.

What does that mean? You wait for exactly what you yourself have
triggered in the initramfs, not for anything unrelated or not loaded.
And that's very likely the right thing to do. There are usually no
other event pending at that point in time, unless you need to work
around some other kernel bug here. :)

Kay

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RESEND PATCH 1/1] udevd: process events with timeliness
  2011-12-16 17:46 [RESEND PATCH 1/1] udevd: process events with timeliness requirements during exit to avoid deadlock Andy Whitcroft
                   ` (2 preceding siblings ...)
  2011-12-20 13:44 ` Kay Sievers
@ 2011-12-20 13:58 ` Kay Sievers
  3 siblings, 0 replies; 5+ messages in thread
From: Kay Sievers @ 2011-12-20 13:58 UTC (permalink / raw)
  To: linux-hotplug

On Tue, Dec 20, 2011 at 14:31, Andy Whitcroft <apw@canonical.com> wrote:
> On Sat, Dec 17, 2011 at 07:15:00PM +0100, Kay Sievers wrote:

>> The patch looks sensible, but I haven't really wrapped my head around
>> if we really should make the TIMEOUT= handling even more special here.
>> I rather see the firmware loading model fixed for the affected
>> drivers, as I think it is very wrong, for many other reasons too, what
>> they do here.
>
> I believe that the change makes TIMEOUT= handline consistantly special,
> ie. it always triggers expedited handling regardless of whether we are
> running normally or in the process of exiting. Â Plus, by handling these
> events we can avoid deadlocking the modprobe for these bad drivers.

As mentioned earlier, at this point, I rather remove the entire
TIMEOUT handling from udev to reach consistency, than make it even
more special. It's not a proper facility, it's a left-over from the
time the entire hotplug stuff was not working at all.

There are no guarantees that asynchronous events will ever be handled
in a certain time frame. Everything should have a timeout, and we
should handle that as good as we can, but we need to avoid synchronous
kernel dependencies on userspace. And kernel drivers should not get
away these days with dirty hacks like this.

Since a while we refuse to work around such broken kernel behaviour in
userspace, for the reason to put the blame and focus where it belongs:
at the bug and not at the workaround. And what you analysed here is a
kernel bug, not a userspace one.

And I still think that depending on a userspace transaction in module
init is just terribly broken. If udev is not running properly,
modprobe will just hang until a timeout is reached, and that is just
not how things should work. I think, that supporting event
inter-dependencies that way just asks for trouble in the long run.

Kay

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-12-20 13:58 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-16 17:46 [RESEND PATCH 1/1] udevd: process events with timeliness requirements during exit to avoid deadlock Andy Whitcroft
2011-12-17 18:15 ` [RESEND PATCH 1/1] udevd: process events with timeliness Kay Sievers
2011-12-20 13:31 ` Andy Whitcroft
2011-12-20 13:44 ` Kay Sievers
2011-12-20 13:58 ` Kay Sievers

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.