* Possible problem with device_move()
@ 2007-07-28 15:36 Alan Stern
2007-07-30 6:42 ` Cornelia Huck
2007-07-30 6:53 ` Marcel Holtmann
0 siblings, 2 replies; 30+ messages in thread
From: Alan Stern @ 2007-07-28 15:36 UTC (permalink / raw)
To: Cornelia Huck, Marcel Holtmann; +Cc: Linux-pm mailing list
Cornelia and Marcel:
The only places in the kernel that call device_move() are in
drivers/s390/cio/device.c and net/bluetooth/rfcomm/tty.c, so I want to
check with the two of you.
I'm in the midst of changing the Power Management core to make it
acquire the device semaphore of every device during a suspend and
resume. The order of acquisition is the order of device registration,
which normally agrees with the way locks are acquired when going
through the device tree (i.e., parents before children).
But when you call device_move() that might no longer be true. If a
device is moved so that its new parent was registered _after_ it was,
then the two orders will disagree. This raises the possibility of a
deadlock if any thread ever tries to lock the device while holding the
new parent's lock -- as might happen during an unregistration, for
example.
Can you tell whether this will ever cause a problem? Or is it known to
be safe because whenever you call device_move(), the new parent was
registered before the device being moved?
Thanks,
Alan Stern
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Possible problem with device_move()
2007-07-28 15:36 Possible problem with device_move() Alan Stern
@ 2007-07-30 6:42 ` Cornelia Huck
2007-07-30 17:34 ` Alan Stern
2007-07-30 6:53 ` Marcel Holtmann
1 sibling, 1 reply; 30+ messages in thread
From: Cornelia Huck @ 2007-07-30 6:42 UTC (permalink / raw)
To: Alan Stern; +Cc: Marcel Holtmann, Linux-pm mailing list
On Sat, 28 Jul 2007 11:36:42 -0400 (EDT),
Alan Stern <stern@rowland.harvard.edu> wrote:
> Can you tell whether this will ever cause a problem? Or is it known to
> be safe because whenever you call device_move(), the new parent was
> registered before the device being moved?
Unfortunately, we can't make any assumptions in cio whether the new
parent was registered before the moving device.
(A little background: We have each ccw_device as parent of a subchannel
as discovered during initial sense. In the case of z/VM (a hypervisor),
all subchannels are always consecutively numbered. Now the following
may happen: A dasd on subchannel 0 is detached and a new device, say a
ctc, is defined. As subchannel 0 was free from z/VMs perspective, the
ctc ccw_device now sits on subchannel 0 and the dasd ccw_device is
moved to the artificial 'defunct' subchannel. When the dasd is attached
again, z/VM will hand out the next free subchannel, which we allocate a
subchannel structure for and move the dasd ccw_device to. And here's
the case where the child is older than the parent...)
While we don't do suspend/resume yet on s390, I don't want to rule it
out for the future...
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Possible problem with device_move()
2007-07-28 15:36 Possible problem with device_move() Alan Stern
2007-07-30 6:42 ` Cornelia Huck
@ 2007-07-30 6:53 ` Marcel Holtmann
2007-07-30 17:37 ` Alan Stern
1 sibling, 1 reply; 30+ messages in thread
From: Marcel Holtmann @ 2007-07-30 6:53 UTC (permalink / raw)
To: Alan Stern; +Cc: Cornelia Huck, Linux-pm mailing list
Hi Alan,
> I'm in the midst of changing the Power Management core to make it
> acquire the device semaphore of every device during a suspend and
> resume. The order of acquisition is the order of device registration,
> which normally agrees with the way locks are acquired when going
> through the device tree (i.e., parents before children).
>
> But when you call device_move() that might no longer be true. If a
> device is moved so that its new parent was registered _after_ it was,
> then the two orders will disagree. This raises the possibility of a
> deadlock if any thread ever tries to lock the device while holding the
> new parent's lock -- as might happen during an unregistration, for
> example.
the Bluetooth subsystem or the RFCOMM TTY code to be more precise uses
device_move() to attach an registered /dev/rfcommX device node (but not
connected) to the appropriate Bluetooth adapter and the low-level
piconet connection when you open (and thus connect) it.
So mainly we re-parent the TTY to the Bluetooth connection and after the
connection terminates we re-parent it to the virtual tree (NULL).
Regards
Marcel
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Possible problem with device_move()
2007-07-30 6:42 ` Cornelia Huck
@ 2007-07-30 17:34 ` Alan Stern
2007-07-31 8:33 ` Cornelia Huck
0 siblings, 1 reply; 30+ messages in thread
From: Alan Stern @ 2007-07-30 17:34 UTC (permalink / raw)
To: Cornelia Huck; +Cc: Marcel Holtmann, Linux-pm mailing list
On Mon, 30 Jul 2007, Cornelia Huck wrote:
> On Sat, 28 Jul 2007 11:36:42 -0400 (EDT),
> Alan Stern <stern@rowland.harvard.edu> wrote:
>
> > Can you tell whether this will ever cause a problem? Or is it known to
> > be safe because whenever you call device_move(), the new parent was
> > registered before the device being moved?
>
> Unfortunately, we can't make any assumptions in cio whether the new
> parent was registered before the moving device.
>
> (A little background: We have each ccw_device as parent of a subchannel
> as discovered during initial sense. In the case of z/VM (a hypervisor),
> all subchannels are always consecutively numbered. Now the following
> may happen: A dasd on subchannel 0 is detached and a new device, say a
> ctc, is defined. As subchannel 0 was free from z/VMs perspective, the
> ctc ccw_device now sits on subchannel 0 and the dasd ccw_device is
> moved to the artificial 'defunct' subchannel. When the dasd is attached
> again, z/VM will hand out the next free subchannel, which we allocate a
> subchannel structure for and move the dasd ccw_device to. And here's
> the case where the child is older than the parent...)
As long as you never hold the device semaphores of both the parent and
the child there shouldn't be any problem. Could this happen when a
subchannel structure is deallocated?
> While we don't do suspend/resume yet on s390, I don't want to rule it
> out for the future...
Thanks,
Alan Stern
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Possible problem with device_move()
2007-07-30 6:53 ` Marcel Holtmann
@ 2007-07-30 17:37 ` Alan Stern
0 siblings, 0 replies; 30+ messages in thread
From: Alan Stern @ 2007-07-30 17:37 UTC (permalink / raw)
To: Marcel Holtmann; +Cc: Cornelia Huck, Linux-pm mailing list
On Mon, 30 Jul 2007, Marcel Holtmann wrote:
> Hi Alan,
>
> > I'm in the midst of changing the Power Management core to make it
> > acquire the device semaphore of every device during a suspend and
> > resume. The order of acquisition is the order of device registration,
> > which normally agrees with the way locks are acquired when going
> > through the device tree (i.e., parents before children).
> >
> > But when you call device_move() that might no longer be true. If a
> > device is moved so that its new parent was registered _after_ it was,
> > then the two orders will disagree. This raises the possibility of a
> > deadlock if any thread ever tries to lock the device while holding the
> > new parent's lock -- as might happen during an unregistration, for
> > example.
>
> the Bluetooth subsystem or the RFCOMM TTY code to be more precise uses
> device_move() to attach an registered /dev/rfcommX device node (but not
> connected) to the appropriate Bluetooth adapter and the low-level
> piconet connection when you open (and thus connect) it.
>
> So mainly we re-parent the TTY to the Bluetooth connection and after the
> connection terminates we re-parent it to the virtual tree (NULL).
What would happen to the TTY if the Bluetooth adapter was unplugged
while it was in use? Would the adapter's release routine try to
acquire the TTY's device sem?
Alan Stern
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Possible problem with device_move()
2007-07-30 17:34 ` Alan Stern
@ 2007-07-31 8:33 ` Cornelia Huck
2007-07-31 15:11 ` Alan Stern
0 siblings, 1 reply; 30+ messages in thread
From: Cornelia Huck @ 2007-07-31 8:33 UTC (permalink / raw)
To: Alan Stern; +Cc: Marcel Holtmann, Linux-pm mailing list
On Mon, 30 Jul 2007 13:34:35 -0400 (EDT),
Alan Stern <stern@rowland.harvard.edu> wrote:
> As long as you never hold the device semaphores of both the parent and
> the child there shouldn't be any problem. Could this happen when a
> subchannel structure is deallocated?
This is indeed what happens, since we unregister the child ccw_device
from the (io_)subchannel's remove function.
(We used to do unregistering of the ccw_device via a workqueue (in
order to avoid livelocks on a no longer existing bus semaphore), but
removed it since it seemed to be needlessly complicated code.)
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Possible problem with device_move()
2007-07-31 8:33 ` Cornelia Huck
@ 2007-07-31 15:11 ` Alan Stern
2007-07-31 15:27 ` Rafael J. Wysocki
2007-07-31 15:30 ` Cornelia Huck
0 siblings, 2 replies; 30+ messages in thread
From: Alan Stern @ 2007-07-31 15:11 UTC (permalink / raw)
To: Cornelia Huck; +Cc: Marcel Holtmann, Linux-pm mailing list
On Tue, 31 Jul 2007, Cornelia Huck wrote:
> On Mon, 30 Jul 2007 13:34:35 -0400 (EDT),
> Alan Stern <stern@rowland.harvard.edu> wrote:
>
> > As long as you never hold the device semaphores of both the parent and
> > the child there shouldn't be any problem. Could this happen when a
> > subchannel structure is deallocated?
>
> This is indeed what happens, since we unregister the child ccw_device
> from the (io_)subchannel's remove function.
>
> (We used to do unregistering of the ccw_device via a workqueue (in
> order to avoid livelocks on a no longer existing bus semaphore), but
> removed it since it seemed to be needlessly complicated code.)
It sounds like you might be in trouble with suspend anyway (if S390
implemented it). As things stand now, the PM core would try to suspend
the subchannel before suspending the ccw_device, because the subchannel
was registered later.
Marcel, you might be in a similar position. The PM core could try to
suspend a Bluetooth adapter before suspending its child TTY node.
I wonder if device_move() shouldn't change the order of entries in the
dpm_active list so that the new parent (and all its ancestors) get
moved up ahead of the device's position? But that might cause other
problems...
Alan Stern
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Possible problem with device_move()
2007-07-31 15:11 ` Alan Stern
@ 2007-07-31 15:27 ` Rafael J. Wysocki
2007-07-31 15:30 ` Cornelia Huck
1 sibling, 0 replies; 30+ messages in thread
From: Rafael J. Wysocki @ 2007-07-31 15:27 UTC (permalink / raw)
To: linux-pm; +Cc: Cornelia Huck, Marcel Holtmann
On Tuesday, 31 July 2007 17:11, Alan Stern wrote:
> On Tue, 31 Jul 2007, Cornelia Huck wrote:
>
> > On Mon, 30 Jul 2007 13:34:35 -0400 (EDT),
> > Alan Stern <stern@rowland.harvard.edu> wrote:
> >
> > > As long as you never hold the device semaphores of both the parent and
> > > the child there shouldn't be any problem. Could this happen when a
> > > subchannel structure is deallocated?
> >
> > This is indeed what happens, since we unregister the child ccw_device
> > from the (io_)subchannel's remove function.
> >
> > (We used to do unregistering of the ccw_device via a workqueue (in
> > order to avoid livelocks on a no longer existing bus semaphore), but
> > removed it since it seemed to be needlessly complicated code.)
>
> It sounds like you might be in trouble with suspend anyway (if S390
> implemented it). As things stand now, the PM core would try to suspend
> the subchannel before suspending the ccw_device, because the subchannel
> was registered later.
>
> Marcel, you might be in a similar position. The PM core could try to
> suspend a Bluetooth adapter before suspending its child TTY node.
>
> I wonder if device_move() shouldn't change the order of entries in the
> dpm_active list so that the new parent (and all its ancestors) get
> moved up ahead of the device's position? But that might cause other
> problems...
Still, if the ordering of dpm_active doesn't reflect the relationships between
devices, we're in trouble anyway.
Greetings,
Rafael
--
"Premature optimization is the root of all evil." - Donald Knuth
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Possible problem with device_move()
2007-07-31 15:11 ` Alan Stern
2007-07-31 15:27 ` Rafael J. Wysocki
@ 2007-07-31 15:30 ` Cornelia Huck
2007-07-31 18:17 ` Alan Stern
1 sibling, 1 reply; 30+ messages in thread
From: Cornelia Huck @ 2007-07-31 15:30 UTC (permalink / raw)
To: Alan Stern; +Cc: Marcel Holtmann, Linux-pm mailing list
On Tue, 31 Jul 2007 11:11:50 -0400 (EDT),
Alan Stern <stern@rowland.harvard.edu> wrote:
> I wonder if device_move() shouldn't change the order of entries in the
> dpm_active list so that the new parent (and all its ancestors) get
> moved up ahead of the device's position? But that might cause other
> problems...
This may be necessary if walking the list is the single determining
factor to the order of suspend/resume. Are there any other dependencies
not covered by time of registration order? I would imagine those needed
moving devices on the dpm_active list as well...
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Possible problem with device_move()
2007-07-31 15:30 ` Cornelia Huck
@ 2007-07-31 18:17 ` Alan Stern
2007-07-31 19:12 ` Rafael J. Wysocki
0 siblings, 1 reply; 30+ messages in thread
From: Alan Stern @ 2007-07-31 18:17 UTC (permalink / raw)
To: Cornelia Huck; +Cc: Marcel Holtmann, Linux-pm mailing list
On Tue, 31 Jul 2007, Cornelia Huck wrote:
> On Tue, 31 Jul 2007 11:11:50 -0400 (EDT),
> Alan Stern <stern@rowland.harvard.edu> wrote:
>
> > I wonder if device_move() shouldn't change the order of entries in the
> > dpm_active list so that the new parent (and all its ancestors) get
> > moved up ahead of the device's position? But that might cause other
> > problems...
>
> This may be necessary if walking the list is the single determining
> factor to the order of suspend/resume.
It is.
> Are there any other dependencies
> not covered by time of registration order? I would imagine those needed
> moving devices on the dpm_active list as well...
I'm not sure what you mean. Devices where A needs to be suspended
before B even though A was discovered first? I'm not aware of anything
like that, other than your case and Marcel's. Right now there's no
code in the PM core to handle such things.
There is a dependency in the USB subsystem, wherein I need an EHCI
controller to be _resumed_ after its companion UHCI or OHCI
controllers. This works out, thanks to the fact that manufacturers
tend to give the EHCI controller the largest PCI function number and
the Linux PCI core enumerates functions in numerical order. This is
just pure luck, however, and if anything changed I'd have to add an
explicit fix.
Alan Stern
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Possible problem with device_move()
2007-07-31 18:17 ` Alan Stern
@ 2007-07-31 19:12 ` Rafael J. Wysocki
2007-07-31 20:37 ` Alan Stern
0 siblings, 1 reply; 30+ messages in thread
From: Rafael J. Wysocki @ 2007-07-31 19:12 UTC (permalink / raw)
To: Alan Stern; +Cc: Cornelia Huck, linux-pm, Marcel Holtmann
On Tuesday, 31 July 2007 20:17, Alan Stern wrote:
> On Tue, 31 Jul 2007, Cornelia Huck wrote:
>
> > On Tue, 31 Jul 2007 11:11:50 -0400 (EDT),
> > Alan Stern <stern@rowland.harvard.edu> wrote:
> >
> > > I wonder if device_move() shouldn't change the order of entries in the
> > > dpm_active list so that the new parent (and all its ancestors) get
> > > moved up ahead of the device's position? But that might cause other
> > > problems...
> >
> > This may be necessary if walking the list is the single determining
> > factor to the order of suspend/resume.
>
> It is.
>
> > Are there any other dependencies
> > not covered by time of registration order? I would imagine those needed
> > moving devices on the dpm_active list as well...
>
> I'm not sure what you mean. Devices where A needs to be suspended
> before B even though A was discovered first? I'm not aware of anything
> like that, other than your case and Marcel's. Right now there's no
> code in the PM core to handle such things.
>
> There is a dependency in the USB subsystem, wherein I need an EHCI
> controller to be _resumed_ after its companion UHCI or OHCI
> controllers. This works out, thanks to the fact that manufacturers
> tend to give the EHCI controller the largest PCI function number and
> the Linux PCI core enumerates functions in numerical order. This is
> just pure luck, however, and if anything changed I'd have to add an
> explicit fix.
That sounds really worrying to me.
The design appears to be very fragile if such things are possible, even in
theory.
Greetings,
Rafael
--
"Premature optimization is the root of all evil." - Donald Knuth
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Possible problem with device_move()
2007-07-31 19:12 ` Rafael J. Wysocki
@ 2007-07-31 20:37 ` Alan Stern
2007-08-01 13:03 ` Cornelia Huck
0 siblings, 1 reply; 30+ messages in thread
From: Alan Stern @ 2007-07-31 20:37 UTC (permalink / raw)
To: Rafael J. Wysocki; +Cc: Cornelia Huck, linux-pm, Marcel Holtmann
On Tue, 31 Jul 2007, Rafael J. Wysocki wrote:
> > There is a dependency in the USB subsystem, wherein I need an EHCI
> > controller to be _resumed_ after its companion UHCI or OHCI
> > controllers. This works out, thanks to the fact that manufacturers
> > tend to give the EHCI controller the largest PCI function number and
> > the Linux PCI core enumerates functions in numerical order. This is
> > just pure luck, however, and if anything changed I'd have to add an
> > explicit fix.
>
> That sounds really worrying to me.
>
> The design appears to be very fragile if such things are possible, even in
> theory.
It wouldn't break the entire resume; it would only prevent the
USB-Persist facility from working with non-high-speed devices. So it's
not quite as bad as it may seem.
Still, I suppose it would be best to add a "move device A to the end of
the dpm_active list" routine for use in this sort of situation. This
should be easy to do.
Alan Stern
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Possible problem with device_move()
2007-07-31 20:37 ` Alan Stern
@ 2007-08-01 13:03 ` Cornelia Huck
2007-08-01 15:22 ` Alan Stern
0 siblings, 1 reply; 30+ messages in thread
From: Cornelia Huck @ 2007-08-01 13:03 UTC (permalink / raw)
To: Alan Stern; +Cc: linux-pm, Marcel Holtmann
On Tue, 31 Jul 2007 16:37:03 -0400 (EDT),
Alan Stern <stern@rowland.harvard.edu> wrote:
> Still, I suppose it would be best to add a "move device A to the end of
> the dpm_active list" routine for use in this sort of situation. This
> should be easy to do.
Would it make sense if device_move() moved the new parent (and its
parents) after the device in the dpm_active list automatically? Then we
would only have to worry about reordering manually in very special
cases.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Possible problem with device_move()
2007-08-01 13:03 ` Cornelia Huck
@ 2007-08-01 15:22 ` Alan Stern
2007-08-01 17:04 ` Cornelia Huck
0 siblings, 1 reply; 30+ messages in thread
From: Alan Stern @ 2007-08-01 15:22 UTC (permalink / raw)
To: Cornelia Huck; +Cc: linux-pm, Marcel Holtmann
On Wed, 1 Aug 2007, Cornelia Huck wrote:
> On Tue, 31 Jul 2007 16:37:03 -0400 (EDT),
> Alan Stern <stern@rowland.harvard.edu> wrote:
>
> > Still, I suppose it would be best to add a "move device A to the end of
> > the dpm_active list" routine for use in this sort of situation. This
> > should be easy to do.
>
> Would it make sense if device_move() moved the new parent (and its
> parents) after the device in the dpm_active list automatically? Then we
> would only have to worry about reordering manually in very special
> cases.
As it stands right now, every place device_move() gets called is
already special!
I'm afraid of reordering devices automatically; there's too much
potential for creating new problems. When a driver calls device_move()
or does something similar, it should know what sort of list
rearrangement is safe. But the PM core can't be expected to know.
Alan Stern
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Possible problem with device_move()
2007-08-01 15:22 ` Alan Stern
@ 2007-08-01 17:04 ` Cornelia Huck
2007-08-01 17:35 ` Alan Stern
0 siblings, 1 reply; 30+ messages in thread
From: Cornelia Huck @ 2007-08-01 17:04 UTC (permalink / raw)
To: Alan Stern; +Cc: linux-pm, Marcel Holtmann
On Wed, 1 Aug 2007 11:22:12 -0400 (EDT),
Alan Stern <stern@rowland.harvard.edu> wrote:
> As it stands right now, every place device_move() gets called is
> already special!
The "special cases" I was thinking about are those where the order to
suspend/resume is not covered by a parent/child relationship, but by
(possibly random) order of registration. I'd have thought the rule "the
child must be suspended before the parent" was pretty straightforward,
but...
>
> I'm afraid of reordering devices automatically; there's too much
> potential for creating new problems. When a driver calls device_move()
> or does something similar, it should know what sort of list
> rearrangement is safe. But the PM core can't be expected to know.
...you have a point here. Automatic reordering may destroy other
ordering, so we shouldn't do it.
(The whole list based on registration order thing seems a bit fragile
to me, but I don't know enough of the PM core and suspend/resume in
general to make a better suggestion :/)
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Possible problem with device_move()
2007-08-01 17:04 ` Cornelia Huck
@ 2007-08-01 17:35 ` Alan Stern
2007-08-01 19:25 ` Rafael J. Wysocki
0 siblings, 1 reply; 30+ messages in thread
From: Alan Stern @ 2007-08-01 17:35 UTC (permalink / raw)
To: Cornelia Huck; +Cc: linux-pm, Marcel Holtmann
On Wed, 1 Aug 2007, Cornelia Huck wrote:
> On Wed, 1 Aug 2007 11:22:12 -0400 (EDT),
> Alan Stern <stern@rowland.harvard.edu> wrote:
>
> > As it stands right now, every place device_move() gets called is
> > already special!
>
> The "special cases" I was thinking about are those where the order to
> suspend/resume is not covered by a parent/child relationship, but by
> (possibly random) order of registration. I'd have thought the rule "the
> child must be suspended before the parent" was pretty straightforward,
> but...
So you mean additional requirements, like what I encountered with EHCI.
It's peculiar in that the controller contains a hardware switch which
can literally connect a USB device to a different (companion)
controller, and setting the switch before the companion controller has
been resumed won't work. The correct way to handle this would be to
set the switch later when the USB device is resumed, but that would be
much more awkward.
I haven't heard of any other cases.
> (The whole list based on registration order thing seems a bit fragile
> to me, but I don't know enough of the PM core and suspend/resume in
> general to make a better suggestion :/)
It hasn't been bad in the past. If A was discovered before B then ipso
facto it is safe to suspend B before suspending A. Likewise, in the
absence of device_move, if A was discovered before B then A cannot
appear below B in the device tree. Of course, this assumes devices are
registered as they are discovered.
Alan Stern
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Possible problem with device_move()
2007-08-01 17:35 ` Alan Stern
@ 2007-08-01 19:25 ` Rafael J. Wysocki
2007-08-01 20:27 ` Alan Stern
0 siblings, 1 reply; 30+ messages in thread
From: Rafael J. Wysocki @ 2007-08-01 19:25 UTC (permalink / raw)
To: Alan Stern; +Cc: Cornelia Huck, linux-pm, Marcel Holtmann
On Wednesday, 1 August 2007 19:35, Alan Stern wrote:
> On Wed, 1 Aug 2007, Cornelia Huck wrote:
>
> > On Wed, 1 Aug 2007 11:22:12 -0400 (EDT),
> > Alan Stern <stern@rowland.harvard.edu> wrote:
> >
> > > As it stands right now, every place device_move() gets called is
> > > already special!
> >
> > The "special cases" I was thinking about are those where the order to
> > suspend/resume is not covered by a parent/child relationship, but by
> > (possibly random) order of registration. I'd have thought the rule "the
> > child must be suspended before the parent" was pretty straightforward,
> > but...
>
> So you mean additional requirements, like what I encountered with EHCI.
> It's peculiar in that the controller contains a hardware switch which
> can literally connect a USB device to a different (companion)
> controller, and setting the switch before the companion controller has
> been resumed won't work. The correct way to handle this would be to
> set the switch later when the USB device is resumed, but that would be
> much more awkward.
>
> I haven't heard of any other cases.
Well, that's really exceptional and I have no idea how to handle things like
that in a generic way.
> > (The whole list based on registration order thing seems a bit fragile
> > to me, but I don't know enough of the PM core and suspend/resume in
> > general to make a better suggestion :/)
>
> It hasn't been bad in the past. If A was discovered before B then ipso
> facto it is safe to suspend B before suspending A. Likewise, in the
> absence of device_move, if A was discovered before B then A cannot
> appear below B in the device tree. Of course, this assumes devices are
> registered as they are discovered.
Which is a weak assupmtion ...
Well, we seem to have some examples of possible situations in which the
design might not be adequate and that's alarming.
Perhaps we should create dpm_active right before the suspend, by really
traversing the device tree, when we own all device semaphores and no device
objects can be added/removed?
Greetings,
Rafael
--
"Premature optimization is the root of all evil." - Donald Knuth
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Possible problem with device_move()
2007-08-01 19:25 ` Rafael J. Wysocki
@ 2007-08-01 20:27 ` Alan Stern
2007-08-02 8:06 ` Cornelia Huck
2007-08-02 11:21 ` Rafael J. Wysocki
0 siblings, 2 replies; 30+ messages in thread
From: Alan Stern @ 2007-08-01 20:27 UTC (permalink / raw)
To: Rafael J. Wysocki; +Cc: Cornelia Huck, linux-pm, Marcel Holtmann
On Wed, 1 Aug 2007, Rafael J. Wysocki wrote:
> > > (The whole list based on registration order thing seems a bit fragile
> > > to me, but I don't know enough of the PM core and suspend/resume in
> > > general to make a better suggestion :/)
> >
> > It hasn't been bad in the past. If A was discovered before B then ipso
> > facto it is safe to suspend B before suspending A. Likewise, in the
> > absence of device_move, if A was discovered before B then A cannot
> > appear below B in the device tree. Of course, this assumes devices are
> > registered as they are discovered.
>
> Which is a weak assupmtion ...
>
> Well, we seem to have some examples of possible situations in which the
> design might not be adequate and that's alarming.
>
> Perhaps we should create dpm_active right before the suspend, by really
> traversing the device tree, when we own all device semaphores and no device
> objects can be added/removed?
We're doing okay the way things are. Changing the order is more likely
to cause new problems than to solve existing ones.
In fact, I would go even farther. Let's document the ordering given by
dpm_active (by default, in order of registration) so that drivers can
depend on it, and let's add special routines to change the order for
the few cases that require it -- under driver control.
A "move to the end" routine would solve the EHCI issue. It ought to
solve the problems associated with device_move() also, provided the
device being moved doesn't have any children.
Cornelia and Marcel, is that the case? When a ccw_device or RFCOMM TTY
is reparented under a new device, do any children get moved along with
it?
Alan Stern
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Possible problem with device_move()
2007-08-01 20:27 ` Alan Stern
@ 2007-08-02 8:06 ` Cornelia Huck
2007-08-02 14:19 ` Alan Stern
2007-08-02 11:21 ` Rafael J. Wysocki
1 sibling, 1 reply; 30+ messages in thread
From: Cornelia Huck @ 2007-08-02 8:06 UTC (permalink / raw)
To: Alan Stern; +Cc: linux-pm, Marcel Holtmann
On Wed, 1 Aug 2007 16:27:48 -0400 (EDT),
Alan Stern <stern@rowland.harvard.edu> wrote:
> In fact, I would go even farther. Let's document the ordering given by
> dpm_active (by default, in order of registration) so that drivers can
> depend on it, and let's add special routines to change the order for
> the few cases that require it -- under driver control.
Driver authors must definitely be made aware of this, and they can only
do something if they are given some control. Implicit assumptions mean
waiting for breakage to happen.
>
> A "move to the end" routine would solve the EHCI issue. It ought to
> solve the problems associated with device_move() also, provided the
> device being moved doesn't have any children.
The device drivers need to be given more control; they could just walk
the children list in this case then.
>
> Cornelia and Marcel, is that the case? When a ccw_device or RFCOMM TTY
> is reparented under a new device, do any children get moved along with
> it?
Hm, device_move() just drags around the children with it (and
ccw_devices may have children as added by the driver; zfcp does that,
for example).
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Possible problem with device_move()
2007-08-01 20:27 ` Alan Stern
2007-08-02 8:06 ` Cornelia Huck
@ 2007-08-02 11:21 ` Rafael J. Wysocki
2007-08-02 14:24 ` Alan Stern
1 sibling, 1 reply; 30+ messages in thread
From: Rafael J. Wysocki @ 2007-08-02 11:21 UTC (permalink / raw)
To: Alan Stern; +Cc: Cornelia Huck, linux-pm, Marcel Holtmann
On Wednesday, 1 August 2007 22:27, Alan Stern wrote:
> On Wed, 1 Aug 2007, Rafael J. Wysocki wrote:
>
> > > > (The whole list based on registration order thing seems a bit fragile
> > > > to me, but I don't know enough of the PM core and suspend/resume in
> > > > general to make a better suggestion :/)
> > >
> > > It hasn't been bad in the past. If A was discovered before B then ipso
> > > facto it is safe to suspend B before suspending A. Likewise, in the
> > > absence of device_move, if A was discovered before B then A cannot
> > > appear below B in the device tree. Of course, this assumes devices are
> > > registered as they are discovered.
> >
> > Which is a weak assupmtion ...
> >
> > Well, we seem to have some examples of possible situations in which the
> > design might not be adequate and that's alarming.
> >
> > Perhaps we should create dpm_active right before the suspend, by really
> > traversing the device tree, when we own all device semaphores and no device
> > objects can be added/removed?
>
> We're doing okay the way things are. Changing the order is more likely
> to cause new problems than to solve existing ones.
Well, my idea is not to change the order, but to create the list from scratch
when we need it and not in advance, because creating the list in advance
causes problems to appear.
In fact, if the system is never suspended, the list that we create is not even
useful for anything.
Greetings,
Rafael
--
"Premature optimization is the root of all evil." - Donald Knuth
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Possible problem with device_move()
2007-08-02 8:06 ` Cornelia Huck
@ 2007-08-02 14:19 ` Alan Stern
2007-08-02 14:50 ` Cornelia Huck
0 siblings, 1 reply; 30+ messages in thread
From: Alan Stern @ 2007-08-02 14:19 UTC (permalink / raw)
To: Cornelia Huck; +Cc: linux-pm, Marcel Holtmann
On Thu, 2 Aug 2007, Cornelia Huck wrote:
> > A "move to the end" routine would solve the EHCI issue. It ought to
> > solve the problems associated with device_move() also, provided the
> > device being moved doesn't have any children.
>
> The device drivers need to be given more control; they could just walk
> the children list in this case then.
Come to think of it, "move to the end" won't solve the EHCI issue. I
would need a "move A up so that it comes before B" routine. For this
special case we wouldn't need to worry about ancestors of A because A
and B will have the same parent, but in general A's ancestors would
have to be moved also. That's a little worrisome because the ancestors
might have dependencies of their own.
> > Cornelia and Marcel, is that the case? When a ccw_device or RFCOMM TTY
> > is reparented under a new device, do any children get moved along with
> > it?
>
> Hm, device_move() just drags around the children with it (and
> ccw_devices may have children as added by the driver; zfcp does that,
> for example).
So you would want to move the sub-channel plus its ancestors in front
of the ccw_device, right? Can you think of a safe way to satisfy
everybody?
Alan Stern
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Possible problem with device_move()
2007-08-02 11:21 ` Rafael J. Wysocki
@ 2007-08-02 14:24 ` Alan Stern
2007-08-02 14:58 ` Rafael J. Wysocki
0 siblings, 1 reply; 30+ messages in thread
From: Alan Stern @ 2007-08-02 14:24 UTC (permalink / raw)
To: Rafael J. Wysocki; +Cc: Cornelia Huck, linux-pm, Marcel Holtmann
On Thu, 2 Aug 2007, Rafael J. Wysocki wrote:
> > > Perhaps we should create dpm_active right before the suspend, by really
> > > traversing the device tree, when we own all device semaphores and no device
> > > objects can be added/removed?
> >
> > We're doing okay the way things are. Changing the order is more likely
> > to cause new problems than to solve existing ones.
>
> Well, my idea is not to change the order, but to create the list from scratch
> when we need it and not in advance, because creating the list in advance
> causes problems to appear.
Doing it that way will almost certainly result in a different order
from the one we have now. And it won't help the EHCI issue, because
there the question is which of several siblings should come last.
> In fact, if the system is never suspended, the list that we create is not even
> useful for anything.
True. But it doesn't take up much space, especially if PM isn't
configured. And creating the list doesn't take much time.
Alan Stern
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Possible problem with device_move()
2007-08-02 14:19 ` Alan Stern
@ 2007-08-02 14:50 ` Cornelia Huck
0 siblings, 0 replies; 30+ messages in thread
From: Cornelia Huck @ 2007-08-02 14:50 UTC (permalink / raw)
To: Alan Stern; +Cc: linux-pm, Marcel Holtmann
On Thu, 2 Aug 2007 10:19:32 -0400 (EDT),
Alan Stern <stern@rowland.harvard.edu> wrote:
> Come to think of it, "move to the end" won't solve the EHCI issue. I
> would need a "move A up so that it comes before B" routine. For this
> special case we wouldn't need to worry about ancestors of A because A
> and B will have the same parent, but in general A's ancestors would
> have to be moved also. That's a little worrisome because the ancestors
> might have dependencies of their own.
In bad cases, we would need to touch the whole tree. This sounds like
people are likely to mess the tree up if given the power :(
> So you would want to move the sub-channel plus its ancestors in front
> of the ccw_device, right? Can you think of a safe way to satisfy
> everybody?
Unless I'm confused, I would need to have the order
(children of ccw_device) -> ccw_device -> subchannel -> channel_subsystem,
no? (Note that currently all subchannels are children of the same
channel subsystem.)
Correct order for moved devices could be most easily obtained by
Rafael's suggestion of generating the list just before it is needed;
but as this doesn't help with other dependencies, we would additionally
need a way to specify those at device registration time (or when it is
already registered), which doesn't sound like a terribly good solution
either...
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Possible problem with device_move()
2007-08-02 14:24 ` Alan Stern
@ 2007-08-02 14:58 ` Rafael J. Wysocki
2007-08-02 15:23 ` Alan Stern
0 siblings, 1 reply; 30+ messages in thread
From: Rafael J. Wysocki @ 2007-08-02 14:58 UTC (permalink / raw)
To: Alan Stern; +Cc: Cornelia Huck, linux-pm, Marcel Holtmann
On Thursday, 2 August 2007 16:24, Alan Stern wrote:
> On Thu, 2 Aug 2007, Rafael J. Wysocki wrote:
>
> > > > Perhaps we should create dpm_active right before the suspend, by really
> > > > traversing the device tree, when we own all device semaphores and no device
> > > > objects can be added/removed?
> > >
> > > We're doing okay the way things are. Changing the order is more likely
> > > to cause new problems than to solve existing ones.
> >
> > Well, my idea is not to change the order, but to create the list from scratch
> > when we need it and not in advance, because creating the list in advance
> > causes problems to appear.
>
> Doing it that way will almost certainly result in a different order
> from the one we have now.
Yes, but wouldn't that order be more accurate?
> And it won't help the EHCI issue, because there the question is which of
> several siblings should come last.
Why not? If the device can indicate something like "place me after that one"
to the code creating the list, we can handle that too.
> > In fact, if the system is never suspended, the list that we create is not even
> > useful for anything.
>
> True. But it doesn't take up much space, especially if PM isn't
> configured. And creating the list doesn't take much time.
Sure.
Greetings,
Rafael
--
"Premature optimization is the root of all evil." - Donald Knuth
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Possible problem with device_move()
2007-08-02 14:58 ` Rafael J. Wysocki
@ 2007-08-02 15:23 ` Alan Stern
2007-08-02 22:39 ` Rafael J. Wysocki
0 siblings, 1 reply; 30+ messages in thread
From: Alan Stern @ 2007-08-02 15:23 UTC (permalink / raw)
To: Rafael J. Wysocki; +Cc: Cornelia Huck, linux-pm, Marcel Holtmann
On Thu, 2 Aug 2007, Rafael J. Wysocki wrote:
> > > Well, my idea is not to change the order, but to create the list from scratch
> > > when we need it and not in advance, because creating the list in advance
> > > causes problems to appear.
> >
> > Doing it that way will almost certainly result in a different order
> > from the one we have now.
>
> Yes, but wouldn't that order be more accurate?
I'm not so sure. There might be hidden dependencies, things we aren't
aware of because they are automatically satisfied by
order-of-discovery. With an arbitrary parent-first traversal of the
device tree those dependencies could be violated.
And then there are the weird possibilities created by device_move().
Isn't it possible that if an older child is moved under a younger
parent, we actually might _want_ to suspend the parent first? Only the
driver doing the move would know for sure.
> > And it won't help the EHCI issue, because there the question is which of
> > several siblings should come last.
>
> Why not? If the device can indicate something like "place me after that one"
> to the code creating the list, we can handle that too.
Yes, but currently we don't have any way of indicating that. And if we
did, it work work just as well with the current dpm_active list, right?
Alan Stern
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Possible problem with device_move()
2007-08-02 15:23 ` Alan Stern
@ 2007-08-02 22:39 ` Rafael J. Wysocki
2007-08-03 14:56 ` Alan Stern
0 siblings, 1 reply; 30+ messages in thread
From: Rafael J. Wysocki @ 2007-08-02 22:39 UTC (permalink / raw)
To: Alan Stern; +Cc: Cornelia Huck, linux-pm, Marcel Holtmann
On Thursday, 2 August 2007 17:23, Alan Stern wrote:
> On Thu, 2 Aug 2007, Rafael J. Wysocki wrote:
>
> > > > Well, my idea is not to change the order, but to create the list from scratch
> > > > when we need it and not in advance, because creating the list in advance
> > > > causes problems to appear.
> > >
> > > Doing it that way will almost certainly result in a different order
> > > from the one we have now.
> >
> > Yes, but wouldn't that order be more accurate?
>
> I'm not so sure. There might be hidden dependencies, things we aren't
> aware of because they are automatically satisfied by
> order-of-discovery. With an arbitrary parent-first traversal of the
> device tree those dependencies could be violated.
>
> And then there are the weird possibilities created by device_move().
> Isn't it possible that if an older child is moved under a younger
> parent, we actually might _want_ to suspend the parent first? Only the
> driver doing the move would know for sure.
>
> > > And it won't help the EHCI issue, because there the question is which of
> > > several siblings should come last.
> >
> > Why not? If the device can indicate something like "place me after that one"
> > to the code creating the list, we can handle that too.
>
> Yes, but currently we don't have any way of indicating that. And if we
> did, it work work just as well with the current dpm_active list, right?
Well, I don't know. :-)
It seems the problem is _exactly_ that we have no means to represent the
dependencies between devices other than the order of registration and/or
the parent-child relationships.
Now, the simplest things that comes to mind is to do what we're doing (ie.
use the order of registration) with a modification allowing the exceptional
devices to have a "please move me to the end of list" flag set. The PM core
would then browse the list during suspend and move the devices marked like
this to the end of dpm_active along with their children, if need be.
Greetings,
Rafael
--
"Premature optimization is the root of all evil." - Donald Knuth
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Possible problem with device_move()
2007-08-02 22:39 ` Rafael J. Wysocki
@ 2007-08-03 14:56 ` Alan Stern
2007-08-03 21:39 ` Rafael J. Wysocki
0 siblings, 1 reply; 30+ messages in thread
From: Alan Stern @ 2007-08-03 14:56 UTC (permalink / raw)
To: Rafael J. Wysocki; +Cc: Cornelia Huck, linux-pm, Marcel Holtmann
On Fri, 3 Aug 2007, Rafael J. Wysocki wrote:
> > > Why not? If the device can indicate something like "place me after that one"
> > > to the code creating the list, we can handle that too.
> >
> > Yes, but currently we don't have any way of indicating that. And if we
> > did, it work work just as well with the current dpm_active list, right?
>
> Well, I don't know. :-)
>
> It seems the problem is _exactly_ that we have no means to represent the
> dependencies between devices other than the order of registration and/or
> the parent-child relationships.
Yes.
> Now, the simplest things that comes to mind is to do what we're doing (ie.
> use the order of registration) with a modification allowing the exceptional
> devices to have a "please move me to the end of list" flag set. The PM core
> would then browse the list during suspend and move the devices marked like
> this to the end of dpm_active along with their children, if need be.
Code is more flexible than flags. Why not export a function drivers
can call, or better yet, a handful of functions to do different things?
It ought to suffice for drivers to call them during probing or
registration, rather than every time the system is suspended. They
also could be called after device_move() -- and to prevent races we
should export pm_sleep_[un]lock().
Suggested functions:
1. Move A in front of B, together with all of A's ancestors which
aren't already ahead of B (or maybe require that A's parent be
ahead of B already).
2. Move device A to the end (A must not have any children).
Anything else? I think this will cover all the cases we currently are
concerned about.
Alan Stern
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Possible problem with device_move()
2007-08-03 14:56 ` Alan Stern
@ 2007-08-03 21:39 ` Rafael J. Wysocki
2007-08-06 9:31 ` Cornelia Huck
0 siblings, 1 reply; 30+ messages in thread
From: Rafael J. Wysocki @ 2007-08-03 21:39 UTC (permalink / raw)
To: Alan Stern; +Cc: Cornelia Huck, linux-pm, Marcel Holtmann
On Friday, 3 August 2007 16:56, Alan Stern wrote:
> On Fri, 3 Aug 2007, Rafael J. Wysocki wrote:
>
> > > > Why not? If the device can indicate something like "place me after that one"
> > > > to the code creating the list, we can handle that too.
> > >
> > > Yes, but currently we don't have any way of indicating that. And if we
> > > did, it work work just as well with the current dpm_active list, right?
> >
> > Well, I don't know. :-)
> >
> > It seems the problem is _exactly_ that we have no means to represent the
> > dependencies between devices other than the order of registration and/or
> > the parent-child relationships.
>
> Yes.
>
> > Now, the simplest things that comes to mind is to do what we're doing (ie.
> > use the order of registration) with a modification allowing the exceptional
> > devices to have a "please move me to the end of list" flag set. The PM core
> > would then browse the list during suspend and move the devices marked like
> > this to the end of dpm_active along with their children, if need be.
>
> Code is more flexible than flags. Why not export a function drivers
> can call, or better yet, a handful of functions to do different things?
Yes, we can do that.
> It ought to suffice for drivers to call them during probing or
> registration, rather than every time the system is suspended. They
> also could be called after device_move() -- and to prevent races we
> should export pm_sleep_[un]lock().
Yes, we should.
> Suggested functions:
>
> 1. Move A in front of B, together with all of A's ancestors which
> aren't already ahead of B (or maybe require that A's parent be
> ahead of B already).
>
> 2. Move device A to the end (A must not have any children).
>
> Anything else?
Nothing more comes to mind at the moment.
> I think this will cover all the cases we currently are concerned about.
I agree.
Greetings,
Rafael
--
"Premature optimization is the root of all evil." - Donald Knuth
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Possible problem with device_move()
2007-08-03 21:39 ` Rafael J. Wysocki
@ 2007-08-06 9:31 ` Cornelia Huck
2007-08-06 13:53 ` Alan Stern
0 siblings, 1 reply; 30+ messages in thread
From: Cornelia Huck @ 2007-08-06 9:31 UTC (permalink / raw)
To: Rafael J. Wysocki; +Cc: linux-pm, Marcel Holtmann
On Fri, 3 Aug 2007 23:39:31 +0200,
"Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> > Suggested functions:
> >
> > 1. Move A in front of B, together with all of A's ancestors which
> > aren't already ahead of B (or maybe require that A's parent be
> > ahead of B already).
I'd prefer "move includes ancestors" as not to introduce another
possibility where users can get it wrong.
> >
> > 2. Move device A to the end (A must not have any children).
> >
> > Anything else?
>
> Nothing more comes to mind at the moment.
>
> > I think this will cover all the cases we currently are concerned about.
>
> I agree.
Yes, I think this should be sufficient.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Possible problem with device_move()
2007-08-06 9:31 ` Cornelia Huck
@ 2007-08-06 13:53 ` Alan Stern
0 siblings, 0 replies; 30+ messages in thread
From: Alan Stern @ 2007-08-06 13:53 UTC (permalink / raw)
To: Cornelia Huck; +Cc: linux-pm, Marcel Holtmann
On Mon, 6 Aug 2007, Cornelia Huck wrote:
> On Fri, 3 Aug 2007 23:39:31 +0200,
> "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
>
> > > Suggested functions:
> > >
> > > 1. Move A in front of B, together with all of A's ancestors which
> > > aren't already ahead of B (or maybe require that A's parent be
> > > ahead of B already).
>
> I'd prefer "move includes ancestors" as not to introduce another
> possibility where users can get it wrong.
We'll see how it ends up being used. If we check that A's parent is
already ahead of B then callers won't be able to get it wrong.
Alan Stern
^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2007-08-06 13:53 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-28 15:36 Possible problem with device_move() Alan Stern
2007-07-30 6:42 ` Cornelia Huck
2007-07-30 17:34 ` Alan Stern
2007-07-31 8:33 ` Cornelia Huck
2007-07-31 15:11 ` Alan Stern
2007-07-31 15:27 ` Rafael J. Wysocki
2007-07-31 15:30 ` Cornelia Huck
2007-07-31 18:17 ` Alan Stern
2007-07-31 19:12 ` Rafael J. Wysocki
2007-07-31 20:37 ` Alan Stern
2007-08-01 13:03 ` Cornelia Huck
2007-08-01 15:22 ` Alan Stern
2007-08-01 17:04 ` Cornelia Huck
2007-08-01 17:35 ` Alan Stern
2007-08-01 19:25 ` Rafael J. Wysocki
2007-08-01 20:27 ` Alan Stern
2007-08-02 8:06 ` Cornelia Huck
2007-08-02 14:19 ` Alan Stern
2007-08-02 14:50 ` Cornelia Huck
2007-08-02 11:21 ` Rafael J. Wysocki
2007-08-02 14:24 ` Alan Stern
2007-08-02 14:58 ` Rafael J. Wysocki
2007-08-02 15:23 ` Alan Stern
2007-08-02 22:39 ` Rafael J. Wysocki
2007-08-03 14:56 ` Alan Stern
2007-08-03 21:39 ` Rafael J. Wysocki
2007-08-06 9:31 ` Cornelia Huck
2007-08-06 13:53 ` Alan Stern
2007-07-30 6:53 ` Marcel Holtmann
2007-07-30 17:37 ` Alan Stern
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox