* Possible problem with device_move() @ 2007-07-28 15:36 Alan Stern 2007-07-30 6:42 ` Cornelia Huck 2007-07-30 6:53 ` Marcel Holtmann 0 siblings, 2 replies; 30+ messages in thread From: Alan Stern @ 2007-07-28 15:36 UTC (permalink / raw) To: Cornelia Huck, Marcel Holtmann; +Cc: Linux-pm mailing list Cornelia and Marcel: The only places in the kernel that call device_move() are in drivers/s390/cio/device.c and net/bluetooth/rfcomm/tty.c, so I want to check with the two of you. I'm in the midst of changing the Power Management core to make it acquire the device semaphore of every device during a suspend and resume. The order of acquisition is the order of device registration, which normally agrees with the way locks are acquired when going through the device tree (i.e., parents before children). But when you call device_move() that might no longer be true. If a device is moved so that its new parent was registered _after_ it was, then the two orders will disagree. This raises the possibility of a deadlock if any thread ever tries to lock the device while holding the new parent's lock -- as might happen during an unregistration, for example. Can you tell whether this will ever cause a problem? Or is it known to be safe because whenever you call device_move(), the new parent was registered before the device being moved? Thanks, Alan Stern ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Possible problem with device_move() 2007-07-28 15:36 Possible problem with device_move() Alan Stern @ 2007-07-30 6:42 ` Cornelia Huck 2007-07-30 17:34 ` Alan Stern 2007-07-30 6:53 ` Marcel Holtmann 1 sibling, 1 reply; 30+ messages in thread From: Cornelia Huck @ 2007-07-30 6:42 UTC (permalink / raw) To: Alan Stern; +Cc: Marcel Holtmann, Linux-pm mailing list On Sat, 28 Jul 2007 11:36:42 -0400 (EDT), Alan Stern <stern@rowland.harvard.edu> wrote: > Can you tell whether this will ever cause a problem? Or is it known to > be safe because whenever you call device_move(), the new parent was > registered before the device being moved? Unfortunately, we can't make any assumptions in cio whether the new parent was registered before the moving device. (A little background: We have each ccw_device as parent of a subchannel as discovered during initial sense. In the case of z/VM (a hypervisor), all subchannels are always consecutively numbered. Now the following may happen: A dasd on subchannel 0 is detached and a new device, say a ctc, is defined. As subchannel 0 was free from z/VMs perspective, the ctc ccw_device now sits on subchannel 0 and the dasd ccw_device is moved to the artificial 'defunct' subchannel. When the dasd is attached again, z/VM will hand out the next free subchannel, which we allocate a subchannel structure for and move the dasd ccw_device to. And here's the case where the child is older than the parent...) While we don't do suspend/resume yet on s390, I don't want to rule it out for the future... ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Possible problem with device_move() 2007-07-30 6:42 ` Cornelia Huck @ 2007-07-30 17:34 ` Alan Stern 2007-07-31 8:33 ` Cornelia Huck 0 siblings, 1 reply; 30+ messages in thread From: Alan Stern @ 2007-07-30 17:34 UTC (permalink / raw) To: Cornelia Huck; +Cc: Marcel Holtmann, Linux-pm mailing list On Mon, 30 Jul 2007, Cornelia Huck wrote: > On Sat, 28 Jul 2007 11:36:42 -0400 (EDT), > Alan Stern <stern@rowland.harvard.edu> wrote: > > > Can you tell whether this will ever cause a problem? Or is it known to > > be safe because whenever you call device_move(), the new parent was > > registered before the device being moved? > > Unfortunately, we can't make any assumptions in cio whether the new > parent was registered before the moving device. > > (A little background: We have each ccw_device as parent of a subchannel > as discovered during initial sense. In the case of z/VM (a hypervisor), > all subchannels are always consecutively numbered. Now the following > may happen: A dasd on subchannel 0 is detached and a new device, say a > ctc, is defined. As subchannel 0 was free from z/VMs perspective, the > ctc ccw_device now sits on subchannel 0 and the dasd ccw_device is > moved to the artificial 'defunct' subchannel. When the dasd is attached > again, z/VM will hand out the next free subchannel, which we allocate a > subchannel structure for and move the dasd ccw_device to. And here's > the case where the child is older than the parent...) As long as you never hold the device semaphores of both the parent and the child there shouldn't be any problem. Could this happen when a subchannel structure is deallocated? > While we don't do suspend/resume yet on s390, I don't want to rule it > out for the future... Thanks, Alan Stern ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Possible problem with device_move() 2007-07-30 17:34 ` Alan Stern @ 2007-07-31 8:33 ` Cornelia Huck 2007-07-31 15:11 ` Alan Stern 0 siblings, 1 reply; 30+ messages in thread From: Cornelia Huck @ 2007-07-31 8:33 UTC (permalink / raw) To: Alan Stern; +Cc: Marcel Holtmann, Linux-pm mailing list On Mon, 30 Jul 2007 13:34:35 -0400 (EDT), Alan Stern <stern@rowland.harvard.edu> wrote: > As long as you never hold the device semaphores of both the parent and > the child there shouldn't be any problem. Could this happen when a > subchannel structure is deallocated? This is indeed what happens, since we unregister the child ccw_device from the (io_)subchannel's remove function. (We used to do unregistering of the ccw_device via a workqueue (in order to avoid livelocks on a no longer existing bus semaphore), but removed it since it seemed to be needlessly complicated code.) ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Possible problem with device_move() 2007-07-31 8:33 ` Cornelia Huck @ 2007-07-31 15:11 ` Alan Stern 2007-07-31 15:27 ` Rafael J. Wysocki 2007-07-31 15:30 ` Cornelia Huck 0 siblings, 2 replies; 30+ messages in thread From: Alan Stern @ 2007-07-31 15:11 UTC (permalink / raw) To: Cornelia Huck; +Cc: Marcel Holtmann, Linux-pm mailing list On Tue, 31 Jul 2007, Cornelia Huck wrote: > On Mon, 30 Jul 2007 13:34:35 -0400 (EDT), > Alan Stern <stern@rowland.harvard.edu> wrote: > > > As long as you never hold the device semaphores of both the parent and > > the child there shouldn't be any problem. Could this happen when a > > subchannel structure is deallocated? > > This is indeed what happens, since we unregister the child ccw_device > from the (io_)subchannel's remove function. > > (We used to do unregistering of the ccw_device via a workqueue (in > order to avoid livelocks on a no longer existing bus semaphore), but > removed it since it seemed to be needlessly complicated code.) It sounds like you might be in trouble with suspend anyway (if S390 implemented it). As things stand now, the PM core would try to suspend the subchannel before suspending the ccw_device, because the subchannel was registered later. Marcel, you might be in a similar position. The PM core could try to suspend a Bluetooth adapter before suspending its child TTY node. I wonder if device_move() shouldn't change the order of entries in the dpm_active list so that the new parent (and all its ancestors) get moved up ahead of the device's position? But that might cause other problems... Alan Stern ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Possible problem with device_move() 2007-07-31 15:11 ` Alan Stern @ 2007-07-31 15:27 ` Rafael J. Wysocki 2007-07-31 15:30 ` Cornelia Huck 1 sibling, 0 replies; 30+ messages in thread From: Rafael J. Wysocki @ 2007-07-31 15:27 UTC (permalink / raw) To: linux-pm; +Cc: Cornelia Huck, Marcel Holtmann On Tuesday, 31 July 2007 17:11, Alan Stern wrote: > On Tue, 31 Jul 2007, Cornelia Huck wrote: > > > On Mon, 30 Jul 2007 13:34:35 -0400 (EDT), > > Alan Stern <stern@rowland.harvard.edu> wrote: > > > > > As long as you never hold the device semaphores of both the parent and > > > the child there shouldn't be any problem. Could this happen when a > > > subchannel structure is deallocated? > > > > This is indeed what happens, since we unregister the child ccw_device > > from the (io_)subchannel's remove function. > > > > (We used to do unregistering of the ccw_device via a workqueue (in > > order to avoid livelocks on a no longer existing bus semaphore), but > > removed it since it seemed to be needlessly complicated code.) > > It sounds like you might be in trouble with suspend anyway (if S390 > implemented it). As things stand now, the PM core would try to suspend > the subchannel before suspending the ccw_device, because the subchannel > was registered later. > > Marcel, you might be in a similar position. The PM core could try to > suspend a Bluetooth adapter before suspending its child TTY node. > > I wonder if device_move() shouldn't change the order of entries in the > dpm_active list so that the new parent (and all its ancestors) get > moved up ahead of the device's position? But that might cause other > problems... Still, if the ordering of dpm_active doesn't reflect the relationships between devices, we're in trouble anyway. Greetings, Rafael -- "Premature optimization is the root of all evil." - Donald Knuth ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Possible problem with device_move() 2007-07-31 15:11 ` Alan Stern 2007-07-31 15:27 ` Rafael J. Wysocki @ 2007-07-31 15:30 ` Cornelia Huck 2007-07-31 18:17 ` Alan Stern 1 sibling, 1 reply; 30+ messages in thread From: Cornelia Huck @ 2007-07-31 15:30 UTC (permalink / raw) To: Alan Stern; +Cc: Marcel Holtmann, Linux-pm mailing list On Tue, 31 Jul 2007 11:11:50 -0400 (EDT), Alan Stern <stern@rowland.harvard.edu> wrote: > I wonder if device_move() shouldn't change the order of entries in the > dpm_active list so that the new parent (and all its ancestors) get > moved up ahead of the device's position? But that might cause other > problems... This may be necessary if walking the list is the single determining factor to the order of suspend/resume. Are there any other dependencies not covered by time of registration order? I would imagine those needed moving devices on the dpm_active list as well... ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Possible problem with device_move() 2007-07-31 15:30 ` Cornelia Huck @ 2007-07-31 18:17 ` Alan Stern 2007-07-31 19:12 ` Rafael J. Wysocki 0 siblings, 1 reply; 30+ messages in thread From: Alan Stern @ 2007-07-31 18:17 UTC (permalink / raw) To: Cornelia Huck; +Cc: Marcel Holtmann, Linux-pm mailing list On Tue, 31 Jul 2007, Cornelia Huck wrote: > On Tue, 31 Jul 2007 11:11:50 -0400 (EDT), > Alan Stern <stern@rowland.harvard.edu> wrote: > > > I wonder if device_move() shouldn't change the order of entries in the > > dpm_active list so that the new parent (and all its ancestors) get > > moved up ahead of the device's position? But that might cause other > > problems... > > This may be necessary if walking the list is the single determining > factor to the order of suspend/resume. It is. > Are there any other dependencies > not covered by time of registration order? I would imagine those needed > moving devices on the dpm_active list as well... I'm not sure what you mean. Devices where A needs to be suspended before B even though A was discovered first? I'm not aware of anything like that, other than your case and Marcel's. Right now there's no code in the PM core to handle such things. There is a dependency in the USB subsystem, wherein I need an EHCI controller to be _resumed_ after its companion UHCI or OHCI controllers. This works out, thanks to the fact that manufacturers tend to give the EHCI controller the largest PCI function number and the Linux PCI core enumerates functions in numerical order. This is just pure luck, however, and if anything changed I'd have to add an explicit fix. Alan Stern ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Possible problem with device_move() 2007-07-31 18:17 ` Alan Stern @ 2007-07-31 19:12 ` Rafael J. Wysocki 2007-07-31 20:37 ` Alan Stern 0 siblings, 1 reply; 30+ messages in thread From: Rafael J. Wysocki @ 2007-07-31 19:12 UTC (permalink / raw) To: Alan Stern; +Cc: Cornelia Huck, linux-pm, Marcel Holtmann On Tuesday, 31 July 2007 20:17, Alan Stern wrote: > On Tue, 31 Jul 2007, Cornelia Huck wrote: > > > On Tue, 31 Jul 2007 11:11:50 -0400 (EDT), > > Alan Stern <stern@rowland.harvard.edu> wrote: > > > > > I wonder if device_move() shouldn't change the order of entries in the > > > dpm_active list so that the new parent (and all its ancestors) get > > > moved up ahead of the device's position? But that might cause other > > > problems... > > > > This may be necessary if walking the list is the single determining > > factor to the order of suspend/resume. > > It is. > > > Are there any other dependencies > > not covered by time of registration order? I would imagine those needed > > moving devices on the dpm_active list as well... > > I'm not sure what you mean. Devices where A needs to be suspended > before B even though A was discovered first? I'm not aware of anything > like that, other than your case and Marcel's. Right now there's no > code in the PM core to handle such things. > > There is a dependency in the USB subsystem, wherein I need an EHCI > controller to be _resumed_ after its companion UHCI or OHCI > controllers. This works out, thanks to the fact that manufacturers > tend to give the EHCI controller the largest PCI function number and > the Linux PCI core enumerates functions in numerical order. This is > just pure luck, however, and if anything changed I'd have to add an > explicit fix. That sounds really worrying to me. The design appears to be very fragile if such things are possible, even in theory. Greetings, Rafael -- "Premature optimization is the root of all evil." - Donald Knuth ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Possible problem with device_move() 2007-07-31 19:12 ` Rafael J. Wysocki @ 2007-07-31 20:37 ` Alan Stern 2007-08-01 13:03 ` Cornelia Huck 0 siblings, 1 reply; 30+ messages in thread From: Alan Stern @ 2007-07-31 20:37 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: Cornelia Huck, linux-pm, Marcel Holtmann On Tue, 31 Jul 2007, Rafael J. Wysocki wrote: > > There is a dependency in the USB subsystem, wherein I need an EHCI > > controller to be _resumed_ after its companion UHCI or OHCI > > controllers. This works out, thanks to the fact that manufacturers > > tend to give the EHCI controller the largest PCI function number and > > the Linux PCI core enumerates functions in numerical order. This is > > just pure luck, however, and if anything changed I'd have to add an > > explicit fix. > > That sounds really worrying to me. > > The design appears to be very fragile if such things are possible, even in > theory. It wouldn't break the entire resume; it would only prevent the USB-Persist facility from working with non-high-speed devices. So it's not quite as bad as it may seem. Still, I suppose it would be best to add a "move device A to the end of the dpm_active list" routine for use in this sort of situation. This should be easy to do. Alan Stern ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Possible problem with device_move() 2007-07-31 20:37 ` Alan Stern @ 2007-08-01 13:03 ` Cornelia Huck 2007-08-01 15:22 ` Alan Stern 0 siblings, 1 reply; 30+ messages in thread From: Cornelia Huck @ 2007-08-01 13:03 UTC (permalink / raw) To: Alan Stern; +Cc: linux-pm, Marcel Holtmann On Tue, 31 Jul 2007 16:37:03 -0400 (EDT), Alan Stern <stern@rowland.harvard.edu> wrote: > Still, I suppose it would be best to add a "move device A to the end of > the dpm_active list" routine for use in this sort of situation. This > should be easy to do. Would it make sense if device_move() moved the new parent (and its parents) after the device in the dpm_active list automatically? Then we would only have to worry about reordering manually in very special cases. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Possible problem with device_move() 2007-08-01 13:03 ` Cornelia Huck @ 2007-08-01 15:22 ` Alan Stern 2007-08-01 17:04 ` Cornelia Huck 0 siblings, 1 reply; 30+ messages in thread From: Alan Stern @ 2007-08-01 15:22 UTC (permalink / raw) To: Cornelia Huck; +Cc: linux-pm, Marcel Holtmann On Wed, 1 Aug 2007, Cornelia Huck wrote: > On Tue, 31 Jul 2007 16:37:03 -0400 (EDT), > Alan Stern <stern@rowland.harvard.edu> wrote: > > > Still, I suppose it would be best to add a "move device A to the end of > > the dpm_active list" routine for use in this sort of situation. This > > should be easy to do. > > Would it make sense if device_move() moved the new parent (and its > parents) after the device in the dpm_active list automatically? Then we > would only have to worry about reordering manually in very special > cases. As it stands right now, every place device_move() gets called is already special! I'm afraid of reordering devices automatically; there's too much potential for creating new problems. When a driver calls device_move() or does something similar, it should know what sort of list rearrangement is safe. But the PM core can't be expected to know. Alan Stern ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Possible problem with device_move() 2007-08-01 15:22 ` Alan Stern @ 2007-08-01 17:04 ` Cornelia Huck 2007-08-01 17:35 ` Alan Stern 0 siblings, 1 reply; 30+ messages in thread From: Cornelia Huck @ 2007-08-01 17:04 UTC (permalink / raw) To: Alan Stern; +Cc: linux-pm, Marcel Holtmann On Wed, 1 Aug 2007 11:22:12 -0400 (EDT), Alan Stern <stern@rowland.harvard.edu> wrote: > As it stands right now, every place device_move() gets called is > already special! The "special cases" I was thinking about are those where the order to suspend/resume is not covered by a parent/child relationship, but by (possibly random) order of registration. I'd have thought the rule "the child must be suspended before the parent" was pretty straightforward, but... > > I'm afraid of reordering devices automatically; there's too much > potential for creating new problems. When a driver calls device_move() > or does something similar, it should know what sort of list > rearrangement is safe. But the PM core can't be expected to know. ...you have a point here. Automatic reordering may destroy other ordering, so we shouldn't do it. (The whole list based on registration order thing seems a bit fragile to me, but I don't know enough of the PM core and suspend/resume in general to make a better suggestion :/) ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Possible problem with device_move() 2007-08-01 17:04 ` Cornelia Huck @ 2007-08-01 17:35 ` Alan Stern 2007-08-01 19:25 ` Rafael J. Wysocki 0 siblings, 1 reply; 30+ messages in thread From: Alan Stern @ 2007-08-01 17:35 UTC (permalink / raw) To: Cornelia Huck; +Cc: linux-pm, Marcel Holtmann On Wed, 1 Aug 2007, Cornelia Huck wrote: > On Wed, 1 Aug 2007 11:22:12 -0400 (EDT), > Alan Stern <stern@rowland.harvard.edu> wrote: > > > As it stands right now, every place device_move() gets called is > > already special! > > The "special cases" I was thinking about are those where the order to > suspend/resume is not covered by a parent/child relationship, but by > (possibly random) order of registration. I'd have thought the rule "the > child must be suspended before the parent" was pretty straightforward, > but... So you mean additional requirements, like what I encountered with EHCI. It's peculiar in that the controller contains a hardware switch which can literally connect a USB device to a different (companion) controller, and setting the switch before the companion controller has been resumed won't work. The correct way to handle this would be to set the switch later when the USB device is resumed, but that would be much more awkward. I haven't heard of any other cases. > (The whole list based on registration order thing seems a bit fragile > to me, but I don't know enough of the PM core and suspend/resume in > general to make a better suggestion :/) It hasn't been bad in the past. If A was discovered before B then ipso facto it is safe to suspend B before suspending A. Likewise, in the absence of device_move, if A was discovered before B then A cannot appear below B in the device tree. Of course, this assumes devices are registered as they are discovered. Alan Stern ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Possible problem with device_move() 2007-08-01 17:35 ` Alan Stern @ 2007-08-01 19:25 ` Rafael J. Wysocki 2007-08-01 20:27 ` Alan Stern 0 siblings, 1 reply; 30+ messages in thread From: Rafael J. Wysocki @ 2007-08-01 19:25 UTC (permalink / raw) To: Alan Stern; +Cc: Cornelia Huck, linux-pm, Marcel Holtmann On Wednesday, 1 August 2007 19:35, Alan Stern wrote: > On Wed, 1 Aug 2007, Cornelia Huck wrote: > > > On Wed, 1 Aug 2007 11:22:12 -0400 (EDT), > > Alan Stern <stern@rowland.harvard.edu> wrote: > > > > > As it stands right now, every place device_move() gets called is > > > already special! > > > > The "special cases" I was thinking about are those where the order to > > suspend/resume is not covered by a parent/child relationship, but by > > (possibly random) order of registration. I'd have thought the rule "the > > child must be suspended before the parent" was pretty straightforward, > > but... > > So you mean additional requirements, like what I encountered with EHCI. > It's peculiar in that the controller contains a hardware switch which > can literally connect a USB device to a different (companion) > controller, and setting the switch before the companion controller has > been resumed won't work. The correct way to handle this would be to > set the switch later when the USB device is resumed, but that would be > much more awkward. > > I haven't heard of any other cases. Well, that's really exceptional and I have no idea how to handle things like that in a generic way. > > (The whole list based on registration order thing seems a bit fragile > > to me, but I don't know enough of the PM core and suspend/resume in > > general to make a better suggestion :/) > > It hasn't been bad in the past. If A was discovered before B then ipso > facto it is safe to suspend B before suspending A. Likewise, in the > absence of device_move, if A was discovered before B then A cannot > appear below B in the device tree. Of course, this assumes devices are > registered as they are discovered. Which is a weak assupmtion ... Well, we seem to have some examples of possible situations in which the design might not be adequate and that's alarming. Perhaps we should create dpm_active right before the suspend, by really traversing the device tree, when we own all device semaphores and no device objects can be added/removed? Greetings, Rafael -- "Premature optimization is the root of all evil." - Donald Knuth ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Possible problem with device_move() 2007-08-01 19:25 ` Rafael J. Wysocki @ 2007-08-01 20:27 ` Alan Stern 2007-08-02 8:06 ` Cornelia Huck 2007-08-02 11:21 ` Rafael J. Wysocki 0 siblings, 2 replies; 30+ messages in thread From: Alan Stern @ 2007-08-01 20:27 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: Cornelia Huck, linux-pm, Marcel Holtmann On Wed, 1 Aug 2007, Rafael J. Wysocki wrote: > > > (The whole list based on registration order thing seems a bit fragile > > > to me, but I don't know enough of the PM core and suspend/resume in > > > general to make a better suggestion :/) > > > > It hasn't been bad in the past. If A was discovered before B then ipso > > facto it is safe to suspend B before suspending A. Likewise, in the > > absence of device_move, if A was discovered before B then A cannot > > appear below B in the device tree. Of course, this assumes devices are > > registered as they are discovered. > > Which is a weak assupmtion ... > > Well, we seem to have some examples of possible situations in which the > design might not be adequate and that's alarming. > > Perhaps we should create dpm_active right before the suspend, by really > traversing the device tree, when we own all device semaphores and no device > objects can be added/removed? We're doing okay the way things are. Changing the order is more likely to cause new problems than to solve existing ones. In fact, I would go even farther. Let's document the ordering given by dpm_active (by default, in order of registration) so that drivers can depend on it, and let's add special routines to change the order for the few cases that require it -- under driver control. A "move to the end" routine would solve the EHCI issue. It ought to solve the problems associated with device_move() also, provided the device being moved doesn't have any children. Cornelia and Marcel, is that the case? When a ccw_device or RFCOMM TTY is reparented under a new device, do any children get moved along with it? Alan Stern ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Possible problem with device_move() 2007-08-01 20:27 ` Alan Stern @ 2007-08-02 8:06 ` Cornelia Huck 2007-08-02 14:19 ` Alan Stern 2007-08-02 11:21 ` Rafael J. Wysocki 1 sibling, 1 reply; 30+ messages in thread From: Cornelia Huck @ 2007-08-02 8:06 UTC (permalink / raw) To: Alan Stern; +Cc: linux-pm, Marcel Holtmann On Wed, 1 Aug 2007 16:27:48 -0400 (EDT), Alan Stern <stern@rowland.harvard.edu> wrote: > In fact, I would go even farther. Let's document the ordering given by > dpm_active (by default, in order of registration) so that drivers can > depend on it, and let's add special routines to change the order for > the few cases that require it -- under driver control. Driver authors must definitely be made aware of this, and they can only do something if they are given some control. Implicit assumptions mean waiting for breakage to happen. > > A "move to the end" routine would solve the EHCI issue. It ought to > solve the problems associated with device_move() also, provided the > device being moved doesn't have any children. The device drivers need to be given more control; they could just walk the children list in this case then. > > Cornelia and Marcel, is that the case? When a ccw_device or RFCOMM TTY > is reparented under a new device, do any children get moved along with > it? Hm, device_move() just drags around the children with it (and ccw_devices may have children as added by the driver; zfcp does that, for example). ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Possible problem with device_move() 2007-08-02 8:06 ` Cornelia Huck @ 2007-08-02 14:19 ` Alan Stern 2007-08-02 14:50 ` Cornelia Huck 0 siblings, 1 reply; 30+ messages in thread From: Alan Stern @ 2007-08-02 14:19 UTC (permalink / raw) To: Cornelia Huck; +Cc: linux-pm, Marcel Holtmann On Thu, 2 Aug 2007, Cornelia Huck wrote: > > A "move to the end" routine would solve the EHCI issue. It ought to > > solve the problems associated with device_move() also, provided the > > device being moved doesn't have any children. > > The device drivers need to be given more control; they could just walk > the children list in this case then. Come to think of it, "move to the end" won't solve the EHCI issue. I would need a "move A up so that it comes before B" routine. For this special case we wouldn't need to worry about ancestors of A because A and B will have the same parent, but in general A's ancestors would have to be moved also. That's a little worrisome because the ancestors might have dependencies of their own. > > Cornelia and Marcel, is that the case? When a ccw_device or RFCOMM TTY > > is reparented under a new device, do any children get moved along with > > it? > > Hm, device_move() just drags around the children with it (and > ccw_devices may have children as added by the driver; zfcp does that, > for example). So you would want to move the sub-channel plus its ancestors in front of the ccw_device, right? Can you think of a safe way to satisfy everybody? Alan Stern ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Possible problem with device_move() 2007-08-02 14:19 ` Alan Stern @ 2007-08-02 14:50 ` Cornelia Huck 0 siblings, 0 replies; 30+ messages in thread From: Cornelia Huck @ 2007-08-02 14:50 UTC (permalink / raw) To: Alan Stern; +Cc: linux-pm, Marcel Holtmann On Thu, 2 Aug 2007 10:19:32 -0400 (EDT), Alan Stern <stern@rowland.harvard.edu> wrote: > Come to think of it, "move to the end" won't solve the EHCI issue. I > would need a "move A up so that it comes before B" routine. For this > special case we wouldn't need to worry about ancestors of A because A > and B will have the same parent, but in general A's ancestors would > have to be moved also. That's a little worrisome because the ancestors > might have dependencies of their own. In bad cases, we would need to touch the whole tree. This sounds like people are likely to mess the tree up if given the power :( > So you would want to move the sub-channel plus its ancestors in front > of the ccw_device, right? Can you think of a safe way to satisfy > everybody? Unless I'm confused, I would need to have the order (children of ccw_device) -> ccw_device -> subchannel -> channel_subsystem, no? (Note that currently all subchannels are children of the same channel subsystem.) Correct order for moved devices could be most easily obtained by Rafael's suggestion of generating the list just before it is needed; but as this doesn't help with other dependencies, we would additionally need a way to specify those at device registration time (or when it is already registered), which doesn't sound like a terribly good solution either... ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Possible problem with device_move() 2007-08-01 20:27 ` Alan Stern 2007-08-02 8:06 ` Cornelia Huck @ 2007-08-02 11:21 ` Rafael J. Wysocki 2007-08-02 14:24 ` Alan Stern 1 sibling, 1 reply; 30+ messages in thread From: Rafael J. Wysocki @ 2007-08-02 11:21 UTC (permalink / raw) To: Alan Stern; +Cc: Cornelia Huck, linux-pm, Marcel Holtmann On Wednesday, 1 August 2007 22:27, Alan Stern wrote: > On Wed, 1 Aug 2007, Rafael J. Wysocki wrote: > > > > > (The whole list based on registration order thing seems a bit fragile > > > > to me, but I don't know enough of the PM core and suspend/resume in > > > > general to make a better suggestion :/) > > > > > > It hasn't been bad in the past. If A was discovered before B then ipso > > > facto it is safe to suspend B before suspending A. Likewise, in the > > > absence of device_move, if A was discovered before B then A cannot > > > appear below B in the device tree. Of course, this assumes devices are > > > registered as they are discovered. > > > > Which is a weak assupmtion ... > > > > Well, we seem to have some examples of possible situations in which the > > design might not be adequate and that's alarming. > > > > Perhaps we should create dpm_active right before the suspend, by really > > traversing the device tree, when we own all device semaphores and no device > > objects can be added/removed? > > We're doing okay the way things are. Changing the order is more likely > to cause new problems than to solve existing ones. Well, my idea is not to change the order, but to create the list from scratch when we need it and not in advance, because creating the list in advance causes problems to appear. In fact, if the system is never suspended, the list that we create is not even useful for anything. Greetings, Rafael -- "Premature optimization is the root of all evil." - Donald Knuth ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Possible problem with device_move() 2007-08-02 11:21 ` Rafael J. Wysocki @ 2007-08-02 14:24 ` Alan Stern 2007-08-02 14:58 ` Rafael J. Wysocki 0 siblings, 1 reply; 30+ messages in thread From: Alan Stern @ 2007-08-02 14:24 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: Cornelia Huck, linux-pm, Marcel Holtmann On Thu, 2 Aug 2007, Rafael J. Wysocki wrote: > > > Perhaps we should create dpm_active right before the suspend, by really > > > traversing the device tree, when we own all device semaphores and no device > > > objects can be added/removed? > > > > We're doing okay the way things are. Changing the order is more likely > > to cause new problems than to solve existing ones. > > Well, my idea is not to change the order, but to create the list from scratch > when we need it and not in advance, because creating the list in advance > causes problems to appear. Doing it that way will almost certainly result in a different order from the one we have now. And it won't help the EHCI issue, because there the question is which of several siblings should come last. > In fact, if the system is never suspended, the list that we create is not even > useful for anything. True. But it doesn't take up much space, especially if PM isn't configured. And creating the list doesn't take much time. Alan Stern ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Possible problem with device_move() 2007-08-02 14:24 ` Alan Stern @ 2007-08-02 14:58 ` Rafael J. Wysocki 2007-08-02 15:23 ` Alan Stern 0 siblings, 1 reply; 30+ messages in thread From: Rafael J. Wysocki @ 2007-08-02 14:58 UTC (permalink / raw) To: Alan Stern; +Cc: Cornelia Huck, linux-pm, Marcel Holtmann On Thursday, 2 August 2007 16:24, Alan Stern wrote: > On Thu, 2 Aug 2007, Rafael J. Wysocki wrote: > > > > > Perhaps we should create dpm_active right before the suspend, by really > > > > traversing the device tree, when we own all device semaphores and no device > > > > objects can be added/removed? > > > > > > We're doing okay the way things are. Changing the order is more likely > > > to cause new problems than to solve existing ones. > > > > Well, my idea is not to change the order, but to create the list from scratch > > when we need it and not in advance, because creating the list in advance > > causes problems to appear. > > Doing it that way will almost certainly result in a different order > from the one we have now. Yes, but wouldn't that order be more accurate? > And it won't help the EHCI issue, because there the question is which of > several siblings should come last. Why not? If the device can indicate something like "place me after that one" to the code creating the list, we can handle that too. > > In fact, if the system is never suspended, the list that we create is not even > > useful for anything. > > True. But it doesn't take up much space, especially if PM isn't > configured. And creating the list doesn't take much time. Sure. Greetings, Rafael -- "Premature optimization is the root of all evil." - Donald Knuth ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Possible problem with device_move() 2007-08-02 14:58 ` Rafael J. Wysocki @ 2007-08-02 15:23 ` Alan Stern 2007-08-02 22:39 ` Rafael J. Wysocki 0 siblings, 1 reply; 30+ messages in thread From: Alan Stern @ 2007-08-02 15:23 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: Cornelia Huck, linux-pm, Marcel Holtmann On Thu, 2 Aug 2007, Rafael J. Wysocki wrote: > > > Well, my idea is not to change the order, but to create the list from scratch > > > when we need it and not in advance, because creating the list in advance > > > causes problems to appear. > > > > Doing it that way will almost certainly result in a different order > > from the one we have now. > > Yes, but wouldn't that order be more accurate? I'm not so sure. There might be hidden dependencies, things we aren't aware of because they are automatically satisfied by order-of-discovery. With an arbitrary parent-first traversal of the device tree those dependencies could be violated. And then there are the weird possibilities created by device_move(). Isn't it possible that if an older child is moved under a younger parent, we actually might _want_ to suspend the parent first? Only the driver doing the move would know for sure. > > And it won't help the EHCI issue, because there the question is which of > > several siblings should come last. > > Why not? If the device can indicate something like "place me after that one" > to the code creating the list, we can handle that too. Yes, but currently we don't have any way of indicating that. And if we did, it work work just as well with the current dpm_active list, right? Alan Stern ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Possible problem with device_move() 2007-08-02 15:23 ` Alan Stern @ 2007-08-02 22:39 ` Rafael J. Wysocki 2007-08-03 14:56 ` Alan Stern 0 siblings, 1 reply; 30+ messages in thread From: Rafael J. Wysocki @ 2007-08-02 22:39 UTC (permalink / raw) To: Alan Stern; +Cc: Cornelia Huck, linux-pm, Marcel Holtmann On Thursday, 2 August 2007 17:23, Alan Stern wrote: > On Thu, 2 Aug 2007, Rafael J. Wysocki wrote: > > > > > Well, my idea is not to change the order, but to create the list from scratch > > > > when we need it and not in advance, because creating the list in advance > > > > causes problems to appear. > > > > > > Doing it that way will almost certainly result in a different order > > > from the one we have now. > > > > Yes, but wouldn't that order be more accurate? > > I'm not so sure. There might be hidden dependencies, things we aren't > aware of because they are automatically satisfied by > order-of-discovery. With an arbitrary parent-first traversal of the > device tree those dependencies could be violated. > > And then there are the weird possibilities created by device_move(). > Isn't it possible that if an older child is moved under a younger > parent, we actually might _want_ to suspend the parent first? Only the > driver doing the move would know for sure. > > > > And it won't help the EHCI issue, because there the question is which of > > > several siblings should come last. > > > > Why not? If the device can indicate something like "place me after that one" > > to the code creating the list, we can handle that too. > > Yes, but currently we don't have any way of indicating that. And if we > did, it work work just as well with the current dpm_active list, right? Well, I don't know. :-) It seems the problem is _exactly_ that we have no means to represent the dependencies between devices other than the order of registration and/or the parent-child relationships. Now, the simplest things that comes to mind is to do what we're doing (ie. use the order of registration) with a modification allowing the exceptional devices to have a "please move me to the end of list" flag set. The PM core would then browse the list during suspend and move the devices marked like this to the end of dpm_active along with their children, if need be. Greetings, Rafael -- "Premature optimization is the root of all evil." - Donald Knuth ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Possible problem with device_move() 2007-08-02 22:39 ` Rafael J. Wysocki @ 2007-08-03 14:56 ` Alan Stern 2007-08-03 21:39 ` Rafael J. Wysocki 0 siblings, 1 reply; 30+ messages in thread From: Alan Stern @ 2007-08-03 14:56 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: Cornelia Huck, linux-pm, Marcel Holtmann On Fri, 3 Aug 2007, Rafael J. Wysocki wrote: > > > Why not? If the device can indicate something like "place me after that one" > > > to the code creating the list, we can handle that too. > > > > Yes, but currently we don't have any way of indicating that. And if we > > did, it work work just as well with the current dpm_active list, right? > > Well, I don't know. :-) > > It seems the problem is _exactly_ that we have no means to represent the > dependencies between devices other than the order of registration and/or > the parent-child relationships. Yes. > Now, the simplest things that comes to mind is to do what we're doing (ie. > use the order of registration) with a modification allowing the exceptional > devices to have a "please move me to the end of list" flag set. The PM core > would then browse the list during suspend and move the devices marked like > this to the end of dpm_active along with their children, if need be. Code is more flexible than flags. Why not export a function drivers can call, or better yet, a handful of functions to do different things? It ought to suffice for drivers to call them during probing or registration, rather than every time the system is suspended. They also could be called after device_move() -- and to prevent races we should export pm_sleep_[un]lock(). Suggested functions: 1. Move A in front of B, together with all of A's ancestors which aren't already ahead of B (or maybe require that A's parent be ahead of B already). 2. Move device A to the end (A must not have any children). Anything else? I think this will cover all the cases we currently are concerned about. Alan Stern ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Possible problem with device_move() 2007-08-03 14:56 ` Alan Stern @ 2007-08-03 21:39 ` Rafael J. Wysocki 2007-08-06 9:31 ` Cornelia Huck 0 siblings, 1 reply; 30+ messages in thread From: Rafael J. Wysocki @ 2007-08-03 21:39 UTC (permalink / raw) To: Alan Stern; +Cc: Cornelia Huck, linux-pm, Marcel Holtmann On Friday, 3 August 2007 16:56, Alan Stern wrote: > On Fri, 3 Aug 2007, Rafael J. Wysocki wrote: > > > > > Why not? If the device can indicate something like "place me after that one" > > > > to the code creating the list, we can handle that too. > > > > > > Yes, but currently we don't have any way of indicating that. And if we > > > did, it work work just as well with the current dpm_active list, right? > > > > Well, I don't know. :-) > > > > It seems the problem is _exactly_ that we have no means to represent the > > dependencies between devices other than the order of registration and/or > > the parent-child relationships. > > Yes. > > > Now, the simplest things that comes to mind is to do what we're doing (ie. > > use the order of registration) with a modification allowing the exceptional > > devices to have a "please move me to the end of list" flag set. The PM core > > would then browse the list during suspend and move the devices marked like > > this to the end of dpm_active along with their children, if need be. > > Code is more flexible than flags. Why not export a function drivers > can call, or better yet, a handful of functions to do different things? Yes, we can do that. > It ought to suffice for drivers to call them during probing or > registration, rather than every time the system is suspended. They > also could be called after device_move() -- and to prevent races we > should export pm_sleep_[un]lock(). Yes, we should. > Suggested functions: > > 1. Move A in front of B, together with all of A's ancestors which > aren't already ahead of B (or maybe require that A's parent be > ahead of B already). > > 2. Move device A to the end (A must not have any children). > > Anything else? Nothing more comes to mind at the moment. > I think this will cover all the cases we currently are concerned about. I agree. Greetings, Rafael -- "Premature optimization is the root of all evil." - Donald Knuth ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Possible problem with device_move() 2007-08-03 21:39 ` Rafael J. Wysocki @ 2007-08-06 9:31 ` Cornelia Huck 2007-08-06 13:53 ` Alan Stern 0 siblings, 1 reply; 30+ messages in thread From: Cornelia Huck @ 2007-08-06 9:31 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: linux-pm, Marcel Holtmann On Fri, 3 Aug 2007 23:39:31 +0200, "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > > Suggested functions: > > > > 1. Move A in front of B, together with all of A's ancestors which > > aren't already ahead of B (or maybe require that A's parent be > > ahead of B already). I'd prefer "move includes ancestors" as not to introduce another possibility where users can get it wrong. > > > > 2. Move device A to the end (A must not have any children). > > > > Anything else? > > Nothing more comes to mind at the moment. > > > I think this will cover all the cases we currently are concerned about. > > I agree. Yes, I think this should be sufficient. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Re: Possible problem with device_move() 2007-08-06 9:31 ` Cornelia Huck @ 2007-08-06 13:53 ` Alan Stern 0 siblings, 0 replies; 30+ messages in thread From: Alan Stern @ 2007-08-06 13:53 UTC (permalink / raw) To: Cornelia Huck; +Cc: linux-pm, Marcel Holtmann On Mon, 6 Aug 2007, Cornelia Huck wrote: > On Fri, 3 Aug 2007 23:39:31 +0200, > "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > > > > Suggested functions: > > > > > > 1. Move A in front of B, together with all of A's ancestors which > > > aren't already ahead of B (or maybe require that A's parent be > > > ahead of B already). > > I'd prefer "move includes ancestors" as not to introduce another > possibility where users can get it wrong. We'll see how it ends up being used. If we check that A's parent is already ahead of B then callers won't be able to get it wrong. Alan Stern ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Possible problem with device_move() 2007-07-28 15:36 Possible problem with device_move() Alan Stern 2007-07-30 6:42 ` Cornelia Huck @ 2007-07-30 6:53 ` Marcel Holtmann 2007-07-30 17:37 ` Alan Stern 1 sibling, 1 reply; 30+ messages in thread From: Marcel Holtmann @ 2007-07-30 6:53 UTC (permalink / raw) To: Alan Stern; +Cc: Cornelia Huck, Linux-pm mailing list Hi Alan, > I'm in the midst of changing the Power Management core to make it > acquire the device semaphore of every device during a suspend and > resume. The order of acquisition is the order of device registration, > which normally agrees with the way locks are acquired when going > through the device tree (i.e., parents before children). > > But when you call device_move() that might no longer be true. If a > device is moved so that its new parent was registered _after_ it was, > then the two orders will disagree. This raises the possibility of a > deadlock if any thread ever tries to lock the device while holding the > new parent's lock -- as might happen during an unregistration, for > example. the Bluetooth subsystem or the RFCOMM TTY code to be more precise uses device_move() to attach an registered /dev/rfcommX device node (but not connected) to the appropriate Bluetooth adapter and the low-level piconet connection when you open (and thus connect) it. So mainly we re-parent the TTY to the Bluetooth connection and after the connection terminates we re-parent it to the virtual tree (NULL). Regards Marcel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Possible problem with device_move() 2007-07-30 6:53 ` Marcel Holtmann @ 2007-07-30 17:37 ` Alan Stern 0 siblings, 0 replies; 30+ messages in thread From: Alan Stern @ 2007-07-30 17:37 UTC (permalink / raw) To: Marcel Holtmann; +Cc: Cornelia Huck, Linux-pm mailing list On Mon, 30 Jul 2007, Marcel Holtmann wrote: > Hi Alan, > > > I'm in the midst of changing the Power Management core to make it > > acquire the device semaphore of every device during a suspend and > > resume. The order of acquisition is the order of device registration, > > which normally agrees with the way locks are acquired when going > > through the device tree (i.e., parents before children). > > > > But when you call device_move() that might no longer be true. If a > > device is moved so that its new parent was registered _after_ it was, > > then the two orders will disagree. This raises the possibility of a > > deadlock if any thread ever tries to lock the device while holding the > > new parent's lock -- as might happen during an unregistration, for > > example. > > the Bluetooth subsystem or the RFCOMM TTY code to be more precise uses > device_move() to attach an registered /dev/rfcommX device node (but not > connected) to the appropriate Bluetooth adapter and the low-level > piconet connection when you open (and thus connect) it. > > So mainly we re-parent the TTY to the Bluetooth connection and after the > connection terminates we re-parent it to the virtual tree (NULL). What would happen to the TTY if the Bluetooth adapter was unplugged while it was in use? Would the adapter's release routine try to acquire the TTY's device sem? Alan Stern ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2007-08-06 13:53 UTC | newest] Thread overview: 30+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-07-28 15:36 Possible problem with device_move() Alan Stern 2007-07-30 6:42 ` Cornelia Huck 2007-07-30 17:34 ` Alan Stern 2007-07-31 8:33 ` Cornelia Huck 2007-07-31 15:11 ` Alan Stern 2007-07-31 15:27 ` Rafael J. Wysocki 2007-07-31 15:30 ` Cornelia Huck 2007-07-31 18:17 ` Alan Stern 2007-07-31 19:12 ` Rafael J. Wysocki 2007-07-31 20:37 ` Alan Stern 2007-08-01 13:03 ` Cornelia Huck 2007-08-01 15:22 ` Alan Stern 2007-08-01 17:04 ` Cornelia Huck 2007-08-01 17:35 ` Alan Stern 2007-08-01 19:25 ` Rafael J. Wysocki 2007-08-01 20:27 ` Alan Stern 2007-08-02 8:06 ` Cornelia Huck 2007-08-02 14:19 ` Alan Stern 2007-08-02 14:50 ` Cornelia Huck 2007-08-02 11:21 ` Rafael J. Wysocki 2007-08-02 14:24 ` Alan Stern 2007-08-02 14:58 ` Rafael J. Wysocki 2007-08-02 15:23 ` Alan Stern 2007-08-02 22:39 ` Rafael J. Wysocki 2007-08-03 14:56 ` Alan Stern 2007-08-03 21:39 ` Rafael J. Wysocki 2007-08-06 9:31 ` Cornelia Huck 2007-08-06 13:53 ` Alan Stern 2007-07-30 6:53 ` Marcel Holtmann 2007-07-30 17:37 ` Alan Stern
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox