From: Peter Zijlstra <peterz@infradead.org>
To: Alan Stern <stern@rowland.harvard.edu>
Cc: Greg KH <greg@kroah.com>, Oliver Neukum <oneukum@suse.de>,
Stephen Hemminger <shemminger@linux-foundation.org>,
linux-kernel@vger.kernel.org, apw <apw@shadowen.org>,
Ingo Molnar <mingo@elte.hu>,
linux-usb-devel@lists.sourceforge.net
Subject: Re: device struct bloat
Date: Tue, 06 Nov 2007 18:19:26 +0100 [thread overview]
Message-ID: <1194369566.6289.59.camel@twins> (raw)
In-Reply-To: <Pine.LNX.4.44L0.0711061114010.3883-100000@iolanthe.rowland.org>
On Tue, 2007-11-06 at 11:32 -0500, Alan Stern wrote:
> On Tue, 6 Nov 2007, Peter Zijlstra wrote:
> > On Tue, 2007-11-06 at 10:36 -0500, Alan Stern wrote:
> > > You can't possibly solve the lockdep problems here with a simple-minded
> > > approach like your DRIVER_NORMAL, DRIVER_PARENT, etc. The device tree
> > > is arbitrarily deep & wide, and there is at least one routine that
> > > acquires the semaphores for _all_ the devices in the tree.
> >
> > *blink* someone needs to take all locks - why?
>
> It's for system suspend. All the devices are locked to prevent driver
> binding and unbinding during the suspend transition.
Just locking the tree root is not enough? (thereby avoiding all any new
modifying operation to descend into the tree).
> Even apart from that, there are places where the USB core needs to
> acquire multiple layers of locks (more than just two). For example, if
> somebody has hubs nested several deep and then unplugs the hub closest
> to the computer, this will happen.
Pin the sub-tree root with a reference, iterate the sub-tree and delete
the leafs one by one?
> Shucks, any time a driver's probe routine tries to register a child
> device you will run into problems. probe() is always called with the
> device locked, and registration of a child will acquire the child's
> lock in order to probe drivers for the child.
Right, so you say you have unbounded stack recursion? How about pushing
each probe into a stack (workqueue perhaps). Then each stack entry would
only do a single level probe, if it would recurse push a new entry on
the stack. Basic recursive -> stack transform.
> > > This fact
> > > alone seems to preclude using lockdep for device locks. (If there was
> > > a form of mutex_lock() that bypassed the lockdep checks, you could use
> > > it and avoid these issues.)
> >
> > I'm sceptical of this, since its a simple tree (as opposed to a balanced
> > tree) a simple lock-coupling approach should be enough to guarantee
> > consistency.
>
> I don't follow your reasoning. Let's say a task wants to lock all the
> direct children of a particular device. In what order should the locks
> be acquired?
I'd first start by asking if you want to lock all the children or the
sub-tree. The latter can be done by locking the root of said sub-tree.
> There's no natural tree-related ordering to follow.
Of course there is a natural deadlock free locking order in a tree: lock
the parent, lock all its children, repeat by noting that each child is a
parent again. If you only ever lock top down, there should not be any
lateral dependencies.
> In the simple case where locks are acquired along a path in the tree,
> you could solve the lockdep issues by passing mutex_lock_nested() a key
> equal to the device's depth in the tree. But that won't work with more
> complicated cases.
And only up to 8, as that's the max nesting depth.
> > > Deadlock is a serious consideration. For the most part, routines
> > > locking devices do so along a single path in the tree. For this simple
> > > case the rule is: Never acquire a parent's lock while holding the
> > > child's lock.
> >
> > Sure, but once you have a parent's lock, you could unlock your
> > grandparent, no? (it having a locked child, your parent, should be
> > enough to guarantee its continued existence)
>
> Of course. So what? In many cases the code needs to keep all three
> locks. The locks don't merely guarantee the device structs' continued
> existence (a kref could do that) -- they serialize a number of related
> operations.
Right, but does the grandparent really need serialisation? Normal simple
tree manipulations don't require more than 2 locks to guarantee tree
consistency. So you either don't have a simple tree, or you have more
requirements.
> > > The routine that locks all the devices acquires the locks in order of
> > > device registration. The idea here is that children are always
> > > registered _after_ their parents, so this should be compatible with the
> > > previous rule. But there is a potential problem: device_move() can
> > > move an older child under a younger parent!
> >
> > Seems like a weird rule, a typical tree locking rule would be to lock
> > them top-down - such a rule can easily cope with moves: lock old parent,
> > lock child, lock new parent, move child, unlock all in reverse order.
>
> Yep. But this isn't a typical tree-locking. System suspend sometimes
> requires that devices be powered down in a specific order, and there
> are constraints not captured by the positions in the tree. We get
> around this by powering devices down in reverse order of detection,
> which should always be safe. Although the locking need not necessarily
> follow these same constraints, it is certainly the easiest approach.
Ah, there are more constraints than tree order (which as I get it
resembles bus order)? Which confuses me, as you cannot detect a leaf
before you have a parent, so top-down should still be good, no?
> (And by the way, your example rule "lock old parent, lock child, lock
> new parent" is subject to deadlocks. What if another task tries to
> move a different device from under the new parent to the old parent at
> the same time?)
D'0h, right you are. How about a delete, insert sequence?
> > Like said, I think the tree locking model should be revisited. A
> > top-down locking model with lock-coupling should, from my ignorant
> > perspective, solve your problems.
>
> I don't understand what you mean by "lock-coupling".
A
B C
D E F G
In order to lock F we do:
lock A, lock C, unlock A, lock F, unlock C
Look at it this way, either you're more complex than the VFS or it can
be annotated :-)
next prev parent reply other threads:[~2007-11-06 17:19 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-11-03 19:48 device struct bloat Stephen Hemminger
2007-11-03 23:14 ` Greg KH
2007-11-04 20:29 ` Peter Zijlstra
2007-11-05 3:58 ` Greg KH
2007-11-05 10:46 ` Peter Zijlstra
2007-11-05 10:57 ` Peter Zijlstra
2007-11-05 22:33 ` Stefan Richter
2007-11-05 22:49 ` Greg KH
2007-11-06 1:38 ` [linux-usb-devel] " David Brownell
2007-11-06 9:43 ` Peter Zijlstra
2007-11-06 9:48 ` Peter Zijlstra
2007-11-06 15:36 ` Alan Stern
2007-11-06 15:58 ` Peter Zijlstra
2007-11-06 16:32 ` Alan Stern
2007-11-06 17:19 ` Peter Zijlstra [this message]
2007-11-06 18:05 ` Alan Stern
2007-11-06 18:57 ` Peter Zijlstra
2007-11-07 16:42 ` Alan Stern
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1194369566.6289.59.camel@twins \
--to=peterz@infradead.org \
--cc=apw@shadowen.org \
--cc=greg@kroah.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-usb-devel@lists.sourceforge.net \
--cc=mingo@elte.hu \
--cc=oneukum@suse.de \
--cc=shemminger@linux-foundation.org \
--cc=stern@rowland.harvard.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.