* Question about driver ioctl, big kernel lock, mutexes @ 2011-03-16 15:50 scameron 2011-03-16 19:06 ` Arnd Bergmann 2011-03-16 19:36 ` Arnd Bergmann 0 siblings, 2 replies; 6+ messages in thread From: scameron @ 2011-03-16 15:50 UTC (permalink / raw) To: linux-kernel; +Cc: arnd, axboe, scameron, dab, mike.miller I just noticed the existence of this change (I guess I'm a little slow): commit 2a48fc0ab24241755dc93bfd4f01d68efab47f5a Author: Arnd Bergmann <arnd@arndb.de> Date: Wed Jun 2 14:28:52 2010 +0200 block: autoconvert trivial BKL users to private mutex The block device drivers have all gained new lock_kernel calls from a recent pushdown, and some of the drivers were already using the BKL before. This turns the BKL into a set of per-driver mutexes. Still need to check whether this is safe to do. This changes the behavior of, for example, the CCISS_PASSTHRU and other ioctls. Previously, these were "protected" by the Big Kernel Lock. Now, from what I understand and what I've observed, the Big Kernel Lock is kind of strange, in that it auto-releases whenever you sleep. This means that the ioctl path in cciss used to allow concurrency (which is fine, they should all be thread safe. I'm kind of wondering why the BKL was there in the first place. I guess I assumed it was necessary for reasons having to do with the upper layers.) Now, if I understand things correctly, with this new driver specific mutex surrounding the ioctl path, only one thread is permitted in the ioctl path at a time, and whether it sleeps or not, the mutex is not released until it's done. That's a pretty big change in behavior. Some commands sent through the passthrough ioctls may take awhile, and they're going to block any other ioctls from getting started while they're going. Previously, it was possible to saturate the controller using multiple threads or processes hammering on the ioctl path. This would appear to no longer be the case. That being said, there was previously no real limit (other than pci_alloc_consistent eventually failing) on how many commands could be shoved through the cciss ioctl path at once, which was probably a bug. It has occurred to me to put a limit on this, and return -EBUSY when the limit is hit, though I don't know if programs using the ioctl are prepared to receive EBUSY... but maybe that's not my problem. Thoughts? I'm thinking that if there's no reason the ioctl path can't allow concurrency, then it should allow concurrency. -- steve ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Question about driver ioctl, big kernel lock, mutexes 2011-03-16 15:50 Question about driver ioctl, big kernel lock, mutexes scameron @ 2011-03-16 19:06 ` Arnd Bergmann 2011-03-16 19:47 ` scameron 2011-03-16 19:36 ` Arnd Bergmann 1 sibling, 1 reply; 6+ messages in thread From: Arnd Bergmann @ 2011-03-16 19:06 UTC (permalink / raw) To: scameron; +Cc: linux-kernel, axboe, dab, mike.miller On Wednesday 16 March 2011 16:50:33 scameron@beardog.cce.hp.com wrote: > > I just noticed the existence of this change (I guess > I'm a little slow): > > commit 2a48fc0ab24241755dc93bfd4f01d68efab47f5a > Author: Arnd Bergmann <arnd@arndb.de> > Date: Wed Jun 2 14:28:52 2010 +0200 > > block: autoconvert trivial BKL users to private mutex > > The block device drivers have all gained new lock_kernel > calls from a recent pushdown, and some of the drivers > were already using the BKL before. > > This turns the BKL into a set of per-driver mutexes. > Still need to check whether this is safe to do. > > This changes the behavior of, for example, the CCISS_PASSTHRU and > other ioctls. Previously, these were "protected" by the Big > Kernel Lock. Now, from what I understand and what I've observed, > the Big Kernel Lock is kind of strange, in that it auto-releases > whenever you sleep. This means that the ioctl path in cciss used > to allow concurrency (which is fine, they should all be thread > safe. I'm kind of wondering why the BKL was there in the first > place. I guess I assumed it was necessary for reasons having to > do with the upper layers.) The reason to have the BKL there is that the BKL used to protect every syscall. The block ioctl handling was simply one of the final places to lose it, after it was gone almost everywhere else. Originally (before Linux-1.3!), the only concurrency that was going on in the kernel was when one task was sleeping, so it was straightforward to add the semantics of the BKL that look so strange in retrospect. > Now, if I understand things correctly, with this new driver specific mutex > surrounding the ioctl path, only one thread is permitted in the ioctl path at a > time, and whether it sleeps or not, the mutex is not released until it's done. Yes. > That's a pretty big change in behavior. Some commands sent through > the passthrough ioctls may take awhile, and they're going to block > any other ioctls from getting started while they're going. Previously, > it was possible to saturate the controller using multiple threads or > processes hammering on the ioctl path. This would appear to no longer > be the case. Correct. Almost all ioctls in the kernel are of the "instantly return" kind, any ioctl that blocks for a potentially unlimited amount of time under a mutex would be a bug that I introduced in the conversion. > That being said, there was previously no real limit (other than > pci_alloc_consistent eventually failing) on how many commands > could be shoved through the cciss ioctl path at once, which was probably > a bug. It has occurred to me to put a limit on this, and return > -EBUSY when the limit is hit, though I don't know if programs using > the ioctl are prepared to receive EBUSY... but maybe that's not my > problem. > > Thoughts? > > I'm thinking that if there's no reason the ioctl path can't allow > concurrency, then it should allow concurrency. Absolutely. The best way forward would be to prove that the mutex I introduced is actually pointless, or alternatively change the code so that the mutex is only held in the places where it is really required, if any. When I submitted the patches, I always tried to make it clear that most of the new mutexes are not required at all, but that proving this for every single driver takes a lot of hard work. Having a mutex that should not be there causes a performance regression or a deadlock, both of which are fairly easy to bisect and fix, but missing a mutex because of a subtle BKL dependency would have meant introducing a race condition that would be extremely hard to analyse. Limiting the number of concurrent I/Os submitted to a device might be a good idea, but I think that's a separate discussion. Arnd ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Question about driver ioctl, big kernel lock, mutexes 2011-03-16 19:06 ` Arnd Bergmann @ 2011-03-16 19:47 ` scameron 2011-03-16 20:40 ` Arnd Bergmann 0 siblings, 1 reply; 6+ messages in thread From: scameron @ 2011-03-16 19:47 UTC (permalink / raw) To: Arnd Bergmann; +Cc: linux-kernel, axboe, dab, mike.miller, scameron On Wed, Mar 16, 2011 at 08:06:14PM +0100, Arnd Bergmann wrote: > On Wednesday 16 March 2011 16:50:33 scameron@beardog.cce.hp.com wrote: > > > > I just noticed the existence of this change (I guess > > I'm a little slow): > > > > commit 2a48fc0ab24241755dc93bfd4f01d68efab47f5a > > Author: Arnd Bergmann <arnd@arndb.de> > > Date: Wed Jun 2 14:28:52 2010 +0200 > > > > block: autoconvert trivial BKL users to private mutex > > > > The block device drivers have all gained new lock_kernel > > calls from a recent pushdown, and some of the drivers > > were already using the BKL before. > > > > This turns the BKL into a set of per-driver mutexes. > > Still need to check whether this is safe to do. > > > > This changes the behavior of, for example, the CCISS_PASSTHRU and > > other ioctls. Previously, these were "protected" by the Big > > Kernel Lock. Now, from what I understand and what I've observed, > > the Big Kernel Lock is kind of strange, in that it auto-releases > > whenever you sleep. This means that the ioctl path in cciss used > > to allow concurrency (which is fine, they should all be thread > > safe. I'm kind of wondering why the BKL was there in the first > > place. I guess I assumed it was necessary for reasons having to > > do with the upper layers.) > > The reason to have the BKL there is that the BKL used to protect > every syscall. The block ioctl handling was simply one of the > final places to lose it, after it was gone almost everywhere > else. > > Originally (before Linux-1.3!), the only concurrency that was > going on in the kernel was when one task was sleeping, so it was > straightforward to add the semantics of the BKL that look so > strange in retrospect. > > > Now, if I understand things correctly, with this new driver specific mutex > > surrounding the ioctl path, only one thread is permitted in the ioctl path at a > > time, and whether it sleeps or not, the mutex is not released until it's done. > > Yes. > > > That's a pretty big change in behavior. Some commands sent through > > the passthrough ioctls may take awhile, and they're going to block > > any other ioctls from getting started while they're going. Previously, > > it was possible to saturate the controller using multiple threads or > > processes hammering on the ioctl path. This would appear to no longer > > be the case. > > Correct. Almost all ioctls in the kernel are of the "instantly > return" kind, any ioctl that blocks for a potentially unlimited > amount of time under a mutex would be a bug that I introduced > in the conversion. > > > That being said, there was previously no real limit (other than > > pci_alloc_consistent eventually failing) on how many commands > > could be shoved through the cciss ioctl path at once, which was probably > > a bug. It has occurred to me to put a limit on this, and return > > -EBUSY when the limit is hit, though I don't know if programs using > > the ioctl are prepared to receive EBUSY... but maybe that's not my > > problem. > > > > Thoughts? > > > > I'm thinking that if there's no reason the ioctl path can't allow > > concurrency, then it should allow concurrency. > > Absolutely. The best way forward would be to prove that the mutex > I introduced is actually pointless, or alternatively change the code > so that the mutex is only held in the places where it is really > required, if any. Pretty sure the ioctl stuff is already thread safe, but I will look at it again. There are a couple of other places the mutex is used, open, release, I think one other place, that I'm less sure about, only because I haven't thought about them. It should be a safe assumption that nothing outside the driver depends on these mutexes (obviously, since they're declared only in the driver) so that should make it pretty easy to figure out. > > When I submitted the patches, I always tried to make it clear that > most of the new mutexes are not required at all, but that proving > this for every single driver takes a lot of hard work. > > Having a mutex that should not be there causes a performance regression > or a deadlock, both of which are fairly easy to bisect and fix, but > missing a mutex because of a subtle BKL dependency would have meant > introducing a race condition that would be extremely hard to analyse. > > Limiting the number of concurrent I/Os submitted to a device might > be a good idea, but I think that's a separate discussion. Ok. Thanks for the explanation and confirmation of what I was thinking. I'll start working on a fix for this. May take me a little while as there are a bunch of other things I have to work on, but it's been this way since June of 2010 so far with no complaints (that I know of), so I guess it won't hurt too much for it to be this way for a little while longer. -- steve ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Question about driver ioctl, big kernel lock, mutexes 2011-03-16 19:47 ` scameron @ 2011-03-16 20:40 ` Arnd Bergmann 2011-03-17 13:26 ` scameron 0 siblings, 1 reply; 6+ messages in thread From: Arnd Bergmann @ 2011-03-16 20:40 UTC (permalink / raw) To: scameron; +Cc: linux-kernel, axboe, dab, mike.miller On Wednesday 16 March 2011 20:47:26 scameron@beardog.cce.hp.com wrote: > On Wed, Mar 16, 2011 at 08:06:14PM +0100, Arnd Bergmann wrote: > > Absolutely. The best way forward would be to prove that the mutex > > I introduced is actually pointless, or alternatively change the code > > so that the mutex is only held in the places where it is really > > required, if any. > > Pretty sure the ioctl stuff is already thread safe, but I will look at it > again. > > There are a couple of other places the mutex is used, open, release, I think > one other place, that I'm less sure about, only because I haven't thought > about them. It should be a safe assumption that nothing outside > the driver depends on these mutexes (obviously, since they're declared > only in the driver) so that should make it pretty easy to figure out. Ah, right. The open/release functions use access the two usage_count variables in an unsafe way. The mutex serializes between the two, but there are other functions that look at the same data without taking the mutex. It would be good to make this safer in some way. I think turning it into an atomic_t would be enough, unless the two usage counts need to be strictly taken atomically or in the context of something else. The rest of open/close looks fine to me without the mutex. > > When I submitted the patches, I always tried to make it clear that > > most of the new mutexes are not required at all, but that proving > > this for every single driver takes a lot of hard work. > > > > Having a mutex that should not be there causes a performance regression > > or a deadlock, both of which are fairly easy to bisect and fix, but > > missing a mutex because of a subtle BKL dependency would have meant > > introducing a race condition that would be extremely hard to analyse. > > > > Limiting the number of concurrent I/Os submitted to a device might > > be a good idea, but I think that's a separate discussion. > > Ok. Thanks for the explanation and confirmation of what I was thinking. > I'll start working on a fix for this. May take me a little while as > there are a bunch of other things I have to work on, but it's been > this way since June of 2010 so far with no complaints (that I know of), > so I guess it won't hurt too much for it to be this way for a little > while longer. I'm curious: what is even the intention behind the passthru ioctls? Do you have applications that use them a lot? Arnd ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Question about driver ioctl, big kernel lock, mutexes 2011-03-16 20:40 ` Arnd Bergmann @ 2011-03-17 13:26 ` scameron 0 siblings, 0 replies; 6+ messages in thread From: scameron @ 2011-03-17 13:26 UTC (permalink / raw) To: Arnd Bergmann; +Cc: linux-kernel, axboe, dab, mike.miller, scameron On Wed, Mar 16, 2011 at 09:40:34PM +0100, Arnd Bergmann wrote: > On Wednesday 16 March 2011 20:47:26 scameron@beardog.cce.hp.com wrote: > > On Wed, Mar 16, 2011 at 08:06:14PM +0100, Arnd Bergmann wrote: > > > > Absolutely. The best way forward would be to prove that the mutex > > > I introduced is actually pointless, or alternatively change the code > > > so that the mutex is only held in the places where it is really > > > required, if any. > > > > Pretty sure the ioctl stuff is already thread safe, but I will look at it > > again. > > > > There are a couple of other places the mutex is used, open, release, I think > > one other place, that I'm less sure about, only because I haven't thought > > about them. It should be a safe assumption that nothing outside > > the driver depends on these mutexes (obviously, since they're declared > > only in the driver) so that should make it pretty easy to figure out. > > Ah, right. The open/release functions use access the two usage_count > variables in an unsafe way. The mutex serializes between the two, > but there are other functions that look at the same data without > taking the mutex. It would be good to make this safer in some > way. > > I think turning it into an atomic_t would be enough, unless the > two usage counts need to be strictly taken atomically or in the > context of something else. > > The rest of open/close looks fine to me without the mutex. > > > > When I submitted the patches, I always tried to make it clear that > > > most of the new mutexes are not required at all, but that proving > > > this for every single driver takes a lot of hard work. > > > > > > Having a mutex that should not be there causes a performance regression > > > or a deadlock, both of which are fairly easy to bisect and fix, but > > > missing a mutex because of a subtle BKL dependency would have meant > > > introducing a race condition that would be extremely hard to analyse. > > > > > > Limiting the number of concurrent I/Os submitted to a device might > > > be a good idea, but I think that's a separate discussion. > > > > Ok. Thanks for the explanation and confirmation of what I was thinking. > > I'll start working on a fix for this. May take me a little while as > > there are a bunch of other things I have to work on, but it's been > > this way since June of 2010 so far with no complaints (that I know of), > > so I guess it won't hurt too much for it to be this way for a little > > while longer. > > I'm curious: what is even the intention behind the passthru ioctls? > Do you have applications that use them a lot? Yes. The main user would be the HP Array configuration utility. Also SNMP storage agents, and SMX providers for storage. And things like cciss_vol_status and other utils here: http://cciss.sourceforge.net/#cciss_utils The passthru ioctl is how new logical drives are configured, added, removed, etc. RAID levels can be changed, migrated, etc. on the fly, and lots of other stuff. Essentially the firmware on the Smart Array boards has a ton of Smart Array specific functionality which can be controlled through the pass through ioctls. -- steve ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Question about driver ioctl, big kernel lock, mutexes 2011-03-16 15:50 Question about driver ioctl, big kernel lock, mutexes scameron 2011-03-16 19:06 ` Arnd Bergmann @ 2011-03-16 19:36 ` Arnd Bergmann 1 sibling, 0 replies; 6+ messages in thread From: Arnd Bergmann @ 2011-03-16 19:36 UTC (permalink / raw) To: scameron; +Cc: linux-kernel, axboe, dab, mike.miller On Wednesday 16 March 2011 16:50:33 scameron@beardog.cce.hp.com wrote: > That's a pretty big change in behavior. Some commands sent through > the passthrough ioctls may take awhile, and they're going to block > any other ioctls from getting started while they're going. Previously, > it was possible to saturate the controller using multiple threads or > processes hammering on the ioctl path. This would appear to no longer > be the case. I just looked a bit closer at the driver, this is what I noticed: * The mutex currently protects CCISS_GETNODENAME against CCISS_SETNODENAME and CCISS_GETINTINFO against CCISS_SETINTINFO. You can easily change that by taking &h->lock in the CCISS_GET* functions. * The scsi ioctls don't need the mutex, we've checked that elsewhere. * I can see no reason why any of the CCISS specific ioctls would require a lock. There is no global data they touch, and very little host specific data, most of which is accessed under a host spinlock. * When you clean up the ioctl handling to remove the mutex, it would be a good idea to clean up the compat_ioctl handling at the same time, which is another artifact of a legacy implementation. Just change cciss_ioctl_big_passthru to take a BIG_IOCTL_Command_struct kernel pointer argument, and then it directly from cciss_ioctl32_big_passthru without the extra compat_alloc_user_space and copy_in_user. Same for cciss_ioctl32_passthru. This will simplify the compat code path and make it faster. * Allocating consistent memory on behalf of the user indeed looks like a potential DoS attack, but it's only possible if you have read permissions on the block device. To improve this, you should track the total amount of memory allocated to the device, since even a single ioctl apparently could be used to pin down all memory otherwise, independent of the number of commands. Arnd ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2011-03-17 13:26 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-03-16 15:50 Question about driver ioctl, big kernel lock, mutexes scameron 2011-03-16 19:06 ` Arnd Bergmann 2011-03-16 19:47 ` scameron 2011-03-16 20:40 ` Arnd Bergmann 2011-03-17 13:26 ` scameron 2011-03-16 19:36 ` Arnd Bergmann
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox