* recursive spinlocks. Shoot.
@ 2003-05-18 9:21 Peter T. Breuer
2003-05-18 16:30 ` Martin J. Bligh
2003-05-18 18:13 ` Davide Libenzi
0 siblings, 2 replies; 58+ messages in thread
From: Peter T. Breuer @ 2003-05-18 9:21 UTC (permalink / raw)
To: linux-kernel
Here's a before-breakfast implementation of a recursive spinlock. That
is, the same thread can "take" the spinlock repeatedly. This is crude -
I just want to focus some attention on the issue (while I go out and
have breakfast :'E).
The idea is to implement trylock correctly, and then get lock and
unlock from that via the standard algebra. lock is a loop doing
a trylock until it succeeds. We emerge from a successful trylock
with the lock notionally held.
The "spinlock" is a register of the current pid, plus a recursion
counter, with atomic access. The pid is either -1 (unset, count is
zero) or some decent value (count is positive).
The trylock will succeed and set the pid if it is currently unset. It
will succeed if the pid matches ours, and increment the count of
holders.
Unlock just decrements the count. When we've unlocked enough times,
somebody else can take the lock.
The objection to this scheme that I have is
1) its notion of identity is limited to the pid, which is crude
2) it doesn't detect dead holders of the lock, which would be nice
3) it should probably be done in assembler, using whatever trick
rwlocks use
4) it doesn't actually use a real spinlock to "hold" the lock, which
makes me nervous.
typedef struct recursive_spinlock {
spinlock_t lock;
int pid;
int count;
} recursive_spinlock_t;
int recursive_spin_trylock(recursive_spinlock_t *lock) {
spin_lock(&lock->lock);
if (lock->count <= 0) {
// greenfield site
lock->pid = current->pid;
lock->count = 1;
spin_unlock(&lock->lock);
return 0;
}
// somebody has the lock
if (lock->pid == current->pid) {
// it was us! return ok`
lock->count++;
spin_unlock(&lock->lock);
return 0;
}
// somebody has the lock and it's not us! return fail
spin_unlock(&lock->lock);
return 1;
}
void recursive_spin_lock(recursive_spinlock_t *lock) {
while (recursive_spin_trylock(lock) != 0);
}
void recursive_spin_unlock(recursive_spinlock_t *lock) {
spin_lock(&lock->lock);
if (--lock->count <= 0)
lock->count = 0;
spin_unlock(&lock->lock);
}
void recursive_spinlock_init(recursive_spinlock_t *lock) {
spinlock_init(&lock->lock);
lock->pid = -1;
lock->count = 0;
}
Peter
^ permalink raw reply [flat|nested] 58+ messages in thread* Re: recursive spinlocks. Shoot. 2003-05-18 9:21 recursive spinlocks. Shoot Peter T. Breuer @ 2003-05-18 16:30 ` Martin J. Bligh 2003-05-18 16:35 ` William Lee Irwin III 2003-05-18 18:13 ` Davide Libenzi 1 sibling, 1 reply; 58+ messages in thread From: Martin J. Bligh @ 2003-05-18 16:30 UTC (permalink / raw) To: ptb, linux-kernel > Here's a before-breakfast implementation of a recursive spinlock. That > is, the same thread can "take" the spinlock repeatedly. Why? M. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: recursive spinlocks. Shoot. 2003-05-18 16:30 ` Martin J. Bligh @ 2003-05-18 16:35 ` William Lee Irwin III 2003-05-18 16:49 ` Arjan van de Ven 2003-05-18 17:24 ` Peter T. Breuer 0 siblings, 2 replies; 58+ messages in thread From: William Lee Irwin III @ 2003-05-18 16:35 UTC (permalink / raw) To: Martin J. Bligh; +Cc: ptb, linux-kernel At some point in the past, Peter Breuer's attribution was removed from: >> Here's a before-breakfast implementation of a recursive spinlock. That >> is, the same thread can "take" the spinlock repeatedly. On Sun, May 18, 2003 at 09:30:17AM -0700, Martin J. Bligh wrote: > Why? netconsole. -- wli ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: recursive spinlocks. Shoot. 2003-05-18 16:35 ` William Lee Irwin III @ 2003-05-18 16:49 ` Arjan van de Ven 2003-05-18 16:54 ` William Lee Irwin III 2003-05-18 17:24 ` Peter T. Breuer 1 sibling, 1 reply; 58+ messages in thread From: Arjan van de Ven @ 2003-05-18 16:49 UTC (permalink / raw) To: William Lee Irwin III; +Cc: Martin J. Bligh, ptb, linux-kernel [-- Attachment #1: Type: text/plain, Size: 599 bytes --] On Sun, 2003-05-18 at 18:35, William Lee Irwin III wrote: > At some point in the past, Peter Breuer's attribution was removed from: > >> Here's a before-breakfast implementation of a recursive spinlock. That > >> is, the same thread can "take" the spinlock repeatedly. > > On Sun, May 18, 2003 at 09:30:17AM -0700, Martin J. Bligh wrote: > > Why? > > netconsole. not really. the netconsole issue is tricky and recursive, but recursive locks aren't the solution; that would require a rewrite of the network drivers. It's far easier to solve it by moving the debug printk's instead. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: recursive spinlocks. Shoot. 2003-05-18 16:49 ` Arjan van de Ven @ 2003-05-18 16:54 ` William Lee Irwin III 2003-05-18 17:14 ` Martin J. Bligh 0 siblings, 1 reply; 58+ messages in thread From: William Lee Irwin III @ 2003-05-18 16:54 UTC (permalink / raw) To: Arjan van de Ven; +Cc: Martin J. Bligh, ptb, linux-kernel At some point in the past, Peter Breuer's attribution was removed from: >>>> Here's a before-breakfast implementation of a recursive spinlock. That >>>> is, the same thread can "take" the spinlock repeatedly. On Sun, May 18, 2003 at 09:30:17AM -0700, Martin J. Bligh wrote: >>> Why? On Sun, 2003-05-18 at 18:35, William Lee Irwin III wrote: >> netconsole. On Sun, May 18, 2003 at 06:49:04PM +0200, Arjan van de Ven wrote: > not really. > the netconsole issue is tricky and recursive, but recursive locks aren't > the solution; that would require a rewrite of the network drivers. It's > far easier to solve it by moving the debug printk's instead. Yes, there are better ways to fix it. But AIUI this is why some people want it (the rest of us just don't want messy locking semantics around). -- wli ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: recursive spinlocks. Shoot. 2003-05-18 16:54 ` William Lee Irwin III @ 2003-05-18 17:14 ` Martin J. Bligh 0 siblings, 0 replies; 58+ messages in thread From: Martin J. Bligh @ 2003-05-18 17:14 UTC (permalink / raw) To: William Lee Irwin III, Arjan van de Ven; +Cc: ptb, linux-kernel --William Lee Irwin III <wli@holomorphy.com> wrote (on Sunday, May 18, 2003 09:54:45 -0700): > At some point in the past, Peter Breuer's attribution was removed from: >>>>> Here's a before-breakfast implementation of a recursive spinlock. That >>>>> is, the same thread can "take" the spinlock repeatedly. > > On Sun, May 18, 2003 at 09:30:17AM -0700, Martin J. Bligh wrote: >>>> Why? > > On Sun, 2003-05-18 at 18:35, William Lee Irwin III wrote: >>> netconsole. > > On Sun, May 18, 2003 at 06:49:04PM +0200, Arjan van de Ven wrote: >> not really. >> the netconsole issue is tricky and recursive, but recursive locks aren't >> the solution; that would require a rewrite of the network drivers. It's >> far easier to solve it by moving the debug printk's instead. > > Yes, there are better ways to fix it. But AIUI this is why some people > want it (the rest of us just don't want messy locking semantics around). Right ... to me this just seems to create confusing code for no really good reason that I can see right now. M. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: recursive spinlocks. Shoot. 2003-05-18 16:35 ` William Lee Irwin III 2003-05-18 16:49 ` Arjan van de Ven @ 2003-05-18 17:24 ` Peter T. Breuer 2003-05-18 22:34 ` David Woodhouse ` (4 more replies) 1 sibling, 5 replies; 58+ messages in thread From: Peter T. Breuer @ 2003-05-18 17:24 UTC (permalink / raw) To: William Lee Irwin III; +Cc: Martin J. Bligh, ptb, linux-kernel "A month of sundays ago William Lee Irwin III wrote:" > At some point in the past, Peter Breuer's attribution was removed from: > >> Here's a before-breakfast implementation of a recursive spinlock. That > >> is, the same thread can "take" the spinlock repeatedly. > > On Sun, May 18, 2003 at 09:30:17AM -0700, Martin J. Bligh wrote: > > Why? > > netconsole. That's a problem looking for a solution! No, the reason for wanting a recursive spinlock is that nonrecursive locks make programming harder. Though I've got quite good at finding and removing deadlocks in my old age, there are still two popular ways that the rest of the world's prgrammers often shoot themselves in the foot with a spinlock: a) sleeping while holding the spinlock b) taking the spinlock in a subroutine while you already have it The first method leads to an early death if the spinlock is a popular one, as the only thread that can release it doesn't seem to be running, errr.. The second method is used by programmers who aren't aware that some obscure subroutine takes a spinlock, and who recklessly take a lock before calling a subroutine (the very thought sends shivers down my spine ...). A popular scenario involves not /knowing/ that your routine is called by the kernel with some obscure lock already held, and then calling a subroutine that calls the same obscure lock. The request function is one example, but that's hardly obscure (and in 2.5 the situation has eased there!). It's the case (b) that a recursive spinlock makes go away. Hey, that's not bad for a small change! 50% of potential programming errors sent to the dustbin without ever being encountered. Peter ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: recursive spinlocks. Shoot. 2003-05-18 17:24 ` Peter T. Breuer @ 2003-05-18 22:34 ` David Woodhouse 2003-05-19 13:37 ` Peter T. Breuer 2003-05-20 3:12 ` Robert White 2003-05-19 2:05 ` Kevin O'Connor ` (3 subsequent siblings) 4 siblings, 2 replies; 58+ messages in thread From: David Woodhouse @ 2003-05-18 22:34 UTC (permalink / raw) To: ptb; +Cc: William Lee Irwin III, Martin J. Bligh, linux-kernel On Sun, 2003-05-18 at 18:24, Peter T. Breuer wrote: > The second method is used by programmers who aren't aware that some > obscure subroutine takes a spinlock, and who recklessly take a lock > before calling a subroutine (the very thought sends shivers down my > spine ...). A popular scenario involves not /knowing/ that your routine > is called by the kernel with some obscure lock already held, and then > calling a subroutine that calls the same obscure lock. The request > function is one example, but that's hardly obscure (and in 2.5 the > situation has eased there!). To be honest, if any programmer is capable of committing this error and not finding and fixing it for themselves, then they're also capable, and arguably _likely_, to introduce subtle lock ordering discrepancies which will cause deadlock once in a blue moon. I don't _want_ you to make life easier for this hypothetical programmer. I want them to either learn to comprehend locking _properly_, or take up gardening instead. -- dwmw2 ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: recursive spinlocks. Shoot. 2003-05-18 22:34 ` David Woodhouse @ 2003-05-19 13:37 ` Peter T. Breuer 2003-05-19 13:45 ` Jens Axboe ` (2 more replies) 2003-05-20 3:12 ` Robert White 1 sibling, 3 replies; 58+ messages in thread From: Peter T. Breuer @ 2003-05-19 13:37 UTC (permalink / raw) To: David Woodhouse; +Cc: ptb, William Lee Irwin III, Martin J. Bligh, linux-kernel "David Woodhouse wrote:" > To be honest, if any programmer is capable of committing this error and > not finding and fixing it for themselves, then they're also capable, and > arguably _likely_, to introduce subtle lock ordering discrepancies which > will cause deadlock once in a blue moon. > > I don't _want_ you to make life easier for this hypothetical programmer. > > I want them to either learn to comprehend locking _properly_, or take up > gardening instead. Let's quote the example from rubini & corbet of the sbull block device driver. The request function ends like so: spin_unlock_irq (&io_request_lock); spin_lock(&device->lock); /* Process all of the buffers in this (possibly clustered) request. */ do { status = sbull_transfer(device, req); } while (end_that_request_first(req, status, DEVICE_NAME)); spin_unlock(&device->lock); spin_lock_irq (&io_request_lock); end_that_request_last(req); } device->busy = 0; } Notice that he runs end_that_request_first outside the io_request_lock and end_that_request_last under the lock. Do you know which is right? (if any :-). And he takes a device lock before calling the "transfer" routine. Yes, he's ok because his transfer function is trivial and doesn't take the lock, but anyone following his recipe is heading for trouble. Peter ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: recursive spinlocks. Shoot. 2003-05-19 13:37 ` Peter T. Breuer @ 2003-05-19 13:45 ` Jens Axboe 2003-05-19 13:47 ` Arjan van de Ven [not found] ` <mailman.1053352200.24653.linux-kernel2news@redhat.com> 2 siblings, 0 replies; 58+ messages in thread From: Jens Axboe @ 2003-05-19 13:45 UTC (permalink / raw) To: Peter T. Breuer Cc: David Woodhouse, William Lee Irwin III, Martin J. Bligh, linux-kernel On Mon, May 19 2003, Peter T. Breuer wrote: > "David Woodhouse wrote:" > > To be honest, if any programmer is capable of committing this error and > > not finding and fixing it for themselves, then they're also capable, and > > arguably _likely_, to introduce subtle lock ordering discrepancies which > > will cause deadlock once in a blue moon. > > > > I don't _want_ you to make life easier for this hypothetical programmer. > > > > I want them to either learn to comprehend locking _properly_, or take up > > gardening instead. > > Let's quote the example from rubini & corbet of the sbull block device > driver. The request function ends like so: > > > spin_unlock_irq (&io_request_lock); > spin_lock(&device->lock); > > /* Process all of the buffers in this (possibly clustered) request. */ > do { > status = sbull_transfer(device, req); > } while (end_that_request_first(req, status, DEVICE_NAME)); > spin_unlock(&device->lock); > spin_lock_irq (&io_request_lock); > end_that_request_last(req); > } > device->busy = 0; > } > > > Notice that he runs end_that_request_first outside the io_request_lock > and end_that_request_last under the lock. Do you know which is right? > (if any :-). Both are right, as it so happens. > And he takes a device lock before calling the "transfer" routine. > Yes, he's ok because his transfer function is trivial and doesn't > take the lock, but anyone following his recipe is heading for > trouble. In 2.5, the device lock most likely would be the queue lock as well so no confusion there. But what are you talking about? I'd assume that the device lock must be held in the transfer function, why else would you grab it in the first place in the function you quote? Please, if you are going to find examples to support the recursive locking idea, find some decent ones... I think you are trying to make up problems that do not exist. I'd hate to see recursive locks being used just because someone can't be bothered to write to code correctly (recursive locks have its uses, the one you are advocating definitely isn't one). Recursive locks are _not_ a remedy for someone who doesn't understand locking or isn't capable enough to design his locking correctly. Period. -- Jens Axboe ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: recursive spinlocks. Shoot. 2003-05-19 13:37 ` Peter T. Breuer 2003-05-19 13:45 ` Jens Axboe @ 2003-05-19 13:47 ` Arjan van de Ven [not found] ` <mailman.1053352200.24653.linux-kernel2news@redhat.com> 2 siblings, 0 replies; 58+ messages in thread From: Arjan van de Ven @ 2003-05-19 13:47 UTC (permalink / raw) To: ptb; +Cc: David Woodhouse, William Lee Irwin III, Martin J. Bligh, linux-kernel [-- Attachment #1: Type: text/plain, Size: 763 bytes --] On Mon, 2003-05-19 at 15:37, Peter T. Breuer wrote: > "David Woodhouse wrote:" > > To be honest, if any programmer is capable of committing this error and > > not finding and fixing it for themselves, then they're also capable, and > > arguably _likely_, to introduce subtle lock ordering discrepancies which > > will cause deadlock once in a blue moon. > > > > I don't _want_ you to make life easier for this hypothetical programmer. > > > > I want them to either learn to comprehend locking _properly_, or take up > > gardening instead. > > Let's quote the example from rubini & corbet of the sbull block device > driver. The request function ends like so: defective locking in a driver is no excuse to pamper over it with recusrive shite. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 58+ messages in thread
[parent not found: <mailman.1053352200.24653.linux-kernel2news@redhat.com>]
* Re: recursive spinlocks. Shoot. [not found] ` <mailman.1053352200.24653.linux-kernel2news@redhat.com> @ 2003-05-19 23:54 ` Pete Zaitcev 2003-05-20 0:03 ` viro 2003-05-20 0:03 ` Johannes Erdfelt 0 siblings, 2 replies; 58+ messages in thread From: Pete Zaitcev @ 2003-05-19 23:54 UTC (permalink / raw) To: linux-kernel >> Let's quote the example from rubini & corbet of the sbull block device >> driver. The request function ends like so: > > defective locking in a driver is no excuse to pamper over it with > recusrive shite. Arjan is a little too harsh here, but on the principle I happen to agree, because I worked with systems which allow recursive locks. They very often cover up for programmer's lack of basic understanding. Worse, sometimes even experienced programmers can do poorly. I ran into the latter cathegory of code when fixing so-called "presto" in Solaris (now replaced by Encore-originated code). Normal spinlocks are not without problems, in particular people tend to write: void urb_rm_priv_locked(struct urb *) { ...... } void urb_rm_priv(struct urb *u) { spin_lock_irqsave(); urb_rm_prin_locked(u); spin_unlock_irqrestore(); } Which eats a stack frame. We make this tradeoff on purpose, as a lesser evil. BTW, I do not see Linus and his leutenants rebuking the onslaught of recursive ingenuity in this thread. Ignoring the hogwash, or waiting and watching? -- Pete ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: recursive spinlocks. Shoot. 2003-05-19 23:54 ` Pete Zaitcev @ 2003-05-20 0:03 ` viro 2003-05-20 0:03 ` Johannes Erdfelt 1 sibling, 0 replies; 58+ messages in thread From: viro @ 2003-05-20 0:03 UTC (permalink / raw) To: Pete Zaitcev; +Cc: linux-kernel On Mon, May 19, 2003 at 07:54:41PM -0400, Pete Zaitcev wrote: > BTW, I do not see Linus and his leutenants rebuking the onslaught > of recursive ingenuity in this thread. Ignoring the hogwash, > or waiting and watching? Watching people flog the darkened spot where a dead horse used to be 5 years ago. Amusing sight, that... ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: recursive spinlocks. Shoot. 2003-05-19 23:54 ` Pete Zaitcev 2003-05-20 0:03 ` viro @ 2003-05-20 0:03 ` Johannes Erdfelt 1 sibling, 0 replies; 58+ messages in thread From: Johannes Erdfelt @ 2003-05-20 0:03 UTC (permalink / raw) To: Pete Zaitcev; +Cc: linux-kernel On Mon, May 19, 2003, Pete Zaitcev <zaitcev@redhat.com> wrote: > >> Let's quote the example from rubini & corbet of the sbull block device > >> driver. The request function ends like so: > > > > defective locking in a driver is no excuse to pamper over it with > > recusrive shite. > > Arjan is a little too harsh here, but on the principle I happen > to agree, because I worked with systems which allow recursive locks. > They very often cover up for programmer's lack of basic understanding. > Worse, sometimes even experienced programmers can do poorly. > I ran into the latter cathegory of code when fixing so-called > "presto" in Solaris (now replaced by Encore-originated code). > > Normal spinlocks are not without problems, in particular people > tend to write: > > void urb_rm_priv_locked(struct urb *) { > ...... > } > void urb_rm_priv(struct urb *u) { > spin_lock_irqsave(); > urb_rm_prin_locked(u); > spin_unlock_irqrestore(); > } > > Which eats a stack frame. We make this tradeoff on purpose, > as a lesser evil. > > BTW, I do not see Linus and his leutenants rebuking the onslaught > of recursive ingenuity in this thread. Ignoring the hogwash, > or waiting and watching? If past experience is any example, I don't think Linus is completely against recursive spinlocks. The uhci driver used a simple implementation at one point in time because of a tricky locking situation. We eventually discovered a non recursive method of handling it and ditched the code. Linus actually helped with the code a little bit. That being said, I'm happy we found an alternative solution and ditched the recursive spinlock code. I agree with much of your sentiments about them as well. JE ^ permalink raw reply [flat|nested] 58+ messages in thread
* RE: recursive spinlocks. Shoot. 2003-05-18 22:34 ` David Woodhouse 2003-05-19 13:37 ` Peter T. Breuer @ 2003-05-20 3:12 ` Robert White 2003-05-20 11:59 ` Helge Hafting 2003-05-22 1:00 ` Rik van Riel 1 sibling, 2 replies; 58+ messages in thread From: Robert White @ 2003-05-20 3:12 UTC (permalink / raw) To: David Woodhouse, ptb; +Cc: William Lee Irwin III, Martin J. Bligh, linux-kernel Er.. The gnome wields a morality stick, the morality stick wields itself to the gnome's hand... ---more--- The gnome hits you... you feel better... Your "moral position"... (quote) I want them to either learn to comprehend locking _properly_, or take up gardening instead. (unquote) ...is critically flawed. In point of fact, "proper" locking, when combined with "proper" definitions of an interface dictate that recursive locking is "better". Demanding that a call_EE_ know what locks a call_ER_ (and all antecedents of caller) will have taken is not exactly good design. Oh, don't get me wrong, I appreciate that in intimately interlocking code you have to do some of this, but remember that "a lock" is "morally" a declaration of need at the module interface level of abstraction. (for instance) The current kernel design requires that (the editorial) we need to know what locks are held when our file system hooks are called. Statements about how some particular lock (let's call him "bob") will be already held by the kernel when routine X is called are "bad" because they lead to all sorts of problems when that statement becomes false later. In the most "morally erect" interface design, the implementer of a module would never have to worry about locking anything, or at least about the current lock-state of anything. In the morally perfect universe, the declarative statement "lock that thing there" would be a smart operator that would know what all locks, in what dependency orders, needed to be acquired to "lock that thing there" and the system could execute that pathway of locking automagically. Of course, in such a world, the domestic swine would be useful as a bulk passenger carrier because of the easy way it traverses the sky... In the current design "we" demand that each designer learn and know what is locked and what is not. "We" have documents about how in version X the duty to lock thing Y has been shifted from outside routine Z to inside. Recursive locks (and a well defined resource allocation order, but who am I kidding) would actually be a benefit to all on both sides of most _info and _op structures, they would insulate design changes and improve portability. A design where I lock things for only as long as I need them is optimal for that contention. If my caller does the same, regardless of my actions then his code is optimal for those contentions too. If neither is concerned with the other's needs then the lock intervals are demand based and will tend to allow the bodies of code to be improved on a "purely local" basis. To achieve that, you need recursive locking. I don't think for an instant that we are going to get recursive locking, but it does make the specification of interfaces better. It's only real down side is that it lets the lazy and the sloppy get away with too much before getting burned. Recursive locking plus a "lock everything I *might* touch because my callees wont care" mentality leads to overlocking and hard-to-find deadlocks. Everything looks and works find for a while until everybody, willing to pre-reserve facilities for all their callees, starts building pool-straddling devices (USB Hard Drives, Network Extensible File Systems, etc) and only finding their deadlocks late in the game when things get tangled. so we wont get recursive locking because of the 02% people who can't be trusted with it. But that doesn't make the recursive locking facility anything less than superior for defining and implementing superior interfaces. Rob. -----Original Message----- From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-owner@vger.kernel.org]On Behalf Of David Woodhouse Sent: Sunday, May 18, 2003 3:35 PM To: ptb@it.uc3m.es Cc: William Lee Irwin III; Martin J. Bligh; linux-kernel@vger.kernel.org Subject: Re: recursive spinlocks. Shoot. On Sun, 2003-05-18 at 18:24, Peter T. Breuer wrote: > The second method is used by programmers who aren't aware that some > obscure subroutine takes a spinlock, and who recklessly take a lock > before calling a subroutine (the very thought sends shivers down my > spine ...). A popular scenario involves not /knowing/ that your routine > is called by the kernel with some obscure lock already held, and then > calling a subroutine that calls the same obscure lock. The request > function is one example, but that's hardly obscure (and in 2.5 the > situation has eased there!). To be honest, if any programmer is capable of committing this error and not finding and fixing it for themselves, then they're also capable, and arguably _likely_, to introduce subtle lock ordering discrepancies which will cause deadlock once in a blue moon. I don't _want_ you to make life easier for this hypothetical programmer. I want them to either learn to comprehend locking _properly_, or take up gardening instead. -- dwmw2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: recursive spinlocks. Shoot. 2003-05-20 3:12 ` Robert White @ 2003-05-20 11:59 ` Helge Hafting 2003-05-20 12:23 ` Richard B. Johnson 2003-05-22 1:00 ` Rik van Riel 1 sibling, 1 reply; 58+ messages in thread From: Helge Hafting @ 2003-05-20 11:59 UTC (permalink / raw) To: Robert White; +Cc: linux-kernel Robert White wrote: > Er.. > > The gnome wields a morality stick, the morality stick wields itself to the > gnome's hand... > ---more--- > The gnome hits you... you feel better... > > > Your "moral position"... > > (quote) I want them to either learn to comprehend locking _properly_, or > take up gardening instead. (unquote) > > ...is critically flawed. > > In point of fact, "proper" locking, when combined with "proper" definitions > of an interface dictate that recursive locking is "better". Demanding that > a call_EE_ know what locks a call_ER_ (and all antecedents of caller) will > have taken is not exactly good design. That depends on how big the total system is. You can break things down into independent modules and submodules that don't know each other, but at some point people need to know a whole module to do it properly. > Oh, don't get me wrong, I appreciate that in intimately interlocking code > you have to do some of this, but remember that "a lock" is "morally" a > declaration of need at the module interface level of abstraction. > > (for instance) The current kernel design requires that (the editorial) we > need to know what locks are held when our file system hooks are called. > Statements about how some particular lock (let's call him "bob") will be > already held by the kernel when routine X is called are "bad" because they > lead to all sorts of problems when that statement becomes false later. > > In the most "morally erect" interface design, the implementer of a module > would never have to worry about locking anything, or at least about the > current lock-state of anything. How can you say that? Sure, an ideal module might not need to worry about locks used by "others", but it might need a lock of its own - one that didn't exist prior to the module. Surely the implementor must know about this lock and when to use it. The module's interface may, after all, be called simultaneoulsy on quite a few cpus. > In the morally perfect universe, the > declarative statement "lock that thing there" would be a smart operator that > would know what all locks, in what dependency orders, needed to be acquired > to "lock that thing there" and the system could execute that pathway of > locking automagically. > Such a smart operator is an interesting excercise, but can it be made about as lightweight as the current locks? It is otherwise bad, because understanding the kernel locking isn't black magic - only work. > Of course, in such a world, the domestic swine would be useful as a bulk > passenger carrier because of the easy way it traverses the sky... :-) > In the current design "we" demand that each designer learn and know what is > locked and what is not. "We" have documents about how in version X the duty > to lock thing Y has been shifted from outside routine Z to inside. > > Recursive locks (and a well defined resource allocation order, but who am I > kidding) would actually be a benefit to all on both sides of most _info and > _op structures, they would insulate design changes and improve portability. > > A design where I lock things for only as long as I need them is optimal for > that contention. > > If my caller does the same, regardless of my actions then his code is > optimal for those contentions too. > > If neither is concerned with the other's needs then the lock intervals are > demand based and will tend to allow the bodies of code to be improved on a > "purely local" basis. > > To achieve that, you need recursive locking. Or a wrapper function that takes the lock and calls the function that assumes the lock is taken. You need to know which to use - but that isn't so incredibly hard, and you avoid the overhead of taking a lock twice. (bus lockeed memory access is slow, conditional jumps...) > I don't think for an instant that we are going to get recursive locking, but > it does make the specification of interfaces better. > > It's only real down side is that it lets the lazy and the sloppy get away > with too much before getting burned. Recursive locking plus a "lock > everything I *might* touch because my callees wont care" mentality leads to > overlocking and hard-to-find deadlocks. Everything looks and works find for > a while until everybody, willing to pre-reserve facilities for all their > callees, starts building pool-straddling devices (USB Hard Drives, Network > Extensible File Systems, etc) and only finding their deadlocks late in the > game when things get tangled. > > so we wont get recursive locking because of the 02% people who can't be > trusted with it. > > But that doesn't make the recursive locking facility anything less than > superior for defining and implementing superior interfaces. Too much abstraction isn't superior. Helge Hafting ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: recursive spinlocks. Shoot. 2003-05-20 11:59 ` Helge Hafting @ 2003-05-20 12:23 ` Richard B. Johnson 2003-05-20 21:05 ` Robert White 0 siblings, 1 reply; 58+ messages in thread From: Richard B. Johnson @ 2003-05-20 12:23 UTC (permalink / raw) To: Helge Hafting; +Cc: Robert White, Linux kernel On Tue, 20 May 2003, Helge Hafting wrote: > Robert White wrote: > > Er.. > > > > The gnome wields a morality stick, the morality stick wields itself to the > > gnome's hand... > > ---more--- > > The gnome hits you... you feel better... > > > > > > Your "moral position"... > > > > (quote) I want them to either learn to comprehend locking _properly_, or > > take up gardening instead. (unquote) > > > > ...is critically flawed. > > > > In point of fact, "proper" locking, when combined with "proper" definitions > > of an interface dictate that recursive locking is "better". Demanding that > > a call_EE_ know what locks a call_ER_ (and all antecedents of caller) will > > have taken is not exactly good design. > > That depends on how big the total system is. You can break things > down into independent modules and submodules that don't know each other, but > at some point people need to know a whole module to do it properly. > [SNIPPED...] Recursive locking is a misnomer. It does during run-time that which should have been done during design-time. In fact, there cannot be any recursion associated with locking. A locking mechanism that allows reentry or recursion is defective, both in design, and implementation. The nature of a lock is required to be such that if the locked object is in use, no access to that object is allowed. Recursive locking implies that if the lock is in use by the same thread that locked it before, then access to that object is allowed. In other words, if the coder (as opposed to designer) screwed up, the locking mechanism will allow it. If this is the way students are being taught to write code at the present time, all software will be moved off-shore in the not too distant future. There is absolutely no possible justification for such garbage. Just because some idiot wrote an article and got it published, doesn't mean this diatribe has any value at all. Cheers, Dick Johnson Penguin : Linux version 2.4.20 on an i686 machine (797.90 BogoMips). Why is the government concerned about the lunatic fringe? Think about it. ^ permalink raw reply [flat|nested] 58+ messages in thread
* RE: recursive spinlocks. Shoot. 2003-05-20 12:23 ` Richard B. Johnson @ 2003-05-20 21:05 ` Robert White 2003-05-20 21:42 ` Richard B. Johnson 0 siblings, 1 reply; 58+ messages in thread From: Robert White @ 2003-05-20 21:05 UTC (permalink / raw) To: root, Helge Hafting; +Cc: Linux kernel -----Original Message----- From: Richard B. Johnson [mailto:root@chaos.analogic.com] Sent: Tuesday, May 20, 2003 5:23 AM To: Helge Hafting > Recursive locking is a misnomer. It does during run-time that which > should have been done during design-time. In fact, there cannot > be any recursion associated with locking. A locking mechanism that > allows reentry or recursion is defective, both in design, and > implementation. Amusing... but false... A lock serves, and is defined by, exactly _ONE_ trait. A lock asserts and guarantees exclusive access to a domain (group of data or resources etc). There is nothing inherently "one deep" about that assertion. (Philosophically stated:) If I say "I own this car for the next hour" and ten minutes later say "I am taking the car to the store for twenty minutes" there is no "violation of truth" to the two assertions. The outer hour and the inner twenty minutes are in no form of conflict. Further, since there promise of the first assertion is _*NOT*_ that the car will be free for use after an hour, if the twenty minutes of the second assertion begins at the 58 minute mark of the sort-of-enclosing hour, thus extending the hour, you are still ok. More mathematically, if I write an operation "X" that, say, needs exclusive access to the dircache for some portion of its execution, for correctness I should lock that dircache. Say I write a second operation "Y" that also needs the dircache, and locks it appropriately. If someone wants/needs to create an operation "Alpha" that contains a number of sub operations including X and Y, but needs to ensure the consistency of the dircache for a range (of time/operations) larger than X, Y and the any number of operations "n" what is to be done. With your artificial definition of locking, the implementer of Alpha must do one of the following: 1) Separately reimplement (copy) X and Y without locking, so that the lock may be held by Alpha. 2) Restructure X and Y into X', X, Y', and Y such that all public uses of X and Y remain unchanged, but X and Y are now locking wrappers around X' and Y' so that X' and Y' may be used within Alpha. 3) (Multiply) Move the locks out of X and Y into all instances of invocations of X and Y so that Alpha has equal and unimpeded access to X an Y that is "identical" to every (now revised) use of X and Y. *ALL* of the above alternatives are wrong. At no time should a stable operation "O" need to be recoded or restructured to be rationally used in an enclosing operation Alpha. In point of fact, if the lock used in X, Y, and Alpha are, by default, recursive, Alpha can be coded as needed without having to revisit any of the operations that Alpha invokes. The implementer of Alpha "probably ought to" know what X and Y and all instances of "n" do and examine need to pre-claim locks to prevent internal deadlock as that is more expedient than making sure that X, Y and "n" all have proper anti-deadlock back-offs. There is no rational argument against recursive locking that can justify any system where the creation of an outer operation should view the pathological restructuring of existent component operations as "the right thing to do". If you think this doesn't happen, I point you do the documentation on the VFS and the various notations about locks having moved in and out of operations. The simple truth is that your statement: > The nature of a lock is required to be such that if the locked object > is in use, no access to that object is allowed. is PURE FANTASY and logically junct. Correctly stated: The nature of a lock is required to be such that, if the locked object is in use, no COMPETING OR OUT OF THREAD/BAND access to that object is allowed. A recursive lock would "protect" accesses that are IN BAND and thus completely allowed. Period. > Recursive locking > implies that if the lock is in use by the same thread that locked > it before, then access to that object is allowed. The statement above is logically useless to your argument, that is, if the words "recursive locking" were replaced with the word "this" or indeed "any lock", the statement remains tautological. "(This/Any lock) implies that if the lock is in use by the same thread that locked it (before/previously) then access to that object is allowed." See how nicely true that is? > In other words, > if the coder (as opposed to designer) screwed up, the locking > mechanism will allow it. If this is the way students are being > taught to write code at the present time, all software will > be moved off-shore in the not too distant future. There is > absolutely no possible justification for such garbage. Just > because some idiot wrote an article and got it published, > doesn't mean this diatribe has any value at all. Your assertions do nothing to address how the coder of "X" (the inner lock) has "screwed up" by correctly coding X with the necessary lock. Your assertions do nothing to address how the coder of "Alpha" can NOT "screw up" if Alpha requires exclusive access to the facility lock by "X" *AND* needs to invoke "X". Your stance is naive and prone to induce errors in the name of an unreasonable and logically fallacious notion of purity. Rob. ^ permalink raw reply [flat|nested] 58+ messages in thread
* RE: recursive spinlocks. Shoot. 2003-05-20 21:05 ` Robert White @ 2003-05-20 21:42 ` Richard B. Johnson 2003-05-20 23:06 ` Robert White 2003-05-21 5:48 ` Nikita Danilov 0 siblings, 2 replies; 58+ messages in thread From: Richard B. Johnson @ 2003-05-20 21:42 UTC (permalink / raw) To: Robert White; +Cc: Helge Hafting, Linux kernel On Tue, 20 May 2003, Robert White wrote: > > > -----Original Message----- > From: Richard B. Johnson [mailto:root@chaos.analogic.com] > Sent: Tuesday, May 20, 2003 5:23 AM > To: Helge Hafting > > > Recursive locking is a misnomer. It does during run-time that which > > should have been done during design-time. In fact, there cannot > > be any recursion associated with locking. A locking mechanism that > > allows reentry or recursion is defective, both in design, and > > implementation. > > Amusing... but false... > > A lock serves, and is defined by, exactly _ONE_ trait. A lock asserts and > guarantees exclusive access to a domain (group of data or resources etc). > The "ONE" trait is incorrect. The lock must guarantee that recursion is not allowed. Or to put it more bluntly, the lock must result in a single thread of execution through the locked object. > There is nothing inherently "one deep" about that assertion. > (Philosophically stated:) If I say "I own this car for the next hour" and > ten minutes later say "I am taking the car to the store for twenty minutes" > there is no "violation of truth" to the two assertions. The outer hour and > the inner twenty minutes are in no form of conflict. Further, since there > promise of the first assertion is _*NOT*_ that the car will be free for use > after an hour, if the twenty minutes of the second assertion begins at the > 58 minute mark of the sort-of-enclosing hour, thus extending the hour, you > are still ok. > Yes there is, in your example above after being granted access to the "car", you start its trip to the store, but in the process you decide to get another "permission" to use the car. The second permission is invalid because somebody already has been granted exclusive use of the car. If you don't already know that, then you will spend the weekend in jail for a DUI. > More mathematically, if I write an operation "X" that, say, needs exclusive > access to the dircache for some portion of its execution, for correctness I > should lock that dircache. Say I write a second operation "Y" that also > needs the dircache, and locks it appropriately. If someone wants/needs to > create an operation "Alpha" that contains a number of sub operations > including X and Y, but needs to ensure the consistency of the dircache for a > range (of time/operations) larger than X, Y and the any number of operations > "n" what is to be done. > > With your artificial definition of locking, the implementer of Alpha must do > one of the following: > > 1) Separately reimplement (copy) X and Y without locking, so that the lock > may be held by Alpha. > > 2) Restructure X and Y into X', X, Y', and Y such that all public uses of X > and Y remain unchanged, but X and Y are now locking wrappers around X' and > Y' so that X' and Y' may be used within Alpha. > > 3) (Multiply) Move the locks out of X and Y into all instances of > invocations of X and Y so that Alpha has equal and unimpeded access to X an > Y that is "identical" to every (now revised) use of X and Y. > > *ALL* of the above alternatives are wrong. At no time should a stable > operation "O" need to be recoded or restructured to be rationally used in an > enclosing operation Alpha. > You are not making any sense at all. If X needs to see the identical results of Y, while the cache is locked, then only one lock need exist and that lock may provide one of the following: (1) Either access to data before modification, i.e., before the lock or.. (2) Access to data after modification, i.e., after the lock... Both for X, Y, respectively, for all of N. > In point of fact, if the lock used in X, Y, and Alpha are, by default, > recursive, Alpha can be coded as needed without having to revisit any of the > operations that Alpha invokes. The implementer of Alpha "probably ought to" > know what X and Y and all instances of "n" do and examine need to pre-claim > locks to prevent internal deadlock as that is more expedient than making > sure that X, Y and "n" all have proper anti-deadlock back-offs. > It is not fact so there is no point to the argument. If a lock cannot retain the state of a locked object, and prevent it from being modifed out-of-order, then the lock isn't a lock at all. It may be an invention that may have some use (somewhere), but it is not a lock. > There is no rational argument against recursive locking that can justify any > system where the creation of an outer operation should view the pathological > restructuring of existent component operations as "the right thing to do". > > If you think this doesn't happen, I point you do the documentation on the > VFS and the various notations about locks having moved in and out of > operations. > > The simple truth is that your statement: > > > The nature of a lock is required to be such that if the locked object > > is in use, no access to that object is allowed. > > is PURE FANTASY and logically junct. Correctly stated: > No. It is correct. > The nature of a lock is required to be such that, if the locked object is in > use, no COMPETING OR OUT OF THREAD/BAND access to that object is allowed. > You just "invented" something that is not a lock. It is some kind of procedure that keeps records of access and allows or disallows access to an object based upon some knowledge of the locked resource and policy. > A recursive lock would "protect" accesses that are IN BAND and thus > completely allowed. Period. > This policy will usually be unknown at compile-time, and would require some kind of knowledge base at run-time. For instance, one could not just allow the same PID access to the same object because, even with threads, there will be a different PID for every attempt at access unless there is a coding error that lets the locker of a shared resource lock the same resource again -- and this is a coding error for which there can be no justification whatsoever. > > Recursive locking > > implies that if the lock is in use by the same thread that locked > > it before, then access to that object is allowed. > > The statement above is logically useless to your argument, that is, if the > words "recursive locking" were replaced with the word "this" or indeed "any > lock", the statement remains tautological. "(This/Any lock) implies that if > the lock is in use by the same thread that locked it (before/previously) > then access to that object is allowed." See how nicely true that is? > It is obvious that you have some kind of adgenda that attempts to justify some poor coding methods with some illogical logic. > > In other words, > > if the coder (as opposed to designer) screwed up, the locking > > mechanism will allow it. If this is the way students are being > > taught to write code at the present time, all software will > > be moved off-shore in the not too distant future. There is > > absolutely no possible justification for such garbage. Just > > because some idiot wrote an article and got it published, > > doesn't mean this diatribe has any value at all. > > Your assertions do nothing to address how the coder of "X" (the inner lock) > has "screwed up" by correctly coding X with the necessary lock. > It was already locked. An attempt of the resource-holder to lock the resource again is an error, pure and simple. > Your assertions do nothing to address how the coder of "Alpha" can NOT > "screw up" if Alpha requires exclusive access to the facility lock by "X" > *AND* needs to invoke "X". > > Your stance is naive and prone to induce errors in the name of an > unreasonable and logically fallacious notion of purity. > The stance is not unreasonable. It also corresponds to what the QC folks call "Good Standards of Engineering Practice", etc. To properly write code so that, at compile-time resources are known to be locked and unlocked in the correct order, goes a long way towards producing the correct results. Cheers, Dick Johnson Penguin : Linux version 2.4.20 on an i686 machine (797.90 BogoMips). Why is the government concerned about the lunatic fringe? Think about it. ^ permalink raw reply [flat|nested] 58+ messages in thread
* RE: recursive spinlocks. Shoot. 2003-05-20 21:42 ` Richard B. Johnson @ 2003-05-20 23:06 ` Robert White 2003-05-21 14:01 ` Richard B. Johnson 2003-05-21 5:48 ` Nikita Danilov 1 sibling, 1 reply; 58+ messages in thread From: Robert White @ 2003-05-20 23:06 UTC (permalink / raw) To: root; +Cc: Helge Hafting, Linux kernel -----Original Message----- From: Richard B. Johnson [mailto:root@chaos.analogic.com] > The lock must guarantee that recursion is not allowed. Or to put > it more bluntly, the lock must result in a single thread of execution > through the locked object. Where the HECK do you get that load of bull? The one requirement of a MUTUAL EXCLUSION PRIMITIVE (a.k.a. mutex, a.k.a. lock) is *MUTUAL* EXCLUSIVITY. Nothing else. Lets look at the words: Mutual - "directed by *each* toward the *other* or *others*" (e.g. not self, duh) {all other definitions talk about group insurance, which applies too 8-) Exclusion -> exclude -> "To prevent or restrict entrance" (etc.) "to bar from participation" So, a mutex erects a "bar to/from participation" "directed by each (holder) to other (would be holder(s))". There is no concept of "Self Exclusion" in a lock (mutex et. al.) so recursion, by definition, is (or should be) permitted. All else is unfounded blither. The fact is, that it is easier to write locks that will self dead-lock and lazy people, acting in the name of expediency, decided that somehow, such locks were "better" because they didn't want to expend the effort to make them correct. Still others then try to stand on lazy precedent as if it is somehow cannon. The only place/way you can argue this is if the constituent operations "X" within a larger body of code Alpha are not considered part of Alpha (re, the previous Alpha is composed of X and others example). But that is like yelling "I locked it, so my arm, which is not all of me, should not be allowed to use it because my arm is not me..." >From the standpoint of purely logical analysis, this is a little esoteric... and obviously specious tripe. Rob. ^ permalink raw reply [flat|nested] 58+ messages in thread
* RE: recursive spinlocks. Shoot. 2003-05-20 23:06 ` Robert White @ 2003-05-21 14:01 ` Richard B. Johnson 2003-05-21 21:56 ` Robert White 0 siblings, 1 reply; 58+ messages in thread From: Richard B. Johnson @ 2003-05-21 14:01 UTC (permalink / raw) To: Robert White; +Cc: Helge Hafting, Linux kernel I'll put this response in one final email. First, I am not impressed with the attempt to use semantics to justify some poor design. There are many words, especially in English, that do not mean what they seem to say. Examples are terminate (kill) impart force (make war), device (nuclear bomb), etc. A lock is supposed to allow one to obtain exclusive use of a resource. It is a resource for which one obtains a lock, not some lines of code. Some lines of code may constitute a resource, but seldom do. There is generally an underlying device that is involved. Since this device may eventually be shared by many, it is essential that only one user be allowed to use this device at a time so the operations upon this device are atomic. Atomic is another word that does not mean what it seems to mean. In the context of software operations, atomic means that an operation must complete as intended or have no effect at all. It is naive (defective) code that makes persons think they need recursive locking of resources. It also forces code execution at runtime that checks whether or not a lock has been obtained, by the very execution thread that obtained the lock in the first place. This means that whoever wrote the code didn't bother to check whether or not they had already written some previous lines of code. It's left up to the CPU to figure this out during runtime. Here is an example: procedure() { lock_resource(); read_directory(); unlock_resource(); } Wonderful. Unfortunately read_directory() can by called by others who don't have a lock on the resource. So the naive coder wants to have a procedure like this for read_directory(): read_directory() { lock_resource(); do_stuff(); unlock_resource(); } The idea is that lock_resource() should succeed if the caller already has obtained the lock. This is the 'technique' of so-called recursive locks. But, suppose some know-it-all manager states in no uncertain terms that such stuff shall not be allowed in production code, that it makes maintenance impossible, etc. So, our not-to-be out-done coder decides to beat the system by passing some kind of flag to each of the procedures. This will tell the procedure if a lock has already been obtained. procedure() { lock_resource(); read_directory(TRUE); unlock_resource(); } read_directory(flag) { if(!flag) lock_resource(); do_stuff(); unlock_resource(); } So, this prevents attempts at double-locking, but this does not convey information that the resource has been un-locked, so the flag needs to be a pointer to something to which the first resource- locker has access, like: procedure() { lock_flag = FALSE; lock_resource(); lock_flag = TRUE; read_directory(&lock_flag); if(lock_flag) unlock_resource(); } read_directory(*flag) { if(*flag != TRUE) lock_resource(); *flag = TRUE; do_stuff(); unlock_resource(); *flag = FALSE; } This works if the resource was only the procedure, read_directory(), and fails miserably if the resource was some underlying device because we now return from read_directory() without a lock. So this is shown to be 'proof' that recursive locks are required. The recursive lock simply moves the lock-flag into the locking procedure and the procedure keeps track of recursion so that the 'real' unlocking only occurs after the Nth call if there were N calls to lock. So the recursive lock comprises a counter and some additional identification method. This seems to fix everything while, in fact, it covers over the real problem, defective code. In the example case, the problem would be fixed by inspection. You look at the code and see what it is doing. In this case, the code could be simplified as: procedure() { read_directory(); } read_directory() { lock_resource(); do_stuff(); unlock_resource(); } Or simply reduced to calling read_directory() without the intervening procedure. There is never any reason, ever, to attempt to obtain a lock on a resource by the execution thread that has already locked the resource. This is demonstrable proof of a bug. Often 'impossible' locking situations can be resolved by using a relay procedure. In the days of old-and-bold engineers who coded in octal because assembly was too high a level, such procedures were commonplace. A relay procedure is some procedure that assures a single path of execution to and from some code that operates upon a shared resource. Such a procedure lines all the "ducks up in a row" so that the code that operates upon the shared resource operates atomically even though the code itself contains no locks. Here is a trivial example: relay() { mem = obtain_all_memory() lock_all_resources(); execute_procedure(mem); unlock_all_resources(); free(mem); } Since the only path to execute_procedure() is through this relay code, there cannot be any failures to unlock the device in certain execution paths or memory leaks, etc. Everything necessary to execute that common procedure atomically is set up ahead of time and torn down after it returns. Of course such code in real life isn't very useful because one generally doesn't know how much memory, or how many resources are actually required until the procedure starts execution. It is meant to demonstrate a method of executing a gigantic, hard to comprehend, procedure with guaranteed atomicity. You call it using a single execution path that obtains all its resources first and releases them after the procedure returns. The easiest way to emulate this in an operating system is to have one global lock. Anybody who needs to have guaranteed atomicity takes the global lock. Unfortunately this is not efficient so some finer granularity locks needed to be established in real operating systems. Every time somebody decides that a section of code needs to be locked, the possible execution paths that could result in a lack of atomicity need to be reviewed. If there are none, then no lock is needed. If there are some such paths, then locks need to be acquired in those paths only. In one email there was mention of the 'necessity' of recursive locks in network file-systems. Network file-systems create a bad problem because they are really 'state-less', with hooks to make them seem at least safe to use. An 'advisory' lock on a shared resource is like getting almost pregnant. A lock should be total and complete or it should not exist at all. In the case of network file-system locking, one needs a lock manager to watch over the whole mess so that if a client dies, disconnects, or otherwise goes away, its locked shared resources are unlocked after some bookkeeping that may allow for reconnection. The lock manager carries out policy so it isn't just the lock it's concerned with, but the whole communications "plant". So, this is not a "locking" issue, but a network file- system management issue, for which there are at least partial solutions, with new problems and newer solutions being discovered almost monthly. On Tue, 20 May 2003, Robert White wrote: > > > -----Original Message----- > From: Richard B. Johnson [mailto:root@chaos.analogic.com] > > > The lock must guarantee that recursion is not allowed. Or to put > > it more bluntly, the lock must result in a single thread of execution > > through the locked object. > > Where the HECK do you get that load of bull? The one requirement of a > MUTUAL EXCLUSION PRIMITIVE (a.k.a. mutex, a.k.a. lock) is *MUTUAL* > EXCLUSIVITY. > > Nothing else. Lets look at the words: > > Mutual - "directed by *each* toward the *other* or *others*" (e.g. not self, > duh) {all other definitions talk about group insurance, which applies too > 8-) > > Exclusion -> exclude -> "To prevent or restrict entrance" (etc.) "to bar > from participation" > > So, a mutex erects a "bar to/from participation" "directed by each (holder) > to other (would be holder(s))". > > There is no concept of "Self Exclusion" in a lock (mutex et. al.) so > recursion, by definition, is (or should be) permitted. > > All else is unfounded blither. > > The fact is, that it is easier to write locks that will self dead-lock and > lazy people, acting in the name of expediency, decided that somehow, such > locks were "better" because they didn't want to expend the effort to make > them correct. Still others then try to stand on lazy precedent as if it is > somehow cannon. > > The only place/way you can argue this is if the constituent operations "X" > within a larger body of code Alpha are not considered part of Alpha (re, the > previous Alpha is composed of X and others example). But that is like > yelling "I locked it, so my arm, which is not all of me, should not be > allowed to use it because my arm is not me..." > > >From the standpoint of purely logical analysis, this is a little esoteric... > and obviously specious tripe. > > Rob. > Cheers, Dick Johnson Penguin : Linux version 2.4.20 on an i686 machine (797.90 BogoMips). Why is the government concerned about the lunatic fringe? Think about it. ^ permalink raw reply [flat|nested] 58+ messages in thread
* RE: recursive spinlocks. Shoot. 2003-05-21 14:01 ` Richard B. Johnson @ 2003-05-21 21:56 ` Robert White 2003-05-22 0:13 ` viro 2003-05-22 0:46 ` Carl-Daniel Hailfinger 0 siblings, 2 replies; 58+ messages in thread From: Robert White @ 2003-05-21 21:56 UTC (permalink / raw) To: root; +Cc: Helge Hafting, Linux kernel /Heavy Sigh You insist on missing the simplest of points and then dressing your response as cannon. Simple point 1: We have public interfaces within the kernel. These interfaces include all of the various whatever_ops structures (e.g. struct file_ops) and are characterized by a list of pointers-to-functions, each with a "well known" behavior. Simple point 2: We occasionally see changes to the afore mentioned, and required "well known" behaviors, in the form of a change of responsibility for locking. Not counting the move from the "big kernel lock" to all that came after, whenever a definition of a routine changes from "called holding some_lock" "no longer called with some_lock held" a whole lot of code gets stirred up. (not the key point though...) Simple Point 3: (The real point) In the existing system, it is impossible to aggregate the existing code calls under an umbrella call, if that umbrella needs the same lock as the constituent calls. Lets say I have a file system with a perfectly implemented unlink and a perfectly implemented rename. Both of these routines need to exist exactly as they are. Both of these routines need to lock the vfs dentry subsystem (look it up.) (slightly contrived example follows) But now lets say that I am building a virtualization layer for, say, an network file system, or a caching file system, or a logical volume concatenator or something. It doesn't really matter why exactly, but lets say that to implement some semantic requirement of the existing network file system I need to implement a "destructive rename", that must be atomic. The "natural solution" is to implement something that looks like destructive_rename(...) { lock_dircache underlying_ops->unlink(target_name) underlying_ops->rename(source_name, target_name) unlock_dircache } But that natural synthesis of primitives wont work because both the unlink and the rename on the underlying file system are implemented with the absolutely correct lock_dircache they need internally. This "natural code" will self-deadlock. You assert that this can only happen if someone "did bad" in some form or another. You are wrong. Your simple-minded view of locking is, well, simple minded. In the stated example, for instance, there is no way to meet the requirements without manhandling the members pointed to by underlying_ops, violating the requirements of the outer layer, or hard coding the theoretical upper-layer so that it only works with a particular underlying file system. None of these are less-than-brain-dead options. But, you say, no such problem sets exist... Here is a problem statement that is virtually unimplementable in the current Linux kernel because of the simple-minded locking model. Create a meta file-system that takes two existing (backing) file system devices (or existing mounted directories) and aggregates them such that: (Ladies, and Gentlemen, The "deltafs" file system...) 1) It doesn't matter what types of file systems are used as the backing file systems. 2) The aggregate file system is fully read-write 3) The base (first existing) file system is read-only 4) The front (second existing) file system is read-write 5) All operations are available on the aggregate file system (unlink, rename, open for write, open for append, chown, set access time, etc.) 6) The aggregate file system engine will transcribe all the modified files from the base to the front file system if the file is modified. 7) The aggregate file system will (probably using a reserved file name and a journaling structure encoded therein) maintain a white-out list to hide unlinked and renamed files. 8) After unmount, the front file system should be a minimal delta of the base file system 9) Remounting the same combination of file systems should consistently result in the same, consistent file system image. 10) (you get the point... 8-) Now, the natural, and only rational implementation, would be to build a module with a "Y" shape. The two short legs being the backing file systems and the long leg being the presentation side that links under "real" mount point. You'd probably need a kernel thread too. Regardless of the secondary concerns, the aggregate file system will need to... wait for it... aggregate. For instance, when you search the dircache for a file, the dircache routines for the aggregate file system will need to call into the dircache routines of the backing file systems, and not necessarily in a symmetric way. There are "interesting" permutations. A simple lookup within the aggregate system will have to (typically) do three things: Look for the file on the front component, or then check for the file in the white-out list, or then check for the file on the read-only base file system. Another interesting aggregate operation would be the various incarnations of open (and subsequent use). If I open, or god forbid, map a file for reading, and then someone else opens it for writing, I can't have just passed out the inode from the base file system out because the write access will not ever be reflected in the reading/mapping of the open-for-read case. You can still almost do this as it exists today, but if you go and allow the backing elements to be bound directories (a la mount --bind) instead of pure devices everything flies right out the window. (The difference being the shared visibility of the underlying dentry stuff to "third parties.") You could get close to a good aggregate file system if you just don't lock anything at the aggregate levels and then hope concurrent accesses don't fall into some k-hole while complex operations (e.g. rename) are in for a competitor. THAT, however, is really bad design. So answer one question: in the case of say "mount -t deltafs -o front=/dev/ram/0,back=/dev/hda3 aggregate /opt" where /dev/ram/0 is a ext2 file system, and /dev/hda3 is a riserfs, who was the "stupid programmer" that caused the self deadlock? The guy who wrote ext2? The guy who wrote riserfs? The guy who wrote deltafs? (probably the latter in your mind.) Would the guy who wrote deltafs have been "smart" if he didn't lock things and just hoped for the best? Would he have been smart to take the locks, but then drop them quickly around the calls to his constituent file systems and again hoe for the best? Simply put, the fault lies in defining a public interface (dentry_operations, and file_ops) that can and will self-deadlock if used in aggregate operations and lo-and-behold if the used in "public" operations were reentrant then the problem would be solved for everyone. Period. There are other examples of highly desirable aggregate or reentrant access requirements that will come up. Do date they haven't really because they are flat out impossible to do more than fake up, but as hot-pluggable devices and loopback based techniques come to the fore, the dependency list starts to become unbounded. The point you doubly miss is that I am not suggesting "all locks must be reentrant" as some alternate pronouncement from on high, as a new truth to replace the old in some quasi-religious coup. Many of the (spin)locks should be just as they are. The per-cpu locks are an example. The paging system locks are another. The should-be-self-exclusive locks are characterized by a particular "privacy". The counter-truth is that any lock that should/could/might be held by code immediately on either side of a public interface should use recursive locking. These are primarily locks with a direct relationship to the various _operations structures. That is dentry_operations, file_ops, etc. may each have one or more resources that naturally relate to them, said resources are usually obvious because they are documented as locks that will be taken before call, might be taken before call, should be taken during phase X of what the call is supposed to do, or "must never be taken". These locks should be recursive. That way, when you want to have your file system on your (conditionally?) (partially?) encrypted block device which is dependent on your removable USB dongle key, the authors for each of the components don't have to go and crack open proven code. (e.g. the block device is encrypted [a la CCS, but not stupid] and is only accessible if the dongle key is inserted. Should that arrangement require the provider to also provide a mega-patch to ext3? I think not.) Sure, all your super-simple examples support super-simple locking. The interesting problems in the real world occasionally demand something more than simple-minded-ness. All of your counter examples and complaints boil down to variations on one, very simple and straight forward case: > relay() > { > mem = obtain_all_memory() > lock_all_resources(); > execute_procedure(mem); > unlock_all_resources(); > free(mem); > } > > Since the only path to execute_procedure() is through this > relay code, there cannot be any failures to unlock the device > in certain execution paths or memory leaks, etc. Everything > necessary to execute that common procedure atomically is > set up ahead of time and torn down after it returns. The logical outcome of your proposals is that there would (at any given level of abstraction) be some outer ring of relay functions that wrap the "real implementations" of all that lies within. It would have to be an outer ring so that, for my previous example, direct uses of ext3 and deltafs would hit the locks but the internal uses where deltafs would call ext3 directly would call the routines behind the locks. Such a ring could only be implemented by doubling the width of the public interface so that it had the ring-surface calls and the intra-ring calls (like read_op and read_by_peer_op). Yech. So who is in charge of making sure that everything on both sides of any possible combination of, say, dentry_operations is going to be re-dressed with these relay functions? Nobody, because that is naive and unworkable for aggregate code, you would only be guessing who might want to aggregate what combinations of things. It is a bad case of "magic happens here" and while such magic can be made in the specific cases, making that magic happen in the general case is the road back to the "BIG KERNEL LOCK". Rob. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: recursive spinlocks. Shoot. 2003-05-21 21:56 ` Robert White @ 2003-05-22 0:13 ` viro 2003-05-22 0:32 ` Robert White 2003-05-22 0:46 ` Carl-Daniel Hailfinger 1 sibling, 1 reply; 58+ messages in thread From: viro @ 2003-05-22 0:13 UTC (permalink / raw) To: Robert White; +Cc: root, Helge Hafting, Linux kernel On Wed, May 21, 2003 at 02:56:12PM -0700, Robert White wrote: > Lets say I have a file system with a perfectly implemented unlink and a > perfectly implemented rename. Both of these routines need to exist exactly > as they are. Both of these routines need to lock the vfs dentry subsystem > (look it up.) _Do_ look it up. Neither ->unlink() nor ->rename() need to do anything with any sort of dentry locking or modifications. Illustrates the point rather nicely, doesn't it? What was that about taking locks out of laziness and ignorance, again? 2%? You really like to feel yourself a member of select group... Unfortunately, that group is nowhere near that select - look up the Sturgeon's Law somewhere. 90% of anything and all such... ^ permalink raw reply [flat|nested] 58+ messages in thread
* RE: recursive spinlocks. Shoot. 2003-05-22 0:13 ` viro @ 2003-05-22 0:32 ` Robert White 0 siblings, 0 replies; 58+ messages in thread From: Robert White @ 2003-05-22 0:32 UTC (permalink / raw) To: viro; +Cc: root, Helge Hafting, Linux kernel yep, I kind of figured that was going to happen... (doh! 8-) (I was groping for the example without my references at hand. My bad for crediting someone with willingness to reason when I know they deliberately do not want to get the point... My apologies to the rest of the list here... 8-) Did my error in routine selection render you so fixated on counting coup that you TOTALLY missed the point about aggregation of operations? I bet it did... /sigh -----Original Message----- From: viro@www.linux.org.uk [mailto:viro@www.linux.org.uk]On Behalf Of viro@parcelfarce.linux.theplanet.co.uk Sent: Wednesday, May 21, 2003 5:14 PM To: Robert White Cc: root@chaos.analogic.com; Helge Hafting; Linux kernel Subject: Re: recursive spinlocks. Shoot. On Wed, May 21, 2003 at 02:56:12PM -0700, Robert White wrote: > Lets say I have a file system with a perfectly implemented unlink and a > perfectly implemented rename. Both of these routines need to exist exactly > as they are. Both of these routines need to lock the vfs dentry subsystem > (look it up.) _Do_ look it up. Neither ->unlink() nor ->rename() need to do anything with any sort of dentry locking or modifications. Illustrates the point rather nicely, doesn't it? What was that about taking locks out of laziness and ignorance, again? 2%? You really like to feel yourself a member of select group... Unfortunately, that group is nowhere near that select - look up the Sturgeon's Law somewhere. 90% of anything and all such... ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: recursive spinlocks. Shoot. 2003-05-21 21:56 ` Robert White 2003-05-22 0:13 ` viro @ 2003-05-22 0:46 ` Carl-Daniel Hailfinger 1 sibling, 0 replies; 58+ messages in thread From: Carl-Daniel Hailfinger @ 2003-05-22 0:46 UTC (permalink / raw) To: Robert White; +Cc: root, Helge Hafting, Linux kernel, viro Robert White wrote: > > Here is a problem statement that is virtually unimplementable in the current > Linux kernel because of the simple-minded locking model. > > Create a meta file-system that takes two existing (backing) file system > devices (or existing mounted directories) and aggregates them such that: > > (Ladies, and Gentlemen, The "deltafs" file system...) Call it unionfs. > 1) It doesn't matter what types of file systems are used as the backing file > systems. > 2) The aggregate file system is fully read-write > 3) The base (first existing) file system is read-only > 4) The front (second existing) file system is read-write > 5) All operations are available on the aggregate file system (unlink, > rename, open for write, open for append, chown, set access time, etc.) > 6) The aggregate file system engine will transcribe all the modified files > from the base to the front file system if the file is modified. > 7) The aggregate file system will (probably using a reserved file name and a > journaling structure encoded therein) maintain a white-out list to hide > unlinked and renamed files. > 8) After unmount, the front file system should be a minimal delta of the > base file system > 9) Remounting the same combination of file systems should consistently > result in the same, consistent file system image. > 10) (you get the point... 8-) IIRC, Al Viro was working on this and we might have it in 2.6 http://www.ussg.iu.edu/hypermail/linux/kernel/0201.0/0745.html Al? Regards, Carl-Daniel ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: recursive spinlocks. Shoot. 2003-05-20 21:42 ` Richard B. Johnson 2003-05-20 23:06 ` Robert White @ 2003-05-21 5:48 ` Nikita Danilov 1 sibling, 0 replies; 58+ messages in thread From: Nikita Danilov @ 2003-05-21 5:48 UTC (permalink / raw) To: root; +Cc: Robert White, Helge Hafting, Linux kernel Richard B. Johnson writes: > On Tue, 20 May 2003, Robert White wrote: > > > > > > > -----Original Message----- > > From: Richard B. Johnson [mailto:root@chaos.analogic.com] > > Sent: Tuesday, May 20, 2003 5:23 AM > > To: Helge Hafting > > > > > Recursive locking is a misnomer. It does during run-time that which > > > should have been done during design-time. In fact, there cannot > > > be any recursion associated with locking. A locking mechanism that > > > allows reentry or recursion is defective, both in design, and > > > implementation. > > > > Amusing... but false... > > > > A lock serves, and is defined by, exactly _ONE_ trait. A lock asserts and > > guarantees exclusive access to a domain (group of data or resources etc). > > > > The "ONE" trait is incorrect. > > The lock must guarantee that recursion is not allowed. Or to put > it more bluntly, the lock must result in a single thread of execution > through the locked object. > Suppose you have a set of objects X, each containing pointer to object from set Y, each y in Y being protected by a lock. To update pointer in x from y to y' one has to take lock on y, and global lock G in this order. To read pointer, it is enough to take G. Several x's may point to the same y. How to lock all y's pointed to by elements of X? If you have recursive locking one may just: 1. take each x from X in turn 2. lock G 3. take y pointed to by x 4. unlock G 5. lock y 6. lock G 7. re-check that y is still pointed to by x, and if not unlock y and go to 3 8. unlock G Without recursive locking what would one do, but emulate it? Nikita. ^ permalink raw reply [flat|nested] 58+ messages in thread
* RE: recursive spinlocks. Shoot. 2003-05-20 3:12 ` Robert White 2003-05-20 11:59 ` Helge Hafting @ 2003-05-22 1:00 ` Rik van Riel 2003-05-22 3:11 ` Robert White 1 sibling, 1 reply; 58+ messages in thread From: Rik van Riel @ 2003-05-22 1:00 UTC (permalink / raw) To: Robert White Cc: David Woodhouse, ptb, William Lee Irwin III, Martin J. Bligh, linux-kernel On Mon, 19 May 2003, Robert White wrote: > In point of fact, "proper" locking, when combined with "proper" > definitions of an interface dictate that recursive locking is "better". > Demanding that a call_EE_ know what locks a call_ER_ (and all > antecedents of caller) will have taken is not exactly good design. So call_EE_ messes with the data structure which call_ER_ has locked, unexpectedly because the recursive locking doesn't show up as an error. Looks like recursive locking would just make debugging harder. Rik -- Engineers don't grow up, they grow sideways. http://www.surriel.com/ http://kernelnewbies.org/ ^ permalink raw reply [flat|nested] 58+ messages in thread
* RE: recursive spinlocks. Shoot. 2003-05-22 1:00 ` Rik van Riel @ 2003-05-22 3:11 ` Robert White 2003-05-22 4:04 ` Nick Piggin 0 siblings, 1 reply; 58+ messages in thread From: Robert White @ 2003-05-22 3:11 UTC (permalink / raw) To: Rik van Riel Cc: David Woodhouse, ptb, William Lee Irwin III, Martin J. Bligh, linux-kernel > From: Rik van Riel [mailto:riel@surriel.com]On Behalf Of Rik van Riel > So call_EE_ messes with the data structure which call_ER_ > has locked, unexpectedly because the recursive locking > doesn't show up as an error. Yes and no. It all hinges on that nonsensical use of "unexpectedly". (I'll be using fully abstract examples from here on out to prevent the function call police from busting me again on a technicality... 8-) call_EE_ is an operator invoked by call_ER_, call_ER_ is changing the protected structure/data/state call_ER_ locked by invoking call_EE_ against it. Nothing "unexpected" has happened because call_ER_ invoked call_EE_ for a reason and with the expectation that call_EE_ would do whatever it is designed to do. Invoking call_EE_ is an imperative command to "mess with the structure" if that is what call_EE_ is supposed to be doing. When you call atomic_inc(something) you expect something to be "messed with" in a particular way. Is the fact that "something" got bigger by one "an error"? Nope. How about if it precision wrapped to zero? Probably not, but if it is, that is a larger semantic problem, not a problem with atomic_int() or your use thereof. Whenever anybody holds a lock they are expected to understand what the operations they invoke will do to the protected resource. The fact that the operator asserts nested ownership for the duration of its nested operation is mathematically provable as a non-issue. If you don't know what an operator is going to do, you shouldn't be invoking it in the kernel in the first place. (or anywhere else for that matter. 8-) If an operator does something it isn't supposed to do, the operator is broken by any and every measure regardless of locking. How hard is this to understand? (Looking back at "my original message" in this thread I exactly cede the point that what recursive locking give you on the well-behaved interface end it can take back from you on the lazy and/or stupid programmer end.) We already demand that the programmer who uses any operation inside the kernel understand the requirements and dangers of that operation. It is implicit that if you play with some pointer protected by a lock you are going to lock it, and unlock it, and not trash it etc. That is already a covenant inside the walls of the kernel. Holding a lock nets you a guarantee that "only the operations you invoke will change the protected data". The fact that some of those operations might be function calls is a given. The recursive lock concept adds exactly one "new feature", with recursive locks, you can call functions that themselves contain sections that want to make that same assertion. Without recursive locks you are instantly barred from calling an existing function (that presumably exists for its own rights a purposes) that does exactly what you need to do, if that function itself wants to make sure that only it is changing the protected data. How much sense does that make as a restriction? "You cannot call that function, it demands exclusive access to the data you have already claimed for the express purpose of modifying via function call(s)." That is like getting your car towed away because didn't move it after the police put a boot on the wheel. That is actually the whole thing. As it is today, if a move (of a file) were implemented as a link and an unlink, (and if those two routines needed to lock something which I am informed they do not 8-], and you were implementing the move after-the-fact for some reason) then the move function would have to be a copy of link and unlink function text now living in a new third function with the locking rearranged. But now we are maintaining two essentially identical copy of the code for links and unlinks. (or you would move the locks out of the "real" unlink and link and wrapper them with ... etc, od nausium, amen 8-) When you take the steps to allow aggregation across/around an interface you (just as you do today) define what any given routine is supposed to do and what it will/should dirty. If the implementers of the "inside" of the interface (e.g. the guy who filled in the file_operations structure in his driver) does his job then the code has a well defined behavior. It is expected that "whangle_inode_list" will "whangle" the inode list, whatever "whangle" means in this usage. The value of the recursive lock is that it lets the guy who comes along later work completely within the clearly defined set of sub-operations. That is, that programmer can know, if he has claimed the inode list lock, that the only entity who "whangled" it was the entity he called, and if there was another operator on the list "bojum", if he never called "bojum_inode_list" then bojum didn't happen, or if it did, it happened in the course of the explicitly invoked whangle. This knowledge exists independently of whangle's need to make sure that nobody bojum(s) while the whangle is taking place. At the aggregation level it would be "dumb" to something like boo() { lock_thingy; void* X = fetch_internal_thingy_pointer(); some_thingy_operations->whangle(); tingy_use_without_revalidation(X); unlock_thingy; } without the sure and simple knowledge that whangle() will not invalidate X. But this is no more difficult a mental leap than knowing whether or not free(X) will invalidate X. If you "know" X can not be invalidated by any "legal" whangle() then this code is okay, and can be used with every possible some_thingy_operations structure. If you don't know this, then this is bad code. The thing is, by defining the thingy_operations standard, and including the fact that lock_thingy protects certain data and states, and "whangle" never "bojums" (or whatever assertions we can or cant make for "any thingy") then we have clearly defined an interface. An other programmer, before or after the creation of any given thingy driver could, if he needs an "atomic" operation "snark" that is a whangle, a bojum, and another whangle in that specific order with no exposure to others trying to whangle, bojum, or snark, could easily write snark. Without recursive locking "snark" would have to look like snark() { some_thingy->whangle() some_thingy->bojum() some_thingy->whangle() } but this snark would not be atomic, it doesn't lock anything. If each operation is done inside a "hopefully minimum" unlock, the danger is still very real. snark() { // meaningless snark lock_thingy; unlock_thingy; thingy->whangle(); lock_thingy; unlock_thingy; thingy->bojum(); lock_thingy; unlock_thingy; thingy->whangle(); lock_thingy; unlock_thingy; } Clearly if snark is at a higher level of abstraction from whangle and bojum, and snarks are mutually exclusive, then we can just put in a snark_lock. But really if we "want to" lock thingy(s) individually instead of blocking all snarks what really are the alternatives to recursive locking? In fact, placing the entire onus on the aggregators to properly use their parts (operators and function calls) and be wary of side effects, places that onus exactly where it belongs. Someone capable of composing the abstraction is not likely to be tripped up by it. We know that "reading a file moves the read pointer" and so forth. If someone implements a set of operations in a thingy_operations struct that pisses all over things it shouldn't be touching, that person is to blame for their bad code. If someone else using a thingy_operations struct does so without a care as to what the operations do, then they are to blame. This is actually no more complex than knowing that if I atomic_inc(some_value), its going to "get larger", with the caveat of zero-wrap hanging over my head. There will always be the cases where some particular aggregation is not now safe and never will be. An example from before, if the deltafs (unionfs?) were to try to mount the same device as the front and base elementary file systems they would be doomed to failure. Aggregate use is invaluable, but it also requires as much understanding of programming in this kind of environment as anything else. It also allows for the construction of some incredible dependencies "at runtime" and a proper aggregate entity should be ready to deal with that. These are not insurmountable, but they are real costs. You can't protect everyone from themselves. The win is that, for the deltafs example, if you never violate your constituent components interfaces (e.g. the etx3 front-side file system semantics and data), you can co-own their elements and make "bigger promises" "for free" by leveraging the mature code. A well defined interface with recursive locks is no more difficult to maintain and generally exist within, than a well designed interface with self-deadlocking locks, it just has more "depth" (if you will forgive the quasi-pun) because of the opportunities for aggregation. Both beat the hell out of an interface definition that has to have all its code "fixed" because a lock needs to move from "callers responsibility" to "callees responsibility" or vice-versa in the next version. This stuff is much clearer at the application level. Make an MT-Safe Deque template that can work "over" both an MT-Safe List. The locking needs to be promoted to the Deque. If you want to view the contents as both the Deque and the List then the locking can't be in both separately, nor can it be in just one (gets obscure, but has to do with the "wait for object" nature of a Deque not being part of a List). If the locks are recursive, the Deque can be implemented without private access to the list's naughty bits. User calls Deque->Push which calls List->PutOnTail the lock is set twice, the end. View as list, view as deque, it all works. When you add the "wait for object" version of Pop, List isn't suddenly considered broken, but it is completely up to the writer of Deque to do what is necessary. There is some magic to wrapping a recursive structure around posix_mutex to yield a recursive entity that is useful with a posix_condition, but that is a totally different post. 8-) The point it, you *can* and indeed *must* apportion responsibility with any public interface anyway, so using recursive locking in a public interface adds far more than it costs. Rob. PS and again, I am not advocating that every lock "must be" recursive, just the ones tied to public interfaces (say, those expected to be used by functions named in any legal module _operations struct) need to be considered for recursion. Clearly the per-CPU locks and such are not candidates for recursion. There is a distinction to be made, and refusing the value of recursive locking is not making a distinction, it is clapping ones ears to ones head and dancing about under the assumption that since it isn't right in some instances it is wrong in all instances. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: recursive spinlocks. Shoot. 2003-05-22 3:11 ` Robert White @ 2003-05-22 4:04 ` Nick Piggin 2003-05-22 4:42 ` Peter T. Breuer 2003-05-23 0:19 ` Robert White 0 siblings, 2 replies; 58+ messages in thread From: Nick Piggin @ 2003-05-22 4:04 UTC (permalink / raw) To: Robert White Cc: Rik van Riel, David Woodhouse, ptb, William Lee Irwin III, Martin J. Bligh, linux-kernel Robert White wrote: >>From: Rik van Riel [mailto:riel@surriel.com]On Behalf Of Rik van Riel >> > >>So call_EE_ messes with the data structure which call_ER_ >>has locked, unexpectedly because the recursive locking >>doesn't show up as an error. >> > >Yes and no. It all hinges on that nonsensical use of "unexpectedly". > >(I'll be using fully abstract examples from here on out to prevent the >function call police from busting me again on a technicality... 8-) > Oh come on! Now I won't get into this argument, but you can't win by saying "if this were implemented in such a way that recursive locking is required, then recursive locking is required here"!! Look: I could easily reimplement your snark so it doesn't have to call the last "whangle" - now see how it can be done completely lockless with a memory barrier between a and b? Locking is an implementation issue, and as such I think you'll have to come up with a real problem or some real code. You must have some target problem in mind? Best regards, Nick ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: recursive spinlocks. Shoot. 2003-05-22 4:04 ` Nick Piggin @ 2003-05-22 4:42 ` Peter T. Breuer 2003-05-22 5:09 ` Nick Piggin 2003-05-23 0:19 ` Robert White 1 sibling, 1 reply; 58+ messages in thread From: Peter T. Breuer @ 2003-05-22 4:42 UTC (permalink / raw) To: Nick Piggin Cc: Robert White, Rik van Riel, David Woodhouse, ptb, William Lee Irwin III, Martin J. Bligh, linux-kernel "Nick Piggin wrote:" > Locking is an implementation issue, and as such I think you'll have > to come up with a real problem or some real code. You must have some > target problem in mind? I'll butt back in here. I found a problem when I made two drivers talk to each other. Concretely a modified raid MD driver and a block device driver. Each driver had ioctls that walked or modified a linear list, and locked the list against other threads while running their subroutine in the ioctl. To be concrete again, the lists were respectively the set of raid arrays in which the block device found itself currently, and the md drivers internal lists of arrays, component devices, etc. The ioctls worked fine when used from user space. Then I had the bright idea of getting the block driver to call the md driver's ioctl automatically when a certain condition arose. Concretely again, I had the block device driver tell the md driver "setfaulty" when the block device noticed an internal problem, and "hotadd" when it felt cured. Unfortunately, I had already gotten the md driver to tell the block driver when it was booted out from or newly included in an array (so that it could know if it should tell md when it felt ill or well). The result was a call from the block driver to the md driver with a lock held, and a rather unexpected call back from the md driver that impotently tried to take the same lock. Same thread. Peter ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: recursive spinlocks. Shoot. 2003-05-22 4:42 ` Peter T. Breuer @ 2003-05-22 5:09 ` Nick Piggin 0 siblings, 0 replies; 58+ messages in thread From: Nick Piggin @ 2003-05-22 5:09 UTC (permalink / raw) To: Peter T. Breuer Cc: Robert White, Rik van Riel, David Woodhouse, William Lee Irwin III, Martin J. Bligh, linux-kernel On Thu, May 22, 2003 at 06:42:15AM +0200, Peter T. Breuer wrote: > "Nick Piggin wrote:" > > Locking is an implementation issue, and as such I think you'll have > > to come up with a real problem or some real code. You must have some > > target problem in mind? > > I'll butt back in here. > snip > > The result was a call from the block driver to the md driver with a > lock held, and a rather unexpected call back from the md driver that > impotently tried to take the same lock. > > Same thread. > OK, in this case, it didn't sound like your block driver expected to be re-entered. Lucky the problem immediately caused a deadlock ;) More seriously, lets say you get around the above with recursive locks: Thread 1 Thread 2 (on another cpu) enter the block driver take the bd lock issue an md ioctl take the md lock enter the block driver spin on bd lock automatically call md ioctl spin on md lock So the problem needs a rethink. ^ permalink raw reply [flat|nested] 58+ messages in thread
* RE: recursive spinlocks. Shoot. 2003-05-22 4:04 ` Nick Piggin 2003-05-22 4:42 ` Peter T. Breuer @ 2003-05-23 0:19 ` Robert White 2003-05-23 7:22 ` Nikita Danilov 1 sibling, 1 reply; 58+ messages in thread From: Robert White @ 2003-05-23 0:19 UTC (permalink / raw) To: Nick Piggin, elladan Cc: Rik van Riel, David Woodhouse, ptb, William Lee Irwin III, Martin J. Bligh, linux-kernel, root This will, hopefully, be my out-comment on this thread. Various persons in different branches of this thread are making and repeating various inferences they have taken from my statements as if those were my position. I will state my position and then let most of this drop. But First: At some point I dropped the spinlock focus and started talking about locking in general. I'll take the onus for that completely. My Bad. Sorry. Now that it is in the mix, however, I will have to make it part of the position, else there will be all sorts of "but you said here that..." kinds of possible responses. My Position: 1) Recursive Locking has value, it is not absolutely inherently broken, bad, or evil as some seem to believe. There is no a-priori "somebody messed up" absolute that can be deduced simply because a logical structure reaches code that wants to clam a resource that the same thread, in an outer layer of abstraction, has already claimed. The only time it is a-priori wrong is when the locking mechanism is designed to self-deadlock, and it is only that self-deadlock that is "wrong by absolute definition". 2) Any application of locking (spin or otherwise) that can be written with recursive locking can be re-written without it. This is nearly tautology, even if it would be NP complete to prove. This is also _exactly_ the iterative vs. recursive design principle. Anything that can be written iteratively can be written recursively and vice versa. It is also true that many problems exhibit an inherent bias towards one or the other of these approaches. Any stance, in the absence of overriding pressure, that one of these approaches should be ruled out completely for a program or design is detrimental to that design. A real-world example of an overriding pressure is the absence of a real stack on the 8051 micro-controller causes the compiler to turn recursive calls into expensive iteration anyway. In the kernel design there are almost no such overriding pressures. 2a) Any application that "is correct" with self-deadlocking locks will be semantically identical using recursive locks. 2b) Persons relying on the self-deadlock behavior of a lock (spin or otherwise) to "detect semantic errors" in a production environment are doing a bad thing. This is not intended, in any way, to minimize the value of catching errors this way during development, especially in an open source project where all sorts of stuff is being joined together. I think, however, that we are all disappointed when our production systems lock up, and would prefer to have them notice the impending problem and at least print a diagnostic. Switching a self-deadlock spinlock for a recursive spinlock with a ceiling of one and a linked printk or stack-trace-and-oops() would be better than a silent lockup in a production system. The atomic spinlock system, as it exists today, has no sense of lock owner, and so can not tell a valid wait from a self-deadlock in an SMP setting. All but the most critical and private locks could do with detection and reporting in the wild because the kernel is subject to post-development vagaries caused by people adding modules. 3) Any lock (spin or otherwise) that is directly related to a public interface should be considered for recursive implementation. This point does not even come close to an assertion that all such locks "must" or even "should" be recursive. It is the simple statement that the locking policy and model for any public interface should be examined for possible value that could be developed because of rational aggregation or reentrancy. That decision, having been made, should be made an express part of the interface. 4) All locks (spin or otherwise) should obviously be held for the shortest amount of time reasonably possible which still produces the correct result. If this needs explaining... 8-) 5) The entity who takes a lock (programmer or code, spin or otherwise) is responsible for knowing what he is doing when he operates on a protected resource, and even all the non-protected resources. This is another corollary of the fact that any entity that uses a common anything (in life or in code) is responsible for its actions. Locking does not ensure proper or meaningful operations will be applied to the protected object, it only ensures that nobody else will be getting in the way. Any argument that contains "what if the programmer takes a lock and then calls/invokes something that..." is specious. The programmer who takes a lock and/or invokes an action, bares the responsibility for his operations whether he wrote the code locally or not. That's just life. ===== I will now provide one concrete and specific example of a 2.5 kernel facility (public interface, though not one with an _operations structure) that should be protected by a recursive lock. This is NOT because I have a specific application in mind, but because I can conceive of several possible scenarios and a related body of "overriding concern". I don't have the code right here to say that this is, or isn't, a spinlock per se. Analysis of the interface documentation and the obvious presence of "some kind of lock, probably in spinlock kinds of time quanta" is enough. The new workqueue, and it's associated work_struct, system "really ought to be" recursively protected. That is, there should be a lock_workqueue() and an unlock_workqueue() added to the public interface, and calling queue_work(), queue_delayed_work(), and cancel_delayed_work() (etc) should be legal exactly as they exist today (e.g. atomically), and also legal between calls of the proposed lock_workqueue() and unlock_workqueue() functions. Clearly using lock_workqueue() and unlock_workqueue() against "events" (the predefined system workqueue) should be "frowned upon", but even that should still be legal. The reasoning is simple. It is almost certain, or at least far from inconceivable, that a high-performance interface with obvious/high user visibility (In one email I used an AGP rendering pipeline), or subsystem significance, could find itself needing to atomically perform more than one insert/remove/modify action on a (hopefully private) work queue. I make no statement that this would be a "good design" but I can see cases where it could come up. I also hope that the reader understands that there could be other reasonable combinations/aggregations of operations other than the two adjacent cancel operations I cite below; the dual cancel is just a good simple but concrete possibility. So anyway, in the current public self-deadlocking interface, if some module writer needed to do, say, an atomic cancel of several particular work_struct(s), but he didn't want to flush the entire workqueue, he would be obliged to read the implementation, take the locks himself and then hack-apart and reassemble the workqueue's linked list in local code. This is bad, and God help the kernel if this module is loaded into a later kernel with a tweaked workqueue semantic. In keeping with principle 2, it is tautological that we could turn queue_work(), queue_delayed_work(), cancel_delayed_work() et. al. into a two layer affair with each calling their respective queue_work_not_locked() versions internally after having claimed the lock, then publicize the lock/unlock_workqueue() calls and all the *_not_locked() calls for people who need aggregation. This, however, would then more than double the size of the interface, damage clarity, and otherwise make things ugly and un-fun. Another principle 2 work around would be to have a "technically not recursive" (spin)lock and build tests into the various routines that detect the outer lock_workqueue() activity and branch around the actual lock attempts. That's just recursive locking in sheep's clothing. It is much better to let this module writer construct a sequence that looks like: lock, cancel, cancel, (whatever), unlock; that uses the regular and well defined top-level calls like cancel_delayed_work(). The fact is, if the lock built into the workqueue system is recursive, persons who need to make aggregate changes to a queue in an atomic way are provided for in an optimally useful and safe way. It is, of course, possible that an idiot could clog up an SMP machine by misusing this outer lock facility improperly. But an idiot can clog up an SMP machine by misusing almost any (spin)lock he cares to take, so that is a push. So here is one public interface that could stand the scrutiny and has a fairly obvious case where recursive locking is a big win, or at least a much bigger lose for non-recursive locking combined with a near certainty of desirable aggregate but still short-duration activities. The workqueue example came to mind almost instantly and I would be stunned if a simultaneously rigorous but open minded survey didn't yield at least a few more "fairly obvious" places where a recursive locking strategy wouldn't pay off big-time. ==== for those who need some flesh on the example, that is all that follows ==== Disclaimer: this is given an informed lay-opinion of what happens in graphics subsystems garnered mostly from sampling manual pages and reading magazines. It does, however, map to other possible problem sets, so it is good enough for this discussion. Consider a graphics rendering module that has a "draw mesh" ioctl/interface. A mesh is a vector of points (list of x, y coordinates and supporting data like color etc) that combine together to make a bunch of triangles that share vertices with their neighbors and get drawn and filled in by the graphics card to create a possibly-non-flat object. Consider that each graphics card has different features and optimal and maximum sizes for things like meshes. Now imagine you are designing the gaming driver interface (think DirectDraw for windows etc.) you don't want to limit the largest mesh acceptable in the interface definition to the smallest "largest mesh" you are likely to encounter in the cheapest video card a user might buy. So you build in a feature that, when a mesh is submitted to the rendering system, it gets compared to the current card, and it the mesh is "too big" it is cut into three parts. (the left part, the right part, and a meta-mesh to "zipper over" the seam between the two.) The system is also smart enough to detect an SMP machine and launch several high-priority kernel threads to do share the rendering. This is, after all for a game, where display speed is the most important thing... 8-) So now, the game sends in a mesh, which just got carved into three (or maybe more, but lets stick with three) pieces. So then, the game discovers that that mesh really needs to be canceled, or the rendering engine decides that the whole thing is behind a wall and should just be discarded to save time, or the entire frame is going to be canceled to make up for a network delay (etc.). If you cancel the mesh fragments (take them out of their work queue) piece-at-a-time you might make an ugly mess (say if only the small "zipper" segment made it onto the display surface) and disappoint your fans and users (both players and game programmers). So there you have it, some programmer focused entirely on his pretty picture simply _must_ *selectively* cull his workqueue to get rid of all of "mesh/frame X". He could, and probably will, walk the workqueue himself, which is highly undesirable from a system stability standpoint. He doesn't really care if he globs up the SMP CPUs to do it because they would otherwise be wasting their cycles in the more intensive task of processing data that will never see phosphor, and he has no choice but to ignore the published interface "just this once" because that interface would leave him with public mud on his face as bits leaked onto CPUs between the individual cancel operations. I can see it happening... pretty easily in fact... and in all sorts of other kinds of deferred tasking models... I can see a recursive locking making it useful for the programmer to stick to the public interface and therefore make the system more stable... I could also see the manual page for lock_workqueue() having a big bold disclaimer reading "USING THIS ROUTINE IS ALMOST ALWAYS THE WRONG THING TO DO but..." Rob. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: recursive spinlocks. Shoot. 2003-05-23 0:19 ` Robert White @ 2003-05-23 7:22 ` Nikita Danilov 2003-05-23 9:07 ` Helge Hafting 2003-05-23 12:18 ` William Lee Irwin III 0 siblings, 2 replies; 58+ messages in thread From: Nikita Danilov @ 2003-05-23 7:22 UTC (permalink / raw) To: Robert White Cc: Nick Piggin, elladan, Rik van Riel, David Woodhouse, ptb, William Lee Irwin III, Martin J. Bligh, linux-kernel, root Robert White writes: > This will, hopefully, be my out-comment on this thread. > [...] > > 4) All locks (spin or otherwise) should obviously be held for the shortest > amount of time reasonably possible which still produces the correct result. > > If this needs explaining... 8-) It surely does. Consider two loops: (1) spin_lock(&lock); list_for_each_entry(item, ...) { do something with item; } spin_unlock(&lock); versus (2) list_for_each_entry(item, ...) { spin_lock(&lock); do something with item; spin_unlock(&lock); } and suppose they both are equally correct. Now, in (2) total amount of time &lock is held is smaller than in (1), but (2) will usually perform worse on SMP, because: . spin_lock() is an optimization barrier . taking even un-contended spin lock is an expensive operation, because of the cache coherency issues. > [...] > > Rob. > Nikita. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: recursive spinlocks. Shoot. 2003-05-23 7:22 ` Nikita Danilov @ 2003-05-23 9:07 ` Helge Hafting 2003-05-23 12:18 ` William Lee Irwin III 1 sibling, 0 replies; 58+ messages in thread From: Helge Hafting @ 2003-05-23 9:07 UTC (permalink / raw) To: Nikita Danilov; +Cc: linux-kernel Nikita Danilov wrote: > Consider two loops: > > (1) > > spin_lock(&lock); > list_for_each_entry(item, ...) { > do something with item; > } > spin_unlock(&lock); > > versus > > (2) > > list_for_each_entry(item, ...) { > spin_lock(&lock); > do something with item; > spin_unlock(&lock); > } > > and suppose they both are equally correct. Now, in (2) total amount of > time &lock is held is smaller than in (1), but (2) will usually perform > worse on SMP, because: > > . spin_lock() is an optimization barrier > > . taking even un-contended spin lock is an expensive operation, because > of the cache coherency issues. This is a tradeoff. If the total running time is "short", use (1) for performance. If the running time is "long" use (2) to avoid lock contention. "long" time happens when the time wasted by other processors spinning typically exceed the time wasted by repeated lock+unlock, or there is excessive latency on some irq-blocking lock. You can get the best of both worlds (low latency and few lock operations) like this: while(more work to do) { spin_lock(&lock); process one suitably sized batch of items spin_unlock(&lock); } This sort of thing certainly helped the VM system. Helge Hafting ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: recursive spinlocks. Shoot. 2003-05-23 7:22 ` Nikita Danilov 2003-05-23 9:07 ` Helge Hafting @ 2003-05-23 12:18 ` William Lee Irwin III 2003-05-24 2:39 ` Robert White 1 sibling, 1 reply; 58+ messages in thread From: William Lee Irwin III @ 2003-05-23 12:18 UTC (permalink / raw) To: Nikita Danilov Cc: Robert White, Nick Piggin, elladan, Rik van Riel, David Woodhouse, ptb, Martin J. Bligh, linux-kernel, root On Fri, May 23, 2003 at 11:22:11AM +0400, Nikita Danilov wrote: > and suppose they both are equally correct. Now, in (2) total amount of > time &lock is held is smaller than in (1), but (2) will usually perform > worse on SMP, because: > . spin_lock() is an optimization barrier > . taking even un-contended spin lock is an expensive operation, because > of the cache coherency issues. All good. Also, the arrival rate (i.e. frequency of lock acquisition) is more important to lock contention than hold time, so they're actually not being as friendly to big SMP as the comment from Robert White would suggest. The arrival rate tends to be O(cpus) since whatever codepath pounds on a lock on one cpu can be executed on all simultaneously. -- wli ^ permalink raw reply [flat|nested] 58+ messages in thread
* RE: recursive spinlocks. Shoot. 2003-05-23 12:18 ` William Lee Irwin III @ 2003-05-24 2:39 ` Robert White 2003-05-28 16:50 ` Timothy Miller 0 siblings, 1 reply; 58+ messages in thread From: Robert White @ 2003-05-24 2:39 UTC (permalink / raw) To: William Lee Irwin III, Nikita Danilov Cc: Nick Piggin, elladan, Rik van Riel, David Woodhouse, ptb, Martin J. Bligh, linux-kernel, root William's comment below, and its direct ancestor from Nikita, seem to support the recursive locking approach to the locking (of some) resources. I didn't go into number 4 myself because I was (rightly) sure that the terms "shortest amount of time reasonably possible" and "correct result" would be points of contention. If Nikita's case 1 is indeed "better" on an SMP system (I take no official position, even though one could be easily deduced from the sum of my previous arguments... 8-) > spin_lock(&lock); > list_for_each_entry(item, ...) { > do something with item; > } > spin_unlock(&lock); and case 2 is "worse" > list_for_each_entry(item, ...) { > spin_lock(&lock); > do something with item; > spin_unlock(&lock); > } and since each workqueue primitive op is essentially: workqueue_op() { lock_workqueue() perform_actual_primitive_op() unlock_workqueue() } Then the existing solution of calling several workqueue operations as they exist today (provable by induction) always decompose to case 2. Calling just one workqueue_op is nominally case 1 (and indistinguishable from case 2 when the list-length is 1). So the existing model, for successive (aggregate) workqueue ops greater than one nets the SMP unfriendly and non-optimal result described in case 2. (ignoring, for now, the ratio of locked to unlocked instructions in each particular primitive) IF, however, an aggregator knows he is going to do several ops in succession (and wants the aggregation to be atomic) and takes the recursive lock in an outer context manually, then the event stream: lock_wokrqueue() list_for_each_entry(item...) { some_workque_op(current_item) } unlock_workqueue() by induction actually becomes: lock_wokrqueue() list_for_each_entry(item...) { lock_workqueue(); // Factoring point perform_actual_primitive_op(); unlock_workqueue(); // Factoring point } unlock_workqueue(); At the marked "factoring points" the code is effectively reduced from taking a lock to something like "if owner == me then increment/decrement lock_depth" because the ownership is tautological at all factoring points. (As is the presence of the lock data in the cache etc(?). But even if everything is inline the compiler probably still can't get rid of the test because the parts of the lock are almost certainly declared volatile.) This leads to a couple of questions (not rhetorical, I genuinely don't know): (and answers probably very by platform) 1) Does the barrier() and test_and_set() operators (e.g. xchgb on an x86), when executed on the non-lock-owner CPU(s) invalidate the relevant cache element(s) of the CPU that owns the lock? If not 2) Do we care that (in a recursive lock) the alterations of the "depth" value will invalidate the cache elements of the non-owner CPU(s) that are waiting for the lock to free up (given that the final unlock will commit at least one such invalidate anyway)? If the answer (to #1) is that the owning CPUs cache is not invalidated by a competitors active competition, then recursive locking is extremely cheap even if the compiler can't get rid of the conditional. regardless (and this one is a little rhetorical 8-) 3) Does the always-evaluated conditional (owner == me) and its associated low-level jump/branch impose sufficient unacceptable cost compared to the complexity of the actual protected operations on the proposed public interface(s) to justify always forcing the case-2 model? (e.g. most manhandle_workqueue operations in the example are way more complex than the cost of the ever-evaluated conditional.) and of course, for any considered public interface: 4) Do the primitive operations of each/any considered interface perform most of their task (over the domain of time) with the lock held or released? 5) For a particular interface, does the flexibility of a recursive lock outweigh the probability of someone coding each/any stupid error and/or the costs of adding the code to prevent same? That is, for any public interface that is selected for recursive locking there will always be some set of primitive operations which can't *ever* make sense when aggregated (invoked with the lock held in the aggregating context). In the case of the workqueue example, if you take the lock and then call the wait-for-workqueue-to-empty-itself call, you will wait forever because no CPU can take a task off of the queue 'cause it's locked... And as obvious as that may be to most people, there are going to be people out there making those kinds of mistakes. The follow-on decision to add check-primitives to various attractive nuisances is non free. Rob. -----Original Message----- From: William Lee Irwin III [mailto:wli@holomorphy.com] Sent: Friday, May 23, 2003 5:19 AM To: Nikita Danilov Cc: Robert White; Nick Piggin; elladan@eskimo.com; Rik van Riel; David Woodhouse; ptb@it.uc3m.es; Martin J. Bligh; linux-kernel@vger.kernel.org; root@chaos.analogic.com Subject: Re: recursive spinlocks. Shoot. On Fri, May 23, 2003 at 11:22:11AM +0400, Nikita Danilov wrote: > and suppose they both are equally correct. Now, in (2) total amount of > time &lock is held is smaller than in (1), but (2) will usually perform > worse on SMP, because: > . spin_lock() is an optimization barrier > . taking even un-contended spin lock is an expensive operation, because > of the cache coherency issues. All good. Also, the arrival rate (i.e. frequency of lock acquisition) is more important to lock contention than hold time, so they're actually not being as friendly to big SMP as the comment from Robert White would suggest. The arrival rate tends to be O(cpus) since whatever codepath pounds on a lock on one cpu can be executed on all simultaneously. -- wli ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: recursive spinlocks. Shoot. 2003-05-24 2:39 ` Robert White @ 2003-05-28 16:50 ` Timothy Miller 0 siblings, 0 replies; 58+ messages in thread From: Timothy Miller @ 2003-05-28 16:50 UTC (permalink / raw) To: Robert White Cc: William Lee Irwin III, Nikita Danilov, Nick Piggin, elladan, Rik van Riel, David Woodhouse, ptb, Martin J. Bligh, linux-kernel, root Someone's probably already suggested this, but here goes... If you REALLY have places where multiple levels may try to grab the same spinlock, and you would therefore have lock contention with yourself, why not just use multiple spinlocks? So if you often call B(), which grabs lock X, but sometimes you call A() which also needs lock X, but A() calls B() which therefore causes self contention, why not have A() use lock Y instead? Treat what A() uses as a separate resource? Or here's another idea: Why not have A() unlock before calling B()? Oh, and there's the obvious idea of passing parameters around which indicate whether or not a lock has been taken, although that's certainly a pain. I do see some value in tracking lock ownership. But what do you do about the unlocks? Do you keep a refcount and unlock when it reaches zero? Now, not only do you have to deal with locks across function calls, but you have to make sure everything unravels properly. Sounds difficult. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: recursive spinlocks. Shoot. 2003-05-18 17:24 ` Peter T. Breuer 2003-05-18 22:34 ` David Woodhouse @ 2003-05-19 2:05 ` Kevin O'Connor 2003-05-19 6:19 ` Jan Hudec ` (2 subsequent siblings) 4 siblings, 0 replies; 58+ messages in thread From: Kevin O'Connor @ 2003-05-19 2:05 UTC (permalink / raw) To: Peter T. Breuer; +Cc: linux-kernel On Sun, May 18, 2003 at 07:24:17PM +0200, Peter T. Breuer wrote: > Though I've got quite good at finding and removing deadlocks in my old > age, there are still two popular ways that the rest of the world's > prgrammers often shoot themselves in the foot with a spinlock: > > a) sleeping while holding the spinlock > b) taking the spinlock in a subroutine while you already have it [...] > The second method is used by programmers who aren't aware that some > obscure subroutine takes a spinlock, and who recklessly take a lock > before calling a subroutine (the very thought sends shivers down my > spine ...). Recursive spinlocks only hide the problem - consider programmers who take lock B and recklessly call a subroutine that takes lock A followed be lock B. The resulting code will appear to work fine, but may have introduced a subtle AB-BA deadlock. I'd rather have a coding defect that reliably and consistently causes a deadlock than one that causes deadlocks in rare timing related cases. >A popular scenario involves not /knowing/ that your routine > is called by the kernel with some obscure lock already held, and then > calling a subroutine that calls the same obscure lock. If the kernel invokes a callback with an obscure lock held (that is promiscuous enough to be grabbed in other helper sub-routines), then its probably a bug - why not just fix it? -Kevin -- ------------------------------------------------------------------------ | Kevin O'Connor "BTW, IMHO we need a FAQ for | | kevin@koconnor.net 'IMHO', 'FAQ', 'BTW', etc. !" | ------------------------------------------------------------------------ ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: recursive spinlocks. Shoot. 2003-05-18 17:24 ` Peter T. Breuer 2003-05-18 22:34 ` David Woodhouse 2003-05-19 2:05 ` Kevin O'Connor @ 2003-05-19 6:19 ` Jan Hudec 2003-05-19 10:29 ` Helge Hafting 2003-05-19 14:28 ` Martin J. Bligh 4 siblings, 0 replies; 58+ messages in thread From: Jan Hudec @ 2003-05-19 6:19 UTC (permalink / raw) To: Peter T. Breuer; +Cc: William Lee Irwin III, Martin J. Bligh, linux-kernel On Sun, May 18, 2003 at 07:24:17PM +0200, Peter T. Breuer wrote: > "A month of sundays ago William Lee Irwin III wrote:" > > At some point in the past, Peter Breuer's attribution was removed from: > > >> Here's a before-breakfast implementation of a recursive spinlock. That > > >> is, the same thread can "take" the spinlock repeatedly. > > > > On Sun, May 18, 2003 at 09:30:17AM -0700, Martin J. Bligh wrote: > > > Why? > > > > netconsole. > > That's a problem looking for a solution! No, the reason for wanting a > recursive spinlock is that nonrecursive locks make programming harder. > > Though I've got quite good at finding and removing deadlocks in my old > age, there are still two popular ways that the rest of the world's > prgrammers often shoot themselves in the foot with a spinlock: > > a) sleeping while holding the spinlock > b) taking the spinlock in a subroutine while you already have it > > The first method leads to an early death if the spinlock is a popular > one, as the only thread that can release it doesn't seem to be running, > errr.. > > The second method is used by programmers who aren't aware that some > obscure subroutine takes a spinlock, and who recklessly take a lock > before calling a subroutine (the very thought sends shivers down my > spine ...). A popular scenario involves not /knowing/ that your routine > is called by the kernel with some obscure lock already held, and then > calling a subroutine that calls the same obscure lock. The request > function is one example, but that's hardly obscure (and in 2.5 the > situation has eased there!). > > It's the case (b) that a recursive spinlock makes go away. It thought we still have the spinlock debuging level 2 which DOES CHECK THIS (seems noone knows about it - I had to fix a trivial error when I used it in user-mode arch, but it worked well then). [check means it gives backtrace] > Hey, that's not bad for a small change! 50% of potential programming > errors sent to the dustbin without ever being encountered. They are errors. Appropriate response to errors is an OOPS. ------------------------------------------------------------------------------- Jan 'Bulb' Hudec <bulb@ucw.cz> ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: recursive spinlocks. Shoot. 2003-05-18 17:24 ` Peter T. Breuer ` (2 preceding siblings ...) 2003-05-19 6:19 ` Jan Hudec @ 2003-05-19 10:29 ` Helge Hafting 2003-05-19 11:37 ` Nikita Danilov 2003-05-19 14:28 ` Martin J. Bligh 4 siblings, 1 reply; 58+ messages in thread From: Helge Hafting @ 2003-05-19 10:29 UTC (permalink / raw) To: ptb; +Cc: linux-kernel Peter T. Breuer wrote: > Hey, that's not bad for a small change! 50% of potential programming > errors sent to the dustbin without ever being encountered. Then you replace errors with inefficiency - nobody discovers that you needlessly take a lock twice. They notice OOPSes though, the lock gurus can then debug it. Trading performance for simplicity is ok in some cases, but I have a strong felling this isn't one of them. Consider how people optimize locking by shaving off a single cycle when they can, and try to avoid locking as much as possible for that big smp scalability. This is something better done right - people should just take the trouble. Helge Hafting ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: recursive spinlocks. Shoot. 2003-05-19 10:29 ` Helge Hafting @ 2003-05-19 11:37 ` Nikita Danilov 2003-05-22 1:21 ` Daniel Phillips 0 siblings, 1 reply; 58+ messages in thread From: Nikita Danilov @ 2003-05-19 11:37 UTC (permalink / raw) To: Helge Hafting; +Cc: ptb, linux-kernel Helge Hafting writes: > Peter T. Breuer wrote: > > > Hey, that's not bad for a small change! 50% of potential programming > > errors sent to the dustbin without ever being encountered. > > Then you replace errors with inefficiency - nobody discovers that > you needlessly take a lock twice. They notice OOPSes though, the > lock gurus can then debug it. > > Trading performance for simplicity is ok in some cases, but I have a strong > felling this isn't one of them. Consider how people optimize locking > by shaving off a single cycle when they can, and try to avoid > locking as much as possible for that big smp scalability. > > This is something better done right - people should just take the > trouble. There, however, are cases when recursive locking is needed. Take, for example, top-to-bottom insertion into balanced tree with per-node locking. Once modifications are done at the "leaf" level, parents should be locked and modified, but one cannot tell in advance whether different leaves have the same or different parents. Simplest (and, sometimes, the only) solution here is to lock parents of all children in turn, even if this may lock the same parent node several times---recursively. > > Helge Hafting > Nikita. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: recursive spinlocks. Shoot. 2003-05-19 11:37 ` Nikita Danilov @ 2003-05-22 1:21 ` Daniel Phillips 0 siblings, 0 replies; 58+ messages in thread From: Daniel Phillips @ 2003-05-22 1:21 UTC (permalink / raw) To: Nikita Danilov, Helge Hafting; +Cc: ptb, linux-kernel On Monday 19 May 2003 13:37, Nikita Danilov wrote: > There, however, are cases when recursive locking is needed. Take, for > example, top-to-bottom insertion into balanced tree with per-node > locking. Once modifications are done at the "leaf" level, parents should > be locked and modified, but one cannot tell in advance whether different > leaves have the same or different parents. Simplest (and, sometimes, the > only) solution here is to lock parents of all children in turn, even if > this may lock the same parent node several times---recursively. Having locked the parent of one child you then check whether the parent of the second child is in fact the same, in which case you already hold the lock and must remember to unlock it only once, otherwise lock the second parent. Do you see a hole in that? Regards, Daniel ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: recursive spinlocks. Shoot. 2003-05-18 17:24 ` Peter T. Breuer ` (3 preceding siblings ...) 2003-05-19 10:29 ` Helge Hafting @ 2003-05-19 14:28 ` Martin J. Bligh 4 siblings, 0 replies; 58+ messages in thread From: Martin J. Bligh @ 2003-05-19 14:28 UTC (permalink / raw) To: ptb, William Lee Irwin III; +Cc: linux-kernel > That's a problem looking for a solution! No, the reason for wanting a > recursive spinlock is that nonrecursive locks make programming harder. And more correct. > Though I've got quite good at finding and removing deadlocks in my old > age, there are still two popular ways that the rest of the world's > prgrammers often shoot themselves in the foot with a spinlock: > > a) sleeping while holding the spinlock > b) taking the spinlock in a subroutine while you already have it So ... we should BUG() on both. M. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: recursive spinlocks. Shoot. 2003-05-18 9:21 recursive spinlocks. Shoot Peter T. Breuer 2003-05-18 16:30 ` Martin J. Bligh @ 2003-05-18 18:13 ` Davide Libenzi 1 sibling, 0 replies; 58+ messages in thread From: Davide Libenzi @ 2003-05-18 18:13 UTC (permalink / raw) To: Peter T. Breuer; +Cc: linux-kernel On Sun, 18 May 2003, Peter T. Breuer wrote: > > Here's a before-breakfast implementation of a recursive spinlock. That > is, the same thread can "take" the spinlock repeatedly. This is crude - > I just want to focus some attention on the issue (while I go out and > have breakfast :'E). > > The idea is to implement trylock correctly, and then get lock and > unlock from that via the standard algebra. lock is a loop doing > a trylock until it succeeds. We emerge from a successful trylock > with the lock notionally held. > > The "spinlock" is a register of the current pid, plus a recursion > counter, with atomic access. The pid is either -1 (unset, count is > zero) or some decent value (count is positive). > > The trylock will succeed and set the pid if it is currently unset. It > will succeed if the pid matches ours, and increment the count of > holders. > > Unlock just decrements the count. When we've unlocked enough times, > somebody else can take the lock. A looong time ago I gave to someone a recursive spinlock implementation that they integrated in the USB code. I don't see it in the latest kernels, so I have to guess that they found a better solution to do their things. I'm biased to say that it must not be necessary to have the thing if you structure your code correctly. - Davide ^ permalink raw reply [flat|nested] 58+ messages in thread
[parent not found: <20030518182010$0541@gated-at.bofh.it>]
* Re: recursive spinlocks. Shoot. [not found] <20030518182010$0541@gated-at.bofh.it> @ 2003-05-18 19:09 ` Peter T. Breuer 2003-05-18 19:31 ` Davide Libenzi 2003-05-19 20:47 ` Jan Hudec 0 siblings, 2 replies; 58+ messages in thread From: Peter T. Breuer @ 2003-05-18 19:09 UTC (permalink / raw) To: Davide Libenzi; +Cc: linux-kernel In article <20030518182010$0541@gated-at.bofh.it> you wrote: > On Sun, 18 May 2003, Peter T. Breuer wrote: >> Here's a before-breakfast implementation of a recursive spinlock. That > A looong time ago I gave to someone a recursive spinlock implementation > that they integrated in the USB code. I don't see it in the latest > kernels, so I have to guess that they found a better solution to do their > things. I'm biased to say that it must not be necessary to have the thing > if you structure your code correctly. Well, you can get rid of anything that way. The question is if the interface is an appropriate one to use or not - i.e. if it makes for better code in general, or if it make errors of programming less likely. I would argue that the latter is undoubtedly true - merely that userspace flock/fcntl works that way would argue for it, but there are a couple of other reasons too. Going against is the point that it may be slower. Can you dig out your implementation and show me it? I wasn't going for assembler in my hasty example. I just wanted to establish that it's easy, so that it becomes known that its easy, and folks therefore aren't afraid of it. That both you and I have had to write it implies that it's not obvious code to everyone. Peter ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: recursive spinlocks. Shoot. 2003-05-18 19:09 ` Peter T. Breuer @ 2003-05-18 19:31 ` Davide Libenzi 2003-05-18 19:49 ` Peter T. Breuer 2003-05-19 20:47 ` Jan Hudec 1 sibling, 1 reply; 58+ messages in thread From: Davide Libenzi @ 2003-05-18 19:31 UTC (permalink / raw) To: Peter T. Breuer; +Cc: Linux Kernel Mailing List On Sun, 18 May 2003, Peter T. Breuer wrote: > In article <20030518182010$0541@gated-at.bofh.it> you wrote: > > On Sun, 18 May 2003, Peter T. Breuer wrote: > >> Here's a before-breakfast implementation of a recursive spinlock. That > > > A looong time ago I gave to someone a recursive spinlock implementation > > that they integrated in the USB code. I don't see it in the latest > > kernels, so I have to guess that they found a better solution to do their > > things. I'm biased to say that it must not be necessary to have the thing > > if you structure your code correctly. > > Well, you can get rid of anything that way. The question is if the > interface is an appropriate one to use or not - i.e. if it makes for > better code in general, or if it make errors of programming less > likely. > > I would argue that the latter is undoubtedly true - merely that > userspace flock/fcntl works that way would argue for it, but there > are a couple of other reasons too. > > Going against is the point that it may be slower. Can you dig out your > implementation and show me it? I wasn't going for assembler in my hasty > example. I just wanted to establish that it's easy, so that it becomes > known that its easy, and folks therefore aren't afraid of it. That both > you and I have had to write it implies that it's not obvious code to > everyone. It was something like this. The real code should be in 2.2.x I believe. The problem at the time was nested ( recursive ) spinlocks. typedef struct s_nestlock { spinlock_t lock; void *uniq; atomic_t count; } nestlock_t; #define nestlock_init(snl) \ do { \ spin_lock_init(&(snl)->lock); \ (snl)->uniq = NULL; \ atomic_set(&(snl)->count, 0); \ } while (0) #define nestlock_irqlock(snl, flags) \ do { \ if ((snl)->uniq == current) { \ atomic_inc(&(snl)->count); \ flags = 0; /* No warnings */ \ } else { \ spin_lock_irqsave(&(snl)->lock, flags); \ atomic_inc(&(snl)->count); \ (snl)->uniq = current; \ } \ } while (0) #define nestlock_irqunlock(snl, flags) \ do { \ if (atomic_dec_and_test(&(snl)->count)) { \ (snl)->uniq = NULL; \ spin_unlock_irqrestore(&(snl)->lock, flags); \ } \ } while (0) #define nestlock_lock(snl) \ do { \ if ((snl)->uniq == current) { \ atomic_inc(&(snl)->count); \ } else { \ spin_lock(&(snl)->lock); \ atomic_inc(&(snl)->count); \ (snl)->uniq = current; \ } \ } while (0) #define nestlock_unlock(snl) \ do { \ if (atomic_dec_and_test(&(snl)->count)) { \ (snl)->uniq = NULL; \ spin_unlock(&(snl)->lock); \ } \ } while (0) - Davide ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: recursive spinlocks. Shoot. 2003-05-18 19:31 ` Davide Libenzi @ 2003-05-18 19:49 ` Peter T. Breuer 2003-05-18 20:13 ` Davide Libenzi 0 siblings, 1 reply; 58+ messages in thread From: Peter T. Breuer @ 2003-05-18 19:49 UTC (permalink / raw) To: Davide Libenzi; +Cc: Peter T. Breuer, Linux Kernel Mailing List "Davide Libenzi wrote:" > On Sun, 18 May 2003, Peter T. Breuer wrote: > > Going against is the point that it may be slower. Can you dig out your > > implementation and show me it? I wasn't going for assembler in my hasty > > example. I just wanted to establish that it's easy, so that it becomes > > It was something like this. The real code should be in 2.2.x I believe. > The problem at the time was nested ( recursive ) spinlocks. > > > typedef struct s_nestlock { > spinlock_t lock; > void *uniq; > atomic_t count; > } nestlock_t; This is essentially the same struct as mine. I used the pid of the task, where you use the address of the task. You use an atomic count, whereas I used an ordinary int, guarded by the embedded spinlock. > #define nestlock_lock(snl) \ > do { \ > if ((snl)->uniq == current) { \ That would be able to read uniq while it is being written by something else (which it can, according to the code below). It needs protection. Probably you want the (< 24 bit) pid instead and to use atomic ops on it. > atomic_inc(&(snl)->count); \ OK, that's the same. > } else { \ > spin_lock(&(snl)->lock); \ > atomic_inc(&(snl)->count); \ > (snl)->uniq = current; \ Hmm .. else we wait for the lock, and then set count and uniq, while somebody else may have entered and be reading it :-). You exit with the embedded lock held, while I used the lock only as a guard to make the ops atomic. > } \ > } while (0) > > #define nestlock_unlock(snl) \ > do { \ > if (atomic_dec_and_test(&(snl)->count)) { \ > (snl)->uniq = NULL; \ > spin_unlock(&(snl)->lock); \ That's OK. > } \ > } while (0) Well, it's not assembler either, but at least it's easily comparable with the nonrecursive version. It's essentially got an extra if and an inc in the lock. That's all. I suspect that the rwlock assembler can be coerced to do the job. Peter ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: recursive spinlocks. Shoot. 2003-05-18 19:49 ` Peter T. Breuer @ 2003-05-18 20:13 ` Davide Libenzi 0 siblings, 0 replies; 58+ messages in thread From: Davide Libenzi @ 2003-05-18 20:13 UTC (permalink / raw) To: Peter T. Breuer; +Cc: Linux Kernel Mailing List On Sun, 18 May 2003, Peter T. Breuer wrote: > This is essentially the same struct as mine. I used the pid of the task, > where you use the address of the task. You use an atomic count, whereas > I used an ordinary int, guarded by the embedded spinlock. > > > > #define nestlock_lock(snl) \ > > do { \ > > if ((snl)->uniq == current) { \ > > That would be able to read uniq while it is being written by something > else (which it can, according to the code below). It needs protection. No it does not, look better. > > atomic_inc(&(snl)->count); \ > > OK, that's the same. > > > } else { \ > > spin_lock(&(snl)->lock); \ > > atomic_inc(&(snl)->count); \ > > (snl)->uniq = current; \ > > Hmm .. else we wait for the lock, and then set count and uniq, while > somebody else may have entered and be reading it :-). You exit with Nope, think about a case were it breaks. False negatives are not possible because it is set by the same task and false positives either. > Well, it's not assembler either, but at least it's easily comparable > with the nonrecursive version. It's essentially got an extra if and > an inc in the lock. That's all. Well, there's a little difference. In case of contention, you loop with your custom try lock while I used the optimized asm code inside spin_lock. But again, I believe we didn't lose anything with the removal of this code (nested/recursive locks) from the kernel. - Davide ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: recursive spinlocks. Shoot. 2003-05-18 19:09 ` Peter T. Breuer 2003-05-18 19:31 ` Davide Libenzi @ 2003-05-19 20:47 ` Jan Hudec 1 sibling, 0 replies; 58+ messages in thread From: Jan Hudec @ 2003-05-19 20:47 UTC (permalink / raw) To: Peter T. Breuer; +Cc: Davide Libenzi, linux-kernel On Sun, May 18, 2003 at 09:09:54PM +0200, Peter T. Breuer wrote: > In article <20030518182010$0541@gated-at.bofh.it> you wrote: > > On Sun, 18 May 2003, Peter T. Breuer wrote: > >> Here's a before-breakfast implementation of a recursive spinlock. That > > > A looong time ago I gave to someone a recursive spinlock implementation > > that they integrated in the USB code. I don't see it in the latest > > kernels, so I have to guess that they found a better solution to do their > > things. I'm biased to say that it must not be necessary to have the thing > > if you structure your code correctly. > > Well, you can get rid of anything that way. The question is if the > interface is an appropriate one to use or not - i.e. if it makes for > better code in general, or if it make errors of programming less > likely. I dare to disagree. It makes for more messy code in general and might result in the obvious bugs to be replaced by subtle ones that are far harder to debug. > I would argue that the latter is undoubtedly true - merely that > userspace flock/fcntl works that way would argue for it, but there > are a couple of other reasons too. No ;-) Only fcntl does. > Going against is the point that it may be slower. Can you dig out your > implementation and show me it? I wasn't going for assembler in my hasty It's also a waste of memory. There are many structures that have a lock per instance and four extra bytes (for the owner) would be noticeable. Not that memory is so precious resource, but cachelines may be. > example. I just wanted to establish that it's easy, so that it becomes > known that its easy, and folks therefore aren't afraid of it. That both > you and I have had to write it implies that it's not obvious code to > everyone. It's not about weather it's easy. It's about weather it's useful. ------------------------------------------------------------------------------- Jan 'Bulb' Hudec <bulb@ucw.cz> ^ permalink raw reply [flat|nested] 58+ messages in thread
[parent not found: <20030518202013$5297@gated-at.bofh.it>]
* Re: recursive spinlocks. Shoot. [not found] <20030518202013$5297@gated-at.bofh.it> @ 2003-05-18 23:15 ` Peter T. Breuer 2003-05-18 23:26 ` Davide Libenzi 2003-05-19 20:22 ` Robert White 0 siblings, 2 replies; 58+ messages in thread From: Peter T. Breuer @ 2003-05-18 23:15 UTC (permalink / raw) To: Davide Libenzi; +Cc: linux-kernel In article <20030518202013$5297@gated-at.bofh.it> you wrote: >> >> > #define nestlock_lock(snl) \ >> > do { \ >> > if ((snl)->uniq == current) { \ >> >> That would be able to read uniq while it is being written by something >> else (which it can, according to the code below). It needs protection. > No it does not, look better. I'm afraid I only see that it does! >> > atomic_inc(&(snl)->count); \ >> > } else { \ >> > spin_lock(&(snl)->lock); \ >> > atomic_inc(&(snl)->count); \ >> > (snl)->uniq = current; \ >> >> Hmm .. else we wait for the lock, and then set count and uniq, while >> somebody else may have entered and be reading it :-). You exit with > Nope, think about a case were it breaks. False negatives are not possible > because it is set by the same task and false positives either. No. This is not true. Imagine two threads, timed as follows ... . . . . if ((snl)->uniq == current) { atomic_inc(&(snl)->count); . } else { . spin_lock(&(snl)->lock); . atomic_inc(&(snl)->count); . (snl)->uniq = current; <-> if ((snl)->uniq == current) { atomic_inc(&(snl)->count); } else { spin_lock(&(snl)->lock); atomic_inc(&(snl)->count); (snl)->uniq = current; There you are. One hits the read exactly at the time the other does a write. Bang. >> Well, it's not assembler either, but at least it's easily comparable >> with the nonrecursive version. It's essentially got an extra if and >> an inc in the lock. That's all. > Well, there's a little difference. In case of contention, you loop with > your custom try lock while I used the optimized asm code inside spin_lock. > But again, I believe we didn't lose anything with the removal of this code > (nested/recursive locks) from the kernel. We lose hours of programmers time, looking for deadlocks caused by accidently taking the same spinlock twice and not knowing it. A question in my mind is whether a fault in a third thread, like sleeping with a spinlock held, can make a recursive spinlock into a fault causer ... no, I don't see any way. Peter ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: recursive spinlocks. Shoot. 2003-05-18 23:15 ` Peter T. Breuer @ 2003-05-18 23:26 ` Davide Libenzi 2003-05-19 12:48 ` Peter T. Breuer 2003-05-19 20:22 ` Robert White 1 sibling, 1 reply; 58+ messages in thread From: Davide Libenzi @ 2003-05-18 23:26 UTC (permalink / raw) To: Peter T. Breuer; +Cc: Linux Kernel Mailing List On Mon, 19 May 2003, Peter T. Breuer wrote: > No. This is not true. Imagine two threads, timed as follows ... > > . > . > . > . > if ((snl)->uniq == current) { > atomic_inc(&(snl)->count); . > } else { . > spin_lock(&(snl)->lock); . > atomic_inc(&(snl)->count); . > (snl)->uniq = current; <-> if ((snl)->uniq == current) { > atomic_inc(&(snl)->count); > } else { > spin_lock(&(snl)->lock); > atomic_inc(&(snl)->count); > (snl)->uniq = current; > > > There you are. One hits the read exactly at the time the other does a > write. Bang. So, what's bang for you ? The second task (the one that reads "uniq") will either see "uniq" as NULL or as (task1)->current. And it'll go acquiring the lock, as expected. Check it again ... - Davide ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: recursive spinlocks. Shoot. 2003-05-18 23:26 ` Davide Libenzi @ 2003-05-19 12:48 ` Peter T. Breuer 2003-05-19 17:15 ` Davide Libenzi 0 siblings, 1 reply; 58+ messages in thread From: Peter T. Breuer @ 2003-05-19 12:48 UTC (permalink / raw) To: Davide Libenzi; +Cc: Peter T. Breuer, Linux Kernel Mailing List "A month of sundays ago Davide Libenzi wrote:" > On Mon, 19 May 2003, Peter T. Breuer wrote: > > > No. This is not true. Imagine two threads, timed as follows ... > > > > . > > . > > . > > . > > if ((snl)->uniq == current) { > > atomic_inc(&(snl)->count); . > > } else { . > > spin_lock(&(snl)->lock); . > > atomic_inc(&(snl)->count); . > > (snl)->uniq = current; <-> if ((snl)->uniq == current) { > > atomic_inc(&(snl)->count); > > } else { > > spin_lock(&(snl)->lock); > > atomic_inc(&(snl)->count); > > (snl)->uniq = current; > > > > > > There you are. One hits the read exactly at the time the other does a > > write. Bang. > > So, what's bang for you ? The second task (the one that reads "uniq") > will either see "uniq" as NULL or as (task1)->current. And it'll go > acquiring the lock, as expected. Check it again ... Perhaps I should expand on my earlier answer ... (1) while, with some luck, writing may be atomic on ia32 (and I'm not sure it is, I'm only prepared to guarantee it for the lower bits, and I really don't know about zeroing the carry and so on), I actually doubt that reading is atomic, or we wouldn't need the atomic_read construction! (2) I'm not prepared to bet that one either sees the answer from before the write was done, or the answer after it is done. I would suspect that one can get anything, including an explosion or reading of the works of tolstoy ... (3) even if one gets either one or the other answer, one of them would be the wrong answer, and you clearly intend the atomic_inc of the counter to be done in the same atomic region as the setting to current, which would be a programming hypothesis that is broken when the wrong answer comes up. Peter ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: recursive spinlocks. Shoot. 2003-05-19 12:48 ` Peter T. Breuer @ 2003-05-19 17:15 ` Davide Libenzi 2003-05-19 17:27 ` Peter T. Breuer 2003-05-19 19:51 ` Peter T. Breuer 0 siblings, 2 replies; 58+ messages in thread From: Davide Libenzi @ 2003-05-19 17:15 UTC (permalink / raw) To: Peter T. Breuer; +Cc: Linux Kernel Mailing List On Mon, 19 May 2003, Peter T. Breuer wrote: > "A month of sundays ago Davide Libenzi wrote:" > > On Mon, 19 May 2003, Peter T. Breuer wrote: > > > > > No. This is not true. Imagine two threads, timed as follows ... > > > > > > . > > > . > > > . > > > . > > > if ((snl)->uniq == current) { > > > atomic_inc(&(snl)->count); . > > > } else { . > > > spin_lock(&(snl)->lock); . > > > atomic_inc(&(snl)->count); . > > > (snl)->uniq = current; <-> if ((snl)->uniq == current) { > > > atomic_inc(&(snl)->count); > > > } else { > > > spin_lock(&(snl)->lock); > > > atomic_inc(&(snl)->count); > > > (snl)->uniq = current; > > > > > > > > > There you are. One hits the read exactly at the time the other does a > > > write. Bang. > > > > So, what's bang for you ? The second task (the one that reads "uniq") > > will either see "uniq" as NULL or as (task1)->current. And it'll go > > acquiring the lock, as expected. Check it again ... > > Perhaps I should expand on my earlier answer ... > > (1) while, with some luck, writing may be atomic on ia32 (and I'm not > sure it is, I'm only prepared to guarantee it for the lower bits, and I > really don't know about zeroing the carry and so on), I actually doubt > that reading is atomic, or we wouldn't need the atomic_read > construction! Look at atomic read : $ emacs `find /usr/src/linux/include -name atomic.h | xargs` > (3) even if one gets either one or the other answer, one of them would > be the wrong answer, and you clearly intend the atomic_inc of the > counter to be done in the same atomic region as the setting to current, > which would be a programming hypothesis that is broken when the wrong > answer comes up. Atomic inc/dec/add/sub are different to read/write an aligned sizeof(int) memory location. An aligned sizeof(int) read/write must be atomic for the bare bone CPU memory coherency. While add/sub/add/inc are (or at least can be) split in MEMOP(load)->ALUOP(?)->MEMOP(store) whose full cycle is not guaranteed to be atomic, load/store of sizeof(int) aligned memory location are not split is every CPU whose designer was not drunk during the architectural phase. Or better, if they are split due some sort of HW limitation, the HW itself has to guarentee the atomicity of the operation. Intel tries also to guarantee the atomicity of non aligned load/store by doing split locks on the memory bus. If you have : int a; thread1: for (;;) a = 1; thread2: for (;;) a = -1; being "a" an aligned memory location, a third thread reading "a" reads either 1 or -1. Going back to the (doubtfully useful) code, you still have to point out were it does bang ... - Davide ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: recursive spinlocks. Shoot. 2003-05-19 17:15 ` Davide Libenzi @ 2003-05-19 17:27 ` Peter T. Breuer 2003-05-19 17:57 ` Alan Cox 2003-05-19 19:51 ` Peter T. Breuer 1 sibling, 1 reply; 58+ messages in thread From: Peter T. Breuer @ 2003-05-19 17:27 UTC (permalink / raw) To: Davide Libenzi; +Cc: Peter T. Breuer, Linux Kernel Mailing List > > (1) while, with some luck, writing may be atomic on ia32 (and I'm not > > sure it is, I'm only prepared to guarantee it for the lower bits, and I > > really don't know about zeroing the carry and so on), I actually doubt > > that reading is atomic, or we wouldn't need the atomic_read > > construction! > > Look at atomic read : > > $ emacs `find /usr/src/linux/include -name atomic.h | xargs` :-). Well, whaddya know. Both read and write of a int (declared volatile) are atomic on ia32. > > (3) even if one gets either one or the other answer, one of them would > > be the wrong answer, and you clearly intend the atomic_inc of the > > counter to be done in the same atomic region as the setting to current, > > which would be a programming hypothesis that is broken when the wrong > > answer comes up. [snip atomicity of read/write] > being "a" an aligned memory location, a third thread reading "a" reads > either 1 or -1. Going back to the (doubtfully useful) code, you still have > to point out were it does bang ... OK. I'll have a look later. Peter ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: recursive spinlocks. Shoot. 2003-05-19 17:27 ` Peter T. Breuer @ 2003-05-19 17:57 ` Alan Cox 0 siblings, 0 replies; 58+ messages in thread From: Alan Cox @ 2003-05-19 17:57 UTC (permalink / raw) To: ptb; +Cc: Davide Libenzi, Linux Kernel Mailing List On Llu, 2003-05-19 at 18:27, Peter T. Breuer wrote: > :-). Well, whaddya know. Both read and write of a int (declared > volatile) are atomic on ia32. Except when they aren't. There are ppro SMP issues about ordering of unlocked stores in some situations. Thats why the PPro generates lock movb $0, foo for the unlock on PPro Fixable however by PPro specific lock ops ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: recursive spinlocks. Shoot. 2003-05-19 17:15 ` Davide Libenzi 2003-05-19 17:27 ` Peter T. Breuer @ 2003-05-19 19:51 ` Peter T. Breuer 1 sibling, 0 replies; 58+ messages in thread From: Peter T. Breuer @ 2003-05-19 19:51 UTC (permalink / raw) To: Davide Libenzi; +Cc: Peter T. Breuer, Linux Kernel Mailing List "Davide Libenzi wrote:" > On Mon, 19 May 2003, Peter T. Breuer wrote: > > > > . > > > > . > > > > . > > > > if ((snl)->uniq == current) { > > > > atomic_inc(&(snl)->count); . > > > > } else { . > > > > spin_lock(&(snl)->lock); . > > > > atomic_inc(&(snl)->count); . > > > > (snl)->uniq = current; <-> if ((snl)->uniq == current) { > > > > atomic_inc(&(snl)->count); > > > > } else { > > > > spin_lock(&(snl)->lock); > > > > atomic_inc(&(snl)->count); > > > > (snl)->uniq = current; > > > So, what's bang for you ? The second task (the one that reads "uniq") > > > will either see "uniq" as NULL or as (task1)->current. And it'll go > > > acquiring the lock, as expected. Check it again ... Let's take that as hypothetically true. > > (3) even if one gets either one or the other answer, one of them would > > be the wrong answer, and you clearly intend the atomic_inc of the > > counter to be done in the same atomic region as the setting to current, > > which would be a programming hypothesis that is broken when the wrong > > answer comes up. > either 1 or -1. Going back to the (doubtfully useful) code, you still have > to point out were it does bang ... The "unexpected" sutuation is when the RH process reads the old value of uniq, just before the LH process sets it. Let's suppose that it erroneously reads NULL, since that's what was set last, at the unlock that lets the LH spinlock open up. That's OK, because it's not equal to its own task identifier either, and that's the only special thing it's looking for. Umm ... actually, the RH process could read uniq before the LH process entered the spinlock. If it were at that time equal to its own task identifier, then chaos would ensue. But if it were so equal, then the RH process would have the spinlock, since it's taken before setting uniq. OK, I agree, there's no problem if the read doesn't blow up. Peter ^ permalink raw reply [flat|nested] 58+ messages in thread
* RE: recursive spinlocks. Shoot. 2003-05-18 23:15 ` Peter T. Breuer 2003-05-18 23:26 ` Davide Libenzi @ 2003-05-19 20:22 ` Robert White 1 sibling, 0 replies; 58+ messages in thread From: Robert White @ 2003-05-19 20:22 UTC (permalink / raw) To: Peter T. Breuer, Davide Libenzi; +Cc: linux-kernel Actually, it is safe to have the concurrent/race condition for the read of a lock owner (outside the lock) and the write of a lock owner (inside the lock) as long as the "current owner" value stored inside the lock is "sent away" before the recursive lock is unlocked for the last time. The owner value doesn't even have to be atomic (though it ought to be) The only "hard" requirement is that there is no mid-update value that can ever be somebody else. This requirement can be violated in some odd segmenting systems if "me" is a pointer. If me is a machine-primitive scalar the probability drops to nearly null that a partially written value can match anybody else (on an x86 platform, you toss a bus-lock around the assignment, which is what, I believe, the atomic type assignments do). Since the logic reads "if not me then wait for lock else increment depth" you only have to guard against false positives. That guard case is accomplished by making sure you don't ever leave "yourself" in the owner slot when you free up the lock. So generically: get_lock(): if owner==me then ++depth; else wait on actual lock primitive possibly_atomic_assign(owner = me) depth = 1 fi free_lock() /* optional debug test */ if owner != me then oops() fi if depth > 1 then --depth else depth = 0 possibly_atomic_assign(owner = nobody) release actual lock primitive fi There needs must be a "nobody" (a well defined quasi-legal value that can not occur in the legitimate set of owners, like -1 or NULL) assigned to the owner item so that you cannot encounter your own id in the owner slot when you don't actually have the lock primitive. At several glances this structure seems odd and somehow vulnerable, but when you consider that a thread is never actually competing with itself, you get closure on the system. (that is, once you really internalize the fact that the same thread can not be multiply entrant on both of these routines at the same time, and you see that all other threads will do a proper wait on the "real" primitive lock because of conditional exclusion (e.g. they are not ever the owner at the time of the test, no matter how you update owner in the sequence, if they are, in fact, not the owner of the lock.) Hitting the read and write at the same time is not a "bang" because (if the rest of the code is correct) the value being read, even if it is "trash" because of a partial update, will not let the reader into the "is me" part of the logic and the not-me branch is a wait for the "real" spinlock. Again, you might want to use atomic_read() and atomic_set(), then again, according to the headers you are only guaranteed 24 bits of atomic protection anyway, so if the owner is a pointer, it probably won't help that much. The finer points include the fact that the depth and owner values are actually protected values inside the domain of the real lock primitive. They are just as safe as any other value protected by the lock. The inequality test "outside" the lock is actually a microcosm of the readers/writer kind of behavior. "Atomicizing" the owner value armors against the possibility of CPU reading a half-written value that "happens" to match the enquiring thread ID. But since the only transition that can occur is from some existent owner to "nobody", or from "nobody" to the new owner, a well chosen "nobody" can make the atomic assignment moot anyway. I use this recursion layer all the time with posix mutex(es) in application layer code where I don't actually have an atomic_t to begin with. It is actually provable. Rob. -----Original Message----- From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-owner@vger.kernel.org]On Behalf Of Peter T. Breuer Sent: Sunday, May 18, 2003 4:15 PM To: Davide Libenzi Cc: linux-kernel@vger.kernel.org Subject: Re: recursive spinlocks. Shoot. In article <20030518202013$5297@gated-at.bofh.it> you wrote: >> >> > #define nestlock_lock(snl) \ >> > do { \ >> > if ((snl)->uniq == current) { \ >> >> That would be able to read uniq while it is being written by something >> else (which it can, according to the code below). It needs protection. > No it does not, look better. I'm afraid I only see that it does! >> > atomic_inc(&(snl)->count); \ >> > } else { \ >> > spin_lock(&(snl)->lock); \ >> > atomic_inc(&(snl)->count); \ >> > (snl)->uniq = current; \ >> >> Hmm .. else we wait for the lock, and then set count and uniq, while >> somebody else may have entered and be reading it :-). You exit with > Nope, think about a case were it breaks. False negatives are not possible > because it is set by the same task and false positives either. No. This is not true. Imagine two threads, timed as follows ... . . . . if ((snl)->uniq == current) { atomic_inc(&(snl)->count); . } else { . spin_lock(&(snl)->lock); . atomic_inc(&(snl)->count); . (snl)->uniq = current; <-> if ((snl)->uniq == current) { atomic_inc(&(snl)->count); } else { spin_lock(&(snl)->lock); atomic_inc(&(snl)->count); (snl)->uniq = current; There you are. One hits the read exactly at the time the other does a write. Bang. >> Well, it's not assembler either, but at least it's easily comparable >> with the nonrecursive version. It's essentially got an extra if and >> an inc in the lock. That's all. > Well, there's a little difference. In case of contention, you loop with > your custom try lock while I used the optimized asm code inside spin_lock. > But again, I believe we didn't lose anything with the removal of this code > (nested/recursive locks) from the kernel. We lose hours of programmers time, looking for deadlocks caused by accidently taking the same spinlock twice and not knowing it. A question in my mind is whether a fault in a third thread, like sleeping with a spinlock held, can make a recursive spinlock into a fault causer ... no, I don't see any way. Peter - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 58+ messages in thread
[parent not found: <20030520231013$3d77@gated-at.bofh.it>]
* Re: recursive spinlocks. Shoot. [not found] <20030520231013$3d77@gated-at.bofh.it> @ 2003-05-21 14:16 ` Peter T. Breuer 0 siblings, 0 replies; 58+ messages in thread From: Peter T. Breuer @ 2003-05-21 14:16 UTC (permalink / raw) To: Robert White; +Cc: linux-kernel In article <20030520231013$3d77@gated-at.bofh.it> you wrote: > From: Richard B. Johnson [mailto:root@chaos.analogic.com] >> The lock must guarantee that recursion is not allowed. Or to put >> it more bluntly, the lock must result in a single thread of execution >> through the locked object. > Where the HECK do you get that load of bull? The one requirement of a > MUTUAL EXCLUSION PRIMITIVE (a.k.a. mutex, a.k.a. lock) is *MUTUAL* > EXCLUSIVITY. :-) this reminds me of the post I didn't make. > All else is unfounded blither. Hey, that was the word I used in the post I didn't make! I'm sort of glad I didn't say it out loud - it ruins the argument. Anyhow, I have now written a tool that can detect various improper uses of locks. I've started out by detecting sleep under spinlock, but I'll go on to other deadlocks. % ./c ../dbr/1/sbull..c <function name=sbull_ioctl file=sbull.c line=171 attr=S_SLEEP> <function name=sbull_request file=sbull.c line=361 attr=S_SLEEP> % Detects two functions in the target file that sleep. For that it had to analyse 28K lines of C. Put a spinlock in the wrong place and ... % ./c test16 sleep under spinlock at file="test16" line=30 <function name=sbull_open file=test16 line=1 attr=S_SLEEP> (I excised a function to fiddle with and put a call to one of the other two in under a lock). I'll make this available. As soon as I figure out some more of how to define C contexts. It uses a database and I'm having trouble deciding where something is first defined in C. Peter ^ permalink raw reply [flat|nested] 58+ messages in thread
end of thread, other threads:[~2003-05-28 16:29 UTC | newest]
Thread overview: 58+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-05-18 9:21 recursive spinlocks. Shoot Peter T. Breuer
2003-05-18 16:30 ` Martin J. Bligh
2003-05-18 16:35 ` William Lee Irwin III
2003-05-18 16:49 ` Arjan van de Ven
2003-05-18 16:54 ` William Lee Irwin III
2003-05-18 17:14 ` Martin J. Bligh
2003-05-18 17:24 ` Peter T. Breuer
2003-05-18 22:34 ` David Woodhouse
2003-05-19 13:37 ` Peter T. Breuer
2003-05-19 13:45 ` Jens Axboe
2003-05-19 13:47 ` Arjan van de Ven
[not found] ` <mailman.1053352200.24653.linux-kernel2news@redhat.com>
2003-05-19 23:54 ` Pete Zaitcev
2003-05-20 0:03 ` viro
2003-05-20 0:03 ` Johannes Erdfelt
2003-05-20 3:12 ` Robert White
2003-05-20 11:59 ` Helge Hafting
2003-05-20 12:23 ` Richard B. Johnson
2003-05-20 21:05 ` Robert White
2003-05-20 21:42 ` Richard B. Johnson
2003-05-20 23:06 ` Robert White
2003-05-21 14:01 ` Richard B. Johnson
2003-05-21 21:56 ` Robert White
2003-05-22 0:13 ` viro
2003-05-22 0:32 ` Robert White
2003-05-22 0:46 ` Carl-Daniel Hailfinger
2003-05-21 5:48 ` Nikita Danilov
2003-05-22 1:00 ` Rik van Riel
2003-05-22 3:11 ` Robert White
2003-05-22 4:04 ` Nick Piggin
2003-05-22 4:42 ` Peter T. Breuer
2003-05-22 5:09 ` Nick Piggin
2003-05-23 0:19 ` Robert White
2003-05-23 7:22 ` Nikita Danilov
2003-05-23 9:07 ` Helge Hafting
2003-05-23 12:18 ` William Lee Irwin III
2003-05-24 2:39 ` Robert White
2003-05-28 16:50 ` Timothy Miller
2003-05-19 2:05 ` Kevin O'Connor
2003-05-19 6:19 ` Jan Hudec
2003-05-19 10:29 ` Helge Hafting
2003-05-19 11:37 ` Nikita Danilov
2003-05-22 1:21 ` Daniel Phillips
2003-05-19 14:28 ` Martin J. Bligh
2003-05-18 18:13 ` Davide Libenzi
[not found] <20030518182010$0541@gated-at.bofh.it>
2003-05-18 19:09 ` Peter T. Breuer
2003-05-18 19:31 ` Davide Libenzi
2003-05-18 19:49 ` Peter T. Breuer
2003-05-18 20:13 ` Davide Libenzi
2003-05-19 20:47 ` Jan Hudec
[not found] <20030518202013$5297@gated-at.bofh.it>
2003-05-18 23:15 ` Peter T. Breuer
2003-05-18 23:26 ` Davide Libenzi
2003-05-19 12:48 ` Peter T. Breuer
2003-05-19 17:15 ` Davide Libenzi
2003-05-19 17:27 ` Peter T. Breuer
2003-05-19 17:57 ` Alan Cox
2003-05-19 19:51 ` Peter T. Breuer
2003-05-19 20:22 ` Robert White
[not found] <20030520231013$3d77@gated-at.bofh.it>
2003-05-21 14:16 ` Peter T. Breuer
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox