* rt_mutex_timed_lock() vs hrtimer_wakeup() race ?
@ 2006-07-30 4:36 Oleg Nesterov
2006-07-30 22:23 ` Steven Rostedt
2006-08-01 7:58 ` Thomas Gleixner
0 siblings, 2 replies; 8+ messages in thread
From: Oleg Nesterov @ 2006-07-30 4:36 UTC (permalink / raw)
To: Ingo Molnar, Thomas Gleixner, Steven Rostedt, Esben Nielsen; +Cc: linux-kernel
I am trying to get some understanding of rt_mutex, but I'm afraid it's
not possible without your help...
// runs on CPU 0
rt_mutex_slowlock:
// main loop
for (;;) {
...
if (timeout && !timeout->task) {
ret = -ETIMEDOUT;
break;
}
...
schedule();
...
set_current_state(state);
}
What if timeout->timer is fired on CPU 1 right before set_current_state() ?
hrtimer_wakeup() does:
timeout->task = NULL; <----- [1]
spin_lock(runqueues->lock);
task->state = TASK_RUNNING; <----- [2]
(task->array != NULL, so try_to_wake_up() just goes to out_running)
If my understanding correct, [1] may slip into the critical section (because
spin_lock() is not a wmb), so that CPU 0 will see [2] but not [1]. In that
case rt_mutex_slowlock() can miss the timeout (set_current_state()->mb()
doesn't help).
Of course, this race (even _if_ I am right) is pure theoretical, but probably
we need smp_wmb() after [1] in hrtimer_wakeup().
Note that do_nanosleep() is ok, hrtimer_base->lock provides a necessary
serialization. In fact, I think it can use __set_current_state(), because
both hrtimer_start() and run_hrtimer_queue() do lock/unlock of base->lock.
Another question, task_blocks_on_rt_mutex() does get_task_struct() and checks
owner->pi_blocked_on != NULL under owner->pi_lock. Why ? RT_MUTEX_HAS_WAITERS
bit is set, we are holding ->wait_lock, so the 'owner' can't go away until
we drop ->wait_lock. I think we can drop owner->pi_lock right after
__rt_mutex_adjust_prio(owner), we can't miss owner->pi_blocked_on != NULL
if it was true before we take owner->pi_lock, and this is the case we should
worry about, yes?
In other words (because I myself can't parse the paragraph above :), could
you explain me why this patch is not correct:
--- rtmutex.c~ 2006-07-30 05:15:38.000000000 +0400
+++ rtmutex.c 2006-07-30 05:41:44.000000000 +0400
@@ -407,7 +407,7 @@ static int task_blocks_on_rt_mutex(struc
struct task_struct *owner = rt_mutex_owner(lock);
struct rt_mutex_waiter *top_waiter = waiter;
unsigned long flags;
- int boost = 0, res;
+ int res;
spin_lock_irqsave(¤t->pi_lock, flags);
__rt_mutex_adjust_prio(current);
@@ -431,24 +431,20 @@ static int task_blocks_on_rt_mutex(struc
plist_add(&waiter->pi_list_entry, &owner->pi_waiters);
__rt_mutex_adjust_prio(owner);
- if (owner->pi_blocked_on) {
- boost = 1;
- /* gets dropped in rt_mutex_adjust_prio_chain()! */
- get_task_struct(owner);
- }
spin_unlock_irqrestore(&owner->pi_lock, flags);
+
+ if (owner->pi_blocked_on)
+ goto boost;
}
else if (debug_rt_mutex_detect_deadlock(waiter, detect_deadlock)) {
- spin_lock_irqsave(&owner->pi_lock, flags);
- if (owner->pi_blocked_on) {
- boost = 1;
- /* gets dropped in rt_mutex_adjust_prio_chain()! */
- get_task_struct(owner);
- }
- spin_unlock_irqrestore(&owner->pi_lock, flags);
+ if (owner->pi_blocked_on)
+ goto boost;
}
- if (!boost)
- return 0;
+
+ return 0;
+boost:
+ /* gets dropped in rt_mutex_adjust_prio_chain()! */
+ get_task_struct(owner);
spin_unlock(&lock->wait_lock);
----------------------------------------------------------
The same question for remove_waiter()/rt_mutex_adjust_pi().
The last (stupid) one,
wake_up_new_task:
if (unlikely(!current->array))
__activate_task(p, rq);
(offtopic) Is it really possible to have current->array == NULL here?
else {
p->prio = current->prio;
What if current was pi-boosted so that rt_prio(current->prio) == 1,
who will de-boost the child?
p->normal_prio = current->normal_prio;
Why? p->normal_prio was calculated by effective_prio() above, could you
explain why that value is not ok?
Thanks,
Oleg.
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: rt_mutex_timed_lock() vs hrtimer_wakeup() race ? 2006-07-30 4:36 rt_mutex_timed_lock() vs hrtimer_wakeup() race ? Oleg Nesterov @ 2006-07-30 22:23 ` Steven Rostedt 2006-08-01 0:12 ` Oleg Nesterov 2006-08-01 7:58 ` Thomas Gleixner 1 sibling, 1 reply; 8+ messages in thread From: Steven Rostedt @ 2006-07-30 22:23 UTC (permalink / raw) To: Oleg Nesterov; +Cc: Ingo Molnar, Thomas Gleixner, Esben Nielsen, linux-kernel On Sun, 2006-07-30 at 08:36 +0400, Oleg Nesterov wrote: > I am trying to get some understanding of rt_mutex, but I'm afraid it's > not possible without your help... Well, I'll give it a try :) > > // runs on CPU 0 > rt_mutex_slowlock: > > // main loop > for (;;) { > ... > > if (timeout && !timeout->task) { > ret = -ETIMEDOUT; > break; > } > > ... > > schedule(); > > ... > > set_current_state(state); > } > > What if timeout->timer is fired on CPU 1 right before set_current_state() ? > > hrtimer_wakeup() does: > > timeout->task = NULL; <----- [1] > > spin_lock(runqueues->lock); > > task->state = TASK_RUNNING; <----- [2] > > (task->array != NULL, so try_to_wake_up() just goes to out_running) > > If my understanding correct, [1] may slip into the critical section (because > spin_lock() is not a wmb), so that CPU 0 will see [2] but not [1]. In that > case rt_mutex_slowlock() can miss the timeout (set_current_state()->mb() > doesn't help). Hmm, that spin_lock and task->state = TASK_RUNNING is in try_to_wake_up, and at the bottom too. I don't think there's a CPU that will slip a write after a call to a function and several branches. I guess it could happen. I'm not an expert here. We're only dealing with SMP issues here, and not worried about PCI bus write outs or anything else. Does anyone know of a SMP machine that could have a CPU do a write to memory and postpone it till after several branches? > > Of course, this race (even _if_ I am right) is pure theoretical, but probably > we need smp_wmb() after [1] in hrtimer_wakeup(). Theoretically, you may be right, and I'm not sure how much a smp_wmb would hurt to prevent a theoretical problem. > > Note that do_nanosleep() is ok, hrtimer_base->lock provides a necessary > serialization. In fact, I think it can use __set_current_state(), because > both hrtimer_start() and run_hrtimer_queue() do lock/unlock of base->lock. > Hmm, that's probably true too. But this really is a micro-optimization, since do_nanosleep isn't called that often (it's a syscall and not called at every schedule or interrupt). > > > Another question, task_blocks_on_rt_mutex() does get_task_struct() and checks > owner->pi_blocked_on != NULL under owner->pi_lock. Why ? RT_MUTEX_HAS_WAITERS > bit is set, we are holding ->wait_lock, so the 'owner' can't go away until > we drop ->wait_lock. That's probably true that the owner can't disappear before we let go of the wait_lock, since the owner should not be disappearing while holding locks. But you are missing the point to why we are grabbing the pi_lock. We are preventing a race that can have us do unneeded work (see below). > I think we can drop owner->pi_lock right after > __rt_mutex_adjust_prio(owner), we can't miss owner->pi_blocked_on != NULL > if it was true before we take owner->pi_lock, and this is the case we should > worry about, yes? > > In other words (because I myself can't parse the paragraph above :), could > you explain me why this patch is not correct: > > --- rtmutex.c~ 2006-07-30 05:15:38.000000000 +0400 > +++ rtmutex.c 2006-07-30 05:41:44.000000000 +0400 > @@ -407,7 +407,7 @@ static int task_blocks_on_rt_mutex(struc > struct task_struct *owner = rt_mutex_owner(lock); > struct rt_mutex_waiter *top_waiter = waiter; > unsigned long flags; > - int boost = 0, res; > + int res; > > spin_lock_irqsave(¤t->pi_lock, flags); > __rt_mutex_adjust_prio(current); > @@ -431,24 +431,20 @@ static int task_blocks_on_rt_mutex(struc > plist_add(&waiter->pi_list_entry, &owner->pi_waiters); > > __rt_mutex_adjust_prio(owner); > - if (owner->pi_blocked_on) { > - boost = 1; > - /* gets dropped in rt_mutex_adjust_prio_chain()! */ > - get_task_struct(owner); > - } > spin_unlock_irqrestore(&owner->pi_lock, flags); > + > + if (owner->pi_blocked_on) > + goto boost; OK, this is a little complicated. We might have just upped the owner's prio with the __rt_mutex_adjust_prio(owner). Now we want to know if the owner is blocked on a lock or not. If it isn't at this moment, we don't want to add the extra time in calling the rt_mutex_adjust_prio_chain. If we do the owner->pi_blocked_on test after letting go of the owner->pi_lock, the owner could than become blocked on a lock. This isn't a bug in logic, that is, it doesn't break the code, but since the owner already has been boosted, when it blocks it will do the work of rt_mutex_adjust_prio_chain too. So we run the risk of calling rt_mutex_adjust_prio_chain when we really didn't need to because the owner already had it's prio boosted when it blocked. Make sense? > } > else if (debug_rt_mutex_detect_deadlock(waiter, detect_deadlock)) { > - spin_lock_irqsave(&owner->pi_lock, flags); > - if (owner->pi_blocked_on) { > - boost = 1; > - /* gets dropped in rt_mutex_adjust_prio_chain()! */ > - get_task_struct(owner); > - } > - spin_unlock_irqrestore(&owner->pi_lock, flags); > + if (owner->pi_blocked_on) > + goto boost; > } > - if (!boost) > - return 0; > + > + return 0; > +boost: > + /* gets dropped in rt_mutex_adjust_prio_chain()! */ > + get_task_struct(owner); So the protection was probably more on that boost = 1 and if we _are_ going to boost then we should grab the lock. But, I think you are right that the get_task_struct is already protected by the wait_lock so perhaps this patch would work: Index: linux-2.6.18-rc2/kernel/rtmutex.c =================================================================== --- linux-2.6.18-rc2.orig/kernel/rtmutex.c 2006-07-30 18:04:12.000000000 -0400 +++ linux-2.6.18-rc2/kernel/rtmutex.c 2006-07-30 18:07:08.000000000 -0400 @@ -433,25 +433,26 @@ static int task_blocks_on_rt_mutex(struc plist_add(&waiter->pi_list_entry, &owner->pi_waiters); __rt_mutex_adjust_prio(owner); - if (owner->pi_blocked_on) { + if (owner->pi_blocked_on) boost = 1; - /* gets dropped in rt_mutex_adjust_prio_chain()! */ - get_task_struct(owner); - } spin_unlock_irqrestore(&owner->pi_lock, flags); } else if (debug_rt_mutex_detect_deadlock(waiter, detect_deadlock)) { spin_lock_irqsave(&owner->pi_lock, flags); - if (owner->pi_blocked_on) { + if (owner->pi_blocked_on) boost = 1; - /* gets dropped in rt_mutex_adjust_prio_chain()! */ - get_task_struct(owner); - } spin_unlock_irqrestore(&owner->pi_lock, flags); } if (!boost) return 0; + /* + * The owner can't disappear while holding a lock, + * so the owner struct is protected by wait_lock. + * Gets dropped in rt_mutex_adjust_prio_chain()! + */ + get_task_struct(owner); + spin_unlock(&lock->wait_lock); res = rt_mutex_adjust_prio_chain(owner, detect_deadlock, lock, waiter, > > spin_unlock(&lock->wait_lock); > > ---------------------------------------------------------- > The same question for remove_waiter()/rt_mutex_adjust_pi(). > Same answer. :) > > The last (stupid) one, Well, the other questions were definitely not stupid! > wake_up_new_task: > > if (unlikely(!current->array)) > __activate_task(p, rq); > > (offtopic) Is it really possible to have current->array == NULL here? ** You know, I think you're right. I don't think current->array can ever be NULL outside of schedule. And here we are even holding the rq locks! Perhaps this can happen on startup. Is the init thread on any run queue on system boot up? > > else { > p->prio = current->prio; > > What if current was pi-boosted so that rt_prio(current->prio) == 1, > who will de-boost the child? Good point! Perhaps this can cause a prio leak. > > p->normal_prio = current->normal_prio; > > Why? p->normal_prio was calculated by effective_prio() above, could you > explain why that value is not ok? I don't know this answer. So I can't help you here. Seems that these were not "Stupid" questions after all. ;) -- Steve ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: rt_mutex_timed_lock() vs hrtimer_wakeup() race ? 2006-07-30 22:23 ` Steven Rostedt @ 2006-08-01 0:12 ` Oleg Nesterov 2006-07-31 20:47 ` Steven Rostedt 0 siblings, 1 reply; 8+ messages in thread From: Oleg Nesterov @ 2006-08-01 0:12 UTC (permalink / raw) To: Steven Rostedt; +Cc: Ingo Molnar, Thomas Gleixner, Esben Nielsen, linux-kernel On 07/30, Steven Rostedt wrote: > > On Sun, 2006-07-30 at 08:36 +0400, Oleg Nesterov wrote: > > > > Another question, task_blocks_on_rt_mutex() does get_task_struct() and checks > > owner->pi_blocked_on != NULL under owner->pi_lock. Why ? RT_MUTEX_HAS_WAITERS > > bit is set, we are holding ->wait_lock, so the 'owner' can't go away until > > we drop ->wait_lock. > > That's probably true that the owner can't disappear before we let go of > the wait_lock, since the owner should not be disappearing while holding > locks. But you are missing the point to why we are grabbing the > pi_lock. We are preventing a race that can have us do unneeded work > (see below). Yes, I see. But ... > =================================================================== > --- linux-2.6.18-rc2.orig/kernel/rtmutex.c 2006-07-30 18:04:12.000000000 -0400 > +++ linux-2.6.18-rc2/kernel/rtmutex.c 2006-07-30 18:07:08.000000000 -0400 > @@ -433,25 +433,26 @@ static int task_blocks_on_rt_mutex(struc > ... > else if (debug_rt_mutex_detect_deadlock(waiter, detect_deadlock)) { > spin_lock_irqsave(&owner->pi_lock, flags); > - if (owner->pi_blocked_on) { > + if (owner->pi_blocked_on) > boost = 1; > - /* gets dropped in rt_mutex_adjust_prio_chain()! */ > - get_task_struct(owner); > - } > spin_unlock_irqrestore(&owner->pi_lock, flags); In that case ->pi_lock can't buy anything. With or without ->pi_lock this check is equally racy, so spin_lock() only adds unneeded work? Thanks! Oleg. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: rt_mutex_timed_lock() vs hrtimer_wakeup() race ? 2006-08-01 0:12 ` Oleg Nesterov @ 2006-07-31 20:47 ` Steven Rostedt 0 siblings, 0 replies; 8+ messages in thread From: Steven Rostedt @ 2006-07-31 20:47 UTC (permalink / raw) To: Oleg Nesterov; +Cc: Ingo Molnar, Thomas Gleixner, Esben Nielsen, linux-kernel On Tue, 2006-08-01 at 04:12 +0400, Oleg Nesterov wrote: > On 07/30, Steven Rostedt wrote: > > > > On Sun, 2006-07-30 at 08:36 +0400, Oleg Nesterov wrote: > > > > > > Another question, task_blocks_on_rt_mutex() does get_task_struct() and checks > > > owner->pi_blocked_on != NULL under owner->pi_lock. Why ? RT_MUTEX_HAS_WAITERS > > > bit is set, we are holding ->wait_lock, so the 'owner' can't go away until > > > we drop ->wait_lock. > > > > That's probably true that the owner can't disappear before we let go of > > the wait_lock, since the owner should not be disappearing while holding > > locks. But you are missing the point to why we are grabbing the > > pi_lock. We are preventing a race that can have us do unneeded work > > (see below). > > Yes, I see. But ... > > > =================================================================== > > --- linux-2.6.18-rc2.orig/kernel/rtmutex.c 2006-07-30 18:04:12.000000000 -0400 > > +++ linux-2.6.18-rc2/kernel/rtmutex.c 2006-07-30 18:07:08.000000000 -0400 > > @@ -433,25 +433,26 @@ static int task_blocks_on_rt_mutex(struc > > ... > > else if (debug_rt_mutex_detect_deadlock(waiter, detect_deadlock)) { > > spin_lock_irqsave(&owner->pi_lock, flags); > > - if (owner->pi_blocked_on) { > > + if (owner->pi_blocked_on) > > boost = 1; > > - /* gets dropped in rt_mutex_adjust_prio_chain()! */ > > - get_task_struct(owner); > > - } > > spin_unlock_irqrestore(&owner->pi_lock, flags); > > In that case ->pi_lock can't buy anything. With or without ->pi_lock this > check is equally racy, so spin_lock() only adds unneeded work? crap! I just did a blind change there. The first one does matter, but this is for debugging. Hmm actually I would just remove the owner blocked check all together and do the boost = 1 to force the chain walk. It's for debugging anyway. So that probably could just look something like this: else if (debug_rt_mutex_detect_deadlock(waiter, detect_deadlock)) boost = 1; the "boost" here is a misnomer. It probably would be better to call it "walk" or "chain_walk". -- Steve ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: rt_mutex_timed_lock() vs hrtimer_wakeup() race ? 2006-07-30 4:36 rt_mutex_timed_lock() vs hrtimer_wakeup() race ? Oleg Nesterov 2006-07-30 22:23 ` Steven Rostedt @ 2006-08-01 7:58 ` Thomas Gleixner 2006-08-01 12:07 ` Steven Rostedt 1 sibling, 1 reply; 8+ messages in thread From: Thomas Gleixner @ 2006-08-01 7:58 UTC (permalink / raw) To: Oleg Nesterov; +Cc: Ingo Molnar, Steven Rostedt, Esben Nielsen, linux-kernel On Sun, 2006-07-30 at 08:36 +0400, Oleg Nesterov wrote: > Another question, task_blocks_on_rt_mutex() does get_task_struct() and checks > owner->pi_blocked_on != NULL under owner->pi_lock. Why ? RT_MUTEX_HAS_WAITERS > bit is set, we are holding ->wait_lock, so the 'owner' can't go away until > we drop ->wait_lock. I think we can drop owner->pi_lock right after > __rt_mutex_adjust_prio(owner), we can't miss owner->pi_blocked_on != NULL > if it was true before we take owner->pi_lock, and this is the case we should > worry about, yes? No. We hold lock->wait_lock. The owner of this lock can be blocked itself, which makes it necessary to do the chain walk. The indicator is owner->pi_blocked_on. This field is only protected by owner->pi_lock. If we look at this field outside of owner->pi_lock, then we might miss a chain walk. CPU 0 CPU 1 lock lock->wait_lock block_on_rt_mutex() lock current->pi_lock current->pi_blocked_on = waiter(lock) unlock current->pi_lock boost = owner->pi_blocked_on owner blocks on lock2 lock owner->pi_lock owner->pi_blocked_on = waiter(lock2) unlock owner->pi_lock lock owner->pi_lock enqueue waiter adjust prio unlock owner->pi_lock unlock lock->wait_lock if boost do_chain_walk() ->pi_blocked_on is protected by ->pi_lock and has to be read/modified under this lock. That way we can not miss a chain walk: CPU 0 CPU 1 lock lock->wait_lock block_on_rt_mutex() lock current->pi_lock current->pi_blocked_on = waiter(lock) unlock current->pi_lock owner blocks on lock2 lock owner->pi_lock owner->pi_blocked_on = waiter(lock2) unlock owner->pi_lock lock owner->pi_lock enqueue waiter adjust prio boost = owner->pi_blocked_on unlock owner->pi_lock unlock lock->wait_lock if boost do_chain_walk() CPU 0 CPU 1 lock lock->wait_lock block_on_rt_mutex() lock current->pi_lock current->pi_blocked_on = waiter(lock) unlock current->pi_lock lock owner->pi_lock enqueue waiter adjust prio of owner boost = owner->pi_blocked_on unlock owner->pi_lock unlock lock->wait_lock owner blocks on lock2 lock owner->pi_lock owner->pi_blocked_on = waiter(lock2) unlock owner->pi_lock if boost do_chain_walk() owner propagates the priority to owner(lock2) get_task_struct() is there for a different reason. We have to protect the gap where we hold _no_ lock in rt_mutex_adjust_prio_chain: retry: lock task->pi_lock block = task->pi_blocked_on->lock if (! trylock block->wait_lock) { unlock task->pi_lock -> task could go away right here, if we do not hold a reference cpu_relax() goto retry } tglx ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: rt_mutex_timed_lock() vs hrtimer_wakeup() race ? 2006-08-01 7:58 ` Thomas Gleixner @ 2006-08-01 12:07 ` Steven Rostedt 2006-08-01 12:52 ` Thomas Gleixner 0 siblings, 1 reply; 8+ messages in thread From: Steven Rostedt @ 2006-08-01 12:07 UTC (permalink / raw) To: tglx; +Cc: Oleg Nesterov, Ingo Molnar, Esben Nielsen, linux-kernel On Tue, 2006-08-01 at 09:58 +0200, Thomas Gleixner wrote: > On Sun, 2006-07-30 at 08:36 +0400, Oleg Nesterov wrote: > > Another question, task_blocks_on_rt_mutex() does get_task_struct() and checks > > owner->pi_blocked_on != NULL under owner->pi_lock. Why ? RT_MUTEX_HAS_WAITERS > > bit is set, we are holding ->wait_lock, so the 'owner' can't go away until > > we drop ->wait_lock. I think we can drop owner->pi_lock right after > > __rt_mutex_adjust_prio(owner), we can't miss owner->pi_blocked_on != NULL > > if it was true before we take owner->pi_lock, and this is the case we should > > worry about, yes? > > No. > > We hold lock->wait_lock. The owner of this lock can be blocked itself, > which makes it necessary to do the chain walk. The indicator is > owner->pi_blocked_on. This field is only protected by owner->pi_lock. > > If we look at this field outside of owner->pi_lock, then we might miss a > chain walk. > Actually Thomas, not counting the debug case, his patch wont miss a chain walk. That is because the boost is read _after_ the owner's prio is adjusted. So the only thing the lock is doing for us is to prevent us from walking the chain twice for the same lock grab. (btw. I'm looking at 2.6.18-rc2, and not the -rt patch, just to make things clear). > CPU 0 CPU 1 > > lock lock->wait_lock > > block_on_rt_mutex() > > lock current->pi_lock > current->pi_blocked_on = waiter(lock) > unlock current->pi_lock > > boost = owner->pi_blocked_on > > owner blocks on lock2 > lock owner->pi_lock > owner->pi_blocked_on = waiter(lock2) > unlock owner->pi_lock > > lock owner->pi_lock > enqueue waiter > adjust prio > unlock owner->pi_lock > unlock lock->wait_lock > > if boost > do_chain_walk() > > ->pi_blocked_on is protected by ->pi_lock and has to be read/modified Something does not have to be read under a lock unless it is actually doing something with it while still under the lock. But this code doesn't do that. It reads boost = owner->pi_blocked_on and then releases the lock. The only other thing that this read is protected with is the adjusting of the prio. So as I said. We are protecting against doing more work, and that is all. We wont miss any chain walk by moving the boost check outside the lock, as long as we do the boost check _after_ we adjusted the owner's prio and the blocking task is on the owner's/lock's wait queue. > under this lock. That way we can not miss a chain walk: > > CPU 0 CPU 1 > > lock lock->wait_lock > > block_on_rt_mutex() > > lock current->pi_lock > current->pi_blocked_on = waiter(lock) > unlock current->pi_lock > > owner blocks on lock2 > lock owner->pi_lock > owner->pi_blocked_on = waiter(lock2) > unlock owner->pi_lock > > lock owner->pi_lock > enqueue waiter > adjust prio > boost = owner->pi_blocked_on > unlock owner->pi_lock > unlock lock->wait_lock > > if boost > do_chain_walk() > BTW, you are showing a lot of CPU0 vs CPU1 code here without explaining what you are showing. Please point out where exactly a chain walk is missed, and how the locking of the boost variable helps. All I see is that it prevents the chain walk from happening twice. Reading a variable under a lock and then releasing the lock does nothing, unless that variable might be a NULL dereference, and that would mean that the owner struct disappeared. Which can't happen because we still have the wait_lock (see below). > > CPU 0 CPU 1 > > lock lock->wait_lock > > block_on_rt_mutex() > > lock current->pi_lock > current->pi_blocked_on = waiter(lock) > unlock current->pi_lock > > lock owner->pi_lock > enqueue waiter > adjust prio of owner > boost = owner->pi_blocked_on > unlock owner->pi_lock > unlock lock->wait_lock > > owner blocks on lock2 > lock owner->pi_lock > owner->pi_blocked_on = waiter(lock2) > unlock owner->pi_lock > if boost > do_chain_walk() > > owner propagates the priority to owner(lock2) > > get_task_struct() is there for a different reason. We have to protect > the gap where we hold _no_ lock in rt_mutex_adjust_prio_chain: I think Oleg realized this. But the question was why is it done with the pi_lock held. It is just as good to do it with the wait_lock held, because the owner can't disappear while holding a lock or that is in itself is a bug. So we can push that get_task_struct to just before the wait_lock is released. -- Steve > > retry: > lock task->pi_lock > block = task->pi_blocked_on->lock > if (! trylock block->wait_lock) { > unlock task->pi_lock > > -> task could go away right here, if we do not hold a reference > > cpu_relax() > goto retry > } > > > tglx > > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: rt_mutex_timed_lock() vs hrtimer_wakeup() race ? 2006-08-01 12:07 ` Steven Rostedt @ 2006-08-01 12:52 ` Thomas Gleixner 2006-08-01 13:21 ` Steven Rostedt 0 siblings, 1 reply; 8+ messages in thread From: Thomas Gleixner @ 2006-08-01 12:52 UTC (permalink / raw) To: Steven Rostedt; +Cc: Oleg Nesterov, Ingo Molnar, Esben Nielsen, linux-kernel On Tue, 2006-08-01 at 08:07 -0400, Steven Rostedt wrote: > > We hold lock->wait_lock. The owner of this lock can be blocked itself, > > which makes it necessary to do the chain walk. The indicator is > > owner->pi_blocked_on. This field is only protected by owner->pi_lock. > > > > If we look at this field outside of owner->pi_lock, then we might miss a > > chain walk. > > > > Actually Thomas, not counting the debug case, his patch wont miss a > chain walk. That is because the boost is read _after_ the owner's prio > is adjusted. So the only thing the lock is doing for us is to prevent > us from walking the chain twice for the same lock grab. (btw. I'm > looking at 2.6.18-rc2, and not the -rt patch, just to make things > clear). So what do we win, when we drop the lock before we check for boosting ? In the worst case we do a redundant chain walk. tglx ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: rt_mutex_timed_lock() vs hrtimer_wakeup() race ? 2006-08-01 12:52 ` Thomas Gleixner @ 2006-08-01 13:21 ` Steven Rostedt 0 siblings, 0 replies; 8+ messages in thread From: Steven Rostedt @ 2006-08-01 13:21 UTC (permalink / raw) To: tglx; +Cc: Oleg Nesterov, Ingo Molnar, Esben Nielsen, linux-kernel On Tue, 2006-08-01 at 14:52 +0200, Thomas Gleixner wrote: > On Tue, 2006-08-01 at 08:07 -0400, Steven Rostedt wrote: > > > We hold lock->wait_lock. The owner of this lock can be blocked itself, > > > which makes it necessary to do the chain walk. The indicator is > > > owner->pi_blocked_on. This field is only protected by owner->pi_lock. > > > > > > If we look at this field outside of owner->pi_lock, then we might miss a > > > chain walk. > > > > > > > Actually Thomas, not counting the debug case, his patch wont miss a > > chain walk. That is because the boost is read _after_ the owner's prio > > is adjusted. So the only thing the lock is doing for us is to prevent > > us from walking the chain twice for the same lock grab. (btw. I'm > > looking at 2.6.18-rc2, and not the -rt patch, just to make things > > clear). > > So what do we win, when we drop the lock before we check for boosting ? > In the worst case we do a redundant chain walk. > I'm not saying that it was correct to drop the lock before checking for boosting. I'm just saying that it won't miss a chain walk. But we might do two of them. I think I'll send my patch again that fixes this up a little. I'll make two patches: one for mainline, and one for -rt. In the debug case, we really don't even need to grab the lock. (see patch -- coming soon) -- Steve ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2006-08-01 13:21 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-07-30 4:36 rt_mutex_timed_lock() vs hrtimer_wakeup() race ? Oleg Nesterov 2006-07-30 22:23 ` Steven Rostedt 2006-08-01 0:12 ` Oleg Nesterov 2006-07-31 20:47 ` Steven Rostedt 2006-08-01 7:58 ` Thomas Gleixner 2006-08-01 12:07 ` Steven Rostedt 2006-08-01 12:52 ` Thomas Gleixner 2006-08-01 13:21 ` Steven Rostedt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox