* kernel BUG at kernel/rtmutex_common.h:75
@ 2015-11-04 14:35 Yimin Deng
2015-11-06 14:41 ` Thomas Gleixner
0 siblings, 1 reply; 4+ messages in thread
From: Yimin Deng @ 2015-11-04 14:35 UTC (permalink / raw)
To: linux-rt-users
I encountered “kernel BUG” which was reported in the
rt_mutex_top_waiter() at kernel/rtmutex_common.h:75.
Linux version: 3.12.37-rt51; CONFIG_PREEMPT_RT_FULL is disabled.
Architecture: PowerPC
We ported an application from pSOS RTOS to Linux using the
Xenomai-Mercury (=library to map pSOS task to POSIX threads). And We
have several threads running in the real-time priority domain.
ThreadA: running at prio -59. pthread_mutex_lock() +
pthread_cond_timedwait() + pthread_mutex_unlock()
ThreadB: running at prio -84. pthread_mutex_lock() +
pthread_cond_signal() + pthread_mutex_unlock()
ThreadA:
------ ------
futex_wait_requeue_pi()
futex_wait_queue_me()
<timed out>
raw_spin_lock_irq(¤t->pi_lock);
if (current->pi_blocked_on) {
raw_spin_unlock_irq(¤t->pi_lock);
} else {
current->pi_blocked_on = PI_WAKEUP_INPROGRESS;
raw_spin_unlock_irq(¤t->pi_lock);
<-- ThreadA was interrupted and preempted!
spin_lock(&hb->lock);
ThreadB:
------ ------
rt_mutex_start_proxy_lock();
task_blocks_on_rt_mutex(); <-- return "-EAGAIN" due to
"task->pi_blocked_on == PI_WAKEUP_INPROGRESS"
...
if (unlikely(ret))
remove_waiter(lock, waiter);
int first = (waiter == rt_mutex_top_waiter(lock)); <--
BUG_ON(w->lock != lock);
It seems that the purpose to call the remove_waiter() is to remove the
waiter added by “plist_add(&waiter->list_entry, &lock->wait_list);” in
the task_blocks_on_rt_mutex(). But in the scenario above there's no
waiter on the lock yet and
the waiter has not been added into the wait list of the lock in the
task_blocks_on_rt_mutex() due to the failure “-EAGAIN”. So it reported
kernel BUG in the rt_mutex_top_waiter().
I modified it as below and the issue seems disappear.
- if (unlikely(ret))
+ if (unlikely(ret && (-EAGAIN != ret)))
remove_waiter(lock, waiter);
Could the scenario above be possible? If so, how to resolve this issue?
Thanks!
B.R.
Yimin Deng
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: kernel BUG at kernel/rtmutex_common.h:75 2015-11-04 14:35 kernel BUG at kernel/rtmutex_common.h:75 Yimin Deng @ 2015-11-06 14:41 ` Thomas Gleixner 2015-11-07 18:09 ` Thomas Gleixner 0 siblings, 1 reply; 4+ messages in thread From: Thomas Gleixner @ 2015-11-06 14:41 UTC (permalink / raw) To: Yimin Deng; +Cc: linux-rt-users [-- Attachment #1: Type: TEXT/PLAIN, Size: 1857 bytes --] B1;2802;0cOn Wed, 4 Nov 2015, Yimin Deng wrote: > It seems that the purpose to call the remove_waiter() is to remove the > waiter added by “plist_add(&waiter->list_entry, &lock->wait_list);” in > the task_blocks_on_rt_mutex(). But in the scenario above there's no > waiter on the lock yet and > the waiter has not been added into the wait list of the lock in the > task_blocks_on_rt_mutex() due to the failure “-EAGAIN”. So it reported > kernel BUG in the rt_mutex_top_waiter(). > > I modified it as below and the issue seems disappear. > - if (unlikely(ret)) > + if (unlikely(ret && (-EAGAIN != ret))) > remove_waiter(lock, waiter); > > Could the scenario above be possible? If so, how to resolve this issue? > Thanks! Yes it is possible. Nice detective work! Your solution is correct, but actually it's not sufficient, because we have another possibility to return early without being queued (-EDEADLOCK). Find the full solution below. Thanks for tracking that down! tglx --- diff --git a/kernel/rtmutex.c b/kernel/rtmutex.c index 7601c1332a88..0e6505d5ce4a 100644 --- a/kernel/rtmutex.c +++ b/kernel/rtmutex.c @@ -1003,11 +1003,18 @@ static void wakeup_next_waiter(struct rt_mutex *lock) static void remove_waiter(struct rt_mutex *lock, struct rt_mutex_waiter *waiter) { - bool is_top_waiter = (waiter == rt_mutex_top_waiter(lock)); struct task_struct *owner = rt_mutex_owner(lock); struct rt_mutex *next_lock = NULL; + bool is_top_waiter = false; unsigned long flags; + /* + * @waiter might be not queued when task_blocks_on_rt_mutex() + * returned early so @lock might not have any waiters. + */ + if (rt_mutex_has_waiters()) + is_top_waiter = (waiter == rt_mutex_top_waiter(lock)); + raw_spin_lock_irqsave(¤t->pi_lock, flags); rt_mutex_dequeue(lock, waiter); current->pi_blocked_on = NULL; ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: kernel BUG at kernel/rtmutex_common.h:75 2015-11-06 14:41 ` Thomas Gleixner @ 2015-11-07 18:09 ` Thomas Gleixner 2015-11-08 3:31 ` Yimin Deng 0 siblings, 1 reply; 4+ messages in thread From: Thomas Gleixner @ 2015-11-07 18:09 UTC (permalink / raw) To: Yimin Deng; +Cc: linux-rt-users [-- Attachment #1: Type: TEXT/PLAIN, Size: 1184 bytes --] On Fri, 6 Nov 2015, Thomas Gleixner wrote: > On Wed, 4 Nov 2015, Yimin Deng wrote: > > It seems that the purpose to call the remove_waiter() is to remove the > > waiter added by “plist_add(&waiter->list_entry, &lock->wait_list);” in > > the task_blocks_on_rt_mutex(). But in the scenario above there's no > > waiter on the lock yet and > > the waiter has not been added into the wait list of the lock in the > > task_blocks_on_rt_mutex() due to the failure “-EAGAIN”. So it reported > > kernel BUG in the rt_mutex_top_waiter(). > > > > I modified it as below and the issue seems disappear. > > - if (unlikely(ret)) > > + if (unlikely(ret && (-EAGAIN != ret))) > > remove_waiter(lock, waiter); > > > > Could the scenario above be possible? If so, how to resolve this issue? > > Thanks! > > Yes it is possible. Nice detective work! > > Your solution is correct, but actually it's not sufficient, because we > have another possibility to return early without being queued > (-EDEADLOCK). Find the full solution below. > > Thanks for tracking that down! Btw, please update to 3.12.48-rt66. It contains quite some bugfixes in the area of futex/rtmutex. Thanks, tglx ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: kernel BUG at kernel/rtmutex_common.h:75 2015-11-07 18:09 ` Thomas Gleixner @ 2015-11-08 3:31 ` Yimin Deng 0 siblings, 0 replies; 4+ messages in thread From: Yimin Deng @ 2015-11-08 3:31 UTC (permalink / raw) To: Thomas Gleixner; +Cc: linux-rt-users 2015-11-08 2:09 GMT+08:00 Thomas Gleixner <tglx@linutronix.de>: > On Fri, 6 Nov 2015, Thomas Gleixner wrote: > Btw, please update to 3.12.48-rt66. It contains quite some bugfixes in > the area of futex/rtmutex. > > Thanks, > > tglx On Fri, 6 Nov 2015, Thomas Gleixner wrote: >> diff --git a/kernel/rtmutex.c b/kernel/rtmutex.c >> index 7601c1332a88..0e6505d5ce4a 100644 >> --- a/kernel/rtmutex.c >> +++ b/kernel/rtmutex.c >> @@ -1003,11 +1003,18 @@ static void wakeup_next_waiter(struct rt_mutex *lock) >> static void remove_waiter(struct rt_mutex *lock, >> struct rt_mutex_waiter *waiter) >> { >> - bool is_top_waiter = (waiter == rt_mutex_top_waiter(lock)); >> struct task_struct *owner = rt_mutex_owner(lock); >> struct rt_mutex *next_lock = NULL; >> + bool is_top_waiter = false; >> unsigned long flags; >> >> + /* >> + * @waiter might be not queued when task_blocks_on_rt_mutex() >> + * returned early so @lock might not have any waiters. >> + */ >> + if (rt_mutex_has_waiters()) >> + is_top_waiter = (waiter == rt_mutex_top_waiter(lock)); >> + >> raw_spin_lock_irqsave(¤t->pi_lock, flags); >> rt_mutex_dequeue(lock, waiter); >> current->pi_blocked_on = NULL; Sincerely appreciate for your answers! Could it be modified as below? It seems not necessary to call rt_mutex_dequeue() and clear the current->pi_blocked_on if there's no waiter on the lock. And the waiter->task is not the 'current' when the remove_waiter() is called in the function rt_mutex_start_proxy_lock(). static void remove_waiter(struct rt_mutex *lock, struct rt_mutex_waiter *waiter) { - bool is_top_waiter = (waiter == rt_mutex_top_waiter(lock)); struct task_struct *owner = rt_mutex_owner(lock); struct rt_mutex *next_lock = NULL; + bool is_top_waiter; + struct task_struct *task = waiter->task; unsigned long flags; + /* + * @waiter might be not queued when task_blocks_on_rt_mutex() + * returned early so @lock might not have any waiters. + */ + if (unlikely(!rt_mutex_has_waiters())) + return; + + is_top_waiter = (waiter == rt_mutex_top_waiter(lock)); + - raw_spin_lock_irqsave(¤t->pi_lock, flags); - rt_mutex_dequeue(lock, waiter); - current->pi_blocked_on = NULL; - raw_spin_unlock_irqrestore(¤t->pi_lock, flags); + raw_spin_lock_irqsave(&task->pi_lock, flags); + rt_mutex_dequeue(lock, waiter); + task->pi_blocked_on = NULL; + __rt_mutex_adjust_prio(task); <-- I'm not sure if it is necessary. + raw_spin_unlock_irqrestore(&task->pi_lock, flags); ...... - rt_mutex_adjust_prio_chain(owner, 0, lock, next_lock, NULL, current); + rt_mutex_adjust_prio_chain(owner, 0, lock, next_lock, NULL, task); ...... } I'm sorry for so many questions. Thanks ahead! B.R. Yimin Deng ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2015-11-08 3:31 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-11-04 14:35 kernel BUG at kernel/rtmutex_common.h:75 Yimin Deng 2015-11-06 14:41 ` Thomas Gleixner 2015-11-07 18:09 ` Thomas Gleixner 2015-11-08 3:31 ` Yimin Deng
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox