* kernel BUG at kernel/rtmutex_common.h:75
@ 2015-11-04 14:35 Yimin Deng
2015-11-06 14:41 ` Thomas Gleixner
0 siblings, 1 reply; 4+ messages in thread
From: Yimin Deng @ 2015-11-04 14:35 UTC (permalink / raw)
To: linux-rt-users
I encountered “kernel BUG” which was reported in the
rt_mutex_top_waiter() at kernel/rtmutex_common.h:75.
Linux version: 3.12.37-rt51; CONFIG_PREEMPT_RT_FULL is disabled.
Architecture: PowerPC
We ported an application from pSOS RTOS to Linux using the
Xenomai-Mercury (=library to map pSOS task to POSIX threads). And We
have several threads running in the real-time priority domain.
ThreadA: running at prio -59. pthread_mutex_lock() +
pthread_cond_timedwait() + pthread_mutex_unlock()
ThreadB: running at prio -84. pthread_mutex_lock() +
pthread_cond_signal() + pthread_mutex_unlock()
ThreadA:
------ ------
futex_wait_requeue_pi()
futex_wait_queue_me()
<timed out>
raw_spin_lock_irq(¤t->pi_lock);
if (current->pi_blocked_on) {
raw_spin_unlock_irq(¤t->pi_lock);
} else {
current->pi_blocked_on = PI_WAKEUP_INPROGRESS;
raw_spin_unlock_irq(¤t->pi_lock);
<-- ThreadA was interrupted and preempted!
spin_lock(&hb->lock);
ThreadB:
------ ------
rt_mutex_start_proxy_lock();
task_blocks_on_rt_mutex(); <-- return "-EAGAIN" due to
"task->pi_blocked_on == PI_WAKEUP_INPROGRESS"
...
if (unlikely(ret))
remove_waiter(lock, waiter);
int first = (waiter == rt_mutex_top_waiter(lock)); <--
BUG_ON(w->lock != lock);
It seems that the purpose to call the remove_waiter() is to remove the
waiter added by “plist_add(&waiter->list_entry, &lock->wait_list);” in
the task_blocks_on_rt_mutex(). But in the scenario above there's no
waiter on the lock yet and
the waiter has not been added into the wait list of the lock in the
task_blocks_on_rt_mutex() due to the failure “-EAGAIN”. So it reported
kernel BUG in the rt_mutex_top_waiter().
I modified it as below and the issue seems disappear.
- if (unlikely(ret))
+ if (unlikely(ret && (-EAGAIN != ret)))
remove_waiter(lock, waiter);
Could the scenario above be possible? If so, how to resolve this issue?
Thanks!
B.R.
Yimin Deng
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: kernel BUG at kernel/rtmutex_common.h:75
2015-11-04 14:35 kernel BUG at kernel/rtmutex_common.h:75 Yimin Deng
@ 2015-11-06 14:41 ` Thomas Gleixner
2015-11-07 18:09 ` Thomas Gleixner
0 siblings, 1 reply; 4+ messages in thread
From: Thomas Gleixner @ 2015-11-06 14:41 UTC (permalink / raw)
To: Yimin Deng; +Cc: linux-rt-users
[-- Attachment #1: Type: TEXT/PLAIN, Size: 1857 bytes --]
B1;2802;0cOn Wed, 4 Nov 2015, Yimin Deng wrote:
> It seems that the purpose to call the remove_waiter() is to remove the
> waiter added by “plist_add(&waiter->list_entry, &lock->wait_list);” in
> the task_blocks_on_rt_mutex(). But in the scenario above there's no
> waiter on the lock yet and
> the waiter has not been added into the wait list of the lock in the
> task_blocks_on_rt_mutex() due to the failure “-EAGAIN”. So it reported
> kernel BUG in the rt_mutex_top_waiter().
>
> I modified it as below and the issue seems disappear.
> - if (unlikely(ret))
> + if (unlikely(ret && (-EAGAIN != ret)))
> remove_waiter(lock, waiter);
>
> Could the scenario above be possible? If so, how to resolve this issue?
> Thanks!
Yes it is possible. Nice detective work!
Your solution is correct, but actually it's not sufficient, because we
have another possibility to return early without being queued
(-EDEADLOCK). Find the full solution below.
Thanks for tracking that down!
tglx
---
diff --git a/kernel/rtmutex.c b/kernel/rtmutex.c
index 7601c1332a88..0e6505d5ce4a 100644
--- a/kernel/rtmutex.c
+++ b/kernel/rtmutex.c
@@ -1003,11 +1003,18 @@ static void wakeup_next_waiter(struct rt_mutex *lock)
static void remove_waiter(struct rt_mutex *lock,
struct rt_mutex_waiter *waiter)
{
- bool is_top_waiter = (waiter == rt_mutex_top_waiter(lock));
struct task_struct *owner = rt_mutex_owner(lock);
struct rt_mutex *next_lock = NULL;
+ bool is_top_waiter = false;
unsigned long flags;
+ /*
+ * @waiter might be not queued when task_blocks_on_rt_mutex()
+ * returned early so @lock might not have any waiters.
+ */
+ if (rt_mutex_has_waiters())
+ is_top_waiter = (waiter == rt_mutex_top_waiter(lock));
+
raw_spin_lock_irqsave(¤t->pi_lock, flags);
rt_mutex_dequeue(lock, waiter);
current->pi_blocked_on = NULL;
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: kernel BUG at kernel/rtmutex_common.h:75
2015-11-06 14:41 ` Thomas Gleixner
@ 2015-11-07 18:09 ` Thomas Gleixner
2015-11-08 3:31 ` Yimin Deng
0 siblings, 1 reply; 4+ messages in thread
From: Thomas Gleixner @ 2015-11-07 18:09 UTC (permalink / raw)
To: Yimin Deng; +Cc: linux-rt-users
[-- Attachment #1: Type: TEXT/PLAIN, Size: 1184 bytes --]
On Fri, 6 Nov 2015, Thomas Gleixner wrote:
> On Wed, 4 Nov 2015, Yimin Deng wrote:
> > It seems that the purpose to call the remove_waiter() is to remove the
> > waiter added by “plist_add(&waiter->list_entry, &lock->wait_list);” in
> > the task_blocks_on_rt_mutex(). But in the scenario above there's no
> > waiter on the lock yet and
> > the waiter has not been added into the wait list of the lock in the
> > task_blocks_on_rt_mutex() due to the failure “-EAGAIN”. So it reported
> > kernel BUG in the rt_mutex_top_waiter().
> >
> > I modified it as below and the issue seems disappear.
> > - if (unlikely(ret))
> > + if (unlikely(ret && (-EAGAIN != ret)))
> > remove_waiter(lock, waiter);
> >
> > Could the scenario above be possible? If so, how to resolve this issue?
> > Thanks!
>
> Yes it is possible. Nice detective work!
>
> Your solution is correct, but actually it's not sufficient, because we
> have another possibility to return early without being queued
> (-EDEADLOCK). Find the full solution below.
>
> Thanks for tracking that down!
Btw, please update to 3.12.48-rt66. It contains quite some bugfixes in
the area of futex/rtmutex.
Thanks,
tglx
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: kernel BUG at kernel/rtmutex_common.h:75
2015-11-07 18:09 ` Thomas Gleixner
@ 2015-11-08 3:31 ` Yimin Deng
0 siblings, 0 replies; 4+ messages in thread
From: Yimin Deng @ 2015-11-08 3:31 UTC (permalink / raw)
To: Thomas Gleixner; +Cc: linux-rt-users
2015-11-08 2:09 GMT+08:00 Thomas Gleixner <tglx@linutronix.de>:
> On Fri, 6 Nov 2015, Thomas Gleixner wrote:
> Btw, please update to 3.12.48-rt66. It contains quite some bugfixes in
> the area of futex/rtmutex.
>
> Thanks,
>
> tglx
On Fri, 6 Nov 2015, Thomas Gleixner wrote:
>> diff --git a/kernel/rtmutex.c b/kernel/rtmutex.c
>> index 7601c1332a88..0e6505d5ce4a 100644
>> --- a/kernel/rtmutex.c
>> +++ b/kernel/rtmutex.c
>> @@ -1003,11 +1003,18 @@ static void wakeup_next_waiter(struct rt_mutex *lock)
>> static void remove_waiter(struct rt_mutex *lock,
>> struct rt_mutex_waiter *waiter)
>> {
>> - bool is_top_waiter = (waiter == rt_mutex_top_waiter(lock));
>> struct task_struct *owner = rt_mutex_owner(lock);
>> struct rt_mutex *next_lock = NULL;
>> + bool is_top_waiter = false;
>> unsigned long flags;
>>
>> + /*
>> + * @waiter might be not queued when task_blocks_on_rt_mutex()
>> + * returned early so @lock might not have any waiters.
>> + */
>> + if (rt_mutex_has_waiters())
>> + is_top_waiter = (waiter == rt_mutex_top_waiter(lock));
>> +
>> raw_spin_lock_irqsave(¤t->pi_lock, flags);
>> rt_mutex_dequeue(lock, waiter);
>> current->pi_blocked_on = NULL;
Sincerely appreciate for your answers!
Could it be modified as below? It seems not necessary to call
rt_mutex_dequeue() and clear the current->pi_blocked_on if there's no
waiter on the lock. And the waiter->task is not the 'current' when the
remove_waiter() is called in the function rt_mutex_start_proxy_lock().
static void remove_waiter(struct rt_mutex *lock,
struct rt_mutex_waiter *waiter)
{
- bool is_top_waiter = (waiter == rt_mutex_top_waiter(lock));
struct task_struct *owner = rt_mutex_owner(lock);
struct rt_mutex *next_lock = NULL;
+ bool is_top_waiter;
+ struct task_struct *task = waiter->task;
unsigned long flags;
+ /*
+ * @waiter might be not queued when task_blocks_on_rt_mutex()
+ * returned early so @lock might not have any waiters.
+ */
+ if (unlikely(!rt_mutex_has_waiters()))
+ return;
+
+ is_top_waiter = (waiter == rt_mutex_top_waiter(lock));
+
- raw_spin_lock_irqsave(¤t->pi_lock, flags);
- rt_mutex_dequeue(lock, waiter);
- current->pi_blocked_on = NULL;
- raw_spin_unlock_irqrestore(¤t->pi_lock, flags);
+ raw_spin_lock_irqsave(&task->pi_lock, flags);
+ rt_mutex_dequeue(lock, waiter);
+ task->pi_blocked_on = NULL;
+ __rt_mutex_adjust_prio(task); <-- I'm not sure if it is necessary.
+ raw_spin_unlock_irqrestore(&task->pi_lock, flags);
......
- rt_mutex_adjust_prio_chain(owner, 0, lock, next_lock, NULL, current);
+ rt_mutex_adjust_prio_chain(owner, 0, lock, next_lock, NULL, task);
......
}
I'm sorry for so many questions.
Thanks ahead!
B.R.
Yimin Deng
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2015-11-08 3:31 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-11-04 14:35 kernel BUG at kernel/rtmutex_common.h:75 Yimin Deng
2015-11-06 14:41 ` Thomas Gleixner
2015-11-07 18:09 ` Thomas Gleixner
2015-11-08 3:31 ` Yimin Deng
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox