public inbox for linux-rt-users@vger.kernel.org
 help / color / mirror / Atom feed
* kernel BUG at kernel/rtmutex_common.h:75
@ 2015-11-04 14:35 Yimin Deng
  2015-11-06 14:41 ` Thomas Gleixner
  0 siblings, 1 reply; 4+ messages in thread
From: Yimin Deng @ 2015-11-04 14:35 UTC (permalink / raw)
  To: linux-rt-users

I encountered “kernel BUG” which was reported in the
rt_mutex_top_waiter() at kernel/rtmutex_common.h:75.

Linux version: 3.12.37-rt51; CONFIG_PREEMPT_RT_FULL is disabled.
Architecture: PowerPC

We ported an application from pSOS RTOS to Linux using the
Xenomai-Mercury (=library to map pSOS task to POSIX threads). And We
have several threads running in the real-time priority domain.
ThreadA: running at prio -59. pthread_mutex_lock() +
pthread_cond_timedwait() + pthread_mutex_unlock()
ThreadB: running at prio -84. pthread_mutex_lock() +
pthread_cond_signal() + pthread_mutex_unlock()

ThreadA:
------ ------
futex_wait_requeue_pi()
    futex_wait_queue_me()
    <timed out>

    raw_spin_lock_irq(&current->pi_lock);
    if (current->pi_blocked_on) {
        raw_spin_unlock_irq(&current->pi_lock);
    } else {
        current->pi_blocked_on = PI_WAKEUP_INPROGRESS;
        raw_spin_unlock_irq(&current->pi_lock);
            <-- ThreadA was interrupted and preempted!
        spin_lock(&hb->lock);


ThreadB:
------ ------
rt_mutex_start_proxy_lock();
    task_blocks_on_rt_mutex();  <-- return "-EAGAIN" due to
"task->pi_blocked_on == PI_WAKEUP_INPROGRESS"
    ...
    if (unlikely(ret))
        remove_waiter(lock, waiter);
            int first = (waiter == rt_mutex_top_waiter(lock));  <--
BUG_ON(w->lock != lock);

It seems that the purpose to call the remove_waiter() is to remove the
waiter added by “plist_add(&waiter->list_entry, &lock->wait_list);” in
the task_blocks_on_rt_mutex(). But in the scenario above there's no
waiter on the lock yet and
the waiter has not been added into the wait list of the lock in the
task_blocks_on_rt_mutex() due to the failure “-EAGAIN”. So it reported
kernel BUG in the rt_mutex_top_waiter().

I modified it as below and the issue seems disappear.
- if (unlikely(ret))
+ if (unlikely(ret && (-EAGAIN != ret)))
       remove_waiter(lock, waiter);

Could the scenario above be possible? If so, how to resolve this issue?
Thanks!

B.R.
Yimin Deng
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: kernel BUG at kernel/rtmutex_common.h:75
  2015-11-04 14:35 kernel BUG at kernel/rtmutex_common.h:75 Yimin Deng
@ 2015-11-06 14:41 ` Thomas Gleixner
  2015-11-07 18:09   ` Thomas Gleixner
  0 siblings, 1 reply; 4+ messages in thread
From: Thomas Gleixner @ 2015-11-06 14:41 UTC (permalink / raw)
  To: Yimin Deng; +Cc: linux-rt-users

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1857 bytes --]

B1;2802;0cOn Wed, 4 Nov 2015, Yimin Deng wrote:
> It seems that the purpose to call the remove_waiter() is to remove the
> waiter added by “plist_add(&waiter->list_entry, &lock->wait_list);” in
> the task_blocks_on_rt_mutex(). But in the scenario above there's no
> waiter on the lock yet and
> the waiter has not been added into the wait list of the lock in the
> task_blocks_on_rt_mutex() due to the failure “-EAGAIN”. So it reported
> kernel BUG in the rt_mutex_top_waiter().
> 
> I modified it as below and the issue seems disappear.
> - if (unlikely(ret))
> + if (unlikely(ret && (-EAGAIN != ret)))
>        remove_waiter(lock, waiter);
> 
> Could the scenario above be possible? If so, how to resolve this issue?
> Thanks!

Yes it is possible. Nice detective work!

Your solution is correct, but actually it's not sufficient, because we
have another possibility to return early without being queued
(-EDEADLOCK). Find the full solution below.

Thanks for tracking that down!

       tglx
---
diff --git a/kernel/rtmutex.c b/kernel/rtmutex.c
index 7601c1332a88..0e6505d5ce4a 100644
--- a/kernel/rtmutex.c
+++ b/kernel/rtmutex.c
@@ -1003,11 +1003,18 @@ static void wakeup_next_waiter(struct rt_mutex *lock)
 static void remove_waiter(struct rt_mutex *lock,
 			  struct rt_mutex_waiter *waiter)
 {
-	bool is_top_waiter = (waiter == rt_mutex_top_waiter(lock));
 	struct task_struct *owner = rt_mutex_owner(lock);
 	struct rt_mutex *next_lock = NULL;
+	bool is_top_waiter = false;
 	unsigned long flags;
 
+	/*
+	 * @waiter might be not queued when task_blocks_on_rt_mutex()
+	 * returned early so @lock might not have any waiters.
+	 */
+	if (rt_mutex_has_waiters())
+		is_top_waiter = (waiter == rt_mutex_top_waiter(lock));
+
 	raw_spin_lock_irqsave(&current->pi_lock, flags);
 	rt_mutex_dequeue(lock, waiter);
 	current->pi_blocked_on = NULL;


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: kernel BUG at kernel/rtmutex_common.h:75
  2015-11-06 14:41 ` Thomas Gleixner
@ 2015-11-07 18:09   ` Thomas Gleixner
  2015-11-08  3:31     ` Yimin Deng
  0 siblings, 1 reply; 4+ messages in thread
From: Thomas Gleixner @ 2015-11-07 18:09 UTC (permalink / raw)
  To: Yimin Deng; +Cc: linux-rt-users

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1184 bytes --]

On Fri, 6 Nov 2015, Thomas Gleixner wrote:
> On Wed, 4 Nov 2015, Yimin Deng wrote:
> > It seems that the purpose to call the remove_waiter() is to remove the
> > waiter added by “plist_add(&waiter->list_entry, &lock->wait_list);” in
> > the task_blocks_on_rt_mutex(). But in the scenario above there's no
> > waiter on the lock yet and
> > the waiter has not been added into the wait list of the lock in the
> > task_blocks_on_rt_mutex() due to the failure “-EAGAIN”. So it reported
> > kernel BUG in the rt_mutex_top_waiter().
> > 
> > I modified it as below and the issue seems disappear.
> > - if (unlikely(ret))
> > + if (unlikely(ret && (-EAGAIN != ret)))
> >        remove_waiter(lock, waiter);
> > 
> > Could the scenario above be possible? If so, how to resolve this issue?
> > Thanks!
> 
> Yes it is possible. Nice detective work!
> 
> Your solution is correct, but actually it's not sufficient, because we
> have another possibility to return early without being queued
> (-EDEADLOCK). Find the full solution below.
> 
> Thanks for tracking that down!

Btw, please update to 3.12.48-rt66. It contains quite some bugfixes in
the area of futex/rtmutex.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: kernel BUG at kernel/rtmutex_common.h:75
  2015-11-07 18:09   ` Thomas Gleixner
@ 2015-11-08  3:31     ` Yimin Deng
  0 siblings, 0 replies; 4+ messages in thread
From: Yimin Deng @ 2015-11-08  3:31 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-rt-users

2015-11-08 2:09 GMT+08:00 Thomas Gleixner <tglx@linutronix.de>:
> On Fri, 6 Nov 2015, Thomas Gleixner wrote:
> Btw, please update to 3.12.48-rt66. It contains quite some bugfixes in
> the area of futex/rtmutex.
>
> Thanks,
>
>         tglx
On Fri, 6 Nov 2015, Thomas Gleixner wrote:
>> diff --git a/kernel/rtmutex.c b/kernel/rtmutex.c
>> index 7601c1332a88..0e6505d5ce4a 100644
>> --- a/kernel/rtmutex.c
>> +++ b/kernel/rtmutex.c
>> @@ -1003,11 +1003,18 @@ static void wakeup_next_waiter(struct rt_mutex *lock)
>>  static void remove_waiter(struct rt_mutex *lock,
>>                           struct rt_mutex_waiter *waiter)
>>  {
>> -       bool is_top_waiter = (waiter == rt_mutex_top_waiter(lock));
>>         struct task_struct *owner = rt_mutex_owner(lock);
>>         struct rt_mutex *next_lock = NULL;
>> +       bool is_top_waiter = false;
>>         unsigned long flags;
>>
>> +       /*
>> +        * @waiter might be not queued when task_blocks_on_rt_mutex()
>> +        * returned early so @lock might not have any waiters.
>> +        */
>> +       if (rt_mutex_has_waiters())
>> +               is_top_waiter = (waiter == rt_mutex_top_waiter(lock));
>> +
>>         raw_spin_lock_irqsave(&current->pi_lock, flags);
>>         rt_mutex_dequeue(lock, waiter);
>>         current->pi_blocked_on = NULL;

Sincerely appreciate for your answers!

Could it be modified as below? It seems not necessary to call
rt_mutex_dequeue() and clear the current->pi_blocked_on if there's no
waiter on the lock. And the waiter->task is not the 'current' when the
remove_waiter() is called in the function rt_mutex_start_proxy_lock().

 static void remove_waiter(struct rt_mutex *lock,
                          struct rt_mutex_waiter *waiter)
 {
-       bool is_top_waiter = (waiter == rt_mutex_top_waiter(lock));
        struct task_struct *owner = rt_mutex_owner(lock);
        struct rt_mutex *next_lock = NULL;
+       bool is_top_waiter;
+ struct task_struct *task = waiter->task;
        unsigned long flags;

+       /*
+        * @waiter might be not queued when task_blocks_on_rt_mutex()
+        * returned early so @lock might not have any waiters.
+        */
+       if (unlikely(!rt_mutex_has_waiters()))
+ return;
+
+       is_top_waiter = (waiter == rt_mutex_top_waiter(lock));
+
-       raw_spin_lock_irqsave(&current->pi_lock, flags);
-       rt_mutex_dequeue(lock, waiter);
-       current->pi_blocked_on = NULL;
- raw_spin_unlock_irqrestore(&current->pi_lock, flags);
+ raw_spin_lock_irqsave(&task->pi_lock, flags);
+       rt_mutex_dequeue(lock, waiter);
+       task->pi_blocked_on = NULL;
+ __rt_mutex_adjust_prio(task);    <-- I'm not sure if it is necessary.
+ raw_spin_unlock_irqrestore(&task->pi_lock, flags);

......
- rt_mutex_adjust_prio_chain(owner, 0, lock, next_lock, NULL, current);
+ rt_mutex_adjust_prio_chain(owner, 0, lock, next_lock, NULL, task);
......
 }

I'm sorry for so many questions.
Thanks ahead!

B.R.
Yimin Deng

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-11-08  3:31 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-11-04 14:35 kernel BUG at kernel/rtmutex_common.h:75 Yimin Deng
2015-11-06 14:41 ` Thomas Gleixner
2015-11-07 18:09   ` Thomas Gleixner
2015-11-08  3:31     ` Yimin Deng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox