Re: [PATCH] sched/deadline/rtmutex: Fix a PI crash for deadline tasks

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Xunlei Pang <xpang@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>,
	linux-kernel@vger.kernel.org, Juri Lelli <juri.lelli@arm.com>,
	Ingo Molnar <mingo@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH] sched/deadline/rtmutex: Fix a PI crash for deadline tasks
Date: Tue, 12 Apr 2016 11:08:04 +0800	[thread overview]
Message-ID: <570C6694.50803@redhat.com> (raw)
In-Reply-To: <570A0D37.8080400@redhat.com>

On 2016／04／10 at 16:22, Xunlei Pang wrote:
> On 2016/04/09 at 21:29, Peter Zijlstra wrote:
>> On Sat, Apr 09, 2016 at 11:25:39AM +0800, Xunlei Pang wrote:
>>
>>>> In any case, I just realized we do not in fact provide this guarantee
>>>> (of pointing to a blocked task) that needs a bit more work.
>>> Current patch calls rt_mutex_adjust_prio() before wake_up_q() the
>>> wakee, at that moment the wakee has been removed by
>>> rt_mutex_slowunlock()->mark_wakeup_next_waiter(), from current's
>>> pi_waiters, rt_mutex_adjust_prio() won't see this wakee, so I think
>>> this should not be problem.
>> No, any wakeup after mark_wakeup_next_waiter() will make the task run.
>> And since we must always consider spurious wakeups, we cannot rely on us
>> (eg. our wake_up_q call) being the one and only.
>>
>> Therefore it is possible and the only thing that stands between us and
>> doom is the fact that the wake_q stuff holds a task reference.
>>
>> But we cannot guarantee that the task we have a pointer to is in fact
>> blocked.
> Does that really matters? the pointer is accessed on behalf of current,  and current
> will call rt_mutex_adjust_prio() very soon to update the right pointer.
>
> Also the pointer is used to update current's deadline/runtime, we can restore these
> params in rt_mutex_setprio() for deboost cases. I just checked current code,  it did
> nothing to restore current's deadline/runtime when deboosting, maybe we can leave
> this job to future deadline bandwidth inheritance?
>
> Regards,
> Xunlei
>

I spotted another issue, we access pi_task without any lock in enqueue_task_dl(),
though we have "dl_prio(pi_task->normal_prio)" condition, that's not enough, "dl_period"
and "dl_runtime" of pi_task can change, if it changed to !deadline class, dl_runtime
was cleared to 0, we will hit a forever loop in replenish_dl_entity() below:
    while (dl_se->runtime <= 0) {
        dl_se->deadline += pi_se->dl_period;
        dl_se->runtime += pi_se->dl_runtime;
    }

or hit "BUG_ON(pi_se->dl_runtime <= 0);".

That's all because without any lock of that task, there is no guarantee.

So I'm thinking of adding more members in rt_mutex_waiter(we don't lose memory,
it's defined on stack) and use this structure instead of task_struct as the top waiter
(i.e. using get_pi_top_waiter() instead of get_pi_top_task()), like:

Step1:
struct rt_mutex_waiter {
         int prio;
+       /* updated under waiter's pi_lock and rt_mutex lock */
+       u64 dl_runtime, dl_period;
+       /*
+        * under owner's pi_lock, rq lock, and rt_mutex lock, copied
+        * directly from dl_runtime, dl_period(under same rt_mutex lock).
+        */
+       u64 dl_runtime_copy, dl_period_copy;

Similarly, adding the memember in task_struct:
#ifdef CONFIG_RT_MUTEXES
    /* PI waiters blocked on a rt_mutex held by this task */
    struct rb_root pi_waiters;
    struct rb_node *pi_waiters_leftmost;
+  struct rb_node *pi_waiters_leftmost_copy; /* updated unlock pi_lock and rq lock */
    /* Deadlock detection and priority inheritance handling */
    struct rt_mutex_waiter *pi_blocked_on;
#endif

Then, we can update "pi_waiters_leftmost_copy" together with "dl_runtime_copy"
and "dl_period_copy" under rq lock, then enqueue_task_dl() can access them
without any problem.

Step2:
We must update "dl_runtime_copy" and "dl_period_copy" under rt_mutex lock,
because it is copied from "dl_runtime" and "dl_period" of rt_mutex_waiter, so we
add a new update function as long as we held rq lock and rt_mutex lock, mainly
can be implemented in rt_mutex_setprio.

Step3:
Note that rt_mutex_setprio() can be called without rtmutex lock by rt_mutex_adjust_prio(),
we can add a parameter to indicate not doing the copy-updating work at that place,
the same applies to rt_mutex_setprio(add a new waiter parameter and keep the original
"prio" parameter). Then we can do the copy-updating work in mark_wakeup_next_waiter()
before unlocking current's pi_lock, as long as we hold rq lock, because rtmutex lock and
owner's pi_lock was already held.

This can also solve the issue you mentioned with only a little overhead involved.

Regards,
Xunlei

next prev parent reply	other threads:[~2016-04-12  3:08 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-01 11:00 [PATCH] sched/deadline/rtmutex: Fix a PI crash for deadline tasks Xunlei Pang
2016-04-01 11:38 ` Peter Zijlstra
2016-04-01 12:23   ` Xunlei Pang
2016-04-01 13:12     ` Peter Zijlstra
2016-04-01 13:34       ` Xunlei Pang
2016-04-01 21:51         ` Peter Zijlstra
2016-04-02 10:19           ` Xunlei Pang
2016-04-05  8:38           ` Xunlei Pang
2016-04-05  9:19             ` Peter Zijlstra
2016-04-05  9:29               ` Peter Zijlstra
2016-04-05 10:48                 ` Xunlei Pang
2016-04-05 11:32                   ` Peter Zijlstra
2016-04-08 16:25                 ` Steven Rostedt
2016-04-08 17:38                   ` Peter Zijlstra
2016-04-08 18:50                     ` Steven Rostedt
2016-04-08 18:59                       ` Peter Zijlstra
2016-04-08 19:15                         ` Steven Rostedt
2016-04-08 19:28                           ` Steven Rostedt
2016-04-09  3:27                             ` Xunlei Pang
2016-04-09  3:25                         ` Xunlei Pang
2016-04-09 13:29                           ` Peter Zijlstra
2016-04-10  8:22                             ` Xunlei Pang
2016-04-12  3:08                               ` Xunlei Pang [this message]
2016-04-12 15:51                                 ` Peter Zijlstra
2016-04-13  2:13                                   ` Xunlei Pang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=570C6694.50803@redhat.com \
    --to=xpang@redhat.com \
    --cc=juri.lelli@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=xlpang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.