Re: [PATCH] sched/deadline/rtmutex: Fix a PI crash for deadline tasks

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Xunlei Pang <xpang@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>,
	linux-kernel@vger.kernel.org, Juri Lelli <juri.lelli@arm.com>,
	Ingo Molnar <mingo@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH] sched/deadline/rtmutex: Fix a PI crash for deadline tasks
Date: Tue, 12 Apr 2016 11:08:04 +0800	[thread overview]
Message-ID: <570C6694.50803@redhat.com> (raw)
In-Reply-To: <570A0D37.8080400@redhat.com>

On 2016／04／10 at 16:22, Xunlei Pang wrote:
> On 2016/04/09 at 21:29, Peter Zijlstra wrote:
>> On Sat, Apr 09, 2016 at 11:25:39AM +0800, Xunlei Pang wrote:
>>
>>>> In any case, I just realized we do not in fact provide this guarantee
>>>> (of pointing to a blocked task) that needs a bit more work.
>>> Current patch calls rt_mutex_adjust_prio() before wake_up_q() the
>>> wakee, at that moment the wakee has been removed by
>>> rt_mutex_slowunlock()->mark_wakeup_next_waiter(), from current's
>>> pi_waiters, rt_mutex_adjust_prio() won't see this wakee, so I think
>>> this should not be problem.
>> No, any wakeup after mark_wakeup_next_waiter() will make the task run.
>> And since we must always consider spurious wakeups, we cannot rely on us
>> (eg. our wake_up_q call) being the one and only.
>>
>> Therefore it is possible and the only thing that stands between us and
>> doom is the fact that the wake_q stuff holds a task reference.
>>
>> But we cannot guarantee that the task we have a pointer to is in fact
>> blocked.
> Does that really matters? the pointer is accessed on behalf of current,  and current
> will call rt_mutex_adjust_prio() very soon to update the right pointer.
>
> Also the pointer is used to update current's deadline/runtime, we can restore these
> params in rt_mutex_setprio() for deboost cases. I just checked current code,  it did
> nothing to restore current's deadline/runtime when deboosting, maybe we can leave
> this job to future deadline bandwidth inheritance?
>
> Regards,
> Xunlei
>

I spotted another issue, we access pi_task without any lock in enqueue_task_dl(),
though we have "dl_prio(pi_task->normal_prio)" condition, that's not enough, "dl_period"
and "dl_runtime" of pi_task can change, if it changed to !deadline class, dl_runtime
was cleared to 0, we will hit a forever loop in replenish_dl_entity() below:
    while (dl_se->runtime <= 0) {
        dl_se->deadline += pi_se->dl_period;
        dl_se->runtime += pi_se->dl_runtime;
    }

or hit "BUG_ON(pi_se->dl_runtime <= 0);".

That's all because without any lock of that task, there is no guarantee.

So I'm thinking of adding more members in rt_mutex_waiter(we don't lose memory,
it's defined on stack) and use this structure instead of task_struct as the top waiter
(i.e. using get_pi_top_waiter() instead of get_pi_top_task()), like:

Step1:
struct rt_mutex_waiter {
         int prio;
+       /* updated under waiter's pi_lock and rt_mutex lock */
+       u64 dl_runtime, dl_period;
+       /*
+        * under owner's pi_lock, rq lock, and rt_mutex lock, copied
+        * directly from dl_runtime, dl_period(under same rt_mutex lock).
+        */
+       u64 dl_runtime_copy, dl_period_copy;

Similarly, adding the memember in task_struct:
#ifdef CONFIG_RT_MUTEXES
    /* PI waiters blocked on a rt_mutex held by this task */
    struct rb_root pi_waiters;
    struct rb_node *pi_waiters_leftmost;
+  struct rb_node *pi_waiters_leftmost_copy; /* updated unlock pi_lock and rq lock */
    /* Deadlock detection and priority inheritance handling */
    struct rt_mutex_waiter *pi_blocked_on;
#endif

Then, we can update "pi_waiters_leftmost_copy" together with "dl_runtime_copy"
and "dl_period_copy" under rq lock, then enqueue_task_dl() can access them
without any problem.

Step2:
We must update "dl_runtime_copy" and "dl_period_copy" under rt_mutex lock,
because it is copied from "dl_runtime" and "dl_period" of rt_mutex_waiter, so we
add a new update function as long as we held rq lock and rt_mutex lock, mainly
can be implemented in rt_mutex_setprio.

Step3:
Note that rt_mutex_setprio() can be called without rtmutex lock by rt_mutex_adjust_prio(),
we can add a parameter to indicate not doing the copy-updating work at that place,
the same applies to rt_mutex_setprio(add a new waiter parameter and keep the original
"prio" parameter). Then we can do the copy-updating work in mark_wakeup_next_waiter()
before unlocking current's pi_lock, as long as we hold rq lock, because rtmutex lock and
owner's pi_lock was already held.

This can also solve the issue you mentioned with only a little overhead involved.

Regards,
Xunlei

next prev parent reply	other threads:[~2016-04-12  3:08 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-01 11:00 [PATCH] sched/deadline/rtmutex: Fix a PI crash for deadline tasks Xunlei Pang
2016-04-01 11:38 ` Peter Zijlstra
2016-04-01 12:23   ` Xunlei Pang
2016-04-01 13:12     ` Peter Zijlstra
2016-04-01 13:34       ` Xunlei Pang
2016-04-01 21:51         ` Peter Zijlstra
2016-04-02 10:19           ` Xunlei Pang
2016-04-05  8:38           ` Xunlei Pang
2016-04-05  9:19             ` Peter Zijlstra
2016-04-05  9:29               ` Peter Zijlstra
2016-04-05 10:48                 ` Xunlei Pang
2016-04-05 11:32                   ` Peter Zijlstra
2016-04-08 16:25                 ` Steven Rostedt
2016-04-08 17:38                   ` Peter Zijlstra
2016-04-08 18:50                     ` Steven Rostedt
2016-04-08 18:59                       ` Peter Zijlstra
2016-04-08 19:15                         ` Steven Rostedt
2016-04-08 19:28                           ` Steven Rostedt
2016-04-09  3:27                             ` Xunlei Pang
2016-04-09  3:25                         ` Xunlei Pang
2016-04-09 13:29                           ` Peter Zijlstra
2016-04-10  8:22                             ` Xunlei Pang
2016-04-12  3:08                               ` Xunlei Pang [this message]
2016-04-12 15:51                                 ` Peter Zijlstra
2016-04-13  2:13                                   ` Xunlei Pang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=570C6694.50803@redhat.com \
    --to=xpang@redhat.com \
    --cc=juri.lelli@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=xlpang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).