From: Peter Zijlstra <peterz@infradead.org>
To: Harshit Agarwal <harshit@nutanix.com>
Cc: Ingo Molnar <mingo@redhat.com>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Valentin Schneider <vschneid@redhat.com>,
linux-kernel@vger.kernel.org, Jon Kohler <jon@nutanix.com>,
Gauri Patwardhan <gauri.patwardhan@nutanix.com>,
Rahul Chunduru <rahul.chunduru@nutanix.com>,
Will Ton <william.ton@nutanix.com>,
stable@vger.kernel.org
Subject: Re: [PATCH v3] sched/rt: Fix race in push_rt_task
Date: Wed, 2 Apr 2025 14:47:12 +0200 [thread overview]
Message-ID: <20250402124712.GN25239@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <20250225180553.167995-1-harshit@nutanix.com>
On Tue, Feb 25, 2025 at 06:05:53PM +0000, Harshit Agarwal wrote:
> Details
> =======
> Let's look at the following scenario to understand this race.
>
> 1) CPU A enters push_rt_task
> a) CPU A has chosen next_task = task p.
> b) CPU A calls find_lock_lowest_rq(Task p, CPU Z’s rq).
> c) CPU A identifies CPU X as a destination CPU (X < Z).
> d) CPU A enters double_lock_balance(CPU Z’s rq, CPU X’s rq).
> e) Since X is lower than Z, CPU A unlocks CPU Z’s rq. Someone else has
> locked CPU X’s rq, and thus, CPU A must wait.
>
> 2) At CPU Z
> a) Previous task has completed execution and thus, CPU Z enters
> schedule, locks its own rq after CPU A releases it.
> b) CPU Z dequeues previous task and begins executing task p.
> c) CPU Z unlocks its rq.
> d) Task p yields the CPU (ex. by doing IO or waiting to acquire a
> lock) which triggers the schedule function on CPU Z.
> e) CPU Z enters schedule again, locks its own rq, and dequeues task p.
> f) As part of dequeue, it sets p.on_rq = 0 and unlocks its rq.
>
> 3) At CPU B
> a) CPU B enters try_to_wake_up with input task p.
> b) Since CPU Z dequeued task p, p.on_rq = 0, and CPU B updates
> B.state = WAKING.
> c) CPU B via select_task_rq determines CPU Y as the target CPU.
>
> 4) The race
> a) CPU A acquires CPU X’s lock and relocks CPU Z.
> b) CPU A reads task p.cpu = Z and incorrectly concludes task p is
> still on CPU Z.
> c) CPU A failed to notice task p had been dequeued from CPU Z while
> CPU A was waiting for locks in double_lock_balance. If CPU A knew
> that task p had been dequeued, it would return NULL forcing
> push_rt_task to give up the task p's migration.
> d) CPU B updates task p.cpu = Y and calls ttwu_queue.
> e) CPU B locks Ys rq. CPU B enqueues task p onto Y and sets task
> p.on_rq = 1.
> f) CPU B unlocks CPU Y, triggering memory synchronization.
> g) CPU A reads task p.on_rq = 1, cementing its assumption that task p
> has not migrated.
> h) CPU A decides to migrate p to CPU X.
>
> This leads to A dequeuing p from Y's queue and various crashes down the
> line.
>
> Solution
> ========
> The solution here is fairly simple. After obtaining the lock (at 4a),
> the check is enhanced to make sure that the task is still at the head of
> the pushable tasks list. If not, then it is anyway not suitable for
> being pushed out.
>
> Testing
> =======
> The fix is tested on a cluster of 3 nodes, where the panics due to this
> are hit every couple of days. A fix similar to this was deployed on such
> cluster and was stable for more than 30 days.
>
> Co-developed-by: Jon Kohler <jon@nutanix.com>
> Signed-off-by: Jon Kohler <jon@nutanix.com>
> Co-developed-by: Gauri Patwardhan <gauri.patwardhan@nutanix.com>
> Signed-off-by: Gauri Patwardhan <gauri.patwardhan@nutanix.com>
> Co-developed-by: Rahul Chunduru <rahul.chunduru@nutanix.com>
> Signed-off-by: Rahul Chunduru <rahul.chunduru@nutanix.com>
> Signed-off-by: Harshit Agarwal <harshit@nutanix.com>
> Tested-by: Will Ton <william.ton@nutanix.com>
> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> Cc: stable@vger.kernel.org
> ---
Thanks, I've picked this up to land after -rc1.
next prev parent reply other threads:[~2025-04-02 12:47 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-25 18:05 [PATCH v3] sched/rt: Fix race in push_rt_task Harshit Agarwal
2025-03-04 9:15 ` Juri Lelli
2025-03-04 15:30 ` Steven Rostedt
2025-03-04 16:18 ` Juri Lelli
2025-03-04 18:31 ` Harshit Agarwal
2025-03-04 18:31 ` Harshit Agarwal
2025-03-04 18:37 ` Harshit Agarwal
2025-03-05 10:43 ` Juri Lelli
2025-03-07 20:54 ` Harshit Agarwal
2025-03-04 18:57 ` Harshit Agarwal
2025-03-26 18:31 ` Phil Auld
2025-04-02 12:47 ` Peter Zijlstra [this message]
2025-04-08 19:05 ` [tip: sched/core] " tip-bot2 for Harshit Agarwal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250402124712.GN25239@noisy.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=gauri.patwardhan@nutanix.com \
--cc=harshit@nutanix.com \
--cc=jon@nutanix.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=rahul.chunduru@nutanix.com \
--cc=rostedt@goodmis.org \
--cc=stable@vger.kernel.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=william.ton@nutanix.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox