public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Harshit Agarwal <harshit@nutanix.com>
Cc: Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	linux-kernel@vger.kernel.org, Jon Kohler <jon@nutanix.com>,
	Gauri Patwardhan <gauri.patwardhan@nutanix.com>,
	Rahul Chunduru <rahul.chunduru@nutanix.com>,
	Will Ton <william.ton@nutanix.com>,
	stable@vger.kernel.org
Subject: Re: [PATCH v3] sched/rt: Fix race in push_rt_task
Date: Wed, 2 Apr 2025 14:47:12 +0200	[thread overview]
Message-ID: <20250402124712.GN25239@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <20250225180553.167995-1-harshit@nutanix.com>

On Tue, Feb 25, 2025 at 06:05:53PM +0000, Harshit Agarwal wrote:

> Details
> =======
> Let's look at the following scenario to understand this race.
> 
> 1) CPU A enters push_rt_task
>   a) CPU A has chosen next_task = task p.
>   b) CPU A calls find_lock_lowest_rq(Task p, CPU Z’s rq).
>   c) CPU A identifies CPU X as a destination CPU (X < Z).
>   d) CPU A enters double_lock_balance(CPU Z’s rq, CPU X’s rq).
>   e) Since X is lower than Z, CPU A unlocks CPU Z’s rq. Someone else has
>      locked CPU X’s rq, and thus, CPU A must wait.
> 
> 2) At CPU Z
>   a) Previous task has completed execution and thus, CPU Z enters
>      schedule, locks its own rq after CPU A releases it.
>   b) CPU Z dequeues previous task and begins executing task p.
>   c) CPU Z unlocks its rq.
>   d) Task p yields the CPU (ex. by doing IO or waiting to acquire a
>      lock) which triggers the schedule function on CPU Z.
>   e) CPU Z enters schedule again, locks its own rq, and dequeues task p.
>   f) As part of dequeue, it sets p.on_rq = 0 and unlocks its rq.
> 
> 3) At CPU B
>   a) CPU B enters try_to_wake_up with input task p.
>   b) Since CPU Z dequeued task p, p.on_rq = 0, and CPU B updates
>      B.state = WAKING.
>   c) CPU B via select_task_rq determines CPU Y as the target CPU.
> 
> 4) The race
>   a) CPU A acquires CPU X’s lock and relocks CPU Z.
>   b) CPU A reads task p.cpu = Z and incorrectly concludes task p is
>      still on CPU Z.
>   c) CPU A failed to notice task p had been dequeued from CPU Z while
>      CPU A was waiting for locks in double_lock_balance. If CPU A knew
>      that task p had been dequeued, it would return NULL forcing
>      push_rt_task to give up the task p's migration.
>   d) CPU B updates task p.cpu = Y and calls ttwu_queue.
>   e) CPU B locks Ys rq. CPU B enqueues task p onto Y and sets task
>      p.on_rq = 1.
>   f) CPU B unlocks CPU Y, triggering memory synchronization.
>   g) CPU A reads task p.on_rq = 1, cementing its assumption that task p
>      has not migrated.
>   h) CPU A decides to migrate p to CPU X.
> 
> This leads to A dequeuing p from Y's queue and various crashes down the
> line.
> 
> Solution
> ========
> The solution here is fairly simple. After obtaining the lock (at 4a),
> the check is enhanced to make sure that the task is still at the head of
> the pushable tasks list. If not, then it is anyway not suitable for
> being pushed out.
> 
> Testing
> =======
> The fix is tested on a cluster of 3 nodes, where the panics due to this
> are hit every couple of days. A fix similar to this was deployed on such
> cluster and was stable for more than 30 days.
> 
> Co-developed-by: Jon Kohler <jon@nutanix.com>
> Signed-off-by: Jon Kohler <jon@nutanix.com>
> Co-developed-by: Gauri Patwardhan <gauri.patwardhan@nutanix.com>
> Signed-off-by: Gauri Patwardhan <gauri.patwardhan@nutanix.com>
> Co-developed-by: Rahul Chunduru <rahul.chunduru@nutanix.com>
> Signed-off-by: Rahul Chunduru <rahul.chunduru@nutanix.com>
> Signed-off-by: Harshit Agarwal <harshit@nutanix.com>
> Tested-by: Will Ton <william.ton@nutanix.com>
> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> Cc: stable@vger.kernel.org
> ---

Thanks, I've picked this up to land after -rc1.



  parent reply	other threads:[~2025-04-02 12:47 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-25 18:05 [PATCH v3] sched/rt: Fix race in push_rt_task Harshit Agarwal
2025-03-04  9:15 ` Juri Lelli
2025-03-04 15:30   ` Steven Rostedt
2025-03-04 16:18     ` Juri Lelli
2025-03-04 18:31       ` Harshit Agarwal
2025-03-04 18:31         ` Harshit Agarwal
2025-03-04 18:37       ` Harshit Agarwal
2025-03-05 10:43         ` Juri Lelli
2025-03-07 20:54           ` Harshit Agarwal
2025-03-04 18:57       ` Harshit Agarwal
2025-03-26 18:31 ` Phil Auld
2025-04-02 12:47 ` Peter Zijlstra [this message]
2025-04-08 19:05 ` [tip: sched/core] " tip-bot2 for Harshit Agarwal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250402124712.GN25239@noisy.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=gauri.patwardhan@nutanix.com \
    --cc=harshit@nutanix.com \
    --cc=jon@nutanix.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=rahul.chunduru@nutanix.com \
    --cc=rostedt@goodmis.org \
    --cc=stable@vger.kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=william.ton@nutanix.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox