public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: chenjinghuang <chenjinghuang2@huawei.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: "mingo@redhat.com" <mingo@redhat.com>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"juri.lelli@redhat.com" <juri.lelli@redhat.com>,
	"vincent.guittot@linaro.org" <vincent.guittot@linaro.org>,
	"dietmar.eggemann@arm.com" <dietmar.eggemann@arm.com>,
	"bsegall@google.com" <bsegall@google.com>,
	"mgorman@suse.de" <mgorman@suse.de>,
	"vschneid@redhat.com" <vschneid@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: 回复: [PATCH] sched/rt: rto_next_cpu: Skip CPUs with NEED_RESCHED
Date: Tue, 25 Nov 2025 07:26:36 +0000	[thread overview]
Message-ID: <4b60e303c2ac4fa0b6dc51e629427492@huawei.com> (raw)
In-Reply-To: <20251121123811.3d34b10b@gandalf.local.home>



-----邮件原件-----
发件人: Steven Rostedt <rostedt@goodmis.org> 
发送时间: 2025年11月22日 1:38
收件人: chenjinghuang <chenjinghuang2@huawei.com>
抄送: mingo@redhat.com; peterz@infradead.org; juri.lelli@redhat.com; vincent.guittot@linaro.org; dietmar.eggemann@arm.com; bsegall@google.com; mgorman@suse.de; vschneid@redhat.com; linux-kernel@vger.kernel.org
主题: Re: [PATCH] sched/rt: rto_next_cpu: Skip CPUs with NEED_RESCHED

On Fri, 21 Nov 2025 01:40:04 +0000
Chen Jinghuang <chenjinghuang2@huawei.com> wrote:

> CPU0 becomes overloaded when hosting a CPU-bound RT task, a 
> non-CPU-bound RT task, and a CFS task stuck in kernel space. When 
> other CPUs switch from RT to non-RT tasks, RT load balancing (LB) is 
> triggered; with HAVE_RT_PUSH_IPI enabled, they send IPIs to CPU0 to 
> drive the execution of rto_push_irq_work_func. During push_rt_task on 
> CPU0, if next_task->prio < rq->donor->prio, resched_curr() sets 
> NEED_RESCHED and after the push operation completes, CPU0 calls rto_next_cpu().
> Since only CPU0 is overloaded in this scenario, rto_next_cpu() should 
> ideally return -1 (no further IPI needed).
> 
> However, multiple CPUs invoking tell_cpu_to_push() during LB 
> increments
> rd->rto_loop_next. Even when rd->rto_cpu is set to -1, the mismatch 
> rd->between rto_loop and rd->rto_loop_next forces rto next_cpu() to 
> rd->restart its
> search from -1. With CPU0 remaining overloaded(""satisfying 
> rt_nr_migratory && rt_nr_total > 1), it gets reselected, causing CPU0 
> to queue irq_work to itself and send self-IPIs repeatedly. As long as 
> CPU0 stays overloaded and other CPUs run pull_rt_tasks(), it falls 
> into an infinite self-IPI loop, wasting CPU cycles on unnecessary interrupt handling.

Is it truly "infinite", or just wasted due to other CPUs requesting a pull?

Also, it appears the issue here is that it's sending to itself.

The IPI explosion in this scenario is caused by two combined factors-cross-CPU 
IPIs triggered by other CPUs repeatedly initiating pull_rt_tasks(), and self-IPIs sent by
CPU0 after reselecting itself in rto_next_cpu(). These two factors form a chain reaction, 
resulting in a "de facto infinite stream of redundant IPIs" while CPU0 remains overloaded.

> 
> The triggering scenario is as follows:
> 
>          cpu0	        	   cpu1               	      cpu2
>                    	        pull_rt_task
> 	                      tell_cpu_to_push
>                  <------------irq_work_queue_on rto_push_irq_work_func
>        push_rt_task
>     resched_curr(rq)                                      pull_rt_task
>     rto_next_cpu                                        tell_cpu_to_push
>      			 <-------------------------- atomic_inc(rto_loop_next)
> rd->rto_loop != next
>      rto_next_cpu
>    irq_work_queue_on
> rto_push_irq_work_func
> 
> Fix redundant self-IPI/cross-CPU IPI when target CPU already has a 
> pending reschedule, making the IPI unnecessary.
> 
> Signed-off-by: Chen Jinghuang <chenjinghuang2@huawei.com>
> ---
>  kernel/sched/rt.c | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 
> 7936d4333731..29ce1af9f121 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -2123,8 +2123,20 @@ static int rto_next_cpu(struct root_domain *rd)
>  
>  		rd->rto_cpu = cpu;
>  
> -		if (cpu < nr_cpu_ids)
> +		if (cpu < nr_cpu_ids) {
> +			struct task_struct *t;
> +			struct rq *rq = cpu_rq(cpu);
> +
> +			rcu_read_lock();
> +			t = rcu_dereference(rq->curr);
> +			if (test_tsk_need_resched(t)) {
> +				rcu_read_unlock();
> +				continue;
> +			}
> +			rcu_read_unlock();
> +
>  			return cpu;
> +		}
>  
>  		rd->rto_cpu = -1;
>  

Instead of skipping need resched, would skipping the current CPU work too?

Acknowledge that "sending IPI to itself" is the direct trigger for the loop. The 
original approach of checking NEED_RESCHED was an indirect optimization 
that did not address the core issue.

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 7936d4333731..cacd8912cd31 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2100,6 +2100,7 @@ static void push_rt_tasks(struct rq *rq)
  */
 static int rto_next_cpu(struct root_domain *rd)  {
+	int this_cpu = smp_processor_id();
 	int next;
 	int cpu;
 
@@ -2118,10 +2119,13 @@ static int rto_next_cpu(struct root_domain *rd)
 	 */
 	for (;;) {
 
-		/* When rto_cpu is -1 this acts like cpumask_first() */
-		cpu = cpumask_next(rd->rto_cpu, rd->rto_mask);
+		do {
+			/* When rto_cpu is -1 this acts like cpumask_first() */
+			cpu = cpumask_next(rd->rto_cpu, rd->rto_mask);
+			rd->rto_cpu = cpu;
 
-		rd->rto_cpu = cpu;
+			/* Do not send IPI to self */
+		} while (cpu == this_cpu);
 
 		if (cpu < nr_cpu_ids)
 			return cpu;

-- Steve


  reply	other threads:[~2025-11-25  7:26 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-21  1:40 [PATCH] sched/rt: rto_next_cpu: Skip CPUs with NEED_RESCHED Chen Jinghuang
2025-11-21 17:38 ` Steven Rostedt
2025-11-25  7:26   ` chenjinghuang [this message]
2025-11-25  8:36   ` [PATCH v2] sched/rt: Skip currently executing CPU in rto_next_cpu() Chen Jinghuang
2025-11-25 16:26     ` Steven Rostedt
2025-11-26  5:54       ` [PATCH v3] " Chen Jinghuang
2025-12-04  7:35       ` [PATCH v3] sched/rt: Skip currently executing CPU in rto_next_cpu() - Request for merge Chen Jinghuang
2025-12-04 15:44         ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4b60e303c2ac4fa0b6dc51e629427492@huawei.com \
    --to=chenjinghuang2@huawei.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox