All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
To: Valentin Schneider <vschneid@redhat.com>
Cc: linux-kernel@vger.kernel.org,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH] sched/rt: Make rt_rq->pushable_tasks updates drive rto_mask
Date: Tue, 15 Aug 2023 16:21:21 +0200	[thread overview]
Message-ID: <20230815142121.MoZplZUr@linutronix.de> (raw)
In-Reply-To: <20230811112044.3302588-1-vschneid@redhat.com>

On 2023-08-11 12:20:44 [+0100], Valentin Schneider wrote:
> Sebastian noted that the rto_push_work IRQ work can be queued for a CPU
> that has an empty pushable_tasks list, which means nothing useful will be
> done in the IPI other than queue the work for the next CPU on the rto_mask.
> 
> rto_push_irq_work_func() only operates on tasks in the pushable_tasks list,
> but the conditions for that irq_work to be queued (and for a CPU to be
> added to the rto_mask) rely on rq_rt->nr_migratory instead.
> 
> nr_migratory is increased whenever an RT task entity is enqueued and it has
> nr_cpus_allowed > 1. Unlike the pushable_tasks list, nr_migratory includes a
> rt_rq's current task. This means a rt_rq can have a migratible current, N
> non-migratible queued tasks, and be flagged as overloaded / have its CPU
> set in the rto_mask, despite having an empty pushable_tasks list.
> 
> Make an rt_rq's overload logic be driven by {enqueue,dequeue}_pushable_task().
> Since rt_rq->{rt_nr_migratory,rt_nr_total} become unused, remove them.
> 
> Note that the case where the current task is pushed away to make way for a
> migration-disabled task remains unchanged: the migration-disabled task has
> to be in the pushable_tasks list in the first place, which means it has
> nr_cpus_allowed > 1.
> 
> Link: http://lore.kernel.org/r/20230801152648._y603AS_@linutronix.de
> Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Signed-off-by: Valentin Schneider <vschneid@redhat.com>
> ---
> This is lightly tested, this looks to be working OK but I don't have nor am
> I aware of a test case for RT balancing, I suppose we want something that
> asserts we always run the N highest prio tasks for N CPUs, with a small
> margin for migrations?

I don't see the storm of IPIs I saw before. So as far that goes:
   Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

What I still observe is:
- CPU0 is idle. CPU0 gets a task assigned from CPU1. That task receives
  a wakeup. CPU0 returns from idle and schedules the task.
  pull_rt_task() on CPU1 and sometimes on other CPU observe this, too.
  CPU1 sends irq_work to CPU0 while at the time rto_next_cpu() sees that
  has_pushable_tasks() return 0. That bit was cleared earlier (as per
  tracing).

- CPU0 is idle. CPU0 gets a task assigned from CPU1. The task on CPU0 is
  woken up without an IPI (yay). But then pull_rt_task() decides that
  send irq_work and has_pushable_tasks() said that is has tasks left
  so….
  Now: rto_push_irq_work_func() run once once on CPU0, does nothing,
  rto_next_cpu() return CPU0 again and enqueues itself again on CPU0.
  Usually after the second or third round the scheduler on CPU0 makes
  enough progress to remove the task/ clear the CPU from mask.

I understand that there is a race and the CPU is cleared from rto_mask
shortly after checking. Therefore I would suggest to look at
has_pushable_tasks() before returning a CPU in rto_next_cpu() as I did
just to avoid the interruption which does nothing.

For the second case the irq_work seems to make no progress. I don't see
any trace_events in hardirq, the mask is cleared outside hardirq (idle
code). The NEED_RESCHED bit is set for current therefore it doesn't make
sense to send irq_work to reschedule if the current already has this on
its agenda.

So what about something like:

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 00e0e50741153..d963408855e25 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2247,8 +2247,23 @@ static int rto_next_cpu(struct root_domain *rd)
 
 		rd->rto_cpu = cpu;
 
-		if (cpu < nr_cpu_ids)
+		if (cpu < nr_cpu_ids) {
+			struct task_struct *t;
+
+			if (!has_pushable_tasks(cpu_rq(cpu)))
+				continue;
+
+			rcu_read_lock();
+			t = rcu_dereference(rq->curr);
+			/* if (test_preempt_need_resched_cpu(cpu_rq(cpu))) */
+			if (test_tsk_need_resched(t)) {
+				rcu_read_unlock();
+				continue;
+			}
+			rcu_read_unlock();
+
 			return cpu;
+		}
 
 		rd->rto_cpu = -1;
 
Sebastian

  reply	other threads:[~2023-08-15 14:22 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-11 11:20 [PATCH] sched/rt: Make rt_rq->pushable_tasks updates drive rto_mask Valentin Schneider
2023-08-15 14:21 ` Sebastian Andrzej Siewior [this message]
2023-09-11 10:54   ` Valentin Schneider
2023-09-20 13:38     ` Sebastian Andrzej Siewior
2023-09-25  8:27   ` Ingo Molnar
2023-09-25 12:09     ` Valentin Schneider
2023-09-25  8:55 ` [tip: sched/core] " tip-bot2 for Valentin Schneider
2023-09-25 10:11   ` Peter Zijlstra
2023-09-25 12:21     ` Valentin Schneider
2023-09-28 21:21       ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230815142121.MoZplZUr@linutronix.de \
    --to=bigeasy@linutronix.de \
    --cc=bristot@redhat.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.