Re: [PATCH -v2 15/17] sched: Fix migrate_disable() vs rt/dl balancing

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Valentin Schneider <valentin.schneider@arm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: tglx@linutronix.de, mingo@kernel.org,
	linux-kernel@vger.kernel.org, bigeasy@linutronix.de,
	qais.yousef@arm.com, swood@redhat.com, juri.lelli@redhat.com,
	vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
	rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
	bristot@redhat.com, vincent.donnefort@arm.com, tj@kernel.org
Subject: Re: [PATCH -v2 15/17] sched: Fix migrate_disable() vs rt/dl balancing
Date: Wed, 07 Oct 2020 20:22:44 +0100	[thread overview]
Message-ID: <jhjpn5tn8mz.mognet@arm.com> (raw)
In-Reply-To: <20201005150922.458081448@infradead.org>


On 05/10/20 15:57, Peter Zijlstra wrote:
> In order to minimize the interference of migrate_disable() on lower
> priority tasks, which can be deprived of runtime due to being stuck
> below a higher priority task. Teach the RT/DL balancers to push away
> these higher priority tasks when a lower priority task gets selected
> to run on a freshly demoted CPU (pull).
>
> This adds migration interference to the higher priority task, but
> restores bandwidth to system that would otherwise be irrevocably lost.
> Without this it would be possible to have all tasks on the system
> stuck on a single CPU, each task preempted in a migrate_disable()
> section with a single high priority task running.
>
> This way we can still approximate running the M highest priority tasks
> on the system.
>
> Migrating the top task away is (ofcourse) still subject to
> migrate_disable() too, which means the lower task is subject to an
> interference equivalent to the worst case migrate_disable() section.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[...]
> @@ -1917,6 +1912,49 @@ static int migration_cpu_stop(void *data
>       return 0;
>  }
>
> +int push_cpu_stop(void *arg)
> +{
> +	struct rq *lowest_rq = NULL, *rq = this_rq();
> +	struct task_struct *p = arg;
> +
> +	raw_spin_lock_irq(&p->pi_lock);
> +	raw_spin_lock(&rq->lock);
> +
> +	if (task_rq(p) != rq)
> +		goto out_unlock;
> +
> +	if (is_migration_disabled(p)) {
> +		p->migration_flags |= MDF_PUSH;
> +		goto out_unlock;
> +	}
> +
> +	p->migration_flags &= ~MDF_PUSH;
> +
> +	if (p->sched_class->find_lock_rq)
> +		lowest_rq = p->sched_class->find_lock_rq(p, rq);
> +
> +	if (!lowest_rq)
> +		goto out_unlock;
> +
> +	// XXX validate p is still the highest prio task

So we want to move some !migrate_disable(), highest priority task to make
room for a migrate_disable(), lower priority task. We're working with the
task that was highest prio at the time of kicking the CPU stopper
(i.e. back when we invoked get_push_task()), but as you point out here it
might no longer be of highest prio.

I was thinking that since this is done in stopper context we could peek at
what actually is the current (!stopper) highest prio task, but that implies
first grabbing the rq lock and *then* some p->pi_lock, which is a no-no :(

Regarding the check, I've cobbled the below. I'm not fond of the implicit
expectation that p will always be > CFS, but there's no CFS .find_lock_rq()
(... for now).

---
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 69b1173eaf45..3ed339980f87 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1919,6 +1919,7 @@ static int migration_cpu_stop(void *data)
 int push_cpu_stop(void *arg)
 {
        struct rq *lowest_rq = NULL, *rq = this_rq();
+	const struct sched_class *class;
        struct task_struct *p = arg;

        raw_spin_lock_irq(&p->pi_lock);
@@ -1940,14 +1941,23 @@ int push_cpu_stop(void *arg)
        if (!lowest_rq)
                goto out_unlock;

-	// XXX validate p is still the highest prio task
-	if (task_rq(p) == rq) {
-		deactivate_task(rq, p, 0);
-		set_task_cpu(p, lowest_rq->cpu);
-		activate_task(lowest_rq, p, 0);
-		resched_curr(lowest_rq);
+	// Validate p is still the highest prio task
+	for_class_range(class, &stop_sched_class - 1, p->sched_class - 1) {
+		struct task_struct *curr = class->peek_next_task(rq);
+
+		if (!curr)
+			continue;
+		if (curr != p)
+			goto out_unlock;
+		else
+			break;
        }

+	deactivate_task(rq, p, 0);
+	set_task_cpu(p, lowest_rq->cpu);
+	activate_task(lowest_rq, p, 0);
+	resched_curr(lowest_rq);
+
        double_unlock_balance(rq, lowest_rq);

 out_unlock:
@@ -7312,7 +7322,7 @@ int sched_cpu_deactivate(unsigned int cpu)

        rq_lock_irqsave(rq, &rf);
        if (rq->rd) {
-		update_rq_clock();
+		update_rq_clock(rq);
                BUG_ON(!cpumask_test_cpu(cpu, rq->rd->span));
                set_rq_offline(rq);
        }
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 15320ede2f45..7964c42b8604 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1840,6 +1840,19 @@ static struct task_struct *pick_next_task_dl(struct rq *rq)
        return p;
 }

+static struct task_struct *peek_next_task_dl(struct rq *rq)
+{
+	struct sched_dl_entity *dl_se;
+	struct dl_rq *dl_rq = &rq->dl;
+
+	if (!sched_dl_runnable(rq))
+		return NULL;
+
+	dl_se = pick_next_dl_entity(rq, dl_rq);
+	BUG_ON(!dl_se);
+	return dl_task_of(dl_se);
+}
+
 static void put_prev_task_dl(struct rq *rq, struct task_struct *p)
 {
        update_curr_dl(rq);
@@ -2498,6 +2511,8 @@ const struct sched_class dl_sched_class
        .check_preempt_curr	= check_preempt_curr_dl,

        .pick_next_task		= pick_next_task_dl,
+	.peek_next_task         = peek_next_task_dl,
+
        .put_prev_task		= put_prev_task_dl,
        .set_next_task		= set_next_task_dl,

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index e90a69b3e85c..83949e9018a3 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1636,6 +1636,14 @@ static struct task_struct *pick_next_task_rt(struct rq *rq)
        return p;
 }

+static struct task_struct *peek_next_task_rt(struct rq *rq)
+{
+	if (!sched_rt_runnable(rq))
+		return NULL;
+
+	return _pick_next_task_rt(rq);
+}
+
 static void put_prev_task_rt(struct rq *rq, struct task_struct *p)
 {
        update_curr_rt(rq);
@@ -2479,6 +2487,8 @@ const struct sched_class rt_sched_class
        .check_preempt_curr	= check_preempt_curr_rt,

        .pick_next_task		= pick_next_task_rt,
+	.peek_next_task         = peek_next_task_rt,
+
        .put_prev_task		= put_prev_task_rt,
        .set_next_task          = set_next_task_rt,

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index d2621155393c..7cd3b8682375 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1811,6 +1811,7 @@ struct sched_class {
        void (*check_preempt_curr)(struct rq *rq, struct task_struct *p, int flags);

        struct task_struct *(*pick_next_task)(struct rq *rq);
+	struct task_struct *(*peek_next_task)(struct rq *rq);

        void (*put_prev_task)(struct rq *rq, struct task_struct *p);
        void (*set_next_task)(struct rq *rq, struct task_struct *p, bool first);

next prev parent reply	other threads:[~2020-10-07 19:24 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-05 14:57 [PATCH -v2 00/17] sched: Migrate disable support Peter Zijlstra
2020-10-05 14:57 ` [PATCH -v2 01/17] stop_machine: Add function and caller debug info Peter Zijlstra
2020-10-05 14:57 ` [PATCH -v2 02/17] sched: Fix balance_callback() Peter Zijlstra
2020-10-05 14:57 ` [PATCH -v2 03/17] sched/hotplug: Ensure only per-cpu kthreads run during hotplug Peter Zijlstra
2020-10-06  7:25   ` Peter Zijlstra
2020-10-05 14:57 ` [PATCH -v2 04/17] sched/core: Wait for tasks being pushed away on hotplug Peter Zijlstra
2020-10-05 14:57 ` [PATCH -v2 05/17] workqueue: Manually break affinity " Peter Zijlstra
2020-10-05 16:31   ` Tejun Heo
2020-10-05 14:57 ` [PATCH -v2 06/17] sched/hotplug: Consolidate task migration on CPU unplug Peter Zijlstra
2020-10-05 14:57 ` [PATCH -v2 07/17] sched: Fix hotplug vs CPU bandwidth control Peter Zijlstra
2020-10-06 12:46   ` Vincent Guittot
2020-10-06 13:33     ` Peter Zijlstra
2020-10-09 20:41   ` Dietmar Eggemann
2020-10-12 12:52     ` Peter Zijlstra
2020-10-12 13:18       ` Peter Zijlstra
2020-10-12 14:14         ` Dietmar Eggemann
2020-10-05 14:57 ` [PATCH -v2 08/17] sched: Massage set_cpus_allowed() Peter Zijlstra
2020-10-06 11:20   ` Valentin Schneider
2020-10-05 14:57 ` [PATCH -v2 09/17] sched: Add migrate_disable() Peter Zijlstra
2020-10-06 11:20   ` Valentin Schneider
2020-10-05 14:57 ` [PATCH -v2 10/17] sched: Fix migrate_disable() vs set_cpus_allowed_ptr() Peter Zijlstra
     [not found]   ` <CH2PR14MB41833F828B4D3BA5A7B6CE7B9A0B0@CH2PR14MB4183.namprd14.prod.outlook.com>
2020-10-12 11:36     ` Peter Zijlstra
     [not found]       ` <CH2PR14MB4183BF26F02A4AD4F45CADA59A070@CH2PR14MB4183.namprd14.prod.outlook.com>
2020-10-13 13:20         ` Valentin Schneider
2020-10-05 14:57 ` [PATCH -v2 11/17] sched/core: Make migrate disable and CPU hotplug cooperative Peter Zijlstra
2020-10-05 14:57 ` [PATCH -v2 12/17] sched,rt: Use cpumask_any*_distribute() Peter Zijlstra
2020-10-06 14:09   ` Juri Lelli
2020-10-06 14:20     ` Peter Zijlstra
2020-10-06 15:55   ` Qais Yousef
2020-10-07  7:22     ` Peter Zijlstra
2020-10-05 14:57 ` [PATCH -v2 13/17] sched,rt: Use the full cpumask for balancing Peter Zijlstra
2020-10-05 14:57 ` [PATCH -v2 14/17] sched, lockdep: Annotate ->pi_lock recursion Peter Zijlstra
2020-10-05 14:57 ` [PATCH -v2 15/17] sched: Fix migrate_disable() vs rt/dl balancing Peter Zijlstra
2020-10-06  7:59   ` Peter Zijlstra
2020-10-06 13:44     ` Steven Rostedt
2020-10-06 13:50       ` Peter Zijlstra
2020-10-06 11:20   ` Valentin Schneider
2020-10-06 13:48     ` Peter Zijlstra
2020-10-06 14:37       ` Juri Lelli
2020-10-06 14:48         ` Peter Zijlstra
2020-10-07  5:29           ` Juri Lelli
2020-10-06 16:19       ` Valentin Schneider
2020-10-07 19:22   ` Valentin Schneider [this message]
2020-10-07 21:08     ` Peter Zijlstra
2020-10-07 22:32       ` Valentin Schneider
2020-10-08 10:48   ` Sebastian Andrzej Siewior
2020-10-12  9:56   ` Dietmar Eggemann
2020-10-12 11:28     ` Peter Zijlstra
2020-10-12 12:22       ` Dietmar Eggemann
2020-10-05 14:57 ` [PATCH -v2 16/17] sched/proc: Print accurate cpumask vs migrate_disable() Peter Zijlstra
2020-10-05 14:57 ` [PATCH -v2 17/17] sched: Add migrate_disable() tracepoints Peter Zijlstra
2020-10-13 14:01 ` [PATCH 1/2] sched: Deny self-issued __set_cpus_allowed_ptr() when migrate_disable() Valentin Schneider
2020-10-13 14:15   ` Sebastian Andrzej Siewior
2020-10-13 14:21     ` Peter Zijlstra
2020-10-15  9:14       ` Valentin Schneider
2020-10-13 14:22     ` Valentin Schneider
2020-11-11  8:23   ` [tip: sched/core] " tip-bot2 for Valentin Schneider
2020-10-13 14:01 ` [PATCH 2/2] sched: Comment affine_move_task() Valentin Schneider
2020-11-11  8:23   ` [tip: sched/core] " tip-bot2 for Valentin Schneider

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:69b1173eaf4 dfblob:3ed339980f8 dfblob:15320ede2f4
dfblob:7964c42b860 dfblob:e90a69b3e85 dfblob:83949e9018a
dfblob:d2621155393 dfblob:7cd3b868237 )
 OR (
bs:"Re: [PATCH -v2 15/17] sched: Fix migrate_disable() vs rt/dl balancing" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=jhjpn5tn8mz.mognet@arm.com \
    --to=valentin.schneider@arm.com \
    --cc=bigeasy@linutronix.de \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=qais.yousef@arm.com \
    --cc=rostedt@goodmis.org \
    --cc=swood@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=vincent.donnefort@arm.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox