From: Joel Fernandes <joel@joelfernandes.org>
To: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Connor O'Brien <connoro@google.com>,
linux-kernel@vger.kernel.org, kernel-team@android.com,
John Stultz <jstultz@google.com>,
Qais Yousef <qais.yousef@arm.com>, Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Daniel Bristot de Oliveira <bristot@redhat.com>,
Valentin Schneider <vschneid@redhat.com>,
Will Deacon <will@kernel.org>, Waiman Long <longman@redhat.com>,
Boqun Feng <boqun.feng@gmail.com>,
"Paul E . McKenney" <paulmck@kernel.org>
Subject: Re: [RFC PATCH 07/11] sched: Add proxy execution
Date: Tue, 22 Nov 2022 18:45:08 +0000 [thread overview]
Message-ID: <Y30YtPo/RSzTi4q5@google.com> (raw)
In-Reply-To: <Y3r3vvzfONR2sY9V@google.com>
On Mon, Nov 21, 2022 at 03:59:58AM +0000, Joel Fernandes wrote:
> On Sun, Nov 20, 2022 at 08:49:22PM -0500, Joel Fernandes wrote:
> > On Sun, Nov 20, 2022 at 7:22 PM Joel Fernandes <joel@joelfernandes.org> wrote:
> > >
> > > Hello Dietmar,
> > >
> > > On Fri, Nov 04, 2022 at 06:09:26PM +0100, Dietmar Eggemann wrote:
> > > > On 31/10/2022 19:00, Joel Fernandes wrote:
> > > > > On Mon, Oct 31, 2022 at 05:39:45PM +0100, Dietmar Eggemann wrote:
> > > > >> On 29/10/2022 05:31, Joel Fernandes wrote:
> > > > >>> Hello Dietmar,
> > > > >>>
> > > > >>>> On Oct 24, 2022, at 6:13 AM, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
> > > > >>>>
> > > > >>>> On 03/10/2022 23:44, Connor O'Brien wrote:
> > > > >>>>> From: Peter Zijlstra <peterz@infradead.org>
> > > >
> > > > [...]
> > > >
> > > > >>>>> + rq_unpin_lock(rq, rf);
> > > > >>>>> + raw_spin_rq_unlock(rq);
> > > > >>>>
> > > > >>>> Don't we run into rq_pin_lock()'s:
> > > > >>>>
> > > > >>>> SCHED_WARN_ON(rq->balance_callback && rq->balance_callback !=
> > > > >>>> &balance_push_callback)
> > > > >>>>
> > > > >>>> by releasing rq lock between queue_balance_callback(, push_rt/dl_tasks)
> > > > >>>> and __balance_callbacks()?
> > > > >>>
> > > > >>> Apologies, I’m a bit lost here. The code you are responding to
> > > > >>> inline does not call rq_pin_lock, it calls rq_unpin_lock. So
> > > > >>> what scenario does the warning trigger according to you?
> > > > >>
> > > > >> True, but the code which sneaks in between proxy()'s
> > > > >> raw_spin_rq_unlock(rq) and raw_spin_rq_lock(rq) does.
> > > > >>
> > > > >
> > > > > Got it now, thanks a lot for clarifying. Can this be fixed by do a
> > > > > __balance_callbacks() at:
> > > >
> > > > I tried the:
> > > >
> > > > head = splice_balance_callbacks(rq)
> > > > task_rq_unlock(rq, p, &rf);
> > > > ...
> > > > balance_callbacks(rq, head);
> > > >
> > > > separation known from __sched_setscheduler() in __schedule() (right
> > > > after pick_next_task()) but it doesn't work. Lot of `BUG: scheduling
> > > > while atomic:`
> > >
> > > How about something like the following? This should exclude concurrent
> > > balance callback queues from other CPUs and let us release the rq lock early
> > > in proxy(). I ran locktorture with your diff to make writer threads RT, and I
> > > cannot reproduce any crash with it:
> > >
> > > ---8<-----------------------
> > >
> > > From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> > > Subject: [PATCH] Exclude balance callback queuing during proxy's migrate
> > >
> > > In commit 565790d28b1e ("sched: Fix balance_callback()"), it is clear that rq
> > > lock needs to be held when __balance_callbacks() in schedule() is called.
> > > However, it is possible that because rq lock is dropped in proxy(), another
> > > CPU, say in __sched_setscheduler() can queue balancing callbacks and cause
> > > issues.
> > >
> > > To remedy this, exclude balance callback queuing on other CPUs, during the
> > > proxy().
> > >
> > > Reported-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > ---
> > > kernel/sched/core.c | 15 +++++++++++++++
> > > kernel/sched/sched.h | 3 +++
> > > 2 files changed, 18 insertions(+)
> > >
> > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > > index 88a5fa34dc06..f1dac21fcd90 100644
> > > --- a/kernel/sched/core.c
> > > +++ b/kernel/sched/core.c
> > > @@ -6739,6 +6739,10 @@ proxy(struct rq *rq, struct task_struct *next, struct rq_flags *rf)
> > > p->wake_cpu = wake_cpu;
> > > }
> > >
> > > + // Prevent other CPUs from queuing balance callbacks while we migrate
> > > + // tasks in the migrate_list with the rq lock released.
> > > + raw_spin_lock(&rq->balance_lock);
> > > +
> > > rq_unpin_lock(rq, rf);
> > > raw_spin_rq_unlock(rq);
> > > raw_spin_rq_lock(that_rq);
> > > @@ -6758,7 +6762,18 @@ proxy(struct rq *rq, struct task_struct *next, struct rq_flags *rf)
> > > }
> > >
> > > raw_spin_rq_unlock(that_rq);
> > > +
> > > + // This may make lockdep unhappy as we acquire rq->lock with balance_lock
> > > + // held. But that should be a false positive, as the following pattern
> > > + // happens only on the current CPU with interrupts disabled:
> > > + // rq_lock()
> > > + // balance_lock();
> > > + // rq_unlock();
> > > + // rq_lock();
> > > raw_spin_rq_lock(rq);
> >
> > Hmm, I think there's still a chance of deadlock here. I need to
> > rethink it a bit, but that's the idea I was going for.
>
> Took care of that, and came up with the below. Tested with locktorture and it
> survives. Thoughts?
Found a new bug in my patch during locktorture + hotplug. I was missing an
unlock() during early return of queue_balance_callback(). Please try the
following version of the patch for testing. To trigger the bug in the v2, I
gave the following options to locktorture:
locktorture.stutter=5 locktorture.torture_type=mutex_lock
locktorture.onoff_interval=1 locktorture.onoff_holdoff=10
locktorture.nwriters_stress=10
Now it is fixed.
---8<-----------------------
From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Subject: [PATCH v3] Exclude balance callback queuing during proxy's migrate
In commit 565790d28b1e ("sched: Fix balance_callback()"), it is clear that rq
lock needs to be held when __balance_callbacks() in schedule() is called.
However, it is possible that because rq lock is dropped in proxy(), another
CPU, say in __sched_setscheduler() can queue balancing callbacks and cause
issues.
To remedy this, exclude balance callback queuing on other CPUs, during the
proxy().
Reported-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
kernel/sched/core.c | 72 ++++++++++++++++++++++++++++++++++++++++++--
kernel/sched/sched.h | 7 ++++-
2 files changed, 76 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 88a5fa34dc06..aba90b3dc3ef 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -633,6 +633,29 @@ struct rq *__task_rq_lock(struct task_struct *p, struct rq_flags *rf)
}
}
+/*
+ * Helper to call __task_rq_lock safely, in scenarios where we might be about to
+ * queue a balance callback on a remote CPU. That CPU might be in proxy(), and
+ * could have released its rq lock while holding balance_lock. So release rq
+ * lock in such a situation to avoid deadlock, and retry.
+ */
+struct rq *__task_rq_lock_balance(struct task_struct *p, struct rq_flags *rf)
+{
+ struct rq *rq;
+ bool locked = false;
+
+ do {
+ if (locked) {
+ __task_rq_unlock(rq, rf);
+ cpu_relax();
+ }
+ rq = __task_rq_lock(p, rf);
+ locked = true;
+ } while (raw_spin_is_locked(&rq->balance_lock));
+
+ return rq;
+}
+
/*
* task_rq_lock - lock p->pi_lock and lock the rq @p resides on.
*/
@@ -675,6 +698,29 @@ struct rq *task_rq_lock(struct task_struct *p, struct rq_flags *rf)
}
}
+/*
+ * Helper to call task_rq_lock safely, in scenarios where we might be about to
+ * queue a balance callback on a remote CPU. That CPU might be in proxy(), and
+ * could have released its rq lock while holding balance_lock. So release rq
+ * lock in such a situation to avoid deadlock, and retry.
+ */
+struct rq *task_rq_lock_balance(struct task_struct *p, struct rq_flags *rf)
+{
+ struct rq *rq;
+ bool locked = false;
+
+ do {
+ if (locked) {
+ task_rq_unlock(rq, p, rf);
+ cpu_relax();
+ }
+ rq = task_rq_lock(p, rf);
+ locked = true;
+ } while (raw_spin_is_locked(&rq->balance_lock));
+
+ return rq;
+}
+
/*
* RQ-clock updating methods:
*/
@@ -6739,6 +6785,12 @@ proxy(struct rq *rq, struct task_struct *next, struct rq_flags *rf)
p->wake_cpu = wake_cpu;
}
+ /*
+ * Prevent other CPUs from queuing balance callbacks while we migrate
+ * tasks in the migrate_list with the rq lock released.
+ */
+ raw_spin_lock(&rq->balance_lock);
+
rq_unpin_lock(rq, rf);
raw_spin_rq_unlock(rq);
raw_spin_rq_lock(that_rq);
@@ -6758,7 +6810,21 @@ proxy(struct rq *rq, struct task_struct *next, struct rq_flags *rf)
}
raw_spin_rq_unlock(that_rq);
+
+ /*
+ * This may make lockdep unhappy as we acquire rq->lock with
+ * balance_lock held. But that should be a false positive, as the
+ * following pattern happens only on the current CPU with interrupts
+ * disabled:
+ * rq_lock()
+ * balance_lock();
+ * rq_unlock();
+ * rq_lock();
+ */
raw_spin_rq_lock(rq);
+
+ raw_spin_unlock(&rq->balance_lock);
+
rq_repin_lock(rq, rf);
return NULL; /* Retry task selection on _this_ CPU. */
@@ -7489,7 +7555,7 @@ void rt_mutex_setprio(struct task_struct *p, struct task_struct *pi_task)
if (p->pi_top_task == pi_task && prio == p->prio && !dl_prio(prio))
return;
- rq = __task_rq_lock(p, &rf);
+ rq = __task_rq_lock_balance(p, &rf);
update_rq_clock(rq);
/*
* Set under pi_lock && rq->lock, such that the value can be used under
@@ -8093,7 +8159,8 @@ static int __sched_setscheduler(struct task_struct *p,
* To be able to change p->policy safely, the appropriate
* runqueue lock must be held.
*/
- rq = task_rq_lock(p, &rf);
+ rq = task_rq_lock_balance(p, &rf);
+
update_rq_clock(rq);
/*
@@ -10312,6 +10379,7 @@ void __init sched_init(void)
rq = cpu_rq(i);
raw_spin_lock_init(&rq->__lock);
+ raw_spin_lock_init(&rq->balance_lock);
rq->nr_running = 0;
rq->calc_load_active = 0;
rq->calc_load_update = jiffies + LOAD_FREQ;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 354e75587fed..17947a4009de 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1057,6 +1057,7 @@ struct rq {
unsigned long cpu_capacity_orig;
struct callback_head *balance_callback;
+ raw_spinlock_t balance_lock;
unsigned char nohz_idle_balance;
unsigned char idle_balance;
@@ -1748,18 +1749,22 @@ queue_balance_callback(struct rq *rq,
void (*func)(struct rq *rq))
{
lockdep_assert_rq_held(rq);
+ raw_spin_lock(&rq->balance_lock);
/*
* Don't (re)queue an already queued item; nor queue anything when
* balance_push() is active, see the comment with
* balance_push_callback.
*/
- if (unlikely(head->next || rq->balance_callback == &balance_push_callback))
+ if (unlikely(head->next || rq->balance_callback == &balance_push_callback)) {
+ raw_spin_unlock(&rq->balance_lock);
return;
+ }
head->func = (void (*)(struct callback_head *))func;
head->next = rq->balance_callback;
rq->balance_callback = head;
+ raw_spin_unlock(&rq->balance_lock);
}
#define rcu_dereference_check_sched_domain(p) \
--
2.38.1.584.g0f3c55d4c2-goog
next prev parent reply other threads:[~2022-11-22 18:45 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-03 21:44 [RFC PATCH 00/11] Reviving the Proxy Execution Series Connor O'Brien
2022-10-03 21:44 ` [RFC PATCH 01/11] locking/ww_mutex: Remove wakeups from under mutex::wait_lock Connor O'Brien
2022-10-04 16:01 ` Waiman Long
2022-10-12 23:54 ` Joel Fernandes
2022-10-20 18:43 ` Connor O'Brien
2022-10-03 21:44 ` [RFC PATCH 02/11] locking/mutex: Rework task_struct::blocked_on Connor O'Brien
2022-10-03 21:44 ` [RFC PATCH 03/11] kernel/locking: Add p->blocked_on wrapper Connor O'Brien
2022-10-03 21:44 ` [RFC PATCH 04/11] locking/mutex: make mutex::wait_lock irq safe Connor O'Brien
2022-10-13 4:30 ` Joel Fernandes
2022-10-03 21:44 ` [RFC PATCH 05/11] sched: Split scheduler execution context Connor O'Brien
2022-10-14 17:01 ` Joel Fernandes
2022-10-19 17:17 ` Valentin Schneider
2022-10-20 18:43 ` Connor O'Brien
2022-10-03 21:44 ` [RFC PATCH 06/11] kernel/locking: Expose mutex_owner() Connor O'Brien
2022-10-03 21:44 ` [RFC PATCH 07/11] sched: Add proxy execution Connor O'Brien
2022-10-12 1:54 ` Joel Fernandes
2022-10-12 9:46 ` Juri Lelli
2022-10-14 17:07 ` Joel Fernandes
2022-10-15 13:53 ` Peter Zijlstra
2022-10-16 20:48 ` Steven Rostedt
2022-10-17 4:03 ` Joel Fernandes
2022-10-17 7:26 ` Peter Zijlstra
2022-10-24 22:33 ` Qais Yousef
2022-10-25 11:19 ` Joel Fernandes
2022-10-25 22:10 ` Qais Yousef
2022-10-15 15:28 ` Peter Zijlstra
2022-10-15 15:08 ` Peter Zijlstra
2022-10-15 15:10 ` Peter Zijlstra
2022-10-15 15:47 ` Peter Zijlstra
2022-10-24 10:13 ` Dietmar Eggemann
2022-10-29 3:31 ` Joel Fernandes
2022-10-31 16:39 ` Dietmar Eggemann
2022-10-31 18:00 ` Joel Fernandes
2022-11-04 17:09 ` Dietmar Eggemann
2022-11-21 0:22 ` Joel Fernandes
2022-11-21 1:49 ` Joel Fernandes
2022-11-21 3:59 ` Joel Fernandes
2022-11-22 18:45 ` Joel Fernandes [this message]
2023-01-09 8:51 ` Chen Yu
2022-10-03 21:44 ` [RFC PATCH 08/11] sched: Fixup task CPUs for potential proxies Connor O'Brien
2022-10-03 21:44 ` [RFC PATCH 09/11] sched/rt: Fix proxy/current (push,pull)ability Connor O'Brien
2022-10-10 11:40 ` Valentin Schneider
2022-10-14 22:32 ` Connor O'Brien
2022-10-19 17:05 ` Valentin Schneider
2022-10-20 13:30 ` Juri Lelli
2022-10-20 16:14 ` Valentin Schneider
2022-10-21 2:22 ` Connor O'Brien
2022-10-03 21:45 ` [RFC PATCH 10/11] torture: support randomized shuffling for proxy exec testing Connor O'Brien
2022-11-12 16:54 ` Joel Fernandes
2022-11-14 20:44 ` Connor O'Brien
2022-11-15 16:02 ` Joel Fernandes
2022-10-03 21:45 ` [RFC PATCH 11/11] locktorture: support nested mutexes Connor O'Brien
2022-10-06 9:59 ` [RFC PATCH 00/11] Reviving the Proxy Execution Series Juri Lelli
2022-10-06 10:07 ` Peter Zijlstra
2022-10-06 12:14 ` Juri Lelli
2022-10-15 15:44 ` Peter Zijlstra
2022-10-17 2:23 ` Joel Fernandes
2022-10-19 11:43 ` Qais Yousef
2022-10-19 12:23 ` Joel Fernandes
2022-10-19 13:41 ` Juri Lelli
2022-10-19 13:51 ` Joel Fernandes
2022-10-19 19:30 ` Qais Yousef
2022-10-20 8:51 ` Joel Fernandes
2022-10-17 3:25 ` Chengming Zhou
2022-10-17 3:56 ` Joel Fernandes
2022-10-17 4:26 ` Chengming Zhou
2022-10-17 12:27 ` Joel Fernandes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y30YtPo/RSzTi4q5@google.com \
--to=joel@joelfernandes.org \
--cc=boqun.feng@gmail.com \
--cc=bristot@redhat.com \
--cc=bsegall@google.com \
--cc=connoro@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=jstultz@google.com \
--cc=juri.lelli@redhat.com \
--cc=kernel-team@android.com \
--cc=linux-kernel@vger.kernel.org \
--cc=longman@redhat.com \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=qais.yousef@arm.com \
--cc=rostedt@goodmis.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.