From: Peter Zijlstra <peterz@infradead.org>
To: Marco Elver <elver@google.com>
Cc: paulmck@kernel.org, Alexander Potapenko <glider@google.com>,
syzbot <syzbot+0ec1e96c2cdf5c0e512a@syzkaller.appspotmail.com>,
audit@vger.kernel.org, eparis@redhat.com,
linux-kernel@vger.kernel.org, paul@paul-moore.com,
syzkaller-bugs@googlegroups.com, kent.overstreet@linux.dev
Subject: Re: [syzbot] [kernel?] KCSAN: assert: race in dequeue_entities
Date: Wed, 23 Oct 2024 11:36:41 +0200 [thread overview]
Message-ID: <20241023093641.GE16066@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <CANpmjNNx6QM67jSaAtkYdxA+A5_FGPLBkBxdVXQ_XOLB8pgzNw@mail.gmail.com>
On Wed, Oct 23, 2024 at 11:03:11AM +0200, Marco Elver wrote:
> On Wed, 23 Oct 2024 at 10:54, Marco Elver <elver@google.com> wrote:
> >
> > On Tue, Oct 22, 2024 at 09:57PM +0200, Marco Elver wrote:
> > > On Tue, 22 Oct 2024 at 21:12, Peter Zijlstra <peterz@infradead.org> wrote:
> > [...]
> > > > So KCSAn is trying to tell me these two paths run concurrently on the
> > > > same 'p' ?!? That would be a horrible bug -- both these call chains
> > > > should be holding rq->__lock (for task_rq(p)).
> > >
> > > Yes correct.
> > >
> > > And just to confirm this is no false positive, the way KCSAN works
> > > _requires_ the race to actually happen before it reports anything;
> > > this can also be seen in Alexander's report with just 1 stack trace
> > > where it saw the value transition from 0 to 1 (TASK_ON_RQ_QUEUED) but
> > > didn't know who did the write because kernel/sched was uninstrumented.
> >
> > Got another version of the splat with CONFIG_KCSAN_VERBOSE=y. Lockdep seems to
> > think that both threads here are holding rq->__lock.
>
> Gotta read more carefully, one instance is ffffa2e57dc2f398 another is
> ffffa2e57dd2f398. If I read it right, then they're not actually the
> same lock.
Yeah, as explained in the diagram below, the moment the ->on_rq = 0
store goes through, we no longer own the task. And since
ASSERT_EXCLUSIVE_WRITER is after that, we go splat.
The below patch changes this order and switches to using
smp_store_release() and ensures to not reference the task after it.
I've boot tested it, but not much else.
Could you please give this a go (on top of -rc3)?
This also explains the SCHED_WARN_ON() Kent saw, that is subject to the
same race.
---
kernel/sched/fair.c | 21 ++++++++++++++-------
kernel/sched/sched.h | 34 ++++++++++++++++++++++++++++++++--
2 files changed, 46 insertions(+), 9 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6512258dc71f..8edac978edb2 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5625,8 +5625,9 @@ pick_next_entity(struct rq *rq, struct cfs_rq *cfs_rq)
struct sched_entity *se = pick_eevdf(cfs_rq);
if (se->sched_delayed) {
dequeue_entities(rq, se, DEQUEUE_SLEEP | DEQUEUE_DELAYED);
- SCHED_WARN_ON(se->sched_delayed);
- SCHED_WARN_ON(se->on_rq);
+ /*
+ * Must not reference @se again, see __block_task().
+ */
return NULL;
}
return se;
@@ -7170,7 +7171,11 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
/* Fix-up what dequeue_task_fair() skipped */
hrtick_update(rq);
- /* Fix-up what block_task() skipped. */
+ /*
+ * Fix-up what block_task() skipped.
+ *
+ * Must be last, @p might not be valid after this.
+ */
__block_task(rq, p);
}
@@ -7187,12 +7192,14 @@ static bool dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
if (!(p->se.sched_delayed && (task_on_rq_migrating(p) || (flags & DEQUEUE_SAVE))))
util_est_dequeue(&rq->cfs, p);
- if (dequeue_entities(rq, &p->se, flags) < 0) {
- util_est_update(&rq->cfs, p, DEQUEUE_SLEEP);
+ util_est_update(&rq->cfs, p, flags & DEQUEUE_SLEEP);
+ if (dequeue_entities(rq, &p->se, flags) < 0)
return false;
- }
- util_est_update(&rq->cfs, p, flags & DEQUEUE_SLEEP);
+ /*
+ * Must not reference @p after dequeue_entities(DEQUEUE_DELAYED).
+ */
+
hrtick_update(rq);
return true;
}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 7b139016cbd9..32e9c41b7ec0 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2791,8 +2791,6 @@ static inline void sub_nr_running(struct rq *rq, unsigned count)
static inline void __block_task(struct rq *rq, struct task_struct *p)
{
- WRITE_ONCE(p->on_rq, 0);
- ASSERT_EXCLUSIVE_WRITER(p->on_rq);
if (p->sched_contributes_to_load)
rq->nr_uninterruptible++;
@@ -2800,6 +2798,38 @@ static inline void __block_task(struct rq *rq, struct task_struct *p)
atomic_inc(&rq->nr_iowait);
delayacct_blkio_start();
}
+
+ ASSERT_EXCLUSIVE_WRITER(p->on_rq);
+
+ /*
+ * The moment this write goes through, ttwu() can swoop in and migrate
+ * this task, rendering our rq->__lock ineffective.
+ *
+ * __schedule() try_to_wake_up()
+ * LOCK rq->__lock LOCK p->pi_lock
+ * pick_next_task()
+ * pick_next_task_fair()
+ * pick_next_entity()
+ * dequeue_entities()
+ * __block_task()
+ * RELEASE p->on_rq = 0; if (p->on_rq && ...)
+ * break;
+ *
+ * ACQUIRE (after ctrl-dep)
+ *
+ * cpu = select_task_rq();
+ * set_task_cpu(p, cpu);
+ * ttwu_queue()
+ * ttwu_do_activate()
+ * LOCK rq->__lock
+ * activate_task()
+ * STORE p->on_rq = 1
+ * UNLOCK rq->__lock
+ *
+ * Callers must ensure to not reference @p after this -- we no longer
+ * own it.
+ */
+ smp_store_release(&p->on_rq, 0);
}
extern void activate_task(struct rq *rq, struct task_struct *p, int flags);
next prev parent reply other threads:[~2024-10-23 9:36 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-27 14:57 [syzbot] [kernel?] KCSAN: assert: race in dequeue_entities syzbot
2024-10-22 8:06 ` Alexander Potapenko
2024-10-22 11:31 ` Peter Zijlstra
2024-10-22 13:40 ` Marco Elver
2024-10-22 14:31 ` Paul E. McKenney
2024-10-23 14:13 ` Paul E. McKenney
2024-10-22 19:12 ` Peter Zijlstra
2024-10-22 19:57 ` Marco Elver
2024-10-23 8:54 ` Marco Elver
2024-10-23 9:03 ` Marco Elver
2024-10-23 9:36 ` Peter Zijlstra [this message]
2024-10-23 13:18 ` Marco Elver
2024-10-24 10:22 ` [tip: sched/urgent] sched: Fix pick_next_task_fair() vs try_to_wake_up() race tip-bot2 for Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241023093641.GE16066@noisy.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=audit@vger.kernel.org \
--cc=elver@google.com \
--cc=eparis@redhat.com \
--cc=glider@google.com \
--cc=kent.overstreet@linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=paul@paul-moore.com \
--cc=paulmck@kernel.org \
--cc=syzbot+0ec1e96c2cdf5c0e512a@syzkaller.appspotmail.com \
--cc=syzkaller-bugs@googlegroups.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.