Re: [PATCH] sched,workqueue: Use READ_ONCE()/WRITE_ONCE() for wake_cpu accesses

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: Tejun Heo <tj@kernel.org>
Cc: Yu Peng <pengyu@kylinos.cn>,
	mingo@redhat.com, juri.lelli@redhat.com,
	vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
	rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
	vschneid@redhat.com, jiangshanlai@gmail.com,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] sched,workqueue: Use READ_ONCE()/WRITE_ONCE() for wake_cpu accesses
Date: Mon, 30 Mar 2026 16:36:29 +0200	[thread overview]
Message-ID: <20260330143629.GP3738010@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <aca4lM-AsEdstYCf@slm.duckdns.org>

On Fri, Mar 27, 2026 at 07:04:20AM -1000, Tejun Heo wrote:
> Hello,
> 
> On Fri, Mar 27, 2026 at 10:27:39AM +0100, Peter Zijlstra wrote:
> > On Fri, Mar 27, 2026 at 03:30:07PM +0800, Yu Peng wrote:
> > > task_struct->wake_cpu is used as a wake placement hint by scheduler code
> > > and workqueue's non-strict affinity repatriation path.
> > > 
> > > These accesses are intentionally lockless and stale values are tolerated,
> > > affecting only wakeup placement. 
> > 
> > The scheduler usage isn't lockless. And I'm not at all sure why and how
> > workqueue is poking at this stuff :/ It has no business poking at it.
> 
> Oh, you were involved in the discussion. It's workqueue using ->wake_cpu as
> a way to implement opportunistic affinity at work boundaries as past history
> doesn't mean anything at these boundaries.
> 
>  https://lore.kernel.org/all/20230519001709.2563-1-tj@kernel.org/

Bah, clearly I didn't remember ...

> Hmmm... it would trigger KCSAN. Maybe a better way to do it is exposing
> sched interface that does WRITE_ONCE wrapping? How do you want it resolved?

Perhaps a try_to_wake_up() variant that takes a cpumask (see the
completely untested code below). But I'm not entirely sure. The current
thing very much relies on a bunch of implementation details that aren't
guaranteed.

The below will 'fake' the task cpumask for a while; and note that the
caller has to be careful to provide a mask that only includes tasks that
were set in the current task, otherwise 'funny' things will happen.
Also, it will only use this mask if it ends up doing CPU selection at
all.

Ooh, I suppose I should make sure wake_cpu is inside the provided map
too...

That said, I don't think there is any urgency here, this code has been
like this for a long time, it can stay this way a little longer.


---
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 496dff740dca..81006b5c881d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4089,7 +4089,8 @@ bool ttwu_state_match(struct task_struct *p, unsigned int state, int *success)
  * Return: %true if @p->state changes (an actual wakeup was done),
  *	   %false otherwise.
  */
-int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
+int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags,
+		   const struct cpumask *wake_mask)
 {
 	guard(preempt)();
 	int cpu, success = 0;
@@ -4128,6 +4129,8 @@ int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 	 * in set_current_state() that the waiting thread does.
 	 */
 	scoped_guard (raw_spinlock_irqsave, &p->pi_lock) {
+		const struct cpumask *prev_mask = NULL;
+
 		smp_mb__after_spinlock();
 		if (!ttwu_state_match(p, state, &success))
 			break;
@@ -4227,7 +4230,14 @@ int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 		 */
 		smp_cond_load_acquire(&p->on_cpu, !VAL);
 
+		if (wake_mask && !is_migration_disabled(p)) {
+			prev_mask = p->cpus_ptr;
+			p->cpus_ptr = wake_mask;
+		}
 		cpu = select_task_rq(p, p->wake_cpu, &wake_flags);
+		if (prev_mask)
+			p->cpus_ptr = prev_mask;
+
 		if (task_cpu(p) != cpu) {
 			if (p->in_iowait) {
 				delayacct_blkio_end(p);
@@ -4371,13 +4381,13 @@ struct task_struct *cpu_curr_snapshot(int cpu)
  */
 int wake_up_process(struct task_struct *p)
 {
-	return try_to_wake_up(p, TASK_NORMAL, 0);
+	return try_to_wake_up(p, TASK_NORMAL, 0, NULL);
 }
 EXPORT_SYMBOL(wake_up_process);
 
 int wake_up_state(struct task_struct *p, unsigned int state)
 {
-	return try_to_wake_up(p, state, 0);
+	return try_to_wake_up(p, state, 0, NULL);
 }
 
 /*
@@ -7247,7 +7257,7 @@ int default_wake_function(wait_queue_entry_t *curr, unsigned mode, int wake_flag
 			  void *key)
 {
 	WARN_ON_ONCE(wake_flags & ~(WF_SYNC|WF_CURRENT_CPU));
-	return try_to_wake_up(curr->private, mode, wake_flags);
+	return try_to_wake_up(curr->private, mode, wake_flags, NULL);
 }
 EXPORT_SYMBOL(default_wake_function);
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 43bbf0693cca..4c9d475da072 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -3760,7 +3760,8 @@ static inline bool is_per_cpu_kthread(struct task_struct *p)
 extern void swake_up_all_locked(struct swait_queue_head *q);
 extern void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait);
 
-extern int try_to_wake_up(struct task_struct *tsk, unsigned int state, int wake_flags);
+extern int try_to_wake_up(struct task_struct *tsk, unsigned int state, int wake_flags,
+			  const struct cpumask *wake_mask);
 
 #ifdef CONFIG_PREEMPT_DYNAMIC
 extern int preempt_dynamic_mode;
diff --git a/kernel/sched/swait.c b/kernel/sched/swait.c
index 0fef6496c4c8..3eb2002fc847 100644
--- a/kernel/sched/swait.c
+++ b/kernel/sched/swait.c
@@ -27,7 +27,7 @@ void swake_up_locked(struct swait_queue_head *q, int wake_flags)
 		return;
 
 	curr = list_first_entry(&q->task_list, typeof(*curr), task_list);
-	try_to_wake_up(curr->task, TASK_NORMAL, wake_flags);
+	try_to_wake_up(curr->task, TASK_NORMAL, wake_flags, NULL);
 	list_del_init(&curr->task_list);
 }
 EXPORT_SYMBOL(swake_up_locked);

next prev parent reply	other threads:[~2026-03-30 14:36 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-27  7:30 [PATCH] sched,workqueue: Use READ_ONCE()/WRITE_ONCE() for wake_cpu accesses Yu Peng
2026-03-27  9:27 ` Peter Zijlstra
2026-03-27 17:04   ` Tejun Heo
2026-03-30 14:36     ` Peter Zijlstra [this message]
2026-03-30 20:09       ` Tejun Heo
2026-03-30 20:26         ` Tejun Heo

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:496dff740dc dfblob:81006b5c881 dfblob:43bbf0693cc
dfblob:4c9d475da07 dfblob:0fef6496c4c dfblob:3eb2002fc84 )
 OR (
bs:"Re: [PATCH] sched,workqueue: Use READ_ONCE()/WRITE_ONCE() for wake_cpu accesses" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260330143629.GP3738010@noisy.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=jiangshanlai@gmail.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=pengyu@kylinos.cn \
    --cc=rostedt@goodmis.org \
    --cc=tj@kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox