From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 18C392EA171 for ; Mon, 30 Mar 2026 14:36:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774881401; cv=none; b=Tdrvj4BK1WAUqHovS5VrSeRyYa0f3pmg9Np2Hk506/AhrN8afB6EHbgTbYbuEgabfkKATW+FHYQkKVJeueI4WsorDd5dVnK3Z3L4/4muPfEVZ5OHDIkYQO17wYydBWinz5Ns5jAraJzMFh6aX03lZpAaWeJY6Kg2VUp6rsBBYI0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774881401; c=relaxed/simple; bh=kvw+Pp+oehkyfqDf7MXPnXR5cvMsy9lrty4mLDSzQnA=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=GaSI7Dr8ZA+3yONPs/BfFLzcfIxQv8NuYywVYx1pDLSx6wZtli6s9rtvU4xknlP60FKNgnwQ9nUObGaBU7+FWMgPqNvsa/8riKDL3kHzIN0ccudZmO0Js7+oMNAjrmoCr0saJOeHGkrfbJ1U+gf+UXx5+ZMBT6NDuORKDrQcLz0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=U9+Uvnw2; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="U9+Uvnw2" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=zPx474JkUc2Mszk0L2GNQqmNTVqWYF9+GnYCDPeFux8=; b=U9+Uvnw2IULiJZKXRqmJoqKQEJ hdapem5GdE9W7g3qaEhkwk+CoJlvCi9gtoqjReKVRzb7+avvl7hH13kTG110Eya92/sIAVm92q5Dh ZzlAvPGFSnzdsCVSG0jCd/KqDItnoJJzbaDku/wFnD177xoge8nzON9IKMkRYizJv/f7HGZarDOaa svgctJc9TJrTvoi6okbzOhFNK+c0DPyquioz4xXicoxA47HhCrN3ojDFQY2ntFG3N49mmBOIuOSgd aA1IPi2oFkckgKWDnG/PigBFn2wM7vK5A3Tkg+r5Xu4oLVDQgurxVVy00LQa/O40gGIoNxy+qKWQ8 fJ/nZbsQ==; Received: from 2001-1c00-8d85-4b00-266e-96ff-fe07-7dcc.cable.dynamic.v6.ziggo.nl ([2001:1c00:8d85:4b00:266e:96ff:fe07:7dcc] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1w7DjD-0000000E5KV-1zYr; Mon, 30 Mar 2026 14:36:31 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 1000) id C37B9300302; Mon, 30 Mar 2026 16:36:29 +0200 (CEST) Date: Mon, 30 Mar 2026 16:36:29 +0200 From: Peter Zijlstra To: Tejun Heo Cc: Yu Peng , mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, jiangshanlai@gmail.com, linux-kernel@vger.kernel.org Subject: Re: [PATCH] sched,workqueue: Use READ_ONCE()/WRITE_ONCE() for wake_cpu accesses Message-ID: <20260330143629.GP3738010@noisy.programming.kicks-ass.net> References: <20260327073007.254673-1-pengyu@kylinos.cn> <20260327092739.GB3739027@noisy.programming.kicks-ass.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Fri, Mar 27, 2026 at 07:04:20AM -1000, Tejun Heo wrote: > Hello, > > On Fri, Mar 27, 2026 at 10:27:39AM +0100, Peter Zijlstra wrote: > > On Fri, Mar 27, 2026 at 03:30:07PM +0800, Yu Peng wrote: > > > task_struct->wake_cpu is used as a wake placement hint by scheduler code > > > and workqueue's non-strict affinity repatriation path. > > > > > > These accesses are intentionally lockless and stale values are tolerated, > > > affecting only wakeup placement. > > > > The scheduler usage isn't lockless. And I'm not at all sure why and how > > workqueue is poking at this stuff :/ It has no business poking at it. > > Oh, you were involved in the discussion. It's workqueue using ->wake_cpu as > a way to implement opportunistic affinity at work boundaries as past history > doesn't mean anything at these boundaries. > > https://lore.kernel.org/all/20230519001709.2563-1-tj@kernel.org/ Bah, clearly I didn't remember ... > Hmmm... it would trigger KCSAN. Maybe a better way to do it is exposing > sched interface that does WRITE_ONCE wrapping? How do you want it resolved? Perhaps a try_to_wake_up() variant that takes a cpumask (see the completely untested code below). But I'm not entirely sure. The current thing very much relies on a bunch of implementation details that aren't guaranteed. The below will 'fake' the task cpumask for a while; and note that the caller has to be careful to provide a mask that only includes tasks that were set in the current task, otherwise 'funny' things will happen. Also, it will only use this mask if it ends up doing CPU selection at all. Ooh, I suppose I should make sure wake_cpu is inside the provided map too... That said, I don't think there is any urgency here, this code has been like this for a long time, it can stay this way a little longer. --- diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 496dff740dca..81006b5c881d 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4089,7 +4089,8 @@ bool ttwu_state_match(struct task_struct *p, unsigned int state, int *success) * Return: %true if @p->state changes (an actual wakeup was done), * %false otherwise. */ -int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) +int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags, + const struct cpumask *wake_mask) { guard(preempt)(); int cpu, success = 0; @@ -4128,6 +4129,8 @@ int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) * in set_current_state() that the waiting thread does. */ scoped_guard (raw_spinlock_irqsave, &p->pi_lock) { + const struct cpumask *prev_mask = NULL; + smp_mb__after_spinlock(); if (!ttwu_state_match(p, state, &success)) break; @@ -4227,7 +4230,14 @@ int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) */ smp_cond_load_acquire(&p->on_cpu, !VAL); + if (wake_mask && !is_migration_disabled(p)) { + prev_mask = p->cpus_ptr; + p->cpus_ptr = wake_mask; + } cpu = select_task_rq(p, p->wake_cpu, &wake_flags); + if (prev_mask) + p->cpus_ptr = prev_mask; + if (task_cpu(p) != cpu) { if (p->in_iowait) { delayacct_blkio_end(p); @@ -4371,13 +4381,13 @@ struct task_struct *cpu_curr_snapshot(int cpu) */ int wake_up_process(struct task_struct *p) { - return try_to_wake_up(p, TASK_NORMAL, 0); + return try_to_wake_up(p, TASK_NORMAL, 0, NULL); } EXPORT_SYMBOL(wake_up_process); int wake_up_state(struct task_struct *p, unsigned int state) { - return try_to_wake_up(p, state, 0); + return try_to_wake_up(p, state, 0, NULL); } /* @@ -7247,7 +7257,7 @@ int default_wake_function(wait_queue_entry_t *curr, unsigned mode, int wake_flag void *key) { WARN_ON_ONCE(wake_flags & ~(WF_SYNC|WF_CURRENT_CPU)); - return try_to_wake_up(curr->private, mode, wake_flags); + return try_to_wake_up(curr->private, mode, wake_flags, NULL); } EXPORT_SYMBOL(default_wake_function); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 43bbf0693cca..4c9d475da072 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -3760,7 +3760,8 @@ static inline bool is_per_cpu_kthread(struct task_struct *p) extern void swake_up_all_locked(struct swait_queue_head *q); extern void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait); -extern int try_to_wake_up(struct task_struct *tsk, unsigned int state, int wake_flags); +extern int try_to_wake_up(struct task_struct *tsk, unsigned int state, int wake_flags, + const struct cpumask *wake_mask); #ifdef CONFIG_PREEMPT_DYNAMIC extern int preempt_dynamic_mode; diff --git a/kernel/sched/swait.c b/kernel/sched/swait.c index 0fef6496c4c8..3eb2002fc847 100644 --- a/kernel/sched/swait.c +++ b/kernel/sched/swait.c @@ -27,7 +27,7 @@ void swake_up_locked(struct swait_queue_head *q, int wake_flags) return; curr = list_first_entry(&q->task_list, typeof(*curr), task_list); - try_to_wake_up(curr->task, TASK_NORMAL, wake_flags); + try_to_wake_up(curr->task, TASK_NORMAL, wake_flags, NULL); list_del_init(&curr->task_list); } EXPORT_SYMBOL(swake_up_locked);