From: "Ionut Nechita (Wind River)" <ionut.nechita@windriver.com>
To: stable@vger.kernel.org
Cc: frederic@kernel.org, tglx@linutronix.de,
linux-kernel@vger.kernel.org, rdunlap@infradead.org,
ptesarik@suse.com, Ionut Nechita <ionut.nechita@windriver.com>,
kernel test robot <oliver.sang@intel.com>
Subject: [PATCH v2 6.12.y 2/7] timers/migration: Annotate accesses to ignore flag
Date: Sat, 21 Mar 2026 12:24:35 +0200 [thread overview]
Message-ID: <20260321102440.27782-3-ionut.nechita@windriver.com> (raw)
In-Reply-To: <20260321102440.27782-1-ionut.nechita@windriver.com>
From: Ionut Nechita <ionut.nechita@windriver.com>
From: Frederic Weisbecker <frederic@kernel.org>
commit 922efd298bb2636880408c00942dbd54d8bf6e0d upstream.
The group's ignore flag is:
_ read under the group's lock (idle entry, remote expiry)
_ turned on/off under the group's lock (idle entry, remote expiry)
_ turned on locklessly on idle exit
When idle entry or remote expiry clear the "ignore" flag of a group, the
operation must be synchronized against other concurrent idle entry or
remote expiry to make sure the related group timer is never missed. To
enforce this synchronization, both "ignore" clear and read are
performed under the group lock.
On the contrary, whether idle entry or remote expiry manage to observe
the "ignore" flag turned on by a CPU exiting idle is a matter of
optimization. If that flag set is missed or cleared concurrently, the
worst outcome is a migrator wasting time remotely handling a "ghost"
timer. This is why the ignore flag can be set locklessly.
Unfortunately, the related lockless accesses are bare and miss
appropriate annotations. KCSAN rightfully complains:
BUG: KCSAN: data-race in __tmigr_cpu_activate / print_report
write to 0xffff88842fc28004 of 1 bytes by task 0 on cpu 0:
__tmigr_cpu_activate
tmigr_cpu_activate
timer_clear_idle
tick_nohz_restart_sched_tick
tick_nohz_idle_exit
do_idle
cpu_startup_entry
kernel_init
do_initcalls
clear_bss
reserve_bios_regions
common_startup_64
read to 0xffff88842fc28004 of 1 bytes by task 0 on cpu 1:
print_report
kcsan_report_known_origin
kcsan_setup_watchpoint
tmigr_next_groupevt
tmigr_update_events
tmigr_inactive_up
__walk_groups+0x50/0x77
walk_groups
__tmigr_cpu_deactivate
tmigr_cpu_deactivate
__get_next_timer_interrupt
timer_base_try_to_set_idle
tick_nohz_stop_tick
tick_nohz_idle_stop_tick
cpuidle_idle_call
do_idle
Although the relevant accesses could be marked as data_race(), the
"ignore" flag being read several times within the same
tmigr_update_events() function is confusing and error prone. Prefer
reading it once in that function and make use of similar/paired accesses
elsewhere with appropriate comments when necessary.
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250114231507.21672-4-frederic@kernel.org
Closes: https://lore.kernel.org/oe-lkp/202501031612.62e0c498-lkp@intel.com
---
kernel/time/timer_migration.c | 21 +++++++++++++++------
1 file changed, 15 insertions(+), 6 deletions(-)
diff --git a/kernel/time/timer_migration.c b/kernel/time/timer_migration.c
index 72538baa7a1fb..0707f1ef05f7e 100644
--- a/kernel/time/timer_migration.c
+++ b/kernel/time/timer_migration.c
@@ -569,7 +569,7 @@ static struct tmigr_event *tmigr_next_groupevt(struct tmigr_group *group)
while ((node = timerqueue_getnext(&group->events))) {
evt = container_of(node, struct tmigr_event, nextevt);
- if (!evt->ignore) {
+ if (!READ_ONCE(evt->ignore)) {
WRITE_ONCE(group->next_expiry, evt->nextevt.expires);
return evt;
}
@@ -665,7 +665,7 @@ static bool tmigr_active_up(struct tmigr_group *group,
* lock is held while updating the ignore flag in idle path. So this
* state change will not be lost.
*/
- group->groupevt.ignore = true;
+ WRITE_ONCE(group->groupevt.ignore, true);
return walk_done;
}
@@ -726,6 +726,7 @@ bool tmigr_update_events(struct tmigr_group *group, struct tmigr_group *child,
union tmigr_state childstate, groupstate;
bool remote = data->remote;
bool walk_done = false;
+ bool ignore;
u64 nextexp;
if (child) {
@@ -744,11 +745,19 @@ bool tmigr_update_events(struct tmigr_group *group, struct tmigr_group *child,
nextexp = child->next_expiry;
evt = &child->groupevt;
- evt->ignore = (nextexp == KTIME_MAX) ? true : false;
+ /*
+ * This can race with concurrent idle exit (activate).
+ * If the current writer wins, a useless remote expiration may
+ * be scheduled. If the activate wins, the event is properly
+ * ignored.
+ */
+ ignore = (nextexp == KTIME_MAX) ? true : false;
+ WRITE_ONCE(evt->ignore, ignore);
} else {
nextexp = data->nextexp;
first_childevt = evt = data->evt;
+ ignore = evt->ignore;
/*
* Walking the hierarchy is required in any case when a
@@ -774,7 +783,7 @@ bool tmigr_update_events(struct tmigr_group *group, struct tmigr_group *child,
* first event information of the group is updated properly and
* also handled properly, so skip this fast return path.
*/
- if (evt->ignore && !remote && group->parent)
+ if (ignore && !remote && group->parent)
return true;
raw_spin_lock(&group->lock);
@@ -788,7 +797,7 @@ bool tmigr_update_events(struct tmigr_group *group, struct tmigr_group *child,
* queue when the expiry time changed only or when it could be ignored.
*/
if (timerqueue_node_queued(&evt->nextevt)) {
- if ((evt->nextevt.expires == nextexp) && !evt->ignore) {
+ if ((evt->nextevt.expires == nextexp) && !ignore) {
/* Make sure not to miss a new CPU event with the same expiry */
evt->cpu = first_childevt->cpu;
goto check_toplvl;
@@ -798,7 +807,7 @@ bool tmigr_update_events(struct tmigr_group *group, struct tmigr_group *child,
WRITE_ONCE(group->next_expiry, KTIME_MAX);
}
- if (evt->ignore) {
+ if (ignore) {
/*
* When the next child event could be ignored (nextexp is
* KTIME_MAX) and there was no remote timer handling before or
--
2.53.0
next prev parent reply other threads:[~2026-03-21 10:25 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-21 10:24 [PATCH v2 6.12.y 0/7] timers/migration: Backport fixes and cleanups from mainline Ionut Nechita (Wind River)
2026-03-21 10:24 ` [PATCH v2 6.12.y 1/7] timer/migration: Fix kernel-doc warnings for union tmigr_state Ionut Nechita (Wind River)
2026-03-21 10:24 ` Ionut Nechita (Wind River) [this message]
2026-03-21 11:03 ` [PATCH v2 6.12.y 2/7] timers/migration: Annotate accesses to ignore flag Greg KH
2026-03-24 7:40 ` Ionut Nechita (Sunlight Linux)
2026-03-21 10:24 ` [PATCH v2 6.12.y 3/7] timers/migration: Simplify top level detection on group setup Ionut Nechita (Wind River)
2026-03-21 10:24 ` [PATCH v2 6.12.y 4/7] timers/migration: Clean up the loop in tmigr_quick_check() Ionut Nechita (Wind River)
2026-03-21 10:24 ` [PATCH v2 6.12.y 5/7] timers/migration: Convert "while" loops to use "for" Ionut Nechita (Wind River)
2026-03-21 10:24 ` [PATCH v2 6.12.y 6/7] timers/migration: Remove locking on group connection Ionut Nechita (Wind River)
2026-03-21 10:24 ` [PATCH v2 6.12.y 7/7] timers/migration: Fix imbalanced NUMA trees Ionut Nechita (Wind River)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260321102440.27782-3-ionut.nechita@windriver.com \
--to=ionut.nechita@windriver.com \
--cc=frederic@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=oliver.sang@intel.com \
--cc=ptesarik@suse.com \
--cc=rdunlap@infradead.org \
--cc=stable@vger.kernel.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox