From: Anna-Maria Behnsen <anna-maria@linutronix.de>
To: Sebastian Siewior <bigeasy@linutronix.de>
Cc: linux-kernel@vger.kernel.org,
Peter Zijlstra <peterz@infradead.org>,
John Stultz <jstultz@google.com>,
Thomas Gleixner <tglx@linutronix.de>,
Eric Dumazet <edumazet@google.com>,
"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>,
Arjan van de Ven <arjan@infradead.org>,
"Paul E . McKenney" <paulmck@kernel.org>,
Frederic Weisbecker <frederic@kernel.org>,
Rik van Riel <riel@surriel.com>,
Steven Rostedt <rostedt@goodmis.org>,
Giovanni Gherdovich <ggherdovich@suse.cz>,
Lukasz Luba <lukasz.luba@arm.com>,
"Gautham R . Shenoy" <gautham.shenoy@amd.com>,
Srinivas Pandruvada <srinivas.pandruvada@intel.com>,
K Prateek Nayak <kprateek.nayak@amd.com>
Subject: Re: [PATCH v9 30/32] timers: Implement the hierarchical pull model
Date: Fri, 08 Dec 2023 11:31:13 +0100 [thread overview]
Message-ID: <875y19ouj2.fsf@somnus> (raw)
In-Reply-To: <20231207180928.FZB319OJ@linutronix.de>
Sebastian Siewior <bigeasy@linutronix.de> writes:
> On 2023-12-01 10:26:52 [+0100], Anna-Maria Behnsen wrote:
>> diff --git a/kernel/time/timer_migration.c b/kernel/time/timer_migration.c
>> new file mode 100644
>> index 000000000000..05cd8f1bc45d
>> --- /dev/null
>> +++ b/kernel/time/timer_migration.c
>> @@ -0,0 +1,1636 @@
> …
>> + * Protection of the tmigr group state information:
>> + * ------------------------------------------------
>> + *
>> + * The state information with the list of active children and migrator needs to
>> + * be protected by a sequence counter. It prevents a race when updates in a
>
> s/a$//
>
>> + * child groups are propagated in changed order. The following scenario
>> + * describes what happens without updating the sequence counter:
>> + *
>> + * Therefore, let's take three groups and four CPUs (CPU2 and CPU3 as well
>> + * as GRP0:1 will not change during the scenario):
>> + *
>> + * LVL 1 [GRP1:0]
>> + * migrator = GRP0:1
>> + * active = GRP0:0, GRP0:1
>> + * / \
>> + * LVL 0 [GRP0:0] [GRP0:1]
>> + * migrator = CPU0 migrator = CPU2
>> + * active = CPU0 active = CPU2
>> + * / \ / \
>> + * CPUs 0 1 2 3
>> + * active idle active idle
>> + *
>> + *
>> + * 1. CPU0 goes idle (changes are updated in GRP0:0; afterwards the current
>> + * states of GRP0:0 and GRP1:0 are stored in the data for walking the
>> + * hierarchy):
>
> CPU0 goes idle. The state update is performed lock less and group
> wise. In the first step only GRP0:0 has been updated. The update of
> GRP1:0 is pending, the CPU walks through the hierarchy.
>
>> + *
>> + * LVL 1 [GRP1:0]
>> + * migrator = GRP0:1
>> + * active = GRP0:0, GRP0:1
>> + * / \
>> + * LVL 0 [GRP0:0] [GRP0:1]
>> + * --> migrator = TMIGR_NONE migrator = CPU2
>> + * --> active = active = CPU2
>> + * / \ / \
>> + * CPUs 0 1 2 3
>> + * --> idle idle active idle
>
>> + * 2. CPU1 comes out of idle (changes are update in GRP0:0; afterwards the
>> + * current states of GRP0:0 and GRP1:0 are stored in the data for walking the
>> + * hierarchy):
>
> While CPU0 goes idle and continues to update the state, CPU1 comes
> out of idle. CPU1 updates GRP0:0. The update for GRP1:0 is pending,
> tge CPU walks through the hierarchy. Both CPUs now walk the hierarchy
> to perform the needed update from their point of view.
> The currently visible state:
>
>> + *
>> + * LVL 1 [GRP1:0]
>> + * migrator = GRP0:1
>> + * active = GRP0:0, GRP0:1
>> + * / \
>> + * LVL 0 [GRP0:0] [GRP0:1]
>> + * --> migrator = CPU1 migrator = CPU2
>> + * --> active = CPU1 active = CPU2
>> + * / \ / \
>> + * CPUs 0 1 2 3
>> + * idle --> active active idle
>> + *
>> + * 3. Here comes the change of the order: Propagating the changes of step 2
>> + * through the hierarchy to GRP1:0 - nothing to be done, because GRP0:0
>> + * is already up to date.
>
> Here is the race condition: CPU1 managed to propagate its changes
> through the hierarchy to GRP1:0 before CPU0 did. The active members
> of GRP1:0 remain unchanged after the update since it is still valid
> from CPU1 current point of view:
>
> LVL 1 [GRP1:0]
> --> migrator = GRP0:1
> --> active = GRP0:0, GRP0:1
> / \
> LVL 0 [GRP0:0] [GRP0:1]
> migrator = CPU1 migrator = CPU2
> active = CPU1 active = CPU2
> / \ / \
> CPUs 0 1 2 3
> idle active active idle
>
> [ I take it as the migrator remains set to GRP0:1 by CPU1 but it could
> be changed to GRP0:0. I assume that both fields (migrator+active) are
> changed there via the propagation and the arrow in both fields denotes
> this. ]
>
>> + * 4. Propagating the changes of step 1 through the hierarchy to GRP1:0
>
> Now CPU0 finally propagates its changes to GRP1:0.
>
>> + *
>> + * LVL 1 [GRP1:0]
>> + * --> migrator = GRP0:1
>> + * --> active = GRP0:1
>> + * / \
>> + * LVL 0 [GRP0:0] [GRP0:1]
>> + * migrator = CPU1 migrator = CPU2
>> + * active = CPU1 active = CPU2
>> + * / \ / \
>> + * CPUs 0 1 2 3
>> + * idle active active idle
>> + *
>> + * Now there is a inconsistent overall state because GRP0:0 is active, but
>> + * it is marked as idle in the GRP1:0. This is prevented by incrementing
>> + * sequence counter whenever changing the state.
>
> The race of CPU0 vs CPU1 led to an inconsistent state in GRP1:0.
> CPU1 is active and is correctly listed as active in GRP0:0. However
> GRP1:0 does not have GRP0:0 listed as active which is wrong.
> The sequence counter has been added to avoid inconsistent states
> during updates. The state is updated atomically only if all members,
> including the sequence counter, match the expected value
> (compare-and-exchange).
> Looking back at the previous example with the addition of the
> sequence number: The update as performed by CPU0 in step 4 will fail.
> CPU1 changed the sequence number during the update in step 3 so the
> expected old value (as seen by CPU0 before starting the walk) does
> not match.
>
Thanks a lot for rephrasing the documentation to make it clearer for the
reader! I use your proposal with some minor changes.
Thanks,
Anna-Maria
next prev parent reply other threads:[~2023-12-08 10:32 UTC|newest]
Thread overview: 86+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-01 9:26 [PATCH v9 00/32] timers: Move from a push remote at enqueue to a pull at expiry model Anna-Maria Behnsen
2023-12-01 9:26 ` [PATCH v9 01/32] tick-sched: Fix function names in comments Anna-Maria Behnsen
2023-12-20 13:09 ` Frederic Weisbecker
2023-12-20 15:59 ` [tip: timers/core] " tip-bot2 for Anna-Maria Behnsen
2023-12-01 9:26 ` [PATCH v9 02/32] tick/sched: Cleanup confusing variables Anna-Maria Behnsen
2023-12-20 15:59 ` [tip: timers/core] " tip-bot2 for Anna-Maria Behnsen
2023-12-01 9:26 ` [PATCH v9 03/32] tick-sched: Warn when next tick seems to be in the past Anna-Maria Behnsen
2023-12-20 13:27 ` Frederic Weisbecker
2023-12-20 15:59 ` [tip: timers/core] " tip-bot2 for Anna-Maria Behnsen
2023-12-01 9:26 ` [PATCH v9 04/32] tracing/timers: Enhance timer_start tracepoint Anna-Maria Behnsen
2023-12-20 13:35 ` Frederic Weisbecker
2023-12-20 15:59 ` [tip: timers/core] " tip-bot2 for Anna-Maria Behnsen
2023-12-01 9:26 ` [PATCH v9 05/32] tracing/timers: Add tracepoint for tracking timer base is_idle flag Anna-Maria Behnsen
2023-12-20 13:43 ` Frederic Weisbecker
2023-12-20 15:59 ` [tip: timers/core] " tip-bot2 for Anna-Maria Behnsen
2023-12-01 9:26 ` [PATCH v9 06/32] timers: Do not IPI for deferrable timers Anna-Maria Behnsen
2023-12-20 15:59 ` [tip: timers/core] " tip-bot2 for Anna-Maria Behnsen
2023-12-01 9:26 ` [PATCH v9 07/32] timers: Move store of next event into __next_timer_interrupt() Anna-Maria Behnsen
2023-12-20 15:59 ` [tip: timers/core] " tip-bot2 for Anna-Maria Behnsen
2023-12-01 9:26 ` [PATCH v9 08/32] timers: Clarify check in forward_timer_base() Anna-Maria Behnsen
2023-12-20 15:59 ` [tip: timers/core] " tip-bot2 for Anna-Maria Behnsen
2023-12-01 9:26 ` [PATCH v9 09/32] timers: Split out forward timer base functionality Anna-Maria Behnsen
2023-12-20 15:59 ` [tip: timers/core] " tip-bot2 for Anna-Maria Behnsen
2023-12-01 9:26 ` [PATCH v9 10/32] timers: Use already existing function for forwarding timer base Anna-Maria Behnsen
2023-12-20 15:59 ` [tip: timers/core] " tip-bot2 for Anna-Maria Behnsen
2023-12-01 9:26 ` [PATCH v9 11/32] timers: Rework idle logic Anna-Maria Behnsen
2023-12-20 14:00 ` Frederic Weisbecker
2023-12-20 15:59 ` [tip: timers/core] " tip-bot2 for Thomas Gleixner
2023-12-01 9:26 ` [PATCH v9 12/32] timers: Fix nextevt calculation when no timers are pending Anna-Maria Behnsen
2023-12-04 16:03 ` Sebastian Siewior
2023-12-05 11:53 ` Anna-Maria Behnsen
2023-12-10 0:35 ` Frederic Weisbecker
2023-12-12 13:21 ` Anna-Maria Behnsen
2023-12-12 13:37 ` Frederic Weisbecker
2023-12-20 14:49 ` Frederic Weisbecker
2023-12-20 15:59 ` [tip: timers/core] " tip-bot2 for Anna-Maria Behnsen
2023-12-01 9:26 ` [PATCH v9 13/32] timers: Restructure get_next_timer_interrupt() Anna-Maria Behnsen
2023-12-01 9:26 ` [PATCH v9 14/32] timers: Split out get next timer interrupt Anna-Maria Behnsen
2023-12-01 9:26 ` [PATCH v9 15/32] timers: Move marking timer bases idle into tick_nohz_stop_tick() Anna-Maria Behnsen
2023-12-01 9:26 ` [PATCH v9 16/32] timers: Optimization for timer_base_try_to_set_idle() Anna-Maria Behnsen
2023-12-04 17:52 ` Sebastian Siewior
2023-12-05 12:05 ` Anna-Maria Behnsen
2023-12-01 9:26 ` [PATCH v9 17/32] timers: Introduce add_timer() variants which modify timer flags Anna-Maria Behnsen
2023-12-05 18:28 ` Sebastian Siewior
2023-12-06 9:24 ` Anna-Maria Behnsen
2023-12-01 9:26 ` [PATCH v9 18/32] workqueue: Use global variant for add_timer() Anna-Maria Behnsen
2023-12-01 9:26 ` [PATCH v9 19/32] timers: add_timer_on(): Make sure TIMER_PINNED flag is set Anna-Maria Behnsen
2023-12-05 18:29 ` Sebastian Siewior
2023-12-06 9:57 ` Anna-Maria Behnsen
2023-12-06 10:26 ` Sebastian Siewior
2023-12-06 10:46 ` Anna-Maria Behnsen
2023-12-01 9:26 ` [PATCH v9 20/32] timers: Ease code in run_local_timers() Anna-Maria Behnsen
2023-12-01 9:26 ` [PATCH v9 21/32] timers: Split next timer interrupt logic Anna-Maria Behnsen
2023-12-05 18:29 ` Sebastian Siewior
2023-12-01 9:26 ` [PATCH v9 22/32] timers: Keep the pinned timers separate from the others Anna-Maria Behnsen
2023-12-05 21:11 ` Sebastian Siewior
2023-12-06 10:23 ` Anna-Maria Behnsen
2023-12-01 9:26 ` [PATCH v9 23/32] timers: Retrieve next expiry of pinned/non-pinned timers separately Anna-Maria Behnsen
2023-12-06 9:47 ` Sebastian Siewior
2023-12-07 10:12 ` Anna-Maria Behnsen
2023-12-01 9:26 ` [PATCH v9 24/32] timers: Split out "get next timer interrupt" functionality Anna-Maria Behnsen
2023-12-06 10:20 ` Sebastian Siewior
2023-12-01 9:26 ` [PATCH v9 25/32] timers: Add get next timer interrupt functionality for remote CPUs Anna-Maria Behnsen
2023-12-06 10:44 ` Sebastian Siewior
2023-12-07 10:27 ` Anna-Maria Behnsen
2023-12-01 9:26 ` [PATCH v9 26/32] timers: Restructure internal locking Anna-Maria Behnsen
2023-12-01 9:26 ` [PATCH v9 27/32] timers: Check if timers base is handled already Anna-Maria Behnsen
2023-12-06 10:58 ` Sebastian Siewior
2023-12-01 9:26 ` [PATCH v9 28/32] tick/sched: Split out jiffies update helper function Anna-Maria Behnsen
2023-12-01 9:26 ` [PATCH v9 29/32] timers: Introduce function to check timer base is_idle flag Anna-Maria Behnsen
2023-12-01 9:26 ` [PATCH v9 30/32] timers: Implement the hierarchical pull model Anna-Maria Behnsen
2023-12-06 16:35 ` Sebastian Siewior
2023-12-08 9:01 ` Anna-Maria Behnsen
2023-12-07 18:09 ` Sebastian Siewior
2023-12-08 10:31 ` Anna-Maria Behnsen [this message]
2023-12-08 18:18 ` Sebastian Siewior
2023-12-11 18:04 ` Sebastian Siewior
2023-12-12 11:31 ` Anna-Maria Behnsen
2023-12-12 11:43 ` Anna-Maria Behnsen
2023-12-12 15:59 ` Sebastian Siewior
2023-12-12 12:14 ` Sebastian Siewior
2023-12-12 14:52 ` Anna-Maria Behnsen
2023-12-12 17:08 ` Sebastian Siewior
2023-12-01 9:26 ` [PATCH v9 31/32] timer_migration: Add tracepoints Anna-Maria Behnsen
2023-12-01 9:26 ` [PATCH v9 32/32] timers: Always queue timers on the local CPU Anna-Maria Behnsen
2023-12-07 12:11 ` [PATCH v9 00/32] timers: Move from a push remote at enqueue to a pull at expiry model Anna-Maria Behnsen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=875y19ouj2.fsf@somnus \
--to=anna-maria@linutronix.de \
--cc=arjan@infradead.org \
--cc=bigeasy@linutronix.de \
--cc=edumazet@google.com \
--cc=frederic@kernel.org \
--cc=gautham.shenoy@amd.com \
--cc=ggherdovich@suse.cz \
--cc=jstultz@google.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lukasz.luba@arm.com \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=rafael.j.wysocki@intel.com \
--cc=riel@surriel.com \
--cc=rostedt@goodmis.org \
--cc=srinivas.pandruvada@intel.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox