public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sebastian Siewior <bigeasy@linutronix.de>
To: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: linux-kernel@vger.kernel.org,
	Peter Zijlstra <peterz@infradead.org>,
	John Stultz <jstultz@google.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Eric Dumazet <edumazet@google.com>,
	"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>,
	Arjan van de Ven <arjan@infradead.org>,
	"Paul E . McKenney" <paulmck@kernel.org>,
	Frederic Weisbecker <frederic@kernel.org>,
	Rik van Riel <riel@surriel.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Giovanni Gherdovich <ggherdovich@suse.cz>,
	Lukasz Luba <lukasz.luba@arm.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>,
	Srinivas Pandruvada <srinivas.pandruvada@intel.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>
Subject: Re: [PATCH v9 30/32] timers: Implement the hierarchical pull model
Date: Fri, 8 Dec 2023 19:18:05 +0100	[thread overview]
Message-ID: <20231208181805.BbFDsoJe@linutronix.de> (raw)
In-Reply-To: <20231201092654.34614-31-anna-maria@linutronix.de>

On 2023-12-01 10:26:52 [+0100], Anna-Maria Behnsen wrote:
> diff --git a/kernel/time/timer_migration.c b/kernel/time/timer_migration.c
> new file mode 100644
> index 000000000000..05cd8f1bc45d
> --- /dev/null
> +++ b/kernel/time/timer_migration.c
> @@ -0,0 +1,1636 @@
> + * Required event and timerqueue update after a remote expiry:
> + * -----------------------------------------------------------
> + *
> + * After a remote expiry of a CPU, a walk through the hierarchy updating the
> + * events and timerqueues has to be done when there is a 'new' global timer of
> + * the remote CPU (which is obvious) but also if there is no new global timer,
> + * but the remote CPU is still idle:

After expiring timers of a remote CPU, a walk through the hierarchy and
updating events timerqueues is required. It is obviously needed if there
is a 'new' global timer but also if there no new global timer but the
remote CPU is still idle.

> + * 1. CPU2 is the migrator and does the remote expiry in GRP1:0; expiry of
> + *    evt-CPU0 and evt-CPU1 are equal:

       CPU0 and CPU1 have both a timer expiring at the same time so both
       have an event enqueued in the timerqueue. CPU2 and CPU3 have no
       global timer pending and CPU2 is the only active CPU and also the
       migrator.

> + *
> + *    LVL 1            [GRP1:0]
> + *                     migrator = GRP0:1
> + *                     active   = GRP0:1
> + *                 --> timerqueue = evt-GRP0:0
> + *                   /                \
> + *    LVL 0  [GRP0:0]                  [GRP0:1]
> + *           migrator = TMIGR_NONE     migrator = CPU2
> + *           active   =                active   = CPU2
> + *           groupevt.ignore = false   groupevt.ignore = true
> + *           groupevt.cpu = CPU0       groupevt.cpu =
> + *           timerqueue = evt-CPU0,    timerqueue =
> + *                        evt-CPU1
> + *              /         \                /         \
> + *    CPUs     0           1              2           3
> + *             idle        idle           active      idle
> + *
> + * 2. Remove the first event of the timerqueue in GRP1:0 and expire the timers
> + *    of CPU0 (see evt-GRP0:0->cpu value):

      CPU2 begins to expire remote timers. It starts with own group
      GRP0:1. GRP0:1 has nothing in ts timerqueue and continues with its
      parent, GRP1:0. In GRP1:0 it dequeues the first event. It looks at
      CPU member expires the pending timer of CPU0.

> + *    LVL 1            [GRP1:0]
> + *                     migrator = GRP0:1
> + *                     active   = GRP0:1
> + *                 --> timerqueue =
> + *                   /                \
> + *    LVL 0  [GRP0:0]                  [GRP0:1]
> + *           migrator = TMIGR_NONE     migrator = CPU2
> + *           active   =                active   = CPU2
> + *           groupevt.ignore = false   groupevt.ignore = true
> + *       --> groupevt.cpu = CPU0       groupevt.cpu =
> + *           timerqueue = evt-CPU0,    timerqueue =
> + *                        evt-CPU1
> + *              /         \                /         \
> + *    CPUs     0           1              2           3
> + *             idle        idle           active      idle
> + *
> + * 3. After the remote expiry CPU0 has no global timer that needs to be
> + *    enqueued. When skipping the walk, the global timer of CPU1 is not handled,
> + *    as the group event of GRP0:0 is not updated and not enqueued into GRP1:0. The
> + *    walk has to be done to update the group events and timerqueues:

     The work isn't over after expiring timers of CPU0. If we stop
     here, then CPU1's timer have not been expired and the timerqueue of
     GRP0:0 has still an event for CPU0 enqueued which has just been
     processed. So it is required to walk the hierarchy from CPU0's
     point of view and update it accordingly.
     CPU0 will be removed from the timerqueue because it has no pending
     timer. If CPU0 would have a timer pending then it has to expire
     after CPU1's first timer because all timer from this period have
     just been expired.
     Either way CPU1 will be first in GRP0:0's timerqueue and therefore
     set in the CPU field of the group event which is enqueued in
     GRP1:0's timerqueue.

> + *    LVL 1            [GRP1:0]
> + *                     migrator = GRP0:1
> + *                     active   = GRP0:1
> + *                 --> timerqueue = evt-GRP0:0
> + *                   /                \
> + *    LVL 0  [GRP0:0]                  [GRP0:1]
> + *           migrator = TMIGR_NONE     migrator = CPU2
> + *           active   =                active   = CPU2
> + *           groupevt.ignore = false   groupevt.ignore = true
> + *       --> groupevt.cpu = CPU1       groupevt.cpu =
> + *       --> timerqueue = evt-CPU1     timerqueue =
> + *              /         \                /         \
> + *    CPUs     0           1              2           3
> + *             idle        idle           active      idle
> + *
> + * Now CPU2 (migrator) is able to handle the timer of CPU1 as CPU2 only scans
> + * the timerqueues of GRP0:1 and GRP1:0.

    Now CPU2 continues step 2 at GRP1:0 and will expire the timer of
    CPU1.

> + * The update of step 3 is valid to be skipped, when the remote CPU went offline
> + * in the meantime because an update was already done during inactive path. When
> + * CPU became active in the meantime, update isn't required as well, because
> + * GRP0:0 is now longer idle.

   The hierarchy walk in step 3 can be skipped if the migrator notices
   that a CPU of GRP0:0 is active. The CPU will mark GRP0:0 active and
   take care of the group and any needed updates within the hierarchy.

I skipped the "offline" part because it is not needed. Before the CPU
can go offline it has first to come out of idle. While going offline it
won't (probably) participate here and the remaining timer will be
migrated to another CPU.

> + */
> +
> +typedef bool (*up_f)(struct tmigr_group *, struct tmigr_group *, void *);
> +
> +static void __walk_groups(up_f up, void *data,
> +			  struct tmigr_cpu *tmc)
> +{
> +	struct tmigr_group *child = NULL, *group = tmc->tmgroup;
> +
> +	do {
> +		WARN_ON_ONCE(group->level >= tmigr_hierarchy_levels);
> +
> +		if (up(group, child, data))
> +			break;
> +
> +		child = group;
> +		group = group->parent;
> +	} while (group);
> +}
> +
> +static void walk_groups(up_f up, void *data, struct tmigr_cpu *tmc)
> +{
> +	lockdep_assert_held(&tmc->lock);
> +
> +	__walk_groups(up, data, tmc);
> +}

So these two. walk_groups() uses all have tmigr_cpu::lock acquired and
__walk_groups() don't. Also the `up' function passed walk_groups() has
always the same data type while the data argument passed to
__walk_groups() has also the same type but different compared to the
former.

Given the locking situation and the type of the data argument looks like
walk_groups() is used for thing#1 and __walk_groups() for thing#2.
Therefore it could make sense have two separate functions (instead of
walk_groups() and __walk_groups()) to distinguish this.
Now it is too late but maybe later I figure out why the one type
requires locking and the other doesn't.

…
> +/*
> + * Return the next event which is already expired of the group timerqueue
> + *
> + * Event, which is returned, is also removed from the queue.
> + */
> +static struct tmigr_event *tmigr_next_expired_groupevt(struct tmigr_group *group,
> +						     u64 now)
> +{
> +	struct tmigr_event *evt = tmigr_next_groupevt(group);
> +
> +	if (!evt || now < evt->nextevt.expires)
> +		return NULL;
> +
> +	/*
> +	 * The event is already expired. Remove it. If it's not the last event,
> +	 * then update all group event related information.
> +	 */

  The event expired, remove it. Update group's next expire time.

> +	if (timerqueue_del(&group->events, &evt->nextevt))
> +		tmigr_next_groupevt(group);
> +	else
> +		WRITE_ONCE(group->next_expiry, KTIME_MAX);

And then you can invoke tmigr_next_groupevt() unconditionally.

> +	return evt;
> +}
> +

Sebastian

  parent reply	other threads:[~2023-12-08 18:18 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-01  9:26 [PATCH v9 00/32] timers: Move from a push remote at enqueue to a pull at expiry model Anna-Maria Behnsen
2023-12-01  9:26 ` [PATCH v9 01/32] tick-sched: Fix function names in comments Anna-Maria Behnsen
2023-12-20 13:09   ` Frederic Weisbecker
2023-12-20 15:59   ` [tip: timers/core] " tip-bot2 for Anna-Maria Behnsen
2023-12-01  9:26 ` [PATCH v9 02/32] tick/sched: Cleanup confusing variables Anna-Maria Behnsen
2023-12-20 15:59   ` [tip: timers/core] " tip-bot2 for Anna-Maria Behnsen
2023-12-01  9:26 ` [PATCH v9 03/32] tick-sched: Warn when next tick seems to be in the past Anna-Maria Behnsen
2023-12-20 13:27   ` Frederic Weisbecker
2023-12-20 15:59   ` [tip: timers/core] " tip-bot2 for Anna-Maria Behnsen
2023-12-01  9:26 ` [PATCH v9 04/32] tracing/timers: Enhance timer_start tracepoint Anna-Maria Behnsen
2023-12-20 13:35   ` Frederic Weisbecker
2023-12-20 15:59   ` [tip: timers/core] " tip-bot2 for Anna-Maria Behnsen
2023-12-01  9:26 ` [PATCH v9 05/32] tracing/timers: Add tracepoint for tracking timer base is_idle flag Anna-Maria Behnsen
2023-12-20 13:43   ` Frederic Weisbecker
2023-12-20 15:59   ` [tip: timers/core] " tip-bot2 for Anna-Maria Behnsen
2023-12-01  9:26 ` [PATCH v9 06/32] timers: Do not IPI for deferrable timers Anna-Maria Behnsen
2023-12-20 15:59   ` [tip: timers/core] " tip-bot2 for Anna-Maria Behnsen
2023-12-01  9:26 ` [PATCH v9 07/32] timers: Move store of next event into __next_timer_interrupt() Anna-Maria Behnsen
2023-12-20 15:59   ` [tip: timers/core] " tip-bot2 for Anna-Maria Behnsen
2023-12-01  9:26 ` [PATCH v9 08/32] timers: Clarify check in forward_timer_base() Anna-Maria Behnsen
2023-12-20 15:59   ` [tip: timers/core] " tip-bot2 for Anna-Maria Behnsen
2023-12-01  9:26 ` [PATCH v9 09/32] timers: Split out forward timer base functionality Anna-Maria Behnsen
2023-12-20 15:59   ` [tip: timers/core] " tip-bot2 for Anna-Maria Behnsen
2023-12-01  9:26 ` [PATCH v9 10/32] timers: Use already existing function for forwarding timer base Anna-Maria Behnsen
2023-12-20 15:59   ` [tip: timers/core] " tip-bot2 for Anna-Maria Behnsen
2023-12-01  9:26 ` [PATCH v9 11/32] timers: Rework idle logic Anna-Maria Behnsen
2023-12-20 14:00   ` Frederic Weisbecker
2023-12-20 15:59   ` [tip: timers/core] " tip-bot2 for Thomas Gleixner
2023-12-01  9:26 ` [PATCH v9 12/32] timers: Fix nextevt calculation when no timers are pending Anna-Maria Behnsen
2023-12-04 16:03   ` Sebastian Siewior
2023-12-05 11:53     ` Anna-Maria Behnsen
2023-12-10  0:35       ` Frederic Weisbecker
2023-12-12 13:21         ` Anna-Maria Behnsen
2023-12-12 13:37           ` Frederic Weisbecker
2023-12-20 14:49   ` Frederic Weisbecker
2023-12-20 15:59   ` [tip: timers/core] " tip-bot2 for Anna-Maria Behnsen
2023-12-01  9:26 ` [PATCH v9 13/32] timers: Restructure get_next_timer_interrupt() Anna-Maria Behnsen
2023-12-01  9:26 ` [PATCH v9 14/32] timers: Split out get next timer interrupt Anna-Maria Behnsen
2023-12-01  9:26 ` [PATCH v9 15/32] timers: Move marking timer bases idle into tick_nohz_stop_tick() Anna-Maria Behnsen
2023-12-01  9:26 ` [PATCH v9 16/32] timers: Optimization for timer_base_try_to_set_idle() Anna-Maria Behnsen
2023-12-04 17:52   ` Sebastian Siewior
2023-12-05 12:05     ` Anna-Maria Behnsen
2023-12-01  9:26 ` [PATCH v9 17/32] timers: Introduce add_timer() variants which modify timer flags Anna-Maria Behnsen
2023-12-05 18:28   ` Sebastian Siewior
2023-12-06  9:24     ` Anna-Maria Behnsen
2023-12-01  9:26 ` [PATCH v9 18/32] workqueue: Use global variant for add_timer() Anna-Maria Behnsen
2023-12-01  9:26 ` [PATCH v9 19/32] timers: add_timer_on(): Make sure TIMER_PINNED flag is set Anna-Maria Behnsen
2023-12-05 18:29   ` Sebastian Siewior
2023-12-06  9:57     ` Anna-Maria Behnsen
2023-12-06 10:26       ` Sebastian Siewior
2023-12-06 10:46         ` Anna-Maria Behnsen
2023-12-01  9:26 ` [PATCH v9 20/32] timers: Ease code in run_local_timers() Anna-Maria Behnsen
2023-12-01  9:26 ` [PATCH v9 21/32] timers: Split next timer interrupt logic Anna-Maria Behnsen
2023-12-05 18:29   ` Sebastian Siewior
2023-12-01  9:26 ` [PATCH v9 22/32] timers: Keep the pinned timers separate from the others Anna-Maria Behnsen
2023-12-05 21:11   ` Sebastian Siewior
2023-12-06 10:23     ` Anna-Maria Behnsen
2023-12-01  9:26 ` [PATCH v9 23/32] timers: Retrieve next expiry of pinned/non-pinned timers separately Anna-Maria Behnsen
2023-12-06  9:47   ` Sebastian Siewior
2023-12-07 10:12     ` Anna-Maria Behnsen
2023-12-01  9:26 ` [PATCH v9 24/32] timers: Split out "get next timer interrupt" functionality Anna-Maria Behnsen
2023-12-06 10:20   ` Sebastian Siewior
2023-12-01  9:26 ` [PATCH v9 25/32] timers: Add get next timer interrupt functionality for remote CPUs Anna-Maria Behnsen
2023-12-06 10:44   ` Sebastian Siewior
2023-12-07 10:27     ` Anna-Maria Behnsen
2023-12-01  9:26 ` [PATCH v9 26/32] timers: Restructure internal locking Anna-Maria Behnsen
2023-12-01  9:26 ` [PATCH v9 27/32] timers: Check if timers base is handled already Anna-Maria Behnsen
2023-12-06 10:58   ` Sebastian Siewior
2023-12-01  9:26 ` [PATCH v9 28/32] tick/sched: Split out jiffies update helper function Anna-Maria Behnsen
2023-12-01  9:26 ` [PATCH v9 29/32] timers: Introduce function to check timer base is_idle flag Anna-Maria Behnsen
2023-12-01  9:26 ` [PATCH v9 30/32] timers: Implement the hierarchical pull model Anna-Maria Behnsen
2023-12-06 16:35   ` Sebastian Siewior
2023-12-08  9:01     ` Anna-Maria Behnsen
2023-12-07 18:09   ` Sebastian Siewior
2023-12-08 10:31     ` Anna-Maria Behnsen
2023-12-08 18:18   ` Sebastian Siewior [this message]
2023-12-11 18:04   ` Sebastian Siewior
2023-12-12 11:31     ` Anna-Maria Behnsen
2023-12-12 11:43       ` Anna-Maria Behnsen
2023-12-12 15:59       ` Sebastian Siewior
2023-12-12 12:14   ` Sebastian Siewior
2023-12-12 14:52     ` Anna-Maria Behnsen
2023-12-12 17:08       ` Sebastian Siewior
2023-12-01  9:26 ` [PATCH v9 31/32] timer_migration: Add tracepoints Anna-Maria Behnsen
2023-12-01  9:26 ` [PATCH v9 32/32] timers: Always queue timers on the local CPU Anna-Maria Behnsen
2023-12-07 12:11 ` [PATCH v9 00/32] timers: Move from a push remote at enqueue to a pull at expiry model Anna-Maria Behnsen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231208181805.BbFDsoJe@linutronix.de \
    --to=bigeasy@linutronix.de \
    --cc=anna-maria@linutronix.de \
    --cc=arjan@infradead.org \
    --cc=edumazet@google.com \
    --cc=frederic@kernel.org \
    --cc=gautham.shenoy@amd.com \
    --cc=ggherdovich@suse.cz \
    --cc=jstultz@google.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lukasz.luba@arm.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=riel@surriel.com \
    --cc=rostedt@goodmis.org \
    --cc=srinivas.pandruvada@intel.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox