Re: [PATCH v6 19/21] timer: Implement the hierarchical pull model

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Sebastian Siewior <bigeasy@linutronix.de>
To: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: linux-kernel@vger.kernel.org,
	Peter Zijlstra <peterz@infradead.org>,
	John Stultz <jstultz@google.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Eric Dumazet <edumazet@google.com>,
	"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>,
	Arjan van de Ven <arjan@infradead.org>,
	"Paul E . McKenney" <paulmck@kernel.org>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Rik van Riel <riel@surriel.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Giovanni Gherdovich <ggherdovich@suse.cz>,
	Lukasz Luba <lukasz.luba@arm.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>
Subject: Re: [PATCH v6 19/21] timer: Implement the hierarchical pull model
Date: Mon, 15 May 2023 13:06:46 +0200	[thread overview]
Message-ID: <20230515110646.zQCEzB8f@linutronix.de> (raw)
In-Reply-To: <20230510072817.116056-20-anna-maria@linutronix.de>

On 2023-05-10 09:28:15 [+0200], Anna-Maria Behnsen wrote:
> diff --git a/kernel/time/timer_migration.c b/kernel/time/timer_migration.c
> new file mode 100644
> index 000000000000..accce2b368f4
> --- /dev/null
> +++ b/kernel/time/timer_migration.c
> @@ -0,0 +1,1322 @@
…
> + * 4. Propagating the changes of step 1 through the hierarchy to GRP1:0
> + *
> + *    LVL 1            [GRP1:0]
> + *                  -> migrator = GRP0:1
> + *                  -> active   = GRP0:1
> + *                   /                \
> + *    LVL 0  [GRP0:0]                  [GRP0:1]
> + *           migrator = CPU1           migrator = CPU2
> + *           active   = CPU1           active   = CPU2
> + *              /         \                /         \
> + *    CPUs     0           1              2           3
> + *             idle        active         active      idle
> + *
> + * Now there is a inconsistent overall state because GRP0:0 is active, but
                   an

> + * it is marked as idle in the GRP1:0. This is prevented by incrementing
> + * sequence counter whenever changing the state.
> + */
> +
> +#ifdef DEBUG
> +# define DBG_BUG_ON(x) BUG_ON(x)
> +#else
> +# define DBG_BUG_ON(x)
> +#endif

I would make them WARN_ON_ONCE() in !DEBUG case or remove it. There is
nothing that sets DEBUG here, right?

.
> +static bool tmigr_update_events(struct tmigr_group *group,
…
> +	if (nextexp == KTIME_MAX) {
> +		evt->ignore = 1;
> +
> +		/*
> +		 * When next event could be ignored (nextexp is KTIME MAX)
> +		 * and there was no remote timer handling before or the
> +		 * group is already active, there is no need to walk the
> +		 * hierarchy even if there is a parent group.
> +		 *
> +		 * The other way round: even if the event could be ignored,
> +		 * but if a remote timer handling was executed before and
> +		 * the group is not active, walking the hierarchy is
> +		 * required to not miss a enqueued timer in the non active
                                        an

> +		 * group. The enqueued timer needs to be propagated to a
> +		 * higher level to ensure it is handeld.
…
> +static struct tmigr_group *tmigr_get_group(unsigned int cpu, unsigned int node,
> +					   unsigned int lvl)
> +{
> +	struct tmigr_group *tmp, *group = NULL;
> +	bool first_loop = true;
> +
> +	lockdep_assert_held(&tmigr_mutex);
> +
> +reloop:
> +	/* Try to attach to an exisiting group first */
> +	list_for_each_entry(tmp, &tmigr_level_list[lvl], list) {
> +		/*
> +		 * If @lvl is below the cross numa node level, check whether
> +		 * this group belongs to the same numa node.
> +		 */
> +		if (lvl < tmigr_crossnode_level && tmp->numa_node != node)
> +			continue;
> +
> +		/* Capacity left? */
> +		if (tmp->num_children >= TMIGR_CHILDS_PER_GROUP)
> +			continue;
> +
> +		/*
> +		 * TODO: A possible further improvement: Make sure that all
> +		 * CPU siblings end up in the same group of lowest level of
> +		 * the hierarchy. Rely on topology sibling mask would be a
> +		 * reasonable solution.
> +		 */
> +
> +		group = tmp;
> +		break;
> +	}
> +
> +	if (group) {
> +		return group;
> +	} else if (first_loop == true) {
> +		first_loop = false;

Why do have this? There is no lock drop or anything that could alter the
outcome of the loop iteration or do I miss something that obvious?

> +		goto reloop;
> +	}
> +
> +	/* Allocate and	set up a new group */
> +	group = kzalloc_node(sizeof(*group), GFP_KERNEL, node);
> +	if (!group)
> +		return ERR_PTR(-ENOMEM);
> +
> +	tmigr_init_group(group, lvl, node);
> +
> +	/* Setup successful. Add it to the hierarchy */
> +	list_add(&group->list, &tmigr_level_list[lvl]);

Not sure if it makes any difference if the item is added to the front of
the list or the end but be aware that this adds it to the front.

> +	return group;
> +}
…
> +static int tmigr_setup_groups(unsigned int cpu, unsigned int node)
> +{
> +	struct tmigr_group *group, *child, **stack;
> +	int top = 0, err = 0, i = 0;
> +	struct list_head *lvllist;
> +	size_t sz;
> +
> +	sz = sizeof(struct tmigr_group *) * tmigr_hierarchy_levels;
> +	stack = kzalloc(sz, GFP_KERNEL);

kcalloc().

> +	if (!stack)
> +		return -ENOMEM;
> +
> +	do {
> +		group = tmigr_get_group(cpu, node, i);
> +		if (IS_ERR(group)) {
> +			err = IS_ERR(group);

This needs to be
	err = PTR_ERR(group);

since now group is either 1 or 0.
_OR_ you err a bool an replace the check later from
	if (err < 0)
with
	if (err)

since you don't actually care about the error code but need to know if
it failed.

> +			break;
> +		}
> +
> +		top = i;
> +		stack[i++] = group;
> +
> +		/*
> +		 * When booting only less CPUs of a system than CPUs are
> +		 * available, not all calculated hierarchy levels are required.
> +		 *
> +		 * The loop is aborted as soon as the highest level, which might
> +		 * be different from tmigr_hierarchy_levels, contains only a
> +		 * single group.
> +		 */
> +		if (group->parent || i == tmigr_hierarchy_levels ||
> +		    (list_empty(&tmigr_level_list[i]) &&
> +		     list_is_singular(&tmigr_level_list[i - 1])))
> +			break;
> +
> +	} while (i < tmigr_hierarchy_levels);
> +
> +	do {
> +		group = stack[--i];
> +
> +		if (err < 0) {
> +			list_del(&group->list);
> +			kfree(group);
> +			continue;
> +		}
…
> +static int __init tmigr_init(void)
> +{
> +	unsigned int cpulvl, nodelvl, cpus_per_node, i;
> +	unsigned int nnodes = num_possible_nodes();
> +	unsigned int ncpus = num_possible_cpus();
> +	int ret = -ENOMEM;
> +	size_t sz;
…
> +	sz = sizeof(struct list_head) * tmigr_hierarchy_levels;
> +	tmigr_level_list = kzalloc(sz, GFP_KERNEL);
kcalloc().

…
> --- /dev/null
> +++ b/kernel/time/timer_migration.h
> @@ -0,0 +1,138 @@
…
> +struct tmigr_group {
> +	raw_spinlock_t		lock;
> +	atomic_t		migr_state;
> +	struct tmigr_group	*parent;
> +	struct tmigr_event	groupevt;
> +	u64			next_expiry;
> +	struct timerqueue_head	events;
> +	u8			childmask;
> +	unsigned int		level;
> +	struct list_head	list;
> +	unsigned int		numa_node;

numa_node is usually an int (not unsigned).

> +	unsigned int		num_children;
> +};

Sebastian

next prev parent reply	other threads:[~2023-05-15 11:06 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-10  7:27 [PATCH v6 00/21] timer: Move from a push remote at enqueue to a pull at expiry model Anna-Maria Behnsen
2023-05-10  7:27 ` [PATCH v6 01/21] tick-sched: Warn when next tick seems to be in the past Anna-Maria Behnsen
2023-05-10  7:27 ` [PATCH v6 02/21] timer: Do not IPI for deferrable timers Anna-Maria Behnsen
2023-05-10  7:27 ` [PATCH v6 03/21] timer: Add comment to get_next_timer_interrupt() description Anna-Maria Behnsen
2023-05-10  7:28 ` [PATCH v6 04/21] timer: Move store of next event into __next_timer_interrupt() Anna-Maria Behnsen
2023-05-10  7:28 ` [PATCH v6 05/21] timer: Split next timer interrupt logic Anna-Maria Behnsen
2023-05-10  7:28 ` [PATCH v6 06/21] timer: Rework idle logic Anna-Maria Behnsen
2023-05-10  7:28 ` [PATCH v6 07/21] timers: Introduce add_timer() variants which modify timer flags Anna-Maria Behnsen
2023-06-05 21:43   ` Frederic Weisbecker
2023-05-10  7:28 ` [PATCH v6 08/21] workqueue: Use global variant for add_timer() Anna-Maria Behnsen
2023-05-10 19:30   ` Tejun Heo
2023-06-05 22:16   ` Frederic Weisbecker
2023-05-10  7:28 ` [PATCH v6 09/21] timer: add_timer_on(): Make sure TIMER_PINNED flag is set Anna-Maria Behnsen
2023-06-05 22:12   ` Frederic Weisbecker
2023-05-10  7:28 ` [PATCH v6 10/21] timers: Ease code in run_local_timers() Anna-Maria Behnsen
2023-05-10  7:28 ` [PATCH v6 11/21] timers: Create helper function to forward timer base clk Anna-Maria Behnsen
2023-05-10  7:28 ` [PATCH v6 12/21] timer: Keep the pinned timers separate from the others Anna-Maria Behnsen
2023-05-10  7:28 ` [PATCH v6 13/21] timer: Retrieve next expiry of pinned/non-pinned timers separately Anna-Maria Behnsen
2023-05-10  7:28 ` [PATCH v6 14/21] timer: Split out "get next timer interrupt" functionality Anna-Maria Behnsen
2023-05-10  7:28 ` [PATCH v6 15/21] timer: Add get next timer interrupt functionality for remote CPUs Anna-Maria Behnsen
2023-05-10 10:16   ` Frederic Weisbecker
2023-05-11 13:06     ` Anna-Maria Behnsen
2023-05-10  7:28 ` [PATCH v6 16/21] timer: Restructure internal locking Anna-Maria Behnsen
2023-05-10  7:28 ` [PATCH v6 17/21] timer: Check if timers base is handled already Anna-Maria Behnsen
2023-05-10  7:28 ` [PATCH v6 18/21] tick/sched: Split out jiffies update helper function Anna-Maria Behnsen
2023-05-10  7:28 ` [PATCH v6 19/21] timer: Implement the hierarchical pull model Anna-Maria Behnsen
2023-05-10 10:32   ` Frederic Weisbecker
2023-05-15 10:19     ` Sebastian Siewior
2023-05-15 10:50       ` Anna-Maria Behnsen
2023-05-15 12:41         ` Sebastian Siewior
2023-05-16  9:24           ` Frederic Weisbecker
2023-05-16  9:37             ` Sebastian Siewior
2023-05-16 12:49           ` Anna-Maria Behnsen
2023-05-16  9:15       ` Frederic Weisbecker
2023-05-11 16:56   ` Sebastian Siewior
2023-05-15 11:06   ` Sebastian Siewior [this message]
2023-05-19  9:32   ` kernel test robot
2023-05-10  7:28 ` [PATCH v6 20/21] timer_migration: Add tracepoints Anna-Maria Behnsen
2023-05-10  7:28 ` [PATCH v6 21/21] timer: Always queue timers on the local CPU Anna-Maria Behnsen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230515110646.zQCEzB8f@linutronix.de \
    --to=bigeasy@linutronix.de \
    --cc=anna-maria@linutronix.de \
    --cc=arjan@infradead.org \
    --cc=edumazet@google.com \
    --cc=fweisbec@gmail.com \
    --cc=gautham.shenoy@amd.com \
    --cc=ggherdovich@suse.cz \
    --cc=jstultz@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lukasz.luba@arm.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=riel@surriel.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox