From: Frederic Weisbecker <frederic@kernel.org>
To: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>,
linux-kernel@vger.kernel.org, Borislav Petkov <bp@alien8.de>,
Narasimhan V <Narasimhan.V@amd.com>
Subject: Re: [PATCH 0/3] timer_migration: Fix a possible race and improvements
Date: Mon, 24 Jun 2024 13:04:53 +0200 [thread overview]
Message-ID: <ZnlS1QcFgHvJGm7J@lothringen> (raw)
In-Reply-To: <87zfrag1jh.fsf@somnus>
On Mon, Jun 24, 2024 at 10:58:26AM +0200, Anna-Maria Behnsen wrote:
> Frederic Weisbecker <frederic@kernel.org> writes:
> > 0) Hierarchy has only 8 CPUs (single node for now with only CPU 0
> > as active.
> >
> >
> > [GRP1:0]
> > migrator = TMIGR_NONE
> > active = NONE
> > nextevt = KTIME_MAX
> > \
> > [GRP0:0] [GRP0:1]
> > migrator = 0 migrator = TMIGR_NONE
> > active = 0 active = NONE
> > nextevt = KTIME_MAX nextevt = KTIME_MAX
> > / \ |
> > 0 1 .. 7 8
> > active idle !online
> >
> > 1) CPU 8 is booting and creates a new node and a new top. For now it's
> > only connected to GRP0:1, not yet to GRP0:0. Also CPU 8 hasn't called
> > __tmigr_cpu_activate() on itself yet.
>
> NIT: In step 1) CPU8 is not yet connected to GRP0:1 as the second while
> loop is not yet finished, but nevertheless...
Yes, I was in the second loop in my mind, but that didn't transcribe well :-)
>
> >
> > [GRP1:0]
> > migrator = TMIGR_NONE
> > active = NONE
> > nextevt = KTIME_MAX
> > / \
> > [GRP0:0] [GRP0:1]
> > migrator = 0 migrator = TMIGR_NONE
> > active = 0 active = NONE
> > nextevt = KTIME_MAX nextevt = KTIME_MAX
> > / \ |
> > 0 1 .. 7 8
> > active idle active
> >
> > 2) CPU 8 connects GRP0:0 to GRP1:0 and observes while in
> > tmigr_connect_child_parent() that GRP0:0 is not TMIGR_NONE. So it
> > prepares to call tmigr_active_up() on it. It hasn't done it yet.
>
> NIT: CPU8 keeps its state !online until step 5.
Oops, copy paste hazards.
> Holding the lock will not help as the state is not protected by the
> lock.
No but any access happening before an UNLOCK is guaranteed to be visible
after the next LOCK. This makes sure that either:
1) If the remote CPU going inactive has passed tmigr_update_events() with
its unlock of group->lock then the onlining CPU will see the TMIGR_NONE
or:
1) If the onlining CPU locks before the remote CPU calls tmigr_update_events(),
then the subsequent LOCK on group->lock in tmigr_update_events() will acquire
the new parent link and propagate the up the inactive state.
And yes there is an optimization case where we don't lock:
if (evt->ignore && !remote && group->parent)
return true;
And that would fall through the cracks. So, forget about that lock idea.
> +static void tmigr_setup_active_up(struct tmigr_group *group, struct tmigr_group *child)
> +{
> + union tmigr_state curstate, childstate;
> + bool walk_done;
> +
> + /*
> + * FIXME: Memory barrier is required here as the child state
> + * could have changed in the meantime
> + */
> + curstate.state = atomic_read_acquire(&group->migr_state);
> +
> + for (;;) {
> + childstate.state = atomic_read(&child->migr_state);
> + if (!childstate.active)
> + return;
Ok there could have been a risk that we miss the remote CPU going active. But
again thanks to the lock this makes sure that either we observe the childstate
as active or the remote CPU sees the link and propagates its active state. And
the unlocked optimization tmigr_update_events() still works because either it
sees the new parent and proceeds or it sees an intermediate parent and the next
ones will be locked.
Phew!
I'll do a deeper review this evening but it _looks_ ok.
Thanks.
next prev parent reply other threads:[~2024-06-24 11:04 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-21 9:37 [PATCH 0/3] timer_migration: Fix a possible race and improvements Anna-Maria Behnsen
2024-06-21 9:37 ` [PATCH 1/3] timer_migration: Do not rely always on group->parent Anna-Maria Behnsen
2024-06-21 9:37 ` [PATCH 2/3] timer_migration: Spare write when nothing changed Anna-Maria Behnsen
2024-06-21 9:37 ` [PATCH 3/3] timer_migration: Improve tracing Anna-Maria Behnsen
2024-06-21 14:31 ` [PATCH 0/3] timer_migration: Fix a possible race and improvements Frederic Weisbecker
2024-06-24 8:58 ` Anna-Maria Behnsen
2024-06-24 11:04 ` Frederic Weisbecker [this message]
2024-06-24 14:48 ` Anna-Maria Behnsen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZnlS1QcFgHvJGm7J@lothringen \
--to=frederic@kernel.org \
--cc=Narasimhan.V@amd.com \
--cc=anna-maria@linutronix.de \
--cc=bp@alien8.de \
--cc=linux-kernel@vger.kernel.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox