All of lore.kernel.org
 help / color / mirror / Atom feed
From: Frederic Weisbecker <frederic@kernel.org>
To: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel@vger.kernel.org, Borislav Petkov <bp@alien8.de>,
	Narasimhan V <Narasimhan.V@amd.com>
Subject: Re: [PATCH 0/3] timer_migration: Fix a possible race and improvements
Date: Fri, 21 Jun 2024 16:31:15 +0200	[thread overview]
Message-ID: <ZnWOswTMML6ShzYO@localhost.localdomain> (raw)
In-Reply-To: <20240621-tmigr-fixes-v1-0-8c8a2d8e8d77@linutronix.de>

Le Fri, Jun 21, 2024 at 11:37:05AM +0200, Anna-Maria Behnsen a écrit :
> Borislav reported a warning in timer migration deactive path
> 
>   https://lore.kernel.org/r/20240612090347.GBZmlkc5PwlVpOG6vT@fat_crate.local
> 
> Sadly it doesn't reproduce directly. But with the change of timing (by
> adding a trace prinkt before the warning), it is possible to trigger the
> warning reliable at least in my test setup. The problem here is a racy
> check agains group->parent pointer. This is also used in other places in
> the code and fixing this racy usage is adressed by the first patch.
> 
> While working with the code, I saw two things which could be improved
> (tracing and update of per cpu group wakeup value). This improvements are
> adressed by the other two patches.
> 
> Patches are available here:
> 
>   https://git.kernel.org/pub/scm/linux/kernel/git/anna-maria/linux-devel.git timers/misc
> 
> Cc: Frederic Weisbecker <frederic@kernel.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: linux-kernel@vger.kernel.org
> 
> Thanks,
> 
> Anna-Maria
> 
> ---

This made me stare at the group creation again and I might have found
something. Does the following race look plausible to you?


                  [GRP0:0]
               migrator = 0
               active   = 0
               nextevt  = KTIME_MAX
               /         \
              0         1 .. 7
          active         idle

0) Hierarchy has only 8 CPUs (single node for now with only CPU 0
   as active.

   
                             [GRP1:0]
                        migrator = TMIGR_NONE
                        active   = NONE
                        nextevt  = KTIME_MAX
                                         \
                 [GRP0:0]                  [GRP0:1]
              migrator = 0              migrator = TMIGR_NONE
              active   = 0              active   = NONE
              nextevt  = KTIME_MAX      nextevt  = KTIME_MAX
                /         \                    |
              0          1 .. 7                8
          active         idle                !online

1) CPU 8 is booting and creates a new node and a new top. For now it's
   only connected to GRP0:1, not yet to GRP0:0. Also CPU 8 hasn't called
   __tmigr_cpu_activate() on itself yet.


                             [GRP1:0]
                        migrator = TMIGR_NONE
                        active   = NONE
                        nextevt  = KTIME_MAX
                       /                  \
                 [GRP0:0]                  [GRP0:1]
              migrator = 0              migrator = TMIGR_NONE
              active   = 0              active   = NONE
              nextevt  = KTIME_MAX      nextevt  = KTIME_MAX
                /         \                    |
              0          1 .. 7                8
          active         idle                active

2) CPU 8 connects GRP0:0 to GRP1:0 and observes while in
   tmigr_connect_child_parent() that GRP0:0 is not TMIGR_NONE. So it
   prepares to call tmigr_active_up() on it. It hasn't done it yet.


                             [GRP1:0]
                        migrator = TMIGR_NONE
                        active   = NONE
                        nextevt  = KTIME_MAX
                       /                  \
                 [GRP0:0]                  [GRP0:1]
              migrator = TMIGR_NONE        migrator = TMIGR_NONE
              active   = NONE              active   = NONE
              nextevt  = KTIME_MAX         nextevt  = KTIME_MAX
                /         \                    |
              0          1 .. 7                8
            idle         idle                active

3) CPU 0 goes idle. Since GRP0:0->parent has been updated by CPU 8 with
   GRP0:0->lock held, CPU 0 observes GRP1:0 after calling tmigr_update_events()
   and it propagates the change to the top (no change there and no wakeup
   programmed since there is no timer).


                             [GRP1:0]
                        migrator = GRP0:0
                        active   = GRP0:0
                        nextevt  = KTIME_MAX
                       /                  \
                 [GRP0:0]                  [GRP0:1]
              migrator = TMIGR_NONE       migrator = TMIGR_NONE
              active   = NONE             active   = NONE
              nextevt  = KTIME_MAX        nextevt  = KTIME_MAX
                /         \                    |
              0          1 .. 7                8
            idle         idle                active

4) Now CPU 8 finally calls tmigr_active_up() to GRP0:0

                             [GRP1:0]
                        migrator = GRP0:0
                        active   = GRP0:0, GRP0:1
                        nextevt  = KTIME_MAX
                       /                  \
                 [GRP0:0]                  [GRP0:1]
              migrator = TMIGR_NONE       migrator = 8
              active   = NONE             active   = 8
              nextevt  = KTIME_MAX        nextevt  = KTIME_MAX
                /         \                    |
              0          1 .. 7                8
            idle         idle                active

5) And out of tmigr_cpu_online() CPU 8 calls tmigr_active_up() on itself

                             [GRP1:0]
                        migrator = GRP0:0
                        active   = GRP0:0
                        nextevt  = T8
                       /                  \
                 [GRP0:0]                  [GRP0:1]
              migrator = TMIGR_NONE         migrator = TMIGR_NONE
              active   = NONE               active   = NONE
              nextevt  = KTIME_MAX          nextevt  = T8
                /         \                    |
              0          1 .. 7                8
            idle         idle                  idle

5) CPU 8 goes idle with a timer T8 and relies on GRP0:0 as the migrator.
   But it's not really active, so T8 gets ignored.


And if that race looks plausible, does the following fix look good?

diff --git a/kernel/time/timer_migration.c b/kernel/time/timer_migration.c
index 84413114db5c..0609cb8c770e 100644
--- a/kernel/time/timer_migration.c
+++ b/kernel/time/timer_migration.c
@@ -1525,7 +1525,6 @@ static void tmigr_connect_child_parent(struct tmigr_group *child,
 	child->childmask = BIT(parent->num_children++);
 
 	raw_spin_unlock(&parent->lock);
-	raw_spin_unlock_irq(&child->lock);
 
 	trace_tmigr_connect_child_parent(child);
 
@@ -1559,6 +1558,14 @@ static void tmigr_connect_child_parent(struct tmigr_group *child,
 		 */
 		WARN_ON(!tmigr_active_up(parent, child, &data) && parent->parent);
 	}
+	/*
+	 * Keep the lock up to that point so that if the child goes idle
+	 * concurrently, either it sees the new parent with its active state
+	 * after locking on tmigr_update_events() and propagates afterwards
+	 * its idle state up, or the current booting CPU will observe TMIGR_NONE
+	 * on the remote child and it won't propagate a spurious active state.
+	 */
+	raw_spin_unlock_irq(&child->lock);
 }
 
 static int tmigr_setup_groups(unsigned int cpu, unsigned int node)

  parent reply	other threads:[~2024-06-21 14:31 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-21  9:37 [PATCH 0/3] timer_migration: Fix a possible race and improvements Anna-Maria Behnsen
2024-06-21  9:37 ` [PATCH 1/3] timer_migration: Do not rely always on group->parent Anna-Maria Behnsen
2024-06-21  9:37 ` [PATCH 2/3] timer_migration: Spare write when nothing changed Anna-Maria Behnsen
2024-06-21  9:37 ` [PATCH 3/3] timer_migration: Improve tracing Anna-Maria Behnsen
2024-06-21 14:31 ` Frederic Weisbecker [this message]
2024-06-24  8:58   ` [PATCH 0/3] timer_migration: Fix a possible race and improvements Anna-Maria Behnsen
2024-06-24 11:04     ` Frederic Weisbecker
2024-06-24 14:48       ` Anna-Maria Behnsen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZnWOswTMML6ShzYO@localhost.localdomain \
    --to=frederic@kernel.org \
    --cc=Narasimhan.V@amd.com \
    --cc=anna-maria@linutronix.de \
    --cc=bp@alien8.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.