stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: gregkh@linuxfoundation.org
Cc: balasubramani_vivekanandan@mentor.com, tglx@linutronix.de,
	stable@vger.kernel.org
Subject: Re: FAILED: patch "[PATCH] tick: broadcast-hrtimer: Fix a race in bc_set_next" failed to apply to 5.3-stable tree
Date: Tue, 8 Oct 2019 19:15:54 -0400	[thread overview]
Message-ID: <20191008231554.GK1396@sasha-vm> (raw)
In-Reply-To: <1570520554227151@kroah.com>

On Tue, Oct 08, 2019 at 09:42:34AM +0200, gregkh@linuxfoundation.org wrote:
>
>The patch below does not apply to the 5.3-stable tree.
>If someone wants it applied there, or to any other stable or longterm
>tree, then please email the backport, including the original git commit
>id to <stable@vger.kernel.org>.
>
>thanks,
>
>greg k-h
>
>------------------ original commit in Linus's tree ------------------
>
>From b9023b91dd020ad7e093baa5122b6968c48cc9e0 Mon Sep 17 00:00:00 2001
>From: Balasubramani Vivekanandan <balasubramani_vivekanandan@mentor.com>
>Date: Thu, 26 Sep 2019 15:51:01 +0200
>Subject: [PATCH] tick: broadcast-hrtimer: Fix a race in bc_set_next
>
>When a cpu requests broadcasting, before starting the tick broadcast
>hrtimer, bc_set_next() checks if the timer callback (bc_handler) is active
>using hrtimer_try_to_cancel(). But hrtimer_try_to_cancel() does not provide
>the required synchronization when the callback is active on other core.
>
>The callback could have already executed tick_handle_oneshot_broadcast()
>and could have also returned. But still there is a small time window where
>the hrtimer_try_to_cancel() returns -1. In that case bc_set_next() returns
>without doing anything, but the next_event of the tick broadcast clock
>device is already set to a timeout value.
>
>In the race condition diagram below, CPU #1 is running the timer callback
>and CPU #2 is entering idle state and so calls bc_set_next().
>
>In the worst case, the next_event will contain an expiry time, but the
>hrtimer will not be started which happens when the racing callback returns
>HRTIMER_NORESTART. The hrtimer might never recover if all further requests
>from the CPUs to subscribe to tick broadcast have timeout greater than the
>next_event of tick broadcast clock device. This leads to cascading of
>failures and finally noticed as rcu stall warnings
>
>Here is a depiction of the race condition
>
>CPU #1 (Running timer callback)                   CPU #2 (Enter idle
>                                                  and subscribe to
>                                                  tick broadcast)
>---------------------                             ---------------------
>
>__run_hrtimer()                                   tick_broadcast_enter()
>
>  bc_handler()                                      __tick_broadcast_oneshot_control()
>
>    tick_handle_oneshot_broadcast()
>
>      raw_spin_lock(&tick_broadcast_lock);
>
>      dev->next_event = KTIME_MAX;                  //wait for tick_broadcast_lock
>      //next_event for tick broadcast clock
>      set to KTIME_MAX since no other cores
>      subscribed to tick broadcasting
>
>      raw_spin_unlock(&tick_broadcast_lock);
>
>    if (dev->next_event == KTIME_MAX)
>      return HRTIMER_NORESTART
>    // callback function exits without
>       restarting the hrtimer                      //tick_broadcast_lock acquired
>                                                   raw_spin_lock(&tick_broadcast_lock);
>
>                                                   tick_broadcast_set_event()
>
>                                                     clockevents_program_event()
>
>                                                       dev->next_event = expires;
>
>                                                       bc_set_next()
>
>                                                         hrtimer_try_to_cancel()
>                                                         //returns -1 since the timer
>                                                         callback is active. Exits without
>                                                         restarting the timer
>  cpu_base->running = NULL;
>
>The comment that hrtimer cannot be armed from within the callback is
>wrong. It is fine to start the hrtimer from within the callback. Also it is
>safe to start the hrtimer from the enter/exit idle code while the broadcast
>handler is active. The enter/exit idle code and the broadcast handler are
>synchronized using tick_broadcast_lock. So there is no need for the
>existing try to cancel logic. All this can be removed which will eliminate
>the race condition as well.
>
>Fixes: 5d1638acb9f6 ("tick: Introduce hrtimer based broadcast")
>Originally-by: Thomas Gleixner <tglx@linutronix.de>
>Signed-off-by: Balasubramani Vivekanandan <balasubramani_vivekanandan@mentor.com>
>Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>Cc: stable@vger.kernel.org
>Link: https://lkml.kernel.org/r/20190926135101.12102-2-balasubramani_vivekanandan@mentor.com

I've fixed up a conflict due to 902a9f9c50905 ("tick: Mark tick related
hrtimers to expiry in hard interrupt context") and queued it for
5.3-4.14. It needs something more complex for older kernels.

-- 
Thanks,
Sasha

      reply	other threads:[~2019-10-08 23:15 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-08  7:42 FAILED: patch "[PATCH] tick: broadcast-hrtimer: Fix a race in bc_set_next" failed to apply to 5.3-stable tree gregkh
2019-10-08 23:15 ` Sasha Levin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191008231554.GK1396@sasha-vm \
    --to=sashal@kernel.org \
    --cc=balasubramani_vivekanandan@mentor.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).