All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@elte.hu>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	linux-kernel@vger.kernel.org,
	Christof Schmitt <christof.schmitt@de.ibm.com>,
	Frank Blaschka <frank.blaschka@de.ibm.com>,
	Horst Hartmann <horsth@linux.vnet.ibm.com>,
	stable@kernel.org
Subject: Re: [patch 3/3] nohz/s390: fix arch_needs_cpu() return value on offline cpus
Date: Wed, 01 Dec 2010 13:19:09 +0100	[thread overview]
Message-ID: <1291205949.32004.1398.camel@laptop> (raw)
In-Reply-To: <20101201091109.GA8984@osiris.boeblingen.de.ibm.com>

On Wed, 2010-12-01 at 10:11 +0100, Heiko Carstens wrote:
> 
> Subject: [PATCH] nohz: fix get_next_timer_interrupt() vs cpu hotplug
> 
> From: Heiko Carstens <heiko.carstens@de.ibm.com>
> 
> This fixes a bug as seen on 2.6.32 based kernels where timers got enqueued
> on offline cpus.
> 
> If a cpu goes offline it might still have pending timers. These will be
> migrated during CPU_DEAD handling after the cpu is offline.
> However while the cpu is going offline it will schedule the idle task
> which will then call tick_nohz_stop_sched_tick().
> That function in turn will call get_next_timer_intterupt() to figure out
> if the tick of the cpu can be stopped or not. If it turns out that the
> next tick is just one jiffy off (delta_jiffies == 1)
> tick_nohz_stop_sched_tick() incorrectly assumes that the tick should not
> stop and takes an early exit and thus it won't update the load balancer
> cpu.
> Just afterwards the cpu will be killed and the load balancer cpu could
> be the offline cpu.
> On 2.6.32 based kernel get_nohz_load_balancer() gets called to decide on
> which cpu a timer should be enqueued (see __mod_timer()). Which leads
> to the possibility that timers get enqueued on an offline cpu. These will
> never expire and can cause a system hang.
> 
> This has been observed 2.6.32 kernels. On current kernels __mod_timer() uses
> get_nohz_timer_target() which doesn't have that problem. However there might
> be other problems because of the too early exit tick_nohz_stop_sched_tick()
> in case a cpu goes offline.
> 
> The easiest and probably safest fix seems to be to let
> get_next_timer_interrupt() just lie and let it say there isn't any pending
> timer if the current cpu is offline.
> I also thought of moving migrate_[hr]timers() from CPU_DEAD to CPU_DYING,
> but seeing that there already have been fixes at least in the hrtimer code
> in this area I'm afraid that this could add new subtle bugs.
> 
> Cc: stable@kernel.org
> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
> --- 

Thanks Heiko, I queued this one as well.

  reply	other threads:[~2010-12-01 12:20 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-26 12:00 [patch 0/3] three cpu hotplug fixes Heiko Carstens
2010-11-26 12:00 ` [patch 1/3] printk: fix wake_up_klogd() vs cpu hotplug Heiko Carstens
2010-11-26 12:10   ` Peter Zijlstra
2010-11-26 12:13     ` Heiko Carstens
2010-11-26 12:15       ` Peter Zijlstra
2010-11-26 12:35   ` Eric Dumazet
2010-11-26 12:42     ` Heiko Carstens
2010-11-26 13:01       ` Eric Dumazet
2010-11-26 15:02       ` [tip:sched/urgent] printk: Fix " tip-bot for Heiko Carstens
2010-11-26 12:00 ` [patch 2/3] nohz: fix printk_needs_cpu() return value on offline cpus Heiko Carstens
2010-11-26 12:11   ` Peter Zijlstra
2010-12-07 21:32     ` [stable] " Greg KH
2010-12-08  8:07       ` Heiko Carstens
2010-12-08 11:13         ` Peter Zijlstra
2010-11-26 15:02   ` [tip:sched/urgent] nohz: Fix " tip-bot for Heiko Carstens
2010-11-26 16:22     ` [PATCH] printk: use this_cpu_{read|write} api on printk_pending Eric Dumazet
2010-11-26 16:29       ` Peter Zijlstra
2010-11-26 16:40       ` Christoph Lameter
2010-11-26 16:59         ` Eric Dumazet
2010-11-26 17:11           ` Christoph Lameter
2010-11-26 17:20             ` Eric Dumazet
2010-11-26 17:27               ` Christoph Lameter
2010-12-08 20:41       ` [tip:sched/core] printk: Use " tip-bot for Eric Dumazet
2010-12-08 21:47         ` Christoph Lameter
2010-12-09  1:43           ` Eric Dumazet
2010-12-09 23:38             ` Christoph Lameter
2010-11-26 12:01 ` [patch 3/3] nohz/s390: fix arch_needs_cpu() return value on offline cpus Heiko Carstens
2010-11-26 12:14   ` Peter Zijlstra
2010-11-26 12:17     ` Heiko Carstens
2010-12-01  9:11     ` Heiko Carstens
2010-12-01 12:19       ` Peter Zijlstra [this message]
2010-12-08 20:41       ` [tip:sched/urgent] nohz: Fix get_next_timer_interrupt() vs cpu hotplug tip-bot for Heiko Carstens

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1291205949.32004.1398.camel@laptop \
    --to=a.p.zijlstra@chello.nl \
    --cc=christof.schmitt@de.ibm.com \
    --cc=frank.blaschka@de.ibm.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=horsth@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=schwidefsky@de.ibm.com \
    --cc=stable@kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.