All of lore.kernel.org
 help / color / mirror / Atom feed
From: Rusty Russell <rusty@rustcorp.com.au>
To: ego@in.ibm.com
Cc: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>,
	Mike Travis <travis@sgi.com>,
	Vegard Nossum <vegard.nossum@gmail.com>,
	Adrian Bunk <bunk@kernel.org>,
	Srivatsa Vaddagiri <vatsa@in.ibm.com>,
	linux-kernel@vger.kernel.org, "Rafael J. Wysocki" <rjw@sisk.pl>,
	"Zhang, Yanmin" <yanmin.zhang@intel.com>,
	Heiko Carstens <heiko.carstens@de.ibm.com>
Subject: Re: v2.6.26-rc7: BUG: unable to handle kernel NULL pointer dereference
Date: Fri, 27 Jun 2008 13:16:42 +1000	[thread overview]
Message-ID: <200806271316.42771.rusty@rustcorp.com.au> (raw)
In-Reply-To: <20080626125820.GA30144@in.ibm.com>

On Thursday 26 June 2008 22:58:20 Gautham R Shenoy wrote:
> On Tue, Jun 24, 2008 at 11:14:51PM +1000, Rusty Russell wrote:
> > On Tuesday 24 June 2008 18:06:23 Zhang, Yanmin wrote:
> > > On Tue, 2008-06-24 at 11:36 +1000, Rusty Russell wrote:
> > > > On Tuesday 24 June 2008 02:58:44 Mike Travis wrote:
> > > > > Rusty Russell wrote:
> > > > > > On Monday 23 June 2008 02:29:07 Vegard Nossum wrote:
> > > > > >> And the (cpu < nr_cpu_ids) fails because the CPU has just been
> > > > > >> offlined (or failed to initialize, but it's the same thing),
> > > > > >> while NR_CPUS is the value that was compiled in as
> > > > > >> CONFIG_NR_CPUS (so the former check will always be true).
> > > > > >>
> > > > > >> I don't think it is valid to ask for a per_cpu() variable on a
> > > > > >> CPU which does not exist, though
> > > > > >
> > > > > > Yes it is.  As long as cpu_possible(cpu), per_cpu(cpu) is valid.
> > > > > >
> > > > > > The number check should be removed: checking cpu_possible() is
> > > > > > sufficient.
> > > > > >
> > > > > > Hope that helps,
> > > > > > Rusty.
> > > > >
> > > > > I don't see a check for index being out of range in cpu_possible().
> > > >
> > > > You're right.  It assumes cpu is < NR_CPUS.  Hmm, I have no idea
> > > > what's going on.  nr_cpu_ids (ignore that it's a horrible name for a
> > > > bad idea) should be fine to test against.
> > > >
> > > > Vegard's analysis is flawed: just because cpu is offline, it still
> > > > must be < nr_cpu_ids, which is based on possible cpus.  Unless
> > > > something crazy is happening, but a quick grep doesn't reveal anyone
> > > > manipulating nr_cpu_ids.
> > > >
> > > > If changing this fixes the bug, something else is badly wrong...
> > > > Rusty.
> > >
> > > In function _cpu_up, the panic happens when calling
> > > __raw_notifier_call_chain at the second time. Kernel doesn't panic when
> > > calling it at the first time. If just say because of nr_cpu_ids,
> > > that's not right.
> > >
> > > By checking source codes, I find function do_boot_cpu is the culprit.
> > > Consider below call chain:
> > >  _cpu_up=>__cpu_up=>smp_ops.cpu_up=>native_cpu_up=>do_boot_cpu.
> > >
> > > So do_boot_cpu is called in the end. In do_boot_cpu, if
> > > boot_error==true, cpu_clear(cpu, cpu_possible_map) is executed. So
> > > later on, when _cpu_up calls __raw_notifier_call_chain at the second
> > > time to report
> > > CPU_UP_CANCELED, because this cpu is already cleared from
> > > cpu_possible_map, get_cpu_sysdev returns NULL.
> > >
> > > Many resources are related to cpu_possible_map, so it's better not to
> > > change it.
> > >
> > > Below patch against 2.6.26-rc7 fixes it by removing the bit clearing in
> > > cpu_possible_map.
> > >
> > > Vegard, would you like to help test it?
> > >
> > > Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com>
> > >
> > > ---
> > >
> > > diff -Nraup linux-2.6.26-rc7/arch/x86/kernel/smpboot.c
> > > linux-2.6.26-rc7_cpuhotplug/arch/x86/kernel/smpboot.c ---
> > > linux-2.6.26-rc7/arch/x86/kernel/smpboot.c	2008-06-24
> > > 09:03:54.000000000 +0800 +++
> > > linux-2.6.26-rc7_cpuhotplug/arch/x86/kernel/smpboot.c	2008-06-24
> > > 09:04:45.000000000 +0800 @@ -996,7 +996,6 @@ do_rest:
> > >  #endif
> > >  		cpu_clear(cpu, cpu_callout_map); /* was set by do_boot_cpu() */
> > >  		cpu_clear(cpu, cpu_initialized); /* was set by cpu_init() */
> > > -		cpu_clear(cpu, cpu_possible_map);
> > >  		cpu_clear(cpu, cpu_present_map);
>
> Nice catch.
>
> While we're at it, is the clearing of cpu from the cpu_present_map
> necessary if cpu_up failed for 'cpu' ?

It's never necessary, but there there are not many places which cpu_present is 
examined.  It just prevents it from being hot added again, AFAICT.

Rusty.

  reply	other threads:[~2008-06-27  3:17 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-22 12:56 v2.6.26-rc7: BUG: unable to handle kernel NULL pointer dereference Vegard Nossum
2008-06-22 14:47 ` Vegard Nossum
2008-06-22 14:54   ` Vegard Nossum
2008-06-22 15:56     ` Adrian Bunk
2008-06-22 16:29       ` Vegard Nossum
2008-06-23  3:26         ` Rusty Russell
2008-06-23 16:58           ` Mike Travis
2008-06-24  1:36             ` Rusty Russell
2008-06-24  7:40               ` Vegard Nossum
2008-06-24  8:06               ` Zhang, Yanmin
2008-06-24  8:37                 ` Vegard Nossum
2008-06-24 13:14                 ` Rusty Russell
2008-06-24 14:44                   ` Mike Travis
2008-06-25  5:38                     ` Rusty Russell
2008-06-25 15:06                       ` Mike Travis
2008-06-26 12:58                   ` Gautham R Shenoy
2008-06-27  3:16                     ` Rusty Russell [this message]
2008-06-30 11:19                   ` Ingo Molnar
2008-06-26  0:59                 ` Zhang, Yanmin
2008-06-26  2:15                   ` Andrew Morton
2008-06-26  9:00                     ` Vegard Nossum
2008-06-26 12:40                       ` Jason Wessel
2008-06-26 13:59                         ` Vegard Nossum
2008-07-10 19:10                 ` Vegard Nossum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200806271316.42771.rusty@rustcorp.com.au \
    --to=rusty@rustcorp.com.au \
    --cc=bunk@kernel.org \
    --cc=ego@in.ibm.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rjw@sisk.pl \
    --cc=travis@sgi.com \
    --cc=vatsa@in.ibm.com \
    --cc=vegard.nossum@gmail.com \
    --cc=yanmin.zhang@intel.com \
    --cc=yanmin_zhang@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.