From: Rusty Russell <rusty@rustcorp.com.au>
To: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
Cc: Mike Travis <travis@sgi.com>,
Vegard Nossum <vegard.nossum@gmail.com>,
Adrian Bunk <bunk@kernel.org>,
Srivatsa Vaddagiri <vatsa@in.ibm.com>,
linux-kernel@vger.kernel.org, Gautham R Shenoy <ego@in.ibm.com>,
"Rafael J. Wysocki" <rjw@sisk.pl>,
"Zhang, Yanmin" <yanmin.zhang@intel.com>,
Heiko Carstens <heiko.carstens@de.ibm.com>
Subject: Re: v2.6.26-rc7: BUG: unable to handle kernel NULL pointer dereference
Date: Tue, 24 Jun 2008 23:14:51 +1000 [thread overview]
Message-ID: <200806242314.51656.rusty@rustcorp.com.au> (raw)
In-Reply-To: <1214294783.25608.75.camel@ymzhang>
On Tuesday 24 June 2008 18:06:23 Zhang, Yanmin wrote:
> On Tue, 2008-06-24 at 11:36 +1000, Rusty Russell wrote:
> > On Tuesday 24 June 2008 02:58:44 Mike Travis wrote:
> > > Rusty Russell wrote:
> > > > On Monday 23 June 2008 02:29:07 Vegard Nossum wrote:
> > > >> And the (cpu < nr_cpu_ids) fails because the CPU has just been
> > > >> offlined (or failed to initialize, but it's the same thing), while
> > > >> NR_CPUS is the value that was compiled in as CONFIG_NR_CPUS (so the
> > > >> former check will always be true).
> > > >>
> > > >> I don't think it is valid to ask for a per_cpu() variable on a CPU
> > > >> which does not exist, though
> > > >
> > > > Yes it is. As long as cpu_possible(cpu), per_cpu(cpu) is valid.
> > > >
> > > > The number check should be removed: checking cpu_possible() is
> > > > sufficient.
> > > >
> > > > Hope that helps,
> > > > Rusty.
> > >
> > > I don't see a check for index being out of range in cpu_possible().
> >
> > You're right. It assumes cpu is < NR_CPUS. Hmm, I have no idea what's
> > going on. nr_cpu_ids (ignore that it's a horrible name for a bad idea)
> > should be fine to test against.
> >
> > Vegard's analysis is flawed: just because cpu is offline, it still must
> > be < nr_cpu_ids, which is based on possible cpus. Unless something crazy
> > is happening, but a quick grep doesn't reveal anyone manipulating
> > nr_cpu_ids.
> >
> > If changing this fixes the bug, something else is badly wrong...
> > Rusty.
>
> In function _cpu_up, the panic happens when calling
> __raw_notifier_call_chain at the second time. Kernel doesn't panic when
> calling it at the first time. If just say because of nr_cpu_ids, that's
> not right.
>
> By checking source codes, I find function do_boot_cpu is the culprit.
> Consider below call chain:
> _cpu_up=>__cpu_up=>smp_ops.cpu_up=>native_cpu_up=>do_boot_cpu.
>
> So do_boot_cpu is called in the end. In do_boot_cpu, if boot_error==true,
> cpu_clear(cpu, cpu_possible_map) is executed. So later on, when _cpu_up
> calls __raw_notifier_call_chain at the second time to report
> CPU_UP_CANCELED, because this cpu is already cleared from
> cpu_possible_map, get_cpu_sysdev returns NULL.
>
> Many resources are related to cpu_possible_map, so it's better not to
> change it.
>
> Below patch against 2.6.26-rc7 fixes it by removing the bit clearing in
> cpu_possible_map.
>
> Vegard, would you like to help test it?
>
> Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com>
>
> ---
>
> diff -Nraup linux-2.6.26-rc7/arch/x86/kernel/smpboot.c
> linux-2.6.26-rc7_cpuhotplug/arch/x86/kernel/smpboot.c ---
> linux-2.6.26-rc7/arch/x86/kernel/smpboot.c 2008-06-24 09:03:54.000000000
> +0800 +++ linux-2.6.26-rc7_cpuhotplug/arch/x86/kernel/smpboot.c 2008-06-24
> 09:04:45.000000000 +0800 @@ -996,7 +996,6 @@ do_rest:
> #endif
> cpu_clear(cpu, cpu_callout_map); /* was set by do_boot_cpu() */
> cpu_clear(cpu, cpu_initialized); /* was set by cpu_init() */
> - cpu_clear(cpu, cpu_possible_map);
> cpu_clear(cpu, cpu_present_map);
> per_cpu(x86_cpu_to_apicid, cpu) = BAD_APICID;
> }
Nice catch. Basically, cpu_possible_map should only be cleared at boot, and
probably not even then.
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Thanks,
Rusty.
next prev parent reply other threads:[~2008-06-24 13:16 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-06-22 12:56 v2.6.26-rc7: BUG: unable to handle kernel NULL pointer dereference Vegard Nossum
2008-06-22 14:47 ` Vegard Nossum
2008-06-22 14:54 ` Vegard Nossum
2008-06-22 15:56 ` Adrian Bunk
2008-06-22 16:29 ` Vegard Nossum
2008-06-23 3:26 ` Rusty Russell
2008-06-23 16:58 ` Mike Travis
2008-06-24 1:36 ` Rusty Russell
2008-06-24 7:40 ` Vegard Nossum
2008-06-24 8:06 ` Zhang, Yanmin
2008-06-24 8:37 ` Vegard Nossum
2008-06-24 13:14 ` Rusty Russell [this message]
2008-06-24 14:44 ` Mike Travis
2008-06-25 5:38 ` Rusty Russell
2008-06-25 15:06 ` Mike Travis
2008-06-26 12:58 ` Gautham R Shenoy
2008-06-27 3:16 ` Rusty Russell
2008-06-30 11:19 ` Ingo Molnar
2008-06-26 0:59 ` Zhang, Yanmin
2008-06-26 2:15 ` Andrew Morton
2008-06-26 9:00 ` Vegard Nossum
2008-06-26 12:40 ` Jason Wessel
2008-06-26 13:59 ` Vegard Nossum
2008-07-10 19:10 ` Vegard Nossum
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200806242314.51656.rusty@rustcorp.com.au \
--to=rusty@rustcorp.com.au \
--cc=bunk@kernel.org \
--cc=ego@in.ibm.com \
--cc=heiko.carstens@de.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=rjw@sisk.pl \
--cc=travis@sgi.com \
--cc=vatsa@in.ibm.com \
--cc=vegard.nossum@gmail.com \
--cc=yanmin.zhang@intel.com \
--cc=yanmin_zhang@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.