* [REGRESSION] Minimize per_cpu reservations patch causes NULL ptr
@ 2008-11-04 20:28 Doug Chapman
2008-11-04 20:35 ` [REGRESSION] Minimize per_cpu reservations patch causes NULL Doug Chapman
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Doug Chapman @ 2008-11-04 20:28 UTC (permalink / raw)
To: linux-ia64
At HP we ran into a regression that appears to be caused by:
commit 2c6e6db41f01b6b4eb98809350827c9678996698
Author: holt@sgi.com <holt@sgi.com>
Date: Thu Apr 3 15:17:13 2008 -0500
[IA64] Minimize per_cpu reservations.
This is seen when we have one or more CPUs deconfigured via "cpuconfig"
at the EFI shell or on systems that have an unpopulated CPU socket. We
hit this traceback (edited for clarity)
Unable to handle kernel NULL pointer dereference (address 0000000000000040)
[<a000000100618bc0>] _spin_lock+0x20/0x60
[<a0000001000811b0>] __enable_runtime+0x70/0x160
[<a000000100086a50>] rq_online_rt+0x110/0x180
[<a00000010007e070>] set_rq_online+0x110/0x160
[<a00000010060db90>] migration_call+0x290/0xc00
[<a000000100807060>] migration_init+0xc0/0x100
[<a00000010000a960>] do_one_initcall+0xa0/0x2a0
[<a0000001007f0310>] kernel_init+0xd0/0x5a0
The problem is the per-cpu datastructures are used but not set up for
the offline CPUs.
in setup_arch() we call this bit of code which should add the offline CPUs to
early_cpu_possible_mask:
per_cpu_scan_finalize((cpus_weight(early_cpu_possible_map) = 0 ?
32 : cpus_weight(early_cpu_possible_map)),
additional_cpus > 0 ? additional_cpus : 0);
however, additional_cpus has not been set yet. It _can_ be set by the user as
a command line parameter however it is normally set by prefill_possible_map()
which is called later. At the point where per_cpu_scan_finalize is called we
_cannot_ know what additional_cpus is because we have not yet gotten that info
from acpi. That happens when we call acpi_boot_init() which hapens later in
setup_arch().
It appears the intent of calling per_cpu_scan_finalize() at the current
location is that it needs to be called prior to find_memory(). The "Minimize
per_cpu reservations." patch changes code in arch/ia64/mm/discontig.c which
sets up the per-cpu datastructures to only set those up for CPUs in
early_cpu_possible_map. The problem is we don't have the offline CPUs in that
map yet. Later code tries to use these datastructures and we hit the NULL ptr.
So, it seems the options to fix this that I can think of are:
1) try to get the info regarding offline CPUs earlier either by calling
acpi_boot_init() earlier (probably not possible prior to calling find_memory())
or walk a subset of the ACPI tables to get the number of offline CPUs sooner
(which sounds ugly).
2) after we call acpi_boot_init() go back and setup the per-cpu datastructures
for the offline CPUs then. Seems like this might be cleaner but I have not
investigated the specifics.
I will start looking at option #2 but certainly welcome suggestions, perhaps
there is an easy fix?
thanks,
- Doug
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [REGRESSION] Minimize per_cpu reservations patch causes NULL
2008-11-04 20:28 [REGRESSION] Minimize per_cpu reservations patch causes NULL ptr Doug Chapman
@ 2008-11-04 20:35 ` Doug Chapman
2008-11-04 21:12 ` Luck, Tony
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Doug Chapman @ 2008-11-04 20:35 UTC (permalink / raw)
To: linux-ia64
On Tue, 2008-11-04 at 15:28 -0500, Doug Chapman wrote:
> It appears the intent of calling per_cpu_scan_finalize() at the
> current
> location is that it needs to be called prior to find_memory(). The
> "Minimize
> per_cpu reservations." patch changes code in arch/ia64/mm/discontig.c
> which
> sets up the per-cpu datastructures to only set those up for CPUs in
> early_cpu_possible_map.
Correction on this bit, it isn't find_memory() it is cpu_init() which
calls per_cpu_init() however the general problem is the same.
- Doug
^ permalink raw reply [flat|nested] 5+ messages in thread* RE: [REGRESSION] Minimize per_cpu reservations patch causes NULL
2008-11-04 20:28 [REGRESSION] Minimize per_cpu reservations patch causes NULL ptr Doug Chapman
2008-11-04 20:35 ` [REGRESSION] Minimize per_cpu reservations patch causes NULL Doug Chapman
@ 2008-11-04 21:12 ` Luck, Tony
2008-11-04 21:21 ` Doug Chapman
2008-11-05 23:07 ` Doug Chapman
3 siblings, 0 replies; 5+ messages in thread
From: Luck, Tony @ 2008-11-04 21:12 UTC (permalink / raw)
To: linux-ia64
> 1) try to get the info regarding offline CPUs earlier either by calling
> acpi_boot_init() earlier (probably not possible prior to calling find_memory())
> or walk a subset of the ACPI tables to get the number of offline CPUs sooner
> (which sounds ugly).
>
>
> 2) after we call acpi_boot_init() go back and setup the per-cpu datastructures
> for the offline CPUs then. Seems like this might be cleaner but I have not
> investigated the specifics.
What about:
3) Perform the needed allocations when we try to bring the new cpu online?
It is possible that this will fail ... but if you are so low on memory
that you can't allocate a couple of pages for the new cpu, you are in
a world of hurt already and adding a new cpu is liklely to make things
worse.
-Tony
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: [REGRESSION] Minimize per_cpu reservations patch causes NULL
2008-11-04 20:28 [REGRESSION] Minimize per_cpu reservations patch causes NULL ptr Doug Chapman
2008-11-04 20:35 ` [REGRESSION] Minimize per_cpu reservations patch causes NULL Doug Chapman
2008-11-04 21:12 ` Luck, Tony
@ 2008-11-04 21:21 ` Doug Chapman
2008-11-05 23:07 ` Doug Chapman
3 siblings, 0 replies; 5+ messages in thread
From: Doug Chapman @ 2008-11-04 21:21 UTC (permalink / raw)
To: linux-ia64
On Tue, 2008-11-04 at 13:12 -0800, Luck, Tony wrote:
> > 1) try to get the info regarding offline CPUs earlier either by calling
> > acpi_boot_init() earlier (probably not possible prior to calling find_memory())
> > or walk a subset of the ACPI tables to get the number of offline CPUs sooner
> > (which sounds ugly).
> >
> >
> > 2) after we call acpi_boot_init() go back and setup the per-cpu datastructures
> > for the offline CPUs then. Seems like this might be cleaner but I have not
> > investigated the specifics.
>
> What about:
>
> 3) Perform the needed allocations when we try to bring the new cpu online?
Seems logical, we would need to fix the code that is trying to use the
datastructures for offline CPUs however. The panic we current hit is in
migration_thread and that might be the only case.
>
> It is possible that this will fail ... but if you are so low on memory
> that you can't allocate a couple of pages for the new cpu, you are in
> a world of hurt already and adding a new cpu is liklely to make things
> worse.
The patch that caused this doesn't just prevent allocating structures
for the offline CPUs but prevents allocating NR_CPUS (4096 in the case
of SLES11) per-cpu structures on a small system that has perhaps only 4
CPUs.
- Doug
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [REGRESSION] Minimize per_cpu reservations patch causes NULL
2008-11-04 20:28 [REGRESSION] Minimize per_cpu reservations patch causes NULL ptr Doug Chapman
` (2 preceding siblings ...)
2008-11-04 21:21 ` Doug Chapman
@ 2008-11-05 23:07 ` Doug Chapman
3 siblings, 0 replies; 5+ messages in thread
From: Doug Chapman @ 2008-11-05 23:07 UTC (permalink / raw)
To: linux-ia64
On Tue, 2008-11-04 at 15:28 -0500, Doug Chapman wrote:
> So, it seems the options to fix this that I can think of are:
>
> 1) try to get the info regarding offline CPUs earlier either by calling
> acpi_boot_init() earlier (probably not possible prior to calling find_memory())
> or walk a subset of the ACPI tables to get the number of offline CPUs sooner
> (which sounds ugly).
This ended up being a clean way to resolve this. I just posted the patch, see
the "[PATCH][IA64] fix boot panic caused by offline CPUs" posting.
- Doug
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2008-11-05 23:07 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-04 20:28 [REGRESSION] Minimize per_cpu reservations patch causes NULL ptr Doug Chapman
2008-11-04 20:35 ` [REGRESSION] Minimize per_cpu reservations patch causes NULL Doug Chapman
2008-11-04 21:12 ` Luck, Tony
2008-11-04 21:21 ` Doug Chapman
2008-11-05 23:07 ` Doug Chapman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox