* FW: [BISECT] Boot failure on ia64.
@ 2008-06-24 14:39 Robin Holt
2008-06-24 15:30 ` Jes Sorensen
2008-06-24 15:41 ` Robin Holt
0 siblings, 2 replies; 4+ messages in thread
From: Robin Holt @ 2008-06-24 14:39 UTC (permalink / raw)
To: linux-ia64
Oops, missed sending this to the ia64 mailing list.
Robin
----- Forwarded message from Robin Holt <holt@sgi.com> -----
Date: Tue, 24 Jun 2008 07:30:14 -0500
From: Robin Holt <holt@sgi.com>
To: tony.luck@intel.com, Alex Chiang <achiang@hp.com>
Cc: linux-kernel@vger.kernel.org
Subject: [BISECT] Boot failure on ia64.
I bisected to this commit 3463a93def55c309f3c0d0a8aaf216be3be42d64
3463a93def55c309f3c0d0a8aaf216be3be42d64 is first bad commit
commit 3463a93def55c309f3c0d0a8aaf216be3be42d64
Author: Alex Chiang <achiang@hp.com>
Date: Wed Jun 11 17:29:27 2008 -0600
[IA64] Update check_sal_cache_flush to use platform_send_ipi()
...
This fails to boot on any sn2 ia64 with the sn2_defconfig.
Here is the output from that boot.
fs0:\efi\SuSE> elilo net0:holt/v1 root=/dev/sda7 console=ttySG0
ELILO
Uncompressing Linux... done
Linux version 2.6.26-rc5-00223-g3463a93 (holt@attica) (gcc version 4.1.2 20070115 (prerelease) (SUSE Linux)) #14 SMP Tue Jun 24 07:27:34 CDT 2008
EFI v1.10 by INTEL: SALsystab=0x6002c25f10 ACPI 2.0=0x6002c26000
console [sn_sal0] enabled
ACPI: RSDP 6002C26000, 0024 (r2 SGI)
ACPI: XSDT 6002C297F0, 0044 (r1 SGI XSDTSN2 10001 7C)
ACPI: APIC 6002C26870, 032C (r1 SGI APICSN2 10001 1)
ACPI: SRAT 6002C26BB0, 06B0 (r1 SGI SRATSN2 10001 1)
ACPI: SLIT 6002C27270, 012C (r1 SGI SLITSN2 10001 1)
ACPI: FACP 6002C27400, 00F4 (r3 SGI FACPSN2 30001 1)
ACPI: DSDT 6002C2AAF0, 0024 (r2 SGI DSDTSN2 20001 AAC)
ACPI: FACS 6002C273B0, 0040
Number of logical nodes in system = 16
Number of memory chunks in system = 16
SAL 3.2: SGI SN2 version 1.50
SAL Platform features: ITC_Drift
SAL: AP wakeup using external interrupt vector 0x12
Unable to handle kernel NULL pointer dereference (address 00000000000044b8)
swapper[0]: Oops 8813272891392 [1]
Modules linked in:
Pid: 0, CPU 0, comm: swapper
psr : 00001010084a2010 ifs : 8000000000000491 ip : [<a000000100087020>] Not tainted (2.6.26-rc5-00223-g3463a93)
ip is at sn2_send_IPI+0x80/0x240
unat: 0000000000000000 pfs : 0000000000000491 rsc : 0000000000000003
rnat: 000000000000afc8 bsps: 000000000001003e pr : 65691ba55aa68599
ldrs: 0000000000000000 ccv : 0000000000ff03ff fpsr: 0009804c8a70433f
csd : 0000000000000000 ssd : 0000000000000000
b0 : a000000100942870 b6 : 00000000ff5423b0 b7 : e000000001fffc00
f6 : 1003e0000000000000000 f7 : 1003e0000000000000001
f8 : 1003e0000000000000000 f9 : 1003e0000000000000000
f10 : 100068fffffffff700000 f11 : 1003e0000000000000090
r1 : a000000100e8bd10 r2 : 00000000000044b8 r3 : 0000000000000000
r8 : 0000000000000000 r9 : 0000000000000000 r10 : ffffffffffff6298
r11 : 0000000000000000 r12 : a000000100adfc30 r13 : a000000100ad0000
r14 : 0000000000000000 r15 : e000006003106298 r16 : e000006003110000
r17 : a000000100d0dce8 r18 : a000000100d0dce8 r19 : a000000100d0dce8
r20 : 0000000000000000 r21 : ffffffffffff0420 r22 : 0000000000000800
r23 : 0000000000000007 r24 : e0000060030b0000 r25 : 000000000004ffff
r26 : a00000010097c460 r27 : e0000060030b0010 r28 : e0000060030b0000
r29 : e0000060030b0020 r30 : 0000000000000000 r31 : 00000000000007ff
Unable to handle kernel NULL pointer dereference (address 0000000000000000)
swapper[0]: Oops 8813272891392 [2]
Modules linked in:
Pid: 0, CPU 0, comm: swapper
psr : 0000101008022018 ifs : 800000000000038c ip : [<a000000100175b30>] Not tainted (2.6.26-rc5-00223-g3463a93)
ip is at kmem_cache_alloc+0x70/0x180
unat: 0000000000000000 pfs : 0000000000000610 rsc : 0000000000000003
rnat: 0000000000000000 bsps: 0000000000000000 pr : 65691ba55aa69aa5
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f
csd : 0000000000000000 ssd : 0000000000000000
b0 : a000000100040bc0 b6 : a000000100040e00 b7 : a00000010000b730
f6 : 1003e45b3373c16c02344 f7 : 1003e9e3779b97f4a7c16
f8 : 1003e0a00000010001426 f9 : 10006c7fffffffd73ea5c
f10 : 100068fffffffff700000 f11 : 1003e0000000000000090
r1 : a000000100e8bd10 r2 : a000000100bae950 r3 : a000000100bac860
r8 : 0000000000000000 r9 : 0000000000000000 r10 : a000000100ad0c54
r11 : 0000000000000000 r12 : a000000100adf100 r13 : a000000100ad0000
r14 : 0000000000000014 r15 : a000000100adf190 r16 : a000000100adf198
r17 : a000000100ca1480 r18 : a000000100adf17c r19 : a000000100adf170
r20 : 0000000000000000 r21 : 0000000000000000 r22 : a000000100adf170
r23 : a000000100adf174 r24 : 000000000000000c r25 : a000000100adf180
r26 : a000000100adf174 r27 : 0000000000000000 r28 : 0000000000000000
r29 : a000000100adf178 r30 : 000000007fffffff r31 : 000000000000000c
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
----- End forwarded message -----
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: FW: [BISECT] Boot failure on ia64.
2008-06-24 14:39 FW: [BISECT] Boot failure on ia64 Robin Holt
@ 2008-06-24 15:30 ` Jes Sorensen
2008-06-24 16:35 ` Alex Chiang
2008-06-24 15:41 ` Robin Holt
1 sibling, 1 reply; 4+ messages in thread
From: Jes Sorensen @ 2008-06-24 15:30 UTC (permalink / raw)
To: linux-ia64
>>>>> "Robin" = Robin Holt <holt@sgi.com> writes:
Robin> Oops, missed sending this to the ia64 mailing list. Robin
Hi Robin,
Just hit the same problem and did a little digging. It's because
platform_send_ipi() ends up doing a cpuid_to_nasid() on sn2, which
relies on NUMA information etc. being setup.
In fact, check_sal_cache_flush() is called a fair bit before
platform_setup() in arch/ia64/kernel/setup.c, which I would claim is
completely broken.
Either check_sal_cache_flush() needs to be moved to after
platform_setup() or Alex's patch should be reverted until a better
solution is found. I am attaching a patch that does the former, but I
don't know if this is safe on HP's systems.
This boots on SN2.
Cheers,
Jes
Call check_sal_cache_flush() after platform_setup() as
check_sal_cache_flush() now relies on being able to call platform
vector code.
Signed-off-by: Jes Sorensen <jes@sgi.com>
---
arch/ia64/kernel/setup.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
Index: linux-2.6.git/arch/ia64/kernel/setup.c
=================================--- linux-2.6.git.orig/arch/ia64/kernel/setup.c
+++ linux-2.6.git/arch/ia64/kernel/setup.c
@@ -578,8 +578,6 @@ setup_arch (char **cmdline_p)
cpu_init(); /* initialize the bootstrap CPU */
mmu_context_init(); /* initialize context_id bitmap */
- check_sal_cache_flush();
-
#ifdef CONFIG_ACPI
acpi_boot_init();
#endif
@@ -607,6 +605,7 @@ setup_arch (char **cmdline_p)
ia64_mca_init();
platform_setup(cmdline_p);
+ check_sal_cache_flush();
paging_init();
}
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: FW: [BISECT] Boot failure on ia64.
2008-06-24 14:39 FW: [BISECT] Boot failure on ia64 Robin Holt
2008-06-24 15:30 ` Jes Sorensen
@ 2008-06-24 15:41 ` Robin Holt
1 sibling, 0 replies; 4+ messages in thread
From: Robin Holt @ 2008-06-24 15:41 UTC (permalink / raw)
To: linux-ia64
Yep, that gets it.
Thanks,
Robin
On Tue, Jun 24, 2008 at 11:30:09AM -0400, Jes Sorensen wrote:
> >>>>> "Robin" = Robin Holt <holt@sgi.com> writes:
>
> Robin> Oops, missed sending this to the ia64 mailing list. Robin
>
> Hi Robin,
>
> Just hit the same problem and did a little digging. It's because
> platform_send_ipi() ends up doing a cpuid_to_nasid() on sn2, which
> relies on NUMA information etc. being setup.
>
> In fact, check_sal_cache_flush() is called a fair bit before
> platform_setup() in arch/ia64/kernel/setup.c, which I would claim is
> completely broken.
>
> Either check_sal_cache_flush() needs to be moved to after
> platform_setup() or Alex's patch should be reverted until a better
> solution is found. I am attaching a patch that does the former, but I
> don't know if this is safe on HP's systems.
>
> This boots on SN2.
>
> Cheers,
> Jes
>
> Call check_sal_cache_flush() after platform_setup() as
> check_sal_cache_flush() now relies on being able to call platform
> vector code.
>
> Signed-off-by: Jes Sorensen <jes@sgi.com>
> ---
> arch/ia64/kernel/setup.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> Index: linux-2.6.git/arch/ia64/kernel/setup.c
> =================================> --- linux-2.6.git.orig/arch/ia64/kernel/setup.c
> +++ linux-2.6.git/arch/ia64/kernel/setup.c
> @@ -578,8 +578,6 @@ setup_arch (char **cmdline_p)
> cpu_init(); /* initialize the bootstrap CPU */
> mmu_context_init(); /* initialize context_id bitmap */
>
> - check_sal_cache_flush();
> -
> #ifdef CONFIG_ACPI
> acpi_boot_init();
> #endif
> @@ -607,6 +605,7 @@ setup_arch (char **cmdline_p)
> ia64_mca_init();
>
> platform_setup(cmdline_p);
> + check_sal_cache_flush();
> paging_init();
> }
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: FW: [BISECT] Boot failure on ia64.
2008-06-24 15:30 ` Jes Sorensen
@ 2008-06-24 16:35 ` Alex Chiang
0 siblings, 0 replies; 4+ messages in thread
From: Alex Chiang @ 2008-06-24 16:35 UTC (permalink / raw)
To: Jes Sorensen; +Cc: Robin Holt, linux-ia64, tony.luck, linux-kernel
* Jes Sorensen <jes@sgi.com>:
> >>>>> "Robin" = Robin Holt <holt@sgi.com> writes:
>
> Robin> Oops, missed sending this to the ia64 mailing list. Robin
>
> Hi Robin,
>
> Just hit the same problem and did a little digging. It's because
> platform_send_ipi() ends up doing a cpuid_to_nasid() on sn2, which
> relies on NUMA information etc. being setup.
>
> In fact, check_sal_cache_flush() is called a fair bit before
> platform_setup() in arch/ia64/kernel/setup.c, which I would claim is
> completely broken.
Yeah, using platform_* before platform_setup() completes is
probably a bad idea. :-/
> Either check_sal_cache_flush() needs to be moved to after
> platform_setup() or Alex's patch should be reverted until a better
> solution is found. I am attaching a patch that does the former, but I
> don't know if this is safe on HP's systems.
>
> This boots on SN2.
This patch works on
- rx5670 (the original buggy HP platform)
- rx6600 (low-end HP platform)
- rx7620 (mid-range HP platform)
Thanks for fixing this.
Tested-by: Alex Chiang <achiang@hp.com>
/ac
>
> Cheers,
> Jes
>
> Call check_sal_cache_flush() after platform_setup() as
> check_sal_cache_flush() now relies on being able to call platform
> vector code.
>
> Signed-off-by: Jes Sorensen <jes@sgi.com>
> ---
> arch/ia64/kernel/setup.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> Index: linux-2.6.git/arch/ia64/kernel/setup.c
> =================================> --- linux-2.6.git.orig/arch/ia64/kernel/setup.c
> +++ linux-2.6.git/arch/ia64/kernel/setup.c
> @@ -578,8 +578,6 @@ setup_arch (char **cmdline_p)
> cpu_init(); /* initialize the bootstrap CPU */
> mmu_context_init(); /* initialize context_id bitmap */
>
> - check_sal_cache_flush();
> -
> #ifdef CONFIG_ACPI
> acpi_boot_init();
> #endif
> @@ -607,6 +605,7 @@ setup_arch (char **cmdline_p)
> ia64_mca_init();
>
> platform_setup(cmdline_p);
> + check_sal_cache_flush();
> paging_init();
> }
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2008-06-24 16:35 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-24 14:39 FW: [BISECT] Boot failure on ia64 Robin Holt
2008-06-24 15:30 ` Jes Sorensen
2008-06-24 16:35 ` Alex Chiang
2008-06-24 15:41 ` Robin Holt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).