* [PATCH]send slave cpus to SAL slave loop on crash (IA64)
@ 2006-10-30 20:36 Jay Lan
2006-10-31 2:02 ` Zou, Nanhai
` (10 more replies)
0 siblings, 11 replies; 12+ messages in thread
From: Jay Lan @ 2006-10-30 20:36 UTC (permalink / raw)
To: linux-ia64
[-- Attachment #1: Type: text/plain, Size: 319 bytes --]
This patch is to fix a problem of interrupts being sent to cpus
that can not respond.
This patch would return slave cpus to SAL slave loop, at time of
crash, except cpu0. The cpu0 is a special case as there is no way
to return it to SAL, so cpu0 is better handled in firmware.
Signed-off-by: Jay Lan <jlan@sgi.com>
[-- Attachment #2: send-cpus-to-slave-loop --]
[-- Type: text/plain, Size: 898 bytes --]
Index: linux/arch/ia64/kernel/crash.c
===================================================================
--- linux.orig/arch/ia64/kernel/crash.c 2006-10-17 17:09:45.662734380 -0700
+++ linux/arch/ia64/kernel/crash.c 2006-10-30 12:27:12.080526026 -0800
@@ -146,14 +146,21 @@ machine_kdump_on_init(void)
void
kdump_cpu_freeze(struct unw_frame_info *info, void *arg)
{
+ int cpuid = smp_processor_id();
+
local_irq_disable();
crash_save_this_cpu();
current->thread.ksp = (__u64)info->sw - 16;
atomic_inc(&kdump_cpu_freezed);
- kdump_status[smp_processor_id()] = 1;
+ kdump_status[cpuid] = 1;
mb();
- for (;;)
- cpu_relax();
+ /* return cpus (except cpu0) to SAL slave loop */
+ if (cpuid == 0) {
+ for (;;)
+ cpu_relax();
+ } else {
+ ia64_jump_to_sal(&sal_boot_rendez_state[cpuid]);
+ }
}
static int
^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: [PATCH]send slave cpus to SAL slave loop on crash (IA64)
2006-10-30 20:36 [PATCH]send slave cpus to SAL slave loop on crash (IA64) Jay Lan
@ 2006-10-31 2:02 ` Zou, Nanhai
2006-10-31 4:33 ` Zou, Nanhai
` (9 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Zou, Nanhai @ 2006-10-31 2:02 UTC (permalink / raw)
To: linux-ia64
> -----Original Message-----
> From: Jay Lan [mailto:jlan@engr.sgi.com]
> Sent: 2006Äê10ÔÂ31ÈÕ 4:36
> To: fastboot
> Cc: Linux-IA64; Zou, Nanhai; Jack Steiner; Luck, Tony
> Subject: [PATCH]send slave cpus to SAL slave loop on crash (IA64)
>
> This patch is to fix a problem of interrupts being sent to cpus
> that can not respond.
>
> This patch would return slave cpus to SAL slave loop, at time of
> crash, except cpu0. The cpu0 is a special case as there is no way
> to return it to SAL, so cpu0 is better handled in firmware.
>
> Signed-off-by: Jay Lan <jlan@sgi.com>
>
Does this fix the I/O interrupt redirect issue on SN?
However this patch will make Kdump depends on cpu hotplug code, so you may add the dependency in Kconfig.
Thanks
Zou Nan hai
^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: [PATCH]send slave cpus to SAL slave loop on crash (IA64)
2006-10-30 20:36 [PATCH]send slave cpus to SAL slave loop on crash (IA64) Jay Lan
2006-10-31 2:02 ` Zou, Nanhai
@ 2006-10-31 4:33 ` Zou, Nanhai
2006-10-31 8:59 ` Jay Lan
` (8 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Zou, Nanhai @ 2006-10-31 4:33 UTC (permalink / raw)
To: linux-ia64
> -----Original Message-----
> From: Jay Lan [mailto:jlan@sgi.com]
> Sent: 2006Äê10ÔÂ31ÈÕ 10:53
> To: Zou, Nanhai
> Cc: fastboot; Linux-IA64; Jack Steiner; Luck, Tony
> Subject: Re: [PATCH]send slave cpus to SAL slave loop on crash (IA64)
>
> Zou, Nanhai wrote:
> >> -----Original Message-----
> >> From: Jay Lan [mailto:jlan@engr.sgi.com]
> >> Sent: 2006A"¨º10O^A^31E`O~ 4:36
> >> To: fastboot
> >> Cc: Linux-IA64; Zou, Nanhai; Jack Steiner; Luck, Tony
> >> Subject: [PATCH]send slave cpus to SAL slave loop on crash (IA64)
> >>
> >> This patch is to fix a problem of interrupts being sent to cpus
> >> that can not respond.
> >>
> >> This patch would return slave cpus to SAL slave loop, at time of
> >> crash, except cpu0. The cpu0 is a special case as there is no way
> >> to return it to SAL, so cpu0 is better handled in firmware.
> >>
> >> Signed-off-by: Jay Lan <jlan@sgi.com>
> >>
> >
> >
> > Does this fix the I/O interrupt redirect issue on SN?
>
> This fixes the interrupts being sent to cpus not in the
> slave loop that caused hang on SN. When one boots up the
> kexec'ed kernel with 'maxcpus=1', all idle cpus needs to
> be sent back. If they are not returned to the SAL slave
> loop and just looping in cpu_relax(), they are considered
> alive, but interrupts would be lost and system hang.
>
But this will rely on machine crash on CPU 0?
Current Kdump will boot to second kernel on the crashing CPU.
So if machine crash and boot on CPU N, CPU 0 will still not be able to redirect interrupt, right?
> This is different from the kexec '--noio' option you added
> to kexec-tools. We still need that fix.
>
Does --noio patch works on SN? I remember you have mentioned there is still some issue when you testing --noio option on SN system?
> > However this patch will make Kdump depends on cpu hotplug code, so you may
> add the dependency in Kconfig.
>
> I thought Kahalid Aziz's patch covered this?
> http://lists.osdl.org/mailman/htdig/fastboot/2006-October/004548.html
>
> Thanks,
> - jay
>
> >
> > Thanks
> > Zou Nan hai
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH]send slave cpus to SAL slave loop on crash (IA64)
2006-10-30 20:36 [PATCH]send slave cpus to SAL slave loop on crash (IA64) Jay Lan
2006-10-31 2:02 ` Zou, Nanhai
2006-10-31 4:33 ` Zou, Nanhai
@ 2006-10-31 8:59 ` Jay Lan
2006-10-31 9:11 ` Zou, Nanhai
` (7 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Jay Lan @ 2006-10-31 8:59 UTC (permalink / raw)
To: linux-ia64
Zou, Nanhai wrote:
>> -----Original Message-----
>> From: Jay Lan [mailto:jlan@sgi.com]
>> Sent: 2006Äê10ÔÂ31ÈÕ 10:53
>> To: Zou, Nanhai
>> Cc: fastboot; Linux-IA64; Jack Steiner; Luck, Tony
>> Subject: Re: [PATCH]send slave cpus to SAL slave loop on crash (IA64)
>>
>> Zou, Nanhai wrote:
>>>> -----Original Message-----
>>>> From: Jay Lan [mailto:jlan@engr.sgi.com]
>>>> Sent: 2006A"¨º10O^A^31E`O~ 4:36
>>>> To: fastboot
>>>> Cc: Linux-IA64; Zou, Nanhai; Jack Steiner; Luck, Tony
>>>> Subject: [PATCH]send slave cpus to SAL slave loop on crash (IA64)
>>>>
>>>> This patch is to fix a problem of interrupts being sent to cpus
>>>> that can not respond.
>>>>
>>>> This patch would return slave cpus to SAL slave loop, at time of
>>>> crash, except cpu0. The cpu0 is a special case as there is no way
>>>> to return it to SAL, so cpu0 is better handled in firmware.
>>>>
>>>> Signed-off-by: Jay Lan <jlan@sgi.com>
>>>>
>>>
>>> Does this fix the I/O interrupt redirect issue on SN?
>> This fixes the interrupts being sent to cpus not in the
>> slave loop that caused hang on SN. When one boots up the
>> kexec'ed kernel with 'maxcpus=1', all idle cpus needs to
>> be sent back. If they are not returned to the SAL slave
>> loop and just looping in cpu_relax(), they are considered
>> alive, but interrupts would be lost and system hang.
>>
>
> But this will rely on machine crash on CPU 0?
We do not rely on machine crash on CPU 0 any more. If the
crashing CPU is not cpu 0 and the cpu 0 not being returned to
the slave loop, this case is handled by our PROM now.
However, if somebody tries to boot up a production kernel using '-le'
option _after_ the kexec'ed kernel is up running, the third kernel
would not boot unless we boot up the second kernel with cpu 0. I
posted a question on "if running 'kexec -le' on a kexec'ed kdump
kernel is legal" earlier and Vivek responded saying the scenario
is not guranteed to work. So, i think we are fine here.
> Current Kdump will boot to second kernel on the crashing CPU.
> So if machine crash and boot on CPU N, CPU 0 will still not be able to redirect interrupt, right?
Yes, and this case is handled in our PROM.
>
>> This is different from the kexec '--noio' option you added
>> to kexec-tools. We still need that fix.
>>
>
>
> Does --noio patch works on SN? I remember you have mentioned there is still some issue when you testing --noio option on SN system?
We need the --noio option to have kexec-kdump working on SN. The problem
was the patch you posted. It was different from the suggestion you
gave me when we first encountered the problem. If we, as you first
suggested, noop all inline function defined in purgatory/arch/ia64/io.h,
then it works.
Is there any issue if the noio patch is changed to your original
suggestion?
Thanks,
- jay
>
>>> However this patch will make Kdump depends on cpu hotplug code, so you may
>> add the dependency in Kconfig.
>>
>> I thought Kahalid Aziz's patch covered this?
>> http://lists.osdl.org/mailman/htdig/fastboot/2006-October/004548.html
>>
>
>> Thanks,
>> - jay
>>
>>> Thanks
>>> Zou Nan hai
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: [PATCH]send slave cpus to SAL slave loop on crash (IA64)
2006-10-30 20:36 [PATCH]send slave cpus to SAL slave loop on crash (IA64) Jay Lan
` (2 preceding siblings ...)
2006-10-31 8:59 ` Jay Lan
@ 2006-10-31 9:11 ` Zou, Nanhai
2006-10-31 18:08 ` Jay Lan
` (6 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Zou, Nanhai @ 2006-10-31 9:11 UTC (permalink / raw)
To: linux-ia64
> -----Original Message-----
> From: Jay Lan [mailto:jlan@sgi.com]
> Sent: 2006年10月31日 17:00
> To: Zou, Nanhai
> Cc: fastboot; Linux-IA64; Jack Steiner; Luck, Tony
> Subject: Re: [PATCH]send slave cpus to SAL slave loop on crash (IA64)
>
> Zou, Nanhai wrote:
> >> -----Original Message-----
> >> From: Jay Lan [mailto:jlan@sgi.com]
> >> Sent: 2006年10月31日 10:53
> >> To: Zou, Nanhai
> >> Cc: fastboot; Linux-IA64; Jack Steiner; Luck, Tony
> >> Subject: Re: [PATCH]send slave cpus to SAL slave loop on crash (IA64)
> >>
> >> Zou, Nanhai wrote:
> >>>> -----Original Message-----
> >>>> From: Jay Lan [mailto:jlan@engr.sgi.com]
> >>>> Sent: 2006A"ê10O^A^31E`O~ 4:36
> >>>> To: fastboot
> >>>> Cc: Linux-IA64; Zou, Nanhai; Jack Steiner; Luck, Tony
> >>>> Subject: [PATCH]send slave cpus to SAL slave loop on crash (IA64)
> >>>>
> >>>> This patch is to fix a problem of interrupts being sent to cpus
> >>>> that can not respond.
> >>>>
> >>>> This patch would return slave cpus to SAL slave loop, at time of
> >>>> crash, except cpu0. The cpu0 is a special case as there is no way
> >>>> to return it to SAL, so cpu0 is better handled in firmware.
> >>>>
> >>>> Signed-off-by: Jay Lan <jlan@sgi.com>
> >>>>
> >>>
> >>> Does this fix the I/O interrupt redirect issue on SN?
> >> This fixes the interrupts being sent to cpus not in the
> >> slave loop that caused hang on SN. When one boots up the
> >> kexec'ed kernel with 'maxcpus=1', all idle cpus needs to
> >> be sent back. If they are not returned to the SAL slave
> >> loop and just looping in cpu_relax(), they are considered
> >> alive, but interrupts would be lost and system hang.
> >>
> >
> > But this will rely on machine crash on CPU 0?
>
> We do not rely on machine crash on CPU 0 any more. If the
> crashing CPU is not cpu 0 and the cpu 0 not being returned to
> the slave loop, this case is handled by our PROM now.
>
> However, if somebody tries to boot up a production kernel using '-le'
> option _after_ the kexec'ed kernel is up running, the third kernel
> would not boot unless we boot up the second kernel with cpu 0. I
> posted a question on "if running 'kexec -le' on a kexec'ed kdump
> kernel is legal" earlier and Vivek responded saying the scenario
> is not guranteed to work. So, i think we are fine here.
Ok, so with this patch and the PROM fix, on a SN system,
1. Kdump -> 2nd kernel works.
2. Kdump -> 2nd kernel -> Kexec to third kernel will not work.
3. Kexec -> 2nd Kernel -> Kexec -> 3rd kernel works?
4. Kexec -> 2nd Kernel -> Kdump -> 3rd kernel works?
I think if scenario 1, 3 and 4 works it will be ok. Scenario 2 is not so useful I guess.
>
>
> > Current Kdump will boot to second kernel on the crashing CPU.
> > So if machine crash and boot on CPU N, CPU 0 will still not be able to redirect
> interrupt, right?
>
> Yes, and this case is handled in our PROM.
>
> >
> >> This is different from the kexec '--noio' option you added
> >> to kexec-tools. We still need that fix.
> >>
> >
> >
> > Does --noio patch works on SN? I remember you have mentioned there is still
> some issue when you testing --noio option on SN system?
>
> We need the --noio option to have kexec-kdump working on SN. The problem
> was the patch you posted. It was different from the suggestion you
> gave me when we first encountered the problem. If we, as you first
> suggested, noop all inline function defined in purgatory/arch/ia64/io.h,
> then it works.
>
> Is there any issue if the noio patch is changed to your original
> suggestion?
>
--noio patch should be the same to my original sugguestion..., it bypass all PIO and MMIO in purgatory with --noio option. I need to have a check though.
Thanks
Zou Nan hai
-
To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH]send slave cpus to SAL slave loop on crash (IA64)
2006-10-30 20:36 [PATCH]send slave cpus to SAL slave loop on crash (IA64) Jay Lan
` (3 preceding siblings ...)
2006-10-31 9:11 ` Zou, Nanhai
@ 2006-10-31 18:08 ` Jay Lan
2006-11-03 17:42 ` Jay Lan
` (5 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Jay Lan @ 2006-10-31 18:08 UTC (permalink / raw)
To: linux-ia64
Zou, Nanhai wrote:
>> We do not rely on machine crash on CPU 0 any more. If the
>> crashing CPU is not cpu 0 and the cpu 0 not being returned to
>> the slave loop, this case is handled by our PROM now.
>>
>> However, if somebody tries to boot up a production kernel using '-le'
>> option _after_ the kexec'ed kernel is up running, the third kernel
>> would not boot unless we boot up the second kernel with cpu 0. I
>> posted a question on "if running 'kexec -le' on a kexec'ed kdump
>> kernel is legal" earlier and Vivek responded saying the scenario
>> is not guranteed to work. So, i think we are fine here.
>
> Ok, so with this patch and the PROM fix, on a SN system,
> 1. Kdump -> 2nd kernel works.
> 2. Kdump -> 2nd kernel -> Kexec to third kernel will not work.
> 3. Kexec -> 2nd Kernel -> Kexec -> 3rd kernel works?
> 4. Kexec -> 2nd Kernel -> Kdump -> 3rd kernel works?
>
> I think if scenario 1, 3 and 4 works it will be ok. Scenario 2 is not so useful I guess.
>
The '-l' option caused a seg fault of kexec:
[root@pogo1 boot]# /home/jlan/kexec -l
/boot/efi/efi/redhat/vmlinuz-2.6.18-kdump
--initrd=/boot/efi/efi/redhat/initrd-2.6.18-kdump
--append="root=/dev/sdb6 rhgb irqpoll ro quite"
Done with process_options
kernel: 0x2000000000328010 kernel_size: 3503fec
memory_range: crashk, idx=5, start018000000, end028000000
memory_range: Boot, idx=7, start07a280010, end07a280061
memory_range: MemoryMap, idx=9, start07a3f0010, end07a3f0611
build_mem_shdrs: ei_class=2, e_shnumF, e_shoffT509848
build_mem_shdrs: sizeof(e_shdr)r, e_shdr=0x6000000000014120
ready to load. type=0,
build_mem_shdrs: ei_class=2, e_shnumF, e_shoffT509848
build_mem_shdrs: sizeof(e_shdr)r, e_shdr=0x6000000000014f30
elf_exec_load
Invalid memory segment 0x4000000 - 0x498bfff
Segmentation fault
[root@pogo1 boot]#
It is on my list but my priority is to make sure kdump work (on
sysrq-c, INIT, MCA) and to ensure /proc/vmcore contains correct
and needed information for SN.
Thanks,
- jay
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH]send slave cpus to SAL slave loop on crash (IA64)
2006-10-30 20:36 [PATCH]send slave cpus to SAL slave loop on crash (IA64) Jay Lan
` (4 preceding siblings ...)
2006-10-31 18:08 ` Jay Lan
@ 2006-11-03 17:42 ` Jay Lan
2006-11-08 2:01 ` Zou, Nanhai
` (4 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Jay Lan @ 2006-11-03 17:42 UTC (permalink / raw)
To: linux-ia64
Zou, Nanhai wrote:
> --noio patch should be the same to my original sugguestion..., it bypass all PIO and MMIO in purgatory with --noio option. I need to have a check though.
Hi Nanhai,
I finally got a chance to look at this further.
Your patch touched four files:
kexec/arch/ia64/kexec-elf-ia64.c
purgatory/arch/ia64/entry.S
purgatory/arch/ia64/include/arch/io.h
purgatory/arch/ia64/io.h
I replaced both of the io.h with my version and it still
failed. So, it should be the way --noio option being handled
that made the difference.
I did have debug messages showing that "noio" option was
processed in elf_rel_set_symbol() in my testing. Does
kernel code need to know about this option? Somehow
this change makes kernel fail in init().
Thanks,
- jay
>
> Thanks
> Zou Nan hai
^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: [PATCH]send slave cpus to SAL slave loop on crash (IA64)
2006-10-30 20:36 [PATCH]send slave cpus to SAL slave loop on crash (IA64) Jay Lan
` (5 preceding siblings ...)
2006-11-03 17:42 ` Jay Lan
@ 2006-11-08 2:01 ` Zou, Nanhai
2006-11-10 19:23 ` Jay Lan
` (3 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Zou, Nanhai @ 2006-11-08 2:01 UTC (permalink / raw)
To: linux-ia64
> -----Original Message-----
> From: Jay Lan [mailto:jlan@sgi.com]
> Sent: 2006Äê11ÔÂ4ÈÕ 1:42
> To: Zou, Nanhai
> Cc: fastboot; Linux-IA64; Jack Steiner; Luck, Tony
> Subject: Re: [PATCH]send slave cpus to SAL slave loop on crash (IA64)
>
> Zou, Nanhai wrote:
> > --noio patch should be the same to my original sugguestion..., it bypass
> all PIO and MMIO in purgatory with --noio option. I need to have a check though.
>
> Hi Nanhai,
>
> I finally got a chance to look at this further.
>
> Your patch touched four files:
> kexec/arch/ia64/kexec-elf-ia64.c
> purgatory/arch/ia64/entry.S
> purgatory/arch/ia64/include/arch/io.h
> purgatory/arch/ia64/io.h
>
> I replaced both of the io.h with my version and it still
> failed. So, it should be the way --noio option being handled
> that made the difference.
>
> I did have debug messages showing that "noio" option was
> processed in elf_rel_set_symbol() in my testing. Does
> kernel code need to know about this option? Somehow
> this change makes kernel fail in init().
>
No, kernel does not need to know this. Kexec-tools pass this option to purgatory code by relocating at kexec load time.
Zou Nan hai
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH]send slave cpus to SAL slave loop on crash (IA64)
2006-10-30 20:36 [PATCH]send slave cpus to SAL slave loop on crash (IA64) Jay Lan
` (6 preceding siblings ...)
2006-11-08 2:01 ` Zou, Nanhai
@ 2006-11-10 19:23 ` Jay Lan
2006-11-14 1:25 ` Zou, Nanhai
` (2 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Jay Lan @ 2006-11-10 19:23 UTC (permalink / raw)
To: linux-ia64
Zou, Nanhai wrote:
>>> But this will rely on machine crash on CPU 0?
>> We do not rely on machine crash on CPU 0 any more. If the
>> crashing CPU is not cpu 0 and the cpu 0 not being returned to
>> the slave loop, this case is handled by our PROM now.
>>
>> However, if somebody tries to boot up a production kernel using '-le'
>> option _after_ the kexec'ed kernel is up running, the third kernel
>> would not boot unless we boot up the second kernel with cpu 0. I
>> posted a question on "if running 'kexec -le' on a kexec'ed kdump
>> kernel is legal" earlier and Vivek responded saying the scenario
>> is not guranteed to work. So, i think we are fine here.
>
> Ok, so with this patch and the PROM fix, on a SN system,
> 1. Kdump -> 2nd kernel works.
> 2. Kdump -> 2nd kernel -> Kexec to third kernel will not work.
> 3. Kexec -> 2nd Kernel -> Kexec -> 3rd kernel works?
> 4. Kexec -> 2nd Kernel -> Kdump -> 3rd kernel works?
>
> I think if scenario 1, 3 and 4 works it will be ok. Scenario 2 is not so useful I guess.
Hi Nanhai,
Where do we stand as to this patch's concern? Did you include this yet?
As to Scenario 3 and 4, 'kexec -l' failed on "Inivalid memory segment"
on SN Altix systems, and i have not had time to dig into it. This patch
is pretty much doing what you suggested "calling ia64_jump_to_sal"
to send the cpus to slave loop.
We can include cpu 0 also by calling fix_b0_for)bsp() to set up b0
for cpu 0 in ia64_mca_init(), if so desired. What do you think?
Regards,
- jay
[root@pogo1 boot]# /home/jlan/kexec-noio -l /boot/vmlinuz-2.6.18-kdump
--noio -
-initrd=/boot/initrd-2.6.18-kdump --append="root=/dev/sdb6 irqpoll ro
console=t
tySG0"
Done with process_options
kernel: 0x2000000000328010 kernel_size: 3502601
memory_range: crashk, idx=5, start018000000, end028000000
memory_range: Boot, idx=7, start07a280010, end07a280061
memory_range: MemoryMap, idx=9, start07a3f0010, end07a3f0611
build_mem_shdrs: ei_class=2, e_shnumF, e_shoffT506776
build_mem_shdrs: sizeof(e_shdr)r, e_shdr=0x6000000000014120
ready to load. type=0,
build_mem_shdrs: ei_class=2, e_shnumF, e_shoffT506776
build_mem_shdrs: sizeof(e_shdr)r, e_shdr=0x6000000000014f30
elf_exec_load
Invalid memory segment 0x4000000 - 0x4997fff
Segmentation fault
[root@pogo1 boot]#
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: [PATCH]send slave cpus to SAL slave loop on crash (IA64)
2006-10-30 20:36 [PATCH]send slave cpus to SAL slave loop on crash (IA64) Jay Lan
` (7 preceding siblings ...)
2006-11-10 19:23 ` Jay Lan
@ 2006-11-14 1:25 ` Zou, Nanhai
2006-11-20 23:13 ` Zou Nan hai
2006-11-20 23:33 ` Jay Lan
10 siblings, 0 replies; 12+ messages in thread
From: Zou, Nanhai @ 2006-11-14 1:25 UTC (permalink / raw)
To: linux-ia64
> -----Original Message-----
> From: Jay Lan [mailto:jlan@sgi.com]
> Sent: 2006Äê11ÔÂ11ÈÕ 3:23
> To: Zou, Nanhai
> Cc: fastboot; Linux-IA64; Jack Steiner; Luck, Tony
> Subject: Re: [PATCH]send slave cpus to SAL slave loop on crash (IA64)
>
> Zou, Nanhai wrote:
> >>> But this will rely on machine crash on CPU 0?
> >> We do not rely on machine crash on CPU 0 any more. If the
> >> crashing CPU is not cpu 0 and the cpu 0 not being returned to
> >> the slave loop, this case is handled by our PROM now.
> >>
> >> However, if somebody tries to boot up a production kernel using '-le'
> >> option _after_ the kexec'ed kernel is up running, the third kernel
> >> would not boot unless we boot up the second kernel with cpu 0. I
> >> posted a question on "if running 'kexec -le' on a kexec'ed kdump
> >> kernel is legal" earlier and Vivek responded saying the scenario
> >> is not guranteed to work. So, i think we are fine here.
> >
> > Ok, so with this patch and the PROM fix, on a SN system,
> > 1. Kdump -> 2nd kernel works.
> > 2. Kdump -> 2nd kernel -> Kexec to third kernel will not work.
> > 3. Kexec -> 2nd Kernel -> Kexec -> 3rd kernel works?
> > 4. Kexec -> 2nd Kernel -> Kdump -> 3rd kernel works?
> >
> > I think if scenario 1, 3 and 4 works it will be ok. Scenario 2 is not so
> useful I guess.
>
> Hi Nanhai,
>
> Where do we stand as to this patch's concern? Did you include this yet?
>
> As to Scenario 3 and 4, 'kexec -l' failed on "Inivalid memory segment"
> on SN Altix systems, and i have not had time to dig into it. This patch
> is pretty much doing what you suggested "calling ia64_jump_to_sal"
> to send the cpus to slave loop.
>
> We can include cpu 0 also by calling fix_b0_for)bsp() to set up b0
> for cpu 0 in ia64_mca_init(), if so desired. What do you think?
During kdump_cpu_freeze, putting everybody except cpu0 to rendez state looks like a hack... However fix_b0_for_bsp is another hack....
Since I think the first one looks less hacky, I will include your patch in my kdump patch.
Thanks
Zou Nan hai
>
> Regards,
> - jay
>
>
> [root@pogo1 boot]# /home/jlan/kexec-noio -l /boot/vmlinuz-2.6.18-kdump
> --noio -
> -initrd=/boot/initrd-2.6.18-kdump --append="root=/dev/sdb6 irqpoll ro
> console=t
> tySG0"
> Done with process_options
> kernel: 0x2000000000328010 kernel_size: 3502601
> memory_range: crashk, idx=5, start018000000, end028000000
> memory_range: Boot, idx=7, start07a280010, end07a280061
> memory_range: MemoryMap, idx=9, start07a3f0010, end07a3f0611
> build_mem_shdrs: ei_class=2, e_shnumF, e_shoffT506776
> build_mem_shdrs: sizeof(e_shdr)r, e_shdr=0x6000000000014120
> ready to load. type=0,
> build_mem_shdrs: ei_class=2, e_shnumF, e_shoffT506776
> build_mem_shdrs: sizeof(e_shdr)r, e_shdr=0x6000000000014f30
> elf_exec_load
> Invalid memory segment 0x4000000 - 0x4997fff
> Segmentation fault
> [root@pogo1 boot]#
>
> >
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH]send slave cpus to SAL slave loop on crash (IA64)
2006-10-30 20:36 [PATCH]send slave cpus to SAL slave loop on crash (IA64) Jay Lan
` (8 preceding siblings ...)
2006-11-14 1:25 ` Zou, Nanhai
@ 2006-11-20 23:13 ` Zou Nan hai
2006-11-20 23:33 ` Jay Lan
10 siblings, 0 replies; 12+ messages in thread
From: Zou Nan hai @ 2006-11-20 23:13 UTC (permalink / raw)
To: linux-ia64
On Tue, 2006-11-21 at 07:33, Jay Lan wrote:
> Zou, Nanhai wrote:
> >> We do not rely on machine crash on CPU 0 any more. If the
> >> crashing CPU is not cpu 0 and the cpu 0 not being returned to
> >> the slave loop, this case is handled by our PROM now.
> >>
> >> However, if somebody tries to boot up a production kernel using
> '-le'
> >> option _after_ the kexec'ed kernel is up running, the third kernel
> >> would not boot unless we boot up the second kernel with cpu 0. I
> >> posted a question on "if running 'kexec -le' on a kexec'ed kdump
> >> kernel is legal" earlier and Vivek responded saying the scenario
> >> is not guranteed to work. So, i think we are fine here.
> >>
> >
> > Ok, so with this patch and the PROM fix, on a SN system,
> > 1. Kdump -> 2nd kernel works.
> > 2. Kdump -> 2nd kernel -> Kexec to third kernel will not work.
> > 3. Kexec -> 2nd Kernel -> Kexec -> 3rd kernel works?
> > 4. Kexec -> 2nd Kernel -> Kdump -> 3rd kernel works?
> >
> > I think if scenario 1, 3 and 4 works it will be ok. Scenario 2 is
> not so useful I guess.
> >
>
> With the patch Nanhai sent to me to fix '-l' option on SN system,
> now scenario 1, 3 and 4 all works. Of course, you need to include
> 'crashkernel' parameter in "append" option when you do 'kexec -l'
> in order for scenario #4 to work. You do not need crashkernel
> parameter for #3 though.
>
> Thanks,
> - jay
>
>
This is the patch,
This patch make normal "kexec -l" first try physical address suggested
by vmlinux.
If there is no enough memory, kexec tools will search /proc/iomem and
find a place to put the new kernel.
This is necessary for "kexec -l" to work on SN platform.
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
diff -Nraup a/kexec/arch/ia64/kexec-elf-ia64.c b/kexec/arch/ia64/kexec-elf-ia64.c
--- a/kexec/arch/ia64/kexec-elf-ia64.c 2006-11-20 00:56:54.000000000 -0500
+++ b/kexec/arch/ia64/kexec-elf-ia64.c 2006-11-20 20:49:21.000000000 -0500
@@ -84,7 +84,8 @@ void elf_ia64_usage(void)
/* Move the crash kerenl physical offset to reserved region
*/
-static void move_loaded_segments(struct kexec_info *info, struct mem_ehdr *ehdr)
+void move_loaded_segments(struct kexec_info *info, struct mem_ehdr *ehdr,
+ unsigned long addr)
{
int i;
long offset;
@@ -92,7 +93,7 @@ static void move_loaded_segments(struct
for(i = 0; i < ehdr->e_phnum; i++) {
phdr = &ehdr->e_phdr[i];
if (phdr->p_type = PT_LOAD) {
- offset = mem_min - phdr->p_paddr;
+ offset = addr - phdr->p_paddr;
break;
}
}
@@ -168,7 +169,12 @@ int elf_ia64_load(int argc, char **argv,
fprintf(stderr, "Failed to find crash kernel region in /proc/iomem\n");
return -1;
}
- move_loaded_segments(info, &ehdr);
+ move_loaded_segments(info, &ehdr, mem_min);
+ } else {
+ if (update_loaded_segments(info, &ehdr)) {
+ fprintf(stderr, "Failed to place kernel\n");
+ return -1;
+ }
}
entry = ehdr.e_entry;
diff -Nraup a/kexec/arch/ia64/kexec-ia64.c b/kexec/arch/ia64/kexec-ia64.c
--- a/kexec/arch/ia64/kexec-ia64.c 2006-11-20 00:54:38.000000000 -0500
+++ b/kexec/arch/ia64/kexec-ia64.c 2006-11-20 20:49:21.000000000 -0500
@@ -29,13 +29,15 @@
#include <getopt.h>
#include <sched.h>
#include <sys/utsname.h>
+#include <limits.h>
#include "../../kexec.h"
#include "../../kexec-syscall.h"
+#include "elf.h"
#include "kexec-ia64.h"
#include <arch/options.h>
static struct memory_range memory_range[MAX_MEMORY_RANGES];
-
+static int memory_ranges;
/* Reserve range for EFI memmap and Boot parameter */
static int split_range(int range, unsigned long start, unsigned long end)
{
@@ -73,7 +75,6 @@ int get_memory_ranges(struct memory_rang
unsigned long kexec_flags)
{
const char iomem[]= "/proc/iomem";
- int memory_ranges = 0;
char line[MAX_LINE];
FILE *fp;
fp = fopen(iomem, "r");
@@ -209,6 +210,45 @@ int arch_compat_trampoline(struct kexec_
return 0;
}
+int update_loaded_segments(struct kexec_info *info, struct mem_ehdr *ehdr)
+{
+ int i;
+ struct mem_phdr *phdr;
+ unsigned long start_addr = ULONG_MAX, end_addr = 0;
+ unsigned long align = 1UL<<26; // 64M
+ for(i = 0; i < ehdr->e_phnum; i++) {
+ phdr = &ehdr->e_phdr[i];
+ if (phdr->p_type = PT_LOAD) {
+ if (phdr->p_paddr < start_addr)
+ start_addr = phdr->p_paddr;
+ if ((phdr->p_paddr + phdr->p_memsz) > end_addr)
+ end_addr = phdr->p_paddr + phdr->p_memsz;
+ }
+
+ }
+
+ for (i = 0; i < memory_ranges
+ && memory_range[i].start <= start_addr; i++) {
+ if (memory_range[i].type = RANGE_RAM &&
+ memory_range[i].end > end_addr)
+ return;
+ }
+
+ for (i = 0; i < memory_ranges; i++) {
+ if (memory_range[i].type = RANGE_RAM) {
+ unsigned long start =
+ (memory_range[i].start + align - 1)&~(align - 1);
+ unsigned long end = memory_range[i].end;
+ if (end > start &&
+ (end - start) > (end_addr - start_addr)) {
+ move_loaded_segments(info, ehdr, start);
+ return 0;
+ }
+ }
+ }
+ return 1;
+}
+
void arch_update_purgatory(struct kexec_info *info)
{
}
diff -Nraup a/kexec/arch/ia64/kexec-ia64.h b/kexec/arch/ia64/kexec-ia64.h
--- a/kexec/arch/ia64/kexec-ia64.h 2006-10-24 21:51:49.000000000 -0400
+++ b/kexec/arch/ia64/kexec-ia64.h 2006-11-20 20:49:43.000000000 -0500
@@ -7,6 +7,10 @@ int elf_ia64_probe(const char *buf, off_
int elf_ia64_load(int argc, char **argv, const char *buf, off_t len,
struct kexec_info *info);
void elf_ia64_usage(void);
+int update_loaded_segments(struct kexec_info *info, struct mem_ehdr *ehdr);
+void move_loaded_segments(struct kexec_info *info, struct mem_ehdr *ehdr,
+ unsigned long addr);
+
#define MAX_MEMORY_RANGES 1024
#define EFI_PAGE_SIZE (1UL<<12)
#define ELF_PAGE_SIZE (1UL<<16)
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH]send slave cpus to SAL slave loop on crash (IA64)
2006-10-30 20:36 [PATCH]send slave cpus to SAL slave loop on crash (IA64) Jay Lan
` (9 preceding siblings ...)
2006-11-20 23:13 ` Zou Nan hai
@ 2006-11-20 23:33 ` Jay Lan
10 siblings, 0 replies; 12+ messages in thread
From: Jay Lan @ 2006-11-20 23:33 UTC (permalink / raw)
To: linux-ia64
Zou, Nanhai wrote:
>> We do not rely on machine crash on CPU 0 any more. If the
>> crashing CPU is not cpu 0 and the cpu 0 not being returned to
>> the slave loop, this case is handled by our PROM now.
>>
>> However, if somebody tries to boot up a production kernel using '-le'
>> option _after_ the kexec'ed kernel is up running, the third kernel
>> would not boot unless we boot up the second kernel with cpu 0. I
>> posted a question on "if running 'kexec -le' on a kexec'ed kdump
>> kernel is legal" earlier and Vivek responded saying the scenario
>> is not guranteed to work. So, i think we are fine here.
>>
>
> Ok, so with this patch and the PROM fix, on a SN system,
> 1. Kdump -> 2nd kernel works.
> 2. Kdump -> 2nd kernel -> Kexec to third kernel will not work.
> 3. Kexec -> 2nd Kernel -> Kexec -> 3rd kernel works?
> 4. Kexec -> 2nd Kernel -> Kdump -> 3rd kernel works?
>
> I think if scenario 1, 3 and 4 works it will be ok. Scenario 2 is not so useful I guess.
>
With the patch Nanhai sent to me to fix '-l' option on SN system,
now scenario 1, 3 and 4 all works. Of course, you need to include
'crashkernel' parameter in "append" option when you do 'kexec -l'
in order for scenario #4 to work. You do not need crashkernel
parameter for #3 though.
Thanks,
- jay
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2006-11-20 23:33 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-30 20:36 [PATCH]send slave cpus to SAL slave loop on crash (IA64) Jay Lan
2006-10-31 2:02 ` Zou, Nanhai
2006-10-31 4:33 ` Zou, Nanhai
2006-10-31 8:59 ` Jay Lan
2006-10-31 9:11 ` Zou, Nanhai
2006-10-31 18:08 ` Jay Lan
2006-11-03 17:42 ` Jay Lan
2006-11-08 2:01 ` Zou, Nanhai
2006-11-10 19:23 ` Jay Lan
2006-11-14 1:25 ` Zou, Nanhai
2006-11-20 23:13 ` Zou Nan hai
2006-11-20 23:33 ` Jay Lan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox