[PATCH] IA64: kexec allocates too few memory for kdump kernel itself

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] IA64: kexec allocates too few memory for kdump kernel itself
@ 2008-09-03 21:01 Jay Lan
  2008-09-04  0:01 ` Simon Horman
  0 siblings, 1 reply; 8+ messages in thread
From: Jay Lan @ 2008-09-03 21:01 UTC (permalink / raw)
  To: kexec

[-- Attachment #1: Type: text/plain, Size: 802 bytes --]

Sometimes the kexec would allocate not enough memory for kdump kernel
itself on IA64 and caused kdump kernel to panic at boot.

When it happens, the /proc/iomem would show a kernel RAM segment like
this:
3014000000-3015294fff : System RAM
  3014000000-3014823ccf : Kernel code
  3014823cd0-3014dee8ef : Kernel data
  3014dee8f0-301529448f : Kernel bss
3015295000-307bffdfff : System RAM
  3018000000-3037ffffff : Crash kernel

But kexec would allocate memory 3018000000-3019290000 for the kernel,
which is 0x5000 smaller than the regular kernel. In my cases, the
physical_node_map and kern_memmap of the kdump kernel overlaped and
caused data corruption.

This patch fixes the problem. The patch was generated against
kexec-tools 2.0.0 and tested in 2.6.27-rc4.

Signed-off-by: Jay Lan <jlan@sgi.com>


[-- Attachment #2: bootmem-fix --]
[-- Type: text/plain, Size: 1030 bytes --]

---
 kexec/arch/ia64/crashdump-ia64.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Index: kexec-tools/kexec/arch/ia64/crashdump-ia64.c
===================================================================
--- kexec-tools.orig/kexec/arch/ia64/crashdump-ia64.c	2008-09-03 11:24:14.289758063 -0700
+++ kexec-tools/kexec/arch/ia64/crashdump-ia64.c	2008-09-03 11:29:34.095833316 -0700
@@ -90,15 +90,15 @@ static void add_loaded_segments_info(str
 			phdr = &ehdr->e_phdr[i];
 	                if (phdr->p_type != PT_LOAD)
 	                        break;
-			if (loaded_segments[loaded_segments_num].end !=
-				phdr->p_paddr & ~(ELF_PAGE_SIZE-1))
-				break;
+			if (loaded_segments[loaded_segments_num].end <
+			    (phdr->p_paddr & ~(ELF_PAGE_SIZE-1)) )
+				loaded_segments[loaded_segments_num].end
+				  = phdr->p_paddr & ~(ELF_PAGE_SIZE-1);
 			loaded_segments[loaded_segments_num].end +=
 				(phdr->p_memsz + ELF_PAGE_SIZE - 1) &
 				~(ELF_PAGE_SIZE - 1);
 			i++;
 		}
-
 		loaded_segments_num++;
 	}
 }

[-- Attachment #3: Type: text/plain, Size: 143 bytes --]

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] IA64: kexec allocates too few memory for kdump kernel itself
  2008-09-03 21:01 [PATCH] IA64: kexec allocates too few memory for kdump kernel itself Jay Lan
@ 2008-09-04  0:01 ` Simon Horman
  2008-09-04  1:37   ` Jay Lan
  0 siblings, 1 reply; 8+ messages in thread
From: Simon Horman @ 2008-09-04  0:01 UTC (permalink / raw)
  To: Jay Lan; +Cc: kexec

On Wed, Sep 03, 2008 at 02:01:59PM -0700, Jay Lan wrote:
> Sometimes the kexec would allocate not enough memory for kdump kernel
> itself on IA64 and caused kdump kernel to panic at boot.
> 
> When it happens, the /proc/iomem would show a kernel RAM segment like
> this:
> 3014000000-3015294fff : System RAM
>   3014000000-3014823ccf : Kernel code
>   3014823cd0-3014dee8ef : Kernel data
>   3014dee8f0-301529448f : Kernel bss
> 3015295000-307bffdfff : System RAM
>   3018000000-3037ffffff : Crash kernel
> 
> But kexec would allocate memory 3018000000-3019290000 for the kernel,
> which is 0x5000 smaller than the regular kernel. In my cases, the
> physical_node_map and kern_memmap of the kdump kernel overlaped and
> caused data corruption.
> 
> This patch fixes the problem. The patch was generated against
> kexec-tools 2.0.0 and tested in 2.6.27-rc4.

Hi Jay,

I am unclear about why this underallocation occurs.

> 
> Signed-off-by: Jay Lan <jlan@sgi.com>
> 

> ---
>  kexec/arch/ia64/crashdump-ia64.c |    8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> Index: kexec-tools/kexec/arch/ia64/crashdump-ia64.c
> ===================================================================
> --- kexec-tools.orig/kexec/arch/ia64/crashdump-ia64.c	2008-09-03 11:24:14.289758063 -0700
> +++ kexec-tools/kexec/arch/ia64/crashdump-ia64.c	2008-09-03 11:29:34.095833316 -0700
> @@ -90,15 +90,15 @@ static void add_loaded_segments_info(str
>  			phdr = &ehdr->e_phdr[i];
>  	                if (phdr->p_type != PT_LOAD)
>  	                        break;
> -			if (loaded_segments[loaded_segments_num].end !=
> -				phdr->p_paddr & ~(ELF_PAGE_SIZE-1))
> -				break;
> +			if (loaded_segments[loaded_segments_num].end <
> +			    (phdr->p_paddr & ~(ELF_PAGE_SIZE-1)) )
> +				loaded_segments[loaded_segments_num].end
> +				  = phdr->p_paddr & ~(ELF_PAGE_SIZE-1);
>  			loaded_segments[loaded_segments_num].end +=
>  				(phdr->p_memsz + ELF_PAGE_SIZE - 1) &
>  				~(ELF_PAGE_SIZE - 1);
>  			i++;
>  		}
> -
>  		loaded_segments_num++;
>  	}
>  }

> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] IA64: kexec allocates too few memory for kdump kernel itself
  2008-09-04  0:01 ` Simon Horman
@ 2008-09-04  1:37   ` Jay Lan
  2008-09-04 18:28     ` Jay Lan
  0 siblings, 1 reply; 8+ messages in thread
From: Jay Lan @ 2008-09-04  1:37 UTC (permalink / raw)
  To: Simon Horman; +Cc: kexec

Simon Horman wrote:
> On Wed, Sep 03, 2008 at 02:01:59PM -0700, Jay Lan wrote:
>> Sometimes the kexec would allocate not enough memory for kdump kernel
>> itself on IA64 and caused kdump kernel to panic at boot.
>>
>> When it happens, the /proc/iomem would show a kernel RAM segment like
>> this:
>> 3014000000-3015294fff : System RAM
>>   3014000000-3014823ccf : Kernel code
>>   3014823cd0-3014dee8ef : Kernel data
>>   3014dee8f0-301529448f : Kernel bss
>> 3015295000-307bffdfff : System RAM
>>   3018000000-3037ffffff : Crash kernel
>>
>> But kexec would allocate memory 3018000000-3019290000 for the kernel,
>> which is 0x5000 smaller than the regular kernel. In my cases, the
>> physical_node_map and kern_memmap of the kdump kernel overlaped and
>> caused data corruption.
>>
>> This patch fixes the problem. The patch was generated against
>> kexec-tools 2.0.0 and tested in 2.6.27-rc4.
> 
> Hi Jay,
> 
> I am unclear about why this underallocation occurs.

Hi Simon,

The routine add_loaded_segments_info() set up "loaded_segment" array
that is needed by purgatory code, based on data stored in the
mem_ehdr array passed in as the second parameter.

Upon entrance of the routine, the crash_memory_range[] contains
information about the regular kernel:
crash_memory_range[ 0]: start=  3000080000, end=  30003fffff
crash_memory_range[ 1]: start=  3003000000, end=  3005ffffff
crash_memory_range[ 2]: start=  3006000000, end=  3013ffffff
crash_memory_range[ 3]: start=  3014000000, end=  3015294fff

The #3 entry is the kernel memory segment.

And the mem_ehdr array would contain data as such:
i=0, p_paddr=3018000000, p_memsz=d04480, p_offset=10000, p_type=1
i=1, p_paddr=3018d20000, p_memsz=9620, p_offset=d20000, p_type=1
i=2, p_paddr=3018d30000, p_memsz=564490, p_offset=d30000, p_type=1
i=3, p_paddr=0, p_memsz=0, p_offset=0, p_type=4

The code wants the new loaded_segments contain starting address
all aligned at page boundary, which is 0x10000 in IA64.

Note that the p_memsz of mem_ehdr does not match to entries in
/proc/iomem:
3014000000-3015294fff : System RAM
  3014000000-3014823ccf : Kernel code
  3014823cd0-3014dee8ef : Kernel data
  3014dee8f0-301529448f : Kernel bss

The original code of add_loaded_segments_info() would go through
the mem_ehdr array and use the p_paddr of the first entry (the
beginning of the reserved memory) as the start address, add
the p_memsz of three entries to calculate the end address of
the kernel segment.

But the p_paddr of i=0 plus p_memsz of i=0 should result in
3018d10000 as the p_paddr of i=1 entry, but actually the
p_paddr of i=1 is 3018d20000. The logic of that routine
can not explain the discrepency.

So, where the data of mem_ehdr array come from?

add_loaded_segments_info
<-  load_crashdump_segments
  <-  elf_ia64_load
    <- file_type[i].load
      <- my_load

The elf_ia64_load set up mem_ehdr, probabaly based on data
pointed by *buf, which i think comes from vmlinuz.

So, i failed to find out how the p_memsz were set up initially.
But, i think we did it the way too complicated, IMHO.

The crash_memory_range[] array showed the kernel segment consumed
0x1295000 bytes of memory and we only need to tell the purgatory
code to reserve that amount of memory. The logic in
add_loaded_segments_info() came out with 0x1290000 and caused the
crashkernel to panic on boot.

Hmmm, as i types now, i may not consider the situation where
the crashkernel is not the same as the first kernel...

Note that the underallocation does not _ALWAYS_ happen! It depends
on the vmlinux we build. Honestly i do not understand some part of
the kexec-tools code well enough to make major surgery to the code.
So, i just compare the end address after calculation of i=0 entry
of mem_ehdr array with the start address of the second entry. If it
is too small, i just bring it up to align with the start address of
the second entry. I am happy to allocate one extra page, may not be
needed in some cases, of memory than to panic. Yes, my patch is
a work-around.

If you can find the true cause of the problem and fix it, it
would be great and appreciated!

Regards,
 - jay

> 
>> Signed-off-by: Jay Lan <jlan@sgi.com>
>>
> 
>> ---
>>  kexec/arch/ia64/crashdump-ia64.c |    8 ++++----
>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> Index: kexec-tools/kexec/arch/ia64/crashdump-ia64.c
>> ===================================================================
>> --- kexec-tools.orig/kexec/arch/ia64/crashdump-ia64.c	2008-09-03 11:24:14.289758063 -0700
>> +++ kexec-tools/kexec/arch/ia64/crashdump-ia64.c	2008-09-03 11:29:34.095833316 -0700
>> @@ -90,15 +90,15 @@ static void add_loaded_segments_info(str
>>  			phdr = &ehdr->e_phdr[i];
>>  	                if (phdr->p_type != PT_LOAD)
>>  	                        break;
>> -			if (loaded_segments[loaded_segments_num].end !=
>> -				phdr->p_paddr & ~(ELF_PAGE_SIZE-1))
>> -				break;
>> +			if (loaded_segments[loaded_segments_num].end <
>> +			    (phdr->p_paddr & ~(ELF_PAGE_SIZE-1)) )
>> +				loaded_segments[loaded_segments_num].end
>> +				  = phdr->p_paddr & ~(ELF_PAGE_SIZE-1);
>>  			loaded_segments[loaded_segments_num].end +=
>>  				(phdr->p_memsz + ELF_PAGE_SIZE - 1) &
>>  				~(ELF_PAGE_SIZE - 1);
>>  			i++;
>>  		}
>> -
>>  		loaded_segments_num++;
>>  	}
>>  }
> 

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] IA64: kexec allocates too few memory for kdump kernel itself
  2008-09-04  1:37   ` Jay Lan
@ 2008-09-04 18:28     ` Jay Lan
  2008-09-12 23:27       ` Simon Horman
  2008-09-15  6:58       ` Simon Horman
  0 siblings, 2 replies; 8+ messages in thread
From: Jay Lan @ 2008-09-04 18:28 UTC (permalink / raw)
  To: Simon Horman; +Cc: kexec

Jay Lan wrote:
> Simon Horman wrote:
>> On Wed, Sep 03, 2008 at 02:01:59PM -0700, Jay Lan wrote:
>>> Sometimes the kexec would allocate not enough memory for kdump kernel
>>> itself on IA64 and caused kdump kernel to panic at boot.
>>>
>>> When it happens, the /proc/iomem would show a kernel RAM segment like
>>> this:
>>> 3014000000-3015294fff : System RAM
>>>   3014000000-3014823ccf : Kernel code
>>>   3014823cd0-3014dee8ef : Kernel data
>>>   3014dee8f0-301529448f : Kernel bss
>>> 3015295000-307bffdfff : System RAM
>>>   3018000000-3037ffffff : Crash kernel
>>>
>>> But kexec would allocate memory 3018000000-3019290000 for the kernel,
>>> which is 0x5000 smaller than the regular kernel. In my cases, the
>>> physical_node_map and kern_memmap of the kdump kernel overlaped and
>>> caused data corruption.
>>>
>>> This patch fixes the problem. The patch was generated against
>>> kexec-tools 2.0.0 and tested in 2.6.27-rc4.
>> Hi Jay,
>>
>> I am unclear about why this underallocation occurs.
> 
> Hi Simon,
> 
> The routine add_loaded_segments_info() set up "loaded_segment" array
> that is needed by purgatory code, based on data stored in the
> mem_ehdr array passed in as the second parameter.
> 
> Upon entrance of the routine, the crash_memory_range[] contains
> information about the regular kernel:
> crash_memory_range[ 0]: start=  3000080000, end=  30003fffff
> crash_memory_range[ 1]: start=  3003000000, end=  3005ffffff
> crash_memory_range[ 2]: start=  3006000000, end=  3013ffffff
> crash_memory_range[ 3]: start=  3014000000, end=  3015294fff
> 
> The #3 entry is the kernel memory segment.
> 
> And the mem_ehdr array would contain data as such:

Hi,

It should be mem_phdr, got it from mem_ehdr->e_phdr.

> i=0, p_paddr=3018000000, p_memsz=d04480, p_offset=10000, p_type=1
> i=1, p_paddr=3018d20000, p_memsz=9620, p_offset=d20000, p_type=1
> i=2, p_paddr=3018d30000, p_memsz=564490, p_offset=d30000, p_type=1
> i=3, p_paddr=0, p_memsz=0, p_offset=0, p_type=4

Does anyone understand how the array were created and why there
was a gap between i=0 and i=1 entries? I think this is the problem
but i do not know how to fix it, so tried to work around it.

The statement my patch replaced was totally broken:
 -			if (loaded_segments[loaded_segments_num].end !=
 -				phdr->p_paddr & ~(ELF_PAGE_SIZE-1))
 -				break;
 +			if (loaded_segments[loaded_segments_num].end <
 +			    (phdr->p_paddr & ~(ELF_PAGE_SIZE-1)) )
 +				loaded_segments[loaded_segments_num].end
 +				  = phdr->p_paddr & ~(ELF_PAGE_SIZE-1);

My debugging showed that when "loaded_segments[loaded_segments_num].end"
!= "phdr->p_paddr & ~(ELF_PAGE_SIZE-1)", they were treated as equal
and continue to next statement.  However, if i assign both expression
to local variables and do comparison, the 'break' statement is
executed correctly when two values are not the same. Unfortunately,
consequently the kdump kernel would _alawys_ hang.

I believe the intent of the original statement is to ensure there is
no gap between entries of mem_phdr array. But if there is a gap,
kexec should simply exit with failure. The 'break' statement just
created a loaded_segment[] array that broke the kernel memory segment
into multiple entries and resulted in the kdump kernel hang in
find_memory(). The IA64 (at least 2.6.27-rc4) kdump kernel works in
some cases today are simply out of luck.

I believe the real fix is to fix the contents of the mem_phdr array.
Since i do not know how to fix it, my patch would close up the
gap where there is the a gap between entries of the mem_phdr array.

Does it make more sense to you now, Simon?

Regards,
 - jay



> 
> The code wants the new loaded_segments contain starting address
> all aligned at page boundary, which is 0x10000 in IA64.
> 
> Note that the p_memsz of mem_ehdr does not match to entries in
> /proc/iomem:
> 3014000000-3015294fff : System RAM
>   3014000000-3014823ccf : Kernel code
>   3014823cd0-3014dee8ef : Kernel data
>   3014dee8f0-301529448f : Kernel bss
> 
> The original code of add_loaded_segments_info() would go through
> the mem_ehdr array and use the p_paddr of the first entry (the
> beginning of the reserved memory) as the start address, add
> the p_memsz of three entries to calculate the end address of
> the kernel segment.
> 
> But the p_paddr of i=0 plus p_memsz of i=0 should result in
> 3018d10000 as the p_paddr of i=1 entry, but actually the
> p_paddr of i=1 is 3018d20000. The logic of that routine
> can not explain the discrepency.
> 
> So, where the data of mem_ehdr array come from?
> 
> add_loaded_segments_info
> <-  load_crashdump_segments
>   <-  elf_ia64_load
>     <- file_type[i].load
>       <- my_load
> 
> The elf_ia64_load set up mem_ehdr, probabaly based on data
> pointed by *buf, which i think comes from vmlinuz.
> 
> So, i failed to find out how the p_memsz were set up initially.
> But, i think we did it the way too complicated, IMHO.
> 
> The crash_memory_range[] array showed the kernel segment consumed
> 0x1295000 bytes of memory and we only need to tell the purgatory
> code to reserve that amount of memory. The logic in
> add_loaded_segments_info() came out with 0x1290000 and caused the
> crashkernel to panic on boot.
> 
> Hmmm, as i types now, i may not consider the situation where
> the crashkernel is not the same as the first kernel...
> 
> Note that the underallocation does not _ALWAYS_ happen! It depends
> on the vmlinux we build. Honestly i do not understand some part of
> the kexec-tools code well enough to make major surgery to the code.
> So, i just compare the end address after calculation of i=0 entry
> of mem_ehdr array with the start address of the second entry. If it
> is too small, i just bring it up to align with the start address of
> the second entry. I am happy to allocate one extra page, may not be
> needed in some cases, of memory than to panic. Yes, my patch is
> a work-around.
> 
> If you can find the true cause of the problem and fix it, it
> would be great and appreciated!
> 
> Regards,
>  - jay
> 
> 
>>> Signed-off-by: Jay Lan <jlan@sgi.com>
>>>
>>> ---
>>>  kexec/arch/ia64/crashdump-ia64.c |    8 ++++----
>>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>>
>>> Index: kexec-tools/kexec/arch/ia64/crashdump-ia64.c
>>> ===================================================================
>>> --- kexec-tools.orig/kexec/arch/ia64/crashdump-ia64.c	2008-09-03 11:24:14.289758063 -0700
>>> +++ kexec-tools/kexec/arch/ia64/crashdump-ia64.c	2008-09-03 11:29:34.095833316 -0700
>>> @@ -90,15 +90,15 @@ static void add_loaded_segments_info(str
>>>  			phdr = &ehdr->e_phdr[i];
>>>  	                if (phdr->p_type != PT_LOAD)
>>>  	                        break;
>>> -			if (loaded_segments[loaded_segments_num].end !=
>>> -				phdr->p_paddr & ~(ELF_PAGE_SIZE-1))
>>> -				break;
>>> +			if (loaded_segments[loaded_segments_num].end <
>>> +			    (phdr->p_paddr & ~(ELF_PAGE_SIZE-1)) )
>>> +				loaded_segments[loaded_segments_num].end
>>> +				  = phdr->p_paddr & ~(ELF_PAGE_SIZE-1);
>>>  			loaded_segments[loaded_segments_num].end +=
>>>  				(phdr->p_memsz + ELF_PAGE_SIZE - 1) &
>>>  				~(ELF_PAGE_SIZE - 1);
>>>  			i++;
>>>  		}
>>> -
>>>  		loaded_segments_num++;
>>>  	}
>>>  }
> 
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] IA64: kexec allocates too few memory for kdump kernel itself
  2008-09-04 18:28     ` Jay Lan
@ 2008-09-12 23:27       ` Simon Horman
  2008-09-13  0:38         ` Jay Lan
  2008-09-15  6:58       ` Simon Horman
  1 sibling, 1 reply; 8+ messages in thread
From: Simon Horman @ 2008-09-12 23:27 UTC (permalink / raw)
  To: Jay Lan; +Cc: kexec

On Thu, Sep 04, 2008 at 11:28:32AM -0700, Jay Lan wrote:
> Jay Lan wrote:
> > Simon Horman wrote:
> >> On Wed, Sep 03, 2008 at 02:01:59PM -0700, Jay Lan wrote:
> >>> Sometimes the kexec would allocate not enough memory for kdump kernel
> >>> itself on IA64 and caused kdump kernel to panic at boot.
> >>>
> >>> When it happens, the /proc/iomem would show a kernel RAM segment like
> >>> this:
> >>> 3014000000-3015294fff : System RAM
> >>>   3014000000-3014823ccf : Kernel code
> >>>   3014823cd0-3014dee8ef : Kernel data
> >>>   3014dee8f0-301529448f : Kernel bss
> >>> 3015295000-307bffdfff : System RAM
> >>>   3018000000-3037ffffff : Crash kernel
> >>>
> >>> But kexec would allocate memory 3018000000-3019290000 for the kernel,
> >>> which is 0x5000 smaller than the regular kernel. In my cases, the
> >>> physical_node_map and kern_memmap of the kdump kernel overlaped and
> >>> caused data corruption.
> >>>
> >>> This patch fixes the problem. The patch was generated against
> >>> kexec-tools 2.0.0 and tested in 2.6.27-rc4.
> >> Hi Jay,
> >>
> >> I am unclear about why this underallocation occurs.
> > 
> > Hi Simon,
> > 
> > The routine add_loaded_segments_info() set up "loaded_segment" array
> > that is needed by purgatory code, based on data stored in the
> > mem_ehdr array passed in as the second parameter.
> > 
> > Upon entrance of the routine, the crash_memory_range[] contains
> > information about the regular kernel:
> > crash_memory_range[ 0]: start=  3000080000, end=  30003fffff
> > crash_memory_range[ 1]: start=  3003000000, end=  3005ffffff
> > crash_memory_range[ 2]: start=  3006000000, end=  3013ffffff
> > crash_memory_range[ 3]: start=  3014000000, end=  3015294fff
> > 
> > The #3 entry is the kernel memory segment.
> > 
> > And the mem_ehdr array would contain data as such:
> 
> Hi,
> 
> It should be mem_phdr, got it from mem_ehdr->e_phdr.
> 
> > i=0, p_paddr=3018000000, p_memsz=d04480, p_offset=10000, p_type=1
> > i=1, p_paddr=3018d20000, p_memsz=9620, p_offset=d20000, p_type=1
> > i=2, p_paddr=3018d30000, p_memsz=564490, p_offset=d30000, p_type=1
> > i=3, p_paddr=0, p_memsz=0, p_offset=0, p_type=4
> 
> Does anyone understand how the array were created and why there
> was a gap between i=0 and i=1 entries? I think this is the problem
> but i do not know how to fix it, so tried to work around it.
> 
> The statement my patch replaced was totally broken:
>  -			if (loaded_segments[loaded_segments_num].end !=
>  -				phdr->p_paddr & ~(ELF_PAGE_SIZE-1))
>  -				break;
>  +			if (loaded_segments[loaded_segments_num].end <
>  +			    (phdr->p_paddr & ~(ELF_PAGE_SIZE-1)) )
>  +				loaded_segments[loaded_segments_num].end
>  +				  = phdr->p_paddr & ~(ELF_PAGE_SIZE-1);
> 
> My debugging showed that when "loaded_segments[loaded_segments_num].end"
> != "phdr->p_paddr & ~(ELF_PAGE_SIZE-1)", they were treated as equal
> and continue to next statement.  However, if i assign both expression
> to local variables and do comparison, the 'break' statement is
> executed correctly when two values are not the same. Unfortunately,
> consequently the kdump kernel would _alawys_ hang.
> 
> I believe the intent of the original statement is to ensure there is
> no gap between entries of mem_phdr array. But if there is a gap,
> kexec should simply exit with failure. The 'break' statement just
> created a loaded_segment[] array that broke the kernel memory segment
> into multiple entries and resulted in the kdump kernel hang in
> find_memory(). The IA64 (at least 2.6.27-rc4) kdump kernel works in
> some cases today are simply out of luck.
> 
> I believe the real fix is to fix the contents of the mem_phdr array.
> Since i do not know how to fix it, my patch would close up the
> gap where there is the a gap between entries of the mem_phdr array.
> 
> Does it make more sense to you now, Simon?

Hi Jay,

yes that does make sense. I'd like to poke around and see
if mem_phdr can be fixed.

> 
> Regards,
>  - jay
> 
> 
> 
> > 
> > The code wants the new loaded_segments contain starting address
> > all aligned at page boundary, which is 0x10000 in IA64.
> > 
> > Note that the p_memsz of mem_ehdr does not match to entries in
> > /proc/iomem:
> > 3014000000-3015294fff : System RAM
> >   3014000000-3014823ccf : Kernel code
> >   3014823cd0-3014dee8ef : Kernel data
> >   3014dee8f0-301529448f : Kernel bss
> > 
> > The original code of add_loaded_segments_info() would go through
> > the mem_ehdr array and use the p_paddr of the first entry (the
> > beginning of the reserved memory) as the start address, add
> > the p_memsz of three entries to calculate the end address of
> > the kernel segment.
> > 
> > But the p_paddr of i=0 plus p_memsz of i=0 should result in
> > 3018d10000 as the p_paddr of i=1 entry, but actually the
> > p_paddr of i=1 is 3018d20000. The logic of that routine
> > can not explain the discrepency.
> > 
> > So, where the data of mem_ehdr array come from?
> > 
> > add_loaded_segments_info
> > <-  load_crashdump_segments
> >   <-  elf_ia64_load
> >     <- file_type[i].load
> >       <- my_load
> > 
> > The elf_ia64_load set up mem_ehdr, probabaly based on data
> > pointed by *buf, which i think comes from vmlinuz.
> > 
> > So, i failed to find out how the p_memsz were set up initially.
> > But, i think we did it the way too complicated, IMHO.
> > 
> > The crash_memory_range[] array showed the kernel segment consumed
> > 0x1295000 bytes of memory and we only need to tell the purgatory
> > code to reserve that amount of memory. The logic in
> > add_loaded_segments_info() came out with 0x1290000 and caused the
> > crashkernel to panic on boot.
> > 
> > Hmmm, as i types now, i may not consider the situation where
> > the crashkernel is not the same as the first kernel...
> > 
> > Note that the underallocation does not _ALWAYS_ happen! It depends
> > on the vmlinux we build. Honestly i do not understand some part of
> > the kexec-tools code well enough to make major surgery to the code.
> > So, i just compare the end address after calculation of i=0 entry
> > of mem_ehdr array with the start address of the second entry. If it
> > is too small, i just bring it up to align with the start address of
> > the second entry. I am happy to allocate one extra page, may not be
> > needed in some cases, of memory than to panic. Yes, my patch is
> > a work-around.
> > 
> > If you can find the true cause of the problem and fix it, it
> > would be great and appreciated!
> > 
> > Regards,
> >  - jay
> > 
> > 
> >>> Signed-off-by: Jay Lan <jlan@sgi.com>
> >>>
> >>> ---
> >>>  kexec/arch/ia64/crashdump-ia64.c |    8 ++++----
> >>>  1 file changed, 4 insertions(+), 4 deletions(-)
> >>>
> >>> Index: kexec-tools/kexec/arch/ia64/crashdump-ia64.c
> >>> ===================================================================
> >>> --- kexec-tools.orig/kexec/arch/ia64/crashdump-ia64.c	2008-09-03 11:24:14.289758063 -0700
> >>> +++ kexec-tools/kexec/arch/ia64/crashdump-ia64.c	2008-09-03 11:29:34.095833316 -0700
> >>> @@ -90,15 +90,15 @@ static void add_loaded_segments_info(str
> >>>  			phdr = &ehdr->e_phdr[i];
> >>>  	                if (phdr->p_type != PT_LOAD)
> >>>  	                        break;
> >>> -			if (loaded_segments[loaded_segments_num].end !=
> >>> -				phdr->p_paddr & ~(ELF_PAGE_SIZE-1))
> >>> -				break;
> >>> +			if (loaded_segments[loaded_segments_num].end <
> >>> +			    (phdr->p_paddr & ~(ELF_PAGE_SIZE-1)) )
> >>> +				loaded_segments[loaded_segments_num].end
> >>> +				  = phdr->p_paddr & ~(ELF_PAGE_SIZE-1);
> >>>  			loaded_segments[loaded_segments_num].end +=
> >>>  				(phdr->p_memsz + ELF_PAGE_SIZE - 1) &
> >>>  				~(ELF_PAGE_SIZE - 1);
> >>>  			i++;
> >>>  		}
> >>> -
> >>>  		loaded_segments_num++;
> >>>  	}
> >>>  }
> > 
> > _______________________________________________
> > kexec mailing list
> > kexec@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/kexec

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] IA64: kexec allocates too few memory for kdump kernel itself
  2008-09-12 23:27       ` Simon Horman
@ 2008-09-13  0:38         ` Jay Lan
  2008-09-15  5:47           ` Simon Horman
  0 siblings, 1 reply; 8+ messages in thread
From: Jay Lan @ 2008-09-13  0:38 UTC (permalink / raw)
  To: Simon Horman; +Cc: kexec

Simon Horman wrote:
>> Hi,
>>
>> It should be mem_phdr, got it from mem_ehdr->e_phdr.
>>
>>> i=0, p_paddr=3018000000, p_memsz=d04480, p_offset=10000, p_type=1
>>> i=1, p_paddr=3018d20000, p_memsz=9620, p_offset=d20000, p_type=1
>>> i=2, p_paddr=3018d30000, p_memsz=564490, p_offset=d30000, p_type=1
>>> i=3, p_paddr=0, p_memsz=0, p_offset=0, p_type=4
>> Does anyone understand how the array were created and why there
>> was a gap between i=0 and i=1 entries? I think this is the problem
>> but i do not know how to fix it, so tried to work around it.
>>
>> The statement my patch replaced was totally broken:
>>  -			if (loaded_segments[loaded_segments_num].end !=
>>  -				phdr->p_paddr & ~(ELF_PAGE_SIZE-1))
>>  -				break;
>>  +			if (loaded_segments[loaded_segments_num].end <
>>  +			    (phdr->p_paddr & ~(ELF_PAGE_SIZE-1)) )
>>  +				loaded_segments[loaded_segments_num].end
>>  +				  = phdr->p_paddr & ~(ELF_PAGE_SIZE-1);
>>
>> My debugging showed that when "loaded_segments[loaded_segments_num].end"
>> != "phdr->p_paddr & ~(ELF_PAGE_SIZE-1)", they were treated as equal
>> and continue to next statement.  However, if i assign both expression
>> to local variables and do comparison, the 'break' statement is
>> executed correctly when two values are not the same. Unfortunately,
>> consequently the kdump kernel would _alawys_ hang.
>>
>> I believe the intent of the original statement is to ensure there is
>> no gap between entries of mem_phdr array. But if there is a gap,
>> kexec should simply exit with failure. The 'break' statement just
>> created a loaded_segment[] array that broke the kernel memory segment
>> into multiple entries and resulted in the kdump kernel hang in
>> find_memory(). The IA64 (at least 2.6.27-rc4) kdump kernel works in
>> some cases today are simply out of luck.
>>
>> I believe the real fix is to fix the contents of the mem_phdr array.
>> Since i do not know how to fix it, my patch would close up the
>> gap where there is the a gap between entries of the mem_phdr array.
>>
>> Does it make more sense to you now, Simon?
> 
> Hi Jay,
> 
> yes that does make sense. I'd like to poke around and see
> if mem_phdr can be fixed.

I think the whole ehdr is read from the kernel binary in
slurp_decompress_file.

Bernhard reported a kdump kernel boot problem caused by a patch
regarding per-cpu variables access in early boot code:
http://article.gmane.org/gmane.linux.ports.ia64/19380

I backed out the offending patch and i was no longer able to
reproduce this problem.

So, it is safe to say the problem was due to how we process
data from the vmlinuz.

The code i tried to change:
-             if (loaded_segments[loaded_segments_num].end !=
-                     phdr->p_paddr & ~(ELF_PAGE_SIZE-1))
-                     break;
has two problems:
1) '!=' operation takes precedence over '&'. If the code is to
   do what it intends to do, the statement should be:
              if (loaded_segments[loaded_segments_num].end !=
                     (phdr->p_paddr & ~(ELF_PAGE_SIZE-1)) )
                     break;
2) When the 'break' is really executed, you breaks the kernel
   segment into multiple segments.

The code needs fix even if the problem i saw was a result of
a bug in the kernel.

Thanks,
jay



> 

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] IA64: kexec allocates too few memory for kdump kernel itself
  2008-09-13  0:38         ` Jay Lan
@ 2008-09-15  5:47           ` Simon Horman
  0 siblings, 0 replies; 8+ messages in thread
From: Simon Horman @ 2008-09-15  5:47 UTC (permalink / raw)
  To: Jay Lan; +Cc: kexec

On Fri, Sep 12, 2008 at 05:38:09PM -0700, Jay Lan wrote:
> Simon Horman wrote:
> >> Hi,
> >>
> >> It should be mem_phdr, got it from mem_ehdr->e_phdr.
> >>
> >>> i=0, p_paddr=3018000000, p_memsz=d04480, p_offset=10000, p_type=1
> >>> i=1, p_paddr=3018d20000, p_memsz=9620, p_offset=d20000, p_type=1
> >>> i=2, p_paddr=3018d30000, p_memsz=564490, p_offset=d30000, p_type=1
> >>> i=3, p_paddr=0, p_memsz=0, p_offset=0, p_type=4
> >> Does anyone understand how the array were created and why there
> >> was a gap between i=0 and i=1 entries? I think this is the problem
> >> but i do not know how to fix it, so tried to work around it.
> >>
> >> The statement my patch replaced was totally broken:
> >>  -			if (loaded_segments[loaded_segments_num].end !=
> >>  -				phdr->p_paddr & ~(ELF_PAGE_SIZE-1))
> >>  -				break;
> >>  +			if (loaded_segments[loaded_segments_num].end <
> >>  +			    (phdr->p_paddr & ~(ELF_PAGE_SIZE-1)) )
> >>  +				loaded_segments[loaded_segments_num].end
> >>  +				  = phdr->p_paddr & ~(ELF_PAGE_SIZE-1);
> >>
> >> My debugging showed that when "loaded_segments[loaded_segments_num].end"
> >> != "phdr->p_paddr & ~(ELF_PAGE_SIZE-1)", they were treated as equal
> >> and continue to next statement.  However, if i assign both expression
> >> to local variables and do comparison, the 'break' statement is
> >> executed correctly when two values are not the same. Unfortunately,
> >> consequently the kdump kernel would _alawys_ hang.
> >>
> >> I believe the intent of the original statement is to ensure there is
> >> no gap between entries of mem_phdr array. But if there is a gap,
> >> kexec should simply exit with failure. The 'break' statement just
> >> created a loaded_segment[] array that broke the kernel memory segment
> >> into multiple entries and resulted in the kdump kernel hang in
> >> find_memory(). The IA64 (at least 2.6.27-rc4) kdump kernel works in
> >> some cases today are simply out of luck.
> >>
> >> I believe the real fix is to fix the contents of the mem_phdr array.
> >> Since i do not know how to fix it, my patch would close up the
> >> gap where there is the a gap between entries of the mem_phdr array.
> >>
> >> Does it make more sense to you now, Simon?
> > 
> > Hi Jay,
> > 
> > yes that does make sense. I'd like to poke around and see
> > if mem_phdr can be fixed.
> 
> I think the whole ehdr is read from the kernel binary in
> slurp_decompress_file.
> 
> Bernhard reported a kdump kernel boot problem caused by a patch
> regarding per-cpu variables access in early boot code:
> http://article.gmane.org/gmane.linux.ports.ia64/19380
> 
> I backed out the offending patch and i was no longer able to
> reproduce this problem.
> 
> So, it is safe to say the problem was due to how we process
> data from the vmlinuz.
> 
> The code i tried to change:
> -             if (loaded_segments[loaded_segments_num].end !=
> -                     phdr->p_paddr & ~(ELF_PAGE_SIZE-1))
> -                     break;
> has two problems:
> 1) '!=' operation takes precedence over '&'. If the code is to
>    do what it intends to do, the statement should be:
>               if (loaded_segments[loaded_segments_num].end !=
>                      (phdr->p_paddr & ~(ELF_PAGE_SIZE-1)) )
>                      break;

Good point. That is definately a problem.

> 2) When the 'break' is really executed, you breaks the kernel
>    segment into multiple segments.

I also agree that is a problem. I just wonder what is the
best thing to do about it.

> The code needs fix even if the problem i saw was a result of
> a bug in the kernel.

Agreed.

For now can you make a patch that just fixes precedence logic bug?
Lets merge that and then tackle the more complex broken segment issue.

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] IA64: kexec allocates too few memory for kdump kernel itself
  2008-09-04 18:28     ` Jay Lan
  2008-09-12 23:27       ` Simon Horman
@ 2008-09-15  6:58       ` Simon Horman
  1 sibling, 0 replies; 8+ messages in thread
From: Simon Horman @ 2008-09-15  6:58 UTC (permalink / raw)
  To: Jay Lan; +Cc: kexec

On Thu, Sep 04, 2008 at 11:28:32AM -0700, Jay Lan wrote:
> Jay Lan wrote:
> > Simon Horman wrote:
> >> On Wed, Sep 03, 2008 at 02:01:59PM -0700, Jay Lan wrote:
> >>> Sometimes the kexec would allocate not enough memory for kdump kernel
> >>> itself on IA64 and caused kdump kernel to panic at boot.
> >>>
> >>> When it happens, the /proc/iomem would show a kernel RAM segment like
> >>> this:
> >>> 3014000000-3015294fff : System RAM
> >>>   3014000000-3014823ccf : Kernel code
> >>>   3014823cd0-3014dee8ef : Kernel data
> >>>   3014dee8f0-301529448f : Kernel bss
> >>> 3015295000-307bffdfff : System RAM
> >>>   3018000000-3037ffffff : Crash kernel
> >>>
> >>> But kexec would allocate memory 3018000000-3019290000 for the kernel,
> >>> which is 0x5000 smaller than the regular kernel. In my cases, the
> >>> physical_node_map and kern_memmap of the kdump kernel overlaped and
> >>> caused data corruption.
> >>>
> >>> This patch fixes the problem. The patch was generated against
> >>> kexec-tools 2.0.0 and tested in 2.6.27-rc4.
> >> Hi Jay,
> >>
> >> I am unclear about why this underallocation occurs.
> > 
> > Hi Simon,
> > 
> > The routine add_loaded_segments_info() set up "loaded_segment" array
> > that is needed by purgatory code, based on data stored in the
> > mem_ehdr array passed in as the second parameter.
> > 
> > Upon entrance of the routine, the crash_memory_range[] contains
> > information about the regular kernel:
> > crash_memory_range[ 0]: start=  3000080000, end=  30003fffff
> > crash_memory_range[ 1]: start=  3003000000, end=  3005ffffff
> > crash_memory_range[ 2]: start=  3006000000, end=  3013ffffff
> > crash_memory_range[ 3]: start=  3014000000, end=  3015294fff
> > 
> > The #3 entry is the kernel memory segment.
> > 
> > And the mem_ehdr array would contain data as such:
> 
> Hi,
> 
> It should be mem_phdr, got it from mem_ehdr->e_phdr.
> 
> > i=0, p_paddr=3018000000, p_memsz=d04480, p_offset=10000, p_type=1
> > i=1, p_paddr=3018d20000, p_memsz=9620, p_offset=d20000, p_type=1
> > i=2, p_paddr=3018d30000, p_memsz=564490, p_offset=d30000, p_type=1
> > i=3, p_paddr=0, p_memsz=0, p_offset=0, p_type=4
> 
> Does anyone understand how the array were created and why there
> was a gap between i=0 and i=1 entries? I think this is the problem
> but i do not know how to fix it, so tried to work around it.
> 
> The statement my patch replaced was totally broken:
>  -			if (loaded_segments[loaded_segments_num].end !=
>  -				phdr->p_paddr & ~(ELF_PAGE_SIZE-1))
>  -				break;
>  +			if (loaded_segments[loaded_segments_num].end <
>  +			    (phdr->p_paddr & ~(ELF_PAGE_SIZE-1)) )
>  +				loaded_segments[loaded_segments_num].end
>  +				  = phdr->p_paddr & ~(ELF_PAGE_SIZE-1);
> 
> My debugging showed that when "loaded_segments[loaded_segments_num].end"
> != "phdr->p_paddr & ~(ELF_PAGE_SIZE-1)", they were treated as equal
> and continue to next statement.  However, if i assign both expression
> to local variables and do comparison, the 'break' statement is
> executed correctly when two values are not the same. Unfortunately,
> consequently the kdump kernel would _alawys_ hang.
> 
> I believe the intent of the original statement is to ensure there is
> no gap between entries of mem_phdr array. But if there is a gap,
> kexec should simply exit with failure. The 'break' statement just
> created a loaded_segment[] array that broke the kernel memory segment
> into multiple entries and resulted in the kdump kernel hang in
> find_memory(). The IA64 (at least 2.6.27-rc4) kdump kernel works in
> some cases today are simply out of luck.
> 
> I believe the real fix is to fix the contents of the mem_phdr array.
> Since i do not know how to fix it, my patch would close up the
> gap where there is the a gap between entries of the mem_phdr array.
> 
> Does it make more sense to you now, Simon?
> 
> Regards,
>  - jay
> 
> 
> 
> > 
> > The code wants the new loaded_segments contain starting address
> > all aligned at page boundary, which is 0x10000 in IA64.
> > 
> > Note that the p_memsz of mem_ehdr does not match to entries in
> > /proc/iomem:
> > 3014000000-3015294fff : System RAM
> >   3014000000-3014823ccf : Kernel code
> >   3014823cd0-3014dee8ef : Kernel data
> >   3014dee8f0-301529448f : Kernel bss
> > 
> > The original code of add_loaded_segments_info() would go through
> > the mem_ehdr array and use the p_paddr of the first entry (the
> > beginning of the reserved memory) as the start address, add
> > the p_memsz of three entries to calculate the end address of
> > the kernel segment.
> > 
> > But the p_paddr of i=0 plus p_memsz of i=0 should result in
> > 3018d10000 as the p_paddr of i=1 entry, but actually the
> > p_paddr of i=1 is 3018d20000. The logic of that routine
> > can not explain the discrepency.
> > 
> > So, where the data of mem_ehdr array come from?
> > 
> > add_loaded_segments_info
> > <-  load_crashdump_segments
> >   <-  elf_ia64_load
> >     <- file_type[i].load
> >       <- my_load
> > 
> > The elf_ia64_load set up mem_ehdr, probabaly based on data
> > pointed by *buf, which i think comes from vmlinuz.

Yes, I believe that is the case too.

> > So, i failed to find out how the p_memsz were set up initially.
> > But, i think we did it the way too complicated, IMHO.


I think that the relevant code path is:

build_mem_elf64_phdr()
called by: build_mem_phdrs()
called by: build_elf_info()
called by: build_elf_exec_info()
called by: elf_ia64_load(), before calling load_crashdump_segments()

As the PT_LOAD segments in mem_ehdr should correlate with data
read from vmlinux, perhaps you can see something interesting by running

readelf -l vmlinux

> > The crash_memory_range[] array showed the kernel segment consumed
> > 0x1295000 bytes of memory and we only need to tell the purgatory
> > code to reserve that amount of memory. The logic in
> > add_loaded_segments_info() came out with 0x1290000 and caused the
> > crashkernel to panic on boot.
> > 
> > Hmmm, as i types now, i may not consider the situation where
> > the crashkernel is not the same as the first kernel...
> > 
> > Note that the underallocation does not _ALWAYS_ happen! It depends
> > on the vmlinux we build. Honestly i do not understand some part of
> > the kexec-tools code well enough to make major surgery to the code.
> > So, i just compare the end address after calculation of i=0 entry
> > of mem_ehdr array with the start address of the second entry. If it
> > is too small, i just bring it up to align with the start address of
> > the second entry. I am happy to allocate one extra page, may not be
> > needed in some cases, of memory than to panic. Yes, my patch is
> > a work-around.
> > 
> > If you can find the true cause of the problem and fix it, it
> > would be great and appreciated!

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-09-15  6:58 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-03 21:01 [PATCH] IA64: kexec allocates too few memory for kdump kernel itself Jay Lan
2008-09-04  0:01 ` Simon Horman
2008-09-04  1:37   ` Jay Lan
2008-09-04 18:28     ` Jay Lan
2008-09-12 23:27       ` Simon Horman
2008-09-13  0:38         ` Jay Lan
2008-09-15  5:47           ` Simon Horman
2008-09-15  6:58       ` Simon Horman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.