public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed
* Decompression failure in an inexplicable case
@ 2010-04-29 15:27 Michael Cashwell
  2010-04-29 15:40 ` Catalin Marinas
  2010-04-29 16:34 ` Catalin Marinas
  0 siblings, 2 replies; 8+ messages in thread
From: Michael Cashwell @ 2010-04-29 15:27 UTC (permalink / raw)
  To: linux-arm-kernel

Greetings,

I'm working on a custom 2.6.33.2 port to Gumstix Verdex XL6P (PXA270) that is exhibiting odd behavior.

If I omit CPUFreq support [CONFIG_CPU_FREQ is not set], or enable both it and its debugging support [CONFIG_CPU_FREQ_DEBUG=y], the kernel builds and runs seemingly fine.

But if I have CONFIG_CPU_FREQ enabled but NOT its debugging support then kernel builds OK (including full mrproper cleanings in between):
...
  LD      vmlinux
  SYSMAP  System.map
  SYSMAP  .tmp_System.map
  OBJCOPY arch/arm/boot/Image
  Kernel: arch/arm/boot/Image is ready
  AS      arch/arm/boot/compressed/head.o
  GZIP    arch/arm/boot/compressed/piggy.gzip
  CC      arch/arm/boot/compressed/misc.o
  AS      arch/arm/boot/compressed/head-xscale.o
  SHIPPED arch/arm/boot/compressed/lib1funcs.S
  AS      arch/arm/boot/compressed/lib1funcs.o
  AS      arch/arm/boot/compressed/piggy.gzip.o
  LD      arch/arm/boot/compressed/vmlinux
  OBJCOPY arch/arm/boot/zImage
  Kernel: arch/arm/boot/zImage is ready
#### Exporting linux-2.6.33.2-arm-gum to netboot.
Image Name:   2.6.33.2-gum1
Created:      Wed Apr 28 16:12:14 2010
Image Type:   ARM Linux Kernel Image (uncompressed)
Data Size:    1736780 Bytes = 1696.07 kB = 1.66 MB
Load Address: 0xA1000000
Entry Point:  0xA1000000

but fails to run:

TFTP of 'GUM1/boot/uImage' from server 10.18.1.11; our IP address is 10.18.17.1 to address 0xa2000000
Loading: #################################################################
         ######################################################
Bytes transferred = 1736844 (1a808c hex)
## Booting image at a2000000 ...
   Image Name:   2.6.33.2-gum1
   Image Type:   ARM Linux Kernel Image (uncompressed)
   Data Size:    1736780 Bytes =  1.7 MB
   Load Address: a1000000
   Entry Point:  a1000000
   Verifying Checksum ... OK

Starting kernel at address a1000000 ...

Uncompressing Linux...

uncompression error

 -- System halted

As noted on another thread, I'm working on cpufreq-pxa2xx.c but I can't see how anything done there could directly cause the decompression to fail. Surely decompression is way too early for any code in cpu-freq to be called. (Yes?)

My hunch is that some subtle image/map linker issue is causing the non-cpufreq-debug case to lay the image out just so the decompressor gets confused but the cause and effect seem so random I'm not sure where to look.

I've seen this sort of thing before on boards where SDRAM is flakey but again, this is an unmodified commercial Gumstix board (in fact, several of them) so this seems unlikely.

I can leave cpufreq debugging compiled in and not pass a "cpufreq.debug=" kernel arg but I don't want to sweep this under the rug.

Anyone have any ideas? Is there any way to get more info from the decompressor or tests I can conduct against the zImage to determine what's wrong?

-Mike

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Decompression failure in an inexplicable case
  2010-04-29 15:27 Decompression failure in an inexplicable case Michael Cashwell
@ 2010-04-29 15:40 ` Catalin Marinas
  2010-04-29 16:09   ` Albin Tonnerre
  2010-04-29 17:16   ` Michael Cashwell
  2010-04-29 16:34 ` Catalin Marinas
  1 sibling, 2 replies; 8+ messages in thread
From: Catalin Marinas @ 2010-04-29 15:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 2010-04-29 at 16:27 +0100, Michael Cashwell wrote:
> If I omit CPUFreq support [CONFIG_CPU_FREQ is not set], or enable both
> it and its debugging support [CONFIG_CPU_FREQ_DEBUG=y], the kernel
> builds and runs seemingly fine.
> 
> But if I have CONFIG_CPU_FREQ enabled but NOT its debugging support
> then kernel builds OK (including full mrproper cleanings in between):

I had some issues a few weeks ago with the decompressing:

http://thread.gmane.org/gmane.linux.ports.arm.kernel/73476

That seemed to do with the size of the Image file and randomly removing
parts of it made it work. Unfortunately, I couldn't reproduce it so that
others can try.

-- 
Catalin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Decompression failure in an inexplicable case
  2010-04-29 15:40 ` Catalin Marinas
@ 2010-04-29 16:09   ` Albin Tonnerre
  2010-04-29 17:16   ` Michael Cashwell
  1 sibling, 0 replies; 8+ messages in thread
From: Albin Tonnerre @ 2010-04-29 16:09 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Apr 29, 2010 at 5:40 PM, Catalin Marinas
<catalin.marinas@arm.com> wrote:
> On Thu, 2010-04-29 at 16:27 +0100, Michael Cashwell wrote:
>> If I omit CPUFreq support [CONFIG_CPU_FREQ is not set], or enable both
>> it and its debugging support [CONFIG_CPU_FREQ_DEBUG=y], the kernel
>> builds and runs seemingly fine.
>>
>> But if I have CONFIG_CPU_FREQ enabled but NOT its debugging support
>> then kernel builds OK (including full mrproper cleanings in between):
>
> I had some issues a few weeks ago with the decompressing:
>
> http://thread.gmane.org/gmane.linux.ports.arm.kernel/73476
>
> That seemed to do with the size of the Image file and randomly removing
> parts of it made it work. Unfortunately, I couldn't reproduce it so that
> others can try.

You might want to try using LZO instead of gzip, that one should work correctly.

Cheers,
Albin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Decompression failure in an inexplicable case
  2010-04-29 15:27 Decompression failure in an inexplicable case Michael Cashwell
  2010-04-29 15:40 ` Catalin Marinas
@ 2010-04-29 16:34 ` Catalin Marinas
  2010-04-29 16:39   ` Russell King - ARM Linux
  2010-04-29 18:59   ` Michael Cashwell
  1 sibling, 2 replies; 8+ messages in thread
From: Catalin Marinas @ 2010-04-29 16:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 2010-04-29 at 16:27 +0100, Michael Cashwell wrote:
> If I omit CPUFreq support [CONFIG_CPU_FREQ is not set], or enable both
> it and its debugging support [CONFIG_CPU_FREQ_DEBUG=y], the kernel
> builds and runs seemingly fine.
> 
> But if I have CONFIG_CPU_FREQ enabled but NOT its debugging support
> then kernel builds OK (including full mrproper cleanings in between):

BTW, one of the suggestions at the time was to try this:

diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
index 3dc2cf8..e12e382 100644
--- a/arch/arm/boot/compressed/head.S
+++ b/arch/arm/boot/compressed/head.S
@@ -316,7 +316,7 @@ LC0:		.word	LC0			@ r1
 		.word	_start			@ r5
 		.word	_got_start		@ r6
 		.word	_got_end		@ ip
-		.word	user_stack+4096		@ sp
+		.word	user_stack+8192		@ sp
 LC1:		.word	reloc_end - reloc_start
 		.size	LC0, . - LC0
 
@@ -1086,4 +1086,4 @@ reloc_end:
 
 		.align
 		.section ".stack", "w"
-user_stack:	.space	4096
+user_stack:	.space	8192


-- 
Catalin

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Decompression failure in an inexplicable case
  2010-04-29 16:34 ` Catalin Marinas
@ 2010-04-29 16:39   ` Russell King - ARM Linux
  2010-04-29 17:29     ` Michael Cashwell
  2010-04-29 18:59   ` Michael Cashwell
  1 sibling, 1 reply; 8+ messages in thread
From: Russell King - ARM Linux @ 2010-04-29 16:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Apr 29, 2010 at 05:34:46PM +0100, Catalin Marinas wrote:
> On Thu, 2010-04-29 at 16:27 +0100, Michael Cashwell wrote:
> > If I omit CPUFreq support [CONFIG_CPU_FREQ is not set], or enable both
> > it and its debugging support [CONFIG_CPU_FREQ_DEBUG=y], the kernel
> > builds and runs seemingly fine.
> > 
> > But if I have CONFIG_CPU_FREQ enabled but NOT its debugging support
> > then kernel builds OK (including full mrproper cleanings in between):
> 
> BTW, one of the suggestions at the time was to try this:

Another thing is to ask which kernel version.  There's a number of fixes
which recently went in to both -stable and -rc to fix problems with
the decompressor.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Decompression failure in an inexplicable case
  2010-04-29 15:40 ` Catalin Marinas
  2010-04-29 16:09   ` Albin Tonnerre
@ 2010-04-29 17:16   ` Michael Cashwell
  1 sibling, 0 replies; 8+ messages in thread
From: Michael Cashwell @ 2010-04-29 17:16 UTC (permalink / raw)
  To: linux-arm-kernel

On Apr 29, 2010, at 11:40 AM, Catalin Marinas wrote:

> On Thu, 2010-04-29 at 16:27 +0100, Michael Cashwell wrote:
>> If I omit CPUFreq support [CONFIG_CPU_FREQ is not set], or enable both it and its debugging support [CONFIG_CPU_FREQ_DEBUG=y], the kernel builds and runs seemingly fine.
>> 
>> But if I have CONFIG_CPU_FREQ enabled but NOT its debugging support then kernel builds OK (including full mrproper cleanings in between):
> 
> I had some issues a few weeks ago with the decompressing:
> 
> http://thread.gmane.org/gmane.linux.ports.arm.kernel/73476
> 
> That seemed to do with the size of the Image file and randomly removing parts of it made it work. Unfortunately, I couldn't reproduce it so that others can try.

OK, thank you!  (I was wondering what happened to the ... dots during decompression.)

It was the sense that random size/layout changes were involved that worried me. Contrary to that thread, I'm seeing this with a kernel of about 1.66MB.

-rwxr-xr-x 1 cashwell cashwell  1813956 2010-04-29 12:45 arch/arm/boot/compressed/vmlinux
-rw-rw-r-- 1 cashwell cashwell  1736932 2010-04-29 12:45 arch/arm/boot/uImage
-rwxr-xr-x 1 cashwell cashwell  1736868 2010-04-29 12:45 arch/arm/boot/zImage
-rwxr-xr-x 1 cashwell cashwell 36840120 2010-04-29 12:31 vmlinux

And for me the *smaller* kernel (without cpufreq debugging) fails.

I tried applying Uwe's patches from 2010-02-03 09:41:07 GMT and got this during the build:
arch/arm/boot/compressed/misc.c: In function 'decompress_kernel':
arch/arm/boot/compressed/misc.c:308: warning: passing argument 4 of 'gunzip' from incompatible pointer type
arch/arm/boot/compressed/misc.c:297: warning: unused variable 'tmp'

I fixed those (deleted the tmp variable and change the first arg of the flush function from char * to void * and defined buff by casting the first arg).

That built but I still get:

Bytes transferred = 1736932 (1a80e4 hex)
## Booting image at a2000000 ...
   Image Name:   2.6.33.2-gum1
   Image Type:   ARM Linux Kernel Image (uncompressed)
   Data Size:    1736868 Bytes =  1.7 MB
   Load Address: a1000000
   Entry Point:  a1000000
   Verifying Checksum ... OK

Starting kernel at address a1000000 ...

Uncompressing Linux..............................................................................................................

uncompression error

 -- System halted

I've also defined DEBUG in arch/arm/boot/compressed/head.S as Uwe also advised. The output and result was unchanged but the dots progressed much more slowly.

Lastly, I just tried using LZO compression instead as the thread also mentioned. After a quick "sudo yum install lzop" it built. Interestingly it's about 6% larger than with GZIP, but it seems to work better:

TFTP of 'cashwell/netboot/GUM1/boot/uImage' from server 10.18.1.11; our IP address is 10.18.17.1 to address 0xa2000000
Loading: #################################################################
         ################################################################
Bytes transferred = 1883320 (1cbcb8 hex)
## Booting image at a2000000 ...
   Image Name:   2.6.33.2-gum1
   Image Type:   ARM Linux Kernel Image (uncompressed)
   Data Size:    1883256 Bytes =  1.8 MB
   Load Address: a1000000
   Entry Point:  a1000000
   Verifying Checksum ... OK

Starting kernel at address a1000000 ...

Uncompressing Linux................. done, booting the kernel.
Linux version 2.6.33.2 (cashwell at mec-fedora12.argon.local) (gcc version 4.3.4 (GCC) ) #1 PREEMPT Thu Apr 29 12:31:38 EDT 2010
CPU: XScale-PXA270 [69054117] revision 7 (ARMv5TE), cr=0000397f
CPU: VIVT data cache, VIVT instruction cache
Machine: Gumstix Verdex PRO
Memory policy: ECC disabled, Data cache writeback
On node 0 totalpages: 32768
free_area_init_node: node 0, pgdat c0358ff0, node_mem_map c0376000
  Normal zone: 256 pages used for memmap
  Normal zone: 0 pages reserved
  Normal zone: 32512 pages, LIFO batch:7
CCSR: 30000310, MDREFR: 201fc031
...

Fewer dots is interesting. But at least this helps me chalk this up to GZIP code being broken somehow.

LZO seems to be the winner.

Thanks!
-Mike

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Decompression failure in an inexplicable case
  2010-04-29 16:39   ` Russell King - ARM Linux
@ 2010-04-29 17:29     ` Michael Cashwell
  0 siblings, 0 replies; 8+ messages in thread
From: Michael Cashwell @ 2010-04-29 17:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Apr 29, 2010, at 12:39 PM, Russell King - ARM Linux wrote:

> On Thu, Apr 29, 2010 at 05:34:46PM +0100, Catalin Marinas wrote:
>> On Thu, 2010-04-29 at 16:27 +0100, Michael Cashwell wrote:
>>> If I omit CPUFreq support [CONFIG_CPU_FREQ is not set], or enable both
>>> it and its debugging support [CONFIG_CPU_FREQ_DEBUG=y], the kernel
>>> builds and runs seemingly fine.
>>> 
>>> But if I have CONFIG_CPU_FREQ enabled but NOT its debugging support
>>> then kernel builds OK (including full mrproper cleanings in between):
>> 
>> BTW, one of the suggestions at the time was to try this:
> 
> Another thing is to ask which kernel version.  There's a number of fixes which recently went in to both -stable and -rc to fix problems with the decompressor.

>From first post (sorry to hear about your Internet connectivity woes Russell!):

> I'm working on a custom 2.6.33.2 port to Gumstix Verdex XL6P (PXA270) ...

The custom part is vanilla board support (missing stuff we don't care about like frame buffers and Bluetooth).

That's why I was so rattled by the decompression problems. I haven't been working anywhere near that.

I've not tried GZIP with the 4K -> 8K stack space Catalin mentioned.

Since I presently seem to have this Heisenbug somewhat cornered and reproducible if doing that and reporting the results would be helpful I'm happy to do so.

-Mike

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Decompression failure in an inexplicable case
  2010-04-29 16:34 ` Catalin Marinas
  2010-04-29 16:39   ` Russell King - ARM Linux
@ 2010-04-29 18:59   ` Michael Cashwell
  1 sibling, 0 replies; 8+ messages in thread
From: Michael Cashwell @ 2010-04-29 18:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Apr 29, 2010, at 12:34 PM, Catalin Marinas wrote:

> On Thu, 2010-04-29 at 16:27 +0100, Michael Cashwell wrote:
>> If I omit CPUFreq support [CONFIG_CPU_FREQ is not set], or enable both it and its debugging support [CONFIG_CPU_FREQ_DEBUG=y], the kernel builds and runs seemingly fine.
>> 
>> But if I have CONFIG_CPU_FREQ enabled but NOT its debugging support then kernel builds OK (including full mrproper cleanings in between):
> 
> BTW, one of the suggestions at the time was to try this:
> 
> diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
> index 3dc2cf8..e12e382 100644
> --- a/arch/arm/boot/compressed/head.S
> +++ b/arch/arm/boot/compressed/head.S
> @@ -316,7 +316,7 @@ LC0:		.word	LC0			@ r1
> 		.word	_start			@ r5
> 		.word	_got_start		@ r6
> 		.word	_got_end		@ ip
> -		.word	user_stack+4096		@ sp
> +		.word	user_stack+8192		@ sp
> LC1:		.word	reloc_end - reloc_start
> 		.size	LC0, . - LC0
> 
> @@ -1086,4 +1086,4 @@ reloc_end:
> 
> 		.align
> 		.section ".stack", "w"
> -user_stack:	.space	4096
> +user_stack:	.space	8192

OK, I tried this (with Uwe's other malloc heap size and decompress_flush() callback patches) and sadly my results were unchanged:

Bytes transferred = 1736932 (1a80e4 hex)
## Booting image at a2000000 ...
   Image Name:   2.6.33.2-gum1
   Image Type:   ARM Linux Kernel Image (uncompressed)
   Data Size:    1736868 Bytes =  1.7 MB
   Load Address: a1000000
   Entry Point:  a1000000
   Verifying Checksum ... OK
No initrd

Starting kernel at address a1000000 ...

Uncompressing Linux..............................................................................................................

uncompression error

 -- System halted

So there still seems to be some data-dependent issues in the kernel's gunzip decompressor.

-Mike

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-04-29 18:59 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-29 15:27 Decompression failure in an inexplicable case Michael Cashwell
2010-04-29 15:40 ` Catalin Marinas
2010-04-29 16:09   ` Albin Tonnerre
2010-04-29 17:16   ` Michael Cashwell
2010-04-29 16:34 ` Catalin Marinas
2010-04-29 16:39   ` Russell King - ARM Linux
2010-04-29 17:29     ` Michael Cashwell
2010-04-29 18:59   ` Michael Cashwell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox