kernel-testers.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* wrong final bzImage build (regading #14270)
@ 2009-10-09 14:17 Michael Tokarev
  2009-10-09 14:26 ` Michael Tokarev
  2009-10-09 14:58 ` Cyrill Gorcunov
  0 siblings, 2 replies; 11+ messages in thread
From: Michael Tokarev @ 2009-10-09 14:17 UTC (permalink / raw)
  To: Kernel Mailing List
  Cc: Rafael J. Wysocki, Cyrill Gorcunov, Kernel Testers List

Ok, finally the mystery solved.  After a week of
digging.

The original problem was titled "Cannot boot on
a PIII Celeron", and Rafael filed a bug #14270
for this.

In short, what I observed was that a new kernel
(2.6.31) fails to boot on a PIII Celeron machine.
But changing just the CPU to plain PIII and voila,
it now works.  I don't know why it behaved this
way, but I found where was the problem, finally.

And the problem is in the last stage of build, when
building the bzImage.

make -f scripts/Makefile.build obj=arch/x86/boot/compressed arch/x86/boot/compressed/vmlinux
...
   (cat arch/x86/boot/compressed/vmlinux.bin | lzma -9 && echo -ne \\x38\\xd6\\x37\\x00) > arch/x86/boot/compressed/vmlinux.bin.lzma
...

Note the echo command.

Now, Debian switched to dash as /bin/sh.  And dash
does not understand the -e option:

$ dash -c 'echo -ne \\x38\\xd6\\x37\\x00' | od -x
0000000 6e2d 2065 785c 3833 785c 3664 785c 3733
0000020 785c 3030 000a

$ bash -c 'echo -ne \\x38\\xd6\\x37\\x00' | od -x
0000000 d638 0037

So the final size (it's the size of uncompressed file)
becomes incorrect.  Here's what mkpiggy outputs for
this (in arch/x86/boot/compressed/piggy.S):

  z_output_len = 170930296

while it should be

  z_output_len = 3659320

And with the former (wrong, larger) size, the whole
thing just reboots on a PIII Celeron.  I've no idea
why, but the original problem is here.

The same thing happens with bzip2 algorithm which is
not new, not only with lzma.

The whole thing looks quite hackish to me, -- mkpiggy
can know the size from the original image just fine,
instead of getting it from the end of already compressed
file.

For now, quick fix is to change echo to printf in there.
Correct fix is to re-write mkpiggy to look at the
original file for size (IMHO anyway).

And this is a very good candidate for -stable as well.
The bug is very difficult to find.  And now when more
and more people who use Debian are switching to dash,
it will be more common.

Thanks!

/mjt

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: wrong final bzImage build (regading #14270)
  2009-10-09 14:17 wrong final bzImage build (regading #14270) Michael Tokarev
@ 2009-10-09 14:26 ` Michael Tokarev
  2009-10-09 14:58 ` Cyrill Gorcunov
  1 sibling, 0 replies; 11+ messages in thread
From: Michael Tokarev @ 2009-10-09 14:26 UTC (permalink / raw)
  To: Kernel Mailing List
  Cc: Rafael J. Wysocki, Cyrill Gorcunov, Kernel Testers List

And I forgot to mention: this IS a regression in 2.6.31.

Michael Tokarev wrote:
> Ok, finally the mystery solved.  After a week of
> digging.
> 
> The original problem was titled "Cannot boot on
> a PIII Celeron", and Rafael filed a bug #14270
> for this.
> 
> In short, what I observed was that a new kernel
> (2.6.31) fails to boot on a PIII Celeron machine.
> But changing just the CPU to plain PIII and voila,
> it now works.  I don't know why it behaved this
> way, but I found where was the problem, finally.
> 
> And the problem is in the last stage of build, when
> building the bzImage.
> 
> make -f scripts/Makefile.build obj=arch/x86/boot/compressed 
> arch/x86/boot/compressed/vmlinux
> ...
>   (cat arch/x86/boot/compressed/vmlinux.bin | lzma -9 && echo -ne 
> \\x38\\xd6\\x37\\x00) > arch/x86/boot/compressed/vmlinux.bin.lzma
> ...
> 
> Note the echo command.
> 
> Now, Debian switched to dash as /bin/sh.  And dash
> does not understand the -e option:
> 
> $ dash -c 'echo -ne \\x38\\xd6\\x37\\x00' | od -x
> 0000000 6e2d 2065 785c 3833 785c 3664 785c 3733
> 0000020 785c 3030 000a
> 
> $ bash -c 'echo -ne \\x38\\xd6\\x37\\x00' | od -x
> 0000000 d638 0037
> 
> So the final size (it's the size of uncompressed file)
> becomes incorrect.  Here's what mkpiggy outputs for
> this (in arch/x86/boot/compressed/piggy.S):
> 
>  z_output_len = 170930296
> 
> while it should be
> 
>  z_output_len = 3659320
> 
> And with the former (wrong, larger) size, the whole
> thing just reboots on a PIII Celeron.  I've no idea
> why, but the original problem is here.
> 
> The same thing happens with bzip2 algorithm which is
> not new, not only with lzma.
> 
> The whole thing looks quite hackish to me, -- mkpiggy
> can know the size from the original image just fine,
> instead of getting it from the end of already compressed
> file.
> 
> For now, quick fix is to change echo to printf in there.
> Correct fix is to re-write mkpiggy to look at the
> original file for size (IMHO anyway).
> 
> And this is a very good candidate for -stable as well.
> The bug is very difficult to find.  And now when more
> and more people who use Debian are switching to dash,
> it will be more common.
> 
> Thanks!
> 
> /mjt

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: wrong final bzImage build (regading #14270)
  2009-10-09 14:17 wrong final bzImage build (regading #14270) Michael Tokarev
  2009-10-09 14:26 ` Michael Tokarev
@ 2009-10-09 14:58 ` Cyrill Gorcunov
  2009-10-09 17:03   ` H. Peter Anvin
  2009-10-09 19:39   ` Michael Tokarev
  1 sibling, 2 replies; 11+ messages in thread
From: Cyrill Gorcunov @ 2009-10-09 14:58 UTC (permalink / raw)
  To: Michael Tokarev
  Cc: Kernel Mailing List, Rafael J. Wysocki, Kernel Testers List,
	Sam Ravnborg, H. Peter Anvin

Peter and Sam CC'ed

[Michael Tokarev - Fri, Oct 09, 2009 at 06:17:50PM +0400]
> Ok, finally the mystery solved.  After a week of
> digging.
>
> The original problem was titled "Cannot boot on
> a PIII Celeron", and Rafael filed a bug #14270
> for this.
>
> In short, what I observed was that a new kernel
> (2.6.31) fails to boot on a PIII Celeron machine.
> But changing just the CPU to plain PIII and voila,
> it now works.  I don't know why it behaved this
> way, but I found where was the problem, finally.
>
> And the problem is in the last stage of build, when
> building the bzImage.
>
> make -f scripts/Makefile.build obj=arch/x86/boot/compressed arch/x86/boot/compressed/vmlinux
> ...
>   (cat arch/x86/boot/compressed/vmlinux.bin | lzma -9 && echo -ne \\x38\\xd6\\x37\\x00) > arch/x86/boot/compressed/vmlinux.bin.lzma
> ...
>
> Note the echo command.
>
> Now, Debian switched to dash as /bin/sh.  And dash
> does not understand the -e option:
>
> $ dash -c 'echo -ne \\x38\\xd6\\x37\\x00' | od -x
> 0000000 6e2d 2065 785c 3833 785c 3664 785c 3733
> 0000020 785c 3030 000a
>
> $ bash -c 'echo -ne \\x38\\xd6\\x37\\x00' | od -x
> 0000000 d638 0037
>
> So the final size (it's the size of uncompressed file)
> becomes incorrect.  Here's what mkpiggy outputs for
> this (in arch/x86/boot/compressed/piggy.S):
>
>  z_output_len = 170930296
>
> while it should be
>
>  z_output_len = 3659320
>
> And with the former (wrong, larger) size, the whole
> thing just reboots on a PIII Celeron.  I've no idea
> why, but the original problem is here.
>
> The same thing happens with bzip2 algorithm which is
> not new, not only with lzma.
>
> The whole thing looks quite hackish to me, -- mkpiggy
> can know the size from the original image just fine,
> instead of getting it from the end of already compressed
> file.
>
> For now, quick fix is to change echo to printf in there.
> Correct fix is to re-write mkpiggy to look at the
> original file for size (IMHO anyway).
>
> And this is a very good candidate for -stable as well.
> The bug is very difficult to find.  And now when more
> and more people who use Debian are switching to dash,
> it will be more common.
>
> Thanks!
>
> /mjt
>

	-- Cyrill

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: wrong final bzImage build (regading #14270)
  2009-10-09 14:58 ` Cyrill Gorcunov
@ 2009-10-09 17:03   ` H. Peter Anvin
       [not found]     ` <4ACF6CF8.4060204-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
  2009-10-09 19:39   ` Michael Tokarev
  1 sibling, 1 reply; 11+ messages in thread
From: H. Peter Anvin @ 2009-10-09 17:03 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Michael Tokarev, Kernel Mailing List, Rafael J. Wysocki,
	Kernel Testers List, Sam Ravnborg

On 10/09/2009 07:58 AM, Cyrill Gorcunov wrote:
> Peter and Sam CC'ed
> 
> [Michael Tokarev - Fri, Oct 09, 2009 at 06:17:50PM +0400]
>> Ok, finally the mystery solved.  After a week of
>> digging.
>>
>> The original problem was titled "Cannot boot on
>> a PIII Celeron", and Rafael filed a bug #14270
>> for this.
>>
>> In short, what I observed was that a new kernel
>> (2.6.31) fails to boot on a PIII Celeron machine.
>> But changing just the CPU to plain PIII and voila,
>> it now works.  I don't know why it behaved this
>> way, but I found where was the problem, finally.
>>

We should switch to printf here.  Hexadecimal constants in echo aren't
guaranteed by POSIX.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: wrong final bzImage build (regading #14270)
       [not found]     ` <4ACF6CF8.4060204-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
@ 2009-10-09 17:14       ` Michael Tokarev
  0 siblings, 0 replies; 11+ messages in thread
From: Michael Tokarev @ 2009-10-09 17:14 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Cyrill Gorcunov, Kernel Mailing List, Rafael J. Wysocki,
	Kernel Testers List, Sam Ravnborg

H. Peter Anvin пишет:
> On 10/09/2009 07:58 AM, Cyrill Gorcunov wrote:
>> Peter and Sam CC'ed
>>
>> [Michael Tokarev - Fri, Oct 09, 2009 at 06:17:50PM +0400]
>>> Ok, finally the mystery solved.  After a week of
>>> digging.
>>>
>>> The original problem was titled "Cannot boot on
>>> a PIII Celeron", and Rafael filed a bug #14270
>>> for this.
>>>
>>> In short, what I observed was that a new kernel
>>> (2.6.31) fails to boot on a PIII Celeron machine.
>>> But changing just the CPU to plain PIII and voila,
>>> it now works.  I don't know why it behaved this
>>> way, but I found where was the problem, finally.
> 
> We should switch to printf here.  Hexadecimal constants in echo aren't
> guaranteed by POSIX.

That's what I initially proposed.  However, as Scott Olson pointed
out, there's already a fix for this:

http://lkml.org/lkml/2009/8/19/84
http://patchwork.kernel.org/patch/42564/

which uses still-non-portable /bin/echo.

(I wish I knew about it a week before now - it wasn't a pleasant week for me).

Still an interesting result.  I can understand if it failed
for systems with smaller amounts of memory, -- nope, it fails
with Celeron on a 64Mb system, but works on the same system
if I replace the CPU to a real PIII...  Fun.

/mjt

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: wrong final bzImage build (regading #14270)
  2009-10-09 14:58 ` Cyrill Gorcunov
  2009-10-09 17:03   ` H. Peter Anvin
@ 2009-10-09 19:39   ` Michael Tokarev
       [not found]     ` <4ACF9184.9040104-Gdu+ltImwkhes2APU0mLOQ@public.gmane.org>
  2009-10-09 20:02     ` Arkadiusz Miskiewicz
  1 sibling, 2 replies; 11+ messages in thread
From: Michael Tokarev @ 2009-10-09 19:39 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Kernel Mailing List, Rafael J. Wysocki, Kernel Testers List,
	Sam Ravnborg, H. Peter Anvin

Ok, some more to this.

It turns out dash's built-in echo command interprets \nnn octal
sequences by default, and there's no way to turn that off.  So,
for example, sed-zoffset command from arch/x86/boot/Makefile
(which includes \1 \2 etc substitutions for sed), when echoed
in verbose mode (V=1), produces.. interesting characters (with
ascii code 1 and 2).

It's not practival to replace V=1's echo with /bin/echo I think.

So I'd say it's not a bug in the build system after all, but
a bug in dash.  Well, at least this expanding-by-default didn't
trigger another very-difficult-to-find bug (hopefully), but it
has good potential.

I'll file a bug report against dash.

/mjt

> [Michael Tokarev - Fri, Oct 09, 2009 at 06:17:50PM +0400]
>> Ok, finally the mystery solved.  After a week of
>> digging.
>>
>> The original problem was titled "Cannot boot on
>> a PIII Celeron", and Rafael filed a bug #14270
>> for this.
>>
>> In short, what I observed was that a new kernel
>> (2.6.31) fails to boot on a PIII Celeron machine.
>> But changing just the CPU to plain PIII and voila,
>> it now works.  I don't know why it behaved this
>> way, but I found where was the problem, finally.
>>
>> And the problem is in the last stage of build, when
>> building the bzImage.
>>
>> make -f scripts/Makefile.build obj=arch/x86/boot/compressed arch/x86/boot/compressed/vmlinux
>> ...
>>   (cat arch/x86/boot/compressed/vmlinux.bin | lzma -9 && echo -ne \\x38\\xd6\\x37\\x00) > arch/x86/boot/compressed/vmlinux.bin.lzma
>> ...
>>
>> Note the echo command.
>>
>> Now, Debian switched to dash as /bin/sh.  And dash
>> does not understand the -e option:
>>
>> $ dash -c 'echo -ne \\x38\\xd6\\x37\\x00' | od -x
>> 0000000 6e2d 2065 785c 3833 785c 3664 785c 3733
>> 0000020 785c 3030 000a
>>
>> $ bash -c 'echo -ne \\x38\\xd6\\x37\\x00' | od -x
>> 0000000 d638 0037
>>
>> So the final size (it's the size of uncompressed file)
>> becomes incorrect.  Here's what mkpiggy outputs for
>> this (in arch/x86/boot/compressed/piggy.S):
>>
>>  z_output_len = 170930296
>>
>> while it should be
>>
>>  z_output_len = 3659320
>>
>> And with the former (wrong, larger) size, the whole
>> thing just reboots on a PIII Celeron.  I've no idea
>> why, but the original problem is here.
>>
>> The same thing happens with bzip2 algorithm which is
>> not new, not only with lzma.
>>
>> The whole thing looks quite hackish to me, -- mkpiggy
>> can know the size from the original image just fine,
>> instead of getting it from the end of already compressed
>> file.
>>
>> For now, quick fix is to change echo to printf in there.
>> Correct fix is to re-write mkpiggy to look at the
>> original file for size (IMHO anyway).
>>
>> And this is a very good candidate for -stable as well.
>> The bug is very difficult to find.  And now when more
>> and more people who use Debian are switching to dash,
>> it will be more common.
>>
>> Thanks!

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: wrong final bzImage build (regading #14270)
       [not found]     ` <4ACF9184.9040104-Gdu+ltImwkhes2APU0mLOQ@public.gmane.org>
@ 2009-10-09 19:59       ` Cyrill Gorcunov
  0 siblings, 0 replies; 11+ messages in thread
From: Cyrill Gorcunov @ 2009-10-09 19:59 UTC (permalink / raw)
  To: Michael Tokarev
  Cc: Kernel Mailing List, Rafael J. Wysocki, Kernel Testers List,
	Sam Ravnborg, H. Peter Anvin

[Michael Tokarev - Fri, Oct 09, 2009 at 11:39:48PM +0400]
> Ok, some more to this.
>
> It turns out dash's built-in echo command interprets \nnn octal
> sequences by default, and there's no way to turn that off.  So,
> for example, sed-zoffset command from arch/x86/boot/Makefile
> (which includes \1 \2 etc substitutions for sed), when echoed
> in verbose mode (V=1), produces.. interesting characters (with
> ascii code 1 and 2).
>
> It's not practival to replace V=1's echo with /bin/echo I think.
>
> So I'd say it's not a bug in the build system after all, but
> a bug in dash.  Well, at least this expanding-by-default didn't
> trigger another very-difficult-to-find bug (hopefully), but it
> has good potential.
>
> I'll file a bug report against dash.
>
> /mjt
>

OK, thanks Michael!

	-- Cyrill

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: wrong final bzImage build (regading #14270)
  2009-10-09 19:39   ` Michael Tokarev
       [not found]     ` <4ACF9184.9040104-Gdu+ltImwkhes2APU0mLOQ@public.gmane.org>
@ 2009-10-09 20:02     ` Arkadiusz Miskiewicz
  2009-10-09 20:56       ` H. Peter Anvin
  1 sibling, 1 reply; 11+ messages in thread
From: Arkadiusz Miskiewicz @ 2009-10-09 20:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: Michael Tokarev, Cyrill Gorcunov, Rafael J. Wysocki,
	Kernel Testers List, Sam Ravnborg, H. Peter Anvin

On Friday 09 of October 2009, Michael Tokarev wrote:
> Ok, some more to this.
> 
> It turns out dash's built-in echo command interprets \nnn octal
> sequences by default, and there's no way to turn that off.  So,
> for example, sed-zoffset command from arch/x86/boot/Makefile
> (which includes \1 \2 etc substitutions for sed), when echoed
> in verbose mode (V=1), produces.. interesting characters (with
> ascii code 1 and 2).
> 
> It's not practival to replace V=1's echo with /bin/echo I think.
> 
> So I'd say it's not a bug in the build system after all, but
> a bug in dash.  

It's still a bug in build system if you consider that a /bin/sh is a posix 
shell. posix shells don't support \hex notation (see single unix system 
specification).

I had exactly this problem few weeks ago with pdksh as /bin/sh (and 
bugreported to author of that change). As I workaround I used /bin/echo but 
using printf is more sane/portable.

-- 
Arkadiusz Mi≈õkiewicz        PLD/Linux Team
arekm / maven.pl            http://ftp.pld-linux.org/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: wrong final bzImage build (regading #14270)
  2009-10-09 20:02     ` Arkadiusz Miskiewicz
@ 2009-10-09 20:56       ` H. Peter Anvin
       [not found]         ` <4ACFA36F.6000105-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: H. Peter Anvin @ 2009-10-09 20:56 UTC (permalink / raw)
  To: Arkadiusz Miskiewicz
  Cc: linux-kernel, Michael Tokarev, Cyrill Gorcunov, Rafael J. Wysocki,
	Kernel Testers List, Sam Ravnborg

On 10/09/2009 01:02 PM, Arkadiusz Miskiewicz wrote:
> 
> I had exactly this problem few weeks ago with pdksh as /bin/sh (and 
> bugreported to author of that change). As I workaround I used /bin/echo but 
> using printf is more sane/portable.
> 

Yes, using printf is the right thing to do.

A patch would be appreciated.

	-hpa

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: wrong final bzImage build (regading #14270)
       [not found]         ` <4ACFA36F.6000105-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
@ 2009-10-09 21:27           ` Michael Tokarev
  2009-10-09 21:29             ` H. Peter Anvin
  0 siblings, 1 reply; 11+ messages in thread
From: Michael Tokarev @ 2009-10-09 21:27 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Arkadiusz Miskiewicz, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Cyrill Gorcunov, Rafael J. Wysocki, Kernel Testers List,
	Sam Ravnborg

H. Peter Anvin wrote:
> On 10/09/2009 01:02 PM, Arkadiusz Miskiewicz wrote:
>> I had exactly this problem few weeks ago with pdksh as /bin/sh (and 
>> bugreported to author of that change). As I workaround I used /bin/echo but 
>> using printf is more sane/portable.
>>
> 
> Yes, using printf is the right thing to do.
> 
> A patch would be appreciated.

Come on, it's just a one-word change (s/echo/printf/ in
scripts/Makefile.lib).

But it should go to Sam's tree first I guess, which already
has s|echo|/bin/echo| so it'll conflict.
It's easier to change it in whatever tree it will be changed
without complete patches.

/mjt

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: wrong final bzImage build (regading #14270)
  2009-10-09 21:27           ` Michael Tokarev
@ 2009-10-09 21:29             ` H. Peter Anvin
  0 siblings, 0 replies; 11+ messages in thread
From: H. Peter Anvin @ 2009-10-09 21:29 UTC (permalink / raw)
  To: Michael Tokarev
  Cc: Arkadiusz Miskiewicz, linux-kernel, Cyrill Gorcunov,
	Rafael J. Wysocki, Kernel Testers List, Sam Ravnborg

On 10/09/2009 02:27 PM, Michael Tokarev wrote:
> H. Peter Anvin wrote:
>> On 10/09/2009 01:02 PM, Arkadiusz Miskiewicz wrote:
>>> I had exactly this problem few weeks ago with pdksh as /bin/sh (and 
>>> bugreported to author of that change). As I workaround I used /bin/echo but 
>>> using printf is more sane/portable.
>>>
>>
>> Yes, using printf is the right thing to do.
>>
>> A patch would be appreciated.
> 
> Come on, it's just a one-word change (s/echo/printf/ in
> scripts/Makefile.lib).

> But it should go to Sam's tree first I guess, which already
> has s|echo|/bin/echo| so it'll conflict.
> It's easier to change it in whatever tree it will be changed
> without complete patches.

So send a patch against Sam's tree.

	-hpa

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2009-10-09 21:29 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-09 14:17 wrong final bzImage build (regading #14270) Michael Tokarev
2009-10-09 14:26 ` Michael Tokarev
2009-10-09 14:58 ` Cyrill Gorcunov
2009-10-09 17:03   ` H. Peter Anvin
     [not found]     ` <4ACF6CF8.4060204-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
2009-10-09 17:14       ` Michael Tokarev
2009-10-09 19:39   ` Michael Tokarev
     [not found]     ` <4ACF9184.9040104-Gdu+ltImwkhes2APU0mLOQ@public.gmane.org>
2009-10-09 19:59       ` Cyrill Gorcunov
2009-10-09 20:02     ` Arkadiusz Miskiewicz
2009-10-09 20:56       ` H. Peter Anvin
     [not found]         ` <4ACFA36F.6000105-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
2009-10-09 21:27           ` Michael Tokarev
2009-10-09 21:29             ` H. Peter Anvin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).