* [PATCH] FIX IT
@ 2009-11-16 19:40 Andreas Mohr
2009-11-16 20:35 ` Nick Bowler
0 siblings, 1 reply; 7+ messages in thread
From: Andreas Mohr @ 2009-11-16 19:40 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel
ChangeLog: Partially fix B0RKEN kernel usability
checkpatch.pl'd, tested, applies cleanly to 2.6.32-rc7.
Seemingly best to go via trusted mmotm.
Thanks as always,
Signed-off-by: Andreas Mohr <andi@lisas.de>
--- linux-2.6/init/main.c.orig 2009-11-16 20:13:08.000000000 +0100
+++ linux-2.6/init/main.c 2009-11-16 20:14:51.000000000 +0100
@@ -846,7 +846,8 @@ static noinline int init_post(void)
run_init_process("/bin/init");
run_init_process("/bin/sh");
- panic("No init found. Try passing init= option to kernel.");
+ panic("No init found. Try passing init= option to kernel. "
+ "See Linux Documentation/init.txt for guidance.");
}
static int __init kernel_init(void * unused)
--- /dev/null 2009-11-10 08:07:33.390012116 +0100
+++ linux-2.6/Documentation/init.txt 2009-11-16 20:17:57.000000000 +0100
@@ -0,0 +1,44 @@
+Explaining the dreaded "No init found." boot hang message
+=========================================================
+
+OK, so you've got this pretty unintuitive message (currently located
+in init/main.c) and are wondering what the H*** went wrong.
+Some high-level reasons for failure (listed roughly in order of execution)
+to load the init binary are:
+A) Unable to mount root FS
+B) init binary doesn't exist on rootfs
+C) other requirements not met
+D) binary exists but dependencies not available
+E) binary cannot be loaded
+
+Detailed explanations:
+0) Set "debug" kernel parameter (in bootloader or CONFIG_CMDLINE)
+to get more detailed kernel messages.
+A) Please make sure you have the correct root FS type
+(and root= kernel parameter points to the correct partition),
+required drivers such as storage hardware (such as SCSI or USB!)
+and filesystem (ext3, jffs2 etc.) are builtin (alternatively as modules by
+using initrd)
+C) Possibly a conflict in console= setup --> initial console unavailable.
+E.g. some serial consoles are unreliable due to serial IRQ issues (e.g. missing
+interrupt-based configuration).
+Try using a different console= device or e.g. netconsole=.
+D) e.g. crucial library dependencies of the init binary such as
+/lib/ld-linux.so.2 missing or broken. Use readelf -d <INIT>|grep NEEDED
+to find out which libraries are required.
+E) make sure the binary's architecture matches your hardware.
+E.g. i386 vs. x86_64 mismatch, or trying to load x86 on ARM hardware.
+Or did you try loading a non-binary file here!?! (shell script?)
+To find out more, add code patch to display kernel_execve()s return values.
+
+Please extend this explanation whenever you find new failure causes
+(after all loading the init binary is a CRITICAL and hard transition step
+which needs to be made as painless as possible), then submit patch to LKML.
+Further TODOs:
+- Implement the various run_init_process() invocations via a struct array
+ which can then store the kernel_execve() result value and on failure
+ log it all by iterating over _all_ results (very important usability fix).
+- try to make the implementation itself more helpful in general,
+ e.g. by providing additional error messages at affected places.
+
+Andreas Mohr <andi at lisas period de>
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH] FIX IT 2009-11-16 19:40 [PATCH] FIX IT Andreas Mohr @ 2009-11-16 20:35 ` Nick Bowler 2009-11-17 20:40 ` Andreas Mohr 0 siblings, 1 reply; 7+ messages in thread From: Nick Bowler @ 2009-11-16 20:35 UTC (permalink / raw) To: Andreas Mohr; +Cc: Andrew Morton, linux-kernel On 20:40 Mon 16 Nov , Andreas Mohr wrote: > ChangeLog: Partially fix B0RKEN kernel usability Improving error messages is a good idea, but I'm not sure how much this patch actually helps. > --- linux-2.6/init/main.c.orig 2009-11-16 20:13:08.000000000 +0100 > +++ linux-2.6/init/main.c 2009-11-16 20:14:51.000000000 +0100 > @@ -846,7 +846,8 @@ static noinline int init_post(void) > run_init_process("/bin/init"); > run_init_process("/bin/sh"); > > - panic("No init found. Try passing init= option to kernel."); > + panic("No init found. Try passing init= option to kernel. " > + "See Linux Documentation/init.txt for guidance."); I think that the people who know where to look after reading this are mainly the people who don't need to read that file, with one exception - point (C) later on. > +OK, so you've got this pretty unintuitive message (currently located > +in init/main.c) and are wondering what the H*** went wrong. > +Some high-level reasons for failure (listed roughly in order of execution) > +to load the init binary are: > +A) Unable to mount root FS Whenever the root FS has been unable to mount, I've always received an error message that included the string "VFS: Unable to mount root fs". Has this changed recently? What sort of setup causes one to receive "No init found" instead? > +B) init binary doesn't exist on rootfs > +C) other requirements not met The introduction to this list already stated that it is not exhaustive, so this entry adds no new information. After reading the detailed explanation, "broken console device" seems more appropriate here. > +D) binary exists but dependencies not available > +E) binary cannot be loaded To me, (B), (D) and (E) are the same thing, and could just be "binary cannot be loaded". The details can be expanded upon in the next section. > +Detailed explanations: <snip> > +C) Possibly a conflict in console= setup --> initial console unavailable. > +E.g. some serial consoles are unreliable due to serial IRQ issues (e.g. missing > +interrupt-based configuration). > +Try using a different console= device or e.g. netconsole=. This appears to be by far the most interesting point in this file, since it clarifies that "No init found." might be caused by a configuration problem which seems completely unrelated to loading init. > +D) e.g. crucial library dependencies of the init binary such as > +/lib/ld-linux.so.2 missing or broken. Use readelf -d <INIT>|grep NEEDED > +to find out which libraries are required. > +E) make sure the binary's architecture matches your hardware. > +E.g. i386 vs. x86_64 mismatch, or trying to load x86 on ARM hardware. > +Or did you try loading a non-binary file here!?! (shell script?) Linux is perfectly happy to load a shell script as init, so this comment is very misleading. -- Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/) ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] FIX IT 2009-11-16 20:35 ` Nick Bowler @ 2009-11-17 20:40 ` Andreas Mohr 2009-12-27 15:03 ` [PATCH] Improve usability in case of init binary failure Andreas Mohr 0 siblings, 1 reply; 7+ messages in thread From: Andreas Mohr @ 2009-11-17 20:40 UTC (permalink / raw) To: Andreas Mohr, Andrew Morton, linux-kernel On Mon, Nov 16, 2009 at 03:35:45PM -0500, Nick Bowler wrote: > On 20:40 Mon 16 Nov , Andreas Mohr wrote: > > --- linux-2.6/init/main.c.orig 2009-11-16 20:13:08.000000000 +0100 > > +++ linux-2.6/init/main.c 2009-11-16 20:14:51.000000000 +0100 > > @@ -846,7 +846,8 @@ static noinline int init_post(void) > > run_init_process("/bin/init"); > > run_init_process("/bin/sh"); > > > > - panic("No init found. Try passing init= option to kernel."); > > + panic("No init found. Try passing init= option to kernel. " > > + "See Linux Documentation/init.txt for guidance."); > > I think that the people who know where to look after reading this are > mainly the people who don't need to read that file, with one exception - > point (C) later on. I'm afraid I have to disagree with some parts in this mail. As an LKML regular I've certainly had a rather higher share of problems in all this than I'd ever have expected. As for less-involved people, they will just raise eyebrows on "Documentation/init.txt", Google the term (as long as they've got a second working computer, that is ;) and be happy. > > +OK, so you've got this pretty unintuitive message (currently located > > +in init/main.c) and are wondering what the H*** went wrong. > > +Some high-level reasons for failure (listed roughly in order of execution) > > +to load the init binary are: > > +A) Unable to mount root FS > > Whenever the root FS has been unable to mount, I've always received an > error message that included the string "VFS: Unable to mount root fs". > Has this changed recently? What sort of setup causes one to receive "No > init found" instead? This _might_ be the case (I think it happens often indeed), but you never know whether it's correctly output in 100% of these cases (e.g. possibly depending on whether "debug" is specified or not, as one factor only!). And given the avalanchy multitude of problems in this area my staunch opinion is that this guidance should be committed NOW regardless of whether it's got a "perfect" appearance (i.e. 100% of the content is fully accurate, lists all required hints and doesn't contain false positives). So far we've provided almost NOTHING, so let's at least add something, soon. I'll just give further examples: a) [same day] saw http://lkml.org/lkml/2009/11/10/526 during some light LKML reading b) [same day] _first_ pastebin plea for help that I encountered on #openwrt - guess what it was about? c) [next day] wasting half a day at work due to Red Hat's sheer inability to make a system work with more than 7MB/s on SATA hardware. Even worse, trying to fix this up by going the way of building a custom 2.6.31.5 (something I'm doing all the time elsewhere), I even managed to hit SEVERE Red Hat initrd root device issues (culminating in "Init not found."), with about a hundred UNSOLVED Google results in trying to make a buggy initrd / nash setup accept a different root device. Talk about double fault, for crying out loud. d) [second next day] private thankful reply of another power user to my patch mail citing Debian initrd issues due to ldd issues causing .so's to get lost and thus a "No init found." message produced. > > +B) init binary doesn't exist on rootfs > > +C) other requirements not met > > The introduction to this list already stated that it is not exhaustive, > so this entry adds no new information. After reading the detailed > explanation, "broken console device" seems more appropriate here. Indeed, it's better to have one-liners with specific issues and then multi-liners elaborating on these issues, I'll update it. > > +D) binary exists but dependencies not available > > +E) binary cannot be loaded > > To me, (B), (D) and (E) are the same thing, and could just be "binary > cannot be loaded". The details can be expanded upon in the next > section. > > > +Detailed explanations: > <snip> > > +C) Possibly a conflict in console= setup --> initial console unavailable. > > +E.g. some serial consoles are unreliable due to serial IRQ issues (e.g. missing > > +interrupt-based configuration). > > +Try using a different console= device or e.g. netconsole=. > > This appears to be by far the most interesting point in this file, since > it clarifies that "No init found." might be caused by a configuration > problem which seems completely unrelated to loading init. Users don't care much whether the message is "Init not found." or "console broken." or whatever, all they know is that their system doesn't work and that they want immediate help and earnest attempts in getting this thing resolved. Of course it would be nice to have individual areas of problems output their fair share of log messages (e.g. console setup), but as long as we don't have that entirely and I'm not fully ready to figure out myself all places that are lacking certain messages (as opposed to e.g. core developers), we need (certainly imperfect) helper documentation NOW. > > +D) e.g. crucial library dependencies of the init binary such as > > +/lib/ld-linux.so.2 missing or broken. Use readelf -d <INIT>|grep NEEDED > > +to find out which libraries are required. > > +E) make sure the binary's architecture matches your hardware. > > +E.g. i386 vs. x86_64 mismatch, or trying to load x86 on ARM hardware. > > +Or did you try loading a non-binary file here!?! (shell script?) > > Linux is perfectly happy to load a shell script as init, so this comment > is very misleading. Oh, interesting. I've seen a warning about this in a forum, thus I added it here, but I don't have experience with this myself, so I guess it's ok after all, thanks! (and there are several reports that seem to confirm that a shell script is possible, probably since the shebang mechanism likely is ld.so-related) This part should thus be altered to mention that a script needs to have its fully working interpreter binary plus dependencies available. I'll submit a new version of this patch very soon. Thanks, Andreas Mohr ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH] Improve usability in case of init binary failure 2009-11-17 20:40 ` Andreas Mohr @ 2009-12-27 15:03 ` Andreas Mohr 2010-02-02 7:10 ` David Rientjes 0 siblings, 1 reply; 7+ messages in thread From: Andreas Mohr @ 2009-12-27 15:03 UTC (permalink / raw) To: Andreas Mohr; +Cc: Andrew Morton, Nick Bowler, linux-kernel On Tue, Nov 17, 2009 at 09:40:16PM +0100, Andreas Mohr wrote: > I'll submit a new version of this patch very soon. Well, took quite a while longer, partly due to broken Broadcom USB host (OpenWrt fix to be submitted) and non-working USB-audio on nicer platforms. Took most of the comments into account (thanks!), improved some wording. Patch against current git, compile- and runtime-tested, checkpatch.pl'd (with a single nice hierarchy warning resulting from mixing git diff output and manual /dev/null diffing). Thanks! Signed-off-by: Andreas Mohr <andi@lisas.de> diff --git a/init/main.c b/init/main.c index dac44a9..33748c6 100644 --- a/init/main.c +++ b/init/main.c @@ -836,7 +836,8 @@ static noinline int init_post(void) run_init_process("/bin/init"); run_init_process("/bin/sh"); - panic("No init found. Try passing init= option to kernel."); + panic("No init found. Try passing init= option to kernel. " + "See Linux Documentation/init.txt for guidance."); } static int __init kernel_init(void * unused) --- /dev/null 2009-12-27 16:25:29.521258205 +0100 +++ Documentation/init.txt 2009-12-27 15:47:46.000000000 +0100 @@ -0,0 +1,49 @@ +Explaining the dreaded "No init found." boot hang message +========================================================= + +OK, so you've got this pretty unintuitive message (currently located +in init/main.c) and are wondering what the H*** went wrong. +Some high-level reasons for failure (listed roughly in order of execution) +to load the init binary are: +A) Unable to mount root FS +B) init binary doesn't exist on rootfs +C) broken console device +D) binary exists but dependencies not available +E) binary cannot be loaded + +Detailed explanations: +0) Set "debug" kernel parameter (in bootloader config file or CONFIG_CMDLINE) + to get more detailed kernel messages. +A) make sure you have the correct root FS type + (and root= kernel parameter points to the correct partition), + required drivers such as storage hardware (such as SCSI or USB!) + and filesystem (ext3, jffs2 etc.) are builtin (alternatively as modules, + to be pre-loaded by an initrd) +C) Possibly a conflict in console= setup --> initial console unavailable. + E.g. some serial consoles are unreliable due to serial IRQ issues (e.g. + missing interrupt-based configuration). + Try using a different console= device or e.g. netconsole= . +D) e.g. required library dependencies of the init binary such as + /lib/ld-linux.so.2 missing or broken. Use readelf -d <INIT>|grep NEEDED + to find out which libraries are required. +E) make sure the binary's architecture matches your hardware. + E.g. i386 vs. x86_64 mismatch, or trying to load x86 on ARM hardware. + In case you tried loading a non-binary file here (shell script?), + you should make sure that the script specifies an interpreter in its shebang + header line (#!/...) that is fully working (including its library + dependencies). And before tackling scripts, better first test a simple + non-script binary such as /bin/sh and confirm its successful execution. + To find out more, add code to init/main.c to display kernel_execve()s + return values. + +Please extend this explanation whenever you find new failure causes +(after all loading the init binary is a CRITICAL and hard transition step +which needs to be made as painless as possible), then submit patch to LKML. +Further TODOs: +- Implement the various run_init_process() invocations via a struct array + which can then store the kernel_execve() result value and on failure + log it all by iterating over _all_ results (very important usability fix). +- try to make the implementation itself more helpful in general, + e.g. by providing additional error messages at affected places. + +Andreas Mohr <andi at lisas period de> ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] Improve usability in case of init binary failure 2009-12-27 15:03 ` [PATCH] Improve usability in case of init binary failure Andreas Mohr @ 2010-02-02 7:10 ` David Rientjes 2010-02-02 17:20 ` Andrew Morton 0 siblings, 1 reply; 7+ messages in thread From: David Rientjes @ 2010-02-02 7:10 UTC (permalink / raw) To: Andreas Mohr; +Cc: Andrew Morton, Nick Bowler, linux-kernel On Sun, 27 Dec 2009, Andreas Mohr wrote: > Well, took quite a while longer, partly due to broken Broadcom USB host > (OpenWrt fix to be submitted) and non-working USB-audio on nicer platforms. > > Took most of the comments into account (thanks!), improved some wording. > > Patch against current git, compile- and runtime-tested, > checkpatch.pl'd (with a single nice hierarchy warning resulting from mixing > git diff output and manual /dev/null diffing). > > Thanks! > It looks like this patch got mangled when added to mmotm-2010-02-01-16-25 in init-mainc-improve-usability-in-case-of-init-binary-failure.patch since it added init.txt to the root directory instead of Documentation, even though the patch below is correct. > Signed-off-by: Andreas Mohr <andi@lisas.de> > > diff --git a/init/main.c b/init/main.c > index dac44a9..33748c6 100644 > --- a/init/main.c > +++ b/init/main.c > @@ -836,7 +836,8 @@ static noinline int init_post(void) > run_init_process("/bin/init"); > run_init_process("/bin/sh"); > > - panic("No init found. Try passing init= option to kernel."); > + panic("No init found. Try passing init= option to kernel. " > + "See Linux Documentation/init.txt for guidance."); > } > > static int __init kernel_init(void * unused) > --- /dev/null 2009-12-27 16:25:29.521258205 +0100 > +++ Documentation/init.txt 2009-12-27 15:47:46.000000000 +0100 > @@ -0,0 +1,49 @@ > +Explaining the dreaded "No init found." boot hang message > +========================================================= > + > +OK, so you've got this pretty unintuitive message (currently located > +in init/main.c) and are wondering what the H*** went wrong. > +Some high-level reasons for failure (listed roughly in order of execution) > +to load the init binary are: > +A) Unable to mount root FS > +B) init binary doesn't exist on rootfs > +C) broken console device > +D) binary exists but dependencies not available > +E) binary cannot be loaded > + > +Detailed explanations: > +0) Set "debug" kernel parameter (in bootloader config file or CONFIG_CMDLINE) > + to get more detailed kernel messages. > +A) make sure you have the correct root FS type > + (and root= kernel parameter points to the correct partition), > + required drivers such as storage hardware (such as SCSI or USB!) > + and filesystem (ext3, jffs2 etc.) are builtin (alternatively as modules, > + to be pre-loaded by an initrd) > +C) Possibly a conflict in console= setup --> initial console unavailable. > + E.g. some serial consoles are unreliable due to serial IRQ issues (e.g. > + missing interrupt-based configuration). > + Try using a different console= device or e.g. netconsole= . > +D) e.g. required library dependencies of the init binary such as > + /lib/ld-linux.so.2 missing or broken. Use readelf -d <INIT>|grep NEEDED > + to find out which libraries are required. > +E) make sure the binary's architecture matches your hardware. > + E.g. i386 vs. x86_64 mismatch, or trying to load x86 on ARM hardware. > + In case you tried loading a non-binary file here (shell script?), > + you should make sure that the script specifies an interpreter in its shebang > + header line (#!/...) that is fully working (including its library > + dependencies). And before tackling scripts, better first test a simple > + non-script binary such as /bin/sh and confirm its successful execution. > + To find out more, add code to init/main.c to display kernel_execve()s > + return values. > + > +Please extend this explanation whenever you find new failure causes > +(after all loading the init binary is a CRITICAL and hard transition step > +which needs to be made as painless as possible), then submit patch to LKML. > +Further TODOs: > +- Implement the various run_init_process() invocations via a struct array > + which can then store the kernel_execve() result value and on failure > + log it all by iterating over _all_ results (very important usability fix). > +- try to make the implementation itself more helpful in general, > + e.g. by providing additional error messages at affected places. > + > +Andreas Mohr <andi at lisas period de> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Improve usability in case of init binary failure 2010-02-02 7:10 ` David Rientjes @ 2010-02-02 17:20 ` Andrew Morton 2010-02-02 17:48 ` Andreas Mohr 0 siblings, 1 reply; 7+ messages in thread From: Andrew Morton @ 2010-02-02 17:20 UTC (permalink / raw) To: David Rientjes; +Cc: Andreas Mohr, Nick Bowler, linux-kernel On Mon, 1 Feb 2010 23:10:51 -0800 (PST) David Rientjes <rientjes@google.com> wrote: > On Sun, 27 Dec 2009, Andreas Mohr wrote: > > > Well, took quite a while longer, partly due to broken Broadcom USB host > > (OpenWrt fix to be submitted) and non-working USB-audio on nicer platforms. > > > > Took most of the comments into account (thanks!), improved some wording. > > > > Patch against current git, compile- and runtime-tested, > > checkpatch.pl'd (with a single nice hierarchy warning resulting from mixing > > git diff output and manual /dev/null diffing). > > > > Thanks! > > > > It looks like this patch got mangled when added to mmotm-2010-02-01-16-25 > in init-mainc-improve-usability-in-case-of-init-binary-failure.patch since > it added init.txt to the root directory instead of Documentation, ah, thanks. > even though the patch below is correct. Nope, the patch was wrong: > > --- a/init/main.c > > +++ b/init/main.c > ... > > --- /dev/null 2009-12-27 16:25:29.521258205 +0100 > > +++ Documentation/init.txt 2009-12-27 15:47:46.000000000 +0100 Should've been a/Documentation/init.txt ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Improve usability in case of init binary failure 2010-02-02 17:20 ` Andrew Morton @ 2010-02-02 17:48 ` Andreas Mohr 0 siblings, 0 replies; 7+ messages in thread From: Andreas Mohr @ 2010-02-02 17:48 UTC (permalink / raw) To: Andrew Morton; +Cc: David Rientjes, Andreas Mohr, Nick Bowler, linux-kernel On Tue, Feb 02, 2010 at 09:20:41AM -0800, Andrew Morton wrote: > On Mon, 1 Feb 2010 23:10:51 -0800 (PST) David Rientjes <rientjes@google.com> wrote: > > > On Sun, 27 Dec 2009, Andreas Mohr wrote: > > > > > Well, took quite a while longer, partly due to broken Broadcom USB host > > > (OpenWrt fix to be submitted) and non-working USB-audio on nicer platforms. > > > > > > Took most of the comments into account (thanks!), improved some wording. > > > > > > Patch against current git, compile- and runtime-tested, > > > checkpatch.pl'd (with a single nice hierarchy warning resulting from mixing > > > git diff output and manual /dev/null diffing). > > > > > > Thanks! > > > > > > > It looks like this patch got mangled when added to mmotm-2010-02-01-16-25 > > in init-mainc-improve-usability-in-case-of-init-binary-failure.patch since > > it added init.txt to the root directory instead of Documentation, > > ah, thanks. > > > even though the patch below is correct. > > Nope, the patch was wrong: > > > > --- a/init/main.c > > > +++ b/init/main.c > > ... > > > --- /dev/null 2009-12-27 16:25:29.521258205 +0100 > > > +++ Documentation/init.txt 2009-12-27 15:47:46.000000000 +0100 > > Should've been a/Documentation/init.txt Indeed, which is why I had mentioned it in the submission (above), but [fatally, as it turned out] did not bother to fix this ""minor"" issue. Lots of sorries, Andreas Mohr ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2010-02-02 17:48 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-11-16 19:40 [PATCH] FIX IT Andreas Mohr 2009-11-16 20:35 ` Nick Bowler 2009-11-17 20:40 ` Andreas Mohr 2009-12-27 15:03 ` [PATCH] Improve usability in case of init binary failure Andreas Mohr 2010-02-02 7:10 ` David Rientjes 2010-02-02 17:20 ` Andrew Morton 2010-02-02 17:48 ` Andreas Mohr
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.