All of lore.kernel.org
 help / color / mirror / Atom feed
* runaway loop modprobe binfmt-0000
@ 2007-01-05 23:13 Tom Lanyon
  2007-01-06  0:42 ` Andrew Morton
  0 siblings, 1 reply; 4+ messages in thread
From: Tom Lanyon @ 2007-01-05 23:13 UTC (permalink / raw)
  To: linux-kernel

Greetings,

I'm encountering a rather annoying error on one of our AMD Opteron
boxes. I recompiled the working 2.6.17 kernel to add some extra SCSI
support and booted to find something similar to:

usbcore: registered new interface driver usb-storage
USB Mass Storage support registered.
usbcore: registered new interface driver hiddev
usbcore: registered new interface driver usbhid
drivers/usb/input/hid-core.c: v2.6:USB HID core driver
PNP: PS/2 Controller [PNP0303:KBD,PNP0f0e:PS2M] at 0x60,0x64 irq 1,12
serio: i8042 KBD port at 0x60,0x64 irq 1
serio: i8042 AUX port at 0x60,0x64 irq 12
mice: PS/2 mouse device common for all mice
device-mapper: ioctl: 4.10.0-ioctl (2006-09-14) initialised: dm-devel@redhat.com
device-mapper: multipath: version 1.0.5 loaded
device-mapper: multipath round-robin: version 1.0.0 loaded
device-mapper: multipath emc: version 0.0.3 loaded
TCP cubic registered
NET: Registered protocol family 1
NET: Registered protocol family 17
Freeing unused kernel memory: 324k freed
request_module: runaway loop modprobe binfmt-0000
request_module: runaway loop modprobe binfmt-0000
request_module: runaway loop modprobe binfmt-0000
request_module: runaway loop modprobe binfmt-0000
request_module: runaway loop modprobe binfmt-0000

... and then a hung kernel.

I figured I'd enabled/disabled something I shouldn't have and went
through the kernel config many times, recompiling with different
options (including ELF and misc emulation support), but to no avail.

As time ticked closer to 5pm on a Friday my interest decreased and so
I tried upgrading to 2.6.18 and then 2.6.19, with a tonne of different
configurations. Still nothing. The best I could achieve was on a fresh
2.6.19.1 I managed to pass the modprobe and load some input drivers,
but then it hung directly after that.

...
request_module: runaway loop modprobe binfmt-0000
request_module: runaway loop modprobe binfmt-0000
request_module: runaway loop modprobe binfmt-0000
input: AT Translated Set 2 keyboard as /class/input/input0
input: PS/2 Generic Mouse as /class/input/input1


Any advice on debugging this would be appreciated. How can I discover
what is trying to load binfmt-0000 and why is it looping?

Regards

-- 
Tom Lanyon

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: runaway loop modprobe binfmt-0000
  2007-01-05 23:13 Tom Lanyon
@ 2007-01-06  0:42 ` Andrew Morton
  2007-01-07 22:19   ` Tom Lanyon
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Morton @ 2007-01-06  0:42 UTC (permalink / raw)
  To: Tom Lanyon; +Cc: linux-kernel

On Sat, 6 Jan 2007 09:43:14 +1030
"Tom Lanyon" <tomlanyon@gmail.com> wrote:

> How can I discover
> what is trying to load binfmt-0000 and why is it looping?

Start with this, I guess..

--- a/kernel/kmod.c~a
+++ a/kernel/kmod.c
@@ -98,10 +98,12 @@ int request_module(const char *fmt, ...)
 	atomic_inc(&kmod_concurrent);
 	if (atomic_read(&kmod_concurrent) > max_modprobes) {
 		/* We may be blaming an innocent here, but unlikely */
-		if (kmod_loop_msg++ < 5)
+		if (kmod_loop_msg++ < 5) {
 			printk(KERN_ERR
 			       "request_module: runaway loop modprobe %s\n",
 			       module_name);
+			dump_stack();
+		}
 		atomic_dec(&kmod_concurrent);
 		return -ENOMEM;
 	}
_


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: runaway loop modprobe binfmt-0000
  2007-01-06  0:42 ` Andrew Morton
@ 2007-01-07 22:19   ` Tom Lanyon
  0 siblings, 0 replies; 4+ messages in thread
From: Tom Lanyon @ 2007-01-07 22:19 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel

On 1/6/07, Andrew Morton <akpm@osdl.org> wrote:
> On Sat, 6 Jan 2007 09:43:14 +1030
> "Tom Lanyon" <tomlanyon@gmail.com> wrote:
>
> > How can I discover
> > what is trying to load binfmt-0000 and why is it looping?
>
> Start with this, I guess..
>
> --- a/kernel/kmod.c~a
> +++ a/kernel/kmod.c
> @@ -98,10 +98,12 @@ int request_module(const char *fmt, ...)
>         atomic_inc(&kmod_concurrent);
>         if (atomic_read(&kmod_concurrent) > max_modprobes) {
>                 /* We may be blaming an innocent here, but unlikely */
> -               if (kmod_loop_msg++ < 5)
> +               if (kmod_loop_msg++ < 5) {
>                         printk(KERN_ERR
>                                "request_module: runaway loop modprobe %s\n",
>                                module_name);
> +                       dump_stack();
> +               }
>                 atomic_dec(&kmod_concurrent);
>                 return -ENOMEM;
>         }
> _
>
>

Thanks for the reply, Andrew.

How interesting... added that to kmod.c, rebuilt without change to
config, reboot.... machine booted perfectly!

I'm going to leave it for now, but I'll leave the dump_stack() call in
there in case further issues arise.

Regards

-- 
Tom Lanyon

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: runaway loop modprobe binfmt-0000
       [not found] ` <20080105013023.5c872c79.akpm@linux-foundation.org>
@ 2008-01-06 20:18   ` Tom Gall
  0 siblings, 0 replies; 4+ messages in thread
From: Tom Gall @ 2008-01-06 20:18 UTC (permalink / raw)
  To: Andrew Morton; +Cc: tomlanyon, linux-kernel

Hi Andrew,

Thanks for the email back.

Andrew Morton wrote:

>On Thu, 03 Jan 2008 12:18:19 -0600 Tom Gall <tom_gall@vnet.ibm.com> wrote:
>  
>
>>This is an interesting one as I just hit this with a 2.6.23.12 kernel 
>>...  granted running on systemsim (the powerpc simulator) ... but as I'm 
>>hitting a wall I dropped in the dump_stack call and here is the output:
>>    
>>
>
>I don't remember what any of this is about and there is no record on the
>mailing list - presumably because you're ccing lkml@vger.kernel.org and not
>linux-kernel@...
>  
>
Ever written the previous year on your checks? Well when I was googling 
about for folks with similar woes, the posts I had thought were from 
this January were actually from Jan 2007.  O Well! Not to mention it's 
been awhile since I've posted to lkml ....

>I suggest that we start again with this bug report from scratch.  
>  
>
In this case, not needed. I spent friday taking a tour of the codepath 
and basically narrowed it down to a device driver passing bum data.

So if this ever happens in the future and someone somewhere seems the 
message about trying to load binfmt-xxxx (where xxxx is 0000 or some 
other "random" number * (hold that thought I'll come back to this) ) 
here are some things to consider as you're debugging your problem.

In my case, the device driver was just good enough that it could access 
the device, but was poor enough that the bits it was loading were 
complete crap ...   So the kernel was happily running along booting, 
gets to the point where it's time to exec init, loads the first bits of 
the binary, does a compare to make sure it's an elf binary, fails the 
compare (because the data is junk) ...  the kernel figures that ooo .. 
this data must be good it's just in a different format ... I'll run 
through the list of known binary file formats.. hmm not that one .. or 
that one ....   then running out of options trys to load a module for 
what must be this amazing new file format, can't find the module and 
ker-blew-ee.

* So this number that is part of the module name binfmt-xxxx is from 
what I see the id of the file format ... thankfully no one has choosen 
0000 as their id ...

>Unless it's a potential denial-of-service attack for unprivileged users,
>in which case a cc to security@vger.kernel.org migt be more appropriate.
>  
>
As this is a boot time issue it's certainly not a DOS.  Now there might 
be value is thinking about this error path in the initial boot up of the 
kernel. The message is a bit obscure for what the real issue is. Still 
.. should one go back to the device and comfirm that the data received 
was actually reasonable? No.

So I think we have two cases here:

Bootup:
I could see a change where the system should probably panic with some 
sort of message like : "Not able to exec %s, unrecognized file format" 
if this were to happen at boot up.  I could put together a patch.

Running System:
If this happens after the system is up and running, I'd think one is 
going to have some data in dmeg probably to the tune of "media error" 
... so in my mind, who cares about that case.

Regards,

Tom



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-01-06 20:18 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <477D26EB.6070008@vnet.ibm.com>
     [not found] ` <20080105013023.5c872c79.akpm@linux-foundation.org>
2008-01-06 20:18   ` runaway loop modprobe binfmt-0000 Tom Gall
2007-01-05 23:13 Tom Lanyon
2007-01-06  0:42 ` Andrew Morton
2007-01-07 22:19   ` Tom Lanyon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.