All of lore.kernel.org
 help / color / mirror / Atom feed
From: Frank Sorenson <frank@tuxrocks.com>
To: Matt Domsch <Matt_Domsch@dell.com>
Cc: Ingo Molnar <mingo@elte.hu>,
	kay.sievers@vrfy.org, LKML <linux-kernel@vger.kernel.org>,
	linux-mm@kvack.org, "Rafael J. Wysocki" <rjw@sisk.pl>,
	jcm@redhat.com
Subject: Re: 2.6.25-rc4 OOMs itself dead on bootup (modprobe bug?)
Date: Sat, 08 Mar 2008 08:03:23 -0600	[thread overview]
Message-ID: <47D29CAB.50301@tuxrocks.com> (raw)
In-Reply-To: <20080308135318.GA8036@auslistsprd01.us.dell.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Matt Domsch wrote:
> On Sat, Mar 08, 2008 at 01:08:46AM -0600, Frank Sorenson wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Frank Sorenson wrote:
>>> I did some additional debugging, and I believe you're correct about it
>>> being specific to my system.  The system seems to run fine until some
>>> time during the boot.  I booted with "init=/bin/sh" (that's how the
>>> system stayed up for 9 minutes), then it died when I tried starting
>>> things up.  I've further narrowed the OOM down to udev (though it's not
>>> entirely udev's fault, since 2.6.24 runs fine).
>>>
>>> I ran your debug info tool before killing the box by running
>>> /sbin/start_udev.  The output of the tool is at
>>> http://tuxrocks.com/tmp/cfs-debug-info-2008.03.06-14.11.24
>>>
>>> Something is apparently happening between 2.6.24 and 2.6.25-rc[34] which
>>> causes udev (or something it calls) to behave very badly.
>> Found it.  The culprit is 8f47f0b688bba7642dac4e979896e4692177670b
>>     dcdbas: add DMI-based module autloading
>>
>>     DMI autoload dcdbas on all Dell systems.
>>
>>     This looks for BIOS Vendor or System Vendor == Dell, so this should
>>     work for systems both Dell-branded and those Dell builds but brands
>>     for others.  It causes udev to load the dcdbas module at startup,
>>     which is used by tools called by HAL for wireless control and
>>     backlight control, among other uses.
>>
>> What actually happens is that when udev loads the dcdbas module at
>> startup, modprobe apparently calls "modprobe dcdbas" itself, repeating
>> until the system runs out of resources (in this case, it OOMs).
>>
>> # ps axf
>> ...
>>   506 ?        S      0:00 /bin/bash /sbin/start_udev
>>   590 ?        S      0:00  \_ /sbin/udevsettle
>>   533 ?        S<s    0:00 /sbin/udevd -d
>>   629 ?        S<     0:00  \_ /sbin/udevd -d
>>   630 ?        S<     0:00  |   \_ /sbin/modprobe
>> dmi:bvnDellInc.:bvrA08:bd04/02/2007:svnDellInc.:pnMP061:pvr:rvnDellInc.:rn0YD479:rvr:cvnDellInc.:ct8:cvr:
>>   949 ?        S<     0:00  |       \_ /sbin/modprobe dcdbas
>>   950 ?        S<     0:00  |           \_ /sbin/modprobe dcdbas
>>   951 ?        S<     0:00  |               \_ /sbin/modprobe dcdbas
>>   953 ?        S<     0:00  |                   \_ /sbin/modprobe dcdbas
>>   955 ?        S<     0:00  |                       \_ /sbin/modprobe dcdbas
>>   958 ?        S<     0:00  |                           \_
>> /sbin/modprobe dcdbas
>> ...repeat...
>>
>> When the system crashed, there were at least 11,600 instances of
>> "/sbin/modprobe dcdbas", each calling the next.
>>
>> Reverting 8f47f0b lets the system boot up just fine again.  Note that a
>> manual "modprobe dcdbas" also causes this recursive behavior, it's just
>> not forced on the system by udev.
>>
>> So dcdbas is a regression from 2.6.24, as well as being broken in other
>> ways.
>>
>> Frank
>> - --
>> Frank Sorenson - KD7TZK
>> Linux Systems Engineer, DSS Engineering, UBS AG
>> frank@tuxrocks.com
> 
> 
> Frank, what version of module-init-tools do you have?  This has been
> in use in Fedora 8 for a few months, and this is the first failure
> report I've seen.
> 
> I'm fine with reverting the patch for now, but really do want to get
> to root cause, because module autoloading is a really good idea, and
> it would be a shame if we couldn't keep that feature enabled because
> some systems have incompatible module-init-tools, and the kernel can't
> know that...  (Perhaps udev could know and not invoke modprobe in
> those instances?)
> 
> -Matt

It's module-init-tools-3.4-2.fc8.x86_64 (most recent Fedora rpm available).

Frank
- --
Frank Sorenson - KD7TZK
Linux Systems Engineer, DSS Engineering, UBS AG
frank@tuxrocks.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iD8DBQFH0pyoaI0dwg4A47wRAq9rAKCVbg5ngSyHVORpLAcD4WY4vNMQlQCdGtr1
9CiHmom5Vopsqukc8e+D1RU=
=GPMU
-----END PGP SIGNATURE-----

WARNING: multiple messages have this Message-ID (diff)
From: Frank Sorenson <frank@tuxrocks.com>
To: Matt Domsch <Matt_Domsch@dell.com>
Cc: Ingo Molnar <mingo@elte.hu>,
	kay.sievers@vrfy.org, LKML <linux-kernel@vger.kernel.org>,
	linux-mm@kvack.org, "Rafael J. Wysocki" <rjw@sisk.pl>,
	jcm@redhat.com
Subject: Re: 2.6.25-rc4 OOMs itself dead on bootup (modprobe bug?)
Date: Sat, 08 Mar 2008 08:03:23 -0600	[thread overview]
Message-ID: <47D29CAB.50301@tuxrocks.com> (raw)
In-Reply-To: <20080308135318.GA8036@auslistsprd01.us.dell.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Matt Domsch wrote:
> On Sat, Mar 08, 2008 at 01:08:46AM -0600, Frank Sorenson wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Frank Sorenson wrote:
>>> I did some additional debugging, and I believe you're correct about it
>>> being specific to my system.  The system seems to run fine until some
>>> time during the boot.  I booted with "init=/bin/sh" (that's how the
>>> system stayed up for 9 minutes), then it died when I tried starting
>>> things up.  I've further narrowed the OOM down to udev (though it's not
>>> entirely udev's fault, since 2.6.24 runs fine).
>>>
>>> I ran your debug info tool before killing the box by running
>>> /sbin/start_udev.  The output of the tool is at
>>> http://tuxrocks.com/tmp/cfs-debug-info-2008.03.06-14.11.24
>>>
>>> Something is apparently happening between 2.6.24 and 2.6.25-rc[34] which
>>> causes udev (or something it calls) to behave very badly.
>> Found it.  The culprit is 8f47f0b688bba7642dac4e979896e4692177670b
>>     dcdbas: add DMI-based module autloading
>>
>>     DMI autoload dcdbas on all Dell systems.
>>
>>     This looks for BIOS Vendor or System Vendor == Dell, so this should
>>     work for systems both Dell-branded and those Dell builds but brands
>>     for others.  It causes udev to load the dcdbas module at startup,
>>     which is used by tools called by HAL for wireless control and
>>     backlight control, among other uses.
>>
>> What actually happens is that when udev loads the dcdbas module at
>> startup, modprobe apparently calls "modprobe dcdbas" itself, repeating
>> until the system runs out of resources (in this case, it OOMs).
>>
>> # ps axf
>> ...
>>   506 ?        S      0:00 /bin/bash /sbin/start_udev
>>   590 ?        S      0:00  \_ /sbin/udevsettle
>>   533 ?        S<s    0:00 /sbin/udevd -d
>>   629 ?        S<     0:00  \_ /sbin/udevd -d
>>   630 ?        S<     0:00  |   \_ /sbin/modprobe
>> dmi:bvnDellInc.:bvrA08:bd04/02/2007:svnDellInc.:pnMP061:pvr:rvnDellInc.:rn0YD479:rvr:cvnDellInc.:ct8:cvr:
>>   949 ?        S<     0:00  |       \_ /sbin/modprobe dcdbas
>>   950 ?        S<     0:00  |           \_ /sbin/modprobe dcdbas
>>   951 ?        S<     0:00  |               \_ /sbin/modprobe dcdbas
>>   953 ?        S<     0:00  |                   \_ /sbin/modprobe dcdbas
>>   955 ?        S<     0:00  |                       \_ /sbin/modprobe dcdbas
>>   958 ?        S<     0:00  |                           \_
>> /sbin/modprobe dcdbas
>> ...repeat...
>>
>> When the system crashed, there were at least 11,600 instances of
>> "/sbin/modprobe dcdbas", each calling the next.
>>
>> Reverting 8f47f0b lets the system boot up just fine again.  Note that a
>> manual "modprobe dcdbas" also causes this recursive behavior, it's just
>> not forced on the system by udev.
>>
>> So dcdbas is a regression from 2.6.24, as well as being broken in other
>> ways.
>>
>> Frank
>> - --
>> Frank Sorenson - KD7TZK
>> Linux Systems Engineer, DSS Engineering, UBS AG
>> frank@tuxrocks.com
> 
> 
> Frank, what version of module-init-tools do you have?  This has been
> in use in Fedora 8 for a few months, and this is the first failure
> report I've seen.
> 
> I'm fine with reverting the patch for now, but really do want to get
> to root cause, because module autoloading is a really good idea, and
> it would be a shame if we couldn't keep that feature enabled because
> some systems have incompatible module-init-tools, and the kernel can't
> know that...  (Perhaps udev could know and not invoke modprobe in
> those instances?)
> 
> -Matt

It's module-init-tools-3.4-2.fc8.x86_64 (most recent Fedora rpm available).

Frank
- --
Frank Sorenson - KD7TZK
Linux Systems Engineer, DSS Engineering, UBS AG
frank@tuxrocks.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iD8DBQFH0pyoaI0dwg4A47wRAq9rAKCVbg5ngSyHVORpLAcD4WY4vNMQlQCdGtr1
9CiHmom5Vopsqukc8e+D1RU=
=GPMU
-----END PGP SIGNATURE-----

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2008-03-08 14:04 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-06 17:26 2.6.25-rc4 OOMs itself dead on bootup Frank Sorenson
2008-03-06 17:26 ` Frank Sorenson
2008-03-06 18:49 ` Ingo Molnar
2008-03-07 19:27   ` Frank Sorenson
2008-03-07 19:27     ` Frank Sorenson
2008-03-08  7:08     ` Frank Sorenson
2008-03-08  7:08       ` Frank Sorenson
2008-03-08  8:22       ` [patch] revert "dcdbas: add DMI-based module autloading" Ingo Molnar
2008-03-08  8:22         ` Ingo Molnar
2008-03-08 18:16         ` Kay Sievers
2008-03-08 18:16           ` Kay Sievers
2008-03-08 18:57           ` Frank Sorenson
2008-03-08 18:57             ` Frank Sorenson
2008-03-08 13:53       ` 2.6.25-rc4 OOMs itself dead on bootup (modprobe bug?) Matt Domsch
2008-03-08 13:53         ` Matt Domsch
2008-03-08 14:03         ` Frank Sorenson [this message]
2008-03-08 14:03           ` Frank Sorenson
2008-03-08 21:53           ` Jon Masters
2008-03-08 21:53             ` Jon Masters
2008-03-08 22:54             ` Kay Sievers
2008-03-08 22:54               ` Kay Sievers
2008-03-09  3:41               ` Jon Masters
2008-03-09  3:41                 ` Jon Masters

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47D29CAB.50301@tuxrocks.com \
    --to=frank@tuxrocks.com \
    --cc=Matt_Domsch@dell.com \
    --cc=jcm@redhat.com \
    --cc=kay.sievers@vrfy.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@elte.hu \
    --cc=rjw@sisk.pl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.