public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Analyzed/Solved: Booting 2.6.30-rc2-git7 very slow
@ 2009-04-24 12:45 Martin Knoblauch
  2009-04-29  1:28 ` Andrew Morton
  0 siblings, 1 reply; 56+ messages in thread
From: Martin Knoblauch @ 2009-04-24 12:45 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linux-kernel, efault, tigran aivazian


----- Original Message ----

> From: Martin Knoblauch <knobi@knobisoft.de>
> To: Rafael J. Wysocki <rjw@sisk.pl>
> Cc: linux-kernel@vger.kernel.org; efault@gmx.de
> Sent: Wednesday, April 22, 2009 6:55:27 PM
> Subject: Re: Booting 2.6.30-rc2-git7 very slow
> 
> ----- Original Message ----
> 
> > From: Martin Knoblauch 
> > To: Rafael J. Wysocki 
> > Cc: linux-kernel@vger.kernel.org; efault@gmx.de
> > Sent: Wednesday, April 22, 2009 1:32:25 PM
> > Subject: Re: Booting 2.6.30-rc2-git7 very slow
> > 
> > ----- Original Message ----
> > 
> > > From: Martin Knoblauch 
> > > To: Rafael J. Wysocki 
> > > Cc: linux-kernel@vger.kernel.org
> > > Sent: Wednesday, April 22, 2009 11:56:27 AM
> > > Subject: Re: Booting 2.6.30-rc2-git7 very slow
> > > 
> > > > From: Rafael J. Wysocki 
> > > 
> > > > To: Martin Knoblauch 
> > > > Cc: linux-kernel@vger.kernel.org
> > > > Sent: Tuesday, April 21, 2009 9:12:20 PM
> > > > Subject: Re: Booting 2.6.30-rc2-git7 very slow
> > > > 
> > > > On Tuesday 21 April 2009, Martin Knoblauch wrote:
> > > > > 
> > > > > Hi, [please CC me on replies, as I am not subscribed]
> > > > > 
> > > > >  booting  2.6.30-rc2-git7 on a HP/DL380G3 (x86_64, RHEL4.3, 64 bit 
> > > userspace) 
> > > > is much slower that it used to be. It seems I run into timeouts when 
> [trying 
> > 
> > > to] 
> > > > load intel and tg3 microcodes:
> > > > > 
> > > > > [   14.478892] platform microcode: firmware: requesting 
> > intel-ucode/0f-04-0a
> > > > > [   74.476741] platform microcode: firmware: requesting 
> > intel-ucode/0f-04-0a
> > > > > [  134.476638] platform microcode: firmware: requesting 
> > intel-ucode/0f-04-0a
> > > > > [  194.476637] platform microcode: firmware: requesting 
> > intel-ucode/0f-04-0a
> > > > > [  254.476493] Microcode Update Driver: v2.00 , 
> > > > Peter Oruba
> > > > > [  254.718489] Microcode Update Driver: v2.00 removed.
> > > > > 
> > > > >  So, we see 60 seconds for eaoch of the CPUs
> > > > > 
> > > > > [  255.273426] tg3 0000:03:01.0: PME# disabled
> > > > > [  257.833769] tg3: eth0: Link is up at 1000 Mbps, full duplex.
> > > > > [  257.833775] tg3: eth0: Flow control is off for TX and off for RX.
> > > > > [  269.643973] tg3 0000:03:01.1: firmware: requesting tigon/tg3_tso.bin
> > > > > [  329.640456] eth1: Failed to load firmware "tigon/tg3_tso.bin"
> > > > > [  329.640463] eth1: TSO capability disabled.
> > > > > [  329.640487] tg3 0000:03:01.1: PME# disabled
> > > > > [  333.081753] tg3: eth1: Link is up at 1000 Mbps, full duplex.
> > > > > [  333.081759] tg3: eth1: Flow control is off for TX and off for RX.
> > > > > 
> > > > >  We see 60 seconds for eth1, complaining about a failed firmware load.
> > > > > /lib/firmware/tigon/tg3_tso.bin does exist and is from the current
> > > > > build.
> > > > 
> > > > Do I assume correctly that 2.6.29 did not have this problem?
> > > > 
> > > 
> > > Just checked. 2.6.29 has exactely the same problem. 2.6.28.2 was OK. This is 
> 
> > > from the 2.6.29 boot:
> > > 
> > > [   14.308340] platform microcode: firmware: requesting intel-ucode/0f-04-0a
> > > [   74.304612] platform microcode: firmware: requesting intel-ucode/0f-04-0a
> > > [  134.304651] platform microcode: firmware: requesting intel-ucode/0f-04-0a
> > > [  194.304638] platform microcode: firmware: requesting intel-ucode/0f-04-0a
> > > [  254.304597] Microcode Update Driver: v2.00 ,  Peter Oruba
> > > [  254.546200] Microcode Update Driver: v2.00 removed.
> > 
> > In case of the platform microcode, the delays happen when doing:
> > 
> > /sbin/modprobe microcode
> > 
> > from the init script. I have a "microcode.dat" File in both /etc/ and 
> > /etc/firmware.
> > 
> > > [  255.088405] tg3 0000:03:01.0: PME# disabled
> > > [  257.669617] tg3: eth0: Link is up at 1000 Mbps, full duplex.
> > > [  257.669622] tg3: eth0: Flow control is off for TX and off for RX.
> > > [  269.456132] tg3 0000:03:01.1: firmware: requesting tigon/tg3_tso.bin
> > > [  329.456495] eth1: Failed to load firmware "tigon/tg3_tso.bin"
> > > [  329.456500] eth1: TSO capability disabled.
> > > [  329.456524] tg3 0000:03:01.1: PME# disabled
> > > [  332.921832] tg3: eth1: Link is up at 1000 Mbps, full duplex.
> > > [  332.921837] tg3: eth1: Flow control is off for TX and off for RX.
> > > 
> > > 
> > 
> 
> OK, I was able to solve the tg3 failure by adding "tigon/tg3_tso.bin" to 
> CONFIG_EXTRA_FIRMWARE. I tried the same with the CPU microcode (after copying 
> /etc/firmware/microcode.dat to $BUILDDIR/firmware/), but no success.
> 
> Searching the archives also found some mentioning that I might need special 
> udev-rules to make firmware loading work again transparently. But no explicit 
> solution.
> 
> I find this new (since 2.6.29) behaviour:
> 
> - very unexpected
> - not well documented
> - the EXTRA_FIRMWARE solution very unscalable
> 

 OK, I just found the reason for both intel-ucode and tg3 failures. Apparently between 2.6.28 and 2.6.29 the mount of sysfs has subtely changed from:

/sys /sys sysfs rw 0 0

to:

none /sys sysfs rw,relatime 0 0

 The "none" breaks the RHEL-4 provided hotplug script "firmware.agent" when it tries to parse the mount point for "/sys". As a result, the firmware loading is never properly finished and the driver(s) just timeout on the value  in /sys/class/firmware/timeout. Bingo. Simple fix in user-pace possible - cool down Martin :-)

Questions remains: was this intentional? It breaks existing userspace and should therefore be considered a regression - right? On the other hand, it will never be a problem for RHEL-4/5 kernels, unless the change in 2.6.29 gets backported. Any ideas?

Cheers
Martin


^ permalink raw reply	[flat|nested] 56+ messages in thread
* Re: Analyzed/Solved/Bisected: Booting 2.6.30-rc2-git7 very slow
@ 2009-05-20 11:01 Martin Knoblauch
  0 siblings, 0 replies; 56+ messages in thread
From: Martin Knoblauch @ 2009-05-20 11:01 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Andrew Morton, viro, rjw, linux-kernel, tigran, Kay Sievers,
	Rafael J. Wysocki, shemminger


----- Original Message ----

> From: Martin Knoblauch <knobi@knobisoft.de>
> To: Mike Galbraith <efault@gmx.de>
> Cc: Andrew Morton <akpm@linux-foundation.org>; viro@ZenIV.linux.org.uk; rjw@sisk.pl; linux-kernel@vger.kernel.org; tigran@aivazian.fsnet.co.uks; Kay Sievers <kay.sievers@vrfy.org>; Rafael J. Wysocki <rjw@sisk.pl>; shemminger@vyatta.com
> Sent: Wednesday, May 20, 2009 12:22:28 PM
> Subject: Re: Analyzed/Solved/Bisected: Booting 2.6.30-rc2-git7 very slow
> 
> ----- Original Message ----
> 
> > From: Mike Galbraith 
> > To: Martin Knoblauch 
> > Cc: Andrew Morton ; viro@ZenIV.linux.org.uk; 
> rjw@sisk.pl; linux-kernel@vger.kernel.org; tigran@aivazian.fsnet.co.uk
> > Sent: Wednesday, May 6, 2009 10:37:45 AM
> > Subject: Re: Analyzed/Solved: Booting 2.6.30-rc2-git7 very slow
> > 
> > On Wed, 2009-05-06 at 00:55 -0700, Martin Knoblauch wrote:
> > 
> > >  just to bring this back to my problem :-)
> > 
> > Good idea :-)
> > 
> > >  Last week I reported that the "new" sysfs entry in /proc/mounts already 
> comes 
> > out of initrd. Does this ring a bell?
> > > 
> > > http://lkml.indiana.edu/hypermail/linux/kernel/0904.3/03048.html
> > 
> > Nope, no bells.
> > 
> > The only thing I can suggest is that you try a bisection.
> > 
> >     -Mike
> 
> OK, so I finally managed to bisect the issue down to the following commit. Not 
> much that I can say about it. Someone else suggested that it might all be a 
> question of timing. Might very well be. I will try it out on a system with a 
> different SCSI/RAID controller. The failing system has an "Smart Array 6i" 
> (cciss). "cciss", "ext3" and "jbd" are all modules coming from initrd.
> 
> |commit 1120f8b8169fb2cb51219d326892d963e762edb6
> |Author: Stephen Hemminger 
> |Date:   Thu Dec 18 09:17:16 2008 -0800
> |
> |    PCI: handle long delays in VPD access
> |
> |    Accessing the VPD area can take a long time.  The existing
> |    VPD access code fails consistently on my hardware. There are comments
> |
> |    Change the access routines to:
> |      * use a mutex rather than spinning with IRQ's disabled and lock held
> |      * have a much longer timeout
> |      * call cond_resched while spinning
> |
> |    Signed-off-by: Stephen Hemminger 
> |    Reviewed-by: Matthew Wilcox 
> |    Signed-off-by: Jesse Barnes 
> 


 yup. Different hardware (IBM x3650 with aacraid) does not show the problem. So it seems to be timing related. I more and more tend to view this as "so what" and go along.

Cheers
Martin


^ permalink raw reply	[flat|nested] 56+ messages in thread
* Re: Analyzed/Solved/Bisected: Booting 2.6.30-rc2-git7 very slow
@ 2009-05-27 11:25 Martin Knoblauch
  2009-05-27 20:31 ` Andrew Morton
  0 siblings, 1 reply; 56+ messages in thread
From: Martin Knoblauch @ 2009-05-27 11:25 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mike Galbraith, viro, rjw, linux-kernel, Kay Sievers, shemminger,
	Jesse Barnes, Matthew Wilcox, mike miller

[-- Attachment #1: Type: text/plain, Size: 5358 bytes --]

----- Original Message ----

> From: Martin Knoblauch <knobi@knobisoft.de>
> To: Andrew Morton <akpm@linux-foundation.org>
> Cc: Mike Galbraith <efault@gmx.de>; viro@ZenIV.linux.org.uk; rjw@sisk.pl; linux-kernel@vger.kernel.org; tigran@aivazian.fsnet.co.uks; Kay Sievers <kay.sievers@vrfy.org>; shemminger@vyatta.com; Jesse Barnes <jbarnes@virtuousgeek.org>; Matthew Wilcox <matthew@wil.cx>
> Sent: Wednesday, May 27, 2009 11:14:06 AM
> Subject: Re: Analyzed/Solved/Bisected: Booting 2.6.30-rc2-git7 very slow
> 
> ----- Original Message ----
> 
> > From: Andrew Morton 
> > To: Martin Knoblauch 
> > Cc: Mike Galbraith ; viro@ZenIV.linux.org.uk; rjw@sisk.pl; 
> linux-kernel@vger.kernel.org; tigran@aivazian.fsnet.co.uks; Kay Sievers 
> ; shemminger@vyatta.com; Jesse Barnes 
> ; Matthew Wilcox 
> > Sent: Wednesday, May 27, 2009 8:31:02 AM
> > Subject: Re: Analyzed/Solved/Bisected: Booting 2.6.30-rc2-git7 very slow
> > 
> > On Wed, 20 May 2009 03:22:28 -0700 (PDT) Martin Knoblauch 
> > wrote:
> > 
> > > 
> > > ----- Original Message ----
> > > 
> > > > From: Mike Galbraith 
> > > > To: Martin Knoblauch 
> > > > Cc: Andrew Morton ; viro@ZenIV.linux.org.uk; 
> > rjw@sisk.pl; linux-kernel@vger.kernel.org; tigran@aivazian.fsnet.co.uk
> > > > Sent: Wednesday, May 6, 2009 10:37:45 AM
> > > > Subject: Re: Analyzed/Solved: Booting 2.6.30-rc2-git7 very slow
> > > > 
> > > > On Wed, 2009-05-06 at 00:55 -0700, Martin Knoblauch wrote:
> > > > 
> > > > >  just to bring this back to my problem :-)
> > > > 
> > > > Good idea :-)
> > > > 
> > > > >  Last week I reported that the "new" sysfs entry in /proc/mounts already 
> 
> > comes 
> > > > out of initrd. Does this ring a bell?
> > > > > 
> > > > > http://lkml.indiana.edu/hypermail/linux/kernel/0904.3/03048.html
> > > > 
> > > > Nope, no bells.
> > > > 
> > > > The only thing I can suggest is that you try a bisection.
> > > > 
> > > >     -Mike
> > > 
> > >  OK, so I finally managed to bisect the issue down to the following commit. 
> > Not much that I can say about it. Someone else suggested that it might all be 
> a 
> > question of timing. Might very well be. I will try it out on a system with a 
> > different SCSI/RAID controller. The failing system has an "Smart Array 6i" 
> > (cciss). "cciss", "ext3" and "jbd" are all modules coming from initrd.
> > > 
> > > |commit 1120f8b8169fb2cb51219d326892d963e762edb6
> > > |Author: Stephen Hemminger 
> > > |Date:   Thu Dec 18 09:17:16 2008 -0800
> > > |
> > > |    PCI: handle long delays in VPD access
> > > |
> > > |    Accessing the VPD area can take a long time.  The existing
> > > |    VPD access code fails consistently on my hardware. There are comments
> > > |
> > > |    Change the access routines to:
> > > |      * use a mutex rather than spinning with IRQ's disabled and lock held
> > > |      * have a much longer timeout
> > > |      * call cond_resched while spinning
> > > |
> > > |    Signed-off-by: Stephen Hemminger 
> > > |    Reviewed-by: Matthew Wilcox 
> > > |    Signed-off-by: Jesse Barnes 
> > > 
> > 
> > 
> > 
> > So afacit what's happening is that the above change caused one of your
> > PCI devices to take a very long time to initialise, yes?  Was it the
> > CCISS driver?
> >
> 
> the whole thing is not understood. I mentioned CCISS only because it is the most 
> visible difference difference between my two test platforms.
> 
> > If you add "printk.time=y" to the kernel boot command line then you'll
> > get timestamped boot messages which will make it easier to determine
> > where the time was consumed.  Adding `initcall_debug' to the boot line
> > will help us delve further into the delay, assuming that the offending
> > driver is build into vmlinux (which it might not be).
> >
> 
> added both options. "dmesg" output from both is appended. The initcall timings 
> vary in both directions. For CCISS, they are actually faster for the "bad" case.
> 
> > Either way, it would be useful to know which driver the above change
> > broke.
> >
> 
> agreed.
> 
> > Once we know that, the questions is: doe sthe driver still work?  If
> > so, then presumably the hardware if behaving unexpectedly, or in a way
> > which we're failing to cope with.
> >
> 
> if it is CCISS, I can definitely say that it does work OK. As far as I can see, 
> the whole system works OK, besides the duplicate sysfs line coming out of 
> initrd.
> 
> > Or perhaps that patch was simply buggy.
> > 
> > btw, I don't agree that this report should be closed for "fuzziness"! 
> > AFACIT the regression clearly and reproducibly occurs on one of your
> > machines, yes?  That ain't fuzzy!
> 
> frankly, I will stop caring about the DL380s before 2.6.31 gets released. My 
> production kernels are not affected and the hotplug scripts can easily be fixed 
> for testting. So, my interest is more curiosity. And the day-job does not really 
> justify spending much more time on it. Aren't day-jobs annoying ...
> 
> Cheers
> Martin

 FWIW, I compiled the CCISS driver into the kernel. This makes the second "/sys" line in /proc/mounts go away, dmesg attached. But does it prove anything? The initialization of the CCISS hardware now happens about 2 seconds earlier in the bootup sequence. Does this hint to a problem with CCISS, or just confirms that the whole issue is really timing dependent? Anyway, I add Mike to CC.

Cheers
Martin

[-- Attachment #2: dmesg-g904d6a3-dirty-modscsi-nomodcciss-good --]
[-- Type: application/octet-stream, Size: 89578 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2009-06-21 13:50 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-24 12:45 Analyzed/Solved: Booting 2.6.30-rc2-git7 very slow Martin Knoblauch
2009-04-29  1:28 ` Andrew Morton
2009-04-29  3:51   ` Mike Galbraith
2009-04-29  8:17     ` Andrew Morton
2009-04-29  9:36       ` Martin Knoblauch
2009-04-29  9:45       ` Martin Knoblauch
2009-04-29 12:08       ` Al Viro
2009-04-29 14:18         ` Mike Galbraith
2009-04-29 14:34           ` Al Viro
2009-04-29 17:28             ` Martin Knoblauch
2009-04-29 19:11             ` Mike Galbraith
2009-05-05 22:49           ` Andrew Morton
2009-05-06  4:45             ` Mike Galbraith
2009-05-06  7:55               ` Martin Knoblauch
2009-05-06  8:37                 ` Mike Galbraith
2009-05-06 10:57                   ` Martin Knoblauch
2009-05-20 10:22                   ` Analyzed/Solved/Bisected: " Martin Knoblauch
2009-05-27  6:31                     ` Andrew Morton
2009-05-27  9:14                       ` Martin Knoblauch
2009-05-27 11:21                       ` Matthew Wilcox
2009-05-27 11:53                         ` Martin Knoblauch
2009-05-27 18:07                           ` jim owens
2009-05-27 18:18                             ` Miller, Mike (OS Dev)
2009-05-27 20:12                               ` jim owens
2009-05-27 21:18                                 ` Miller, Mike (OS Dev)
2009-05-28  8:59                               ` Martin Knoblauch
2009-05-28 19:01                                 ` Miller, Mike (OS Dev)
2009-05-28 20:48                                   ` Martin Knoblauch
2009-04-29 17:24         ` Analyzed/Solved: " Martin Knoblauch
2009-04-29 17:35           ` Valdis.Kletnieks
2009-04-29 17:43             ` Al Viro
2009-04-30 13:02               ` Olivier Galibert
2009-04-29 17:45             ` Martin Knoblauch
2009-04-29 17:41           ` Al Viro
2009-04-29 17:51             ` Martin Knoblauch
2009-04-29 18:10               ` Al Viro
2009-04-30  9:12                 ` Martin Knoblauch
2009-05-27  6:22                   ` Andrew Morton
2009-04-29  9:34   ` Martin Knoblauch
  -- strict thread matches above, loose matches on Subject: below --
2009-05-20 11:01 Analyzed/Solved/Bisected: " Martin Knoblauch
2009-05-27 11:25 Martin Knoblauch
2009-05-27 20:31 ` Andrew Morton
2009-05-27 20:56   ` Kay Sievers
2009-05-28  9:14     ` Martin Knoblauch
2009-06-16 19:25       ` Jesse Barnes
2009-06-17  8:35         ` Martin Knoblauch
2009-06-20 16:37           ` jim owens
2009-06-20 16:58             ` Matthew Wilcox
2009-06-20 18:19               ` Kay Sievers
2009-06-20 18:26                 ` Matthew Wilcox
2009-06-20 18:36                   ` Kay Sievers
2009-06-20 19:06                     ` Matthew Wilcox
2009-06-20 21:17                       ` jim owens
2009-06-21 10:57                         ` Martin Knoblauch
2009-06-21 13:50                           ` jim owens
2009-06-21 10:54             ` Martin Knoblauch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox