* long boot delays caused by 070ad7e7 floppy change
@ 2012-07-02 22:46 Andi Kleen
2012-07-02 23:01 ` Linus Torvalds
0 siblings, 1 reply; 20+ messages in thread
From: Andi Kleen @ 2012-07-02 22:46 UTC (permalink / raw)
To: Jiri Kosina; +Cc: linux-kernel, torvalds, shemminger
Jiri,
One of my test systems has a new 20s+ boot delay in 3.5rc*
It takes the floppy driver that long to determine I don't
have a floppy
[ 2.386322] calling floppy_init+0x0/0xe46 @ 1
[ 2.448977] Refined TSC clocksource calibration: 2594.106 MHz.
[ 2.449021] Switching to clocksource tsc
[ 22.371560] floppy0: no floppy controllers found
[ 22.371676] initcall floppy_init+0x0/0xe46 returned -19 after 19553621 usecs
In older kernels the delay is only about 3s.
I tracked it down to
commit 070ad7e793dc6ff753ee682ef7790b3373b471f6
Author: Jiri Kosina <jkosina@suse.cz>
Date: Fri May 18 13:50:25 2012 +0200
floppy: convert to delayed work and single-thread wq
The problem is not completely stable, a bisect actually doesn't
find it and reverting the patch on current HEAD also doesn't help.
However when setting the tree manually to one commit before
yours the problem is gone and adding your patch adds it.
I think the floppy init should run asynchronous, but it
would be still good to make this faster again. Since block
devices need to be synchronized at end of boot there is no
way to hide a 20s delay completely.
-Andi
--
ak@linux.intel.com -- Speaking for myself only
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: long boot delays caused by 070ad7e7 floppy change 2012-07-02 22:46 long boot delays caused by 070ad7e7 floppy change Andi Kleen @ 2012-07-02 23:01 ` Linus Torvalds 2012-07-03 13:47 ` Calvin Walton 0 siblings, 1 reply; 20+ messages in thread From: Linus Torvalds @ 2012-07-02 23:01 UTC (permalink / raw) To: Andi Kleen; +Cc: Jiri Kosina, linux-kernel, shemminger On Mon, Jul 2, 2012 at 3:46 PM, Andi Kleen <andi@firstfloor.org> wrote: > > One of my test systems has a new 20s+ boot delay in 3.5rc* > It takes the floppy driver that long to determine I don't > have a floppy Can you enable debug output ("floppy=debug" iirc) to give more details of what happens leading up to the delay? Maybe the timing of the debug messages would give some hints too.. The fact that reverting the patch on current -git doesn't actually help makes me curious. And means that we can't fix it by reverting, so we really do have to figure out what's going on. Linus ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: long boot delays caused by 070ad7e7 floppy change 2012-07-02 23:01 ` Linus Torvalds @ 2012-07-03 13:47 ` Calvin Walton 2012-07-03 17:07 ` Linus Torvalds 0 siblings, 1 reply; 20+ messages in thread From: Calvin Walton @ 2012-07-03 13:47 UTC (permalink / raw) To: Linus Torvalds; +Cc: Andi Kleen, Jiri Kosina, linux-kernel, shemminger On Mon, 2012-07-02 at 16:01 -0700, Linus Torvalds wrote: > On Mon, Jul 2, 2012 at 3:46 PM, Andi Kleen <andi@firstfloor.org> wrote: > > > > One of my test systems has a new 20s+ boot delay in 3.5rc* > > It takes the floppy driver that long to determine I don't > > have a floppy > > Can you enable debug output ("floppy=debug" iirc) to give more details > of what happens leading up to the delay? Maybe the timing of the debug > messages would give some hints too.. I'm seeing the same issue, so I've given this a try. Here are the messages printed when floppy debugging is enabled: [ 0.000000] floppy0: Setting flag 0x1 ... [ 0.703201] 00:07: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A [ 0.703506] Non-volatile memory driver v1.3 [ 0.714318] floppy0: reschedule timeout lock fdc [ 1.651396] Refined TSC clocksource calibration: 2698.759 MHz. [ 1.651548] Switching to clocksource tsc [ 20.729798] floppy0: reschedule timeout do wakeup [ 20.732654] floppy0: no floppy controllers found [ 20.733220] loop: module loaded [ 20.733394] ahci 0000:00:1f.2: version 3.0 [ 20.733430] ahci 0000:00:1f.2: irq 42 for MSI/MSI-X The 'reschedule timeout' messages are only printed when debugging is enabled. Hopefully that can give someone a hint as to what's going on? -- Calvin Walton <calvin.walton@kepstin.ca> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: long boot delays caused by 070ad7e7 floppy change 2012-07-03 13:47 ` Calvin Walton @ 2012-07-03 17:07 ` Linus Torvalds 2012-07-03 17:40 ` Calvin Walton 2012-07-03 17:45 ` Andi Kleen 0 siblings, 2 replies; 20+ messages in thread From: Linus Torvalds @ 2012-07-03 17:07 UTC (permalink / raw) To: Calvin Walton; +Cc: Andi Kleen, Jiri Kosina, linux-kernel, shemminger On Tue, Jul 3, 2012 at 6:47 AM, Calvin Walton <calvin.walton@kepstin.ca> wrote: > > I'm seeing the same issue, so I've given this a try. Here are the > messages printed when floppy debugging is enabled: > > [ 0.714318] floppy0: reschedule timeout lock fdc > [ 20.729798] floppy0: reschedule timeout do wakeup Ok, that 20 s timeout is a hint. 20s is the floppy "max timeout" value, and it's used in various places. HOWEVER. The "lock fdc" case absolutely should not be using that timeout. It should use the floppy "UDP->timeout" value, which should be just 3s. Which is still a long time, but not nearly as long as that. Can you add a DPRINT() to the lock_fdc() function just before calling reschedule_timeout() to print out what the "drive" is? Something like yhhis (whitespace-damaged due to cut-and-paste): diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c index cce7df367b79..00b52cef9d22 100644 --- a/drivers/block/floppy.c +++ b/drivers/block/floppy.c @@ -876,6 +876,7 @@ static int lock_fdc(int drive, bool interruptible) command_status = FD_COMMAND_NONE; + DPRINT("lock_fdc: %d (out of %d)\n", drive, N_DRIVE); reschedule_timeout(drive, "lock fdc"); set_fdc(drive); return 0; because maybe the 'drive' variable is crap there. In fact, add a WARN_ON_ONCE(dive < 0 || drive >= N_DRIVE); in there too. Because if it is out of range, I'd want to see the call chain. (And you still need floppy=debug to see that debug message, of course). Linus ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: long boot delays caused by 070ad7e7 floppy change 2012-07-03 17:07 ` Linus Torvalds @ 2012-07-03 17:40 ` Calvin Walton 2012-07-03 17:50 ` Linus Torvalds 2012-07-03 17:45 ` Andi Kleen 1 sibling, 1 reply; 20+ messages in thread From: Calvin Walton @ 2012-07-03 17:40 UTC (permalink / raw) To: Linus Torvalds; +Cc: Andi Kleen, Jiri Kosina, linux-kernel, shemminger On Tue, 2012-07-03 at 10:07 -0700, Linus Torvalds wrote: > On Tue, Jul 3, 2012 at 6:47 AM, Calvin Walton <calvin.walton@kepstin.ca> wrote: > > > > I'm seeing the same issue, so I've given this a try. Here are the > > messages printed when floppy debugging is enabled: > > > > [ 0.714318] floppy0: reschedule timeout lock fdc > > [ 20.729798] floppy0: reschedule timeout do wakeup > > Ok, that 20 s timeout is a hint. > > 20s is the floppy "max timeout" value, and it's used in various > places. HOWEVER. The "lock fdc" case absolutely should not be using > that timeout. It should use the floppy "UDP->timeout" value, which > should be just 3s. Which is still a long time, but not nearly as long > as that. > > Can you add a DPRINT() to the lock_fdc() function just before calling > reschedule_timeout() to print out what the "drive" is? Ok, here's the dmesg output that I get: [ 0.661966] Serial: 8250/16550 driver, 2 ports, IRQ sharing disabled [ 0.682589] serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A [ 0.703303] 00:07: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A [ 0.703599] Non-volatile memory driver v1.3 [ 0.714316] floppy0: lock_fdc: -1 (out of 8) [ 0.714462] ------------[ cut here ]------------ [ 0.714606] WARNING: at drivers/block/floppy.c:880 lock_fdc.isra.30+0x156/0x170() [ 0.717372] Hardware name: EX58-DS4 [ 0.717511] Modules linked in: [ 0.717649] Pid: 1, comm: swapper/0 Not tainted 3.5.0-rc4-00230-g15114c7-dirty #21 [ 0.717872] Call Trace: [ 0.718010] [<ffffffff810686aa>] warn_slowpath_common+0x7a/0xb0 [ 0.718154] [<ffffffff810686f5>] warn_slowpath_null+0x15/0x20 [ 0.718306] [<ffffffff813e3d96>] lock_fdc.isra.30+0x156/0x170 [ 0.718448] [<ffffffff813e46f1>] user_reset_fdc+0x11/0x80 [ 0.718592] [<ffffffff81cbb854>] floppy_init+0x7d7/0xe15 [ 0.718737] [<ffffffff81cbb07d>] ? set_cmos+0x6e/0x6e [ 0.718879] [<ffffffff810001ba>] do_one_initcall+0x3a/0x160 [ 0.719024] [<ffffffff81c95c4f>] kernel_init+0x101/0x185 [ 0.719169] [<ffffffff81c955c5>] ? do_early_param+0x87/0x87 [ 0.719317] [<ffffffff81760d94>] kernel_thread_helper+0x4/0x10 [ 0.719460] [<ffffffff81c95b4e>] ? start_kernel+0x32f/0x32f [ 0.719604] [<ffffffff81760d90>] ? gs_change+0xb/0xb [ 0.719748] ---[ end trace 16b38434e9defc9e ]--- [ 0.719888] floppy0: reschedule timeout lock fdc [ 1.651398] Refined TSC clocksource calibration: 2698.759 MHz. [ 1.651551] Switching to clocksource tsc [ 20.729857] floppy0: reschedule timeout do wakeup [ 20.730009] floppy0: no floppy controllers found [ 20.730578] loop: module loaded [ 20.730746] ahci 0000:00:1f.2: version 3.0 [ 20.730782] ahci 0000:00:1f.2: irq 42 for MSI/MSI-X when using this debugging patch: diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c index cce7df3..8a41259 100644 --- a/drivers/block/floppy.c +++ b/drivers/block/floppy.c @@ -876,6 +876,9 @@ static int lock_fdc(int drive, bool interruptible) command_status = FD_COMMAND_NONE; + DPRINT("lock_fdc: %d (out of %d)\n", drive, N_DRIVE); + WARN_ON_ONCE(drive < 0 || drive >= N_DRIVE); + reschedule_timeout(drive, "lock fdc"); set_fdc(drive); return 0; -- Calvin Walton <calvin.walton@kepstin.ca> ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: long boot delays caused by 070ad7e7 floppy change 2012-07-03 17:40 ` Calvin Walton @ 2012-07-03 17:50 ` Linus Torvalds 2012-07-03 17:57 ` Jiri Kosina 2012-07-03 17:58 ` Linus Torvalds 0 siblings, 2 replies; 20+ messages in thread From: Linus Torvalds @ 2012-07-03 17:50 UTC (permalink / raw) To: Calvin Walton; +Cc: Andi Kleen, Jiri Kosina, linux-kernel, shemminger On Tue, Jul 3, 2012 at 10:40 AM, Calvin Walton <calvin.walton@kepstin.ca> wrote: > > Ok, here's the dmesg output that I get: > > [ 0.714316] floppy0: lock_fdc: -1 (out of 8) Ok, thanks. That -1 definitely explains the long delay. It's using "drive=-1" because it has no actual drive number, just the *controller* number. And the backtrace shows what's going on too: it's the user_reset_fdc(-1, FD_RESET_ALWAYS, false); that passes in -1 as the floppy controller number. Ugh. It looks like the code meant for -1 to mean "regular delay" (current_reqD), and -2 to mean MAXTIMEOUT. For some reason the "-1" logic has gotten screwed. Linus ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: long boot delays caused by 070ad7e7 floppy change 2012-07-03 17:50 ` Linus Torvalds @ 2012-07-03 17:57 ` Jiri Kosina 2012-07-03 18:45 ` Andi Kleen 2012-07-03 17:58 ` Linus Torvalds 1 sibling, 1 reply; 20+ messages in thread From: Jiri Kosina @ 2012-07-03 17:57 UTC (permalink / raw) To: Linus Torvalds; +Cc: Calvin Walton, Andi Kleen, linux-kernel, shemminger On Tue, 3 Jul 2012, Linus Torvalds wrote: > > Ok, here's the dmesg output that I get: > > > > [ 0.714316] floppy0: lock_fdc: -1 (out of 8) > > Ok, thanks. That -1 definitely explains the long delay. It's using > "drive=-1" because it has no actual drive number, just the > *controller* number. > > And the backtrace shows what's going on too: it's the > > user_reset_fdc(-1, FD_RESET_ALWAYS, false); > > that passes in -1 as the floppy controller number. > > Ugh. > > It looks like the code meant for -1 to mean "regular delay" > (current_reqD), and -2 to mean MAXTIMEOUT. For some reason the "-1" > logic has gotten screwed. I am still trying to understand the exact situation. Andi, do I understand you correctly that even if you revert 070ad7e793dc on top of current Linus' tree, you are still seeing the 20 second delay inside the initcall? -- Jiri Kosina SUSE Labs ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: long boot delays caused by 070ad7e7 floppy change 2012-07-03 17:57 ` Jiri Kosina @ 2012-07-03 18:45 ` Andi Kleen 0 siblings, 0 replies; 20+ messages in thread From: Andi Kleen @ 2012-07-03 18:45 UTC (permalink / raw) To: Jiri Kosina Cc: Linus Torvalds, Calvin Walton, Andi Kleen, linux-kernel, shemminger On Tue, Jul 03, 2012 at 07:57:15PM +0200, Jiri Kosina wrote: > On Tue, 3 Jul 2012, Linus Torvalds wrote: > > > > Ok, here's the dmesg output that I get: > > > > > > [ 0.714316] floppy0: lock_fdc: -1 (out of 8) > > > > Ok, thanks. That -1 definitely explains the long delay. It's using > > "drive=-1" because it has no actual drive number, just the > > *controller* number. > > > > And the backtrace shows what's going on too: it's the > > > > user_reset_fdc(-1, FD_RESET_ALWAYS, false); > > > > that passes in -1 as the floppy controller number. > > > > Ugh. > > > > It looks like the code meant for -1 to mean "regular delay" > > (current_reqD), and -2 to mean MAXTIMEOUT. For some reason the "-1" > > logic has gotten screwed. > > I am still trying to understand the exact situation. > > Andi, do I understand you correctly that even if you revert 070ad7e793dc > on top of current Linus' tree, you are still seeing the 20 second delay > inside the initcall? Correct. Here's the debug output with just reverted (without Linus patch): [ 2.489559] calling floppy_init+0x0/0xe46 @ 1 [ 2.504363] floppy0: lock_fdc: -1 (out of 8) [ 2.504410] floppy0: reschedule timeout lock fdc [ 22.498784] floppy0: reschedule timeout do wakeup [ 22.498811] floppy0: no floppy controllers found [ 22.498944] initcall floppy_init+0x0/0xe46 returned -19 after 19577158 usecs ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: long boot delays caused by 070ad7e7 floppy change 2012-07-03 17:50 ` Linus Torvalds 2012-07-03 17:57 ` Jiri Kosina @ 2012-07-03 17:58 ` Linus Torvalds 2012-07-03 18:15 ` Jiri Kosina 1 sibling, 1 reply; 20+ messages in thread From: Linus Torvalds @ 2012-07-03 17:58 UTC (permalink / raw) To: Calvin Walton; +Cc: Andi Kleen, Jiri Kosina, linux-kernel, shemminger On Tue, Jul 3, 2012 at 10:50 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > It looks like the code meant for -1 to mean "regular delay" > (current_reqD), and -2 to mean MAXTIMEOUT. For some reason the "-1" > logic has gotten screwed. So it looks like current_reqD still exists, and sets the drive to "current_drive". Which *should* be 0, as far as I can tell. In fact, since "current_drive" is printed out by DPRINTK(), we *know* it's 0 (it's the "0" in the "floppy0:" message). But how the heck does it pick that bogus 20 second delay then? The -1 should have been caught by if (drive == current_reqD) drive = current_drive; and then we should have used the regular delay for drive 0. Jiri, any ideas? Linus ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: long boot delays caused by 070ad7e7 floppy change 2012-07-03 17:58 ` Linus Torvalds @ 2012-07-03 18:15 ` Jiri Kosina 2012-07-03 18:58 ` Andi Kleen 0 siblings, 1 reply; 20+ messages in thread From: Jiri Kosina @ 2012-07-03 18:15 UTC (permalink / raw) To: Linus Torvalds; +Cc: Calvin Walton, Andi Kleen, linux-kernel, shemminger On Tue, 3 Jul 2012, Linus Torvalds wrote: > > It looks like the code meant for -1 to mean "regular delay" > > (current_reqD), and -2 to mean MAXTIMEOUT. For some reason the "-1" > > logic has gotten screwed. > > So it looks like current_reqD still exists, and sets the drive to > "current_drive". Which *should* be 0, as far as I can tell. In fact, > since "current_drive" is printed out by DPRINTK(), we *know* it's 0 > (it's the "0" in the "floppy0:" message). > > But how the heck does it pick that bogus 20 second delay then? The -1 > should have been caught by > > if (drive == current_reqD) > drive = current_drive; > > and then we should have used the regular delay for drive 0. > > Jiri, any ideas? I am not seeing how this could be happening at the moment. I am also a bit puzzled by Andi's statement that reverting the conversion to the single wq is not fixing it. Have we by any chance changed anything in wq processing during early bootup lately? Calvin, Andi, could you please dump the value of UDP->timeout in __reschedule_timeout(), please? diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c index cce7df3..9850120 100644 --- a/drivers/block/floppy.c +++ b/drivers/block/floppy.c @@ -675,8 +675,11 @@ static void __reschedule_timeout(int drive, const char *message) if (drive < 0 || drive >= N_DRIVE) { delay = 20UL * HZ; drive = 0; - } else + } else { + DPRINT("__reschedule_timeout: setting timeout to %ld\n", + UDP->timeout); delay = UDP->timeout; + } queue_delayed_work(floppy_wq, &fd_timeout, delay); if (UDP->flags & FD_DEBUG) -- Jiri Kosina SUSE Labs ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: long boot delays caused by 070ad7e7 floppy change 2012-07-03 18:15 ` Jiri Kosina @ 2012-07-03 18:58 ` Andi Kleen 2012-07-03 19:12 ` Linus Torvalds 0 siblings, 1 reply; 20+ messages in thread From: Andi Kleen @ 2012-07-03 18:58 UTC (permalink / raw) To: Jiri Kosina Cc: Linus Torvalds, Calvin Walton, Andi Kleen, linux-kernel, shemminger > Calvin, Andi, could you please dump the value of UDP->timeout in > __reschedule_timeout(), please? [ 2.496493] calling floppy_init+0x0/0xe46 @ 1 [ 2.511303] floppy0: lock_fdc: -1 (out of 8) [ 2.511346] floppy0: __reschedule_timeout: setting timeout to 750 [ 2.511372] floppy0: reschedule timeout lock fdc [ 22.501757] floppy0: reschedule timeout do wakeup [ 22.501784] floppy0: no floppy controllers found [ 22.501919] initcall floppy_init+0x0/0xe46 returned -19 after 19573258 usecs -Andi ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: long boot delays caused by 070ad7e7 floppy change 2012-07-03 18:58 ` Andi Kleen @ 2012-07-03 19:12 ` Linus Torvalds 2012-07-03 19:36 ` Calvin Walton 2012-07-03 19:40 ` Andi Kleen 0 siblings, 2 replies; 20+ messages in thread From: Linus Torvalds @ 2012-07-03 19:12 UTC (permalink / raw) To: Andi Kleen; +Cc: Jiri Kosina, Calvin Walton, linux-kernel, shemminger On Tue, Jul 3, 2012 at 11:58 AM, Andi Kleen <andi@firstfloor.org> wrote: > > [ 2.496493] calling floppy_init+0x0/0xe46 @ 1 > [ 2.511303] floppy0: lock_fdc: -1 (out of 8) > [ 2.511346] floppy0: __reschedule_timeout: setting timeout to 750 Ahhah. And you have CONFIG_HZ250 set, don't you? So it's actually asking for a 3-second timeout like it should. > [ 2.511372] floppy0: reschedule timeout lock fdc > [ 22.501757] floppy0: reschedule timeout do wakeup But that actual timeout is too damn reliably pretty much exactly 20s for it to be anything else. I wonder if 'fd_timeout' had been previously queued already with the 20s timeout, and not canceled. So then the new queuing of a 3-second timeout doesn't actually do anything, because the WORK_STRUCT_PENDING_BIT is still set. So maybe we're missing a cancel somewhere. What happens if you add a cancel_delayed_work(&fd_timeout); to before the queue_delayed_work() in __reschedule_timeout()? Does that possibly make the delay really be 3 seconds? Linus ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: long boot delays caused by 070ad7e7 floppy change 2012-07-03 19:12 ` Linus Torvalds @ 2012-07-03 19:36 ` Calvin Walton 2012-07-03 19:42 ` Andi Kleen 2012-07-03 20:05 ` Linus Torvalds 2012-07-03 19:40 ` Andi Kleen 1 sibling, 2 replies; 20+ messages in thread From: Calvin Walton @ 2012-07-03 19:36 UTC (permalink / raw) To: Linus Torvalds; +Cc: Andi Kleen, Jiri Kosina, linux-kernel, shemminger On Tue, 2012-07-03 at 12:12 -0700, Linus Torvalds wrote: > What happens if you add a > > cancel_delayed_work(&fd_timeout); > > to before the queue_delayed_work() in __reschedule_timeout()? Does > that possibly make the delay really be 3 seconds? Yes, it does... [ 0.718571] floppy0: reschedule timeout lock fdc [ 1.650956] Refined TSC clocksource calibration: 2698.760 MHz. [ 1.651109] Switching to clocksource tsc [ 3.724664] floppy0: reschedule timeout do wakeup [ 3.724815] floppy0: no floppy controllers found diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c index cce7df3..c8064e4 100644 --- a/drivers/block/floppy.c +++ b/drivers/block/floppy.c @@ -678,6 +678,8 @@ static void __reschedule_timeout(int drive, const char *message) } else delay = UDP->timeout; + cancel_delayed_work(&fd_timeout); + queue_delayed_work(floppy_wq, &fd_timeout, delay); if (UDP->flags & FD_DEBUG) DPRINT("reschedule timeout %s\n", message); Of course, a 3 second delay at this point in boot is still a fairly big bit of waiting, given how fast everything else is nowadays. -- Calvin Walton <calvin.walton@kepstin.ca> ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: long boot delays caused by 070ad7e7 floppy change 2012-07-03 19:36 ` Calvin Walton @ 2012-07-03 19:42 ` Andi Kleen 2012-07-03 20:05 ` Linus Torvalds 1 sibling, 0 replies; 20+ messages in thread From: Andi Kleen @ 2012-07-03 19:42 UTC (permalink / raw) To: Calvin Walton Cc: Linus Torvalds, Andi Kleen, Jiri Kosina, linux-kernel, shemminger On Tue, Jul 03, 2012 at 03:36:40PM -0400, Calvin Walton wrote: > On Tue, 2012-07-03 at 12:12 -0700, Linus Torvalds wrote: > > What happens if you add a > > > > cancel_delayed_work(&fd_timeout); > > > > to before the queue_delayed_work() in __reschedule_timeout()? Does > > that possibly make the delay really be 3 seconds? > > Yes, it does... > [ 0.718571] floppy0: reschedule timeout lock fdc > [ 1.650956] Refined TSC clocksource calibration: 2698.760 MHz. > [ 1.651109] Switching to clocksource tsc > [ 3.724664] floppy0: reschedule timeout do wakeup > [ 3.724815] floppy0: no floppy controllers found > > diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c > index cce7df3..c8064e4 100644 > --- a/drivers/block/floppy.c > +++ b/drivers/block/floppy.c > @@ -678,6 +678,8 @@ static void __reschedule_timeout(int drive, const char *message) > } else > delay = UDP->timeout; > > + cancel_delayed_work(&fd_timeout); > + > queue_delayed_work(floppy_wq, &fd_timeout, delay); > if (UDP->flags & FD_DEBUG) > DPRINT("reschedule timeout %s\n", message); > > Of course, a 3 second delay at this point in boot is still a fairly big > bit of waiting, given how fast everything else is nowadays. Here's the patch I did for that yesterday >From 04a4d851d1c96b6685352d9cef41b77966cc7a1e Mon Sep 17 00:00:00 2001 From: Andi Kleen <ak@linux.intel.com> Date: Mon, 2 Jul 2012 17:07:08 -0700 Subject: [PATCH] floppy: Run floppy initialization asynchronous floppy_init is quite slow, 3s on my test system to determine that there is no floppy. Run it asynchronous to the other init calls to improve boot time. Cuts down boot time by 1s on a system of mine. Signed-off-by: Andi Kleen <ak@linux.intel.com> diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c index cce7df3..8b5769b 100644 --- a/drivers/block/floppy.c +++ b/drivers/block/floppy.c @@ -191,6 +191,7 @@ static int print_unex = 1; #include <linux/mutex.h> #include <linux/io.h> #include <linux/uaccess.h> +#include <linux/async.h> /* * PS/2 floppies have much slower step rates than regular floppies. @@ -4122,7 +4123,7 @@ static struct kobject *floppy_find(dev_t dev, int *part, void *data) return get_disk(disks[drive]); } -static int __init floppy_init(void) +static int __init do_floppy_init(void) { int i, unit, drive; int err, dr; @@ -4337,6 +4338,24 @@ out_put_disk: return err; } +#ifndef MODULE +static __init void floppy_async_init(void *data, async_cookie_t cookie) +{ + do_floppy_init(); +} +#endif + +static int __init floppy_init(void) +{ +#ifdef MODULE + return do_floppy_init(void); +#else + /* Don't hold up the bootup by the floppy initialization */ + async_schedule(floppy_async_init, NULL); + return 0; +#endif +} + static const struct io_region { int offset; int size; -- 1.7.7.6 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: long boot delays caused by 070ad7e7 floppy change 2012-07-03 19:36 ` Calvin Walton 2012-07-03 19:42 ` Andi Kleen @ 2012-07-03 20:05 ` Linus Torvalds 2012-07-03 20:11 ` Andi Kleen 2012-07-03 23:06 ` Jiri Kosina 1 sibling, 2 replies; 20+ messages in thread From: Linus Torvalds @ 2012-07-03 20:05 UTC (permalink / raw) To: Calvin Walton; +Cc: Andi Kleen, Jiri Kosina, linux-kernel, shemminger On Tue, Jul 3, 2012 at 12:36 PM, Calvin Walton <calvin.walton@kepstin.ca> wrote: > On Tue, 2012-07-03 at 12:12 -0700, Linus Torvalds wrote: >> What happens if you add a >> >> cancel_delayed_work(&fd_timeout); >> >> to before the queue_delayed_work() in __reschedule_timeout()? Does >> that possibly make the delay really be 3 seconds? > > Yes, it does... Ok. And I see the previous timeout - it's this: reschedule_timeout(MAXTIMEOUT, "floppy init"); that sets the fd_timeout to 20 seconds initially. That whole fd_timeout code is broken, though. I'm not sure what the floppy init timeout is supposed to do, when the floppy reset code wants to re-use it. And yes, the 3-second timeout is still too long. Doing it asynchronously like Andi does helps a bit, but I suspect we could make the reset timeout shorter still just to make the initial "you don't have a floppy drive" code go faster. That said, why even compile in the floppy driver any more? Even if you have the hardware, it probably doesn't work after ten years of gathering dust. Those floppy drives weren't exactly reliable even back in the days.. Anyway, I looked up the 82078 docs for reset. Holding the reset low *does* cause an interrupt, but I'm not finding how long the reset might take. But for the controller it really should be milliseconds, not seconds, I suspect. Can anybody find the appropriate reset timeout? Linus ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: long boot delays caused by 070ad7e7 floppy change 2012-07-03 20:05 ` Linus Torvalds @ 2012-07-03 20:11 ` Andi Kleen 2012-07-03 23:06 ` Jiri Kosina 1 sibling, 0 replies; 20+ messages in thread From: Andi Kleen @ 2012-07-03 20:11 UTC (permalink / raw) To: Linus Torvalds; +Cc: Calvin Walton, Jiri Kosina, linux-kernel, shemminger Linus Torvalds <torvalds@linux-foundation.org> writes: > > Doing it asynchronously like Andi does helps a bit, I think it would be more effective if the floppy driver initialized earlier, I think I have some slowdown because it starts too late. > but I suspect we > could make the reset timeout shorter still just to make the initial > "you don't have a floppy drive" code go faster. > > That said, why even compile in the floppy driver any more? I used to boot simulators from make fdimage some time ago. Until the kernels got too large :-) But yes make isoimage is the way to go and I'll remove it from my configs. Thanks for the debugging. -Andi -- ak@linux.intel.com -- Speaking for myself only ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: long boot delays caused by 070ad7e7 floppy change 2012-07-03 20:05 ` Linus Torvalds 2012-07-03 20:11 ` Andi Kleen @ 2012-07-03 23:06 ` Jiri Kosina 1 sibling, 0 replies; 20+ messages in thread From: Jiri Kosina @ 2012-07-03 23:06 UTC (permalink / raw) To: Linus Torvalds; +Cc: Calvin Walton, Andi Kleen, linux-kernel, shemminger On Tue, 3 Jul 2012, Linus Torvalds wrote: > >> What happens if you add a > >> > >> cancel_delayed_work(&fd_timeout); > >> > >> to before the queue_delayed_work() in __reschedule_timeout()? Does > >> that possibly make the delay really be 3 seconds? > > > > Yes, it does... > > Ok. And I see the previous timeout - it's this: > > reschedule_timeout(MAXTIMEOUT, "floppy init"); > > that sets the fd_timeout to 20 seconds initially. Still doesn't explain why reverting 070ad7e793dc6 (which made the whole thing completely serialized) doesn't fix it though; that's quite a mystery to me. > That whole fd_timeout code is broken, though. I'm not sure what the > floppy init timeout is supposed to do, when the floppy reset code wants > to re-use it. > > And yes, the 3-second timeout is still too long. > > Doing it asynchronously like Andi does helps a bit, but I suspect we > could make the reset timeout shorter still just to make the initial > "you don't have a floppy drive" code go faster. > > That said, why even compile in the floppy driver any more? Even if you > have the hardware, it probably doesn't work after ten years of > gathering dust. Those floppy drives weren't exactly reliable even back > in the days.. > > Anyway, I looked up the 82078 docs for reset. Holding the reset low > *does* cause an interrupt, but I'm not finding how long the reset > might take. But for the controller it really should be milliseconds, > not seconds, I suspect. Can anybody find the appropriate reset > timeout? I will look into it once I am back again from vacation next week. In the meantime, I'd propose to simply add the cancel_delayed_work() for 3.5; it's an obvious bug I made during the timer -> workqueue conversion, as we used to do del_timer() before. -- Jiri Kosina SUSE Labs ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: long boot delays caused by 070ad7e7 floppy change 2012-07-03 19:12 ` Linus Torvalds 2012-07-03 19:36 ` Calvin Walton @ 2012-07-03 19:40 ` Andi Kleen 2012-07-03 23:01 ` Linus Torvalds 1 sibling, 1 reply; 20+ messages in thread From: Andi Kleen @ 2012-07-03 19:40 UTC (permalink / raw) To: Linus Torvalds Cc: Andi Kleen, Jiri Kosina, Calvin Walton, linux-kernel, shemminger On Tue, Jul 03, 2012 at 12:12:53PM -0700, Linus Torvalds wrote: > On Tue, Jul 3, 2012 at 11:58 AM, Andi Kleen <andi@firstfloor.org> wrote: > > > > [ 2.496493] calling floppy_init+0x0/0xe46 @ 1 > > [ 2.511303] floppy0: lock_fdc: -1 (out of 8) > > [ 2.511346] floppy0: __reschedule_timeout: setting timeout to 750 > > Ahhah. And you have CONFIG_HZ250 set, don't you? Yes I do. > > So it's actually asking for a 3-second timeout like it should. > > > [ 2.511372] floppy0: reschedule timeout lock fdc > > [ 22.501757] floppy0: reschedule timeout do wakeup > > But that actual timeout is too damn reliably pretty much exactly 20s > for it to be anything else. > > I wonder if 'fd_timeout' had been previously queued already with the > 20s timeout, and not canceled. So then the new queuing of a 3-second > timeout doesn't actually do anything, because the > WORK_STRUCT_PENDING_BIT is still set. > > So maybe we're missing a cancel somewhere. > > What happens if you add a > > cancel_delayed_work(&fd_timeout); > > to before the queue_delayed_work() in __reschedule_timeout()? Does > that possibly make the delay really be 3 seconds? That works. Patch cuts it down to 3s as before Tested-by: Andi Kleen <ak@linux.intel.com> (and with the async probe patch I posted earlier the kernel boot time goes down by ~1s from 14 to 13ish) -Andi With Linus patch: [ 2.492449] calling floppy_init+0x0/0xe46 @ 1 [ 2.507259] floppy0: lock_fdc: -1 (out of 8) [ 2.507304] floppy0: __reschedule_timeout: setting timeout to 750 [ 2.507330] floppy0: reschedule timeout lock fdc [ 5.509642] floppy0: reschedule timeout do wakeup [ 5.509683] floppy0: no floppy controllers found [ 5.509796] initcall floppy_init+0x0/0xe46 returned -19 after 2952148 usecs [ 5.509826] calling brd_init+0x0/0x1ce @ 1 ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: long boot delays caused by 070ad7e7 floppy change 2012-07-03 19:40 ` Andi Kleen @ 2012-07-03 23:01 ` Linus Torvalds 0 siblings, 0 replies; 20+ messages in thread From: Linus Torvalds @ 2012-07-03 23:01 UTC (permalink / raw) To: Andi Kleen; +Cc: Jiri Kosina, Calvin Walton, linux-kernel, shemminger On Tue, Jul 3, 2012 at 12:40 PM, Andi Kleen <andi@firstfloor.org> wrote: > > That works. Patch cuts it down to 3s as before Ok, I committed it and pushed out. That said, I'm surprised that you report that the revert doesn't work, though. Before commit 070ad7e793dc, there was a "del_timer()" there, and re-introducing that del_timer() (and canceling the work is the equivalent of that) should have taken care of the issue. Odd, but I can't find it in myself to care too much. Linus ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: long boot delays caused by 070ad7e7 floppy change 2012-07-03 17:07 ` Linus Torvalds 2012-07-03 17:40 ` Calvin Walton @ 2012-07-03 17:45 ` Andi Kleen 1 sibling, 0 replies; 20+ messages in thread From: Andi Kleen @ 2012-07-03 17:45 UTC (permalink / raw) To: Linus Torvalds Cc: Calvin Walton, Andi Kleen, Jiri Kosina, linux-kernel, shemminger Sorry for the delay. Here's the output on my system with floppy=debug and Linus' debug patch: [ 2.480521] calling floppy_init+0x0/0xe46 @ 1 [ 2.495325] floppy0: lock_fdc: -1 (out of 8) [ 2.495369] floppy0: reschedule timeout lock fdc [ 22.501736] floppy0: reschedule timeout do wakeup [ 22.501762] floppy0: no floppy controllers found [ 22.501896] initcall floppy_init+0x0/0xe46 returned -19 after 19588875 usecs -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2012-07-03 23:06 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-07-02 22:46 long boot delays caused by 070ad7e7 floppy change Andi Kleen 2012-07-02 23:01 ` Linus Torvalds 2012-07-03 13:47 ` Calvin Walton 2012-07-03 17:07 ` Linus Torvalds 2012-07-03 17:40 ` Calvin Walton 2012-07-03 17:50 ` Linus Torvalds 2012-07-03 17:57 ` Jiri Kosina 2012-07-03 18:45 ` Andi Kleen 2012-07-03 17:58 ` Linus Torvalds 2012-07-03 18:15 ` Jiri Kosina 2012-07-03 18:58 ` Andi Kleen 2012-07-03 19:12 ` Linus Torvalds 2012-07-03 19:36 ` Calvin Walton 2012-07-03 19:42 ` Andi Kleen 2012-07-03 20:05 ` Linus Torvalds 2012-07-03 20:11 ` Andi Kleen 2012-07-03 23:06 ` Jiri Kosina 2012-07-03 19:40 ` Andi Kleen 2012-07-03 23:01 ` Linus Torvalds 2012-07-03 17:45 ` Andi Kleen
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.