All of lore.kernel.org
 help / color / mirror / Atom feed
* long boot delays caused by 070ad7e7 floppy change
@ 2012-07-02 22:46 Andi Kleen
  2012-07-02 23:01 ` Linus Torvalds
  0 siblings, 1 reply; 20+ messages in thread
From: Andi Kleen @ 2012-07-02 22:46 UTC (permalink / raw)
  To: Jiri Kosina; +Cc: linux-kernel, torvalds, shemminger

Jiri,

One of my test systems has a new 20s+ boot delay in 3.5rc*
It takes the floppy driver that long to determine I don't 
have a floppy

[    2.386322] calling  floppy_init+0x0/0xe46 @ 1
[    2.448977] Refined TSC clocksource calibration: 2594.106 MHz.
[    2.449021] Switching to clocksource tsc
[   22.371560] floppy0: no floppy controllers found
[   22.371676] initcall floppy_init+0x0/0xe46 returned -19 after 19553621 usecs

In older kernels the delay is only about 3s.

I tracked it down to 

commit 070ad7e793dc6ff753ee682ef7790b3373b471f6
Author: Jiri Kosina <jkosina@suse.cz>
Date:   Fri May 18 13:50:25 2012 +0200

    floppy: convert to delayed work and single-thread wq

The problem is not completely stable, a bisect actually doesn't
find it and reverting the patch on current HEAD also doesn't help.

However when setting the tree manually to one commit before
yours the problem is gone and adding your patch adds it.

I think the floppy init should run asynchronous, but it 
would be still good to make this faster again. Since block
devices need to be synchronized at end of boot there is no
way to hide a 20s delay completely.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: long boot delays caused by 070ad7e7 floppy change
  2012-07-02 22:46 long boot delays caused by 070ad7e7 floppy change Andi Kleen
@ 2012-07-02 23:01 ` Linus Torvalds
  2012-07-03 13:47   ` Calvin Walton
  0 siblings, 1 reply; 20+ messages in thread
From: Linus Torvalds @ 2012-07-02 23:01 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Jiri Kosina, linux-kernel, shemminger

On Mon, Jul 2, 2012 at 3:46 PM, Andi Kleen <andi@firstfloor.org> wrote:
>
> One of my test systems has a new 20s+ boot delay in 3.5rc*
> It takes the floppy driver that long to determine I don't
> have a floppy

Can you enable debug output ("floppy=debug" iirc) to give more details
of what happens leading up to the delay? Maybe the timing of the debug
messages would give some hints too..

The fact that reverting the patch on current -git doesn't actually
help makes me curious. And means that we can't fix it by reverting, so
we really do have to figure out what's going on.

                Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: long boot delays caused by 070ad7e7 floppy change
  2012-07-02 23:01 ` Linus Torvalds
@ 2012-07-03 13:47   ` Calvin Walton
  2012-07-03 17:07     ` Linus Torvalds
  0 siblings, 1 reply; 20+ messages in thread
From: Calvin Walton @ 2012-07-03 13:47 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andi Kleen, Jiri Kosina, linux-kernel, shemminger

On Mon, 2012-07-02 at 16:01 -0700, Linus Torvalds wrote:
> On Mon, Jul 2, 2012 at 3:46 PM, Andi Kleen <andi@firstfloor.org> wrote:
> >
> > One of my test systems has a new 20s+ boot delay in 3.5rc*
> > It takes the floppy driver that long to determine I don't
> > have a floppy
> 
> Can you enable debug output ("floppy=debug" iirc) to give more details
> of what happens leading up to the delay? Maybe the timing of the debug
> messages would give some hints too..

I'm seeing the same issue, so I've given this a try. Here are the
messages printed when floppy debugging is enabled:

[    0.000000] floppy0: Setting flag 0x1
...
[    0.703201] 00:07: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
[    0.703506] Non-volatile memory driver v1.3
[    0.714318] floppy0: reschedule timeout lock fdc
[    1.651396] Refined TSC clocksource calibration: 2698.759 MHz.
[    1.651548] Switching to clocksource tsc
[   20.729798] floppy0: reschedule timeout do wakeup
[   20.732654] floppy0: no floppy controllers found
[   20.733220] loop: module loaded
[   20.733394] ahci 0000:00:1f.2: version 3.0
[   20.733430] ahci 0000:00:1f.2: irq 42 for MSI/MSI-X

The 'reschedule timeout' messages are only printed when debugging is
enabled. Hopefully that can give someone a hint as to what's going on?
-- 
Calvin Walton <calvin.walton@kepstin.ca>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: long boot delays caused by 070ad7e7 floppy change
  2012-07-03 13:47   ` Calvin Walton
@ 2012-07-03 17:07     ` Linus Torvalds
  2012-07-03 17:40       ` Calvin Walton
  2012-07-03 17:45       ` Andi Kleen
  0 siblings, 2 replies; 20+ messages in thread
From: Linus Torvalds @ 2012-07-03 17:07 UTC (permalink / raw)
  To: Calvin Walton; +Cc: Andi Kleen, Jiri Kosina, linux-kernel, shemminger

On Tue, Jul 3, 2012 at 6:47 AM, Calvin Walton <calvin.walton@kepstin.ca> wrote:
>
> I'm seeing the same issue, so I've given this a try. Here are the
> messages printed when floppy debugging is enabled:
>
> [    0.714318] floppy0: reschedule timeout lock fdc
> [   20.729798] floppy0: reschedule timeout do wakeup

Ok, that 20 s timeout is a hint.

20s is the floppy "max timeout" value, and it's used in various
places. HOWEVER. The "lock fdc" case absolutely should not be using
that timeout. It should use the floppy "UDP->timeout" value, which
should be just 3s. Which is still a long time, but not nearly as long
as that.

Can you add a DPRINT() to the lock_fdc() function just before calling
reschedule_timeout() to print out what the "drive" is? Something like
yhhis (whitespace-damaged due to cut-and-paste):

    diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
    index cce7df367b79..00b52cef9d22 100644
    --- a/drivers/block/floppy.c
    +++ b/drivers/block/floppy.c
    @@ -876,6 +876,7 @@ static int lock_fdc(int drive, bool interruptible)

     	command_status = FD_COMMAND_NONE;

    +	DPRINT("lock_fdc: %d (out of %d)\n", drive, N_DRIVE);
     	reschedule_timeout(drive, "lock fdc");
     	set_fdc(drive);
     	return 0;

because maybe the 'drive' variable is crap there. In fact, add a

        WARN_ON_ONCE(dive < 0 || drive >= N_DRIVE);

in there too. Because if it is out of range, I'd want to see the call chain.

(And you still need floppy=debug to see that debug message, of course).

                Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: long boot delays caused by 070ad7e7 floppy change
  2012-07-03 17:07     ` Linus Torvalds
@ 2012-07-03 17:40       ` Calvin Walton
  2012-07-03 17:50         ` Linus Torvalds
  2012-07-03 17:45       ` Andi Kleen
  1 sibling, 1 reply; 20+ messages in thread
From: Calvin Walton @ 2012-07-03 17:40 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andi Kleen, Jiri Kosina, linux-kernel, shemminger

On Tue, 2012-07-03 at 10:07 -0700, Linus Torvalds wrote:
> On Tue, Jul 3, 2012 at 6:47 AM, Calvin Walton <calvin.walton@kepstin.ca> wrote:
> >
> > I'm seeing the same issue, so I've given this a try. Here are the
> > messages printed when floppy debugging is enabled:
> >
> > [    0.714318] floppy0: reschedule timeout lock fdc
> > [   20.729798] floppy0: reschedule timeout do wakeup
> 
> Ok, that 20 s timeout is a hint.
> 
> 20s is the floppy "max timeout" value, and it's used in various
> places. HOWEVER. The "lock fdc" case absolutely should not be using
> that timeout. It should use the floppy "UDP->timeout" value, which
> should be just 3s. Which is still a long time, but not nearly as long
> as that.
> 
> Can you add a DPRINT() to the lock_fdc() function just before calling
> reschedule_timeout() to print out what the "drive" is?

Ok, here's the dmesg output that I get:

[    0.661966] Serial: 8250/16550 driver, 2 ports, IRQ sharing disabled
[    0.682589] serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
[    0.703303] 00:07: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
[    0.703599] Non-volatile memory driver v1.3
[    0.714316] floppy0: lock_fdc: -1 (out of 8)
[    0.714462] ------------[ cut here ]------------
[    0.714606] WARNING: at drivers/block/floppy.c:880 lock_fdc.isra.30+0x156/0x170()
[    0.717372] Hardware name: EX58-DS4
[    0.717511] Modules linked in:
[    0.717649] Pid: 1, comm: swapper/0 Not tainted 3.5.0-rc4-00230-g15114c7-dirty #21
[    0.717872] Call Trace:
[    0.718010]  [<ffffffff810686aa>] warn_slowpath_common+0x7a/0xb0
[    0.718154]  [<ffffffff810686f5>] warn_slowpath_null+0x15/0x20
[    0.718306]  [<ffffffff813e3d96>] lock_fdc.isra.30+0x156/0x170
[    0.718448]  [<ffffffff813e46f1>] user_reset_fdc+0x11/0x80
[    0.718592]  [<ffffffff81cbb854>] floppy_init+0x7d7/0xe15
[    0.718737]  [<ffffffff81cbb07d>] ? set_cmos+0x6e/0x6e
[    0.718879]  [<ffffffff810001ba>] do_one_initcall+0x3a/0x160
[    0.719024]  [<ffffffff81c95c4f>] kernel_init+0x101/0x185
[    0.719169]  [<ffffffff81c955c5>] ? do_early_param+0x87/0x87
[    0.719317]  [<ffffffff81760d94>] kernel_thread_helper+0x4/0x10
[    0.719460]  [<ffffffff81c95b4e>] ? start_kernel+0x32f/0x32f
[    0.719604]  [<ffffffff81760d90>] ? gs_change+0xb/0xb
[    0.719748] ---[ end trace 16b38434e9defc9e ]---
[    0.719888] floppy0: reschedule timeout lock fdc
[    1.651398] Refined TSC clocksource calibration: 2698.759 MHz.
[    1.651551] Switching to clocksource tsc
[   20.729857] floppy0: reschedule timeout do wakeup
[   20.730009] floppy0: no floppy controllers found
[   20.730578] loop: module loaded
[   20.730746] ahci 0000:00:1f.2: version 3.0
[   20.730782] ahci 0000:00:1f.2: irq 42 for MSI/MSI-X

when using this debugging patch:

diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index cce7df3..8a41259 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -876,6 +876,9 @@ static int lock_fdc(int drive, bool interruptible)
 
 	command_status = FD_COMMAND_NONE;
 
+	DPRINT("lock_fdc: %d (out of %d)\n", drive, N_DRIVE);
+	WARN_ON_ONCE(drive < 0 || drive >= N_DRIVE);
+
 	reschedule_timeout(drive, "lock fdc");
 	set_fdc(drive);
 	return 0;

-- 
Calvin Walton <calvin.walton@kepstin.ca>


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: long boot delays caused by 070ad7e7 floppy change
  2012-07-03 17:07     ` Linus Torvalds
  2012-07-03 17:40       ` Calvin Walton
@ 2012-07-03 17:45       ` Andi Kleen
  1 sibling, 0 replies; 20+ messages in thread
From: Andi Kleen @ 2012-07-03 17:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Calvin Walton, Andi Kleen, Jiri Kosina, linux-kernel, shemminger



Sorry for the delay. Here's the output on my system with floppy=debug
and Linus' debug patch:


[    2.480521] calling  floppy_init+0x0/0xe46 @ 1
[    2.495325] floppy0: lock_fdc: -1 (out of 8)
[    2.495369] floppy0: reschedule timeout lock fdc
[   22.501736] floppy0: reschedule timeout do wakeup
[   22.501762] floppy0: no floppy controllers found
[   22.501896] initcall floppy_init+0x0/0xe46 returned -19 after
19588875 usecs


-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: long boot delays caused by 070ad7e7 floppy change
  2012-07-03 17:40       ` Calvin Walton
@ 2012-07-03 17:50         ` Linus Torvalds
  2012-07-03 17:57           ` Jiri Kosina
  2012-07-03 17:58           ` Linus Torvalds
  0 siblings, 2 replies; 20+ messages in thread
From: Linus Torvalds @ 2012-07-03 17:50 UTC (permalink / raw)
  To: Calvin Walton; +Cc: Andi Kleen, Jiri Kosina, linux-kernel, shemminger

On Tue, Jul 3, 2012 at 10:40 AM, Calvin Walton <calvin.walton@kepstin.ca> wrote:
>
> Ok, here's the dmesg output that I get:
>
> [    0.714316] floppy0: lock_fdc: -1 (out of 8)

Ok, thanks. That -1 definitely explains the long delay. It's using
"drive=-1" because it has no actual drive number, just the
*controller* number.

And the backtrace shows what's going on too: it's the

  user_reset_fdc(-1, FD_RESET_ALWAYS, false);

that passes in -1 as the floppy controller number.

Ugh.

It looks like the code meant for -1 to mean "regular delay"
(current_reqD), and -2 to mean MAXTIMEOUT. For some reason the "-1"
logic has gotten screwed.

                  Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: long boot delays caused by 070ad7e7 floppy change
  2012-07-03 17:50         ` Linus Torvalds
@ 2012-07-03 17:57           ` Jiri Kosina
  2012-07-03 18:45             ` Andi Kleen
  2012-07-03 17:58           ` Linus Torvalds
  1 sibling, 1 reply; 20+ messages in thread
From: Jiri Kosina @ 2012-07-03 17:57 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Calvin Walton, Andi Kleen, linux-kernel, shemminger

On Tue, 3 Jul 2012, Linus Torvalds wrote:

> > Ok, here's the dmesg output that I get:
> >
> > [    0.714316] floppy0: lock_fdc: -1 (out of 8)
> 
> Ok, thanks. That -1 definitely explains the long delay. It's using
> "drive=-1" because it has no actual drive number, just the
> *controller* number.
> 
> And the backtrace shows what's going on too: it's the
> 
>   user_reset_fdc(-1, FD_RESET_ALWAYS, false);
> 
> that passes in -1 as the floppy controller number.
> 
> Ugh.
> 
> It looks like the code meant for -1 to mean "regular delay"
> (current_reqD), and -2 to mean MAXTIMEOUT. For some reason the "-1"
> logic has gotten screwed.

I am still trying to understand the exact situation.

Andi, do I understand you correctly that even if you revert 070ad7e793dc 
on top of current Linus' tree, you are still seeing the 20 second delay 
inside the initcall?

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: long boot delays caused by 070ad7e7 floppy change
  2012-07-03 17:50         ` Linus Torvalds
  2012-07-03 17:57           ` Jiri Kosina
@ 2012-07-03 17:58           ` Linus Torvalds
  2012-07-03 18:15             ` Jiri Kosina
  1 sibling, 1 reply; 20+ messages in thread
From: Linus Torvalds @ 2012-07-03 17:58 UTC (permalink / raw)
  To: Calvin Walton; +Cc: Andi Kleen, Jiri Kosina, linux-kernel, shemminger

On Tue, Jul 3, 2012 at 10:50 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> It looks like the code meant for -1 to mean "regular delay"
> (current_reqD), and -2 to mean MAXTIMEOUT. For some reason the "-1"
> logic has gotten screwed.

So it looks like current_reqD still exists, and sets the drive to
"current_drive". Which *should* be 0, as far as I can tell.  In fact,
since "current_drive" is printed out by DPRINTK(), we *know* it's 0
(it's the "0" in the "floppy0:" message).

But how the heck does it pick that bogus 20 second delay then? The -1
should have been caught by

        if (drive == current_reqD)
                drive = current_drive;

and then we should have used the regular delay for drive 0.

Jiri, any ideas?

                  Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: long boot delays caused by 070ad7e7 floppy change
  2012-07-03 17:58           ` Linus Torvalds
@ 2012-07-03 18:15             ` Jiri Kosina
  2012-07-03 18:58               ` Andi Kleen
  0 siblings, 1 reply; 20+ messages in thread
From: Jiri Kosina @ 2012-07-03 18:15 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Calvin Walton, Andi Kleen, linux-kernel, shemminger

On Tue, 3 Jul 2012, Linus Torvalds wrote:

> > It looks like the code meant for -1 to mean "regular delay"
> > (current_reqD), and -2 to mean MAXTIMEOUT. For some reason the "-1"
> > logic has gotten screwed.
> 
> So it looks like current_reqD still exists, and sets the drive to
> "current_drive". Which *should* be 0, as far as I can tell.  In fact,
> since "current_drive" is printed out by DPRINTK(), we *know* it's 0
> (it's the "0" in the "floppy0:" message).
> 
> But how the heck does it pick that bogus 20 second delay then? The -1
> should have been caught by
> 
>         if (drive == current_reqD)
>                 drive = current_drive;
> 
> and then we should have used the regular delay for drive 0.
> 
> Jiri, any ideas?

I am not seeing how this could be happening at the moment. I am also a bit 
puzzled by Andi's statement that reverting the conversion to the single wq 
is not fixing it.

Have we by any chance changed anything in wq processing during early 
bootup lately?

Calvin, Andi, could you please dump the value of UDP->timeout in 
__reschedule_timeout(), please?

diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index cce7df3..9850120 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -675,8 +675,11 @@ static void __reschedule_timeout(int drive, const char *message)
 	if (drive < 0 || drive >= N_DRIVE) {
 		delay = 20UL * HZ;
 		drive = 0;
-	} else
+	} else {
+		DPRINT("__reschedule_timeout: setting timeout to %ld\n",
+				UDP->timeout);
 		delay = UDP->timeout;
+	}
 
 	queue_delayed_work(floppy_wq, &fd_timeout, delay);
 	if (UDP->flags & FD_DEBUG)

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: long boot delays caused by 070ad7e7 floppy change
  2012-07-03 17:57           ` Jiri Kosina
@ 2012-07-03 18:45             ` Andi Kleen
  0 siblings, 0 replies; 20+ messages in thread
From: Andi Kleen @ 2012-07-03 18:45 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Linus Torvalds, Calvin Walton, Andi Kleen, linux-kernel,
	shemminger

On Tue, Jul 03, 2012 at 07:57:15PM +0200, Jiri Kosina wrote:
> On Tue, 3 Jul 2012, Linus Torvalds wrote:
> 
> > > Ok, here's the dmesg output that I get:
> > >
> > > [    0.714316] floppy0: lock_fdc: -1 (out of 8)
> > 
> > Ok, thanks. That -1 definitely explains the long delay. It's using
> > "drive=-1" because it has no actual drive number, just the
> > *controller* number.
> > 
> > And the backtrace shows what's going on too: it's the
> > 
> >   user_reset_fdc(-1, FD_RESET_ALWAYS, false);
> > 
> > that passes in -1 as the floppy controller number.
> > 
> > Ugh.
> > 
> > It looks like the code meant for -1 to mean "regular delay"
> > (current_reqD), and -2 to mean MAXTIMEOUT. For some reason the "-1"
> > logic has gotten screwed.
> 
> I am still trying to understand the exact situation.
> 
> Andi, do I understand you correctly that even if you revert 070ad7e793dc 
> on top of current Linus' tree, you are still seeing the 20 second delay 
> inside the initcall?

Correct. Here's the debug output with just reverted (without Linus
patch):

[    2.489559] calling  floppy_init+0x0/0xe46 @ 1
[    2.504363] floppy0: lock_fdc: -1 (out of 8)
[    2.504410] floppy0: reschedule timeout lock fdc
[   22.498784] floppy0: reschedule timeout do wakeup
[   22.498811] floppy0: no floppy controllers found
[   22.498944] initcall floppy_init+0x0/0xe46 returned -19 after
19577158 usecs


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: long boot delays caused by 070ad7e7 floppy change
  2012-07-03 18:15             ` Jiri Kosina
@ 2012-07-03 18:58               ` Andi Kleen
  2012-07-03 19:12                 ` Linus Torvalds
  0 siblings, 1 reply; 20+ messages in thread
From: Andi Kleen @ 2012-07-03 18:58 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Linus Torvalds, Calvin Walton, Andi Kleen, linux-kernel,
	shemminger

> Calvin, Andi, could you please dump the value of UDP->timeout in 
> __reschedule_timeout(), please?


[    2.496493] calling  floppy_init+0x0/0xe46 @ 1
[    2.511303] floppy0: lock_fdc: -1 (out of 8)
[    2.511346] floppy0: __reschedule_timeout: setting timeout to 750
[    2.511372] floppy0: reschedule timeout lock fdc
[   22.501757] floppy0: reschedule timeout do wakeup
[   22.501784] floppy0: no floppy controllers found
[   22.501919] initcall floppy_init+0x0/0xe46 returned -19 after
19573258 usecs

-Andi

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: long boot delays caused by 070ad7e7 floppy change
  2012-07-03 18:58               ` Andi Kleen
@ 2012-07-03 19:12                 ` Linus Torvalds
  2012-07-03 19:36                   ` Calvin Walton
  2012-07-03 19:40                   ` Andi Kleen
  0 siblings, 2 replies; 20+ messages in thread
From: Linus Torvalds @ 2012-07-03 19:12 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Jiri Kosina, Calvin Walton, linux-kernel, shemminger

On Tue, Jul 3, 2012 at 11:58 AM, Andi Kleen <andi@firstfloor.org> wrote:
>
> [    2.496493] calling  floppy_init+0x0/0xe46 @ 1
> [    2.511303] floppy0: lock_fdc: -1 (out of 8)
> [    2.511346] floppy0: __reschedule_timeout: setting timeout to 750

Ahhah. And you have CONFIG_HZ250 set, don't you?

So it's actually asking for a 3-second timeout like it should.

> [    2.511372] floppy0: reschedule timeout lock fdc
> [   22.501757] floppy0: reschedule timeout do wakeup

But that actual timeout is too damn reliably pretty much exactly 20s
for it to be anything else.

I wonder if 'fd_timeout' had been previously queued already with the
20s timeout, and not canceled. So then the new queuing of a 3-second
timeout doesn't actually do anything, because the
WORK_STRUCT_PENDING_BIT is still set.

So maybe we're missing a cancel somewhere.

What happens if you add a

        cancel_delayed_work(&fd_timeout);

to before the queue_delayed_work() in __reschedule_timeout()? Does
that possibly make the delay really be 3 seconds?

                   Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: long boot delays caused by 070ad7e7 floppy change
  2012-07-03 19:12                 ` Linus Torvalds
@ 2012-07-03 19:36                   ` Calvin Walton
  2012-07-03 19:42                     ` Andi Kleen
  2012-07-03 20:05                     ` Linus Torvalds
  2012-07-03 19:40                   ` Andi Kleen
  1 sibling, 2 replies; 20+ messages in thread
From: Calvin Walton @ 2012-07-03 19:36 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andi Kleen, Jiri Kosina, linux-kernel, shemminger

On Tue, 2012-07-03 at 12:12 -0700, Linus Torvalds wrote:
> What happens if you add a
> 
>         cancel_delayed_work(&fd_timeout);
> 
> to before the queue_delayed_work() in __reschedule_timeout()? Does
> that possibly make the delay really be 3 seconds?

Yes, it does...
[    0.718571] floppy0: reschedule timeout lock fdc
[    1.650956] Refined TSC clocksource calibration: 2698.760 MHz.
[    1.651109] Switching to clocksource tsc
[    3.724664] floppy0: reschedule timeout do wakeup
[    3.724815] floppy0: no floppy controllers found

diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index cce7df3..c8064e4 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -678,6 +678,8 @@ static void __reschedule_timeout(int drive, const char *message)
 	} else
 		delay = UDP->timeout;
 
+	cancel_delayed_work(&fd_timeout);
+
 	queue_delayed_work(floppy_wq, &fd_timeout, delay);
 	if (UDP->flags & FD_DEBUG)
 		DPRINT("reschedule timeout %s\n", message);

Of course, a 3 second delay at this point in boot is still a fairly big
bit of waiting, given how fast everything else is nowadays.

-- 
Calvin Walton <calvin.walton@kepstin.ca>


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: long boot delays caused by 070ad7e7 floppy change
  2012-07-03 19:12                 ` Linus Torvalds
  2012-07-03 19:36                   ` Calvin Walton
@ 2012-07-03 19:40                   ` Andi Kleen
  2012-07-03 23:01                     ` Linus Torvalds
  1 sibling, 1 reply; 20+ messages in thread
From: Andi Kleen @ 2012-07-03 19:40 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andi Kleen, Jiri Kosina, Calvin Walton, linux-kernel, shemminger

On Tue, Jul 03, 2012 at 12:12:53PM -0700, Linus Torvalds wrote:
> On Tue, Jul 3, 2012 at 11:58 AM, Andi Kleen <andi@firstfloor.org> wrote:
> >
> > [    2.496493] calling  floppy_init+0x0/0xe46 @ 1
> > [    2.511303] floppy0: lock_fdc: -1 (out of 8)
> > [    2.511346] floppy0: __reschedule_timeout: setting timeout to 750
> 
> Ahhah. And you have CONFIG_HZ250 set, don't you?

Yes I do.

> 
> So it's actually asking for a 3-second timeout like it should.
> 
> > [    2.511372] floppy0: reschedule timeout lock fdc
> > [   22.501757] floppy0: reschedule timeout do wakeup
> 
> But that actual timeout is too damn reliably pretty much exactly 20s
> for it to be anything else.
> 
> I wonder if 'fd_timeout' had been previously queued already with the
> 20s timeout, and not canceled. So then the new queuing of a 3-second
> timeout doesn't actually do anything, because the
> WORK_STRUCT_PENDING_BIT is still set.
> 
> So maybe we're missing a cancel somewhere.
> 
> What happens if you add a
> 
>         cancel_delayed_work(&fd_timeout);
> 
> to before the queue_delayed_work() in __reschedule_timeout()? Does
> that possibly make the delay really be 3 seconds?

That works.  Patch cuts it down to 3s as before 

Tested-by: Andi Kleen <ak@linux.intel.com>

(and with the async probe patch I posted earlier the kernel
boot time goes down by ~1s from 14 to 13ish)

-Andi

With Linus patch:

[    2.492449] calling  floppy_init+0x0/0xe46 @ 1
[    2.507259] floppy0: lock_fdc: -1 (out of 8)
[    2.507304] floppy0: __reschedule_timeout: setting timeout to 750
[    2.507330] floppy0: reschedule timeout lock fdc
[    5.509642] floppy0: reschedule timeout do wakeup
[    5.509683] floppy0: no floppy controllers found
[    5.509796] initcall floppy_init+0x0/0xe46 returned -19 after 2952148
usecs
[    5.509826] calling  brd_init+0x0/0x1ce @ 1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: long boot delays caused by 070ad7e7 floppy change
  2012-07-03 19:36                   ` Calvin Walton
@ 2012-07-03 19:42                     ` Andi Kleen
  2012-07-03 20:05                     ` Linus Torvalds
  1 sibling, 0 replies; 20+ messages in thread
From: Andi Kleen @ 2012-07-03 19:42 UTC (permalink / raw)
  To: Calvin Walton
  Cc: Linus Torvalds, Andi Kleen, Jiri Kosina, linux-kernel, shemminger

On Tue, Jul 03, 2012 at 03:36:40PM -0400, Calvin Walton wrote:
> On Tue, 2012-07-03 at 12:12 -0700, Linus Torvalds wrote:
> > What happens if you add a
> > 
> >         cancel_delayed_work(&fd_timeout);
> > 
> > to before the queue_delayed_work() in __reschedule_timeout()? Does
> > that possibly make the delay really be 3 seconds?
> 
> Yes, it does...
> [    0.718571] floppy0: reschedule timeout lock fdc
> [    1.650956] Refined TSC clocksource calibration: 2698.760 MHz.
> [    1.651109] Switching to clocksource tsc
> [    3.724664] floppy0: reschedule timeout do wakeup
> [    3.724815] floppy0: no floppy controllers found
> 
> diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
> index cce7df3..c8064e4 100644
> --- a/drivers/block/floppy.c
> +++ b/drivers/block/floppy.c
> @@ -678,6 +678,8 @@ static void __reschedule_timeout(int drive, const char *message)
>  	} else
>  		delay = UDP->timeout;
>  
> +	cancel_delayed_work(&fd_timeout);
> +
>  	queue_delayed_work(floppy_wq, &fd_timeout, delay);
>  	if (UDP->flags & FD_DEBUG)
>  		DPRINT("reschedule timeout %s\n", message);
> 
> Of course, a 3 second delay at this point in boot is still a fairly big
> bit of waiting, given how fast everything else is nowadays.

Here's the patch I did for that yesterday


>From 04a4d851d1c96b6685352d9cef41b77966cc7a1e Mon Sep 17 00:00:00 2001
From: Andi Kleen <ak@linux.intel.com>
Date: Mon, 2 Jul 2012 17:07:08 -0700
Subject: [PATCH] floppy: Run floppy initialization asynchronous

floppy_init is quite slow, 3s on my test system to determine
that there is no floppy. Run it asynchronous to the other
init calls to improve boot time.

Cuts down boot time by 1s on a system of mine.

Signed-off-by: Andi Kleen <ak@linux.intel.com>

diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index cce7df3..8b5769b 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -191,6 +191,7 @@ static int print_unex = 1;
 #include <linux/mutex.h>
 #include <linux/io.h>
 #include <linux/uaccess.h>
+#include <linux/async.h>
 
 /*
  * PS/2 floppies have much slower step rates than regular floppies.
@@ -4122,7 +4123,7 @@ static struct kobject *floppy_find(dev_t dev, int *part, void *data)
 	return get_disk(disks[drive]);
 }
 
-static int __init floppy_init(void)
+static int __init do_floppy_init(void)
 {
 	int i, unit, drive;
 	int err, dr;
@@ -4337,6 +4338,24 @@ out_put_disk:
 	return err;
 }
 
+#ifndef MODULE
+static __init void floppy_async_init(void *data, async_cookie_t cookie)
+{
+	do_floppy_init();
+}
+#endif
+
+static int __init floppy_init(void)
+{
+#ifdef MODULE
+	return do_floppy_init(void);
+#else
+	/* Don't hold up the bootup by the floppy initialization */
+	async_schedule(floppy_async_init, NULL);
+	return 0;
+#endif
+}
+
 static const struct io_region {
 	int offset;
 	int size;
-- 
1.7.7.6






^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: long boot delays caused by 070ad7e7 floppy change
  2012-07-03 19:36                   ` Calvin Walton
  2012-07-03 19:42                     ` Andi Kleen
@ 2012-07-03 20:05                     ` Linus Torvalds
  2012-07-03 20:11                       ` Andi Kleen
  2012-07-03 23:06                       ` Jiri Kosina
  1 sibling, 2 replies; 20+ messages in thread
From: Linus Torvalds @ 2012-07-03 20:05 UTC (permalink / raw)
  To: Calvin Walton; +Cc: Andi Kleen, Jiri Kosina, linux-kernel, shemminger

On Tue, Jul 3, 2012 at 12:36 PM, Calvin Walton <calvin.walton@kepstin.ca> wrote:
> On Tue, 2012-07-03 at 12:12 -0700, Linus Torvalds wrote:
>> What happens if you add a
>>
>>         cancel_delayed_work(&fd_timeout);
>>
>> to before the queue_delayed_work() in __reschedule_timeout()? Does
>> that possibly make the delay really be 3 seconds?
>
> Yes, it does...

Ok. And I see the previous timeout - it's this:

        reschedule_timeout(MAXTIMEOUT, "floppy init");

that sets the fd_timeout to 20 seconds initially.

That whole fd_timeout code is broken, though. I'm not sure what the
floppy init timeout is supposed to do, when the floppy reset code
wants to re-use it.

And yes, the 3-second timeout is still too long.

Doing it asynchronously like Andi does helps a bit, but I suspect we
could make the reset timeout shorter still just to make the initial
"you don't have a floppy drive" code go faster.

That said, why even compile in the floppy driver any more? Even if you
have the hardware, it probably doesn't work after ten years of
gathering dust. Those floppy drives weren't exactly reliable even back
in the days..

Anyway, I looked up the 82078 docs for reset. Holding the reset low
*does* cause an interrupt, but I'm not finding how long the reset
might take. But for the controller it really should be milliseconds,
not seconds, I suspect. Can anybody find the appropriate reset
timeout?

                 Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: long boot delays caused by 070ad7e7 floppy change
  2012-07-03 20:05                     ` Linus Torvalds
@ 2012-07-03 20:11                       ` Andi Kleen
  2012-07-03 23:06                       ` Jiri Kosina
  1 sibling, 0 replies; 20+ messages in thread
From: Andi Kleen @ 2012-07-03 20:11 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Calvin Walton, Jiri Kosina, linux-kernel, shemminger

Linus Torvalds <torvalds@linux-foundation.org> writes:
>
> Doing it asynchronously like Andi does helps a bit,

I think it would be more effective if the floppy driver initialized
earlier, I think I have some slowdown because it starts too late.

> but I suspect we
> could make the reset timeout shorter still just to make the initial
> "you don't have a floppy drive" code go faster.
>
> That said, why even compile in the floppy driver any more?

I used to boot simulators from make fdimage some time ago. Until
the kernels got too large :-)

But yes make isoimage is the way to go and

I'll remove it from my configs.

Thanks for the debugging.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: long boot delays caused by 070ad7e7 floppy change
  2012-07-03 19:40                   ` Andi Kleen
@ 2012-07-03 23:01                     ` Linus Torvalds
  0 siblings, 0 replies; 20+ messages in thread
From: Linus Torvalds @ 2012-07-03 23:01 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Jiri Kosina, Calvin Walton, linux-kernel, shemminger

On Tue, Jul 3, 2012 at 12:40 PM, Andi Kleen <andi@firstfloor.org> wrote:
>
> That works.  Patch cuts it down to 3s as before

Ok, I committed it and pushed out.

That said, I'm surprised that you report that the revert doesn't work,
though. Before commit 070ad7e793dc, there was a "del_timer()" there,
and re-introducing that del_timer() (and canceling the work is the
equivalent of that) should have taken care of the issue.

Odd, but I can't find it in myself to care too much.

               Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: long boot delays caused by 070ad7e7 floppy change
  2012-07-03 20:05                     ` Linus Torvalds
  2012-07-03 20:11                       ` Andi Kleen
@ 2012-07-03 23:06                       ` Jiri Kosina
  1 sibling, 0 replies; 20+ messages in thread
From: Jiri Kosina @ 2012-07-03 23:06 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Calvin Walton, Andi Kleen, linux-kernel, shemminger

On Tue, 3 Jul 2012, Linus Torvalds wrote:

> >> What happens if you add a
> >>
> >>         cancel_delayed_work(&fd_timeout);
> >>
> >> to before the queue_delayed_work() in __reschedule_timeout()? Does
> >> that possibly make the delay really be 3 seconds?
> >
> > Yes, it does...
> 
> Ok. And I see the previous timeout - it's this:
> 
>         reschedule_timeout(MAXTIMEOUT, "floppy init");
> 
> that sets the fd_timeout to 20 seconds initially.

Still doesn't explain why reverting 070ad7e793dc6 (which made the whole 
thing completely serialized) doesn't fix it though; that's quite a mystery 
to me.

> That whole fd_timeout code is broken, though. I'm not sure what the 
> floppy init timeout is supposed to do, when the floppy reset code wants 
> to re-use it.
> 
> And yes, the 3-second timeout is still too long.
> 
> Doing it asynchronously like Andi does helps a bit, but I suspect we
> could make the reset timeout shorter still just to make the initial
> "you don't have a floppy drive" code go faster.
> 
> That said, why even compile in the floppy driver any more? Even if you
> have the hardware, it probably doesn't work after ten years of
> gathering dust. Those floppy drives weren't exactly reliable even back
> in the days..
> 
> Anyway, I looked up the 82078 docs for reset. Holding the reset low
> *does* cause an interrupt, but I'm not finding how long the reset
> might take. But for the controller it really should be milliseconds,
> not seconds, I suspect. Can anybody find the appropriate reset
> timeout?

I will look into it once I am back again from vacation next week. In the 
meantime, I'd propose to simply add the cancel_delayed_work() for 3.5; 
it's an obvious bug I made during the timer -> workqueue conversion, as we 
used to do del_timer() before.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2012-07-03 23:06 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-02 22:46 long boot delays caused by 070ad7e7 floppy change Andi Kleen
2012-07-02 23:01 ` Linus Torvalds
2012-07-03 13:47   ` Calvin Walton
2012-07-03 17:07     ` Linus Torvalds
2012-07-03 17:40       ` Calvin Walton
2012-07-03 17:50         ` Linus Torvalds
2012-07-03 17:57           ` Jiri Kosina
2012-07-03 18:45             ` Andi Kleen
2012-07-03 17:58           ` Linus Torvalds
2012-07-03 18:15             ` Jiri Kosina
2012-07-03 18:58               ` Andi Kleen
2012-07-03 19:12                 ` Linus Torvalds
2012-07-03 19:36                   ` Calvin Walton
2012-07-03 19:42                     ` Andi Kleen
2012-07-03 20:05                     ` Linus Torvalds
2012-07-03 20:11                       ` Andi Kleen
2012-07-03 23:06                       ` Jiri Kosina
2012-07-03 19:40                   ` Andi Kleen
2012-07-03 23:01                     ` Linus Torvalds
2012-07-03 17:45       ` Andi Kleen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.