public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* softlockup interaction with slow consoles
@ 2006-02-20 21:18 David S. Miller
  2006-02-21  9:09 ` Ingo Molnar
  0 siblings, 1 reply; 12+ messages in thread
From: David S. Miller @ 2006-02-20 21:18 UTC (permalink / raw)
  To: mingo; +Cc: linux-kernel


When register_console() runs, it dumps the entire log buffer
to the console device with interrupts off.

If you're going over a slow console, this can easily take more
than 10 seconds especially on SMP with many cpus brought up.

This makes the softlock fire, which is terribly annoying :-)

I think the bug is in the console registry code, I think it
should capture chunks of the existing console buffer into some
local memory and push things piece by piece with interrupts
enabled to the console driver(s).

Any better ideas?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: softlockup interaction with slow consoles
  2006-02-20 21:18 softlockup interaction with slow consoles David S. Miller
@ 2006-02-21  9:09 ` Ingo Molnar
  2006-02-21  9:16   ` David S. Miller
  0 siblings, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2006-02-21  9:09 UTC (permalink / raw)
  To: David S. Miller; +Cc: linux-kernel


On Mon, 20 Feb 2006, David S. Miller wrote:

> When register_console() runs, it dumps the entire log buffer to the
> console device with interrupts off.
> 
> If you're going over a slow console, this can easily take more than 10
> seconds especially on SMP with many cpus brought up.
> 
> This makes the softlock fire, which is terribly annoying :-)
> 
> I think the bug is in the console registry code, I think it should
> capture chunks of the existing console buffer into some local memory and
> push things piece by piece with interrupts enabled to the console
> driver(s).
> 
> Any better ideas?

is this a boot-time-only problem? In recent updates to the softlockup 
code in -mm:

 timer-irq-driven-soft-watchdog-cleanups.patch
 timer-irq-driven-soft-watchdog-percpu-race-fix.patch
 timer-irq-driven-soft-watchdog-percpu-fix.patch
 timer-irq-driven-soft-watchdog-boot-fix.patch

i changed soft lockup detection to be turned off during bootup. That
should work around any boot-time warnings.

(if this can happen on a booted up system then the real fix would indeed
be to split up register_console()'s workload - that would also make it
more preemption-friendly. But at first sight it looks quite complex to
do.)

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: softlockup interaction with slow consoles
  2006-02-21  9:09 ` Ingo Molnar
@ 2006-02-21  9:16   ` David S. Miller
  2006-02-21 13:49     ` Andi Kleen
  2006-02-21 19:23     ` Matt Mackall
  0 siblings, 2 replies; 12+ messages in thread
From: David S. Miller @ 2006-02-21  9:16 UTC (permalink / raw)
  To: mingo; +Cc: linux-kernel

From: Ingo Molnar <mingo@redhat.com>
Date: Tue, 21 Feb 2006 04:09:58 -0500 (EST)

> i changed soft lockup detection to be turned off during bootup. That
> should work around any boot-time warnings.

Excellent.

> (if this can happen on a booted up system then the real fix would indeed
> be to split up register_console()'s workload - that would also make it
> more preemption-friendly. But at first sight it looks quite complex to
> do.)

Agreed.  I thought about buffering in the slow console driver itself
but that's bad because if it's a crash message we might not get the
events (interrupts, or whatever) in order to make forward progress
printing out the buffer, and thus we'd lose the valuable messages.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: softlockup interaction with slow consoles
  2006-02-21  9:16   ` David S. Miller
@ 2006-02-21 13:49     ` Andi Kleen
  2006-02-21 20:01       ` David S. Miller
  2006-02-21 19:23     ` Matt Mackall
  1 sibling, 1 reply; 12+ messages in thread
From: Andi Kleen @ 2006-02-21 13:49 UTC (permalink / raw)
  To: David S. Miller; +Cc: linux-kernel, mingo

"David S. Miller" <davem@davemloft.net> writes:

> From: Ingo Molnar <mingo@redhat.com>
> Date: Tue, 21 Feb 2006 04:09:58 -0500 (EST)
> 
> > i changed soft lockup detection to be turned off during bootup. That
> > should work around any boot-time warnings.
> 
> Excellent.


Still you could probably see problems with very slow consoles even after
bootup, couldn't you?

(for some reason some people still use 9600 baud serial consoles, which
tend to trigger all kind of interesting races) 


-Andi

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: softlockup interaction with slow consoles
  2006-02-21  9:16   ` David S. Miller
  2006-02-21 13:49     ` Andi Kleen
@ 2006-02-21 19:23     ` Matt Mackall
  2006-02-21 19:58       ` David S. Miller
  1 sibling, 1 reply; 12+ messages in thread
From: Matt Mackall @ 2006-02-21 19:23 UTC (permalink / raw)
  To: David S. Miller; +Cc: mingo, linux-kernel

On Tue, Feb 21, 2006 at 01:16:50AM -0800, David S. Miller wrote:
> From: Ingo Molnar <mingo@redhat.com>
> Date: Tue, 21 Feb 2006 04:09:58 -0500 (EST)
> 
> > i changed soft lockup detection to be turned off during bootup. That
> > should work around any boot-time warnings.
> 
> Excellent.

I don't like it. We should instead just have printk tickle the watchdog.
 
> > (if this can happen on a booted up system then the real fix would indeed
> > be to split up register_console()'s workload - that would also make it
> > more preemption-friendly. But at first sight it looks quite complex to
> > do.)
> 
> Agreed.  I thought about buffering in the slow console driver itself
> but that's bad because if it's a crash message we might not get the
> events (interrupts, or whatever) in order to make forward progress
> printing out the buffer, and thus we'd lose the valuable messages.

Absolutely.

Also note that sysrq-t can be substantially worse than the the
boot-time dump if you've got a lot of processes. So the boot-time hack
is insufficient. Even at 115kbps, this can be several seconds.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: softlockup interaction with slow consoles
  2006-02-21 19:23     ` Matt Mackall
@ 2006-02-21 19:58       ` David S. Miller
  0 siblings, 0 replies; 12+ messages in thread
From: David S. Miller @ 2006-02-21 19:58 UTC (permalink / raw)
  To: mpm; +Cc: mingo, linux-kernel

From: Matt Mackall <mpm@selenic.com>
Date: Tue, 21 Feb 2006 13:23:40 -0600

> I don't like it. We should instead just have printk tickle the watchdog.

You can't, interrupts are disabled the entire time and thus
jiffies aren't advancing.  The "touch" just sets the local
cpu timestamp to whatever jiffies is.

That's the problem.

I agree that it would be nice to fix this, but the desire for
synchronous console output makes this very hard to solve.

Like I said, maybe some local chunking in release_console_sem()
but since interrupts will come in we can trigger more and more
lengthy console output and eventually overflow the printk log
buffer before the chunking loop can catch up and we'll thus
lose messages.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: softlockup interaction with slow consoles
  2006-02-21 13:49     ` Andi Kleen
@ 2006-02-21 20:01       ` David S. Miller
  2006-02-21 20:05         ` Andi Kleen
  0 siblings, 1 reply; 12+ messages in thread
From: David S. Miller @ 2006-02-21 20:01 UTC (permalink / raw)
  To: ak; +Cc: linux-kernel, mingo

From: Andi Kleen <ak@suse.de>
Date: 21 Feb 2006 14:49:49 +0100

> Still you could probably see problems with very slow consoles even
> after bootup, couldn't you?

Yes, others have mentioned this too, good point.

Depending upon what we're really trying to achieve with the
softlockup stuff, we can perhaps look into increasing the
timeout or making it configurable.  We could even do this
dynamically, so when we register a serial console running
at some low baud rate, we scale up the softlockup timeout
or something like that.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: softlockup interaction with slow consoles
  2006-02-21 20:01       ` David S. Miller
@ 2006-02-21 20:05         ` Andi Kleen
  2006-02-21 20:19           ` David S. Miller
  0 siblings, 1 reply; 12+ messages in thread
From: Andi Kleen @ 2006-02-21 20:05 UTC (permalink / raw)
  To: David S. Miller; +Cc: linux-kernel, mingo

On Tuesday 21 February 2006 21:01, David S. Miller wrote:
> From: Andi Kleen <ak@suse.de>
> Date: 21 Feb 2006 14:49:49 +0100
> 
> > Still you could probably see problems with very slow consoles even
> > after bootup, couldn't you?
> 
> Yes, others have mentioned this too, good point.
> 
> Depending upon what we're really trying to achieve with the
> softlockup stuff, we can perhaps look into increasing the
> timeout or making it configurable.  We could even do this
> dynamically, so when we register a serial console running
> at some low baud rate, we scale up the softlockup timeout
> or something like that.

The classic way is to just use touch_nmi_watchdog() somewhere
in the loop that does work. That touches the softwatchdog too
these days.

-Andi

 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: softlockup interaction with slow consoles
  2006-02-21 20:05         ` Andi Kleen
@ 2006-02-21 20:19           ` David S. Miller
  2006-02-21 20:27             ` Dave Jones
  2006-02-21 20:43             ` Andi Kleen
  0 siblings, 2 replies; 12+ messages in thread
From: David S. Miller @ 2006-02-21 20:19 UTC (permalink / raw)
  To: ak; +Cc: linux-kernel, mingo

From: Andi Kleen <ak@suse.de>
Date: Tue, 21 Feb 2006 21:05:37 +0100

> The classic way is to just use touch_nmi_watchdog() somewhere
> in the loop that does work. That touches the softwatchdog too
> these days.

"jiffies" aren't advancing, since interrupts are disabled by
release_console_sem(), so that doesn't work.

I tried that already :-)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: softlockup interaction with slow consoles
  2006-02-21 20:19           ` David S. Miller
@ 2006-02-21 20:27             ` Dave Jones
  2006-02-21 20:50               ` David S. Miller
  2006-02-21 20:43             ` Andi Kleen
  1 sibling, 1 reply; 12+ messages in thread
From: Dave Jones @ 2006-02-21 20:27 UTC (permalink / raw)
  To: David S. Miller; +Cc: ak, linux-kernel, mingo

On Tue, Feb 21, 2006 at 12:19:48PM -0800, David S. Miller wrote:
 > From: Andi Kleen <ak@suse.de>
 > Date: Tue, 21 Feb 2006 21:05:37 +0100
 > 
 > > The classic way is to just use touch_nmi_watchdog() somewhere
 > > in the loop that does work. That touches the softwatchdog too
 > > these days.
 > 
 > "jiffies" aren't advancing, since interrupts are disabled by
 > release_console_sem(), so that doesn't work.
 > 
 > I tried that already :-)

Where did you put it?  I hit a similar problem a few months back,
and this patch did the trick for me..

		Dave

NMI Watchdog detected LOCKUP on CPU2CPU 2
Modules linked in: loop usb_storage md5 ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core rfcomm l2cap bluetooth sunrpc pcdPid: 3138, comm: gpm Not tainted 2.6.11-1.1290_FC4smp
RIP: 0010:[<ffffffff80273b8a>] <ffffffff80273b8a>{serial_in+106}
RSP: 0018:ffff81003afc3d50  EFLAGS: 00000002
RAX: 0000000000000020 RBX: 0000000000000020 RCX: 0000000000000000
RDX: 00000000000003fd RSI: 0000000000000005 RDI: ffffffff804dcd60
RBP: 00000000000024fc R08: 000000000000000a R09: 0000000000000033
R10: ffff81001beb7c20 R11: 0000000000000020 R12: ffffffff804dcd60
R13: ffffffff804ade76 R14: 000000000000002b R15: 000000000000002c
FS:  00002aaaaaac4920(0000) GS:ffffffff804fca00(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002aaaaabcb000 CR3: 000000003c0d0000 CR4: 00000000000006e0
Process gpm (pid: 3138, threadinfo ffff81003afc2000, task ffff81003eb63780)
Stack: ffffffff80275f2e 0000000000000000 ffffffff80448380 0000000000007d6b
       000000000000002c fffffffffffffbbf 0000000000000292 0000000000008000
       ffffffff80138e8c 0000000000007d97
Call Trace:<ffffffff80275f2e>{serial8250_console_write+270} <ffffffff80138e8c>{__call_console_drivers+76}
       <ffffffff8013914b>{release_console_sem+315} <ffffffff80260325>{con_open+149}
       <ffffffff80254e99>{tty_open+537} <ffffffff80192713>{chrdev_open+387}
       <ffffffff80188824>{dentry_open+260} <ffffffff80188994>{filp_open+68}
       <ffffffff80187b73>{get_unused_fd+227} <ffffffff80188a6c>{sys_open+76}
       <ffffffff8010ebc6>{tracesys+209}

Code: 0f b6 c0 c3 66 90 41 57 49 89 f7 41 56 41 be 00 01 00 00 41
console shuts up ...


--- linux-2.6.11/drivers/serial/8250.c~	2005-05-14 02:49:02.000000000 -0400
+++ linux-2.6.11/drivers/serial/8250.c	2005-05-14 02:54:30.000000000 -0400
@@ -2098,9 +2098,11 @@ static inline void wait_for_xmitr(struct
 	/* Wait up to 1s for flow control if necessary */
 	if (up->port.flags & UPF_CONS_FLOW) {
 		tmout = 1000000;
-		while (--tmout &&
-		       ((serial_in(up, UART_MSR) & UART_MSR_CTS) == 0))
+		while (!(serial_in(up, UART_MSR) & UART_MSR_CTS) && --tmout) {
 			udelay(1);
+			if ((tmout % 1000) == 0)
+				touch_nmi_watchdog();
+		}
 	}
 }
 





^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: softlockup interaction with slow consoles
  2006-02-21 20:19           ` David S. Miller
  2006-02-21 20:27             ` Dave Jones
@ 2006-02-21 20:43             ` Andi Kleen
  1 sibling, 0 replies; 12+ messages in thread
From: Andi Kleen @ 2006-02-21 20:43 UTC (permalink / raw)
  To: David S. Miller; +Cc: linux-kernel, mingo

On Tuesday 21 February 2006 21:19, David S. Miller wrote:
> From: Andi Kleen <ak@suse.de>
> Date: Tue, 21 Feb 2006 21:05:37 +0100
> 
> > The classic way is to just use touch_nmi_watchdog() somewhere
> > in the loop that does work. That touches the softwatchdog too
> > these days.
> 
> "jiffies" aren't advancing, since interrupts are disabled by
> release_console_sem(), so that doesn't work.
> 
> I tried that already :-)

Ah I see the problem I guess. When you restart then your 
timer interrupt catches up the missing jiffies very quickly
and that triggers the softwatchdog. 

Nasty. Perhaps it needs to look at xtime instead and 
let touch_softlockup_watchdog() update that.  But that could
break with NTP then. Perhaps using monotonic_clock() ? 

-Andi
 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: softlockup interaction with slow consoles
  2006-02-21 20:27             ` Dave Jones
@ 2006-02-21 20:50               ` David S. Miller
  0 siblings, 0 replies; 12+ messages in thread
From: David S. Miller @ 2006-02-21 20:50 UTC (permalink / raw)
  To: davej; +Cc: ak, linux-kernel, mingo

From: Dave Jones <davej@redhat.com>
Date: Tue, 21 Feb 2006 15:27:57 -0500

> On Tue, Feb 21, 2006 at 12:19:48PM -0800, David S. Miller wrote:
>  > From: Andi Kleen <ak@suse.de>
>  > Date: Tue, 21 Feb 2006 21:05:37 +0100
>  > 
>  > > The classic way is to just use touch_nmi_watchdog() somewhere
>  > > in the loop that does work. That touches the softwatchdog too
>  > > these days.
>  > 
>  > "jiffies" aren't advancing, since interrupts are disabled by
>  > release_console_sem(), so that doesn't work.
>  > 
>  > I tried that already :-)
> 
> Where did you put it?  I hit a similar problem a few months back,
> and this patch did the trick for me..

Not the NMI watchdog, the softlockup watchdog.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2006-02-21 21:00 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-02-20 21:18 softlockup interaction with slow consoles David S. Miller
2006-02-21  9:09 ` Ingo Molnar
2006-02-21  9:16   ` David S. Miller
2006-02-21 13:49     ` Andi Kleen
2006-02-21 20:01       ` David S. Miller
2006-02-21 20:05         ` Andi Kleen
2006-02-21 20:19           ` David S. Miller
2006-02-21 20:27             ` Dave Jones
2006-02-21 20:50               ` David S. Miller
2006-02-21 20:43             ` Andi Kleen
2006-02-21 19:23     ` Matt Mackall
2006-02-21 19:58       ` David S. Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox