[PATCH] printk: Avoid softlockups in console

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] printk: Avoid softlockups in console_unlock()
@ 2013-01-15 17:58 Jan Kara
  2013-01-16  7:37 ` Andrew Morton
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Kara @ 2013-01-15 17:58 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: LKML, jslaby, Jan Kara

A CPU can be caught in console_unlock() for a long time (tens of seconds are
reported by our customers) when other CPUs are using printk heavily and serial
console makes printing slow.  This triggers softlockup warnings because
interrupts are disabled for the whole time console_unlock() runs (e.g.
vprintk() calls console_unlock() with interrupts disabled).

We fix the issue by printing at most 1 KB of messages (unless we are in an
early boot stage or oops is happening) in one console_unlock() call. The rest
of the buffer will be printed either by further callers to printk() or by a
queued work.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 kernel/printk.c |   54 +++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 files changed, 49 insertions(+), 5 deletions(-)

diff --git a/kernel/printk.c b/kernel/printk.c
index 357f714..13a633f 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -246,9 +246,22 @@ static enum log_flags console_prev;
 static u64 clear_seq;
 static u32 clear_idx;
 
+/* Worker to print accumulated data to console when there's too much of it */
+static void printk_worker(struct work_struct *work);
+static DECLARE_WORK(printk_work, printk_worker);
+
 #define PREFIX_MAX		32
 #define LOG_LINE_MAX		1024 - PREFIX_MAX
 
+/*
+ * How much do we print in one console_unlock(). We have this limit so that we
+ * don't have interrupts disabled on one CPU for too long when there a lot to
+ * write. We are conservative because we can be printing to a 9600 baud serial
+ * console... Note that we check each time after printing one line so
+ * LOG_LINE_MAX actually limits granularity of our checks.
+ */
+#define PRINT_LIMIT		1024
+
 /* record buffer */
 #if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
 #define LOG_ALIGN 4
@@ -2030,23 +2043,26 @@ out:
  *
  * While the console_lock was held, console output may have been buffered
  * by printk().  If this is the case, console_unlock(); emits
- * the output prior to releasing the lock.
+ * the output prior to releasing the lock. We print at most PRINT_LIMIT
+ * characters. Function returns true, if there's more data that needs
+ * printing in the buffer.
  *
  * If there is output waiting, we wake /dev/kmsg and syslog() users.
  *
  * console_unlock(); may be called from any context.
  */
-void console_unlock(void)
+static bool __console_unlock(void)
 {
 	static char text[LOG_LINE_MAX + PREFIX_MAX];
 	static u64 seen_seq;
 	unsigned long flags;
 	bool wake_klogd = false;
 	bool retry;
+	unsigned printed = 0;
 
 	if (console_suspended) {
 		up(&console_sem);
-		return;
+		return false;
 	}
 
 	console_may_schedule = 0;
@@ -2072,7 +2088,8 @@ again:
 			console_prev = 0;
 		}
 skip:
-		if (console_seq == log_next_seq)
+		if (console_seq == log_next_seq ||
+		    (printed >= PRINT_LIMIT && !oops_in_progress && keventd_up()))
 			break;
 
 		msg = log_from_idx(console_idx);
@@ -2105,6 +2122,7 @@ skip:
 		call_console_drivers(level, text, len);
 		start_critical_timings();
 		local_irq_restore(flags);
+		printed += len;
 	}
 	console_locked = 0;
 	mutex_release(&console_lock_dep_map, 1, _RET_IP_);
@@ -2127,14 +2145,40 @@ skip:
 	retry = console_seq != log_next_seq;
 	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
 
-	if (retry && console_trylock())
+	if (retry &&
+	    (printed < PRINT_LIMIT || oops_in_progress || !keventd_up()) &&
+	    console_trylock())
 		goto again;
 
 	if (wake_klogd)
 		wake_up_klogd();
+	return retry;
+}
+
+void console_unlock(void)
+{
+	if (__console_unlock()) {
+		/* Let worker do the rest of printing */
+		schedule_work(&printk_work);
+	}
 }
 EXPORT_SYMBOL(console_unlock);
 
+/*
+ * This is a worker function to print data from printk buffer when
+ * console_unlock() didn't write it all. The advantage of this function is that
+ * it does the printing in a well known context where we can reschedule to
+ * avoid locking up one CPU with printing.
+ */
+static void printk_worker(struct work_struct *work)
+{
+	console_lock();
+	while (__console_unlock()) {
+		cond_resched();
+		console_lock();
+	}
+}
+
 /**
  * console_conditional_schedule - yield the CPU if required
  *
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] printk: Avoid softlockups in console_unlock()
  2013-01-15 17:58 [PATCH] printk: Avoid softlockups in console_unlock() Jan Kara
@ 2013-01-16  7:37 ` Andrew Morton
  2013-01-16 10:16   ` Jan Kara
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2013-01-16  7:37 UTC (permalink / raw)
  To: Jan Kara; +Cc: Greg Kroah-Hartman, LKML, jslaby

On Tue, 15 Jan 2013 18:58:34 +0100 Jan Kara <jack@suse.cz> wrote:

> A CPU can be caught in console_unlock() for a long time (tens of seconds are
> reported by our customers) when other CPUs are using printk heavily and serial
> console makes printing slow.  This triggers softlockup warnings because
> interrupts are disabled for the whole time console_unlock() runs (e.g.
> vprintk() calls console_unlock() with interrupts disabled).

It should trigger the NMI watchdog in that case?

> We fix the issue by printing at most 1 KB of messages (unless we are in an
> early boot stage or oops is happening) in one console_unlock() call. The rest
> of the buffer will be printed either by further callers to printk() or by a
> queued work.

Complex.  Did you try just putting a touch_nmi_watchdog() in the loop?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] printk: Avoid softlockups in console_unlock()
  2013-01-16  7:37 ` Andrew Morton
@ 2013-01-16 10:16   ` Jan Kara
  2013-01-16 22:50     ` Andrew Morton
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Kara @ 2013-01-16 10:16 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Jan Kara, Greg Kroah-Hartman, LKML, jslaby

On Tue 15-01-13 23:37:42, Andrew Morton wrote:
> On Tue, 15 Jan 2013 18:58:34 +0100 Jan Kara <jack@suse.cz> wrote:
> 
> > A CPU can be caught in console_unlock() for a long time (tens of seconds are
> > reported by our customers) when other CPUs are using printk heavily and serial
> > console makes printing slow.  This triggers softlockup warnings because
> > interrupts are disabled for the whole time console_unlock() runs (e.g.
> > vprintk() calls console_unlock() with interrupts disabled).
> 
> It should trigger the NMI watchdog in that case?
  Yes. I see soft lockup reports like:
BUG: soft lockup - CPU#0 stuck for 23s! [kworker/u:126:791]
Modules linked in: sd_mod crc_t10dif usbhid hid uhci_hcd ehci_hcd thermal usbcore usb_common bfa scsi_transport_fc scsi_tgt processor thermal_sys hwmon dm_snapshot dm_mod ata_generic ata_piix libata hpsa cciss scsi_mod
Supported: Yes

Pid: 791, comm: kworker/u:126 Not tainted 3.0.13-0.9-pae #1 HP ProLiant DL380 G7
EIP: 0060:[<c027ceda>] EFLAGS: 00000202 CPU: 0
EIP is at smp_call_function_many+0x1aa/0x220
EAX: 00000292 EBX: f3006780 ECX: 00000004 EDX: 00000080
ESI: f3006798 EDI: 00000003 EBP: c08696b4 ESP: eeed7e68
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process kworker/u:126 (pid: 791, ti=eeed6000 task=eeed4f60 task.ti=eeed6000)
Stack:
 00000080 01000000 00000000 c034bbd0 00000000 c034bbd0 f0c03850 ef5c7400
 c027cf6e 00000001 00000000 c027cf9b 00000000 f0c03840 f0c03850 ef5c7400
 c0351e55 00000001 ef5c743c 00000000 ef5c7400 f0c03840 ef5c7460 f1041c05
Call Trace:
 [<c027cf6e>] smp_call_function+0x1e/0x30
 [<c027cf9b>] on_each_cpu+0x1b/0x50
 [<c0351e55>] __blkdev_put+0x165/0x1a0
 [<c03db6f2>] register_disk+0x142/0x150
 [<c03db7e9>] add_disk+0xe9/0x160
 [<f823f166>] sd_probe_async+0xf6/0x1c0 [sd_mod]
 [<c026e0ff>] async_run_entry_fn+0x5f/0x160
 [<c026062c>] process_one_work+0x13c/0x2d0
 [<c0262e87>] worker_thread+0x127/0x370
 [<c0266764>] kthread+0x74/0x80
 [<c05ce126>] kernel_thread_helper+0x6/0xd
Code: e8 3c 9d 34 00 0f ae f0 89 f6 89 f0 ff 15 04 1e 83 c0
80 7c 24 07 00 0f 84 bf fe ff ff f6 43 10 01 0f 84 b5 fe ff ff 66 90 f3 90
<f6> 43 10 01 66 90 75 f6 83 c4 10 5b 5e 5f 5d c3 8d b6 00 00 00 
 
  Note that this is with our SLE11-SP2 kernel but printk code upstream is
the same upstream in this regard.

> > We fix the issue by printing at most 1 KB of messages (unless we are in an
> > early boot stage or oops is happening) in one console_unlock() call. The rest
> > of the buffer will be printed either by further callers to printk() or by a
> > queued work.
> 
> Complex.  Did you try just putting a touch_nmi_watchdog() in the loop?
  I didn't try that. I suppose touch_nmi_watchdog() +
rcu_cpu_stall_reset() would make the messages go away (yes, RCU eventually
freaks out as well). But is it really sane that we keep single CPU busy,
unusable for anything else, for such a long time? There can be no RCU
callbacks processed, no IPIs are processed (which is what triggers
softlockup reports), etc.

I agree that if we silence all the warnings, everything will eventually
hang waiting for the stalled CPU, that will finish the printing and things
start from the beginning (we tried silencing RCU with rcu_cpu_stall_reset()
and that makes the machine boot eventually). But it seems like papering
over a real problem?  
								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] printk: Avoid softlockups in console_unlock()
  2013-01-16 10:16   ` Jan Kara
@ 2013-01-16 22:50     ` Andrew Morton
  2013-01-16 23:55       ` Jan Kara
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2013-01-16 22:50 UTC (permalink / raw)
  To: Jan Kara; +Cc: Greg Kroah-Hartman, LKML, jslaby

On Wed, 16 Jan 2013 11:16:44 +0100
Jan Kara <jack@suse.cz> wrote:

> On Tue 15-01-13 23:37:42, Andrew Morton wrote:
> > On Tue, 15 Jan 2013 18:58:34 +0100 Jan Kara <jack@suse.cz> wrote:
> > 
> > > A CPU can be caught in console_unlock() for a long time (tens of seconds are
> > > reported by our customers) when other CPUs are using printk heavily and serial
> > > console makes printing slow.  This triggers softlockup warnings because
> > > interrupts are disabled for the whole time console_unlock() runs (e.g.
> > > vprintk() calls console_unlock() with interrupts disabled).
> > 
> > It should trigger the NMI watchdog in that case?
>   Yes. I see soft lockup reports like:
> BUG: soft lockup - CPU#0 stuck for 23s! [kworker/u:126:791]

That's not an actual NMI watchdog expiry.  Doesn't matter.

> > > We fix the issue by printing at most 1 KB of messages (unless we are in an
> > > early boot stage or oops is happening) in one console_unlock() call. The rest
> > > of the buffer will be printed either by further callers to printk() or by a
> > > queued work.
> > 
> > Complex.  Did you try just putting a touch_nmi_watchdog() in the loop?
>   I didn't try that. I suppose touch_nmi_watchdog() +
> rcu_cpu_stall_reset() would make the messages go away (yes, RCU eventually
> freaks out as well). But is it really sane that we keep single CPU busy,
> unusable for anything else, for such a long time? There can be no RCU
> callbacks processed, no IPIs are processed (which is what triggers
> softlockup reports), etc.

What's not sane is doing large amounts of printk over a slow device.

> I agree that if we silence all the warnings, everything will eventually
> hang waiting for the stalled CPU, that will finish the printing and things
> start from the beginning (we tried silencing RCU with rcu_cpu_stall_reset()
> and that makes the machine boot eventually). But it seems like papering
> over a real problem?  

Well not really - we're doing what the printk() caller asked us to do -
to synchronously print stuff.  And simply sitting there pumping out the
characters is the simplest, most straightforward thing to do.  And
printk() should be simple and straightforward.

If this is all a problem then the calling code should stop doing so
much printing!

And punting the operation to a kernel thread is a pretty radical change
- it surely adds significant risk that output will be lost.


So hrm, I dunno.  Can we just put the touch_nmi_watchdog() in there
intially, see if it fixes things?  If people continue to hit problems
then we can take a second look?


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] printk: Avoid softlockups in console_unlock()
  2013-01-16 22:50     ` Andrew Morton
@ 2013-01-16 23:55       ` Jan Kara
  2013-01-17  0:11         ` Andrew Morton
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Kara @ 2013-01-16 23:55 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Jan Kara, Greg Kroah-Hartman, LKML, jslaby

On Wed 16-01-13 14:50:05, Andrew Morton wrote:
> > > > We fix the issue by printing at most 1 KB of messages (unless we are in an
> > > > early boot stage or oops is happening) in one console_unlock() call. The rest
> > > > of the buffer will be printed either by further callers to printk() or by a
> > > > queued work.
> > > 
> > > Complex.  Did you try just putting a touch_nmi_watchdog() in the loop?
> >   I didn't try that. I suppose touch_nmi_watchdog() +
> > rcu_cpu_stall_reset() would make the messages go away (yes, RCU eventually
> > freaks out as well). But is it really sane that we keep single CPU busy,
> > unusable for anything else, for such a long time? There can be no RCU
> > callbacks processed, no IPIs are processed (which is what triggers
> > softlockup reports), etc.
> 
> What's not sane is doing large amounts of printk over a slow device.
  Yeah, but serial consoles are handy... These sysadmins are pretty
religious about them and when the machine doesn't boot with serial console
enabled they complain. Nitpickers!

> > I agree that if we silence all the warnings, everything will eventually
> > hang waiting for the stalled CPU, that will finish the printing and things
> > start from the beginning (we tried silencing RCU with rcu_cpu_stall_reset()
> > and that makes the machine boot eventually). But it seems like papering
> > over a real problem?  
> 
> Well not really - we're doing what the printk() caller asked us to do -
> to synchronously print stuff.  And simply sitting there pumping out the
> characters is the simplest, most straightforward thing to do.  And
> printk() should be simple and straightforward.
  Except that printk() isn't always synchronous. It writes a message into
the kernel buffer. If there is noone else pumping characters from the buffer
into console at that moment, the printk caller starts doing so. But
otherwise printk() just returns and pumping of characters is left on
whoever started doing that thankless job. So that poor guy ends up doing
the pumping for all the others...

> If this is all a problem then the calling code should stop doing so
> much printing!
  It's mostly a device discovery that triggers the issues in practice. They
have over thousand of SCSI disks attached (multipath in use) and when each
disk prints ~400 bytes of messages (just check your dmesg) you end up with
~30s worth of printing over 115200 console.

> And punting the operation to a kernel thread is a pretty radical change
> - it surely adds significant risk that output will be lost.
  I agree there is a higher chance the output will be lost.

> So hrm, I dunno.  Can we just put the touch_nmi_watchdog() in there
> intially, see if it fixes things?  If people continue to hit problems
> then we can take a second look?
  OK, I'll see if I can get this tested on one of those machines...

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] printk: Avoid softlockups in console_unlock()
  2013-01-16 23:55       ` Jan Kara
@ 2013-01-17  0:11         ` Andrew Morton
  2013-01-17 21:04           ` Jan Kara
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2013-01-17  0:11 UTC (permalink / raw)
  To: Jan Kara; +Cc: Greg Kroah-Hartman, LKML, jslaby

On Thu, 17 Jan 2013 00:55:29 +0100
Jan Kara <jack@suse.cz> wrote:

> On Wed 16-01-13 14:50:05, Andrew Morton wrote:
> > > > > We fix the issue by printing at most 1 KB of messages (unless we are in an
> > > > > early boot stage or oops is happening) in one console_unlock() call. The rest
> > > > > of the buffer will be printed either by further callers to printk() or by a
> > > > > queued work.
> > > > 
> > > > Complex.  Did you try just putting a touch_nmi_watchdog() in the loop?
> > >   I didn't try that. I suppose touch_nmi_watchdog() +
> > > rcu_cpu_stall_reset() would make the messages go away (yes, RCU eventually
> > > freaks out as well). But is it really sane that we keep single CPU busy,
> > > unusable for anything else, for such a long time? There can be no RCU
> > > callbacks processed, no IPIs are processed (which is what triggers
> > > softlockup reports), etc.
> > 
> > What's not sane is doing large amounts of printk over a slow device.
>   Yeah, but serial consoles are handy... These sysadmins are pretty
> religious about them and when the machine doesn't boot with serial console
> enabled they complain. Nitpickers!

OK, so it's the boot-time message flood which is causing the problem.
Others may disagree, but I think that special case of message flooding
is quite justifiable.

> > > I agree that if we silence all the warnings, everything will eventually
> > > hang waiting for the stalled CPU, that will finish the printing and things
> > > start from the beginning (we tried silencing RCU with rcu_cpu_stall_reset()
> > > and that makes the machine boot eventually). But it seems like papering
> > > over a real problem?  
> > 
> > Well not really - we're doing what the printk() caller asked us to do -
> > to synchronously print stuff.  And simply sitting there pumping out the
> > characters is the simplest, most straightforward thing to do.  And
> > printk() should be simple and straightforward.
>   Except that printk() isn't always synchronous. It writes a message into
> the kernel buffer. If there is noone else pumping characters from the buffer
> into console at that moment, the printk caller starts doing so. But
> otherwise printk() just returns and pumping of characters is left on
> whoever started doing that thankless job. So that poor guy ends up doing
> the pumping for all the others...

Well yes; that's another of my hare-brained schemes, inflicted upon Linux
before everyone else was born.

> > If this is all a problem then the calling code should stop doing so
> > much printing!
>   It's mostly a device discovery that triggers the issues in practice. They
> have over thousand of SCSI disks attached (multipath in use) and when each
> disk prints ~400 bytes of messages (just check your dmesg) you end up with
> ~30s worth of printing over 115200 console.

OK.

If they're using 110 baud (nobody remembers what this word means) then
the 1000 chars will be too much.  More seriously, what happens if
they're using hardware flow control and someone kicks the cable out? 
Some people are going to want the kernel to just sit there waiting,
until the cable gets plugged in again.

> > And punting the operation to a kernel thread is a pretty radical change
> > - it surely adds significant risk that output will be lost.
>   I agree there is a higher chance the output will be lost.
> 
> > So hrm, I dunno.  Can we just put the touch_nmi_watchdog() in there
> > intially, see if it fixes things?  If people continue to hit problems
> > then we can take a second look?
>   OK, I'll see if I can get this tested on one of those machines...

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] printk: Avoid softlockups in console_unlock()
  2013-01-17  0:11         ` Andrew Morton
@ 2013-01-17 21:04           ` Jan Kara
  2013-01-17 21:39             ` Andrew Morton
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Kara @ 2013-01-17 21:04 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Jan Kara, Greg Kroah-Hartman, LKML, jslaby

On Wed 16-01-13 16:11:18, Andrew Morton wrote:
> On Thu, 17 Jan 2013 00:55:29 +0100
> Jan Kara <jack@suse.cz> wrote:
> > > If this is all a problem then the calling code should stop doing so
> > > much printing!
> >   It's mostly a device discovery that triggers the issues in practice. They
> > have over thousand of SCSI disks attached (multipath in use) and when each
> > disk prints ~400 bytes of messages (just check your dmesg) you end up with
> > ~30s worth of printing over 115200 console.
> 
> OK.
> 
> If they're using 110 baud (nobody remembers what this word means) then
  Heh, I remember times when I was connecting to internet with 2400 baud
modem via dialup. So I do remember what baud means from those times ;).

> the 1000 chars will be too much.  More seriously, what happens if
> they're using hardware flow control and someone kicks the cable out? 
> Some people are going to want the kernel to just sit there waiting,
> until the cable gets plugged in again.
> 
> > > And punting the operation to a kernel thread is a pretty radical change
> > > - it surely adds significant risk that output will be lost.
> >   I agree there is a higher chance the output will be lost.
> > 
> > > So hrm, I dunno.  Can we just put the touch_nmi_watchdog() in there
> > > intially, see if it fixes things?  If people continue to hit problems
> > > then we can take a second look?
> >   OK, I'll see if I can get this tested on one of those machines...
> 
> Thanks.
  So I played a bit with this. To make things easier for me I added
artificial mdelay(len*10) (effectively simulating console able to print 100
characters per second) just after call_console_drivers() so that I can
trigger issues even on a machine easily available to me. Booting actually
doesn't trigger any problems because there aren't enough things happening
in parallel on common machine during boot but
  echo t >/proc/sysrq-trigger &
  for i in /lib/modules/3.8.0-rc3-0-default/kernel/fs/*/*.ko; do 
    name=`basename $i`; name=${name%.ko}; modprobe $name
  done
easily triggers the problem (as modprobe uses both RCU & IPIs to signal all
CPUs).

  Adding
	touch_nmi_watchdog();
	touch_all_softlockup_watchdogs();
	rcu_cpu_stall_reset();
  into the printk loop did stop all the warnings and the machine eventually
came alive again after finishing printing sysrq-t output (while printing
the machine was responding to ping but ssh stopped working after a while -
not sure what was happening but apparently some IO requests weren't
completing and maybe networking started dropping packets because it
accumulated too much RCU work).

So your suggestion seems certainly plausible. I was just wondering
  a) Above three lines can be pretty expensive on big machines as they
iterate over all CPUs. So we should probably limit it to once per jiffy or
something like that?
  b) Above three lines can make softlockup detection pretty useless if
there's enough printk traffic (it would be enough to printk() something
every 10s or so which can happen with netfilter logging packets or so).
But if we touch the watchdogs only if we spend more than 1 jiffy in the
console_unlock() we should hopefully touch those watchdogs only in rare
cases of heavy printk traffic.

  Thoughts?

								Honza

PS: sysrq-t was ~200 KB on that freshly booted machine so on a busy machine
that would trigger softlockups with 115200 serial console as well (I
actually do remember seeing it in some customer's reports). So it's not
just boot. 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] printk: Avoid softlockups in console_unlock()
  2013-01-17 21:04           ` Jan Kara
@ 2013-01-17 21:39             ` Andrew Morton
  2013-01-17 23:46               ` Jan Kara
  2013-01-31  7:44               ` anish singh
  0 siblings, 2 replies; 16+ messages in thread
From: Andrew Morton @ 2013-01-17 21:39 UTC (permalink / raw)
  To: Jan Kara; +Cc: Greg Kroah-Hartman, LKML, jslaby

On Thu, 17 Jan 2013 22:04:42 +0100
Jan Kara <jack@suse.cz> wrote:

> ...
>
>   So I played a bit with this. To make things easier for me I added
> artificial mdelay(len*10) (effectively simulating console able to print 100
> characters per second) just after call_console_drivers() so that I can
> trigger issues even on a machine easily available to me. Booting actually
> doesn't trigger any problems because there aren't enough things happening
> in parallel on common machine during boot but
>   echo t >/proc/sysrq-trigger &
>   for i in /lib/modules/3.8.0-rc3-0-default/kernel/fs/*/*.ko; do 
>     name=`basename $i`; name=${name%.ko}; modprobe $name
>   done
> easily triggers the problem (as modprobe uses both RCU & IPIs to signal all
> CPUs).
> 
>   Adding
> 	touch_nmi_watchdog();
> 	touch_all_softlockup_watchdogs();
> 	rcu_cpu_stall_reset();

I'm not sure that touch_all_softlockup_watchdogs() is needed? 
touch_nmi_watchdog() itself calls touch_softlockup_watchdog().

If the rcu_cpu_stall_reset() is needed here then presumably it is
needed elsewhere and we should put a call to rcu_cpu_stall_reset() into
(the increasingly misnamed) touch_nmi_watchdog().

>   into the printk loop did stop all the warnings and the machine eventually
> came alive again after finishing printing sysrq-t output (while printing
> the machine was responding to ping but ssh stopped working after a while -
> not sure what was happening but apparently some IO requests weren't
> completing and maybe networking started dropping packets because it
> accumulated too much RCU work).
> 
> So your suggestion seems certainly plausible. I was just wondering
>   a) Above three lines can be pretty expensive on big machines as they
> iterate over all CPUs. So we should probably limit it to once per jiffy or
> something like that?

I guess so - is it hard to test the effects of such a change?  Maybe do
a few MB of printks with the output disabled with `dmesg -n' and see
what effect such a patch has?

If it does need ratelimiting, I'd worry about using jiffies for that. 
If the kernel is spending a long time with interrupts disabled, jiffies
might not be incrementing.  Using the CPU timestamp would be better
(eg, sched_clock()).

>   b) Above three lines can make softlockup detection pretty useless if
> there's enough printk traffic (it would be enough to printk() something
> every 10s or so which can happen with netfilter logging packets or so).

Yes, that is a concern.

> But if we touch the watchdogs only if we spend more than 1 jiffy in the
> console_unlock() we should hopefully touch those watchdogs only in rare
> cases of heavy printk traffic.

yup.  Another option might be to do the touch_nmi_watchdog() only if
there is a "large" amount of data being emitted within
console_unlock().  Like your 1000 character threshold.

> PS: sysrq-t was ~200 KB on that freshly booted machine so on a busy machine
> that would trigger softlockups with 115200 serial console as well (I
> actually do remember seeing it in some customer's reports). So it's not
> just boot. 

Yes, we have hit this before.  Large printks over slow console devices.
See the sad little touch_nmi_watchdog() in lib/show_mem.c, for
example.  I actually thought we'd fixed this in printk itself but it
seems not - the two touch_nmi_watchdog()s in there are for special
cases.

Actually, a bit of grepping for touch_nmi_watchdog() is interesting.  A
number of serial drivers are doing it, which makes me wonder why your
customers weren't saved by that - using the wrong driver, perhaps?  If
we can get this fixed centrally then a lot of those calls should be
removeable.

The presence of all those touch_nmi_watchdog() calls around the place
has implications for the testing of your patch, btw ;)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] printk: Avoid softlockups in console_unlock()
  2013-01-17 21:39             ` Andrew Morton
@ 2013-01-17 23:46               ` Jan Kara
  2013-01-17 23:50                 ` Andrew Morton
  2013-01-31  7:44               ` anish singh
  1 sibling, 1 reply; 16+ messages in thread
From: Jan Kara @ 2013-01-17 23:46 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Jan Kara, Greg Kroah-Hartman, LKML, jslaby

On Thu 17-01-13 13:39:17, Andrew Morton wrote:
> On Thu, 17 Jan 2013 22:04:42 +0100
> Jan Kara <jack@suse.cz> wrote:
> 
> > ...
> >
> >   So I played a bit with this. To make things easier for me I added
> > artificial mdelay(len*10) (effectively simulating console able to print 100
> > characters per second) just after call_console_drivers() so that I can
> > trigger issues even on a machine easily available to me. Booting actually
> > doesn't trigger any problems because there aren't enough things happening
> > in parallel on common machine during boot but
> >   echo t >/proc/sysrq-trigger &
> >   for i in /lib/modules/3.8.0-rc3-0-default/kernel/fs/*/*.ko; do 
> >     name=`basename $i`; name=${name%.ko}; modprobe $name
> >   done
> > easily triggers the problem (as modprobe uses both RCU & IPIs to signal all
> > CPUs).
> > 
> >   Adding
> > 	touch_nmi_watchdog();
> > 	touch_all_softlockup_watchdogs();
> > 	rcu_cpu_stall_reset();
> 
> I'm not sure that touch_all_softlockup_watchdogs() is needed? 
> touch_nmi_watchdog() itself calls touch_softlockup_watchdog().
  It is. I've tried without it and the machine died a horrible death
because softlockup reports added to already too heavy printk traffic. The
problem is that CPU doing printing cannot handle IPIs thus if someone calls
e.g. smp_call_function_many() that function will spin waiting for IPIs on
all CPUs to finish. And that doesn't happen until printing is done so
CPU doing smp_call_function_many() gets locked up as well.

> If the rcu_cpu_stall_reset() is needed here then presumably it is
> needed elsewhere and we should put a call to rcu_cpu_stall_reset() into
> (the increasingly misnamed) touch_nmi_watchdog().
  Yeah, correct.
 
> >   into the printk loop did stop all the warnings and the machine eventually
> > came alive again after finishing printing sysrq-t output (while printing
> > the machine was responding to ping but ssh stopped working after a while -
> > not sure what was happening but apparently some IO requests weren't
> > completing and maybe networking started dropping packets because it
> > accumulated too much RCU work).
> > 
> > So your suggestion seems certainly plausible. I was just wondering
> >   a) Above three lines can be pretty expensive on big machines as they
> > iterate over all CPUs. So we should probably limit it to once per jiffy or
> > something like that?
> 
> I guess so - is it hard to test the effects of such a change?  Maybe do
> a few MB of printks with the output disabled with `dmesg -n' and see
> what effect such a patch has?
  OK, will try.
 
> If it does need ratelimiting, I'd worry about using jiffies for that. 
> If the kernel is spending a long time with interrupts disabled, jiffies
> might not be incrementing.  Using the CPU timestamp would be better
> (eg, sched_clock()).
> 
> >   b) Above three lines can make softlockup detection pretty useless if
> > there's enough printk traffic (it would be enough to printk() something
> > every 10s or so which can happen with netfilter logging packets or so).
> 
> Yes, that is a concern.
> 
> > But if we touch the watchdogs only if we spend more than 1 jiffy in the
> > console_unlock() we should hopefully touch those watchdogs only in rare
> > cases of heavy printk traffic.
> 
> yup.  Another option might be to do the touch_nmi_watchdog() only if
> there is a "large" amount of data being emitted within
> console_unlock().  Like your 1000 character threshold.
  OK, I'll experiment a bit more and will see what works.
 
> > PS: sysrq-t was ~200 KB on that freshly booted machine so on a busy machine
> > that would trigger softlockups with 115200 serial console as well (I
> > actually do remember seeing it in some customer's reports). So it's not
> > just boot. 
> 
> Yes, we have hit this before.  Large printks over slow console devices.
> See the sad little touch_nmi_watchdog() in lib/show_mem.c, for
> example.  I actually thought we'd fixed this in printk itself but it
> seems not - the two touch_nmi_watchdog()s in there are for special
> cases.
> 
> Actually, a bit of grepping for touch_nmi_watchdog() is interesting.  A
> number of serial drivers are doing it, which makes me wonder why your
> customers weren't saved by that - using the wrong driver, perhaps?  If
> we can get this fixed centrally then a lot of those calls should be
> removeable.
  Ah, finally I understand why the hardlockup detector doesn't trigger.
These touch_nmi_watchdog() calls silence it. So that's why only RCU
eventually complains and other CPUs get locked up due to IPIs. But I never
saw softlockup reports from the CPU doing the printing (which was actually
why it took us rather long time to figure out the culprit of these stalls).

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] printk: Avoid softlockups in console_unlock()
  2013-01-17 23:46               ` Jan Kara
@ 2013-01-17 23:50                 ` Andrew Morton
  2013-01-21 21:00                   ` Jan Kara
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2013-01-17 23:50 UTC (permalink / raw)
  To: Jan Kara; +Cc: Greg Kroah-Hartman, LKML, jslaby

On Fri, 18 Jan 2013 00:46:14 +0100
Jan Kara <jack@suse.cz> wrote:

> On Thu 17-01-13 13:39:17, Andrew Morton wrote:
> > On Thu, 17 Jan 2013 22:04:42 +0100
> > Jan Kara <jack@suse.cz> wrote:
> > 
> > > ...
> > >
> > >   So I played a bit with this. To make things easier for me I added
> > > artificial mdelay(len*10) (effectively simulating console able to print 100
> > > characters per second) just after call_console_drivers() so that I can
> > > trigger issues even on a machine easily available to me. Booting actually
> > > doesn't trigger any problems because there aren't enough things happening
> > > in parallel on common machine during boot but
> > >   echo t >/proc/sysrq-trigger &
> > >   for i in /lib/modules/3.8.0-rc3-0-default/kernel/fs/*/*.ko; do 
> > >     name=`basename $i`; name=${name%.ko}; modprobe $name
> > >   done
> > > easily triggers the problem (as modprobe uses both RCU & IPIs to signal all
> > > CPUs).
> > > 
> > >   Adding
> > > 	touch_nmi_watchdog();
> > > 	touch_all_softlockup_watchdogs();
> > > 	rcu_cpu_stall_reset();
> > 
> > I'm not sure that touch_all_softlockup_watchdogs() is needed? 
> > touch_nmi_watchdog() itself calls touch_softlockup_watchdog().
>   It is. I've tried without it and the machine died a horrible death
> because softlockup reports added to already too heavy printk traffic. The
> problem is that CPU doing printing cannot handle IPIs thus if someone calls
> e.g. smp_call_function_many() that function will spin waiting for IPIs on
> all CPUs to finish. And that doesn't happen until printing is done so
> CPU doing smp_call_function_many() gets locked up as well.

erk.  I trust we'll have a nice comment explaining this mechanism ;)



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] printk: Avoid softlockups in console_unlock()
  2013-01-17 23:50                 ` Andrew Morton
@ 2013-01-21 21:00                   ` Jan Kara
  2013-01-29 14:54                     ` Jan Kara
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Kara @ 2013-01-21 21:00 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Jan Kara, Greg Kroah-Hartman, LKML, jslaby

[-- Attachment #1: Type: text/plain, Size: 2290 bytes --]

On Thu 17-01-13 15:50:29, Andrew Morton wrote:
> On Fri, 18 Jan 2013 00:46:14 +0100
> Jan Kara <jack@suse.cz> wrote:
> 
> > On Thu 17-01-13 13:39:17, Andrew Morton wrote:
> > > On Thu, 17 Jan 2013 22:04:42 +0100
> > > Jan Kara <jack@suse.cz> wrote:
> > > 
> > > > ...
> > > >
> > > >   So I played a bit with this. To make things easier for me I added
> > > > artificial mdelay(len*10) (effectively simulating console able to print 100
> > > > characters per second) just after call_console_drivers() so that I can
> > > > trigger issues even on a machine easily available to me. Booting actually
> > > > doesn't trigger any problems because there aren't enough things happening
> > > > in parallel on common machine during boot but
> > > >   echo t >/proc/sysrq-trigger &
> > > >   for i in /lib/modules/3.8.0-rc3-0-default/kernel/fs/*/*.ko; do 
> > > >     name=`basename $i`; name=${name%.ko}; modprobe $name
> > > >   done
> > > > easily triggers the problem (as modprobe uses both RCU & IPIs to signal all
> > > > CPUs).
> > > > 
> > > >   Adding
> > > > 	touch_nmi_watchdog();
> > > > 	touch_all_softlockup_watchdogs();
> > > > 	rcu_cpu_stall_reset();
> > > 
> > > I'm not sure that touch_all_softlockup_watchdogs() is needed? 
> > > touch_nmi_watchdog() itself calls touch_softlockup_watchdog().
> >   It is. I've tried without it and the machine died a horrible death
> > because softlockup reports added to already too heavy printk traffic. The
> > problem is that CPU doing printing cannot handle IPIs thus if someone calls
> > e.g. smp_call_function_many() that function will spin waiting for IPIs on
> > all CPUs to finish. And that doesn't happen until printing is done so
> > CPU doing smp_call_function_many() gets locked up as well.
> 
> erk.  I trust we'll have a nice comment explaining this mechanism ;)
  So I was testing the attached patch which does what we discussed. The bad
news is I was able to trigger a situation (twice) when suddently sda
disappeared and thus all IO requests failed with EIO. There is no trace of
what's happened in the kernel log. I'm guessing that disabled interrupts on
the printing CPU caused scsi layer to time out for some request and fail the
device. So where do we go from here?

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

[-- Attachment #2: 0001-printk-Avoid-softlockup-reports-during-heavy-printk-.patch --]
[-- Type: text/x-patch, Size: 2902 bytes --]

>From 9373cb89d434519089a1a6a0955aef08c814d70f Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Mon, 21 Jan 2013 20:30:20 +0100
Subject: [PATCH] printk: Avoid softlockup reports during heavy printk traffic

A CPU can be caught in console_unlock() for a long time (tens of seconds are
reported by our customers) when other CPUs are using printk heavily and serial
console makes printing slow.  This triggers softlockup warnings because
interrupts are disabled for the whole time console_unlock() runs (e.g.
vprintk() calls console_unlock() with interrupts disabled).

It doesn't trigger hardlockup reports because serial drivers already call
touch_nmi_watchdog() but that's not enough in some cases. For example if some
process uses on_each_cpu(), smp_call_function() will block until IPI on the
printing CPU is processed and that doesn't happen for tens of seconds so the
CPU where on_each_cpu() is executing is locked up as well.

Postponing the printing to a worker thread was deemed too risky (see discussion
in the thread starting with https://lkml.org/lkml/2013/1/15/438) so here we
take a less risky approach and just silence all the watchdogs. We have to be
careful not to make them completely useless by touching them too often so we
touch the watchdogs only every 1000 characters we print in one console_unlock()
call. That should really happen only during boot, when printing sysrq-t output,
or in similarly extreme situations.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 kernel/printk.c |   23 +++++++++++++++++++++++
 1 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/kernel/printk.c b/kernel/printk.c
index 357f714..1cbae2b 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -2043,6 +2043,7 @@ void console_unlock(void)
 	unsigned long flags;
 	bool wake_klogd = false;
 	bool retry;
+	unsigned int printed = 0;
 
 	if (console_suspended) {
 		up(&console_sem);
@@ -2103,8 +2104,30 @@ skip:
 
 		stop_critical_timings();	/* don't trace print latency */
 		call_console_drivers(level, text, len);
+		printed += len;
 		start_critical_timings();
 		local_irq_restore(flags);
+
+		/*
+		 * If we already printed too much, touch watchdogs. The limit
+		 * is set high enough so that it happens really rarely (and
+		 * printk traffic doesn't render lockup watchdogs useless) but
+		 * low enough so that even slower serial console doesn't cause
+		 * lockup reports. 9600 baud serial console prints 1000 chars
+		 * in about 1 second...
+		 */
+		if (printed > 1000) {
+			touch_nmi_watchdog();
+			/*
+			 * This is needed because other CPUs may be waiting on
+			 * this CPU to process IPI so they are stalled as well.
+			 */
+			touch_all_softlockup_watchdogs();
+			local_irq_save(flags);
+			rcu_cpu_stall_reset();
+			local_irq_restore(flags);
+			printed = 0;
+		}
 	}
 	console_locked = 0;
 	mutex_release(&console_lock_dep_map, 1, _RET_IP_);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] printk: Avoid softlockups in console_unlock()
  2013-01-21 21:00                   ` Jan Kara
@ 2013-01-29 14:54                     ` Jan Kara
  2013-01-31  0:08                       ` Andrew Morton
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Kara @ 2013-01-29 14:54 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Jan Kara, Greg Kroah-Hartman, LKML, jslaby

On Mon 21-01-13 22:00:08, Jan Kara wrote:
> On Thu 17-01-13 15:50:29, Andrew Morton wrote:
> > On Fri, 18 Jan 2013 00:46:14 +0100
> > Jan Kara <jack@suse.cz> wrote:
> > 
> > > On Thu 17-01-13 13:39:17, Andrew Morton wrote:
> > > > On Thu, 17 Jan 2013 22:04:42 +0100
> > > > Jan Kara <jack@suse.cz> wrote:
> > > > 
> > > > > ...
> > > > >
> > > > >   So I played a bit with this. To make things easier for me I added
> > > > > artificial mdelay(len*10) (effectively simulating console able to print 100
> > > > > characters per second) just after call_console_drivers() so that I can
> > > > > trigger issues even on a machine easily available to me. Booting actually
> > > > > doesn't trigger any problems because there aren't enough things happening
> > > > > in parallel on common machine during boot but
> > > > >   echo t >/proc/sysrq-trigger &
> > > > >   for i in /lib/modules/3.8.0-rc3-0-default/kernel/fs/*/*.ko; do 
> > > > >     name=`basename $i`; name=${name%.ko}; modprobe $name
> > > > >   done
> > > > > easily triggers the problem (as modprobe uses both RCU & IPIs to signal all
> > > > > CPUs).
> > > > > 
> > > > >   Adding
> > > > > 	touch_nmi_watchdog();
> > > > > 	touch_all_softlockup_watchdogs();
> > > > > 	rcu_cpu_stall_reset();
> > > > 
> > > > I'm not sure that touch_all_softlockup_watchdogs() is needed? 
> > > > touch_nmi_watchdog() itself calls touch_softlockup_watchdog().
> > >   It is. I've tried without it and the machine died a horrible death
> > > because softlockup reports added to already too heavy printk traffic. The
> > > problem is that CPU doing printing cannot handle IPIs thus if someone calls
> > > e.g. smp_call_function_many() that function will spin waiting for IPIs on
> > > all CPUs to finish. And that doesn't happen until printing is done so
> > > CPU doing smp_call_function_many() gets locked up as well.
> > 
> > erk.  I trust we'll have a nice comment explaining this mechanism ;)
>   So I was testing the attached patch which does what we discussed. The bad
> news is I was able to trigger a situation (twice) when suddently sda
> disappeared and thus all IO requests failed with EIO. There is no trace of
> what's happened in the kernel log. I'm guessing that disabled interrupts on
> the printing CPU caused scsi layer to time out for some request and fail the
> device. So where do we go from here?
  Andrew? I guess this fell off your radar via the "hrm, strange, need to
have a closer look later" path? Currently I'd be inclined to return to my
original solution...

								Honza


> From 9373cb89d434519089a1a6a0955aef08c814d70f Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@suse.cz>
> Date: Mon, 21 Jan 2013 20:30:20 +0100
> Subject: [PATCH] printk: Avoid softlockup reports during heavy printk traffic
> 
> A CPU can be caught in console_unlock() for a long time (tens of seconds are
> reported by our customers) when other CPUs are using printk heavily and serial
> console makes printing slow.  This triggers softlockup warnings because
> interrupts are disabled for the whole time console_unlock() runs (e.g.
> vprintk() calls console_unlock() with interrupts disabled).
> 
> It doesn't trigger hardlockup reports because serial drivers already call
> touch_nmi_watchdog() but that's not enough in some cases. For example if some
> process uses on_each_cpu(), smp_call_function() will block until IPI on the
> printing CPU is processed and that doesn't happen for tens of seconds so the
> CPU where on_each_cpu() is executing is locked up as well.
> 
> Postponing the printing to a worker thread was deemed too risky (see discussion
> in the thread starting with https://lkml.org/lkml/2013/1/15/438) so here we
> take a less risky approach and just silence all the watchdogs. We have to be
> careful not to make them completely useless by touching them too often so we
> touch the watchdogs only every 1000 characters we print in one console_unlock()
> call. That should really happen only during boot, when printing sysrq-t output,
> or in similarly extreme situations.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  kernel/printk.c |   23 +++++++++++++++++++++++
>  1 files changed, 23 insertions(+), 0 deletions(-)
> 
> diff --git a/kernel/printk.c b/kernel/printk.c
> index 357f714..1cbae2b 100644
> --- a/kernel/printk.c
> +++ b/kernel/printk.c
> @@ -2043,6 +2043,7 @@ void console_unlock(void)
>  	unsigned long flags;
>  	bool wake_klogd = false;
>  	bool retry;
> +	unsigned int printed = 0;
>  
>  	if (console_suspended) {
>  		up(&console_sem);
> @@ -2103,8 +2104,30 @@ skip:
>  
>  		stop_critical_timings();	/* don't trace print latency */
>  		call_console_drivers(level, text, len);
> +		printed += len;
>  		start_critical_timings();
>  		local_irq_restore(flags);
> +
> +		/*
> +		 * If we already printed too much, touch watchdogs. The limit
> +		 * is set high enough so that it happens really rarely (and
> +		 * printk traffic doesn't render lockup watchdogs useless) but
> +		 * low enough so that even slower serial console doesn't cause
> +		 * lockup reports. 9600 baud serial console prints 1000 chars
> +		 * in about 1 second...
> +		 */
> +		if (printed > 1000) {
> +			touch_nmi_watchdog();
> +			/*
> +			 * This is needed because other CPUs may be waiting on
> +			 * this CPU to process IPI so they are stalled as well.
> +			 */
> +			touch_all_softlockup_watchdogs();
> +			local_irq_save(flags);
> +			rcu_cpu_stall_reset();
> +			local_irq_restore(flags);
> +			printed = 0;
> +		}
>  	}
>  	console_locked = 0;
>  	mutex_release(&console_lock_dep_map, 1, _RET_IP_);
> -- 
> 1.7.1
> 

-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] printk: Avoid softlockups in console_unlock()
  2013-01-29 14:54                     ` Jan Kara
@ 2013-01-31  0:08                       ` Andrew Morton
  2013-01-31 12:46                         ` Jan Kara
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2013-01-31  0:08 UTC (permalink / raw)
  To: Jan Kara; +Cc: Greg Kroah-Hartman, LKML, jslaby

On Tue, 29 Jan 2013 15:54:24 +0100
Jan Kara <jack@suse.cz> wrote:

> >   So I was testing the attached patch which does what we discussed. The bad
> > news is I was able to trigger a situation (twice) when suddently sda
> > disappeared and thus all IO requests failed with EIO. There is no trace of
> > what's happened in the kernel log. I'm guessing that disabled interrupts on
> > the printing CPU caused scsi layer to time out for some request and fail the
> > device. So where do we go from here?
>   Andrew? I guess this fell off your radar via the "hrm, strange, need to
> have a closer look later" path?

urgh.  I was hoping that if we left it long enough, one of both of us
would die :(

I fear we will rue the day when we changed printk() to bounce some of
its work up to a kernel thread.

> Currently I'd be inclined to return to my original solution...

Can we make it smarter?  Say, take a peek at the current
softlockup/nmi-watchdog intervals, work out how for how long we can
afford to keep interrupts disabled and then use that period and
sched_clock() to work out if we're getting into trouble?  IOW, remove
the hard-wired "1000" thing which will always be too high or too low
for all situations.

Implementation-wise, that would probably end up adding a kernel-wide
function along the lines of

/*
 * Return the maximum number of nanosecond for which interrupts may be disabled
 * on the current CPU
 */
u64 max_interrupt_disabled_duration(void)
{
	return min(sortirq duration, nmi watchdog duration);
}

Thinking ahead...

Other kernel sites which know they can disable interrupts for a long
time can perhaps use this.

Later, realtimeish systems (for example machine controllers) might want
to add a kernel tunable so they can set the
max_interrupt_disabled_duration() return value much lower.

To make that more accurate, we could add per-cpu, per-irq variables to
record sched_clock() when each CPU enters the interrupt, so the comment
becomes

/*
 * Return the remaining maximum number of nanosecond for which interrupts may
 * be disabled on the current CPU
 */

This may all be crazy and hopefully we'll never do it, but the design
should permit such things from day one if practical.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] printk: Avoid softlockups in console_unlock()
  2013-01-17 21:39             ` Andrew Morton
  2013-01-17 23:46               ` Jan Kara
@ 2013-01-31  7:44               ` anish singh
  2013-01-31 21:21                 ` Andrew Morton
  1 sibling, 1 reply; 16+ messages in thread
From: anish singh @ 2013-01-31  7:44 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Jan Kara, Greg Kroah-Hartman, LKML, jslaby

On Fri, Jan 18, 2013 at 3:09 AM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Thu, 17 Jan 2013 22:04:42 +0100
> Jan Kara <jack@suse.cz> wrote:
>
>> ...
>>
>>   So I played a bit with this. To make things easier for me I added
>> artificial mdelay(len*10) (effectively simulating console able to print 100
>> characters per second) just after call_console_drivers() so that I can
>> trigger issues even on a machine easily available to me. Booting actually
>> doesn't trigger any problems because there aren't enough things happening
>> in parallel on common machine during boot but
>>   echo t >/proc/sysrq-trigger &
>>   for i in /lib/modules/3.8.0-rc3-0-default/kernel/fs/*/*.ko; do
>>     name=`basename $i`; name=${name%.ko}; modprobe $name
>>   done
>> easily triggers the problem (as modprobe uses both RCU & IPIs to signal all
>> CPUs).
>>
>>   Adding
>>       touch_nmi_watchdog();
>>       touch_all_softlockup_watchdogs();
>>       rcu_cpu_stall_reset();
>
> I'm not sure that touch_all_softlockup_watchdogs() is needed?
> touch_nmi_watchdog() itself calls touch_softlockup_watchdog().
>
> If the rcu_cpu_stall_reset() is needed here then presumably it is
> needed elsewhere and we should put a call to rcu_cpu_stall_reset() into
> (the increasingly misnamed) touch_nmi_watchdog().
>
>>   into the printk loop did stop all the warnings and the machine eventually
>> came alive again after finishing printing sysrq-t output (while printing
>> the machine was responding to ping but ssh stopped working after a while -
>> not sure what was happening but apparently some IO requests weren't
>> completing and maybe networking started dropping packets because it
>> accumulated too much RCU work).
>>
>> So your suggestion seems certainly plausible. I was just wondering
>>   a) Above three lines can be pretty expensive on big machines as they
>> iterate over all CPUs. So we should probably limit it to once per jiffy or
>> something like that?
>
> I guess so - is it hard to test the effects of such a change?  Maybe do
> a few MB of printks with the output disabled with `dmesg -n' and see
> what effect such a patch has?
>
> If it does need ratelimiting, I'd worry about using jiffies for that.
> If the kernel is spending a long time with interrupts disabled, jiffies
> might not be incrementing.  Using the CPU timestamp would be better
> (eg, sched_clock()).

http://lxr.free-electrons.com/source/kernel/sched/clock.c#L75
I am puzzled because of this definition(above link).Sched_clock is
dependent on jiffies and jiffies is blocked so how sched_clock would
be better(I am 100% missing something very obvious)?

Is it that sched_clock is not dependent on jiffies?
>
>>   b) Above three lines can make softlockup detection pretty useless if
>> there's enough printk traffic (it would be enough to printk() something
>> every 10s or so which can happen with netfilter logging packets or so).
>
> Yes, that is a concern.
>
>> But if we touch the watchdogs only if we spend more than 1 jiffy in the
>> console_unlock() we should hopefully touch those watchdogs only in rare
>> cases of heavy printk traffic.
>
> yup.  Another option might be to do the touch_nmi_watchdog() only if
> there is a "large" amount of data being emitted within
> console_unlock().  Like your 1000 character threshold.
>
>> PS: sysrq-t was ~200 KB on that freshly booted machine so on a busy machine
>> that would trigger softlockups with 115200 serial console as well (I
>> actually do remember seeing it in some customer's reports). So it's not
>> just boot.
>
> Yes, we have hit this before.  Large printks over slow console devices.
> See the sad little touch_nmi_watchdog() in lib/show_mem.c, for
> example.  I actually thought we'd fixed this in printk itself but it
> seems not - the two touch_nmi_watchdog()s in there are for special
> cases.
>
> Actually, a bit of grepping for touch_nmi_watchdog() is interesting.  A
> number of serial drivers are doing it, which makes me wonder why your
> customers weren't saved by that - using the wrong driver, perhaps?  If
> we can get this fixed centrally then a lot of those calls should be
> removeable.
>
> The presence of all those touch_nmi_watchdog() calls around the place
> has implications for the testing of your patch, btw ;)
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] printk: Avoid softlockups in console_unlock()
  2013-01-31  0:08                       ` Andrew Morton
@ 2013-01-31 12:46                         ` Jan Kara
  0 siblings, 0 replies; 16+ messages in thread
From: Jan Kara @ 2013-01-31 12:46 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Jan Kara, Greg Kroah-Hartman, LKML, jslaby

On Wed 30-01-13 16:08:27, Andrew Morton wrote:
> On Tue, 29 Jan 2013 15:54:24 +0100
> Jan Kara <jack@suse.cz> wrote:
> 
> > >   So I was testing the attached patch which does what we discussed. The bad
> > > news is I was able to trigger a situation (twice) when suddently sda
> > > disappeared and thus all IO requests failed with EIO. There is no trace of
> > > what's happened in the kernel log. I'm guessing that disabled interrupts on
> > > the printing CPU caused scsi layer to time out for some request and fail the
> > > device. So where do we go from here?
> >   Andrew? I guess this fell off your radar via the "hrm, strange, need to
> > have a closer look later" path?
> 
> urgh.  I was hoping that if we left it long enough, one of both of us
> would die :(
  I'm too young for this strategy to work for me :)

> I fear we will rue the day when we changed printk() to bounce some of
> its work up to a kernel thread.
> 
> > Currently I'd be inclined to return to my original solution...
> 
> Can we make it smarter?  Say, take a peek at the current
> softlockup/nmi-watchdog intervals, work out how for how long we can
> afford to keep interrupts disabled and then use that period and
> sched_clock() to work out if we're getting into trouble?  IOW, remove
> the hard-wired "1000" thing which will always be too high or too low
> for all situations.
  Yes, I also thought that making offloading more clever (so that offload
doesn't happen unless we really have no choice) could make the approach
more acceptable. 

> Implementation-wise, that would probably end up adding a kernel-wide
> function along the lines of
> 
> /*
>  * Return the maximum number of nanosecond for which interrupts may be disabled
>  * on the current CPU
>  */
> u64 max_interrupt_disabled_duration(void)
> {
> 	return min(sortirq duration, nmi watchdog duration);
> }
  OK, that sounds good. So I'll write some patch...
 
								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] printk: Avoid softlockups in console_unlock()
  2013-01-31  7:44               ` anish singh
@ 2013-01-31 21:21                 ` Andrew Morton
  0 siblings, 0 replies; 16+ messages in thread
From: Andrew Morton @ 2013-01-31 21:21 UTC (permalink / raw)
  To: anish singh; +Cc: Jan Kara, Greg Kroah-Hartman, LKML, jslaby

On Thu, 31 Jan 2013 13:14:05 +0530
anish singh <anish198519851985@gmail.com> wrote:

> > If it does need ratelimiting, I'd worry about using jiffies for that.
> > If the kernel is spending a long time with interrupts disabled, jiffies
> > might not be incrementing.  Using the CPU timestamp would be better
> > (eg, sched_clock()).
> 
> http://lxr.free-electrons.com/source/kernel/sched/clock.c#L75
> I am puzzled because of this definition(above link).Sched_clock is
> dependent on jiffies and jiffies is blocked so how sched_clock would
> be better(I am 100% missing something very obvious)?
> 
> Is it that sched_clock is not dependent on jiffies?

yes, I think sched_clock is dependent on jiffies for some architectures.

I was really using sched_clock as a place-filler for "some timer which
keeps running when interrupts are disabled" ;) I'm not really sure what
that would be nowadays - even get_cycles() isn't implemented on some
architectures.  I guess some architectures will need a lame fallback or
some sort.



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2013-01-31 21:21 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-15 17:58 [PATCH] printk: Avoid softlockups in console_unlock() Jan Kara
2013-01-16  7:37 ` Andrew Morton
2013-01-16 10:16   ` Jan Kara
2013-01-16 22:50     ` Andrew Morton
2013-01-16 23:55       ` Jan Kara
2013-01-17  0:11         ` Andrew Morton
2013-01-17 21:04           ` Jan Kara
2013-01-17 21:39             ` Andrew Morton
2013-01-17 23:46               ` Jan Kara
2013-01-17 23:50                 ` Andrew Morton
2013-01-21 21:00                   ` Jan Kara
2013-01-29 14:54                     ` Jan Kara
2013-01-31  0:08                       ` Andrew Morton
2013-01-31 12:46                         ` Jan Kara
2013-01-31  7:44               ` anish singh
2013-01-31 21:21                 ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).