From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757816AbbIVKzI (ORCPT ); Tue, 22 Sep 2015 06:55:08 -0400 Received: from mx2.suse.de ([195.135.220.15]:44195 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756258AbbIVKzG (ORCPT ); Tue, 22 Sep 2015 06:55:06 -0400 Date: Tue, 22 Sep 2015 12:55:02 +0200 From: Jan Kara To: Andrew Morton Cc: Jan Kara , LKML , pmladek@suse.com, rostedt@goodmis.org, Gavin Hu , KY Srinivasan , Jan Kara Subject: Re: [PATCH 3/4] kernel: Avoid softlockups in stop_machine() during heavy printing Message-ID: <20150922105502.GK9028@quack.suse.cz> References: <1439998711-7013-1-git-send-email-jack@suse.com> <1439998711-7013-4-git-send-email-jack@suse.com> <20150918151521.c5c8caa59a0e254fdd713337@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150918151521.c5c8caa59a0e254fdd713337@linux-foundation.org> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 18-09-15 15:15:21, Andrew Morton wrote: > On Wed, 19 Aug 2015 17:38:30 +0200 Jan Kara wrote: > > > From: Jan Kara > > > > When there are lots of messages accumulated in printk buffer, printing > > them (especially over serial console) can take a long time (tens of > > seconds). stop_machine() will effectively make all cpus spin in > > multi_cpu_stop() waiting for the CPU doing printing to print all the > > messages which triggers NMI softlockup watchdog and RCU stall detector > > which add even more to the messages to print. Since machine doesn't do > > anything (except serving interrupts) during this time, also network > > connections are dropped and other disturbances may happen. > > > > Paper over the problem by waiting for printk buffer to be empty before > > starting to stop CPUs. In theory a burst of new messages can be appended > > to the printk buffer before CPUs enter multi_cpu_stop() so this isn't a 100% > > solution but it works OK in practice and I'm not aware of a reasonably > > simple better solution. > > Confused. Why don't patches 1 and 2 already fix this problem? Because stop_machine() will not allow printing threads to run on any CPU (all but one CPUs are spinning in multi_cpu_stop() without possibility of preemption) and thus any printk offloading cannot happen. > > ... > > > > @@ -2489,6 +2489,28 @@ struct tty_driver *console_device(int *index) > > } > > > > /* > > + * Wait until all messages accumulated in the printk buffer are printed to > > + * console. Note that as soon as this function returns, new messages may be > > + * added to the printk buffer by other CPUs. > > + */ > > +void console_flush(void) > > This doesn't seem a very good name. We already have > console_cont_flush(), cont_flush(), etc. Can we think of something > more specific? printk_log_buf_drain() perhaps. Thanks for suggestion. I'll change the name. > > +{ > > + bool retry; > > + unsigned long flags; > > + > > + while (1) { > > + raw_spin_lock_irqsave(&logbuf_lock, flags); > > + retry = console_seq != log_next_seq; > > + raw_spin_unlock_irqrestore(&logbuf_lock, flags); > > Does this lock/unlock do anything useful? > > > + if (!retry || console_suspended) > > + break; > > + /* Cycle console_sem to wait for outstanding printing */ > > + console_lock(); > > + console_unlock(); > > + } > > +} > > + > > +/* > > * Prevent further output on the passed console device so that (for example) > > * serial drivers can disable console output before suspending a port, and can > > * re-enable output afterwards. > > diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c > > index fd643d8c4b42..016d34621d2e 100644 > > --- a/kernel/stop_machine.c > > +++ b/kernel/stop_machine.c > > @@ -21,6 +21,7 @@ > > #include > > #include > > #include > > +#include > > > > /* > > * Structure to determine completion condition and record errors. May > > @@ -543,6 +544,14 @@ int __stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus) > > return ret; > > } > > > > + /* > > + * If there are lots of outstanding messages, printing them can take a > > + * long time and all cpus would be spinning waiting for the printing to > > + * finish thus triggering NMI watchdog, RCU lockups etc. Wait for the > > + * printing here to avoid these. > > + */ > > + console_flush(); > > This is pretty pointless if num_possible_cpus==1. I'd suggest setting > printk_offload_chars=0 in this case, add some early bale-out into > console_flush(). Or something along those lines. > > And make console_flush() go away altogether if CONFIG_SMP=n - it's > pointless bloat. Sure, I'll do that. Honza -- Jan Kara SUSE Labs, CR