From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============8247109897995723324==" MIME-Version: 1.0 From: Petr Mladek To: lkp@lists.01.org Subject: Re: [printk] fbc14616f4: BUG:kernel_reboot-without-warning_in_test_stage Date: Mon, 10 Apr 2017 13:54:32 +0200 Message-ID: <20170410115432.GV3452@pathway.suse.cz> In-Reply-To: <20170410045339.GB2793@jagdpanzerIV.localdomain> List-Id: --===============8247109897995723324== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable On Mon 2017-04-10 13:53:39, Sergey Senozhatsky wrote: > On (04/09/17 12:12), Pavel Machek wrote: > [..] > > > a side note, > > > that's rather unclear to me how would "message delayed" really help. > > > if your system hard-lockup so badly and there are no printk messages > > > even from NMI watchdog, then we won't be able to print that message. > > = > > We are talking about > > = > > printk("unusual condition"); > > do_something_clever(); /* Which unfortunately hard-crashes the machi= ne */ > > = > > that works with my proposal, but not with yours. Seen it happen many > > times before. > = > I see your point, sure. > I can't completely agree on "that works with my proposal, but not with yo= urs." > = > on SMP system this would be true only if no other CPU holds the console_s= em > at the time we call printk(). (skipping irrelevant cases when we have sus= pended > console or !online CPU and !CON_ANYTIME console). and there is nothing th= at > makes "no other CPU holds the console_sem" always true on SMP system at a= ny > given point in time. so no, "A always works, B never works" is not accura= te. > = > but, once again, I see your point. A compromise might be to move the offloading from vprintk_emit() to console_unlock(). By other words, the printk could always try to flush some messages to the console. The console might trigger the offload (wakeup kthread) after few lines or when the printing takes too long. We could go even furter. We could replace the cond_resched() in console_unlock() with a need_resched() check. Then we could avoid sleeping with console_sem taken. It will avoid the softlockups caused by printk(). It should work pretty well in most critical situations. Of course, it will not guarantee that we will see all messages when there is a flood of messages from many CPUs. But it was never guaranteed. Best Regards, Petr --===============8247109897995723324==-- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753355AbdDJLyh (ORCPT ); Mon, 10 Apr 2017 07:54:37 -0400 Received: from mx2.suse.de ([195.135.220.15]:38992 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752955AbdDJLyg (ORCPT ); Mon, 10 Apr 2017 07:54:36 -0400 Date: Mon, 10 Apr 2017 13:54:32 +0200 From: Petr Mladek To: Sergey Senozhatsky Cc: Pavel Machek , Sergey Senozhatsky , Jan Kara , "Eric W. Biederman" , Ye Xiaolong , Steven Rostedt , Andrew Morton , Linus Torvalds , Peter Zijlstra , "Rafael J . Wysocki" , Greg Kroah-Hartman , Jiri Slaby , Len Brown , linux-kernel@vger.kernel.org, lkp@01.org Subject: Re: [printk] fbc14616f4: BUG:kernel_reboot-without-warning_in_test_stage Message-ID: <20170410115432.GV3452@pathway.suse.cz> References: <20170406173306.GD10363@amd> <20170407044334.GA487@jagdpanzerIV.localdomain> <20170407071558.GA11792@amd> <20170407074634.GB1091@jagdpanzerIV.localdomain> <20170407081449.GA12859@amd> <20170407121021.GA379@jagdpanzerIV.localdomain> <20170407124455.GC4756@amd> <20170407151306.GA384@tigerII.localdomain> <20170409101230.GB27363@amd> <20170410045339.GB2793@jagdpanzerIV.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170410045339.GB2793@jagdpanzerIV.localdomain> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon 2017-04-10 13:53:39, Sergey Senozhatsky wrote: > On (04/09/17 12:12), Pavel Machek wrote: > [..] > > > a side note, > > > that's rather unclear to me how would "message delayed" really help. > > > if your system hard-lockup so badly and there are no printk messages > > > even from NMI watchdog, then we won't be able to print that message. > > > > We are talking about > > > > printk("unusual condition"); > > do_something_clever(); /* Which unfortunately hard-crashes the machine */ > > > > that works with my proposal, but not with yours. Seen it happen many > > times before. > > I see your point, sure. > I can't completely agree on "that works with my proposal, but not with yours." > > on SMP system this would be true only if no other CPU holds the console_sem > at the time we call printk(). (skipping irrelevant cases when we have suspended > console or !online CPU and !CON_ANYTIME console). and there is nothing that > makes "no other CPU holds the console_sem" always true on SMP system at any > given point in time. so no, "A always works, B never works" is not accurate. > > but, once again, I see your point. A compromise might be to move the offloading from vprintk_emit() to console_unlock(). By other words, the printk could always try to flush some messages to the console. The console might trigger the offload (wakeup kthread) after few lines or when the printing takes too long. We could go even furter. We could replace the cond_resched() in console_unlock() with a need_resched() check. Then we could avoid sleeping with console_sem taken. It will avoid the softlockups caused by printk(). It should work pretty well in most critical situations. Of course, it will not guarantee that we will see all messages when there is a flood of messages from many CPUs. But it was never guaranteed. Best Regards, Petr