From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1754039AbbE1MAk (ORCPT <rfc822;w@1wt.eu>);
	Thu, 28 May 2015 08:00:40 -0400
Received: from cantor2.suse.de ([195.135.220.15]:41955 "EHLO mx2.suse.de"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1753989AbbE1MA3 (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Thu, 28 May 2015 08:00:29 -0400
Date: Thu, 28 May 2015 14:00:27 +0200
From: Petr Mladek <pmladek@suse.cz>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>,
        Steven Rostedt <rostedt@goodmis.org>,
        Dave Anderson <anderson@redhat.com>,
        "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
        Kay Sievers <kay@vrfy.org>, Jiri Kosina <jkosina@suse.cz>,
        Michal Hocko <mhocko@suse.cz>, Jan Kara <jack@suse.cz>,
        linux-kernel@vger.kernel.org, Wang Long <long.wanglong@huawei.com>,
        peifeiyue@huawei.com, dzickus@redhat.com, morgan.wang@huawei.com,
        sasha.levin@oracle.com
Subject: Re: [PATCH 01/10] printk: Avoid deadlock in NMI context
Message-ID: <20150528120026.GD3135@pathway.suse.cz>
References: <1432557993-20458-1-git-send-email-pmladek@suse.cz>
 <1432557993-20458-2-git-send-email-pmladek@suse.cz>
 <20150527161346.fb3178d393ebbaafea4e3906@linux-foundation.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20150527161346.fb3178d393ebbaafea4e3906@linux-foundation.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Wed 2015-05-27 16:13:46, Andrew Morton wrote:
> On Mon, 25 May 2015 14:46:24 +0200 Petr Mladek <pmladek@suse.cz> wrote:
> 
> > printk() cannot be used in NMI context safely because it uses an internal
> > lock and thus could cause a deadlock. This is fine because NMI context
> > is very special. The handlers should be short, effective, and do not
> > use printk().
> > 
> > But developers tend to print warnings even from NMI code. They are pretty
> > hard to debug when they lockup the machine and nothing appears in the logs.
> > 
> > This patch prevents from the deadlock on logbuf_lock by using trylock
> > rather than spin_lock. If the lock can not be taken, the message is
> > ignored and some warning is printed later on.
> > 
> > We also must not try to get console from NMI context. It needs
> > even more locks and there is even higher chance to hung up.
> > 
> > Unfortunately, we could not print more details about the lost message.
> > We could not alloc a buffer in NMI. Therefore we would need some
> > lockless mechanism to share a buffer between NMI and normal context.
> > But this would make printk() code much more complicated and
> > it is not worth it. There has already been an attempt to do so
> > and it has been rejected, see https://lkml.org/lkml/2014/6/10/388
> > This is also the reason why we use the atomic counter.
> 
> hm, I expect it wouldn't be too messy to shove the text into a static
> per-cpu buffer.  So we at least get a few hundred bytes of stuff.

The problem is that we would need to read the static buffer in the normal
context without a lock. The result might be a messy message. Or I
could add some lock-less hackery to keep some consistency but this
would make the code more complex.

In each case, we will not be able to preserve all messages. So, I am
not sure if any more complex solution is worth doing.

Note that we are talking about a corner case. printk() should not be
used in NMI in the first place. If it is used, we still do our best
to get it out. We try to get Oops messages even harder out. If the
message is lost, it might mean some flood of printk()s and
the message might get lost anyway.

> 
> > +		/* emit KERN_CRIT message */
> > +		printed_len += log_store(0, 2, LOG_PREFIX|LOG_NEWLINE, 0,
>   +					 NULL, 0, text, text_len);
> 
> s/2/LOGLEVEL_CRIT/

Good point.

Best Regards,
Petr