From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754039AbbE1MAk (ORCPT ); Thu, 28 May 2015 08:00:40 -0400 Received: from cantor2.suse.de ([195.135.220.15]:41955 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753989AbbE1MA3 (ORCPT ); Thu, 28 May 2015 08:00:29 -0400 Date: Thu, 28 May 2015 14:00:27 +0200 From: Petr Mladek To: Andrew Morton Cc: Frederic Weisbecker , Steven Rostedt , Dave Anderson , "Paul E. McKenney" , Kay Sievers , Jiri Kosina , Michal Hocko , Jan Kara , linux-kernel@vger.kernel.org, Wang Long , peifeiyue@huawei.com, dzickus@redhat.com, morgan.wang@huawei.com, sasha.levin@oracle.com Subject: Re: [PATCH 01/10] printk: Avoid deadlock in NMI context Message-ID: <20150528120026.GD3135@pathway.suse.cz> References: <1432557993-20458-1-git-send-email-pmladek@suse.cz> <1432557993-20458-2-git-send-email-pmladek@suse.cz> <20150527161346.fb3178d393ebbaafea4e3906@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150527161346.fb3178d393ebbaafea4e3906@linux-foundation.org> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 2015-05-27 16:13:46, Andrew Morton wrote: > On Mon, 25 May 2015 14:46:24 +0200 Petr Mladek wrote: > > > printk() cannot be used in NMI context safely because it uses an internal > > lock and thus could cause a deadlock. This is fine because NMI context > > is very special. The handlers should be short, effective, and do not > > use printk(). > > > > But developers tend to print warnings even from NMI code. They are pretty > > hard to debug when they lockup the machine and nothing appears in the logs. > > > > This patch prevents from the deadlock on logbuf_lock by using trylock > > rather than spin_lock. If the lock can not be taken, the message is > > ignored and some warning is printed later on. > > > > We also must not try to get console from NMI context. It needs > > even more locks and there is even higher chance to hung up. > > > > Unfortunately, we could not print more details about the lost message. > > We could not alloc a buffer in NMI. Therefore we would need some > > lockless mechanism to share a buffer between NMI and normal context. > > But this would make printk() code much more complicated and > > it is not worth it. There has already been an attempt to do so > > and it has been rejected, see https://lkml.org/lkml/2014/6/10/388 > > This is also the reason why we use the atomic counter. > > hm, I expect it wouldn't be too messy to shove the text into a static > per-cpu buffer. So we at least get a few hundred bytes of stuff. The problem is that we would need to read the static buffer in the normal context without a lock. The result might be a messy message. Or I could add some lock-less hackery to keep some consistency but this would make the code more complex. In each case, we will not be able to preserve all messages. So, I am not sure if any more complex solution is worth doing. Note that we are talking about a corner case. printk() should not be used in NMI in the first place. If it is used, we still do our best to get it out. We try to get Oops messages even harder out. If the message is lost, it might mean some flood of printk()s and the message might get lost anyway. > > > + /* emit KERN_CRIT message */ > > + printed_len += log_store(0, 2, LOG_PREFIX|LOG_NEWLINE, 0, > + NULL, 0, text, text_len); > > s/2/LOGLEVEL_CRIT/ Good point. Best Regards, Petr