From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751337AbdLOFGN (ORCPT ); Fri, 15 Dec 2017 00:06:13 -0500 Received: from mail-pg0-f68.google.com ([74.125.83.68]:38686 "EHLO mail-pg0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751252AbdLOFGM (ORCPT ); Fri, 15 Dec 2017 00:06:12 -0500 X-Google-Smtp-Source: ACJfBou7LUt/sY2GiXMIcFXR4sVb3epvjGhrqLxa663YitBBpBywe1eijb7lfxTc5hoHy6FNza5P2Q== Date: Fri, 15 Dec 2017 14:06:07 +0900 From: Sergey Senozhatsky To: Steven Rostedt Cc: Sergey Senozhatsky , Tejun Heo , Sergey Senozhatsky , Petr Mladek , Jan Kara , Andrew Morton , Peter Zijlstra , Rafael Wysocki , Pavel Machek , Tetsuo Handa , linux-kernel@vger.kernel.org Subject: Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread Message-ID: <20171215050607.GC11199@jagdpanzerIV> References: <20171204134825.7822-1-sergey.senozhatsky@gmail.com> <20171214142709.trgl76hbcdwaczzd@pathway.suse.cz> <20171214152551.GY3919388@devbig577.frc2.facebook.com> <20171214125506.52a7e5fa@gandalf.local.home> <20171214181153.GZ3919388@devbig577.frc2.facebook.com> <20171215021024.GA11199@jagdpanzerIV> <20171214221831.3ead0298@vmware.local.home> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20171214221831.3ead0298@vmware.local.home> User-Agent: Mutt/1.9.1 (2017-09-22) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, On (12/14/17 22:18), Steven Rostedt wrote: > > Steven, your approach works ONLY when we have the following preconditions: > > > > a) there is a CPU that is calling printk() from the 'safe' (non-atomic, > > etc) context > > > > what does guarantee that? what happens if there is NO non-atomic > > CPU or that non-atomic simplky missses the console_owner != false > > point? we are going to conclude > > > > "if printk() doesn't work for you, it's because you are holding it wrong"? > > > > > > what if that non-atomic CPU does not call printk(), but instead > > it does console_lock()/console_unlock()? why there is no handoff? > > > > CPU0 CPU1 ~ CPU10 > > in atomic contexts [!]. ping-ponging console_sem > > ownership to each other. while what they really > > need to do is to simply up() and let CPU0 to > > handle it. > > printk > > console_lock() > > schedule() > > ... > > printk > > printk > > ... > > printk > > printk > > > > up() > > > > // woken up > > console_unlock() > > > > why do we make an emphasis on fixing vprintk_printk()? > > Where do we do the above? And has this been proven to be an issue? um... hundreds of cases. deep-stack spin_lock_irqsave() lockup reports from multiple CPUs (3 cpus) happening at the same moment + NMI backtraces from all the CPUs (more than 3 cpus) that follows the lockups, over not-so-fast serial console. exactly the bug report I received two days ago. so which one of the CPUs here is a good candidate to successfully emit all of the pending logbuf entries? none. all of them either have local IRQs disabled, or dump_stack() from either backtrace IPI or backtrace NMI (depending on the configuration). do we periodically do console_lock() on a running system? yes, we do. add to console_unlock() --- diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index b9006617710f..1c811f6d94bf 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -2143,6 +2143,10 @@ void console_unlock(void) bool wake_klogd = false; bool do_cond_resched, retry; + if (!(current->flags & PF_KTHREAD)) + dump_stack(); + + if (console_suspended) { up_console_sem(); return; --- and just boot the system. I work for a company that has several thousand engineers spread across the globe. and people do use printk(), and issues do happen. the scenarios that Tejun and I talk about are not theoretical. if those scenarios are completely theoretical, as you suggest, - then, OK, what exactly guarantees that whenever atomic CPUs printk there is always a non-atomic CPU to take over the printing? > > b) non-atomic CPU sees console_owner set (which is set for a very short > > period of time) > > > > again. what if that non-atomic CPU does not see console_owner? > > "don't use printk()"? > > May I ask, why are we doing the printk in the first place? this argument is really may be applied against your patch as well. I really don't want us to have this type of "technical" discussion. printk() is a tool for developers. but developers can't use. > > c) the task that is looping in console_unlock() sees non-atomic CPU when > > console_owner is set. > > I haven't looked at the latest code, but my last patch didn't care > about "atomic" and "non-atomic" I know. and I think it is sort of a problem. lots of printk-s are happening from IRQs / softirqs and so on. take a look at CONFIG_IP_ROUTE_VERBOSE, for example. do_softirq() -> ip_handle_martian_source() and a bunch of other places. these irq->printk-s can "steal" the console_sem and go to console_unlock(). "don't use printk() then" type of argument does not really help to a guy who reports the lockup. > > Steven, I thought we reached the agreement [**] that the solution we should > > be working on is a combination of prinkt_kthread and console_sem hand > > off. Simply because it adds the missing "there is a non-atomic CPU wishing > > to console_unlock()" thing. > > > > lkml.kernel.org/r/20171108162813.GA983427@devbig577.frc2.facebook.com > > > > https://marc.info/?l=linux-kernel&m=151011840830776&w=2 > > https://marc.info/?l=linux-kernel&m=151015141407368&w=2 > > https://marc.info/?l=linux-kernel&m=151018900919386&w=2 > > https://marc.info/?l=linux-kernel&m=151019815721161&w=2 > > https://marc.info/?l=linux-kernel&m=151020275921953&w=2 > > ** https://marc.info/?l=linux-kernel&m=151020404622181&w=2 > > ** https://marc.info/?l=linux-kernel&m=151020565222469&w=2 > > I'm still fine with the hybrid approach, but I want to see a problem > first before we fix it. > > > > > > > what am I missing? > > The reproducer. will that printk_test module lkml.kernel.org/r/20171204135314.9122-2-sergey.senozhatsky@gmail.com suffice? -ss