From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754891AbdLODSg (ORCPT ); Thu, 14 Dec 2017 22:18:36 -0500 Received: from mail.kernel.org ([198.145.29.99]:36964 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754792AbdLODSf (ORCPT ); Thu, 14 Dec 2017 22:18:35 -0500 DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5849D2186A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=rostedt@goodmis.org Date: Thu, 14 Dec 2017 22:18:31 -0500 From: Steven Rostedt To: Sergey Senozhatsky Cc: Tejun Heo , Sergey Senozhatsky , Petr Mladek , Jan Kara , Andrew Morton , Peter Zijlstra , Rafael Wysocki , Pavel Machek , Tetsuo Handa , linux-kernel@vger.kernel.org Subject: Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread Message-ID: <20171214221831.3ead0298@vmware.local.home> In-Reply-To: <20171215021024.GA11199@jagdpanzerIV> References: <20171204134825.7822-1-sergey.senozhatsky@gmail.com> <20171214142709.trgl76hbcdwaczzd@pathway.suse.cz> <20171214152551.GY3919388@devbig577.frc2.facebook.com> <20171214125506.52a7e5fa@gandalf.local.home> <20171214181153.GZ3919388@devbig577.frc2.facebook.com> <20171215021024.GA11199@jagdpanzerIV> X-Mailer: Claws Mail 3.15.1 (GTK+ 2.24.31; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 15 Dec 2017 11:10:24 +0900 Sergey Senozhatsky wrote: > Steven, your approach works ONLY when we have the following preconditions: > > a) there is a CPU that is calling printk() from the 'safe' (non-atomic, > etc) context > > what does guarantee that? what happens if there is NO non-atomic > CPU or that non-atomic simplky missses the console_owner != false > point? we are going to conclude > > "if printk() doesn't work for you, it's because you are holding it wrong"? > > > what if that non-atomic CPU does not call printk(), but instead > it does console_lock()/console_unlock()? why there is no handoff? > > CPU0 CPU1 ~ CPU10 > in atomic contexts [!]. ping-ponging console_sem > ownership to each other. while what they really > need to do is to simply up() and let CPU0 to > handle it. > printk > console_lock() > schedule() > ... > printk > printk > ... > printk > printk > > up() > > // woken up > console_unlock() > > why do we make an emphasis on fixing vprintk_printk()? Where do we do the above? And has this been proven to be an issue? If it has, I think it's a separate issue than what I proposed. As what I proposed is to fix the case where lots of CPUs are doing printks, and only one actually does the write. > > > b) non-atomic CPU sees console_owner set (which is set for a very short > period of time) > > again. what if that non-atomic CPU does not see console_owner? > "don't use printk()"? May I ask, why are we doing the printk in the first place? > > c) the task that is looping in console_unlock() sees non-atomic CPU when > console_owner is set. I haven't looked at the latest code, but my last patch didn't care about "atomic" and "non-atomic" issues, because I don't know if that is indeed an issue in the real world. > > > IOW, we need to have > > > the right CPU (a) at the very right moment (b && c) doing the very right thing. > > > * and the "very right moment" is tiny and additionally depends > on a foreign CPU [the one that is looping in console_unlock()]. > > > > a simple question - how is that going to work for everyone? are we > "fixing" a small fraction of possible use-cases? Still sounds like you are ;-) > > > > Steven, I thought we reached the agreement [**] that the solution we should > be working on is a combination of prinkt_kthread and console_sem hand > off. Simply because it adds the missing "there is a non-atomic CPU wishing > to console_unlock()" thing. > > lkml.kernel.org/r/20171108162813.GA983427@devbig577.frc2.facebook.com > > https://marc.info/?l=linux-kernel&m=151011840830776&w=2 > https://marc.info/?l=linux-kernel&m=151015141407368&w=2 > https://marc.info/?l=linux-kernel&m=151018900919386&w=2 > https://marc.info/?l=linux-kernel&m=151019815721161&w=2 > https://marc.info/?l=linux-kernel&m=151020275921953&w=2 > ** https://marc.info/?l=linux-kernel&m=151020404622181&w=2 > ** https://marc.info/?l=linux-kernel&m=151020565222469&w=2 I'm still fine with the hybrid approach, but I want to see a problem first before we fix it. > > > what am I missing? The reproducer. Let Tejun do the test with just my patch, and if it still has problems, then we can add more logic to the code. I like to take things one step at a time. What I'm seeing is that there was a problem that could be solved with my solution, but during this process, people have found hundreds of theoretical problems and started down the path to solve each of them. I want to see a real bug, before we go down the path of having to have external threads and such, to solve a bug that we don't really know exists yet. -- Steve