From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932407AbeAJHSb (ORCPT + 1 other); Wed, 10 Jan 2018 02:18:31 -0500 Received: from mail.kernel.org ([198.145.29.99]:50900 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932139AbeAJHSa (ORCPT ); Wed, 10 Jan 2018 02:18:30 -0500 DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 372FB20C51 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=rostedt@goodmis.org Date: Wed, 10 Jan 2018 02:18:27 -0500 From: Steven Rostedt To: Tejun Heo Cc: Petr Mladek , Sergey Senozhatsky , Jan Kara , Andrew Morton , Peter Zijlstra , Rafael Wysocki , Pavel Machek , Tetsuo Handa , linux-kernel@vger.kernel.org, Sergey Senozhatsky Subject: Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread Message-ID: <20180110021827.350ba374@vmware.local.home> In-Reply-To: <20180109225356.GW3668920@devbig577.frc2.facebook.com> References: <20171214152551.GY3919388@devbig577.frc2.facebook.com> <20171214125506.52a7e5fa@gandalf.local.home> <20171214181153.GZ3919388@devbig577.frc2.facebook.com> <20171214132109.32ae6a74@gandalf.local.home> <20171222000932.GG1084507@devbig577.frc2.facebook.com> <20171221231932.27727fab@vmware.local.home> <20180109200620.GQ3668920@devbig577.frc2.facebook.com> <20180109170847.28b41eec@vmware.local.home> <20180109221705.GU3668920@devbig577.frc2.facebook.com> <20180109174750.2551c2a1@vmware.local.home> <20180109225356.GW3668920@devbig577.frc2.facebook.com> X-Mailer: Claws Mail 3.15.1 (GTK+ 2.24.31; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Tue, 9 Jan 2018 14:53:56 -0800 Tejun Heo wrote: > Hello, Steven. > > On Tue, Jan 09, 2018 at 05:47:50PM -0500, Steven Rostedt wrote: > > > Maybe it can break out eventually but that can take a really long > > > time. It's OOM. Most of userland is waiting for reclaim. There > > > isn't all that much going on outside that and there can only be one > > > CPU which is OOMing. The kernel isn't gonna be all that chatty. > > > > Are you saying that the OOM is stuck printing over and over on a single > > CPU. Perhaps we should fix THAT. > > I'm not sure what you meant but OOM code isn't doing anything bad My point is, that your test is only hammering at a single CPU. You say it is the scenario you see, which means that the OOM is printing out more than it should, because if it prints it out once, it should not print it out again for the same process, or go into a loop doing it over and over on a single CPU. That would be a bug in the implementation. > other than excluding others from doing OOM kills simultaneously, which > is what we want, and printing a lot of messages and then gets caught > up in a positive feedback loop. > > To me, the whole point of this effort is preventing printk messages > from causing significant or critical disruptions to overall system > operation. I agree, and my patch helps with this tremendously, if we are not doing something stupid like printk thousands of times in an interrupt handler, over and over on a single CPU. > IOW, it's rather dumb if the machine goes down because > somebody printk'd wrong or just failed to foresee the combinations of > events which could lead to such conditions. I still like to see a trace of a real situation. > > It's not like we don't know how to fix this either. But we don't want the fix to introduce regressions, and offloading printk does. Heck, the current fixes to printk has causes issues for me in my own debugging. Like we can no longer do large dumps of printk from NMI context. Which I use to do when detecting a lock up and then doing a task list dump of all tasks. Or even a ftrace_dump_on_oops. http://lkml.kernel.org/r/20180109162019.GL3040@hirez.programming.kicks-ass.net -- Steve