From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751326AbcGKWqF (ORCPT ); Mon, 11 Jul 2016 18:46:05 -0400 Received: from mail-pf0-f170.google.com ([209.85.192.170]:36697 "EHLO mail-pf0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750787AbcGKWqD (ORCPT ); Mon, 11 Jul 2016 18:46:03 -0400 Date: Mon, 11 Jul 2016 15:46:01 -0700 From: Viresh Kumar To: "Rafael J. Wysocki" Cc: Jan Kara , Sergey Senozhatsky , Tejun Heo , Greg Kroah-Hartman , Linux Kernel Mailing List , vlevenetz@mm-sol.com, vaibhav.hiremath@linaro.org, alex.elder@linaro.org, johan@kernel.org, akpm@linux-foundation.org, rostedt@goodmis.org, Sergey Senozhatsky Subject: Re: [Query] Preemption (hogging) of the work handler Message-ID: <20160711224601.GJ4695@ubuntu> References: <20160701165959.GR12473@ubuntu> <20160711154438.GA528@swordfish> <20160711223501.GI4695@ubuntu> <2231804.EWgFb9e2VG@vostro.rjw.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2231804.EWgFb9e2VG@vostro.rjw.lan> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 12-07-16, 00:44, Rafael J. Wysocki wrote: > On Monday, July 11, 2016 03:35:01 PM Viresh Kumar wrote: > > Hi Sergey and Jan, > > > > On 12-07-16, 00:44, Sergey Senozhatsky wrote: > > > right. apart from cases when the existing console_unlock() behaviour can > > > simply "block" a process to flush the log_buf to slow serial consoles > > > (regardless the process execution context) and make the system less > > > responsive, I have around ~10 absolutely different scenarios on my list that > > > may cause soft/hard lockups, rcu stalls, oom-s, etc. and console_unlock() is > > > the root cause there. the simplest ones involve heavy printk() usage, the > > > trickier ones do not necessarily have anything that is abusing printk(): a > > > moderate printk() pressure coming from other CPUs on the system and more or > > > less active tty -> UART can do the trick, because uart interrupt service > > > routine and call_console_drivers()->write() have to compete for the same > > > uart port spin_lock. soft lockups are probably the most common problems, > > > though, it's not all that easy to catch, because watchdog does not ring > > > the bell straight after preempt_enable(), but from hrtimer interrupt, that > > > happens approx every 4 seconds. by this time CPU can be somewhere far away > > > from console_unlock(). I had an idea of doing watchdog soft lockup check > > > from preempt_enable(), when it brings preempt_count down to zero, but not > > > sure I can recall how well did it go. > > > > Thanks for your feedback guys, and I have one more blocking issue > > where I need your help/advice. > > > > So, the excess printing in our case is done in parallel to system > > suspend. And that can very much happen after all the non-boot CPUs are > > offlined. > > > > Sometimes, the platform doesn't come back after suspend. I have tried > > enabling no-console-suspend and the last line it prints is: > > > > Disabling non-boot CPUs > > > > And nothing after that at all. We have to forcefully reboot the phone > > after that. Moving the prints to they synchronous way (using > > echo 1 > /sys/module/printk/parameters/synchronous), fixes that issue. > > But no_console_suspend is best-effort by design. Yeah and I am not sure how should I go ahead about this issue now :) > And *please* CC PM-related stuff to linux-pm. Sure. I wasn't sure initially when this thread got started, that it is a PM related stuff and so didn't do it. As it was all about printk and hogging :) -- viresh