From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754794AbcIIUNq (ORCPT ); Fri, 9 Sep 2016 16:13:46 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:50034 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752040AbcIIUNp (ORCPT ); Fri, 9 Sep 2016 16:13:45 -0400 Date: Fri, 9 Sep 2016 13:13:44 -0700 From: Andrew Morton To: jsiddle@redhat.com Cc: linux-kernel@vger.kernel.org, penguin-kernel@I-love.SAKURA.ne.jp Subject: Re: [PATCH] hung_task: Allow hung_task_panic when hung_task_warnings is 0. Message-Id: <20160909131344.c6995ea35e778db4e76abb4f@linux-foundation.org> In-Reply-To: <1473450214-4049-1-git-send-email-jsiddle@redhat.com> References: <1473450214-4049-1-git-send-email-jsiddle@redhat.com> X-Mailer: Sylpheed 3.4.1 (GTK+ 2.24.23; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 9 Sep 2016 15:43:34 -0400 jsiddle@redhat.com wrote: > From: John Siddle > > Previously hung_task_panic would not be respected if enabled after > hung_task_warnings had already been decremented to 0. > > Permit the kernel to panic if hung_task_panic is enabled after > hung_task_warnings has already been decremented to 0 and another task > hangs for hung_task_timeout_secs seconds. > > Check if hung_task_panic is enabled so we don't return prematurely, and > check if hung_task_warnings is non-zero so we don't print the warning > unnecessarily. > > ... > > --- a/kernel/hung_task.c > +++ b/kernel/hung_task.c > @@ -98,7 +98,7 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout) > > trace_sched_process_hang(t); > > - if (!sysctl_hung_task_warnings) > + if (!sysctl_hung_task_warnings && !sysctl_hung_task_panic) > return; > > if (sysctl_hung_task_warnings > 0) > @@ -108,16 +108,18 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout) > * Ok, the task did not get scheduled for more than 2 minutes, > * complain: > */ > - pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n", > - t->comm, t->pid, timeout); > - pr_err(" %s %s %.*s\n", > - print_tainted(), init_utsname()->release, > - (int)strcspn(init_utsname()->version, " "), > - init_utsname()->version); > - pr_err("\"echo 0 > /proc/sys/kernel/hung_task_timeout_secs\"" > - " disables this message.\n"); > - sched_show_task(t); > - debug_show_held_locks(t); > + if (sysctl_hung_task_warnings) { > + pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n", > + t->comm, t->pid, timeout); > + pr_err(" %s %s %.*s\n", > + print_tainted(), init_utsname()->release, > + (int)strcspn(init_utsname()->version, " "), > + init_utsname()->version); > + pr_err("\"echo 0 > /proc/sys/kernel/hung_task_timeout_secs\"" > + " disables this message.\n"); > + sched_show_task(t); > + debug_show_held_locks(t); > + } This introduces an off-by-one error. In the old code, if sysctl_hung_task_warnings==1 on entry, we warn. With the new code, we no longer warn. This? --- a/kernel/hung_task.c~hung_task-allow-hung_task_panic-when-hung_task_warnings-is-0-fix +++ a/kernel/hung_task.c @@ -101,14 +101,12 @@ static void check_hung_task(struct task_ if (!sysctl_hung_task_warnings && !sysctl_hung_task_panic) return; - if (sysctl_hung_task_warnings > 0) - sysctl_hung_task_warnings--; - /* * Ok, the task did not get scheduled for more than 2 minutes, * complain: */ if (sysctl_hung_task_warnings) { + sysctl_hung_task_warnings--; pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n", t->comm, t->pid, timeout); pr_err(" %s %s %.*s\n", _