From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932590AbbCQT0w (ORCPT ); Tue, 17 Mar 2015 15:26:52 -0400 Received: from mx1.redhat.com ([209.132.183.28]:58671 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932335AbbCQT0t (ORCPT ); Tue, 17 Mar 2015 15:26:49 -0400 Date: Tue, 17 Mar 2015 20:24:50 +0100 From: Oleg Nesterov To: Aaron Tomlin Cc: akpm@linux-foundation.org, rientjes@google.com, dwysocha@redhat.com, linux-kernel@vger.kernel.org, Ingo Molnar Subject: [PATCH 0/2] hung_task: improve rcu_lock_break() logic Message-ID: <20150317192450.GA32579@redhat.com> References: <1426601624-6703-1-git-send-email-atomlin@redhat.com> <1426601624-6703-2-git-send-email-atomlin@redhat.com> <20150317170920.GA21493@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150317170920.GA21493@redhat.com> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/17, Oleg Nesterov wrote: > > On 03/17, Aaron Tomlin wrote: > > > > --- a/kernel/hung_task.c > > +++ b/kernel/hung_task.c > > @@ -169,7 +169,7 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout) > > return; > > > > rcu_read_lock(); > > - do_each_thread(g, t) { > > + for_each_process_thread(g, t) { > > if (!max_count--) > > goto unlock; > > if (!--batch_count) { > > @@ -180,7 +180,7 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout) > > /* use "==" to skip the TASK_KILLABLE tasks waiting on NFS */ > > if (t->state == TASK_UNINTERRUPTIBLE) > > check_hung_task(t, timeout); > > - } while_each_thread(g, t); > > + } > > > Acked-by: Oleg Nesterov > > > > Perhaps it also makes sense to improve this rcu_lock_break a bit... > For example, if 't' is dead but 'g' is alive we can continue the > "for_each_process" part of this double loop. And if 't' is still > alive then we can find the new leader and continue... > > But I agree, lets start from this fix, then we will see. Something like this. on top of Aaron's change. But actually I am not sure this really makes a lot of sense. But if yes, we can do more. We can save g->start_time before rcu_lock_break(), and then "both dead" case can use for_each_process() to (try to) find the first process whis has a ge start_time. Oleg.