From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753447AbcGYQXx (ORCPT ); Mon, 25 Jul 2016 12:23:53 -0400 Received: from mx1.redhat.com ([209.132.183.28]:57308 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752930AbcGYQXv (ORCPT ); Mon, 25 Jul 2016 12:23:51 -0400 Date: Mon, 25 Jul 2016 18:23:48 +0200 From: Oleg Nesterov To: Andrew Morton Cc: Dave Anderson , Ingo Molnar , Peter Zijlstra , "Paul E. McKenney" , Wang Shu , linux-kernel@vger.kernel.org Subject: [PATCH 1/3] introduce for_each_process_thread_{break,continue}() helpers Message-ID: <20160725162348.GA23947@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160725162332.GA23935@redhat.com> User-Agent: Mutt/1.5.24 (2015-08-30) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Mon, 25 Jul 2016 16:23:51 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Example: rcu_read_lock(); for_each_process_thread(p, t) { do_something_slow(p, t); if (SPENT_TOO_MANY_TIME) { for_each_process_thread_break(p, t); rcu_read_unlock(); schedule(); rcu_read_lock(); for_each_process_thread_continue(&p, &t); } } rcu_read_unlock(); This looks similar to rcu_lock_break(), but much better and the next patch changes check_hung_uninterruptible_tasks() to use these new helpers. But my real target is show_state_filter() which can trivially lead to lockup. Compared to rcu_lock_break(), for_each_process_thread_continue() never gives up, it relies on fact that both process and thread lists are sorted by the task->start_time key. So, for example, even if both leader/thread are already dead we can find the next alive process and continue. TODO: This is another indication we should rename pid_alive(tsk) (and probably change it), most users call it to ensure that rcu_read_lock() still protects this tsk. NOTE: it seems that, contrary to the comment, task_struct->start_time is not really monotonic, and this should be probably fixed. Until then _continue() might skip more threads with the same ->start_time than necessary. Signed-off-by: Oleg Nesterov --- include/linux/sched.h | 10 ++++++++++ kernel/exit.c | 42 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 52 insertions(+) diff --git a/include/linux/sched.h b/include/linux/sched.h index 253538f..004c9d5 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2859,6 +2859,16 @@ extern bool current_is_single_threaded(void); #define for_each_process_thread(p, t) \ for_each_process(p) for_each_thread(p, t) +static inline void +for_each_process_thread_break(struct task_struct *p, struct task_struct *t) +{ + get_task_struct(p); + get_task_struct(t); +} + +extern void +for_each_process_thread_continue(struct task_struct **, struct task_struct **); + static inline int get_nr_threads(struct task_struct *tsk) { return tsk->signal->nr_threads; diff --git a/kernel/exit.c b/kernel/exit.c index 9e6e135..08f49d4 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -210,6 +210,48 @@ repeat: goto repeat; } +void for_each_process_thread_continue(struct task_struct **p_leader, + struct task_struct **p_thread) +{ + struct task_struct *leader = *p_leader, *thread = *p_thread; + struct task_struct *prev, *next; + u64 start_time; + + if (pid_alive(thread)) { + /* mt exec could change the leader */ + *p_leader = thread->group_leader; + } else if (pid_alive(leader)) { + start_time = thread->start_time; + prev = leader; + + for_each_thread(leader, next) { + if (next->start_time > start_time) + break; + prev = next; + } + + *p_thread = prev; + } else { + start_time = leader->start_time; + prev = &init_task; + + for_each_process(next) { + if (next->start_time > start_time) + break; + prev = next; + } + + *p_leader = prev; + /* a new thread can come after that, but this is fine */ + *p_thread = list_last_entry(&prev->signal->thread_head, + struct task_struct, + thread_node); + } + + put_task_struct(leader); + put_task_struct(thread); +} + /* * Determine if a process group is "orphaned", according to the POSIX * definition in 2.2.2.52. Orphaned process groups are not to be affected -- 2.5.0