* [PATCH 0/2] introduce for_each_process_thread_break() and for_each_process_thread_continue() @ 2018-09-12 16:33 Oleg Nesterov 2018-09-12 16:33 ` [PATCH 1/2] " Oleg Nesterov 2018-09-12 16:33 ` [PATCH 2/2] hung_task: change check_hung_uninterruptible_tasks() to use for_each_process_thread_break/continue Oleg Nesterov 0 siblings, 2 replies; 5+ messages in thread From: Oleg Nesterov @ 2018-09-12 16:33 UTC (permalink / raw) To: Andrew Morton Cc: Dmitry Vyukov, Michal Hocko, Paul E. McKenney, Peter Zijlstra, linux-kernel Resend. IMHO this makes sense anyway, but mostly this is preparation for other changes. show_state_filter() and other "slow" users of for_each_process() can livelock and even trigger hard lockups. Peter, you have already reviewed at least 1/2 (heh, two years ago) and iirc you were mostly agree but pointed out that for_each_xxx() in _continue() could be SPEND_TOO_MUCH_TIME themselves. I added a note into the changelog. Oleg. include/linux/sched/signal.h | 10 ++++++++++ kernel/exit.c | 42 ++++++++++++++++++++++++++++++++++++++++++ kernel/hung_task.c | 25 +++++++++---------------- 3 files changed, 61 insertions(+), 16 deletions(-) ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/2] introduce for_each_process_thread_break() and for_each_process_thread_continue() 2018-09-12 16:33 [PATCH 0/2] introduce for_each_process_thread_break() and for_each_process_thread_continue() Oleg Nesterov @ 2018-09-12 16:33 ` Oleg Nesterov 2018-09-12 19:25 ` Andrew Morton 2018-09-12 16:33 ` [PATCH 2/2] hung_task: change check_hung_uninterruptible_tasks() to use for_each_process_thread_break/continue Oleg Nesterov 1 sibling, 1 reply; 5+ messages in thread From: Oleg Nesterov @ 2018-09-12 16:33 UTC (permalink / raw) To: Andrew Morton Cc: Dmitry Vyukov, Michal Hocko, Paul E. McKenney, Peter Zijlstra, linux-kernel Usage: rcu_read_lock(); for_each_process_thread(p, t) { do_something_slow(p, t); if (SPENT_TOO_MUCH_TIME) { for_each_process_thread_break(p, t); rcu_read_unlock(); schedule(); rcu_read_lock(); for_each_process_thread_continue(&p, &t); } } rcu_read_unlock(); This looks similar to rcu_lock_break(), but much better and the next patch changes check_hung_uninterruptible_tasks() to use these new helpers. But my real target is show_state_filter() which can trivially lead to lockup. Compared to rcu_lock_break(), for_each_process_thread_continue() never gives up, it relies on fact that both process and thread lists are sorted by the task->start_time key. So, for example, even if both leader/thread are already dead we can find the next alive process and continue. Strictly speaking, the for_each_process/for_each_thread loops in _continue() could be "SPEND_TOO_MUCH_TIME" by themselves, so perhaps we will add another "max_scan" argument later or do something else. But at least they can not livelock under heavy fork/exit loads, they are bounded by PID_MAX_DEFAULT in the worst case. NOTE: it seems that, contrary to the comment, task_struct->start_time is not really monotonic, and this should be probably fixed. Until then _continue() might skip more threads with the same ->start_time than necessary. Signed-off-by: Oleg Nesterov <oleg@redhat.com> --- include/linux/sched/signal.h | 10 ++++++++++ kernel/exit.c | 42 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 52 insertions(+) diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h index 1be3572..1c957d4 100644 --- a/include/linux/sched/signal.h +++ b/include/linux/sched/signal.h @@ -565,6 +565,16 @@ extern bool current_is_single_threaded(void); #define for_each_process_thread(p, t) \ for_each_process(p) for_each_thread(p, t) +static inline void +for_each_process_thread_break(struct task_struct *p, struct task_struct *t) +{ + get_task_struct(p); + get_task_struct(t); +} + +extern void +for_each_process_thread_continue(struct task_struct **, struct task_struct **); + typedef int (*proc_visitor)(struct task_struct *p, void *data); void walk_process_tree(struct task_struct *top, proc_visitor, void *); diff --git a/kernel/exit.c b/kernel/exit.c index 0e21e6d..71380c7 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -319,6 +319,48 @@ void rcuwait_wake_up(struct rcuwait *w) rcu_read_unlock(); } +void for_each_process_thread_continue(struct task_struct **p_leader, + struct task_struct **p_thread) +{ + struct task_struct *leader = *p_leader, *thread = *p_thread; + struct task_struct *prev, *next; + u64 start_time; + + if (pid_alive(thread)) { + /* mt exec could change the leader */ + *p_leader = thread->group_leader; + } else if (pid_alive(leader)) { + start_time = thread->start_time; + prev = leader; + + for_each_thread(leader, next) { + if (next->start_time > start_time) + break; + prev = next; + } + + *p_thread = prev; + } else { + start_time = leader->start_time; + prev = &init_task; + + for_each_process(next) { + if (next->start_time > start_time) + break; + prev = next; + } + + *p_leader = prev; + /* a new thread can come after that, but this is fine */ + *p_thread = list_last_entry(&prev->signal->thread_head, + struct task_struct, + thread_node); + } + + put_task_struct(leader); + put_task_struct(thread); +} + /* * Determine if a process group is "orphaned", according to the POSIX * definition in 2.2.2.52. Orphaned process groups are not to be affected -- 2.5.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] introduce for_each_process_thread_break() and for_each_process_thread_continue() 2018-09-12 16:33 ` [PATCH 1/2] " Oleg Nesterov @ 2018-09-12 19:25 ` Andrew Morton 2018-09-13 15:55 ` Oleg Nesterov 0 siblings, 1 reply; 5+ messages in thread From: Andrew Morton @ 2018-09-12 19:25 UTC (permalink / raw) To: Oleg Nesterov Cc: Dmitry Vyukov, Michal Hocko, Paul E. McKenney, Peter Zijlstra, linux-kernel On Wed, 12 Sep 2018 18:33:35 +0200 Oleg Nesterov <oleg@redhat.com> wrote: > Usage: > > rcu_read_lock(); > for_each_process_thread(p, t) { > do_something_slow(p, t); > > if (SPENT_TOO_MUCH_TIME) { > for_each_process_thread_break(p, t); > rcu_read_unlock(); > schedule(); > rcu_read_lock(); > for_each_process_thread_continue(&p, &t); > } > } > rcu_read_unlock(); > > This looks similar to rcu_lock_break(), but much better and the next patch > changes check_hung_uninterruptible_tasks() to use these new helpers. But my > real target is show_state_filter() which can trivially lead to lockup. > > Compared to rcu_lock_break(), for_each_process_thread_continue() never gives > up, it relies on fact that both process and thread lists are sorted by the > task->start_time key. So, for example, even if both leader/thread are already > dead we can find the next alive process and continue. > > Strictly speaking, the for_each_process/for_each_thread loops in _continue() > could be "SPEND_TOO_MUCH_TIME" by themselves, so perhaps we will add another > "max_scan" argument later or do something else. But at least they can not > livelock under heavy fork/exit loads, they are bounded by PID_MAX_DEFAULT in > the worst case. > > NOTE: it seems that, contrary to the comment, task_struct->start_time is not > really monotonic, and this should be probably fixed. Until then _continue() > might skip more threads with the same ->start_time than necessary. > > --- a/include/linux/sched/signal.h > +++ b/include/linux/sched/signal.h > @@ -565,6 +565,16 @@ extern bool current_is_single_threaded(void); > #define for_each_process_thread(p, t) \ > for_each_process(p) for_each_thread(p, t) > > +static inline void > +for_each_process_thread_break(struct task_struct *p, struct task_struct *t) > +{ > + get_task_struct(p); > + get_task_struct(t); > +} > + > +extern void > +for_each_process_thread_continue(struct task_struct **, struct task_struct **); These things will need some documentation, please. What they do, why they do it, how people should use them, when and why they should use them. Etcetera! This is tricky stuff. > +void for_each_process_thread_continue(struct task_struct **p_leader, > + struct task_struct **p_thread) > +{ > + struct task_struct *leader = *p_leader, *thread = *p_thread; > + struct task_struct *prev, *next; > + u64 start_time; > + > + if (pid_alive(thread)) { > + /* mt exec could change the leader */ > + *p_leader = thread->group_leader; > + } else if (pid_alive(leader)) { > + start_time = thread->start_time; > + prev = leader; > + > + for_each_thread(leader, next) { > + if (next->start_time > start_time) > + break; > + prev = next; > + } > + > + *p_thread = prev; > + } else { > + start_time = leader->start_time; > + prev = &init_task; > + > + for_each_process(next) { > + if (next->start_time > start_time) > + break; > + prev = next; > + } > + > + *p_leader = prev; > + /* a new thread can come after that, but this is fine */ > + *p_thread = list_last_entry(&prev->signal->thread_head, > + struct task_struct, > + thread_node); > + } > + > + put_task_struct(leader); > + put_task_struct(thread); > +} Should these be available to modules, like the rest of these things appear to be? Or we could do that later if a need is shown. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] introduce for_each_process_thread_break() and for_each_process_thread_continue() 2018-09-12 19:25 ` Andrew Morton @ 2018-09-13 15:55 ` Oleg Nesterov 0 siblings, 0 replies; 5+ messages in thread From: Oleg Nesterov @ 2018-09-13 15:55 UTC (permalink / raw) To: Andrew Morton Cc: Dmitry Vyukov, Michal Hocko, Paul E. McKenney, Peter Zijlstra, linux-kernel On 09/12, Andrew Morton wrote: > > > Usage: > > > > rcu_read_lock(); > > for_each_process_thread(p, t) { > > do_something_slow(p, t); > > > > if (SPENT_TOO_MUCH_TIME) { > > for_each_process_thread_break(p, t); > > rcu_read_unlock(); > > schedule(); > > rcu_read_lock(); > > for_each_process_thread_continue(&p, &t); > > } > > } > > rcu_read_unlock(); ... > > +static inline void > > +for_each_process_thread_break(struct task_struct *p, struct task_struct *t) > > +{ > > + get_task_struct(p); > > + get_task_struct(t); > > +} > > + > > +extern void > > +for_each_process_thread_continue(struct task_struct **, struct task_struct **); > > These things will need some documentation, please. What they do, why > they do it, how people should use them, when and why they should use > them. Etcetera! This is tricky stuff. See "Usage" above, this is as simple as list_for_each_entry_continue_rcu(), just you need to call _break() first. OK, I'll try to add some comments and send V2. > Should these be available to modules, like the rest of these things > appear to be? Or we could do that later if a need is shown. Yes, I think this can be exported on demand, Oleg. ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 2/2] hung_task: change check_hung_uninterruptible_tasks() to use for_each_process_thread_break/continue 2018-09-12 16:33 [PATCH 0/2] introduce for_each_process_thread_break() and for_each_process_thread_continue() Oleg Nesterov 2018-09-12 16:33 ` [PATCH 1/2] " Oleg Nesterov @ 2018-09-12 16:33 ` Oleg Nesterov 1 sibling, 0 replies; 5+ messages in thread From: Oleg Nesterov @ 2018-09-12 16:33 UTC (permalink / raw) To: Andrew Morton Cc: Dmitry Vyukov, Michal Hocko, Paul E. McKenney, Peter Zijlstra, linux-kernel Change check_hung_uninterruptible_tasks() to use the new helpers and remove the "can_cont" logic from rcu_lock_break(). We could add for_each_process_thread_break/continue right into rcu_lock_break() but see the next patch. Signed-off-by: Oleg Nesterov <oleg@redhat.com> --- kernel/hung_task.c | 25 +++++++++---------------- 1 file changed, 9 insertions(+), 16 deletions(-) diff --git a/kernel/hung_task.c b/kernel/hung_task.c index b9132d1..13af558 100644 --- a/kernel/hung_task.c +++ b/kernel/hung_task.c @@ -148,20 +148,11 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout) * For preemptible RCU it is sufficient to call rcu_read_unlock in order * to exit the grace period. For classic RCU, a reschedule is required. */ -static bool rcu_lock_break(struct task_struct *g, struct task_struct *t) +static void rcu_lock_break(void) { - bool can_cont; - - get_task_struct(g); - get_task_struct(t); rcu_read_unlock(); cond_resched(); rcu_read_lock(); - can_cont = pid_alive(g) && pid_alive(t); - put_task_struct(t); - put_task_struct(g); - - return can_cont; } /* @@ -185,16 +176,18 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout) hung_task_show_lock = false; rcu_read_lock(); for_each_process_thread(g, t) { - if (!max_count--) + /* use "==" to skip the TASK_KILLABLE tasks waiting on NFS */ + if (t->state == TASK_UNINTERRUPTIBLE) + check_hung_task(t, timeout); + + if (!--max_count) goto unlock; if (!--batch_count) { batch_count = HUNG_TASK_BATCHING; - if (!rcu_lock_break(g, t)) - goto unlock; + for_each_process_thread_break(g, t); + rcu_lock_break(); + for_each_process_thread_continue(&g, &t); } - /* use "==" to skip the TASK_KILLABLE tasks waiting on NFS */ - if (t->state == TASK_UNINTERRUPTIBLE) - check_hung_task(t, timeout); } unlock: rcu_read_unlock(); -- 2.5.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2018-09-13 17:35 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-09-12 16:33 [PATCH 0/2] introduce for_each_process_thread_break() and for_each_process_thread_continue() Oleg Nesterov 2018-09-12 16:33 ` [PATCH 1/2] " Oleg Nesterov 2018-09-12 19:25 ` Andrew Morton 2018-09-13 15:55 ` Oleg Nesterov 2018-09-12 16:33 ` [PATCH 2/2] hung_task: change check_hung_uninterruptible_tasks() to use for_each_process_thread_break/continue Oleg Nesterov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox