public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: init's children list is long and slows reaping children.
@ 2007-04-06  8:42 Oleg Nesterov
  2007-04-06  9:10 ` Eric W. Biederman
  0 siblings, 1 reply; 100+ messages in thread
From: Oleg Nesterov @ 2007-04-06  8:42 UTC (permalink / raw)
  To: Robin Holt
  Cc: Chris Snook, Eric W. Biederman, Ingo Molnar, Linus Torvalds,
	linux-kernel

Robin Holt wrote:
>
> wait_task_zombie() is taking many seconds to get through the list.
> For the case of a modprobe, stop_machine creates one thread per cpu
> (remember big number). All are parented to init and their exit will
> cause wait_task_zombie to scan multiple times most of the way through
> this very long list looking for threads which need to be reaped.

Could you try this patch

	http://marc.info/?l=linux-kernel&m=117337194209912

?

It can't solve the whole problem, but at least an exiting kernel thread
won't kick init.

Oleg.


^ permalink raw reply	[flat|nested] 100+ messages in thread
* init's children list is long and slows reaping children.
@ 2007-04-05 19:51 Robin Holt
  2007-04-05 20:57 ` Linus Torvalds
  2007-04-06 22:38 ` Jeff Garzik
  0 siblings, 2 replies; 100+ messages in thread
From: Robin Holt @ 2007-04-05 19:51 UTC (permalink / raw)
  To: Eric W. Biederman, Ingo Molnar, Linus Torvalds; +Cc: linux-kernel, Jack Steiner


We have been testing a new larger configuration and we are seeing a very
large scan time of init's tsk->children list.  In the cases we are seeing,
there are numerous kernel processes created for each cpu (ie: events/0
... events/<big number>, xfslogd/0 ... xfslogd/<big number>).  These are
all on the list ahead of the processes we are currently trying to reap.

wait_task_zombie() is taking many seconds to get through the list.
For the case of a modprobe, stop_machine creates one thread per cpu
(remember big number). All are parented to init and their exit will
cause wait_task_zombie to scan multiple times most of the way through
this very long list looking for threads which need to be reaped.  As
a reference point, when we tried to mount the xfs root filesystem,
we ran out of pid space and had to recompile a kernel with a larger
default max pids.

For testing, Jack Steiner create the following patch.  All it does
is moves tasks which are transitioning to the zombie state from where
they are in the children list to the head of the list.  In this way,
they will be the first found and reaping does speed up.  We will still
do a full scan of the list once the rearranged tasks are all removed.
This does not seem to be a significant problem.

This does, however, modify the order of reaping of children.  Is there a
guarantee of the order for reaping children which needs to be preserved
or can this simple patch be used to speed up the reaping?  If this
simple patch is not acceptable, are there any preferred methods for
linking together the tasks that have been zombied so they can be reaped
more quickly?  Maybe add a zombie list_head to the task_struct and chain
them together in the children list order?

In comparison, without this patch, following modprobe on that particular
machine init is still reaping zombied tasks more than 30 seconds
following command completion.  With this patch, all the zombied tasks
are removed within the first couple seconds.

Any suggestions would be greatly appreciated.

Thanks,
Robin Holt

Patch against 2.6.16 SLES 10 kernel.

Index: linux-2.6.16/kernel/exit.c
===================================================================
--- linux-2.6.16.orig/kernel/exit.c	2007-03-28 21:56:20.601860403 -0500
+++ linux-2.6.16/kernel/exit.c	2007-03-28 22:01:12.233942431 -0500
@@ -710,6 +710,13 @@ static void exit_notify(struct task_stru
 	write_lock_irq(&tasklist_lock);
 
 	/*
+	 * Relink to head of parent's child list. This makes it easier to find.
+	 * On large systems, init has way too many children that never terminate.
+	 */
+	list_del_init(&tsk->sibling);
+	list_add(&tsk->sibling, &tsk->parent->children);
+
+	/*
 	 * This does two things:
 	 *
   	 * A.  Make init inherit all the child processes

^ permalink raw reply	[flat|nested] 100+ messages in thread

end of thread, other threads:[~2007-04-11 21:31 UTC | newest]

Thread overview: 100+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-06  8:42 init's children list is long and slows reaping children Oleg Nesterov
2007-04-06  9:10 ` Eric W. Biederman
2007-04-06  9:44   ` Oleg Nesterov
2007-04-06 15:45     ` Eric W. Biederman
2007-04-06 15:47       ` Oleg Nesterov
2007-04-06 17:16         ` Linus Torvalds
2007-04-06 17:27           ` Ingo Molnar
2007-04-06 17:31             ` Ingo Molnar
2007-04-06 17:34           ` Eric W. Biederman
2007-04-06 19:06             ` H. Peter Anvin
2007-04-06 19:15               ` Eric W. Biederman
2007-04-06 19:21                 ` H. Peter Anvin
2007-04-06 21:04                 ` Jeremy Fitzhardinge
2007-04-06 21:07                   ` H. Peter Anvin
2007-04-06 19:36           ` Oleg Nesterov
2007-04-06 19:43             ` Ingo Molnar
2007-04-06 20:01               ` Oleg Nesterov
2007-04-06 20:21                 ` Ingo Molnar
2007-04-06 19:47             ` Oleg Nesterov
2007-04-06 19:59               ` Eric W. Biederman
2007-04-07 20:31             ` Oleg Nesterov
2007-04-08  0:38               ` Eric W. Biederman
2007-04-08 15:46                 ` Oleg Nesterov
  -- strict thread matches above, loose matches on Subject: below --
2007-04-05 19:51 Robin Holt
2007-04-05 20:57 ` Linus Torvalds
2007-04-06  0:51   ` Chris Snook
2007-04-06  1:03     ` Chris Snook
2007-04-06  1:29     ` Linus Torvalds
2007-04-06  2:15       ` Eric W. Biederman
2007-04-06 10:43         ` Robin Holt
2007-04-06 15:38           ` Eric W. Biederman
2007-04-06 16:31             ` Oleg Nesterov
2007-04-06 17:32               ` Ingo Molnar
2007-04-06 17:39                 ` Roland Dreier
2007-04-06 18:04                   ` Eric W. Biederman
2007-04-06 18:30                 ` Eric W. Biederman
2007-04-10 13:48                   ` Ingo Molnar
2007-04-10 13:38                     ` Oleg Nesterov
2007-04-10 15:00                       ` Eric W. Biederman
2007-04-10 14:51                     ` Eric W. Biederman
2007-04-10 15:06                       ` Ingo Molnar
2007-04-10 15:22                         ` Eric W. Biederman
2007-04-10 15:53                           ` Ingo Molnar
2007-04-10 16:17                             ` Eric W. Biederman
2007-04-10 16:44                       ` Oleg Nesterov
2007-04-11 19:55                         ` Bill Davidsen
2007-04-11 20:17                           ` Eric W. Biederman
2007-04-11 21:24                             ` Bill Davidsen
2007-04-11 20:19                           ` Oleg Nesterov
2007-04-06 18:02               ` Eric W. Biederman
2007-04-06 18:21               ` Davide Libenzi
2007-04-06 18:56                 ` Eric W. Biederman
2007-04-06 19:16                   ` Davide Libenzi
2007-04-06 19:19                     ` Ingo Molnar
2007-04-06 21:29                       ` Davide Libenzi
2007-04-06 21:51                         ` Linus Torvalds
2007-04-06 22:31                           ` Davide Libenzi
2007-04-06 22:46                             ` Linus Torvalds
2007-04-06 22:59                               ` Davide Libenzi
2007-04-09  8:28                           ` Ingo Molnar
2007-04-09 18:09                             ` Bill Davidsen
2007-04-09 19:28                               ` Kyle Moffett
2007-04-09 19:51                                 ` Linus Torvalds
2007-04-09 20:03                                   ` Davide Libenzi
2007-04-10 15:12                                     ` Bill Davidsen
2007-04-10 19:17                                       ` Davide Libenzi
2007-04-09 20:00                                 ` Eric W. Biederman
2007-04-06 16:41             ` Robin Holt
2007-04-09 17:37         ` Chris Snook
2007-04-06 18:05       ` Christoph Hellwig
2007-04-06 19:39         ` Eric W. Biederman
2007-04-06 22:38 ` Jeff Garzik
2007-04-06 22:51   ` Linus Torvalds
2007-04-06 23:37     ` Jeff Garzik
2007-04-11  7:28       ` Nick Piggin
2007-04-10  0:23   ` Andrew Morton
2007-04-10  0:48     ` Eric W. Biederman
2007-04-10  1:15       ` Andrew Morton
2007-04-10  6:53       ` Jeff Garzik
2007-04-10  9:42       ` Robin Holt
2007-04-10  1:59     ` Dave Jones
2007-04-10  2:30       ` Andrew Morton
2007-04-10  2:46         ` Linus Torvalds
2007-04-10  7:07           ` Jeff Garzik
2007-04-10 22:20             ` Ingo Oeser
2007-04-10  5:07         ` Alexey Dobriyan
2007-04-10  5:21           ` Dave Jones
2007-04-10  6:09         ` Torsten Kaiser
2007-04-10  7:08           ` Jeff Garzik
2007-04-10  7:05         ` Jeff Garzik
2007-04-10  7:37           ` Andrew Morton
2007-04-10  8:33             ` Jeff Garzik
2007-04-10  8:41               ` Andrew Morton
2007-04-10  8:48                 ` Jeff Garzik
2007-04-10 22:35                   ` Ingo Oeser
2007-04-10 16:35           ` Matt Mackall
2007-04-10  7:44         ` Russell King
2007-04-10  8:16           ` Jeff Garzik
2007-04-10  8:59           ` Ingo Molnar
2007-04-10  9:33             ` Jeff Garzik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox