public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* 2.4.4 sluggish under fork load
@ 2001-04-28 11:52 Peter Osterlund
  2001-04-28 14:16 ` J . A . Magallon
                   ` (4 more replies)
  0 siblings, 5 replies; 21+ messages in thread
From: Peter Osterlund @ 2001-04-28 11:52 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

I have noticed that 2.4.4 feels a lot less responsive than 2.4.3 under
fork load. This is caused by the "run child first after fork" patch. I
have tested on two different UP x86 systems running redhat 7.0.

For example, when running the gcc configure script, the X mouse pointer is
very jerky. The configure script itself runs approximately as fast as in
2.4.3.

Another thing is that the bash loop "while true ; do /bin/true ; done" is
not possible to interrupt with ctrl-c.

A third thing I noticed is that starting a gnome session in redhat 7.0
takes longer. (It takes more time for the gnome splash screen to appear.)

Reverting the fork patch makes all these problems go away on my machine.
I'm not saying that this is necessarily a good idea, that patch might be
good for other reasons.


--- linux-2.4.4/kernel/fork.c~	Sat Apr 28 09:46:58 2001
+++ linux-2.4.4/kernel/fork.c	Sat Apr 28 11:14:33 2001
@@ -674,9 +674,16 @@
 	 * and then exec(). This is only important in the first timeslice.
 	 * In the long run, the scheduling behavior is unchanged.
 	 */
+#if 0
 	p->counter = current->counter;
 	current->counter = 0;
 	current->need_resched = 1;
+#else
+	p->counter = (current->counter + 1) >> 1;
+	current->counter >>= 1;
+	if (!current->counter)
+		current->need_resched = 1;
+#endif

 	/*
 	 * Ok, add it to the run-queues and make it

-- 
Peter Österlund             peter.osterlund@mailbox.swipnet.se
Sköndalsvägen 35            http://home1.swipnet.se/~w-15919
S-128 66 Sköndal            +46 8 942647
Sweden



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.4 sluggish under fork load
  2001-04-28 11:52 Peter Osterlund
@ 2001-04-28 14:16 ` J . A . Magallon
  2001-04-28 14:26 ` Mohammad A. Haque
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 21+ messages in thread
From: J . A . Magallon @ 2001-04-28 14:16 UTC (permalink / raw)
  To: Peter Osterlund; +Cc: Linus Torvalds, linux-kernel


On 04.28 Peter Osterlund wrote:
> 
> Another thing is that the bash loop "while true ; do /bin/true ; done" is
> not possible to interrupt with ctrl-c.
> 

Just tried that under 2.4.4 on two terminals at the same time and the system
even noticed it. Both cpus were running at about 45%user+55%sys, and was
able to use balsa to read mail (disk access) and both loops stopped
immediatley under Ctrl-C.

-- 
J.A. Magallon                                          #  Let the source
mailto:jamagallon@able.es                              #  be with you, Luke... 

Linux werewolf 2.4.4 #1 SMP Sat Apr 28 11:45:02 CEST 2001 i686


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.4 sluggish under fork load
  2001-04-28 11:52 Peter Osterlund
  2001-04-28 14:16 ` J . A . Magallon
@ 2001-04-28 14:26 ` Mohammad A. Haque
  2001-04-28 15:07 ` Rene Puls
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 21+ messages in thread
From: Mohammad A. Haque @ 2001-04-28 14:26 UTC (permalink / raw)
  To: Peter Osterlund; +Cc: Linus Torvalds, linux-kernel

Peter Osterlund wrote:
> 
> I have noticed that 2.4.4 feels a lot less responsive than 2.4.3 under
> fork load. This is caused by the "run child first after fork" patch. I
> have tested on two different UP x86 systems running redhat 7.0.
> 
> For example, when running the gcc configure script, the X mouse pointer is
> very jerky. The configure script itself runs approximately as fast as in
> 2.4.3.
> 
> Another thing is that the bash loop "while true ; do /bin/true ; done" is
> not possible to interrupt with ctrl-c.
> 

Just as a data point, I'm experiencing this also.

> Reverting the fork patch makes all these problems go away on my machine.
> I'm not saying that this is necessarily a good idea, that patch might be
> good for other reasons.

I'll try out this patch soon.
-- 

=====================================================================
Mohammad A. Haque                              http://www.haque.net/ 
                                               mhaque@haque.net

  "Alcohol and calculus don't mix.             Project Lead
   Don't drink and derive." --Unknown          http://wm.themes.org/
                                               batmanppc@themes.org
=====================================================================

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.4 sluggish under fork load
  2001-04-28 11:52 Peter Osterlund
  2001-04-28 14:16 ` J . A . Magallon
  2001-04-28 14:26 ` Mohammad A. Haque
@ 2001-04-28 15:07 ` Rene Puls
  2001-04-28 17:10   ` John Kacur
  2001-04-28 17:54 ` Linus Torvalds
  2001-04-28 20:00 ` Harald Dunkel
  4 siblings, 1 reply; 21+ messages in thread
From: Rene Puls @ 2001-04-28 15:07 UTC (permalink / raw)
  To: Peter Osterlund; +Cc: Linus Torvalds, linux-kernel

Peter Osterlund wrote:
> 
> Another thing is that the bash loop "while true ; do /bin/true ; done" is
> not possible to interrupt with ctrl-c.

	Same thing here.

> A third thing I noticed is that starting a gnome session in redhat 7.0
> takes longer. (It takes more time for the gnome splash screen to
> appear.)

	I had similar problems with Sawfish: Starting a program from
the root menu would take about one or two seconds under 2.4.4.

> Reverting the fork patch makes all these problems go away on my
> machine.

	This patch worked for me as well.

bye,
Rene
 
-- 
Rene Puls <rpuls@gmx.net>                                 0x8652FFE2
http://www.lionking.org/~kianga/                    personal/pgp-key

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.4 sluggish under fork load
  2001-04-28 15:07 ` Rene Puls
@ 2001-04-28 17:10   ` John Kacur
  2001-04-28 18:00     ` Peter Osterlund
  0 siblings, 1 reply; 21+ messages in thread
From: John Kacur @ 2001-04-28 17:10 UTC (permalink / raw)
  To: linux-kernel

>Peter Osterlund wrote:
>> 
>> Another thing is that the bash loop "while true ; do /bin/true ; done" is
>> not possible to interrupt with ctrl-c.

>        Same thing here.

I'm not having any problems. Just a quick question, is everyone who is
having a problem running with more than one cpu?

John Kacur

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.4 sluggish under fork load
  2001-04-28 11:52 Peter Osterlund
                   ` (2 preceding siblings ...)
  2001-04-28 15:07 ` Rene Puls
@ 2001-04-28 17:54 ` Linus Torvalds
  2001-04-28 19:14   ` Peter Osterlund
  2001-04-28 20:00 ` Harald Dunkel
  4 siblings, 1 reply; 21+ messages in thread
From: Linus Torvalds @ 2001-04-28 17:54 UTC (permalink / raw)
  To: Peter Osterlund; +Cc: linux-kernel


On Sat, 28 Apr 2001, Peter Osterlund wrote:
> 
> For example, when running the gcc configure script, the X mouse pointer is
> very jerky. The configure script itself runs approximately as fast as in
> 2.4.3.

Ok. Fair enough. The new "run the child first" approach has advantages,
but it is entirely possible that the advantages unfairly prioritize things
that do a lot of forking.

> Another thing is that the bash loop "while true ; do /bin/true ; done" is
> not possible to interrupt with ctrl-c.

This, however, is a bash bug, not a kernel issue. Bash does something
strange with the terminal and ignores ^C at times, and basically only
react correctly to the ^C under the right circumstances. Changing the
child to run first probably makes the pre-existing bug much easier to see.

> Reverting the fork patch makes all these problems go away on my machine.

Reverting it outright may be an acceptable approach. I'll think about
it: the arguments _for_ the patch are true and real, and it shows up as
real improvements on some things..

An alternative approach might be to not give the child the _whole_
timeslice, but give it more than half. Partition it out 66% - 33% or
something.

		Linus



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.4 sluggish under fork load
  2001-04-28 17:10   ` John Kacur
@ 2001-04-28 18:00     ` Peter Osterlund
  0 siblings, 0 replies; 21+ messages in thread
From: Peter Osterlund @ 2001-04-28 18:00 UTC (permalink / raw)
  To: John Kacur; +Cc: linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1569 bytes --]

John Kacur <jkacur@home.com> writes:

> >Peter Osterlund wrote:
> >> 
> >> Another thing is that the bash loop "while true ; do /bin/true ; done" is
> >> not possible to interrupt with ctrl-c.
> 
> >        Same thing here.
> 
> I'm not having any problems. Just a quick question, is everyone who is
> having a problem running with more than one cpu?

A clarification. The bash loop above doesn't cause any sluggishness on
my single cpu system. The non-working ctrl-c is probably just a bash
bug. The child process must eat some cpu time to provoke the
sluggishness, like in the following test program where the child busy
waits 100ms and then exits:

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/time.h>

int main(int argc, char* argv[])
{
    double childTime = 0.10;
    if (argc > 1)
	childTime = atof(argv[1]);

    for (;;) {
	int child = fork();
	if (child == -1) {
	    printf("fork error\n");
	    exit(0);
	} else if (child > 0) {
	    while (waitpid(child, NULL, 0) != child)
		;
	    printf("."); fflush(stdout);
	} else {
	    struct timeval tv1, tv2;
	    double t;
	    gettimeofday(&tv1, NULL);
	    for (;;) {
		gettimeofday(&tv2, NULL);
		t = (tv2.tv_sec - tv1.tv_sec) +
		    (tv2.tv_usec - tv1.tv_usec) / 1000000.0;
		if (t > childTime)
		    break;
	    }
	    _exit(0);
	}
    }

    return 0;
}

-- 
Peter Österlund             peter.osterlund@mailbox.swipnet.se
Sköndalsvägen 35            http://home1.swipnet.se/~w-15919
S-128 66 Sköndal            +46 8 942647
Sweden


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.4 sluggish under fork load
  2001-04-28 17:54 ` Linus Torvalds
@ 2001-04-28 19:14   ` Peter Osterlund
  0 siblings, 0 replies; 21+ messages in thread
From: Peter Osterlund @ 2001-04-28 19:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

On Sat, 28 Apr 2001, Linus Torvalds wrote:

> > Reverting the fork patch makes all these problems go away on my machine.
>
> Reverting it outright may be an acceptable approach. I'll think about
> it: the arguments _for_ the patch are true and real, and it shows up as
> real improvements on some things..

I agree with the reasoning for running the child first. Maybe the real
problem is somewhere else. I wrote two test programs to quantify the
behaviour. If I run "./fork 0.2" and "./lat 0.15" at the same time, lat
shows regular 160ms scheduling delays. (With the old fork.c the scheduling
delay is 20ms + epsilon as expected.)

Maybe some code path just forgets to reschedule?

-------- fork.c --------

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/time.h>

int main(int argc, char* argv[])
{
    double childTime = atof(argv[1]);

    for (;;) {
	int child = fork();
	if (child == -1) {
	    printf("fork error\n");
	    exit(0);
	} else if (child > 0) {
	    while (waitpid(child, NULL, 0) != child)
		;
	    printf("."); fflush(stdout);
	} else {
	    struct timeval tv1, tv2;
	    double t;
	    gettimeofday(&tv1, NULL);
	    for (;;) {
		gettimeofday(&tv2, NULL);
		t = (tv2.tv_sec - tv1.tv_sec) +
		    (tv2.tv_usec - tv1.tv_usec) / 1000000.0;
		if (t > childTime)
		    break;
	    }
	    _exit(0);
	}
    }

    return 0;
}


-------- lat.c --------

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/time.h>

int main(int argc, char* argv[])
{
    double tLimit = 0.03;
    if (argc > 1)
	tLimit = atof(argv[1]);

    for (;;) {
	struct timeval tv1, tv2;
	double t;

	gettimeofday(&tv1, NULL);
	usleep(10000);
	gettimeofday(&tv2, NULL);
	t = (tv2.tv_sec - tv1.tv_sec) +
	    (tv2.tv_usec - tv1.tv_usec) / 1000000.0;
	if (t > tLimit)
	    printf("t:%g\n", t);
    }
    return 0;
}

-- 
Peter Österlund             peter.osterlund@mailbox.swipnet.se
Sköndalsvägen 35            http://home1.swipnet.se/~w-15919
S-128 66 Sköndal            +46 8 942647
Sweden



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.4 sluggish under fork load
  2001-04-28 11:52 Peter Osterlund
                   ` (3 preceding siblings ...)
  2001-04-28 17:54 ` Linus Torvalds
@ 2001-04-28 20:00 ` Harald Dunkel
  4 siblings, 0 replies; 21+ messages in thread
From: Harald Dunkel @ 2001-04-28 20:00 UTC (permalink / raw)
  To: Peter Osterlund; +Cc: linux-kernel

Peter Osterlund wrote:
> 
> I have noticed that 2.4.4 feels a lot less responsive than 2.4.3 under
> fork load. This is caused by the "run child first after fork" patch. I
> have tested on two different UP x86 systems running redhat 7.0.
> 
> For example, when running the gcc configure script, the X mouse pointer is
> very jerky. The configure script itself runs approximately as fast as in
> 2.4.3.
> 

That explains why xtoolwait did not work anymore. After applying the
patch everything is OK again.


Many thanx

Harri

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.4 sluggish under fork load
@ 2001-04-29  7:14 Adam J. Richter
  0 siblings, 0 replies; 21+ messages in thread
From: Adam J. Richter @ 2001-04-29  7:14 UTC (permalink / raw)
  To: linux-kernel, mhaque, peter.osterlund; +Cc: torvalds

Peter Osterlund wrote:
> Another thing is that the bash loop "while true ; do /bin/true ; done" is
> not possible to interrupt with ctrl-c.

	I have reproduced this on a uniprocessor machine and determined
that it is a bash bug.  I will submit a bash bug report and sample
patch that fixes the problem (but may be incorrect in other ways), and
will cc it to linux-kernel.  Look for the subject "Patch(?): bash-2.05/jobs.c
loses interrupts."

	I have not yet investigated the other report of "sluggish" behavior.

Adam J. Richter     __     ______________   4880 Stevens Creek Blvd, Suite 104
adam@yggdrasil.com     \ /                  San Jose, California 95129-1034
+1 408 261-6630         | g g d r a s i l   United States of America
fax +1 408 261-6631      "Free Software For The Rest Of Us."

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.4 sluggish under fork load
@ 2001-04-29  8:04 Adam J. Richter
  0 siblings, 0 replies; 21+ messages in thread
From: Adam J. Richter @ 2001-04-29  8:04 UTC (permalink / raw)
  To: linux-kernel

	On rereading Linus's message, I see that he indicated that
"while true ; do /bin/true ; done" was known to be a bash bug, not
just a suggested possibility.  Sorry for acting as if this were
a new discovery.  Anyhow, I hope that at least the proposed bash
patch that I submitted may be of some use.

Adam J. Richter     __     ______________   4880 Stevens Creek Blvd, Suite 104
adam@yggdrasil.com     \ /                  San Jose, California 95129-1034
+1 408 261-6630         | g g d r a s i l   United States of America
fax +1 408 261-6631      "Free Software For The Rest Of Us."

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.4 sluggish under fork load
       [not found] <Pine.LNX.4.21.0104281928080.10759-100000@penguin.transmeta.com>
@ 2001-04-29  8:26 ` Peter Osterlund
  2001-04-30 17:51   ` Andrea Arcangeli
  0 siblings, 1 reply; 21+ messages in thread
From: Peter Osterlund @ 2001-04-29  8:26 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Mark Hahn, Adam J. Richter, linux-kernel

On Sat, 28 Apr 2001, Linus Torvalds wrote:

> > could we leave it at half, but set the parent to SCHED_YIELD?
>
> Sounds like a good idea. Peter, how does that feel to you? I bet that I'v
> enever seen it simply because all my machines are (a) much too powerful
> for any reasonable use and (b) SMP.

That seems to work. The scheduling delays are back to 20ms and the
sluggishness feeling is gone. I wrote a simple test program to verify that
the child is still scheduled before the parent, so the performance
advantage should still be there. The only annoying thing is that it hides
the bash bug ;)

Patch below:

--- linux-2.4.4.orig/kernel/fork.c	Sat Apr 28 10:17:00 2001
+++ linux-2.4.4/kernel/fork.c	Sun Apr 29 10:06:42 2001
@@ -666,16 +666,18 @@
 	p->pdeath_signal = 0;

 	/*
-	 * Give the parent's dynamic priority entirely to the child.  The
-	 * total amount of dynamic priorities in the system doesn't change
-	 * (more scheduling fairness), but the child will run first, which
-	 * is especially useful in avoiding a lot of copy-on-write faults
-	 * if the child for a fork() just wants to do a few simple things
-	 * and then exec(). This is only important in the first timeslice.
-	 * In the long run, the scheduling behavior is unchanged.
+	 * "share" dynamic priority between parent and child, thus the
+	 * total amount of dynamic priorities in the system doesn't change,
+	 * more scheduling fairness. The parent yields to let the child run
+	 * first, which is especially useful in avoiding a lot of
+	 * copy-on-write faults if the child for a fork() just wants to do a
+	 * few simple things and then exec(). This is only important in the
+	 * first timeslice. In the long run, the scheduling behavior is
+	 * unchanged.
 	 */
-	p->counter = current->counter;
-	current->counter = 0;
+	p->counter = (current->counter + 1) >> 1;
+	current->counter >>= 1;
+	current->policy |= SCHED_YIELD;
 	current->need_resched = 1;

 	/*

-- 
Peter Österlund             peter.osterlund@mailbox.swipnet.se
Sköndalsvägen 35            http://home1.swipnet.se/~w-15919
S-128 66 Sköndal            +46 8 942647
Sweden



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.4 sluggish under fork load
  2001-04-29  8:26 ` 2.4.4 sluggish under fork load Peter Osterlund
@ 2001-04-30 17:51   ` Andrea Arcangeli
  2001-04-30 21:45     ` Peter Osterlund
  2001-05-01  2:38     ` Rik van Riel
  0 siblings, 2 replies; 21+ messages in thread
From: Andrea Arcangeli @ 2001-04-30 17:51 UTC (permalink / raw)
  To: Peter Osterlund; +Cc: Linus Torvalds, Mark Hahn, Adam J. Richter, linux-kernel

On Sun, Apr 29, 2001 at 10:26:57AM +0200, Peter Osterlund wrote:
> On Sat, 28 Apr 2001, Linus Torvalds wrote:
> 
> > > could we leave it at half, but set the parent to SCHED_YIELD?
> >
> > Sounds like a good idea. Peter, how does that feel to you? I bet that I'v
> > enever seen it simply because all my machines are (a) much too powerful
> > for any reasonable use and (b) SMP.
> 
> That seems to work. The scheduling delays are back to 20ms and the
> sluggishness feeling is gone. I wrote a simple test program to verify that
> the child is still scheduled before the parent, so the performance
> advantage should still be there. The only annoying thing is that it hides
> the bash bug ;)
> 
> Patch below:
> 
> --- linux-2.4.4.orig/kernel/fork.c	Sat Apr 28 10:17:00 2001
> +++ linux-2.4.4/kernel/fork.c	Sun Apr 29 10:06:42 2001
> @@ -666,16 +666,18 @@
>  	p->pdeath_signal = 0;
> 
>  	/*
> -	 * Give the parent's dynamic priority entirely to the child.  The
> -	 * total amount of dynamic priorities in the system doesn't change
> -	 * (more scheduling fairness), but the child will run first, which
> -	 * is especially useful in avoiding a lot of copy-on-write faults
> -	 * if the child for a fork() just wants to do a few simple things
> -	 * and then exec(). This is only important in the first timeslice.
> -	 * In the long run, the scheduling behavior is unchanged.
> +	 * "share" dynamic priority between parent and child, thus the
> +	 * total amount of dynamic priorities in the system doesn't change,
> +	 * more scheduling fairness. The parent yields to let the child run
> +	 * first, which is especially useful in avoiding a lot of
> +	 * copy-on-write faults if the child for a fork() just wants to do a
> +	 * few simple things and then exec(). This is only important in the
> +	 * first timeslice. In the long run, the scheduling behavior is
> +	 * unchanged.
>  	 */
> -	p->counter = current->counter;
> -	current->counter = 0;
> +	p->counter = (current->counter + 1) >> 1;
> +	current->counter >>= 1;
> +	current->policy |= SCHED_YIELD;
>  	current->need_resched = 1;
> 
>  	/*

please try to reproduce the bad behaviour with 2.4.4aa2. There's a bug
in the parent-timeslice patch in 2.4 that I fixed while backporting it
to 2.2aa and that I now forward ported the fix to 2.4aa. The fact 2.4.4
gives the whole timeslice to the child just gives more light to such
bug. Unfortunately the fix doesn't apply cleanly to 2.4.4 (it's
incremental with the numa-scheduler patch) and I need to finish a few
more things before I can backport it myself.

Andrea

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.4 sluggish under fork load
  2001-04-30 17:51   ` Andrea Arcangeli
@ 2001-04-30 21:45     ` Peter Osterlund
  2001-05-01  2:38     ` Rik van Riel
  1 sibling, 0 replies; 21+ messages in thread
From: Peter Osterlund @ 2001-04-30 21:45 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Linus Torvalds, Mark Hahn, Adam J. Richter, linux-kernel

On Mon, 30 Apr 2001, Andrea Arcangeli wrote:

> please try to reproduce the bad behaviour with 2.4.4aa2. There's a bug
> in the parent-timeslice patch in 2.4 that I fixed while backporting it
> to 2.2aa and that I now forward ported the fix to 2.4aa. The fact
> 2.4.4 gives the whole timeslice to the child just gives more light to
> such bug. Unfortunately the fix doesn't apply cleanly to 2.4.4 (it's
> incremental with the numa-scheduler patch) and I need to finish a few
> more things before I can backport it myself.

I applied the 10_parent-timeslice-5 patch to 2.4.4 and tested. (If I
understood correctly, the idea of that patch is to give the remaining
child time-slice back to the parent when the child exits, but only if
there have been no time-slice recalculation since the child was created.)

It is somewhat better than plain 2.4.4, but not much. I still see
scheduling delays in the range 30-120ms when running "./fork 0.4". (fork
is a program that starts a child, the child busy waits some time (0.4s)
and then exits. The parent then immediately respawns another child, etc.
See one of my previous messages.)

-- 
Peter Österlund             peter.osterlund@mailbox.swipnet.se
Sköndalsvägen 35            http://home1.swipnet.se/~w-15919
S-128 66 Sköndal            +46 8 942647
Sweden



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.4 sluggish under fork load
  2001-04-30 17:51   ` Andrea Arcangeli
  2001-04-30 21:45     ` Peter Osterlund
@ 2001-05-01  2:38     ` Rik van Riel
  2001-05-01  5:18       ` Andrea Arcangeli
  1 sibling, 1 reply; 21+ messages in thread
From: Rik van Riel @ 2001-05-01  2:38 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Peter Osterlund, Linus Torvalds, Mark Hahn, Adam J. Richter,
	linux-kernel

On Mon, 30 Apr 2001, Andrea Arcangeli wrote:
> On Sun, Apr 29, 2001 at 10:26:57AM +0200, Peter Osterlund wrote:

> > -	p->counter = current->counter;
> > -	current->counter = 0;
> > +	p->counter = (current->counter + 1) >> 1;
> > +	current->counter >>= 1;
> > +	current->policy |= SCHED_YIELD;
> >  	current->need_resched = 1;
> 
> please try to reproduce the bad behaviour with 2.4.4aa2. There's a bug
> in the parent-timeslice patch in 2.4 that I fixed while backporting it
> to 2.2aa and that I now forward ported the fix to 2.4aa. The fact
> 2.4.4 gives the whole timeslice to the child just gives more light to
> such bug.

The fact that 2.4.4 gives the whole timeslice to the child
is just bogus to begin with.

The problem people tried to solve was "make sure the kernel
runs the child first after a fork", this has just about
NOTHING to do with how the timeslice is distributed.

Now, since we are in a supposedly stable branch of the kernel,
why mess with the timeslice distribution between parent and
child?  The timeslice distribution that has worked very well
for the last YEARS...

I agree when people want to fix problems, but I really don't
think 2.4 is the time to also "fix" non-problems.

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/		http://distro.conectiva.com/

Send all your spam to aardvark@nl.linux.org (spam digging piggy)


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.4 sluggish under fork load
@ 2001-05-01  4:18 Adam J. Richter
  0 siblings, 0 replies; 21+ messages in thread
From: Adam J. Richter @ 2001-05-01  4:18 UTC (permalink / raw)
  To: riel; +Cc: linux-kernel

>The fact that 2.4.4 gives the whole timeslice to the child
>is just bogus to begin with.

        I only did that because I could not find another way
to make the child run first that worked in practice.  I tried
other things before that.  Since Peter Osterlund's SCHED_YIELD
thing works, we no longer have to give all of the CPU to the
child.  The scheduler time slices are currently enormous, so as
long as the child gets even one clock tick before the parent runs,
it should reach the exec() if that is its plan.  1 tick = 10ms = 10
million cycles on a 1GHz CPU, which should be enough time to encrypt
my /boot/vmlinux in twofish if it's in RAM.

Adam J. Richter     __     ______________   4880 Stevens Creek Blvd, Suite 104
adam@yggdrasil.com     \ /                  San Jose, California 95129-1034
+1 408 261-6630         | g g d r a s i l   United States of America
fax +1 408 261-6631      "Free Software For The Rest Of Us."

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.4 sluggish under fork load
  2001-05-01  2:38     ` Rik van Riel
@ 2001-05-01  5:18       ` Andrea Arcangeli
  2001-05-01 16:55         ` Andrea Arcangeli
  0 siblings, 1 reply; 21+ messages in thread
From: Andrea Arcangeli @ 2001-05-01  5:18 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Peter Osterlund, Linus Torvalds, Mark Hahn, Adam J. Richter,
	linux-kernel

On Mon, Apr 30, 2001 at 11:38:23PM -0300, Rik van Riel wrote:
> On Mon, 30 Apr 2001, Andrea Arcangeli wrote:
> > On Sun, Apr 29, 2001 at 10:26:57AM +0200, Peter Osterlund wrote:
> 
> > > -	p->counter = current->counter;
> > > -	current->counter = 0;
> > > +	p->counter = (current->counter + 1) >> 1;
> > > +	current->counter >>= 1;
> > > +	current->policy |= SCHED_YIELD;
> > >  	current->need_resched = 1;
> > 
> > please try to reproduce the bad behaviour with 2.4.4aa2. There's a bug
> > in the parent-timeslice patch in 2.4 that I fixed while backporting it
> > to 2.2aa and that I now forward ported the fix to 2.4aa. The fact
> > 2.4.4 gives the whole timeslice to the child just gives more light to
> > such bug.
> 
> The fact that 2.4.4 gives the whole timeslice to the child
> is just bogus to begin with.
> 
> The problem people tried to solve was "make sure the kernel
> runs the child first after a fork", this has just about
> NOTHING to do with how the timeslice is distributed.
> 
> Now, since we are in a supposedly stable branch of the kernel,
> why mess with the timeslice distribution between parent and
> child?  The timeslice distribution that has worked very well
> for the last YEARS...

I'm running with this below patch applied since a some time (I didn't
submitted it because for some reason unless I do p->policy &=
~SCHED_YIELD ksoftirqd deadlocks at boot and I didn't yet investigated
why, and I'd like to have the whole picture on it first):

diff -urN z/include/linux/sched.h z1/include/linux/sched.h
--- z/include/linux/sched.h	Mon Apr 30 04:22:25 2001
+++ z1/include/linux/sched.h	Mon Apr 30 02:45:07 2001
@@ -301,7 +301,7 @@
  * all fields in a single cacheline that are needed for
  * the goodness() loop in schedule().
  */
-	int counter;
+	volatile int counter;
 	int nice;
 	unsigned int policy;
 	struct mm_struct *mm;
diff -urN z/kernel/fork.c z1/kernel/fork.c
--- z/kernel/fork.c	Mon Apr 30 04:22:25 2001
+++ z1/kernel/fork.c	Mon Apr 30 03:49:26 2001
@@ -666,17 +666,17 @@
 	p->pdeath_signal = 0;
 
 	/*
-	 * Give the parent's dynamic priority entirely to the child.  The
-	 * total amount of dynamic priorities in the system doesn't change
-	 * (more scheduling fairness), but the child will run first, which
-	 * is especially useful in avoiding a lot of copy-on-write faults
-	 * if the child for a fork() just wants to do a few simple things
-	 * and then exec(). This is only important in the first timeslice.
-	 * In the long run, the scheduling behavior is unchanged.
+	 * Scheduling the child first is especially useful in avoiding a
+	 * lot of copy-on-write faults if the child for a fork() just wants
+	 * to do a few simple things and then exec().
 	 */
-	p->counter = current->counter;
-	current->counter = 0;
-	current->need_resched = 1;
+	{
+		int counter = current->counter >> 1;
+		current->counter = p->counter = counter;
+		p->policy &= ~SCHED_YIELD;
+		current->policy |= SCHED_YIELD;
+		current->need_resched = 1;
+	}
 	/* Tell the parent if it can get back its timeslice when child exits */
 	p->get_child_timeslice = 1;
 

The only point of my previous email is that if a fork loop has very
invasive effect on the rest of the system that more probably indicates
people got bitten by the bug in the parent-timeslice logic, furthmore I
never noticed any sluggish behaviour on my systems and before posting my
previous email I had 1 definitive feedback that the bad beahviour
observed on vanilla 2.4.4 with parallel compiles in the background got
cured *completly* by my tree (that in the tested revision didn't
included the above inlined change yet). So I thought it was worth
mentioning about the effect of the parent-timeslice bugfix here too.
This doesn't mean I don't want something like the above inlined patch
integrated.

Andrea

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.4 sluggish under fork load
  2001-05-01  5:18       ` Andrea Arcangeli
@ 2001-05-01 16:55         ` Andrea Arcangeli
  2001-05-01 17:33           ` J . A . Magallon
  2001-05-01 20:34           ` Alan Cox
  0 siblings, 2 replies; 21+ messages in thread
From: Andrea Arcangeli @ 2001-05-01 16:55 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Peter Osterlund, Linus Torvalds, Mark Hahn, Adam J. Richter,
	linux-kernel, Alan Cox

On Tue, May 01, 2001 at 07:18:49AM +0200, Andrea Arcangeli wrote:
> I'm running with this below patch applied since a some time (I didn't
> submitted it because for some reason unless I do p->policy &=
> ~SCHED_YIELD ksoftirqd deadlocks at boot and I didn't yet investigated
> why, and I'd like to have the whole picture on it first):

OK I found the explanation now. The reason ksoftirqd was deadlocking on
me without the explicit clear of SCHED_YIELD in p->policy is because a
softirq event was pending at the time of the first kernel_thread() and
then while returning from the syscall it was so taking the ret_from_irq
path that skips the reschedule [which was supposed to clear the
sched_yield and to reschedule the child] because CS was pointing to the
kernel descriptor. So init then runs with SCHED_YIELD set and when it
executes kernel_thread(ksoftirqd) also ksoftirqd inherit SCHED_YIELD set
too (copied at top of do_fork) and it never gets scheduled -> deadlock.

Basically there's no guarantee that any kernel_thread will return with
SCHED_YIELD cleared.

And if you fork off a child with its p->policy SCHED_YIELD set it will
never get scheduled in.

Only "just" running tasks can have SCHED_YIELD set.

So the below lines are the *right* and most robust approch as far I can
tell. (plus counter needs to be volatile, as every variable that can
change under the C code, even while it's probably not required by the
code involved with current->counter)

> +	{
> +		int counter = current->counter >> 1;
> +		current->counter = p->counter = counter;
> +		p->policy &= ~SCHED_YIELD;
> +		current->policy |= SCHED_YIELD;
> +		current->need_resched = 1;
> +	}

Alan, the patch you merged in 2.4.4ac2 can fail like mine, but it may fail in
a much more subtle way, while I notice if ksoftirqd never get scheduled
because I synchronize on it and I deadlock, your kupdate/bdflush/kswapd
may be forked off correctly but they can all have SCHED_YIELD set and
they will *never* get scheduled. You know what can happen if kupdate
never gets scheduled... I recommend to be careful with 2.4.4ac2.

My patch (part of it quoted above) is the right replacement for the code
in 2.4.4ac2 (you may want to do `counter = current->counter + 1 >> 1'
tricks additionally to that, I will change it a bit too for that minor
part.

Andrea

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.4 sluggish under fork load
  2001-05-01 16:55         ` Andrea Arcangeli
@ 2001-05-01 17:33           ` J . A . Magallon
  2001-05-01 20:34           ` Alan Cox
  1 sibling, 0 replies; 21+ messages in thread
From: J . A . Magallon @ 2001-05-01 17:33 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Linux Kernel


On 05.01 Andrea Arcangeli wrote:
> 
> And if you fork off a child with its p->policy SCHED_YIELD set it will
> never get scheduled in.
> 
> Only "just" running tasks can have SCHED_YIELD set.
> 
> So the below lines are the *right* and most robust approch as far I can
> tell. (plus counter needs to be volatile, as every variable that can
> change under the C code, even while it's probably not required by the
> code involved with current->counter)
> 
> > +	{
> > +		int counter = current->counter >> 1;
> > +		current->counter = p->counter = counter;
> > +		p->policy &= ~SCHED_YIELD;
> > +		current->policy |= SCHED_YIELD;
> > +		current->need_resched = 1;
> > +	}
> 
> Alan, the patch you merged in 2.4.4ac2 can fail like mine, but it may fail in
> a much more subtle way, while I notice if ksoftirqd never get scheduled
> because I synchronize on it and I deadlock, your kupdate/bdflush/kswapd
> may be forked off correctly but they can all have SCHED_YIELD set and
> they will *never* get scheduled. You know what can happen if kupdate
> never gets scheduled... I recommend to be careful with 2.4.4ac2.
> 

It looks like this is related to my problem (see thread [Re: Linux-2.4.4-ac2]).
Funtions __start_kernel called kernel_thread(init,...), and seems to hang
on cpu_idle().

-- 
J.A. Magallon                                          #  Let the source
mailto:jamagallon@able.es                              #  be with you, Luke... 

Linux werewolf 2.4.4-ac1 #1 SMP Tue May 1 11:35:17 CEST 2001 i686


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.4 sluggish under fork load
  2001-05-01 16:55         ` Andrea Arcangeli
  2001-05-01 17:33           ` J . A . Magallon
@ 2001-05-01 20:34           ` Alan Cox
  1 sibling, 0 replies; 21+ messages in thread
From: Alan Cox @ 2001-05-01 20:34 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Rik van Riel, Peter Osterlund, Linus Torvalds, Mark Hahn,
	Adam J. Richter, linux-kernel, Alan Cox

> OK I found the explanation now. The reason ksoftirqd was deadlocking on
> me without the explicit clear of SCHED_YIELD in p->policy is because a
> softirq event was pending at the time of the first kernel_thread() and
> then while returning from the syscall it was so taking the ret_from_irq

Oh boy. 

> > +		current->policy |= SCHED_YIELD;
> > +		current->need_resched = 1;
> > +	}
> 
> Alan, the patch you merged in 2.4.4ac2 can fail like mine, but it may fail in
> a much more subtle way, while I notice if ksoftirqd never get scheduled
> because I synchronize on it and I deadlock, your kupdate/bdflush/kswapd
> may be forked off correctly but they can all have SCHED_YIELD set and
> they will *never* get scheduled. You know what can happen if kupdate
> never gets scheduled... I recommend to be careful with 2.4.4ac2.

Change merged for -ac3. Nice debugging


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.4 sluggish under fork load
@ 2001-05-03 14:02 Hubertus Franke
  0 siblings, 0 replies; 21+ messages in thread
From: Hubertus Franke @ 2001-05-03 14:02 UTC (permalink / raw)
  To: Adam J. Richter; +Cc: riel, linux-kernel


I pointed that out to the folk who proposed this and
gave him a fix that ensures that the child has at least a value of 2
higher.

Given the child all and the parent nothing is TOTAL BOGUS. The parent
essentially has to wait for a recalculate.
This so-called fix has to go in the next release.


Hubertus Franke
Enterprise Linux Group (Mgr),  Linux Technology Center (Member Scalability)
, OS-PIC (Chair)
email: frankeh@us.ibm.com
(w) 914-945-2003    (fax) 914-945-4425   TL: 862-2003



"Adam J. Richter" <adam@yggdrasil.com>@vger.kernel.org on 05/01/2001
12:18:10 AM

Sent by:  linux-kernel-owner@vger.kernel.org


To:   riel@conectiva.com.br
cc:   linux-kernel@vger.kernel.org
Subject:  Re: 2.4.4 sluggish under fork load



>The fact that 2.4.4 gives the whole timeslice to the child
>is just bogus to begin with.

        I only did that because I could not find another way
to make the child run first that worked in practice.  I tried
other things before that.  Since Peter Osterlund's SCHED_YIELD
thing works, we no longer have to give all of the CPU to the
child.  The scheduler time slices are currently enormous, so as
long as the child gets even one clock tick before the parent runs,
it should reach the exec() if that is its plan.  1 tick = 10ms = 10
million cycles on a 1GHz CPU, which should be enough time to encrypt
my /boot/vmlinux in twofish if it's in RAM.

Adam J. Richter     __     ______________   4880 Stevens Creek Blvd, Suite
104
adam@yggdrasil.com     \ /                  San Jose, California 95129-1034
+1 408 261-6630         | g g d r a s i l   United States of America
fax +1 408 261-6631      "Free Software For The Rest Of Us."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/




^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2001-05-03 14:06 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <Pine.LNX.4.21.0104281928080.10759-100000@penguin.transmeta.com>
2001-04-29  8:26 ` 2.4.4 sluggish under fork load Peter Osterlund
2001-04-30 17:51   ` Andrea Arcangeli
2001-04-30 21:45     ` Peter Osterlund
2001-05-01  2:38     ` Rik van Riel
2001-05-01  5:18       ` Andrea Arcangeli
2001-05-01 16:55         ` Andrea Arcangeli
2001-05-01 17:33           ` J . A . Magallon
2001-05-01 20:34           ` Alan Cox
2001-05-03 14:02 Hubertus Franke
  -- strict thread matches above, loose matches on Subject: below --
2001-05-01  4:18 Adam J. Richter
2001-04-29  8:04 Adam J. Richter
2001-04-29  7:14 Adam J. Richter
2001-04-28 11:52 Peter Osterlund
2001-04-28 14:16 ` J . A . Magallon
2001-04-28 14:26 ` Mohammad A. Haque
2001-04-28 15:07 ` Rene Puls
2001-04-28 17:10   ` John Kacur
2001-04-28 18:00     ` Peter Osterlund
2001-04-28 17:54 ` Linus Torvalds
2001-04-28 19:14   ` Peter Osterlund
2001-04-28 20:00 ` Harald Dunkel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox