multi-queue scheduler update

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* multi-queue scheduler update
@ 2001-01-18 23:53 Mike Kravetz
  2001-01-19  0:26 ` Andrea Arcangeli
                   ` (3 more replies)
  0 siblings, 4 replies; 35+ messages in thread
From: Mike Kravetz @ 2001-01-18 23:53 UTC (permalink / raw)
  To: lse-tech; +Cc: linux-kernel

I just posted an updated version of the multi-queue scheduler
for the 2.4.0 kernel.  This version also contains support for
realtime tasks.  The patch can be found at:

http://lse.sourceforge.net/scheduling/

Here are some very preliminary numbers from sched_test_yield
(which was previously posted to this (lse-tech) list by Bill
Hartner).  Tests were run on a system with 8 700 MHz Pentium
III processors.

                           microseconds/yield
# threads      2.2.16-22           2.4        2.4-multi-queue
------------   ---------         --------     ---------------
16               18.740            4.603         1.455
32               17.702            5.134         1.456
64               23.300            5.586         1.466
128              47.273           18.812         1.480
256             105.701           71.147         1.517
512               FRC            143.500         1.661
1024              FRC            196.425         6.166
2048              FRC              FRC          23.291
4096              FRC              FRC          47.117

*FRC = failed to reach confidence level

-- 
Mike Kravetz                                 mkravetz@sequent.com
IBM Linux Technology Center
15450 SW Koll Parkway
Beaverton, OR 97006-6063                     (503)578-3494
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: multi-queue scheduler update
  2001-01-18 23:53 multi-queue scheduler update Mike Kravetz
@ 2001-01-19  0:26 ` Andrea Arcangeli
  2001-01-19  0:51   ` [Lse-tech] " Andi Kleen
                     ` (3 more replies)
  2001-01-19  0:43 ` Gerhard Mack
                   ` (2 subsequent siblings)
  3 siblings, 4 replies; 35+ messages in thread
From: Andrea Arcangeli @ 2001-01-19  0:26 UTC (permalink / raw)
  To: Mike Kravetz; +Cc: lse-tech, linux-kernel

On Thu, Jan 18, 2001 at 03:53:11PM -0800, Mike Kravetz wrote:
> Here are some very preliminary numbers from sched_test_yield
> (which was previously posted to this (lse-tech) list by Bill
> Hartner).  Tests were run on a system with 8 700 MHz Pentium
> III processors.
> 
>                            microseconds/yield
> # threads      2.2.16-22           2.4        2.4-multi-queue
> ------------   ---------         --------     ---------------
> 16               18.740            4.603         1.455

I remeber the O(1) scheduler from Davide Libenzi was beating the mainline O(N)
scheduler with over 7 tasks in the runqueue (actually I'm not sure if the
number was 7 but certainly it was under 10). So if you also use a O(1)
scheduler too as I guess (since you have a chance to run fast on the lots of
tasks running case) the most interesting thing is how you score with 2/4/8
tasks in the runqueue (I think the tests on the O(1) scheduler patch was done
at max on a 2-way SMP btw). (the argument for which Davide's patch wasn't
included is that most machines have less than 4/5 tasks in the runqueue at the
same time)

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: multi-queue scheduler update
  2001-01-18 23:53 multi-queue scheduler update Mike Kravetz
  2001-01-19  0:26 ` Andrea Arcangeli
@ 2001-01-19  0:43 ` Gerhard Mack
  2001-01-23 16:49 ` [Lse-tech] " Jun Nakajima
       [not found] ` <LYR76657-1923-2001.01.23-08.54.49--mikek#sequent.com@lyris.sequent.com>
  3 siblings, 0 replies; 35+ messages in thread
From: Gerhard Mack @ 2001-01-19  0:43 UTC (permalink / raw)
  To: Mike Kravetz; +Cc: lse-tech, linux-kernel

What affect does this scheduler have on 1 - 5 tasks??

	Gerhard


On Thu, 18 Jan 2001, Mike Kravetz wrote:

> I just posted an updated version of the multi-queue scheduler
> for the 2.4.0 kernel.  This version also contains support for
> realtime tasks.  The patch can be found at:
> 
> http://lse.sourceforge.net/scheduling/
> 
> Here are some very preliminary numbers from sched_test_yield
> (which was previously posted to this (lse-tech) list by Bill
> Hartner).  Tests were run on a system with 8 700 MHz Pentium
> III processors.
> 
>                            microseconds/yield
> # threads      2.2.16-22           2.4        2.4-multi-queue
> ------------   ---------         --------     ---------------
> 16               18.740            4.603         1.455
> 32               17.702            5.134         1.456
> 64               23.300            5.586         1.466
> 128              47.273           18.812         1.480
> 256             105.701           71.147         1.517
> 512               FRC            143.500         1.661
> 1024              FRC            196.425         6.166
> 2048              FRC              FRC          23.291
> 4096              FRC              FRC          47.117
> 
> *FRC = failed to reach confidence level
> 
> -- 
> Mike Kravetz                                 mkravetz@sequent.com
> IBM Linux Technology Center
> 15450 SW Koll Parkway
> Beaverton, OR 97006-6063                     (503)578-3494
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> Please read the FAQ at http://www.tux.org/lkml/
> 

--
Gerhard Mack

gmack@innerfire.net

<>< As a computer I find your faith in technology amusing.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Lse-tech] Re: multi-queue scheduler update
  2001-01-19  0:26 ` Andrea Arcangeli
@ 2001-01-19  0:51   ` Andi Kleen
  2001-01-19  1:14     ` John Clemens
  2001-01-19  0:52   ` [Lse-tech] " Mike Kravetz
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 35+ messages in thread
From: Andi Kleen @ 2001-01-19  0:51 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Mike Kravetz, lse-tech, linux-kernel

On Fri, Jan 19, 2001 at 01:26:16AM +0100, Andrea Arcangeli wrote:
> I remeber the O(1) scheduler from Davide Libenzi was beating the mainline O(N)
> scheduler with over 7 tasks in the runqueue (actually I'm not sure if the
> number was 7 but certainly it was under 10). So if you also use a O(1)
> scheduler too as I guess (since you have a chance to run fast on the lots of
> tasks running case) the most interesting thing is how you score with 2/4/8
> tasks in the runqueue (I think the tests on the O(1) scheduler patch was done
> at max on a 2-way SMP btw). (the argument for which Davide's patch wasn't
> included is that most machines have less than 4/5 tasks in the runqueue at the
> same time)

They seem to have tried that in a separate patch:
http://lse.sourceforge.net/scheduling/PrioScheduler.html

Very nice literate programming style btw @-)


-Andi 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Lse-tech] Re: multi-queue scheduler update
  2001-01-19  0:26 ` Andrea Arcangeli
  2001-01-19  0:51   ` [Lse-tech] " Andi Kleen
@ 2001-01-19  0:52   ` Mike Kravetz
  2001-01-19  1:30     ` Andrea Arcangeli
  2001-01-19 16:06     ` David Lang
  2001-01-19  1:00   ` Mark Hahn
  2001-01-19 23:35   ` Mike Kravetz
  3 siblings, 2 replies; 35+ messages in thread
From: Mike Kravetz @ 2001-01-19  0:52 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: lse-tech, linux-kernel

On Fri, Jan 19, 2001 at 01:26:16AM +0100, Andrea Arcangeli wrote:
> On Thu, Jan 18, 2001 at 03:53:11PM -0800, Mike Kravetz wrote:
> > Here are some very preliminary numbers from sched_test_yield
> > (which was previously posted to this (lse-tech) list by Bill
> > Hartner).  Tests were run on a system with 8 700 MHz Pentium
> > III processors.
> > 
> >                            microseconds/yield
> > # threads      2.2.16-22           2.4        2.4-multi-queue
> > ------------   ---------         --------     ---------------
> > 16               18.740            4.603         1.455
> 
> I remeber the O(1) scheduler from Davide Libenzi was beating the mainline O(N)
> scheduler with over 7 tasks in the runqueue (actually I'm not sure if the
> number was 7 but certainly it was under 10). So if you also use a O(1)
> scheduler too as I guess (since you have a chance to run fast on the lots of
> tasks running case) the most interesting thing is how you score with 2/4/8
> tasks in the runqueue (I think the tests on the O(1) scheduler patch was done
> at max on a 2-way SMP btw). (the argument for which Davide's patch wasn't
> included is that most machines have less than 4/5 tasks in the runqueue at the
> same time)
> 
> Andrea

Thanks for the suggestion.  The only reason I hesitated to test with
a small number of threads is because I was under the assumption that
this particular benchmark may have problems if the number of threads
was less than the number of processors.  I'll give the tests a try
with a smaller number of threads.  I'm also open to suggestions for
what benchmarks/test methods I could use for scheduler testing.  If
you remember what people have used in the past, please let me know.

-- 
Mike Kravetz                                 mkravetz@sequent.com
IBM Linux Technology Center
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: multi-queue scheduler update
  2001-01-19  0:26 ` Andrea Arcangeli
  2001-01-19  0:51   ` [Lse-tech] " Andi Kleen
  2001-01-19  0:52   ` [Lse-tech] " Mike Kravetz
@ 2001-01-19  1:00   ` Mark Hahn
  2001-01-19  1:08     ` Andi Kleen
  2001-01-19  1:35     ` Andrea Arcangeli
  2001-01-19 23:35   ` Mike Kravetz
  3 siblings, 2 replies; 35+ messages in thread
From: Mark Hahn @ 2001-01-19  1:00 UTC (permalink / raw)
  To: linux-kernel

> >                            microseconds/yield
> > # threads      2.2.16-22           2.4        2.4-multi-queue
> > ------------   ---------         --------     ---------------
> > 16               18.740            4.603         1.455
> 
> I remeber the O(1) scheduler from Davide Libenzi was beating the mainline O(N)

isn't the normal case (as in "The Right Case to optimize") 
where there are close to zero runnable tasks?  what realistic/sane
scenarios have very large numbers of spinning threads?  all server
situations I can think of do not.  not volanomark -loopback, surely!

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: multi-queue scheduler update
  2001-01-19  1:00   ` Mark Hahn
@ 2001-01-19  1:08     ` Andi Kleen
  2001-01-19  1:23       ` Mike Kravetz
  2001-01-19  1:35     ` Andrea Arcangeli
  1 sibling, 1 reply; 35+ messages in thread
From: Andi Kleen @ 2001-01-19  1:08 UTC (permalink / raw)
  To: Mark Hahn; +Cc: linux-kernel

On Thu, Jan 18, 2001 at 08:00:16PM -0500, Mark Hahn wrote:
> > >                            microseconds/yield
> > > # threads      2.2.16-22           2.4        2.4-multi-queue
> > > ------------   ---------         --------     ---------------
> > > 16               18.740            4.603         1.455
> > 
> > I remeber the O(1) scheduler from Davide Libenzi was beating the mainline O(N)
> 
> isn't the normal case (as in "The Right Case to optimize") 
> where there are close to zero runnable tasks?  what realistic/sane
> scenarios have very large numbers of spinning threads?  all server
> situations I can think of do not.  not volanomark -loopback, surely!

I think the main point of Mike's patch is decreasing locking and cache line
bouncing overhead of multi cpu scheduling, not optimizing lots of runnable tasks.


-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: multi-queue scheduler update
  2001-01-19  0:51   ` [Lse-tech] " Andi Kleen
@ 2001-01-19  1:14     ` John Clemens
  0 siblings, 0 replies; 35+ messages in thread
From: John Clemens @ 2001-01-19  1:14 UTC (permalink / raw)
  To: linux-kernel

While I agree that this is probably only a win for very specialized tasks,
I'd be interested in seeing this patch implemented on a NUMA machine, with
one runqueue per node... anybody willing to try it? I don't have access to
one. How about from the Linux Scalability project at SGI? any comments?

john.c

-- 
John Clemens          http://www.deater.net/john
john@deater.net     ICQ: 7175925, IM: PianoManO8
      "I Hate Quotes" -- Samuel L. Clemens

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: multi-queue scheduler update
  2001-01-19  1:08     ` Andi Kleen
@ 2001-01-19  1:23       ` Mike Kravetz
  2001-01-19  1:38         ` Davide Libenzi
  0 siblings, 1 reply; 35+ messages in thread
From: Mike Kravetz @ 2001-01-19  1:23 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Mark Hahn, linux-kernel

On Fri, Jan 19, 2001 at 02:08:52AM +0100, Andi Kleen wrote:
> On Thu, Jan 18, 2001 at 08:00:16PM -0500, Mark Hahn wrote:
> > > >                            microseconds/yield
> > > > # threads      2.2.16-22           2.4        2.4-multi-queue
> > > > ------------   ---------         --------     ---------------
> > > > 16               18.740            4.603         1.455
> > > 
> > > I remeber the O(1) scheduler from Davide Libenzi was beating the mainline O(N)
> > 
> > isn't the normal case (as in "The Right Case to optimize") 
> > where there are close to zero runnable tasks?  what realistic/sane
> > scenarios have very large numbers of spinning threads?  all server
> > situations I can think of do not.  not volanomark -loopback, surely!
> 
> I think the main point of Mike's patch is decreasing locking and cache line
> bouncing overhead of multi cpu scheduling, not optimizing lots of runnable tasks.
> 
> 
> -Andi

Andi is correct.  Although the results I posted may seem to indicate
we are concentrating on high thread counts, this is really secondary
to reducing lock contention within the scheduler.  A co-worker down
the hall just ran pgbench (a postgresql db) benchmark and saw
contention on the runqueue lock at 57%.  Now, I know nothing about this
benchmark, but it will be interesting to see what happens after
applying my patch.

-- 
Mike Kravetz                                 mkravetz@sequent.com
IBM Linux Technology Center
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Lse-tech] Re: multi-queue scheduler update
  2001-01-19  0:52   ` [Lse-tech] " Mike Kravetz
@ 2001-01-19  1:30     ` Andrea Arcangeli
  2001-01-19  1:34       ` Mike Kravetz
  2001-01-19  1:39       ` Davide Libenzi
  2001-01-19 16:06     ` David Lang
  1 sibling, 2 replies; 35+ messages in thread
From: Andrea Arcangeli @ 2001-01-19  1:30 UTC (permalink / raw)
  To: Mike Kravetz; +Cc: lse-tech, linux-kernel

On Thu, Jan 18, 2001 at 04:52:25PM -0800, Mike Kravetz wrote:
> was less than the number of processors.  I'll give the tests a try
> with a smaller number of threads.  I'm also open to suggestions for

OK!

> what benchmarks/test methods I could use for scheduler testing.  If
> you remember what people have used in the past, please let me know.

It was this one IIRC (it spawns threads calling sched_yield() in loop).

/*
  Tester for the kernel's speed in scheduling.
  (C) 1999 / Willy Tarreau <willy@meta-x.org>

  Modified by Davide Libenzi <davidel@maticad.it>


  You can do whatever you want with this program, but I'm not
  responsible for any misuse. Be aware that it can heavily load
  a host. As it is multithreaded, it might take advantages of SMP.

  It basically creates a growing amount of threads and measures
  their cumulative work (i.e. loop iterations/second). The output
  is easily useable by gnuplot.

  To compile, you need libpthread :

     gcc -O2 -fomit-frame-pointer -o threads threads.c -lpthread

  Output on stdout is :
     <nb_threads> <average_work> <zero_work_threads> <std_deviation>

*/

#include <stdio.h>
#include <pthread.h>
#include <signal.h>
#include <unistd.h>
#include <time.h>



#define MAXTHREADS	450
#define MEASURE_TIME	60



pthread_t       thr[MAXTHREADS];
int             nbthreads = MAXTHREADS;
int             measure_time = MEASURE_TIME;
volatile        actthreads = 0;

long long int   totalwork[MAXTHREADS];
volatile int    stop = 0,
                start = 0,
                count = 0;

void            oneatwork(int thr)
{
    int             i;
    while (!start)              /* don't disturb pthread_create() */
        usleep(10000);

    actthreads++;
    while (!stop)
    {
        if (count)
            totalwork[thr]++;

        syscall(158); /* sys_sched_yield() */
    }
    actthreads--;
    pthread_exit(0);
}

main(int argc, char **argv)
{

    int             i,
                    err,
                    avgwork,
                    thrzero;
    long long int   value,
                    avgvalue;
    double          sqrdev;
    time_t          ts,
                    te;

    if (argc < 3)
    {
        printf("usage: %s  threads  time\n", argv[0]);
        exit(1);
    }    

    nbthreads = atoi(argv[1]);
    measure_time = atoi(argv[2]);
    
    
    start = 0;
    count = 0;
    stop = 0;
    actthreads = 0;
    thrzero = 0;
    value = 0;
    sqrdev = 0.0;

    fprintf(stderr, "\nCreating %d threads ...", nbthreads);
    for (i = 0; i < nbthreads; i++)
    {
        if ((err = pthread_create(&thr[i], NULL, (void *) &oneatwork, (void *) i)) != 0)
        {
            fprintf(stderr, "thread %d pthread_create=%d -> ", i, err);
            perror("");
            exit(1);
        }
        pthread_detach(thr[i]);
    }

    for (i = 0; i < nbthreads; i++)
        totalwork[i] = 0;

    fprintf(stderr, " OK !\nWaiting for all threads to start ...");

    start = 1;
    while (actthreads != nbthreads)
        usleep(10000);         /* waiting for a bit of stability */

    fprintf(stderr, "Go !\n");

    count = 1;
    time(&ts);

    sleep(measure_time);

    count = 0;
    stop = 1;
    time(&te);


    for (i = 0; i < nbthreads; i++)
    {
        value += totalwork[i];
        if (totalwork[i] == 0)
            ++thrzero;
    }
    avgvalue = value / nbthreads;
    value /= (int) difftime(te, ts);
    avgwork = (int) (value / nbthreads);

    for (i = 0; i < nbthreads; i++)
    {
        double          difvv = (double) (totalwork[i] - avgvalue);

        sqrdev += difvv * difvv;
    }

    while (actthreads > 0)
        usleep(10000);

    printf("%d\t\t%lld\t\t%d\t\t%d\t\t%f\n", nbthreads, value, avgwork, thrzero,
            sqrdev / ((double) nbthreads * avgvalue * avgvalue));

    exit(0);

}

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Lse-tech] Re: multi-queue scheduler update
  2001-01-19  1:30     ` Andrea Arcangeli
@ 2001-01-19  1:34       ` Mike Kravetz
  2001-01-19 20:49         ` Mike Kravetz
  2001-01-19  1:39       ` Davide Libenzi
  1 sibling, 1 reply; 35+ messages in thread
From: Mike Kravetz @ 2001-01-19  1:34 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Mike Kravetz, lse-tech, linux-kernel

On Fri, Jan 19, 2001 at 02:30:41AM +0100, Andrea Arcangeli wrote:
> On Thu, Jan 18, 2001 at 04:52:25PM -0800, Mike Kravetz wrote:
> > was less than the number of processors.  I'll give the tests a try
> > with a smaller number of threads.  I'm also open to suggestions for
> 
> OK!
> 
> > what benchmarks/test methods I could use for scheduler testing.  If
> > you remember what people have used in the past, please let me know.
> 
> It was this one IIRC (it spawns threads calling sched_yield() in loop).

Thanks!

At first glance this looks to be the same type of test/benchmark
I have been using.

-
Mike
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: multi-queue scheduler update
  2001-01-19  1:00   ` Mark Hahn
  2001-01-19  1:08     ` Andi Kleen
@ 2001-01-19  1:35     ` Andrea Arcangeli
  2001-01-19  1:48       ` Andi Kleen
  1 sibling, 1 reply; 35+ messages in thread
From: Andrea Arcangeli @ 2001-01-19  1:35 UTC (permalink / raw)
  To: Mark Hahn; +Cc: linux-kernel

On Thu, Jan 18, 2001 at 08:00:16PM -0500, Mark Hahn wrote:
> > >                            microseconds/yield
> > > # threads      2.2.16-22           2.4        2.4-multi-queue
> > > ------------   ---------         --------     ---------------
> > > 16               18.740            4.603         1.455
> > 
> > I remeber the O(1) scheduler from Davide Libenzi was beating the mainline O(N)
> 
> isn't the normal case (as in "The Right Case to optimize") 
> where there are close to zero runnable tasks?  what realistic/sane
> scenarios have very large numbers of spinning threads?  all server
> situations I can think of do not.  not volanomark -loopback, surely!

This is why the numbers with 2/4/8 threads in the runqueue are the most
interesting ones 8)

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: multi-queue scheduler update
  2001-01-19  1:23       ` Mike Kravetz
@ 2001-01-19  1:38         ` Davide Libenzi
  0 siblings, 0 replies; 35+ messages in thread
From: Davide Libenzi @ 2001-01-19  1:38 UTC (permalink / raw)
  To: Mike Kravetz, Andi Kleen; +Cc: Mark Hahn, linux-kernel

On Thursday 18 January 2001 17:33, Mike Kravetz wrote:
> On Fri, Jan 19, 2001 at 02:08:52AM +0100, Andi Kleen wrote:
> > On Thu, Jan 18, 2001 at 08:00:16PM -0500, Mark Hahn wrote:
> > > > >                            microseconds/yield
> > > > > # threads      2.2.16-22           2.4        2.4-multi-queue
> > > > > ------------   ---------         --------     ---------------
> > > > > 16               18.740            4.603         1.455
> > > >
> > > > I remeber the O(1) scheduler from Davide Libenzi was beating the
> > > > mainline O(N)
> > >
> > > isn't the normal case (as in "The Right Case to optimize")
> > > where there are close to zero runnable tasks?  what realistic/sane
> > > scenarios have very large numbers of spinning threads?  all server
> > > situations I can think of do not.  not volanomark -loopback, surely!
> >
> > I think the main point of Mike's patch is decreasing locking and cache
> > line bouncing overhead of multi cpu scheduling, not optimizing lots of
> > runnable tasks.
> >
> >
> > -Andi
>
> Andi is correct.  Although the results I posted may seem to indicate
> we are concentrating on high thread counts, this is really secondary
> to reducing lock contention within the scheduler.  A co-worker down
> the hall just ran pgbench (a postgresql db) benchmark and saw
> contention on the runqueue lock at 57%.  Now, I know nothing about this
> benchmark, but it will be interesting to see what happens after
> applying my patch.

Yep, the patch work in a different way and if these are the numbers it seems 
to be interesting.
Could You post results for a fewer number of tasks ?
I mean what is the performance loss for 1,2,..,5 tasks ?

To test You can use lmbench ( I don't remember the link ) and I should have 
the program I've used to test my patch somewhere.


- Davide
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Lse-tech] Re: multi-queue scheduler update
  2001-01-19  1:30     ` Andrea Arcangeli
  2001-01-19  1:34       ` Mike Kravetz
@ 2001-01-19  1:39       ` Davide Libenzi
  1 sibling, 0 replies; 35+ messages in thread
From: Davide Libenzi @ 2001-01-19  1:39 UTC (permalink / raw)
  To: Andrea Arcangeli, Mike Kravetz; +Cc: lse-tech, linux-kernel

On Thursday 18 January 2001 17:39, Andrea Arcangeli wrote:
> On Thu, Jan 18, 2001 at 04:52:25PM -0800, Mike Kravetz wrote:
> > was less than the number of processors.  I'll give the tests a try
> > with a smaller number of threads.  I'm also open to suggestions for
>
> OK!
>
> > what benchmarks/test methods I could use for scheduler testing.  If
> > you remember what people have used in the past, please let me know.
>
> It was this one IIRC (it spawns threads calling sched_yield() in loop).
>
> /*
>   Tester for the kernel's speed in scheduling.
>   (C) 1999 / Willy Tarreau <willy@meta-x.org>
>
>   Modified by Davide Libenzi <davidel@maticad.it>
>
>
>   You can do whatever you want with this program, but I'm not
>   responsible for any misuse. Be aware that it can heavily load
>   a host. As it is multithreaded, it might take advantages of SMP.
>
>   It basically creates a growing amount of threads and measures
>   their cumulative work (i.e. loop iterations/second). The output
>   is easily useable by gnuplot.
>
>   To compile, you need libpthread :
>
>      gcc -O2 -fomit-frame-pointer -o threads threads.c -lpthread
>
>   Output on stdout is :
>      <nb_threads> <average_work> <zero_work_threads> <std_deviation>
>
> */
>
> #include <stdio.h>
> #include <pthread.h>
> #include <signal.h>
> #include <unistd.h>
> #include <time.h>
>
>
>
> #define MAXTHREADS	450
> #define MEASURE_TIME	60
>
>
>
> pthread_t       thr[MAXTHREADS];
> int             nbthreads = MAXTHREADS;
> int             measure_time = MEASURE_TIME;
> volatile        actthreads = 0;
>
> long long int   totalwork[MAXTHREADS];
> volatile int    stop = 0,
>                 start = 0,
>                 count = 0;
>
> void            oneatwork(int thr)
> {
>     int             i;
>     while (!start)              /* don't disturb pthread_create() */
>         usleep(10000);
>
>     actthreads++;
>     while (!stop)
>     {
>         if (count)
>             totalwork[thr]++;
>
>         syscall(158); /* sys_sched_yield() */
>     }
>     actthreads--;
>     pthread_exit(0);
> }
>
> main(int argc, char **argv)
> {
>
>     int             i,
>                     err,
>                     avgwork,
>                     thrzero;
>     long long int   value,
>                     avgvalue;
>     double          sqrdev;
>     time_t          ts,
>                     te;
>
>     if (argc < 3)
>     {
>         printf("usage: %s  threads  time\n", argv[0]);
>         exit(1);
>     }
>
>     nbthreads = atoi(argv[1]);
>     measure_time = atoi(argv[2]);
>
>
>     start = 0;
>     count = 0;
>     stop = 0;
>     actthreads = 0;
>     thrzero = 0;
>     value = 0;
>     sqrdev = 0.0;
>
>     fprintf(stderr, "\nCreating %d threads ...", nbthreads);
>     for (i = 0; i < nbthreads; i++)
>     {
>         if ((err = pthread_create(&thr[i], NULL, (void *) &oneatwork, (void
> *) i)) != 0) {
>             fprintf(stderr, "thread %d pthread_create=%d -> ", i, err);
>             perror("");
>             exit(1);
>         }
>         pthread_detach(thr[i]);
>     }
>
>     for (i = 0; i < nbthreads; i++)
>         totalwork[i] = 0;
>
>     fprintf(stderr, " OK !\nWaiting for all threads to start ...");
>
>     start = 1;
>     while (actthreads != nbthreads)
>         usleep(10000);         /* waiting for a bit of stability */
>
>     fprintf(stderr, "Go !\n");
>
>     count = 1;
>     time(&ts);
>
>     sleep(measure_time);
>
>     count = 0;
>     stop = 1;
>     time(&te);
>
>
>     for (i = 0; i < nbthreads; i++)
>     {
>         value += totalwork[i];
>         if (totalwork[i] == 0)
>             ++thrzero;
>     }
>     avgvalue = value / nbthreads;
>     value /= (int) difftime(te, ts);
>     avgwork = (int) (value / nbthreads);
>
>     for (i = 0; i < nbthreads; i++)
>     {
>         double          difvv = (double) (totalwork[i] - avgvalue);
>
>         sqrdev += difvv * difvv;
>     }
>
>     while (actthreads > 0)
>         usleep(10000);
>
>     printf("%d\t\t%lld\t\t%d\t\t%d\t\t%f\n", nbthreads, value, avgwork,
> thrzero, sqrdev / ((double) nbthreads * avgvalue * avgvalue));
>
>     exit(0);
>
> }
>

Andrea found it before me :)


- Davide
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: multi-queue scheduler update
  2001-01-19  1:35     ` Andrea Arcangeli
@ 2001-01-19  1:48       ` Andi Kleen
  0 siblings, 0 replies; 35+ messages in thread
From: Andi Kleen @ 2001-01-19  1:48 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel

On Fri, Jan 19, 2001 at 02:35:02AM +0100, Andrea Arcangeli wrote:
> On Thu, Jan 18, 2001 at 08:00:16PM -0500, Mark Hahn wrote:
> > > >                            microseconds/yield
> > > > # threads      2.2.16-22           2.4        2.4-multi-queue
> > > > ------------   ---------         --------     ---------------
> > > > 16               18.740            4.603         1.455
> > > 
> > > I remeber the O(1) scheduler from Davide Libenzi was beating the mainline O(N)
> > 
> > isn't the normal case (as in "The Right Case to optimize") 
> > where there are close to zero runnable tasks?  what realistic/sane
> > scenarios have very large numbers of spinning threads?  all server
> > situations I can think of do not.  not volanomark -loopback, surely!
> 
> This is why the numbers with 2/4/8 threads in the runqueue are the most
> interesting ones 8)

With Arjan's patch to use prefetching for the runqueue scan the numbers
will be likely different [at least on cpus that can benefit from prefetching
like p2+ or athlon] 

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Lse-tech] Re: multi-queue scheduler update
@ 2001-01-19 15:47 Hubertus Franke
  2001-01-19 17:11 ` Mike Kravetz
  0 siblings, 1 reply; 35+ messages in thread
From: Hubertus Franke @ 2001-01-19 15:47 UTC (permalink / raw)
  To: lse-tech, linux-kernel; +Cc: Pratap Pattnaik

Indeed, Andi,  we tried that  priority==tablebased scheduler approach. If
you check the call for participation again, what
we are trying to do is to get to the bottom of what actually impacts
scheduler performance and subsequently
come up with a combined best bread (i.e. satisfies the highend and low
end). Since this is still work in progress, here
are a few numbers that I got from running the 2.4.0-test12 kernels for
vanilla and priority based complementing Mike's numbers.
I add this as an extra columns to Mikes table. Our Machine is 8-way 700 MHZ
Pentium 2MB caches, though I don't think
for the sched_yield test it makes a difference. I ran with 50 seconds
runtime per test to get by the FRC problem.

                           microseconds/yield
#threads 2.2.16-22      2.4        2.4-MQ          2.4.0-test12
2.4.0-test12-Prio
------   ---------    --------     ----------      ------------
-----------------
16         18.740      4.603        1.455           4.51          4.39
32         17.702      5.134        1.456           5.01          4.06
64         23.300      5.586        1.466           5.70          3.99
128        47.273     18.812        1.480          12.06 %        3.99
256        105.701    71.147        1.517          60.2           4.05
512        FRC       143.500        1.661         132.5           4.19
1024       FRC       196.425        6.166         295.4 #         4.57
2048       FRC         FRC         23.291         460.4           5.34
4096       FRC         FRC         47.117         631.3           5.91

*FRC = failed to reach confidence level

Some comments to some numbers:
#) Mike measure 196, I measured 295 ?? Somebody has a typo here I assume.
%) This actually varied between 8 and 14 on multiple runs averaging 12.
Bill Hartner suggests that these might be cache issues (OT).

What you can see from these numbers is that MQ does an awesome job up to
1024 threads. When measuring in the future, we will take from now on the
general concern about low number of threads into account. Your points are
well taken. I m pretty confident our MQ scheduler will be in reasonable
ballpark of the current scheduler. To go on, the priority==tablebased
scheduler does better for very high number of processes. It actually beats
the vanilla version throughout (>= 16). It stays stable, because we stop
immediately when we found a process that run last on the invoking cpu. Only
way we could do better is to continue searching for a affinity boost due to
<mm>. Here the discussions might start. The next version of the tablebased
scheduler will take into account whether the table index only covers one
goodness range or multiple (e.g. RT). This could give some better
performance for the general case.

The roadmap ahead for Mike and I and the rest of the crew is to combine
these methods. In our first attempt we first wanted to demonstrate that the
MQ does a great job while emulating current scheduler semantics. Now if we
relax these semantics just a bit, e.g. we would be tolerating a bit more
priority inversion (which any scheduler does that deploys affinity boosts),
we probably can do even better.

These are the things we are currently doing and soon should have some
results now:

(1) We are preparing for LWE with a full  measurement of the latest kernel.
For this purpose we have frozen to 2.4.1-pre8.
Unless ofcourse you are telling us this is not a good kernel to run on.
(2) We will measure 1-4096 threads for vanilla, priority and MQ for two
tests (both provided by Bill Hartner in Austin).
     (a) sched_yield          although not a meaningful benchmark, it
really exposes the raw overhead of scheduling
                    the problem here it artificially generates lock
contention at a rate we would not see in
                    general applications.
     (b) chatroom        similar to VolanoBenchmark, but easier to use and
measure. This gives a better idea what
                    the impact would be for real applications

On the progress side. Now that we already have a good idea what the MQ and
the table==priority based scheduler can do, we
want to combine them and see how that impacts performance. Next we still
have the open issues whether keeping queues in priority order makes sense
or not. That exercise should be done for both MQ and table based scheduler.

Next, we have started looking into breaking up the CPU set. Right now we
scan all CPUs to find an appropriate CPU to preempt.
For large number of CPUs that can cost particular with very few number
(1-4) of threads.
We are currently experimenting with breaking up the CPUs into smaller sets
and just schedule with in their set, i.e. we don't look beyond the set to
balance (e.g. priorities etc). Occasionally (1HZ) we run a load balancing
mechanism to redistribute work.
We have a simple prototype running demonstrating the idea.This could be
also useful for NUMA systems as well. We will post this patch over the MQ
soon on the site.

Hubertus Franke
Enterprise Linux Group (Mgr),  Linux Technology Center (Member Scalability)
, OS-PIC (Chair)
email: frankeh@us.ibm.com
(w) 914-945-2003    (fax) 914-945-4425   TL: 862-2003

Andi Kleen <ak@suse.de>@lists.sourceforge.net on 01/18/2001 07:51:01 PM

Sent by:  lse-tech-admin@lists.sourceforge.net

To:   Andrea Arcangeli <andrea@suse.de>
cc:   Mike Kravetz <mkravetz@sequent.com>, lse-tech@lists.sourceforge.net,
      linux-kernel@vger.kernel.org
Subject:  Re: [Lse-tech] Re: multi-queue scheduler update

On Fri, Jan 19, 2001 at 01:26:16AM +0100, Andrea Arcangeli wrote:
> I remeber the O(1) scheduler from Davide Libenzi was beating the mainline
O(N)
> scheduler with over 7 tasks in the runqueue (actually I'm not sure if the
> number was 7 but certainly it was under 10). So if you also use a O(1)
> scheduler too as I guess (since you have a chance to run fast on the lots
of
> tasks running case) the most interesting thing is how you score with
2/4/8
> tasks in the runqueue (I think the tests on the O(1) scheduler patch was
done
> at max on a 2-way SMP btw). (the argument for which Davide's patch wasn't
> included is that most machines have less than 4/5 tasks in the runqueue
at the
> same time)

They seem to have tried that in a separate patch:
http://lse.sourceforge.net/scheduling/PrioScheduler.html

Very nice literate programming style btw @-)

-Andi

_______________________________________________
Lse-tech mailing list
Lse-tech@lists.sourceforge.net
http://lists.sourceforge.net/lists/listinfo/lse-tech

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Lse-tech] Re: multi-queue scheduler update
  2001-01-19  0:52   ` [Lse-tech] " Mike Kravetz
  2001-01-19  1:30     ` Andrea Arcangeli
@ 2001-01-19 16:06     ` David Lang
  1 sibling, 0 replies; 35+ messages in thread
From: David Lang @ 2001-01-19 16:06 UTC (permalink / raw)
  To: Mike Kravetz; +Cc: Andrea Arcangeli, lse-tech, linux-kernel

another thing that would be interesting is what is the overhead on UP or
small (2-4 way) SMP machines

David Lang

On Thu, 18 Jan 2001, Mike Kravetz wrote:

> Date: Thu, 18 Jan 2001 16:52:25 -0800
> From: Mike Kravetz <mkravetz@sequent.com>
> To: Andrea Arcangeli <andrea@suse.de>
> Cc: lse-tech@lists.sourceforge.net, linux-kernel@vger.kernel.org
> Subject: Re: [Lse-tech] Re: multi-queue scheduler update
>
> On Fri, Jan 19, 2001 at 01:26:16AM +0100, Andrea Arcangeli wrote:
> > On Thu, Jan 18, 2001 at 03:53:11PM -0800, Mike Kravetz wrote:
> > > Here are some very preliminary numbers from sched_test_yield
> > > (which was previously posted to this (lse-tech) list by Bill
> > > Hartner).  Tests were run on a system with 8 700 MHz Pentium
> > > III processors.
> > >
> > >                            microseconds/yield
> > > # threads      2.2.16-22           2.4        2.4-multi-queue
> > > ------------   ---------         --------     ---------------
> > > 16               18.740            4.603         1.455
> >
> > I remeber the O(1) scheduler from Davide Libenzi was beating the mainline O(N)
> > scheduler with over 7 tasks in the runqueue (actually I'm not sure if the
> > number was 7 but certainly it was under 10). So if you also use a O(1)
> > scheduler too as I guess (since you have a chance to run fast on the lots of
> > tasks running case) the most interesting thing is how you score with 2/4/8
> > tasks in the runqueue (I think the tests on the O(1) scheduler patch was done
> > at max on a 2-way SMP btw). (the argument for which Davide's patch wasn't
> > included is that most machines have less than 4/5 tasks in the runqueue at the
> > same time)
> >
> > Andrea
>
> Thanks for the suggestion.  The only reason I hesitated to test with
> a small number of threads is because I was under the assumption that
> this particular benchmark may have problems if the number of threads
> was less than the number of processors.  I'll give the tests a try
> with a smaller number of threads.  I'm also open to suggestions for
> what benchmarks/test methods I could use for scheduler testing.  If
> you remember what people have used in the past, please let me know.
>
> --
> Mike Kravetz                                 mkravetz@sequent.com
> IBM Linux Technology Center
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> Please read the FAQ at http://www.tux.org/lkml/
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Lse-tech] Re: multi-queue scheduler update
@ 2001-01-19 16:30 Hubertus Franke
  2001-01-19 16:33 ` nick
  0 siblings, 1 reply; 35+ messages in thread
From: Hubertus Franke @ 2001-01-19 16:30 UTC (permalink / raw)
  To: David Lang; +Cc: Mike Kravetz, Andrea Arcangeli, lse-tech, linux-kernel


Sure, we are measuring that as well.
We are running all these benchmarks and configurations that I mentioned in
my previous message on
1-2-4-6- and 8 way configurations.
We have posted some preliminary results on older kernels on the website:

http://lse.sourceforge.net/scheduling/prelim.html

MQ scheduler is meaningless for a UP kernel that is only build under the
SMP flag.
The priority==tablebased scheduler does make sense to run on a UP (i.e. not
SMP compiled) kernel.
Some more fine-tuning of the current code base might improve that case,
because affinity is not a concern
I can simply go to my top table hash, retrieve the first P entry with
!P->has_cpu and I am ready to go.

Hubertus Franke
Enterprise Linux Group (Mgr),  Linux Technology Center (Member Scalability)
, OS-PIC (Chair)
email: frankeh@us.ibm.com
(w) 914-945-2003    (fax) 914-945-4425   TL: 862-2003



David Lang <dlang@diginsite.com>@lists.sourceforge.net on 01/19/2001
11:06:37 AM

Sent by:  lse-tech-admin@lists.sourceforge.net


To:   Mike Kravetz <mkravetz@sequent.com>
cc:   Andrea Arcangeli <andrea@suse.de>, <lse-tech@lists.sourceforge.net>,
      <linux-kernel@vger.kernel.org>
Subject:  Re: [Lse-tech] Re: multi-queue scheduler update



another thing that would be interesting is what is the overhead on UP or
small (2-4 way) SMP machines

David Lang

On Thu, 18 Jan 2001, Mike Kravetz wrote:

> Date: Thu, 18 Jan 2001 16:52:25 -0800
> From: Mike Kravetz <mkravetz@sequent.com>
> To: Andrea Arcangeli <andrea@suse.de>
> Cc: lse-tech@lists.sourceforge.net, linux-kernel@vger.kernel.org
> Subject: Re: [Lse-tech] Re: multi-queue scheduler update
>
> On Fri, Jan 19, 2001 at 01:26:16AM +0100, Andrea Arcangeli wrote:
> > On Thu, Jan 18, 2001 at 03:53:11PM -0800, Mike Kravetz wrote:
> > > Here are some very preliminary numbers from sched_test_yield
> > > (which was previously posted to this (lse-tech) list by Bill
> > > Hartner).  Tests were run on a system with 8 700 MHz Pentium
> > > III processors.
> > >
> > >                            microseconds/yield
> > > # threads      2.2.16-22           2.4        2.4-multi-queue
> > > ------------   ---------         --------     ---------------
> > > 16               18.740            4.603         1.455
> >
> > I remeber the O(1) scheduler from Davide Libenzi was beating the
mainline O(N)
> > scheduler with over 7 tasks in the runqueue (actually I'm not sure if
the
> > number was 7 but certainly it was under 10). So if you also use a O(1)
> > scheduler too as I guess (since you have a chance to run fast on the
lots of
> > tasks running case) the most interesting thing is how you score with
2/4/8
> > tasks in the runqueue (I think the tests on the O(1) scheduler patch
was done
> > at max on a 2-way SMP btw). (the argument for which Davide's patch
wasn't
> > included is that most machines have less than 4/5 tasks in the runqueue
at the
> > same time)
> >
> > Andrea
>
> Thanks for the suggestion.  The only reason I hesitated to test with
> a small number of threads is because I was under the assumption that
> this particular benchmark may have problems if the number of threads
> was less than the number of processors.  I'll give the tests a try
> with a smaller number of threads.  I'm also open to suggestions for
> what benchmarks/test methods I could use for scheduler testing.  If
> you remember what people have used in the past, please let me know.
>
> --
> Mike Kravetz                                 mkravetz@sequent.com
> IBM Linux Technology Center
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel"
in
> the body of a message to majordomo@vger.kernel.org
> Please read the FAQ at http://www.tux.org/lkml/
>

_______________________________________________
Lse-tech mailing list
Lse-tech@lists.sourceforge.net
http://lists.sourceforge.net/lists/listinfo/lse-tech



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Lse-tech] Re: multi-queue scheduler update
  2001-01-19 16:30 Hubertus Franke
@ 2001-01-19 16:33 ` nick
  2001-01-19 17:06   ` Tim Wright
  0 siblings, 1 reply; 35+ messages in thread
From: nick @ 2001-01-19 16:33 UTC (permalink / raw)
  To: Hubertus Franke
  Cc: David Lang, Mike Kravetz, Andrea Arcangeli, lse-tech,
	linux-kernel

You might want to rerun the tests with less cache heavy procs.  The 2meg
xeons you are using could distort things from what the average linux user
would see (running with 256-512k cache).
	Nick

On Fri, 19 Jan 2001, Hubertus Franke wrote:

> 
> Sure, we are measuring that as well.
> We are running all these benchmarks and configurations that I mentioned in
> my previous message on
> 1-2-4-6- and 8 way configurations.
> We have posted some preliminary results on older kernels on the website:
> 
> http://lse.sourceforge.net/scheduling/prelim.html
> 
> MQ scheduler is meaningless for a UP kernel that is only build under the
> SMP flag.
> The priority==tablebased scheduler does make sense to run on a UP (i.e. not
> SMP compiled) kernel.
> Some more fine-tuning of the current code base might improve that case,
> because affinity is not a concern
> I can simply go to my top table hash, retrieve the first P entry with
> !P->has_cpu and I am ready to go.
> 
> Hubertus Franke
> Enterprise Linux Group (Mgr),  Linux Technology Center (Member Scalability)
> , OS-PIC (Chair)
> email: frankeh@us.ibm.com
> (w) 914-945-2003    (fax) 914-945-4425   TL: 862-2003
> 
> 
> 
> David Lang <dlang@diginsite.com>@lists.sourceforge.net on 01/19/2001
> 11:06:37 AM
> 
> Sent by:  lse-tech-admin@lists.sourceforge.net
> 
> 
> To:   Mike Kravetz <mkravetz@sequent.com>
> cc:   Andrea Arcangeli <andrea@suse.de>, <lse-tech@lists.sourceforge.net>,
>       <linux-kernel@vger.kernel.org>
> Subject:  Re: [Lse-tech] Re: multi-queue scheduler update
> 
> 
> 
> another thing that would be interesting is what is the overhead on UP or
> small (2-4 way) SMP machines
> 
> David Lang
> 
> On Thu, 18 Jan 2001, Mike Kravetz wrote:
> 
> > Date: Thu, 18 Jan 2001 16:52:25 -0800
> > From: Mike Kravetz <mkravetz@sequent.com>
> > To: Andrea Arcangeli <andrea@suse.de>
> > Cc: lse-tech@lists.sourceforge.net, linux-kernel@vger.kernel.org
> > Subject: Re: [Lse-tech] Re: multi-queue scheduler update
> >
> > On Fri, Jan 19, 2001 at 01:26:16AM +0100, Andrea Arcangeli wrote:
> > > On Thu, Jan 18, 2001 at 03:53:11PM -0800, Mike Kravetz wrote:
> > > > Here are some very preliminary numbers from sched_test_yield
> > > > (which was previously posted to this (lse-tech) list by Bill
> > > > Hartner).  Tests were run on a system with 8 700 MHz Pentium
> > > > III processors.
> > > >
> > > >                            microseconds/yield
> > > > # threads      2.2.16-22           2.4        2.4-multi-queue
> > > > ------------   ---------         --------     ---------------
> > > > 16               18.740            4.603         1.455
> > >
> > > I remeber the O(1) scheduler from Davide Libenzi was beating the
> mainline O(N)
> > > scheduler with over 7 tasks in the runqueue (actually I'm not sure if
> the
> > > number was 7 but certainly it was under 10). So if you also use a O(1)
> > > scheduler too as I guess (since you have a chance to run fast on the
> lots of
> > > tasks running case) the most interesting thing is how you score with
> 2/4/8
> > > tasks in the runqueue (I think the tests on the O(1) scheduler patch
> was done
> > > at max on a 2-way SMP btw). (the argument for which Davide's patch
> wasn't
> > > included is that most machines have less than 4/5 tasks in the runqueue
> at the
> > > same time)
> > >
> > > Andrea
> >
> > Thanks for the suggestion.  The only reason I hesitated to test with
> > a small number of threads is because I was under the assumption that
> > this particular benchmark may have problems if the number of threads
> > was less than the number of processors.  I'll give the tests a try
> > with a smaller number of threads.  I'm also open to suggestions for
> > what benchmarks/test methods I could use for scheduler testing.  If
> > you remember what people have used in the past, please let me know.
> >
> > --
> > Mike Kravetz                                 mkravetz@sequent.com
> > IBM Linux Technology Center
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel"
> in
> > the body of a message to majordomo@vger.kernel.org
> > Please read the FAQ at http://www.tux.org/lkml/
> >
> 
> _______________________________________________
> Lse-tech mailing list
> Lse-tech@lists.sourceforge.net
> http://lists.sourceforge.net/lists/listinfo/lse-tech
> 
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> Please read the FAQ at http://www.tux.org/lkml/
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Lse-tech] Re: multi-queue scheduler update
@ 2001-01-19 16:47 Hubertus Franke
  0 siblings, 0 replies; 35+ messages in thread
From: Hubertus Franke @ 2001-01-19 16:47 UTC (permalink / raw)
  To: nick; +Cc: lse-tech, linux-kernel


In the sched_yield benchmark case this is not a problem , because the
threads don't have any memory footprint, all cloned.
The chatroom, I agree with you. However, I assume that these big irons
(8-ways) will be pretty much loaded with at least 1MB cache. Maybe at this
point another cite with an 8-way system and small cache could run this. I
don't know whether those actually exists.

Alternatively, we could setup a smaller 4-way system (we have a 4-way
300MHZ-P-II Xeon, with 512MB cache) that would fit into your class and we
could also collect the numbers on those and post those.

We are automizing the reboot process right now where we are modifying the
lilol.conf so we can run many tests with different "maxcpus=.." unattended.

So little to do, so much time... ahhh make that so little time, so much to
do.

Hubertus Franke
Enterprise Linux Group (Mgr),  Linux Technology Center (Member Scalability)
, OS-PIC (Chair)
email: frankeh@us.ibm.com
(w) 914-945-2003    (fax) 914-945-4425   TL: 862-2003



nick@snowman.net on 01/19/2001 11:33:34 AM

To:   Hubertus Franke/Watson/IBM@IBMUS
cc:   David Lang <dlang@diginsite.com>, Mike Kravetz
      <mkravetz@sequent.com>, Andrea Arcangeli <andrea@suse.de>,
      lse-tech@lists.sourceforge.net, linux-kernel@vger.kernel.org
Subject:  Re: [Lse-tech] Re: multi-queue scheduler update



You might want to rerun the tests with less cache heavy procs.  The 2meg
xeons you are using could distort things from what the average linux user
would see (running with 256-512k cache).
     Nick

On Fri, 19 Jan 2001, Hubertus Franke wrote:

>
> Sure, we are measuring that as well.
> We are running all these benchmarks and configurations that I mentioned
in
> my previous message on
> 1-2-4-6- and 8 way configurations.
> We have posted some preliminary results on older kernels on the website:
>
> http://lse.sourceforge.net/scheduling/prelim.html
>
> MQ scheduler is meaningless for a UP kernel that is only build under the
> SMP flag.
> The priority==tablebased scheduler does make sense to run on a UP (i.e.
not
> SMP compiled) kernel.
> Some more fine-tuning of the current code base might improve that case,
> because affinity is not a concern
> I can simply go to my top table hash, retrieve the first P entry with
> !P->has_cpu and I am ready to go.
>
> Hubertus Franke
> Enterprise Linux Group (Mgr),  Linux Technology Center (Member
Scalability)
> , OS-PIC (Chair)
> email: frankeh@us.ibm.com
> (w) 914-945-2003    (fax) 914-945-4425   TL: 862-2003
>
>
>
> David Lang <dlang@diginsite.com>@lists.sourceforge.net on 01/19/2001
> 11:06:37 AM
>
> Sent by:  lse-tech-admin@lists.sourceforge.net
>
>
> To:   Mike Kravetz <mkravetz@sequent.com>
> cc:   Andrea Arcangeli <andrea@suse.de>,
<lse-tech@lists.sourceforge.net>,
>       <linux-kernel@vger.kernel.org>
> Subject:  Re: [Lse-tech] Re: multi-queue scheduler update
>
>
>
> another thing that would be interesting is what is the overhead on UP or
> small (2-4 way) SMP machines
>
> David Lang
>
> On Thu, 18 Jan 2001, Mike Kravetz wrote:
>
> > Date: Thu, 18 Jan 2001 16:52:25 -0800
> > From: Mike Kravetz <mkravetz@sequent.com>
> > To: Andrea Arcangeli <andrea@suse.de>
> > Cc: lse-tech@lists.sourceforge.net, linux-kernel@vger.kernel.org
> > Subject: Re: [Lse-tech] Re: multi-queue scheduler update
> >
> > On Fri, Jan 19, 2001 at 01:26:16AM +0100, Andrea Arcangeli wrote:
> > > On Thu, Jan 18, 2001 at 03:53:11PM -0800, Mike Kravetz wrote:
> > > > Here are some very preliminary numbers from sched_test_yield
> > > > (which was previously posted to this (lse-tech) list by Bill
> > > > Hartner).  Tests were run on a system with 8 700 MHz Pentium
> > > > III processors.
> > > >
> > > >                            microseconds/yield
> > > > # threads      2.2.16-22           2.4        2.4-multi-queue
> > > > ------------   ---------         --------     ---------------
> > > > 16               18.740            4.603         1.455
> > >
> > > I remeber the O(1) scheduler from Davide Libenzi was beating the
> mainline O(N)
> > > scheduler with over 7 tasks in the runqueue (actually I'm not sure if
> the
> > > number was 7 but certainly it was under 10). So if you also use a
O(1)
> > > scheduler too as I guess (since you have a chance to run fast on the
> lots of
> > > tasks running case) the most interesting thing is how you score with
> 2/4/8
> > > tasks in the runqueue (I think the tests on the O(1) scheduler patch
> was done
> > > at max on a 2-way SMP btw). (the argument for which Davide's patch
> wasn't
> > > included is that most machines have less than 4/5 tasks in the
runqueue
> at the
> > > same time)
> > >
> > > Andrea
> >
> > Thanks for the suggestion.  The only reason I hesitated to test with
> > a small number of threads is because I was under the assumption that
> > this particular benchmark may have problems if the number of threads
> > was less than the number of processors.  I'll give the tests a try
> > with a smaller number of threads.  I'm also open to suggestions for
> > what benchmarks/test methods I could use for scheduler testing.  If
> > you remember what people have used in the past, please let me know.
> >
> > --
> > Mike Kravetz                                 mkravetz@sequent.com
> > IBM Linux Technology Center
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel"
> in
> > the body of a message to majordomo@vger.kernel.org
> > Please read the FAQ at http://www.tux.org/lkml/
> >
>
> _______________________________________________
> Lse-tech mailing list
> Lse-tech@lists.sourceforge.net
> http://lists.sourceforge.net/lists/listinfo/lse-tech
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel"
in
> the body of a message to majordomo@vger.kernel.org
> Please read the FAQ at http://www.tux.org/lkml/
>




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Lse-tech] Re: multi-queue scheduler update
  2001-01-19 16:33 ` nick
@ 2001-01-19 17:06   ` Tim Wright
  0 siblings, 0 replies; 35+ messages in thread
From: Tim Wright @ 2001-01-19 17:06 UTC (permalink / raw)
  To: nick
  Cc: Hubertus Franke, David Lang, Mike Kravetz, Andrea Arcangeli,
	lse-tech, linux-kernel

Hi Nick,
you can't run with <512K L2 for >2-way on Intel. The 256K L2 cache cumine
procs only support 2-way SMP. For 4-way and greater, you have to use Xeon
procs, and they come in three flavours - 512K, 1M, and 2M. The machine that
Mike is using has 1M parts (which are fairly common at the 4/8-way level).
Hubertus has the 2M parts which are more expensive. By the time you have 8
procs, the 2M part can give a substantial performance boost on some workloads.

Tim

On Fri, Jan 19, 2001 at 11:33:34AM -0500, nick@snowman.net wrote:
> You might want to rerun the tests with less cache heavy procs.  The 2meg
> xeons you are using could distort things from what the average linux user
> would see (running with 256-512k cache).
> 	Nick
> 
> On Fri, 19 Jan 2001, Hubertus Franke wrote:
> 
> > 
> > Sure, we are measuring that as well.
> > We are running all these benchmarks and configurations that I mentioned in
> > my previous message on
> > 1-2-4-6- and 8 way configurations.
> > We have posted some preliminary results on older kernels on the website:
> > 
> > http://lse.sourceforge.net/scheduling/prelim.html
> > 
> > MQ scheduler is meaningless for a UP kernel that is only build under the
> > SMP flag.
> > The priority==tablebased scheduler does make sense to run on a UP (i.e. not
> > SMP compiled) kernel.
> > Some more fine-tuning of the current code base might improve that case,
> > because affinity is not a concern
> > I can simply go to my top table hash, retrieve the first P entry with
> > !P->has_cpu and I am ready to go.
> > 
> > Hubertus Franke
> > Enterprise Linux Group (Mgr),  Linux Technology Center (Member Scalability)
> > , OS-PIC (Chair)
> > email: frankeh@us.ibm.com
> > (w) 914-945-2003    (fax) 914-945-4425   TL: 862-2003
> > 
> > 
> > 
> > David Lang <dlang@diginsite.com>@lists.sourceforge.net on 01/19/2001
> > 11:06:37 AM
> > 
> > Sent by:  lse-tech-admin@lists.sourceforge.net
> > 
> > 
> > To:   Mike Kravetz <mkravetz@sequent.com>
> > cc:   Andrea Arcangeli <andrea@suse.de>, <lse-tech@lists.sourceforge.net>,
> >       <linux-kernel@vger.kernel.org>
> > Subject:  Re: [Lse-tech] Re: multi-queue scheduler update
> > 
> > 
> > 
> > another thing that would be interesting is what is the overhead on UP or
> > small (2-4 way) SMP machines
> > 
> > David Lang
> > 
> > On Thu, 18 Jan 2001, Mike Kravetz wrote:
> > 
> > > Date: Thu, 18 Jan 2001 16:52:25 -0800
> > > From: Mike Kravetz <mkravetz@sequent.com>
> > > To: Andrea Arcangeli <andrea@suse.de>
> > > Cc: lse-tech@lists.sourceforge.net, linux-kernel@vger.kernel.org
> > > Subject: Re: [Lse-tech] Re: multi-queue scheduler update
> > >
> > > On Fri, Jan 19, 2001 at 01:26:16AM +0100, Andrea Arcangeli wrote:
> > > > On Thu, Jan 18, 2001 at 03:53:11PM -0800, Mike Kravetz wrote:
> > > > > Here are some very preliminary numbers from sched_test_yield
> > > > > (which was previously posted to this (lse-tech) list by Bill
> > > > > Hartner).  Tests were run on a system with 8 700 MHz Pentium
> > > > > III processors.
> > > > >
> > > > >                            microseconds/yield
> > > > > # threads      2.2.16-22           2.4        2.4-multi-queue
> > > > > ------------   ---------         --------     ---------------
> > > > > 16               18.740            4.603         1.455
> > > >
> > > > I remeber the O(1) scheduler from Davide Libenzi was beating the
> > mainline O(N)
> > > > scheduler with over 7 tasks in the runqueue (actually I'm not sure if
> > the
> > > > number was 7 but certainly it was under 10). So if you also use a O(1)
> > > > scheduler too as I guess (since you have a chance to run fast on the
> > lots of
> > > > tasks running case) the most interesting thing is how you score with
> > 2/4/8
> > > > tasks in the runqueue (I think the tests on the O(1) scheduler patch
> > was done
> > > > at max on a 2-way SMP btw). (the argument for which Davide's patch
> > wasn't
> > > > included is that most machines have less than 4/5 tasks in the runqueue
> > at the
> > > > same time)
> > > >
> > > > Andrea
> > >
> > > Thanks for the suggestion.  The only reason I hesitated to test with
> > > a small number of threads is because I was under the assumption that
> > > this particular benchmark may have problems if the number of threads
> > > was less than the number of processors.  I'll give the tests a try
> > > with a smaller number of threads.  I'm also open to suggestions for
> > > what benchmarks/test methods I could use for scheduler testing.  If
> > > you remember what people have used in the past, please let me know.
> > >
> > > --
> > > Mike Kravetz                                 mkravetz@sequent.com
> > > IBM Linux Technology Center
> > > -
> > > To unsubscribe from this list: send the line "unsubscribe linux-kernel"
> > in
> > > the body of a message to majordomo@vger.kernel.org
> > > Please read the FAQ at http://www.tux.org/lkml/
> > >
> > 
> > _______________________________________________
> > Lse-tech mailing list
> > Lse-tech@lists.sourceforge.net
> > http://lists.sourceforge.net/lists/listinfo/lse-tech
> > 
> > 
> > 
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > Please read the FAQ at http://www.tux.org/lkml/
> > 
> 
> 
> _______________________________________________
> Lse-tech mailing list
> Lse-tech@lists.sourceforge.net
> http://lists.sourceforge.net/lists/listinfo/lse-tech

-- 
Tim Wright - timw@splhi.com or timw@aracnet.com or twright@us.ibm.com
IBM Linux Technology Center, Beaverton, Oregon
Interested in Linux scalability ? Look at http://lse.sourceforge.net/
"Nobody ever said I was charming, they said "Rimmer, you're a git!"" RD VI
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Lse-tech] Re: multi-queue scheduler update
  2001-01-19 15:47 [Lse-tech] " Hubertus Franke
@ 2001-01-19 17:11 ` Mike Kravetz
       [not found]   ` <LYR76657-5332-2001.01.19-12.12.38--mikek#sequent.com@lyris.sequent.com>
  0 siblings, 1 reply; 35+ messages in thread
From: Mike Kravetz @ 2001-01-19 17:11 UTC (permalink / raw)
  To: Hubertus Franke; +Cc: lse-tech, linux-kernel

On Fri, Jan 19, 2001 at 10:47:06AM -0500, Hubertus Franke wrote:
<stuff deleted>
> What you can see from these numbers is that MQ does an awesome job up to
> 1024 threads. When measuring in the future, we will take from now on the
> general concern about low number of threads into account. Your points are
> well taken. I m pretty confident our MQ scheduler will be in reasonable
> ballpark of the current scheduler.
<more stuff deleted>

Hubertus,

'Hopefully' the multi-queue scheduler will be in the ballpark for
low number of threads.  However, remember the extra overhead being
incurred in the current implementation.  To maintain existing
scheduler behavior, we look at all CPU specific runqueues to find
the highest priority (goodness) task in the system.  Therefore,
when running with a single thread on an 8 processor system, we
examine 8 runqueues instead of the single global runqueue.  In
a test where tasks are simply spinning doing sched_yield()s, I
suspect this difference may be significant.

I'll run the IIRC benchmark with low thread counts, and post the
results.  In adition, I have some ideas on how to make intelligent
decisions to avoid examining all runqueueus when the number of
running tasks is less than the number of processors.

-- 
Mike Kravetz                                 mkravetz@sequent.com
IBM Linux Technology Center
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Lse-tech] Re: multi-queue scheduler update
@ 2001-01-19 18:03 Hubertus Franke
  2001-01-19 19:52 ` bert hubert
  0 siblings, 1 reply; 35+ messages in thread
From: Hubertus Franke @ 2001-01-19 18:03 UTC (permalink / raw)
  To: lse-tech, linux-kernel; +Cc: l

Mike sounds good, we will do all our measurements from now on with thread
count for the entire range from 1 to 16 and
then in power of twos upto 2048 and for maxcpus=1,2,4,6,8. Do you think
that 4096 is overkill ? So far the numbers you got and we got over here are
the same. Andi suggested that <pre8> has some problems with IO scheduling.

You are right, "hopefully + ballpark" ~= 10%.
As for intelligent decisions, the general loadbalancing that we already
started might help out a bit here.

Other stuff we could look into....
Remember we talked about counting active idle threads at some point.

if (active_idle_threads < smp_num_cpus) {
     /* now we know that we simply give it to the first idle_thread found,
instead of
      * collecting the max_na_goodness value and somewhat sorting through
it
      * similar to the current Vanilla algorithm
      */
} else {
     /* current MQ algorithm */
}

Just shooting from the hip here, lets restart this discussion.

Hubertus Franke
Enterprise Linux Group (Mgr),  Linux Technology Center (Member Scalability)
, OS-PIC (Chair)
email: frankeh@us.ibm.com
(w) 914-945-2003    (fax) 914-945-4425   TL: 862-2003

Mike Kravetz <mkravetz@sequent.com> on 01/19/2001 12:11:04 PM

To:   Hubertus Franke/Watson/IBM@IBMUS
cc:   lse-tech@lists.sourceforge.net, linux-kernel@vger.kernel.org
Subject:  Re: [Lse-tech] Re: multi-queue scheduler update

On Fri, Jan 19, 2001 at 10:47:06AM -0500, Hubertus Franke wrote:
<stuff deleted>
> What you can see from these numbers is that MQ does an awesome job up to
> 1024 threads. When measuring in the future, we will take from now on the
> general concern about low number of threads into account. Your points are
> well taken. I m pretty confident our MQ scheduler will be in reasonable
> ballpark of the current scheduler.
<more stuff deleted>

Hubertus,

'Hopefully' the multi-queue scheduler will be in the ballpark for
low number of threads.  However, remember the extra overhead being
incurred in the current implementation.  To maintain existing
scheduler behavior, we look at all CPU specific runqueues to find
the highest priority (goodness) task in the system.  Therefore,
when running with a single thread on an 8 processor system, we
examine 8 runqueues instead of the single global runqueue.  In
a test where tasks are simply spinning doing sched_yield()s, I
suspect this difference may be significant.

I'll run the IIRC benchmark with low thread counts, and post the
results.  In adition, I have some ideas on how to make intelligent
decisions to avoid examining all runqueueus when the number of
running tasks is less than the number of processors.

--
Mike Kravetz                                 mkravetz@sequent.com
IBM Linux Technology Center

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Lse-tech] Re: multi-queue scheduler update
  2001-01-19 18:03 Hubertus Franke
@ 2001-01-19 19:52 ` bert hubert
  0 siblings, 0 replies; 35+ messages in thread
From: bert hubert @ 2001-01-19 19:52 UTC (permalink / raw)
  To: linux-kernel

On Fri, Jan 19, 2001 at 01:03:05PM -0500, Hubertus Franke wrote:
> 
> Mike sounds good, we will do all our measurements from now on with thread
> count for the entire range from 1 to 16 and
> then in power of twos upto 2048 and for maxcpus=1,2,4,6,8. Do you think
> that 4096 is overkill ? So far the numbers you got and we got over here are
> the same. Andi suggested that <pre8> has some problems with IO scheduling.

I have used up to 3000 threads in serious non-frivolous programs. Although I
have since been flamed over at #kernelnewbies that I should have been using
a statemachine :-)

Regards,

bert hubert

-- 
PowerDNS                     Versatile DNS Services  
Trilab                       The Technology People   
'SYN! .. SYN|ACK! .. ACK!' - the mating call of the internet
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Lse-tech] Re: multi-queue scheduler update
       [not found]   ` <LYR76657-5332-2001.01.19-12.12.38--mikek#sequent.com@lyris.sequent.com>
@ 2001-01-19 20:32     ` Mike Kravetz
  0 siblings, 0 replies; 35+ messages in thread
From: Mike Kravetz @ 2001-01-19 20:32 UTC (permalink / raw)
  To: Mark Hahn; +Cc: linux-kernel

On Fri, Jan 19, 2001 at 03:12:11PM -0500, Mark Hahn wrote:
> > incurred in the current implementation.  To maintain existing
> > scheduler behavior, we look at all CPU specific runqueues to find
> > the highest priority (goodness) task in the system.  Therefore,
> 
> do you have cpu-affinity?  the mainstream scheduler at one time
> actually tuned the decision to move a task based on its expected
> timeslice and the worstcase cache-flush time.

We use the same same cpu-affinity mechanism as the current scheduler.
This simply gives a 'priority boost' to tasks that last ran on the
current CPU.  In our multi-queue scheduler, tasks on a remote queue
must have high enough priority (to overcome this boost) before being
moved to the local queue.

-- 
Mike Kravetz                                 mkravetz@sequent.com
IBM Linux Technology Center
15450 SW Koll Parkway
Beaverton, OR 97006-6063                     (503)578-3494
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Lse-tech] Re: multi-queue scheduler update
  2001-01-19  1:34       ` Mike Kravetz
@ 2001-01-19 20:49         ` Mike Kravetz
  2001-01-19 21:51           ` Mike Kravetz
  0 siblings, 1 reply; 35+ messages in thread
From: Mike Kravetz @ 2001-01-19 20:49 UTC (permalink / raw)
  To: Andrea Arcangeli, Davide Libenzi; +Cc: lse-tech, linux-kernel

On Thu, Jan 18, 2001 at 05:34:35PM -0800, Mike Kravetz wrote:
> On Fri, Jan 19, 2001 at 02:30:41AM +0100, Andrea Arcangeli wrote:
> > On Thu, Jan 18, 2001 at 04:52:25PM -0800, Mike Kravetz wrote:
> > > was less than the number of processors.  I'll give the tests a try
> > > with a smaller number of threads.  I'm also open to suggestions for
> > 
> > OK!
> > 
> > > what benchmarks/test methods I could use for scheduler testing.  If
> > > you remember what people have used in the past, please let me know.
> > 
> > It was this one IIRC (it spawns threads calling sched_yield() in loop).
> 
> Thanks!

It was my intention to post IIRC numbers for small thread counts today.
However, the benchmark (not the system) seems to hang on occasion.  This
occurs on both the unmodified 2.4.0 kernel and the one which contains
my multi-queue patch.  Therefore, I'm pretty sure it is not something
I did. :)

Anyone else see anything like this before?  I'll look into the reason
for the hang, but it will delay my posting of these numbers.

-- 
Mike Kravetz                                 mkravetz@sequent.com
IBM Linux Technology Center
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Lse-tech] Re: multi-queue scheduler update
  2001-01-19 20:49         ` Mike Kravetz
@ 2001-01-19 21:51           ` Mike Kravetz
  2001-01-19 22:03             ` Davide Libenzi
  0 siblings, 1 reply; 35+ messages in thread
From: Mike Kravetz @ 2001-01-19 21:51 UTC (permalink / raw)
  To: Andrea Arcangeli, Davide Libenzi; +Cc: lse-tech, linux-kernel

On Fri, Jan 19, 2001 at 12:49:21PM -0800, Mike Kravetz showed his lack
of internet slang understanding and wrote:
> 
> It was my intention to post IIRC numbers for small thread counts today.
> However, the benchmark (not the system) seems to hang on occasion.  This
> occurs on both the unmodified 2.4.0 kernel and the one which contains
> my multi-queue patch.  Therefore, I'm pretty sure it is not something
> I did. :)
> 
> Anyone else see anything like this before?  I'll look into the reason
> for the hang, but it will delay my posting of these numbers.

I think I have found the problem.  Here is a code snippet from the
benchmark Andrea posted.

void            oneatwork(int thr)
{
    int             i;
    while (!start)              /* don't disturb pthread_create() */
        usleep(10000);                                                          

    actthreads++;
    while (!stop)
    {
        if (count)
            totalwork[thr]++;

        syscall(158); /* sys_sched_yield() */                                   
    }                                                                           
    actthreads--;                                                               
    pthread_exit(0);
}                                                                               

Note that actthreads is a global variable which is being updated
by multiple threads without any form of synchronization.  Because
of this actthreads sometimes never goes to zero after all worker
threads have finished.  I changed actthreads to be an atomic and
used atomic operations to manipulate it.  With this change, I was
able to complete one round of testing which I had not been able to
do in the past.

Does anyone maintain this benchmark code?  The changes I indicate
above should be made.  If you need more specifics I can provide
them.

-- 
Mike Kravetz                                 mkravetz@sequent.com
IBM Linux Technology Center
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Lse-tech] Re: multi-queue scheduler update
  2001-01-19 21:51           ` Mike Kravetz
@ 2001-01-19 22:03             ` Davide Libenzi
  2001-01-19 22:18               ` Mike Kravetz
  0 siblings, 1 reply; 35+ messages in thread
From: Davide Libenzi @ 2001-01-19 22:03 UTC (permalink / raw)
  To: Mike Kravetz, Andrea Arcangeli; +Cc: lse-tech, linux-kernel

On Friday 19 January 2001 13:59, Mike Kravetz wrote:
> On Fri, Jan 19, 2001 at 12:49:21PM -0800, Mike Kravetz showed his lack
>
> of internet slang understanding and wrote:
> > It was my intention to post IIRC numbers for small thread counts today.
> > However, the benchmark (not the system) seems to hang on occasion.  This
> > occurs on both the unmodified 2.4.0 kernel and the one which contains
> > my multi-queue patch.  Therefore, I'm pretty sure it is not something
> > I did. :)
> >
> > Anyone else see anything like this before?  I'll look into the reason
> > for the hang, but it will delay my posting of these numbers.
>
> I think I have found the problem.  Here is a code snippet from the
> benchmark Andrea posted.
>
> void            oneatwork(int thr)
> {
>     int             i;
>     while (!start)              /* don't disturb pthread_create() */
>         usleep(10000);
>
>     actthreads++;
>     while (!stop)
>     {
>         if (count)
>             totalwork[thr]++;
>
>         syscall(158); /* sys_sched_yield() */
>     }
>     actthreads--;
>     pthread_exit(0);
> }
>
> Note that actthreads is a global variable which is being updated
> by multiple threads without any form of synchronization.  Because
> of this actthreads sometimes never goes to zero after all worker
> threads have finished. 

If all threads complete successfully actthreads has to be zero.
If some thread dies, this won't be true.



- Davide
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Lse-tech] Re: multi-queue scheduler update
  2001-01-19 22:03             ` Davide Libenzi
@ 2001-01-19 22:18               ` Mike Kravetz
  2001-01-19 23:24                 ` Davide Libenzi
  0 siblings, 1 reply; 35+ messages in thread
From: Mike Kravetz @ 2001-01-19 22:18 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Andrea Arcangeli, lse-tech, linux-kernel

On Fri, Jan 19, 2001 at 02:03:06PM -0800, Davide Libenzi wrote:
<stuff deleted>
> > void            oneatwork(int thr)
> > {
> >     int             i;
> >     while (!start)              /* don't disturb pthread_create() */
> >         usleep(10000);
> >
> >     actthreads++;
> >     while (!stop)
> >     {
> >         if (count)
> >             totalwork[thr]++;
> >
> >         syscall(158); /* sys_sched_yield() */
> >     }
> >     actthreads--;
> >     pthread_exit(0);
> > }
> >
> > Note that actthreads is a global variable which is being updated
> > by multiple threads without any form of synchronization.  Because
> > of this actthreads sometimes never goes to zero after all worker
> > threads have finished. 
> 
> If all threads complete successfully actthreads has to be zero.

Not as currently coded.  If two threads try to decrement actthreads
at the same time, there is no guarantee that it will be decremented
twice.  That is why you need to put some type of synchronization in
place.

-- 
Mike Kravetz                                 mkravetz@sequent.com
IBM Linux Technology Center
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Lse-tech] Re: multi-queue scheduler update
  2001-01-19 22:18               ` Mike Kravetz
@ 2001-01-19 23:24                 ` Davide Libenzi
  0 siblings, 0 replies; 35+ messages in thread
From: Davide Libenzi @ 2001-01-19 23:24 UTC (permalink / raw)
  To: Mike Kravetz; +Cc: Andrea Arcangeli, lse-tech, linux-kernel

On Friday 19 January 2001 15:23, Mike Kravetz wrote:
> On Fri, Jan 19, 2001 at 02:03:06PM -0800, Davide Libenzi wrote:
> <stuff deleted>
>
> > > void            oneatwork(int thr)
> > > {
> > >     int             i;
> > >     while (!start)              /* don't disturb pthread_create() */
> > >         usleep(10000);
> > >
> > >     actthreads++;
> > >     while (!stop)
> > >     {
> > >         if (count)
> > >             totalwork[thr]++;
> > >
> > >         syscall(158); /* sys_sched_yield() */
> > >     }
> > >     actthreads--;
> > >     pthread_exit(0);
> > > }
> > >
> > > Note that actthreads is a global variable which is being updated
> > > by multiple threads without any form of synchronization.  Because
> > > of this actthreads sometimes never goes to zero after all worker
> > > threads have finished.
> >
> > If all threads complete successfully actthreads has to be zero.
>
> Not as currently coded.  If two threads try to decrement actthreads
> at the same time, there is no guarantee that it will be decremented
> twice.  That is why you need to put some type of synchronization in
> place.

Right, inc & dec are not atomic w/o #LOCK.


- Davide
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: multi-queue scheduler update
  2001-01-19  0:26 ` Andrea Arcangeli
                     ` (2 preceding siblings ...)
  2001-01-19  1:00   ` Mark Hahn
@ 2001-01-19 23:35   ` Mike Kravetz
  3 siblings, 0 replies; 35+ messages in thread
From: Mike Kravetz @ 2001-01-19 23:35 UTC (permalink / raw)
  To: lse-tech; +Cc: linux-kernel

As promised, here are some numbers for low thread counts from the
benchmark Andrew and Davide provided.  I ran the benchmark for
1,2,4 and 8 threads.  I ran the test 5 times for each thread count
and used 60 seconds as the measure time in each case.

2.4.0
-----
1               1785408         1785408         0               0.000000
1               1786130         1786130         0               0.000000
1               1786156         1786156         0               0.000000
1               1781575         1781575         0               0.000000
1               1780079         1780079         0               0.000000
2               1873405         936702          0               0.000000
2               2006473         1003236         0               0.000001
2               1953842         976921          0               0.000004
2               1951338         975669          0               0.000000
2               1887887         943943          0               0.000004
4               1936350         484087          0               0.000055
4               1814430         453607          0               0.000087
4               1972681         493170          0               0.000055
4               1951748         487937          0               0.000206
4               1862182         465545          0               0.000283
8               2917216         364652          0               0.000008
8               2655834         331979          0               0.000018
8               3026734         378341          0               0.000005
8               3010204         376275          0               0.000004
8               2569647         321205          0               0.000014

2.4.0-multi-queue
-----------------
1               1295498         1295498         0               0.000000
1               1295011         1295011         0               0.000000
1               1296768         1296768         0               0.000000
1               1296053         1296053         0               0.000000
1               1296472         1296472         0               0.000000
2               1999043         999521          0               0.000000
2               1410636         705318          0               0.000000
2               1414476         707238          0               0.000000
2               2014664         1007332         0               0.000001
2               1414509         707254          0               0.000000
4               2046182         511545          0               0.000232
4               2101535         525383          0               0.000115
4               2094828         523707          0               0.000155
4               2097406         524351          0               0.000144
4               2057331         514332          0               0.000132
8               3795829         474478          0               0.000185
8               4058329         507291          0               0.001871
8               3845934         480741          0               0.000248
8               3715243         464405          0               0.000084
8               3777303         472162          0               0.000194

As expected the single thread numbers for the multi-queue scheduler
are not as good as those of the existing scheduler.  However, at 2
threads it is getting pretty close and from 4 threads up, the
multi-queue scheduler does better.

In this multi-queue implementation, the amount of overhead is
related to the number of processors in the system.  Therefore,
I would expect the numbers to 'be better' for low thread counts
on systems with lower (<8) processor counts.  It would be
interesting to see if the point at which the multi-queue does
better stays at aprox CPUs/2 as we change system configurations.
Hopefully we will have some more extensive benchmark results in
the not too distant future.  Until then, we'll be looking into
optimizations to help out the multi-queue scheduler at low
thread counts.

-- 
Mike Kravetz                                 mkravetz@sequent.com
IBM Linux Technology Center
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Lse-tech] Re: multi-queue scheduler update
@ 2001-01-22 13:35 Hubertus Franke
  2001-01-22 16:58 ` Davide Libenzi
  0 siblings, 1 reply; 35+ messages in thread
From: Hubertus Franke @ 2001-01-22 13:35 UTC (permalink / raw)
  To: lse-tech; +Cc: linux-kernel


Per popular demand. Here are a few numbers for small thread counts
running the sched_yield_test benchmark on a 2-way SMP with the following
characteristics.

model name      : Pentium III (Katmai)
stepping        : 3
cpu MHz         : 551.266
cache size      : 512 KB

I compare 2.4.1-pre8  kernels (vanilla, table/prio scheduler and
multiqueue).

#T : van   prio  MQ
----------------------
 1 : 0.591 0.582 0.750
 2 : 0.295 0.293 0.377
 3 : 2.091 2.373 1.010
 4 : 1.894 1.783 1.558
 5 : 1.949 1.794 1.591
 6 : 2.003 1.803 1.605
 7 : 2.050 1.805 1.654
 8 : 2.118 1.816 1.676
 9 : 2.174 1.811 1.708
10 : 2.235 1.821 1.744
11 : 2.304 1.823 1.780
12 : 2.365 1.831 1.863
13 : 2.427 1.829 1.870
14 : 2.494 1.841 1.950
15 : 2.578 1.839 1.959
16 : 2.691 1.865 2.043
17 : 2.804 1.855 2.041
18 : 2.893 1.873 2.127
19 : 3.001 1.851 2.079
20 : 3.098 1.878 2.182
21 : 3.191 1.851 2.178
22 : 3.263 1.884 2.233
23 : 3.332 1.850 2.231
24 : 3.403 1.901 2.272
25 : 3.472 1.865 2.251
26 : 3.540 1.923 2.305
27 : 3.604 1.872 2.295
28 : 3.680 1.900 2.333
29 : 4.204 1.883 2.329
30 : 4.256 1.944 2.358
31 : 3.875 1.936 2.325
32 : 4.476 1.953 2.339


Hubertus Franke
Enterprise Linux Group (Mgr),  Linux Technology Center (Member Scalability)
, OS-PIC (Chair)
email: frankeh@us.ibm.com
(w) 914-945-2003    (fax) 914-945-4425   TL: 862-2003



Andrea Arcangeli <andrea@suse.de>@lists.sourceforge.net on 01/18/2001
08:30:41 PM

Sent by:  lse-tech-admin@lists.sourceforge.net


To:   Mike Kravetz <mkravetz@sequent.com>
cc:   lse-tech@lists.sourceforge.net, linux-kernel@vger.kernel.org
Subject:  Re: [Lse-tech] Re: multi-queue scheduler update



On Thu, Jan 18, 2001 at 04:52:25PM -0800, Mike Kravetz wrote:
> was less than the number of processors.  I'll give the tests a try
> with a smaller number of threads.  I'm also open to suggestions for

OK!

> what benchmarks/test methods I could use for scheduler testing.  If
> you remember what people have used in the past, please let me know.

It was this one IIRC (it spawns threads calling sched_yield() in loop).

/*
  Tester for the kernel's speed in scheduling.
  (C) 1999 / Willy Tarreau <willy@meta-x.org>

  Modified by Davide Libenzi <davidel@maticad.it>


  You can do whatever you want with this program, but I'm not
  responsible for any misuse. Be aware that it can heavily load
  a host. As it is multithreaded, it might take advantages of SMP.

  It basically creates a growing amount of threads and measures
  their cumulative work (i.e. loop iterations/second). The output
  is easily useable by gnuplot.

  To compile, you need libpthread :

     gcc -O2 -fomit-frame-pointer -o threads threads.c -lpthread

  Output on stdout is :
     <nb_threads> <average_work> <zero_work_threads> <std_deviation>

*/

#include <stdio.h>
#include <pthread.h>
#include <signal.h>
#include <unistd.h>
#include <time.h>



#define MAXTHREADS  450
#define MEASURE_TIME     60



pthread_t       thr[MAXTHREADS];
int             nbthreads = MAXTHREADS;
int             measure_time = MEASURE_TIME;
volatile        actthreads = 0;

long long int   totalwork[MAXTHREADS];
volatile int    stop = 0,
                start = 0,
                count = 0;

void            oneatwork(int thr)
{
    int             i;
    while (!start)              /* don't disturb pthread_create() */
        usleep(10000);

    actthreads++;
    while (!stop)
    {
        if (count)
            totalwork[thr]++;

        syscall(158); /* sys_sched_yield() */
    }
    actthreads--;
    pthread_exit(0);
}

main(int argc, char **argv)
{

    int             i,
                    err,
                    avgwork,
                    thrzero;
    long long int   value,
                    avgvalue;
    double          sqrdev;
    time_t          ts,
                    te;

    if (argc < 3)
    {
        printf("usage: %s  threads  time\n", argv[0]);
        exit(1);
    }

    nbthreads = atoi(argv[1]);
    measure_time = atoi(argv[2]);


    start = 0;
    count = 0;
    stop = 0;
    actthreads = 0;
    thrzero = 0;
    value = 0;
    sqrdev = 0.0;

    fprintf(stderr, "\nCreating %d threads ...", nbthreads);
    for (i = 0; i < nbthreads; i++)
    {
        if ((err = pthread_create(&thr[i], NULL, (void *) &oneatwork, (void
*) i)) != 0)
        {
            fprintf(stderr, "thread %d pthread_create=%d -> ", i, err);
            perror("");
            exit(1);
        }
        pthread_detach(thr[i]);
    }

    for (i = 0; i < nbthreads; i++)
        totalwork[i] = 0;

    fprintf(stderr, " OK !\nWaiting for all threads to start ...");

    start = 1;
    while (actthreads != nbthreads)
        usleep(10000);         /* waiting for a bit of stability */

    fprintf(stderr, "Go !\n");

    count = 1;
    time(&ts);

    sleep(measure_time);

    count = 0;
    stop = 1;
    time(&te);


    for (i = 0; i < nbthreads; i++)
    {
        value += totalwork[i];
        if (totalwork[i] == 0)
            ++thrzero;
    }
    avgvalue = value / nbthreads;
    value /= (int) difftime(te, ts);
    avgwork = (int) (value / nbthreads);

    for (i = 0; i < nbthreads; i++)
    {
        double          difvv = (double) (totalwork[i] - avgvalue);

        sqrdev += difvv * difvv;
    }

    while (actthreads > 0)
        usleep(10000);

    printf("%d\t\t%lld\t\t%d\t\t%d\t\t%f\n", nbthreads, value, avgwork,
thrzero,
            sqrdev / ((double) nbthreads * avgvalue * avgvalue));

    exit(0);

}

Andrea

_______________________________________________
Lse-tech mailing list
Lse-tech@lists.sourceforge.net
http://lists.sourceforge.net/lists/listinfo/lse-tech



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Lse-tech] Re: multi-queue scheduler update
  2001-01-22 13:35 Hubertus Franke
@ 2001-01-22 16:58 ` Davide Libenzi
  0 siblings, 0 replies; 35+ messages in thread
From: Davide Libenzi @ 2001-01-22 16:58 UTC (permalink / raw)
  To: Hubertus Franke, lse-tech; +Cc: linux-kernel

On Monday 22 January 2001 08:57, Hubertus Franke wrote:
> Per popular demand. Here are a few numbers for small thread counts
> running the sched_yield_test benchmark on a 2-way SMP with the following
> characteristics.
>
> model name      : Pentium III (Katmai)
> stepping        : 3
> cpu MHz         : 551.266
> cache size      : 512 KB
>
> I compare 2.4.1-pre8  kernels (vanilla, table/prio scheduler and
> multiqueue).

What's 'table/prio scheduler' ?



- Davide
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Lse-tech] multi-queue scheduler update
  2001-01-18 23:53 multi-queue scheduler update Mike Kravetz
  2001-01-19  0:26 ` Andrea Arcangeli
  2001-01-19  0:43 ` Gerhard Mack
@ 2001-01-23 16:49 ` Jun Nakajima
       [not found] ` <LYR76657-1923-2001.01.23-08.54.49--mikek#sequent.com@lyris.sequent.com>
  3 siblings, 0 replies; 35+ messages in thread
From: Jun Nakajima @ 2001-01-23 16:49 UTC (permalink / raw)
  To: Mike Kravetz; +Cc: lse-tech, linux-kernel

I tried to run SDET (Software Development Environment Throughput), which
basically is a system level, throughput oriented benchmark, on the 2.4.0
kernel and 2.4.0 kernel with this patch. 

I guess many (old?) Unix guys are familiar with it, and it is (was?)
sometimes used to check some aspects of scalability of the system. The
details of this bechmark are not so important in this mail (available
upon request).

The following are very preliminary numbers from the benchmark. Tests
were run on a system with 8 550 MHz Pentium III processors. I think
those results are encouraging.

# of Scripts	Throughput 	Throughput
                2.4		2.4-multi-queue
---------	----------	--------
1       	2057.1		1978.0
2       	4114.3		4067.8
4       	7700.5		7700.5
6       	10746.3		10746.3
8       	12973.0		12576.4
10      	13186.8		13235.3
15      	13138.7		13235.3
20      	12996.4		13043.5
25      	13005.8		13005.8
30      	12811.4		13059.3
40      	12676.1		12732.1
50      	12121.2		12676.1
60      	12314.7		12442.4
70      	12051.6		11954.5
80      	11871.4		11985.0
90      	11608.7		11777.5
100     	10849.9		11523.7
125     	10678.7		10940.9
150     	10416.7		10503.8
175     	10187.6		10314.3
200     	9749.5		10106.7
250     	8343.4		8787.3

I also checked hot-spots with the 2.4.0 kernel (not with multi-queue)
with lockmeter (http://oss.sgi.com/projects/lockmeter/). The data were
sampled when the number of scripte is 175.

SPINLOCKS       HOLD          WAIT
   UTIL   CON   MEAN (MAX)    MEAN (MAX)        TOTAL   NAME
...
10.56%  26.89%  7.4us(175us)  3.4us(692us)  1569304   runqueue_lock
 2.23%  29.75%  4.5us(20us)   4.4us(646us)   550505    __wake_up+0x7c
 0.01%  11.62%  6.6us(15us)   1.0us(65us)      2056    __wake_up+0x128
 0.00%  14.29%  0.4us(2.6us)  3.0us(332us)     1393   
deliver_signal+0x58
 0.00%   9.94%  7.2us(16us)   1.2us(56us)       332   
process_timeout+0x14
 0.01%  26.70%  4.7us(16us)   5.0us(296us)     1457   
schedule_tail+0x58
 7.53%  23.28%   11us(175us)  3.0us(692us)   781676    schedule+0xd0
 0.66%  35.42%  3.5us(23us)   2.8us(486us)   206008    schedule+0x458
 0.00%  11.79%  4.2us(78us)   1.1us(56us)       560    schedule+0x504
 0.11%   9.42%  5.0us(21us)   2.3us(420us)    25317   
wake_up_process+0x14

The above result basically tells utilization of runqueue_lock is about
10% of all spinlocks held during the benchmarck and nealy 27% of the
requests for this lock need to spin and wait for the lock (The NAMEs
below the lock are the locations where that lock is used). This might
explain the throughput improvements gained by the multi-queue scheduler.

Now who has the largest utilization? Of course it's kernel_flag.
SPINLOCKS       HOLD          WAIT
   UTIL   CON   MEAN (MAX)    MEAN (MAX)      TOTAL  NAME
...
43.15%  33.08%  13us(95971us) 12us(95997us) 3558789  kernel_flag
 0.02%  38.26%  0.7us(29us)   34us(94975us)   23788   acct_process+0x1c
 0.02%  44.63%  8.3us(43us)    23us(675us)     2012   chrdev_open+0x4c
 0.00%  22.26%  0.9us(2.5us)   16us(525us)      283   de_put+0x28
 5.26%  38.34%  244us(1184us) 21us(53127us)   23788   do_exit+0xf8
 0.99%  36.22%   11us(840us)  12us(53195us)   96205   
ext2_delete_inode+0x20
 0.46%  29.64%  1.2us(159us)  9.1us(53249us) 430421  
ext2_discard_prealloc+0x20
 1.28%  40.60%  9.7us(152us)   22us(43404us) 146014  
ext2_get_block+0x54
 0.00%  40.00%  0.4us(0.7us)   8.6us(34us)        5  
locks_remove_flock+0x34
 0.00%  40.00%  0.6us(1.2us)   4.5us(14us)        5  
locks_remove_posix+0x38
 0.92%  40.80%  12us(572us)    16us(47804us)  84618   lookup_hash+0x84
 0.16%  37.35%  1.0us(178us)   13us(53173us) 175002   notify_change+0x68
 7.78%  15.00%  46us(2523us)   3.1us(27213us)188485   permission+0x38
20.34%  32.99%  12us(1981us)   12us(95997us)1927065   real_lookup+0x64
 0.05%  47.31%  595us(51910us) 22us(270us)       93   schedule+0x490
 0.56%  42.11%  32861us(95971us)41us(405us)      19  
sync_old_buffers+0x20
 0.83%  40.22%  19us(1473us)   19us(41614us)  48081   sys_fcntl64+0x44
 0.01%  38.05%  1.3us(37us)    22us(49506us)  12422   sys_ioctl+0x4c
 0.06%  33.12%  0.5us(62us)    15us(49778us) 132230   sys_llseek+0x88
 0.00%  39.64%  0.9us(4.9us)   19us(849us)     5401   sys_lseek+0x6c
 0.00%  37.50%  28us(48us)     12us(222us)      200   sys_rename+0x1a0
 0.02%  42.29%  6.2us(22us)    81us(93181us)   3802   sys_sysctl+0x4c
 0.00%  52.27%   6.4us(29us)   13us(156us)      132   tty_read+0xbc
 0.01%  41.36%  13us(37us)     16us(434us)      810   tty_release+0x1c
 0.00%  48.12%  17us(143us)    22us(497us)      133   tty_write+0x1bc
 2.08%  41.32%  25us(309us)    18us(29470us)  92009   vfs_create+0x98
 0.52%  38.57%  85us(227us)    12us(698us)     6800   vfs_mkdir+0x90
 1.10%  38.40%  20us(317us)    14us(1100us)   60359   vfs_readdir+0x68
 0.07%  41.66%  12us(78us)    18us(1120us)     6800   vfs_rmdir+0x188
 0.00% 100.00%  24us(24us)    21us(27us)          2   vfs_statfs+0x4c
 0.60%  36.52%  7.2us(104us)  9.4us(904us)    91805   vfs_unlink+0x110

This tells many things, but 
- utilization of kernel_flag is about 43% and more than half of that 
  utilization is done by real_lookup.
- its average hold-time is not relatively significant, but max wait-time 
  is.
- The location sync_old_buffers+0x20 looks responsible for the longest 
  wait-time (95997us).
- sync_old_buffers is responsible only for 0.83% of lock utilization,
but
  it has the largest average (32861us) and max (95971us) hold-time.

So if we replace the big kernel lock with a fine-grained lock in the
real_lookup function, we would see more throughput improvements at
leaset for this benchmarck. 

But I guess the reason for holding the big kernel in real_lookup() is
that not all filesystems don't implement an MP-safe lookup routine. Is
that correct assumption?

For sync_old_buffers, we could hold the big kernel lock per filesystem,
for example. 

static struct dentry * real_lookup(struct dentry * parent, struct qstr *
name, int flags)
{
 ...

        result = d_lookup(parent, name);
        if (!result) {
                struct dentry * dentry = d_alloc(parent, name);
                result = ERR_PTR(-ENOMEM);
                if (dentry) {
                        lock_kernel();
                        result = dir->i_op->lookup(dir, dentry);
                        unlock_kernel();
                        if (result)
                                dput(dentry);
                        else
                                result = dentry;
                }
                up(&dir->i_sem);
                return result;
        }

...
}

static int sync_old_buffers(void)
{
        lock_kernel();
        sync_supers(0);
        sync_inodes(0);
        unlock_kernel();

        flush_dirty_buffers(1);
        /* must really sync all the active I/O request to disk here */
        run_task_queue(&tq_disk);
        return 0;
}



Mike Kravetz wrote:
> 
> I just posted an updated version of the multi-queue scheduler
> for the 2.4.0 kernel.  This version also contains support for
> realtime tasks.  The patch can be found at:
> 
> http://lse.sourceforge.net/scheduling/
> 
> Here are some very preliminary numbers from sched_test_yield
> (which was previously posted to this (lse-tech) list by Bill
> Hartner).  Tests were run on a system with 8 700 MHz Pentium
> III processors.
> 
>                            microseconds/yield
> # threads      2.2.16-22           2.4        2.4-multi-queue
> ------------   ---------         --------     ---------------
> 16               18.740            4.603         1.455
> 32               17.702            5.134         1.456
> 64               23.300            5.586         1.466
> 128              47.273           18.812         1.480
> 256             105.701           71.147         1.517
> 512               FRC            143.500         1.661
> 1024              FRC            196.425         6.166
> 2048              FRC              FRC          23.291
> 4096              FRC              FRC          47.117
> 
> *FRC = failed to reach confidence level
> 
> --
> Mike Kravetz                                 mkravetz@sequent.com
> IBM Linux Technology Center
> 15450 SW Koll Parkway
> Beaverton, OR 97006-6063                     (503)578-3494
> 
> _______________________________________________
> Lse-tech mailing list
> Lse-tech@lists.sourceforge.net
> http://lists.sourceforge.net/lists/listinfo/lse-tech

-- 
Jun U Nakajima
Core OS Development
SCO/Murray Hill, NJ
Email: jun@sco.com, Phone: 908-790-2352 Fax: 908-790-2426
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Lse-tech] multi-queue scheduler update
       [not found] ` <LYR76657-1923-2001.01.23-08.54.49--mikek#sequent.com@lyris.sequent.com>
@ 2001-01-23 17:08   ` Mike Kravetz
  0 siblings, 0 replies; 35+ messages in thread
From: Mike Kravetz @ 2001-01-23 17:08 UTC (permalink / raw)
  To: Jun Nakajima; +Cc: lse-tech, linux-kernel

On Tue, Jan 23, 2001 at 11:49:27AM -0500, Jun Nakajima wrote:
> I tried to run SDET (Software Development Environment Throughput), which
> basically is a system level, throughput oriented benchmark, on the 2.4.0
> kernel and 2.4.0 kernel with this patch. 

Thanks for running this.  I too remember SDET, but I won't claim
to be old. :)

We were doing some more analysis on the multi-queue scheduler and
noticed that performance has regressed since posting preliminary
numbers with the 2.4.0-test10 kernel.  After comparing the code,
it looks like I have over-engineered for the worst case of lock
contention.  This was done at the expense of the normal case.
I'm currently working on this situation and expect to have a new
patch out in the not too distant future.

I expect the numbers will get better.
-- 
Mike Kravetz                                 mkravetz@sequent.com
IBM Linux Technology Center
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2001-01-23 17:09 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-01-18 23:53 multi-queue scheduler update Mike Kravetz
2001-01-19  0:26 ` Andrea Arcangeli
2001-01-19  0:51   ` [Lse-tech] " Andi Kleen
2001-01-19  1:14     ` John Clemens
2001-01-19  0:52   ` [Lse-tech] " Mike Kravetz
2001-01-19  1:30     ` Andrea Arcangeli
2001-01-19  1:34       ` Mike Kravetz
2001-01-19 20:49         ` Mike Kravetz
2001-01-19 21:51           ` Mike Kravetz
2001-01-19 22:03             ` Davide Libenzi
2001-01-19 22:18               ` Mike Kravetz
2001-01-19 23:24                 ` Davide Libenzi
2001-01-19  1:39       ` Davide Libenzi
2001-01-19 16:06     ` David Lang
2001-01-19  1:00   ` Mark Hahn
2001-01-19  1:08     ` Andi Kleen
2001-01-19  1:23       ` Mike Kravetz
2001-01-19  1:38         ` Davide Libenzi
2001-01-19  1:35     ` Andrea Arcangeli
2001-01-19  1:48       ` Andi Kleen
2001-01-19 23:35   ` Mike Kravetz
2001-01-19  0:43 ` Gerhard Mack
2001-01-23 16:49 ` [Lse-tech] " Jun Nakajima
     [not found] ` <LYR76657-1923-2001.01.23-08.54.49--mikek#sequent.com@lyris.sequent.com>
2001-01-23 17:08   ` Mike Kravetz
  -- strict thread matches above, loose matches on Subject: below --
2001-01-19 15:47 [Lse-tech] " Hubertus Franke
2001-01-19 17:11 ` Mike Kravetz
     [not found]   ` <LYR76657-5332-2001.01.19-12.12.38--mikek#sequent.com@lyris.sequent.com>
2001-01-19 20:32     ` Mike Kravetz
2001-01-19 16:30 Hubertus Franke
2001-01-19 16:33 ` nick
2001-01-19 17:06   ` Tim Wright
2001-01-19 16:47 Hubertus Franke
2001-01-19 18:03 Hubertus Franke
2001-01-19 19:52 ` bert hubert
2001-01-22 13:35 Hubertus Franke
2001-01-22 16:58 ` Davide Libenzi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox