From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S264388AbUHGVxe (ORCPT ); Sat, 7 Aug 2004 17:53:34 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S264389AbUHGVxe (ORCPT ); Sat, 7 Aug 2004 17:53:34 -0400 Received: from web13910.mail.yahoo.com ([216.136.172.95]:17558 "HELO web13910.mail.yahoo.com") by vger.kernel.org with SMTP id S264388AbUHGVxK (ORCPT ); Sat, 7 Aug 2004 17:53:10 -0400 Message-ID: <20040807215309.4746.qmail@web13910.mail.yahoo.com> Date: Sat, 7 Aug 2004 14:53:09 -0700 (PDT) From: Reply-To: spaminos-ker@yahoo.com Subject: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others) To: linux-kernel@vger.kernel.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Hey guys I ran into this problem a few month ago while trying to migrate to 2.6 series at the time, it was using 2.6.1, I figured maybe it was fixed in the latest kernel, so I tried 2.6.7 but it shows the same signs. I am running in an interesting problem with, what seems to be, the scheduler. I managed to create a test program to reproduce it. This test system I use is an Athlon XP1800+ with 256 Mo RAM with Redhat 9 installed (but I could reproduce it the same way with Dual CPU systems). I can reproduce this problem on 2.6.7 (and before, I could reproduce it on a redhat kernel linux-2.4.20-19.9) Kernel that I compiled with and without CONFIG_PREEMPT, SMP on or off, to see no difference. Everything is fine on my regular (non redhat) 2.4 kernel (same box). Basically, here is what's happening on my server: * I have a daemon running, serving http requests via worker threads. * I have an other daemon running that does some sanity check on the system, basically to monitor the box (and it mostly sleeps). Under load, the sanity check daemon stops running properly. It seems like all the CPU is given to the http server, even though they are started the same way. This leads to all sorts of timeouts, not good for a production server. So how to reproduce the problem: I run in one shell a script to check for slowdowns: A=$(date +%s) ; while true ; do sleep 1s; B=$(date +%s) ; D=$(($B-$A)) ; if [ $D -ge 3 ] ; then date ; echo ">>>>>>> delta = $D" ; fi ; A=$B ; done and in an other, my test program (see below for source code): ./cputest 2 10000 1000 512 & note: I am running everything from ssh shells, not from X (don't know if this has any effect for this kind of test). And after a few seconds, I get this: Mon Aug 2 14:40:27 PDT 2004 >>>>>>> delta = 3 Mon Aug 2 14:41:37 PDT 2004 >>>>>>> delta = 3 Mon Aug 2 14:41:51 PDT 2004 >>>>>>> delta = 3 Mon Aug 2 14:41:57 PDT 2004 >>>>>>> delta = 3 Mon Aug 2 14:42:12 PDT 2004 >>>>>>> delta = 3 The script takes 3 seconds to execute?! That means that the script, that sleeps and doesn't use a lot of CPU, stops getting cycles, for some reason. On 2.4.x series kernel (from kernel.org, not redhat), I don't get any output for the script, meaning the fairness is good between processes (even if I start it with 60 worker threads instead of 2). If I try to increase the number of threads to 60 on 2.6.7, the script gets stuck for very long period of time (I've seen 90 seconds). I can post the .config file I used for compiling the 2.6.7 kernel, if needed, but I figured this post was already big enough :) After seeing a discussion on scheduling problems, I also tried different patches, in particular the effect of an alternate scheduler from Nick Piggin: 2.6.7 -> fails with 2 threads 2.6.7-bk20 -> fails with 2 threads 2.6.7-bk20-np8 -> works fine with 2 threads, fails with 20 threads 2.6.7-mm7 -> fails with 2 threads 2.6.7-mm7-np8 -> works fine with 2 threads, fails with 20 threads Any clue? Nicolas compile the code with: gcc -pthread -D_REENTRANT -lm cputest.c -o cputest ---------------------- cputest.c ----------------------------- #include #include #include #include #include #include #include int nbthreads = 0; int iterations = 0; int run = 1; int iterations2 = 0; #define DEFAULTBUFSIZE 1 int globalpipe[2]; int buflen = DEFAULTBUFSIZE; void exithandler(int s) { if (run) { run = 0; close(globalpipe[0]); close(globalpipe[1]); } } char *fname; void *thread_code(void *arg) { time_t lasttime = time(NULL); int jj = (int)arg; int localiter = iterations/jj; int fd; char *buf; buf = alloca(buflen); if (buf == NULL) { printf("Failed to alloca!\n"); exit(-1); } do { int i; time_t newtime; for(i=0; i < localiter && run; i++) { int j; double k = 1.0; for(j=0; j < iterations2 && run; j++) { k += j*k; k = (1-k)*(1+k); k = sqrt(k); } if (fd != -1) { if (read(globalpipe[0], buf, buflen) != buflen) { printf("%d, aborted read\n", jj); } } } sleep(1); newtime = time(NULL); printf("%d delta = %d\n", (int)arg, newtime-lasttime); lasttime = newtime; } while (run); return NULL; } int main(int argc, char **argv) { int i; pthread_t *mythreads; if (argc < 5) return -1; nbthreads = atoi(argv[1]); iterations = atoi(argv[2]); iterations2 = atoi(argv[3]); buflen = atoi(argv[4]); signal(SIGINT, exithandler); if (pipe(globalpipe) != 0) { return -2; } mythreads = (pthread_t *)calloc(nbthreads, sizeof(pthread_t)); for(i=0; i < nbthreads; i++) { if (pthread_create(&mythreads[i], NULL, thread_code, (void *)(i+1)) != 0) { return -5; } printf("Started %d\n", i); } void * res; char *buf; int localbuflen; localbuflen = buflen * nbthreads; buf = alloca(localbuflen); if (buf == NULL) { printf("Failed to alloca!\n"); run = 0; exit(-1); } sleep(3); do { write(globalpipe[1], buf, localbuflen); } while (run); for(i=0; i < nbthreads; i++) { printf("Waiting %d\n", i); pthread_join(mythreads[i], &res); } return 0; } -------------------- end cputest.c -------------------