* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others) [not found] <20040811010116.GL11200@holomorphy.com> @ 2004-08-11 2:21 ` spaminos-ker 2004-08-11 2:23 ` William Lee Irwin III 2004-08-11 3:09 ` Con Kolivas 0 siblings, 2 replies; 21+ messages in thread From: spaminos-ker @ 2004-08-11 2:21 UTC (permalink / raw) To: linux-kernel; +Cc: William Lee Irwin III --- William Lee Irwin III <wli@holomorphy.com> wrote: > > Wakeup bonuses etc. are starving tasks. Could you try Peter Williams' > SPA patches with the do_promotions() function? I suspect these should > pass your tests. > > > -- wli > I tried the patch-2.6.7-spa_hydra_FULL-v4.0 patch I only changed the value of /proc/sys/kernel/cpusched/mode to switch between different patches. The 2 threads test passes successfuly (improvement over stock 2.6.7) but none passed the 20 threads test: eb Tue Aug 10 19:10:48 PDT 2004 >>>>>>> delta = 6 Tue Aug 10 19:11:03 PDT 2004 >>>>>>> delta = 16 Tue Aug 10 19:11:13 PDT 2004 >>>>>>> delta = 9 Tue Aug 10 19:11:24 PDT 2004 >>>>>>> delta = 11 Tue Aug 10 19:11:34 PDT 2004 >>>>>>> delta = 10 Tue Aug 10 19:11:45 PDT 2004 >>>>>>> delta = 11 Tue Aug 10 19:11:56 PDT 2004 >>>>>>> delta = 11 Tue Aug 10 19:12:06 PDT 2004 >>>>>>> delta = 10 pb Tue Aug 10 19:07:52 PDT 2004 >>>>>>> delta = 3 Tue Aug 10 19:07:55 PDT 2004 >>>>>>> delta = 3 Tue Aug 10 19:07:59 PDT 2004 >>>>>>> delta = 4 Tue Aug 10 19:08:02 PDT 2004 >>>>>>> delta = 3 Tue Aug 10 19:08:05 PDT 2004 >>>>>>> delta = 3 sc Tue Aug 10 19:08:28 PDT 2004 >>>>>>> delta = 3 Tue Aug 10 19:09:08 PDT 2004 >>>>>>> delta = 3 Tue Aug 10 19:09:17 PDT 2004 >>>>>>> delta = 3 Tue Aug 10 19:09:23 PDT 2004 >>>>>>> delta = 3 Tue Aug 10 19:09:49 PDT 2004 >>>>>>> delta = 3 Tue Aug 10 19:09:53 PDT 2004 >>>>>>> delta = 3 Tue Aug 10 19:09:55 PDT 2004 >>>>>>> delta = 3 eb seemed to be the worst of the bunch with quite long system hangs on this particular test. With the default settings of: base_promotion_interval 255 compute 0 cpu_hog_threshold 900 ia_threshold 900 initial_ia_bonus 1 interactive 0 log_at_exit 0 max_ia_bonus 9 max_tpt_bonus 4 sched_batch_time_slice_multiplier 10 sched_iso_threshold 50 sched_rr_time_slice 100 time_slice 100 I am not very familiar with all the parameters, so I just kept the defaults Anything else I could try? Nicolas ===== ------------------------------------------------------------ video meliora proboque deteriora sequor ------------------------------------------------------------ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others) 2004-08-11 2:21 ` Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others) spaminos-ker @ 2004-08-11 2:23 ` William Lee Irwin III 2004-08-11 2:45 ` Peter Williams 2004-08-11 3:09 ` Con Kolivas 1 sibling, 1 reply; 21+ messages in thread From: William Lee Irwin III @ 2004-08-11 2:23 UTC (permalink / raw) To: spaminos-ker; +Cc: linux-kernel On Tue, Aug 10, 2004 at 07:21:43PM -0700, spaminos-ker@yahoo.com wrote: > I am not very familiar with all the parameters, so I just kept the defaults > Anything else I could try? > Nicolas No. It appeared that the SPA bits had sufficient fairness in them to pass this test but apparently not quite enough. -- wli ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others) 2004-08-11 2:23 ` William Lee Irwin III @ 2004-08-11 2:45 ` Peter Williams 2004-08-11 2:47 ` Peter Williams 0 siblings, 1 reply; 21+ messages in thread From: Peter Williams @ 2004-08-11 2:45 UTC (permalink / raw) To: spaminos-ker; +Cc: William Lee Irwin III, linux-kernel William Lee Irwin III wrote: > On Tue, Aug 10, 2004 at 07:21:43PM -0700, spaminos-ker@yahoo.com wrote: > >>I am not very familiar with all the parameters, so I just kept the defaults >>Anything else I could try? >>Nicolas > > > No. It appeared that the SPA bits had sufficient fairness in them to > pass this test but apparently not quite enough. > The interactive bonus may interfere with fairness (the throughput bonus should actually help it for tasks with equal nice) so you could try setting max_ia_bonus to zero (and possibly increasing max_tpt_bonus). With "eb" mode this should still give good interactive response but expect interactive response to suffer a little in "pb" mode however renicing the X server to a negative value should help. Peter PS There's a primitive GUI available for setting the scheduler parameters at <http://prdownloads.sourceforge.net/cpuse/gcpuctl_hydra-1.3.tar.gz?download> this is just a Python script with a Glade XML file (gcpuctl_hydra.glade) which needs to be in the same directory that you run the script from. -- Peter Williams pwil3058@bigpond.net.au "Learning, n. The kind of ignorance distinguishing the studious." -- Ambrose Bierce ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others) 2004-08-11 2:45 ` Peter Williams @ 2004-08-11 2:47 ` Peter Williams 2004-08-11 3:23 ` Peter Williams 0 siblings, 1 reply; 21+ messages in thread From: Peter Williams @ 2004-08-11 2:47 UTC (permalink / raw) To: spaminos-ker; +Cc: William Lee Irwin III, linux-kernel Peter Williams wrote: > William Lee Irwin III wrote: > >> On Tue, Aug 10, 2004 at 07:21:43PM -0700, spaminos-ker@yahoo.com wrote: >> >>> I am not very familiar with all the parameters, so I just kept the >>> defaults >>> Anything else I could try? >>> Nicolas >> >> >> >> No. It appeared that the SPA bits had sufficient fairness in them to >> pass this test but apparently not quite enough. >> > > The interactive bonus may interfere with fairness (the throughput bonus > should actually help it for tasks with equal nice) so you could try > setting max_ia_bonus to zero (and possibly increasing max_tpt_bonus). > With "eb" mode this should still give good interactive response but > expect interactive response to suffer a little in "pb" mode however > renicing the X server to a negative value should help. I should also have mentioned that fiddling with the promotion interval may help. Peter -- Peter Williams pwil3058@bigpond.net.au "Learning, n. The kind of ignorance distinguishing the studious." -- Ambrose Bierce ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others) 2004-08-11 2:47 ` Peter Williams @ 2004-08-11 3:23 ` Peter Williams 2004-08-11 3:31 ` Con Kolivas 2004-08-11 3:44 ` Peter Williams 0 siblings, 2 replies; 21+ messages in thread From: Peter Williams @ 2004-08-11 3:23 UTC (permalink / raw) To: spaminos-ker; +Cc: William Lee Irwin III, linux-kernel Peter Williams wrote: > Peter Williams wrote: > >> William Lee Irwin III wrote: >> >>> On Tue, Aug 10, 2004 at 07:21:43PM -0700, spaminos-ker@yahoo.com wrote: >>> >>>> I am not very familiar with all the parameters, so I just kept the >>>> defaults >>>> Anything else I could try? >>>> Nicolas >>> >>> >>> >>> >>> No. It appeared that the SPA bits had sufficient fairness in them to >>> pass this test but apparently not quite enough. >>> >> >> The interactive bonus may interfere with fairness (the throughput >> bonus should actually help it for tasks with equal nice) so you could >> try setting max_ia_bonus to zero (and possibly increasing >> max_tpt_bonus). With "eb" mode this should still give good interactive >> response but expect interactive response to suffer a little in "pb" >> mode however renicing the X server to a negative value should help. > > > I should also have mentioned that fiddling with the promotion interval > may help. Having reread your original e-mail I think that this problem is probably being caused by the interactive bonus mechanism classifying the httpd server threads as "interactive" threads and giving them a bonus. But for some reason the daemon is not identified as "interactive" meaning that it gets given a lower priority. In this situation if there's a large number of httpd threads (even with promotion) it could take quite a while for the daemon to get a look in. Without promotion total starvation is even a possibility. Peter PS For both "eb" and "pb" modes, max_io_bonus should be set to zero on servers (where interactive responsiveness isn't an issue). PPS For "sc" mode, try setting "interactive" to zero and "compute" to 1. -- Peter Williams pwil3058@bigpond.net.au "Learning, n. The kind of ignorance distinguishing the studious." -- Ambrose Bierce ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others) 2004-08-11 3:23 ` Peter Williams @ 2004-08-11 3:31 ` Con Kolivas 2004-08-11 3:46 ` Peter Williams 2004-08-11 3:44 ` Peter Williams 1 sibling, 1 reply; 21+ messages in thread From: Con Kolivas @ 2004-08-11 3:31 UTC (permalink / raw) To: Peter Williams; +Cc: spaminos-ker, William Lee Irwin III, linux-kernel Peter Williams writes: > Peter Williams wrote: >> Peter Williams wrote: >> >>> William Lee Irwin III wrote: >>> >>>> On Tue, Aug 10, 2004 at 07:21:43PM -0700, spaminos-ker@yahoo.com wrote: >>>> >>>>> I am not very familiar with all the parameters, so I just kept the >>>>> defaults >>>>> Anything else I could try? >>>>> Nicolas >>>> >>>> >>>> >>>> >>>> No. It appeared that the SPA bits had sufficient fairness in them to >>>> pass this test but apparently not quite enough. >>>> >>> >>> The interactive bonus may interfere with fairness (the throughput >>> bonus should actually help it for tasks with equal nice) so you could >>> try setting max_ia_bonus to zero (and possibly increasing >>> max_tpt_bonus). With "eb" mode this should still give good interactive >>> response but expect interactive response to suffer a little in "pb" >>> mode however renicing the X server to a negative value should help. >> >> >> I should also have mentioned that fiddling with the promotion interval >> may help. > > Having reread your original e-mail I think that this problem is probably > being caused by the interactive bonus mechanism classifying the httpd > server threads as "interactive" threads and giving them a bonus. But > for some reason the daemon is not identified as "interactive" meaning > that it gets given a lower priority. In this situation if there's a > large number of httpd threads (even with promotion) it could take quite > a while for the daemon to get a look in. Without promotion total > starvation is even a possibility. > > Peter > PS For both "eb" and "pb" modes, max_io_bonus should be set to zero on > servers (where interactive responsiveness isn't an issue). > PPS For "sc" mode, try setting "interactive" to zero and "compute" to 1. No, compute should not be set to 1 for a server. It is reserved only for computational nodes, not regular servers. "Compute" will increase latency which is undersirable. Cheers, Con ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others) 2004-08-11 3:31 ` Con Kolivas @ 2004-08-11 3:46 ` Peter Williams 0 siblings, 0 replies; 21+ messages in thread From: Peter Williams @ 2004-08-11 3:46 UTC (permalink / raw) To: Con Kolivas; +Cc: spaminos-ker, William Lee Irwin III, linux-kernel Con Kolivas wrote: > Peter Williams writes: > >> Peter Williams wrote: >> >>> Peter Williams wrote: >>> >>>> William Lee Irwin III wrote: >>>> >>>>> On Tue, Aug 10, 2004 at 07:21:43PM -0700, spaminos-ker@yahoo.com >>>>> wrote: >>>>> >>>>>> I am not very familiar with all the parameters, so I just kept the >>>>>> defaults >>>>>> Anything else I could try? >>>>>> Nicolas >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> No. It appeared that the SPA bits had sufficient fairness in them to >>>>> pass this test but apparently not quite enough. >>>>> >>>> >>>> The interactive bonus may interfere with fairness (the throughput >>>> bonus should actually help it for tasks with equal nice) so you >>>> could try setting max_ia_bonus to zero (and possibly increasing >>>> max_tpt_bonus). With "eb" mode this should still give good >>>> interactive response but expect interactive response to suffer a >>>> little in "pb" mode however renicing the X server to a negative >>>> value should help. >>> >>> >>> >>> I should also have mentioned that fiddling with the promotion >>> interval may help. >> >> >> Having reread your original e-mail I think that this problem is >> probably being caused by the interactive bonus mechanism classifying >> the httpd server threads as "interactive" threads and giving them a >> bonus. But for some reason the daemon is not identified as >> "interactive" meaning that it gets given a lower priority. In this >> situation if there's a large number of httpd threads (even with >> promotion) it could take quite a while for the daemon to get a look >> in. Without promotion total starvation is even a possibility. >> >> Peter >> PS For both "eb" and "pb" modes, max_io_bonus should be set to zero on >> servers (where interactive responsiveness isn't an issue). >> PPS For "sc" mode, try setting "interactive" to zero and "compute" to 1. > > > No, compute should not be set to 1 for a server. It is reserved only for > computational nodes, not regular servers. "Compute" will increase > latency which is undersirable. Sorry, my misunderstanding. Peter -- Peter Williams pwil3058@bigpond.net.au "Learning, n. The kind of ignorance distinguishing the studious." -- Ambrose Bierce ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others) 2004-08-11 3:23 ` Peter Williams 2004-08-11 3:31 ` Con Kolivas @ 2004-08-11 3:44 ` Peter Williams 2004-08-13 0:13 ` spaminos-ker 1 sibling, 1 reply; 21+ messages in thread From: Peter Williams @ 2004-08-11 3:44 UTC (permalink / raw) To: spaminos-ker; +Cc: Peter Williams, William Lee Irwin III, linux-kernel Peter Williams wrote: > Peter Williams wrote: > >> Peter Williams wrote: >> >>> William Lee Irwin III wrote: >>> >>>> On Tue, Aug 10, 2004 at 07:21:43PM -0700, spaminos-ker@yahoo.com wrote: >>>> >>>>> I am not very familiar with all the parameters, so I just kept the >>>>> defaults >>>>> Anything else I could try? >>>>> Nicolas >>>> >>>> >>>> >>>> >>>> >>>> No. It appeared that the SPA bits had sufficient fairness in them to >>>> pass this test but apparently not quite enough. >>>> >>> >>> The interactive bonus may interfere with fairness (the throughput >>> bonus should actually help it for tasks with equal nice) so you could >>> try setting max_ia_bonus to zero (and possibly increasing >>> max_tpt_bonus). With "eb" mode this should still give good >>> interactive response but expect interactive response to suffer a >>> little in "pb" mode however renicing the X server to a negative value >>> should help. >> >> >> >> I should also have mentioned that fiddling with the promotion interval >> may help. > > > Having reread your original e-mail I think that this problem is probably > being caused by the interactive bonus mechanism classifying the httpd > server threads as "interactive" threads and giving them a bonus. But > for some reason the daemon is not identified as "interactive" meaning > that it gets given a lower priority. In this situation if there's a > large number of httpd threads (even with promotion) it could take quite > a while for the daemon to get a look in. Without promotion total > starvation is even a possibility. > > Peter > PS For both "eb" and "pb" modes, max_io_bonus should be set to zero on > servers (where interactive responsiveness isn't an issue). > PPS For "sc" mode, try setting "interactive" to zero and "compute" to 1. I've just run your tests on my desktop and with max_ia_bonus at its default value I see the "delta = 3" with 20 threads BUT when I set max_ia_bonus to zero they stop (in both "eb" and "pb" mode). So I then reran the tests with 60 threads and zero max_ia_bonus and no output was generated by your testdelay script in either "eb" or "pb" modes. I didn't try "sc" mode as I have a ZAPHOD kernel loaded (not HYDRA) but Con has reported that the problem is absent in his latest patches so I'll update the "sc" mode in HYDRA to those patches. Peter -- Peter Williams pwil3058@bigpond.net.au "Learning, n. The kind of ignorance distinguishing the studious." -- Ambrose Bierce ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others) 2004-08-11 3:44 ` Peter Williams @ 2004-08-13 0:13 ` spaminos-ker 2004-08-13 1:44 ` Peter Williams 0 siblings, 1 reply; 21+ messages in thread From: spaminos-ker @ 2004-08-13 0:13 UTC (permalink / raw) To: linux-kernel; +Cc: Peter Williams, William Lee Irwin III --- Peter Williams <pwil3058@bigpond.net.au> wrote: > I've just run your tests on my desktop and with max_ia_bonus at its > default value I see the "delta = 3" with 20 threads BUT when I set > max_ia_bonus to zero they stop (in both "eb" and "pb" mode). So I then > reran the tests with 60 threads and zero max_ia_bonus and no output was > generated by your testdelay script in either "eb" or "pb" modes. I > didn't try "sc" mode as I have a ZAPHOD kernel loaded (not HYDRA) but > Con has reported that the problem is absent in his latest patches so > I'll update the "sc" mode in HYDRA to those patches. > I just tried the same test on spa-zaphod-linux 4.1 over 2.6.8-rc4 I also have messages with 20 threads "delta = 3" that go away when I set max_ia_bonus to 0 (and stay off with 60 threads too) in "pb" mode. But, unlike your desktop, the "eb" mode doesn't seem to get better by setting max_ia_bonus to 0 on my machine, maybe I need to tweak something else? (even though, the idea of tweaking for a given workload doesn't sound very good to me). The "pb" mode is very responsive with the system under heavy load, I like it :) I will run some tests over the week end with the actual server to see the effect of this patch on a more complex system. Nicolas PS: the machine I am using is a pure server, only accessible through ssh, so I can not really tell the behavior under X. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others) 2004-08-13 0:13 ` spaminos-ker @ 2004-08-13 1:44 ` Peter Williams 0 siblings, 0 replies; 21+ messages in thread From: Peter Williams @ 2004-08-13 1:44 UTC (permalink / raw) To: spaminos-ker; +Cc: linux-kernel, William Lee Irwin III spaminos-ker@yahoo.com wrote: > --- Peter Williams <pwil3058@bigpond.net.au> wrote: > >>I've just run your tests on my desktop and with max_ia_bonus at its >>default value I see the "delta = 3" with 20 threads BUT when I set >>max_ia_bonus to zero they stop (in both "eb" and "pb" mode). So I then >>reran the tests with 60 threads and zero max_ia_bonus and no output was >>generated by your testdelay script in either "eb" or "pb" modes. I >>didn't try "sc" mode as I have a ZAPHOD kernel loaded (not HYDRA) but >>Con has reported that the problem is absent in his latest patches so >>I'll update the "sc" mode in HYDRA to those patches. >> > > > I just tried the same test on spa-zaphod-linux 4.1 over 2.6.8-rc4 > > I also have messages with 20 threads "delta = 3" that go away when I set > max_ia_bonus to 0 (and stay off with 60 threads too) in "pb" mode. I'm going to do some experiments to measure the relationship between the size of max_ia_bonus and the observed delays to see if there's value that gives acceptable performance without turning bonuses off completely. > But, unlike your desktop, the "eb" mode doesn't seem to get better by setting > max_ia_bonus to 0 on my machine, maybe I need to tweak something else? (even > though, the idea of tweaking for a given workload doesn't sound very good to > me). You could try increasing "base_promotion_interval". When I have a better idea of the best values (for each mode) for the various parameters I'll reset their values when the mode is changed. > > The "pb" mode is very responsive with the system under heavy load, I like it :) That's good to hear. If you have time, I'd appreciate if you could try a few different values of max_ia_bonus to determine the minimum value that still gives good responsiveness for your system? I'm trying to get a feel for how much this varies from system to system. > > I will run some tests over the week end with the actual server to see the > effect of this patch on a more complex system. > > Nicolas > > PS: the machine I am using is a pure server, only accessible through ssh, so I > can not really tell the behavior under X. If it's a pure server I imagine that it's not running X. On a pure server I'd recommend setting max_ia_bonus to zero. Thanks Peter -- Peter Williams pwil3058@bigpond.net.au "Learning, n. The kind of ignorance distinguishing the studious." -- Ambrose Bierce ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others) 2004-08-11 2:21 ` Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others) spaminos-ker 2004-08-11 2:23 ` William Lee Irwin III @ 2004-08-11 3:09 ` Con Kolivas 2004-08-11 10:24 ` Prakash K. Cheemplavam ` (2 more replies) 1 sibling, 3 replies; 21+ messages in thread From: Con Kolivas @ 2004-08-11 3:09 UTC (permalink / raw) To: spaminos-ker; +Cc: linux-kernel, William Lee Irwin III spaminos-ker@yahoo.com writes: > --- William Lee Irwin III <wli@holomorphy.com> wrote: >> >> Wakeup bonuses etc. are starving tasks. Could you try Peter Williams' >> SPA patches with the do_promotions() function? I suspect these should >> pass your tests. >> >> >> -- wli >> > > I tried the patch-2.6.7-spa_hydra_FULL-v4.0 patch > > I only changed the value of /proc/sys/kernel/cpusched/mode to switch between > different patches. > > The 2 threads test passes successfuly (improvement over stock 2.6.7) but none > passed the 20 threads test: Hi I tried this on the latest staircase patch (7.I) and am not getting any output from your script when tested up to 60 threads on my hardware. Can you try this version of staircase please? There are 7.I patches against 2.6.8-rc4 and 2.6.8-rc4-mm1 http://ck.kolivas.org/patches/2.6/2.6.8/ Cheers, Con ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others) 2004-08-11 3:09 ` Con Kolivas @ 2004-08-11 10:24 ` Prakash K. Cheemplavam 2004-08-11 11:26 ` Scheduler fairness problem on 2.6 series Con Kolivas 2004-08-12 2:04 ` Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others) spaminos-ker 2004-08-12 2:24 ` spaminos-ker 2 siblings, 1 reply; 21+ messages in thread From: Prakash K. Cheemplavam @ 2004-08-11 10:24 UTC (permalink / raw) To: Con Kolivas; +Cc: spaminos-ker, linux-kernel, William Lee Irwin III -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Con Kolivas wrote: | I tried this on the latest staircase patch (7.I) and am not getting any | output from your script when tested up to 60 threads on my hardware. Can | you try this version of staircase please? | | There are 7.I patches against 2.6.8-rc4 and 2.6.8-rc4-mm1 | | http://ck.kolivas.org/patches/2.6/2.6.8/ Hi, I just updated to 2.6.8-rc4-ck2 and tried the two options interactive and compute. Is the compute stuff functional? I tried setting it to 1 within X and after that X wasn't usable anymore (meaning it looked like locked up, frozen/gone mouse cursor even). I managed to switch back to console and set it to 0 and all was OK again. The interactive to 0 setting helped me with runnign locally multiple processes using mpi. Nevertheless (only with interactive 1 regression to vanilla scheduler, else same) can't this be enhanced? Details: I am working on a load balancing class using mpi. For testing purpises I am running multiple processes on my machine. So for a given problem I can say, it needs x time to solve. Using more processes opn a single machine, this time (except communication and balancing overhead) shouldn't be much larger. Unfortunately this happens. Eg. a given probelm using two processes needs about 20 seconds to finish. But using 8 it already needs 47s (55s with interactiv set to 1). No, my balancing framework is quite good. On a real (small, even larger till 128 nodes tested) cluster overhead is just as low as 3% to 5%, ie. it scales quite linearly. Any idea how to tweak the staircase to get near the 20 seconds with more processes? Or is this rather a problem of mpich used locally? If you like I can send you my code to test (beware it is not that small). Cheers, Prakash -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFBGfPZxU2n/+9+t5gRApa1AJ9j82Aujwj/IoGLqvDsX29y/dLu/wCglvse bRV6zeWc+6z+ETl9Hxqleho= =Jay6 -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Scheduler fairness problem on 2.6 series 2004-08-11 10:24 ` Prakash K. Cheemplavam @ 2004-08-11 11:26 ` Con Kolivas 2004-08-11 12:05 ` Prakash K. Cheemplavam 0 siblings, 1 reply; 21+ messages in thread From: Con Kolivas @ 2004-08-11 11:26 UTC (permalink / raw) To: Prakash K. Cheemplavam; +Cc: linux kernel mailing list [-- Attachment #1: Type: text/plain, Size: 2465 bytes --] Prakash K. Cheemplavam wrote: > Con Kolivas wrote: > | I tried this on the latest staircase patch (7.I) and am not getting any > | output from your script when tested up to 60 threads on my hardware. Can > | you try this version of staircase please? > | > | There are 7.I patches against 2.6.8-rc4 and 2.6.8-rc4-mm1 > | > | http://ck.kolivas.org/patches/2.6/2.6.8/ > > Hi, > > I just updated to 2.6.8-rc4-ck2 and tried the two options interactive > and compute. Is the compute stuff functional? I tried setting it to 1 > within X and after that X wasn't usable anymore (meaning it looked like > locked up, frozen/gone mouse cursor even). I managed to switch back to > console and set it to 0 and all was OK again. Compute is very functional. However it isn't remotely meant to be run on a desktop because of very large scheduling latencies (on purpose). > The interactive to 0 setting helped me with runnign locally multiple > processes using mpi. Nevertheless (only with interactive 1 regression to > vanilla scheduler, else same) can't this be enhanced? I don't understand your question. Can what be enhanced? > Details: I am working on a load balancing class using mpi. For testing > purpises I am running multiple processes on my machine. So for a given > problem I can say, it needs x time to solve. Using more processes opn a > single machine, this time (except communication and balancing overhead) > shouldn't be much larger. Unfortunately this happens. Eg. a given > probelm using two processes needs about 20 seconds to finish. But using > 8 it already needs 47s (55s with interactiv set to 1). No, my balancing > framework is quite good. On a real (small, even larger till 128 nodes > tested) cluster overhead is just as low as 3% to 5%, ie. it scales quite > linearly. Once again I dont quite understand you. Are you saying that there is more than 50% cpu overhead when running 8 processes? Or that the cpu is distributed unfairly such that the longest will run for 47s? > Any idea how to tweak the staircase to get near the 20 seconds with more > processes? Or is this rather a problem of mpich used locally? Compute mode is by far the most scalable mode in staircase for purely computational tasks. The cost is that of interactivity; it is bad on purpose since it is a no-compromise maximum cpu cache utilisation policy. > If you like I can send you my code to test (beware it is not that small). > > Cheers, > > Prakash Cheers, Con [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 256 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Scheduler fairness problem on 2.6 series 2004-08-11 11:26 ` Scheduler fairness problem on 2.6 series Con Kolivas @ 2004-08-11 12:05 ` Prakash K. Cheemplavam 2004-08-11 19:22 ` Prakash K. Cheemplavam 0 siblings, 1 reply; 21+ messages in thread From: Prakash K. Cheemplavam @ 2004-08-11 12:05 UTC (permalink / raw) To: Con Kolivas; +Cc: linux kernel mailing list -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Con Kolivas wrote: | Prakash K. Cheemplavam wrote: | |> Con Kolivas wrote: |> | I tried this on the latest staircase patch (7.I) and am not getting any |> | output from your script when tested up to 60 threads on my hardware. |> Can |> | you try this version of staircase please? |> | |> | There are 7.I patches against 2.6.8-rc4 and 2.6.8-rc4-mm1 |> | |> | http://ck.kolivas.org/patches/2.6/2.6.8/ |> |> Hi, |> |> I just updated to 2.6.8-rc4-ck2 and tried the two options interactive |> and compute. Is the compute stuff functional? I tried setting it to 1 |> within X and after that X wasn't usable anymore (meaning it looked like |> locked up, frozen/gone mouse cursor even). I managed to switch back to |> console and set it to 0 and all was OK again. | | | Compute is very functional. However it isn't remotely meant to be run on | a desktop because of very large scheduling latencies (on purpose). Uhm, OK, I didn't know it would have such drastic effect. Perhpas you should add a warnign that this setting shouldn't be used on X. :-) | |> The interactive to 0 setting helped me with runnign locally multiple |> processes using mpi. Nevertheless (only with interactive 1 regression to |> vanilla scheduler, else same) can't this be enhanced? | | | I don't understand your question. Can what be enhanced? | |> Details: I am working on a load balancing class using mpi. For testing |> purpises I am running multiple processes on my machine. So for a given |> problem I can say, it needs x time to solve. Using more processes opn a |> single machine, this time (except communication and balancing overhead) |> shouldn't be much larger. Unfortunately this happens. Eg. a given |> probelm using two processes needs about 20 seconds to finish. But using |> 8 it already needs 47s (55s with interactiv set to 1). No, my balancing |> framework is quite good. On a real (small, even larger till 128 nodes |> tested) cluster overhead is just as low as 3% to 5%, ie. it scales quite |> linearly. | | | Once again I dont quite understand you. Are you saying that there is | more than 50% cpu overhead when running 8 processes? Or that the cpu is | distributed unfairly such that the longest will run for 47s? I don't think it is the overhead. I rather think the way the kernel schedulers gives mpich and the cpu bound program resources is unfair. Or the timeslice is tto big? Those 8 processes in my test usually do a load-balancing after 1 second of work. In this second all of those processes should use the CPU at the same time. I rather have the impression that the processes get CPU time one after the other, so it fools the load balancer to think the cpu is fast (the job is done in "regular" time but the overhead seems to be big, as each process after having finished now waits for the next one to finish and communicate with it. Or to put it more graphically (with 4 processes consisting of 3 parts just for making it clear and final communication:) What is done now (xy, x: process, y:part or communication): 11 12 13 1c 21 22 23 2c 31 32 33 3c 41 42 43 4c What the sheduler should rather do: 11 21 31 41 12 22 32 42 13 23 33 43 1c 2c 3c 4c So the balancer would rather find the CPU to be slower by the factor of used processes instead of thinking the overhead is big. (I am not sure whether this really explains the steep increase of time wasted with more processes used. Perheaps it really is mpich, though I don't understand why it would use up so much time. Any way for me to find out? Via profiling?) This is just a guess of what I think goes wrong. (Is the timeslice simply too big which the scheduler gives each process?) hth, Prakash -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFBGgtxxU2n/+9+t5gRAp4HAJ0eN4j3RHvTmvQDzMi+fpa2YAuU3QCgpQRQ 6zbDInJz3DqrJrzh3DUTiIw= =Yk5C -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Scheduler fairness problem on 2.6 series 2004-08-11 12:05 ` Prakash K. Cheemplavam @ 2004-08-11 19:22 ` Prakash K. Cheemplavam 2004-08-11 23:42 ` Con Kolivas 0 siblings, 1 reply; 21+ messages in thread From: Prakash K. Cheemplavam @ 2004-08-11 19:22 UTC (permalink / raw) Cc: Con Kolivas, linux kernel mailing list -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 | | I don't think it is the overhead. I rather think the way the kernel | schedulers gives mpich and the cpu bound program resources is unfair. Well, I don't know whether it helps, but I ran a profiler and these are the functions which cause so much wasted CPU cycles when running 16 processes of my example with mpich: 124910 9.8170 vmlinux tcp_poll 123356 9.6949 vmlinux sys_select 85634 6.7302 vmlinux do_select 71858 5.6475 vmlinux sysenter_past_esp 62093 4.8801 vmlinux kfree 51658 4.0600 vmlinux __copy_to_user_ll 37495 2.9468 vmlinux max_select_fd 36949 2.9039 vmlinux __kmalloc 22700 1.7841 vmlinux __copy_from_user_ll 14587 1.1464 vmlinux do_gettimeofday Is anything scheduler related? bye, Prakash -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFBGnHxxU2n/+9+t5gRAlF+AJ9z+OqbIJYkeiy4nAPVB22S/WLLnACg1khF XeF+3Hq0adpoLjdbn+tmzn0= =7Onu -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Scheduler fairness problem on 2.6 series 2004-08-11 19:22 ` Prakash K. Cheemplavam @ 2004-08-11 23:42 ` Con Kolivas 2004-08-12 8:08 ` Prakash K. Cheemplavam 2004-08-12 18:18 ` Bill Davidsen 0 siblings, 2 replies; 21+ messages in thread From: Con Kolivas @ 2004-08-11 23:42 UTC (permalink / raw) To: Prakash K. Cheemplavam; +Cc: linux kernel mailing list [-- Attachment #1: Type: text/plain, Size: 1404 bytes --] Prakash K. Cheemplavam wrote: > | > | I don't think it is the overhead. I rather think the way the kernel > | schedulers gives mpich and the cpu bound program resources is unfair. > > Well, I don't know whether it helps, but I ran a profiler and these are > the functions which cause so much wasted CPU cycles when running 16 > processes of my example with mpich: > > 124910 9.8170 vmlinux tcp_poll > 123356 9.6949 vmlinux sys_select > 85634 6.7302 vmlinux do_select > 71858 5.6475 vmlinux sysenter_past_esp > 62093 4.8801 vmlinux kfree > 51658 4.0600 vmlinux __copy_to_user_ll > 37495 2.9468 vmlinux max_select_fd > 36949 2.9039 vmlinux __kmalloc > 22700 1.7841 vmlinux __copy_from_user_ll > 14587 1.1464 vmlinux do_gettimeofday > > Is anything scheduler related? No It looks like your select timeouts are too short and when the cpu load goes up they repeatedly timeout wasting cpu cycles. I quote from `man select_tut` under the section SELECT LAW: 1. You should always try use select without a timeout. Your program should have nothing to do if there is no data available. Code that depends on timeouts is not usually portable and difficult to debug. Cheers, Con [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 256 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Scheduler fairness problem on 2.6 series 2004-08-11 23:42 ` Con Kolivas @ 2004-08-12 8:08 ` Prakash K. Cheemplavam 2004-08-12 18:18 ` Bill Davidsen 1 sibling, 0 replies; 21+ messages in thread From: Prakash K. Cheemplavam @ 2004-08-12 8:08 UTC (permalink / raw) To: Con Kolivas; +Cc: linux kernel mailing list -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Con Kolivas wrote: | Prakash K. Cheemplavam wrote: | |> 124910 9.8170 vmlinux tcp_poll |> 123356 9.6949 vmlinux sys_select |> 85634 6.7302 vmlinux do_select |> 71858 5.6475 vmlinux sysenter_past_esp |> 62093 4.8801 vmlinux kfree |> 51658 4.0600 vmlinux __copy_to_user_ll |> 37495 2.9468 vmlinux max_select_fd |> 36949 2.9039 vmlinux __kmalloc |> 22700 1.7841 vmlinux __copy_from_user_ll |> 14587 1.1464 vmlinux do_gettimeofday |> | It looks like your select timeouts are too short and when the cpu load | goes up they repeatedly timeout wasting cpu cycles. | I quote from `man select_tut` under the section SELECT LAW: | | 1. You should always try use select without a timeout. Your program | should have nothing to do if there is no data available. Code | that depends on timeouts is not usually portable and difficult | to debug. | Thanks for your explanation. I cannot do anything about it, as it is mpich related. So I'll ask them if they could change its behaviour a bit so that it eats less CPU on a single CPU machine. Cheers, Prakash -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFBGyV1xU2n/+9+t5gRAqHEAJ9hW/AJYtMenL6mXQ4JZYvTvRrRkgCdHwQD LbJ1MYJ/pbpNbrT8vvlD8uI= =9AUE -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Scheduler fairness problem on 2.6 series 2004-08-11 23:42 ` Con Kolivas 2004-08-12 8:08 ` Prakash K. Cheemplavam @ 2004-08-12 18:18 ` Bill Davidsen 1 sibling, 0 replies; 21+ messages in thread From: Bill Davidsen @ 2004-08-12 18:18 UTC (permalink / raw) To: linux-kernel Con Kolivas wrote: > Prakash K. Cheemplavam wrote: > >> | >> | I don't think it is the overhead. I rather think the way the kernel >> | schedulers gives mpich and the cpu bound program resources is unfair. >> >> Well, I don't know whether it helps, but I ran a profiler and these are >> the functions which cause so much wasted CPU cycles when running 16 >> processes of my example with mpich: >> >> 124910 9.8170 vmlinux tcp_poll >> 123356 9.6949 vmlinux sys_select >> 85634 6.7302 vmlinux do_select >> 71858 5.6475 vmlinux sysenter_past_esp >> 62093 4.8801 vmlinux kfree >> 51658 4.0600 vmlinux __copy_to_user_ll >> 37495 2.9468 vmlinux max_select_fd >> 36949 2.9039 vmlinux __kmalloc >> 22700 1.7841 vmlinux __copy_from_user_ll >> 14587 1.1464 vmlinux do_gettimeofday >> >> Is anything scheduler related? > > > No > > It looks like your select timeouts are too short and when the cpu load > goes up they repeatedly timeout wasting cpu cycles. > I quote from `man select_tut` under the section SELECT LAW: > > 1. You should always try use select without a timeout. Your program > should have nothing to do if there is no data available. Code > that depends on timeouts is not usually portable and difficult > to debug. There's a generalization which should confuse novice users... correctly used a timeout IS a debugging technique. Useful to detect when a peer has gone walkabout, as a common example. Sounds as if the timeout is way too low here, however. Perhaps they are using it as poorly-done polling? In any case, not kernel misbehaviour. -- -bill davidsen (davidsen@tmr.com) "The secret to procrastination is to put things off until the last possible moment - but no longer" -me ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others) 2004-08-11 3:09 ` Con Kolivas 2004-08-11 10:24 ` Prakash K. Cheemplavam @ 2004-08-12 2:04 ` spaminos-ker 2004-08-12 2:24 ` spaminos-ker 2 siblings, 0 replies; 21+ messages in thread From: spaminos-ker @ 2004-08-12 2:04 UTC (permalink / raw) To: Con Kolivas; +Cc: linux-kernel, William Lee Irwin III --- Con Kolivas <kernel@kolivas.org> wrote: > Hi > > I tried this on the latest staircase patch (7.I) and am not getting any > output from your script when tested up to 60 threads on my hardware. Can you > try this version of staircase please? > > There are 7.I patches against 2.6.8-rc4 and 2.6.8-rc4-mm1 > > http://ck.kolivas.org/patches/2.6/2.6.8/ > > Cheers, > Con > Just tried on my machine: 2.6.8-rc4 fails all tests (did the test just to be sure) 2.6.8-rc4 with the "from_2.6.8-rc4_to_staircase7.I" patch and things look pretty good: on my hardware, I could put 60 threads too, and my shells are still very responsive etc, and I get no slow downs with my watchdog script. A few strange things happened though (with 60 threads): * after a few minutes, I got one message Wed Aug 11 18:06:11 PDT 2004 >>>>>>> delta = 57 57 seconds !?! very surprising * shortly after that, I tried to run top, or ps, and they all got stuck, I waited a couple minutes and they were still stuck. I opened a few shells, I could do anything but commands that enumerate the process list. After a while, I killed the cputest program (ctrld c it), and the stucked ps/top continued their execution. I could not reproduce those problems ; I even rebooted the machine, but only got one message delta of 3 every 30 minutes or so. Nicolas ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others) 2004-08-11 3:09 ` Con Kolivas 2004-08-11 10:24 ` Prakash K. Cheemplavam 2004-08-12 2:04 ` Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others) spaminos-ker @ 2004-08-12 2:24 ` spaminos-ker 2004-08-12 2:53 ` Con Kolivas 2 siblings, 1 reply; 21+ messages in thread From: spaminos-ker @ 2004-08-12 2:24 UTC (permalink / raw) To: Con Kolivas; +Cc: linux-kernel, William Lee Irwin III --- Con Kolivas <kernel@kolivas.org> wrote: > > Hi > > I tried this on the latest staircase patch (7.I) and am not getting any > output from your script when tested up to 60 threads on my hardware. Can you > try this version of staircase please? > > There are 7.I patches against 2.6.8-rc4 and 2.6.8-rc4-mm1 > > http://ck.kolivas.org/patches/2.6/2.6.8/ > > Cheers, > Con > > One thing to note is that I do get a lot of output from the script if I set interactive to 0 (delays between 3 and 13 seconds with 60 threads). Nicolas ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others) 2004-08-12 2:24 ` spaminos-ker @ 2004-08-12 2:53 ` Con Kolivas 0 siblings, 0 replies; 21+ messages in thread From: Con Kolivas @ 2004-08-12 2:53 UTC (permalink / raw) To: spaminos-ker; +Cc: linux-kernel, William Lee Irwin III spaminos-ker@yahoo.com writes: > --- Con Kolivas <kernel@kolivas.org> wrote: >> >> Hi >> >> I tried this on the latest staircase patch (7.I) and am not getting any >> output from your script when tested up to 60 threads on my hardware. Can you >> try this version of staircase please? >> >> There are 7.I patches against 2.6.8-rc4 and 2.6.8-rc4-mm1 >> >> http://ck.kolivas.org/patches/2.6/2.6.8/ >> >> Cheers, >> Con >> >> > > One thing to note is that I do get a lot of output from the script if I set > interactive to 0 (delays between 3 and 13 seconds with 60 threads). Sounds fair. With interactive==0 it will penalise tasks during their bursts of cpu usage in the interest of fairness, and your script effectively is BASH doing a burst of cpu so 3-13 second delays when the load is effectively >60 is pretty good. Cheers, Con ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2004-08-13 1:44 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20040811010116.GL11200@holomorphy.com>
2004-08-11 2:21 ` Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others) spaminos-ker
2004-08-11 2:23 ` William Lee Irwin III
2004-08-11 2:45 ` Peter Williams
2004-08-11 2:47 ` Peter Williams
2004-08-11 3:23 ` Peter Williams
2004-08-11 3:31 ` Con Kolivas
2004-08-11 3:46 ` Peter Williams
2004-08-11 3:44 ` Peter Williams
2004-08-13 0:13 ` spaminos-ker
2004-08-13 1:44 ` Peter Williams
2004-08-11 3:09 ` Con Kolivas
2004-08-11 10:24 ` Prakash K. Cheemplavam
2004-08-11 11:26 ` Scheduler fairness problem on 2.6 series Con Kolivas
2004-08-11 12:05 ` Prakash K. Cheemplavam
2004-08-11 19:22 ` Prakash K. Cheemplavam
2004-08-11 23:42 ` Con Kolivas
2004-08-12 8:08 ` Prakash K. Cheemplavam
2004-08-12 18:18 ` Bill Davidsen
2004-08-12 2:04 ` Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others) spaminos-ker
2004-08-12 2:24 ` spaminos-ker
2004-08-12 2:53 ` Con Kolivas
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox