* [BENCHMARK] 2.6.3-rc2 v 2.6.3-rc3-mm1 kernbench @ 2004-02-16 12:59 Con Kolivas 2004-02-16 13:20 ` Nick Piggin 0 siblings, 1 reply; 8+ messages in thread From: Con Kolivas @ 2004-02-16 12:59 UTC (permalink / raw) To: linux kernel mailing list; +Cc: Nick Piggin, Andrew Morton, Cliff White -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Here's some nice evidence of the sched domains' patch value: kernbench 0.20 running on an X440 8x1.5Ghz P4HT (2 node) Time is in seconds. Lower is better (fixed font table) Summary: Kernel: 2.6.3-rc2 2.6.3-rc3-mm1 Half(-j8) 120.8 113.0 Optimal(-j64) 81.6 79.3 Max(-j) 82.9 80.3 shorter summary: 2.6.3-rc3-mm1 kicks butt long winded summary (look at the massive context switch differences): results.2.6.3-rc2 Average Half Load Run: Elapsed Time 120.808 User Time 802.428 System Time 92.072 Percent CPU 740 Context Switches 10613.6 Sleeps 26667 Average Optimum Load Run: Elapsed Time 81.59 User Time 1007.89 System Time 112.36 Percent CPU 1372.6 Context Switches 63006.2 Sleeps 40406 Average Maximum Load Run: Elapsed Time 82.944 User Time 1012.33 System Time 122.424 Percent CPU 1367.6 Context Switches 44822.2 Sleeps 22161 results.2.6.3-rc3-mm1: Average Half Load Run: Elapsed Time 113.008 User Time 742.786 System Time 90.65 Percent CPU 738 Context Switches 28062.6 Sleeps 24571.8 Average Optimum Load Run: Elapsed Time 79.278 User Time 1007.69 System Time 107.388 Percent CPU 1407 Context Switches 33355 Sleeps 32720 Average Maximum Load Run: Elapsed Time 80.33 User Time 1009.89 System Time 121.518 Percent CPU 1408.4 Context Switches 31802.4 Sleeps 22905 Con -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (GNU/Linux) iD8DBQFAML6+ZUg7+tp6mRURAiH3AJ0bNPTLxNncxVoT1LivWhe4sXrAyQCeJzXw 6IsBVGzd4yJpR9eW3gZYBPM= =tDlR -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [BENCHMARK] 2.6.3-rc2 v 2.6.3-rc3-mm1 kernbench 2004-02-16 12:59 [BENCHMARK] 2.6.3-rc2 v 2.6.3-rc3-mm1 kernbench Con Kolivas @ 2004-02-16 13:20 ` Nick Piggin 2004-02-16 14:30 ` Con Kolivas 0 siblings, 1 reply; 8+ messages in thread From: Nick Piggin @ 2004-02-16 13:20 UTC (permalink / raw) To: Con Kolivas; +Cc: linux kernel mailing list, Andrew Morton, Cliff White Con Kolivas wrote: >-----BEGIN PGP SIGNED MESSAGE----- >Hash: SHA1 > >Here's some nice evidence of the sched domains' patch value: >kernbench 0.20 running on an X440 8x1.5Ghz P4HT (2 node) > >Time is in seconds. Lower is better (fixed font table) > >Summary: >Kernel: 2.6.3-rc2 2.6.3-rc3-mm1 >Half(-j8) 120.8 113.0 >Optimal(-j64) 81.6 79.3 >Max(-j) 82.9 80.3 > > >shorter summary: >2.6.3-rc3-mm1 kicks butt > > Thanks Con, Results look pretty good. The half-load context switches are increased - that is probably a result of active balancing. And speaking of active balancing, it is not yet working across nodes with the configuration you're on. To get some idea of our worst case SMT performance (-j8), would it be possible to do -j8 and -j64 runs with HT turned off? ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [BENCHMARK] 2.6.3-rc2 v 2.6.3-rc3-mm1 kernbench 2004-02-16 13:20 ` Nick Piggin @ 2004-02-16 14:30 ` Con Kolivas 2004-02-17 0:42 ` bill davidsen 0 siblings, 1 reply; 8+ messages in thread From: Con Kolivas @ 2004-02-16 14:30 UTC (permalink / raw) To: Nick Piggin; +Cc: linux kernel mailing list On Tue, 17 Feb 2004 00:20, Nick Piggin wrote: > Con Kolivas wrote: > >-----BEGIN PGP SIGNED MESSAGE----- > >Hash: SHA1 > > > >Here's some nice evidence of the sched domains' patch value: > >kernbench 0.20 running on an X440 8x1.5Ghz P4HT (2 node) > > > >Time is in seconds. Lower is better (fixed font table) > > > >Summary: > >Kernel: 2.6.3-rc2 2.6.3-rc3-mm1 > >Half(-j8) 120.8 113.0 > >Optimal(-j64) 81.6 79.3 > >Max(-j) 82.9 80.3 > > > > > >shorter summary: > >2.6.3-rc3-mm1 kicks butt > > Thanks Con, > Results look pretty good. The half-load context switches are > increased - that is probably a result of active balancing. > And speaking of active balancing, it is not yet working across > nodes with the configuration you're on. > > To get some idea of our worst case SMT performance (-j8), would > it be possible to do -j8 and -j64 runs with HT turned off? sure. results.2.6.3-rc3-mm1 + SMT: Average Half Load Run: Elapsed Time 113.008 User Time 742.786 System Time 90.65 Percent CPU 738 Context Switches 28062.6 Sleeps 24571.8 Average Optimum Load Run: Elapsed Time 79.278 User Time 1007.69 System Time 107.388 Percent CPU 1407 Context Switches 33355 Sleeps 32720 2.6.3-rc3-mm1 no SMT: Average Half Load Run: Elapsed Time 133.51 User Time 799.268 System Time 92.784 Percent CPU 669 Context Switches 19340.8 Sleeps 24427.4 Average Optimum Load Run: Elapsed Time 81.486 User Time 1006.37 System Time 106.952 Percent CPU 1366.8 Context Switches 33939 Sleeps 32453.4 Con ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [BENCHMARK] 2.6.3-rc2 v 2.6.3-rc3-mm1 kernbench 2004-02-16 14:30 ` Con Kolivas @ 2004-02-17 0:42 ` bill davidsen 2004-02-17 4:22 ` Nick Piggin 0 siblings, 1 reply; 8+ messages in thread From: bill davidsen @ 2004-02-17 0:42 UTC (permalink / raw) To: linux-kernel In article <200402170130.24070.kernel@kolivas.org>, Con Kolivas <kernel@kolivas.org> wrote: | On Tue, 17 Feb 2004 00:20, Nick Piggin wrote: | > | > Thanks Con, | > Results look pretty good. The half-load context switches are | > increased - that is probably a result of active balancing. | > And speaking of active balancing, it is not yet working across | > nodes with the configuration you're on. | > | > To get some idea of our worst case SMT performance (-j8), would | > it be possible to do -j8 and -j64 runs with HT turned off? | | sure. Now, I have a problem with the numbers here, either I don't understand them (likely) or I don't believe them (also possible). | | results.2.6.3-rc3-mm1 + SMT: | Average Half Load Run: | Elapsed Time 113.008 | User Time 742.786 | System Time 90.65 | Percent CPU 738 | Context Switches 28062.6 | Sleeps 24571.8 | 2.6.3-rc3-mm1 no SMT: | Average Half Load Run: | Elapsed Time 133.51 | User Time 799.268 | System Time 92.784 | Percent CPU 669 | Context Switches 19340.8 | Sleeps 24427.4 As I look at these numbers, I see that with SMT the real time is lower, the system time is higher, and the system time is higher. All what I would expect since effectively the system has twice as many CPUs. But the user time, there I have a problem understanding. User time is time in the user program, and I would expect user time to go up, since resource contention inside a CPU is likely to mean less work being done per unit of time, and therefore if you measure CPU time from the outside you need more of it to get the job done. And what do you get running on one non-SMT CPU for the same mix? When I run stuff I usually see the user CPU go up a tad and the e.t. go down a little (SMT) or quite a bit (SMP). I have faith in the reporting of the numbers, but I wonder about the way the data were measured. Hopefully someone can clarify, because it looks a little like what you would see if you counted "one" for one tick worth of user mode time in the CPU, regardless of whether one or two threads were executing. -- bill davidsen <davidsen@tmr.com> CTO, TMR Associates, Inc Doing interesting things with little computers since 1979. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [BENCHMARK] 2.6.3-rc2 v 2.6.3-rc3-mm1 kernbench 2004-02-17 0:42 ` bill davidsen @ 2004-02-17 4:22 ` Nick Piggin 2004-02-17 16:19 ` Bill Davidsen 0 siblings, 1 reply; 8+ messages in thread From: Nick Piggin @ 2004-02-17 4:22 UTC (permalink / raw) To: bill davidsen; +Cc: linux-kernel, Con Kolivas bill davidsen wrote: >In article <200402170130.24070.kernel@kolivas.org>, >Con Kolivas <kernel@kolivas.org> wrote: >| On Tue, 17 Feb 2004 00:20, Nick Piggin wrote: > >| > >| > Thanks Con, >| > Results look pretty good. The half-load context switches are >| > increased - that is probably a result of active balancing. >| > And speaking of active balancing, it is not yet working across >| > nodes with the configuration you're on. >| > >| > To get some idea of our worst case SMT performance (-j8), would >| > it be possible to do -j8 and -j64 runs with HT turned off? >| >| sure. > >Now, I have a problem with the numbers here, either I don't understand >them (likely) or I don't believe them (also possible). >| >| results.2.6.3-rc3-mm1 + SMT: >| Average Half Load Run: >| Elapsed Time 113.008 >| User Time 742.786 >| System Time 90.65 >| Percent CPU 738 >| Context Switches 28062.6 >| Sleeps 24571.8 > > >| 2.6.3-rc3-mm1 no SMT: >| Average Half Load Run: >| Elapsed Time 133.51 >| User Time 799.268 >| System Time 92.784 >| Percent CPU 669 >| Context Switches 19340.8 >| Sleeps 24427.4 > >As I look at these numbers, I see that with SMT the real time is lower, >the system time is higher, and the system time is higher. All what I >would expect since effectively the system has twice as many CPUs. > >But the user time, there I have a problem understanding. User time is >time in the user program, and I would expect user time to go up, since >resource contention inside a CPU is likely to mean less work being done >per unit of time, and therefore if you measure CPU time from the outside >you need more of it to get the job done. > >And what do you get running on one non-SMT CPU for the same mix? When I >run stuff I usually see the user CPU go up a tad and the e.t. go down a >little (SMT) or quite a bit (SMP). > >I have faith in the reporting of the numbers, but I wonder about the way >the data were measured. Hopefully someone can clarify, because it looks >a little like what you would see if you counted "one" for one tick worth >of user mode time in the CPU, regardless of whether one or two threads >were executing. > Bill, I have CC'ed your message without modification because Con is not subscribed to the list. Even for people who are subscribed, the convention on lkml is to reply to all. Anyway, the "no SMT" run is with CONFIG_SCHED_SMT turned off, P4 HT is still on. This was my fault because I didn't specify clearly that I wanted to see a run with hardware HT turned off, although these numbers are still interesting. Con hasn't tried HT off AFAIK because we couldn't work out how to turn it off at boot time! :( ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [BENCHMARK] 2.6.3-rc2 v 2.6.3-rc3-mm1 kernbench 2004-02-17 4:22 ` Nick Piggin @ 2004-02-17 16:19 ` Bill Davidsen 2004-02-17 16:45 ` Nick Piggin 0 siblings, 1 reply; 8+ messages in thread From: Bill Davidsen @ 2004-02-17 16:19 UTC (permalink / raw) To: Nick Piggin; +Cc: linux-kernel, Con Kolivas On Tue, 17 Feb 2004, Nick Piggin wrote: > Bill, I have CC'ed your message without modification because Con is > not subscribed to the list. Even for people who are subscribed, the > convention on lkml is to reply to all. > > Anyway, the "no SMT" run is with CONFIG_SCHED_SMT turned off, P4 HT > is still on. This was my fault because I didn't specify clearly that > I wanted to see a run with hardware HT turned off, although these > numbers are still interesting. > > Con hasn't tried HT off AFAIK because we couldn't work out how to > turn it off at boot time! :( The curse of the brain-dead BIOS :-( So does CONFIG_SCHED_SMT turned off mean not using more than one sibling per package, or just going back to using them poorly? Yes, I should go root through the code. Clearly it would be good to get one more data point with HT off in BIOS, but from this data it looks as if the SMT stuff really helps little when the system is very heavily loaded (Nproc>=Nsibs), and does best when the load is around Nproc==Ncpu. At least as I read the data. The really interesting data would be the -j64 load without HT, using both schedulers. I just got done looking at a mail server with HT, kept the load avg 40-70 for a week. Speaks highly for the stability of RHEL-3.0, but I wouldn't mind a little more performance for free. -- bill davidsen <davidsen@tmr.com> CTO, TMR Associates, Inc Doing interesting things with little computers since 1979. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [BENCHMARK] 2.6.3-rc2 v 2.6.3-rc3-mm1 kernbench 2004-02-17 16:19 ` Bill Davidsen @ 2004-02-17 16:45 ` Nick Piggin 2004-02-18 0:25 ` Con Kolivas 0 siblings, 1 reply; 8+ messages in thread From: Nick Piggin @ 2004-02-17 16:45 UTC (permalink / raw) To: Bill Davidsen; +Cc: linux-kernel, Con Kolivas Bill Davidsen wrote: >On Tue, 17 Feb 2004, Nick Piggin wrote: > > > >>Bill, I have CC'ed your message without modification because Con is >>not subscribed to the list. Even for people who are subscribed, the >>convention on lkml is to reply to all. >> >>Anyway, the "no SMT" run is with CONFIG_SCHED_SMT turned off, P4 HT >>is still on. This was my fault because I didn't specify clearly that >>I wanted to see a run with hardware HT turned off, although these >>numbers are still interesting. >> >>Con hasn't tried HT off AFAIK because we couldn't work out how to >>turn it off at boot time! :( >> > >The curse of the brain-dead BIOS :-( > >So does CONFIG_SCHED_SMT turned off mean not using more than one sibling >per package, or just going back to using them poorly? Yes, I should go >root through the code. > > It just goes back to treating them the same as physical CPUs. The option will be eventually removed. >Clearly it would be good to get one more data point with HT off in BIOS, >but from this data it looks as if the SMT stuff really helps little when >the system is very heavily loaded (Nproc>=Nsibs), and does best when the >load is around Nproc==Ncpu. At least as I read the data. The really >interesting data would be the -j64 load without HT, using both schedulers. > > The biggest problems with SMT happen when 1 < Nproc < Nsibs, because every two processes that end up on one physical CPU leaves one physical CPU idle, and the non HT scheduler can't detect or correct this. At higher numbers of processes, you fill all virtual CPUs, so physical CPUs don't become idle. You can still be smarter about cache and migration costs though. >I just got done looking at a mail server with HT, kept the load avg 40-70 >for a week. Speaks highly for the stability of RHEL-3.0, but I wouldn't >mind a little more performance for free. > > Not sure if they have any sort of HT aware scheduler or not. If they do it is probably a shared runqueues type which is much the same as sched domains in terms of functionality. I don't think it would help much here though. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [BENCHMARK] 2.6.3-rc2 v 2.6.3-rc3-mm1 kernbench 2004-02-17 16:45 ` Nick Piggin @ 2004-02-18 0:25 ` Con Kolivas 0 siblings, 0 replies; 8+ messages in thread From: Con Kolivas @ 2004-02-18 0:25 UTC (permalink / raw) To: Nick Piggin; +Cc: Bill Davidsen, linux-kernel Quoting Nick Piggin <piggin@cyberone.com.au>: > Bill Davidsen wrote: > >On Tue, 17 Feb 2004, Nick Piggin wrote: > >>Con hasn't tried HT off AFAIK because we couldn't work out how to > >>turn it off at boot time! :( > >The curse of the brain-dead BIOS :-( Err not quite; it's an X440 that is 10000 miles away so I cant really access the bios easily :P > >So does CONFIG_SCHED_SMT turned off mean not using more than one sibling > >per package, or just going back to using them poorly? Yes, I should go > >root through the code. > > It just goes back to treating them the same as physical CPUs. > The option will be eventually removed. > > >Clearly it would be good to get one more data point with HT off in BIOS, > >but from this data it looks as if the SMT stuff really helps little when > >the system is very heavily loaded (Nproc>=Nsibs), and does best when the > >load is around Nproc==Ncpu. At least as I read the data. Actually that machine is 8 packages, 16 logical cpus and kernbench half load by default is set to cpus / 2. The idea behind that load is to minimise wasted idle time so this is where good tuning should show the most bonus as it does. The really > >interesting data would be the -j64 load without HT, using both schedulers. > > The biggest problems with SMT happen when 1 < Nproc < Nsibs, > because every two processes that end up on one physical CPU > leaves one physical CPU idle, and the non HT scheduler can't > detect or correct this. > > At higher numbers of processes, you fill all virtual CPUs, > so physical CPUs don't become idle. You can still be smarter > about cache and migration costs though. > > >I just got done looking at a mail server with HT, kept the load avg 40-70 > >for a week. Speaks highly for the stability of RHEL-3.0, but I wouldn't > >mind a little more performance for free. > > Not sure if they have any sort of HT aware scheduler or not. > If they do it is probably a shared runqueues type which is > much the same as sched domains in terms of functionality. > I don't think it would help much here though. I think any bonus on the optimal and max loads on kernbench is remarkable since it's usually easy to keep all cpus busy when the load is 4xnum_cpus or higher. The fact that sched_domains shows a decent percentage bonus even with these loads speaks volumes about the effectiveness of this patch. Con ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2004-02-18 0:05 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-02-16 12:59 [BENCHMARK] 2.6.3-rc2 v 2.6.3-rc3-mm1 kernbench Con Kolivas 2004-02-16 13:20 ` Nick Piggin 2004-02-16 14:30 ` Con Kolivas 2004-02-17 0:42 ` bill davidsen 2004-02-17 4:22 ` Nick Piggin 2004-02-17 16:19 ` Bill Davidsen 2004-02-17 16:45 ` Nick Piggin 2004-02-18 0:25 ` Con Kolivas
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox