* aim7 -30% regression in 2.6.24-rc1 @ 2007-10-26 9:43 Zhang, Yanmin 2007-10-26 9:53 ` Peter Zijlstra 2007-10-26 11:23 ` Ingo Molnar 0 siblings, 2 replies; 19+ messages in thread From: Zhang, Yanmin @ 2007-10-26 9:43 UTC (permalink / raw) To: LKML; +Cc: mingo I tested 2.6.24-rc1 on my x86_64 machine which has 2 quad-core processors. Comparing with 2.6.23, aim7 has about -30% regression. I did a bisect and found patch http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=b5869ce7f68b233ceb81465a7644be0d9a5f3dbb caused the issue. kbuild/SPECjbb2000/SPECjbb2005 also has big regressions. On my another tigerton machine (4 quad-core processors), SPECjbb2005 has more than -40% regression. I didn't do a bisect on such benchmark testing, but I suspect the root cause is like aim7's. -yanmin ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: aim7 -30% regression in 2.6.24-rc1 2007-10-26 9:43 aim7 -30% regression in 2.6.24-rc1 Zhang, Yanmin @ 2007-10-26 9:53 ` Peter Zijlstra 2007-10-29 0:15 ` Zhang, Yanmin 2007-10-26 11:23 ` Ingo Molnar 1 sibling, 1 reply; 19+ messages in thread From: Peter Zijlstra @ 2007-10-26 9:53 UTC (permalink / raw) To: Zhang, Yanmin; +Cc: LKML, mingo On Fri, 2007-10-26 at 17:43 +0800, Zhang, Yanmin wrote: > I tested 2.6.24-rc1 on my x86_64 machine which has 2 quad-core processors. > > Comparing with 2.6.23, aim7 has about -30% regression. I did a bisect and found > patch http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=b5869ce7f68b233ceb81465a7644be0d9a5f3dbb > caused the issue. Bit weird that you point to a merge commit, and not an actual patch. Are you sure git bisect pointed at this one? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: aim7 -30% regression in 2.6.24-rc1 2007-10-26 9:53 ` Peter Zijlstra @ 2007-10-29 0:15 ` Zhang, Yanmin 0 siblings, 0 replies; 19+ messages in thread From: Zhang, Yanmin @ 2007-10-29 0:15 UTC (permalink / raw) To: Peter Zijlstra; +Cc: LKML, mingo On Fri, 2007-10-26 at 11:53 +0200, Peter Zijlstra wrote: > On Fri, 2007-10-26 at 17:43 +0800, Zhang, Yanmin wrote: > > I tested 2.6.24-rc1 on my x86_64 machine which has 2 quad-core processors. > > > > Comparing with 2.6.23, aim7 has about -30% regression. I did a bisect and found > > patch http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=b5869ce7f68b233ceb81465a7644be0d9a5f3dbb > > caused the issue. > > Bit weird that you point to a merge commit, and not an actual patch. Are > you sure git bisect pointed at this one? When I did a bisect, kernel couldn't boot and my testing log showed it's at b5869ce7f68b233ceb81465a7644be0d9a5f3dbb. So I did a manual checkout. #git clone ... #git pull ... #git checkout b5869ce7f68b233ceb81465a7644be0d9a5f3dbb Then, compiled kernel and tested it. Then, reversed above patch and recompiled/retested it. If I ran git log, I could see this tag in the list. -yanmin ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: aim7 -30% regression in 2.6.24-rc1 2007-10-26 9:43 aim7 -30% regression in 2.6.24-rc1 Zhang, Yanmin 2007-10-26 9:53 ` Peter Zijlstra @ 2007-10-26 11:23 ` Ingo Molnar 2007-10-29 2:22 ` Zhang, Yanmin 1 sibling, 1 reply; 19+ messages in thread From: Ingo Molnar @ 2007-10-26 11:23 UTC (permalink / raw) To: Zhang, Yanmin; +Cc: LKML, Peter Zijlstra * Zhang, Yanmin <yanmin_zhang@linux.intel.com> wrote: > I tested 2.6.24-rc1 on my x86_64 machine which has 2 quad-core processors. > > Comparing with 2.6.23, aim7 has about -30% regression. I did a bisect > and found patch > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=b5869ce7f68b233ceb81465a7644be0d9a5f3dbb > caused the issue. weird, that's a commit diff - i.e. it changes no code. > kbuild/SPECjbb2000/SPECjbb2005 also has big regressions. On my another > tigerton machine (4 quad-core processors), SPECjbb2005 has more than > -40% regression. I didn't do a bisect on such benchmark testing, but I > suspect the root cause is like aim7's. these two commits might be relevant: 7a6c6bcee029a978f866511d6e41dbc7301fde4c 95dbb421d12fdd9796ed153853daf3679809274f but a bisection result would be the best info. Ingo ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: aim7 -30% regression in 2.6.24-rc1 2007-10-26 11:23 ` Ingo Molnar @ 2007-10-29 2:22 ` Zhang, Yanmin 2007-10-29 9:37 ` Zhang, Yanmin 0 siblings, 1 reply; 19+ messages in thread From: Zhang, Yanmin @ 2007-10-29 2:22 UTC (permalink / raw) To: Ingo Molnar; +Cc: LKML, Peter Zijlstra On Fri, 2007-10-26 at 13:23 +0200, Ingo Molnar wrote: > * Zhang, Yanmin <yanmin_zhang@linux.intel.com> wrote: > > > I tested 2.6.24-rc1 on my x86_64 machine which has 2 quad-core processors. > > > > Comparing with 2.6.23, aim7 has about -30% regression. I did a bisect > > and found patch > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=b5869ce7f68b233ceb81465a7644be0d9a5f3dbb > > caused the issue. > > weird, that's a commit diff - i.e. it changes no code. I got the tag from #git log. As for above link, I just added prior http address, so readers could check the patch by clicking. > > > kbuild/SPECjbb2000/SPECjbb2005 also has big regressions. On my another > > tigerton machine (4 quad-core processors), SPECjbb2005 has more than > > -40% regression. I didn't do a bisect on such benchmark testing, but I > > suspect the root cause is like aim7's. > > these two commits might be relevant: > > 7a6c6bcee029a978f866511d6e41dbc7301fde4c I did a quick testing. This patch has no impact. > 95dbb421d12fdd9796ed153853daf3679809274f Above big patch doesn't include this one, which means if I do 'git checkout b5869ce7f68b233ceb81465a7644be0d9a5f3dbb', the kernel doesn't include 95dbb421d12fdd9796ed153853daf3679809274f. > > but a bisection result would be the best info. I will do a bisect between 2.6.23 and tag 9c63d9c021f375a2708ad79043d6f4dd1291a085. -yanmin ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: aim7 -30% regression in 2.6.24-rc1 2007-10-29 2:22 ` Zhang, Yanmin @ 2007-10-29 9:37 ` Zhang, Yanmin 2007-10-30 2:12 ` Zhang, Yanmin 0 siblings, 1 reply; 19+ messages in thread From: Zhang, Yanmin @ 2007-10-29 9:37 UTC (permalink / raw) To: Ingo Molnar; +Cc: LKML, Peter Zijlstra On Mon, 2007-10-29 at 10:22 +0800, Zhang, Yanmin wrote: > On Fri, 2007-10-26 at 13:23 +0200, Ingo Molnar wrote: > > * Zhang, Yanmin <yanmin_zhang@linux.intel.com> wrote: > > > > > I tested 2.6.24-rc1 on my x86_64 machine which has 2 quad-core processors. > > > > > > Comparing with 2.6.23, aim7 has about -30% regression. I did a bisect > > > and found patch > > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=b5869ce7f68b233ceb81465a7644be0d9a5f3dbb > > > caused the issue. > > > > weird, that's a commit diff - i.e. it changes no code. > I got the tag from #git log. As for above link, I just added prior http address, > so readers could check the patch by clicking. > > > > > > kbuild/SPECjbb2000/SPECjbb2005 also has big regressions. On my another > > > tigerton machine (4 quad-core processors), SPECjbb2005 has more than > > > -40% regression. I didn't do a bisect on such benchmark testing, but I > > > suspect the root cause is like aim7's. > > > > these two commits might be relevant: > > > > 7a6c6bcee029a978f866511d6e41dbc7301fde4c > I did a quick testing. This patch has no impact. > > > 95dbb421d12fdd9796ed153853daf3679809274f > Above big patch doesn't include this one, which means if I do > 'git checkout b5869ce7f68b233ceb81465a7644be0d9a5f3dbb', the kernel doesn't include > 95dbb421d12fdd9796ed153853daf3679809274f. > > > > > but a bisection result would be the best info. > I will do a bisect between 2.6.23 and tag 9c63d9c021f375a2708ad79043d6f4dd1291a085. I ran git bisect with kernel version as the tag. It looks like git will be crazy sometimes. So I checked ChangeLog and used the number tag to replace the kernel version and retested it. It looks like at least 2 patches were responsible for the regression. I'm doing sub-bisect now. I could find aim7 regression on all my testing machines although the regression percentage is different. Machine regression 8-core stoakley 30% 16-core tigerton 6% tulsa(dual-core+HT, 16 logical cpu) 20% -yanmin ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: aim7 -30% regression in 2.6.24-rc1 2007-10-29 9:37 ` Zhang, Yanmin @ 2007-10-30 2:12 ` Zhang, Yanmin 2007-10-30 7:26 ` Ingo Molnar 0 siblings, 1 reply; 19+ messages in thread From: Zhang, Yanmin @ 2007-10-30 2:12 UTC (permalink / raw) To: Ingo Molnar; +Cc: LKML, Peter Zijlstra On Mon, 2007-10-29 at 17:37 +0800, Zhang, Yanmin wrote: > On Mon, 2007-10-29 at 10:22 +0800, Zhang, Yanmin wrote: > > On Fri, 2007-10-26 at 13:23 +0200, Ingo Molnar wrote: > > > * Zhang, Yanmin <yanmin_zhang@linux.intel.com> wrote: > > > > > > > I tested 2.6.24-rc1 on my x86_64 machine which has 2 quad-core processors. > > > > > > > > Comparing with 2.6.23, aim7 has about -30% regression. I did a bisect > > > > and found patch > > > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=b5869ce7f68b233ceb81465a7644be0d9a5f3dbb > > > > caused the issue. > > > > > > weird, that's a commit diff - i.e. it changes no code. > > I got the tag from #git log. As for above link, I just added prior http address, > > so readers could check the patch by clicking. > > > > > > > > > kbuild/SPECjbb2000/SPECjbb2005 also has big regressions. On my another > > > > tigerton machine (4 quad-core processors), SPECjbb2005 has more than > > > > -40% regression. I didn't do a bisect on such benchmark testing, but I > > > > suspect the root cause is like aim7's. > > > > > > these two commits might be relevant: > > > > > > 7a6c6bcee029a978f866511d6e41dbc7301fde4c > > I did a quick testing. This patch has no impact. > > > > > 95dbb421d12fdd9796ed153853daf3679809274f > > Above big patch doesn't include this one, which means if I do > > 'git checkout b5869ce7f68b233ceb81465a7644be0d9a5f3dbb', the kernel doesn't include > > 95dbb421d12fdd9796ed153853daf3679809274f. > > > > > > > > but a bisection result would be the best info. > > I will do a bisect between 2.6.23 and tag 9c63d9c021f375a2708ad79043d6f4dd1291a085. > I ran git bisect with kernel version as the tag. It looks like git will > be crazy sometimes. So I checked ChangeLog and used the number tag to replace > the kernel version and retested it. > > It looks like at least 2 patches were responsible for the regression. I'm > doing sub-bisect now. > > I could find aim7 regression on all my testing machines although the regression > percentage is different. > > Machine regression > 8-core stoakley 30% > 16-core tigerton 6% > tulsa(dual-core+HT, 16 logical cpu) 20% sub-bisecting captured patch 38ad464d410dadceda1563f36bdb0be7fe4c8938(sched: uniform tunings) caused 20% regression of aim7. The last 10% should be also related to sched parameters, such like sysctl_sched_min_granularity. -yanmin ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: aim7 -30% regression in 2.6.24-rc1 2007-10-30 2:12 ` Zhang, Yanmin @ 2007-10-30 7:26 ` Ingo Molnar 2007-10-30 8:36 ` Zhang, Yanmin 0 siblings, 1 reply; 19+ messages in thread From: Ingo Molnar @ 2007-10-30 7:26 UTC (permalink / raw) To: Zhang, Yanmin; +Cc: LKML, Peter Zijlstra * Zhang, Yanmin <yanmin_zhang@linux.intel.com> wrote: > sub-bisecting captured patch > 38ad464d410dadceda1563f36bdb0be7fe4c8938(sched: uniform tunings) > caused 20% regression of aim7. > > The last 10% should be also related to sched parameters, such like > sysctl_sched_min_granularity. ah, interesting. Since you have CONFIG_SCHED_DEBUG enabled, could you please try to figure out what the best value for /proc/sys/kernel_sched_latency, /proc/sys/kernel_sched_nr_latency and /proc/sys/kernel_sched_min_granularity is? there's a tuning constraint for kernel_sched_nr_latency: - kernel_sched_nr_latency should always be set to kernel_sched_latency/kernel_sched_min_granularity. (it's not a free tunable) i suspect a good approach would be to double the value of kernel_sched_latency and kernel_sched_nr_latency in each tuning iteration, while keeping kernel_sched_min_granularity unchanged. That will excercise the tuning values of the 2.6.23 kernel as well. Ingo ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: aim7 -30% regression in 2.6.24-rc1 2007-10-30 7:26 ` Ingo Molnar @ 2007-10-30 8:36 ` Zhang, Yanmin 2007-10-31 9:57 ` Zhang, Yanmin 0 siblings, 1 reply; 19+ messages in thread From: Zhang, Yanmin @ 2007-10-30 8:36 UTC (permalink / raw) To: Ingo Molnar; +Cc: LKML, Peter Zijlstra On Tue, 2007-10-30 at 08:26 +0100, Ingo Molnar wrote: > * Zhang, Yanmin <yanmin_zhang@linux.intel.com> wrote: > > > sub-bisecting captured patch > > 38ad464d410dadceda1563f36bdb0be7fe4c8938(sched: uniform tunings) > > caused 20% regression of aim7. > > > > The last 10% should be also related to sched parameters, such like > > sysctl_sched_min_granularity. > > ah, interesting. Since you have CONFIG_SCHED_DEBUG enabled, could you > please try to figure out what the best value for > /proc/sys/kernel_sched_latency, /proc/sys/kernel_sched_nr_latency and > /proc/sys/kernel_sched_min_granularity is? > > there's a tuning constraint for kernel_sched_nr_latency: > > - kernel_sched_nr_latency should always be set to > kernel_sched_latency/kernel_sched_min_granularity. (it's not a free > tunable) > > i suspect a good approach would be to double the value of > kernel_sched_latency and kernel_sched_nr_latency in each tuning > iteration, while keeping kernel_sched_min_granularity unchanged. That > will excercise the tuning values of the 2.6.23 kernel as well. I followed your idea to test 2.6.24-rc1. The improvement is slow. When sched_nr_latency=2560 and sched_latency_ns=640000000, the performance is still about 15% less than 2.6.23. -yanmin ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: aim7 -30% regression in 2.6.24-rc1 2007-10-30 8:36 ` Zhang, Yanmin @ 2007-10-31 9:57 ` Zhang, Yanmin 2007-10-31 10:30 ` Peter Zijlstra 2007-11-01 9:34 ` Zhang, Yanmin 0 siblings, 2 replies; 19+ messages in thread From: Zhang, Yanmin @ 2007-10-31 9:57 UTC (permalink / raw) To: Ingo Molnar; +Cc: LKML, Peter Zijlstra On Tue, 2007-10-30 at 16:36 +0800, Zhang, Yanmin wrote: > On Tue, 2007-10-30 at 08:26 +0100, Ingo Molnar wrote: > > * Zhang, Yanmin <yanmin_zhang@linux.intel.com> wrote: > > > > > sub-bisecting captured patch > > > 38ad464d410dadceda1563f36bdb0be7fe4c8938(sched: uniform tunings) > > > caused 20% regression of aim7. > > > > > > The last 10% should be also related to sched parameters, such like > > > sysctl_sched_min_granularity. > > > > ah, interesting. Since you have CONFIG_SCHED_DEBUG enabled, could you > > please try to figure out what the best value for > > /proc/sys/kernel_sched_latency, /proc/sys/kernel_sched_nr_latency and > > /proc/sys/kernel_sched_min_granularity is? > > > > there's a tuning constraint for kernel_sched_nr_latency: > > > > - kernel_sched_nr_latency should always be set to > > kernel_sched_latency/kernel_sched_min_granularity. (it's not a free > > tunable) > > > > i suspect a good approach would be to double the value of > > kernel_sched_latency and kernel_sched_nr_latency in each tuning > > iteration, while keeping kernel_sched_min_granularity unchanged. That > > will excercise the tuning values of the 2.6.23 kernel as well. > I followed your idea to test 2.6.24-rc1. The improvement is slow. > When sched_nr_latency=2560 and sched_latency_ns=640000000, the performance > is still about 15% less than 2.6.23. I got the aim7 30% regression on my new upgraded stoakley machine. I found this mahcine is slower than the old one. Maybe BIOS has issues, or memeory(Might not be dual-channel?) is slow. So I retested it on the old machine and found on the old stoakley machine, the regression is about 6%, quite similiar to the regression on tigerton machine. By sched_nr_latency=640 and sched_latency_ns=640000000 on the old stoakley machine, the regression becomes about 2%. Other latency has more regression. On my tulsa machine, by sched_nr_latency=640 and sched_latency_ns=640000000, the regression becomes less than 1% (The original regression is about 20%). When I ran a bad script to change the values of sched_nr_latency and sched_latency_ns, I hit OOPS on my tulsa machine. Below is the log. It looks like sched_nr_latency becomes 0. *******************Log************************************ divide error: 0000 [1] SMP CPU 1 Modules linked in: megaraid_mbox megaraid_mm Pid: 7326, comm: sh Not tainted 2.6.24-rc1 #2 RIP: 0010:[<ffffffff8022c2bf>] [<ffffffff8022c2bf>] __sched_period+0x22/0x2e RSP: 0018:ffff810105909e38 EFLAGS: 00010046 RAX: 000000005a000000 RBX: 0000000000000000 RCX: 000000002d000000 RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000002 RBP: ffff810105909e40 R08: ffff810103bfed50 R09: 00000000ffffffff R10: 0000000000000038 R11: 0000000000000296 R12: ffff810100d6db40 R13: ffff8101058c4148 R14: 0000000000000001 R15: ffff810104c34088 FS: 00002b851bc59f50(0000) GS:ffff810100cb1b40(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000006c64d8 CR3: 000000010752c000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process sh (pid: 7326, threadinfo ffff810105908000, task ffff810104c34040) Stack: 0000000000000800 ffff810105909e58 ffffffff8022c2db 00000000079d292b ffff810105909e88 ffffffff8022c36e ffff810100d6db40 ffff8101058c4148 ffff8101058c4100 0000000000000001 ffff810105909ec8 ffffffff80232d0a Call Trace: [<ffffffff8022c2db>] __sched_vslice+0x10/0x1d [<ffffffff8022c36e>] place_entity+0x86/0xc3 [<ffffffff80232d0a>] task_new_fair+0x48/0xa5 [<ffffffff8020b63e>] system_call+0x7e/0x83 [<ffffffff80233325>] wake_up_new_task+0x70/0xa4 [<ffffffff80235612>] do_fork+0x137/0x204 [<ffffffff802818bd>] vfs_write+0x121/0x136 [<ffffffff8023f017>] recalc_sigpending+0xe/0x25 [<ffffffff8023f0ef>] sigprocmask+0x9e/0xc0 [<ffffffff8020b957>] ptregscall_common+0x67/0xb0 Code: 48 f7 f3 48 89 c1 5b c9 48 89 c8 c3 55 48 89 e5 53 48 89 fb RIP [<ffffffff8022c2bf>] __sched_period+0x22/0x2e RSP <ffff810105909e38> divide error: 0000 [2] SMP CPU 0 Modules linked in: megaraid_mbox megaraid_mm Pid: 3674, comm: automount Tainted: G D 2.6.24-rc1 #2 RIP: 0010:[<ffffffff8022c2bf>] [<ffffffff8022c2bf>] __sched_period+0x22/0x2e RSP: 0018:ffff81010690de38 EFLAGS: 00010046 RAX: 000000005a000000 RBX: 0000000000000000 RCX: 000000002d000000 RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000002 RBP: ffff81010690de40 R08: ffff81010690c000 R09: 00000000ffffffff R10: 0000000000000038 R11: ffff810104007040 R12: ffff810001033880 R13: ffff810100f2a828 R14: 0000000000000001 R15: ffff810104007088 FS: 0000000040021950(0063) GS:ffffffff8074e000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00002b6cc4245000 CR3: 0000000105972000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process automount (pid: 3674, threadinfo ffff81010690c000, task ffff810104007040) Stack: 0000000000000800 ffff81010690de58 ffffffff8022c2db 000000057aef240d ffff81010690de88 ffffffff8022c36e ffff810001033880 ffff810100f2a828 ffff810100f2a7e0 0000000000000000 ffff81010690dec8 ffffffff80232d0a Call Trace: [<ffffffff8022c2db>] __sched_vslice+0x10/0x1d [<ffffffff8022c36e>] place_entity+0x86/0xc3 [<ffffffff80232d0a>] task_new_fair+0x48/0xa5 [<ffffffff8020b63e>] system_call+0x7e/0x83 [<ffffffff80233325>] wake_up_new_task+0x70/0xa4 [<ffffffff80235612>] do_fork+0x137/0x204 [<ffffffff8020b957>] ptregscall_common+0x67/0xb0 Code: 48 f7 f3 48 89 c1 5b c9 48 89 c8 c3 55 48 89 e5 53 48 89 fb RIP [<ffffffff8022c2bf>] __sched_period+0x22/0x2e RSP <ffff81010690de38> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: aim7 -30% regression in 2.6.24-rc1 2007-10-31 9:57 ` Zhang, Yanmin @ 2007-10-31 10:30 ` Peter Zijlstra 2007-11-01 8:58 ` Ingo Molnar 2007-11-01 9:34 ` Zhang, Yanmin 1 sibling, 1 reply; 19+ messages in thread From: Peter Zijlstra @ 2007-10-31 10:30 UTC (permalink / raw) To: Zhang, Yanmin; +Cc: Ingo Molnar, LKML [-- Attachment #1: Type: text/plain, Size: 3323 bytes --] On Wed, 2007-10-31 at 17:57 +0800, Zhang, Yanmin wrote: > On Tue, 2007-10-30 at 16:36 +0800, Zhang, Yanmin wrote: > > On Tue, 2007-10-30 at 08:26 +0100, Ingo Molnar wrote: > > > * Zhang, Yanmin <yanmin_zhang@linux.intel.com> wrote: > > > > > > > sub-bisecting captured patch > > > > 38ad464d410dadceda1563f36bdb0be7fe4c8938(sched: uniform tunings) > > > > caused 20% regression of aim7. > > > > > > > > The last 10% should be also related to sched parameters, such like > > > > sysctl_sched_min_granularity. > > > > > > ah, interesting. Since you have CONFIG_SCHED_DEBUG enabled, could you > > > please try to figure out what the best value for > > > /proc/sys/kernel_sched_latency, /proc/sys/kernel_sched_nr_latency and > > > /proc/sys/kernel_sched_min_granularity is? > > > > > > there's a tuning constraint for kernel_sched_nr_latency: > > > > > > - kernel_sched_nr_latency should always be set to > > > kernel_sched_latency/kernel_sched_min_granularity. (it's not a free > > > tunable) > > > > > > i suspect a good approach would be to double the value of > > > kernel_sched_latency and kernel_sched_nr_latency in each tuning > > > iteration, while keeping kernel_sched_min_granularity unchanged. That > > > will excercise the tuning values of the 2.6.23 kernel as well. > > I followed your idea to test 2.6.24-rc1. The improvement is slow. > > When sched_nr_latency=2560 and sched_latency_ns=640000000, the performance > > is still about 15% less than 2.6.23. > > I got the aim7 30% regression on my new upgraded stoakley machine. I found > this mahcine is slower than the old one. Maybe BIOS has issues, or memeory(Might not > be dual-channel?) is slow. So I retested it on the old machine and found on the old > stoakley machine, the regression is about 6%, quite similiar to the regression on tigerton > machine. > > By sched_nr_latency=640 and sched_latency_ns=640000000 on the old stoakley machine, > the regression becomes about 2%. Other latency has more regression. > > On my tulsa machine, by sched_nr_latency=640 and sched_latency_ns=640000000, > the regression becomes less than 1% (The original regression is about 20%). > > When I ran a bad script to change the values of sched_nr_latency and sched_latency_ns, > I hit OOPS on my tulsa machine. Below is the log. It looks like sched_nr_latency becomes > 0. Oops, yeah I think I overlooked that case :-/ I think limiting the sysctl parameters make most sense, as a 0 value really doesn't. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> --- diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 3b4efbe..0f34c91 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -94,6 +94,7 @@ static int two = 2; static int zero; static int one_hundred = 100; +static int int_max = INT_MAX; /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */ static int maxolduid = 65535; @@ -239,7 +240,10 @@ static struct ctl_table kern_table[] = { .data = &sysctl_sched_nr_latency, .maxlen = sizeof(unsigned int), .mode = 0644, - .proc_handler = &proc_dointvec, + .proc_handler = &proc_dointvec_minmax, + .strategy = &sysctl_intvec, + .extra1 = &one, + .extra2 = &int_max, }, { .ctl_name = CTL_UNNUMBERED, [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: aim7 -30% regression in 2.6.24-rc1 2007-10-31 10:30 ` Peter Zijlstra @ 2007-11-01 8:58 ` Ingo Molnar [not found] ` <1193922687.27652.279.camel@twins> 0 siblings, 1 reply; 19+ messages in thread From: Ingo Molnar @ 2007-11-01 8:58 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Zhang, Yanmin, LKML * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > static int one_hundred = 100; > +static int int_max = INT_MAX; > > /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */ > static int maxolduid = 65535; > @@ -239,7 +240,10 @@ static struct ctl_table kern_table[] = { > .data = &sysctl_sched_nr_latency, > .maxlen = sizeof(unsigned int), > .mode = 0644, > - .proc_handler = &proc_dointvec, > + .proc_handler = &proc_dointvec_minmax, > + .strategy = &sysctl_intvec, > + .extra1 = &one, > + .extra2 = &int_max, could we instead justmake sched_nr_latency non-tunable, and recalculate it from the sysctl handler whenever sched_latency or sched_min_granularity changes? That would avoid not only the division by zero bug but also other out-of-spec tunings. Ingo ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <1193922687.27652.279.camel@twins>]
[parent not found: <20071101150049.GB4044@elte.hu>]
* Re: aim7 -30% regression in 2.6.24-rc1 [not found] ` <20071101150049.GB4044@elte.hu> @ 2007-11-01 15:29 ` Peter Zijlstra 2007-11-01 15:36 ` Ingo Molnar 0 siblings, 1 reply; 19+ messages in thread From: Peter Zijlstra @ 2007-11-01 15:29 UTC (permalink / raw) To: Ingo Molnar; +Cc: LKML, Zhang, Yanmin [-- Attachment #1: Type: text/plain, Size: 5417 bytes --] (restoring CCs which I inadvertly dropped) On Thu, 2007-11-01 at 16:00 +0100, Ingo Molnar wrote: > * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > > > > could we instead justmake sched_nr_latency non-tunable, and > > > recalculate it from the sysctl handler whenever sched_latency or > > > sched_min_granularity changes? That would avoid not only the > > > division by zero bug but also other out-of-spec tunings. > > > > We don't have min_granularity anymore. > > i think we should reintroduce it in the SCHED_DEBUG case and make it the > main tunable item - sched_nr is a nice performance optimization but > quite unintuitive as a tuning knob. ok, I don't particularly care either way, could be because I wrote the stuff :-) Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> --- Index: linux-2.6/include/linux/sched.h =================================================================== --- linux-2.6.orig/include/linux/sched.h +++ linux-2.6/include/linux/sched.h @@ -1466,12 +1466,16 @@ extern void sched_idle_next(void); #ifdef CONFIG_SCHED_DEBUG extern unsigned int sysctl_sched_latency; -extern unsigned int sysctl_sched_nr_latency; +extern unsigned int sysctl_sched_min_granularity; extern unsigned int sysctl_sched_wakeup_granularity; extern unsigned int sysctl_sched_batch_wakeup_granularity; extern unsigned int sysctl_sched_child_runs_first; extern unsigned int sysctl_sched_features; extern unsigned int sysctl_sched_migration_cost; + +int sched_nr_latency_handler(struct ctl_table *table, int write, + struct file *file, void __user *buffer, size_t *length, + loff_t *ppos); #endif extern unsigned int sysctl_sched_compat_yield; Index: linux-2.6/kernel/sched_debug.c =================================================================== --- linux-2.6.orig/kernel/sched_debug.c +++ linux-2.6/kernel/sched_debug.c @@ -210,7 +210,7 @@ static int sched_debug_show(struct seq_f #define PN(x) \ SEQ_printf(m, " .%-40s: %Ld.%06ld\n", #x, SPLIT_NS(x)) PN(sysctl_sched_latency); - PN(sysctl_sched_nr_latency); + PN(sysctl_sched_min_granularity); PN(sysctl_sched_wakeup_granularity); PN(sysctl_sched_batch_wakeup_granularity); PN(sysctl_sched_child_runs_first); Index: linux-2.6/kernel/sched_fair.c =================================================================== --- linux-2.6.orig/kernel/sched_fair.c +++ linux-2.6/kernel/sched_fair.c @@ -35,16 +35,21 @@ const_debug unsigned int sysctl_sched_latency = 20000000ULL; /* - * After fork, child runs first. (default) If set to 0 then - * parent will (try to) run first. + * Minimal preemption granularity for CPU-bound tasks: + * (default: 1 msec, units: nanoseconds) */ -const_debug unsigned int sysctl_sched_child_runs_first = 1; +const_debug unsigned int sysctl_sched_min_granularity = 1000000ULL; /* - * Minimal preemption granularity for CPU-bound tasks: - * (default: 2 msec, units: nanoseconds) + * is kept at sysctl_sched_latency / sysctl_sched_min_granularity + */ +const_debug unsigned int sched_nr_latency = 20; + +/* + * After fork, child runs first. (default) If set to 0 then + * parent will (try to) run first. */ -const_debug unsigned int sysctl_sched_nr_latency = 20; +const_debug unsigned int sysctl_sched_child_runs_first = 1; /* * sys_sched_yield() compat mode @@ -301,6 +306,21 @@ static inline struct sched_entity *__pic * Scheduling class statistics methods: */ +#ifdef CONFIG_SCHED_DEBUG +int sched_nr_latency_handler(struct ctl_table *table, int write, + struct file *filp, void __user *buffer, size_t *lenp, + loff_t *ppos) +{ + int ret = proc_dointvec_minmax(table, write, filp, buffer, lenp, ppos); + + if (!ret && write) { + sched_nr_latency = + sysctl_sched_latency / sysctl_sched_min_granularity; + } + + return ret; +} +#endif /* * The idea is to set a period in which each task runs once. @@ -313,7 +333,7 @@ static inline struct sched_entity *__pic static u64 __sched_period(unsigned long nr_running) { u64 period = sysctl_sched_latency; - unsigned long nr_latency = sysctl_sched_nr_latency; + unsigned long nr_latency = sched_nr_latency; if (unlikely(nr_running > nr_latency)) { period *= nr_running; Index: linux-2.6/kernel/sysctl.c =================================================================== --- linux-2.6.orig/kernel/sysctl.c +++ linux-2.6/kernel/sysctl.c @@ -235,11 +235,14 @@ static struct ctl_table kern_table[] = { #ifdef CONFIG_SCHED_DEBUG { .ctl_name = CTL_UNNUMBERED, - .procname = "sched_nr_latency", - .data = &sysctl_sched_nr_latency, + .procname = "sched_min_granularity_ns", + .data = &sysctl_sched_min_granularity, .maxlen = sizeof(unsigned int), .mode = 0644, - .proc_handler = &proc_dointvec, + .proc_handler = &sched_nr_latency_handler, + .strategy = &sysctl_intvec, + .extra1 = &min_sched_granularity_ns, + .extra2 = &max_sched_granularity_ns, }, { .ctl_name = CTL_UNNUMBERED, @@ -247,7 +250,7 @@ static struct ctl_table kern_table[] = { .data = &sysctl_sched_latency, .maxlen = sizeof(unsigned int), .mode = 0644, - .proc_handler = &proc_dointvec_minmax, + .proc_handler = &sched_nr_latency_handler, .strategy = &sysctl_intvec, .extra1 = &min_sched_granularity_ns, .extra2 = &max_sched_granularity_ns, [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: aim7 -30% regression in 2.6.24-rc1 2007-11-01 15:29 ` Peter Zijlstra @ 2007-11-01 15:36 ` Ingo Molnar 0 siblings, 0 replies; 19+ messages in thread From: Ingo Molnar @ 2007-11-01 15:36 UTC (permalink / raw) To: Peter Zijlstra; +Cc: LKML, Zhang, Yanmin * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > > > We don't have min_granularity anymore. > > > > i think we should reintroduce it in the SCHED_DEBUG case and make it > > the main tunable item - sched_nr is a nice performance optimization > > but quite unintuitive as a tuning knob. > > ok, I don't particularly care either way, could be because I wrote the > stuff :-) heh :-) I've applied your patch, it looks good to me. Ingo ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: aim7 -30% regression in 2.6.24-rc1 2007-10-31 9:57 ` Zhang, Yanmin 2007-10-31 10:30 ` Peter Zijlstra @ 2007-11-01 9:34 ` Zhang, Yanmin 2007-11-01 10:02 ` Cyrus Massoumi 1 sibling, 1 reply; 19+ messages in thread From: Zhang, Yanmin @ 2007-11-01 9:34 UTC (permalink / raw) To: Ingo Molnar; +Cc: LKML, Peter Zijlstra On Wed, 2007-10-31 at 17:57 +0800, Zhang, Yanmin wrote: > On Tue, 2007-10-30 at 16:36 +0800, Zhang, Yanmin wrote: > > On Tue, 2007-10-30 at 08:26 +0100, Ingo Molnar wrote: > > > * Zhang, Yanmin <yanmin_zhang@linux.intel.com> wrote: > > > > > > > sub-bisecting captured patch > > > > 38ad464d410dadceda1563f36bdb0be7fe4c8938(sched: uniform tunings) > > > > caused 20% regression of aim7. > > > > > > > > The last 10% should be also related to sched parameters, such like > > > > sysctl_sched_min_granularity. > > > > > > ah, interesting. Since you have CONFIG_SCHED_DEBUG enabled, could you > > > please try to figure out what the best value for > > > /proc/sys/kernel_sched_latency, /proc/sys/kernel_sched_nr_latency and > > > /proc/sys/kernel_sched_min_granularity is? > > > > > > there's a tuning constraint for kernel_sched_nr_latency: > > > > > > - kernel_sched_nr_latency should always be set to > > > kernel_sched_latency/kernel_sched_min_granularity. (it's not a free > > > tunable) > > > > > > i suspect a good approach would be to double the value of > > > kernel_sched_latency and kernel_sched_nr_latency in each tuning > > > iteration, while keeping kernel_sched_min_granularity unchanged. That > > > will excercise the tuning values of the 2.6.23 kernel as well. > > I followed your idea to test 2.6.24-rc1. The improvement is slow. > > When sched_nr_latency=2560 and sched_latency_ns=640000000, the performance > > is still about 15% less than 2.6.23. > > I got the aim7 30% regression on my new upgraded stoakley machine. I found > this mahcine is slower than the old one. Maybe BIOS has issues, or memeory(Might not > be dual-channel?) is slow. So I retested it on the old machine and found on the old > stoakley machine, the regression is about 6%, quite similiar to the regression on tigerton > machine. > > By sched_nr_latency=640 and sched_latency_ns=640000000 on the old stoakley machine, > the regression becomes about 2%. Other latency has more regression. > > On my tulsa machine, by sched_nr_latency=640 and sched_latency_ns=640000000, > the regression becomes less than 1% (The original regression is about 20%). I rerun SPECjbb by ched_nr_latency=640 and sched_latency_ns=640000000. On tigerton, the regression is still more than 40%. On stoakley machine, it becomes worse (26%, original is 9%). I will do more investigation to make sure SPECjbb regression is also casued by the bad default values. We need a smarter method to calculate the best default values for the key tuning parameters. One interesting is sysbench+mysql(readonly) got the same result like 2.6.22 (no regression). Good job! -yanmin ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: aim7 -30% regression in 2.6.24-rc1 2007-11-01 9:34 ` Zhang, Yanmin @ 2007-11-01 10:02 ` Cyrus Massoumi 2007-11-05 1:24 ` Zhang, Yanmin 0 siblings, 1 reply; 19+ messages in thread From: Cyrus Massoumi @ 2007-11-01 10:02 UTC (permalink / raw) To: Zhang, Yanmin; +Cc: Ingo Molnar, LKML, Peter Zijlstra Zhang, Yanmin wrote: > On Wed, 2007-10-31 at 17:57 +0800, Zhang, Yanmin wrote: >> On Tue, 2007-10-30 at 16:36 +0800, Zhang, Yanmin wrote: >>> On Tue, 2007-10-30 at 08:26 +0100, Ingo Molnar wrote: >>>> * Zhang, Yanmin <yanmin_zhang@linux.intel.com> wrote: >>>> >>>>> sub-bisecting captured patch >>>>> 38ad464d410dadceda1563f36bdb0be7fe4c8938(sched: uniform tunings) >>>>> caused 20% regression of aim7. >>>>> >>>>> The last 10% should be also related to sched parameters, such like >>>>> sysctl_sched_min_granularity. >>>> ah, interesting. Since you have CONFIG_SCHED_DEBUG enabled, could you >>>> please try to figure out what the best value for >>>> /proc/sys/kernel_sched_latency, /proc/sys/kernel_sched_nr_latency and >>>> /proc/sys/kernel_sched_min_granularity is? >>>> >>>> there's a tuning constraint for kernel_sched_nr_latency: >>>> >>>> - kernel_sched_nr_latency should always be set to >>>> kernel_sched_latency/kernel_sched_min_granularity. (it's not a free >>>> tunable) >>>> >>>> i suspect a good approach would be to double the value of >>>> kernel_sched_latency and kernel_sched_nr_latency in each tuning >>>> iteration, while keeping kernel_sched_min_granularity unchanged. That >>>> will excercise the tuning values of the 2.6.23 kernel as well. >>> I followed your idea to test 2.6.24-rc1. The improvement is slow. >>> When sched_nr_latency=2560 and sched_latency_ns=640000000, the performance >>> is still about 15% less than 2.6.23. >> I got the aim7 30% regression on my new upgraded stoakley machine. I found >> this mahcine is slower than the old one. Maybe BIOS has issues, or memeory(Might not >> be dual-channel?) is slow. So I retested it on the old machine and found on the old >> stoakley machine, the regression is about 6%, quite similiar to the regression on tigerton >> machine. >> >> By sched_nr_latency=640 and sched_latency_ns=640000000 on the old stoakley machine, >> the regression becomes about 2%. Other latency has more regression. >> >> On my tulsa machine, by sched_nr_latency=640 and sched_latency_ns=640000000, >> the regression becomes less than 1% (The original regression is about 20%). > I rerun SPECjbb by ched_nr_latency=640 and sched_latency_ns=640000000. On tigerton, > the regression is still more than 40%. On stoakley machine, it becomes worse (26%, > original is 9%). I will do more investigation to make sure SPECjbb regression is > also casued by the bad default values. > > We need a smarter method to calculate the best default values for the key tuning > parameters. > > One interesting is sysbench+mysql(readonly) got the same result like 2.6.22 (no > regression). Good job! Do you mean you couldn't reproduce the regression which was reported with 2.6.23 (http://lkml.org/lkml/2007/10/30/53) with 2.6.24-rc1? It would be nice if you could provide some numbers for 2.6.22, 2.6.23 and 2.6.24-rc1. > -yanmin greetings Cyrus ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: aim7 -30% regression in 2.6.24-rc1 2007-11-01 10:02 ` Cyrus Massoumi @ 2007-11-05 1:24 ` Zhang, Yanmin 2007-11-05 9:37 ` Cyrus Massoumi 0 siblings, 1 reply; 19+ messages in thread From: Zhang, Yanmin @ 2007-11-05 1:24 UTC (permalink / raw) To: Cyrus Massoumi; +Cc: Ingo Molnar, LKML, Peter Zijlstra On Thu, 2007-11-01 at 11:02 +0100, Cyrus Massoumi wrote: > Zhang, Yanmin wrote: > > On Wed, 2007-10-31 at 17:57 +0800, Zhang, Yanmin wrote: > >> On Tue, 2007-10-30 at 16:36 +0800, Zhang, Yanmin wrote: > >>> On Tue, 2007-10-30 at 08:26 +0100, Ingo Molnar wrote: > >>>> * Zhang, Yanmin <yanmin_zhang@linux.intel.com> wrote: > >>>> > >>>>> sub-bisecting captured patch > >>>>> 38ad464d410dadceda1563f36bdb0be7fe4c8938(sched: uniform tunings) > >>>>> caused 20% regression of aim7. > >>>>> > >>>>> The last 10% should be also related to sched parameters, such like > >>>>> sysctl_sched_min_granularity. > >>>> ah, interesting. Since you have CONFIG_SCHED_DEBUG enabled, could you > >>>> please try to figure out what the best value for > >>>> /proc/sys/kernel_sched_latency, /proc/sys/kernel_sched_nr_latency and > >>>> /proc/sys/kernel_sched_min_granularity is? > >>>> > >>>> there's a tuning constraint for kernel_sched_nr_latency: > >>>> > >>>> - kernel_sched_nr_latency should always be set to > >>>> kernel_sched_latency/kernel_sched_min_granularity. (it's not a free > >>>> tunable) > >>>> > >>>> i suspect a good approach would be to double the value of > >>>> kernel_sched_latency and kernel_sched_nr_latency in each tuning > >>>> iteration, while keeping kernel_sched_min_granularity unchanged. That > >>>> will excercise the tuning values of the 2.6.23 kernel as well. > >>> I followed your idea to test 2.6.24-rc1. The improvement is slow. > >>> When sched_nr_latency=2560 and sched_latency_ns=640000000, the performance > >>> is still about 15% less than 2.6.23. > >> I got the aim7 30% regression on my new upgraded stoakley machine. I found > >> this mahcine is slower than the old one. Maybe BIOS has issues, or memeory(Might not > >> be dual-channel?) is slow. So I retested it on the old machine and found on the old > >> stoakley machine, the regression is about 6%, quite similiar to the regression on tigerton > >> machine. > >> > >> By sched_nr_latency=640 and sched_latency_ns=640000000 on the old stoakley machine, > >> the regression becomes about 2%. Other latency has more regression. > >> > >> On my tulsa machine, by sched_nr_latency=640 and sched_latency_ns=640000000, > >> the regression becomes less than 1% (The original regression is about 20%). > > I rerun SPECjbb by ched_nr_latency=640 and sched_latency_ns=640000000. On tigerton, > > the regression is still more than 40%. On stoakley machine, it becomes worse (26%, > > original is 9%). I will do more investigation to make sure SPECjbb regression is > > also casued by the bad default values. > > > > We need a smarter method to calculate the best default values for the key tuning > > parameters. > > > > One interesting is sysbench+mysql(readonly) got the same result like 2.6.22 (no > > regression). Good job! > > Do you mean you couldn't reproduce the regression which was reported > with 2.6.23 (http://lkml.org/lkml/2007/10/30/53) with 2.6.24-rc1? It looks like you missed my emails. Firstly, I reproduced (or just find the same myself :) ) the issue with kernel 2.6.22, 2.6.23-rc and 2.6.23. Ingo wrote a big patch to fix it and the new patch is in 2.6.24-rc1 now. Then I retested it with 2.6.24-rc1 on a couple of x86_64 machines. The issue disappeared. You could test it with 2.6.24-rc1. > It > would be nice if you could provide some numbers for 2.6.22, 2.6.23 and > 2.6.24-rc1. Sorry. Intel policy doesn't allow me to publish the numbers because only specific departments in Intel could do that. But I could talk the regression percentage. -yanmin ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: aim7 -30% regression in 2.6.24-rc1 2007-11-05 1:24 ` Zhang, Yanmin @ 2007-11-05 9:37 ` Cyrus Massoumi 2007-11-07 5:30 ` Zhang, Yanmin 0 siblings, 1 reply; 19+ messages in thread From: Cyrus Massoumi @ 2007-11-05 9:37 UTC (permalink / raw) To: Zhang, Yanmin; +Cc: mingo, linux-kernel, a.p.zijlstra Zhang, Yanmin wrote: > On Thu, 2007-11-01 at 11:02 +0100, Cyrus Massoumi wrote: >> Zhang, Yanmin wrote: >>> On Wed, 2007-10-31 at 17:57 +0800, Zhang, Yanmin wrote: >>>> On Tue, 2007-10-30 at 16:36 +0800, Zhang, Yanmin wrote: >>>>> On Tue, 2007-10-30 at 08:26 +0100, Ingo Molnar wrote: >>>>>> * Zhang, Yanmin <yanmin_zhang@linux.intel.com> wrote: >>>>>> >>>>>>> sub-bisecting captured patch >>>>>>> 38ad464d410dadceda1563f36bdb0be7fe4c8938(sched: uniform tunings) >>>>>>> caused 20% regression of aim7. >>>>>>> >>>>>>> The last 10% should be also related to sched parameters, such like >>>>>>> sysctl_sched_min_granularity. >>>>>> ah, interesting. Since you have CONFIG_SCHED_DEBUG enabled, could you >>>>>> please try to figure out what the best value for >>>>>> /proc/sys/kernel_sched_latency, /proc/sys/kernel_sched_nr_latency and >>>>>> /proc/sys/kernel_sched_min_granularity is? >>>>>> >>>>>> there's a tuning constraint for kernel_sched_nr_latency: >>>>>> >>>>>> - kernel_sched_nr_latency should always be set to >>>>>> kernel_sched_latency/kernel_sched_min_granularity. (it's not a free >>>>>> tunable) >>>>>> >>>>>> i suspect a good approach would be to double the value of >>>>>> kernel_sched_latency and kernel_sched_nr_latency in each tuning >>>>>> iteration, while keeping kernel_sched_min_granularity unchanged. That >>>>>> will excercise the tuning values of the 2.6.23 kernel as well. >>>>> I followed your idea to test 2.6.24-rc1. The improvement is slow. >>>>> When sched_nr_latency=2560 and sched_latency_ns=640000000, the performance >>>>> is still about 15% less than 2.6.23. >>>> I got the aim7 30% regression on my new upgraded stoakley machine. I found >>>> this mahcine is slower than the old one. Maybe BIOS has issues, or memeory(Might not >>>> be dual-channel?) is slow. So I retested it on the old machine and found on the old >>>> stoakley machine, the regression is about 6%, quite similiar to the regression on tigerton >>>> machine. >>>> >>>> By sched_nr_latency=640 and sched_latency_ns=640000000 on the old stoakley machine, >>>> the regression becomes about 2%. Other latency has more regression. >>>> >>>> On my tulsa machine, by sched_nr_latency=640 and sched_latency_ns=640000000, >>>> the regression becomes less than 1% (The original regression is about 20%). >>> I rerun SPECjbb by ched_nr_latency=640 and sched_latency_ns=640000000. On tigerton, >>> the regression is still more than 40%. On stoakley machine, it becomes worse (26%, >>> original is 9%). I will do more investigation to make sure SPECjbb regression is >>> also casued by the bad default values. >>> >>> We need a smarter method to calculate the best default values for the key tuning >>> parameters. >>> >>> One interesting is sysbench+mysql(readonly) got the same result like 2.6.22 (no >>> regression). Good job! >> Do you mean you couldn't reproduce the regression which was reported >> with 2.6.23 (http://lkml.org/lkml/2007/10/30/53) with 2.6.24-rc1? > It looks like you missed my emails. Yeah :( > Firstly, I reproduced (or just find the same myself :) ) the issue with kernel 2.6.22, > 2.6.23-rc and 2.6.23. > > Ingo wrote a big patch to fix it and the new patch is in 2.6.24-rc1 now. That's nice, could you please point me to the commit? > Then I retested it with 2.6.24-rc1 on a couple of x86_64 machines. The issue > disappeared. You could test it with 2.6.24-rc1. Will do! >> It >> would be nice if you could provide some numbers for 2.6.22, 2.6.23 and >> 2.6.24-rc1. > Sorry. Intel policy doesn't allow me to publish the numbers because only > specific departments in Intel could do that. But I could talk the regression > percentage. Fair enough :) > -yanmin greetings Cyrus ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: aim7 -30% regression in 2.6.24-rc1 2007-11-05 9:37 ` Cyrus Massoumi @ 2007-11-07 5:30 ` Zhang, Yanmin 0 siblings, 0 replies; 19+ messages in thread From: Zhang, Yanmin @ 2007-11-07 5:30 UTC (permalink / raw) To: Cyrus Massoumi; +Cc: mingo, linux-kernel, a.p.zijlstra On Mon, 2007-11-05 at 10:37 +0100, Cyrus Massoumi wrote: > Zhang, Yanmin wrote: > > On Thu, 2007-11-01 at 11:02 +0100, Cyrus Massoumi wrote: > >> Zhang, Yanmin wrote: > >>> On Wed, 2007-10-31 at 17:57 +0800, Zhang, Yanmin wrote: > >>>> On Tue, 2007-10-30 at 16:36 +0800, Zhang, Yanmin wrote: > >>>>> On Tue, 2007-10-30 at 08:26 +0100, Ingo Molnar wrote: > >>>>>> * Zhang, Yanmin <yanmin_zhang@linux.intel.com> wrote: > >>>>>> > >>>>>>> sub-bisecting captured patch > >>>>>>> 38ad464d410dadceda1563f36bdb0be7fe4c8938(sched: uniform tunings) > >>>>>>> caused 20% regression of aim7. > >>>>>>> > >>>>>>> The last 10% should be also related to sched parameters, such like > >>>>>>> sysctl_sched_min_granularity. > >>>>>> ah, interesting. Since you have CONFIG_SCHED_DEBUG enabled, could you > >>>>>> please try to figure out what the best value for > >>>>>> /proc/sys/kernel_sched_latency, /proc/sys/kernel_sched_nr_latency and > >>>>>> /proc/sys/kernel_sched_min_granularity is? > >>>>>> > >>>>>> there's a tuning constraint for kernel_sched_nr_latency: > >>>>>> > >>>>>> - kernel_sched_nr_latency should always be set to > >>>>>> kernel_sched_latency/kernel_sched_min_granularity. (it's not a free > >>>>>> tunable) > >>>>>> > >>>>>> i suspect a good approach would be to double the value of > >>>>>> kernel_sched_latency and kernel_sched_nr_latency in each tuning > >>>>>> iteration, while keeping kernel_sched_min_granularity unchanged. That > >>>>>> will excercise the tuning values of the 2.6.23 kernel as well. > >>>>> I followed your idea to test 2.6.24-rc1. The improvement is slow. > >>>>> When sched_nr_latency=2560 and sched_latency_ns=640000000, the performance > >>>>> is still about 15% less than 2.6.23. > >>>> I got the aim7 30% regression on my new upgraded stoakley machine. I found > >>>> this mahcine is slower than the old one. Maybe BIOS has issues, or memeory(Might not > >>>> be dual-channel?) is slow. So I retested it on the old machine and found on the old > >>>> stoakley machine, the regression is about 6%, quite similiar to the regression on tigerton > >>>> machine. > >>>> > >>>> By sched_nr_latency=640 and sched_latency_ns=640000000 on the old stoakley machine, > >>>> the regression becomes about 2%. Other latency has more regression. > >>>> > >>>> On my tulsa machine, by sched_nr_latency=640 and sched_latency_ns=640000000, > >>>> the regression becomes less than 1% (The original regression is about 20%). > >>> I rerun SPECjbb by ched_nr_latency=640 and sched_latency_ns=640000000. On tigerton, > >>> the regression is still more than 40%. On stoakley machine, it becomes worse (26%, > >>> original is 9%). I will do more investigation to make sure SPECjbb regression is > >>> also casued by the bad default values. > >>> > >>> We need a smarter method to calculate the best default values for the key tuning > >>> parameters. > >>> > >>> One interesting is sysbench+mysql(readonly) got the same result like 2.6.22 (no > >>> regression). Good job! > >> Do you mean you couldn't reproduce the regression which was reported > >> with 2.6.23 (http://lkml.org/lkml/2007/10/30/53) with 2.6.24-rc1? > > It looks like you missed my emails. > > Yeah :( > > > Firstly, I reproduced (or just find the same myself :) ) the issue with kernel 2.6.22, > > 2.6.23-rc and 2.6.23. > > > > Ingo wrote a big patch to fix it and the new patch is in 2.6.24-rc1 now. > > That's nice, could you please point me to the commit? The patch is very big. http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=b5869ce7f68b233ceb81465a7644be0d9a5f3dbb ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2007-11-07 5:32 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-26 9:43 aim7 -30% regression in 2.6.24-rc1 Zhang, Yanmin
2007-10-26 9:53 ` Peter Zijlstra
2007-10-29 0:15 ` Zhang, Yanmin
2007-10-26 11:23 ` Ingo Molnar
2007-10-29 2:22 ` Zhang, Yanmin
2007-10-29 9:37 ` Zhang, Yanmin
2007-10-30 2:12 ` Zhang, Yanmin
2007-10-30 7:26 ` Ingo Molnar
2007-10-30 8:36 ` Zhang, Yanmin
2007-10-31 9:57 ` Zhang, Yanmin
2007-10-31 10:30 ` Peter Zijlstra
2007-11-01 8:58 ` Ingo Molnar
[not found] ` <1193922687.27652.279.camel@twins>
[not found] ` <20071101150049.GB4044@elte.hu>
2007-11-01 15:29 ` Peter Zijlstra
2007-11-01 15:36 ` Ingo Molnar
2007-11-01 9:34 ` Zhang, Yanmin
2007-11-01 10:02 ` Cyrus Massoumi
2007-11-05 1:24 ` Zhang, Yanmin
2007-11-05 9:37 ` Cyrus Massoumi
2007-11-07 5:30 ` Zhang, Yanmin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox