* RE: [discuss] Re: [PATCH] Allow all Opteron processors to change pstate at same time
@ 2006-07-12 16:11 shin, jacob
2006-07-12 16:14 ` Langsdorf, Mark
2006-07-13 13:06 ` Pavel Machek
0 siblings, 2 replies; 29+ messages in thread
From: shin, jacob @ 2006-07-12 16:11 UTC (permalink / raw)
To: Deguara, Joachim, Andi Kleen
Cc: Langsdorf, Mark, discuss, linux-kernel, cpufreq
On Wednesday, July 12, 2006 9:06 AM Joachim Deguara wrote:
> Here are the further findings after letting the machine toggle between
> 1GHz and 2.2Ghz every two seconds for roughly 24 hours. Unfortunately
> there is an oops after bringing CPU2 online and CPU3 will not come
> online. Still the differences in TSC are not bad:
Can I get more information on how to reproduce the Oops? Kernel version?
.config? your hardware?
I have run basic set of CPU Hotplug on/offline tests, and I could not
reproduce it..
-Jacob Shin
AMD, Inc.
^ permalink raw reply [flat|nested] 29+ messages in thread* RE: [discuss] Re: [PATCH] Allow all Opteron processors to change pstate at same time 2006-07-12 16:11 [discuss] Re: [PATCH] Allow all Opteron processors to change pstate at same time shin, jacob @ 2006-07-12 16:14 ` Langsdorf, Mark 2006-07-13 13:06 ` Pavel Machek 1 sibling, 0 replies; 29+ messages in thread From: Langsdorf, Mark @ 2006-07-12 16:14 UTC (permalink / raw) To: shin, jacob, Deguara, Joachim, Andi Kleen; +Cc: discuss, linux-kernel, cpufreq > > Here are the further findings after letting the machine > toggle between > > 1GHz and 2.2Ghz every two seconds for roughly 24 hours. > Unfortunately > > there is an oops after bringing CPU2 online and CPU3 will not come > > online. Still the differences in TSC are not bad: > > Can I get more information on how to reproduce the Oops? > Kernel version? > .config? your hardware? > > I have run basic set of CPU Hotplug on/offline tests, and I could not > reproduce it.. There's probably something in my patch that is causing it. Are you testing with that? Joachim - Have you had a chance to measure TSC drift without PN? I'd like to know if the patch is making the problem worse or not. -Mark Langsdorf AMD, Inc. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [discuss] Re: [PATCH] Allow all Opteron processors to change pstate at same time 2006-07-12 16:11 [discuss] Re: [PATCH] Allow all Opteron processors to change pstate at same time shin, jacob 2006-07-12 16:14 ` Langsdorf, Mark @ 2006-07-13 13:06 ` Pavel Machek 2006-07-13 14:32 ` Joachim Deguara 1 sibling, 1 reply; 29+ messages in thread From: Pavel Machek @ 2006-07-13 13:06 UTC (permalink / raw) To: shin, jacob Cc: Deguara, Joachim, Andi Kleen, Langsdorf, Mark, discuss, linux-kernel, cpufreq Hi! > > Here are the further findings after letting the machine toggle between > > 1GHz and 2.2Ghz every two seconds for roughly 24 hours. Unfortunately > > there is an oops after bringing CPU2 online and CPU3 will not come > > online. Still the differences in TSC are not bad: > > Can I get more information on how to reproduce the Oops? Kernel version? > .config? your hardware? > > I have run basic set of CPU Hotplug on/offline tests, and I could not > reproduce it.. Can you run two such tests *in parallel*? That seemed to break it really quickly. Pavel -- Thanks for all the (sleeping) penguins. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [discuss] Re: [PATCH] Allow all Opteron processors to change pstate at same time 2006-07-13 13:06 ` Pavel Machek @ 2006-07-13 14:32 ` Joachim Deguara 2006-07-16 1:56 ` Pavel Machek 0 siblings, 1 reply; 29+ messages in thread From: Joachim Deguara @ 2006-07-13 14:32 UTC (permalink / raw) To: Pavel Machek Cc: shin, jacob, Andi Kleen, Langsdorf, Mark, discuss, linux-kernel, cpufreq On Thu, 2006-07-13 at 13:06 +0000, Pavel Machek wrote: > Can you run two such tests *in parallel*? That seemed to break it > really quickly. parallel sounds fun, but I don't get it. Two machine or trying to go online and offline at the same time? Firestorming two busy parallel while loops, one turning the core offline and the other online, did not bring an oops so I guess this kernel is in the clear in that regard. I can't get it to crash again and I am afraid that it crashed under an old devel kernel. After another ~20 hour test with heavy freq changes with the tscsync patch CPU 1: Syncing TSC to CPU 0. CPU 1: synchronized TSC with CPU 0 (last diff 4 cycles, maxerr 499 cycles) ... CPU 2: Syncing TSC to CPU 0. CPU 2: synchronized TSC with CPU 0 (last diff -105 cycles, maxerr 600 cycles) ... CPU 3: Syncing TSC to CPU 0. CPU 3: synchronized TSC with CPU 0 (last diff -122 cycles, maxerr 1126 cycles) after 5 hours of no PowerNow! CPU 1: Syncing TSC to CPU 0. CPU 1: synchronized TSC with CPU 0 (last diff -3 cycles, maxerr 598 cycles) ... CPU 2: Syncing TSC to CPU 0. CPU 2: synchronized TSC with CPU 0 (last diff -124 cycles, maxerr 1129 cycles) ... CPU 3: Syncing TSC to CPU 0. CPU 3: synchronized TSC with CPU 0 (last diff -124 cycles, maxerr 1127 cycles) huh?? I don't understand but it does not matter what I do or how long I do it, the difference looks to always be about the same. -joachim ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [discuss] Re: [PATCH] Allow all Opteron processors to change pstate at same time 2006-07-13 14:32 ` Joachim Deguara @ 2006-07-16 1:56 ` Pavel Machek 2006-07-17 7:37 ` Joachim Deguara 0 siblings, 1 reply; 29+ messages in thread From: Pavel Machek @ 2006-07-16 1:56 UTC (permalink / raw) To: Joachim Deguara Cc: shin, jacob, Andi Kleen, Langsdorf, Mark, discuss, linux-kernel, cpufreq > On Thu, 2006-07-13 at 13:06 +0000, Pavel Machek wrote: > > Can you run two such tests *in parallel*? That seemed to break it > > really quickly. > parallel sounds fun, but I don't get it. Two machine or trying to go > online and offline at the same time? Firestorming two busy parallel Trying to online and offline at the same time. > while loops, one turning the core offline and the other online, did not > bring an oops so I guess this kernel is in the clear in that regard. Better run two tight loops, each doing online; offline. I got reports it crashed machines before, but maybe it is solved. > I can't get it to crash again and I am afraid that it crashed under an > old devel kernel. After another ~20 hour test with heavy freq changes > with the tscsync patch > > CPU 1: Syncing TSC to CPU 0. > CPU 1: synchronized TSC with CPU 0 (last diff 4 cycles, maxerr 499 > cycles) > ... > CPU 2: Syncing TSC to CPU 0. > CPU 2: synchronized TSC with CPU 0 (last diff -105 cycles, maxerr 600 > cycles) > ... > CPU 3: Syncing TSC to CPU 0. > CPU 3: synchronized TSC with CPU 0 (last diff -122 cycles, maxerr 1126 > cycles) > > > after 5 hours of no PowerNow! > > CPU 1: Syncing TSC to CPU 0. > CPU 1: synchronized TSC with CPU 0 (last diff -3 cycles, maxerr 598 > cycles) > ... > CPU 2: Syncing TSC to CPU 0. > CPU 2: synchronized TSC with CPU 0 (last diff -124 cycles, maxerr 1129 > cycles) > ... > CPU 3: Syncing TSC to CPU 0. > CPU 3: synchronized TSC with CPU 0 (last diff -124 cycles, maxerr 1127 > cycles) > > > huh?? I don't understand but it does not matter what I do or how long I > do it, the difference looks to always be about the same. > > -joachim > > > -- Thanks, Sharp! ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [discuss] Re: [PATCH] Allow all Opteron processors to change pstate at same time 2006-07-16 1:56 ` Pavel Machek @ 2006-07-17 7:37 ` Joachim Deguara 2006-07-20 15:59 ` Pavel Machek 0 siblings, 1 reply; 29+ messages in thread From: Joachim Deguara @ 2006-07-17 7:37 UTC (permalink / raw) To: Pavel Machek Cc: shin, jacob, Andi Kleen, Langsdorf, Mark, discuss, linux-kernel, cpufreq On Sun, 2006-07-16 at 03:56 +0200, Pavel Machek wrote: > > On Thu, 2006-07-13 at 13:06 +0000, Pavel Machek wrote: > > > Can you run two such tests *in parallel*? That seemed to break it > > > really quickly. > > parallel sounds fun, but I don't get it. Two machine or trying to > go > > online and offline at the same time? Firestorming two busy parallel > > Trying to online and offline at the same time. > > > while loops, one turning the core offline and the other online, did > not > > bring an oops so I guess this kernel is in the clear in that regard. > > Better run two tight loops, each doing online; offline. I got reports > it crashed machines before, but maybe it is solved. yeah, that's what I did. Somethings are easier described in bash than in english. Nothing crashed or oopsed so the green light is there for online and offline in 2.6.18-rc1 (with my setup). -joachim ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [discuss] Re: [PATCH] Allow all Opteron processors to change pstate at same time 2006-07-17 7:37 ` Joachim Deguara @ 2006-07-20 15:59 ` Pavel Machek 0 siblings, 0 replies; 29+ messages in thread From: Pavel Machek @ 2006-07-20 15:59 UTC (permalink / raw) To: Joachim Deguara Cc: shin, jacob, Andi Kleen, Langsdorf, Mark, discuss, linux-kernel, cpufreq > On Sun, 2006-07-16 at 03:56 +0200, Pavel Machek wrote: > > > On Thu, 2006-07-13 at 13:06 +0000, Pavel Machek wrote: > > > > Can you run two such tests *in parallel*? That seemed to break it > > > > really quickly. > > > parallel sounds fun, but I don't get it. Two machine or trying to > > go > > > online and offline at the same time? Firestorming two busy parallel > > > > Trying to online and offline at the same time. > > > > > while loops, one turning the core offline and the other online, did > > not > > > bring an oops so I guess this kernel is in the clear in that regard. > > > > Better run two tight loops, each doing online; offline. I got reports > > it crashed machines before, but maybe it is solved. > > yeah, that's what I did. Somethings are easier described in bash than in > english. Nothing crashed or oopsed so the green light is there for > online and offline in 2.6.18-rc1 (with my setup). Okay, I tried hard here, and reproduced some fork failures (and swsusp panic :-), but not oops. Pavel -- Thanks, Sharp! ^ permalink raw reply [flat|nested] 29+ messages in thread
[parent not found: <Pine.LNX.4.64.0607061519040.9066@solonow.amd.com>]
* Re: [PATCH] Allow all Opteron processors to change pstate at same time [not found] <Pine.LNX.4.64.0607061519040.9066@solonow.amd.com> @ 2006-07-07 12:10 ` Andi Kleen 2006-07-07 17:36 ` [discuss] " Langsdorf, Mark ` (4 more replies) 0 siblings, 5 replies; 29+ messages in thread From: Andi Kleen @ 2006-07-07 12:10 UTC (permalink / raw) To: Mark Langsdorf; +Cc: discuss, linux-kernel, cpufreq "Mark Langsdorf" <mark.langsdorf@amd.com> writes: [cc'ing back to discuss and cpufreq] > The current generation of Opteron processors do not provide a frequency > independent TSC. This causes wild gettimeofday skew on systems that > enable cpufreq while using TSC as a gtod source. > > This patch provides a workaround by changing all processors to the same > frequency at the same time, so that the TSC on each processor never > increments at a different rate than the TSC on another processor. > > the "powernow-k8.tscsync=1" options enables simeltameous transitions. > Other options are necessary to force the use of TSC as a gtod source. > > This patch should apply cleanly to the 2.6.18-rc1 kernel. Your patch seems to be ^M damaged. I'm still dubious if the result is really correct if the hardware wasn't designed to guarantee synchronous TSC operation. Can you do the following test please? - Set this option - Let the system run for let's say a day or two with some freq transitions and varying loads [Better would be to let two systems run in this way to compare] - Then hotunplug all the CPUs >0 with for i in /sys/devices/system/cpu/cpu*/online ; do echo 0 > $i ; done - Wait a bit - Restart them again with for i in /sys/devices/system/cpu/cpu*/online ; do echo 1 > $i ; done The kernel should now print the results of the TSC resync for the replugged CPUs with output like this CPU N: Syncing TSC to CPU 0. CPU N: synchronized TSC with CPU 0 (last diff XXX cycles, maxerr YYY cycles) How do these numbers look like, also compared to the original boot output? If the cycles diverge more between the different CPUs it would be a bad sign. It would mean that the error would add up over longer runtime and timing would get more and more unstable. -Andi ^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [discuss] Re: [PATCH] Allow all Opteron processors to change pstate at same time 2006-07-07 12:10 ` Andi Kleen @ 2006-07-07 17:36 ` Langsdorf, Mark 2006-07-10 12:45 ` Joachim Deguara 2006-07-07 18:14 ` Scott Lampert ` (3 subsequent siblings) 4 siblings, 1 reply; 29+ messages in thread From: Langsdorf, Mark @ 2006-07-07 17:36 UTC (permalink / raw) To: ak; +Cc: discuss, linux-kernel, cpufreq > > the "powernow-k8.tscsync=1" options enables simeltameous > transitions. > > Other options are necessary to force the use of TSC as a > gtod source. > > > > This patch should apply cleanly to the 2.6.18-rc1 kernel. > > Your patch seems to be ^M damaged. I'll smack my mailer around again. Sorry about that. > I'm still dubious if the result is really correct if the > hardware wasn't designed to guarantee synchronous TSC operation. > > Can you do the following test please? We'll try to have the results back by Monday evening. -Mark Langsdorf AMD, Inc. ^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [discuss] Re: [PATCH] Allow all Opteron processors to change pstate at same time 2006-07-07 17:36 ` [discuss] " Langsdorf, Mark @ 2006-07-10 12:45 ` Joachim Deguara 2006-07-10 13:02 ` Joachim Deguara 0 siblings, 1 reply; 29+ messages in thread From: Joachim Deguara @ 2006-07-10 12:45 UTC (permalink / raw) To: Langsdorf, Mark; +Cc: ak, discuss, linux-kernel, cpufreq [-- Attachment #1: Type: text/plain, Size: 544 bytes --] On Fri, 2006-07-07 at 12:36 -0500, Langsdorf, Mark wrote: > > Your patch seems to be ^M damaged. > > I'll smack my mailer around again. Sorry about that. revised patch attached (also corrected tscsync qualifier and initial state of req_state). > > I'm still dubious if the result is really correct if the > > hardware wasn't designed to guarantee synchronous TSC operation. > > > > Can you do the following test please? > > We'll try to have the results back by Monday evening. may have to wait till Tuesday for my results. -joachim [-- Attachment #2: 2.6.18-rc1-pntscsync.patch --] [-- Type: text/x-patch, Size: 6927 bytes --] --- arch/i386/kernel/cpu/cpufreq/powernow-k8.c.orig 2006-07-10 13:25:47.000000000 +0200 +++ arch/i386/kernel/cpu/cpufreq/powernow-k8.c 2006-07-10 14:15:37.000000000 +0200 @@ -46,13 +46,15 @@ #define PFX "powernow-k8: " #define BFX PFX "BIOS error: " -#define VERSION "version 2.00.00" +#define VERSION "version 2.10.00" #include "powernow-k8.h" /* serialize freq changes */ static DEFINE_MUTEX(fidvid_mutex); static struct powernow_k8_data *powernow_data[NR_CPUS]; +static int *req_state = NULL; +static int tscsync = 0; static int cpu_family = CPU_OPTERON; @@ -205,6 +207,17 @@ static int write_new_fid(struct powernow dprintk("writing fid 0x%x, lo 0x%x, hi 0x%x\n", fid, lo, data->plllock * PLL_LOCK_CONVERSION); + if (tscsync) { + int i; + cpumask_t oldmask = current->cpus_allowed; + for_each_online_cpu(i) { + set_cpus_allowed(current, cpumask_of_cpu(i)); + schedule(); + wrmsr(MSR_FIDVID_CTL, lo & ~MSR_C_LO_INIT_FID_VID, data->plllock * PLL_LOCK_CONVERSION); + } + set_cpus_allowed(current, oldmask); + schedule(); + } do { wrmsr(MSR_FIDVID_CTL, lo, data->plllock * PLL_LOCK_CONVERSION); if (i++ > 100) { @@ -247,6 +260,17 @@ static int write_new_vid(struct powernow dprintk("writing vid 0x%x, lo 0x%x, hi 0x%x\n", vid, lo, STOP_GRANT_5NS); + if (tscsync) { + int i; + cpumask_t oldmask = current->cpus_allowed; + for_each_online_cpu(i) { + set_cpus_allowed(current, cpumask_of_cpu(i)); + schedule(); + wrmsr(MSR_FIDVID_CTL, lo & ~MSR_C_LO_INIT_FID_VID, STOP_GRANT_5NS); + } + set_cpus_allowed(current, oldmask); + schedule(); + } do { wrmsr(MSR_FIDVID_CTL, lo, STOP_GRANT_5NS); if (i++ > 100) { @@ -386,7 +410,8 @@ static int core_frequency_transition(str } if (data->currfid == reqfid) { - printk(KERN_ERR PFX "ph2 null fid transition 0x%x\n", data->currfid); + if (!tscsync) + printk(KERN_ERR PFX "ph2 null fid transition 0x%x\n", data->currfid); return 0; } @@ -960,9 +985,21 @@ static int transition_frequency_fidvid(s u32 vid = 0; int res, i; struct cpufreq_freqs freqs; + cpumask_t changing_cores; dprintk("cpu %d transition to index %u\n", smp_processor_id(), index); + /* if all processors are transitioning in step, find the highest + * current state and go to that + */ + + if (tscsync && req_state) { + req_state[smp_processor_id()] = index; + for_each_online_cpu(i) + if (req_state[i] < index) + index = req_state[i]; + } + /* fid/vid correctness check for k8 */ /* fid are the lower 8 bits of the index we stored into * the cpufreq frequency table in find_psb_table, vid @@ -983,6 +1020,8 @@ static int transition_frequency_fidvid(s } if ((fid < HI_FID_TABLE_BOTTOM) && (data->currfid < HI_FID_TABLE_BOTTOM)) { + if (tscsync && (data->currfid == fid)) + return 0; printk(KERN_ERR PFX "ignoring illegal change in lo freq table-%x to 0x%x\n", data->currfid, fid); @@ -994,7 +1033,11 @@ static int transition_frequency_fidvid(s freqs.old = find_khz_freq_from_fid(data->currfid); freqs.new = find_khz_freq_from_fid(fid); - for_each_cpu_mask(i, *(data->available_cores)) { + if (tscsync) + changing_cores = cpu_online_map; + else + changing_cores = *(data->available_cores); + for_each_cpu_mask(i, changing_cores) { freqs.cpu = i; cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE); } @@ -1002,10 +1045,16 @@ static int transition_frequency_fidvid(s res = transition_fid_vid(data, fid, vid); freqs.new = find_khz_freq_from_fid(data->currfid); - for_each_cpu_mask(i, *(data->available_cores)) { + for_each_cpu_mask(i, changing_cores) { freqs.cpu = i; cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE); } + if (tscsync) + for_each_online_cpu(i) + if (powernow_data[i]) { + powernow_data[i]->currfid = data->currfid; + powernow_data[i]->currvid = data->currvid; + } return res; } @@ -1054,7 +1103,7 @@ static int powernowk8_target(struct cpuf u32 checkfid; u32 checkvid; unsigned int newstate; - int ret = -EIO; + int ret = 0;//-EIO; if (!data) return -EINVAL; @@ -1089,7 +1138,7 @@ static int powernowk8_target(struct cpuf dprintk("targ: curr fid 0x%x, vid 0x%x\n", data->currfid, data->currvid); - if ((checkvid != data->currvid) || (checkfid != data->currfid)) { + if (!tscsync && ((checkvid != data->currvid) || (checkfid != data->currfid))) { printk(KERN_INFO PFX "error - out of sync, fix 0x%x 0x%x, vid 0x%x 0x%x\n", checkfid, data->currfid, checkvid, data->currvid); @@ -1100,7 +1149,6 @@ static int powernowk8_target(struct cpuf goto err_out; mutex_lock(&fidvid_mutex); - powernow_k8_acpi_pst_values(data, newstate); if (cpu_family == CPU_HW_PSTATE) @@ -1137,6 +1185,32 @@ static int powernowk8_verify(struct cpuf return cpufreq_frequency_table_verify(pol, data->powernow_table); } +/* On an MP system that is transitioning all cores in sync, adjust the + * vids for each frequency to the highest. Otherwise, systems made up + * of different steppings may fail. + */ +static void sync_tables(int curcpu) +{ + int j; + for (j = 0; j < powernow_data[curcpu]->numps; j++) { + int i; + int maxvid = 0; + for_each_online_cpu(i) { + int testvid; + if (!powernow_data[i] || !powernow_data[i]->powernow_table) + continue; + testvid = powernow_data[i]->powernow_table[j].index & 0xff00; + if (testvid > maxvid) + maxvid = testvid; + } + for_each_online_cpu(i) { + if (!powernow_data[i] || ! powernow_data[i]->powernow_table) + continue; + powernow_data[i]->powernow_table[j].index &= 0xff; + powernow_data[i]->powernow_table[j].index |= maxvid; + } + } +} /* per CPU init entry point to the driver */ static int __cpuinit powernowk8_cpu_init(struct cpufreq_policy *pol) { @@ -1241,6 +1315,8 @@ static int __cpuinit powernowk8_cpu_init powernow_data[pol->cpu] = data; + if (tscsync && (cpu_family == CPU_OPTERON)) + sync_tables(pol->cpu); return 0; err_out: @@ -1323,6 +1399,16 @@ static int __cpuinit powernowk8_init(voi } if (supported_cpus == num_online_cpus()) { + if (tscsync) { + req_state = kalloc(sizeof(int)*NR_CPUS, GFP_KERNEL); + if (!req_state) { + printk(KERN_ERR PFX "Unable to allocate memory!\n"); + return -ENOMEM; + } + //necessary for dual-cores (99=just a large number) + for(i=0; i < NR_CPUS; i++) + req_state[i] = 99; + } printk(KERN_INFO PFX "Found %d %s " "processors (" VERSION ")\n", supported_cpus, boot_cpu_data.x86_model_id); @@ -1337,6 +1423,9 @@ static void __exit powernowk8_exit(void) { dprintk("exit\n"); + if (tscsync) + kfree(req_state); + cpufreq_unregister_driver(&cpufreq_amd64_driver); } @@ -1346,3 +1435,6 @@ MODULE_LICENSE("GPL"); late_initcall(powernowk8_init); module_exit(powernowk8_exit); + +module_param(tscsync, int, 0); +MODULE_PARM_DESC(tscsync, "enable tsc by synchronizing powernow-k8 changes"); ^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [discuss] Re: [PATCH] Allow all Opteron processors to change pstate at same time 2006-07-10 12:45 ` Joachim Deguara @ 2006-07-10 13:02 ` Joachim Deguara 0 siblings, 0 replies; 29+ messages in thread From: Joachim Deguara @ 2006-07-10 13:02 UTC (permalink / raw) To: Langsdorf, Mark; +Cc: ak, discuss, linux-kernel, cpufreq [-- Attachment #1: Type: text/plain, Size: 352 bytes --] On Mon, 2006-07-10 at 14:46 +0200, Joachim Deguara wrote: > On Fri, 2006-07-07 at 12:36 -0500, Langsdorf, Mark wrote: > > > Your patch seems to be ^M damaged. > > > > I'll smack my mailer around again. Sorry about that. > > revised patch attached (also corrected tscsync qualifier and initial > state of req_state). typo corrected (sorry) -joachim [-- Attachment #2: 2.6.18-rc1-pntscsync.patch --] [-- Type: text/x-patch, Size: 6928 bytes --] --- arch/i386/kernel/cpu/cpufreq/powernow-k8.c.orig 2006-07-10 13:25:47.000000000 +0200 +++ arch/i386/kernel/cpu/cpufreq/powernow-k8.c 2006-07-10 14:52:11.000000000 +0200 @@ -46,13 +46,15 @@ #define PFX "powernow-k8: " #define BFX PFX "BIOS error: " -#define VERSION "version 2.00.00" +#define VERSION "version 2.10.00" #include "powernow-k8.h" /* serialize freq changes */ static DEFINE_MUTEX(fidvid_mutex); static struct powernow_k8_data *powernow_data[NR_CPUS]; +static int *req_state = NULL; +static int tscsync = 0; static int cpu_family = CPU_OPTERON; @@ -205,6 +207,17 @@ static int write_new_fid(struct powernow dprintk("writing fid 0x%x, lo 0x%x, hi 0x%x\n", fid, lo, data->plllock * PLL_LOCK_CONVERSION); + if (tscsync) { + int i; + cpumask_t oldmask = current->cpus_allowed; + for_each_online_cpu(i) { + set_cpus_allowed(current, cpumask_of_cpu(i)); + schedule(); + wrmsr(MSR_FIDVID_CTL, lo & ~MSR_C_LO_INIT_FID_VID, data->plllock * PLL_LOCK_CONVERSION); + } + set_cpus_allowed(current, oldmask); + schedule(); + } do { wrmsr(MSR_FIDVID_CTL, lo, data->plllock * PLL_LOCK_CONVERSION); if (i++ > 100) { @@ -247,6 +260,17 @@ static int write_new_vid(struct powernow dprintk("writing vid 0x%x, lo 0x%x, hi 0x%x\n", vid, lo, STOP_GRANT_5NS); + if (tscsync) { + int i; + cpumask_t oldmask = current->cpus_allowed; + for_each_online_cpu(i) { + set_cpus_allowed(current, cpumask_of_cpu(i)); + schedule(); + wrmsr(MSR_FIDVID_CTL, lo & ~MSR_C_LO_INIT_FID_VID, STOP_GRANT_5NS); + } + set_cpus_allowed(current, oldmask); + schedule(); + } do { wrmsr(MSR_FIDVID_CTL, lo, STOP_GRANT_5NS); if (i++ > 100) { @@ -386,7 +410,8 @@ static int core_frequency_transition(str } if (data->currfid == reqfid) { - printk(KERN_ERR PFX "ph2 null fid transition 0x%x\n", data->currfid); + if (!tscsync) + printk(KERN_ERR PFX "ph2 null fid transition 0x%x\n", data->currfid); return 0; } @@ -960,9 +985,21 @@ static int transition_frequency_fidvid(s u32 vid = 0; int res, i; struct cpufreq_freqs freqs; + cpumask_t changing_cores; dprintk("cpu %d transition to index %u\n", smp_processor_id(), index); + /* if all processors are transitioning in step, find the highest + * current state and go to that + */ + + if (tscsync && req_state) { + req_state[smp_processor_id()] = index; + for_each_online_cpu(i) + if (req_state[i] < index) + index = req_state[i]; + } + /* fid/vid correctness check for k8 */ /* fid are the lower 8 bits of the index we stored into * the cpufreq frequency table in find_psb_table, vid @@ -983,6 +1020,8 @@ static int transition_frequency_fidvid(s } if ((fid < HI_FID_TABLE_BOTTOM) && (data->currfid < HI_FID_TABLE_BOTTOM)) { + if (tscsync && (data->currfid == fid)) + return 0; printk(KERN_ERR PFX "ignoring illegal change in lo freq table-%x to 0x%x\n", data->currfid, fid); @@ -994,7 +1033,11 @@ static int transition_frequency_fidvid(s freqs.old = find_khz_freq_from_fid(data->currfid); freqs.new = find_khz_freq_from_fid(fid); - for_each_cpu_mask(i, *(data->available_cores)) { + if (tscsync) + changing_cores = cpu_online_map; + else + changing_cores = *(data->available_cores); + for_each_cpu_mask(i, changing_cores) { freqs.cpu = i; cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE); } @@ -1002,10 +1045,16 @@ static int transition_frequency_fidvid(s res = transition_fid_vid(data, fid, vid); freqs.new = find_khz_freq_from_fid(data->currfid); - for_each_cpu_mask(i, *(data->available_cores)) { + for_each_cpu_mask(i, changing_cores) { freqs.cpu = i; cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE); } + if (tscsync) + for_each_online_cpu(i) + if (powernow_data[i]) { + powernow_data[i]->currfid = data->currfid; + powernow_data[i]->currvid = data->currvid; + } return res; } @@ -1054,7 +1103,7 @@ static int powernowk8_target(struct cpuf u32 checkfid; u32 checkvid; unsigned int newstate; - int ret = -EIO; + int ret = 0;//-EIO; if (!data) return -EINVAL; @@ -1089,7 +1138,7 @@ static int powernowk8_target(struct cpuf dprintk("targ: curr fid 0x%x, vid 0x%x\n", data->currfid, data->currvid); - if ((checkvid != data->currvid) || (checkfid != data->currfid)) { + if (!tscsync && ((checkvid != data->currvid) || (checkfid != data->currfid))) { printk(KERN_INFO PFX "error - out of sync, fix 0x%x 0x%x, vid 0x%x 0x%x\n", checkfid, data->currfid, checkvid, data->currvid); @@ -1100,7 +1149,6 @@ static int powernowk8_target(struct cpuf goto err_out; mutex_lock(&fidvid_mutex); - powernow_k8_acpi_pst_values(data, newstate); if (cpu_family == CPU_HW_PSTATE) @@ -1137,6 +1185,32 @@ static int powernowk8_verify(struct cpuf return cpufreq_frequency_table_verify(pol, data->powernow_table); } +/* On an MP system that is transitioning all cores in sync, adjust the + * vids for each frequency to the highest. Otherwise, systems made up + * of different steppings may fail. + */ +static void sync_tables(int curcpu) +{ + int j; + for (j = 0; j < powernow_data[curcpu]->numps; j++) { + int i; + int maxvid = 0; + for_each_online_cpu(i) { + int testvid; + if (!powernow_data[i] || !powernow_data[i]->powernow_table) + continue; + testvid = powernow_data[i]->powernow_table[j].index & 0xff00; + if (testvid > maxvid) + maxvid = testvid; + } + for_each_online_cpu(i) { + if (!powernow_data[i] || ! powernow_data[i]->powernow_table) + continue; + powernow_data[i]->powernow_table[j].index &= 0xff; + powernow_data[i]->powernow_table[j].index |= maxvid; + } + } +} /* per CPU init entry point to the driver */ static int __cpuinit powernowk8_cpu_init(struct cpufreq_policy *pol) { @@ -1241,6 +1315,8 @@ static int __cpuinit powernowk8_cpu_init powernow_data[pol->cpu] = data; + if (tscsync && (cpu_family == CPU_OPTERON)) + sync_tables(pol->cpu); return 0; err_out: @@ -1323,6 +1399,16 @@ static int __cpuinit powernowk8_init(voi } if (supported_cpus == num_online_cpus()) { + if (tscsync) { + req_state = kmalloc(sizeof(int)*NR_CPUS, GFP_KERNEL); + if (!req_state) { + printk(KERN_ERR PFX "Unable to allocate memory!\n"); + return -ENOMEM; + } + //necessary for dual-cores (99=just a large number) + for(i=0; i < NR_CPUS; i++) + req_state[i] = 99; + } printk(KERN_INFO PFX "Found %d %s " "processors (" VERSION ")\n", supported_cpus, boot_cpu_data.x86_model_id); @@ -1337,6 +1423,9 @@ static void __exit powernowk8_exit(void) { dprintk("exit\n"); + if (tscsync) + kfree(req_state); + cpufreq_unregister_driver(&cpufreq_amd64_driver); } @@ -1346,3 +1435,6 @@ MODULE_LICENSE("GPL"); late_initcall(powernowk8_init); module_exit(powernowk8_exit); + +module_param(tscsync, int, 0); +MODULE_PARM_DESC(tscsync, "enable tsc by synchronizing powernow-k8 changes"); ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [discuss] Re: [PATCH] Allow all Opteron processors to change pstate at same time 2006-07-07 12:10 ` Andi Kleen 2006-07-07 17:36 ` [discuss] " Langsdorf, Mark @ 2006-07-07 18:14 ` Scott Lampert 2006-07-07 18:26 ` Langsdorf, Mark 2006-07-11 12:55 ` Joachim Deguara ` (2 subsequent siblings) 4 siblings, 1 reply; 29+ messages in thread From: Scott Lampert @ 2006-07-07 18:14 UTC (permalink / raw) Cc: discuss, cpufreq As an aside, is this what the "AMD Dual-Core Optimizer" driver located at: http://www.amd.com/us-en/Processors/TechnicalResources/0,,30_182_871_13118,00.html does for that other OS? Does this solution work there? -Scott Andi Kleen wrote: > "Mark Langsdorf" <mark.langsdorf@amd.com> writes: > > [cc'ing back to discuss and cpufreq] > >> The current generation of Opteron processors do not provide a frequency >> independent TSC. This causes wild gettimeofday skew on systems that >> enable cpufreq while using TSC as a gtod source. >> >> This patch provides a workaround by changing all processors to the same >> frequency at the same time, so that the TSC on each processor never >> increments at a different rate than the TSC on another processor. >> >> the "powernow-k8.tscsync=1" options enables simeltameous transitions. >> Other options are necessary to force the use of TSC as a gtod source. >> >> This patch should apply cleanly to the 2.6.18-rc1 kernel. > > Your patch seems to be ^M damaged. > > I'm still dubious if the result is really correct if the hardware > wasn't designed to guarantee synchronous TSC operation. > > Can you do the following test please? > > - Set this option > - Let the system run for let's say a day or two with some freq transitions > and varying loads > [Better would be to let two systems run in this way to compare] > - Then hotunplug all the CPUs >0 with > for i in /sys/devices/system/cpu/cpu*/online ; do echo 0 > $i ; done > - Wait a bit > - Restart them again with > for i in /sys/devices/system/cpu/cpu*/online ; do echo 1 > $i ; done > > The kernel should now print the results of the TSC resync for the > replugged CPUs with output like this > > CPU N: Syncing TSC to CPU 0. > CPU N: synchronized TSC with CPU 0 (last diff XXX cycles, maxerr YYY cycles) > > How do these numbers look like, also compared to the original boot > output? > > If the cycles diverge more between the different CPUs it would be a bad sign. > It would mean that the error would add up over longer runtime > and timing would get more and more unstable. > > -Andi > ^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [discuss] Re: [PATCH] Allow all Opteron processors to change pstate at same time 2006-07-07 18:14 ` Scott Lampert @ 2006-07-07 18:26 ` Langsdorf, Mark 0 siblings, 0 replies; 29+ messages in thread From: Langsdorf, Mark @ 2006-07-07 18:26 UTC (permalink / raw) To: Scott Lampert; +Cc: discuss, cpufreq > As an aside, is this what the "AMD Dual-Core Optimizer" > driver located at: > > http://www.amd.com/us-en/Processors/TechnicalResources/0,,30_1 > 82_871_13118,00.html > > does for that other OS? Does this solution work there? No, that tool forces the TSCs on a single dual-core processor into sync by rewriting them once per second. There's no attempt to handle multiprocessor systems. My initial attempt at solving the TSC/PN problem was to synchronize all TSCs to the TSC of the first processor after a PN! transition. It didn't work because I couldn't guarantee the timing closely enough, and it was an ugly hack into the guts of arch/x86_64/kernel/time.c that would also need changes to the userland tools. This patch is better because it's a userland transparent change to the powernow-k8 code. The ideal solution is to go to a per-core TSC counter, but that's some complex code I haven't figured out yet. -Mark Langsdorf AMD, Inc. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [discuss] Re: [PATCH] Allow all Opteron processors to change pstate at same time 2006-07-07 12:10 ` Andi Kleen 2006-07-07 17:36 ` [discuss] " Langsdorf, Mark 2006-07-07 18:14 ` Scott Lampert @ 2006-07-11 12:55 ` Joachim Deguara 2006-07-11 13:07 ` Andi Kleen 2006-07-12 14:06 ` Joachim Deguara 2006-07-25 21:47 ` Langsdorf, Mark 2006-07-26 16:42 ` Langsdorf, Mark 4 siblings, 2 replies; 29+ messages in thread From: Joachim Deguara @ 2006-07-11 12:55 UTC (permalink / raw) To: Andi Kleen; +Cc: Mark Langsdorf, discuss, linux-kernel, cpufreq On Fri, 2006-07-07 at 14:10 +0200, Andi Kleen wrote: > How do these numbers look like, also compared to the original boot > output? > > If the cycles diverge more between the different CPUs it would be a > bad sign. > It would mean that the error would add up over longer runtime > and timing would get more and more unstable. Here are some initial findings. When I say I change the frequencies, then I mean that a script toggles from max to min or from min to max every two seconds. I will let the machine run overnight to compare, but the interesting fact is that the cycle difference where the same for a short test (2 minutes) and a medium test (2 hours). The maching is a two-way dual-core Tyan Thunder K8WE 2895. letting the computer run while toggling the freqs for 2 hours, set freqs to max, then put all but cpu0 offline. Jul 11 21:21:52 gradient kernel: Breaking affinity for irq 0 Jul 11 21:21:52 gradient kernel: Breaking affinity for irq 1 Jul 11 21:21:52 gradient kernel: CPU 1 is now offline Jul 11 21:21:52 gradient kernel: Breaking affinity for irq 201 Jul 11 21:21:52 gradient kernel: CPU 2 is now offline Jul 11 21:21:52 gradient [powersave]: WARNING (adjustSpeed:159) calcCPULoad failed. Cannot adjust speeds: -1 Jul 11 21:21:52 gradient kernel: CPU 3 is now offline Jul 11 21:21:52 gradient kernel: SMP alternatives: switching to UP code waited a bit then put all cores online Jul 11 21:23:35 gradient kernel: SMP alternatives: switching to SMP code Jul 11 21:23:35 gradient kernel: Booting processor 1/4 APIC 0x1 Jul 11 21:23:35 gradient kernel: Initializing CPU#1 Jul 11 21:23:35 gradient kernel: Calibrating delay using timer specific routine.. 2009.40 BogoMIPS (lpj=4018809) Jul 11 21:23:35 gradient kernel: CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) Jul 11 21:23:35 gradient kernel: CPU: L2 Cache: 1024K (64 bytes/line) Jul 11 21:23:35 gradient kernel: CPU 1/1 -> Node 0 Jul 11 21:23:35 gradient kernel: CPU: Physical Processor ID: 0 Jul 11 21:23:35 gradient kernel: CPU: Processor Core ID: 1 Jul 11 21:23:35 gradient kernel: Dual Core AMD Opteron(tm) Processor 275 stepping 02 Jul 11 21:23:35 gradient kernel: CPU 1: Syncing TSC to CPU 0. Jul 11 21:23:35 gradient kernel: CPU 1: synchronized TSC with CPU 0 (last diff 3 cycles, maxerr 502 cycles) Jul 11 21:23:35 gradient kernel: powernow-k8: 0 : fid 0xe (2200 MHz), vid 0x8 Jul 11 21:23:35 gradient kernel: powernow-k8: 1 : fid 0xc (2000 MHz), vid 0xa Jul 11 21:23:35 gradient kernel: powernow-k8: 2 : fid 0xa (1800 MHz), vid 0xc Jul 11 21:23:35 gradient kernel: powernow-k8: 3 : fid 0x2 (1000 MHz), vid 0x12 Jul 11 21:23:35 gradient kernel: SMP alternatives: switching to SMP code Jul 11 21:23:35 gradient kernel: Booting processor 2/4 APIC 0x2 Jul 11 21:23:35 gradient kernel: Initializing CPU#2 Jul 11 21:23:35 gradient kernel: Calibrating delay using timer specific routine.. 2009.41 BogoMIPS (lpj=4018838) Jul 11 21:23:35 gradient kernel: CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) Jul 11 21:23:35 gradient kernel: CPU: L2 Cache: 1024K (64 bytes/line) Jul 11 21:23:35 gradient kernel: CPU 2/2 -> Node 1 Jul 11 21:23:35 gradient kernel: CPU: Physical Processor ID: 1 Jul 11 21:23:35 gradient kernel: CPU: Processor Core ID: 0 Jul 11 21:23:35 gradient kernel: Dual Core AMD Opteron(tm) Processor 275 stepping 02 Jul 11 21:23:35 gradient kernel: CPU 2: Syncing TSC to CPU 0. Jul 11 21:23:35 gradient kernel: CPU 2: synchronized TSC with CPU 0 (last diff -91 cycles, maxerr 621 cycles) Jul 11 21:23:35 gradient kernel: powernow-k8: 0 : fid 0xe (2200 MHz), vid 0x8 Jul 11 21:23:35 gradient kernel: powernow-k8: 1 : fid 0xc (2000 MHz), vid 0xa Jul 11 21:23:35 gradient kernel: powernow-k8: 2 : fid 0xa (1800 MHz), vid 0xc Jul 11 21:23:35 gradient kernel: powernow-k8: 3 : fid 0x2 (1000 MHz), vid 0x12 Jul 11 21:23:35 gradient kernel: SMP alternatives: switching to SMP code Jul 11 21:23:35 gradient kernel: Booting processor 3/4 APIC 0x3 Jul 11 21:23:35 gradient kernel: Initializing CPU#3 Jul 11 21:23:35 gradient kernel: Calibrating delay using timer specific routine.. 4420.77 BogoMIPS (lpj=8841555) Jul 11 21:23:35 gradient kernel: CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) Jul 11 21:23:35 gradient kernel: CPU: L2 Cache: 1024K (64 bytes/line) Jul 11 21:23:35 gradient kernel: CPU 3/3 -> Node 1 Jul 11 21:23:35 gradient kernel: CPU: Physical Processor ID: 1 Jul 11 21:23:35 gradient kernel: CPU: Processor Core ID: 1 Jul 11 21:23:35 gradient kernel: Dual Core AMD Opteron(tm) Processor 275 stepping 02 Jul 11 21:23:35 gradient kernel: CPU 3: Syncing TSC to CPU 0. Jul 11 21:23:35 gradient kernel: CPU 3: synchronized TSC with CPU 0 (last diff -122 cycles, maxerr 1129 cycles) Jul 11 21:23:35 gradient kernel: powernow-k8: 0 : fid 0xe (2200 MHz), vid 0x8 Jul 11 21:23:35 gradient kernel: powernow-k8: 1 : fid 0xc (2000 MHz), vid 0xa Jul 11 21:23:35 gradient kernel: powernow-k8: 2 : fid 0xa (1800 MHz), vid 0xc Jul 11 21:23:35 gradient kernel: powernow-k8: 3 : fid 0x2 (1000 MHz), vid 0x12 going offline again results in: Jul 11 21:24:39 gradient kernel: CPU 1 is now offline Jul 11 21:24:39 gradient kernel: Breaking affinity for irq 201 Jul 11 21:24:39 gradient kernel: powernow-k8: limiting to CPU 2 failed in powernowk8_get Jul 11 21:24:39 gradient kernel: powernow-k8: limiting to CPU 2 failed in powernowk8_get Jul 11 21:24:39 gradient kernel: CPU 2 is now offline Jul 11 21:24:39 gradient kernel: CPU 3 is now offline Jul 11 21:24:39 gradient kernel: SMP alternatives: switching to UP code I waited just a little bit and did not do any freq changes, but when I put the cores online, the differences where about the same??? Jul 11 21:25:24 gradient kernel: SMP alternatives: switching to SMP code Jul 11 21:25:24 gradient kernel: Booting processor 1/4 APIC 0x1 Jul 11 21:25:24 gradient kernel: Initializing CPU#1 Jul 11 21:25:24 gradient kernel: Calibrating delay using timer specific routine.. 2009.44 BogoMIPS (lpj=4018898) Jul 11 21:25:24 gradient kernel: CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) Jul 11 21:25:24 gradient kernel: CPU: L2 Cache: 1024K (64 bytes/line) Jul 11 21:25:24 gradient kernel: CPU 1/1 -> Node 0 Jul 11 21:25:24 gradient kernel: CPU: Physical Processor ID: 0 Jul 11 21:25:24 gradient kernel: CPU: Processor Core ID: 1 Jul 11 21:25:24 gradient kernel: Dual Core AMD Opteron(tm) Processor 275 stepping 02 Jul 11 21:25:24 gradient kernel: CPU 1: Syncing TSC to CPU 0. Jul 11 21:25:24 gradient kernel: CPU 1: synchronized TSC with CPU 0 (last diff 4 cycles, maxerr 499 cycles) Jul 11 21:25:24 gradient kernel: powernow-k8: 0 : fid 0xe (2200 MHz), vid 0x8 Jul 11 21:25:24 gradient kernel: powernow-k8: 1 : fid 0xc (2000 MHz), vid 0xa Jul 11 21:25:24 gradient kernel: powernow-k8: 2 : fid 0xa (1800 MHz), vid 0xc Jul 11 21:25:24 gradient kernel: powernow-k8: 3 : fid 0x2 (1000 MHz), vid 0x12 Jul 11 21:25:24 gradient kernel: SMP alternatives: switching to SMP code Jul 11 21:25:24 gradient kernel: Booting processor 2/4 APIC 0x2 Jul 11 21:25:24 gradient kernel: Initializing CPU#2 Jul 11 21:25:24 gradient kernel: Calibrating delay using timer specific routine.. 2009.43 BogoMIPS (lpj=4018861) Jul 11 21:25:24 gradient kernel: CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) Jul 11 21:25:24 gradient kernel: CPU: L2 Cache: 1024K (64 bytes/line) Jul 11 21:25:24 gradient kernel: CPU 2/2 -> Node 1 Jul 11 21:25:24 gradient kernel: CPU: Physical Processor ID: 1 Jul 11 21:25:24 gradient kernel: CPU: Processor Core ID: 0 Jul 11 21:25:24 gradient kernel: Dual Core AMD Opteron(tm) Processor 275 stepping 02 Jul 11 21:25:24 gradient kernel: CPU 2: Syncing TSC to CPU 0. Jul 11 21:25:24 gradient kernel: CPU 2: synchronized TSC with CPU 0 (last diff -93 cycles, maxerr 625 cycles) Jul 11 21:25:24 gradient kernel: powernow-k8: 0 : fid 0xe (2200 MHz), vid 0x8 Jul 11 21:25:24 gradient kernel: powernow-k8: 1 : fid 0xc (2000 MHz), vid 0xa Jul 11 21:25:24 gradient kernel: powernow-k8: 2 : fid 0xa (1800 MHz), vid 0xc Jul 11 21:25:24 gradient kernel: powernow-k8: 3 : fid 0x2 (1000 MHz), vid 0x12 Jul 11 21:25:24 gradient kernel: SMP alternatives: switching to SMP code Jul 11 21:25:24 gradient kernel: Booting processor 3/4 APIC 0x3 Jul 11 21:25:24 gradient kernel: Initializing CPU#3 Jul 11 21:25:24 gradient kernel: Calibrating delay using timer specific routine.. 4420.67 BogoMIPS (lpj=8841353) Jul 11 21:25:24 gradient kernel: CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) Jul 11 21:25:24 gradient kernel: CPU: L2 Cache: 1024K (64 bytes/line) Jul 11 21:25:24 gradient kernel: CPU 3/3 -> Node 1 Jul 11 21:25:24 gradient kernel: CPU: Physical Processor ID: 1 Jul 11 21:25:24 gradient kernel: CPU: Processor Core ID: 1 Jul 11 21:25:24 gradient kernel: Dual Core AMD Opteron(tm) Processor 275 stepping 02 Jul 11 21:25:24 gradient kernel: CPU 3: Syncing TSC to CPU 0. Jul 11 21:25:24 gradient kernel: CPU 3: synchronized TSC with CPU 0 (last diff -122 cycles, maxerr 1126 cycles) Jul 11 21:25:24 gradient kernel: powernow-k8: 0 : fid 0xe (2200 MHz), vid 0x8 Jul 11 21:25:24 gradient kernel: powernow-k8: 1 : fid 0xc (2000 MHz), vid 0xa Jul 11 21:25:24 gradient kernel: powernow-k8: 2 : fid 0xa (1800 MHz), vid 0xc Jul 11 21:25:24 gradient kernel: powernow-k8: 3 : fid 0x2 (1000 MHz), vid 0x12 sorry if this word-wrappings looks as bad by you as it does with evolution. -joachim ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [discuss] Re: [PATCH] Allow all Opteron processors to change pstate at same time 2006-07-11 12:55 ` Joachim Deguara @ 2006-07-11 13:07 ` Andi Kleen 2006-07-11 13:14 ` Arjan van de Ven 2006-07-11 13:31 ` Langsdorf, Mark 2006-07-12 14:06 ` Joachim Deguara 1 sibling, 2 replies; 29+ messages in thread From: Andi Kleen @ 2006-07-11 13:07 UTC (permalink / raw) To: Joachim Deguara; +Cc: Mark Langsdorf, discuss, linux-kernel, cpufreq > > Jul 11 21:23:35 gradient kernel: CPU 2: Syncing TSC to CPU 0. > Jul 11 21:23:35 gradient kernel: CPU 2: synchronized TSC with CPU 0 > (last diff -91 cycles, maxerr 621 cycles) > Jul 11 21:23:35 gradient kernel: CPU 3: Syncing TSC to CPU 0. > Jul 11 21:23:35 gradient kernel: CPU 3: synchronized TSC with CPU 0 > (last diff -122 cycles, maxerr 1129 cycles) This means the CPUs diverged between 500 and 1100 cycles in the night. This can already cause severe timing problems with the clock going backwards if a task switches CPUs - and there are many programs that don't like that. If the system is up longer it will be worse. The only way to possibly make the concept work would be regular TSC resyncs during runtime, but I think I would prefer using per CPU TSC offsets using RDTSCP instead because they should be able to tolerate arbitary shifts. -Andi ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [discuss] Re: [PATCH] Allow all Opteron processors to change pstate at same time 2006-07-11 13:07 ` Andi Kleen @ 2006-07-11 13:14 ` Arjan van de Ven 2006-07-11 16:15 ` Alan Cox 2006-07-11 13:31 ` Langsdorf, Mark 1 sibling, 1 reply; 29+ messages in thread From: Arjan van de Ven @ 2006-07-11 13:14 UTC (permalink / raw) To: Andi Kleen Cc: Joachim Deguara, Mark Langsdorf, discuss, linux-kernel, cpufreq > > The only way to possibly make the concept work would be regular TSC resyncs > during runtime, but I think I would prefer using per CPU TSC offsets > using RDTSCP instead because they should be able to tolerate arbitary > shifts. if you have per cpu offset and speed, then you don't even need to tie all frequencies together... sounds like the best solution to me.. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [discuss] Re: [PATCH] Allow all Opteron processors to change pstate at same time 2006-07-11 13:14 ` Arjan van de Ven @ 2006-07-11 16:15 ` Alan Cox 2006-07-11 16:01 ` Arjan van de Ven 2006-07-11 16:04 ` Andi Kleen 0 siblings, 2 replies; 29+ messages in thread From: Alan Cox @ 2006-07-11 16:15 UTC (permalink / raw) To: Arjan van de Ven Cc: Andi Kleen, Joachim Deguara, Mark Langsdorf, discuss, linux-kernel, cpufreq Ar Maw, 2006-07-11 am 15:14 +0200, ysgrifennodd Arjan van de Ven: > if you have per cpu offset and speed, then you don't even need to tie > all frequencies together... sounds like the best solution to me.. CPU clocks on some systems are not stable relative to one another. Doing the maths only works if you know the divergence isn't cause by independant clock sources ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [discuss] Re: [PATCH] Allow all Opteron processors to change pstate at same time 2006-07-11 16:15 ` Alan Cox @ 2006-07-11 16:01 ` Arjan van de Ven 2006-07-11 16:04 ` Andi Kleen 1 sibling, 0 replies; 29+ messages in thread From: Arjan van de Ven @ 2006-07-11 16:01 UTC (permalink / raw) To: Alan Cox; +Cc: discuss, cpufreq, Andi Kleen, linux-kernel On Tue, 2006-07-11 at 17:15 +0100, Alan Cox wrote: > Ar Maw, 2006-07-11 am 15:14 +0200, ysgrifennodd Arjan van de Ven: > > if you have per cpu offset and speed, then you don't even need to tie > > all frequencies together... sounds like the best solution to me.. > > CPU clocks on some systems are not stable relative to one another. Doing > the maths only works if you know the divergence isn't cause by > independant clock sources obviously you do math only on your local cpu, with the values for your local cpu. And just never ever look at tsc values from another cpu, consider them entirely uncorrelated for all I care ;) Within those constraints it should be reasonably ok (there still is trouble if the tsc stops entirely in idle, but thats a different thing) ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [discuss] Re: [PATCH] Allow all Opteron processors to change pstate at same time 2006-07-11 16:15 ` Alan Cox 2006-07-11 16:01 ` Arjan van de Ven @ 2006-07-11 16:04 ` Andi Kleen 1 sibling, 0 replies; 29+ messages in thread From: Andi Kleen @ 2006-07-11 16:04 UTC (permalink / raw) To: Alan Cox Cc: Arjan van de Ven, Joachim Deguara, Mark Langsdorf, discuss, linux-kernel, cpufreq, vojtech On Tuesday 11 July 2006 18:15, Alan Cox wrote: > Ar Maw, 2006-07-11 am 15:14 +0200, ysgrifennodd Arjan van de Ven: > > if you have per cpu offset and speed, then you don't even need to tie > > all frequencies together... sounds like the best solution to me.. > > CPU clocks on some systems are not stable relative to one another. Doing > the maths only works if you know the divergence isn't cause by > independant clock sources You misunderstood the proposal (actually there is a prototype, so it's more than that) The reason the TSCs need to be synchronized is that gettimeofday always takes the offset against a global variable set by the last timer interrupt. So your TSC needs to be synchronized to the CPU of the TSC that runs the timer interrupt. If instead the per CPU timers set a cpu local variable then you can do the offset calculation per CPU. The scheduler already uses this trick by keeping sched_clock comparisions always CPU local. In practice there are a few more complications, but that's it in a nutshell. -Andi ^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [discuss] Re: [PATCH] Allow all Opteron processors to change pstate at same time 2006-07-11 13:07 ` Andi Kleen 2006-07-11 13:14 ` Arjan van de Ven @ 2006-07-11 13:31 ` Langsdorf, Mark 2006-07-11 13:34 ` Arjan van de Ven 1 sibling, 1 reply; 29+ messages in thread From: Langsdorf, Mark @ 2006-07-11 13:31 UTC (permalink / raw) To: Andi Kleen, Deguara, Joachim; +Cc: discuss, linux-kernel, cpufreq > > Jul 11 21:23:35 gradient kernel: CPU 2: Syncing TSC to CPU 0. > > Jul 11 21:23:35 gradient kernel: CPU 2: synchronized TSC with CPU 0 > > (last diff -91 cycles, maxerr 621 cycles) > > > Jul 11 21:23:35 gradient kernel: CPU 3: Syncing TSC to CPU 0. > > Jul 11 21:23:35 gradient kernel: CPU 3: synchronized TSC with CPU 0 > > (last diff -122 cycles, maxerr 1129 cycles) > > This means the CPUs diverged between 500 and 1100 cycles in the night. > This can already cause severe timing problems with the clock > going backwards if a task switches CPUs - and there are many > programs that don't like that. If the system is up longer it > will be worse. Joachim - Can you run Andi's test without changing PN! frequency? I'd like to see a baseline for how bad TSC is by itself, and whether the TSCnow! code is making the problem worse or better. Customers in the field seem to want to use TSC for gtod, so I want to know how awful an idea that is. > The only way to possibly make the concept work would be > regular TSC resyncs during runtime, but I think I would > prefer using per CPU TSC offsets using RDTSCP instead because > they should be able to tolerate arbitary shifts. I would prefer that, too, but I don't have the resources to code that solution in the timeframe I have available. -Mark Langsdorf AMD, Inc. ^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [discuss] Re: [PATCH] Allow all Opteron processors to change pstate at same time 2006-07-11 13:31 ` Langsdorf, Mark @ 2006-07-11 13:34 ` Arjan van de Ven 2006-07-11 13:51 ` Andi Kleen 0 siblings, 1 reply; 29+ messages in thread From: Arjan van de Ven @ 2006-07-11 13:34 UTC (permalink / raw) To: Langsdorf, Mark Cc: Andi Kleen, Deguara, Joachim, discuss, linux-kernel, cpufreq > Customers in the field seem to want to use TSC for gtod, > so I want to know how awful an idea that is. in userspace or in the kernel? And do you happen to know why they don't want to use hpet? Greetings, Arjan van de Ven ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [discuss] Re: [PATCH] Allow all Opteron processors to change pstate at same time 2006-07-11 13:34 ` Arjan van de Ven @ 2006-07-11 13:51 ` Andi Kleen 0 siblings, 0 replies; 29+ messages in thread From: Andi Kleen @ 2006-07-11 13:51 UTC (permalink / raw) To: Arjan van de Ven Cc: Langsdorf, Mark, Deguara, Joachim, discuss, linux-kernel, cpufreq On Tuesday 11 July 2006 15:34, Arjan van de Ven wrote: > > > Customers in the field seem to want to use TSC for gtod, > > so I want to know how awful an idea that is. > > in userspace or in the kernel? It has to be in kernel. User space is hopeless. > And do you happen to know why they don't want to use hpet? HPET is slow (although not as bad as PM) too and most BIOS don't enable it. -Andi ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [discuss] Re: [PATCH] Allow all Opteron processors to change pstate at same time 2006-07-11 12:55 ` Joachim Deguara 2006-07-11 13:07 ` Andi Kleen @ 2006-07-12 14:06 ` Joachim Deguara 2006-07-12 14:54 ` Andi Kleen 1 sibling, 1 reply; 29+ messages in thread From: Joachim Deguara @ 2006-07-12 14:06 UTC (permalink / raw) To: Andi Kleen; +Cc: Mark Langsdorf, discuss, linux-kernel, cpufreq On Tue, 2006-07-11 at 14:55 +0200, Joachim Deguara wrote: > Here are some initial findings. Here are the further findings after letting the machine toggle between 1GHz and 2.2Ghz every two seconds for roughly 24 hours. Unfortunately there is an oops after bringing CPU2 online and CPU3 will not come online. Still the differences in TSC are not bad: after 24 hours toggling the freq Jul 12 23:39:27 gradient kernel: CPU 1: synchronized TSC with CPU 0 (last diff 13 cycles, maxerr 471 cycles) Jul 12 23:39:27 gradient kernel: CPU 2: synchronized TSC with CPU 0 (last diff 22 cycles, maxerr 580 cycles) CPU3 does not appear because of oops Again from yesterdays test after 2 hours toggling the freq Jul 11 21:23:35 gradient kernel: CPU 1: synchronized TSC with CPU 0 (last diff 3 cycles, maxerr 502 cycles) Jul 11 21:23:35 gradient kernel: CPU 2: synchronized TSC with CPU 0 (last diff -91 cycles, maxerr 621 cycles) Jul 11 21:23:35 gradient kernel: CPU 3: synchronized TSC with CPU 0 (last diff -122 cycles, maxerr 1129 cycles) after 2 minutes of no toggling (what I see as best case) Jul 11 21:25:24 gradient kernel: CPU 1: synchronized TSC with CPU 0 (last diff 4 cycles, maxerr 499 cycles) Jul 11 21:25:24 gradient kernel: CPU 2: synchronized TSC with CPU 0 (last diff -93 cycles, maxerr 625 cycles) Jul 11 21:25:24 gradient kernel: CPU 3: synchronized TSC with CPU 0 (last diff -122 cycles, maxerr 1126 cycles) At 1GHz, 1000 cycles translates 1 microsecond, which happens to be exactly the resolution of gettimeofday. And we are way below this with the last diff and the maxerr is theoretical as it is just a measure of round trip from the sync algo. Full log of going online with the 24 hours test is below -joachim Jul 12 23:39:27 gradient kernel: Booting processor 1/4 APIC 0x1 Jul 12 23:39:27 gradient kernel: Initializing CPU#1 Jul 12 23:39:27 gradient kernel: Calibrating delay using timer specific routine.. 2009.42 BogoMIPS (lpj=4018859) Jul 12 23:39:27 gradient kernel: CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) Jul 12 23:39:27 gradient kernel: CPU: L2 Cache: 1024K (64 bytes/line) Jul 12 23:39:27 gradient kernel: CPU 1(2) -> Node 0 -> Core 1 Jul 12 23:39:27 gradient kernel: Dual Core AMD Opteron(tm) Processor 275 stepping 02 Jul 12 23:39:27 gradient kernel: CPU 1: Syncing TSC to CPU 0. Jul 12 23:39:27 gradient kernel: CPU 1: synchronized TSC with CPU 0 (last diff 13 cycles, maxerr 471 cycles) Jul 12 23:39:27 gradient kernel: kobject machinecheck1: registering. parent: machinecheck, set: machinecheck Jul 12 23:39:27 gradient kernel: kobject_uevent Jul 12 23:39:27 gradient kernel: fill_kobj_path: path = '/devices/system/machinecheck/machinecheck1' Jul 12 23:39:27 gradient kernel: kobject threshold1: registering. parent: threshold, set: threshold Jul 12 23:39:27 gradient kernel: kobject_uevent Jul 12 23:39:27 gradient kernel: fill_kobj_path: path = '/devices/system/threshold/threshold1' Jul 12 23:39:27 gradient kernel: cpufreq-core: adding CPU 1 Jul 12 23:39:27 gradient kernel: powernow-k8: 0 : fid 0xe, vid 0x8 Jul 12 23:39:27 gradient kernel: powernow-k8: 1 : fid 0xc, vid 0xa Jul 12 23:39:27 gradient kernel: powernow-k8: 2 : fid 0xa, vid 0xc Jul 12 23:39:27 gradient kernel: powernow-k8: 3 : fid 0x2, vid 0x12 Jul 12 23:39:27 gradient kernel: powernow-k8: 0 : fid 0xe (2200 MHz), vid 0x8 Jul 12 23:39:27 gradient kernel: powernow-k8: 1 : fid 0xc (2000 MHz), vid 0xa Jul 12 23:39:27 gradient kernel: powernow-k8: 2 : fid 0xa (1800 MHz), vid 0xc Jul 12 23:39:27 gradient kernel: powernow-k8: 3 : fid 0x2 (1000 MHz), vid 0x12 Jul 12 23:39:27 gradient kernel: powernow-k8: cpu1, init lo 0x1202, hi 0x1 Jul 12 23:39:27 gradient kernel: powernow-k8: policy current frequency 1000000 kHz Jul 12 23:39:27 gradient kernel: freq-table: table entry 0: 2200000 kHz, 2062 index Jul 12 23:39:27 gradient kernel: freq-table: table entry 1: 2000000 kHz, 2572 index Jul 12 23:39:27 gradient kernel: freq-table: table entry 2: 1800000 kHz, 3082 index Jul 12 23:39:27 gradient kernel: freq-table: table entry 3: 1000000 kHz, 4610 index Jul 12 23:39:27 gradient kernel: freq-table: setting show_table for cpu 1 to ffff81007f619800 Jul 12 23:39:27 gradient kernel: powernow-k8: cpu_init done, current fid 0x2, vid 0x12 Jul 12 23:39:27 gradient kernel: cpufreq-core: CPU already managed, adding link Jul 12 23:39:27 gradient kernel: printk: 31 messages suppressed. Jul 12 23:39:27 gradient kernel: freq-table: clearing show_table for cpu 1 Jul 12 23:39:27 gradient kernel: kobject_uevent Jul 12 23:39:27 gradient kernel: fill_kobj_path: path = '/devices/system/cpu/cpu1' Jul 12 23:39:27 gradient kernel: Booting processor 2/4 APIC 0x2 Jul 12 23:39:27 gradient kernel: Initializing CPU#2 Jul 12 23:39:27 gradient kernel: Calibrating delay using timer specific routine.. 2009.34 BogoMIPS (lpj=4018684) Jul 12 23:39:27 gradient kernel: CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) Jul 12 23:39:27 gradient kernel: CPU: L2 Cache: 1024K (64 bytes/line) Jul 12 23:39:27 gradient kernel: CPU 2(2) -> Node 1 -> Core 0 Jul 12 23:39:27 gradient kernel: Dual Core AMD Opteron(tm) Processor 275 stepping 02 Jul 12 23:39:27 gradient kernel: CPU 2: Syncing TSC to CPU 0. Jul 12 23:39:27 gradient kernel: CPU 2: synchronized TSC with CPU 0 (last diff 22 cycles, maxerr 580 cycles) Jul 12 23:39:27 gradient kernel: kobject machinecheck2: registering. parent: machinecheck, set: machinecheck Jul 12 23:39:27 gradient kernel: kobject_uevent Jul 12 23:39:27 gradient kernel: fill_kobj_path: path = '/devices/system/machinecheck/machinecheck2' Jul 12 23:39:27 gradient kernel: kobject threshold2: registering. parent: threshold, set: threshold Jul 12 23:39:27 gradient kernel: kobject_uevent Jul 12 23:39:27 gradient kernel: fill_kobj_path: path = '/devices/system/threshold/threshold2' Jul 12 23:39:27 gradient kernel: cpufreq-core: adding CPU 2 Jul 12 23:39:27 gradient kernel: powernow-k8: 0 : fid 0xe, vid 0x8 Jul 12 23:39:27 gradient kernel: powernow-k8: 1 : fid 0xc, vid 0xa Jul 12 23:39:27 gradient kernel: powernow-k8: 2 : fid 0xa, vid 0xc Jul 12 23:39:28 gradient kernel: powernow-k8: 3 : fid 0x2, vid 0x12 Jul 12 23:39:28 gradient kernel: powernow-k8: 0 : fid 0xe (2200 MHz), vid 0x8 Jul 12 23:39:28 gradient kernel: powernow-k8: 1 : fid 0xc (2000 MHz), vid 0xa Jul 12 23:39:28 gradient kernel: powernow-k8: 2 : fid 0xa (1800 MHz), vid 0xc Jul 12 23:39:28 gradient kernel: powernow-k8: 3 : fid 0x2 (1000 MHz), vid 0x12 Jul 12 23:39:28 gradient kernel: powernow-k8: cpu2, init lo 0x802, hi 0x1 Jul 12 23:39:28 gradient kernel: powernow-k8: policy current frequency 1000000 kHz Jul 12 23:39:28 gradient kernel: freq-table: table entry 0: 2200000 kHz, 2062 index Jul 12 23:39:28 gradient kernel: freq-table: table entry 1: 2000000 kHz, 2572 index Jul 12 23:39:28 gradient kernel: freq-table: table entry 2: 1800000 kHz, 3082 index Jul 12 23:39:28 gradient kernel: freq-table: table entry 3: 1000000 kHz, 4610 index Jul 12 23:39:28 gradient kernel: freq-table: setting show_table for cpu 2 to ffff8101d8339480 Jul 12 23:39:28 gradient kernel: powernow-k8: cpu_init done, current fid 0x2, vid 0x8 Jul 12 23:39:28 gradient kernel: Unable to handle kernel NULL pointer dereference at 0000000000000001 RIP: Jul 12 23:39:28 gradient kernel: <ffffffff882fa00c>{:powernow_k8:powernowk8_cpu_init+3115} Jul 12 23:39:28 gradient kernel: PGD 1d24f5067 PUD 1d9035067 PMD 0 Jul 12 23:39:28 gradient kernel: Oops: 0000 [1] SMP Jul 12 23:39:28 gradient kernel: last sysfs file: /devices/system/cpu/cpu2/online Jul 12 23:39:28 gradient kernel: CPU 2 Jul 12 23:39:28 gradient kernel: Modules linked in: xt_pkttype ipt_LOG xt_limit cpufreq_ondemand cpufreq_userspace cpufreq_powersave powernow_k8 freq_table snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device af_packet button battery ac ip6t_REJECT xt_tcpudp ipt_REJECT xt_state iptable_mangle iptable_nat ip_nat iptable_filter ip6table_m angle ip_conntrack nfnetlink ip_tables ip6table_filter ip6_tables x_tables ipv6 loop dm_mod forcedeth snd_intel8x0 snd_ac97_codec snd_ac97_bus snd_pcm snd_timer snd so undcore snd_page_alloc floppy reiserfs edd fan thermal processor sg sata_nv libata amd74xx sd_mod scsi_mod ide_disk ide_core Jul 12 23:39:28 gradient kernel: Pid: 25406, comm: bash Tainted: G U 2.6.16.20-20060612161415-smp #14 Jul 12 23:39:28 gradient kernel: RIP: 0010:[<ffffffff882fa00c>] <ffffffff882fa00c>{:powernow_k8:powernowk8_cpu_init+3115} Jul 12 23:39:28 gradient kernel: RSP: 0018:ffff810078263c18 EFLAGS: 00010202 Jul 12 23:39:28 gradient kernel: RAX: 0000000000000001 RBX: 0000000000000800 RCX: 0000000000000001 Jul 12 23:39:28 gradient kernel: RDX: 0000000000000001 RSI: 0000000000000080 RDI: ffffffff8045c760 Jul 12 23:39:28 gradient kernel: RBP: 0000000000000000 R08: ffffffff8045c760 R09: ffff810078263928 Jul 12 23:39:28 gradient kernel: R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 Jul 12 23:39:28 gradient kernel: R13: 0000000000000002 R14: ffff8101d83394a0 R15: ffff81007f390400 Jul 12 23:39:28 gradient kernel: FS: 00002af6fc81fae0(0000) GS:ffff8101dadd4240(0000) knlGS:0000000000000000 Jul 12 23:39:28 gradient kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Jul 12 23:39:28 gradient kernel: CR2: 0000000000000001 CR3: 00000001cfe35000 CR4: 00000000000006a0 Jul 12 23:39:28 gradient kernel: Process bash (pid: 25406, threadinfo ffff810078262000, task ffff81007f48f080) Jul 12 23:39:28 gradient kernel: Stack: 0000000000000000 00000003801e9054 ffff8101d8339480 0000000000000000 Jul 12 23:39:28 gradient kernel: 0000000000000004 00000000000000c0 0000000000000004 0000000000000000 Jul 12 23:39:28 gradient kernel: ffffffffffffffff ffffffffffffffff Jul 12 23:39:28 gradient kernel: Call Trace: <ffffffff80268b4f>{cpufreq_add_dev+414} Jul 12 23:39:28 gradient kernel: <ffffffff802cf074>{thread_return+0} <ffffffff801911b1>{alloc_inode+42} Jul 12 23:39:28 gradient kernel: <ffffffff801b83bf>{sysfs_new_dirent+52} <ffffffff801b863d>{sysfs_make_dirent+27} Jul 12 23:39:28 gradient kernel: <ffffffff80268f97>{cpufreq_cpu_callback+49} <ffffffff802d2805>{notifier_call_chain+28} Jul 12 23:39:28 gradient kernel: <ffffffff801476bd>{cpu_up+169} <ffffffff80250b94>{store_online+85} Jul 12 23:39:28 gradient kernel: <ffffffff801b7d75>{sysfs_write_file+185} <ffffffff8017adc0>{vfs_write+215} Jul 12 23:39:28 gradient kernel: <ffffffff8017b381>{sys_write+69} <ffffffff8010a7be>{system_call+126} Jul 12 23:39:28 gradient kernel: Jul 12 23:39:28 gradient kernel: Code: 8b 04 28 25 00 ff 00 00 39 c3 0f 4c d8 ff c2 be 80 00 00 00 Jul 12 23:39:28 gradient kernel: RIP <ffffffff882fa00c>{:powernow_k8:powernowk8_cpu_init+3115} RSP <ffff810078263c18> Jul 12 23:39:28 gradient kernel: CR2: 0000000000000001 ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [discuss] Re: [PATCH] Allow all Opteron processors to change pstate at same time 2006-07-12 14:06 ` Joachim Deguara @ 2006-07-12 14:54 ` Andi Kleen 0 siblings, 0 replies; 29+ messages in thread From: Andi Kleen @ 2006-07-12 14:54 UTC (permalink / raw) To: discuss; +Cc: Joachim Deguara, Mark Langsdorf, linux-kernel, cpufreq On Wednesday 12 July 2006 16:06, Joachim Deguara wrote: > On Tue, 2006-07-11 at 14:55 +0200, Joachim Deguara wrote: > > Here are some initial findings. > > Here are the further findings after letting the machine toggle between > 1GHz and 2.2Ghz every two seconds for roughly 24 hours. Unfortunately > there is an oops after bringing CPU2 online and CPU3 will not come > online. Perhaps Mark/Jacob can take a look at the oops. > Still the differences in TSC are not bad: Think: any difference you get for 24h will be 10x when the system runs 10 times longer and 365 times when the system runs for a year (and there are Linux systems who run much longer than a year) Also even small non monotonies between CPUs in gettimeofday can cause big trouble. -Andi ^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [discuss] Re: [PATCH] Allow all Opteron processors to change pstate at same time 2006-07-07 12:10 ` Andi Kleen ` (2 preceding siblings ...) 2006-07-11 12:55 ` Joachim Deguara @ 2006-07-25 21:47 ` Langsdorf, Mark 2006-07-26 11:31 ` Joachim Deguara 2006-07-26 16:42 ` Langsdorf, Mark 4 siblings, 1 reply; 29+ messages in thread From: Langsdorf, Mark @ 2006-07-25 21:47 UTC (permalink / raw) To: ak, Gulam, Nagib; +Cc: discuss, linux-kernel, cpufreq > > This patch provides a workaround by changing all processors to the > > same frequency at the same time, so that the TSC on each processor > > never increments at a different rate than the TSC on > > another processor. > > I'm still dubious if the result is really correct if the > hardware wasn't designed to guarantee synchronous TSC operation. > > Can you do the following test please? > > - Set this option > - Let the system run for let's say a day or two with some > freq transitions and varying loads [Better would be to let > two systems run in this way to compare] > - Then hotunplug all the CPUs >0 with > for i in /sys/devices/system/cpu/cpu*/online ; do echo 0 > $i ; done > - Wait a bit > - Restart them again with > for i in /sys/devices/system/cpu/cpu*/online ; do echo 1 > $i ; done I started running a baseline test on Thursday, July 13th, and continued until today. The system was not running cpufreq but it was using TSC as the sole gtod timesource. Here's the numbers (again, after 12 days uptime): SMP alternatives: switching to SMP code Booting processor 1/4 APIC 0x1 Initializing CPU#1 Calibrating delay using timer specific routine.. 4394.60 BogoMIPS (lpj=8789204) CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 1024K (64 bytes/line) CPU 1/1 -> Node 1 AMD Opteron(tm) Processor 854 stepping 01 CPU 1: Syncing TSC to CPU 0. CPU 1: synchronized TSC with CPU 0 (last diff -132 cycles, maxerr 966 cycles) SMP alternatives: switching to SMP code Booting processor 2/4 APIC 0x2 Initializing CPU#2 Calibrating delay using timer specific routine.. 4394.60 BogoMIPS (lpj=8789200) CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 1024K (64 bytes/line) CPU 2/2 -> Node 2 AMD Opteron(tm) Processor 838 stepping 01 CPU 2: Syncing TSC to CPU 0. CPU 2: synchronized TSC with CPU 0 (last diff -132 cycles, maxerr 953 cycles) SMP alternatives: switching to SMP code Booting processor 3/4 APIC 0x3 Initializing CPU#3 Calibrating delay using timer specific routine.. 4394.58 BogoMIPS (lpj=8789169) CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 1024K (64 bytes/line) CPU 3/3 -> Node 3 AMD Opteron(tm) Processor 838 stepping 01 CPU 3: Syncing TSC to CPU 0. CPU 3: synchronized TSC with CPU 0 (last diff -257 cycles, maxerr 864 cycles) Compared to the same machine, immediately after reboot: SMP alternatives: switching to SMP code Booting processor 1/4 APIC 0x1 Initializing CPU#1 Calibrating delay using timer specific routine.. 4394.60 BogoMIPS (lpj=8789216) CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 1024K (64 bytes/line) CPU 1/1 -> Node 1 AMD Opteron(tm) Processor 854 stepping 01 CPU 1: Syncing TSC to CPU 0. CPU 1: synchronized TSC with CPU 0 (last diff -132 cycles, maxerr 972 cycles) SMP alternatives: switching to SMP code Booting processor 2/4 APIC 0x2 Initializing CPU#2 Calibrating delay using timer specific routine.. 4394.60 BogoMIPS (lpj=8789212) CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 1024K (64 bytes/line) CPU 2/2 -> Node 2 AMD Opteron(tm) Processor 838 stepping 01 CPU 2: Syncing TSC to CPU 0. CPU 2: synchronized TSC with CPU 0 (last diff -129 cycles, maxerr 949 cycles) SMP alternatives: switching to SMP code Booting processor 3/4 APIC 0x3 Initializing CPU#3 Calibrating delay using timer specific routine.. 4394.57 BogoMIPS (lpj=8789142) CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 1024K (64 bytes/line) CPU 3/3 -> Node 3 AMD Opteron(tm) Processor 838 stepping 01 CPU 3: Syncing TSC to CPU 0. CPU 3: synchronized TSC with CPU 0 (last diff -173 cycles, maxerr 1648 cycles) I don't see any significant drift. It looks to me like current generation Opterons are TSC-safe. Joachim was supposed to be collecting the data for the system with PN! enabled, if he hasn't posted it already. -Mark Langsdorf AMD, Inc. ^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [discuss] Re: [PATCH] Allow all Opteron processors to change pstate at same time 2006-07-25 21:47 ` Langsdorf, Mark @ 2006-07-26 11:31 ` Joachim Deguara 0 siblings, 0 replies; 29+ messages in thread From: Joachim Deguara @ 2006-07-26 11:31 UTC (permalink / raw) To: Langsdorf, Mark; +Cc: ak, Gulam, Nagib, discuss, linux-kernel, cpufreq On Tue, 2006-07-25 at 16:47 -0500, Langsdorf, Mark wrote: > Joachim was supposed to be collecting the data for the system > with PN! enabled, if he hasn't posted it already. > yeah sorry, when I checked today my machine only had an uptime of 21 hours and somehow has some new entries in the mcelog, great. I can redo the longterm test. -joachim ^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [discuss] Re: [PATCH] Allow all Opteron processors to change pstate at same time 2006-07-07 12:10 ` Andi Kleen ` (3 preceding siblings ...) 2006-07-25 21:47 ` Langsdorf, Mark @ 2006-07-26 16:42 ` Langsdorf, Mark 2006-07-26 16:54 ` Andi Kleen 4 siblings, 1 reply; 29+ messages in thread From: Langsdorf, Mark @ 2006-07-26 16:42 UTC (permalink / raw) To: ak, Gulam, Nagib; +Cc: discuss, linux-kernel, cpufreq > - Set this option > - Let the system run for let's say a day or two with some > freq transitions and varying loads [Better would be to let > two systems run in this way to compare] > - Then hotunplug all the CPUs >0 with > for i in /sys/devices/system/cpu/cpu*/online ; do echo 0 > $i ; done > - Wait a bit > - Restart them again with > for i in /sys/devices/system/cpu/cpu*/online ; do echo 1 > $i ; done > > The kernel should now print the results of the TSC resync for > the replugged CPUs with output like this > > CPU N: Syncing TSC to CPU 0. > CPU N: synchronized TSC with CPU 0 (last diff XXX cycles, > maxerr YYY cycles) > > How do these numbers look like, also compared to the original > boot output? > > If the cycles diverge more between the different CPUs it > would be a bad sign. > It would mean that the error would add up over longer runtime > and timing would get more and more unstable. Andi - Are you sure this test is testing what you think it is testing? I just ran my system with the stock 2.6.18 kernel and TSC enabled. I ran PN! on a varying load long enough for the clocks to become completely unglued. The time delta on the command `date;sleep 5; date` was going from -10 to 30 seconds. However, the results of your test show no problems: during bootup SMP alternatives: switching to SMP code Booting processor 1/4 APIC 0x1 Initializing CPU#1 Calibrating delay using timer specific routine.. 4394.60 BogoMIPS (lpj=8789206) CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 1024K (64 bytes/line) CPU 1/1 -> Node 1 AMD Opteron(tm) Processor 854 stepping 01 CPU 1: Syncing TSC to CPU 0. CPU 1: synchronized TSC with CPU 0 (last diff -135 cycles, maxerr 968 cycles) SMP alternatives: switching to SMP code Booting processor 2/4 APIC 0x2 Initializing CPU#2 Calibrating delay using timer specific routine.. 4394.60 BogoMIPS (lpj=8789208) CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 1024K (64 bytes/line) CPU 2/2 -> Node 2 AMD Opteron(tm) Processor 838 stepping 01 CPU 2: Syncing TSC to CPU 0. CPU 2: synchronized TSC with CPU 0 (last diff -132 cycles, maxerr 949 cycles) SMP alternatives: switching to SMP code Booting processor 3/4 APIC 0x3 Initializing CPU#3 Calibrating delay using timer specific routine.. 4394.60 BogoMIPS (lpj=8789214) CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 1024K (64 bytes/line) CPU 3/3 -> Node 3 AMD Opteron(tm) Processor 838 stepping 01 CPU 3: Syncing TSC to CPU 0. CPU 3: synchronized TSC with CPU 0 (last diff -184 cycles, maxerr 1648 cycles) Brought up 4 CPUs hot add of offlined CPUs Booting processor 1/4 APIC 0x1 Initializing CPU#1 Calibrating delay using timer specific routine.. 1997.56 BogoMIPS (lpj=3995129) CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 1024K (64 bytes/line) CPU 1/1 -> Node 1 AMD Opteron(tm) Processor 854 stepping 01 CPU 1: Syncing TSC to CPU 0. CPU 1: synchronized TSC with CPU 0 (last diff -102 cycles, maxerr 685 cycles) powernow-k8: 0 : fid 0xe (2200 MHz), vid 0xc powernow-k8: 1 : fid 0xc (2000 MHz), vid 0xe powernow-k8: 2 : fid 0xa (1800 MHz), vid 0x10 powernow-k8: 3 : fid 0x2 (1000 MHz), vid 0x12 SMP alternatives: switching to SMP code Booting processor 2/4 APIC 0x2 Initializing CPU#2 Calibrating delay using timer specific routine.. 1997.54 BogoMIPS (lpj=3995084) CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 1024K (64 bytes/line) CPU 2/2 -> Node 2 AMD Opteron(tm) Processor 838 stepping 01 CPU 2: Syncing TSC to CPU 0. CPU 2: synchronized TSC with CPU 0 (last diff -97 cycles, maxerr 663 cycles) powernow-k8: 0 : fid 0xe (2200 MHz), vid 0x6 powernow-k8: 1 : fid 0xc (2000 MHz), vid 0x8 powernow-k8: 2 : fid 0xa (1800 MHz), vid 0xa powernow-k8: 3 : fid 0x2 (1000 MHz), vid 0x12 SMP alternatives: switching to SMP code Booting processor 3/4 APIC 0x3 Initializing CPU#3 Calibrating delay using timer specific routine.. 1997.54 BogoMIPS (lpj=3995099) CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 1024K (64 bytes/line) CPU 3/3 -> Node 3 AMD Opteron(tm) Processor 838 stepping 01 CPU 3: Syncing TSC to CPU 0. CPU 3: synchronized TSC with CPU 0 (last diff -109 cycles, maxerr 1024 cycles) powernow-k8: 0 : fid 0xe (2200 MHz), vid 0x6 powernow-k8: 1 : fid 0xc (2000 MHz), vid 0x8 powernow-k8: 2 : fid 0xa (1800 MHz), vid 0xa powernow-k8: 3 : fid 0x2 (1000 MHz), vid 0x12 Is there a better test we can be using? -Mark Langsdorf AMD, Inc. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [discuss] Re: [PATCH] Allow all Opteron processors to change pstate at same time 2006-07-26 16:42 ` Langsdorf, Mark @ 2006-07-26 16:54 ` Andi Kleen 2006-07-26 18:34 ` Langsdorf, Mark 0 siblings, 1 reply; 29+ messages in thread From: Andi Kleen @ 2006-07-26 16:54 UTC (permalink / raw) To: Langsdorf, Mark; +Cc: discuss, cpufreq, linux-kernel, Gulam, Nagib > AMD Opteron(tm) Processor 838 stepping 01 > CPU 3: Syncing TSC to CPU 0. > CPU 3: synchronized TSC with CPU 0 (last diff -109 cycles, maxerr 1024 Hmm, indeed - i would have expected higher max errors too. It should have worked in theory. No explanation currently. > cycles) > powernow-k8: 0 : fid 0xe (2200 MHz), vid 0x6 > powernow-k8: 1 : fid 0xc (2000 MHz), vid 0x8 > powernow-k8: 2 : fid 0xa (1800 MHz), vid 0xa > powernow-k8: 3 : fid 0x2 (1000 MHz), vid 0x12 > > Is there a better test we can be using? I don't know of any. Ok I guess it would be possible to write something in user space, but it would likely look similar to the algorithm the kernel uses. -Andi ^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [discuss] Re: [PATCH] Allow all Opteron processors to change pstate at same time 2006-07-26 16:54 ` Andi Kleen @ 2006-07-26 18:34 ` Langsdorf, Mark 2006-07-26 18:53 ` Andi Kleen 0 siblings, 1 reply; 29+ messages in thread From: Langsdorf, Mark @ 2006-07-26 18:34 UTC (permalink / raw) To: Andi Kleen; +Cc: Gulam, Nagib, discuss, linux-kernel, cpufreq > > AMD Opteron(tm) Processor 838 stepping 01 CPU 3: Syncing > TSC to CPU 0. > > CPU 3: synchronized TSC with CPU 0 (last diff -109 cycles, > maxerr 1024 > > Hmm, indeed - i would have expected higher max errors too. > It should have worked in theory. No explanation currently. THat's unfortunate. > > cycles) > > powernow-k8: 0 : fid 0xe (2200 MHz), vid 0x6 > > powernow-k8: 1 : fid 0xc (2000 MHz), vid 0x8 > > powernow-k8: 2 : fid 0xa (1800 MHz), vid 0xa > > powernow-k8: 3 : fid 0x2 (1000 MHz), vid 0x12 > > > > Is there a better test we can be using? > > I don't know of any. Ok I guess it would be possible to write > something in user space, but it would likely look similar to > the algorithm the kernel uses. I ran the following simple test on the 4P system with TSC gtod for a week: while true; do date; sleep 3600; done the first entry went in at July 13 15:39:48, the last entry at July 25 15:39:50. A drift of 2 seconds over 12 days is within specification, I believe. In contrast, the same machine running with TSC and standard PN! sees massive drift, upwards of an hour, within an hour. If the TSCnow! patch reduces measured drift down to a second a week, would you consider that acceptable? -Mark Langsdorf AMD, Inc. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [discuss] Re: [PATCH] Allow all Opteron processors to change pstate at same time 2006-07-26 18:34 ` Langsdorf, Mark @ 2006-07-26 18:53 ` Andi Kleen 0 siblings, 0 replies; 29+ messages in thread From: Andi Kleen @ 2006-07-26 18:53 UTC (permalink / raw) To: discuss; +Cc: cpufreq, linux-kernel, Gulam, Nagib > In contrast, the same machine running with TSC and standard > PN! sees massive drift, upwards of an hour, within an hour. Do you see the same drift when you lock date on a single CPU with taskset? > If the TSCnow! patch reduces measured drift down to a second > a week, would you consider that acceptable? No because even one second a week will break timing badly over time. I believe the only good solution unless hardware helps would be to use per CPU TSC offsets as discussed earlier. Even that is a bit risky because there can be still very small drifts, but they should be limited by a clock tick error max and might work out. -Andi ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2006-07-26 18:53 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-12 16:11 [discuss] Re: [PATCH] Allow all Opteron processors to change pstate at same time shin, jacob
2006-07-12 16:14 ` Langsdorf, Mark
2006-07-13 13:06 ` Pavel Machek
2006-07-13 14:32 ` Joachim Deguara
2006-07-16 1:56 ` Pavel Machek
2006-07-17 7:37 ` Joachim Deguara
2006-07-20 15:59 ` Pavel Machek
[not found] <Pine.LNX.4.64.0607061519040.9066@solonow.amd.com>
2006-07-07 12:10 ` Andi Kleen
2006-07-07 17:36 ` [discuss] " Langsdorf, Mark
2006-07-10 12:45 ` Joachim Deguara
2006-07-10 13:02 ` Joachim Deguara
2006-07-07 18:14 ` Scott Lampert
2006-07-07 18:26 ` Langsdorf, Mark
2006-07-11 12:55 ` Joachim Deguara
2006-07-11 13:07 ` Andi Kleen
2006-07-11 13:14 ` Arjan van de Ven
2006-07-11 16:15 ` Alan Cox
2006-07-11 16:01 ` Arjan van de Ven
2006-07-11 16:04 ` Andi Kleen
2006-07-11 13:31 ` Langsdorf, Mark
2006-07-11 13:34 ` Arjan van de Ven
2006-07-11 13:51 ` Andi Kleen
2006-07-12 14:06 ` Joachim Deguara
2006-07-12 14:54 ` Andi Kleen
2006-07-25 21:47 ` Langsdorf, Mark
2006-07-26 11:31 ` Joachim Deguara
2006-07-26 16:42 ` Langsdorf, Mark
2006-07-26 16:54 ` Andi Kleen
2006-07-26 18:34 ` Langsdorf, Mark
2006-07-26 18:53 ` Andi Kleen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox