* Extreme system overhead on large IP27
@ 2006-10-21 19:59 Karl-Johan Karlsson
2006-10-22 15:21 ` Ralf Baechle
0 siblings, 1 reply; 22+ messages in thread
From: Karl-Johan Karlsson @ 2006-10-21 19:59 UTC (permalink / raw)
To: linux-mips
I have an Origin 2000 with 16 R12000 and 16 R10000 CPU:s, running a git
snapshot kernel from 20060618 based on 2.6.17.10 (the latest available in
Gentoo). Light loads run without problems, but as soon as the load average
goes above 4-5 system overhead skyrockets and almost no useful work is being
done (see top output below). OProfile is no help, since the daemon just
throws away everything the kernel gives it (see output from strace of
oprofiled below).
Does anyone know where this overhead is coming from, or how to get some data
from OProfile so I can search for it myself? I'll try booting just the R12000
part sometime soon to see if that helps with either problem.
----------
top - 14:00:40 up 6 days, 20:25, 3 users, load average: 11.98, 11.58, 8.03
Tasks: 314 total, 11 running, 302 sleeping, 0 stopped, 1 zombie
Cpu0 : 0.0%us, 35.9%sy, 4.5%ni, 56.0%id, 0.0%wa, 0.0%hi, 3.6%si, 0.0%st
Cpu1 : 4.2%us, 37.4%sy, 3.9%ni, 50.0%id, 0.6%wa, 0.0%hi, 3.9%si, 0.0%st
Cpu2 : 0.0%us, 9.0%sy, 0.9%ni, 83.5%id, 0.0%wa, 0.0%hi, 6.6%si, 0.0%st
Cpu3 : 0.0%us, 3.3%sy, 0.3%ni, 89.7%id, 0.6%wa, 0.0%hi, 6.0%si, 0.0%st
Cpu4 : 0.0%us, 0.0%sy, 0.0%ni, 93.4%id, 0.0%wa, 0.0%hi, 6.6%si, 0.0%st
Cpu5 : 0.0%us, 0.0%sy, 0.0%ni, 93.4%id, 0.0%wa, 0.0%hi, 6.6%si, 0.0%st
Cpu6 : 0.0%us, 34.6%sy, 3.3%ni, 59.3%id, 0.0%wa, 0.0%hi, 2.7%si, 0.0%st
Cpu7 : 0.0%us, 40.5%sy, 3.0%ni, 52.9%id, 0.0%wa, 0.0%hi, 3.6%si, 0.0%st
Cpu8 : 0.0%us, 13.6%sy, 0.9%ni, 77.9%id, 0.0%wa, 0.0%hi, 7.6%si, 0.0%st
Cpu9 : 0.0%us, 18.5%sy, 2.4%ni, 70.8%id, 0.0%wa, 0.0%hi, 8.2%si, 0.0%st
Cpu10 : 0.0%us, 0.0%sy, 0.0%ni, 90.5%id, 0.0%wa, 0.0%hi, 9.5%si, 0.0%st
Cpu11 : 0.0%us, 15.6%sy, 2.1%ni, 73.3%id, 0.0%wa, 0.0%hi, 8.9%si, 0.0%st
Cpu12 : 0.0%us, 22.2%sy, 1.2%ni, 66.5%id, 0.0%wa, 0.0%hi, 10.2%si, 0.0%st
Cpu13 : 0.0%us, 0.0%sy, 0.0%ni, 87.5%id, 0.0%wa, 0.0%hi, 12.5%si, 0.0%st
Cpu14 : 0.0%us, 31.7%sy, 1.3%ni, 59.2%id, 0.0%wa, 0.0%hi, 7.8%si, 0.0%st
Cpu15 : 0.0%us, 53.9%sy, 0.6%ni, 34.7%id, 0.0%wa, 0.0%hi, 10.7%si, 0.0%st
Cpu16 : 0.0%us, 13.1%sy, 0.3%ni, 78.0%id, 0.3%wa, 0.0%hi, 8.3%si, 0.0%st
Cpu17 : 0.0%us, 59.5%sy, 2.4%ni, 31.1%id, 0.0%wa, 0.0%hi, 7.0%si, 0.0%st
Cpu18 : 0.0%us, 17.5%sy, 0.3%ni, 74.2%id, 0.0%wa, 0.0%hi, 8.0%si, 0.0%st
Cpu19 : 0.0%us, 45.8%sy, 0.3%ni, 45.8%id, 0.0%wa, 0.0%hi, 8.0%si, 0.0%st
Cpu20 : 0.0%us, 8.3%sy, 0.0%ni, 81.7%id, 0.0%wa, 0.0%hi, 9.9%si, 0.0%st
Cpu21 : 0.0%us, 78.3%sy, 0.3%ni, 12.8%id, 0.0%wa, 0.0%hi, 8.6%si, 0.0%st
Cpu22 : 0.0%us, 62.6%sy, 0.0%ni, 27.5%id, 0.0%wa, 0.0%hi, 9.9%si, 0.0%st
Cpu23 : 0.0%us, 30.9%sy, 0.0%ni, 59.0%id, 0.0%wa, 0.0%hi, 10.1%si, 0.0%st
Cpu24 : 0.0%us, 31.4%sy, 0.0%ni, 56.5%id, 0.0%wa, 0.0%hi, 12.1%si, 0.0%st
Cpu25 : 0.0%us, 56.6%sy, 0.0%ni, 30.9%id, 0.0%wa, 0.0%hi, 12.5%si, 0.0%st
Cpu26 : 0.0%us, 68.5%sy, 0.0%ni, 22.7%id, 0.0%wa, 0.0%hi, 8.8%si, 0.0%st
Cpu27 : 0.0%us, 8.7%sy, 0.0%ni, 81.9%id, 0.0%wa, 0.0%hi, 9.4%si, 0.0%st
Cpu28 : 0.9%us, 59.3%sy, 0.0%ni, 35.1%id, 0.0%wa, 0.0%hi, 4.7%si, 0.0%st
Cpu29 : 0.0%us, 36.3%sy, 0.0%ni, 59.6%id, 0.0%wa, 0.0%hi, 4.0%si, 0.0%st
Cpu30 : 0.0%us, 20.6%sy, 0.0%ni, 74.3%id, 0.0%wa, 0.0%hi, 5.1%si, 0.0%st
Cpu31 : 0.0%us, 75.3%sy, 0.3%ni, 19.9%id, 0.0%wa, 0.0%hi, 4.4%si, 0.0%st
Mem: 10975264k total, 1860404k used, 9114860k free, 486912k buffers
Swap: 2007992k total, 0k used, 2007992k free, 783448k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
19535 portage 28 3 4232 1076 124 R 97 0.0 0:04.08 sh
19524 portage 28 3 4232 1152 200 R 95 0.0 0:07.35 sh
19541 portage 28 3 2212 392 304 R 94 0.0 0:03.26 sed
19487 root 25 0 2200 424 340 R 91 0.0 0:18.38 find
19533 portage 28 3 4556 1116 148 R 91 0.0 0:04.46 sh
19543 portage 28 3 8900 944 676 R 79 0.0 0:02.62 cc1
19545 portage 28 3 6248 168 128 R 71 0.0 0:02.35 cc1
19491 portage 28 3 4232 1276 324 S 66 0.0 0:15.70 sh
19544 portage 21 3 5472 2676 824 S 41 0.0 0:01.37 as
19530 portage 28 3 2344 604 492 S 24 0.0 0:02.81 mips-unknown-li
19520 portage 28 3 2344 608 496 S 17 0.0 0:05.66 mips-unknown-li
19549 portage 23 3 4232 1264 312 R 11 0.0 0:00.36 sh
19518 creideik 16 0 4028 1400 976 R 10 0.0 0:04.35 top
19550 portage 22 3 4232 1220 268 R 9 0.0 0:00.31 sh
18578 portage 28 3 4556 2108 1140 S 8 0.0 0:44.73 sh
19059 portage 21 3 4232 2068 1116 S 8 0.0 0:15.94 sh
19551 portage 24 3 4232 1084 132 R 4 0.0 0:00.14 sh
----------
FD 3 = /dev/oprofile/buffer
1161459953.480231 lseek(3, 0, SEEK_SET) = 0 <0.001000>
1161459953.485233
read(3, "\377\377\377\377\377\377\377\377\0\0\0\0\0\0\0\2\0\0\0\0\0\0\0\v\377\377\377\377\377\377\377\377\0\0\0\0\0\0\0\2\0\0\0\0\0\0\0\3\377\377\377\377\377\377\377\377\0\0\0\0\0\0\0\2\0\0\0\0\0\0\0\10\377\377\377\377\377\377\377\377\0\0\0\0\0\0\0\2\0\0\0\0\0\0\0\25\377\377\377\377\377\377\377\377\0\0\0\0\0\0\0\2\0\0\0\0\0\0\0\31\377\377\377\377\377\377\377\377\0\0\0\0\0\0\0\2\0\0\0\0\0\0\0\35\377\377\377\377\377\377\377\377\0\0\0\0\0\0\0\2\0\0\0\0\0\0\0\34\377\377\377\377\377\377\377\377\0\0\0\0\0\0\0\2\0\0\0\0\0\0\0\20\377\377\377\377\377\377\377\377\0\0\0\0\0\0\0\2\0\0\0\0\0\0\0\27\377\377\377\377\377\377\377\377\0\0\0\0\0\0\0\2\0\0\0\0\0\0\0\32\377\377\377\377\377\377\377\377\0\0\0\0\0\0\0\2\0\0\0\0\0\0\0\33\377\377\377\377\377\377\377\377\0\0\0\0\0\0\0\2\0\0\0\0\0\0\0\n\377\377\377\377\377\377\377\377\0\0\0\0\0\0\0\2\0\0\0\0\0\0\0\16\377\377\377\377\377\377\377\377\0\0\0\0\0\0\0\2\0\0\0\0\0\0\0\4\377\377\377\377\377\377\377\377\0\0\0\0\0\0\0\2\0\0\0\0\0\0\0\22\377\377\377\377\377\377\377\377\0\0\0\0\0\0"...,
1048576) = 786456 <101.700285>
1161460055.353596 open("/var/lib/oprofile/complete_dump", O_WRONLY|O_CREAT|
O_TRUNC, 0666) = 4 <0.002001>
1161460055.357598 fstat64(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
<0.001000>
1161460055.361600 old_mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|
MAP_ANONYMOUS, -1, 0) = 0x2aad6000 <0.001000>
1161460055.364601 write(4, "1\n", 2) = 2 <0.002001>
1161460055.368603 close(4) = 0 <0.002001>
1161460055.373605 munmap(0x2aad6000, 65536) = 0 <0.002001>
1161460055.379608 lseek(3, 0, SEEK_SET) = 0 <0.001001>
1161460055.383610 read(3, <unfinished ...>
--
Karl-Johan Karlsson
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: Extreme system overhead on large IP27 2006-10-21 19:59 Extreme system overhead on large IP27 Karl-Johan Karlsson @ 2006-10-22 15:21 ` Ralf Baechle 2006-10-22 23:23 ` Ralf Baechle 0 siblings, 1 reply; 22+ messages in thread From: Ralf Baechle @ 2006-10-22 15:21 UTC (permalink / raw) To: Karl-Johan Karlsson; +Cc: linux-mips On Sat, Oct 21, 2006 at 09:59:02PM +0200, Karl-Johan Karlsson wrote: > I have an Origin 2000 with 16 R12000 and 16 R10000 CPU:s, running a git > snapshot kernel from 20060618 based on 2.6.17.10 (the latest available in > Gentoo). Light loads run without problems, but as soon as the load average > goes above 4-5 system overhead skyrockets and almost no useful work is being > done (see top output below). OProfile is no help, since the daemon just > throws away everything the kernel gives it (see output from strace of > oprofiled below). > > Does anyone know where this overhead is coming from, or how to get some data > from OProfile so I can search for it myself? I'll try booting just the R12000 > part sometime soon to see if that helps with either problem. Oprofile is a bit of a bitch on mixed processor systems since it assumes all processors to have identical performance counters. However SGI in it's wisdem decieded the R12000 had to be better than the R10000 and changed it. It is possible to work around that but lacking any mixed CPU configuration I've never done that. With those annotations, the kernel part of oprofile doesn't yet support R1x000 processors, I'll try to cook up something. Should be easy enough since the interface is nearly identical to MIPS32/MIPS64. Ralf ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Extreme system overhead on large IP27 2006-10-22 15:21 ` Ralf Baechle @ 2006-10-22 23:23 ` Ralf Baechle 2006-10-23 0:19 ` Ralf Baechle 0 siblings, 1 reply; 22+ messages in thread From: Ralf Baechle @ 2006-10-22 23:23 UTC (permalink / raw) To: Karl-Johan Karlsson; +Cc: linux-mips On Sun, Oct 22, 2006 at 04:21:58PM +0100, Ralf Baechle wrote: > > I have an Origin 2000 with 16 R12000 and 16 R10000 CPU:s, running a git > > snapshot kernel from 20060618 based on 2.6.17.10 (the latest available in > > Gentoo). Light loads run without problems, but as soon as the load average > > goes above 4-5 system overhead skyrockets and almost no useful work is being > > done (see top output below). OProfile is no help, since the daemon just > > throws away everything the kernel gives it (see output from strace of > > oprofiled below). > > > > Does anyone know where this overhead is coming from, or how to get some data > > from OProfile so I can search for it myself? I'll try booting just the R12000 > > part sometime soon to see if that helps with either problem. > > Oprofile is a bit of a bitch on mixed processor systems since it assumes > all processors to have identical performance counters. However SGI in > it's wisdem decieded the R12000 had to be better than the R10000 and > changed it. It is possible to work around that but lacking any mixed > CPU configuration I've never done that. > > With those annotations, the kernel part of oprofile doesn't yet support > R1x000 processors, I'll try to cook up something. Should be easy enough > since the interface is nearly identical to MIPS32/MIPS64. Okay, turns out as I suspected one of the well facts well disguised by the R10000, MIPS32 and MIPS64 architecture manuals is that the R10000 MFPS, MFPC, MTPS, MTPC instructions use the same encoding as MIPS32/MIPS64 mfc0 instructions with a selector argument, So getting oprofile to actually work on the R10000 family won't be hard. Ralf ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Extreme system overhead on large IP27 2006-10-22 23:23 ` Ralf Baechle @ 2006-10-23 0:19 ` Ralf Baechle 2006-10-23 21:30 ` Karl-Johan Karlsson 0 siblings, 1 reply; 22+ messages in thread From: Ralf Baechle @ 2006-10-23 0:19 UTC (permalink / raw) To: Karl-Johan Karlsson; +Cc: linux-mips On Mon, Oct 23, 2006 at 12:23:16AM +0100, Ralf Baechle wrote: > Okay, turns out as I suspected one of the well facts well disguised by the > R10000, MIPS32 and MIPS64 architecture manuals is that the R10000 MFPS, > MFPC, MTPS, MTPC instructions use the same encoding as MIPS32/MIPS64 mfc0 > instructions with a selector argument, So getting oprofile to actually > work on the R10000 family won't be hard. Can you test below patch which adds oprofile support for the R10000 family processors? The patch only adds support for these processors; it doesn't attempt to get things right for mixes of R10000 and R12000 processors, so as a hint for usage: * oprofile will detect the number of performance counters on whatever processor (likely to be CPU 0) executes its intialization code. Whatever the result of this detection, oprofile will believe all processors are identical. * R10000 processors have 2 counters, R12000 processors have 4 counters. As the result a mixed configuration will be limited to only 2 counters. * R10000 and R12000 processors have different events. Only the events that are identical on both processors can be used. Most interesting these are: Counters 0, 1 CYCLES Counter 0 INSTRUCTIONS_GRADUATED Ralf [MIPS] Oprofile: kernel support for the R10000. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> diff --git a/arch/mips/oprofile/Makefile b/arch/mips/oprofile/Makefile index 0a50aad..a54362c 100644 --- a/arch/mips/oprofile/Makefile +++ b/arch/mips/oprofile/Makefile @@ -12,5 +12,6 @@ oprofile-y := $(DRIVER_OBJS) common.o oprofile-$(CONFIG_CPU_MIPS32) += op_model_mipsxx.o oprofile-$(CONFIG_CPU_MIPS64) += op_model_mipsxx.o +oprofile-$(CONFIG_CPU_R10000) += op_model_r10k.o oprofile-$(CONFIG_CPU_SB1) += op_model_mipsxx.o oprofile-$(CONFIG_CPU_RM9000) += op_model_rm9000.o diff --git a/arch/mips/oprofile/common.c b/arch/mips/oprofile/common.c index 65eb554..2524215 100644 --- a/arch/mips/oprofile/common.c +++ b/arch/mips/oprofile/common.c @@ -15,6 +15,7 @@ #include <asm/cpu-info.h> #include "op_impl.h" extern struct op_mips_model op_model_mipsxx_ops __attribute__((weak)); +extern struct op_mips_model op_model_r10k_ops __attribute__((weak)); extern struct op_mips_model op_model_rm9000_ops __attribute__((weak)); static struct op_mips_model *model; @@ -86,6 +87,12 @@ int __init oprofile_arch_init(struct opr lmodel = &op_model_mipsxx_ops; break; + case CPU_R10000: + case CPU_R12000: + case CPU_R14000: + lmodel = &op_model_r10k_ops;; + break; + case CPU_RM9000: lmodel = &op_model_rm9000_ops; break; diff --git a/arch/mips/oprofile/op_model_r10k.c b/arch/mips/oprofile/op_model_r10k.c new file mode 100644 index 0000000..6dd2a5e --- /dev/null +++ b/arch/mips/oprofile/op_model_r10k.c @@ -0,0 +1,227 @@ +/* + * This file is subject to the terms and conditions of the GNU General Public + * License. See the file "COPYING" in the main directory of this archive + * for more details. + * + * Copyright (C) 2004, 05, 06 by Ralf Baechle + * Copyright (C) 2005 by MIPS Technologies, Inc. + */ +#include <linux/oprofile.h> +#include <linux/interrupt.h> +#include <linux/smp.h> +#include <asm/irq_regs.h> + +#include "op_impl.h" + +#define M_PERFCTL_EXL (1UL << 0) +#define M_PERFCTL_KERNEL (1UL << 1) +#define M_PERFCTL_SUPERVISOR (1UL << 2) +#define M_PERFCTL_USER (1UL << 3) +#define M_PERFCTL_INTERRUPT_ENABLE (1UL << 4) +#define M_PERFCTL_EVENT(event) ((event) << 5) +#define M_PERFCTL_WIDE (1UL << 30) +#define M_PERFCTL_MORE (1UL << 31) + +#define M_COUNTER_OVERFLOW (1UL << 31) + +struct op_mips_model op_model_r10k_ops; + +static struct r10k_register_config { + unsigned int control[4]; + unsigned int counter[4]; +} reg; + +/* Compute all of the registers in preparation for enabling profiling. */ + +static void r10k_reg_setup(struct op_counter_config *ctr) +{ + unsigned int counters = op_model_r10k_ops.num_counters; + int i; + + /* Compute the performance counter control word. */ + /* For now count kernel and user mode */ + for (i = 0; i < counters; i++) { + reg.control[i] = 0; + reg.counter[i] = 0; + + if (!ctr[i].enabled) + continue; + + reg.control[i] = M_PERFCTL_EVENT(ctr[i].event) | + M_PERFCTL_INTERRUPT_ENABLE; + if (ctr[i].kernel) + reg.control[i] |= M_PERFCTL_KERNEL; + if (ctr[i].user) + reg.control[i] |= M_PERFCTL_USER; + if (ctr[i].exl) + reg.control[i] |= M_PERFCTL_EXL; + reg.counter[i] = 0x80000000 - ctr[i].count; + } +} + +/* Program all of the registers in preparation for enabling profiling. */ + +static void r10k_cpu_setup (void *args) +{ + unsigned int counters = op_model_r10k_ops.num_counters; + + switch (counters) { + case 4: + write_c0_perfctrl3(0); + write_c0_perfcntr3(reg.counter[3]); + case 3: + write_c0_perfctrl2(0); + write_c0_perfcntr2(reg.counter[2]); + case 2: + write_c0_perfctrl1(0); + write_c0_perfcntr1(reg.counter[1]); + case 1: + write_c0_perfctrl0(0); + write_c0_perfcntr0(reg.counter[0]); + } +} + +/* Start all counters on current CPU */ +static void r10k_cpu_start(void *args) +{ + unsigned int counters = op_model_r10k_ops.num_counters; + + switch (counters) { + case 4: + write_c0_perfctrl3(reg.control[3]); + case 3: + write_c0_perfctrl2(reg.control[2]); + case 2: + write_c0_perfctrl1(reg.control[1]); + case 1: + write_c0_perfctrl0(reg.control[0]); + } +} + +/* Stop all counters on current CPU */ +static void r10k_cpu_stop(void *args) +{ + unsigned int counters = op_model_r10k_ops.num_counters; + + switch (counters) { + case 4: + write_c0_perfctrl3(0); + case 3: + write_c0_perfctrl2(0); + case 2: + write_c0_perfctrl1(0); + case 1: + write_c0_perfctrl0(0); + } +} + +static int r10k_perfcount_handler(void) +{ + unsigned int counters = op_model_r10k_ops.num_counters; + unsigned int control; + unsigned int counter; + int handled = 0; + + switch (counters) { +#define HANDLE_COUNTER(n) \ + case n + 1: \ + control = read_c0_perfctrl ## n(); \ + counter = read_c0_perfcntr ## n(); \ + if ((control & M_PERFCTL_INTERRUPT_ENABLE) && \ + (counter & M_COUNTER_OVERFLOW)) { \ + oprofile_add_sample(get_irq_regs(), n); \ + write_c0_perfcntr ## n(reg.counter[n]); \ + handled = 1; \ + } + HANDLE_COUNTER(3) + HANDLE_COUNTER(2) + HANDLE_COUNTER(1) + HANDLE_COUNTER(0) + } + + return handled; +} + +#define M_CONFIG1_PC (1 << 4) + +static inline int n_counters(void) +{ + switch (current_cpu_data.cputype) { + case CPU_R10000: + case CPU_R12000: + return 2; + + case CPU_R14000: + return 4; + } + + return 0; +} + +static inline void reset_counters(int counters) +{ + switch (counters) { + case 4: + write_c0_perfctrl3(0); + write_c0_perfcntr3(0); + case 3: + write_c0_perfctrl2(0); + write_c0_perfcntr2(0); + case 2: + write_c0_perfctrl1(0); + write_c0_perfcntr1(0); + case 1: + write_c0_perfctrl0(0); + write_c0_perfcntr0(0); + } +} + +static int __init r10k_init(void) +{ + int counters; + + counters = n_counters(); + if (counters == 0) { + printk(KERN_ERR "Oprofile: CPU has no performance counters\n"); + return -ENODEV; + } + + reset_counters(counters); + + op_model_r10k_ops.num_counters = counters; + switch (current_cpu_data.cputype) { + case CPU_R10000: + op_model_r10k_ops.cpu_type = "mips/r10000"; + break; + + case CPU_R12000: + case CPU_R14000: + op_model_r10k_ops.cpu_type = "mips/r12000"; + break; + + default: + printk(KERN_ERR "Profiling unsupported for this CPU\n"); + + return -ENODEV; + } + + perf_irq = r10k_perfcount_handler; + + return 0; +} + +static void r10k_exit(void) +{ + reset_counters(op_model_r10k_ops.num_counters); + + perf_irq = null_perf_irq; +} + +struct op_mips_model op_model_r10k_ops = { + .reg_setup = r10k_reg_setup, + .cpu_setup = r10k_cpu_setup, + .init = r10k_init, + .exit = r10k_exit, + .cpu_start = r10k_cpu_start, + .cpu_stop = r10k_cpu_stop, +}; ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: Extreme system overhead on large IP27 2006-10-23 0:19 ` Ralf Baechle @ 2006-10-23 21:30 ` Karl-Johan Karlsson [not found] ` <20061023224318.GA1732@linux-mips.org> 0 siblings, 1 reply; 22+ messages in thread From: Karl-Johan Karlsson @ 2006-10-23 21:30 UTC (permalink / raw) To: Ralf Baechle; +Cc: linux-mips On Monday 23 October 2006 02:19, Ralf Baechle wrote: > Can you test below patch which adds oprofile support for the R10000 > family processors? I've tried it, and it doesn't solve my problem. With the patch applied, "opcontrol --list-events" seems correct, but I still get no data from OProfile, neither from the CYCLES nor the INSTRUCTIONS_GRADUATED event. /var/lib/oprofile/oprofiled.log just repeats: Nr. samples lost cpu buffer overflow: 0 Nr. samples received: 0 Nr. backtrace aborted: 0 I tried both on the full machine and on only the R12000 rack with identical results. The R12000 rack alone also has the original problem with large system overhead. -- Karl-Johan Karlsson ^ permalink raw reply [flat|nested] 22+ messages in thread
[parent not found: <20061023224318.GA1732@linux-mips.org>]
* Re: Extreme system overhead on large IP27 [not found] ` <20061023224318.GA1732@linux-mips.org> @ 2006-10-24 13:53 ` Karl-Johan Karlsson 2006-10-24 14:06 ` Ralf Baechle 0 siblings, 1 reply; 22+ messages in thread From: Karl-Johan Karlsson @ 2006-10-24 13:53 UTC (permalink / raw) To: Ralf Baechle; +Cc: linux-mips On Tue, October 24, 2006 00:43, Ralf Baechle wrote: > If you reduce your system to just 4 processors, do you also have that > extremly high overhead? The reason I'm asking is that my own Origin 200 > system has just 4 processors. I can't get physical access to the system to pull out CPU boards today, so I did the best I could do remotely - powered down all modules but one and am now running a kernel built with support for only 4 of the 8 remaining R12000 CPU:s. Overhead is not as extreme as with more CPU:s, but still high. Running four copies of "md5sum /dev/zero", top shows around 95% useful work and 5% system overhead per CPU, while a "make -j4" of the kernel gives me 20-30% system and 70-80% user time (down from a maximum of 80% system time with all 32 CPU:s). This is still on the Gentoo 2.6.17.10 kernel, by the way (which is a mips-git snapshot from 2006-06-18 plus extra patches from e.g. <URL:http://ftp.du.se/pub/os/gentoo/distfiles/mips-sources-generic_patches-1.25.tar.bz2>). I tried a git snapshot from earlier today, but the only thing that kernel did was print the NUMA-link topology and then hang. Now that I actually look at Gentoo's patchset, I see there's a large patch (misc-2.6.17-ioc3-metadriver-r26.patch) touching serial and ethernet drivers for the IOC3. Perhaps the snapshot actually did boot, but just couldn't talk to me without that patch? The patch doesn't apply to the current git, though, so I think I'll leave that to someone who knows what they're doing. -- Karl-Johan Karlsson ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Extreme system overhead on large IP27 2006-10-24 13:53 ` Karl-Johan Karlsson @ 2006-10-24 14:06 ` Ralf Baechle 2006-10-24 15:33 ` Ilya A. Volynets-Evenbakh 2006-10-24 15:44 ` Karl-Johan Karlsson 0 siblings, 2 replies; 22+ messages in thread From: Ralf Baechle @ 2006-10-24 14:06 UTC (permalink / raw) To: Karl-Johan Karlsson; +Cc: linux-mips On Tue, Oct 24, 2006 at 03:53:56PM +0200, Karl-Johan Karlsson wrote: > I can't get physical access to the system to pull out CPU boards today, so > I did the best I could do remotely - powered down all modules but one and > am now running a kernel built with support for only 4 of the 8 remaining > R12000 CPU:s. The kernel has a maxcpus=<somenumber> option which is even easier. You also can disable processors at the boot prompt. Pulling node boards is strongly disrecommended; the connectors are very fragile. > Overhead is not as extreme as with more CPU:s, but still high. Running > four copies of "md5sum /dev/zero", top shows around 95% useful work and 5% > system overhead per CPU, while a "make -j4" of the kernel gives me 20-30% > system and 70-80% user time (down from a maximum of 80% system time with > all 32 CPU:s). > > This is still on the Gentoo 2.6.17.10 kernel, by the way (which is a > mips-git snapshot from 2006-06-18 plus extra patches from e.g. > <URL:http://ftp.du.se/pub/os/gentoo/distfiles/mips-sources-generic_patches-1.25.tar.bz2>). > I tried a git snapshot from earlier today, but the only thing that kernel > did was print the NUMA-link topology and then hang. To use the linux-mips.org git kernel you also need my IP27 patchset available from /pub/linux/mips/people/ralf/ip27/ on ftp.linux-mips.org. > Now that I actually look at Gentoo's patchset, I see there's a large patch > (misc-2.6.17-ioc3-metadriver-r26.patch) touching serial and ethernet > drivers for the IOC3. Perhaps the snapshot actually did boot, but just > couldn't talk to me without that patch? The patch doesn't apply to the > current git, though, so I think I'll leave that to someone who knows what > they're doing. That metadriver thing is primarily necessary for the sake of Octanes. Ralf ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Extreme system overhead on large IP27 2006-10-24 14:06 ` Ralf Baechle @ 2006-10-24 15:33 ` Ilya A. Volynets-Evenbakh 2006-10-24 15:44 ` Karl-Johan Karlsson 1 sibling, 0 replies; 22+ messages in thread From: Ilya A. Volynets-Evenbakh @ 2006-10-24 15:33 UTC (permalink / raw) To: Ralf Baechle; +Cc: Karl-Johan Karlsson, linux-mips Ralf Baechle wrote: > On Tue, Oct 24, 2006 at 03:53:56PM +0200, Karl-Johan Karlsson wrote: > >> Now that I actually look at Gentoo's patchset, I see there's a large patch >> (misc-2.6.17-ioc3-metadriver-r26.patch) touching serial and ethernet >> drivers for the IOC3. Perhaps the snapshot actually did boot, but just >> couldn't talk to me without that patch? The patch doesn't apply to the >> current git, though, so I think I'll leave that to someone who knows what >> they're doing. >> > > That metadriver thing is primarily necessary for the sake of Octanes. > Actually I wasn't able to get my O2K to boot without it, last time I tried. > Ralf > -- Ilya A. Volynets-Evenbakh Total Knowledge. CTO http://www.total-knowledge.com ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Extreme system overhead on large IP27 2006-10-24 14:06 ` Ralf Baechle 2006-10-24 15:33 ` Ilya A. Volynets-Evenbakh @ 2006-10-24 15:44 ` Karl-Johan Karlsson 2006-10-24 15:50 ` Ralf Baechle ` (2 more replies) 1 sibling, 3 replies; 22+ messages in thread From: Karl-Johan Karlsson @ 2006-10-24 15:44 UTC (permalink / raw) To: Ralf Baechle; +Cc: linux-mips On Tue, October 24, 2006 16:06, Ralf Baechle wrote: > On Tue, Oct 24, 2006 at 03:53:56PM +0200, Karl-Johan Karlsson wrote: >> This is still on the Gentoo 2.6.17.10 kernel, by the way (which is a >> mips-git snapshot from 2006-06-18 plus extra patches from e.g. >> <URL:http://ftp.du.se/pub/os/gentoo/distfiles/mips-sources-generic_patches-1.25.tar.bz2>). >> I tried a git snapshot from earlier today, but the only thing that >> kernel >> did was print the NUMA-link topology and then hang. > > To use the linux-mips.org git kernel you also need my IP27 patchset > available from /pub/linux/mips/people/ralf/ip27/ on ftp.linux-mips.org. That at least made it boot. Problems: 1. System overhead is still there, still 20-30% per CPU when building a kernel with "make -j4" with 4 CPU:s enabled. 2. Timekeeping is broken. The clock in /proc/driver/rtc seems correct, but the system clock advances at about 1/16 of real time. # zcat /proc/config.gz | grep HZ | grep -v ^# CONFIG_HZ_250=y CONFIG_SYS_SUPPORTS_ARBIT_HZ=y CONFIG_HZ=250 # zcat /proc/config.gz | grep RTC | grep -v ^# CONFIG_RTC=y CONFIG_SGI_IP27_RTC=y 3. When booting, the kernel started, did a bit of initialization, restarted, and did everything all over again, this time going all the way. Loading dksc(0,1,8)/vmlinux... Reading 3659336 bytes... OK. Entering kernel. [17179569.184000] Linux version 2.6.19-rc3 (root@viggen) (gcc version 4.1.1 (Gentoo 4.1.1)) #2 SMP Tue Oct 24 16:49:51 CEST 2006 [17179569.184000] ARCH: SGI-IP27 [17179569.184000] PROMLIB: ARC firmware Version 64 Revision 0 [17179569.184000] Discovered 8 cpus on 4 nodes [17179569.184000] ************** Topology ******************** [17179569.184000] 00 01 02 03 [17179569.184000] 00 0 1 2 2 [17179569.184000] 01 1 0 2 2 [17179569.184000] 02 2 2 0 1 [17179569.184000] 03 2 2 1 0 [17179569.184000] Router 0: 1 0 r [17179569.184000] Router 1: 3 2 r [17179569.184000] CPU revision is: 00000e23 [17179569.184000] FPU revision is: 00000900 [17179569.184000] IP27: Running on node 0. [17179569.184000] Node 0 has a primary CPU, CPU is running. [17179569.184000] Node 0 has a secondary CPU, CPU is running. [17179569.184000] Machine is in M mode. [17179569.184000] Cpu 0, Nasid 0x0: partnum 0x0 is is xbow [17179569.184000] Cpu 0, Nasid 0x0, widget 0x8 (partnum 0xc002) is a bridge [17179569.184000] Bridge SSRAM size 1kB [17179569.184000] b_even_resp: 0000ba98 [17179569.184000] b_odd_resp: 0000ba98 [17179569.184000] Cpu 0, Nasid 0x0, widget 0xe (partnum 0xc002) is a bridge [17179569.184000] Bridge SSRAM size 1kB [17179569.184000] b_even_resp: 0000ba98 [17179569.184000] b_odd_resp: 0000ba98 [17179569.184000] CPU 0 clock is 300MHz. [17179569.184000] Determined physical RAM map: [17179569.184000] REPLICATION: ON nasid 0, ktext from nasid 0, kdata from nasid 0 [17179569.184000] REPLICATION: ON nasid 1, ktext from nasid 0, kdata from nasid 0 [17179569.184000] REPLICATION: ON nasid 2, ktext from nasid 0, kdata from nasid 0 [17179569.184000] REPLICATION: ON nasid 3, ktext from nasid 0, kdata from nasid 0 [17179569.184000] Built 4 zonelists. Total pages: 1551360 [17179569.184000] Kernel command line: root=/dev/md2 append console=ttyS0 [17179569.184000] Primary instruction cache 32kB, physically tagged, 2-way, linesize 64 bytes. [17179569.184000] Primary data cache 32kB, 2-way, linesize 32 bytes. [17179569.184000] Unified secondary cache 8192kB 2-way, linesize 128 bytes. [17179569.184000] Virtual address space probed at 44 bits [17179569.184000] Physical address space probed at 40 bits [17179569.184000] Synthesized TLB refill handler (41 instructions). [17179569.184000] Synthesized TLB load handler fastpath (55 instructions). [17179569.184000] Synthesized TLB store handler fastpath (55 instructions). [17179569.184000] Synthesized TLB modify handler fastpath (54 instructions). [17179569.184000] PID hash table entries: 4096 (order: 12, 32768 bytes) [17179569.184000] Using 1.250 MHz high precision timer. [17179569.184000] Linux version 2.6.19-rc3 (root@viggen) (gcc version 4.1.1 (Gentoo 4.1.1)) #2 SMP Tue Oct 24 16:49:51 CEST 2006 [17179569.184000] ARCH: SGI-IP27 [17179569.184000] PROMLIB: ARC firmware Version 64 Revision 0 [17179569.184000] Discovered 8 cpus on 4 nodes [17179569.184000] ************** Topology ******************** [17179569.184000] 00 01 02 03 [17179569.184000] 00 0 1 2 2 [17179569.184000] 01 1 0 2 2 [17179569.184000] 02 2 2 0 1 [17179569.184000] 03 2 2 1 0 [17179569.184000] Router 0: 1 0 r [17179569.184000] Router 1: 3 2 r [17179569.184000] CPU revision is: 00000e23 [17179569.184000] FPU revision is: 00000900 [17179569.184000] IP27: Running on node 0. [17179569.184000] Node 0 has a primary CPU, CPU is running. [17179569.184000] Node 0 has a secondary CPU, CPU is running. [17179569.184000] Machine is in M mode. [17179569.184000] Cpu 0, Nasid 0x0: partnum 0x0 is is xbow [17179569.184000] Cpu 0, Nasid 0x0, widget 0x8 (partnum 0xc002) is a bridge [17179569.184000] Bridge SSRAM size 1kB [17179569.184000] b_even_resp: 0000ba98 [17179569.184000] b_odd_resp: 0000ba98 [17179569.184000] Cpu 0, Nasid 0x0, widget 0xe (partnum 0xc002) is a bridge [17179569.184000] Bridge SSRAM size 1kB [17179569.184000] b_even_resp: 0000ba98 [17179569.184000] b_odd_resp: 0000ba98 [17179569.184000] CPU 0 clock is 300MHz. [17179569.184000] Determined physical RAM map: [17179569.184000] REPLICATION: ON nasid 0, ktext from nasid 0, kdata from nasid 0 [17179569.184000] REPLICATION: ON nasid 1, ktext from nasid 0, kdata from nasid 0 [17179569.184000] REPLICATION: ON nasid 2, ktext from nasid 0, kdata from nasid 0 [17179569.184000] REPLICATION: ON nasid 3, ktext from nasid 0, kdata from nasid 0 [17179569.184000] Built 4 zonelists. Total pages: 1551360 [17179569.184000] Kernel command line: root=/dev/md2 append console=ttyS0 [17179569.184000] Primary instruction cache 32kB, physically tagged, 2-way, linesize 64 bytes. [17179569.184000] Primary data cache 32kB, 2-way, linesize 32 bytes. [17179569.184000] Unified secondary cache 8192kB 2-way, linesize 128 bytes. [17179569.184000] Virtual address space probed at 44 bits [17179569.184000] Physical address space probed at 40 bits [17179569.184000] Synthesized TLB refill handler (41 instructions). [17179569.184000] Synthesized TLB load handler fastpath (55 instructions). [17179569.184000] Synthesized TLB store handler fastpath (55 instructions). [17179569.184000] Synthesized TLB modify handler fastpath (54 instructions). [17179569.184000] PID hash table entries: 4096 (order: 12, 32768 bytes) [17179569.184000] Using 1.250 MHz high precision timer. [17179569.260000] Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes) [17179569.332000] Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes) [17179571.216000] Memory: 4091192k/4194304k available (2492k kernel code, 103112k reserved, 850k data, 232k init, 0k highmem) [...] # zcat /proc/config.gz | grep -v ^# | grep . CONFIG_MIPS=y CONFIG_SGI_IP27=y CONFIG_SGI_SN_M_MODE=y CONFIG_EARLY_PRINTK=y CONFIG_RWSEM_GENERIC_SPINLOCK=y CONFIG_GENERIC_FIND_NEXT_BIT=y CONFIG_GENERIC_HWEIGHT=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_GENERIC_TIME=y CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER=y CONFIG_ARC=y CONFIG_DMA_IP27=y CONFIG_CPU_BIG_ENDIAN=y CONFIG_SYS_SUPPORTS_BIG_ENDIAN=y CONFIG_MIPS_L1_CACHE_SHIFT=7 CONFIG_ARC64=y CONFIG_BOOT_ELF64=y CONFIG_CPU_R10000=y CONFIG_SYS_HAS_CPU_R10000=y CONFIG_SYS_SUPPORTS_64BIT_KERNEL=y CONFIG_CPU_SUPPORTS_32BIT_KERNEL=y CONFIG_CPU_SUPPORTS_64BIT_KERNEL=y CONFIG_64BIT=y CONFIG_PAGE_SIZE_4KB=y CONFIG_CPU_HAS_PREFETCH=y CONFIG_MIPS_MT_DISABLED=y CONFIG_CPU_HAS_LLSC=y CONFIG_CPU_HAS_SYNC=y CONFIG_GENERIC_HARDIRQS=y CONFIG_GENERIC_IRQ_PROBE=y CONFIG_IRQ_PER_CPU=y CONFIG_CPU_SUPPORTS_HIGHMEM=y CONFIG_ARCH_DISCONTIGMEM_ENABLE=y CONFIG_NUMA=y CONFIG_SYS_SUPPORTS_NUMA=y CONFIG_NODES_SHIFT=6 CONFIG_SELECT_MEMORY_MODEL=y CONFIG_DISCONTIGMEM_MANUAL=y CONFIG_DISCONTIGMEM=y CONFIG_FLAT_NODE_MEM_MAP=y CONFIG_NEED_MULTIPLE_NODES=y CONFIG_SPLIT_PTLOCK_CPUS=4 CONFIG_MIGRATION=y CONFIG_RESOURCES_64BIT=y CONFIG_SMP=y CONFIG_SYS_SUPPORTS_SMP=y CONFIG_NR_CPUS=4 CONFIG_HZ_250=y CONFIG_SYS_SUPPORTS_ARBIT_HZ=y CONFIG_HZ=250 CONFIG_PREEMPT_NONE=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" CONFIG_EXPERIMENTAL=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_LOCALVERSION="" CONFIG_LOCALVERSION_AUTO=y CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_POSIX_MQUEUE=y CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_CPUSETS=y CONFIG_INITRAMFS_SOURCE="" CONFIG_CC_OPTIMIZE_FOR_SIZE=y CONFIG_SYSCTL=y CONFIG_EMBEDDED=y CONFIG_KALLSYMS=y CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_EPOLL=y CONFIG_SHMEM=y CONFIG_SLAB=y CONFIG_VM_EVENT_COUNTERS=y CONFIG_RT_MUTEXES=y CONFIG_BASE_SMALL=0 CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y CONFIG_KMOD=y CONFIG_STOP_MACHINE=y CONFIG_BLOCK=y CONFIG_IOSCHED_NOOP=y CONFIG_IOSCHED_AS=y CONFIG_DEFAULT_AS=y CONFIG_DEFAULT_IOSCHED="anticipatory" CONFIG_HW_HAS_PCI=y CONFIG_PCI=y CONFIG_PCI_DOMAINS=y CONFIG_MMU=y CONFIG_BINFMT_ELF=y CONFIG_BUILD_ELF64=y CONFIG_MIPS32_COMPAT=y CONFIG_COMPAT=y CONFIG_MIPS32_O32=y CONFIG_BINFMT_ELF32=y CONFIG_NET=y CONFIG_PACKET=y CONFIG_PACKET_MMAP=y CONFIG_UNIX=y CONFIG_INET=y CONFIG_IP_FIB_HASH=y CONFIG_SYN_COOKIES=y CONFIG_INET_DIAG=y CONFIG_INET_TCP_DIAG=y CONFIG_TCP_CONG_CUBIC=y CONFIG_DEFAULT_TCP_CONG="cubic" CONFIG_STANDALONE=y CONFIG_PREVENT_FIRMWARE_BUILD=y CONFIG_BLK_DEV_LOOP=y CONFIG_RAID_ATTRS=y CONFIG_SCSI=y CONFIG_BLK_DEV_SD=y CONFIG_BLK_DEV_SR=y CONFIG_BLK_DEV_SR_VENDOR=y CONFIG_CHR_DEV_SG=y CONFIG_SCSI_CONSTANTS=y CONFIG_SCSI_SPI_ATTRS=y CONFIG_SCSI_QLOGIC_1280=y CONFIG_MD=y CONFIG_BLK_DEV_MD=y CONFIG_MD_RAID1=y CONFIG_NETDEVICES=y CONFIG_PHYLIB=y CONFIG_MARVELL_PHY=y CONFIG_DAVICOM_PHY=y CONFIG_QSEMI_PHY=y CONFIG_LXT_PHY=y CONFIG_CICADA_PHY=y CONFIG_VITESSE_PHY=y CONFIG_SMSC_PHY=y CONFIG_NET_ETHERNET=y CONFIG_MII=y CONFIG_SGI_IOC3_ETH=y CONFIG_SERIO=y CONFIG_SERIO_SERPORT=y CONFIG_SERIAL_8250=y CONFIG_SERIAL_8250_CONSOLE=y CONFIG_SERIAL_8250_PCI=y CONFIG_SERIAL_8250_NR_UARTS=4 CONFIG_SERIAL_8250_RUNTIME_UARTS=4 CONFIG_SERIAL_8250_EXTENDED=y CONFIG_SERIAL_8250_MANY_PORTS=y CONFIG_SERIAL_8250_SHARE_IRQ=y CONFIG_SERIAL_CORE=y CONFIG_SERIAL_CORE_CONSOLE=y CONFIG_UNIX98_PTYS=y CONFIG_LEGACY_PTYS=y CONFIG_LEGACY_PTY_COUNT=256 CONFIG_HW_RANDOM=y CONFIG_RTC=y CONFIG_SGI_IP27_RTC=y CONFIG_USB_ARCH_HAS_HCD=y CONFIG_USB_ARCH_HAS_OHCI=y CONFIG_USB_ARCH_HAS_EHCI=y CONFIG_EXT2_FS=y CONFIG_EXT2_FS_XATTR=y CONFIG_EXT2_FS_POSIX_ACL=y CONFIG_EXT2_FS_SECURITY=y CONFIG_EXT3_FS=y CONFIG_EXT3_FS_XATTR=y CONFIG_EXT3_FS_POSIX_ACL=y CONFIG_EXT3_FS_SECURITY=y CONFIG_JBD=y CONFIG_FS_MBCACHE=y CONFIG_FS_POSIX_ACL=y CONFIG_INOTIFY=y CONFIG_INOTIFY_USER=y CONFIG_AUTOFS4_FS=y CONFIG_ISO9660_FS=y CONFIG_JOLIET=y CONFIG_PROC_FS=y CONFIG_PROC_KCORE=y CONFIG_PROC_SYSCTL=y CONFIG_SYSFS=y CONFIG_TMPFS=y CONFIG_RAMFS=y CONFIG_NFS_FS=y CONFIG_NFS_V3=y CONFIG_LOCKD=y CONFIG_LOCKD_V4=y CONFIG_NFS_COMMON=y CONFIG_SUNRPC=y CONFIG_PARTITION_ADVANCED=y CONFIG_MSDOS_PARTITION=y CONFIG_SGI_PARTITION=y CONFIG_NLS=y CONFIG_NLS_DEFAULT="iso8859-1" CONFIG_NLS_ASCII=y CONFIG_NLS_ISO8859_1=y CONFIG_NLS_ISO8859_15=y CONFIG_NLS_UTF8=y CONFIG_PROFILING=y CONFIG_OPROFILE=y CONFIG_TRACE_IRQFLAGS_SUPPORT=y CONFIG_PRINTK_TIME=y CONFIG_MAGIC_SYSRQ=y CONFIG_DEBUG_KERNEL=y CONFIG_LOG_BUF_SHIFT=17 CONFIG_DEBUG_INFO=y CONFIG_CMDLINE="" CONFIG_CRC32=y CONFIG_PLIST=y -- Karl-Johan Karlsson ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Extreme system overhead on large IP27 2006-10-24 15:44 ` Karl-Johan Karlsson @ 2006-10-24 15:50 ` Ralf Baechle 2006-10-24 17:34 ` Atsushi Nemoto 2006-10-25 8:45 ` Atsushi Nemoto 2 siblings, 0 replies; 22+ messages in thread From: Ralf Baechle @ 2006-10-24 15:50 UTC (permalink / raw) To: Karl-Johan Karlsson; +Cc: linux-mips On Tue, Oct 24, 2006 at 05:44:41PM +0200, Karl-Johan Karlsson wrote: > 2. Timekeeping is broken. The clock in /proc/driver/rtc seems correct, but > the system clock advances at about 1/16 of real time. This one was caused by changeset ebca9aafa9bd5086d9f310205a8e30e225c5a5a6 which apparently wasn't quite ripe. You can work around it by revoking this changeset for now. The time damage affects other systems as well ... Ralf ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Extreme system overhead on large IP27 2006-10-24 15:44 ` Karl-Johan Karlsson 2006-10-24 15:50 ` Ralf Baechle @ 2006-10-24 17:34 ` Atsushi Nemoto 2006-10-24 17:50 ` Ralf Baechle 2006-10-25 8:45 ` Atsushi Nemoto 2 siblings, 1 reply; 22+ messages in thread From: Atsushi Nemoto @ 2006-10-24 17:34 UTC (permalink / raw) To: ralf; +Cc: creideiki+linux-mips, linux-mips On Tue, 24 Oct 2006 16:50:45 +0100, Ralf Baechle <ralf@linux-mips.org> wrote: > > 2. Timekeeping is broken. The clock in /proc/driver/rtc seems correct, but > > the system clock advances at about 1/16 of real time. > > This one was caused by changeset ebca9aafa9bd5086d9f310205a8e30e225c5a5a6 > which apparently wasn't quite ripe. You can work around it by > revoking this changeset for now. The time damage affects other systems > as well ... Now I'm looking my patch again but still can not find any problem... One question: > # zcat /proc/config.gz | grep HZ | grep -v ^# > CONFIG_HZ_250=y > CONFIG_SYS_SUPPORTS_ARBIT_HZ=y > CONFIG_HZ=250 IP27 really supports HZ=250 ? --- Atsushi Nemoto ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Extreme system overhead on large IP27 2006-10-24 17:34 ` Atsushi Nemoto @ 2006-10-24 17:50 ` Ralf Baechle 0 siblings, 0 replies; 22+ messages in thread From: Ralf Baechle @ 2006-10-24 17:50 UTC (permalink / raw) To: Atsushi Nemoto; +Cc: creideiki+linux-mips, linux-mips On Wed, Oct 25, 2006 at 02:34:28AM +0900, Atsushi Nemoto wrote: > Date: Wed, 25 Oct 2006 02:34:28 +0900 (JST) > To: ralf@linux-mips.org > Cc: creideiki+linux-mips@ferretporn.se, linux-mips@linux-mips.org > Subject: Re: Extreme system overhead on large IP27 > From: Atsushi Nemoto <anemo@mba.ocn.ne.jp> > Content-Type: Text/Plain; charset=us-ascii > > On Tue, 24 Oct 2006 16:50:45 +0100, Ralf Baechle <ralf@linux-mips.org> wrote: > > > 2. Timekeeping is broken. The clock in /proc/driver/rtc seems correct, but > > > the system clock advances at about 1/16 of real time. > > > > This one was caused by changeset ebca9aafa9bd5086d9f310205a8e30e225c5a5a6 > > which apparently wasn't quite ripe. You can work around it by > > revoking this changeset for now. The time damage affects other systems > > as well ... > > Now I'm looking my patch again but still can not find any problem... > > One question: > > > # zcat /proc/config.gz | grep HZ | grep -v ^# > > CONFIG_HZ_250=y > > CONFIG_SYS_SUPPORTS_ARBIT_HZ=y > > CONFIG_HZ=250 > > IP27 really supports HZ=250 ? Arbitrary frequency actually. The timer used is the HUB timer which is running at 800ns afair. It's like 53 bits or so. There is also a compare register so it's somewhat similar to the cop0 counter / compare timer. Ralf ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Extreme system overhead on large IP27 2006-10-24 15:44 ` Karl-Johan Karlsson 2006-10-24 15:50 ` Ralf Baechle 2006-10-24 17:34 ` Atsushi Nemoto @ 2006-10-25 8:45 ` Atsushi Nemoto 2006-10-26 4:05 ` Atsushi Nemoto 2 siblings, 1 reply; 22+ messages in thread From: Atsushi Nemoto @ 2006-10-25 8:45 UTC (permalink / raw) To: creideiki+linux-mips; +Cc: ralf, linux-mips On Tue, 24 Oct 2006 17:44:41 +0200 (CEST), "Karl-Johan Karlsson" <creideiki+linux-mips@ferretporn.se> wrote: > 2. Timekeeping is broken. The clock in /proc/driver/rtc seems correct, but > the system clock advances at about 1/16 of real time. Is this problem still happen if you disabled CONFIG_OPROFILE ? > 3. When booting, the kernel started, did a bit of initialization, > restarted, and did everything all over again, this time going all the way. It is not a problem. Your log is usual ERALY_PRINTK behaviour. > [17179569.184000] Linux version 2.6.19-rc3 (root@viggen) (gcc version > 4.1.1 (Gentoo 4.1.1)) #2 SMP Tue Oct 24 16:49:51 CEST 2006 > [17179569.184000] ARCH: SGI-IP27 ... > [17179569.184000] Using 1.250 MHz high precision timer. These lines are printed by initial console driver (ioc3). > [17179569.184000] Linux version 2.6.19-rc3 (root@viggen) (gcc version > 4.1.1 (Gentoo 4.1.1)) #2 SMP Tue Oct 24 16:49:51 CEST 2006 > [17179569.184000] ARCH: SGI-IP27 And rest are printed by standard console driver (8250). --- Atsushi Nemoto ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Extreme system overhead on large IP27 2006-10-25 8:45 ` Atsushi Nemoto @ 2006-10-26 4:05 ` Atsushi Nemoto 2006-10-26 7:42 ` Manish Lachwani ` (2 more replies) 0 siblings, 3 replies; 22+ messages in thread From: Atsushi Nemoto @ 2006-10-26 4:05 UTC (permalink / raw) To: anemo; +Cc: creideiki+linux-mips, ralf, linux-mips On Wed, 25 Oct 2006 17:45:04 +0900 (JST), Atsushi Nemoto <anemo@mba.ocn.ne.jp> wrote: > > 2. Timekeeping is broken. The clock in /proc/driver/rtc seems correct, but > > the system clock advances at about 1/16 of real time. > > Is this problem still happen if you disabled CONFIG_OPROFILE ? I think I found the problem at last. static struct clocksource clocksource_mips = { .name = "MIPS", .rating = 250, .read = read_mips_hpt, .shift = 24, .is_continuous = 1, }; This shift value is too large for ip27 HPT (1.25MHz). temp = (u64) NSEC_PER_SEC << clocksource_mips.shift; do_div(temp, mips_hpt_frequency); clocksource_mips.mult = (unsigned)temp; If mips_hpt_frequency is less than 0x1000000 (16777216), temp would be larger than possible 32bit value. I'll cook a patch later but until then you can use lesser shift value, for example, 20. --- Atsushi Nemoto ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Extreme system overhead on large IP27 2006-10-26 4:05 ` Atsushi Nemoto @ 2006-10-26 7:42 ` Manish Lachwani 2006-10-26 14:16 ` Atsushi Nemoto 2006-10-26 8:41 ` Karl-Johan Karlsson 2006-10-26 12:56 ` Ralf Baechle 2 siblings, 1 reply; 22+ messages in thread From: Manish Lachwani @ 2006-10-26 7:42 UTC (permalink / raw) To: Atsushi Nemoto; +Cc: creideiki+linux-mips, ralf, linux-mips Hi Atsushi, It could be that I am seeing a similar issue on the SWARM board (sb1250) as well. Your patch removed the shifts for mip_hpt_frequency from arch/mips/sibyte/sb1250/time.c and in the sb1250_hpt_read(). The Sibyte HPT is 1 Mhz. However, when I added those shifts back, I did not see any issues with the system clock. I could possibly try out your patch with lower clocksource shift values and see if the system clock is still wrong. Btw, the clocksource changes seem to work well on the BCM 1480 based board. Thanks, Manish Lachwani --- Atsushi Nemoto <anemo@mba.ocn.ne.jp> wrote: > On Wed, 25 Oct 2006 17:45:04 +0900 (JST), Atsushi > Nemoto <anemo@mba.ocn.ne.jp> wrote: > > > 2. Timekeeping is broken. The clock in > /proc/driver/rtc seems correct, but > > > the system clock advances at about 1/16 of real > time. > > > > Is this problem still happen if you disabled > CONFIG_OPROFILE ? > > I think I found the problem at last. > > static struct clocksource clocksource_mips = { > .name = "MIPS", > .rating = 250, > .read = read_mips_hpt, > .shift = 24, > .is_continuous = 1, > }; > > This shift value is too large for ip27 HPT > (1.25MHz). > > temp = (u64) NSEC_PER_SEC << > clocksource_mips.shift; > do_div(temp, mips_hpt_frequency); > clocksource_mips.mult = (unsigned)temp; > > If mips_hpt_frequency is less than 0x1000000 > (16777216), temp would be > larger than possible 32bit value. I'll cook a patch > later but until > then you can use lesser shift value, for example, > 20. > > --- > Atsushi Nemoto > > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Extreme system overhead on large IP27 2006-10-26 7:42 ` Manish Lachwani @ 2006-10-26 14:16 ` Atsushi Nemoto 2006-10-27 1:55 ` mlachwani 0 siblings, 1 reply; 22+ messages in thread From: Atsushi Nemoto @ 2006-10-26 14:16 UTC (permalink / raw) To: m_lachwani; +Cc: creideiki+linux-mips, ralf, linux-mips On Thu, 26 Oct 2006 00:42:16 -0700 (PDT), Manish Lachwani <m_lachwani@yahoo.com> wrote: > It could be that I am seeing a similar issue on the > SWARM board (sb1250) as well. Your patch removed the > shifts for mip_hpt_frequency from > arch/mips/sibyte/sb1250/time.c and in the > sb1250_hpt_read(). The Sibyte HPT is 1 Mhz. However, > when I added those shifts back, I did not see any > issues with the system clock. I could possibly try out > your patch with lower clocksource shift values and see > if the system clock is still wrong. I just sent the patch. Please try it. > Btw, the clocksource changes seem to work well on the > BCM 1480 based board. Thanks, good news! As Ralf pointed out, current code still problematic on some SMP system, but I think IP27, SB1250, BCM1480 should be OK now while their mips_hpt_read are not using per-CPU cp0 timers. --- Atsushi Nemoto ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Extreme system overhead on large IP27 2006-10-26 14:16 ` Atsushi Nemoto @ 2006-10-27 1:55 ` mlachwani 0 siblings, 0 replies; 22+ messages in thread From: mlachwani @ 2006-10-27 1:55 UTC (permalink / raw) To: Atsushi Nemoto; +Cc: m_lachwani, creideiki+linux-mips, ralf, linux-mips Hi Atsushi, I tried out your patch on the SWARM SMP and it works. thanks, Manish Lachwani Atsushi Nemoto wrote: > On Thu, 26 Oct 2006 00:42:16 -0700 (PDT), Manish Lachwani <m_lachwani@yahoo.com> wrote: > >> It could be that I am seeing a similar issue on the >> SWARM board (sb1250) as well. Your patch removed the >> shifts for mip_hpt_frequency from >> arch/mips/sibyte/sb1250/time.c and in the >> sb1250_hpt_read(). The Sibyte HPT is 1 Mhz. However, >> when I added those shifts back, I did not see any >> issues with the system clock. I could possibly try out >> your patch with lower clocksource shift values and see >> if the system clock is still wrong. >> > > I just sent the patch. Please try it. > > >> Btw, the clocksource changes seem to work well on the >> BCM 1480 based board. >> > > Thanks, good news! > > As Ralf pointed out, current code still problematic on some SMP > system, but I think IP27, SB1250, BCM1480 should be OK now while their > mips_hpt_read are not using per-CPU cp0 timers. > > --- > Atsushi Nemoto > > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Extreme system overhead on large IP27 2006-10-26 4:05 ` Atsushi Nemoto 2006-10-26 7:42 ` Manish Lachwani @ 2006-10-26 8:41 ` Karl-Johan Karlsson 2006-10-26 12:56 ` Ralf Baechle 2 siblings, 0 replies; 22+ messages in thread From: Karl-Johan Karlsson @ 2006-10-26 8:41 UTC (permalink / raw) To: Atsushi Nemoto; +Cc: linux-mips On Thu, October 26, 2006 06:05, Atsushi Nemoto wrote: > static struct clocksource clocksource_mips = { > .name = "MIPS", > .rating = 250, > .read = read_mips_hpt, > .shift = 24, > .is_continuous = 1, > }; > > [...] > > I'll cook a patch later but until > then you can use lesser shift value, for example, 20. Setting it to 20 works, thanks. -- Karl-Johan Karlsson ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Extreme system overhead on large IP27 2006-10-26 4:05 ` Atsushi Nemoto 2006-10-26 7:42 ` Manish Lachwani 2006-10-26 8:41 ` Karl-Johan Karlsson @ 2006-10-26 12:56 ` Ralf Baechle 2006-10-26 13:51 ` Kevin D. Kissell 2 siblings, 1 reply; 22+ messages in thread From: Ralf Baechle @ 2006-10-26 12:56 UTC (permalink / raw) To: Atsushi Nemoto; +Cc: creideiki+linux-mips, linux-mips On Thu, Oct 26, 2006 at 01:05:52PM +0900, Atsushi Nemoto wrote: > I think I found the problem at last. I'm afraid there is more than one problem. On the 34K core each VPE has its own c0_count and c0_compare registers. However the reset values are undefined. Which means the time offset calculated by offset = (clocksource_read(clock) - clock->cycle_last) & clock->mask; may differ wildly between processors resulting in a time jitter of upto almost 215s between both VPEs. Unfortunately there is an unavoidable race condition when attempting to synchronize the two counters. But the 34K's nature shrinks the time window to somwhere in the single digit range of cycles so on a hardcore that would be a handfull of nanoseconds. Anything that is less than the shortest time for a process to migrate from one processor (VPE in case of 34K) to another is good enough as it will guarantee that time cannot jump backward - but the jitter may still be a a slight problem for the most demanding programs. Others like RM9000x2 may have similar issues if the counter registers don't come out of reset synchronized; need to look into that. Ralf ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Extreme system overhead on large IP27 @ 2006-10-26 13:51 ` Kevin D. Kissell 0 siblings, 0 replies; 22+ messages in thread From: Kevin D. Kissell @ 2006-10-26 13:51 UTC (permalink / raw) To: Ralf Baechle, Atsushi Nemoto; +Cc: creideiki+linux-mips, linux-mips > On Thu, Oct 26, 2006 at 01:05:52PM +0900, Atsushi Nemoto wrote: > > > I think I found the problem at last. > > I'm afraid there is more than one problem. > > On the 34K core each VPE has its own c0_count and c0_compare registers. > However the reset values are undefined. Which means the time offset > calculated by > > offset = (clocksource_read(clock) - clock->cycle_last) & clock->mask; > > may differ wildly between processors resulting in a time jitter of upto > almost 215s between both VPEs. Unfortunately there is an unavoidable > race condition when attempting to synchronize the two counters. But > the 34K's nature shrinks the time window to somwhere in the single digit > range of cycles so on a hardcore that would be a handfull of nanoseconds. I don't see what's different here than in any other SMP case. Is it really true that the MIPS SMP support *requires* that all CPUs in the system come out of reset on the same clock, with the same value in Count? I find that very surprising (and a little disappointing). Is this a general limitation of Linux? MIPS32/MIPS64 PRAs call out the reset value of Count as being undefined, and chip specs for pre-MIPS32 CPUs like the R10000 and the R4400 do not call out any reset value for Count either. If there's going to be skew between CPU clocks, all it really means is that one cannot directly compare timestamps generated by different CPUs. At a given point in time, "How long will it be until you hit an absolute Count value X?" will have a slightly different answer on each CPU if there is skew, but "What will the local Count value be N jiffies from now?" should be something that can be correctly calculated independently on each node. Where are we depending on the former, and can that usage be converted into something more like the later? Regards, Kevin K. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Extreme system overhead on large IP27 @ 2006-10-26 13:51 ` Kevin D. Kissell 0 siblings, 0 replies; 22+ messages in thread From: Kevin D. Kissell @ 2006-10-26 13:51 UTC (permalink / raw) To: Ralf Baechle, Atsushi Nemoto; +Cc: creideiki+linux-mips, linux-mips > On Thu, Oct 26, 2006 at 01:05:52PM +0900, Atsushi Nemoto wrote: > > > I think I found the problem at last. > > I'm afraid there is more than one problem. > > On the 34K core each VPE has its own c0_count and c0_compare registers. > However the reset values are undefined. Which means the time offset > calculated by > > offset = (clocksource_read(clock) - clock->cycle_last) & clock->mask; > > may differ wildly between processors resulting in a time jitter of upto > almost 215s between both VPEs. Unfortunately there is an unavoidable > race condition when attempting to synchronize the two counters. But > the 34K's nature shrinks the time window to somwhere in the single digit > range of cycles so on a hardcore that would be a handfull of nanoseconds. I don't see what's different here than in any other SMP case. Is it really true that the MIPS SMP support *requires* that all CPUs in the system come out of reset on the same clock, with the same value in Count? I find that very surprising (and a little disappointing). Is this a general limitation of Linux? MIPS32/MIPS64 PRAs call out the reset value of Count as being undefined, and chip specs for pre-MIPS32 CPUs like the R10000 and the R4400 do not call out any reset value for Count either. If there's going to be skew between CPU clocks, all it really means is that one cannot directly compare timestamps generated by different CPUs. At a given point in time, "How long will it be until you hit an absolute Count value X?" will have a slightly different answer on each CPU if there is skew, but "What will the local Count value be N jiffies from now?" should be something that can be correctly calculated independently on each node. Where are we depending on the former, and can that usage be converted into something more like the later? Regards, Kevin K. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Extreme system overhead on large IP27 2006-10-26 13:51 ` Kevin D. Kissell (?) @ 2006-10-26 16:50 ` Ralf Baechle -1 siblings, 0 replies; 22+ messages in thread From: Ralf Baechle @ 2006-10-26 16:50 UTC (permalink / raw) To: Kevin D. Kissell; +Cc: Atsushi Nemoto, creideiki+linux-mips, linux-mips On Thu, Oct 26, 2006 at 03:51:35PM +0200, Kevin D. Kissell wrote: > I don't see what's different here than in any other SMP case. It just happened to be a coonfiguration which happened to trigger the issue. But the underlying problem could exist on any other SMP system using per-processor timers. > Is it really > true that the MIPS SMP support *requires* that all CPUs in the system > come out of reset on the same clock, with the same value in Count? There isn't even an requirement to use the cp0 counter at all. It just happens to be that the VSMP kernel is using that timer. It also happens to be quite a logic choice on the Malta where the alternative would be specific to one of the several system controllers. SGI systems are infamous for potencially using mixed spec CPUs from the same family. That includes different clock speeds; something like having 180MHz R10000 and 500MHz R14000 would be possible. The only sane cure for the time code in such cases is avoiding c0_count and relying on some other system-wide time source. The same is may be needed in case of variable CPU clock. That said, Linux doesn't care just need a little bit of glue code to deal with arbitrary timers. > I find that very surprising (and a little disappointing). Is this a general > limitation of Linux? MIPS32/MIPS64 PRAs call out the reset value > of Count as being undefined, and chip specs for pre-MIPS32 CPUs > like the R10000 and the R4400 do not call out any reset value for > Count either. The count / compare code is very much did originate on uniprocessor systems and the sole thing it cares about is the speed the counter is incrementing at, not the absolute value. > If there's going to be skew between CPU clocks, all it really means > is that one cannot directly compare timestamps generated by different > CPUs. At a given point in time, "How long will it be until you hit an > absolute Count value X?" will have a slightly different answer on each CPU > if there is skew, but "What will the local Count value be N jiffies from now?" > should be something that can be correctly calculated independently on each > node. Where are we depending on the former, and can that usage be converted > into something more like the later? > Kevin K. Ralf ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2006-10-27 1:55 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-21 19:59 Extreme system overhead on large IP27 Karl-Johan Karlsson
2006-10-22 15:21 ` Ralf Baechle
2006-10-22 23:23 ` Ralf Baechle
2006-10-23 0:19 ` Ralf Baechle
2006-10-23 21:30 ` Karl-Johan Karlsson
[not found] ` <20061023224318.GA1732@linux-mips.org>
2006-10-24 13:53 ` Karl-Johan Karlsson
2006-10-24 14:06 ` Ralf Baechle
2006-10-24 15:33 ` Ilya A. Volynets-Evenbakh
2006-10-24 15:44 ` Karl-Johan Karlsson
2006-10-24 15:50 ` Ralf Baechle
2006-10-24 17:34 ` Atsushi Nemoto
2006-10-24 17:50 ` Ralf Baechle
2006-10-25 8:45 ` Atsushi Nemoto
2006-10-26 4:05 ` Atsushi Nemoto
2006-10-26 7:42 ` Manish Lachwani
2006-10-26 14:16 ` Atsushi Nemoto
2006-10-27 1:55 ` mlachwani
2006-10-26 8:41 ` Karl-Johan Karlsson
2006-10-26 12:56 ` Ralf Baechle
2006-10-26 13:51 ` Kevin D. Kissell
2006-10-26 13:51 ` Kevin D. Kissell
2006-10-26 16:50 ` Ralf Baechle
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.