* sched isolcpus=1 related OOPS in 2.6.9
@ 2004-12-02 15:42 devik
2004-12-03 16:28 ` devik
0 siblings, 1 reply; 7+ messages in thread
From: devik @ 2004-12-02 15:42 UTC (permalink / raw)
To: linux-kernel
Hello,
in Soyo dual CPU PII/350 system I experience early
OOPS (even ksymdump can't save it) during CPU#1
initialization when I use cmdline isolcpus=1 to force
only CPU#0 use (I want to use affinity to select CPU#1).
The OOPS triggers every time when I use isolcpus.
I traced the problem down into sched.c:1928 (find_busiest_group)
where group->cpu_power was zero (thus division by zero occured).
In call trace it goes swapper->schedule()->........->find_busiest_group.
Important registers there: eax=ecx=edx=0, ebx!=0.
Config and vmlinux:
http://luxik.cdi.cz/~devik/files/isolcpus-oops/
Sorry no oops yet (can't get it via ksymoops nor serial),
I can provide further info (screen photo).
Can anyone at least direct me where to look further ?
(I found no general description of group scheduling code
so that I'm lost in it).
thanks a much,
-------------------------------
Martin Devera aka devik
Linux kernel QoS/HTB maintainer
http://luxik.cdi.cz/~devik/
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: sched isolcpus=1 related OOPS in 2.6.9 2004-12-02 15:42 sched isolcpus=1 related OOPS in 2.6.9 devik @ 2004-12-03 16:28 ` devik 2004-12-03 17:18 ` Randy.Dunlap 0 siblings, 1 reply; 7+ messages in thread From: devik @ 2004-12-03 16:28 UTC (permalink / raw) To: linux-kernel > only CPU#0 use (I want to use affinity to select CPU#1). > The OOPS triggers every time when I use isolcpus. > > I traced the problem down into sched.c:1928 (find_busiest_group) > where group->cpu_power was zero (thus division by zero occured). > In call trace it goes swapper->schedule()->........->find_busiest_group. > Important registers there: eax=ecx=edx=0, ebx!=0. Well, I have more info. I setup bochs smp emulator and hacked printk to output into e9 port which is then directed to a file. Also I turned sched_domains debugging. From the result (below) is clear that there is bug in isolated domains setup. devik <6>BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009fc00 (usable) BIOS-e820: 0000000000100000 - 0000000002000000 (usable) <5>32MB LOWMEM available. <6>found SMP MP-table at 000fd0f0 <7>On node 0 totalpages: 8192 <7> DMA zone: 4096 pages, LIFO batch:1 <7> Normal zone: 4096 pages, LIFO batch:1 <7> HighMem zone: 0 pages, LIFO batch:1 <6>DMI not present. <3>ACPI: Unable to locate RSDP <6>Intel MultiProcessor Specification v1.4 <6> Virtual Wire compatibility mode. <6>OEM ID: BOCHSCPU Product ID: 0.1 APIC at: 0xFEE00000 Processor #0 6:0 APIC version 17 Processor #1 6:0 APIC version 17 <6>I/O APIC #2 Version 17 at 0xFEC00000. Enabling APIC mode: Flat. Using 1 I/O APICs <6>Processors: 2 Built 1 zonelists Kernel command line: BOOT_IMAGE=linux ro root=301 apic=debug noapic isolcpus=1 mapped APIC to ffffd000 (fee00000) mapped IOAPIC to ffffc000 (fec00000) <6>Initializing CPU#0 CPU 0 irqstacks, hard=c035b000 soft=c0359000 PID hash table entries: 256 (order: 8, 4096 bytes) Detected 2.001 MHz processor. <6>Using tsc for high-res timesource Console: colour VGA+ 80x50 Dentry cache hash table entries: 8192 (order: 3, 32768 bytes) Inode-cache hash table entries: 4096 (order: 2, 16384 bytes) <6>Memory: 29372k/32768k available (1511k kernel code, 2960k reserved, 698k data, 168k init, 0k highmem) Checking if this processor honours the WP bit even in supervisor mode... Ok. <7>Calibrating delay loop... 8.19 BogoMIPS (lpj=4096) Mount-cache hash table entries: 512 (order: 0, 4096 bytes) <7>CPU: After generic identify, caps: 0180a379 00000000 00000000 00000000 <7>CPU: After vendor identify, caps: 0180a379 00000000 00000000 00000000 <7>CPU: After all inits, caps: 0180a379 00000000 00000000 00000040 <6>Enabling fast FPU save and restore... done. <6>Checking 'hlt' instruction... OK. CPU0: Intel Pentium III (Coppermine) stepping 03 per-CPU timeslice cutoff: 81.50 usecs. task migration cache decay timeout: 1 msecs. Getting VERSION: 170011 Getting VERSION: 170011 Getting ID: 0 Getting LVT0: 0 Getting LVT1: 0 enabled ExtINT on CPU#0 Booting processor 1/1 eip 2000 CPU 1 irqstacks, hard=c035c000 soft=c035a000 <6>Initializing CPU#1 masked ExtINT on CPU#1 <7>Calibrating delay loop... 8.19 BogoMIPS (lpj=4096) <7>CPU: After generic identify, caps: 0180a379 00000000 00000000 00000000 <7>CPU: After vendor identify, caps: 0180a379 00000000 00000000 00000000 <7>CPU: After all inits, caps: 0180a379 00000000 00000000 00000040 CPU1: Intel Pentium III (Coppermine) stepping 03 <6>Total of 2 processors activated (16.38 BogoMIPS). Using local APIC timer interrupts. calibrating APIC timer ... ..... CPU clock speed is 1.0999 MHz. ..... host bus clock speed is 1.0999 MHz. <6>checking TSC synchronization across 2 CPUs: <6>CPU#0 had 0 usecs TSC skew, fixed it up. <6>CPU#1 had 0 usecs TSC skew, fixed it up. Brought up 2 CPUs <6>Setting up cpu 1 isolated. <7>CPU0: online <7> domain 0: span 3 <7> groups: 1 2 <7>CPU1: online <7> domain 0: span 2 <7>ERROR domain->cpu_power not set <7> groups: 2 <1>divide error: 0000 [#1] SMP Modules linked in: CPU: 0 EIP: 0060:[<c0116fd3>] Not tainted VLI EFLAGS: 00010046 (2.6.9imq) EIP is at find_busiest_group+0x2b3/0x310 eax: 00000000 ebx: c10b2e74 ecx: 00000000 edx: 00000000 esi: c0360ee8 edi: c0351000 ebp: c10b2e84 esp: c10b2e38 ds: 007b es: 007b ss: 0068 Process swapper (pid: 1, threadinfo=c10b2000 task=c10b15a0) Stack: c10b2e74 00000002 00000002 00004441 08bca3a6 c0351000 00000000 00000001 00000080 00000080 00000080 00000000 00000000 c0360edc 00000000 00000002 00000040 c1044940 00000001 c10b2eb8 c0117125 c1044940 00000000 c10b2ea8 Call Trace: [<c01071ff>] show_stack+0x7f/0xa0 [<c01073ae>] show_registers+0x15e/0x1c0 [<c01075c4>] die+0xf4/0x180 [<c010775b>] do_divide_error+0x10b/0x130 [<c0106ded>] error_code+0x2d/0x38 [<c0117125>] load_balance+0x35/0x1a0 [<c01175fa>] rebalance_tick+0xba/0xd0 [<c0117732>] scheduler_tick+0x122/0x480 [<c0124b85>] update_process_times+0x45/0x50 [<c0111928>] smp_apic_timer_interrupt+0xf8/0x100 [<c0106d52>] apic_timer_interrupt+0x1a/0x20 [<c032d03a>] unpack_to_rootfs+0x17a/0x200 [<c032d0eb>] populate_rootfs+0x2b/0x120 [<c010058a>] init+0x8a/0x1e0 [<c0104565>] kernel_thread_helper+0x5/0x10 Code: 00 0f 4d c2 83 f8 01 89 c1 7e ad 8b 4d d0 85 c9 0f 84 fe fd ff ff 8b 45 e0 01 45 dc 89 c2 8b 4e 08 01 4d d4 c1 e2 07 89 d0 31 d2 <f7> f1 8b 55 cc 85 d2 89 45 e0 75 1c 8b 45 e4 39 45 e0 76 09 89 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: sched isolcpus=1 related OOPS in 2.6.9 2004-12-03 16:28 ` devik @ 2004-12-03 17:18 ` Randy.Dunlap 2004-12-03 17:46 ` devik 2004-12-03 18:15 ` [PATCH] " devik 0 siblings, 2 replies; 7+ messages in thread From: Randy.Dunlap @ 2004-12-03 17:18 UTC (permalink / raw) To: devik; +Cc: linux-kernel, piggin devik wrote: >>only CPU#0 use (I want to use affinity to select CPU#1). >>The OOPS triggers every time when I use isolcpus. >> >>I traced the problem down into sched.c:1928 (find_busiest_group) >>where group->cpu_power was zero (thus division by zero occured). >>In call trace it goes swapper->schedule()->........->find_busiest_group. >>Important registers there: eax=ecx=edx=0, ebx!=0. > > > Well, I have more info. I setup bochs smp emulator and hacked > printk to output into e9 port which is then directed to a file. > Also I turned sched_domains debugging. From the result (below) > is clear that there is bug in isolated domains setup. You are correct, of course. If "isolcpus" is used, the isolated cpu(s) (in <cpu_isolated_map>) are not init like the remaining cpus are. I don't know what's intended here... but it's not the divide by 0. > devik > > <6>BIOS-provided physical RAM map: > BIOS-e820: 0000000000000000 - 000000000009fc00 (usable) > BIOS-e820: 0000000000100000 - 0000000002000000 (usable) > <5>32MB LOWMEM available. > <6>found SMP MP-table at 000fd0f0 > <7>On node 0 totalpages: 8192 > <7> DMA zone: 4096 pages, LIFO batch:1 > <7> Normal zone: 4096 pages, LIFO batch:1 > <7> HighMem zone: 0 pages, LIFO batch:1 > <6>DMI not present. > <3>ACPI: Unable to locate RSDP > <6>Intel MultiProcessor Specification v1.4 > <6> Virtual Wire compatibility mode. > <6>OEM ID: BOCHSCPU Product ID: 0.1 APIC at: 0xFEE00000 > Processor #0 6:0 APIC version 17 > Processor #1 6:0 APIC version 17 > <6>I/O APIC #2 Version 17 at 0xFEC00000. > Enabling APIC mode: Flat. Using 1 I/O APICs > <6>Processors: 2 > Built 1 zonelists > Kernel command line: BOOT_IMAGE=linux ro root=301 apic=debug noapic isolcpus=1 > mapped APIC to ffffd000 (fee00000) > mapped IOAPIC to ffffc000 (fec00000) > <6>Initializing CPU#0 > CPU 0 irqstacks, hard=c035b000 soft=c0359000 > PID hash table entries: 256 (order: 8, 4096 bytes) > Detected 2.001 MHz processor. > <6>Using tsc for high-res timesource > Console: colour VGA+ 80x50 > Dentry cache hash table entries: 8192 (order: 3, 32768 bytes) > Inode-cache hash table entries: 4096 (order: 2, 16384 bytes) > <6>Memory: 29372k/32768k available (1511k kernel code, 2960k reserved, 698k data, 168k init, 0k highmem) > Checking if this processor honours the WP bit even in supervisor mode... Ok. > <7>Calibrating delay loop... 8.19 BogoMIPS (lpj=4096) > Mount-cache hash table entries: 512 (order: 0, 4096 bytes) > <7>CPU: After generic identify, caps: 0180a379 00000000 00000000 00000000 > <7>CPU: After vendor identify, caps: 0180a379 00000000 00000000 00000000 > <7>CPU: After all inits, caps: 0180a379 00000000 00000000 00000040 > <6>Enabling fast FPU save and restore... done. > <6>Checking 'hlt' instruction... OK. > CPU0: Intel Pentium III (Coppermine) stepping 03 > per-CPU timeslice cutoff: 81.50 usecs. > task migration cache decay timeout: 1 msecs. > Getting VERSION: 170011 > Getting VERSION: 170011 > Getting ID: 0 > Getting LVT0: 0 > Getting LVT1: 0 > enabled ExtINT on CPU#0 > Booting processor 1/1 eip 2000 > CPU 1 irqstacks, hard=c035c000 soft=c035a000 > <6>Initializing CPU#1 > masked ExtINT on CPU#1 > <7>Calibrating delay loop... 8.19 BogoMIPS (lpj=4096) > <7>CPU: After generic identify, caps: 0180a379 00000000 00000000 00000000 > <7>CPU: After vendor identify, caps: 0180a379 00000000 00000000 00000000 > <7>CPU: After all inits, caps: 0180a379 00000000 00000000 00000040 > CPU1: Intel Pentium III (Coppermine) stepping 03 > <6>Total of 2 processors activated (16.38 BogoMIPS). > Using local APIC timer interrupts. > calibrating APIC timer ... > ..... CPU clock speed is 1.0999 MHz. > ..... host bus clock speed is 1.0999 MHz. > <6>checking TSC synchronization across 2 CPUs: > <6>CPU#0 had 0 usecs TSC skew, fixed it up. > <6>CPU#1 had 0 usecs TSC skew, fixed it up. > Brought up 2 CPUs > <6>Setting up cpu 1 isolated. > <7>CPU0: online > <7> domain 0: span 3 > <7> groups: 1 2 > <7>CPU1: online > <7> domain 0: span 2 > <7>ERROR domain->cpu_power not set > <7> groups: 2 > <1>divide error: 0000 [#1] > SMP > Modules linked in: > CPU: 0 > EIP: 0060:[<c0116fd3>] Not tainted VLI > EFLAGS: 00010046 (2.6.9imq) > EIP is at find_busiest_group+0x2b3/0x310 > eax: 00000000 ebx: c10b2e74 ecx: 00000000 edx: 00000000 > esi: c0360ee8 edi: c0351000 ebp: c10b2e84 esp: c10b2e38 > ds: 007b es: 007b ss: 0068 > Process swapper (pid: 1, threadinfo=c10b2000 task=c10b15a0) > Stack: c10b2e74 00000002 00000002 00004441 08bca3a6 c0351000 00000000 00000001 > 00000080 00000080 00000080 00000000 00000000 c0360edc 00000000 00000002 > 00000040 c1044940 00000001 c10b2eb8 c0117125 c1044940 00000000 c10b2ea8 > Call Trace: > [<c01071ff>] show_stack+0x7f/0xa0 > [<c01073ae>] show_registers+0x15e/0x1c0 > [<c01075c4>] die+0xf4/0x180 > [<c010775b>] do_divide_error+0x10b/0x130 > [<c0106ded>] error_code+0x2d/0x38 > [<c0117125>] load_balance+0x35/0x1a0 > [<c01175fa>] rebalance_tick+0xba/0xd0 > [<c0117732>] scheduler_tick+0x122/0x480 > [<c0124b85>] update_process_times+0x45/0x50 > [<c0111928>] smp_apic_timer_interrupt+0xf8/0x100 > [<c0106d52>] apic_timer_interrupt+0x1a/0x20 > [<c032d03a>] unpack_to_rootfs+0x17a/0x200 > [<c032d0eb>] populate_rootfs+0x2b/0x120 > [<c010058a>] init+0x8a/0x1e0 > [<c0104565>] kernel_thread_helper+0x5/0x10 > Code: 00 0f 4d c2 83 f8 01 89 c1 7e ad 8b 4d d0 85 c9 0f 84 fe fd ff ff 8b 45 e0 01 45 dc 89 c2 8b 4e 08 01 4d d4 c1 e2 07 89 d0 31 d2 <f7> f1 8b 55 cc 85 d2 89 45 e0 75 1c 8b 45 e4 39 45 e0 76 09 89 -- ~Randy ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: sched isolcpus=1 related OOPS in 2.6.9 2004-12-03 17:18 ` Randy.Dunlap @ 2004-12-03 17:46 ` devik 2004-12-03 18:15 ` [PATCH] " devik 1 sibling, 0 replies; 7+ messages in thread From: devik @ 2004-12-03 17:46 UTC (permalink / raw) To: Randy.Dunlap; +Cc: linux-kernel, piggin > > Well, I have more info. I setup bochs smp emulator and hacked > > printk to output into e9 port which is then directed to a file. > > Also I turned sched_domains debugging. From the result (below) > > is clear that there is bug in isolated domains setup. > > You are correct, of course. If "isolcpus" is used, the isolated > cpu(s) (in <cpu_isolated_map>) are not init like the remaining > cpus are. The problem seems worse to me. See: Brought up 2 CPUs <7>CPU0: online <7> domain 0: span 3 <7> groups: 1[128] 2[0] <7>ERROR groups don't span domain->span <7>CPU1: online <7> domain 0: span 2 <7>ERROR domain->cpu_power not set <7> groups: 2[0] <1>divide error: 0000 [#1] I added code which dumps cpu_power for each group (in brackets) and it seems that only for the first group the power is computed (even for regular non isolated domain). Also the span for cpu0 should be 1 (it can be fixed by: sd->span = cpu_possible_map; => sd->span = cpu_default_map; at line 4484). Even then the group list is still bad. I'll dig more into it... devik ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] sched isolcpus=1 related OOPS in 2.6.9 2004-12-03 17:18 ` Randy.Dunlap 2004-12-03 17:46 ` devik @ 2004-12-03 18:15 ` devik 2004-12-03 19:47 ` Dimitri Sivanich 1 sibling, 1 reply; 7+ messages in thread From: devik @ 2004-12-03 18:15 UTC (permalink / raw) To: Randy.Dunlap; +Cc: linux-kernel, piggin [-- Attachment #1: Type: TEXT/PLAIN, Size: 483 bytes --] > You are correct, of course. If "isolcpus" is used, the isolated > cpu(s) (in <cpu_isolated_map>) are not init like the remaining > cpus are. > > I don't know what's intended here... but it's not the divide by 0. A patch is attached which fixes problems with isolated domains for me. I hope it is correct :-) I will try on real SMP when I will be in work (now it boots on Boochs). enjoy, Martin Devera aka devik Linux kernel QoS/HTB maintainer http://luxik.cdi.cz/~devik/ [-- Attachment #2: Type: TEXT/PLAIN, Size: 1224 bytes --] --- linux-2.6.9/kernel/sched.c Mon Oct 18 23:54:55 2004 +++ kernel/sched.c Fri Dec 3 19:06:04 2004 @@ -4480,7 +4480,7 @@ #ifdef CONFIG_NUMA sd->span = nodemask; #else - sd->span = cpu_possible_map; + sd->span = cpu_default_map; #endif sd->parent = p; sd->groups = &sched_group_phys[group]; @@ -4512,11 +4512,14 @@ /* Set up isolated groups */ for_each_cpu_mask(i, cpu_isolated_map) { + int group; cpumask_t mask; cpus_clear(mask); cpu_set(i, mask); init_sched_build_groups(sched_group_isolated, mask, &cpu_to_isolated_group); + group = cpu_to_isolated_group(i); + sched_group_isolated[group].cpu_power = SCHED_LOAD_SCALE; } #ifdef CONFIG_NUMA @@ -4532,7 +4535,7 @@ &cpu_to_phys_group); } #else - init_sched_build_groups(sched_group_phys, cpu_possible_map, + init_sched_build_groups(sched_group_phys, cpu_default_map, &cpu_to_phys_group); #endif @@ -4634,7 +4637,7 @@ cpus_or(groupmask, groupmask, group->cpumask); cpumask_scnprintf(str, NR_CPUS, group->cpumask); - printk(" %s", str); + printk(" %s[%ld]", str, group->cpu_power); group = group->next; } while (group != sd->groups); ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] sched isolcpus=1 related OOPS in 2.6.9 2004-12-03 18:15 ` [PATCH] " devik @ 2004-12-03 19:47 ` Dimitri Sivanich 2004-12-03 20:21 ` devik 0 siblings, 1 reply; 7+ messages in thread From: Dimitri Sivanich @ 2004-12-03 19:47 UTC (permalink / raw) To: devik; +Cc: Randy.Dunlap, linux-kernel, piggin On Fri, Dec 03, 2004 at 07:15:58PM +0100, devik wrote: > A patch is attached which fixes problems with isolated > domains for me. I hope it is correct :-) I will try on Martin, After a quick look, this patch looks OK (although I haven't had a chance to test it yet). I don't know what what was intended with a default cpu_power of 0, but I don't believe that a value of SCHED_LOAD_SCALE should negatively affect the isolated domains (versus any other value). Note that sched_init() does use SCHED_LOAD_SCALE as a default. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] sched isolcpus=1 related OOPS in 2.6.9 2004-12-03 19:47 ` Dimitri Sivanich @ 2004-12-03 20:21 ` devik 0 siblings, 0 replies; 7+ messages in thread From: devik @ 2004-12-03 20:21 UTC (permalink / raw) To: Dimitri Sivanich; +Cc: Randy.Dunlap, linux-kernel, piggin > After a quick look, this patch looks OK (although I haven't had a chance to > test it yet). I don't know what what was intended with a default cpu_power > of 0, but I don't believe that a value of SCHED_LOAD_SCALE should negatively > affect the isolated domains (versus any other value). Hello Dimitri, thanks for your check. As I understand the code (it took me 5 hours eh eh) only relative sizes of cpu_power within one domain matter. Thus in isolated domain one can use any nonzero value. Also SCHED_LOAD_SCALE is probably ok in principle because the value means "power of one typical CPU" AFAIK. I'm not sure who is official maintainer of the scheduler and whether he will see/integrate the patch ... devik ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2004-12-03 20:24 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-12-02 15:42 sched isolcpus=1 related OOPS in 2.6.9 devik 2004-12-03 16:28 ` devik 2004-12-03 17:18 ` Randy.Dunlap 2004-12-03 17:46 ` devik 2004-12-03 18:15 ` [PATCH] " devik 2004-12-03 19:47 ` Dimitri Sivanich 2004-12-03 20:21 ` devik
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox