* Re: 2.6.12-mm2 [not found] <fa.h6rvsi4.j68fhk@ifi.uio.no> @ 2005-06-27 6:44 ` Reuben Farrelly 2005-06-27 7:24 ` 2.6.12-mm2 Andrew Morton 0 siblings, 1 reply; 9+ messages in thread From: Reuben Farrelly @ 2005-06-27 6:44 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel Hi, On 26/06/2005 11:12 a.m., Andrew Morton wrote: > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12/2.6.12-mm2/ > > > - A reminder that there is a vger mailing list for tracking patches which > are added to -mm. Do > > `echo subscribe mm-commits | mail majordomo@vger.kernel.org' > > - Lots of merges. I'm holding off on the 80-odd pcmcia patches until we get > the recent PCI breakage sorted out. > > - Big arch/cris update. Some bad stuff seems to be happening here (this is new to -mm2; -mm1 did not have this problem). It's 100% reproduceable, although seems to happen at slightly different places in the bootup, especially at the end. Did I miss a patch for this? reuben Linux version 2.6.12-mm2 (root@tornado) (gcc version 4.0.0 20050622 (Red Hat 4.0.0-13)) #1 SMP Mon Jun 27 01:19:41 NZST 2005 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009f800 (usable) BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved) BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 000000001fff0000 (usable) BIOS-e820: 000000001fff0000 - 000000001fff3000 (ACPI NVS) BIOS-e820: 000000001fff3000 - 0000000020000000 (ACPI data) BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved) 511MB LOWMEM available. found SMP MP-table at 000f52e0 DMI 2.3 present. ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) Processor #0 15:3 APIC version 20 ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled) Processor #1 15:3 APIC version 20 ACPI: LAPIC_NMI (acpi_id[0x00] dfl dfl lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1]) ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0]) IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) Enabling APIC mode: Flat. Using 1 I/O APICs Using ACPI (MADT) for SMP configuration information Allocating PCI resources starting at 20000000 (gap: 20000000:dec00000) Built 1 zonelists Initializing CPU#0 Kernel command line: ro root=/dev/md2 console=ttyS1,57600 PID hash table entries: 2048 (order: 11, 32768 bytes) Detected 2813.906 MHz processor. Using tsc for high-res timesource Console: colour VGA+ 80x25 Dentry cache hash table entries: 131072 (order: 7, 524288 bytes) Inode-cache hash table entries: 65536 (order: 6, 262144 bytes) Memory: 515240k/524224k available (2133k kernel code, 8504k reserved, 920k data, 204k init, 0k highmem) Checking if this processor honours the WP bit even in supervisor mode... Ok. Calibrating delay using timer specific routine.. 5635.43 BogoMIPS (lpj=11270866) Mount-cache hash table entries: 512 monitor/mwait feature present. using mwait in idle threads. CPU: Trace cache: 12K uops, L1 D cache: 16K CPU: L2 cache: 1024K CPU: Physical Processor ID: 0 Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. CPU0: Intel P4/Xeon Extended MCE MSRs (12) available CPU0: Thermal monitoring enabled Enabling fast FPU save and restore... done. Enabling unmasked SIMD FPU exception support... done. Checking 'hlt' instruction... OK. CPU0: Intel(R) Pentium(R) 4 CPU 2.80GHz stepping 03 Booting processor 1/1 eip 2000 Initializing CPU#1 Calibrating delay using timer specific routine.. 5627.52 BogoMIPS (lpj=11255052) monitor/mwait feature present. CPU: Trace cache: 12K uops, L1 D cache: 16K CPU: L2 cache: 1024K CPU: Physical Processor ID: 0 Intel machine check architecture supported. Intel machine check reporting enabled on CPU#1. CPU1: Intel P4/Xeon Extended MCE MSRs (12) available CPU1: Thermal monitoring enabled CPU1: Intel(R) Pentium(R) 4 CPU 2.80GHz stepping 03 Total of 2 processors activated (11262.95 BogoMIPS). ENABLING IO-APIC IRQs ..TIMER: vector=0x31 pin1=2 pin2=-1 checking TSC synchronization across 2 CPUs: passed. softlockup thread 0 started up. Brought up 2 CPUs softlockup thread 1 started up. -> [0][1][ 524288] 0.0 [ 0.0] (0): ( 30446 15223) -> [0][1][ 551882] 0.0 [ 0.0] (0): ( 12568 16550) -> [0][1][ 580928] 0.0 [ 0.0] (0): ( -2456 15787) -> [0][1][ 611503] 0.0 [ 0.0] (0): ( -4468 8899) -> [0][1][ 643687] 0.0 [ 0.0] (0): ( -10064 7247) -> [0][1][ 677565] 0.0 [ 0.0] (0): ( 6817 12064) -> [0][1][ 713226] 0.0 [ 0.0] (0): ( 15269 10258) -> [0][1][ 750764] 0.0 [ 0.0] (0): ( 16819 5904) -> found max. [0][1] working set size found: 524288, cost: 30446 --------------------- | migration cost matrix (max_cache_size: 1048576, cpu: 2813 MHz): --------------------- [00] [01] [00]: - 0.0(0) [01]: 0.0(0) - -------------------------------- | cacheflush times [1]: 0.0 (60892) | calibration delay: 0 seconds -------------------------------- NET: Registered protocol family 16 PCI: PCI BIOS revision 2.10 entry at 0xfb440, last bus=3 PCI: Using configuration type 1 mtrr: v2.0 (20020519) ACPI: Subsystem revision 20050309 ACPI: Interpreter enabled ACPI: Using IOAPIC for interrupt routing ACPI: PCI Root Bridge [PCI0] (0000:00) PCI: Probing PCI hardware (bus 00) PCI: Ignoring BAR0-3 of IDE controller 0000:00:1f.2 PCI: Transparent bridge - 0000:00:1e.0 ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 *7 9 10 11 12 14 15) ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 9 *10 11 12 14 15) ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 *5 6 7 9 10 11 12 14 15) ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 9 10 *11 12 14 15) ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0,disabled. ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 9 10 *11 12 14 15) ACPI: PCI Interrupt Link [LNK0] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0,disabled. ACPI: PCI Interrupt Link [LNK1] (IRQs 3 4 5 6 7 *9 10 11 12 14 15) SCSI subsystem initialized usbcore: registered new driver usbfs usbcore: registered new driver hub PCI: Using ACPI for IRQ routing PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report Machine check exception polling timer started. inotify device minor=63 Initializing Cryptographic API Real Time Clock Driver v1.12 hw_random: RNG not detected Hangcheck: starting hangcheck timer 0.9.0 (tick is 180 seconds, margin is 60 seconds). Hangcheck: Using monotonic_clock(). cn_fork is registered serio: i8042 AUX port at 0x60,0x64 irq 12 serio: i8042 KBD port at 0x60,0x64 irq 1 Serial: 8250/16550 driver $Revision: 1.90 $ 8 ports, IRQ sharing enabled ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A [<c0103ad0>] dump_stack+0x17/0x19 [<c01cab4b>] spin_bug+0x5b/0x67 [<c01cac9c>] _raw_spin_lock+0x78/0x7a [<c0314ad9>] _spin_lock+0x8/0xa [<c0313370>] schedule+0x6c0/0xd68 [<c0100d31>] cpu_idle+0x64/0x66 [<c01002c5>] rest_init+0x25/0x27 [<c03fe8af>] start_kernel+0x154/0x167 [<c010020f>] 0xc010020f Kernel panic - not syncing: bad locking Badness in smp_call_function at arch/i386/kernel/smp.c:553 [<c0103ad0>] dump_stack+0x17/0x19 [<c010f980>] smp_call_function+0x137/0x13c [<c010fb49>] smp_send_stop+0x1e/0x27 [<c011c2cf>] panic+0x4c/0x102 [<c01cab57>] __spin_lock_debug+0x0/0xcd [<c01cac9c>] _raw_spin_lock+0x78/0x7a [<c0314ad9>] _spin_lock+0x8/0xa [<c0313370>] schedule+0x6c0/0xd68 [<c0100d31>] cpu_idle+0x64/0x66 [<c01002c5>] rest_init+0x25/0x27 [<c03fe8af>] start_kernel+0x154/0x167 [<c010020f>] 0xc010020f ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 2.6.12-mm2 2005-06-27 6:44 ` 2.6.12-mm2 Reuben Farrelly @ 2005-06-27 7:24 ` Andrew Morton 2005-06-27 7:47 ` 2.6.12-mm2 Reuben Farrelly 0 siblings, 1 reply; 9+ messages in thread From: Andrew Morton @ 2005-06-27 7:24 UTC (permalink / raw) To: Reuben Farrelly; +Cc: linux-kernel, Ingo Molnar Reuben Farrelly <reuben-lkml@reub.net> wrote: > > ... > > Some bad stuff seems to be happening here (this is new to -mm2; -mm1 did not > have this problem). > > It's 100% reproduceable, although seems to happen at slightly different places > in the bootup, especially at the end. Did I miss a patch for this? > Why do you keep breaking my kernel? > ... > Serial: 8250/16550 driver $Revision: 1.90 $ 8 ports, IRQ sharing enabled > ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A > ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A > [<c0103ad0>] dump_stack+0x17/0x19 > [<c01cab4b>] spin_bug+0x5b/0x67 > [<c01cac9c>] _raw_spin_lock+0x78/0x7a > [<c0314ad9>] _spin_lock+0x8/0xa > [<c0313370>] schedule+0x6c0/0xd68 > [<c0100d31>] cpu_idle+0x64/0x66 > [<c01002c5>] rest_init+0x25/0x27 > [<c03fe8af>] start_kernel+0x154/0x167 > [<c010020f>] 0xc010020f > Kernel panic - not syncing: bad locking That's odd - we lost a printk there: printk("BUG: spinlock %s on CPU#%d, %s/%d, %p\n", msg, smp_processor_id(), current->comm, current->pid, lock); which is a shame, because it would have told us stuff. Do you have any traces which do have that message? Anyway, scary trace. It look like some spinlock is thought to be in the wrong state in schedule(). Send the .config, please. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 2.6.12-mm2 2005-06-27 7:24 ` 2.6.12-mm2 Andrew Morton @ 2005-06-27 7:47 ` Reuben Farrelly 2005-06-27 8:22 ` 2.6.12-mm2 Andrew Morton 0 siblings, 1 reply; 9+ messages in thread From: Reuben Farrelly @ 2005-06-27 7:47 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, Ingo Molnar Hi, On 27/06/2005 7:24 p.m., Andrew Morton wrote: > Reuben Farrelly <reuben-lkml@reub.net> wrote: >> ... >> >> Some bad stuff seems to be happening here (this is new to -mm2; -mm1 did not >> have this problem). >> >> It's 100% reproduceable, although seems to happen at slightly different places >> in the bootup, especially at the end. Did I miss a patch for this? >> > > Why do you keep breaking my kernel? Sadistic enjoyment ;-) >> Serial: 8250/16550 driver $Revision: 1.90 $ 8 ports, IRQ sharing enabled >> ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A >> ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A >> [<c0103ad0>] dump_stack+0x17/0x19 >> [<c01cab4b>] spin_bug+0x5b/0x67 >> [<c01cac9c>] _raw_spin_lock+0x78/0x7a >> [<c0314ad9>] _spin_lock+0x8/0xa >> [<c0313370>] schedule+0x6c0/0xd68 >> [<c0100d31>] cpu_idle+0x64/0x66 >> [<c01002c5>] rest_init+0x25/0x27 >> [<c03fe8af>] start_kernel+0x154/0x167 >> [<c010020f>] 0xc010020f >> Kernel panic - not syncing: bad locking > > That's odd - we lost a printk there: > > printk("BUG: spinlock %s on CPU#%d, %s/%d, %p\n", msg, > smp_processor_id(), current->comm, current->pid, lock); > > which is a shame, because it would have told us stuff. Do you have any > traces which do have that message? Uh. Likely got munged within hyperterm. Here's a better one just created using QVT: usbcore: registered new driver usblp drivers/usb/class/usblp.c: v0.13: USB Printer Device Class driver usb 3-1: new full speed USB device using uhci_hcd and address 2 hub 3-1:1.0: USB hub found hub 3-1:1.0: 4 ports detected usbcore: registered new driver hiddev usb 4-1: new full speed USB device using uhci_hcd and address 2 drivers/usb/class/usblp.c: usblp0: USB Bidirectional printer dev 2 if 0 alt 0 proto 2 vid 0x03F0 pid 0x6204 usbcore: registered new driver usbhid drivers/usb/input/hid-core.c: v2.01:USB HID core driver mice: PS/2 mouse device common for all mice input: PC Speaker md: raid1 personality registered as nr 3 md: md driver 0.90.2 MAX_MD_DEVS=256, MD_SB_DISKS=27 md: bitmap version 3.38 NET: Registered protocol family 2 IP: routing cache hash table of 1024 buckets, 24Kbytes TCP established hash table entries: 32768 (order: 7, 786432 bytes) TCP bind hash table entries: 32768 (order: 7, 655360 bytes) TCP: Hash tables configured (established 32768 bind 32768) TCP reno registered GRE over IPv4 tunneling driver ip_conntrack version 2.1 (4095 buckets, 32760 max) - 212 bytes per conntrack usb 3-1.1: new low speed USB device using uhci_hcd and address 3 BUG: spinlock recursion on CPU#0, swapper/0, c1407160 [<c0103ad0>] dump_stack+0x17/0x19 [<c01cab4b>] spin_bug+0x5b/0x67 [<c01cac9c>] _raw_spin_lock+0x78/0x7a [<c0314ad9>] _spin_lock+0x8/0xa [<c0117399>] scheduler_tick+0xd0/0x37c [<c01256b3>] update_process_times+0x58/0xd7 [<c0110f50>] smp_apic_timer_interrupt+0xde/0xe0 [<c0103614>] apic_timer_interrupt+0x1c/0x24 [<c0100d1e>] cpu_idle+0x51/0x66 [<c01002c5>] rest_init+0x25/0x27 [<c03fe8af>] start_kernel+0x154/0x167 [<c010020f>] 0xc010020f Kernel panic - not syncing: bad locking Badness in smp_call_function at arch/i386/kernel/smp.c:553 [<c0103ad0>] dump_stack+0x17/0x19 [<c010f980>] smp_call_function+0x137/0x13c [<c010fb49>] smp_send_stop+0x1e/0x27 [<c011c2cf>] panic+0x4c/0x102 [<c01cab57>] __spin_lock_debug+0x0/0xcd [<c01cac9c>] _raw_spin_lock+0x78/0x7a [<c0314ad9>] _spin_lock+0x8/0xa [<c0117399>] scheduler_tick+0xd0/0x37c [<c01256b3>] update_process_times+0x58/0xd7 [<c0110f50>] smp_apic_timer_interrupt+0xde/0xe0 [<c0103614>] apic_timer_interrupt+0x1c/0x24 [<c0100d1e>] cpu_idle+0x51/0x66 [<c01002c5>] rest_init+0x25/0x27 [<c03fe8af>] start_kernel+0x154/0x167 [<c010020f>] 0xc010020f > Anyway, scary trace. It look like some spinlock is thought to be in the > wrong state in schedule(). Send the .config, please. Now online at http://www.reub.net/kernel/.config Reuben ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 2.6.12-mm2 2005-06-27 7:47 ` 2.6.12-mm2 Reuben Farrelly @ 2005-06-27 8:22 ` Andrew Morton 2005-06-27 9:37 ` 2.6.12-mm2 Ingo Molnar 2005-06-27 9:48 ` [patch] spinlock-debug fix Ingo Molnar 0 siblings, 2 replies; 9+ messages in thread From: Andrew Morton @ 2005-06-27 8:22 UTC (permalink / raw) To: Reuben Farrelly; +Cc: linux-kernel, mingo Reuben Farrelly <reuben-lkml@reub.net> wrote: > > > Anyway, scary trace. It look like some spinlock is thought to be in the > > wrong state in schedule(). Send the .config, please. > > Now online at http://www.reub.net/kernel/.config Me too. BUG: spinlock recursion on CPU#0, swapper/0, c120d520 [<c01039ed>] dump_stack+0x19/0x20 [<c01d9af2>] spin_bug+0x42/0x54 [<c01d9bfa>] _raw_spin_lock+0x3e/0x84 [<c031d0ad>] _spin_lock+0x9/0x10 [<c031b9e9>] schedule+0x479/0xbc8 [<c0100cb4>] cpu_idle+0x88/0x8c [<c01002c1>] rest_init+0x21/0x28 [<c0442899>] start_kernel+0x151/0x158 [<c010020f>] 0xc010020f Kernel panic - not syncing: bad locking The bug is in the new spinlock debugging code itself. Ingo, can you test that .config please? Reuben, I guess disabling CONFIG_DEBUG_SPINLOCK will get you going. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 2.6.12-mm2 2005-06-27 8:22 ` 2.6.12-mm2 Andrew Morton @ 2005-06-27 9:37 ` Ingo Molnar 2005-06-27 21:14 ` 2.6.12-mm2 Andrew Morton 2005-06-27 9:48 ` [patch] spinlock-debug fix Ingo Molnar 1 sibling, 1 reply; 9+ messages in thread From: Ingo Molnar @ 2005-06-27 9:37 UTC (permalink / raw) To: Andrew Morton; +Cc: Reuben Farrelly, linux-kernel is the fput()/sysfs_release() crash below known? Ingo Linux version 2.6.12-mm2 (mingo@jupiter) (gcc version 3.4.1 20040831 (Red Hat 3.4.1-10)) #11 SMP Mon Jun 27 11:19:41 CEST 2005 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009fc00 (usable) BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved) BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 0000000010000000 (usable) BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved) BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved) BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved) 256MB LOWMEM available. found SMP MP-table at 000f5b30 On node 0 totalpages: 65536 DMA zone: 4096 pages, LIFO batch:1 Normal zone: 61440 pages, LIFO batch:31 HighMem zone: 0 pages, LIFO batch:1 early console enabled DMI 2.2 present. ABIT i440BX-W83977 detected: force use of acpi=ht ACPI: Unable to locate RSDP Intel MultiProcessor Specification v1.1 Virtual Wire compatibility mode. OEM ID: OEM00000 Product ID: PROD00000000 APIC at: 0xFEE00000 Processor #0 6:6 APIC version 17 Processor #1 6:6 APIC version 17 I/O APIC #2 Version 17 at 0xFEC00000. Enabling APIC mode: Flat. Using 1 I/O APICs Processors: 2 Allocating PCI resources starting at 10000000 (gap: 10000000:eec00000) Built 1 zonelists mapped APIC to ffffd000 (fee00000) mapped IOAPIC to ffffc000 (fec00000) Initializing CPU#0 Kernel command line: root=/dev/hda1 debug earlyprintk=serial,ttyS0,115200 console=ttyS0,115200 console=tty0 3 maxcpus=2 nmi_watchdog=1 debug profile=0 kernel profiling enabled (shift: 0) PID hash table entries: 2048 (order: 11, 32768 bytes) Detected 467.796 MHz processor. Using tsc for high-res timesource disabling early console Console: colour VGA+ 80x25 Dentry cache hash table entries: 65536 (order: 6, 262144 bytes) Inode-cache hash table entries: 32768 (order: 5, 131072 bytes) Memory: 244664k/262144k available (2592k kernel code, 17052k reserved, 1061k data, 224k init, 0k highmem) Checking if this processor honours the WP bit even in supervisor mode... Ok. Calibrating delay using timer specific routine.. 937.21 BogoMIPS (lpj=1874423) Mount-cache hash table entries: 512 CPU: After generic identify, caps: 0183fbff 00000000 00000000 00000000 00000000 00000000 00000000 CPU: After vendor identify, caps: 0183fbff 00000000 00000000 00000000 00000000 00000000 00000000 CPU: L1 I cache: 16K, L1 D cache: 16K CPU: L2 cache: 128K CPU: After all inits, caps: 0183fbff 00000000 00000000 00000040 00000000 00000000 00000000 Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. Enabling fast FPU save and restore... done. Checking 'hlt' instruction... OK. CPU0: Intel Celeron (Mendocino) stepping 05 Booting processor 1/1 eip 2000 Initializing CPU#1 Calibrating delay using timer specific routine.. 935.62 BogoMIPS (lpj=1871246) CPU: After generic identify, caps: 0183fbff 00000000 00000000 00000000 00000000 00000000 00000000 CPU: After vendor identify, caps: 0183fbff 00000000 00000000 00000000 00000000 00000000 00000000 CPU: L1 I cache: 16K, L1 D cache: 16K CPU: L2 cache: 128K CPU: After all inits, caps: 0183fbff 00000000 00000000 00000040 00000000 00000000 00000000 Intel machine check architecture supported. Intel machine check reporting enabled on CPU#1. CPU1: Intel Celeron (Mendocino) stepping 05 Total of 2 processors activated (1872.83 BogoMIPS). ENABLING IO-APIC IRQs ..TIMER: vector=0x31 pin1=2 pin2=0 checking TSC synchronization across 2 CPUs: passed. softlockup thread 0 started up. Brought up 2 CPUs softlockup thread 1 started up. -> [0][1][ 65536] 0.3 [ 0.3] (0): ( 357620 178810) -> [0][1][ 68985] 0.3 [ 0.3] (0): ( 354253 91088) -> [0][1][ 72615] 0.3 [ 0.3] (0): ( 390935 63885) -> [0][1][ 76436] 0.4 [ 0.4] (0): ( 415617 44283) -> [0][1][ 80458] 0.4 [ 0.4] (0): ( 428850 28758) -> [0][1][ 84692] 0.4 [ 0.4] (0): ( 460558 30233) -> [0][1][ 89149] 0.5 [ 0.5] (0): ( 501929 35802) -> [0][1][ 93841] 0.5 [ 0.5] (0): ( 543036 38454) -> [0][1][ 98780] 0.6 [ 0.6] (0): ( 609270 52344) -> [0][1][ 103978] 0.5 [ 0.6] (0): ( 596005 32804) -> [0][1][ 109450] 0.6 [ 0.6] (0): ( 607209 22004) -> [0][1][ 115210] 0.6 [ 0.6] (0): ( 643212 29003) -> [0][1][ 121273] 0.7 [ 0.7] (0): ( 716535 51163) -> [0][1][ 127655] 0.7 [ 0.7] (0): ( 795865 65246) -> [0][1][ 134373] 0.7 [ 0.7] (0): ( 745335 57888) -> [0][1][ 141445] 0.7 [ 0.7] (0): ( 796304 54428) -> [0][1][ 148889] 0.8 [ 0.8] (0): ( 820203 39163) -> [0][1][ 156725] 0.7 [ 0.8] (0): ( 716504 71431) -> [0][1][ 164973] 0.6 [ 0.8] (0): ( 679884 54025) -> [0][1][ 173655] 0.6 [ 0.8] (0): ( 662305 35802) -> [0][1][ 182794] 0.5 [ 0.8] (0): ( 598857 49625) -> [0][1][ 192414] 0.6 [ 0.8] (0): ( 628063 39415) -> [0][1][ 202541] 0.5 [ 0.8] (0): ( 577476 45001) -> found max. [0][1] working set size found: 148889, cost: 820203 --------------------- | migration cost matrix (max_cache_size: 131072, cpu: 467 MHz): --------------------- [00] [01] [00]: - 1.6(0) [01]: 1.6(0) - -------------------------------- | cacheflush times [1]: 1.6 (1640406) | calibration delay: 0 seconds -------------------------------- NET: Registered protocol family 16 PCI: PCI BIOS revision 2.10 entry at 0xfb420, last bus=1 PCI: Using configuration type 1 mtrr: v2.0 (20020519) mtrr: your CPUs had inconsistent fixed MTRR settings mtrr: probably your BIOS does not setup all CPUs. mtrr: corrected configuration. ACPI: Subsystem revision 20050309 ACPI: Interpreter disabled. SCSI subsystem initialized usbcore: registered new driver usbfs usbcore: registered new driver hub PCI: Probing PCI hardware PCI: Probing PCI hardware (bus 00) Boot video device is 0000:01:00.0 PCI: Using IRQ router PIIX/ICH [8086/7110] at 0000:00:07.0 PCI->APIC IRQ transform: 0000:00:07.2[D] -> IRQ 19 PCI->APIC IRQ transform: 0000:00:0b.0[A] -> IRQ 18 PCI->APIC IRQ transform: 0000:00:0d.0[A] -> IRQ 17 PCI->APIC IRQ transform: 0000:00:0f.0[A] -> IRQ 16 PCI->APIC IRQ transform: 0000:00:13.0[A] -> IRQ 18 PCI->APIC IRQ transform: 0000:00:13.1[B] -> IRQ 18 PCI->APIC IRQ transform: 0000:01:00.0[A] -> IRQ 16 Machine check exception polling timer started. inotify device minor=63 Initializing Cryptographic API Limiting direct PCI/PCI transfers. Real Time Clock Driver v1.12 Hangcheck: starting hangcheck timer 0.9.0 (tick is 180 seconds, margin is 60 seconds). Hangcheck: Using monotonic_clock(). cn_fork is registered serio: i8042 AUX port at 0x60,0x64 irq 12 serio: i8042 KBD port at 0x60,0x64 irq 1 Serial: 8250/16550 driver $Revision: 1.90 $ 8 ports, IRQ sharing enabled ÿttyS0 at I/O 0x3f8 (irq = 4) is a 16550A ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A io scheduler noop registered io scheduler anticipatory registered Floppy drive(s): fd0 is 1.44M FDC 0 is a post-1991 82077 pktcdvd: v0.2.0a 2004-07-14 Jens Axboe (axboe@suse.de) and petero2@telia.com Intel(R) PRO/1000 Network Driver - version 6.0.54-k2-NAPI Copyright (c) 1999-2004 Intel Corporation. e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection e100: Intel(R) PRO/100 Network Driver, 3.4.8-k2-NAPI e100: Copyright(c) 1999-2005 Intel Corporation e100: eth1: e100_probe: addr 0xef140000, irq 16, MAC addr 00:90:27:8C:A0:50 Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx PIIX4: IDE controller at PCI slot 0000:00:07.1 PIIX4: chipset revision 1 PIIX4: not 100% native mode: will probe irqs later ide0: BM-DMA at 0xf000-0xf007, BIOS settings: hda:DMA, hdb:pio ide1: BM-DMA at 0xf008-0xf00f, BIOS settings: hdc:DMA, hdd:DMA Probing IDE interface ide0... hda: QUANTUM FIREBALLP LM20.5, ATA DISK drive ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 Probing IDE interface ide1... hdc: QUANTUM FIREBALL SE4.3A, ATA DISK drive hdd: LG CD-ROM CRD-8522B, ATAPI CD/DVD-ROM drive ide1 at 0x170-0x177,0x376 on irq 15 hda: max request size: 128KiB hda: 40132503 sectors (20547 MB) w/1900KiB Cache, CHS=39813/16/63, UDMA(33) hda: cache flushes not supported hda: hda1 hdc: max request size: 128KiB hdc: 8418816 sectors (4310 MB) w/80KiB Cache, CHS=14848/9/63, UDMA(33) hdc: cache flushes not supported hdc: hdc1 hdc2 libata version 1.11 loaded. USB Universal Host Controller Interface driver v2.3 uhci_hcd 0000:00:07.2: Intel Corporation 82371AB/EB/MB PIIX4 USB uhci_hcd 0000:00:07.2: new USB bus registered, assigned bus number 1 uhci_hcd 0000:00:07.2: irq 19, io base 0x0000b000 hub 1-0:1.0: USB hub found hub 1-0:1.0: 2 ports detected usbcore: registered new driver usblp drivers/usb/class/usblp.c: v0.13: USB Printer Device Class driver usbcore: registered new driver hiddev usbcore: registered new driver usbhid drivers/usb/input/hid-core.c: v2.01:USB HID core driver mice: PS/2 mouse device common for all mice input: PC Speaker md: raid1 personality registered as nr 3 md: md driver 0.90.2 MAX_MD_DEVS=256, MD_SB_DISKS=27 md: bitmap version 3.38 NET: Registered protocol family 2 input: AT Translated Set 2 keyboard on isa0060/serio0 IP: routing cache hash table of 512 buckets, 12Kbytes TCP established hash table entries: 16384 (order: 6, 393216 bytes) TCP bind hash table entries: 16384 (order: 6, 327680 bytes) TCP: Hash tables configured (established 16384 bind 16384) TCP reno registered GRE over IPv4 tunneling driver ip_conntrack version 2.1 (2048 buckets, 16384 max) - 212 bytes per conntrack ip_tables: (C) 2000-2002 Netfilter core team NET: Registered protocol family 1 NET: Registered protocol family 17 Testing NMI watchdog ... OK. Starting balanced_irq Using IPI Shortcut mode BIOS EDD facility v0.16 2004-Jun-25, 2 devices found md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE. kjournald starting. Commit interval 5 seconds EXT3-fs: mounted filesystem with ordered data mode. VFS: Mounted root (ext3 filesystem) readonly. Freeing unused kernel memory: 224k freed Unable to handle kernel paging request at virtual address 6b6b6ceb printing eip: c018b51c *pde = 00000000 Oops: 0002 [#1] SMP Modules linked in: CPU: 1 EIP: 0060:[<c018b51c>] Not tainted VLI EFLAGS: 00010206 (2.6.12-mm2) EIP is at sysfs_release+0x3f/0x79 eax: 6b6b6beb ebx: 6b6b6b6b ecx: ce5b5e64 edx: 00000000 esi: cf266ca4 edi: cfa3a544 ebp: ce635f50 esp: ce635f44 ds: 007b es: 007b ss: 0068 Process udev (pid: 1286, threadinfo=ce634000 task=c1fe6040) Stack: 00000010 c1ca8b78 c1d4656c ce635f74 c01580b2 00000000 cf399e10 cf399e10 ce5b5e64 c1d4656c c1cb275c 00000000 ce635f84 c0157f36 c1cb275c c1d4656c ce635f9c c0156873 00000004 00000004 c1cb275c c1cb2760 ce635fb4 c015690a Call Trace: [<c0103a18>] show_stack+0x7c/0x92 [<c0103b99>] show_registers+0x152/0x1ca [<c0103d96>] die+0xf4/0x16f [<c011433f>] do_page_fault+0x466/0x684 [<c0103683>] error_code+0x4f/0x54 [<c01580b2>] __fput+0x176/0x1a9 [<c0157f36>] fput+0x3b/0x41 [<c0156873>] filp_close+0x36/0x65 [<c015690a>] sys_close+0x68/0x83 [<c0102b13>] sysenter_past_esp+0x54/0x75 Code: 58 8b 70 14 8b 41 58 8b 40 14 85 f6 8b 58 04 74 07 89 f0 e8 8b 22 05 00 85 db 74 1a b8 00 e0 ff ff 21 e0 8b 40 10 c1 e0 07 01 d8 <ff> 88 00 01 00 00 83 3b 02 74 22 85 ff 74 0e 8b 47 0c 85 c0 75 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 2.6.12-mm2 2005-06-27 9:37 ` 2.6.12-mm2 Ingo Molnar @ 2005-06-27 21:14 ` Andrew Morton 2005-06-28 7:30 ` 2.6.12-mm2 Ingo Molnar 0 siblings, 1 reply; 9+ messages in thread From: Andrew Morton @ 2005-06-27 21:14 UTC (permalink / raw) To: Ingo Molnar; +Cc: reuben-lkml, linux-kernel Ingo Molnar <mingo@elte.hu> wrote: > > is the fput()/sysfs_release() crash below known? It doesn't ring any bells. You have a use-after-free error when udev is dinking with a sysfs file. It could be anything. Could you debug it a bit, please, work out which file caused the crash? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 2.6.12-mm2 2005-06-27 21:14 ` 2.6.12-mm2 Andrew Morton @ 2005-06-28 7:30 ` Ingo Molnar 0 siblings, 0 replies; 9+ messages in thread From: Ingo Molnar @ 2005-06-28 7:30 UTC (permalink / raw) To: Andrew Morton; +Cc: reuben-lkml, linux-kernel * Andrew Morton <akpm@osdl.org> wrote: > Ingo Molnar <mingo@elte.hu> wrote: > > > > is the fput()/sysfs_release() crash below known? > > It doesn't ring any bells. > > You have a use-after-free error when udev is dinking with a sysfs > file. It could be anything. Could you debug it a bit, please, work > out which file caused the crash? unfortunately it's totally unreproducible, it triggered only once during hundreds of bootups. Will keep eyes open though. Ingo ^ permalink raw reply [flat|nested] 9+ messages in thread
* [patch] spinlock-debug fix 2005-06-27 8:22 ` 2.6.12-mm2 Andrew Morton 2005-06-27 9:37 ` 2.6.12-mm2 Ingo Molnar @ 2005-06-27 9:48 ` Ingo Molnar 2005-06-27 10:57 ` Reuben Farrelly 1 sibling, 1 reply; 9+ messages in thread From: Ingo Molnar @ 2005-06-27 9:48 UTC (permalink / raw) To: Andrew Morton; +Cc: Reuben Farrelly, linux-kernel * Andrew Morton <akpm@osdl.org> wrote: > Reuben Farrelly <reuben-lkml@reub.net> wrote: > > > > > Anyway, scary trace. It look like some spinlock is thought to be in the > > > wrong state in schedule(). Send the .config, please. > > > > Now online at http://www.reub.net/kernel/.config > > Me too. > > BUG: spinlock recursion on CPU#0, swapper/0, c120d520 > [<c01039ed>] dump_stack+0x19/0x20 > [<c01d9af2>] spin_bug+0x42/0x54 > [<c01d9bfa>] _raw_spin_lock+0x3e/0x84 > [<c031d0ad>] _spin_lock+0x9/0x10 > [<c031b9e9>] schedule+0x479/0xbc8 > [<c0100cb4>] cpu_idle+0x88/0x8c > [<c01002c1>] rest_init+0x21/0x28 > [<c0442899>] start_kernel+0x151/0x158 > [<c010020f>] 0xc010020f > Kernel panic - not syncing: bad locking > > The bug is in the new spinlock debugging code itself. Ingo, can you > test that .config please? couldnt reproduce it on an UP box, nor on an SMP/HT 2/4-way box, but it finally triggered on a 2-way SMP box. the bug is that current->pid is not a unique identifier on SMP (doh!). The patch below fixes the bug - which also happens to be a speedup for the debugging code, as the ->pid dereferencing does not have to be done anymore. Also, i've disabled the panicing for now. Ingo - change owner_pid to owner, to fix bad pid uniqueness assumption on SMP - some more debug output printed - dont panic for now Signed-off-by: Ingo Molnar <mingo@elte.hu> include/linux/spinlock_types.h | 16 ++++++++++------ kernel/sched.c | 2 +- lib/spinlock_debug.c | 30 +++++++++++++++++++----------- 3 files changed, 30 insertions(+), 18 deletions(-) Index: linux/include/linux/spinlock_types.h =================================================================== --- linux.orig/include/linux/spinlock_types.h +++ linux/include/linux/spinlock_types.h @@ -21,11 +21,12 @@ typedef struct { unsigned int break_lock; #endif #ifdef CONFIG_DEBUG_SPINLOCK - unsigned int magic, owner_pid, owner_cpu; + unsigned int magic, owner_cpu; + void *owner; #endif } spinlock_t; -#define SPINLOCK_MAGIC 0xdead4ead +#define SPINLOCK_MAGIC 0xdead4ead typedef struct { raw_rwlock_t raw_lock; @@ -33,22 +34,25 @@ typedef struct { unsigned int break_lock; #endif #ifdef CONFIG_DEBUG_SPINLOCK - unsigned int magic, owner_pid, owner_cpu; + unsigned int magic, owner_cpu; + void *owner; #endif } rwlock_t; -#define RWLOCK_MAGIC 0xdeaf1eed +#define RWLOCK_MAGIC 0xdeaf1eed + +#define SPINLOCK_OWNER_INIT ((void *)-1L) #ifdef CONFIG_DEBUG_SPINLOCK # define SPIN_LOCK_UNLOCKED \ (spinlock_t) { .raw_lock = __RAW_SPIN_LOCK_UNLOCKED, \ .magic = SPINLOCK_MAGIC, \ - .owner_pid = -1, \ + .owner = SPINLOCK_OWNER_INIT, \ .owner_cpu = -1 } #define RW_LOCK_UNLOCKED \ (rwlock_t) { .raw_lock = __RAW_RW_LOCK_UNLOCKED, \ .magic = RWLOCK_MAGIC, \ - .owner_pid = -1, \ + .owner = SPINLOCK_OWNER_INIT, \ .owner_cpu = -1 } #else # define SPIN_LOCK_UNLOCKED \ Index: linux/kernel/sched.c =================================================================== --- linux.orig/kernel/sched.c +++ linux/kernel/sched.c @@ -1604,7 +1604,7 @@ static inline void finish_task_switch(ru prev_task_flags = prev->flags; #ifdef CONFIG_DEBUG_SPINLOCK /* this is a valid case when another task releases the spinlock */ - rq->lock.owner_pid = current->pid; + rq->lock.owner = current; #endif finish_arch_switch(prev); finish_lock_switch(rq, prev); Index: linux/lib/spinlock_debug.c =================================================================== --- linux.orig/lib/spinlock_debug.c +++ linux/lib/spinlock_debug.c @@ -14,16 +14,24 @@ static void spin_bug(spinlock_t *lock, const char *msg) { static long print_once = 1; + struct task_struct *owner = NULL; if (xchg(&print_once, 0)) { - printk("BUG: spinlock %s on CPU#%d, %s/%d, %p\n", msg, - smp_processor_id(), current->comm, current->pid, lock); + if (lock->owner && lock->owner != SPINLOCK_OWNER_INIT) + owner = lock->owner; + printk("BUG: spinlock %s on CPU#%d, %s/%d\n", + msg, smp_processor_id(), current->comm, current->pid); + printk(" lock: %p, .magic: %08x, .owner: %s/%d, .owner_cpu: %d\n", + lock, lock->magic, + owner ? owner->comm : "<none>", + owner ? owner->pid : -1, + lock->owner_cpu); dump_stack(); #ifdef CONFIG_SMP /* * We cannot continue on SMP: */ - panic("bad locking"); +// panic("bad locking"); #endif } } @@ -33,7 +41,7 @@ static void spin_bug(spinlock_t *lock, c static inline void debug_spin_lock_before(spinlock_t *lock) { SPIN_BUG_ON(lock->magic != SPINLOCK_MAGIC, lock, "bad magic"); - SPIN_BUG_ON(lock->owner_pid == current->pid, lock, "recursion"); + SPIN_BUG_ON(lock->owner == current, lock, "recursion"); SPIN_BUG_ON(lock->owner_cpu == raw_smp_processor_id(), lock, "cpu recursion"); } @@ -41,17 +49,17 @@ static inline void debug_spin_lock_befor static inline void debug_spin_lock_after(spinlock_t *lock) { lock->owner_cpu = raw_smp_processor_id(); - lock->owner_pid = current->pid; + lock->owner = current; } static inline void debug_spin_unlock(spinlock_t *lock) { SPIN_BUG_ON(lock->magic != SPINLOCK_MAGIC, lock, "bad magic"); SPIN_BUG_ON(!spin_is_locked(lock), lock, "already unlocked"); - SPIN_BUG_ON(lock->owner_pid != current->pid, lock, "wrong owner"); + SPIN_BUG_ON(lock->owner != current, lock, "wrong owner"); SPIN_BUG_ON(lock->owner_cpu != raw_smp_processor_id(), lock, "wrong CPU"); - lock->owner_pid = -1; + lock->owner = SPINLOCK_OWNER_INIT; lock->owner_cpu = -1; } @@ -176,7 +184,7 @@ void _raw_read_unlock(rwlock_t *lock) static inline void debug_write_lock_before(rwlock_t *lock) { RWLOCK_BUG_ON(lock->magic != RWLOCK_MAGIC, lock, "bad magic"); - RWLOCK_BUG_ON(lock->owner_pid == current->pid, lock, "recursion"); + RWLOCK_BUG_ON(lock->owner == current, lock, "recursion"); RWLOCK_BUG_ON(lock->owner_cpu == raw_smp_processor_id(), lock, "cpu recursion"); } @@ -184,16 +192,16 @@ static inline void debug_write_lock_befo static inline void debug_write_lock_after(rwlock_t *lock) { lock->owner_cpu = raw_smp_processor_id(); - lock->owner_pid = current->pid; + lock->owner = current; } static inline void debug_write_unlock(rwlock_t *lock) { RWLOCK_BUG_ON(lock->magic != RWLOCK_MAGIC, lock, "bad magic"); - RWLOCK_BUG_ON(lock->owner_pid != current->pid, lock, "wrong owner"); + RWLOCK_BUG_ON(lock->owner != current, lock, "wrong owner"); RWLOCK_BUG_ON(lock->owner_cpu != raw_smp_processor_id(), lock, "wrong CPU"); - lock->owner_pid = -1; + lock->owner = SPINLOCK_OWNER_INIT; lock->owner_cpu = -1; } ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [patch] spinlock-debug fix 2005-06-27 9:48 ` [patch] spinlock-debug fix Ingo Molnar @ 2005-06-27 10:57 ` Reuben Farrelly 0 siblings, 0 replies; 9+ messages in thread From: Reuben Farrelly @ 2005-06-27 10:57 UTC (permalink / raw) To: Ingo Molnar; +Cc: Andrew Morton, linux-kernel Hi, On 27/06/2005 9:48 p.m., Ingo Molnar wrote: > * Andrew Morton <akpm@osdl.org> wrote: > >> Reuben Farrelly <reuben-lkml@reub.net> wrote: >>> > Anyway, scary trace. It look like some spinlock is thought to be in the >>> > wrong state in schedule(). Send the .config, please. >>> >>> Now online at http://www.reub.net/kernel/.config >> Me too. >> >> BUG: spinlock recursion on CPU#0, swapper/0, c120d520 >> [<c01039ed>] dump_stack+0x19/0x20 >> [<c01d9af2>] spin_bug+0x42/0x54 >> [<c01d9bfa>] _raw_spin_lock+0x3e/0x84 >> [<c031d0ad>] _spin_lock+0x9/0x10 >> [<c031b9e9>] schedule+0x479/0xbc8 >> [<c0100cb4>] cpu_idle+0x88/0x8c >> [<c01002c1>] rest_init+0x21/0x28 >> [<c0442899>] start_kernel+0x151/0x158 >> [<c010020f>] 0xc010020f >> Kernel panic - not syncing: bad locking >> >> The bug is in the new spinlock debugging code itself. Ingo, can you >> test that .config please? > > couldnt reproduce it on an UP box, nor on an SMP/HT 2/4-way box, but it > finally triggered on a 2-way SMP box. > > the bug is that current->pid is not a unique identifier on SMP (doh!). > > The patch below fixes the bug - which also happens to be a speedup for > the debugging code, as the ->pid dereferencing does not have to be done > anymore. Also, i've disabled the panicing for now. > > Ingo This patch fixes it - thanks Ingo. reuben ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2005-06-28 7:36 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <fa.h6rvsi4.j68fhk@ifi.uio.no>
2005-06-27 6:44 ` 2.6.12-mm2 Reuben Farrelly
2005-06-27 7:24 ` 2.6.12-mm2 Andrew Morton
2005-06-27 7:47 ` 2.6.12-mm2 Reuben Farrelly
2005-06-27 8:22 ` 2.6.12-mm2 Andrew Morton
2005-06-27 9:37 ` 2.6.12-mm2 Ingo Molnar
2005-06-27 21:14 ` 2.6.12-mm2 Andrew Morton
2005-06-28 7:30 ` 2.6.12-mm2 Ingo Molnar
2005-06-27 9:48 ` [patch] spinlock-debug fix Ingo Molnar
2005-06-27 10:57 ` Reuben Farrelly
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox