* 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849
@ 2005-09-15 16:51 Petr Vandrovec
2005-09-15 17:33 ` Petr Vandrovec
[not found] ` <20050916023005.4146e499.akpm@osdl.org>
0 siblings, 2 replies; 43+ messages in thread
From: Petr Vandrovec @ 2005-09-15 16:51 UTC (permalink / raw)
To: Linux-kernel
Hello,
so now once crashes on UP system were sorted out, I tried to
put new kernel on my SMP host - and sorry to say, but it does not
seem to work as advertised :-( It seems that we somehow got
blocks from CPU#1 into memory blocks on CPU#0, and free_block
complains that caller holds cachep->nodelists[0]->list_lock
while nodeid for block passed to free_block() comes from processor
(and node) #1...
I cannot find how this happened. Hopefully somebody else
will know... Meanwhile I'll try to get rid of PREEMPT, apparently
although it is now masqueraded under 'Low-latency desktop' it
is still somewhat dangerous. If it is triggered by preempt, that is.
Thanks,
Petr Vandrovec
ttyS0 at I/O 0x3f8 (irq = 0) is a 16550A
ttyS1 at I/O 0x2f8 (irq = 0) is a 16550A
ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
parport: PnPBIOS parport detected.
parport0: PC-style at 0x378 (0x778), irq 7, dma 3 [PCSPP,TRISTATE,COMPAT,EPP,ECP,DMA]
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered
pktcdvd: v0.2.0a 2004-07-14 Jens Axboe (axboe@suse.de) and petero2@telia.com
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
AMD8111: IDE controller at PCI slot 0000:00:07.1
AMD8111: chipset revision 3
AMD8111: not 100% native mode: will probe irqs later
AMD8111: 0000:00:07.1 (rev 03) UDMA133 controller
ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:DMA, hdb:pio
ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:DMA, hdd:pio
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at mm/slab.c:1849
invalid operand: 0000 [1] PREEMPT SMP
CPU 0
Modules linked in:
Pid: 8, comm: events/0 Not tainted 2.6.14-rc1-1619 #1
RIP: 0010:[<ffffffff8016e826>] <ffffffff8016e826>{free_block+294}
RSP: 0000:ffff81007ff21d88 EFLAGS: 00010002
RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000310
RDX: 0000000000000000 RSI: ffff81007ffddd10 RDI: ffff81007ffda080
RBP: ffff81007ffde000 R08: ffff81003ffaed90 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff81007ffc9b50
R13: ffff81007ffde048 R14: ffff81007ffda080 R15: ffff81007ffda080
FS: 0000000000000000(0000) GS:ffffffff805fb800(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000000101000 CR4: 00000000000006e0
Process events/0 (pid: 8, threadinfo ffff81007ff20000, task ffff81003ff8c790)
Stack: 0000000000000000 0000000000000000 0000000000000213 0000000200000000
ffff81007ffddd10 ffff81007ffddd10 ffff81007ffddce8 0000000000000002
0000000000000000 ffff81007ffda080
Call Trace:<ffffffff8016fdc7>{drain_array_locked+167} <ffffffff8016feee>{cache_reap+206}
<ffffffff803a2374>{_spin_lock_irqsave+36} <ffffffff8016fe20>{cache_reap+0}
<ffffffff8014a1bc>{worker_thread+476} <ffffffff80132610>{default_wake_function+0}
<ffffffff80132610>{default_wake_function+0} <ffffffff80149fe0>{worker_thread+0}
<ffffffff8014ebc2>{kthread+146} <ffffffff8010ed12>{child_rip+8}
<ffffffff80149fe0>{worker_thread+0} <ffffffff8014eb30>{kthread+0}
<ffffffff8010ed0a>{child_rip+0}
Code: 0f 0b 68 bd aa 3d 80 c2 39 07 48 89 ee 4c 89 ff 4c 8d 75 30
RIP <ffffffff8016e826>{free_block+294} RSP <ffff81007ff21d88>
<6>note: events/0[8] exited with preempt_count 1
hda: _NEC DVD_RW ND-3500AG, ATAPI CD/DVD-ROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Probing IDE interface ide1...
hdc: WDC WD1200JB-00CRA0, ATA DISK drive
ide1 at 0x170-0x177,0x376 on irq 15
hdc: max request size: 128KiB
hdc: 234441648 sectors (120034 MB) w/8192KiB Cache, CHS=65535/16/63, UDMA(100)
hdc: cache flushes not supported
hdc: hdc1
libata version 1.12 loaded.
sata_sil version 0.9
ACPI: PCI Interrupt 0000:01:0b.0[A] -> GSI 17 (level, low) -> IRQ 177
<and box is dead>
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849 2005-09-15 16:51 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849 Petr Vandrovec @ 2005-09-15 17:33 ` Petr Vandrovec [not found] ` <20050916023005.4146e499.akpm@osdl.org> 1 sibling, 0 replies; 43+ messages in thread From: Petr Vandrovec @ 2005-09-15 17:33 UTC (permalink / raw) To: Linux-kernel Petr Vandrovec wrote: > Hello, > so now once crashes on UP system were sorted out, I tried to > put new kernel on my SMP host - and sorry to say, but it does not > seem to work as advertised :-( It seems that we somehow got > blocks from CPU#1 into memory blocks on CPU#0, and free_block > complains that caller holds cachep->nodelists[0]->list_lock > while nodeid for block passed to free_block() comes from processor > (and node) #1... > > I cannot find how this happened. Hopefully somebody else > will know... Meanwhile I'll try to get rid of PREEMPT, apparently > although it is now masqueraded under 'Low-latency desktop' it > is still somewhat dangerous. If it is triggered by preempt, that is. It is not caused by preempt, non-preempt kernel crashes exactly same way. Petr > Thanks, > Petr Vandrovec > > > ttyS0 at I/O 0x3f8 (irq = 0) is a 16550A > ttyS1 at I/O 0x2f8 (irq = 0) is a 16550A > ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A > ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A > parport: PnPBIOS parport detected. > parport0: PC-style at 0x378 (0x778), irq 7, dma 3 > [PCSPP,TRISTATE,COMPAT,EPP,ECP,DMA] > io scheduler noop registered > io scheduler anticipatory registered > io scheduler deadline registered > io scheduler cfq registered > pktcdvd: v0.2.0a 2004-07-14 Jens Axboe (axboe@suse.de) and > petero2@telia.com > Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 > ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx > AMD8111: IDE controller at PCI slot 0000:00:07.1 > AMD8111: chipset revision 3 > AMD8111: not 100% native mode: will probe irqs later > AMD8111: 0000:00:07.1 (rev 03) UDMA133 controller > ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:DMA, hdb:pio > ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:DMA, hdd:pio > ----------- [cut here ] --------- [please bite here ] --------- > Kernel BUG at mm/slab.c:1849 > invalid operand: 0000 [1] PREEMPT SMP > CPU 0 > Modules linked in: > Pid: 8, comm: events/0 Not tainted 2.6.14-rc1-1619 #1 > RIP: 0010:[<ffffffff8016e826>] <ffffffff8016e826>{free_block+294} > RSP: 0000:ffff81007ff21d88 EFLAGS: 00010002 > RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000310 > RDX: 0000000000000000 RSI: ffff81007ffddd10 RDI: ffff81007ffda080 > RBP: ffff81007ffde000 R08: ffff81003ffaed90 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000000 R12: ffff81007ffc9b50 > R13: ffff81007ffde048 R14: ffff81007ffda080 R15: ffff81007ffda080 > FS: 0000000000000000(0000) GS:ffffffff805fb800(0000) > knlGS:0000000000000000 > CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b > CR2: 0000000000000000 CR3: 0000000000101000 CR4: 00000000000006e0 > Process events/0 (pid: 8, threadinfo ffff81007ff20000, task > ffff81003ff8c790) > Stack: 0000000000000000 0000000000000000 0000000000000213 0000000200000000 > ffff81007ffddd10 ffff81007ffddd10 ffff81007ffddce8 0000000000000002 > 0000000000000000 ffff81007ffda080 > Call Trace:<ffffffff8016fdc7>{drain_array_locked+167} > <ffffffff8016feee>{cache_reap+206} > <ffffffff803a2374>{_spin_lock_irqsave+36} > <ffffffff8016fe20>{cache_reap+0} > <ffffffff8014a1bc>{worker_thread+476} > <ffffffff80132610>{default_wake_function+0} > <ffffffff80132610>{default_wake_function+0} > <ffffffff80149fe0>{worker_thread+0} > <ffffffff8014ebc2>{kthread+146} <ffffffff8010ed12>{child_rip+8} > <ffffffff80149fe0>{worker_thread+0} <ffffffff8014eb30>{kthread+0} > <ffffffff8010ed0a>{child_rip+0} > > Code: 0f 0b 68 bd aa 3d 80 c2 39 07 48 89 ee 4c 89 ff 4c 8d 75 30 > RIP <ffffffff8016e826>{free_block+294} RSP <ffff81007ff21d88> > <6>note: events/0[8] exited with preempt_count 1 > hda: _NEC DVD_RW ND-3500AG, ATAPI CD/DVD-ROM drive > ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 > Probing IDE interface ide1... > hdc: WDC WD1200JB-00CRA0, ATA DISK drive > ide1 at 0x170-0x177,0x376 on irq 15 > hdc: max request size: 128KiB > hdc: 234441648 sectors (120034 MB) w/8192KiB Cache, CHS=65535/16/63, > UDMA(100) > hdc: cache flushes not supported > hdc: hdc1 > libata version 1.12 loaded. > sata_sil version 0.9 > ACPI: PCI Interrupt 0000:01:0b.0[A] -> GSI 17 (level, low) -> IRQ 177 > <and box is dead> > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 43+ messages in thread
[parent not found: <20050916023005.4146e499.akpm@osdl.org>]
[parent not found: <432AA00D.4030706@vc.cvut.cz>]
[parent not found: <20050916230809.789d6b0b.akpm@osdl.org>]
* Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849 [not found] ` <20050916230809.789d6b0b.akpm@osdl.org> @ 2005-09-19 16:02 ` Petr Vandrovec 2005-09-19 18:29 ` Andrew Morton 0 siblings, 1 reply; 43+ messages in thread From: Petr Vandrovec @ 2005-09-19 16:02 UTC (permalink / raw) To: Andrew Morton; +Cc: Linux-kernel Andrew Morton wrote: > Petr Vandrovec <vandrove@vc.cvut.cz> wrote: > >>Andrew Morton wrote: >> > Petr Vandrovec <vandrove@vc.cvut.cz> wrote: >> > >> >> so now once crashes on UP system were sorted out, I tried to >> >> put new kernel on my SMP host - and sorry to say, but it does not >> >> seem to work as advertised :-( >> > >> > .config (again), please. >> >> Any SMP with NUMA. One which I'm trying to debug now is attached. >> It is available at http://vana.vc.cvut.cz/config as well. > > I can get 2.6.14-rc1 to crash with your .config, but current -linus is OK. It still dies for me, with current git (tree 7513cdadc661cfe0bd1625145a4876e54df191ca, commit 6c0741fbdee5bd0f8ed13ac287c4ab18e8ba7d83). Config is available at http://platan.vc.cvut.cz/config-vana.txt. Box is dual opteron Tyan K8W, S2885. Any idea how to track problem down? I'm not sure bisect will work without lot of interaction & patching, as almost all kernels after 2.6.13 were dying with some other problems on that box... Thanks, Petr Vandrovec Bootdata ok (command line is BOOT_IMAGE=Linux ro root=801 ramdisk=0 console=ttyS0,115200 console=tty0 nmi_watchdog=2 psmouse_noext=1 verbose) Linux version 2.6.14-rc1-6c07 (root@vana) (gcc version 3.3.3 (Debian 20040401)) #4 SMP Mon Sep 19 17:44:44 CEST 2005 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009fc00 (usable) BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved) BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 000000007fff0000 (usable) BIOS-e820: 000000007fff0000 - 000000007ffff000 (ACPI data) BIOS-e820: 000000007ffff000 - 0000000080000000 (ACPI NVS) BIOS-e820: 00000000ff780000 - 0000000100000000 (reserved) SRAT: PXM 0 -> APIC 0 -> Node 0 SRAT: PXM 1 -> APIC 1 -> Node 1 SRAT: Node 0 PXM 0 100000-3fffffff SRAT: Node 1 PXM 1 40000000-7fffffff SRAT: Node 0 PXM 0 0-3fffffff Bootmem setup node 0 0000000000000000-000000003fffffff Bootmem setup node 1 0000000040000000-000000007ffeffff ACPI: PM-Timer IO Port: 0x5008 ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) Processor #0 15:5 APIC version 16 ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled) Processor #1 15:5 APIC version 16 ACPI: LAPIC (acpi_id[0x03] lapic_id[0x82] disabled) ACPI: LAPIC (acpi_id[0x04] lapic_id[0x83] disabled) ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0]) IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, GSI 0-23 ACPI: IOAPIC (id[0x03] address[0xff4ff000] gsi_base[24]) IOAPIC[1]: apic_id 3, version 17, address 0xff4ff000, GSI 24-27 ACPI: IOAPIC (id[0x04] address[0xff4fe000] gsi_base[28]) IOAPIC[2]: apic_id 4, version 17, address 0xff4fe000, GSI 28-31 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) Setting APIC routing to flat ACPI: HPET id: 0x102282a0 base: 0xfec01000 Using ACPI (MADT) for SMP configuration information Allocating PCI resources starting at 88000000 (gap: 80000000:7f780000) Checking aperture... CPU 0: aperture @ c0000000 size 512 MB CPU 1: aperture @ c0000000 size 512 MB Built 2 zonelists Kernel command line: BOOT_IMAGE=Linux ro root=801 ramdisk=0 console=ttyS0,115200 console=tty0 nmi_watchdog=2 psmouse_noext=1 verbose Parameter psmouse_noext is obsolete, ignored Initializing CPU#0 PID hash table entries: 4096 (order: 12, 131072 bytes) time.c: Using 14.318180 MHz HPET timer. time.c: Detected 1993.374 MHz processor. Console: colour VGA+ 80x25 Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes) Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes) Memory: 2056328k/2097088k available (2616k kernel code, 40372k reserved, 1869k data, 248k init) Calibrating delay using timer specific routine.. 3991.40 BogoMIPS (lpj=19957004) Security Framework v1.0.0 initialized SELinux: Initializing. SELinux: Starting in permissive mode Mount-cache hash table entries: 256 CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 1024K (64 bytes/line) CPU 0(1) -> Node 0 -> Core 0 mtrr: v2.0 (20020519) Using local APIC timer interrupts. Detected 12.458 MHz APIC timer. softlockup thread 0 started up. Booting processor 1/2 APIC 0x1 Initializing CPU#1 Calibrating delay using timer specific routine.. 3986.60 BogoMIPS (lpj=19933040) CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 1024K (64 bytes/line) CPU 1(1) -> Node 1 -> Core 0 AMD Opteron(tm) Processor 246 stepping 0a CPU 1: Syncing TSC to CPU 0. CPU 1: synchronized TSC with CPU 0 (last diff -8 cycles, maxerr 1095 cycles) Brought up 2 CPUs softlockup thread 1 started up. time.c: Using HPET based timekeeping. testing NMI watchdog ... OK. NET: Registered protocol family 16 ACPI: bus type pci registered PCI: Using configuration type 1 ACPI: Subsystem revision 20050902 ACPI: Interpreter enabled ACPI: Using IOAPIC for interrupt routing ACPI: PCI Root Bridge [PCI0] (0000:00) PCI: Probing PCI hardware (bus 00) ACPI: PCI Root Bridge [PCIB] (0000:04) PCI: Probing PCI hardware (bus 04) ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 9 10 *11 12 14 15) ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 9 *10 11 12 14 15) ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 *9 10 11 12 14 15) ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 *5 6 7 9 10 11 12 14 15) Linux Plug and Play Support v0.97 (c) Adam Belay pnp: PnP ACPI init pnp: PnP ACPI: found 14 devices SCSI subsystem initialized PCI: Using ACPI for IRQ routing PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report TC classifier action (bugs to netdev@vger.kernel.org cc hadi@cyberus.ca) hpet0: at MMIO 0xfec01000, IRQs 2, 8, 0 hpet0: 69ns tick, 3 32-bit timers agpgart: Detected AMD 8151 AGP Bridge rev B3 agpgart: AGP aperture is 512M @ 0xc0000000 PCI-DMA: Disabling IOMMU. pnp: 00:09: ioport range 0x680-0x6ff has been reserved pnp: 00:09: ioport range 0x295-0x296 has been reserved pnp: 00:09: ioport range 0xb78-0xb7f has been reserved pnp: 00:09: ioport range 0xf78-0xf7f has been reserved PCI: Bridge: 0000:00:06.0 IO window: 9000-afff MEM window: ff100000-ff2fffff PREFETCH window: disabled. PCI: Bridge: 0000:00:0a.0 IO window: disabled. MEM window: ff300000-ff3fffff PREFETCH window: 9e900000-9e9fffff PCI: Bridge: 0000:00:0b.0 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. PCI: Bridge: 0000:04:01.0 IO window: c000-cfff MEM window: ff500000-ff5fffff PREFETCH window: 9eb00000-beafffff IA-32 Microcode Update Driver: v1.14 <tigran@veritas.com> IA32 emulation $Id: sys_ia32.c,v 1.32 2002/03/24 13:02:28 ak Exp $ audit: initializing netlink socket (disabled) audit(1127144995.320:1): initialized Total HugeTLB memory allocated, 0 SELinux: Registering netfilter hooks Initializing Cryptographic API PCI: MSI quirk detected. pci_msi_quirk set. PCI: MSI quirk detected. pci_msi_quirk set. ACPI: PCI Interrupt 0000:05:00.0[A] -> GSI 16 (level, low) -> IRQ 169 radeonfb: Found Intel x86 BIOS ROM Image radeonfb: Retreived PLL infos from BIOS radeonfb: Reference=27.00 MHz (RefDiv=12) Memory=200.00 Mhz, System=166.00 MHz radeonfb: PLL min 20000 max 40000 radeonfb: Monitor 1 type DFP found radeonfb: EDID probed radeonfb: Monitor 2 type no found Console: switching to colour frame buffer device 240x75 radeonfb (0000:05:00.0): ATI Radeon Yd ACPI: Power Button (FF) [PWRF] ACPI: Power Button (CM) [PWRB] Using specific hotkey driver ACPI: CPU0 (power states: C1[C1]) ACPI: Processor [CPU1] (supports 8 throttling states) ACPI: CPU1 (power states: C1[C1]) Real Time Clock Driver v1.12 hpet_acpi_add: no address or irqs in _CRS Linux agpgart interface v0.101 (c) Dave Jones [drm] Initialized drm 1.0.0 20040925 PNP: PS/2 Controller [PNP0303:PS2K] at 0x60,0x64 irq 1 PNP: PS/2 controller doesn't have AUX irq; using default 12 serio: i8042 AUX port at 0x60,0x64 irq 12 serio: i8042 KBD port at 0x60,0x64 irq 1 Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A ttyS0 at I/O 0x3f8 (irq = 0) is a 16550A ttyS1 at I/O 0x2f8 (irq = 0) is a 16550A ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A parport: PnPBIOS parport detected. parport0: PC-style at 0x378 (0x778), irq 7, dma 3 [PCSPP,TRISTATE,COMPAT,EPP,ECP,DMA] io scheduler noop registered io scheduler anticipatory registered io scheduler deadline registered io scheduler cfq registered pktcdvd: v0.2.0a 2004-07-14 Jens Axboe (axboe@suse.de) and petero2@telia.com Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx AMD8111: IDE controller at PCI slot 0000:00:07.1 AMD8111: chipset revision 3 AMD8111: not 100% native mode: will probe irqs later AMD8111: 0000:00:07.1 (rev 03) UDMA133 controller ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:DMA, hdb:pio ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:DMA, hdd:pio ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at mm/slab.c:1849 invalid operand: 0000 [1] SMP CPU 0 Modules linked in: Pid: 8, comm: events/0 Not tainted 2.6.14-rc1-6c07 #1 RIP: 0010:[<ffffffff8016d316>] <ffffffff8016d316>{free_block+294} RSP: 0000:ffff81007ff21d88 EFLAGS: 00010002 RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000310 RDX: 0000000000000000 RSI: ffff81007ffddd10 RDI: ffff81007ffda080 RBP: ffff81007ffde000 R08: ffff81003ffa0d50 R09: 0000000000000000 R10: 00000000ffffffff R11: 0000000000000000 R12: ffff81007ffc9b50 R13: ffff81007ffde048 R14: ffff81007ffda080 R15: ffff81007ffda080 FS: 0000000000000000(0000) GS:ffffffff805f2800(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000000 CR3: 0000000000101000 CR4: 00000000000006e0 Process events/0 (pid: 8, threadinfo ffff81007ff20000, task ffff81003ff8c790) Stack: 0000000000000000 0000000000000000 0000000000000292 hda: _NEC DVD_RW ND-3500AG, ATAPI CD/DVD-ROM drive 0000000200000000 ffff81007ffddd10 ffff81007ffddd10 ffff81007ffddce8 0000000000000002 0000000000000000 ffff81007ffda080 Call Trace:<ffffffff8016e8b7>{drain_array_locked+167} <ffffffff8016e9f7>{cache_reap+231} <ffffffff80131e23>{__wake_up+67} <ffffffff8016e910>{cache_reap+0} <ffffffff8014930c>{worker_thread+476} <ffffffff80131d60>{default_wake_function+0} <ffffffff80131d60>{default_wake_function+0} <ffffffff80149130>{worker_thread+0} <ffffffff8014db82>{kthread+146} <ffffffff8010ec22>{child_rip+8} <ffffffff80149130>{worker_thread+0} <ffffffff8014daf0>{kthread+0} <ffffffff8010ec1a>{child_rip+0} Code: 0f 0b 68 9d 26 3d 80 c2 39 07 48 89 ee 4c 89 ff 4c 8d 75 30 RIP <ffffffff8016d316>{free_block+294} RSP <ffff81007ff21d88> ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 Probing IDE interface ide1... hdc: WDC WD1200JB-00CRA0, ATA DISK drive ide1 at 0x170-0x177,0x376 on irq 15 hdc: max request size: 128KiB hdc: 234441648 sectors (120034 MB) w/8192KiB Cache, CHS=65535/16/63, UDMA(100) hdc: cache flushes not supported hdc: hdc1 libata version 1.12 loaded. sata_sil version 0.9 ACPI: PCI Interrupt 0000:01:0b.0[A] -> GSI 17 (level, low) -> IRQ 177 ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849 2005-09-19 16:02 ` Petr Vandrovec @ 2005-09-19 18:29 ` Andrew Morton 2005-09-19 18:51 ` Christoph Lameter 2005-09-19 18:56 ` Petr Vandrovec 0 siblings, 2 replies; 43+ messages in thread From: Andrew Morton @ 2005-09-19 18:29 UTC (permalink / raw) To: Petr Vandrovec; +Cc: linux-kernel, Christoph Lameter Petr Vandrovec <vandrove@vc.cvut.cz> wrote: > > Andrew Morton wrote: > > Petr Vandrovec <vandrove@vc.cvut.cz> wrote: > > > >>Andrew Morton wrote: > >> > Petr Vandrovec <vandrove@vc.cvut.cz> wrote: > >> > > >> >> so now once crashes on UP system were sorted out, I tried to > >> >> put new kernel on my SMP host - and sorry to say, but it does not > >> >> seem to work as advertised :-( > >> > > >> > .config (again), please. > >> > >> Any SMP with NUMA. One which I'm trying to debug now is attached. > >> It is available at http://vana.vc.cvut.cz/config as well. > > > > I can get 2.6.14-rc1 to crash with your .config, but current -linus is OK. > > It still dies for me, with current git (tree 7513cdadc661cfe0bd1625145a4876e54df191ca, > commit 6c0741fbdee5bd0f8ed13ac287c4ab18e8ba7d83). Config is available at > http://platan.vc.cvut.cz/config-vana.txt. Box is dual opteron Tyan K8W, S2885. > > ... > > ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:DMA, hdb:pio > ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:DMA, hdd:pio > ----------- [cut here ] --------- [please bite here ] --------- > Kernel BUG at mm/slab.c:1849 > invalid operand: 0000 [1] SMP > CPU 0 > Modules linked in: > Pid: 8, comm: events/0 Not tainted 2.6.14-rc1-6c07 #1 > RIP: 0010:[<ffffffff8016d316>] <ffffffff8016d316>{free_block+294} > RSP: 0000:ffff81007ff21d88 EFLAGS: 00010002 > RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000310 > RDX: 0000000000000000 RSI: ffff81007ffddd10 RDI: ffff81007ffda080 > RBP: ffff81007ffde000 R08: ffff81003ffa0d50 R09: 0000000000000000 > R10: 00000000ffffffff R11: 0000000000000000 R12: ffff81007ffc9b50 > R13: ffff81007ffde048 R14: ffff81007ffda080 R15: ffff81007ffda080 > FS: 0000000000000000(0000) GS:ffffffff805f2800(0000) knlGS:0000000000000000 > CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b > CR2: 0000000000000000 CR3: 0000000000101000 CR4: 00000000000006e0 > Process events/0 (pid: 8, threadinfo ffff81007ff20000, task ffff81003ff8c790) > Stack: 0000000000000000 0000000000000000 0000000000000292 hda: _NEC DVD_RW ND-3500AG, ATAPI CD/DVD-ROM drive > 0000000200000000 > ffff81007ffddd10 ffff81007ffddd10 ffff81007ffddce8 0000000000000002 > 0000000000000000 ffff81007ffda080 > Call Trace:<ffffffff8016e8b7>{drain_array_locked+167} <ffffffff8016e9f7>{cache_reap+231} > <ffffffff80131e23>{__wake_up+67} <ffffffff8016e910>{cache_reap+0} > <ffffffff8014930c>{worker_thread+476} <ffffffff80131d60>{default_wake_function+0} > <ffffffff80131d60>{default_wake_function+0} <ffffffff80149130>{worker_thread+0} > <ffffffff8014db82>{kthread+146} <ffffffff8010ec22>{child_rip+8} > <ffffffff80149130>{worker_thread+0} <ffffffff8014daf0>{kthread+0} > <ffffffff8010ec1a>{child_rip+0} > > Code: 0f 0b 68 9d 26 3d 80 c2 39 07 48 89 ee 4c 89 ff 4c 8d 75 30 > RIP <ffffffff8016d316>{free_block+294} RSP <ffff81007ff21d88> > ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 Well. The CPU_UP_CANCELED locking in cpuup_callback() looks borked to me - it takes cachep->nodelists[node]->list_lock and then calls drain_alien_cache() which appears to take the same lock. But that's not the problem here. The code in cache_reap() recalculates numa_node_id() multiple times, so if the caller changes CPUs then this assertion will trigger. However it's running under keventd here, which is pinned to a single CPU. Still, it would be useful if you could try putting preempt_disable()s in cache_reap(), or change cache_reap() to evaluate numa_node_id() just the once, and cache that in a local variable. I wonder why numa_node_id() uses raw_smp_processor_id()? That's just asking for preempt non-atomicity bugs. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849 2005-09-19 18:29 ` Andrew Morton @ 2005-09-19 18:51 ` Christoph Lameter 2005-09-19 19:28 ` Andrew Morton 2005-09-19 18:56 ` Petr Vandrovec 1 sibling, 1 reply; 43+ messages in thread From: Christoph Lameter @ 2005-09-19 18:51 UTC (permalink / raw) To: Andrew Morton; +Cc: Petr Vandrovec, alokk, linux-kernel On Mon, 19 Sep 2005, Andrew Morton wrote: > Well. The CPU_UP_CANCELED locking in cpuup_callback() looks borked to me - > it takes cachep->nodelists[node]->list_lock and then calls > drain_alien_cache() which appears to take the same lock. But that's not > the problem here. > > The code in cache_reap() recalculates numa_node_id() multiple times, so if > the caller changes CPUs then this assertion will trigger. However it's > running under keventd here, which is pinned to a single CPU. Still, it > would be useful if you could try putting preempt_disable()s in > cache_reap(), or change cache_reap() to evaluate numa_node_id() just the > once, and cache that in a local variable. drain_array_cache_locked calls check_spinlock_acquired_node which is in turn insuring that interrupts are off. So no move to a different processor should be possible. However, that is contradicted by __wake_up calling drain_array_cache_locked. The process just woke up? > I wonder why numa_node_id() uses raw_smp_processor_id()? That's just > asking for preempt non-atomicity bugs. Accessing arrays indexed by node number even works if the process continues to be executed on another node. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849 2005-09-19 18:51 ` Christoph Lameter @ 2005-09-19 19:28 ` Andrew Morton 2005-09-19 21:20 ` Christoph Lameter 0 siblings, 1 reply; 43+ messages in thread From: Andrew Morton @ 2005-09-19 19:28 UTC (permalink / raw) To: Christoph Lameter; +Cc: vandrove, alokk, linux-kernel Christoph Lameter <clameter@engr.sgi.com> wrote: > > On Mon, 19 Sep 2005, Andrew Morton wrote: > > > Well. The CPU_UP_CANCELED locking in cpuup_callback() looks borked to me - > > it takes cachep->nodelists[node]->list_lock and then calls > > drain_alien_cache() which appears to take the same lock. But that's not > > the problem here. > > > > The code in cache_reap() recalculates numa_node_id() multiple times, so if > > the caller changes CPUs then this assertion will trigger. However it's > > running under keventd here, which is pinned to a single CPU. Still, it > > would be useful if you could try putting preempt_disable()s in > > cache_reap(), or change cache_reap() to evaluate numa_node_id() just the > > once, and cache that in a local variable. > > drain_array_cache_locked calls check_spinlock_acquired_node which is in > turn insuring that interrupts are off. So no move to a different processor > should be possible. list_for_each(walk, &cache_chain) { kmem_cache_t *searchp; struct list_head* p; int tofree; struct slab *slabp; searchp = list_entry(walk, kmem_cache_t, next); if (searchp->flags & SLAB_NO_REAP) goto next; check_irq_on(); l3 = searchp->nodelists[numa_node_id()]; if (l3->alien) drain_alien_cache(searchp, l3); ->preempt here spin_lock_irq(&l3->list_lock); drain_array_locked(searchp, ac_data(searchp), 0, numa_node_id()); ->oops, wrong node. Still, this should all be pinned to one CPU, by happenstance. > However, that is contradicted by __wake_up calling > drain_array_cache_locked. The process just woke up? Not sure what you mean here. > > I wonder why numa_node_id() uses raw_smp_processor_id()? That's just > > asking for preempt non-atomicity bugs. > > Accessing arrays indexed by node number even works if the process > continues to be executed on another node. That's a special case and the callers should be changed to use a new raw_numa_node_id() in that case. Code which calls numa_node_id() and then continues to use the result of that in preemptible code is often buggy. Code which reevaluates numa_node_id() in preemptible code and assumes that it returned the same thing is even buggier (unless it happens to be CPU pinned). numa_node_id() is doing a bad thing and should be converted to use smp_processor_id() so we can identify all the possibly-buggy callsites. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849 2005-09-19 19:28 ` Andrew Morton @ 2005-09-19 21:20 ` Christoph Lameter 2005-09-20 5:16 ` Andrew Morton 0 siblings, 1 reply; 43+ messages in thread From: Christoph Lameter @ 2005-09-19 21:20 UTC (permalink / raw) To: Andrew Morton; +Cc: vandrove, alokk, linux-kernel, manfred On Mon, 19 Sep 2005, Andrew Morton wrote: > list_for_each(walk, &cache_chain) { > kmem_cache_t *searchp; > struct list_head* p; > int tofree; > struct slab *slabp; > > searchp = list_entry(walk, kmem_cache_t, next); > > if (searchp->flags & SLAB_NO_REAP) > goto next; > > check_irq_on(); > > l3 = searchp->nodelists[numa_node_id()]; > if (l3->alien) > drain_alien_cache(searchp, l3); > ->preempt here > spin_lock_irq(&l3->list_lock); > > drain_array_locked(searchp, ac_data(searchp), 0, > numa_node_id()); > ->oops, wrong node. This is called from keventd which exists per processor. Hmmm... This looks as if it can change processors after all but the slab allocator depends on it running on the right processor. So does the page allocator. sigh. What is the point of having per processor workqueues if they do not stay on the assigned processor? The fast fix for this case is to get the node number once and then use it consistently. But we really need to audit the slab and page allocator for additional cases like this or disable preempt and check for the right processor in cache_reap(). Index: linux-2.6/mm/slab.c =================================================================== --- linux-2.6.orig/mm/slab.c 2005-09-19 14:10:33.489800899 -0700 +++ linux-2.6/mm/slab.c 2005-09-19 14:10:44.555105862 -0700 @@ -3262,6 +3262,7 @@ { struct list_head *walk; struct kmem_list3 *l3; + int node = numa_node_id(); if (down_trylock(&cache_chain_sem)) { /* Give up. Setup the next iteration. */ @@ -3282,13 +3283,13 @@ check_irq_on(); - l3 = searchp->nodelists[numa_node_id()]; + l3 = searchp->nodelists[node]; if (l3->alien) drain_alien_cache(searchp, l3); spin_lock_irq(&l3->list_lock); drain_array_locked(searchp, ac_data(searchp), 0, - numa_node_id()); + node); if (time_after(l3->next_reap, jiffies)) goto next_unlock; @@ -3297,7 +3298,7 @@ if (l3->shared) drain_array_locked(searchp, l3->shared, 0, - numa_node_id()); + node); if (l3->free_touched) { l3->free_touched = 0; ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849 2005-09-19 21:20 ` Christoph Lameter @ 2005-09-20 5:16 ` Andrew Morton 2005-09-20 8:34 ` Alok Kataria 2005-09-20 13:58 ` Petr Vandrovec 0 siblings, 2 replies; 43+ messages in thread From: Andrew Morton @ 2005-09-20 5:16 UTC (permalink / raw) To: Christoph Lameter; +Cc: vandrove, alokk, linux-kernel, manfred Christoph Lameter <clameter@engr.sgi.com> wrote: > > On Mon, 19 Sep 2005, Andrew Morton wrote: > > > list_for_each(walk, &cache_chain) { > > kmem_cache_t *searchp; > > struct list_head* p; > > int tofree; > > struct slab *slabp; > > > > searchp = list_entry(walk, kmem_cache_t, next); > > > > if (searchp->flags & SLAB_NO_REAP) > > goto next; > > > > check_irq_on(); > > > > l3 = searchp->nodelists[numa_node_id()]; > > if (l3->alien) > > drain_alien_cache(searchp, l3); > > ->preempt here > > spin_lock_irq(&l3->list_lock); > > > > drain_array_locked(searchp, ac_data(searchp), 0, > > numa_node_id()); > > ->oops, wrong node. > > This is called from keventd which exists per processor. Hmmm... This looks > as if it can change processors after all Well no, it would be a big bug if a keventd thread were to change CPUs. It's OK to rely upon the pinnedness of keventd I guess - a comment would be nice. > but the slab allocator depends on > it running on the right processor. So does the page allocator. sigh. What > is the point of having per processor workqueues if they do not stay on > the assigned processor? They do. I don't believe that preemption is the source of this BUG. (Petr, does CONFIG_PREEMPT=n fix it?) > The fast fix for this case is to get the node number once and then use it > consistently. If one is writing preempt-safe code then one should disable preemption before copying the current CPU number into a local variable. > But we really need to audit the slab and page allocator for > additional cases like this or disable preempt and check for the right > processor in cache_reap(). numa_node_id() must use smp_processor_id(), not raw_smp_processor_id(). Then all the runtime squawks need to be audited and fixed, or switched to (new) raw_numa_node_id() if is is verified that a CPU/node switch at any time is OK. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849 2005-09-20 5:16 ` Andrew Morton @ 2005-09-20 8:34 ` Alok Kataria 2005-09-20 13:58 ` Petr Vandrovec 1 sibling, 0 replies; 43+ messages in thread From: Alok Kataria @ 2005-09-20 8:34 UTC (permalink / raw) To: Andrew Morton Cc: Christoph Lameter, vandrove, linux-kernel, manfred, Ravikiran G Thirumalai [-- Attachment #1: Type: text/plain, Size: 2716 bytes --] Hi, Attached is a patch which stores the numa_node_id in a local variable after disabling interrupts, in the cache_reap code path. I was not able to reproduce the bug that Petr was talking about, but if the cache reap threads do schedule across cpu's which might be the problem here then this should just fix it. Andrew, i also have a patch which fixes the CPU_DOWN code path, which i will send u later. Thanks & Regards, Alok. On Tue, 2005-09-20 at 10:46, Andrew Morton wrote: > Christoph Lameter <clameter@engr.sgi.com> wrote: > > > > On Mon, 19 Sep 2005, Andrew Morton wrote: > > > > > list_for_each(walk, &cache_chain) { > > > kmem_cache_t *searchp; > > > struct list_head* p; > > > int tofree; > > > struct slab *slabp; > > > > > > searchp = list_entry(walk, kmem_cache_t, next); > > > > > > if (searchp->flags & SLAB_NO_REAP) > > > goto next; > > > > > > check_irq_on(); > > > > > > l3 = searchp->nodelists[numa_node_id()]; > > > if (l3->alien) > > > drain_alien_cache(searchp, l3); > > > ->preempt here > > > spin_lock_irq(&l3->list_lock); > > > > > > drain_array_locked(searchp, ac_data(searchp), 0, > > > numa_node_id()); > > > ->oops, wrong node. > > > > This is called from keventd which exists per processor. Hmmm... This looks > > as if it can change processors after all > > Well no, it would be a big bug if a keventd thread were to change CPUs. > > It's OK to rely upon the pinnedness of keventd I guess - a comment would be > nice. > > > but the slab allocator depends on > > it running on the right processor. So does the page allocator. sigh. What > > is the point of having per processor workqueues if they do not stay on > > the assigned processor? > > They do. I don't believe that preemption is the source of this BUG. > (Petr, does CONFIG_PREEMPT=n fix it?) > > > The fast fix for this case is to get the node number once and then use it > > consistently. > > If one is writing preempt-safe code then one should disable preemption > before copying the current CPU number into a local variable. > > > But we really need to audit the slab and page allocator for > > additional cases like this or disable preempt and check for the right > > processor in cache_reap(). > > numa_node_id() must use smp_processor_id(), not raw_smp_processor_id(). > Then all the runtime squawks need to be audited and fixed, or switched to > (new) raw_numa_node_id() if is is verified that a CPU/node switch at any > time is OK. > -- There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies. [-- Attachment #2: cache_reap_nodeid.patch --] [-- Type: text/x-patch, Size: 1442 bytes --] Signed-off-by: Alok N Kataria <alokk@calsoftinc.com> Index: linux-2.6.13/mm/slab.c =================================================================== --- linux-2.6.13.orig/mm/slab.c 2005-09-13 20:56:33.040284000 +0530 +++ linux-2.6.13/mm/slab.c 2005-09-20 13:22:08.328464250 +0530 @@ -3273,7 +3273,7 @@ list_for_each(walk, &cache_chain) { kmem_cache_t *searchp; struct list_head* p; - int tofree; + int tofree, nodeid; struct slab *slabp; searchp = list_entry(walk, kmem_cache_t, next); @@ -3283,13 +3283,19 @@ check_irq_on(); - l3 = searchp->nodelists[numa_node_id()]; + nodeid = numa_node_id(); + l3 = searchp->nodelists[nodeid]; if (l3->alien) drain_alien_cache(searchp, l3); - spin_lock_irq(&l3->list_lock); + + local_irq_disable(); + nodeid = numa_node_id(); + l3 = searchp->nodelists[nodeid]; + + spin_lock(&l3->list_lock); drain_array_locked(searchp, ac_data(searchp), 0, - numa_node_id()); + nodeid); if (time_after(l3->next_reap, jiffies)) goto next_unlock; @@ -3298,7 +3304,7 @@ if (l3->shared) drain_array_locked(searchp, l3->shared, 0, - numa_node_id()); + nodeid); if (l3->free_touched) { l3->free_touched = 0; @@ -3327,7 +3333,8 @@ spin_lock_irq(&l3->list_lock); } while(--tofree > 0); next_unlock: - spin_unlock_irq(&l3->list_lock); + spin_unlock(&l3->list_lock); + local_irq_enable(); next: cond_resched(); } ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849 2005-09-20 5:16 ` Andrew Morton 2005-09-20 8:34 ` Alok Kataria @ 2005-09-20 13:58 ` Petr Vandrovec 2005-09-21 1:03 ` Christoph Lameter 2005-09-28 21:02 ` Ravikiran G Thirumalai 1 sibling, 2 replies; 43+ messages in thread From: Petr Vandrovec @ 2005-09-20 13:58 UTC (permalink / raw) To: Andrew Morton; +Cc: Christoph Lameter, alokk, linux-kernel, manfred Andrew Morton wrote: > Christoph Lameter <clameter@engr.sgi.com> wrote: > >>On Mon, 19 Sep 2005, Andrew Morton wrote: >> >> >>> list_for_each(walk, &cache_chain) { >>> kmem_cache_t *searchp; >>> struct list_head* p; >>> int tofree; >>> struct slab *slabp; >>> >>> searchp = list_entry(walk, kmem_cache_t, next); >>> >>> if (searchp->flags & SLAB_NO_REAP) >>> goto next; >>> >>> check_irq_on(); >>> >>> l3 = searchp->nodelists[numa_node_id()]; >>> if (l3->alien) >>> drain_alien_cache(searchp, l3); >>>->preempt here >>> spin_lock_irq(&l3->list_lock); >>> >>> drain_array_locked(searchp, ac_data(searchp), 0, >>> numa_node_id()); >>>->oops, wrong node. >> >>This is called from keventd which exists per processor. Hmmm... This looks >>as if it can change processors after all > > > Well no, it would be a big bug if a keventd thread were to change CPUs. > > It's OK to rely upon the pinnedness of keventd I guess - a comment would be > nice. > > >>but the slab allocator depends on >>it running on the right processor. So does the page allocator. sigh. What >>is the point of having per processor workqueues if they do not stay on >>the assigned processor? > > > They do. I don't believe that preemption is the source of this BUG. > (Petr, does CONFIG_PREEMPT=n fix it?) No, it does not. I've even added printks here and there to show node number, and everything works as it should. Maybe there are some problems with numa_node_id() and migrating between processors when memory gets released, I do not know. Only thing I know that if I'll add WARN_ON below to the free_block(), it triggers... @free_block slabp = GET_PAGE_SLAB(virt_to_page(objp)); nodeid = slabp->nodeid; + WARN_ON(nodeid != numa_node_id()); <<<<< l3 = cachep->nodelist[nodeid]; list_del(&slabp->list); objnr = (objp - slabp->s_mem) / cachep->objsize; check_spinlock_acquired_node(cachep, nodeid); check_slabp(cachep, slabp); ... saying that keventd/0 tries to operate on slab belonging to node#1, while having acquired lock for cachep belonging to node #0. Due to this check_spinlock_acquired_node(cachep, nodeid) fails (check_spinlock_acquired_node(cachep, 0) would succeed). Petr ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849 2005-09-20 13:58 ` Petr Vandrovec @ 2005-09-21 1:03 ` Christoph Lameter 2005-09-21 1:22 ` Petr Vandrovec 2005-09-28 21:02 ` Ravikiran G Thirumalai 1 sibling, 1 reply; 43+ messages in thread From: Christoph Lameter @ 2005-09-21 1:03 UTC (permalink / raw) To: Petr Vandrovec; +Cc: Andrew Morton, alokk, linux-kernel, manfred On Tue, 20 Sep 2005, Petr Vandrovec wrote: > slab belonging to node#1, while having acquired lock for cachep belonging > to node #0. Due to this check_spinlock_acquired_node(cachep, nodeid) fails > (check_spinlock_acquired_node(cachep, 0) would succeed). Hmmm. If a node runs out of memory then pages from another node may end up on the slab list of a node. But it seems that free_block cannot handle that properly. How are you producing the problem? Could you try the following patch: --- The numa slab allocator may allocate pages from foreign nodes onto the lists for a particular node if a node runs out of memory. Inspecting the slab->nodeid field will not reflect that the page is now in use for the slabs of another node. This patch fixes that issue by adding a node field to free_block so that the caller can indicate which node currently uses a slab. Also removes the check for the current node from kmalloc_cache_node since the process may shift later to another node which may lead to an allocation on another node than intended. Signed-off-by: Christoph Lameter <clameter@sgi.com> Index: linux-2.6.14-rc1/mm/slab.c =================================================================== --- linux-2.6.14-rc1.orig/mm/slab.c 2005-09-21 00:09:05.000000000 +0000 +++ linux-2.6.14-rc1/mm/slab.c 2005-09-21 00:48:12.000000000 +0000 @@ -639,7 +639,7 @@ static enum { static DEFINE_PER_CPU(struct work_struct, reap_work); -static void free_block(kmem_cache_t* cachep, void** objpp, int len); +static void free_block(kmem_cache_t* cachep, void** objpp, int len, int node); static void enable_cpucache (kmem_cache_t *cachep); static void cache_reap (void *unused); static int __node_shrink(kmem_cache_t *cachep, int node); @@ -804,7 +804,7 @@ static inline void __drain_alien_cache(k if (ac->avail) { spin_lock(&rl3->list_lock); - free_block(cachep, ac->entry, ac->avail); + free_block(cachep, ac->entry, ac->avail, node); ac->avail = 0; spin_unlock(&rl3->list_lock); } @@ -925,7 +925,7 @@ static int __devinit cpuup_callback(stru /* Free limit for this kmem_list3 */ l3->free_limit -= cachep->batchcount; if (nc) - free_block(cachep, nc->entry, nc->avail); + free_block(cachep, nc->entry, nc->avail, node); if (!cpus_empty(mask)) { spin_unlock(&l3->list_lock); @@ -934,7 +934,7 @@ static int __devinit cpuup_callback(stru if (l3->shared) { free_block(cachep, l3->shared->entry, - l3->shared->avail); + l3->shared->avail, node); kfree(l3->shared); l3->shared = NULL; } @@ -1882,12 +1882,13 @@ static void do_drain(void *arg) { kmem_cache_t *cachep = (kmem_cache_t*)arg; struct array_cache *ac; + int node = numa_node_id(); check_irq_off(); ac = ac_data(cachep); - spin_lock(&cachep->nodelists[numa_node_id()]->list_lock); - free_block(cachep, ac->entry, ac->avail); - spin_unlock(&cachep->nodelists[numa_node_id()]->list_lock); + spin_lock(&cachep->nodelists[node]->list_lock); + free_block(cachep, ac->entry, ac->avail, node); + spin_unlock(&cachep->nodelists[node]->list_lock); ac->avail = 0; } @@ -2608,7 +2609,7 @@ done: /* * Caller needs to acquire correct kmem_list's list_lock */ -static void free_block(kmem_cache_t *cachep, void **objpp, int nr_objects) +static void free_block(kmem_cache_t *cachep, void **objpp, int nr_objects, int node) { int i; struct kmem_list3 *l3; @@ -2617,14 +2618,12 @@ static void free_block(kmem_cache_t *cac void *objp = objpp[i]; struct slab *slabp; unsigned int objnr; - int nodeid = 0; slabp = GET_PAGE_SLAB(virt_to_page(objp)); - nodeid = slabp->nodeid; - l3 = cachep->nodelists[nodeid]; + l3 = cachep->nodelists[node]; list_del(&slabp->list); objnr = (objp - slabp->s_mem) / cachep->objsize; - check_spinlock_acquired_node(cachep, nodeid); + check_spinlock_acquired_node(cachep, node); check_slabp(cachep, slabp); @@ -2664,13 +2663,14 @@ static void cache_flusharray(kmem_cache_ { int batchcount; struct kmem_list3 *l3; + int node = numa_node_id(); batchcount = ac->batchcount; #if DEBUG BUG_ON(!batchcount || batchcount > ac->avail); #endif check_irq_off(); - l3 = cachep->nodelists[numa_node_id()]; + l3 = cachep->nodelists[node]; spin_lock(&l3->list_lock); if (l3->shared) { struct array_cache *shared_array = l3->shared; @@ -2686,7 +2686,7 @@ static void cache_flusharray(kmem_cache_ } } - free_block(cachep, ac->entry, batchcount); + free_block(cachep, ac->entry, batchcount, node); free_done: #if STATS { @@ -2751,7 +2751,7 @@ static inline void __cache_free(kmem_cac } else { spin_lock(&(cachep->nodelists[nodeid])-> list_lock); - free_block(cachep, &objp, 1); + free_block(cachep, &objp, 1, nodeid); spin_unlock(&(cachep->nodelists[nodeid])-> list_lock); } @@ -2844,7 +2844,7 @@ void *kmem_cache_alloc_node(kmem_cache_t unsigned long save_flags; void *ptr; - if (nodeid == numa_node_id() || nodeid == -1) + if (nodeid == -1) return __cache_alloc(cachep, flags); if (unlikely(!cachep->nodelists[nodeid])) { @@ -3079,7 +3079,7 @@ static int alloc_kmemlist(kmem_cache_t * if ((nc = cachep->nodelists[node]->shared)) free_block(cachep, nc->entry, - nc->avail); + nc->avail, node); l3->shared = new; if (!cachep->nodelists[node]->alien) { @@ -3160,7 +3160,7 @@ static int do_tune_cpucache(kmem_cache_t if (!ccold) continue; spin_lock_irq(&cachep->nodelists[cpu_to_node(i)]->list_lock); - free_block(cachep, ccold->entry, ccold->avail); + free_block(cachep, ccold->entry, ccold->avail, cpu_to_node(i)); spin_unlock_irq(&cachep->nodelists[cpu_to_node(i)]->list_lock); kfree(ccold); } @@ -3240,7 +3240,7 @@ static void drain_array_locked(kmem_cach if (tofree > ac->avail) { tofree = (ac->avail+1)/2; } - free_block(cachep, ac->entry, tofree); + free_block(cachep, ac->entry, tofree, node); ac->avail -= tofree; memmove(ac->entry, &(ac->entry[tofree]), sizeof(void*)*ac->avail); ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849 2005-09-21 1:03 ` Christoph Lameter @ 2005-09-21 1:22 ` Petr Vandrovec 2005-09-21 15:59 ` Christoph Lameter 0 siblings, 1 reply; 43+ messages in thread From: Petr Vandrovec @ 2005-09-21 1:22 UTC (permalink / raw) To: Christoph Lameter; +Cc: Andrew Morton, alokk, linux-kernel, manfred Christoph Lameter wrote: > On Tue, 20 Sep 2005, Petr Vandrovec wrote: > > >>slab belonging to node#1, while having acquired lock for cachep belonging >>to node #0. Due to this check_spinlock_acquired_node(cachep, nodeid) fails >>(check_spinlock_acquired_node(cachep, 0) would succeed). > > > Hmmm. If a node runs out of memory then pages from another node may end up > on the slab list of a node. But it seems that free_block cannot handle > that properly. > > How are you producing the problem? Simple... I just boot any kernel after 2.6.13, and it dies in front of me. Currently I'm using config below, which I boot with 'rootdelay=60' so panic in keventd happens before panic due to no root filesystem. No ACPI. Nothing. 100% reproducible. Maybe I should enable embedded options and remove all other device drivers still present in the kernel. Below config is dmesg from 2.6.13, which has no problems with comming up. Maybe you'll find some clue there, but I see none. Node #0 has 1GB of memory, so it should have no need to borrow blocks from node #1 when this kernel is able to boot in 16MB of memory... Petr # # Automatically generated make config: don't edit # Linux kernel version: 2.6.14-rc1-6c07 # Wed Sep 21 03:03:20 2005 # CONFIG_X86_64=y CONFIG_64BIT=y CONFIG_X86=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_MMU=y CONFIG_RWSEM_GENERIC_SPINLOCK=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_X86_CMPXCHG=y CONFIG_EARLY_PRINTK=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y # # Code maturity level options # # CONFIG_EXPERIMENTAL is not set CONFIG_CLEAN_COMPILE=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 # CONFIG_SHOW_LOGO is not set # # General setup # CONFIG_LOCALVERSION="" # CONFIG_LOCALVERSION_AUTO is not set # CONFIG_SWAP is not set # CONFIG_SYSVIPC is not set # CONFIG_BSD_PROCESS_ACCT is not set # CONFIG_SYSCTL is not set # CONFIG_HOTPLUG is not set # CONFIG_IKCONFIG is not set # CONFIG_CPUSETS is not set CONFIG_INITRAMFS_SOURCE="" # CONFIG_EMBEDDED is not set CONFIG_KALLSYMS=y # CONFIG_KALLSYMS_ALL is not set # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_EPOLL=y CONFIG_SHMEM=y CONFIG_CC_ALIGN_FUNCTIONS=0 CONFIG_CC_ALIGN_LABELS=0 CONFIG_CC_ALIGN_LOOPS=0 CONFIG_CC_ALIGN_JUMPS=0 # CONFIG_TINY_SHMEM is not set CONFIG_BASE_SMALL=0 # # Loadable module support # # CONFIG_MODULES is not set # # Processor type and features # CONFIG_MK8=y # CONFIG_MPSC is not set # CONFIG_GENERIC_CPU is not set CONFIG_X86_L1_CACHE_BYTES=64 CONFIG_X86_L1_CACHE_SHIFT=6 CONFIG_X86_TSC=y CONFIG_X86_GOOD_APIC=y # CONFIG_MICROCODE is not set # CONFIG_X86_MSR is not set # CONFIG_X86_CPUID is not set CONFIG_X86_IO_APIC=y CONFIG_X86_LOCAL_APIC=y # CONFIG_MTRR is not set CONFIG_SMP=y # CONFIG_SCHED_SMT is not set CONFIG_PREEMPT_NONE=y # CONFIG_PREEMPT_VOLUNTARY is not set # CONFIG_PREEMPT is not set # CONFIG_PREEMPT_BKL is not set CONFIG_K8_NUMA=y # CONFIG_NUMA_EMU is not set CONFIG_ARCH_DISCONTIGMEM_ENABLE=y CONFIG_NUMA=y CONFIG_ARCH_DISCONTIGMEM_DEFAULT=y CONFIG_ARCH_SPARSEMEM_ENABLE=y CONFIG_DISCONTIGMEM=y CONFIG_FLAT_NODE_MEM_MAP=y CONFIG_NEED_MULTIPLE_NODES=y # CONFIG_SPARSEMEM_STATIC is not set CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID=y CONFIG_NR_CPUS=8 CONFIG_HPET_TIMER=y CONFIG_DUMMY_IOMMU=y CONFIG_X86_MCE=y # CONFIG_X86_MCE_INTEL is not set CONFIG_PHYSICAL_START=0x100000 # CONFIG_SECCOMP is not set CONFIG_HZ_100=y # CONFIG_HZ_250 is not set # CONFIG_HZ_1000 is not set CONFIG_HZ=100 CONFIG_GENERIC_HARDIRQS=y CONFIG_GENERIC_IRQ_PROBE=y CONFIG_ISA_DMA_API=y CONFIG_GENERIC_PENDING_IRQ=y # # Power management options # # CONFIG_PM is not set # # ACPI (Advanced Configuration and Power Interface) Support # # CONFIG_ACPI is not set # # CPU Frequency scaling # # CONFIG_CPU_FREQ is not set # # Bus options (PCI etc.) # # CONFIG_PCI is not set # # PCCARD (PCMCIA/CardBus) support # # CONFIG_PCCARD is not set # # PCI Hotplug Support # # # Executable file formats / Emulations # # CONFIG_BINFMT_ELF is not set # CONFIG_BINFMT_MISC is not set # CONFIG_IA32_EMULATION is not set # # Networking # # CONFIG_NET is not set # # Device Drivers # # # Generic Driver Options # CONFIG_STANDALONE=y # CONFIG_PREVENT_FIRMWARE_BUILD is not set # CONFIG_FW_LOADER is not set # CONFIG_DEBUG_DRIVER is not set # # Connector - unified userspace <-> kernelspace linker # # # Memory Technology Devices (MTD) # # CONFIG_MTD is not set # # Parallel port support # # CONFIG_PARPORT is not set # # Plug and Play support # # # Block devices # # CONFIG_BLK_DEV_FD is not set # CONFIG_BLK_DEV_COW_COMMON is not set # CONFIG_BLK_DEV_LOOP is not set # CONFIG_BLK_DEV_RAM is not set CONFIG_BLK_DEV_RAM_COUNT=16 # CONFIG_LBD is not set # CONFIG_CDROM_PKTCDVD is not set # # IO Schedulers # CONFIG_IOSCHED_NOOP=y # CONFIG_IOSCHED_AS is not set # CONFIG_IOSCHED_DEADLINE is not set # CONFIG_IOSCHED_CFQ is not set # # ATA/ATAPI/MFM/RLL support # # CONFIG_IDE is not set # # SCSI device support # # CONFIG_RAID_ATTRS is not set # CONFIG_SCSI is not set # # Multi-device support (RAID and LVM) # # CONFIG_MD is not set # # Fusion MPT device support # # CONFIG_FUSION is not set # # IEEE 1394 (FireWire) support # # # I2O device support # # # Network device support # # CONFIG_NETPOLL is not set # CONFIG_NET_POLL_CONTROLLER is not set # # ISDN subsystem # # # Telephony Support # # CONFIG_PHONE is not set # # Input device support # CONFIG_INPUT=y # # Userland interfaces # CONFIG_INPUT_MOUSEDEV=y # CONFIG_INPUT_MOUSEDEV_PSAUX is not set CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024 CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768 # CONFIG_INPUT_JOYDEV is not set # CONFIG_INPUT_TSDEV is not set # CONFIG_INPUT_EVDEV is not set # CONFIG_INPUT_EVBUG is not set # # Input Device Drivers # CONFIG_INPUT_KEYBOARD=y CONFIG_KEYBOARD_ATKBD=y # CONFIG_KEYBOARD_SUNKBD is not set # CONFIG_KEYBOARD_LKKBD is not set # CONFIG_KEYBOARD_XTKBD is not set # CONFIG_KEYBOARD_NEWTON is not set # CONFIG_INPUT_MOUSE is not set # CONFIG_INPUT_JOYSTICK is not set # CONFIG_INPUT_TOUCHSCREEN is not set # CONFIG_INPUT_MISC is not set # # Hardware I/O ports # CONFIG_SERIO=y CONFIG_SERIO_I8042=y # CONFIG_SERIO_SERPORT is not set # CONFIG_SERIO_CT82C710 is not set CONFIG_SERIO_LIBPS2=y # CONFIG_SERIO_RAW is not set # CONFIG_GAMEPORT is not set # # Character devices # CONFIG_VT=y CONFIG_VT_CONSOLE=y # CONFIG_CONSOLE_KNOWS_9B is not set CONFIG_HW_CONSOLE=y # CONFIG_SERIAL_NONSTANDARD is not set # # Serial drivers # # CONFIG_SERIAL_8250 is not set # # Non-8250 serial port support # CONFIG_UNIX98_PTYS=y # CONFIG_LEGACY_PTYS is not set # # IPMI # # CONFIG_IPMI_HANDLER is not set # # Watchdog Cards # # CONFIG_WATCHDOG is not set # CONFIG_NVRAM is not set # CONFIG_RTC is not set # CONFIG_GEN_RTC is not set # CONFIG_DTLK is not set # CONFIG_R3964 is not set # # Ftape, the floppy tape device driver # # CONFIG_AGP is not set # CONFIG_MWAVE is not set # CONFIG_RAW_DRIVER is not set # CONFIG_HANGCHECK_TIMER is not set # # TPM devices # # # I2C support # # CONFIG_I2C is not set # # Dallas's 1-wire bus # # CONFIG_W1 is not set # # Hardware Monitoring support # # CONFIG_HWMON is not set # CONFIG_HWMON_VID is not set # # Misc devices # # # Multimedia Capabilities Port drivers # # # Multimedia devices # # CONFIG_VIDEO_DEV is not set # # Digital Video Broadcasting Devices # # # Graphics support # # CONFIG_FB is not set # CONFIG_VIDEO_SELECT is not set # # Console display driver support # CONFIG_VGA_CONSOLE=y CONFIG_DUMMY_CONSOLE=y # # Sound # # CONFIG_SOUND is not set # # USB support # # CONFIG_USB_ARCH_HAS_HCD is not set # CONFIG_USB_ARCH_HAS_OHCI is not set # # USB Gadget Support # # CONFIG_USB_GADGET is not set # # MMC/SD Card support # # CONFIG_MMC is not set # # InfiniBand support # # # SN Devices # # # Firmware Drivers # # CONFIG_DELL_RBU is not set # CONFIG_DCDBAS is not set # # File systems # # CONFIG_EXT2_FS is not set # CONFIG_EXT3_FS is not set # CONFIG_JBD is not set # CONFIG_REISERFS_FS is not set # CONFIG_JFS_FS is not set # CONFIG_FS_POSIX_ACL is not set # CONFIG_XFS_FS is not set # CONFIG_MINIX_FS is not set # CONFIG_ROMFS_FS is not set # CONFIG_INOTIFY is not set # CONFIG_QUOTA is not set CONFIG_DNOTIFY=y # CONFIG_AUTOFS_FS is not set # CONFIG_AUTOFS4_FS is not set # CONFIG_FUSE_FS is not set # # CD-ROM/DVD Filesystems # # CONFIG_ISO9660_FS is not set # CONFIG_UDF_FS is not set # # DOS/FAT/NT Filesystems # # CONFIG_MSDOS_FS is not set # CONFIG_VFAT_FS is not set # CONFIG_NTFS_FS is not set # # Pseudo filesystems # CONFIG_PROC_FS=y CONFIG_PROC_KCORE=y CONFIG_SYSFS=y # CONFIG_TMPFS is not set # CONFIG_HUGETLBFS is not set # CONFIG_HUGETLB_PAGE is not set CONFIG_RAMFS=y # CONFIG_RELAYFS_FS is not set # # Miscellaneous filesystems # # CONFIG_HFSPLUS_FS is not set # CONFIG_CRAMFS is not set # CONFIG_VXFS_FS is not set # CONFIG_HPFS_FS is not set # CONFIG_QNX4FS_FS is not set # CONFIG_SYSV_FS is not set # CONFIG_UFS_FS is not set # # Partition Types # # CONFIG_PARTITION_ADVANCED is not set CONFIG_MSDOS_PARTITION=y # # Native Language Support # # CONFIG_NLS is not set # # Kernel hacking # # CONFIG_PRINTK_TIME is not set CONFIG_DEBUG_KERNEL=y CONFIG_MAGIC_SYSRQ=y CONFIG_LOG_BUF_SHIFT=17 CONFIG_DETECT_SOFTLOCKUP=y CONFIG_SCHEDSTATS=y CONFIG_DEBUG_SLAB=y CONFIG_DEBUG_SPINLOCK=y CONFIG_DEBUG_SPINLOCK_SLEEP=y # CONFIG_DEBUG_KOBJECT is not set CONFIG_DEBUG_INFO=y CONFIG_DEBUG_FS=y # CONFIG_FRAME_POINTER is not set CONFIG_INIT_DEBUG=y # CONFIG_KPROBES is not set # # Security options # # CONFIG_KEYS is not set # CONFIG_SECURITY is not set # # Cryptographic options # # CONFIG_CRYPTO is not set # # Hardware crypto devices # # # Library routines # # CONFIG_CRC_CCITT is not set # CONFIG_CRC16 is not set CONFIG_CRC32=y # CONFIG_LIBCRC32C is not set Bootdata ok (command line is BOOT_IMAGE=2.6.13-64 ro root=801 ramdisk=0 console=ttyS0,115200 console=tty0 nmi_watchdog=2 psmouse_noext=1 verbose) Linux version 2.6.13 (root@vana) (gcc version 3.3.3 (Debian 20040401)) #2 SMP Tue Aug 30 02:41:20 CEST 2005 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009fc00 (usable) BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved) BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 000000007fff0000 (usable) BIOS-e820: 000000007fff0000 - 000000007ffff000 (ACPI data) BIOS-e820: 000000007ffff000 - 0000000080000000 (ACPI NVS) BIOS-e820: 00000000ff780000 - 0000000100000000 (reserved) ACPI: RSDP (v002 ACPIAM ) @ 0x00000000000f6e10 ACPI: XSDT (v001 A M I OEMXSDT 0x06000514 MSFT 0x00000097) @ 0x000000007fff0100 ACPI: FADT (v001 A M I OEMFACP 0x06000514 MSFT 0x00000097) @ 0x000000007fff0281 ACPI: MADT (v001 A M I OEMAPIC 0x06000514 MSFT 0x00000097) @ 0x000000007fff0380 ACPI: OEMB (v001 A M I OEMBIOS 0x06000514 MSFT 0x00000097) @ 0x000000007ffff040 ACPI: SRAT (v001 A M I OEMSRAT 0x06000514 MSFT 0x00000097) @ 0x000000007fff4260 ACPI: HPET (v001 A M I OEMHPET 0x06000514 MSFT 0x00000097) @ 0x000000007fff4370 ACPI: ASF! (v001 AMIASF AMDSTRET 0x00000001 INTL 0x02002026) @ 0x000000007fff43b0 ACPI: DSDT (v001 0AAAA 0AAAA001 0x00000001 INTL 0x02002026) @ 0x0000000000000000 SRAT: PXM 0 -> APIC 0 -> CPU 0 -> Node 0 SRAT: PXM 1 -> APIC 1 -> CPU 1 -> Node 1 SRAT: Node 0 PXM 0 100000-3fffffff SRAT: Node 1 PXM 1 40000000-7fffffff SRAT: Node 0 PXM 0 0-3fffffff Using 24 for the hash shift. Max adder is 7fffffff Bootmem setup node 0 0000000000000000-000000003fffffff Bootmem setup node 1 0000000040000000-000000007ffeffff On node 0 totalpages: 262046 DMA zone: 3999 pages, LIFO batch:1 Normal zone: 258047 pages, LIFO batch:31 HighMem zone: 0 pages, LIFO batch:1 On node 1 totalpages: 262127 DMA zone: 0 pages, LIFO batch:1 Normal zone: 262127 pages, LIFO batch:31 HighMem zone: 0 pages, LIFO batch:1 ACPI: PM-Timer IO Port: 0x5008 ACPI: Local APIC address 0xfee00000 ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) Processor #0 15:5 APIC version 16 ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled) Processor #1 15:5 APIC version 16 ACPI: LAPIC (acpi_id[0x03] lapic_id[0x82] disabled) ACPI: LAPIC (acpi_id[0x04] lapic_id[0x83] disabled) ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0]) IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, GSI 0-23 ACPI: IOAPIC (id[0x03] address[0xff4ff000] gsi_base[24]) IOAPIC[1]: apic_id 3, version 17, address 0xff4ff000, GSI 24-27 ACPI: IOAPIC (id[0x04] address[0xff4fe000] gsi_base[28]) IOAPIC[2]: apic_id 4, version 17, address 0xff4fe000, GSI 28-31 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: IRQ0 used by override. ACPI: IRQ2 used by override. ACPI: IRQ9 used by override. Setting APIC routing to flat ACPI: HPET id: 0x102282a0 base: 0xfec01000 Using ACPI (MADT) for SMP configuration information Allocating PCI resources starting at 80000000 (gap: 80000000:7f780000) Checking aperture... CPU 0: aperture @ c0000000 size 512 MB CPU 1: aperture @ c0000000 size 512 MB Built 2 zonelists Kernel command line: BOOT_IMAGE=2.6.13-64 ro root=801 ramdisk=0 console=ttyS0,115200 console=tty0 nmi_watchdog=2 psmouse_noext=1 verbose Parameter psmouse_noext is obsolete, ignored Initializing CPU#0 PID hash table entries: 4096 (order: 12, 131072 bytes) time.c: Using 14.318180 MHz HPET timer. time.c: Detected 1991.621 MHz processor. Console: colour VGA+ 80x25 Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes) Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes) Memory: 2055660k/2097088k available (2668k kernel code, 0k reserved, 2318k data, 232k init) Calibrating delay using timer specific routine.. 3988.04 BogoMIPS (lpj=19940224) Security Framework v1.0.0 initialized SELinux: Initializing. SELinux: Starting in permissive mode Mount-cache hash table entries: 256 CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 1024K (64 bytes/line) CPU 0(1) -> Node 0 -> Core 0 mtrr: v2.0 (20020519) Using local APIC timer interrupts. Detected 12.447 MHz APIC timer. Booting processor 1/2 APIC 0x1 Initializing CPU#1 Calibrating delay using timer specific routine.. 3983.13 BogoMIPS (lpj=19915667) CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 1024K (64 bytes/line) CPU 1(1) -> Node 1 -> Core 0 AMD Opteron(tm) Processor 246 stepping 0a CPU 1: Syncing TSC to CPU 0. CPU 1: synchronized TSC with CPU 0 (last diff -136 cycles, maxerr 901 cycles) Brought up 2 CPUs time.c: Using HPET based timekeeping. testing NMI watchdog ... OK. NET: Registered protocol family 16 ACPI: bus type pci registered PCI: Using configuration type 1 ACPI: Subsystem revision 20050408 ACPI: Interpreter enabled ACPI: Using IOAPIC for interrupt routing ACPI: PCI Root Bridge [PCI0] (0000:00) PCI: Probing PCI hardware (bus 00) ACPI: Assume root bridge [\_SB_.PCI0] segment is 0 ACPI: Assume root bridge [\_SB_.PCIB] segment is 0 PCI: Scanning bus 0000:00 PCI: Found 0000:00:06.0 [1022/7460] 000604 01 PCI: Found 0000:00:07.0 [1022/7468] 000601 00 PCI: Found 0000:00:07.1 [1022/7469] 000101 00 PCI: Found 0000:00:07.2 [1022/746a] 000c05 00 PCI: Found 0000:00:07.3 [1022/746b] 000680 00 PCI: Found 0000:00:07.5 [1022/746d] 000401 00 PCI: Found 0000:00:0a.0 [1022/7450] 000604 01 PCI: Found 0000:00:0a.1 [1022/7451] 000800 00 PCI: Found 0000:00:0b.0 [1022/7450] 000604 01 PCI: Found 0000:00:0b.1 [1022/7451] 000800 00 PCI: Found 0000:00:18.0 [1022/1100] 000600 00 PCI: Found 0000:00:18.1 [1022/1101] 000600 00 PCI: Found 0000:00:18.2 [1022/1102] 000600 00 PCI: Found 0000:00:18.3 [1022/1103] 000600 00 PCI: Found 0000:00:19.0 [1022/1100] 000600 00 PCI: Found 0000:00:19.1 [1022/1101] 000600 00 PCI: Found 0000:00:19.2 [1022/1102] 000600 00 PCI: Found 0000:00:19.3 [1022/1103] 000600 00 PCI: Fixups for bus 0000:00 PCI: Scanning behind PCI bridge 0000:00:06.0, config 010100, pass 0 PCI: Scanning bus 0000:01 PCI: Found 0000:01:00.0 [1022/7464] 000c03 00 PCI: Found 0000:01:00.1 [1022/7464] 000c03 00 PCI: Found 0000:01:0b.0 [1095/3114] 000180 00 PCI: Found 0000:01:0c.0 [104c/8023] 000c00 00 PCI: Fixups for bus 0000:01 PCI: Bus scan for 0000:01 returning with max=01 PCI: Scanning behind PCI bridge 0000:00:0a.0, config 020200, pass 0 PCI: Scanning bus 0000:02 PCI: Found 0000:02:07.0 [1131/7146] 000480 00 PCI: Found 0000:02:09.0 [14e4/16a7] 000200 00 PCI: Fixups for bus 0000:02 PCI: Bus scan for 0000:02 returning with max=02 PCI: Scanning behind PCI bridge 0000:00:0b.0, config 030300, pass 0 PCI: Scanning bus 0000:03 PCI: Fixups for bus 0000:03 PCI: Bus scan for 0000:03 returning with max=03 PCI: Scanning behind PCI bridge 0000:00:06.0, config 010100, pass 1 PCI: Scanning behind PCI bridge 0000:00:0a.0, config 020200, pass 1 PCI: Scanning behind PCI bridge 0000:00:0b.0, config 030300, pass 1 PCI: Bus scan for 0000:00 returning with max=03 ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCI1._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.GOLA._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.GOLB._PRT] ACPI: PCI Root Bridge [PCIB] (0000:04) PCI: Probing PCI hardware (bus 04) ACPI: Assume root bridge [\_SB_.PCI0] segment is 0 ACPI: Assume root bridge [\_SB_.PCIB] segment is 0 PCI: Scanning bus 0000:04 PCI: Found 0000:04:00.0 [1022/7454] 000600 00 PCI: Found 0000:04:01.0 [1022/7455] 000604 01 PCI: Fixups for bus 0000:04 PCI: Scanning behind PCI bridge 0000:04:01.0, config 050504, pass 0 PCI: Scanning bus 0000:05 PCI: Found 0000:05:00.0 [1002/5964] 000300 00 Boot video device is 0000:05:00.0 PCI: Found 0000:05:00.1 [1002/5d44] 000380 00 PCI: Fixups for bus 0000:05 PCI: Bus scan for 0000:05 returning with max=05 PCI: Scanning behind PCI bridge 0000:04:01.0, config 050504, pass 1 PCI: Bus scan for 0000:04 returning with max=05 ACPI: PCI Interrupt Routing Table [\_SB_.PCIB.PBP2._PRT] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 9 10 *11 12 14 15) ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 9 *10 11 12 14 15) ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 *9 10 11 12 14 15) ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 *5 6 7 9 10 11 12 14 15) Linux Plug and Play Support v0.97 (c) Adam Belay pnp: PnP ACPI init pnp: PnP ACPI: found 14 devices SCSI subsystem initialized PCI: Using ACPI for IRQ routing PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report got res [80000000:8007ffff] bus [80000000:8007ffff] flags 7200 for BAR 6 of 0000:01:0b.0 PCI: Bridge: 0000:00:06.0 IO window: 9000-afff MEM window: ff100000-ff2fffff PREFETCH window: 80000000-800fffff got res [9e900000:9e90ffff] bus [9e900000:9e90ffff] flags 7200 for BAR 6 of 0000:02:09.0 PCI: Bridge: 0000:00:0a.0 IO window: disabled. MEM window: ff300000-ff3fffff PREFETCH window: 9e900000-9e9fffff PCI: Bridge: 0000:00:0b.0 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. got res [9eb00000:9eb1ffff] bus [9eb00000:9eb1ffff] flags 7202 for BAR 6 of 0000:05:00.0 PCI: Bridge: 0000:04:01.0 IO window: c000-cfff MEM window: ff500000-ff5fffff PREFETCH window: 9eb00000-beafffff TC classifier action (bugs to netdev@vger.kernel.org cc hadi@cyberus.ca) hpet0: at MMIO 0xfec01000, IRQs 2, 8, 0 hpet0: 69ns tick, 3 32-bit timers agpgart: Detected AMD 8151 AGP Bridge rev B3 agpgart: AGP aperture is 512M @ 0xc0000000 PCI-DMA: Disabling IOMMU. pnp: 00:09: ioport range 0x680-0x6ff has been reserved pnp: 00:09: ioport range 0x295-0x296 has been reserved pnp: 00:09: ioport range 0xb78-0xb7f has been reserved pnp: 00:09: ioport range 0xf78-0xf7f has been reserved IA-32 Microcode Update Driver: v1.14 <tigran@veritas.com> IA32 emulation $Id: sys_ia32.c,v 1.32 2002/03/24 13:02:28 ak Exp $ audit: initializing netlink socket (disabled) audit(1127264758.390:1): initialized Total HugeTLB memory allocated, 0 SELinux: Registering netfilter hooks Initializing Cryptographic API PCI: MSI quirk detected. pci_msi_quirk set. PCI: MSI quirk detected. pci_msi_quirk set. ACPI: PCI Interrupt 0000:05:00.0[A] -> GSI 16 (level, low) -> IRQ 169 radeonfb: Found Intel x86 BIOS ROM Image radeonfb: Retreived PLL infos from BIOS radeonfb: Reference=27.00 MHz (RefDiv=12) Memory=200.00 Mhz, System=166.00 MHz radeonfb: PLL min 20000 max 40000 radeonfb: Monitor 1 type DFP found radeonfb: EDID probed radeonfb: Monitor 2 type no found Console: switching to colour frame buffer device 240x75 radeonfb (0000:05:00.0): ATI Radeon Yd ACPI: Power Button (FF) [PWRF] ACPI: Power Button (CM) [PWRB] Using specific hotkey driver ACPI: CPU0 (power states: C1[C1]) ACPI: Processor [CPU1] (supports 8 throttling states) ACPI: CPU1 (power states: C1[C1]) Real Time Clock Driver v1.12 hpet_acpi_add: no address or irqs in _CRS Linux agpgart interface v0.101 (c) Dave Jones [drm] Initialized drm 1.0.0 20040925 PNP: PS/2 controller doesn't have AUX irq; using default 0xc PNP: PS/2 Controller [PNP0303:PS2K] at 0x60,0x64 irq 112 serio: i8042 AUX port at 0x60,0x64 irq 12 serio: i8042 KBD port at 0x60,0x64 irq 1 Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A ttyS0 at I/O 0x3f8 (irq = 0) is a 16550A ttyS1 at I/O 0x2f8 (irq = 0) is a 16550A ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A parport: PnPBIOS parport detected. parport0: PC-style at 0x378 (0x778), irq 7, dma 3 [PCSPP,TRISTATE,COMPAT,EPP,ECP,DMA] io scheduler noop registered io scheduler anticipatory registered io scheduler deadline registered io scheduler cfq registered pktcdvd: v0.2.0a 2004-07-14 Jens Axboe (axboe@suse.de) and petero2@telia.com Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx AMD8111: IDE controller at PCI slot 0000:00:07.1 AMD8111: chipset revision 3 AMD8111: not 100% native mode: will probe irqs later AMD8111: 0000:00:07.1 (rev 03) UDMA133 controller ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:DMA, hdb:pio ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:DMA, hdd:pio Probing IDE interface ide0... hda: _NEC DVD_RW ND-3500AG, ATAPI CD/DVD-ROM drive ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 Probing IDE interface ide1... hdc: WDC WD1200JB-00CRA0, ATA DISK drive ide1 at 0x170-0x177,0x376 on irq 15 hdc: max request size: 128KiB hdc: 234441648 sectors (120034 MB) w/8192KiB Cache, CHS=65535/16/63, UDMA(100) hdc: cache flushes not supported hdc: hdc1 libata version 1.12 loaded. sata_sil version 0.9 ACPI: PCI Interrupt 0000:01:0b.0[A] -> GSI 17 (level, low) -> IRQ 177 ata1: SATA max UDMA/100 cmd 0xFFFFC20000004C80 ctl 0xFFFFC20000004C8A bmdma 0xFFFFC20000004C00 irq 177 ata2: SATA max UDMA/100 cmd 0xFFFFC20000004CC0 ctl 0xFFFFC20000004CCA bmdma 0xFFFFC20000004C08 irq 177 ata3: SATA max UDMA/100 cmd 0xFFFFC20000004E80 ctl 0xFFFFC20000004E8A bmdma 0xFFFFC20000004E00 irq 177 ata4: SATA max UDMA/100 cmd 0xFFFFC20000004EC0 ctl 0xFFFFC20000004ECA bmdma 0xFFFFC20000004E08 irq 177 ata1: dev 0 cfg 49:2f00 82:74eb 83:7feb 84:4123 85:74e8 86:3c03 87:4123 88:207f ata1: dev 0 ATA, max UDMA/133, 781422768 sectors: lba48 ata1: dev 0 configured for UDMA/100 scsi0 : sata_sil ata2: no device found (phy stat 00000000) scsi1 : sata_sil ata3: no device found (phy stat 00000000) scsi2 : sata_sil ata4: no device found (phy stat 00000000) scsi3 : sata_sil Vendor: ATA Model: HDS724040KLSA80 Rev: KFAO Type: Direct-Access ANSI SCSI revision: 05 SCSI device sda: 781422768 512-byte hdwr sectors (400088 MB) SCSI device sda: drive cache: write back SCSI device sda: 781422768 512-byte hdwr sectors (400088 MB) SCSI device sda: drive cache: write back sda: sda1 sda2 sda3 sda4 Attached scsi disk sda at scsi0, channel 0, id 0, lun 0 Fusion MPT base driver 3.03.02 Copyright (c) 1999-2005 LSI Logic Corporation Fusion MPT SPI Host driver 3.03.02 Fusion MPT misc device (ioctl) driver 3.03.02 mptctl: Registered with Fusion MPT base driver mptctl: /dev/mptctl @ (major,minor=10,220) aoe: aoe_init: AoE v2.6-10 initialised. mice: PS/2 mouse device common for all mice input: PC Speaker i2c /dev entries driver NET: Registered protocol family 2 IP route cache hash table entries: 131072 (order: 8, 1048576 bytes) TCP established hash table entries: 131072 (order: 9, 3145728 bytes) TCP bind hash table entries: 65536 (order: 8, 1572864 bytes) input: AT Translated Set 2 keyboard on isa0060/serio0 TCP: Hash tables configured (established 131072 bind 65536) TCP reno registered NET: Registered protocol family 1 NET: Registered protocol family 17 BIOS EDD facility v0.16 2004-Jun-25, 2 devices found Found 512b device! Using larger block size... EXT3-fs: mounted filesystem with ordered data mode. VFS: Mounted root (ext3 filesystem) readonly. Freeing unused kernel memory: 232k freed kjournald starting. Commit interval 5 seconds Found 512b device! Using larger block size... Found 512b device! Using larger block size... Adding 1959916k swap on /dev/sda2. Priority:-1 extents:1 EXT3 FS on sda1, internal journal i2c_adapter i2c-6: Detecting device at 6,0x2e with COMPANY: 0x41 and VERSTEP: 0x62 i2c_adapter i2c-6: Autodetecting device at 6,0x2e ... lm85 6-002e: Initializing device lm85 6-002e: LM85_REG_CONFIG is: 0x05 lm85 6-002e: Setting CONFIG to: 0x05 powernow-k8: Found 2 AMD Athlon 64 / Opteron processors (version 1.50.3) powernow-k8: 0 : fid 0xc (2000 MHz), vid 0x2 (1500 mV) powernow-k8: 1 : fid 0xa (1800 MHz), vid 0x6 (1400 mV) powernow-k8: 2 : fid 0x2 (1000 MHz), vid 0xe (1200 mV) cpu_init done, current fid 0xc, vid 0x2 powernow-k8: 0 : fid 0xc (2000 MHz), vid 0x2 (1500 mV) powernow-k8: 1 : fid 0xa (1800 MHz), vid 0x6 (1400 mV) powernow-k8: 2 : fid 0x2 (1000 MHz), vid 0xe (1200 mV) cpu_init done, current fid 0xc, vid 0x2 tg3.c:v3.37 (August 25, 2005) ACPI: PCI Interrupt 0000:02:09.0[A] -> GSI 24 (level, low) -> IRQ 185 eth0: Tigon3 [partno(BCM95703A30) rev 1002 PHY(5703)] (PCI:33MHz:64-bit) 10/100/1000BaseT Ethernet 00:e0:81:2c:90:0a eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] TSOcap[1] eth0: dma_rwctrl[763f0000] Found 512b device! Using larger block size... Found 512b device! Using larger block size... kjournald starting. Commit interval 5 seconds EXT3 FS on sda3, internal journal EXT3-fs: mounted filesystem with ordered data mode. Found 512b device! Using larger block size... kjournald starting. Commit interval 5 seconds EXT3 FS on sda4, internal journal EXT3-fs: mounted filesystem with ordered data mode. Intel 810 + AC97 Audio, version 1.01, 19:40:55 Aug 29 2005 ACPI: PCI Interrupt 0000:00:07.5[B] -> GSI 17 (level, low) -> IRQ 177 i810: AMD-8111 IOHub found at IO 0xbc00 and 0xb800, MEM 0x0000 and 0x0000, IRQ 177 i810_audio: Audio Controller supports 6 channels. i810_audio: Defaulting to base 2 channel mode. i810_audio: Resetting connection 0 ac97_codec: AC97 Audio codec, id: ADS116 (Analog Devices AD1981B) i810_audio: AC'97 codec 0 supports AMAP, total channels = 2 usbcore: registered new driver usbfs usbcore: registered new driver hub ohci_hcd: 2005 April 22 USB 1.1 'Open' Host Controller (OHCI) Driver (PCI) ACPI: PCI Interrupt 0000:01:00.0[D] -> GSI 19 (level, low) -> IRQ 193 ohci_hcd 0000:01:00.0: Advanced Micro Devices [AMD] AMD-8111 USB ohci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 1 ohci_hcd 0000:01:00.0: irq 193, io mem 0xff2fd000 hub 1-0:1.0: USB hub found hub 1-0:1.0: 3 ports detected ACPI: PCI Interrupt 0000:01:00.1[D] -> GSI 19 (level, low) -> IRQ 193 ohci_hcd 0000:01:00.1: Advanced Micro Devices [AMD] AMD-8111 USB (#2) ohci_hcd 0000:01:00.1: new USB bus registered, assigned bus number 2 ohci_hcd 0000:01:00.1: irq 193, io mem 0xff2fe000 hub 2-0:1.0: USB hub found hub 2-0:1.0: 3 ports detected ieee1394: Initialized config rom entry `ip1394' ohci1394: $Rev: 1299 $ Ben Collins <bcollins@debian.org> ACPI: PCI Interrupt 0000:01:0c.0[A] -> GSI 19 (level, low) -> IRQ 193 ohci1394: fw-host0: OHCI-1394 1.1 (PCI): IRQ=[193] MMIO=[ff2ff000-ff2ff7ff] Max Packet=[2048] ieee1394: Host added: ID:BUS[0-00:1023] GUID[00e0810000303bdf] lm85 6-002e: Reading sensor values eth1394: $Rev: 1264 $ Ben Collins <bcollins@debian.org> eth1394: eth1: IEEE-1394 IPv4 over 1394 Ethernet (fw-host0) lm85 6-002e: Reading config values NET: Registered protocol family 15 pci_hotplug: PCI Hot Plug PCI Core version: 0.5 shpchp: HPC vendor_id 1022 device_id 7460 ss_vid 0 ss_did 0 shpchp: shpc_init: cannot reserve MMIO region shpchp: HPC vendor_id 1022 device_id 7450 ss_vid 0 ss_did 0 shpchp: shpc_init: cannot reserve MMIO region shpchp: HPC vendor_id 1022 device_id 7450 ss_vid 0 ss_did 0 shpchp: shpc_init: cannot reserve MMIO region shpchp: HPC vendor_id 1022 device_id 7455 ss_vid 0 ss_did 0 shpchp: shpc_init: cannot reserve MMIO region shpchp: Standard Hot Plug PCI Controller Driver version: 0.4 hw_random: AMD768 system management I/O registers at 0x5000. hw_random hardware driver 1.0.0 loaded saa7146: register extension 'budget_ci dvb'. ACPI: PCI Interrupt 0000:02:07.0[A] -> GSI 26 (level, low) -> IRQ 201 saa7146: found saa7146 @ mem ffffc20000198c00 (revision 1, irq 201) (0x13c2,0x1011). DVB: registering new adapter (TT-Budget/WinTV-NOVA-T PCI). adapter has MAC addr = 00:d0:5c:03:23:34 DVB: registering frontend 0 (Philips TDA10045H DVB-T)... Floppy drive(s): fd0 is 1.44M FDC 0 is a post-1991 82077 hda: ATAPI 48X DVD-ROM DVD-R CD-R/RW drive, 2048kB Cache, UDMA(33) Uniform CD-ROM driver Revision: 3.20 tg3: eth0: Link is up at 100 Mbps, full duplex. tg3: eth0: Flow control is off for TX and off for RX. NET: Registered protocol family 10 Disabled Privacy Extensions on device ffffffff8052efc0(lo) IPv6 over IPv4 tunneling driver WDT driver for the Winbond(TM) W83627HF Super I/O chip initialising. w83627hf WDT: initialized. timeout=60 sec (nowayout=0) selinux_register_security: Registering secondary module capability Capability LSM initialized as secondary Installing knfsd (copyright (C) 1996 okir@monad.swb.de). NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory NFSD: starting 90-second grace period crc32c: Unknown symbol crc32c_le Initializing IPsec netlink socket ipcomp6: Unknown symbol xfrm6_tunnel_free_spi ipcomp6: Unknown symbol xfrm6_tunnel_alloc_spi ipcomp6: Unknown symbol xfrm6_tunnel_spi_lookup NET: Registered protocol family 4 ioctl32(ipx_configure:6768): Unknown cmd fd(3) cmd(000089e1){00} arg(ffffd85b) on socket:[13557] Process accounting paused lm85 6-002e: Reading sensor values ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849 2005-09-21 1:22 ` Petr Vandrovec @ 2005-09-21 15:59 ` Christoph Lameter 2005-09-22 19:52 ` Christoph Lameter 0 siblings, 1 reply; 43+ messages in thread From: Christoph Lameter @ 2005-09-21 15:59 UTC (permalink / raw) To: Petr Vandrovec; +Cc: Andrew Morton, alokk, linux-kernel, manfred On Wed, 21 Sep 2005, Petr Vandrovec wrote: > Simple... I just boot any kernel after 2.6.13, and it dies in front of me. > Currently I'm using config below, which I boot with 'rootdelay=60' so panic > in keventd happens before panic due to no root filesystem. No ACPI. > Nothing. 100% reproducible. Maybe I should enable embedded options and > remove all other device drivers still present in the kernel. > > Below config is dmesg from 2.6.13, which has no problems with comming up. > Maybe > you'll find some clue there, but I see none. Node #0 has 1GB of memory, so > it should have no need to borrow blocks from node #1 when this kernel is able > to boot in 16MB of memory... Hmm. This likely has something to do with debugging code. I was unable to reproduce this on amd64 with your config. I get another failure with 2.6.14-rc2 on ia64 if I enable all the debugging features that you have. The system works fine if no debugging is configured: kernel BUG at kernel/workqueue.c:541! swapper[1]: bugcheck! 0 [1] Modules linked in: Pid: 1, CPU 0, comm: swapper psr : 00001010085a6010 ifs : 8000000000000105 ip : [<a0000001000e5b10>] Not tainted ip is at init_workqueues+0x90/0xa0 unat: 0000000000000000 pfs : 0000000000000105 rsc : 0000000000000003 rnat: 0000000000000000 bsps: 000000000001003e pr : 000000000000a541 ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70433f csd : 0000000000000000 ssd : 0000000000000000 b0 : a0000001000e5b10 b6 : e000003002385ad0 b7 : e000000001fffc00 f6 : 1003e000000003f34da65 f7 : 1003e20c49ba5e353f7cf f8 : 1003e00000000000000c8 f9 : 10006c7fffffffd73ea5c f10 : 0fffdfffffffffaf00000 f11 : 1003e0000000000000000 r1 : a000000100cd7670 r2 : 0000000000000001 r3 : e00000b0057c0e00 r8 : 0000000000000029 r9 : 0000000000004000 r10 : 0000000000000001 r11 : 0000000000000002 r12 : e00000b0057c7de0 r13 : e00000b0057c0000 r14 : a0000001009690b0 r15 : e00000b0057c0df4 r16 : e00000b0057c0e00 r17 : 0000000000000001 r18 : 0000000000000002 r19 : a0000001009690b0 r20 : a000000100ad78f8 r21 : ffffffffffffffff r22 : e000000001fffc00 r23 : a000000100b11748 r24 : 0000000000000000 r25 : 0000000000000004 r26 : e00000b0057c0df0 r27 : e00000b0057c7cc8 r28 : e00000b0057c7cd0 r29 : 0000000000000c46 r30 : 0000000000000c46 r31 : 0000000000000308 ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849 2005-09-21 15:59 ` Christoph Lameter @ 2005-09-22 19:52 ` Christoph Lameter 2005-09-22 20:01 ` Andrew Morton 0 siblings, 1 reply; 43+ messages in thread From: Christoph Lameter @ 2005-09-22 19:52 UTC (permalink / raw) To: Petr Vandrovec; +Cc: Andrew Morton, alokk, linux-kernel, manfred On Wed, 21 Sep 2005, Christoph Lameter wrote: > Hmm. This likely has something to do with debugging code. I was unable to > reproduce this on amd64 with your config. I get another failure with > 2.6.14-rc2 on ia64 if I enable all the debugging features that you have. > The system works fine if no debugging is configured: > > kernel BUG at kernel/workqueue.c:541! > swapper[1]: bugcheck! 0 [1] I fixed the above issue (a structure became larger than the maximum allowed by the slab allocator) and the kernel boots fine now on an 8 way ia64. Cannot reproduce the problem. # # Automatically generated make config: don't edit # Linux kernel version: 2.6.14-rc2-git1 # Thu Sep 22 10:31:35 2005 # # # Code maturity level options # CONFIG_EXPERIMENTAL=y CONFIG_CLEAN_COMPILE=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 # # General setup # CONFIG_LOCALVERSION="" CONFIG_LOCALVERSION_AUTO=y CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_POSIX_MQUEUE=y # CONFIG_BSD_PROCESS_ACCT is not set CONFIG_SYSCTL=y # CONFIG_AUDIT is not set CONFIG_HOTPLUG=y CONFIG_KOBJECT_UEVENT=y # CONFIG_IKCONFIG is not set CONFIG_CPUSETS=y CONFIG_INITRAMFS_SOURCE="" # CONFIG_EMBEDDED is not set CONFIG_KALLSYMS=y CONFIG_KALLSYMS_ALL=y # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_EPOLL=y CONFIG_SHMEM=y CONFIG_CC_ALIGN_FUNCTIONS=0 CONFIG_CC_ALIGN_LABELS=0 CONFIG_CC_ALIGN_LOOPS=0 CONFIG_CC_ALIGN_JUMPS=0 # CONFIG_TINY_SHMEM is not set CONFIG_BASE_SMALL=0 # # Loadable module support # CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y # CONFIG_MODULE_FORCE_UNLOAD is not set CONFIG_OBSOLETE_MODPARM=y # CONFIG_MODVERSIONS is not set # CONFIG_MODULE_SRCVERSION_ALL is not set CONFIG_KMOD=y CONFIG_STOP_MACHINE=y # # Processor type and features # CONFIG_IA64=y CONFIG_64BIT=y CONFIG_MMU=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_TIME_INTERPOLATION=y CONFIG_EFI=y CONFIG_GENERIC_IOMAP=y CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER=y CONFIG_IA64_UNCACHED_ALLOCATOR=y # CONFIG_IA64_GENERIC is not set # CONFIG_IA64_DIG is not set # CONFIG_IA64_HP_ZX1 is not set # CONFIG_IA64_HP_ZX1_SWIOTLB is not set CONFIG_IA64_SGI_SN2=y # CONFIG_IA64_HP_SIM is not set # CONFIG_ITANIUM is not set CONFIG_MCKINLEY=y # CONFIG_IA64_PAGE_SIZE_4KB is not set # CONFIG_IA64_PAGE_SIZE_8KB is not set CONFIG_IA64_PAGE_SIZE_16KB=y # CONFIG_IA64_PAGE_SIZE_64KB is not set # CONFIG_HZ_100 is not set CONFIG_HZ_250=y # CONFIG_HZ_1000 is not set CONFIG_HZ=250 CONFIG_IA64_L1_CACHE_SHIFT=7 CONFIG_NUMA=y CONFIG_VIRTUAL_MEM_MAP=y CONFIG_HOLES_IN_ZONE=y CONFIG_ARCH_DISCONTIGMEM_ENABLE=y # CONFIG_IA64_CYCLONE is not set CONFIG_IOSAPIC=y CONFIG_IA64_SGI_SN_XP=m CONFIG_FORCE_MAX_ZONEORDER=18 CONFIG_SMP=y CONFIG_NR_CPUS=512 # CONFIG_HOTPLUG_CPU is not set CONFIG_SCHED_SMT=y CONFIG_PREEMPT=y CONFIG_SELECT_MEMORY_MODEL=y # CONFIG_FLATMEM_MANUAL is not set CONFIG_DISCONTIGMEM_MANUAL=y # CONFIG_SPARSEMEM_MANUAL is not set CONFIG_DISCONTIGMEM=y CONFIG_FLAT_NODE_MEM_MAP=y CONFIG_NEED_MULTIPLE_NODES=y # CONFIG_SPARSEMEM_STATIC is not set CONFIG_IA32_SUPPORT=y CONFIG_COMPAT=y CONFIG_IA64_MCA_RECOVERY=y CONFIG_PERFMON=y CONFIG_IA64_PALINFO=y # # Firmware Drivers # CONFIG_EFI_VARS=y CONFIG_EFI_PCDP=y # CONFIG_DELL_RBU is not set CONFIG_BINFMT_ELF=y # CONFIG_BINFMT_MISC is not set # # Power management and ACPI # CONFIG_PM=y # CONFIG_PM_DEBUG is not set # # ACPI (Advanced Configuration and Power Interface) Support # CONFIG_ACPI=y # CONFIG_ACPI_BUTTON is not set # CONFIG_ACPI_FAN is not set # CONFIG_ACPI_PROCESSOR is not set CONFIG_ACPI_NUMA=y CONFIG_ACPI_BLACKLIST_YEAR=0 # CONFIG_ACPI_DEBUG is not set CONFIG_ACPI_POWER=y CONFIG_ACPI_SYSTEM=y # CONFIG_ACPI_CONTAINER is not set # # CPU Frequency scaling # # CONFIG_CPU_FREQ is not set # # Bus options (PCI, PCMCIA) # CONFIG_PCI=y CONFIG_PCI_DOMAINS=y # CONFIG_PCI_MSI is not set CONFIG_PCI_LEGACY_PROC=y # CONFIG_PCI_DEBUG is not set # # PCI Hotplug Support # CONFIG_HOTPLUG_PCI=y # CONFIG_HOTPLUG_PCI_FAKE is not set # CONFIG_HOTPLUG_PCI_ACPI is not set # CONFIG_HOTPLUG_PCI_CPCI is not set # CONFIG_HOTPLUG_PCI_SHPC is not set CONFIG_HOTPLUG_PCI_SGI=y # # PCCARD (PCMCIA/CardBus) support # # CONFIG_PCCARD is not set # # Networking # CONFIG_NET=y # # Networking options # CONFIG_PACKET=y CONFIG_PACKET_MMAP=y CONFIG_UNIX=y # CONFIG_NET_KEY is not set CONFIG_INET=y CONFIG_IP_MULTICAST=y # CONFIG_IP_ADVANCED_ROUTER is not set CONFIG_IP_FIB_HASH=y # CONFIG_IP_PNP is not set # CONFIG_NET_IPIP is not set # CONFIG_NET_IPGRE is not set # CONFIG_IP_MROUTE is not set # CONFIG_ARPD is not set CONFIG_SYN_COOKIES=y # CONFIG_INET_AH is not set # CONFIG_INET_ESP is not set # CONFIG_INET_IPCOMP is not set # CONFIG_INET_TUNNEL is not set CONFIG_INET_DIAG=y CONFIG_INET_TCP_DIAG=y # CONFIG_TCP_CONG_ADVANCED is not set CONFIG_TCP_CONG_BIC=y CONFIG_IPV6=m # CONFIG_IPV6_PRIVACY is not set # CONFIG_INET6_AH is not set # CONFIG_INET6_ESP is not set # CONFIG_INET6_IPCOMP is not set # CONFIG_INET6_TUNNEL is not set # CONFIG_IPV6_TUNNEL is not set # CONFIG_NETFILTER is not set # # DCCP Configuration (EXPERIMENTAL) # # CONFIG_IP_DCCP is not set # # SCTP Configuration (EXPERIMENTAL) # # CONFIG_IP_SCTP is not set # CONFIG_ATM is not set # CONFIG_BRIDGE is not set # CONFIG_VLAN_8021Q is not set # CONFIG_DECNET is not set # CONFIG_LLC2 is not set # CONFIG_IPX is not set # CONFIG_ATALK is not set # CONFIG_X25 is not set # CONFIG_LAPB is not set # CONFIG_NET_DIVERT is not set # CONFIG_ECONET is not set # CONFIG_WAN_ROUTER is not set # CONFIG_NET_SCHED is not set # CONFIG_NET_CLS_ROUTE is not set # # Network testing # # CONFIG_NET_PKTGEN is not set # CONFIG_HAMRADIO is not set # CONFIG_IRDA is not set # CONFIG_BT is not set # CONFIG_IEEE80211 is not set # # Device Drivers # # # Generic Driver Options # CONFIG_STANDALONE=y CONFIG_PREVENT_FIRMWARE_BUILD=y CONFIG_FW_LOADER=y # CONFIG_DEBUG_DRIVER is not set # # Connector - unified userspace <-> kernelspace linker # # CONFIG_CONNECTOR is not set # # Memory Technology Devices (MTD) # # CONFIG_MTD is not set # # Parallel port support # # CONFIG_PARPORT is not set # # Plug and Play support # # CONFIG_PNP is not set # # Block devices # # CONFIG_BLK_CPQ_DA is not set # CONFIG_BLK_CPQ_CISS_DA is not set # CONFIG_BLK_DEV_DAC960 is not set # CONFIG_BLK_DEV_UMEM is not set # CONFIG_BLK_DEV_COW_COMMON is not set CONFIG_BLK_DEV_LOOP=y CONFIG_BLK_DEV_CRYPTOLOOP=m CONFIG_BLK_DEV_NBD=m # CONFIG_BLK_DEV_SX8 is not set # CONFIG_BLK_DEV_UB is not set CONFIG_BLK_DEV_RAM=y CONFIG_BLK_DEV_RAM_COUNT=16 CONFIG_BLK_DEV_RAM_SIZE=4096 CONFIG_BLK_DEV_INITRD=y # CONFIG_CDROM_PKTCDVD is not set # # IO Schedulers # CONFIG_IOSCHED_NOOP=y CONFIG_IOSCHED_AS=y CONFIG_IOSCHED_DEADLINE=y CONFIG_IOSCHED_CFQ=y CONFIG_ATA_OVER_ETH=m # # ATA/ATAPI/MFM/RLL support # CONFIG_IDE=y CONFIG_BLK_DEV_IDE=y # # Please see Documentation/ide.txt for help/info on IDE drives # # CONFIG_BLK_DEV_IDE_SATA is not set CONFIG_BLK_DEV_IDEDISK=y # CONFIG_IDEDISK_MULTI_MODE is not set CONFIG_BLK_DEV_IDECD=y # CONFIG_BLK_DEV_IDETAPE is not set # CONFIG_BLK_DEV_IDEFLOPPY is not set # CONFIG_BLK_DEV_IDESCSI is not set # CONFIG_IDE_TASK_IOCTL is not set # # IDE chipset support/bugfixes # CONFIG_IDE_GENERIC=y CONFIG_BLK_DEV_IDEPCI=y # CONFIG_IDEPCI_SHARE_IRQ is not set # CONFIG_BLK_DEV_OFFBOARD is not set # CONFIG_BLK_DEV_GENERIC is not set # CONFIG_BLK_DEV_OPTI621 is not set CONFIG_BLK_DEV_IDEDMA_PCI=y # CONFIG_BLK_DEV_IDEDMA_FORCED is not set CONFIG_IDEDMA_PCI_AUTO=y # CONFIG_IDEDMA_ONLYDISK is not set # CONFIG_BLK_DEV_AEC62XX is not set # CONFIG_BLK_DEV_ALI15X3 is not set # CONFIG_BLK_DEV_AMD74XX is not set # CONFIG_BLK_DEV_CMD64X is not set # CONFIG_BLK_DEV_TRIFLEX is not set # CONFIG_BLK_DEV_CY82C693 is not set # CONFIG_BLK_DEV_CS5520 is not set # CONFIG_BLK_DEV_CS5530 is not set # CONFIG_BLK_DEV_HPT34X is not set # CONFIG_BLK_DEV_HPT366 is not set # CONFIG_BLK_DEV_SC1200 is not set # CONFIG_BLK_DEV_PIIX is not set # CONFIG_BLK_DEV_IT821X is not set # CONFIG_BLK_DEV_NS87415 is not set # CONFIG_BLK_DEV_PDC202XX_OLD is not set # CONFIG_BLK_DEV_PDC202XX_NEW is not set # CONFIG_BLK_DEV_SVWKS is not set CONFIG_BLK_DEV_SGIIOC4=y # CONFIG_BLK_DEV_SIIMAGE is not set # CONFIG_BLK_DEV_SLC90E66 is not set # CONFIG_BLK_DEV_TRM290 is not set # CONFIG_BLK_DEV_VIA82CXXX is not set # CONFIG_IDE_ARM is not set CONFIG_BLK_DEV_IDEDMA=y # CONFIG_IDEDMA_IVB is not set CONFIG_IDEDMA_AUTO=y # CONFIG_BLK_DEV_HD is not set # # SCSI device support # # CONFIG_RAID_ATTRS is not set CONFIG_SCSI=y CONFIG_SCSI_PROC_FS=y # # SCSI support type (disk, tape, CD-ROM) # CONFIG_BLK_DEV_SD=y CONFIG_CHR_DEV_ST=m # CONFIG_CHR_DEV_OSST is not set CONFIG_BLK_DEV_SR=m # CONFIG_BLK_DEV_SR_VENDOR is not set CONFIG_CHR_DEV_SG=m CONFIG_CHR_DEV_SCH=m # # Some SCSI devices (e.g. CD jukebox) support multiple LUNs # # CONFIG_SCSI_MULTI_LUN is not set CONFIG_SCSI_CONSTANTS=y # CONFIG_SCSI_LOGGING is not set # # SCSI Transport Attributes # CONFIG_SCSI_SPI_ATTRS=y CONFIG_SCSI_FC_ATTRS=y # CONFIG_SCSI_ISCSI_ATTRS is not set # CONFIG_SCSI_SAS_ATTRS is not set # # SCSI low-level drivers # # CONFIG_BLK_DEV_3W_XXXX_RAID is not set # CONFIG_SCSI_3W_9XXX is not set # CONFIG_SCSI_ACARD is not set # CONFIG_SCSI_AACRAID is not set # CONFIG_SCSI_AIC7XXX is not set # CONFIG_SCSI_AIC7XXX_OLD is not set # CONFIG_SCSI_AIC79XX is not set # CONFIG_MEGARAID_NEWGEN is not set # CONFIG_MEGARAID_LEGACY is not set CONFIG_SCSI_SATA=y # CONFIG_SCSI_SATA_AHCI is not set # CONFIG_SCSI_SATA_SVW is not set # CONFIG_SCSI_ATA_PIIX is not set # CONFIG_SCSI_SATA_MV is not set # CONFIG_SCSI_SATA_NV is not set # CONFIG_SCSI_SATA_PROMISE is not set # CONFIG_SCSI_SATA_QSTOR is not set # CONFIG_SCSI_SATA_SX4 is not set # CONFIG_SCSI_SATA_SIL is not set # CONFIG_SCSI_SATA_SIS is not set # CONFIG_SCSI_SATA_ULI is not set # CONFIG_SCSI_SATA_VIA is not set CONFIG_SCSI_SATA_VITESSE=y # CONFIG_SCSI_DMX3191D is not set # CONFIG_SCSI_FUTURE_DOMAIN is not set # CONFIG_SCSI_IPS is not set # CONFIG_SCSI_INITIO is not set # CONFIG_SCSI_INIA100 is not set # CONFIG_SCSI_SYM53C8XX_2 is not set # CONFIG_SCSI_IPR is not set # CONFIG_SCSI_QLOGIC_FC is not set CONFIG_SCSI_QLOGIC_1280=y # CONFIG_SCSI_QLOGIC_1280_1040 is not set CONFIG_SCSI_QLA2XXX=y # CONFIG_SCSI_QLA21XX is not set CONFIG_SCSI_QLA22XX=y CONFIG_SCSI_QLA2300=y CONFIG_SCSI_QLA2322=y # CONFIG_SCSI_QLA6312 is not set # CONFIG_SCSI_QLA24XX is not set # CONFIG_SCSI_LPFC is not set # CONFIG_SCSI_DC395x is not set # CONFIG_SCSI_DC390T is not set # CONFIG_SCSI_DEBUG is not set # # Multi-device support (RAID and LVM) # CONFIG_MD=y CONFIG_BLK_DEV_MD=y CONFIG_MD_LINEAR=y CONFIG_MD_RAID0=y CONFIG_MD_RAID1=y # CONFIG_MD_RAID10 is not set CONFIG_MD_RAID5=y # CONFIG_MD_RAID6 is not set CONFIG_MD_MULTIPATH=y # CONFIG_MD_FAULTY is not set CONFIG_BLK_DEV_DM=y CONFIG_DM_CRYPT=m CONFIG_DM_SNAPSHOT=m CONFIG_DM_MIRROR=m CONFIG_DM_ZERO=m CONFIG_DM_MULTIPATH=m CONFIG_DM_MULTIPATH_EMC=m # # Fusion MPT device support # CONFIG_FUSION=y CONFIG_FUSION_SPI=y CONFIG_FUSION_FC=y # CONFIG_FUSION_SAS is not set CONFIG_FUSION_MAX_SGE=128 CONFIG_FUSION_CTL=m # # IEEE 1394 (FireWire) support # # CONFIG_IEEE1394 is not set # # I2O device support # # CONFIG_I2O is not set # # Network device support # CONFIG_NETDEVICES=y # CONFIG_DUMMY is not set # CONFIG_BONDING is not set # CONFIG_EQUALIZER is not set # CONFIG_TUN is not set # # ARCnet devices # # CONFIG_ARCNET is not set # # PHY device support # # # Ethernet (10 or 100Mbit) # # CONFIG_NET_ETHERNET is not set # # Ethernet (1000 Mbit) # # CONFIG_ACENIC is not set # CONFIG_DL2K is not set # CONFIG_E1000 is not set # CONFIG_NS83820 is not set # CONFIG_HAMACHI is not set # CONFIG_YELLOWFIN is not set # CONFIG_R8169 is not set # CONFIG_SIS190 is not set # CONFIG_SKGE is not set # CONFIG_SK98LIN is not set CONFIG_TIGON3=y # CONFIG_BNX2 is not set # # Ethernet (10000 Mbit) # # CONFIG_CHELSIO_T1 is not set # CONFIG_IXGB is not set CONFIG_S2IO=m # CONFIG_S2IO_NAPI is not set # CONFIG_2BUFF_MODE is not set # # Token Ring devices # # CONFIG_TR is not set # # Wireless LAN (non-hamradio) # # CONFIG_NET_RADIO is not set # # Wan interfaces # # CONFIG_WAN is not set # CONFIG_FDDI is not set # CONFIG_HIPPI is not set # CONFIG_PPP is not set # CONFIG_SLIP is not set # CONFIG_NET_FC is not set # CONFIG_SHAPER is not set CONFIG_NETCONSOLE=y CONFIG_NETPOLL=y # CONFIG_NETPOLL_RX is not set # CONFIG_NETPOLL_TRAP is not set CONFIG_NET_POLL_CONTROLLER=y # # ISDN subsystem # # CONFIG_ISDN is not set # # Telephony Support # # CONFIG_PHONE is not set # # Input device support # CONFIG_INPUT=y # # Userland interfaces # CONFIG_INPUT_MOUSEDEV=y # CONFIG_INPUT_MOUSEDEV_PSAUX is not set CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024 CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768 # CONFIG_INPUT_JOYDEV is not set # CONFIG_INPUT_TSDEV is not set # CONFIG_INPUT_EVDEV is not set # CONFIG_INPUT_EVBUG is not set # # Input Device Drivers # # CONFIG_INPUT_KEYBOARD is not set # CONFIG_INPUT_MOUSE is not set # CONFIG_INPUT_JOYSTICK is not set # CONFIG_INPUT_TOUCHSCREEN is not set # CONFIG_INPUT_MISC is not set # # Hardware I/O ports # # CONFIG_SERIO is not set # CONFIG_GAMEPORT is not set # # Character devices # CONFIG_VT=y CONFIG_VT_CONSOLE=y CONFIG_HW_CONSOLE=y CONFIG_SERIAL_NONSTANDARD=y # CONFIG_ROCKETPORT is not set # CONFIG_CYCLADES is not set # CONFIG_DIGIEPCA is not set # CONFIG_MOXA_SMARTIO is not set # CONFIG_ISI is not set # CONFIG_SYNCLINKMP is not set # CONFIG_N_HDLC is not set # CONFIG_SPECIALIX is not set # CONFIG_SX is not set # CONFIG_STALDRV is not set CONFIG_SGI_SNSC=y CONFIG_SGI_TIOCX=y CONFIG_SGI_MBCS=m # # Serial drivers # # CONFIG_SERIAL_8250 is not set # # Non-8250 serial port support # CONFIG_SERIAL_CORE=y CONFIG_SERIAL_CORE_CONSOLE=y CONFIG_SERIAL_SGI_L1_CONSOLE=y # CONFIG_SERIAL_JSM is not set CONFIG_SERIAL_SGI_IOC4=y CONFIG_UNIX98_PTYS=y CONFIG_LEGACY_PTYS=y CONFIG_LEGACY_PTY_COUNT=256 # # IPMI # # CONFIG_IPMI_HANDLER is not set # # Watchdog Cards # # CONFIG_WATCHDOG is not set # CONFIG_HW_RANDOM is not set CONFIG_EFI_RTC=y # CONFIG_DTLK is not set # CONFIG_R3964 is not set # CONFIG_APPLICOM is not set # # Ftape, the floppy tape device driver # # CONFIG_AGP is not set # CONFIG_DRM is not set CONFIG_RAW_DRIVER=m # CONFIG_HPET is not set CONFIG_MAX_RAW_DEVS=256 # CONFIG_HANGCHECK_TIMER is not set CONFIG_MMTIMER=y # # TPM devices # # CONFIG_TCG_TPM is not set # # I2C support # # CONFIG_I2C is not set # # Dallas's 1-wire bus # # CONFIG_W1 is not set # # Hardware Monitoring support # # CONFIG_HWMON is not set # CONFIG_HWMON_VID is not set # # Misc devices # # # Multimedia Capabilities Port drivers # # # Multimedia devices # # CONFIG_VIDEO_DEV is not set # # Digital Video Broadcasting Devices # # CONFIG_DVB is not set # # Graphics support # # CONFIG_FB is not set # # Console display driver support # CONFIG_VGA_CONSOLE=y CONFIG_DUMMY_CONSOLE=y # # Sound # # CONFIG_SOUND is not set # # USB support # CONFIG_USB_ARCH_HAS_HCD=y CONFIG_USB_ARCH_HAS_OHCI=y CONFIG_USB=m # CONFIG_USB_DEBUG is not set # # Miscellaneous USB options # # CONFIG_USB_DEVICEFS is not set # CONFIG_USB_BANDWIDTH is not set # CONFIG_USB_DYNAMIC_MINORS is not set # CONFIG_USB_SUSPEND is not set # CONFIG_USB_OTG is not set # # USB Host Controller Drivers # CONFIG_USB_EHCI_HCD=m # CONFIG_USB_EHCI_SPLIT_ISO is not set # CONFIG_USB_EHCI_ROOT_HUB_TT is not set # CONFIG_USB_ISP116X_HCD is not set CONFIG_USB_OHCI_HCD=m # CONFIG_USB_OHCI_BIG_ENDIAN is not set CONFIG_USB_OHCI_LITTLE_ENDIAN=y CONFIG_USB_UHCI_HCD=m # CONFIG_USB_SL811_HCD is not set # # USB Device Class drivers # # CONFIG_USB_BLUETOOTH_TTY is not set # CONFIG_USB_ACM is not set # CONFIG_USB_PRINTER is not set # # NOTE: USB_STORAGE enables SCSI, and 'SCSI disk support' may also be needed; see USB_STORAGE Help for more information # # CONFIG_USB_STORAGE is not set # # USB Input Devices # CONFIG_USB_HID=m CONFIG_USB_HIDINPUT=y # CONFIG_HID_FF is not set # CONFIG_USB_HIDDEV is not set # # USB HID Boot Protocol drivers # # CONFIG_USB_KBD is not set # CONFIG_USB_MOUSE is not set # CONFIG_USB_AIPTEK is not set # CONFIG_USB_WACOM is not set # CONFIG_USB_ACECAD is not set # CONFIG_USB_KBTAB is not set # CONFIG_USB_POWERMATE is not set # CONFIG_USB_MTOUCH is not set # CONFIG_USB_ITMTOUCH is not set # CONFIG_USB_EGALAX is not set # CONFIG_USB_YEALINK is not set # CONFIG_USB_XPAD is not set # CONFIG_USB_ATI_REMOTE is not set # CONFIG_USB_KEYSPAN_REMOTE is not set # CONFIG_USB_APPLETOUCH is not set # # USB Imaging devices # # CONFIG_USB_MDC800 is not set # CONFIG_USB_MICROTEK is not set # # USB Multimedia devices # # CONFIG_USB_DABUSB is not set # # Video4Linux support is needed for USB Multimedia device support # # # USB Network Adapters # # CONFIG_USB_CATC is not set # CONFIG_USB_KAWETH is not set # CONFIG_USB_PEGASUS is not set # CONFIG_USB_RTL8150 is not set # CONFIG_USB_USBNET is not set CONFIG_USB_MON=y # # USB port drivers # # # USB Serial Converter support # # CONFIG_USB_SERIAL is not set # # USB Miscellaneous drivers # # CONFIG_USB_EMI62 is not set # CONFIG_USB_EMI26 is not set # CONFIG_USB_AUERSWALD is not set # CONFIG_USB_RIO500 is not set # CONFIG_USB_LEGOTOWER is not set # CONFIG_USB_LCD is not set # CONFIG_USB_LED is not set # CONFIG_USB_CYTHERM is not set # CONFIG_USB_PHIDGETKIT is not set # CONFIG_USB_PHIDGETSERVO is not set # CONFIG_USB_IDMOUSE is not set # CONFIG_USB_SISUSBVGA is not set # CONFIG_USB_LD is not set # # USB DSL modem support # # # USB Gadget Support # # CONFIG_USB_GADGET is not set # # MMC/SD Card support # # CONFIG_MMC is not set # # InfiniBand support # CONFIG_INFINIBAND=m # CONFIG_INFINIBAND_USER_MAD is not set # CONFIG_INFINIBAND_USER_ACCESS is not set CONFIG_INFINIBAND_MTHCA=m # CONFIG_INFINIBAND_MTHCA_DEBUG is not set CONFIG_INFINIBAND_IPOIB=m # CONFIG_INFINIBAND_IPOIB_DEBUG is not set # # SN Devices # CONFIG_SGI_IOC4=y # # File systems # CONFIG_EXT2_FS=y CONFIG_EXT2_FS_XATTR=y CONFIG_EXT2_FS_POSIX_ACL=y CONFIG_EXT2_FS_SECURITY=y # CONFIG_EXT2_FS_XIP is not set CONFIG_EXT3_FS=y CONFIG_EXT3_FS_XATTR=y CONFIG_EXT3_FS_POSIX_ACL=y CONFIG_EXT3_FS_SECURITY=y CONFIG_JBD=y # CONFIG_JBD_DEBUG is not set CONFIG_FS_MBCACHE=y CONFIG_REISERFS_FS=y # CONFIG_REISERFS_CHECK is not set # CONFIG_REISERFS_PROC_INFO is not set CONFIG_REISERFS_FS_XATTR=y CONFIG_REISERFS_FS_POSIX_ACL=y CONFIG_REISERFS_FS_SECURITY=y # CONFIG_JFS_FS is not set CONFIG_FS_POSIX_ACL=y CONFIG_XFS_FS=y CONFIG_XFS_EXPORT=y CONFIG_XFS_QUOTA=y # CONFIG_XFS_SECURITY is not set CONFIG_XFS_POSIX_ACL=y CONFIG_XFS_RT=y # CONFIG_MINIX_FS is not set # CONFIG_ROMFS_FS is not set CONFIG_INOTIFY=y CONFIG_QUOTA=y # CONFIG_QFMT_V1 is not set # CONFIG_QFMT_V2 is not set CONFIG_QUOTACTL=y CONFIG_DNOTIFY=y CONFIG_AUTOFS_FS=m CONFIG_AUTOFS4_FS=m # CONFIG_FUSE_FS is not set # # CD-ROM/DVD Filesystems # CONFIG_ISO9660_FS=y CONFIG_JOLIET=y # CONFIG_ZISOFS is not set CONFIG_UDF_FS=m CONFIG_UDF_NLS=y # # DOS/FAT/NT Filesystems # CONFIG_FAT_FS=y # CONFIG_MSDOS_FS is not set CONFIG_VFAT_FS=y CONFIG_FAT_DEFAULT_CODEPAGE=437 CONFIG_FAT_DEFAULT_IOCHARSET="iso8859-1" # CONFIG_NTFS_FS is not set # # Pseudo filesystems # CONFIG_PROC_FS=y CONFIG_PROC_KCORE=y CONFIG_SYSFS=y CONFIG_TMPFS=y CONFIG_HUGETLBFS=y CONFIG_HUGETLB_PAGE=y CONFIG_RAMFS=y # CONFIG_RELAYFS_FS is not set # # Miscellaneous filesystems # # CONFIG_ADFS_FS is not set # CONFIG_AFFS_FS is not set # CONFIG_HFS_FS is not set # CONFIG_HFSPLUS_FS is not set # CONFIG_BEFS_FS is not set # CONFIG_BFS_FS is not set # CONFIG_EFS_FS is not set # CONFIG_CRAMFS is not set # CONFIG_VXFS_FS is not set # CONFIG_HPFS_FS is not set # CONFIG_QNX4FS_FS is not set # CONFIG_SYSV_FS is not set # CONFIG_UFS_FS is not set # # Network File Systems # CONFIG_NFS_FS=m CONFIG_NFS_V3=y # CONFIG_NFS_V3_ACL is not set CONFIG_NFS_V4=y CONFIG_NFS_DIRECTIO=y CONFIG_NFSD=m CONFIG_NFSD_V3=y # CONFIG_NFSD_V3_ACL is not set CONFIG_NFSD_V4=y CONFIG_NFSD_TCP=y CONFIG_LOCKD=m CONFIG_LOCKD_V4=y CONFIG_EXPORTFS=y CONFIG_NFS_COMMON=y CONFIG_SUNRPC=m CONFIG_SUNRPC_GSS=m CONFIG_RPCSEC_GSS_KRB5=m # CONFIG_RPCSEC_GSS_SPKM3 is not set CONFIG_SMB_FS=m # CONFIG_SMB_NLS_DEFAULT is not set CONFIG_CIFS=m # CONFIG_CIFS_STATS is not set # CONFIG_CIFS_XATTR is not set # CONFIG_CIFS_EXPERIMENTAL is not set # CONFIG_NCP_FS is not set # CONFIG_CODA_FS is not set # CONFIG_AFS_FS is not set # CONFIG_9P_FS is not set # # Partition Types # CONFIG_PARTITION_ADVANCED=y # CONFIG_ACORN_PARTITION is not set # CONFIG_OSF_PARTITION is not set # CONFIG_AMIGA_PARTITION is not set # CONFIG_ATARI_PARTITION is not set # CONFIG_MAC_PARTITION is not set CONFIG_MSDOS_PARTITION=y # CONFIG_BSD_DISKLABEL is not set # CONFIG_MINIX_SUBPARTITION is not set # CONFIG_SOLARIS_X86_PARTITION is not set # CONFIG_UNIXWARE_DISKLABEL is not set # CONFIG_LDM_PARTITION is not set CONFIG_SGI_PARTITION=y # CONFIG_ULTRIX_PARTITION is not set # CONFIG_SUN_PARTITION is not set CONFIG_EFI_PARTITION=y # # Native Language Support # CONFIG_NLS=y CONFIG_NLS_DEFAULT="iso8859-1" CONFIG_NLS_CODEPAGE_437=y # CONFIG_NLS_CODEPAGE_737 is not set # CONFIG_NLS_CODEPAGE_775 is not set # CONFIG_NLS_CODEPAGE_850 is not set # CONFIG_NLS_CODEPAGE_852 is not set # CONFIG_NLS_CODEPAGE_855 is not set # CONFIG_NLS_CODEPAGE_857 is not set # CONFIG_NLS_CODEPAGE_860 is not set # CONFIG_NLS_CODEPAGE_861 is not set # CONFIG_NLS_CODEPAGE_862 is not set # CONFIG_NLS_CODEPAGE_863 is not set # CONFIG_NLS_CODEPAGE_864 is not set # CONFIG_NLS_CODEPAGE_865 is not set # CONFIG_NLS_CODEPAGE_866 is not set # CONFIG_NLS_CODEPAGE_869 is not set # CONFIG_NLS_CODEPAGE_936 is not set # CONFIG_NLS_CODEPAGE_950 is not set # CONFIG_NLS_CODEPAGE_932 is not set # CONFIG_NLS_CODEPAGE_949 is not set # CONFIG_NLS_CODEPAGE_874 is not set # CONFIG_NLS_ISO8859_8 is not set # CONFIG_NLS_CODEPAGE_1250 is not set # CONFIG_NLS_CODEPAGE_1251 is not set # CONFIG_NLS_ASCII is not set CONFIG_NLS_ISO8859_1=y # CONFIG_NLS_ISO8859_2 is not set # CONFIG_NLS_ISO8859_3 is not set # CONFIG_NLS_ISO8859_4 is not set # CONFIG_NLS_ISO8859_5 is not set # CONFIG_NLS_ISO8859_6 is not set # CONFIG_NLS_ISO8859_7 is not set # CONFIG_NLS_ISO8859_9 is not set # CONFIG_NLS_ISO8859_13 is not set # CONFIG_NLS_ISO8859_14 is not set # CONFIG_NLS_ISO8859_15 is not set # CONFIG_NLS_KOI8_R is not set # CONFIG_NLS_KOI8_U is not set CONFIG_NLS_UTF8=y # # Library routines # # CONFIG_CRC_CCITT is not set # CONFIG_CRC16 is not set CONFIG_CRC32=y # CONFIG_LIBCRC32C is not set CONFIG_ZLIB_INFLATE=m CONFIG_ZLIB_DEFLATE=m CONFIG_GENERIC_ALLOCATOR=y CONFIG_GENERIC_HARDIRQS=y CONFIG_GENERIC_IRQ_PROBE=y CONFIG_GENERIC_PENDING_IRQ=y # # Profiling support # # CONFIG_PROFILING is not set # # Kernel hacking # # CONFIG_PRINTK_TIME is not set CONFIG_DEBUG_KERNEL=y CONFIG_MAGIC_SYSRQ=y CONFIG_LOG_BUF_SHIFT=20 CONFIG_DETECT_SOFTLOCKUP=y CONFIG_SCHEDSTATS=y CONFIG_DEBUG_SLAB=y CONFIG_DEBUG_PREEMPT=y CONFIG_DEBUG_SPINLOCK=y CONFIG_DEBUG_SPINLOCK_SLEEP=y # CONFIG_DEBUG_KOBJECT is not set CONFIG_DEBUG_INFO=y CONFIG_DEBUG_FS=y # CONFIG_KPROBES is not set CONFIG_IA64_GRANULE_16MB=y # CONFIG_IA64_GRANULE_64MB is not set # CONFIG_IA64_PRINT_HAZARDS is not set # CONFIG_DISABLE_VHPT is not set # CONFIG_IA64_DEBUG_CMPXCHG is not set # CONFIG_IA64_DEBUG_IRQ is not set CONFIG_SYSVIPC_COMPAT=y # # Security options # # CONFIG_KEYS is not set # CONFIG_SECURITY is not set # # Cryptographic options # CONFIG_CRYPTO=y CONFIG_CRYPTO_HMAC=y # CONFIG_CRYPTO_NULL is not set # CONFIG_CRYPTO_MD4 is not set CONFIG_CRYPTO_MD5=y CONFIG_CRYPTO_SHA1=m # CONFIG_CRYPTO_SHA256 is not set # CONFIG_CRYPTO_SHA512 is not set # CONFIG_CRYPTO_WP512 is not set # CONFIG_CRYPTO_TGR192 is not set CONFIG_CRYPTO_DES=m # CONFIG_CRYPTO_BLOWFISH is not set # CONFIG_CRYPTO_TWOFISH is not set # CONFIG_CRYPTO_SERPENT is not set # CONFIG_CRYPTO_AES is not set # CONFIG_CRYPTO_CAST5 is not set # CONFIG_CRYPTO_CAST6 is not set # CONFIG_CRYPTO_TEA is not set # CONFIG_CRYPTO_ARC4 is not set # CONFIG_CRYPTO_KHAZAD is not set # CONFIG_CRYPTO_ANUBIS is not set CONFIG_CRYPTO_DEFLATE=m # CONFIG_CRYPTO_MICHAEL_MIC is not set # CONFIG_CRYPTO_CRC32C is not set # CONFIG_CRYPTO_TEST is not set # # Hardware crypto devices # ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849 2005-09-22 19:52 ` Christoph Lameter @ 2005-09-22 20:01 ` Andrew Morton 2005-09-22 21:25 ` Petr Vandrovec 0 siblings, 1 reply; 43+ messages in thread From: Andrew Morton @ 2005-09-22 20:01 UTC (permalink / raw) To: Christoph Lameter; +Cc: vandrove, alokk, linux-kernel, manfred Christoph Lameter <clameter@engr.sgi.com> wrote: > > On Wed, 21 Sep 2005, Christoph Lameter wrote: > > > Hmm. This likely has something to do with debugging code. I was unable to > > reproduce this on amd64 with your config. I get another failure with > > 2.6.14-rc2 on ia64 if I enable all the debugging features that you have. > > The system works fine if no debugging is configured: > > > > kernel BUG at kernel/workqueue.c:541! > > swapper[1]: bugcheck! 0 [1] > > I fixed the above issue (a structure became larger than the maximum > allowed by the slab allocator) and the kernel boots fine now on an 8 way > ia64. Cannot reproduce the problem. Petr can. I think we're still waiting for him to test the below (please): Begin forwarded message: Date: Tue, 20 Sep 2005 18:03:54 -0700 (PDT) From: Christoph Lameter <clameter@engr.sgi.com> To: Petr Vandrovec <vandrove@vc.cvut.cz> Cc: Andrew Morton <akpm@osdl.org>, alokk@calsoftinc.com, linux-kernel@vger.kernel.org, manfred@colorfullife.com Subject: Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849 On Tue, 20 Sep 2005, Petr Vandrovec wrote: > slab belonging to node#1, while having acquired lock for cachep belonging > to node #0. Due to this check_spinlock_acquired_node(cachep, nodeid) fails > (check_spinlock_acquired_node(cachep, 0) would succeed). Hmmm. If a node runs out of memory then pages from another node may end up on the slab list of a node. But it seems that free_block cannot handle that properly. How are you producing the problem? Could you try the following patch: --- The numa slab allocator may allocate pages from foreign nodes onto the lists for a particular node if a node runs out of memory. Inspecting the slab->nodeid field will not reflect that the page is now in use for the slabs of another node. This patch fixes that issue by adding a node field to free_block so that the caller can indicate which node currently uses a slab. Also removes the check for the current node from kmalloc_cache_node since the process may shift later to another node which may lead to an allocation on another node than intended. Signed-off-by: Christoph Lameter <clameter@sgi.com> Index: linux-2.6.14-rc1/mm/slab.c =================================================================== --- linux-2.6.14-rc1.orig/mm/slab.c 2005-09-21 00:09:05.000000000 +0000 +++ linux-2.6.14-rc1/mm/slab.c 2005-09-21 00:48:12.000000000 +0000 @@ -639,7 +639,7 @@ static enum { static DEFINE_PER_CPU(struct work_struct, reap_work); -static void free_block(kmem_cache_t* cachep, void** objpp, int len); +static void free_block(kmem_cache_t* cachep, void** objpp, int len, int node); static void enable_cpucache (kmem_cache_t *cachep); static void cache_reap (void *unused); static int __node_shrink(kmem_cache_t *cachep, int node); @@ -804,7 +804,7 @@ static inline void __drain_alien_cache(k if (ac->avail) { spin_lock(&rl3->list_lock); - free_block(cachep, ac->entry, ac->avail); + free_block(cachep, ac->entry, ac->avail, node); ac->avail = 0; spin_unlock(&rl3->list_lock); } @@ -925,7 +925,7 @@ static int __devinit cpuup_callback(stru /* Free limit for this kmem_list3 */ l3->free_limit -= cachep->batchcount; if (nc) - free_block(cachep, nc->entry, nc->avail); + free_block(cachep, nc->entry, nc->avail, node); if (!cpus_empty(mask)) { spin_unlock(&l3->list_lock); @@ -934,7 +934,7 @@ static int __devinit cpuup_callback(stru if (l3->shared) { free_block(cachep, l3->shared->entry, - l3->shared->avail); + l3->shared->avail, node); kfree(l3->shared); l3->shared = NULL; } @@ -1882,12 +1882,13 @@ static void do_drain(void *arg) { kmem_cache_t *cachep = (kmem_cache_t*)arg; struct array_cache *ac; + int node = numa_node_id(); check_irq_off(); ac = ac_data(cachep); - spin_lock(&cachep->nodelists[numa_node_id()]->list_lock); - free_block(cachep, ac->entry, ac->avail); - spin_unlock(&cachep->nodelists[numa_node_id()]->list_lock); + spin_lock(&cachep->nodelists[node]->list_lock); + free_block(cachep, ac->entry, ac->avail, node); + spin_unlock(&cachep->nodelists[node]->list_lock); ac->avail = 0; } @@ -2608,7 +2609,7 @@ done: /* * Caller needs to acquire correct kmem_list's list_lock */ -static void free_block(kmem_cache_t *cachep, void **objpp, int nr_objects) +static void free_block(kmem_cache_t *cachep, void **objpp, int nr_objects, int node) { int i; struct kmem_list3 *l3; @@ -2617,14 +2618,12 @@ static void free_block(kmem_cache_t *cac void *objp = objpp[i]; struct slab *slabp; unsigned int objnr; - int nodeid = 0; slabp = GET_PAGE_SLAB(virt_to_page(objp)); - nodeid = slabp->nodeid; - l3 = cachep->nodelists[nodeid]; + l3 = cachep->nodelists[node]; list_del(&slabp->list); objnr = (objp - slabp->s_mem) / cachep->objsize; - check_spinlock_acquired_node(cachep, nodeid); + check_spinlock_acquired_node(cachep, node); check_slabp(cachep, slabp); @@ -2664,13 +2663,14 @@ static void cache_flusharray(kmem_cache_ { int batchcount; struct kmem_list3 *l3; + int node = numa_node_id(); batchcount = ac->batchcount; #if DEBUG BUG_ON(!batchcount || batchcount > ac->avail); #endif check_irq_off(); - l3 = cachep->nodelists[numa_node_id()]; + l3 = cachep->nodelists[node]; spin_lock(&l3->list_lock); if (l3->shared) { struct array_cache *shared_array = l3->shared; @@ -2686,7 +2686,7 @@ static void cache_flusharray(kmem_cache_ } } - free_block(cachep, ac->entry, batchcount); + free_block(cachep, ac->entry, batchcount, node); free_done: #if STATS { @@ -2751,7 +2751,7 @@ static inline void __cache_free(kmem_cac } else { spin_lock(&(cachep->nodelists[nodeid])-> list_lock); - free_block(cachep, &objp, 1); + free_block(cachep, &objp, 1, nodeid); spin_unlock(&(cachep->nodelists[nodeid])-> list_lock); } @@ -2844,7 +2844,7 @@ void *kmem_cache_alloc_node(kmem_cache_t unsigned long save_flags; void *ptr; - if (nodeid == numa_node_id() || nodeid == -1) + if (nodeid == -1) return __cache_alloc(cachep, flags); if (unlikely(!cachep->nodelists[nodeid])) { @@ -3079,7 +3079,7 @@ static int alloc_kmemlist(kmem_cache_t * if ((nc = cachep->nodelists[node]->shared)) free_block(cachep, nc->entry, - nc->avail); + nc->avail, node); l3->shared = new; if (!cachep->nodelists[node]->alien) { @@ -3160,7 +3160,7 @@ static int do_tune_cpucache(kmem_cache_t if (!ccold) continue; spin_lock_irq(&cachep->nodelists[cpu_to_node(i)]->list_lock); - free_block(cachep, ccold->entry, ccold->avail); + free_block(cachep, ccold->entry, ccold->avail, cpu_to_node(i)); spin_unlock_irq(&cachep->nodelists[cpu_to_node(i)]->list_lock); kfree(ccold); } @@ -3240,7 +3240,7 @@ static void drain_array_locked(kmem_cach if (tofree > ac->avail) { tofree = (ac->avail+1)/2; } - free_block(cachep, ac->entry, tofree); + free_block(cachep, ac->entry, tofree, node); ac->avail -= tofree; memmove(ac->entry, &(ac->entry[tofree]), sizeof(void*)*ac->avail); ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849 2005-09-22 20:01 ` Andrew Morton @ 2005-09-22 21:25 ` Petr Vandrovec 2005-09-22 21:32 ` Christoph Lameter 2005-09-22 21:46 ` Andrew Morton 0 siblings, 2 replies; 43+ messages in thread From: Petr Vandrovec @ 2005-09-22 21:25 UTC (permalink / raw) To: Andrew Morton; +Cc: Christoph Lameter, alokk, linux-kernel, manfred Andrew Morton wrote: > Christoph Lameter <clameter@engr.sgi.com> wrote: > >>On Wed, 21 Sep 2005, Christoph Lameter wrote: >> >> > Hmm. This likely has something to do with debugging code. I was unable to >> > reproduce this on amd64 with your config. I get another failure with >> > 2.6.14-rc2 on ia64 if I enable all the debugging features that you have. >> > The system works fine if no debugging is configured: >> > >> > kernel BUG at kernel/workqueue.c:541! >> > swapper[1]: bugcheck! 0 [1] >> >> I fixed the above issue (a structure became larger than the maximum >> allowed by the slab allocator) and the kernel boots fine now on an 8 way >> ia64. Cannot reproduce the problem. > > > Petr can. I think we're still waiting for him to test the below (please): Sorry, I've missed that half of email completely. Yes, it seems to fix problem, box has currently 8 min uptime, which is 7:55 more than it survived before. Thanks, Petr Vandrovec > Could you try the following patch: > > --- > > The numa slab allocator may allocate pages from foreign nodes onto the lists > for a particular node if a node runs out of memory. Inspecting the slab->nodeid > field will not reflect that the page is now in use for the slabs of another node. > > This patch fixes that issue by adding a node field to free_block so that the caller > can indicate which node currently uses a slab. > > Also removes the check for the current node from kmalloc_cache_node since the > process may shift later to another node which may lead to an allocation on another > node than intended. > > Signed-off-by: Christoph Lameter <clameter@sgi.com> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849 2005-09-22 21:25 ` Petr Vandrovec @ 2005-09-22 21:32 ` Christoph Lameter 2005-09-22 21:46 ` Andrew Morton 1 sibling, 0 replies; 43+ messages in thread From: Christoph Lameter @ 2005-09-22 21:32 UTC (permalink / raw) To: Petr Vandrovec; +Cc: Andrew Morton, alokk, linux-kernel, manfred On Thu, 22 Sep 2005, Petr Vandrovec wrote: > Sorry, I've missed that half of email completely. Yes, it seems to fix > problem, > box has currently 8 min uptime, which is 7:55 more than it survived before. I thought the box did not boot at all? The problem appears on an otherwise idle machine after bootup? ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849 2005-09-22 21:25 ` Petr Vandrovec 2005-09-22 21:32 ` Christoph Lameter @ 2005-09-22 21:46 ` Andrew Morton 2005-09-22 21:54 ` Christoph Lameter 1 sibling, 1 reply; 43+ messages in thread From: Andrew Morton @ 2005-09-22 21:46 UTC (permalink / raw) To: Petr Vandrovec; +Cc: clameter, alokk, linux-kernel, manfred Petr Vandrovec <vandrove@vc.cvut.cz> wrote: > > Andrew Morton wrote: > > Christoph Lameter <clameter@engr.sgi.com> wrote: > > > >>On Wed, 21 Sep 2005, Christoph Lameter wrote: > >> > >> > Hmm. This likely has something to do with debugging code. I was unable to > >> > reproduce this on amd64 with your config. I get another failure with > >> > 2.6.14-rc2 on ia64 if I enable all the debugging features that you have. > >> > The system works fine if no debugging is configured: > >> > > >> > kernel BUG at kernel/workqueue.c:541! > >> > swapper[1]: bugcheck! 0 [1] > >> > >> I fixed the above issue (a structure became larger than the maximum > >> allowed by the slab allocator) and the kernel boots fine now on an 8 way > >> ia64. Cannot reproduce the problem. > > > > > > Petr can. I think we're still waiting for him to test the below (please): > > Sorry, I've missed that half of email completely. Yes, it seems to fix problem, > box has currently 8 min uptime, which is 7:55 more than it survived before. Great, thanks. Christoph, was that patch the final official version? ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849 2005-09-22 21:46 ` Andrew Morton @ 2005-09-22 21:54 ` Christoph Lameter 2005-09-23 0:25 ` Petr Vandrovec 0 siblings, 1 reply; 43+ messages in thread From: Christoph Lameter @ 2005-09-22 21:54 UTC (permalink / raw) To: Andrew Morton; +Cc: Petr Vandrovec, alokk, linux-kernel, manfred On Thu, 22 Sep 2005, Andrew Morton wrote: > Great, thanks. Christoph, was that patch the final official version? This should deal with the node ownership issue. So yes. I still have some open question on how pages ended up on the wrong node. This should only happen if a zone / node has run out of memory. If pages ended up on the wrong node without that then there may be a different issue still to be fixed. Maybe Petr can give us some more details on when the problem occurs? ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849 2005-09-22 21:54 ` Christoph Lameter @ 2005-09-23 0:25 ` Petr Vandrovec 0 siblings, 0 replies; 43+ messages in thread From: Petr Vandrovec @ 2005-09-23 0:25 UTC (permalink / raw) To: Christoph Lameter; +Cc: Andrew Morton, alokk, linux-kernel, manfred Christoph Lameter wrote: > On Thu, 22 Sep 2005, Andrew Morton wrote: > > >>Great, thanks. Christoph, was that patch the final official version? > > > This should deal with the node ownership issue. So yes. > > I still have some open question on how pages ended up on the wrong node. > This should only happen if a zone / node has run out of memory. If pages > ended up on the wrong node without that then there may be a different > issue still to be fixed. > > Maybe Petr can give us some more details on when the problem occurs? Problem seems to happen immediately, and just first run of cache_reap (2 seconds after eventd initializes if I understand it correctly) already finds problem. But I'm confused. I've just added code which is supposed to verify all additions to the cache entry[] (http://platan.vc.cvut.cz/verify-all-entry-add.diff) on the top of Christoph patch to catch one which later causes problem in cache_reap, and it logs nothing at the time crash was happening :-( Only incident it logs is "while (batchcount > 0)" loop in cache_alloc_refill, saying that objp ffff81007ffd9430 belonging to the slab ffff81007ffd9000 which belongs to node 1 was added to array_cache belonging to node 0 (called from ffffffff8016e4a9) (mm/slab.c ~ line 2430) ... cache avc_node This repeats couple of times, for avc_node, mnt_cache, proc_inode_cache and bdev_cache. Nothing else. So I've reverted your fix, and still I did not catch offender, so I'm probably missing some place which populates array_cache entry[] :-( Only if after I added logging to free_block() I was able to find that offender is proc_inode_cache. But I have no idea how this object appeared in the incorrect node cache... Petr ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849 2005-09-20 13:58 ` Petr Vandrovec 2005-09-21 1:03 ` Christoph Lameter @ 2005-09-28 21:02 ` Ravikiran G Thirumalai 2005-09-28 22:50 ` Christoph Lameter 2005-09-29 16:43 ` Petr Vandrovec 1 sibling, 2 replies; 43+ messages in thread From: Ravikiran G Thirumalai @ 2005-09-28 21:02 UTC (permalink / raw) To: Petr Vandrovec Cc: Andrew Morton, Christoph Lameter, alokk, linux-kernel, manfred, Shai Fultheim (Shai@scalex86.org), ananth, Andi Kleen On Tue, Sep 20, 2005 at 03:58:16PM +0200, Petr Vandrovec wrote: > Andrew Morton wrote: > >Christoph Lameter <clameter@engr.sgi.com> wrote: > > > >... > >They do. I don't believe that preemption is the source of this BUG. > >(Petr, does CONFIG_PREEMPT=n fix it?) > > No, it does not. I've even added printks here and there to show node > number, > and everything works as it should. Maybe there are some problems with > numa_node_id() and migrating between processors when memory gets released, > I do not know. > > Only thing I know that if I'll add WARN_ON below to the free_block(), it > triggers... > > @free_block > slabp = GET_PAGE_SLAB(virt_to_page(objp)); > nodeid = slabp->nodeid; > + WARN_ON(nodeid != numa_node_id()); <<<<< > l3 = cachep->nodelist[nodeid]; > list_del(&slabp->list); > objnr = (objp - slabp->s_mem) / cachep->objsize; > check_spinlock_acquired_node(cachep, nodeid); > check_slabp(cachep, slabp); > > ... saying that keventd/0 tries to operate on > slab belonging to node#1, while having acquired lock for cachep belonging ^^^^^^^^^^^^^^^^^^^^^^^^^ > to node #0 ^^^^^^^^^^ Just might be relevant here, I found a bug with the recent x86_64 changes to 2.6.14-rc* which causes the node_to_cpumask[] to go bad for the boot processor. This happens on both amd and em64t boxes. I guess the kevent/0 cpus_allowed mask might be changed by the bad node_to_cpumask[] here? On a opteron box (courtesy Ananth M) # cat /sys/devices/system/node/node0/cpumap 00000000 # cat /sys/devices/system/node/node1/cpumap 00000003 On our em64t IBM x460 NUMA, # cat /sys/devices/system/node/node0/cpumap 0000000e # cat /sys/devices/system/node/node1/cpumap 000000f1 Here is a fix for that, I have sounded out Andi Kleen on this and waiting for his comments. Maybe somebody can test the patch below on amds? Thanks, Kiran --- Patch to fix the BP node_to_cpumask. 2.6.14-rc* broke the boot cpu bit as the cpu_to_node(0) is now not setup early enough for numa_init_array. cpu_to_node[] is setup much later at srat_detect_node on acpi srat based em64t machines. This seems like a problem on amd machines too, Tested on em64t though. /sys/devices/system/node/node0/cpumap shows up sanely after this patch. Signed off by: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: Shai Fultheim <shai@scalex86.org> Index: linux-2.6.14-rc1/arch/x86_64/mm/numa.c =================================================================== --- linux-2.6.14-rc1.orig/arch/x86_64/mm/numa.c 2005-09-19 17:58:10.000000000 -0700 +++ linux-2.6.14-rc1/arch/x86_64/mm/numa.c 2005-09-27 01:34:20.000000000 -0700 @@ -178,7 +178,6 @@ rr++; } - set_bit(0, &node_to_cpumask[cpu_to_node(0)]); } #ifdef CONFIG_NUMA_EMU @@ -266,9 +265,7 @@ __cpuinit void numa_add_cpu(int cpu) { - /* BP is initialized elsewhere */ - if (cpu) - set_bit(cpu, &node_to_cpumask[cpu_to_node(cpu)]); + set_bit(cpu, &node_to_cpumask[cpu_to_node(cpu)]); } unsigned long __init numa_free_all_bootmem(void) ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849 2005-09-28 21:02 ` Ravikiran G Thirumalai @ 2005-09-28 22:50 ` Christoph Lameter 2005-09-29 16:43 ` Petr Vandrovec 1 sibling, 0 replies; 43+ messages in thread From: Christoph Lameter @ 2005-09-28 22:50 UTC (permalink / raw) To: Andrew Morton Cc: Ravikiran G Thirumalai, Petr Vandrovec, alokk, linux-kernel, manfred, Shai Fultheim (Shai@scalex86.org), ananth, Andi Kleen On Wed, 28 Sep 2005, Ravikiran G Thirumalai wrote: > Just might be relevant here, I found a bug with the recent > x86_64 changes to 2.6.14-rc* which causes the node_to_cpumask[] to go bad for > the boot processor. This happens on both amd and em64t boxes. I guess the > kevent/0 cpus_allowed mask might be changed by the bad node_to_cpumask[] > here? Andrew, could we add the following patch to the kernel to detect conditions in the future? This code will only be compiled in if slab debugging is enabled. --- [SLAB] Add additional debugging to detect slabs from the wrong node This patch adds some stack dumps if the slab logic is processing slab blocks from the wrong node. This is necessary in order to detect situations as encountered by Petr. Signed-off-by: Christoph Lameter <clameter@sgi.com> Index: linux-2.6.14-rc2/mm/slab.c =================================================================== --- linux-2.6.14-rc2.orig/mm/slab.c 2005-09-27 13:22:30.000000000 -0700 +++ linux-2.6.14-rc2/mm/slab.c 2005-09-28 15:46:31.000000000 -0700 @@ -2421,6 +2421,7 @@ retry: next = slab_bufctl(slabp)[slabp->free]; #if DEBUG slab_bufctl(slabp)[slabp->free] = BUFCTL_FREE; + WARN_ON(numa_node_id() != slabp->nodeid); #endif slabp->free = next; } @@ -2635,8 +2636,10 @@ static void free_block(kmem_cache_t *cac check_spinlock_acquired_node(cachep, node); check_slabp(cachep, slabp); - #if DEBUG + /* Verify that the slab belongs to the intended node */ + WARN_ON(slabp->nodeid != node); + if (slab_bufctl(slabp)[objnr] != BUFCTL_FREE) { printk(KERN_ERR "slab: double free detected in cache " "'%s', objp %p\n", cachep->name, objp); ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849 2005-09-28 21:02 ` Ravikiran G Thirumalai 2005-09-28 22:50 ` Christoph Lameter @ 2005-09-29 16:43 ` Petr Vandrovec 2005-09-29 18:11 ` Ravikiran G Thirumalai 2005-09-30 5:45 ` Ravikiran G Thirumalai 1 sibling, 2 replies; 43+ messages in thread From: Petr Vandrovec @ 2005-09-29 16:43 UTC (permalink / raw) To: Ravikiran G Thirumalai Cc: Andrew Morton, Christoph Lameter, alokk, linux-kernel, manfred, Shai Fultheim (Shai@scalex86.org), ananth, Andi Kleen Ravikiran G Thirumalai wrote: Unfortunately I must confirm that it does not fix problem. But it pointed out to me another thing - proc_inode_cache stuff is put into caches BEFORE this code is executed. So if anything in mm/slab.c relies on node_to_mask[] being valid (and if it relies on some other things which are set this late), it probably won't work. Petr > On Tue, Sep 20, 2005 at 03:58:16PM +0200, Petr Vandrovec wrote: > Index: linux-2.6.14-rc1/arch/x86_64/mm/numa.c > =================================================================== > --- linux-2.6.14-rc1.orig/arch/x86_64/mm/numa.c 2005-09-19 17:58:10.000000000 -0700 > +++ linux-2.6.14-rc1/arch/x86_64/mm/numa.c 2005-09-27 01:34:20.000000000 -0700 > @@ -178,7 +178,6 @@ > rr++; > } > > - set_bit(0, &node_to_cpumask[cpu_to_node(0)]); > } > > #ifdef CONFIG_NUMA_EMU > @@ -266,9 +265,7 @@ > > __cpuinit void numa_add_cpu(int cpu) > { > - /* BP is initialized elsewhere */ > - if (cpu) > - set_bit(cpu, &node_to_cpumask[cpu_to_node(cpu)]); > + set_bit(cpu, &node_to_cpumask[cpu_to_node(cpu)]); > } > > unsigned long __init numa_free_all_bootmem(void) > ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849 2005-09-29 16:43 ` Petr Vandrovec @ 2005-09-29 18:11 ` Ravikiran G Thirumalai 2005-09-29 18:38 ` Christoph Lameter 2005-09-30 5:45 ` Ravikiran G Thirumalai 1 sibling, 1 reply; 43+ messages in thread From: Ravikiran G Thirumalai @ 2005-09-29 18:11 UTC (permalink / raw) To: Petr Vandrovec Cc: Andrew Morton, Christoph Lameter, alokk, linux-kernel, manfred, Shai Fultheim (Shai@scalex86.org), ananth, Andi Kleen, bos On Thu, Sep 29, 2005 at 06:43:05PM +0200, Petr Vandrovec wrote: > Ravikiran G Thirumalai wrote: > > Unfortunately I must confirm that it does not fix problem. But it pointed > out to me another thing - proc_inode_cache stuff is put into caches > BEFORE this code is executed. So if anything in mm/slab.c relies > on node_to_mask[] being valid (and if it relies on some other things > which are set this late), it probably won't work. Hmmm. Another data point for this bug. Bryan, who encountered the same bug on his box just tried 2.6.13 stock + numa slab patches from 2.6.13-mm s, and apparently, the kernel booted up on his opteron. So I guess we should concentrate on the x86_64 bootup part. Thanks, Kiran ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849 2005-09-29 18:11 ` Ravikiran G Thirumalai @ 2005-09-29 18:38 ` Christoph Lameter 0 siblings, 0 replies; 43+ messages in thread From: Christoph Lameter @ 2005-09-29 18:38 UTC (permalink / raw) To: Ravikiran G Thirumalai Cc: Petr Vandrovec, Andrew Morton, alokk, linux-kernel, manfred, Shai Fultheim (Shai@scalex86.org), ananth, Andi Kleen, bos On Thu, 29 Sep 2005, Ravikiran G Thirumalai wrote: > Hmmm. Another data point for this bug. Bryan, who encountered the same bug > on his box just tried 2.6.13 stock + numa slab patches from 2.6.13-mm s, and > apparently, the kernel booted up on his opteron. So I guess we should > concentrate on the x86_64 bootup part. Careful with the patchsets. Some of them contain my fix that masks the problem. Be sure to either have the WARN_ON statements in there that check for valid node numers or use a version before I added the node parameter to free_block. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849 2005-09-29 16:43 ` Petr Vandrovec 2005-09-29 18:11 ` Ravikiran G Thirumalai @ 2005-09-30 5:45 ` Ravikiran G Thirumalai 2005-09-30 6:05 ` Andrew Morton 2005-09-30 16:55 ` Christoph Lameter 1 sibling, 2 replies; 43+ messages in thread From: Ravikiran G Thirumalai @ 2005-09-30 5:45 UTC (permalink / raw) To: Petr Vandrovec Cc: Andrew Morton, Christoph Lameter, alokk, linux-kernel, manfred, Shai Fultheim (Shai@scalex86.org), ananth, Andi Kleen On Thu, Sep 29, 2005 at 06:43:05PM +0200, Petr Vandrovec wrote: > Ravikiran G Thirumalai wrote: > > Unfortunately I must confirm that it does not fix problem. But it pointed > out to me another thing - proc_inode_cache stuff is put into caches > BEFORE this code is executed. So if anything in mm/slab.c relies > on node_to_mask[] being valid (and if it relies on some other things > which are set this late), it probably won't work. The tests Alok carried out on Petr's box confirmed that cpu_to_node[BP] is not setup early enough by numa_init_array due to the x86_64 changes in 2.6.14-rc*, and unfortunately set wrongly by the work around code in numa_init_array(). cpu_to_node[0] gets set with 1 early and later gets set properly to 0 during identify_cpu() when all cpus are brought up, but confusing the numa slab in the process. Here is a quick fix for this. The right fix obviously is to have cpu_to_node[bsp] setup early for numa_init_array(). The following patch will fix the problem now, and the code can stay on even when cpu_to_node{BP] gets fixed early correctly. Thanks to Petr for access to his box. Signed off by: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: Alok N Kataria <alokk@calsoftinc.com> Index: slab-x86_64-fix-2.6.14-rc2/arch/x86_64/mm/numa.c =================================================================== --- slab-x86_64-fix-2.6.14-rc2.orig/arch/x86_64/mm/numa.c 2005-09-29 20:39:25.000000000 -0700 +++ slab-x86_64-fix-2.6.14-rc2/arch/x86_64/mm/numa.c 2005-09-29 21:38:05.000000000 -0700 @@ -167,15 +167,14 @@ mapping. To avoid this fill in the mapping for all possible CPUs, as the number of CPUs is not known yet. We round robin the existing nodes. */ - rr = 0; + rr = first_node(node_online_map); for (i = 0; i < NR_CPUS; i++) { if (cpu_to_node[i] != NUMA_NO_NODE) continue; + cpu_to_node[i] = rr; rr = next_node(rr, node_online_map); if (rr == MAX_NUMNODES) rr = first_node(node_online_map); - cpu_to_node[i] = rr; - rr++; } } ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849 2005-09-30 5:45 ` Ravikiran G Thirumalai @ 2005-09-30 6:05 ` Andrew Morton 2005-09-30 6:28 ` Ravikiran G Thirumalai 2005-09-30 16:55 ` Christoph Lameter 1 sibling, 1 reply; 43+ messages in thread From: Andrew Morton @ 2005-09-30 6:05 UTC (permalink / raw) To: Ravikiran G Thirumalai Cc: vandrove, clameter, alokk, linux-kernel, manfred, shai, ananth, ak Ravikiran G Thirumalai <kiran@scalex86.org> wrote: > > On Thu, Sep 29, 2005 at 06:43:05PM +0200, Petr Vandrovec wrote: > > Ravikiran G Thirumalai wrote: > > > > Unfortunately I must confirm that it does not fix problem. But it pointed > > out to me another thing - proc_inode_cache stuff is put into caches > > BEFORE this code is executed. So if anything in mm/slab.c relies > > on node_to_mask[] being valid (and if it relies on some other things > > which are set this late), it probably won't work. > > The tests Alok carried out on Petr's box confirmed that cpu_to_node[BP] > is not setup early enough by numa_init_array due to the x86_64 changes in > 2.6.14-rc*, and unfortunately set wrongly by the work around code in > numa_init_array(). cpu_to_node[0] gets set with 1 early and later gets set > properly to 0 during identify_cpu() when all cpus are brought up, but > confusing the numa slab in the process. > > Here is a quick fix for this. The right fix obviously is to have > cpu_to_node[bsp] setup early for numa_init_array(). The following patch > will fix the problem now, and the code can stay on even when cpu_to_node{BP] > gets fixed early correctly. > > Thanks to Petr for access to his box. > > Signed off by: Ravikiran Thirumalai <kiran@scalex86.org> > Signed-off-by: Alok N Kataria <alokk@calsoftinc.com> > > Index: slab-x86_64-fix-2.6.14-rc2/arch/x86_64/mm/numa.c > =================================================================== > --- slab-x86_64-fix-2.6.14-rc2.orig/arch/x86_64/mm/numa.c 2005-09-29 20:39:25.000000000 -0700 > +++ slab-x86_64-fix-2.6.14-rc2/arch/x86_64/mm/numa.c 2005-09-29 21:38:05.000000000 -0700 > @@ -167,15 +167,14 @@ > mapping. To avoid this fill in the mapping for all possible > CPUs, as the number of CPUs is not known yet. > We round robin the existing nodes. */ > - rr = 0; > + rr = first_node(node_online_map); > for (i = 0; i < NR_CPUS; i++) { > if (cpu_to_node[i] != NUMA_NO_NODE) > continue; > + cpu_to_node[i] = rr; > rr = next_node(rr, node_online_map); > if (rr == MAX_NUMNODES) > rr = first_node(node_online_map); > - cpu_to_node[i] = rr; > - rr++; > } > > } Is this fix independent from ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.14-rc2/2.6.14-rc2-mm2/broken-out/x86_64-fix-the-bp-node_to_cpumask.patch ? ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849 2005-09-30 6:05 ` Andrew Morton @ 2005-09-30 6:28 ` Ravikiran G Thirumalai 2005-09-30 15:16 ` Bryan O'Sullivan 0 siblings, 1 reply; 43+ messages in thread From: Ravikiran G Thirumalai @ 2005-09-30 6:28 UTC (permalink / raw) To: Andrew Morton Cc: vandrove, clameter, alokk, linux-kernel, manfred, shai, ananth, ak On Thu, Sep 29, 2005 at 11:05:40PM -0700, Andrew Morton wrote: > Ravikiran G Thirumalai <kiran@scalex86.org> wrote: > > Is this fix independent from > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.14-rc2/2.6.14-rc2-mm2/broken-out/x86_64-fix-the-bp-node_to_cpumask.patch > ? Yes. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849 2005-09-30 6:28 ` Ravikiran G Thirumalai @ 2005-09-30 15:16 ` Bryan O'Sullivan 2005-09-30 15:57 ` Christoph Lameter 2005-09-30 20:11 ` Andi Kleen 0 siblings, 2 replies; 43+ messages in thread From: Bryan O'Sullivan @ 2005-09-30 15:16 UTC (permalink / raw) To: Ravikiran G Thirumalai Cc: Andrew Morton, vandrove, clameter, alokk, linux-kernel, manfred, shai, ananth, ak On Thu, 2005-09-29 at 23:28 -0700, Ravikiran G Thirumalai wrote: > Yes. Kiran, your patch works for me, too. I can boot 2.6.14-rc2 with your patch, but not without it. Thanks for your help. <b ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849 2005-09-30 15:16 ` Bryan O'Sullivan @ 2005-09-30 15:57 ` Christoph Lameter 2005-09-30 16:45 ` Bryan O'Sullivan 2005-09-30 20:11 ` Andi Kleen 1 sibling, 1 reply; 43+ messages in thread From: Christoph Lameter @ 2005-09-30 15:57 UTC (permalink / raw) To: Bryan O'Sullivan Cc: Ravikiran G Thirumalai, Andrew Morton, vandrove, alokk, linux-kernel, manfred, shai, ananth, ak On Fri, 30 Sep 2005, Bryan O'Sullivan wrote: > Kiran, your patch works for me, too. I can boot 2.6.14-rc2 with your > patch, but not without it. The patch is not in rc2-mm2 right? I can now reproduce it on a AMD64 single processor with numa emulation (numa=fake=2). So all x86_64 NUMA systems will throw these same stacktraces for rc2-mm2? Bootdata ok (command line is root=/dev/sda2 console=tty0 console=ttyS0,38400n8 numa=fake=2) Linux version 2.6.14-rc2-mm2 (christoph@qirst.com) (gcc version 3.3.5 (Debian 1:3.3.5-13)) #2 SMP Fri Sep 30 15:49:50 UTC 2005 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009f800 (usable) BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved) BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 000000003fff0000 (usable) BIOS-e820: 000000003fff0000 - 000000003fff3000 (ACPI NVS) BIOS-e820: 000000003fff3000 - 0000000040000000 (ACPI data) BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved) BIOS-e820: 00000000fee00000 - 00000000fef00000 (reserved) BIOS-e820: 00000000fefffc00 - 00000000ff000000 (reserved) BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved) Faking node 0 at 0000000000000000-000000000fffffff (255MB) Faking node 1 at 0000000010000000-000000003fff0000 (767MB) Bootmem setup node 0 0000000000000000-000000000fffffff Bootmem setup node 1 0000000010000000-000000003fff0000 Nvidia board detected. Ignoring ACPI timer override. ACPI: PM-Timer IO Port: 0x4008 ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) Processor #0 15:12 APIC version 16 ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1]) ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0]) IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: BIOS IRQ0 pin2 override ignored. ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 high edge) ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 15 high edge) Setting APIC routing to flat Using ACPI (MADT) for SMP configuration information Allocating PCI resources starting at 50000000 (gap: 40000000:bec00000) Built 2 zonelists Initializing CPU#0 Kernel command line: root=/dev/sda2 console=tty0 console=ttyS0,38400n8 numa=fake=2 PID hash table entries: 4096 (order: 12, 131072 bytes) time.c: Using 3.579545 MHz PM timer. time.c: Detected 2210.110 MHz processor. ... powernow-k8: Found 1 AMD Athlon 64 / Opteron processors (version 1.50.3) powernow-k8: 0 : fid 0xe (2200 MHz), vid 0x2 (1500 mV) powernow-k8: 1 : fid 0xc (2000 MHz), vid 0x6 (1400 mV) powernow-k8: 2 : fid 0xa (1800 MHz), vid 0xa (1300 mV) powernow-k8: 3 : fid 0x2 (1000 MHz), vid 0x12 (1100 mV) cpu_init done, current fid 0xe, vid 0x2 Badness in cache_alloc_refill at mm/slab.c:2424 Call Trace:<ffffffff80164f8a>{cache_alloc_refill+458} <ffffffff80165858>{kmem_cache_alloc+136} <ffffffff8019f41e>{alloc_vfsmnt+30} <ffffffff801893d2>{do_kern_mount+82} <ffffffff8018fd5f>{getname+31} <ffffffff801a07ef>{do_new_mount+143} <ffffffff801a0f5b>{do_mount+363} <ffffffff8018fd5f>{getname+31} <ffffffff8015ee60>{__get_free_pages+16} <ffffffff801a134b>{sys_mount+139} <ffffffff8010b4ae>{name_to_dev_t+62} <ffffffff80514e70>{prepare_namespace+80} <ffffffff8010b16a>{init+250} <ffffffff8010ed2e>{child_rip+8} <ffffffff8010b070>{init+0} <ffffffff8010ed26>{child_rip+0} Badness in cache_alloc_refill at mm/slab.c:2424 Call Trace:<ffffffff80164f8a>{cache_alloc_refill+458} <ffffffff80165858>{kmem_cache_alloc+136} <ffffffff8019f41e>{alloc_vfsmnt+30} <ffffffff801893d2>{do_kern_mount+82} <ffffffff8018fd5f>{getname+31} <ffffffff801a07ef>{do_new_mount+143} <ffffffff801a0f5b>{do_mount+363} <ffffffff8018fd5f>{getname+31} <ffffffff8015ee60>{__get_free_pages+16} <ffffffff801a134b>{sys_mount+139} <ffffffff8010b4ae>{name_to_dev_t+62} <ffffffff80514e70>{prepare_namespace+80} <ffffffff8010b16a>{init+250} <ffffffff8010ed2e>{child_rip+8} <ffffffff8010b070>{init+0} <ffffffff8010ed26>{child_rip+0} Badness in cache_alloc_refill at mm/slab.c:2424 ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849 2005-09-30 15:57 ` Christoph Lameter @ 2005-09-30 16:45 ` Bryan O'Sullivan 0 siblings, 0 replies; 43+ messages in thread From: Bryan O'Sullivan @ 2005-09-30 16:45 UTC (permalink / raw) To: Christoph Lameter Cc: Ravikiran G Thirumalai, Andrew Morton, vandrove, alokk, linux-kernel, manfred, shai, ananth, ak On Fri, 2005-09-30 at 08:57 -0700, Christoph Lameter wrote: > On Fri, 30 Sep 2005, Bryan O'Sullivan wrote: > > > Kiran, your patch works for me, too. I can boot 2.6.14-rc2 with your > > patch, but not without it. > > The patch is not in rc2-mm2 right? Correct. > I can now reproduce it on a AMD64 > single processor with numa emulation (numa=fake=2). That's helpful for reproducing the problem. Thanks. > So all x86_64 NUMA > systems will throw these same stacktraces for rc2-mm2? I've only tried with HDAMA motherboards, but based on your report and Petr's, it seems somewhat likely. <b ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849 2005-09-30 15:16 ` Bryan O'Sullivan 2005-09-30 15:57 ` Christoph Lameter @ 2005-09-30 20:11 ` Andi Kleen 2005-09-30 20:23 ` Ravikiran G Thirumalai 1 sibling, 1 reply; 43+ messages in thread From: Andi Kleen @ 2005-09-30 20:11 UTC (permalink / raw) To: Bryan O'Sullivan Cc: Ravikiran G Thirumalai, Andrew Morton, vandrove, clameter, alokk, linux-kernel, manfred, shai, ananth On Friday 30 September 2005 17:16, Bryan O'Sullivan wrote: > On Thu, 2005-09-29 at 23:28 -0700, Ravikiran G Thirumalai wrote: > > Yes. > > Kiran, your patch works for me, too. I can boot 2.6.14-rc2 with your > patch, but not without it. > > Thanks for your help. It's already on its way to Linus. Thanks Kiran. BTW for my defense: my NUMA boxes booted just fine with the original patchkit. -Andi ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849 2005-09-30 20:11 ` Andi Kleen @ 2005-09-30 20:23 ` Ravikiran G Thirumalai 0 siblings, 0 replies; 43+ messages in thread From: Ravikiran G Thirumalai @ 2005-09-30 20:23 UTC (permalink / raw) To: Andi Kleen Cc: Bryan O'Sullivan, Andrew Morton, vandrove, clameter, alokk, linux-kernel, manfred, shai, ananth On Fri, Sep 30, 2005 at 10:11:15PM +0200, Andi Kleen wrote: > On Friday 30 September 2005 17:16, Bryan O'Sullivan wrote: > > On Thu, 2005-09-29 at 23:28 -0700, Ravikiran G Thirumalai wrote: > > > Yes. > > > > Kiran, your patch works for me, too. I can boot 2.6.14-rc2 with your > > patch, but not without it. > > > > Thanks for your help. > > It's already on its way to Linus. Thanks Kiran. Thanks are also due to Alok for spending long hours trying out all combinations on Petr's box. Thanks Alok. Kiran ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849 2005-09-30 5:45 ` Ravikiran G Thirumalai 2005-09-30 6:05 ` Andrew Morton @ 2005-09-30 16:55 ` Christoph Lameter 1 sibling, 0 replies; 43+ messages in thread From: Christoph Lameter @ 2005-09-30 16:55 UTC (permalink / raw) To: Ravikiran G Thirumalai Cc: Petr Vandrovec, Andrew Morton, alokk, linux-kernel, manfred, Shai Fultheim (Shai@scalex86.org), ananth, Andi Kleen On Thu, 29 Sep 2005, Ravikiran G Thirumalai wrote: > Here is a quick fix for this. The right fix obviously is to have > cpu_to_node[bsp] setup early for numa_init_array(). The following patch > will fix the problem now, and the code can stay on even when cpu_to_node{BP] > gets fixed early correctly. This fixes the problem that I can produce by booting with numa=fake=2 ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849 2005-09-19 18:29 ` Andrew Morton 2005-09-19 18:51 ` Christoph Lameter @ 2005-09-19 18:56 ` Petr Vandrovec 2005-09-19 19:08 ` Christoph Lameter 1 sibling, 1 reply; 43+ messages in thread From: Petr Vandrovec @ 2005-09-19 18:56 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, Christoph Lameter Andrew Morton wrote: > Petr Vandrovec <vandrove@vc.cvut.cz> wrote: > >>Andrew Morton wrote: >> >>>Petr Vandrovec <vandrove@vc.cvut.cz> wrote: >>> >>> >>>>Andrew Morton wrote: >>>> >>>>>Petr Vandrovec <vandrove@vc.cvut.cz> wrote: >>>>> >>>>> >>>>>> so now once crashes on UP system were sorted out, I tried to >>>>>>put new kernel on my SMP host - and sorry to say, but it does not >>>>>>seem to work as advertised :-( >>>>> >>>>>.config (again), please. >>>> >>>>Any SMP with NUMA. One which I'm trying to debug now is attached. >>>>It is available at http://vana.vc.cvut.cz/config as well. >>> >>>I can get 2.6.14-rc1 to crash with your .config, but current -linus is OK. >> >>It still dies for me, with current git (tree 7513cdadc661cfe0bd1625145a4876e54df191ca, >>commit 6c0741fbdee5bd0f8ed13ac287c4ab18e8ba7d83). Config is available at >>http://platan.vc.cvut.cz/config-vana.txt. Box is dual opteron Tyan K8W, S2885. >> >>... >> >> ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:DMA, hdb:pio >> ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:DMA, hdd:pio >>----------- [cut here ] --------- [please bite here ] --------- >>Kernel BUG at mm/slab.c:1849 >>invalid operand: 0000 [1] SMP >>CPU 0 >>Modules linked in: >>Pid: 8, comm: events/0 Not tainted 2.6.14-rc1-6c07 #1 >>RIP: 0010:[<ffffffff8016d316>] <ffffffff8016d316>{free_block+294} >>RSP: 0000:ffff81007ff21d88 EFLAGS: 00010002 >>RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000310 >>RDX: 0000000000000000 RSI: ffff81007ffddd10 RDI: ffff81007ffda080 >>RBP: ffff81007ffde000 R08: ffff81003ffa0d50 R09: 0000000000000000 >>R10: 00000000ffffffff R11: 0000000000000000 R12: ffff81007ffc9b50 >>R13: ffff81007ffde048 R14: ffff81007ffda080 R15: ffff81007ffda080 >>FS: 0000000000000000(0000) GS:ffffffff805f2800(0000) knlGS:0000000000000000 >>CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b >>CR2: 0000000000000000 CR3: 0000000000101000 CR4: 00000000000006e0 >>Process events/0 (pid: 8, threadinfo ffff81007ff20000, task ffff81003ff8c790) >>Stack: 0000000000000000 0000000000000000 0000000000000292 hda: _NEC DVD_RW ND-3500AG, ATAPI CD/DVD-ROM drive >>0000000200000000 >> ffff81007ffddd10 ffff81007ffddd10 ffff81007ffddce8 0000000000000002 >> 0000000000000000 ffff81007ffda080 >>Call Trace:<ffffffff8016e8b7>{drain_array_locked+167} <ffffffff8016e9f7>{cache_reap+231} >> <ffffffff80131e23>{__wake_up+67} <ffffffff8016e910>{cache_reap+0} >> <ffffffff8014930c>{worker_thread+476} <ffffffff80131d60>{default_wake_function+0} >> <ffffffff80131d60>{default_wake_function+0} <ffffffff80149130>{worker_thread+0} >> <ffffffff8014db82>{kthread+146} <ffffffff8010ec22>{child_rip+8} >> <ffffffff80149130>{worker_thread+0} <ffffffff8014daf0>{kthread+0} >> <ffffffff8010ec1a>{child_rip+0} >> >>Code: 0f 0b 68 9d 26 3d 80 c2 39 07 48 89 ee 4c 89 ff 4c 8d 75 30 >>RIP <ffffffff8016d316>{free_block+294} RSP <ffff81007ff21d88> >> ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 > > > Well. The CPU_UP_CANCELED locking in cpuup_callback() looks borked to me - > it takes cachep->nodelists[node]->list_lock and then calls > drain_alien_cache() which appears to take the same lock. But that's not > the problem here. > > The code in cache_reap() recalculates numa_node_id() multiple times, so if > the caller changes CPUs then this assertion will trigger. However it's > running under keventd here, which is pinned to a single CPU. Still, it > would be useful if you could try putting preempt_disable()s in > cache_reap(), or change cache_reap() to evaluate numa_node_id() just the > once, and cache that in a local variable. > > I wonder why numa_node_id() uses raw_smp_processor_id()? That's just > asking for preempt non-atomicity bugs. I've thought that this is problem, but as far as I can tell while this is problem it does not happen here. Just free_block() finds that pointer it got from caller belongs to the slab that belongs to the CPU#1/node#1 while caller obtained lock on CPU#0/node#0 structures. Which suggests that drain_array_locked() was issued with node #0 while array_cache->entry it got contains blocks which belong to node #1. Which I cannot explain. Petr ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849 2005-09-19 18:56 ` Petr Vandrovec @ 2005-09-19 19:08 ` Christoph Lameter 0 siblings, 0 replies; 43+ messages in thread From: Christoph Lameter @ 2005-09-19 19:08 UTC (permalink / raw) To: Petr Vandrovec; +Cc: Andrew Morton, linux-kernel, alokk On Mon, 19 Sep 2005, Petr Vandrovec wrote: > I've thought that this is problem, but as far as I can tell while this is > problem it does not happen here. Just free_block() finds that pointer it > got from caller belongs to the slab that belongs to the CPU#1/node#1 > while caller obtained lock on CPU#0/node#0 structures. Which suggests > that drain_array_locked() was issued with node #0 while array_cache->entry > it got contains blocks which belong to node #1. Which I cannot explain. That can happen if node 0 runs out of memory and the page_allocator falls back to take memory from node 1 for node 0 requests. Maybe we have a problem here. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849
@ 2005-09-23 19:34 Alok Kataria
2005-09-23 23:57 ` Christoph Lameter
` (2 more replies)
0 siblings, 3 replies; 43+ messages in thread
From: Alok Kataria @ 2005-09-23 19:34 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Petr Vandrovec, Andrew Morton, linux-kernel, manfred
[-- Attachment #1: Type: text/plain, Size: 1838 bytes --]
On Wed, 2005-09-21 at 06:33, Christoph Lameter wrote:
Hi Christoph,
I have some doubts over this...
>/On Tue, 20 Sep 2005, Petr Vandrovec wrote:
>
>> slab belonging to node#1, while having acquired lock for cachep belonging
>> to node #0. Due to this check_spinlock_acquired_node(cachep, nodeid) fails
>> (check_spinlock_acquired_node(cachep, 0) would succeed).
>
>Hmmm. If a node runs out of memory then pages from another node may end up
>on the slab list of a node. But it seems that free_block cannot handle
>that properly.
>
>How are you producing the problem?
>
>Could you try the following patch:
>
>---
>
>The numa slab allocator may allocate pages from foreign nodes onto the lists
>for a particular node if a node runs out of memory. Inspecting the slab->nodeid
>field will not reflect that the page is now in use for the slabs of another node.
>/
>
/
/
IMO the slab->nodeid field just lets us know to which nodes list3 is
this slab attached, irrespective of the node from
which node the memory was got.
>/This patch fixes that issue by adding a node field to free_block so that the caller
>can indicate which node currently uses a slab.
>
>/
>
But the nodeid is already accessible through the slab-descriptor of this
object, and this nodeid is set in the cache_grow
function.
>/Also removes the check for the current node from kmalloc_cache_node since the
>process may shift later to another node which may lead to an allocation on another
>node than intended.
>/
>
Yeah that is possible, but won't putting a check in __cache_alloc_node
after disabling the interrupt be better, because
kmalloc_node/kmem_cache_alloc_node can be called at runtime as well, and
getting the object directly from the slabs, instead of the arraycaches
may slow up things.
Thus tweaking the patch a little.
Thanks & Regards,
Alok
[-- Attachment #2: cache_alloc_node.patch --]
[-- Type: text/x-patch, Size: 1880 bytes --]
Signed-off-by: Alok N Kataria <alokk@calsoftinc.com>
Index: linux-2.6.13/mm/slab.c
===================================================================
--- linux-2.6.13.orig/mm/slab.c 2005-09-24 00:08:00.221900000 +0530
+++ linux-2.6.13/mm/slab.c 2005-09-24 00:24:12.206645250 +0530
@@ -2507,16 +2507,12 @@
#define cache_alloc_debugcheck_after(a,b,objp,d) (objp)
#endif
-
-static inline void *__cache_alloc(kmem_cache_t *cachep, unsigned int __nocast flags)
+static inline void *____cache_alloc(kmem_cache_t *cachep, unsigned int __nocast flags)
{
- unsigned long save_flags;
void* objp;
struct array_cache *ac;
- cache_alloc_debugcheck_before(cachep, flags);
-
- local_irq_save(save_flags);
+ check_irq_off();
ac = ac_data(cachep);
if (likely(ac->avail)) {
STATS_INC_ALLOCHIT(cachep);
@@ -2526,6 +2522,18 @@
STATS_INC_ALLOCMISS(cachep);
objp = cache_alloc_refill(cachep, flags);
}
+ return objp;
+}
+
+static inline void *__cache_alloc(kmem_cache_t *cachep, unsigned int __nocast flags)
+{
+ unsigned long save_flags;
+ void* objp;
+
+ cache_alloc_debugcheck_before(cachep, flags);
+
+ local_irq_save(save_flags);
+ objp = ____cache_alloc(cachep, flags);
local_irq_restore(save_flags);
objp = cache_alloc_debugcheck_after(cachep, flags, objp, __builtin_return_address(0));
return objp;
@@ -2841,7 +2849,7 @@
unsigned long save_flags;
void *ptr;
- if (nodeid == numa_node_id() || nodeid == -1)
+ if (nodeid == -1)
return __cache_alloc(cachep, flags);
if (unlikely(!cachep->nodelists[nodeid])) {
@@ -2852,6 +2860,8 @@
cache_alloc_debugcheck_before(cachep, flags);
local_irq_save(save_flags);
+ if (nodeid == numa_node_id())
+ ____cache_alloc(cachep, flags);
ptr = __cache_alloc_node(cachep, flags, nodeid);
local_irq_restore(save_flags);
ptr = cache_alloc_debugcheck_after(cachep, flags, ptr, __builtin_return_address(0));
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849 2005-09-23 19:34 Alok Kataria @ 2005-09-23 23:57 ` Christoph Lameter 2005-09-24 0:05 ` Christoph Lameter 2005-09-24 12:52 ` Manfred Spraul 2 siblings, 0 replies; 43+ messages in thread From: Christoph Lameter @ 2005-09-23 23:57 UTC (permalink / raw) To: Alok Kataria; +Cc: Petr Vandrovec, Andrew Morton, linux-kernel, manfred On Sat, 24 Sep 2005, Alok Kataria wrote: > But the nodeid is already accessible through the slab-descriptor of this > object, and this nodeid is set in the cache_grow > function. Correct. We still have no explanation why the slab was later assigned to the wrong node. The patch fixes the locking issue though because the wrong nodeid field is now ignored. There is certianly more to fix here. > > /Also removes the check for the current node from kmalloc_cache_node since > > the > > process may shift later to another node which may lead to an allocation on > > another > > node than intended. > > / > > > Yeah that is possible, but won't putting a check in __cache_alloc_node after > disabling the interrupt be better, because kmalloc_node/kmem_cache_alloc_node > can be called at runtime as well, and getting the object directly from the > slabs, instead of the arraycaches may slow up things. > Thus tweaking the patch a little. Good ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849 2005-09-23 19:34 Alok Kataria 2005-09-23 23:57 ` Christoph Lameter @ 2005-09-24 0:05 ` Christoph Lameter 2005-09-24 12:52 ` Manfred Spraul 2 siblings, 0 replies; 43+ messages in thread From: Christoph Lameter @ 2005-09-24 0:05 UTC (permalink / raw) To: Alok Kataria; +Cc: Petr Vandrovec, Andrew Morton, linux-kernel, manfred Patch was not send inline. So this is going to look at a bit strange. Comments on the code: @@ -2852,6 +2860,8 @@ cache_alloc_debugcheck_before(cachep, flags); local_irq_save(save_flags); + if (nodeid == numa_node_id()) + ____cache_alloc(cachep, flags); ptr = __cache_alloc_node(cachep, flags, nodeid); This should be ptr = ___cache_alloc(cachep, flags) else ptr = __cache_alloc_node(...) right? local_irq_restore(save_flags); ptr = cache_alloc_debugcheck_after(cachep, flags, ptr, __builtin_return_address(0)); ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849 2005-09-23 19:34 Alok Kataria 2005-09-23 23:57 ` Christoph Lameter 2005-09-24 0:05 ` Christoph Lameter @ 2005-09-24 12:52 ` Manfred Spraul 2 siblings, 0 replies; 43+ messages in thread From: Manfred Spraul @ 2005-09-24 12:52 UTC (permalink / raw) To: Alok Kataria Cc: Christoph Lameter, Petr Vandrovec, Andrew Morton, linux-kernel Alok Kataria wrote: > > IMO the slab->nodeid field just lets us know to which nodes list3 is > this slab attached, irrespective of the node from > which node the memory was got. > Correct. Otherwise the code wouldn't work on ia32 NUMAQ systems: They have the whole ZONE_NORMAL in node 0. When a slab is allocated, it's assigned to the node that did the alloc, regardless of the physical location of the memory. -- Manfred ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849
@ 2005-09-25 14:16 Alok Kataria
2005-09-26 18:00 ` Christoph Lameter
0 siblings, 1 reply; 43+ messages in thread
From: Alok Kataria @ 2005-09-25 14:16 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Petr Vandrovec, Andrew Morton, linux-kernel, manfred
On Sat, 2005-09-24 at 05:35, Christoph Lameter wrote:
> Comments on the code:
>
> @@ -2852,6 +2860,8 @@
>
> cache_alloc_debugcheck_before(cachep, flags);
> local_irq_save(save_flags);
> + if (nodeid == numa_node_id())
> + ____cache_alloc(cachep, flags);
> ptr = __cache_alloc_node(cachep, flags, nodeid);
>
> This should be
>
> ptr = ___cache_alloc(cachep, flags)
> else
> ptr = __cache_alloc_node(...)
>
> right?
>
> local_irq_restore(save_flags);
> ptr = cache_alloc_debugcheck_after(cachep, flags, ptr,
> __builtin_return_address(0));
Oh a major blunder !! Updated the patch
--
As pointed by Christoph, In kmalloc_node we are cheking if, the allocation is for the
same node when interrupts are "on", this may lead to an allocation on another node than intended.
This patch just shifts the check for the current node in __cache_alloc_node when interrupts
are disabled.
Signed-off-by: Alok N Kataria <alokk@calsoftinc.com>
Cc : Christoph Lameter <clameter@sgi.com>
Index: linux-2.6.13/mm/slab.c
===================================================================
--- linux-2.6.13.orig/mm/slab.c 2005-09-25 18:48:16.068349500 +0530
+++ linux-2.6.13/mm/slab.c 2005-09-25 18:48:18.484500500 +0530
@@ -2508,16 +2508,12 @@
#define cache_alloc_debugcheck_after(a,b,objp,d) (objp)
#endif
-
-static inline void *__cache_alloc(kmem_cache_t *cachep, unsigned int __nocast flags)
+static inline void *____cache_alloc(kmem_cache_t *cachep, unsigned int __nocast flags)
{
- unsigned long save_flags;
void* objp;
struct array_cache *ac;
- cache_alloc_debugcheck_before(cachep, flags);
-
- local_irq_save(save_flags);
+ check_irq_off();
ac = ac_data(cachep);
if (likely(ac->avail)) {
STATS_INC_ALLOCHIT(cachep);
@@ -2527,6 +2523,18 @@
STATS_INC_ALLOCMISS(cachep);
objp = cache_alloc_refill(cachep, flags);
}
+ return objp;
+}
+
+static inline void *__cache_alloc(kmem_cache_t *cachep, unsigned int __nocast flags)
+{
+ unsigned long save_flags;
+ void* objp;
+
+ cache_alloc_debugcheck_before(cachep, flags);
+
+ local_irq_save(save_flags);
+ objp = ____cache_alloc(cachep, flags);
local_irq_restore(save_flags);
objp = cache_alloc_debugcheck_after(cachep, flags, objp,
__builtin_return_address(0));
@@ -2844,7 +2852,7 @@
unsigned long save_flags;
void *ptr;
- if (nodeid == numa_node_id() || nodeid == -1)
+ if (nodeid == -1)
return __cache_alloc(cachep, flags);
if (unlikely(!cachep->nodelists[nodeid])) {
@@ -2855,7 +2863,10 @@
cache_alloc_debugcheck_before(cachep, flags);
local_irq_save(save_flags);
- ptr = __cache_alloc_node(cachep, flags, nodeid);
+ if (nodeid == numa_node_id())
+ ptr = ____cache_alloc(cachep, flags);
+ else
+ ptr = __cache_alloc_node(cachep, flags, nodeid);
local_irq_restore(save_flags);
ptr = cache_alloc_debugcheck_after(cachep, flags, ptr, __builtin_return_address(0));
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849 2005-09-25 14:16 Alok Kataria @ 2005-09-26 18:00 ` Christoph Lameter 2005-09-26 19:34 ` Alok Kataria 0 siblings, 1 reply; 43+ messages in thread From: Christoph Lameter @ 2005-09-26 18:00 UTC (permalink / raw) To: Alok Kataria; +Cc: Petr Vandrovec, Andrew Morton, linux-kernel, manfred On Sun, 25 Sep 2005, Alok Kataria wrote: > As pointed by Christoph, In kmalloc_node we are cheking if, the allocation is > for the > same node when interrupts are "on", this may lead to an allocation on another > node than intended. > This patch just shifts the check for the current node in __cache_alloc_node > when interrupts > are disabled. Alokk, could you verify that this patch works? Petr, could you check that this patch fixes your issue? I am a bit skeptical. I do not think we have found the real problem yet. We need to have some way to reproduce the problem if it still persists after applying Alokk's patch. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849 2005-09-26 18:00 ` Christoph Lameter @ 2005-09-26 19:34 ` Alok Kataria 0 siblings, 0 replies; 43+ messages in thread From: Alok Kataria @ 2005-09-26 19:34 UTC (permalink / raw) To: Christoph Lameter; +Cc: Petr Vandrovec, Andrew Morton, linux-kernel, manfred On Mon, 2005-09-26 at 23:30, Christoph Lameter wrote: > On Sun, 25 Sep 2005, Alok Kataria wrote: > > > As pointed by Christoph, In kmalloc_node we are cheking if, the allocation is > > for the > > same node when interrupts are "on", this may lead to an allocation on another > > node than intended. > > This patch just shifts the check for the current node in __cache_alloc_node > > when interrupts > > are disabled. > > Alokk, could you verify that this patch works? Yes it does work at my end, i am still not able to reproduce the BUG so don't know if we really fix that BUG. > > Petr, could you check that this patch fixes your issue? I am a bit > skeptical. I do not think we have found the real problem yet. We need to > have some way to reproduce the problem if it still persists after applying > Alokk's patch. Yep, that will help, if it still BUG's the information that you provided with verify_entry will be great. Thanks & Regards, Alok ^ permalink raw reply [flat|nested] 43+ messages in thread
end of thread, other threads:[~2005-09-30 20:23 UTC | newest]
Thread overview: 43+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-09-15 16:51 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849 Petr Vandrovec
2005-09-15 17:33 ` Petr Vandrovec
[not found] ` <20050916023005.4146e499.akpm@osdl.org>
[not found] ` <432AA00D.4030706@vc.cvut.cz>
[not found] ` <20050916230809.789d6b0b.akpm@osdl.org>
2005-09-19 16:02 ` Petr Vandrovec
2005-09-19 18:29 ` Andrew Morton
2005-09-19 18:51 ` Christoph Lameter
2005-09-19 19:28 ` Andrew Morton
2005-09-19 21:20 ` Christoph Lameter
2005-09-20 5:16 ` Andrew Morton
2005-09-20 8:34 ` Alok Kataria
2005-09-20 13:58 ` Petr Vandrovec
2005-09-21 1:03 ` Christoph Lameter
2005-09-21 1:22 ` Petr Vandrovec
2005-09-21 15:59 ` Christoph Lameter
2005-09-22 19:52 ` Christoph Lameter
2005-09-22 20:01 ` Andrew Morton
2005-09-22 21:25 ` Petr Vandrovec
2005-09-22 21:32 ` Christoph Lameter
2005-09-22 21:46 ` Andrew Morton
2005-09-22 21:54 ` Christoph Lameter
2005-09-23 0:25 ` Petr Vandrovec
2005-09-28 21:02 ` Ravikiran G Thirumalai
2005-09-28 22:50 ` Christoph Lameter
2005-09-29 16:43 ` Petr Vandrovec
2005-09-29 18:11 ` Ravikiran G Thirumalai
2005-09-29 18:38 ` Christoph Lameter
2005-09-30 5:45 ` Ravikiran G Thirumalai
2005-09-30 6:05 ` Andrew Morton
2005-09-30 6:28 ` Ravikiran G Thirumalai
2005-09-30 15:16 ` Bryan O'Sullivan
2005-09-30 15:57 ` Christoph Lameter
2005-09-30 16:45 ` Bryan O'Sullivan
2005-09-30 20:11 ` Andi Kleen
2005-09-30 20:23 ` Ravikiran G Thirumalai
2005-09-30 16:55 ` Christoph Lameter
2005-09-19 18:56 ` Petr Vandrovec
2005-09-19 19:08 ` Christoph Lameter
-- strict thread matches above, loose matches on Subject: below --
2005-09-23 19:34 Alok Kataria
2005-09-23 23:57 ` Christoph Lameter
2005-09-24 0:05 ` Christoph Lameter
2005-09-24 12:52 ` Manfred Spraul
2005-09-25 14:16 Alok Kataria
2005-09-26 18:00 ` Christoph Lameter
2005-09-26 19:34 ` Alok Kataria
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox