* [BUG] 2.6.30-rc3: BUG triggered on some hugepage usages [not found] <alpine.LFD.2.00.0904212014170.3101@localhost.localdomain> @ 2009-04-24 9:51 ` Mel Gorman 2009-04-24 15:24 ` Michael Ellerman 2009-04-27 8:15 ` Benjamin Herrenschmidt 0 siblings, 2 replies; 7+ messages in thread From: Mel Gorman @ 2009-04-24 9:51 UTC (permalink / raw) To: Linus Torvalds; +Cc: linuxppc-dev, Linux Kernel Mailing List On Tue, Apr 21, 2009 at 08:27:57PM -0700, Linus Torvalds wrote: > Another week, another -rc. > I'm seeing some tests with sysbench+postgres+large pages fail on ppc64 although a very clear pattern is not forming as to what exactly is causing it. However, the libhugetlbfs regression tests (make && make func) are triggering the following oops when calling mlock() and so are likely related. ------------[ cut here ]------------ kernel BUG at arch/powerpc/mm/pgtable.c:243! Oops: Exception in kernel mode, sig: 5 [#1] SMP NR_CPUS=128 NUMA pSeries Modules linked in: dm_snapshot dm_mirror dm_region_hash dm_log qla2xxx loop nfnetlink iptable_filter iptable_nat nf_nat ip_tables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp xt_limit ipt_LOG xt_pkttype x_tables NIP: c00000000002becc LR: c00000000002c02c CTR: 0000000000000000 REGS: c0000000ea92b4c0 TRAP: 0700 Not tainted (2.6.30-rc3-autokern1) MSR: 8000000000029032 <EE,ME,CE,IR,DR> CR: 28000484 XER: 20000020 TASK = c00000000395b660[7611] 'mlock' THREAD: c0000000ea928000 CPU: 3 GPR00: 0000000000000001 c0000000ea92b740 c0000000008ea170 c0000000ec7d4980 GPR04: 000000003f000000 c0000001e2278cf8 0000001900000393 0000000000000001 GPR08: f000000002bc0000 0000000000000000 0000000000000113 c0000001e2278c81 GPR12: 0000000044000482 c00000000093b880 0000000028004422 0000000000000000 GPR16: c0000000ea92bbf0 c0000000009f06f0 0000001900000113 c0000000ec7d4980 GPR20: 0000000000000000 f000000002bc0000 000000003f000000 c0000001e2278cf8 GPR24: c0000000eaa90bb0 0000000000000000 c0000000eaa90bb0 c0000000ea928000 GPR28: f000000002bc0000 0000001900000393 0000000000000001 c0000001e2278cf8 NIP [c00000000002becc] .assert_pte_locked+0x54/0x8c LR [c00000000002c02c] .ptep_set_access_flags+0x50/0x8c Call Trace: [c0000000ea92b740] [c0000000eaa90bb0] 0xc0000000eaa90bb0 (unreliable) [c0000000ea92b7d0] [c0000000000ed1b0] .hugetlb_cow+0xd4/0x654 [c0000000ea92b900] [c0000000000edbf0] .hugetlb_fault+0x4c0/0x708 [c0000000ea92b9f0] [c0000000000ee890] .follow_hugetlb_page+0x174/0x364 [c0000000ea92bae0] [c0000000000d8d30] .__get_user_pages+0x288/0x4c0 [c0000000ea92bbb0] [c0000000000da10c] .make_pages_present+0xa0/0xe0 [c0000000ea92bc40] [c0000000000db758] .mlock_fixup+0x90/0x228 [c0000000ea92bd00] [c0000000000dbb38] .do_mlock+0xc4/0x128 [c0000000ea92bda0] [c0000000000dbccc] .SyS_mlock+0xb0/0xec [c0000000ea92be30] [c00000000000852c] syscall_exit+0x0/0x40 Instruction dump: 0b000000 78892662 79291f24 7d69582a 7d600074 7800d182 0b000000 78895e62 79291f24 7d29582a 7d200074 7800d182 <0b000000> 3c004000 3960ffff 780007c6 ---[ end trace 36a7faa04fa9452b ]--- This corresponds to #ifdef CONFIG_DEBUG_VM void assert_pte_locked(struct mm_struct *mm, unsigned long addr) { pgd_t *pgd; pud_t *pud; pmd_t *pmd; if (mm == &init_mm) return; pgd = mm->pgd + pgd_index(addr); BUG_ON(pgd_none(*pgd)); pud = pud_offset(pgd, addr); BUG_ON(pud_none(*pud)); pmd = pmd_offset(pud, addr); BUG_ON(!pmd_present(*pmd)); <----- THIS LINE BUG_ON(!spin_is_locked(pte_lockptr(mm, pmd))); } #endif /* CONFIG_DEBUG_VM */ This area was last changed by commit 8d30c14cab30d405a05f2aaceda1e9ad57800f36 in the 2.6.30-rc1 timeframe. I think there was another hugepage-related problem with this patch but I can't remember what it was. Full dmesg is ==== dmesg ==== Using pSeries machine description Page orders: linear mapping = 24, virtual = 12, io = 12, vmemmap = 24 Found initrd at 0xc000000003300000:0xc000000004b67000 console [udbg0] enabled Partition configured for 8 cpus. CPU maps initialized for 2 threads per core (thread shift is 1) Starting Linux PPC64 #1 SMP Fri Apr 24 09:08:10 UTC 2009 ----------------------------------------------------- ppc64_pft_size = 0x1b physicalMemorySize = 0x1e8000000 htab_hash_mask = 0xfffff ----------------------------------------------------- Initializing cgroup subsys cpuset Linux version 2.6.30-rc3-autokern1 (root@elm3a121) (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP Fri Apr 24 09:08:10 UTC 2009 [boot]0012 Setup Arch Node 0 Memory: 0x0-0xee000000 Node 1 Memory: 0xee000000-0x1e8000000 PCI host bridge /pci@800000020000001 ranges: IO 0x000003fe00100000..0x000003fe001fffff -> 0x0000000000000000 MEM 0x0000040080000000..0x00000400bfffffff -> 0x00000000c0000000 PCI host bridge /pci@800000020000002 ranges: IO 0x000003fe00600000..0x000003fe006fffff -> 0x0000000000000000 MEM 0x0000040100000000..0x000004017fffffff -> 0x0000000080000000 PCI host bridge /pci@800000020000003 ranges: IO 0x000003fe00300000..0x000003fe003fffff -> 0x0000000000000000 MEM 0x00000400c0000000..0x00000400ffffffff -> 0x00000000c0000000 EEH: PCI Enhanced I/O Error Handling Enabled PPC64 nvram contains 7168 bytes Using dedicated idle loop Zone PFN ranges: DMA 0x00000000 -> 0x001e8000 Normal 0x001e8000 -> 0x001e8000 Movable zone start PFN for each node early_node_map[2] active PFN ranges 0: 0x00000000 -> 0x000ee000 1: 0x000ee000 -> 0x001e8000 On node 0 totalpages: 974848 DMA zone: 13328 pages used for memmap DMA zone: 0 pages reserved DMA zone: 961520 pages, LIFO batch:31 On node 1 totalpages: 1024000 DMA zone: 14000 pages used for memmap DMA zone: 0 pages reserved DMA zone: 1010000 pages, LIFO batch:31 [boot]0015 Setup Done Built 2 zonelists in Node order, mobility grouping on. Total pages: 1971520 Policy zone: DMA Kernel command line: loglevel=8 autobench_args: root=/dev/sda3 ABAT:1240564260 loglevel=8 NR_IRQS:512 [boot]0020 XICS Init [boot]0021 XICS Done pic: no ISA interrupt controller PID hash table entries: 4096 (order: 12, 32768 bytes) time_init: decrementer frequency = 238.060000 MHz time_init: processor frequency = 1904.480000 MHz clocksource: timebase mult[10cd6fc] shift[22] registered clockevent: decrementer mult[3cf1] shift[16] cpu[0] Console: colour dummy device 80x25 console handover: boot [udbg0] -> real [hvc0] freeing bootmem node 0 freeing bootmem node 1 Memory: 7834904k/7995392k available (7808k kernel code, 160488k reserved, 1312k data, 1010k bss, 324k init) SLUB: Genslabs=14, HWalign=128, Order=0-3, MinObjects=0, CPUs=8, Nodes=16 Calibrating delay loop... 475.13 BogoMIPS (lpj=950272) Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes) Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes) Mount-cache hash table entries: 256 Initializing cgroup subsys ns Initializing cgroup subsys cpuacct irq: irq 2 on host null mapped to virtual irq 16 clockevent: decrementer mult[3cf1] shift[16] cpu[1] Processor 1 found. clockevent: decrementer mult[3cf1] shift[16] cpu[2] Processor 2 found. clockevent: decrementer mult[3cf1] shift[16] cpu[3] Processor 3 found. clockevent: decrementer mult[3cf1] shift[16] cpu[4] Processor 4 found. clockevent: decrementer mult[3cf1] shift[16] cpu[5] Processor 5 found. clockevent: decrementer mult[3cf1] shift[16] cpu[6] Processor 6 found. clockevent: decrementer mult[3cf1] shift[16] cpu[7] Processor 7 found. Brought up 8 CPUs Node 0 CPUs: 0-3 Node 1 CPUs: 4-7 CPU0 attaching sched-domain: domain 0: span 0-1 level SIBLING groups: 0 1 domain 1: span 0-3 level CPU groups: 0-1 2-3 domain 2: span 0-7 level NODE groups: 0-3 (__cpu_power = 2048) 4-7 (__cpu_power = 2048) CPU1 attaching sched-domain: domain 0: span 0-1 level SIBLING groups: 1 0 domain 1: span 0-3 level CPU groups: 0-1 2-3 domain 2: span 0-7 level NODE groups: 0-3 (__cpu_power = 2048) 4-7 (__cpu_power = 2048) CPU2 attaching sched-domain: domain 0: span 2-3 level SIBLING groups: 2 3 domain 1: span 0-3 level CPU groups: 2-3 0-1 domain 2: span 0-7 level NODE groups: 0-3 (__cpu_power = 2048) 4-7 (__cpu_power = 2048) CPU3 attaching sched-domain: domain 0: span 2-3 level SIBLING groups: 3 2 domain 1: span 0-3 level CPU groups: 2-3 0-1 domain 2: span 0-7 level NODE groups: 0-3 (__cpu_power = 2048) 4-7 (__cpu_power = 2048) CPU4 attaching sched-domain: domain 0: span 4-5 level SIBLING groups: 4 5 domain 1: span 4-7 level CPU groups: 4-5 6-7 domain 2: span 0-7 level NODE groups: 4-7 (__cpu_power = 2048) 0-3 (__cpu_power = 2048) CPU5 attaching sched-domain: domain 0: span 4-5 level SIBLING groups: 5 4 domain 1: span 4-7 level CPU groups: 4-5 6-7 domain 2: span 0-7 level NODE groups: 4-7 (__cpu_power = 2048) 0-3 (__cpu_power = 2048) CPU6 attaching sched-domain: domain 0: span 6-7 level SIBLING groups: 6 7 domain 1: span 4-7 level CPU groups: 6-7 4-5 domain 2: span 0-7 level NODE groups: 4-7 (__cpu_power = 2048) 0-3 (__cpu_power = 2048) CPU7 attaching sched-domain: domain 0: span 6-7 level SIBLING groups: 7 6 domain 1: span 4-7 level CPU groups: 6-7 4-5 domain 2: span 0-7 level NODE groups: 4-7 (__cpu_power = 2048) 0-3 (__cpu_power = 2048) net_namespace: 1352 bytes NET: Registered protocol family 16 IBM eBus Device Driver PCI: Probing PCI hardware IOMMU table initialized, virtual merging enabled irq: irq 83 on host null mapped to virtual irq 83 pci 0000:c8:01.0: supports D1 D2 pci 0000:c8:01.0: PME# supported from D0 D1 D2 D3hot pci 0000:c8:01.0: PME# disabled pci 0000:c8:01.1: supports D1 D2 pci 0000:c8:01.1: PME# supported from D0 D1 D2 D3hot pci 0000:c8:01.1: PME# disabled pci 0000:c8:01.2: supports D1 D2 pci 0000:c8:01.2: PME# supported from D0 D1 D2 D3hot pci 0000:c8:01.2: PME# disabled irq: irq 85 on host null mapped to virtual irq 85 pci 0000:d0:01.0: PME# supported from D0 D3hot D3cold pci 0000:d0:01.0: PME# disabled pci 0000:d0:01.1: PME# supported from D0 D3hot D3cold pci 0000:d0:01.1: PME# disabled irq: irq 87 on host null mapped to virtual irq 87 irq: irq 88 on host null mapped to virtual irq 88 pci 0001:c8:01.0: supports D1 D2 pci 0001:c8:01.0: PME# supported from D0 D1 D2 D3hot pci 0001:c8:01.0: PME# disabled irq: irq 165 on host null mapped to virtual irq 165 irq: irq 167 on host null mapped to virtual irq 167 irq: irq 117 on host null mapped to virtual irq 117 pci 0002:d0:01.0: supports D1 irq: irq 119 on host null mapped to virtual irq 119 irq: irq 115 on host null mapped to virtual irq 115 PCI: Probing PCI hardware done bio: create slab <bio-0> at 0 SCSI subsystem initialized libata version 3.00 loaded. usbcore: registered new interface driver usbfs usbcore: registered new interface driver hub usbcore: registered new device driver usb NET: Registered protocol family 2 Switched to high resolution mode on CPU 0 Switched to high resolution mode on CPU 1 Switched to high resolution mode on CPU 2 Switched to high resolution mode on CPU 3 Switched to high resolution mode on CPU 4 Switched to high resolution mode on CPU 5 Switched to high resolution mode on CPU 6 Switched to high resolution mode on CPU 7 IP route cache hash table entries: 262144 (order: 9, 2097152 bytes) TCP established hash table entries: 524288 (order: 11, 8388608 bytes) TCP bind hash table entries: 65536 (order: 8, 1048576 bytes) TCP: Hash tables configured (established 524288 bind 65536) TCP reno registered NET: Registered protocol family 1 checking if image is initramfs... rootfs image is not initramfs (junk in compressed archive); looks like an initrd Freeing initrd memory: 24988k freed irq: irq 655360 on host null mapped to virtual irq 17 irq: irq 589825 on host null mapped to virtual irq 18 RTAS daemon started audit: initializing netlink socket (disabled) type=2000 audit(1240564423.424:1): initialized HugeTLB registered 16 MB page size, pre-allocated 0 pages VFS: Disk quotas dquot_6.5.2 Dquot-cache hash table entries: 512 (order 0, 4096 bytes) Installing knfsd (copyright (C) 1996 okir@monad.swb.de). msgmni has been set to 15351 alg: No test for stdrng (krng) Block layer SCSI generic (bsg) driver version 0.4 loaded (major 254) io scheduler noop registered io scheduler anticipatory registered (default) io scheduler deadline registered io scheduler cfq registered matroxfb: Matrox G450 detected PInS data found at offset 31168 PInS memtype = 5 matroxfb: 640x480x8bpp (virtual: 640x26214) matroxfb: framebuffer at 0x40170000000, mapped to 0xd000080080080000, size 33554432 Console: switching to colour frame buffer device 80x30 fb0: MATROX frame buffer device matroxfb_crtc2: secondary head of fb0 was registered as fb1 vio_register_driver: driver hvc_console registering HVSI: registered 0 devices Generic RTC Driver v1.07 Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled brd: module loaded Uniform Multi-Platform E-IDE driver ide-gd driver 1.18 ide-cd driver 5.00 ipr: IBM Power RAID SCSI Device Driver version: 2.4.2 (January 21, 2009) ipr 0000:c0:01.0: enabling device (0140 -> 0142) ipr 0000:c0:01.0: Found IOA with IRQ: 83 ipr 0000:c0:01.0: Initializing IOA. ipr 0000:c0:01.0: Starting IOA initialization sequence. ipr 0000:c0:01.0: Adapter firmware version: 020A004E ipr 0000:c0:01.0: IOA initialized. scsi0 : IBM 570B Storage Adapter scsi 0:0:15:0: Enclosure IBM VSBPD3E U4SCSI 4812 PQ: 0 ANSI: 2 scsi: unknown device type 31 scsi 0:255:255:255: No Device IBM 570B001 0150 PQ: 0 ANSI: 0 ipr 0002:c8:01.0: Found IOA with IRQ: 117 ipr 0002:c8:01.0: Starting IOA initialization sequence. ipr 0002:c8:01.0: Adapter firmware version: 020A004E ipr 0002:c8:01.0: IOA initialized. scsi1 : IBM 570B Storage Adapter scsi 1:0:4:0: Direct-Access IBM H0 HUS103014FL3800 RPQF PQ: 0 ANSI: 4 scsi 1:0:5:0: Direct-Access IBM ST373453LC C51A PQ: 0 ANSI: 3 scsi 1:0:15:0: Enclosure IBM VSBPD3E U4SCSI 4812 PQ: 0 ANSI: 2 scsi: unknown device type 31 scsi 1:255:255:255: No Device IBM 570B001 0150 PQ: 0 ANSI: 0 vio_register_driver: driver ibmvscsi registering st: Version 20081215, fixed bufsize 32768, s/g segs 256 Driver 'st' needs updating - please use bus_type methods Driver 'sd' needs updating - please use bus_type methods Driver 'sr' needs updating - please use bus_type methods scsi 0:0:15:0: Attached scsi generic sg0 type 13 scsi 0:255:255:255: Attached scsi generic sg1 type 31 sd 1:0:4:0: Attached scsi generic sg2 type 0 sd 1:0:5:0: Attached scsi generic sg3 type 0 scsi 1:0:15:0: Attached scsi generic sg4 type 13 scsi 1:255:255:255: Attached scsi generic sg5 type 31 Intel(R) PRO/1000 Network Driver - version 7.3.21-k3-NAPI Copyright (c) 1999-2006 Intel Corporation. e1000 0000:d0:01.0: enabling device (0140 -> 0143) sd 1:0:4:0: [sda] 286748000 512-byte hardware sectors: (146 GB/136 GiB) sd 1:0:5:0: [sdb] 143374000 512-byte hardware sectors: (73.4 GB/68.3 GiB) sd 1:0:5:0: [sdb] Write Protect is off sd 1:0:5:0: [sdb] Mode Sense: cb 00 10 08 sd 1:0:5:0: [sdb] Write cache: disabled, read cache: enabled, supports DPO and FUA sdb: sdb1 sdb2 sdb3 sd 1:0:5:0: [sdb] Attached SCSI disk sd 1:0:4:0: [sda] Write Protect is off sd 1:0:4:0: [sda] Mode Sense: d3 00 10 08 sd 1:0:4:0: [sda] Write cache: disabled, read cache: enabled, supports DPO and FUA e1000: 0000:d0:01.0: e1000_probe: (PCI-X:133MHz:64-bit) 00:09:6b:dd:0d:9c sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 > e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection e1000 0000:d0:01.1: enabling device (0140 -> 0143) sd 1:0:4:0: [sda] Attached SCSI disk e1000: 0000:d0:01.1: e1000_probe: (PCI-X:133MHz:64-bit) 00:09:6b:dd:0d:9d e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection pcnet32.c:v1.35 21.Apr.2008 tsbogend@alpha.franken.de e100: Intel(R) PRO/100 Network Driver, 3.5.24-k2-NAPI e100: Copyright(c) 1999-2006 Intel Corporation drivers/net/ibmveth.c: ibmveth: IBM i/pSeries Virtual Ethernet Driver 1.03 vio_register_driver: driver ibmveth registering console [netcon0] enabled netconsole: network logging started ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver ehci_hcd 0000:c8:01.2: enabling device (0140 -> 0142) ehci_hcd 0000:c8:01.2: EHCI Host Controller ehci_hcd 0000:c8:01.2: new USB bus registered, assigned bus number 1 ehci_hcd 0000:c8:01.2: Enabling legacy PCI PM ehci_hcd 0000:c8:01.2: irq 85, io mem 0x400a0002000 ehci_hcd 0000:c8:01.2: USB 2.0 started, EHCI 1.00 usb usb1: configuration #1 chosen from 1 choice hub 1-0:1.0: USB hub found hub 1-0:1.0: 5 ports detected ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver ohci_hcd 0000:c8:01.0: OHCI Host Controller ohci_hcd 0000:c8:01.0: new USB bus registered, assigned bus number 2 ohci_hcd 0000:c8:01.0: irq 85, io mem 0x400a0001000 usb usb2: configuration #1 chosen from 1 choice hub 2-0:1.0: USB hub found hub 2-0:1.0: 3 ports detected ohci_hcd 0000:c8:01.1: OHCI Host Controller ohci_hcd 0000:c8:01.1: new USB bus registered, assigned bus number 3 ohci_hcd 0000:c8:01.1: irq 85, io mem 0x400a0000000 usb usb3: configuration #1 chosen from 1 choice hub 3-0:1.0: USB hub found hub 3-0:1.0: 2 ports detected Initializing USB Mass Storage driver... usbcore: registered new interface driver usb-storage USB Mass Storage support registered. mice: PS/2 mouse device common for all mice md: linear personality registered for level -1 md: raid0 personality registered for level 0 md: raid1 personality registered for level 1 device-mapper: ioctl: 4.14.0-ioctl (2008-04-23) initialised: dm-devel@redhat.com oprofile: using ppc64/power5 performance monitoring. IPv4 over IPv4 tunneling driver TCP cubic registered NET: Registered protocol family 17 RPC: Registered udp transport module. RPC: Registered tcp transport module. registered taskstats version 1 md: Waiting for all devices to be available before autodetect md: If you don't use raid, use raid=noautodetect md: Autodetecting RAID arrays. md: Scanned 0 and added 0 devices. md: autorun ... md: ... autorun DONE. RAMDISK: cramfs filesystem found at block 0 RAMDISK: Loading 24988KiB [1 disk] into ram disk... done. VFS: Mounted root (cramfs filesystem) readonly on device 1:0. Freeing unused kernel memory: 324k freed nf_conntrack version 0.5.0 (16384 buckets, 65536 max) CONFIG_NF_CT_ACCT is deprecated and will be removed soon. Please use nf_conntrack.acct=1 kernel paramater, acct=1 nf_conntrack module option or sysctl net.netfilter.nf_conntrack_acct=1 to enable it. ip_tables: (C) 2000-2006 Netfilter Core Team Netfilter messages via NETLINK v0.30. loop: module loaded QLogic Fibre Channel HBA Driver: 8.03.01-k1 qla2xxx 0001:d0:01.0: enabling device (0140 -> 0143) qla2xxx 0001:d0:01.0: Found an ISP2300, irq 167, iobase 0xd00008008001a000 qla2xxx 0001:d0:01.0: Configuring PCI space... qla2xxx 0001:d0:01.0: Configure NVRAM parameters... qla2xxx 0001:d0:01.0: Verifying loaded RISC code... qla2xxx 0001:d0:01.0: firmware: requesting ql2300_fw.bin qla2xxx 0001:d0:01.0: Firmware image unavailable. qla2xxx 0001:d0:01.0: Firmware images can be retrieved from: ftp://ftp.qlogic.com/outgoing/linux/firmware/. qla2xxx 0001:d0:01.0: Failed to initialize adapter qla2xxx 0002:c0:01.0: enabling device (0140 -> 0143) qla2xxx 0002:c0:01.0: Found an ISP2300, irq 115, iobase 0xd00008008001e000 qla2xxx 0002:c0:01.0: Configuring PCI space... qla2xxx 0002:c0:01.0: Configure NVRAM parameters... qla2xxx 0002:c0:01.0: Verifying loaded RISC code... qla2xxx 0002:c0:01.0: firmware: requesting ql2300_fw.bin qla2xxx 0002:c0:01.0: Firmware image unavailable. qla2xxx 0002:c0:01.0: Firmware images can be retrieved from: ftp://ftp.qlogic.com/outgoing/linux/firmware/. qla2xxx 0002:c0:01.0: Failed to initialize adapter kjournald starting. Commit interval 5 seconds EXT3-fs: mounted filesystem with writeback data mode. EXT3 FS on sda3, internal journal e1000: lan0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX ------------[ cut here ]------------ kernel BUG at arch/powerpc/mm/pgtable.c:243! Oops: Exception in kernel mode, sig: 5 [#1] SMP NR_CPUS=128 NUMA pSeries Modules linked in: dm_snapshot dm_mirror dm_region_hash dm_log qla2xxx loop nfnetlink iptable_filter iptable_nat nf_nat ip_tables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp xt_limit ipt_LOG xt_pkttype x_tables NIP: c00000000002becc LR: c00000000002c02c CTR: 0000000000000000 REGS: c0000000ea92b4c0 TRAP: 0700 Not tainted (2.6.30-rc3-autokern1) MSR: 8000000000029032 <EE,ME,CE,IR,DR> CR: 28000484 XER: 20000020 TASK = c00000000395b660[7611] 'mlock' THREAD: c0000000ea928000 CPU: 3 GPR00: 0000000000000001 c0000000ea92b740 c0000000008ea170 c0000000ec7d4980 GPR04: 000000003f000000 c0000001e2278cf8 0000001900000393 0000000000000001 GPR08: f000000002bc0000 0000000000000000 0000000000000113 c0000001e2278c81 GPR12: 0000000044000482 c00000000093b880 0000000028004422 0000000000000000 GPR16: c0000000ea92bbf0 c0000000009f06f0 0000001900000113 c0000000ec7d4980 GPR20: 0000000000000000 f000000002bc0000 000000003f000000 c0000001e2278cf8 GPR24: c0000000eaa90bb0 0000000000000000 c0000000eaa90bb0 c0000000ea928000 GPR28: f000000002bc0000 0000001900000393 0000000000000001 c0000001e2278cf8 NIP [c00000000002becc] .assert_pte_locked+0x54/0x8c LR [c00000000002c02c] .ptep_set_access_flags+0x50/0x8c Call Trace: [c0000000ea92b740] [c0000000eaa90bb0] 0xc0000000eaa90bb0 (unreliable) [c0000000ea92b7d0] [c0000000000ed1b0] .hugetlb_cow+0xd4/0x654 [c0000000ea92b900] [c0000000000edbf0] .hugetlb_fault+0x4c0/0x708 [c0000000ea92b9f0] [c0000000000ee890] .follow_hugetlb_page+0x174/0x364 [c0000000ea92bae0] [c0000000000d8d30] .__get_user_pages+0x288/0x4c0 [c0000000ea92bbb0] [c0000000000da10c] .make_pages_present+0xa0/0xe0 [c0000000ea92bc40] [c0000000000db758] .mlock_fixup+0x90/0x228 [c0000000ea92bd00] [c0000000000dbb38] .do_mlock+0xc4/0x128 [c0000000ea92bda0] [c0000000000dbccc] .SyS_mlock+0xb0/0xec [c0000000ea92be30] [c00000000000852c] syscall_exit+0x0/0x40 Instruction dump: 0b000000 78892662 79291f24 7d69582a 7d600074 7800d182 0b000000 78895e62 79291f24 7d29582a 7d200074 7800d182 <0b000000> 3c004000 3960ffff 780007c6 ---[ end trace 36a7faa04fa9452b ]--- -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [BUG] 2.6.30-rc3: BUG triggered on some hugepage usages 2009-04-24 9:51 ` [BUG] 2.6.30-rc3: BUG triggered on some hugepage usages Mel Gorman @ 2009-04-24 15:24 ` Michael Ellerman 2009-04-30 20:59 ` Mel Gorman 2009-04-27 8:15 ` Benjamin Herrenschmidt 1 sibling, 1 reply; 7+ messages in thread From: Michael Ellerman @ 2009-04-24 15:24 UTC (permalink / raw) To: Mel Gorman; +Cc: linuxppc-dev, Linus Torvalds, Linux Kernel Mailing List [-- Attachment #1: Type: text/plain, Size: 5601 bytes --] On Fri, 2009-04-24 at 10:51 +0100, Mel Gorman wrote: > On Tue, Apr 21, 2009 at 08:27:57PM -0700, Linus Torvalds wrote: > > Another week, another -rc. > > > > I'm seeing some tests with sysbench+postgres+large pages fail on ppc64 > although a very clear pattern is not forming as to what exactly is > causing it. However, the libhugetlbfs regression tests (make && make > func) are triggering the following oops when calling mlock() and so are > likely related. > > ------------[ cut here ]------------ > kernel BUG at arch/powerpc/mm/pgtable.c:243! > Oops: Exception in kernel mode, sig: 5 [#1] > SMP NR_CPUS=128 NUMA pSeries > Modules linked in: dm_snapshot dm_mirror dm_region_hash dm_log qla2xxx > loop nfnetlink iptable_filter iptable_nat nf_nat ip_tables > nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT > xt_tcpudp xt_limit ipt_LOG xt_pkttype x_tables > NIP: c00000000002becc LR: c00000000002c02c CTR: 0000000000000000 > REGS: c0000000ea92b4c0 TRAP: 0700 Not tainted (2.6.30-rc3-autokern1) > MSR: 8000000000029032 <EE,ME,CE,IR,DR> CR: 28000484 XER: 20000020 > TASK = c00000000395b660[7611] 'mlock' THREAD: c0000000ea928000 CPU: 3 > GPR00: 0000000000000001 c0000000ea92b740 c0000000008ea170 c0000000ec7d4980 > GPR04: 000000003f000000 c0000001e2278cf8 0000001900000393 0000000000000001 > GPR08: f000000002bc0000 0000000000000000 0000000000000113 c0000001e2278c81 > GPR12: 0000000044000482 c00000000093b880 0000000028004422 0000000000000000 > GPR16: c0000000ea92bbf0 c0000000009f06f0 0000001900000113 c0000000ec7d4980 > GPR20: 0000000000000000 f000000002bc0000 000000003f000000 c0000001e2278cf8 > GPR24: c0000000eaa90bb0 0000000000000000 c0000000eaa90bb0 c0000000ea928000 > GPR28: f000000002bc0000 0000001900000393 0000000000000001 c0000001e2278cf8 > NIP [c00000000002becc] .assert_pte_locked+0x54/0x8c > LR [c00000000002c02c] .ptep_set_access_flags+0x50/0x8c > Call Trace: > [c0000000ea92b740] [c0000000eaa90bb0] 0xc0000000eaa90bb0 (unreliable) > [c0000000ea92b7d0] [c0000000000ed1b0] .hugetlb_cow+0xd4/0x654 > [c0000000ea92b900] [c0000000000edbf0] .hugetlb_fault+0x4c0/0x708 > [c0000000ea92b9f0] [c0000000000ee890] .follow_hugetlb_page+0x174/0x364 > [c0000000ea92bae0] [c0000000000d8d30] .__get_user_pages+0x288/0x4c0 > [c0000000ea92bbb0] [c0000000000da10c] .make_pages_present+0xa0/0xe0 > [c0000000ea92bc40] [c0000000000db758] .mlock_fixup+0x90/0x228 > [c0000000ea92bd00] [c0000000000dbb38] .do_mlock+0xc4/0x128 > [c0000000ea92bda0] [c0000000000dbccc] .SyS_mlock+0xb0/0xec > [c0000000ea92be30] [c00000000000852c] syscall_exit+0x0/0x40 > Instruction dump: > 0b000000 78892662 79291f24 7d69582a 7d600074 7800d182 0b000000 78895e62 > 79291f24 7d29582a 7d200074 7800d182 <0b000000> 3c004000 3960ffff > 780007c6 > ---[ end trace 36a7faa04fa9452b ]--- > > This corresponds to > > #ifdef CONFIG_DEBUG_VM > void assert_pte_locked(struct mm_struct *mm, unsigned long addr) > { > pgd_t *pgd; > pud_t *pud; > pmd_t *pmd; > > if (mm == &init_mm) > return; > pgd = mm->pgd + pgd_index(addr); > BUG_ON(pgd_none(*pgd)); > pud = pud_offset(pgd, addr); > BUG_ON(pud_none(*pud)); > pmd = pmd_offset(pud, addr); > BUG_ON(!pmd_present(*pmd)); <----- THIS LINE > BUG_ON(!spin_is_locked(pte_lockptr(mm, pmd))); > } > #endif /* CONFIG_DEBUG_VM */ > > This area was last changed by commit 8d30c14cab30d405a05f2aaceda1e9ad57800f36 > in the 2.6.30-rc1 timeframe. I think there was another hugepage-related > problem with this patch but I can't remember what it was. It broke modules, but I don't remember anything hugepage related. So the code changed from: -#define ptep_set_access_flags(__vma, __address, __ptep, __entry, __dirty) \ -({ \ - int __changed = !pte_same(*(__ptep), __entry); \ - if (__changed) { \ - __ptep_set_access_flags(__ptep, __entry, __dirty); \ - flush_tlb_page_nohash(__vma, __address); \ - } \ - __changed; \ -}) to: +int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address, + pte_t *ptep, pte_t entry, int dirty) +{ + int changed; + if (!dirty && pte_need_exec_flush(entry, 0)) + entry = do_dcache_icache_coherency(entry); + changed = !pte_same(*(ptep), entry); + if (changed) { + assert_pte_locked(vma->vm_mm, address); + __ptep_set_access_flags(ptep, entry); + flush_tlb_page_nohash(vma, address); + } + return changed; +} So the call to assert_pte_locked() is new. And it's never going to work for huge pages, the page table structure is different right? Notice pte_update() checks (arch/powerpc/include/asm/pgtable-ppc64.h): 198 /* huge pages use the old page table lock */ 199 if (!huge) 200 assert_pte_locked(mm, addr); But unlike pte_update() ptep_set_access_flags() has no way of knowing it's been called from huge_ptep_set_access_flags(). So my guess is we either remove the call to assert_pte_locked() in there, or have assert_pte_locked() check whether it's being called for a huge pte. cheers [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [BUG] 2.6.30-rc3: BUG triggered on some hugepage usages 2009-04-24 15:24 ` Michael Ellerman @ 2009-04-30 20:59 ` Mel Gorman 2009-04-30 21:48 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 7+ messages in thread From: Mel Gorman @ 2009-04-30 20:59 UTC (permalink / raw) To: Michael Ellerman; +Cc: linuxppc-dev, Linus Torvalds, Linux Kernel Mailing List On Sat, Apr 25, 2009 at 01:24:50AM +1000, Michael Ellerman wrote: > On Fri, 2009-04-24 at 10:51 +0100, Mel Gorman wrote: > > On Tue, Apr 21, 2009 at 08:27:57PM -0700, Linus Torvalds wrote: > > > Another week, another -rc. > > > > > > > I'm seeing some tests with sysbench+postgres+large pages fail on ppc64 > > although a very clear pattern is not forming as to what exactly is > > causing it. However, the libhugetlbfs regression tests (make && make > > func) are triggering the following oops when calling mlock() and so are > > likely related. > > > > ------------[ cut here ]------------ > > kernel BUG at arch/powerpc/mm/pgtable.c:243! > > Oops: Exception in kernel mode, sig: 5 [#1] > > SMP NR_CPUS=128 NUMA pSeries > > Modules linked in: dm_snapshot dm_mirror dm_region_hash dm_log qla2xxx > > loop nfnetlink iptable_filter iptable_nat nf_nat ip_tables > > nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT > > xt_tcpudp xt_limit ipt_LOG xt_pkttype x_tables > > NIP: c00000000002becc LR: c00000000002c02c CTR: 0000000000000000 > > REGS: c0000000ea92b4c0 TRAP: 0700 Not tainted (2.6.30-rc3-autokern1) > > MSR: 8000000000029032 <EE,ME,CE,IR,DR> CR: 28000484 XER: 20000020 > > TASK = c00000000395b660[7611] 'mlock' THREAD: c0000000ea928000 CPU: 3 > > GPR00: 0000000000000001 c0000000ea92b740 c0000000008ea170 c0000000ec7d4980 > > GPR04: 000000003f000000 c0000001e2278cf8 0000001900000393 0000000000000001 > > GPR08: f000000002bc0000 0000000000000000 0000000000000113 c0000001e2278c81 > > GPR12: 0000000044000482 c00000000093b880 0000000028004422 0000000000000000 > > GPR16: c0000000ea92bbf0 c0000000009f06f0 0000001900000113 c0000000ec7d4980 > > GPR20: 0000000000000000 f000000002bc0000 000000003f000000 c0000001e2278cf8 > > GPR24: c0000000eaa90bb0 0000000000000000 c0000000eaa90bb0 c0000000ea928000 > > GPR28: f000000002bc0000 0000001900000393 0000000000000001 c0000001e2278cf8 > > NIP [c00000000002becc] .assert_pte_locked+0x54/0x8c > > LR [c00000000002c02c] .ptep_set_access_flags+0x50/0x8c > > Call Trace: > > [c0000000ea92b740] [c0000000eaa90bb0] 0xc0000000eaa90bb0 (unreliable) > > [c0000000ea92b7d0] [c0000000000ed1b0] .hugetlb_cow+0xd4/0x654 > > [c0000000ea92b900] [c0000000000edbf0] .hugetlb_fault+0x4c0/0x708 > > [c0000000ea92b9f0] [c0000000000ee890] .follow_hugetlb_page+0x174/0x364 > > [c0000000ea92bae0] [c0000000000d8d30] .__get_user_pages+0x288/0x4c0 > > [c0000000ea92bbb0] [c0000000000da10c] .make_pages_present+0xa0/0xe0 > > [c0000000ea92bc40] [c0000000000db758] .mlock_fixup+0x90/0x228 > > [c0000000ea92bd00] [c0000000000dbb38] .do_mlock+0xc4/0x128 > > [c0000000ea92bda0] [c0000000000dbccc] .SyS_mlock+0xb0/0xec > > [c0000000ea92be30] [c00000000000852c] syscall_exit+0x0/0x40 > > Instruction dump: > > 0b000000 78892662 79291f24 7d69582a 7d600074 7800d182 0b000000 78895e62 > > 79291f24 7d29582a 7d200074 7800d182 <0b000000> 3c004000 3960ffff > > 780007c6 > > ---[ end trace 36a7faa04fa9452b ]--- > > > > This corresponds to > > > > #ifdef CONFIG_DEBUG_VM > > void assert_pte_locked(struct mm_struct *mm, unsigned long addr) > > { > > pgd_t *pgd; > > pud_t *pud; > > pmd_t *pmd; > > > > if (mm == &init_mm) > > return; > > pgd = mm->pgd + pgd_index(addr); > > BUG_ON(pgd_none(*pgd)); > > pud = pud_offset(pgd, addr); > > BUG_ON(pud_none(*pud)); > > pmd = pmd_offset(pud, addr); > > BUG_ON(!pmd_present(*pmd)); <----- THIS LINE > > BUG_ON(!spin_is_locked(pte_lockptr(mm, pmd))); > > } > > #endif /* CONFIG_DEBUG_VM */ > > > > This area was last changed by commit 8d30c14cab30d405a05f2aaceda1e9ad57800f36 > > in the 2.6.30-rc1 timeframe. I think there was another hugepage-related > > problem with this patch but I can't remember what it was. > > It broke modules, but I don't remember anything hugepage related. > > So the code changed from: > > -#define ptep_set_access_flags(__vma, __address, __ptep, __entry, __dirty) \ > -({ \ > - int __changed = !pte_same(*(__ptep), __entry); \ > - if (__changed) { \ > - __ptep_set_access_flags(__ptep, __entry, __dirty); \ > - flush_tlb_page_nohash(__vma, __address); \ > - } \ > - __changed; \ > -}) > > to: > > +int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address, > + pte_t *ptep, pte_t entry, int dirty) > +{ > + int changed; > + if (!dirty && pte_need_exec_flush(entry, 0)) > + entry = do_dcache_icache_coherency(entry); > + changed = !pte_same(*(ptep), entry); > + if (changed) { > + assert_pte_locked(vma->vm_mm, address); > + __ptep_set_access_flags(ptep, entry); > + flush_tlb_page_nohash(vma, address); > + } > + return changed; > +} > > So the call to assert_pte_locked() is new. And it's never going to work > for huge pages, the page table structure is different right? Right > Notice > pte_update() checks (arch/powerpc/include/asm/pgtable-ppc64.h): > > 198 /* huge pages use the old page table lock */ > 199 if (!huge) > 200 assert_pte_locked(mm, addr); > > But unlike pte_update() ptep_set_access_flags() has no way of knowing > it's been called from huge_ptep_set_access_flags(). > It does because it has the VMA. This patch fixes the problem for me. If it looks ok, I'll resend to Paul Mackerras for the powerpc tree. As Ben says, it doesn't explain the functional difficulties I had in sysbench+postgres so I'll reinvestigate that. ==== CUT HERE ==== powerpc: Do not assert pte_locked for hugepage PTE entries With CONFIG_DEBUG_VM, an assertion is made when changing the protection flags of a PTE that the PTE is locked. Huge pages use a different pagetable format and the assertion is bogus and will always trigger with a bug looking something like Unable to handle kernel paging request for data at address 0xf1a00235800006f8 Faulting instruction address: 0xc000000000034a80 Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=32 NUMA Maple Modules linked in: dm_snapshot dm_mirror dm_region_hash dm_log dm_mod loop evdev ext3 jbd mbcache sg sd_mod ide_pci_generic pata_amd ata_generic ipr libata tg3 libphy scsi_mod windfarm_pid windfarm_smu_sat windfarm_max6690_sensor windfarm_lm75_sensor windfarm_cpufreq_clamp windfarm_core i2c_powermac NIP: c000000000034a80 LR: c000000000034b18 CTR: 0000000000000003 REGS: c000000003037600 TRAP: 0300 Not tainted (2.6.30-rc3-autokern1) MSR: 9000000000009032 <EE,ME,IR,DR> CR: 28002484 XER: 200fffff DAR: f1a00235800006f8, DSISR: 0000000040010000 TASK = c0000002e54cc740[2960] 'map_high_trunca' THREAD: c000000003034000 CPU: 2 GPR00: 4000000000000000 c000000003037880 c000000000895d30 c0000002e5a2e500 GPR04: 00000000a0000000 c0000002edc40880 0000005700000393 0000000000000001 GPR08: f000000011ac0000 01a00235800006e8 00000000000000f5 f1a00235800006e8 GPR12: 0000000028000484 c0000000008dd780 0000000000001000 0000000000000000 GPR16: fffffffffffff000 0000000000000000 00000000a0000000 c000000003037a20 GPR20: c0000002e5f4ece8 0000000000001000 c0000002edc40880 0000000000000000 GPR24: c0000002e5f4ece8 0000000000000000 00000000a0000000 c0000002e5f4ece8 GPR28: 0000005700000393 c0000002e5a2e500 00000000a0000000 c000000003037880 NIP [c000000000034a80] .assert_pte_locked+0xa4/0xd0 LR [c000000000034b18] .ptep_set_access_flags+0x6c/0xb4 Call Trace: [c000000003037880] [c000000003037990] 0xc000000003037990 (unreliable) [c000000003037910] [c000000000034b18] .ptep_set_access_flags+0x6c/0xb4 [c0000000030379b0] [c00000000014bef8] .hugetlb_cow+0x124/0x674 [c000000003037b00] [c00000000014c930] .hugetlb_fault+0x4e8/0x6f8 [c000000003037c00] [c00000000013443c] .handle_mm_fault+0xac/0x828 [c000000003037cf0] [c0000000000340a8] .do_page_fault+0x39c/0x584 [c000000003037e30] [c0000000000057b0] handle_page_fault+0x20/0x5c Instruction dump: 7d29582a 7d200074 7800d182 0b000000 3c004000 3960ffff 780007c6 796b00c4 7d290214 7929a302 1d290068 7d6b4a14 <800b0010> 7c000074 7800d182 0b000000 This patch fixes the problem by not asseting the PTE is locked for VMAs backed by huge pages. Signed-off-by: Mel Gorman <mel@csn.ul.ie> --- arch/powerpc/mm/pgtable.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c index f5c6fd4..ae1d67c 100644 --- a/arch/powerpc/mm/pgtable.c +++ b/arch/powerpc/mm/pgtable.c @@ -219,7 +219,8 @@ int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address, entry = do_dcache_icache_coherency(entry); changed = !pte_same(*(ptep), entry); if (changed) { - assert_pte_locked(vma->vm_mm, address); + if (!(vma->vm_flags & VM_HUGETLB)) + assert_pte_locked(vma->vm_mm, address); __ptep_set_access_flags(ptep, entry); flush_tlb_page_nohash(vma, address); } ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [BUG] 2.6.30-rc3: BUG triggered on some hugepage usages 2009-04-30 20:59 ` Mel Gorman @ 2009-04-30 21:48 ` Benjamin Herrenschmidt 2009-05-18 17:13 ` Mel Gorman 0 siblings, 1 reply; 7+ messages in thread From: Benjamin Herrenschmidt @ 2009-04-30 21:48 UTC (permalink / raw) To: Mel Gorman; +Cc: linuxppc-dev, Linus Torvalds, Linux Kernel Mailing List On Thu, 2009-04-30 at 21:59 +0100, Mel Gorman wrote: > This patch fixes the problem by not asseting the PTE is locked for VMAs > backed by huge pages. Thanks, will apply. Cheers, Ben. > Signed-off-by: Mel Gorman <mel@csn.ul.ie> > --- > arch/powerpc/mm/pgtable.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c > index f5c6fd4..ae1d67c 100644 > --- a/arch/powerpc/mm/pgtable.c > +++ b/arch/powerpc/mm/pgtable.c > @@ -219,7 +219,8 @@ int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address, > entry = do_dcache_icache_coherency(entry); > changed = !pte_same(*(ptep), entry); > if (changed) { > - assert_pte_locked(vma->vm_mm, address); > + if (!(vma->vm_flags & VM_HUGETLB)) > + assert_pte_locked(vma->vm_mm, address); > __ptep_set_access_flags(ptep, entry); > flush_tlb_page_nohash(vma, address); > } ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [BUG] 2.6.30-rc3: BUG triggered on some hugepage usages 2009-04-30 21:48 ` Benjamin Herrenschmidt @ 2009-05-18 17:13 ` Mel Gorman 2009-05-18 17:26 ` Linus Torvalds 0 siblings, 1 reply; 7+ messages in thread From: Mel Gorman @ 2009-05-18 17:13 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: linuxppc-dev, Linus Torvalds, Linux Kernel Mailing List, ebmunson On Fri, May 01, 2009 at 07:48:46AM +1000, Benjamin Herrenschmidt wrote: > On Thu, 2009-04-30 at 21:59 +0100, Mel Gorman wrote: > > > This patch fixes the problem by not asseting the PTE is locked for VMAs > > backed by huge pages. > > Thanks, will apply. > What's the story with this patch? I'm still hearing of failures with huge pages that this patch fixes but I'm no seeing it upstream. Was the patch rejected or did it just slip through the cracks? To refresh, an assertion is being made on ppc64 that only makes sense for base pages. Hugepages through a wobbly every time. For convenience, here is the patch again. Thanks. ==== CUT HERE ==== powerpc: Do not assert pte_locked for hugepage PTE entries With DEBUG_VM enabled, an assertion is made when changing the protection flags of a PTE that the PTE is locked. Huge pages use a different pagetable format and the assertion is bogus and will always trigger with a bug looking something like Unable to handle kernel paging request for data at address 0xf1a00235800006f8 Faulting instruction address: 0xc000000000034a80 Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=32 NUMA Maple Modules linked in: dm_snapshot dm_mirror dm_region_hash dm_log dm_mod loop evdev ext3 jbd mbcache sg sd_mod ide_pci_generic pata_amd ata_generic ipr libata tg3 libphy scsi_mod windfarm_pid windfarm_smu_sat windfarm_max6690_sensor windfarm_lm75_sensor windfarm_cpufreq_clamp windfarm_core i2c_powermac NIP: c000000000034a80 LR: c000000000034b18 CTR: 0000000000000003 REGS: c000000003037600 TRAP: 0300 Not tainted (2.6.30-rc3-autokern1) MSR: 9000000000009032 <EE,ME,IR,DR> CR: 28002484 XER: 200fffff DAR: f1a00235800006f8, DSISR: 0000000040010000 TASK = c0000002e54cc740[2960] 'map_high_trunca' THREAD: c000000003034000 CPU: 2 GPR00: 4000000000000000 c000000003037880 c000000000895d30 c0000002e5a2e500 GPR04: 00000000a0000000 c0000002edc40880 0000005700000393 0000000000000001 GPR08: f000000011ac0000 01a00235800006e8 00000000000000f5 f1a00235800006e8 GPR12: 0000000028000484 c0000000008dd780 0000000000001000 0000000000000000 GPR16: fffffffffffff000 0000000000000000 00000000a0000000 c000000003037a20 GPR20: c0000002e5f4ece8 0000000000001000 c0000002edc40880 0000000000000000 GPR24: c0000002e5f4ece8 0000000000000000 00000000a0000000 c0000002e5f4ece8 GPR28: 0000005700000393 c0000002e5a2e500 00000000a0000000 c000000003037880 NIP [c000000000034a80] .assert_pte_locked+0xa4/0xd0 LR [c000000000034b18] .ptep_set_access_flags+0x6c/0xb4 Call Trace: [c000000003037880] [c000000003037990] 0xc000000003037990 (unreliable) [c000000003037910] [c000000000034b18] .ptep_set_access_flags+0x6c/0xb4 [c0000000030379b0] [c00000000014bef8] .hugetlb_cow+0x124/0x674 [c000000003037b00] [c00000000014c930] .hugetlb_fault+0x4e8/0x6f8 [c000000003037c00] [c00000000013443c] .handle_mm_fault+0xac/0x828 [c000000003037cf0] [c0000000000340a8] .do_page_fault+0x39c/0x584 [c000000003037e30] [c0000000000057b0] handle_page_fault+0x20/0x5c Instruction dump: 7d29582a 7d200074 7800d182 0b000000 3c004000 3960ffff 780007c6 796b00c4 7d290214 7929a302 1d290068 7d6b4a14 <800b0010> 7c000074 7800d182 0b000000 This patch fixes the problem by not asseting the PTE is locked for VMAs backed by huge pages. Signed-off-by: Mel Gorman <mel@csn.ul.ie> --- arch/powerpc/mm/pgtable.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c index f5c6fd4..ae1d67c 100644 --- a/arch/powerpc/mm/pgtable.c +++ b/arch/powerpc/mm/pgtable.c @@ -219,7 +219,8 @@ int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address, entry = do_dcache_icache_coherency(entry); changed = !pte_same(*(ptep), entry); if (changed) { - assert_pte_locked(vma->vm_mm, address); + if (!(vma->vm_flags & VM_HUGETLB)) + assert_pte_locked(vma->vm_mm, address); __ptep_set_access_flags(ptep, entry); flush_tlb_page_nohash(vma, address); } ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [BUG] 2.6.30-rc3: BUG triggered on some hugepage usages 2009-05-18 17:13 ` Mel Gorman @ 2009-05-18 17:26 ` Linus Torvalds 0 siblings, 0 replies; 7+ messages in thread From: Linus Torvalds @ 2009-05-18 17:26 UTC (permalink / raw) To: Mel Gorman; +Cc: ebmunson, Linux Kernel Mailing List, linuxppc-dev On Mon, 18 May 2009, Mel Gorman wrote: > > What's the story with this patch? I'm still hearing of failures with huge pages > that this patch fixes but I'm no seeing it upstream. Was the patch > rejected or did it just slip through the cracks? It didn't slip through the cracks, it was apparently just delayed. It's part of the merge requests I've gotten today (well, strictly speaking it seems to have hit my inbox just before midnight yesterday, but that's because those silly aussies stand upside down and sleep at odd hours). In fact, I just merged it, I haven't even had time to push that out. Linus ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [BUG] 2.6.30-rc3: BUG triggered on some hugepage usages 2009-04-24 9:51 ` [BUG] 2.6.30-rc3: BUG triggered on some hugepage usages Mel Gorman 2009-04-24 15:24 ` Michael Ellerman @ 2009-04-27 8:15 ` Benjamin Herrenschmidt 1 sibling, 0 replies; 7+ messages in thread From: Benjamin Herrenschmidt @ 2009-04-27 8:15 UTC (permalink / raw) To: Mel Gorman; +Cc: linuxppc-dev, Linus Torvalds, Linux Kernel Mailing List On Fri, 2009-04-24 at 10:51 +0100, Mel Gorman wrote: > I'm seeing some tests with sysbench+postgres+large pages fail on ppc64 > although a very clear pattern is not forming as to what exactly is > causing it. However, the libhugetlbfs regression tests (make && make > func) are triggering the following oops when calling mlock() and so > are > likely related. This would be a spurrious WARN_ON().. the test I added in there should not apply to huge pages. However, I don't see that causing a functional problem with sysbench+postgres Ben. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2009-05-18 17:27 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <alpine.LFD.2.00.0904212014170.3101@localhost.localdomain>
2009-04-24 9:51 ` [BUG] 2.6.30-rc3: BUG triggered on some hugepage usages Mel Gorman
2009-04-24 15:24 ` Michael Ellerman
2009-04-30 20:59 ` Mel Gorman
2009-04-30 21:48 ` Benjamin Herrenschmidt
2009-05-18 17:13 ` Mel Gorman
2009-05-18 17:26 ` Linus Torvalds
2009-04-27 8:15 ` Benjamin Herrenschmidt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).