* [BUG] __GFP_THISNODE is not always honored @ 2008-08-15 22:01 ` Adam Litke 0 siblings, 0 replies; 23+ messages in thread From: Adam Litke @ 2008-08-15 22:01 UTC (permalink / raw) To: linux-mm; +Cc: linux-kernel, Andrew Morton, nacc, mel, apw, agl While running the libhugetlbfs test suite on a NUMA machine with 2.6.27-rc3, I discovered some strange behavior with __GFP_THISNODE. The hugetlb function alloc_fresh_huge_page_node() calls alloc_pages_node() with __GFP_THISNODE but occasionally a page that is not on the requested node is returned. Since the hugetlb code assumes that the page will be on the requested node, badness follows when the page is added to the wrong node's free_list. There is clearly something wrong with the buddy allocator since __GFP_THISNODE cannot be trusted. Until that is fixed, the hugetlb code should not assume that the newly allocated page is on the node asked for. This patch prevents the hugetlb pool counters from being corrupted and allows the code to cope with unbalanced numa allocations. So far my debugging has led me to get_page_from_freelist() inside the for_each_zone_zonelist() loop. When buffered_rmqueue() returns a page I compare the value of page_to_nid(page), zone->node and the node that the hugetlb code requested with __GFP_THISNODE. These all match -- except when the problem triggers. In that case, zone->node matches the node we asked for but page_to_nid() does not. Workaround patch: diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 67a7119..7a30a61 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -568,7 +568,7 @@ static struct page *alloc_fresh_huge_page_node(struct hstate *h, int nid) __free_pages(page, huge_page_order(h)); return NULL; } - prep_new_huge_page(h, page, nid); + prep_new_huge_page(h, page, page_to_nid(page)); } return page; -- Adam Litke - (agl at us.ibm.com) IBM Linux Technology Center ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [BUG] __GFP_THISNODE is not always honored @ 2008-08-15 22:01 ` Adam Litke 0 siblings, 0 replies; 23+ messages in thread From: Adam Litke @ 2008-08-15 22:01 UTC (permalink / raw) To: linux-mm; +Cc: linux-kernel, Andrew Morton, nacc, mel, apw, agl While running the libhugetlbfs test suite on a NUMA machine with 2.6.27-rc3, I discovered some strange behavior with __GFP_THISNODE. The hugetlb function alloc_fresh_huge_page_node() calls alloc_pages_node() with __GFP_THISNODE but occasionally a page that is not on the requested node is returned. Since the hugetlb code assumes that the page will be on the requested node, badness follows when the page is added to the wrong node's free_list. There is clearly something wrong with the buddy allocator since __GFP_THISNODE cannot be trusted. Until that is fixed, the hugetlb code should not assume that the newly allocated page is on the node asked for. This patch prevents the hugetlb pool counters from being corrupted and allows the code to cope with unbalanced numa allocations. So far my debugging has led me to get_page_from_freelist() inside the for_each_zone_zonelist() loop. When buffered_rmqueue() returns a page I compare the value of page_to_nid(page), zone->node and the node that the hugetlb code requested with __GFP_THISNODE. These all match -- except when the problem triggers. In that case, zone->node matches the node we asked for but page_to_nid() does not. Workaround patch: diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 67a7119..7a30a61 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -568,7 +568,7 @@ static struct page *alloc_fresh_huge_page_node(struct hstate *h, int nid) __free_pages(page, huge_page_order(h)); return NULL; } - prep_new_huge_page(h, page, nid); + prep_new_huge_page(h, page, page_to_nid(page)); } return page; -- Adam Litke - (agl at us.ibm.com) IBM Linux Technology Center -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [BUG] __GFP_THISNODE is not always honored 2008-08-15 22:01 ` Adam Litke @ 2008-08-18 10:59 ` Mel Gorman -1 siblings, 0 replies; 23+ messages in thread From: Mel Gorman @ 2008-08-18 10:59 UTC (permalink / raw) To: Adam Litke; +Cc: linux-mm, linux-kernel, Andrew Morton, nacc, apw, agl On (15/08/08 17:01), Adam Litke didst pronounce: > While running the libhugetlbfs test suite on a NUMA machine with 2.6.27-rc3, I > discovered some strange behavior with __GFP_THISNODE. The hugetlb function > alloc_fresh_huge_page_node() calls alloc_pages_node() with __GFP_THISNODE but > occasionally a page that is not on the requested node is returned. That's bad in itself and has wider reaching consequences than hugetlb getting its counters wrong. I believe SLUB depends on __GFP_THISNODE being obeyed for example. Can you boot the machine in question with mminit_loglevel=4 and loglevel=8 set on the command line and send me the dmesg please? It should output the zonelists and I might be able to figure out what's going wrong. Thanks > Since the > hugetlb code assumes that the page will be on the requested node, badness follows > when the page is added to the wrong node's free_list. > > There is clearly something wrong with the buddy allocator since __GFP_THISNODE > cannot be trusted. Until that is fixed, the hugetlb code should not assume > that the newly allocated page is on the node asked for. This patch prevents > the hugetlb pool counters from being corrupted and allows the code to cope with > unbalanced numa allocations. > > So far my debugging has led me to get_page_from_freelist() inside the > for_each_zone_zonelist() loop. When buffered_rmqueue() returns a page I > compare the value of page_to_nid(page), zone->node and the node that the > hugetlb code requested with __GFP_THISNODE. These all match -- except when the > problem triggers. In that case, zone->node matches the node we asked for but > page_to_nid() does not. > Feels like the wrong zonelist is being used. The dmesg with mminit_loglevel may tell. > Workaround patch: > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 67a7119..7a30a61 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -568,7 +568,7 @@ static struct page *alloc_fresh_huge_page_node(struct hstate *h, int nid) > __free_pages(page, huge_page_order(h)); > return NULL; > } > - prep_new_huge_page(h, page, nid); > + prep_new_huge_page(h, page, page_to_nid(page)); > } This will mask the bug for hugetlb but I wonder if this should be a VM_BUG_ON(page_to_nid(page) != nid) ? > > return page; > > -- > Adam Litke - (agl at us.ibm.com) > IBM Linux Technology Center > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [BUG] __GFP_THISNODE is not always honored @ 2008-08-18 10:59 ` Mel Gorman 0 siblings, 0 replies; 23+ messages in thread From: Mel Gorman @ 2008-08-18 10:59 UTC (permalink / raw) To: Adam Litke; +Cc: linux-mm, linux-kernel, Andrew Morton, nacc, apw, agl On (15/08/08 17:01), Adam Litke didst pronounce: > While running the libhugetlbfs test suite on a NUMA machine with 2.6.27-rc3, I > discovered some strange behavior with __GFP_THISNODE. The hugetlb function > alloc_fresh_huge_page_node() calls alloc_pages_node() with __GFP_THISNODE but > occasionally a page that is not on the requested node is returned. That's bad in itself and has wider reaching consequences than hugetlb getting its counters wrong. I believe SLUB depends on __GFP_THISNODE being obeyed for example. Can you boot the machine in question with mminit_loglevel=4 and loglevel=8 set on the command line and send me the dmesg please? It should output the zonelists and I might be able to figure out what's going wrong. Thanks > Since the > hugetlb code assumes that the page will be on the requested node, badness follows > when the page is added to the wrong node's free_list. > > There is clearly something wrong with the buddy allocator since __GFP_THISNODE > cannot be trusted. Until that is fixed, the hugetlb code should not assume > that the newly allocated page is on the node asked for. This patch prevents > the hugetlb pool counters from being corrupted and allows the code to cope with > unbalanced numa allocations. > > So far my debugging has led me to get_page_from_freelist() inside the > for_each_zone_zonelist() loop. When buffered_rmqueue() returns a page I > compare the value of page_to_nid(page), zone->node and the node that the > hugetlb code requested with __GFP_THISNODE. These all match -- except when the > problem triggers. In that case, zone->node matches the node we asked for but > page_to_nid() does not. > Feels like the wrong zonelist is being used. The dmesg with mminit_loglevel may tell. > Workaround patch: > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 67a7119..7a30a61 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -568,7 +568,7 @@ static struct page *alloc_fresh_huge_page_node(struct hstate *h, int nid) > __free_pages(page, huge_page_order(h)); > return NULL; > } > - prep_new_huge_page(h, page, nid); > + prep_new_huge_page(h, page, page_to_nid(page)); > } This will mask the bug for hugetlb but I wonder if this should be a VM_BUG_ON(page_to_nid(page) != nid) ? > > return page; > > -- > Adam Litke - (agl at us.ibm.com) > IBM Linux Technology Center > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [BUG] __GFP_THISNODE is not always honored 2008-08-18 10:59 ` Mel Gorman (?) @ 2008-08-18 18:16 ` Adam Litke 2008-08-18 19:57 ` Mel Gorman -1 siblings, 1 reply; 23+ messages in thread From: Adam Litke @ 2008-08-18 18:16 UTC (permalink / raw) To: Mel Gorman; +Cc: linux-mm, linux-kernel, Andrew Morton, nacc, apw, agl [-- Attachment #1: Type: text/plain, Size: 2570 bytes --] On Mon, 2008-08-18 at 11:59 +0100, Mel Gorman wrote: > On (15/08/08 17:01), Adam Litke didst pronounce: > > While running the libhugetlbfs test suite on a NUMA machine with 2.6.27-rc3, I > > discovered some strange behavior with __GFP_THISNODE. The hugetlb function > > alloc_fresh_huge_page_node() calls alloc_pages_node() with __GFP_THISNODE but > > occasionally a page that is not on the requested node is returned. > > That's bad in itself and has wider reaching consequences than hugetlb > getting its counters wrong. I believe SLUB depends on __GFP_THISNODE > being obeyed for example. Can you boot the machine in question with > mminit_loglevel=4 and loglevel=8 set on the command line and send me the > dmesg please? It should output the zonelists and I might be able to > figure out what's going wrong. Thanks dmesg output is attached. > > Since the > > hugetlb code assumes that the page will be on the requested node, badness follows > > when the page is added to the wrong node's free_list. > > > > There is clearly something wrong with the buddy allocator since __GFP_THISNODE > > cannot be trusted. Until that is fixed, the hugetlb code should not assume > > that the newly allocated page is on the node asked for. This patch prevents > > the hugetlb pool counters from being corrupted and allows the code to cope with > > unbalanced numa allocations. > > > > So far my debugging has led me to get_page_from_freelist() inside the > > for_each_zone_zonelist() loop. When buffered_rmqueue() returns a page I > > compare the value of page_to_nid(page), zone->node and the node that the > > hugetlb code requested with __GFP_THISNODE. These all match -- except when the > > problem triggers. In that case, zone->node matches the node we asked for but > > page_to_nid() does not. > > > > Feels like the wrong zonelist is being used. The dmesg with > mminit_loglevel may tell. > > > Workaround patch: > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > index 67a7119..7a30a61 100644 > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > > @@ -568,7 +568,7 @@ static struct page *alloc_fresh_huge_page_node(struct hstate *h, int nid) > > __free_pages(page, huge_page_order(h)); > > return NULL; > > } > > - prep_new_huge_page(h, page, nid); > > + prep_new_huge_page(h, page, page_to_nid(page)); > > } > > This will mask the bug for hugetlb but I wonder if this should be a > VM_BUG_ON(page_to_nid(page) != nid) ? Yeah, the patch was provided for illustrative purposes only. -- Adam Litke - (agl at us.ibm.com) IBM Linux Technology Center [-- Attachment #2: dmesg.out --] [-- Type: text/plain, Size: 39044 bytes --] Using pSeries machine description Page orders: linear mapping = 24, virtual = 12, io = 12, vmemmap = 24 Using 1TB segments Found initrd at 0xc000000002c00000:0xc000000002f0fc00 console [udbg0] enabled Partition configured for 8 cpus. CPU maps initialized for 2 threads per core (thread shift is 1) Starting Linux PPC64 #1 SMP Mon Aug 18 16:19:50 UTC 2008 ----------------------------------------------------- ppc64_pft_size = 0x19 physicalMemorySize = 0x80000000 htab_hash_mask = 0x3ffff ----------------------------------------------------- Initializing cgroup subsys cpuset Linux version 2.6.27-rc3-autokern1 (root@tundro5.rchland.ibm.com) (gcc version 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)) #1 SMP Mon Aug 18 16:19:50 UTC 2008 [boot]0012 Setup Arch mminit::memory_register Entering add_active_range(1, 0x0, 0x8000) 0 entries of 256 used mminit::memory_register Entering add_active_range(0, 0x8000, 0xc000) 1 entries of 256 used mminit::memory_register Entering add_active_range(0, 0xc000, 0x10000) 2 entries of 256 used mminit::memory_register Entering add_active_range(0, 0x10000, 0x14000) 2 entries of 256 used mminit::memory_register Entering add_active_range(0, 0x14000, 0x18000) 2 entries of 256 used mminit::memory_register Entering add_active_range(0, 0x18000, 0x1c000) 2 entries of 256 used mminit::memory_register Entering add_active_range(0, 0x1c000, 0x20000) 2 entries of 256 used mminit::memory_register Entering add_active_range(0, 0x20000, 0x24000) 2 entries of 256 used mminit::memory_register Entering add_active_range(0, 0x24000, 0x28000) 2 entries of 256 used mminit::memory_register Entering add_active_range(0, 0x28000, 0x2c000) 2 entries of 256 used mminit::memory_register Entering add_active_range(0, 0x2c000, 0x30000) 2 entries of 256 used mminit::memory_register Entering add_active_range(0, 0x30000, 0x34000) 2 entries of 256 used mminit::memory_register Entering add_active_range(0, 0x34000, 0x38000) 2 entries of 256 used mminit::memory_register Entering add_active_range(0, 0x38000, 0x3c000) 2 entries of 256 used mminit::memory_register Entering add_active_range(0, 0x3c000, 0x40000) 2 entries of 256 used mminit::memory_register Entering add_active_range(0, 0x40000, 0x44000) 2 entries of 256 used mminit::memory_register Entering add_active_range(1, 0x44000, 0x48000) 2 entries of 256 used mminit::memory_register Entering add_active_range(1, 0x48000, 0x4c000) 3 entries of 256 used mminit::memory_register Entering add_active_range(1, 0x4c000, 0x50000) 3 entries of 256 used mminit::memory_register Entering add_active_range(1, 0x50000, 0x54000) 3 entries of 256 used mminit::memory_register Entering add_active_range(1, 0x54000, 0x58000) 3 entries of 256 used mminit::memory_register Entering add_active_range(1, 0x58000, 0x5c000) 3 entries of 256 used mminit::memory_register Entering add_active_range(1, 0x5c000, 0x60000) 3 entries of 256 used mminit::memory_register Entering add_active_range(1, 0x60000, 0x64000) 3 entries of 256 used mminit::memory_register Entering add_active_range(1, 0x64000, 0x68000) 3 entries of 256 used mminit::memory_register Entering add_active_range(1, 0x68000, 0x6c000) 3 entries of 256 used mminit::memory_register Entering add_active_range(1, 0x6c000, 0x70000) 3 entries of 256 used mminit::memory_register Entering add_active_range(1, 0x70000, 0x74000) 3 entries of 256 used mminit::memory_register Entering add_active_range(1, 0x74000, 0x78000) 3 entries of 256 used mminit::memory_register Entering add_active_range(1, 0x78000, 0x7c000) 3 entries of 256 used mminit::memory_register Entering add_active_range(1, 0x7c000, 0x80000) 3 entries of 256 used Node 0 Memory: 0x8000000-0x44000000 Node 1 Memory: 0x0-0x8000000 0x44000000-0x80000000 EEH: No capable adapters found PPC64 nvram contains 15360 bytes Using shared processor idle loop Zone PFN ranges: DMA 0x00000000 -> 0x00080000 Normal 0x00080000 -> 0x00080000 Movable zone start PFN for each node early_node_map[3] active PFN ranges 1: 0x00000000 -> 0x00008000 0: 0x00008000 -> 0x00044000 1: 0x00044000 -> 0x00080000 mminit::pageflags_layout_widths Section 0 Node 4 Zone 2 Flags 19 mminit::pageflags_layout_shifts Section 20 Node 4 Zone 2 mminit::pageflags_layout_offsets Section 0 Node 60 Zone 58 mminit::pageflags_layout_zoneid Zone ID: 58 -> 64 mminit::pageflags_layout_usage location: 64 -> 58 unused 58 -> 19 flags 19 -> 0 On node 0 totalpages: 245760 mminit::memmap_init DMA zone: 3360 pages used for memmap mminit::memmap_init DMA zone: 0 pages reserved DMA zone: 242400 pages, LIFO batch:31 mminit::memmap_init Initialising map node 0 zone 0 pfns 32768 -> 278528 mminit::memmap_init Normal zone: 0 pages used for memmap mminit::memmap_init Movable zone: 0 pages used for memmap On node 1 totalpages: 278528 mminit::memmap_init DMA zone: 7168 pages used for memmap mminit::memmap_init DMA zone: 0 pages reserved DMA zone: 271360 pages, LIFO batch:31 mminit::memmap_init Initialising map node 1 zone 0 pfns 0 -> 524288 mminit::memmap_init Normal zone: 0 pages used for memmap mminit::memmap_init Movable zone: 0 pages used for memmap [boot]0015 Setup Done mminit::zonelist general 0:DMA = 0:DMA 1:DMA mminit::zonelist thisnode 0:DMA = 0:DMA mminit::zonelist general 1:DMA = 1:DMA 0:DMA mminit::zonelist thisnode 1:DMA = 1:DMA Built 2 zonelists in Node order, mobility grouping on. Total pages: 513760 Policy zone: DMA Kernel command line: ro console=hvc0 autobench_args: root=/dev/sda6 ABAT:1219080427 mminit_loglevel=4 loglevel=8 [boot]0020 XICS Init [boot]0021 XICS Done pic: no ISA interrupt controller PID hash table entries: 4096 (order: 12, 32768 bytes) time_init: decrementer frequency = 512.000000 MHz time_init: processor frequency = 4208.000000 MHz clocksource: timebase mult[7d0000] shift[22] registered clockevent: decrementer mult[8312] shift[16] cpu[0] Console: colour dummy device 80x25 console handover: boot [udbg0] -> real [hvc0] Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes) Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes) freeing bootmem node 0 freeing bootmem node 1 Memory: 2036920k/2097152k available (7204k kernel code, 60876k reserved, 1328k data, 607k bss, 292k init) SLUB: Genslabs=13, HWalign=128, Order=0-3, MinObjects=0, CPUs=8, Nodes=16 Calibrating delay loop... 1021.95 BogoMIPS (lpj=2043904) Mount-cache hash table entries: 256 Initializing cgroup subsys ns Initializing cgroup subsys cpuacct clockevent: decrementer mult[8312] shift[16] cpu[1] Processor 1 found. clockevent: decrementer mult[8312] shift[16] cpu[2] Processor 2 found. clockevent: decrementer mult[8312] shift[16] cpu[3] Processor 3 found. clockevent: decrementer mult[8312] shift[16] cpu[4] Processor 4 found. clockevent: decrementer mult[8312] shift[16] cpu[5] Processor 5 found. clockevent: decrementer mult[8312] shift[16] cpu[6] Processor 6 found. clockevent: decrementer mult[8312] shift[16] cpu[7] Processor 7 found. Brought up 8 CPUs Node 0 CPUs: 0-7 Node 1 CPUs: CPU0 attaching sched-domain: domain 0: span 0-1 level SIBLING groups: 0 1 domain 1: span 0-7 level CPU groups: 0-1 2-3 4-5 6-7 domain 2: span 0-7 level NODE groups: 0-7 CPU1 attaching sched-domain: domain 0: span 0-1 level SIBLING groups: 1 0 domain 1: span 0-7 level CPU groups: 0-1 2-3 4-5 6-7 domain 2: span 0-7 level NODE groups: 0-7 CPU2 attaching sched-domain: domain 0: span 2-3 level SIBLING groups: 2 3 domain 1: span 0-7 level CPU groups: 2-3 4-5 6-7 0-1 domain 2: span 0-7 level NODE groups: 0-7 CPU3 attaching sched-domain: domain 0: span 2-3 level SIBLING groups: 3 2 domain 1: span 0-7 level CPU groups: 2-3 4-5 6-7 0-1 domain 2: span 0-7 level NODE groups: 0-7 CPU4 attaching sched-domain: domain 0: span 4-5 level SIBLING groups: 4 5 domain 1: span 0-7 level CPU groups: 4-5 6-7 0-1 2-3 domain 2: span 0-7 level NODE groups: 0-7 CPU5 attaching sched-domain: domain 0: span 4-5 level SIBLING groups: 5 4 domain 1: span 0-7 level CPU groups: 4-5 6-7 0-1 2-3 domain 2: span 0-7 level NODE groups: 0-7 CPU6 attaching sched-domain: domain 0: span 6-7 level SIBLING groups: 6 7 domain 1: span 0-7 level CPU groups: 6-7 0-1 2-3 4-5 domain 2: span 0-7 level NODE groups: 0-7 CPU7 attaching sched-domain: domain 0: span 6-7 level SIBLING groups: 7 6 domain 1: span 0-7 level CPU groups: 6-7 0-1 2-3 4-5 domain 2: span 0-7 level NODE groups: 0-7 net_namespace: 1152 bytes NET: Registered protocol family 16 IBM eBus Device Driver PCI: Probing PCI hardware PCI: Probing PCI hardware done SCSI subsystem initialized libata version 3.00 loaded. usbcore: registered new interface driver usbfs usbcore: registered new interface driver hub usbcore: registered new device driver usb NET: Registered protocol family 2 Switched to high resolution mode on CPU 0 Switched to high resolution mode on CPU 1 Switched to high resolution mode on CPU 2 Switched to high resolution mode on CPU 3 Switched to high resolution mode on CPU 4 Switched to high resolution mode on CPU 5 Switched to high resolution mode on CPU 6 Switched to high resolution mode on CPU 7 IP route cache hash table entries: 65536 (order: 7, 524288 bytes) TCP established hash table entries: 262144 (order: 10, 4194304 bytes) TCP bind hash table entries: 65536 (order: 8, 1048576 bytes) TCP: Hash tables configured (established 262144 bind 65536) TCP reno registered NET: Registered protocol family 1 checking if image is initramfs... it is Freeing initrd memory: 3135k freed IOMMU table initialized, virtual merging enabled RTAS daemon started audit: initializing netlink socket (disabled) type=2000 audit(1219080539.432:1): initialized RTAS: event: 952, Type: Platform Error, Severity: 2 HugeTLB registered 16 MB page size, pre-allocated 0 pages HugeTLB registered 16 GB page size, pre-allocated 0 pages HugeTLB registered 64 KB page size, pre-allocated 0 pages Installing knfsd (copyright (C) 1996 okir@monad.swb.de). msgmni has been set to 3984 Block layer SCSI generic (bsg) driver version 0.4 loaded (major 254) io scheduler noop registered io scheduler anticipatory registered (default) io scheduler deadline registered io scheduler cfq registered vio_register_driver: driver hvc_console registering HVSI: registered 0 devices Generic RTC Driver v1.07 Serial: 8250/16550 driver4 ports, IRQ sharing disabled brd: module loaded loop: module loaded Intel(R) PRO/1000 Network Driver - version 7.3.20-k3-NAPI Copyright (c) 1999-2006 Intel Corporation. pcnet32.c:v1.35 21.Apr.2008 tsbogend@alpha.franken.de e100: Intel(R) PRO/100 Network Driver, 3.5.23-k4-NAPI e100: Copyright(c) 1999-2006 Intel Corporation drivers/net/ibmveth.c: ibmveth: IBM i/pSeries Virtual Ethernet Driver 1.03 vio_register_driver: driver ibmveth registering console [netcon0] enabled netconsole: network logging started Uniform Multi-Platform E-IDE driver ipr: IBM Power RAID SCSI Device Driver version: 2.4.1 (April 24, 2007) vio_register_driver: driver ibmvscsi registering ibmvscsi 30000003: SRP_VERSION: 16.a scsi0 : IBM POWER Virtual SCSI Adapter 1.5.8 ibmvscsi 30000003: partner initialization complete ibmvscsi 30000003: sent SRP login ibmvscsi 30000003: SRP_LOGIN succeeded ibmvscsi 30000003: host srp version: 16.a, host partition tundro1 (1), OS 2, max io 262144 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi 0:0:1:0: Direct-Access IBM VDASD blkdev 0001 PQ: 0 ANSI: 4 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 scsi scan: INQUIRY result too short (5), using 36 st: Version 20080504, fixed bufsize 32768, s/g segs 256 Driver 'st' needs updating - please use bus_type methods Driver 'sd' needs updating - please use bus_type methods sd 0:0:1:0: [sda] 73400922 512-byte hardware sectors (37581 MB) sd 0:0:1:0: [sda] Write Protect is off sd 0:0:1:0: [sda] Mode Sense: 0c 00 00 08 sd 0:0:1:0: [sda] Write cache: disabled, read cache: disabled, doesn't support DPO or FUA sd 0:0:1:0: [sda] 73400922 512-byte hardware sectors (37581 MB) sd 0:0:1:0: [sda] Write Protect is off sd 0:0:1:0: [sda] Mode Sense: 0c 00 00 08 sd 0:0:1:0: [sda] Write cache: disabled, read cache: disabled, doesn't support DPO or FUA sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 > sd 0:0:1:0: [sda] Attached SCSI disk Driver 'sr' needs updating - please use bus_type methods sd 0:0:1:0: Attached scsi generic sg0 type 0 ohci_hcd: 2006 August 04 USB 1.1 'Open' Host Controller (OHCI) Driver Initializing USB Mass Storage driver... usbcore: registered new interface driver usb-storage USB Mass Storage support registered. mice: PS/2 mouse device common for all mice md: linear personality registered for level -1 md: raid0 personality registered for level 0 md: raid1 personality registered for level 1 device-mapper: ioctl: 4.14.0-ioctl (2008-04-23) initialised: dm-devel@redhat.com usbcore: registered new interface driver hiddev usbcore: registered new interface driver usbhid usbhid: v2.6:USB HID core driver oprofile: using ppc64/power6 performance monitoring. IPv4 over IPv4 tunneling driver TCP cubic registered NET: Registered protocol family 17 RPC: Registered udp transport module. RPC: Registered tcp transport module. registered taskstats version 1 Freeing unused kernel memory: 292k freed IBM eHEA ethernet device driver (Release EHEA_0092) ehea: eth1: Jumbo frames are disabled ehea: eth1 -> logical port id #2 kjournald starting. Commit interval 5 seconds EXT3-fs: mounted filesystem with ordered data mode. Unable to find swap-space signature EXT3 FS on sda6, internal journal Unable to find swap-space signature ehea: lan0: Physical port up ehea: External switch port is backup port ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [BUG] __GFP_THISNODE is not always honored 2008-08-18 18:16 ` Adam Litke @ 2008-08-18 19:57 ` Mel Gorman 0 siblings, 0 replies; 23+ messages in thread From: Mel Gorman @ 2008-08-18 19:57 UTC (permalink / raw) To: Adam Litke; +Cc: linux-mm, linux-kernel, Andrew Morton, nacc, apw, agl On (18/08/08 13:16), Adam Litke didst pronounce: > <MUCH SNIPPAGE> > mminit::memmap_init Initialising map node 0 zone 0 pfns 32768 -> 278528 > mminit::memmap_init Initialising map node 1 zone 0 pfns 0 -> 524288 This might be the problem here. This machine has overlapping nodes which is a fairly rare situation. I think it's possible the page linkages for node 0 are getting overwritten with their node 1 equivalents. If this is happening, it would lead to some oddness. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [BUG] __GFP_THISNODE is not always honored @ 2008-08-18 19:57 ` Mel Gorman 0 siblings, 0 replies; 23+ messages in thread From: Mel Gorman @ 2008-08-18 19:57 UTC (permalink / raw) To: Adam Litke; +Cc: linux-mm, linux-kernel, Andrew Morton, nacc, apw, agl On (18/08/08 13:16), Adam Litke didst pronounce: > <MUCH SNIPPAGE> > mminit::memmap_init Initialising map node 0 zone 0 pfns 32768 -> 278528 > mminit::memmap_init Initialising map node 1 zone 0 pfns 0 -> 524288 This might be the problem here. This machine has overlapping nodes which is a fairly rare situation. I think it's possible the page linkages for node 0 are getting overwritten with their node 1 equivalents. If this is happening, it would lead to some oddness. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [BUG] __GFP_THISNODE is not always honored 2008-08-18 10:59 ` Mel Gorman @ 2008-08-18 19:14 ` Christoph Lameter -1 siblings, 0 replies; 23+ messages in thread From: Christoph Lameter @ 2008-08-18 19:14 UTC (permalink / raw) To: Mel Gorman Cc: Adam Litke, linux-mm, linux-kernel, Andrew Morton, nacc, apw, agl > That's bad in itself and has wider reaching consequences than hugetlb > getting its counters wrong. I believe SLUB depends on __GFP_THISNODE > being obeyed for example. Can you boot the machine in question with > mminit_loglevel=4 and loglevel=8 set on the command line and send me the > dmesg please? It should output the zonelists and I might be able to > figure out what's going wrong. Thanks Its SLAB depends on it and will corrupt data if the wrong node is returned. SLAB has BUG_ONs that should trigger if anything like that occurs. > This will mask the bug for hugetlb but I wonder if this should be a > VM_BUG_ON(page_to_nid(page) != nid) ? Right. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [BUG] __GFP_THISNODE is not always honored @ 2008-08-18 19:14 ` Christoph Lameter 0 siblings, 0 replies; 23+ messages in thread From: Christoph Lameter @ 2008-08-18 19:14 UTC (permalink / raw) To: Mel Gorman Cc: Adam Litke, linux-mm, linux-kernel, Andrew Morton, nacc, apw, agl > That's bad in itself and has wider reaching consequences than hugetlb > getting its counters wrong. I believe SLUB depends on __GFP_THISNODE > being obeyed for example. Can you boot the machine in question with > mminit_loglevel=4 and loglevel=8 set on the command line and send me the > dmesg please? It should output the zonelists and I might be able to > figure out what's going wrong. Thanks Its SLAB depends on it and will corrupt data if the wrong node is returned. SLAB has BUG_ONs that should trigger if anything like that occurs. > This will mask the bug for hugetlb but I wonder if this should be a > VM_BUG_ON(page_to_nid(page) != nid) ? Right. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [BUG] __GFP_THISNODE is not always honored 2008-08-15 22:01 ` Adam Litke @ 2008-08-18 19:21 ` Christoph Lameter -1 siblings, 0 replies; 23+ messages in thread From: Christoph Lameter @ 2008-08-18 19:21 UTC (permalink / raw) To: Adam Litke; +Cc: linux-mm, linux-kernel, Andrew Morton, nacc, mel, apw, agl Adam Litke wrote: > > So far my debugging has led me to get_page_from_freelist() inside the > for_each_zone_zonelist() loop. When buffered_rmqueue() returns a page I > compare the value of page_to_nid(page), zone->node and the node that the > hugetlb code requested with __GFP_THISNODE. These all match -- except when the > problem triggers. In that case, zone->node matches the node we asked for but > page_to_nid() does not. Uhhh.. A page that was just taken off the freelist? So we may have freed or coalesced a page to the wrong zone? Looks like there is something more fundamental that broke here. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [BUG] __GFP_THISNODE is not always honored @ 2008-08-18 19:21 ` Christoph Lameter 0 siblings, 0 replies; 23+ messages in thread From: Christoph Lameter @ 2008-08-18 19:21 UTC (permalink / raw) To: Adam Litke; +Cc: linux-mm, linux-kernel, Andrew Morton, nacc, mel, apw, agl Adam Litke wrote: > > So far my debugging has led me to get_page_from_freelist() inside the > for_each_zone_zonelist() loop. When buffered_rmqueue() returns a page I > compare the value of page_to_nid(page), zone->node and the node that the > hugetlb code requested with __GFP_THISNODE. These all match -- except when the > problem triggers. In that case, zone->node matches the node we asked for but > page_to_nid() does not. Uhhh.. A page that was just taken off the freelist? So we may have freed or coalesced a page to the wrong zone? Looks like there is something more fundamental that broke here. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [BUG] __GFP_THISNODE is not always honored 2008-08-18 19:21 ` Christoph Lameter @ 2008-08-18 19:52 ` Mel Gorman -1 siblings, 0 replies; 23+ messages in thread From: Mel Gorman @ 2008-08-18 19:52 UTC (permalink / raw) To: Christoph Lameter Cc: Adam Litke, linux-mm, linux-kernel, Andrew Morton, nacc, apw, agl On (18/08/08 14:21), Christoph Lameter didst pronounce: > Adam Litke wrote: > > > > So far my debugging has led me to get_page_from_freelist() inside the > > for_each_zone_zonelist() loop. When buffered_rmqueue() returns a page I > > compare the value of page_to_nid(page), zone->node and the node that the > > hugetlb code requested with __GFP_THISNODE. These all match -- except when the > > problem triggers. In that case, zone->node matches the node we asked for but > > page_to_nid() does not. > > Uhhh.. A page that was just taken off the freelist? So we may have freed or > coalesced a page to the wrong zone? Looks like there is something more > fundamental that broke here. > It's still a bit hard to tell but I don't believe we are coalescing wrong at the moment. buffered_rmqueue() is pretty high in the call chain for the page allocator. The problem could have been explained if the zonelist walking for __GFP_THISNODE was screwed but the dmesg output seems to show that's ok at least. It could also be something really wacky like the page linkages don't match the zone->node linkages. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [BUG] __GFP_THISNODE is not always honored @ 2008-08-18 19:52 ` Mel Gorman 0 siblings, 0 replies; 23+ messages in thread From: Mel Gorman @ 2008-08-18 19:52 UTC (permalink / raw) To: Christoph Lameter Cc: Adam Litke, linux-mm, linux-kernel, Andrew Morton, nacc, apw, agl On (18/08/08 14:21), Christoph Lameter didst pronounce: > Adam Litke wrote: > > > > So far my debugging has led me to get_page_from_freelist() inside the > > for_each_zone_zonelist() loop. When buffered_rmqueue() returns a page I > > compare the value of page_to_nid(page), zone->node and the node that the > > hugetlb code requested with __GFP_THISNODE. These all match -- except when the > > problem triggers. In that case, zone->node matches the node we asked for but > > page_to_nid() does not. > > Uhhh.. A page that was just taken off the freelist? So we may have freed or > coalesced a page to the wrong zone? Looks like there is something more > fundamental that broke here. > It's still a bit hard to tell but I don't believe we are coalescing wrong at the moment. buffered_rmqueue() is pretty high in the call chain for the page allocator. The problem could have been explained if the zonelist walking for __GFP_THISNODE was screwed but the dmesg output seems to show that's ok at least. It could also be something really wacky like the page linkages don't match the zone->node linkages. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 23+ messages in thread
* [BUG] Make setup_zone_migrate_reserve() aware of overlapping nodes 2008-08-15 22:01 ` Adam Litke @ 2008-08-20 17:08 ` Adam Litke -1 siblings, 0 replies; 23+ messages in thread From: Adam Litke @ 2008-08-20 17:08 UTC (permalink / raw) To: linux-mm; +Cc: linux-kernel, Andrew Morton, nacc, mel, apw, agl I have gotten to the root cause of the hugetlb badness I reported back on August 15th. My system has the following memory topology (note the overlapping node): Node 0 Memory: 0x8000000-0x44000000 Node 1 Memory: 0x0-0x8000000 0x44000000-0x80000000 setup_zone_migrate_reserve() scans the address range 0x0-0x8000000 looking for a pageblock to move onto the MIGRATE_RESERVE list. Finding no candidates, it happily continues the scan into 0x8000000-0x44000000. When a pageblock is found, the pages are moved to the MIGRATE_RESERVE list on the wrong zone. Oops. (Andrew: once the proper fix is agreed upon, this should also be a candidate for -stable.) setup_zone_migrate_reserve() should skip pageblocks in overlapping nodes. Signed-off-by: Adam Litke <agl@us.ibm.com> diff --git a/mm/page_alloc.c b/mm/page_alloc.c index af982f7..f297a9b 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2512,6 +2512,10 @@ static void setup_zone_migrate_reserve(struct zone *zone) pageblock_order; for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) { + /* Watch out for overlapping nodes */ + if (!early_pfn_in_nid(pfn, zone->node)) + continue; + if (!pfn_valid(pfn)) continue; page = pfn_to_page(pfn); -- Adam Litke - (agl at us.ibm.com) IBM Linux Technology Center ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [BUG] Make setup_zone_migrate_reserve() aware of overlapping nodes @ 2008-08-20 17:08 ` Adam Litke 0 siblings, 0 replies; 23+ messages in thread From: Adam Litke @ 2008-08-20 17:08 UTC (permalink / raw) To: linux-mm; +Cc: linux-kernel, Andrew Morton, nacc, mel, apw, agl I have gotten to the root cause of the hugetlb badness I reported back on August 15th. My system has the following memory topology (note the overlapping node): i>>?Node 0 Memory: 0x8000000-0x44000000 i>>?Node 1 Memory: 0x0-0x8000000 0x44000000-0x80000000 setup_zone_migrate_reserve() scans the address range 0x0-0x8000000 looking for a pageblock to move onto the MIGRATE_RESERVE list. Finding no candidates, it happily continues the scan into 0x8000000-0x44000000. When a pageblock is found, the pages are moved to the MIGRATE_RESERVE list on the wrong zone. Oops. (Andrew: once the proper fix is agreed upon, this should also be a candidate for -stable.) setup_zone_migrate_reserve() should skip pageblocks in overlapping nodes. Signed-off-by: Adam Litke <agl@us.ibm.com> diff --git a/mm/page_alloc.c b/mm/page_alloc.c index af982f7..f297a9b 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2512,6 +2512,10 @@ static void setup_zone_migrate_reserve(struct zone *zone) pageblock_order; for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) { + /* Watch out for overlapping nodes */ + if (!early_pfn_in_nid(pfn, zone->node)) + continue; + if (!pfn_valid(pfn)) continue; page = pfn_to_page(pfn); -- Adam Litke - (agl at us.ibm.com) IBM Linux Technology Center -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [BUG] Make setup_zone_migrate_reserve() aware of overlapping nodes 2008-08-20 17:08 ` Adam Litke @ 2008-08-20 18:11 ` Dave Hansen -1 siblings, 0 replies; 23+ messages in thread From: Dave Hansen @ 2008-08-20 18:11 UTC (permalink / raw) To: Adam Litke; +Cc: linux-mm, linux-kernel, Andrew Morton, nacc, mel, apw, agl On Wed, 2008-08-20 at 12:08 -0500, Adam Litke wrote: > I have gotten to the root cause of the hugetlb badness I reported back > on August 15th. My system has the following memory topology (note the > overlapping node): > > Node 0 Memory: 0x8000000-0x44000000 > Node 1 Memory: 0x0-0x8000000 0x44000000-0x80000000 > > setup_zone_migrate_reserve() scans the address range 0x0-0x8000000 > looking for a pageblock to move onto the MIGRATE_RESERVE list. Finding > no candidates, it happily continues the scan into 0x8000000-0x44000000. > When a pageblock is found, the pages are moved to the MIGRATE_RESERVE > list on the wrong zone. Oops. This eventually gets down into move_freepages() via: ->setup_zone_migrate_reserve() ->move_freepages_block() ->move_freepages() right? It looks like there have been bugs in this area before in move_freepages(). Should there be a more stringent check in *there*? Maybe a warning? > > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -2512,6 +2512,10 @@ static void setup_zone_migrate_reserve(struct > zone *zone) > pageblock_order; > > for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) { > + /* Watch out for overlapping nodes */ > + if (!early_pfn_in_nid(pfn, zone->node)) > + continue; zone->node doesn't exist on !CONFIG_NUMA. :( You probably want: if (!early_pfn_in_nid(pfn, zone_to_nid(zone))) continue; Are you sure you need the "early_" variant here? We're not using early_pfn_valid() right below it. I guess you could also use: if (!page_to_nid(page) != zone_to_nid(zone)) continue; -- Dave ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [BUG] Make setup_zone_migrate_reserve() aware of overlapping nodes @ 2008-08-20 18:11 ` Dave Hansen 0 siblings, 0 replies; 23+ messages in thread From: Dave Hansen @ 2008-08-20 18:11 UTC (permalink / raw) To: Adam Litke; +Cc: linux-mm, linux-kernel, Andrew Morton, nacc, mel, apw, agl On Wed, 2008-08-20 at 12:08 -0500, Adam Litke wrote: > I have gotten to the root cause of the hugetlb badness I reported back > on August 15th. My system has the following memory topology (note the > overlapping node): > > i>>?Node 0 Memory: 0x8000000-0x44000000 > i>>?Node 1 Memory: 0x0-0x8000000 0x44000000-0x80000000 > > setup_zone_migrate_reserve() scans the address range 0x0-0x8000000 > looking for a pageblock to move onto the MIGRATE_RESERVE list. Finding > no candidates, it happily continues the scan into 0x8000000-0x44000000. > When a pageblock is found, the pages are moved to the MIGRATE_RESERVE > list on the wrong zone. Oops. This eventually gets down into move_freepages() via: ->setup_zone_migrate_reserve() ->move_freepages_block() ->move_freepages() right? It looks like there have been bugs in this area before in move_freepages(). Should there be a more stringent check in *there*? Maybe a warning? > i>>? > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -2512,6 +2512,10 @@ static void setup_zone_migrate_reserve(struct > zone *zone) > pageblock_order; > > for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) { > + /* Watch out for overlapping nodes */ > + if (!early_pfn_in_nid(pfn, zone->node)) > + continue; zone->node doesn't exist on !CONFIG_NUMA. :( You probably want: if (!early_pfn_in_nid(pfn, zone_to_nid(zone))) continue; Are you sure you need the "early_" variant here? We're not using early_pfn_valid() right below it. I guess you could also use: if (!page_to_nid(page) != zone_to_nid(zone)) continue; -- Dave -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 23+ messages in thread
* [BUG] [PATCH v2] Make setup_zone_migrate_reserve() aware of overlapping nodes 2008-08-20 18:11 ` Dave Hansen @ 2008-08-20 19:55 ` Adam Litke -1 siblings, 0 replies; 23+ messages in thread From: Adam Litke @ 2008-08-20 19:55 UTC (permalink / raw) To: Dave Hansen; +Cc: linux-mm, linux-kernel, Andrew Morton, nacc, mel, apw, agl Changes since V1 - Fix build for !NUMA - Add VM_BUG_ON() to catch this problem at the source I have gotten to the root cause of the hugetlb badness I reported back on August 15th. My system has the following memory topology (note the overlapping node): Node 0 Memory: 0x8000000-0x44000000 Node 1 Memory: 0x0-0x8000000 0x44000000-0x80000000 setup_zone_migrate_reserve() scans the address range 0x0-0x8000000 looking for a pageblock to move onto the MIGRATE_RESERVE list. Finding no candidates, it happily continues the scan into 0x8000000-0x44000000. When a pageblock is found, the pages are moved to the MIGRATE_RESERVE list on the wrong zone. Oops. (Andrew: once the proper fix is agreed upon, this should also be a candidate for -stable.) setup_zone_migrate_reserve() should skip pageblocks in overlapping nodes. Signed-off-by: Adam Litke <agl@us.ibm.com> diff --git a/mm/page_alloc.c b/mm/page_alloc.c index af982f7..feb7916 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -694,6 +694,9 @@ static int move_freepages(struct zone *zone, #endif for (page = start_page; page <= end_page;) { + /* Make sure we are not inadvertently changing nodes */ + VM_BUG_ON(page_to_nid(page) != zone_to_nid(zone)); + if (!pfn_valid_within(page_to_pfn(page))) { page++; continue; @@ -2516,6 +2519,10 @@ static void setup_zone_migrate_reserve(struct zone *zone) continue; page = pfn_to_page(pfn); + /* Watch out for overlapping nodes */ + if (page_to_nid(page) != zone_to_nid(zone)) + continue; + /* Blocks with reserved pages will never free, skip them. */ if (PageReserved(page)) continue; -- Adam Litke - (agl at us.ibm.com) IBM Linux Technology Center ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [BUG] [PATCH v2] Make setup_zone_migrate_reserve() aware of overlapping nodes @ 2008-08-20 19:55 ` Adam Litke 0 siblings, 0 replies; 23+ messages in thread From: Adam Litke @ 2008-08-20 19:55 UTC (permalink / raw) To: Dave Hansen; +Cc: linux-mm, linux-kernel, Andrew Morton, nacc, mel, apw, agl I have gotten to the root cause of the hugetlb badness I reported back on August 15th. My system has the following memory topology (note the overlapping node): Node 0 Memory: 0x8000000-0x44000000 Node 1 Memory: 0x0-0x8000000 0x44000000-0x80000000 setup_zone_migrate_reserve() scans the address range 0x0-0x8000000 looking for a pageblock to move onto the MIGRATE_RESERVE list. Finding no candidates, it happily continues the scan into 0x8000000-0x44000000. When a pageblock is found, the pages are moved to the MIGRATE_RESERVE list on the wrong zone. Oops. (Andrew: once the proper fix is agreed upon, this should also be a candidate for -stable.) setup_zone_migrate_reserve() should skip pageblocks in overlapping nodes. Signed-off-by: Adam Litke <agl@us.ibm.com> diff --git a/mm/page_alloc.c b/mm/page_alloc.c index af982f7..feb7916 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -694,6 +694,9 @@ static int move_freepages(struct zone *zone, #endif for (page = start_page; page <= end_page;) { + /* Make sure we are not inadvertently changing nodes */ + VM_BUG_ON(page_to_nid(page) != zone_to_nid(zone)); + if (!pfn_valid_within(page_to_pfn(page))) { page++; continue; @@ -2516,6 +2519,10 @@ static void setup_zone_migrate_reserve(struct zone *zone) continue; page = pfn_to_page(pfn); + /* Watch out for overlapping nodes */ + if (page_to_nid(page) != zone_to_nid(zone)) + continue; + /* Blocks with reserved pages will never free, skip them. */ if (PageReserved(page)) continue; -- Adam Litke - (agl at us.ibm.com) IBM Linux Technology Center -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [BUG] [PATCH v2] Make setup_zone_migrate_reserve() aware of overlapping nodes 2008-08-20 19:55 ` Adam Litke @ 2008-08-21 11:33 ` Mel Gorman -1 siblings, 0 replies; 23+ messages in thread From: Mel Gorman @ 2008-08-21 11:33 UTC (permalink / raw) To: Adam Litke Cc: Dave Hansen, linux-mm, linux-kernel, Andrew Morton, nacc, apw, agl On (20/08/08 14:55), Adam Litke didst pronounce: > Changes since V1 > - Fix build for !NUMA > - Add VM_BUG_ON() to catch this problem at the source > > I have gotten to the root cause of the hugetlb badness I reported back on > August 15th. My system has the following memory topology (note the > overlapping node): > > Node 0 Memory: 0x8000000-0x44000000 > Node 1 Memory: 0x0-0x8000000 0x44000000-0x80000000 > > setup_zone_migrate_reserve() scans the address range 0x0-0x8000000 looking > for a pageblock to move onto the MIGRATE_RESERVE list. Finding no > candidates, it happily continues the scan into 0x8000000-0x44000000. When > a pageblock is found, the pages are moved to the MIGRATE_RESERVE list on > the wrong zone. Oops. > > (Andrew: once the proper fix is agreed upon, this should also be a > candidate for -stable.) > > setup_zone_migrate_reserve() should skip pageblocks in overlapping nodes. > > Signed-off-by: Adam Litke <agl@us.ibm.com> > zone_to_nid(zone) is called every time in the loop even though it will never change. This is less than optimal but setup_zone_migrate_reserve() is only called during init and when min_free_kbytes is adjusted so it's not worth worrying about. Otherwise it looks good. Acked-by: Mel Gorman <mel@csn.ul.ie> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index af982f7..feb7916 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -694,6 +694,9 @@ static int move_freepages(struct zone *zone, > #endif > > for (page = start_page; page <= end_page;) { > + /* Make sure we are not inadvertently changing nodes */ > + VM_BUG_ON(page_to_nid(page) != zone_to_nid(zone)); > + > if (!pfn_valid_within(page_to_pfn(page))) { > page++; > continue; > @@ -2516,6 +2519,10 @@ static void setup_zone_migrate_reserve(struct zone *zone) > continue; > page = pfn_to_page(pfn); > > + /* Watch out for overlapping nodes */ > + if (page_to_nid(page) != zone_to_nid(zone)) > + continue; > + > /* Blocks with reserved pages will never free, skip them. */ > if (PageReserved(page)) > continue; > > -- > Adam Litke - (agl at us.ibm.com) > IBM Linux Technology Center > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [BUG] [PATCH v2] Make setup_zone_migrate_reserve() aware of overlapping nodes @ 2008-08-21 11:33 ` Mel Gorman 0 siblings, 0 replies; 23+ messages in thread From: Mel Gorman @ 2008-08-21 11:33 UTC (permalink / raw) To: Adam Litke Cc: Dave Hansen, linux-mm, linux-kernel, Andrew Morton, nacc, apw, agl On (20/08/08 14:55), Adam Litke didst pronounce: > Changes since V1 > - Fix build for !NUMA > - Add VM_BUG_ON() to catch this problem at the source > > I have gotten to the root cause of the hugetlb badness I reported back on > August 15th. My system has the following memory topology (note the > overlapping node): > > Node 0 Memory: 0x8000000-0x44000000 > Node 1 Memory: 0x0-0x8000000 0x44000000-0x80000000 > > setup_zone_migrate_reserve() scans the address range 0x0-0x8000000 looking > for a pageblock to move onto the MIGRATE_RESERVE list. Finding no > candidates, it happily continues the scan into 0x8000000-0x44000000. When > a pageblock is found, the pages are moved to the MIGRATE_RESERVE list on > the wrong zone. Oops. > > (Andrew: once the proper fix is agreed upon, this should also be a > candidate for -stable.) > > setup_zone_migrate_reserve() should skip pageblocks in overlapping nodes. > > Signed-off-by: Adam Litke <agl@us.ibm.com> > zone_to_nid(zone) is called every time in the loop even though it will never change. This is less than optimal but setup_zone_migrate_reserve() is only called during init and when min_free_kbytes is adjusted so it's not worth worrying about. Otherwise it looks good. Acked-by: Mel Gorman <mel@csn.ul.ie> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index af982f7..feb7916 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -694,6 +694,9 @@ static int move_freepages(struct zone *zone, > #endif > > for (page = start_page; page <= end_page;) { > + /* Make sure we are not inadvertently changing nodes */ > + VM_BUG_ON(page_to_nid(page) != zone_to_nid(zone)); > + > if (!pfn_valid_within(page_to_pfn(page))) { > page++; > continue; > @@ -2516,6 +2519,10 @@ static void setup_zone_migrate_reserve(struct zone *zone) > continue; > page = pfn_to_page(pfn); > > + /* Watch out for overlapping nodes */ > + if (page_to_nid(page) != zone_to_nid(zone)) > + continue; > + > /* Blocks with reserved pages will never free, skip them. */ > if (PageReserved(page)) > continue; > > -- > Adam Litke - (agl at us.ibm.com) > IBM Linux Technology Center > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [BUG] [PATCH v2] Make setup_zone_migrate_reserve() aware of overlapping nodes 2008-08-21 11:33 ` Mel Gorman @ 2008-08-26 9:29 ` Andy Whitcroft -1 siblings, 0 replies; 23+ messages in thread From: Andy Whitcroft @ 2008-08-26 9:29 UTC (permalink / raw) To: Mel Gorman Cc: Adam Litke, Dave Hansen, linux-mm, linux-kernel, Andrew Morton, nacc, agl On Thu, Aug 21, 2008 at 12:33:39PM +0100, Mel Gorman wrote: > On (20/08/08 14:55), Adam Litke didst pronounce: > > Changes since V1 > > - Fix build for !NUMA > > - Add VM_BUG_ON() to catch this problem at the source > > > > I have gotten to the root cause of the hugetlb badness I reported back on > > August 15th. My system has the following memory topology (note the > > overlapping node): > > > > Node 0 Memory: 0x8000000-0x44000000 > > Node 1 Memory: 0x0-0x8000000 0x44000000-0x80000000 > > > > setup_zone_migrate_reserve() scans the address range 0x0-0x8000000 looking > > for a pageblock to move onto the MIGRATE_RESERVE list. Finding no > > candidates, it happily continues the scan into 0x8000000-0x44000000. When > > a pageblock is found, the pages are moved to the MIGRATE_RESERVE list on > > the wrong zone. Oops. > > > > (Andrew: once the proper fix is agreed upon, this should also be a > > candidate for -stable.) > > > > setup_zone_migrate_reserve() should skip pageblocks in overlapping nodes. > > > > Signed-off-by: Adam Litke <agl@us.ibm.com> > > > > zone_to_nid(zone) is called every time in the loop even though it will never > change. This is less than optimal but setup_zone_migrate_reserve() is only > called during init and when min_free_kbytes is adjusted so it's not worth > worrying about. Otherwise it looks good. > > Acked-by: Mel Gorman <mel@csn.ul.ie> > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index af982f7..feb7916 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -694,6 +694,9 @@ static int move_freepages(struct zone *zone, > > #endif > > > > for (page = start_page; page <= end_page;) { > > + /* Make sure we are not inadvertently changing nodes */ > > + VM_BUG_ON(page_to_nid(page) != zone_to_nid(zone)); > > + > > if (!pfn_valid_within(page_to_pfn(page))) { > > page++; > > continue; > > @@ -2516,6 +2519,10 @@ static void setup_zone_migrate_reserve(struct zone *zone) > > continue; > > page = pfn_to_page(pfn); > > > > + /* Watch out for overlapping nodes */ > > + if (page_to_nid(page) != zone_to_nid(zone)) > > + continue; > > + > > /* Blocks with reserved pages will never free, skip them. */ > > if (PageReserved(page)) > > continue; This patch looks sane. I do note that we have a config option to tell us whether we have any possibility of overlapping nodes, and we have an early version of a check for this early_pfn_in_nid() in mm.h. You might consider having a non-early variant of this which could be optimised away for those arches which do not have CONFIG_NODES_SPAN_OTHER_NODES. In 'unearlifying' this to pfn_in_nid() I think we have a small naming issue with these function as they are only valid for use with pfns within an existing node. They should probabally both be *pfn_in_nid_within() or something in line with pfn_valid_within(). -apw ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [BUG] [PATCH v2] Make setup_zone_migrate_reserve() aware of overlapping nodes @ 2008-08-26 9:29 ` Andy Whitcroft 0 siblings, 0 replies; 23+ messages in thread From: Andy Whitcroft @ 2008-08-26 9:29 UTC (permalink / raw) To: Mel Gorman Cc: Adam Litke, Dave Hansen, linux-mm, linux-kernel, Andrew Morton, nacc, agl On Thu, Aug 21, 2008 at 12:33:39PM +0100, Mel Gorman wrote: > On (20/08/08 14:55), Adam Litke didst pronounce: > > Changes since V1 > > - Fix build for !NUMA > > - Add VM_BUG_ON() to catch this problem at the source > > > > I have gotten to the root cause of the hugetlb badness I reported back on > > August 15th. My system has the following memory topology (note the > > overlapping node): > > > > Node 0 Memory: 0x8000000-0x44000000 > > Node 1 Memory: 0x0-0x8000000 0x44000000-0x80000000 > > > > setup_zone_migrate_reserve() scans the address range 0x0-0x8000000 looking > > for a pageblock to move onto the MIGRATE_RESERVE list. Finding no > > candidates, it happily continues the scan into 0x8000000-0x44000000. When > > a pageblock is found, the pages are moved to the MIGRATE_RESERVE list on > > the wrong zone. Oops. > > > > (Andrew: once the proper fix is agreed upon, this should also be a > > candidate for -stable.) > > > > setup_zone_migrate_reserve() should skip pageblocks in overlapping nodes. > > > > Signed-off-by: Adam Litke <agl@us.ibm.com> > > > > zone_to_nid(zone) is called every time in the loop even though it will never > change. This is less than optimal but setup_zone_migrate_reserve() is only > called during init and when min_free_kbytes is adjusted so it's not worth > worrying about. Otherwise it looks good. > > Acked-by: Mel Gorman <mel@csn.ul.ie> > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index af982f7..feb7916 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -694,6 +694,9 @@ static int move_freepages(struct zone *zone, > > #endif > > > > for (page = start_page; page <= end_page;) { > > + /* Make sure we are not inadvertently changing nodes */ > > + VM_BUG_ON(page_to_nid(page) != zone_to_nid(zone)); > > + > > if (!pfn_valid_within(page_to_pfn(page))) { > > page++; > > continue; > > @@ -2516,6 +2519,10 @@ static void setup_zone_migrate_reserve(struct zone *zone) > > continue; > > page = pfn_to_page(pfn); > > > > + /* Watch out for overlapping nodes */ > > + if (page_to_nid(page) != zone_to_nid(zone)) > > + continue; > > + > > /* Blocks with reserved pages will never free, skip them. */ > > if (PageReserved(page)) > > continue; This patch looks sane. I do note that we have a config option to tell us whether we have any possibility of overlapping nodes, and we have an early version of a check for this early_pfn_in_nid() in mm.h. You might consider having a non-early variant of this which could be optimised away for those arches which do not have CONFIG_NODES_SPAN_OTHER_NODES. In 'unearlifying' this to pfn_in_nid() I think we have a small naming issue with these function as they are only valid for use with pfns within an existing node. They should probabally both be *pfn_in_nid_within() or something in line with pfn_valid_within(). -apw -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2008-08-26 9:29 UTC | newest] Thread overview: 23+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-08-15 22:01 [BUG] __GFP_THISNODE is not always honored Adam Litke 2008-08-15 22:01 ` Adam Litke 2008-08-18 10:59 ` Mel Gorman 2008-08-18 10:59 ` Mel Gorman 2008-08-18 18:16 ` Adam Litke 2008-08-18 19:57 ` Mel Gorman 2008-08-18 19:57 ` Mel Gorman 2008-08-18 19:14 ` Christoph Lameter 2008-08-18 19:14 ` Christoph Lameter 2008-08-18 19:21 ` Christoph Lameter 2008-08-18 19:21 ` Christoph Lameter 2008-08-18 19:52 ` Mel Gorman 2008-08-18 19:52 ` Mel Gorman 2008-08-20 17:08 ` [BUG] Make setup_zone_migrate_reserve() aware of overlapping nodes Adam Litke 2008-08-20 17:08 ` Adam Litke 2008-08-20 18:11 ` Dave Hansen 2008-08-20 18:11 ` Dave Hansen 2008-08-20 19:55 ` [BUG] [PATCH v2] " Adam Litke 2008-08-20 19:55 ` Adam Litke 2008-08-21 11:33 ` Mel Gorman 2008-08-21 11:33 ` Mel Gorman 2008-08-26 9:29 ` Andy Whitcroft 2008-08-26 9:29 ` Andy Whitcroft
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.