* Re: Oops in 3.7-rc8 isolate_free_pages_block() [not found] <20121206091744.GA1397@polaris.bitmath.org> @ 2012-12-06 14:48 ` Jan Kara 2012-12-06 15:22 ` Henrik Rydberg 2012-12-06 16:19 ` Mel Gorman 0 siblings, 2 replies; 18+ messages in thread From: Jan Kara @ 2012-12-06 14:48 UTC (permalink / raw) To: Henrik Rydberg Cc: Linus Torvalds, mgorman, linux-mm, Linux Kernel Mailing List On Thu 06-12-12 10:17:44, Henrik Rydberg wrote: > Hi Linus, > > This is the third time I encounter this oops in 3.7, but the first > time I managed to get a decent screenshot: > > http://bitmath.org/test/oops-3.7-rc8.jpg > > It seems to have to do with page migration. I run with transparent > hugepages configured, just for the fun of it. > > I am happy to test any suggestions. Adding linux-mm and Mel as an author of compaction in particular to CC... It seems that while traversing struct page structures, we entered into a new huge page (note that RBX is 0xffffea0001c00000 - just the beginning of a huge page) and oopsed on PageBuddy test (_mapcount is at offset 0x18 in struct page). It might be useful if you provide disassembly of isolate_freepages_block() function in your kernel so that we can guess more from other register contents... Honza Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Oops in 3.7-rc8 isolate_free_pages_block() 2012-12-06 14:48 ` Oops in 3.7-rc8 isolate_free_pages_block() Jan Kara @ 2012-12-06 15:22 ` Henrik Rydberg 2012-12-06 16:10 ` Linus Torvalds 2012-12-06 16:19 ` Mel Gorman 1 sibling, 1 reply; 18+ messages in thread From: Henrik Rydberg @ 2012-12-06 15:22 UTC (permalink / raw) To: Jan Kara; +Cc: Linus Torvalds, mgorman, linux-mm, Linux Kernel Mailing List Hi Jan, > > http://bitmath.org/test/oops-3.7-rc8.jpg > > > > It seems to have to do with page migration. I run with transparent > > hugepages configured, just for the fun of it. > > > > I am happy to test any suggestions. > > Adding linux-mm and Mel as an author of compaction in particular to CC... > > It seems that while traversing struct page structures, we entered into a new > huge page (note that RBX is 0xffffea0001c00000 - just the beginning of > a huge page) and oopsed on PageBuddy test (_mapcount is at offset 0x18 in > struct page). It might be useful if you provide disassembly of > isolate_freepages_block() function in your kernel so that we can guess more > from other register contents... I had to recreate the vmlinux file, but it seems be at the right address, so here we go: ffffffff810a6d00 <isolate_freepages_block>: ffffffff810a6d00: 48 b8 00 00 00 00 00 movabs $0xffffea0000000000,%rax ffffffff810a6d07: ea ff ff ffffffff810a6d0a: 41 57 push %r15 ffffffff810a6d0c: 41 56 push %r14 ffffffff810a6d0e: 49 89 fe mov %rdi,%r14 ffffffff810a6d11: 41 55 push %r13 ffffffff810a6d13: 49 89 d5 mov %rdx,%r13 ffffffff810a6d16: 41 54 push %r12 ffffffff810a6d18: 55 push %rbp ffffffff810a6d19: 53 push %rbx ffffffff810a6d1a: 48 89 f3 mov %rsi,%rbx ffffffff810a6d1d: 48 c1 e3 06 shl $0x6,%rbx ffffffff810a6d21: 48 83 ec 58 sub $0x58,%rsp ffffffff810a6d25: 48 01 c3 add %rax,%rbx ffffffff810a6d28: 48 39 f2 cmp %rsi,%rdx ffffffff810a6d2b: 48 89 74 24 30 mov %rsi,0x30(%rsp) ffffffff810a6d30: 44 88 44 24 3b mov %r8b,0x3b(%rsp) ffffffff810a6d35: 0f 86 15 02 00 00 jbe ffffffff810a6f50 <isolate_freepages_block+0x250> ffffffff810a6d3b: 48 8d 47 58 lea 0x58(%rdi),%rax ffffffff810a6d3f: 31 d2 xor %edx,%edx ffffffff810a6d41: 48 8b 6c 24 30 mov 0x30(%rsp),%rbp ffffffff810a6d46: 48 89 44 24 20 mov %rax,0x20(%rsp) ffffffff810a6d4b: 48 8d 47 40 lea 0x40(%rdi),%rax ffffffff810a6d4f: 49 89 dc mov %rbx,%r12 ffffffff810a6d52: c7 44 24 3c 00 00 00 movl $0x0,0x3c(%rsp) ffffffff810a6d59: 00 ffffffff810a6d5a: 49 89 ce mov %rcx,%r14 ffffffff810a6d5d: 41 89 d7 mov %edx,%r15d ffffffff810a6d60: 48 89 44 24 28 mov %rax,0x28(%rsp) ffffffff810a6d65: 48 89 7c 24 18 mov %rdi,0x18(%rsp) ffffffff810a6d6a: eb 1c jmp ffffffff810a6d88 <isolate_freepages_block+0x88> ffffffff810a6d6c: 0f 1f 40 00 nopl 0x0(%rax) ffffffff810a6d70: 48 83 c5 01 add $0x1,%rbp ffffffff810a6d74: 48 83 c3 40 add $0x40,%rbx ffffffff810a6d78: 49 39 ed cmp %rbp,%r13 ffffffff810a6d7b: 0f 86 cf 00 00 00 jbe ffffffff810a6e50 <isolate_freepages_block+0x150> ffffffff810a6d81: 4d 85 e4 test %r12,%r12 ffffffff810a6d84: 4c 0f 44 e3 cmove %rbx,%r12 ffffffff810a6d88: 8b 43 18 mov 0x18(%rbx),%eax ffffffff810a6d8b: 83 f8 80 cmp $0xffffff80,%eax ffffffff810a6d8e: 75 e0 jne ffffffff810a6d70 <isolate_freepages_block+0x70> ffffffff810a6d90: 48 8b 44 24 18 mov 0x18(%rsp),%rax ffffffff810a6d95: 48 8d 74 24 48 lea 0x48(%rsp),%rsi ffffffff810a6d9a: 41 0f b6 d7 movzbl %r15b,%edx ffffffff810a6d9e: 4c 8b 44 24 20 mov 0x20(%rsp),%r8 ffffffff810a6da3: 48 8b 4c 24 28 mov 0x28(%rsp),%rcx ffffffff810a6da8: 48 8b 40 50 mov 0x50(%rax),%rax ffffffff810a6dac: 48 89 c7 mov %rax,%rdi ffffffff810a6daf: 48 83 c7 50 add $0x50,%rdi ffffffff810a6db3: e8 a8 fe ff ff callq ffffffff810a6c60 <compact_checklock_irqsave.isra.13> ffffffff810a6db8: 84 c0 test %al,%al ffffffff810a6dba: 41 89 c7 mov %eax,%r15d ffffffff810a6dbd: 0f 84 8d 00 00 00 je ffffffff810a6e50 <isolate_freepages_block+0x150> ffffffff810a6dc3: 80 7c 24 3b 00 cmpb $0x0,0x3b(%rsp) ffffffff810a6dc8: 0f 84 c2 00 00 00 je ffffffff810a6e90 <isolate_freepages_block+0x190> ffffffff810a6dce: 8b 43 18 mov 0x18(%rbx),%eax ffffffff810a6dd1: 83 f8 80 cmp $0xffffff80,%eax ffffffff810a6dd4: 75 9a jne ffffffff810a6d70 <isolate_freepages_block+0x70> ffffffff810a6dd6: 48 89 df mov %rbx,%rdi ffffffff810a6dd9: e8 32 db fe ff callq ffffffff81094910 <split_free_page> ffffffff810a6dde: 85 c0 test %eax,%eax ffffffff810a6de0: 0f 84 81 01 00 00 je ffffffff810a6f67 <isolate_freepages_block+0x267> ffffffff810a6de6: 01 44 24 3c add %eax,0x3c(%rsp) ffffffff810a6dea: 83 f8 00 cmp $0x0,%eax ffffffff810a6ded: 0f 8e 48 01 00 00 jle ffffffff810a6f3b <isolate_freepages_block+0x23b> ffffffff810a6df3: 4d 8b 06 mov (%r14),%r8 ffffffff810a6df6: 48 89 d9 mov %rbx,%rcx ffffffff810a6df9: 31 ff xor %edi,%edi ffffffff810a6dfb: 4c 8d 5b 20 lea 0x20(%rbx),%r11 ffffffff810a6dff: 90 nop ffffffff810a6e00: 48 8d 51 20 lea 0x20(%rcx),%rdx ffffffff810a6e04: 48 89 ce mov %rcx,%rsi ffffffff810a6e07: 83 c7 01 add $0x1,%edi ffffffff810a6e0a: 48 29 de sub %rbx,%rsi ffffffff810a6e0d: 48 83 c1 40 add $0x40,%rcx ffffffff810a6e11: 39 c7 cmp %eax,%edi ffffffff810a6e13: 49 89 50 08 mov %rdx,0x8(%r8) ffffffff810a6e17: 4e 89 04 1e mov %r8,(%rsi,%r11,1) ffffffff810a6e1b: 49 89 d0 mov %rdx,%r8 ffffffff810a6e1e: 4e 89 74 1e 08 mov %r14,0x8(%rsi,%r11,1) ffffffff810a6e23: 49 89 16 mov %rdx,(%r14) ffffffff810a6e26: 75 d8 jne ffffffff810a6e00 <isolate_freepages_block+0x100> ffffffff810a6e28: 8d 48 ff lea -0x1(%rax),%ecx ffffffff810a6e2b: 48 98 cltq ffffffff810a6e2d: 48 83 e8 01 sub $0x1,%rax ffffffff810a6e31: 48 63 c9 movslq %ecx,%rcx ffffffff810a6e34: 48 01 cd add %rcx,%rbp ffffffff810a6e37: 48 c1 e0 06 shl $0x6,%rax ffffffff810a6e3b: 48 01 c3 add %rax,%rbx ffffffff810a6e3e: 48 83 c5 01 add $0x1,%rbp ffffffff810a6e42: 48 83 c3 40 add $0x40,%rbx ffffffff810a6e46: 49 39 ed cmp %rbp,%r13 ffffffff810a6e49: 0f 87 32 ff ff ff ja ffffffff810a6d81 <isolate_freepages_block+0x81> ffffffff810a6e4f: 90 nop ffffffff810a6e50: 4c 8b 74 24 18 mov 0x18(%rsp),%r14 ffffffff810a6e55: 44 89 fa mov %r15d,%edx ffffffff810a6e58: 48 63 44 24 3c movslq 0x3c(%rsp),%rax ffffffff810a6e5d: 80 7c 24 3b 00 cmpb $0x0,0x3b(%rsp) ffffffff810a6e62: 74 11 je ffffffff810a6e75 <isolate_freepages_block+0x175> ffffffff810a6e64: 4c 89 e9 mov %r13,%rcx ffffffff810a6e67: 48 2b 4c 24 30 sub 0x30(%rsp),%rcx ffffffff810a6e6c: 48 39 c1 cmp %rax,%rcx ffffffff810a6e6f: 0f 87 b7 00 00 00 ja ffffffff810a6f2c <isolate_freepages_block+0x22c> ffffffff810a6e75: 84 d2 test %dl,%dl ffffffff810a6e77: 75 31 jne ffffffff810a6eaa <isolate_freepages_block+0x1aa> ffffffff810a6e79: 4c 39 ed cmp %r13,%rbp ffffffff810a6e7c: 74 4d je ffffffff810a6ecb <isolate_freepages_block+0x1cb> ffffffff810a6e7e: 48 83 c4 58 add $0x58,%rsp ffffffff810a6e82: 5b pop %rbx ffffffff810a6e83: 5d pop %rbp ffffffff810a6e84: 41 5c pop %r12 ffffffff810a6e86: 41 5d pop %r13 ffffffff810a6e88: 41 5e pop %r14 ffffffff810a6e8a: 41 5f pop %r15 ffffffff810a6e8c: c3 retq ffffffff810a6e8d: 0f 1f 00 nopl (%rax) ffffffff810a6e90: 48 89 df mov %rbx,%rdi ffffffff810a6e93: e8 78 fd ff ff callq ffffffff810a6c10 <suitable_migration_target> ffffffff810a6e98: 84 c0 test %al,%al ffffffff810a6e9a: 0f 85 2e ff ff ff jne ffffffff810a6dce <isolate_freepages_block+0xce> ffffffff810a6ea0: 4c 8b 74 24 18 mov 0x18(%rsp),%r14 ffffffff810a6ea5: 48 63 44 24 3c movslq 0x3c(%rsp),%rax ffffffff810a6eaa: 49 8b 7e 50 mov 0x50(%r14),%rdi ffffffff810a6eae: 48 89 44 24 08 mov %rax,0x8(%rsp) ffffffff810a6eb3: 48 8b 74 24 48 mov 0x48(%rsp),%rsi ffffffff810a6eb8: 48 83 c7 50 add $0x50,%rdi ffffffff810a6ebc: e8 1f 14 60 00 callq ffffffff816a82e0 <_raw_spin_unlock_irqrestore> ffffffff810a6ec1: 4c 39 ed cmp %r13,%rbp ffffffff810a6ec4: 48 8b 44 24 08 mov 0x8(%rsp),%rax ffffffff810a6ec9: 75 b3 jne ffffffff810a6e7e <isolate_freepages_block+0x17e> ffffffff810a6ecb: 4d 85 e4 test %r12,%r12 ffffffff810a6ece: 49 8b 5e 50 mov 0x50(%r14),%rbx ffffffff810a6ed2: 74 aa je ffffffff810a6e7e <isolate_freepages_block+0x17e> ffffffff810a6ed4: 8b 4c 24 3c mov 0x3c(%rsp),%ecx ffffffff810a6ed8: 85 c9 test %ecx,%ecx ffffffff810a6eda: 75 a2 jne ffffffff810a6e7e <isolate_freepages_block+0x17e> ffffffff810a6edc: b9 03 00 00 00 mov $0x3,%ecx ffffffff810a6ee1: ba 03 00 00 00 mov $0x3,%edx ffffffff810a6ee6: be 01 00 00 00 mov $0x1,%esi ffffffff810a6eeb: 4c 89 e7 mov %r12,%rdi ffffffff810a6eee: 48 89 44 24 08 mov %rax,0x8(%rsp) ffffffff810a6ef3: e8 18 d1 fe ff callq ffffffff81094010 <set_pageblock_flags_group> ffffffff810a6ef8: 41 80 7e 42 00 cmpb $0x0,0x42(%r14) ffffffff810a6efd: 48 8b 44 24 08 mov 0x8(%rsp),%rax ffffffff810a6f02: 0f 85 76 ff ff ff jne ffffffff810a6e7e <isolate_freepages_block+0x17e> ffffffff810a6f08: 48 ba 00 00 00 00 00 movabs $0x160000000000,%rdx ffffffff810a6f0f: 16 00 00 ffffffff810a6f12: 49 01 d4 add %rdx,%r12 ffffffff810a6f15: 49 c1 fc 06 sar $0x6,%r12 ffffffff810a6f19: 4c 3b 63 60 cmp 0x60(%rbx),%r12 ffffffff810a6f1d: 0f 83 5b ff ff ff jae ffffffff810a6e7e <isolate_freepages_block+0x17e> ffffffff810a6f23: 4c 89 63 60 mov %r12,0x60(%rbx) ffffffff810a6f27: e9 52 ff ff ff jmpq ffffffff810a6e7e <isolate_freepages_block+0x17e> ffffffff810a6f2c: 31 c0 xor %eax,%eax ffffffff810a6f2e: c7 44 24 3c 00 00 00 movl $0x0,0x3c(%rsp) ffffffff810a6f35: 00 ffffffff810a6f36: e9 3a ff ff ff jmpq ffffffff810a6e75 <isolate_freepages_block+0x175> ffffffff810a6f3b: 0f 84 2f fe ff ff je ffffffff810a6d70 <isolate_freepages_block+0x70> ffffffff810a6f41: e9 e2 fe ff ff jmpq ffffffff810a6e28 <isolate_freepages_block+0x128> ffffffff810a6f46: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) ffffffff810a6f4d: 00 00 00 ffffffff810a6f50: 48 89 f5 mov %rsi,%rbp ffffffff810a6f53: 31 c0 xor %eax,%eax ffffffff810a6f55: c7 44 24 3c 00 00 00 movl $0x0,0x3c(%rsp) ffffffff810a6f5c: 00 ffffffff810a6f5d: 31 d2 xor %edx,%edx ffffffff810a6f5f: 45 31 e4 xor %r12d,%r12d ffffffff810a6f62: e9 f6 fe ff ff jmpq ffffffff810a6e5d <isolate_freepages_block+0x15d> ffffffff810a6f67: 80 7c 24 3b 00 cmpb $0x0,0x3b(%rsp) ffffffff810a6f6c: 0f 84 74 fe ff ff je ffffffff810a6de6 <isolate_freepages_block+0xe6> ffffffff810a6f72: 44 89 fa mov %r15d,%edx ffffffff810a6f75: 4c 8b 74 24 18 mov 0x18(%rsp),%r14 ffffffff810a6f7a: 48 63 44 24 3c movslq 0x3c(%rsp),%rax ffffffff810a6f7f: e9 e0 fe ff ff jmpq ffffffff810a6e64 <isolate_freepages_block+0x164> ffffffff810a6f84: 66 66 66 2e 0f 1f 84 data32 data32 nopw %cs:0x0(%rax,%rax,1) ffffffff810a6f8b: 00 00 00 00 00 Thanks, Henrik -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Oops in 3.7-rc8 isolate_free_pages_block() 2012-12-06 15:22 ` Henrik Rydberg @ 2012-12-06 16:10 ` Linus Torvalds 2012-12-06 16:35 ` Mel Gorman 0 siblings, 1 reply; 18+ messages in thread From: Linus Torvalds @ 2012-12-06 16:10 UTC (permalink / raw) To: Henrik Rydberg Cc: Jan Kara, Mel Gorman, linux-mm, Linux Kernel Mailing List, Andrew Morton Ok, so it's isolate_freepages_block+0x88, and as Jan Kara already guessed from just the offset, that is indeed likely the PageBuddy() test. On Thu, Dec 6, 2012 at 7:22 AM, Henrik Rydberg <rydberg@euromail.se> wrote: > > http://bitmath.org/test/oops-3.7-rc8.jpg > > ffffffff810a6d6a: eb 1c jmp ffffffff810a6d88 <isolate_freepages_block+0x88> > ffffffff810a6d6c: 0f 1f 40 00 nopl 0x0(%rax) On the first entry to the loop, we jump *into* the loop, over the end condition (the compiler has basically turned. And we jump directly to the faulting instruction. Looking at the register state, though, we're not at the first iteration of the loop, so we don't have to worry about that case. The loop itself then starts with: > ffffffff810a6d70: 48 83 c5 01 add $0x1,%rbp > ffffffff810a6d74: 48 83 c3 40 add $0x40,%rbx The above is the "blockpfn++, cursor++" part of the loop, while the test below is the loop condition ("blockpfn < end_pfn"): > ffffffff810a6d78: 49 39 ed cmp %rbp,%r13 > ffffffff810a6d7b: 0f 86 cf 00 00 00 jbe ffffffff810a6e50 <isolate_freepages_block+0x150> >From your image, %rbp is 0x070000 and %r13 is 0x0702f9. The "pfn_valid_within()" test is a no-op because we don't have holes in zones on x86, so then we have if (!valid_page) valid_page = page; which generates a test+cmove: > ffffffff810a6d81: 4d 85 e4 test %r12,%r12 > ffffffff810a6d84: 4c 0f 44 e3 cmove %rbx,%r12 (which is how we can tell we're not at the beginning: 'valid_page' is 0xffffea0001bfbe40, while the current page is 0xffffea0001c00000). .. and finally the oopsing instruction from PageBuddy(), which is the read of the 'page->_mapcount' > ffffffff810a6d88: 8b 43 18 mov 0x18(%rbx),%eax > ffffffff810a6d8b: 83 f8 80 cmp $0xffffff80,%eax > ffffffff810a6d8e: 75 e0 jne ffffffff810a6d70 <isolate_freepages_block+0x70> So yeah, that loop has apparently wandered into la-la-land. end_pfn must be somehow wrong. Mel, does any of this ring a bell (Andrew also added to the cc, since the patches came through him). Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Oops in 3.7-rc8 isolate_free_pages_block() 2012-12-06 16:10 ` Linus Torvalds @ 2012-12-06 16:35 ` Mel Gorman 0 siblings, 0 replies; 18+ messages in thread From: Mel Gorman @ 2012-12-06 16:35 UTC (permalink / raw) To: Linus Torvalds Cc: Henrik Rydberg, Jan Kara, linux-mm, Linux Kernel Mailing List, Andrew Morton On Thu, Dec 06, 2012 at 08:10:13AM -0800, Linus Torvalds wrote: > Ok, so it's isolate_freepages_block+0x88, and as Jan Kara already > guessed from just the offset, that is indeed likely the PageBuddy() > test. > > On Thu, Dec 6, 2012 at 7:22 AM, Henrik Rydberg <rydberg@euromail.se> wrote: > > > > http://bitmath.org/test/oops-3.7-rc8.jpg > > > > ffffffff810a6d6a: eb 1c jmp ffffffff810a6d88 <isolate_freepages_block+0x88> > > ffffffff810a6d6c: 0f 1f 40 00 nopl 0x0(%rax) > > On the first entry to the loop, we jump *into* the loop, over the end > condition (the compiler has basically turned. And we jump directly to > the faulting instruction. Looking at the register state, though, we're > not at the first iteration of the loop, so we don't have to worry > about that case. The loop itself then starts with: > > > ffffffff810a6d70: 48 83 c5 01 add $0x1,%rbp > > ffffffff810a6d74: 48 83 c3 40 add $0x40,%rbx > > The above is the "blockpfn++, cursor++" part of the loop, while the > test below is the loop condition ("blockpfn < end_pfn"): > > > ffffffff810a6d78: 49 39 ed cmp %rbp,%r13 > > ffffffff810a6d7b: 0f 86 cf 00 00 00 jbe ffffffff810a6e50 <isolate_freepages_block+0x150> > > From your image, %rbp is 0x070000 and %r13 is 0x0702f9. > > The "pfn_valid_within()" test is a no-op because we don't have holes > in zones on x86, so then we have > That thing is not about holes in zones, it's about holes within a MAX_ORDER_NR_PAGES block but either way it's a no-op x86 and we're not doing a pfn_valid check in this loop. I didn't look back in time but I have a vague recollection that this used to be always start with an aligned PFN but with large amounts of churn since, it's no longer true. > if (!valid_page) > valid_page = page; > > which generates a test+cmove: > > > ffffffff810a6d81: 4d 85 e4 test %r12,%r12 > > ffffffff810a6d84: 4c 0f 44 e3 cmove %rbx,%r12 > > (which is how we can tell we're not at the beginning: 'valid_page' is > 0xffffea0001bfbe40, while the current page is 0xffffea0001c00000). > > .. and finally the oopsing instruction from PageBuddy(), which is the > read of the 'page->_mapcount' > > > ffffffff810a6d88: 8b 43 18 mov 0x18(%rbx),%eax > > ffffffff810a6d8b: 83 f8 80 cmp $0xffffff80,%eax > > ffffffff810a6d8e: 75 e0 jne ffffffff810a6d70 <isolate_freepages_block+0x70> > > So yeah, that loop has apparently wandered into la-la-land. end_pfn > must be somehow wrong. > I think we wandered into a hole where there is no valid struct page. > Mel, does any of this ring a bell (Andrew also added to the cc, since > the patches came through him). > It reminded me of a similar bug in the migration scanner which I mentioned in the patch elsewhere in the thread but carelessly failed to cc Andrew. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Oops in 3.7-rc8 isolate_free_pages_block() 2012-12-06 14:48 ` Oops in 3.7-rc8 isolate_free_pages_block() Jan Kara 2012-12-06 15:22 ` Henrik Rydberg @ 2012-12-06 16:19 ` Mel Gorman 2012-12-06 16:50 ` Linus Torvalds ` (2 more replies) 1 sibling, 3 replies; 18+ messages in thread From: Mel Gorman @ 2012-12-06 16:19 UTC (permalink / raw) To: Jan Kara Cc: Henrik Rydberg, Linus Torvalds, linux-mm, Linux Kernel Mailing List On Thu, Dec 06, 2012 at 03:48:21PM +0100, Jan Kara wrote: > On Thu 06-12-12 10:17:44, Henrik Rydberg wrote: > > Hi Linus, > > > > This is the third time I encounter this oops in 3.7, but the first > > time I managed to get a decent screenshot: > > > > http://bitmath.org/test/oops-3.7-rc8.jpg > > > > It seems to have to do with page migration. I run with transparent > > hugepages configured, just for the fun of it. > > > > I am happy to test any suggestions. > Adding linux-mm and Mel as an author of compaction in particular to CC... > It seems that while traversing struct page structures, we entered into a new > huge page (note that RBX is 0xffffea0001c00000 - just the beginning of > a huge page) and oopsed on PageBuddy test (_mapcount is at offset 0x18 in > struct page). It might be useful if you provide disassembly of > isolate_freepages_block() function in your kernel so that we can guess more > from other register contents... > Still travelling and am not in a position to test this properly :(. However, this bug feels very similar to a bug in the migration scanner where a pfn_valid check is missed because the start is not aligned. Henrik, when did this start happening? I would be a little surprised if it started between 3.6 and 3.7-rcX but maybe it's just easier to hit now for some reason. How reproducible is this? Is there anything in particular you do to trigger the oops? Does the following patch help any? It's only compile tested I'm afraid. ---8<--- mm: compaction: check pfn_valid when entering a new MAX_ORDER_NR_PAGES block during isolation for free Commit 0bf380bc (mm: compaction: check pfn_valid when entering a new MAX_ORDER_NR_PAGES block during isolation for migration) added a check for pfn_valid() when isolating pages for migration as the scanner does not necessarily start pageblock-aligned. However, the free scanner has the same problem. If it encounters a hole, it can also trigger an oops when is calls PageBuddy(page) on a page that is within an hole. Reported-by: Henrik Rydberg <rydberg@euromail.se> Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: stable@vger.kernel.org --- mm/compaction.c | 10 ++++++++++ 1 files changed, 10 insertions(+), 0 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 9eef558..7d85ad485 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -298,6 +298,16 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, continue; if (!valid_page) valid_page = page; + + /* + * As blockpfn may not start aligned, blockpfn->end_pfn + * may cross a MAX_ORDER_NR_PAGES boundary and a pfn_valid + * check is necessary. If the pfn is not valid, stop + * isolation. + */ + if ((blockpfn & (MAX_ORDER_NR_PAGES - 1)) == 0 && + !pfn_valid(blockpfn)) + break; if (!PageBuddy(page)) continue; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: Oops in 3.7-rc8 isolate_free_pages_block() 2012-12-06 16:19 ` Mel Gorman @ 2012-12-06 16:50 ` Linus Torvalds 2012-12-06 17:55 ` Mel Gorman 2012-12-06 16:58 ` Henrik Rydberg 2012-12-06 17:22 ` Henrik Rydberg 2 siblings, 1 reply; 18+ messages in thread From: Linus Torvalds @ 2012-12-06 16:50 UTC (permalink / raw) To: Mel Gorman; +Cc: Jan Kara, Henrik Rydberg, linux-mm, Linux Kernel Mailing List On Thu, Dec 6, 2012 at 8:19 AM, Mel Gorman <mgorman@suse.de> wrote: > > Still travelling and am not in a position to test this properly :(. > However, this bug feels very similar to a bug in the migration scanner where > a pfn_valid check is missed because the start is not aligned. Ugh. This patch makes my eyes bleed. Is there no way to do this nicely in the caller? IOW, fix the 'end_pfn' logic way upstream where it is computed, and just cap it at the MAX_ORDER_NR_PAGES boundary? For example, isolate_freepages_range() seems to have this *other* end-point alignment thing going on, and does it in a loop. Wouldn't it be much better to have a separate loop that looped up to the next MAX_ORDER_NR_PAGES boundary instead of having this kind of very random test in the middle of a loop. Even the name ("isolate_freepages_block") implies that we have a "block" of pages. Having to have a random "oops, this block can have other blocks inside of it that aren't mapped" test in the middle of that function really makes me go "Uhh, no". Plus, is it even guaranteed that the *first* pfn (that we get called with) is pfnvalid to begin with? So I guess this patch fixes things, but it does make me go "That's really *really* ugly". Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Oops in 3.7-rc8 isolate_free_pages_block() 2012-12-06 16:50 ` Linus Torvalds @ 2012-12-06 17:55 ` Mel Gorman 2012-12-06 18:19 ` Linus Torvalds 0 siblings, 1 reply; 18+ messages in thread From: Mel Gorman @ 2012-12-06 17:55 UTC (permalink / raw) To: Linus Torvalds Cc: Jan Kara, Henrik Rydberg, linux-mm, Linux Kernel Mailing List On Thu, Dec 06, 2012 at 08:50:54AM -0800, Linus Torvalds wrote: > On Thu, Dec 6, 2012 at 8:19 AM, Mel Gorman <mgorman@suse.de> wrote: > > > > Still travelling and am not in a position to test this properly :(. > > However, this bug feels very similar to a bug in the migration scanner where > > a pfn_valid check is missed because the start is not aligned. > > Ugh. This patch makes my eyes bleed. > Yeah. I was listening to a talk while I was writing it, a bit cranky and didn't see why I should suffer alone. > Is there no way to do this nicely in the caller? IOW, fix the > 'end_pfn' logic way upstream where it is computed, and just cap it at > the MAX_ORDER_NR_PAGES boundary? > Easily done in the caller, but not on the MAX_ORDER_NR_PAGES boundary. The caller is striding by pageblock so a MAX_ORDER_NR_PAGES alignment will not work out. > For example, isolate_freepages_range() seems to have this *other* > end-point alignment thing going on, and does it in a loop. Wouldn't it > be much better to have a separate loop that looped up to the next > MAX_ORDER_NR_PAGES boundary instead of having this kind of very random > test in the middle of a loop. > > Even the name ("isolate_freepages_block") implies that we have a > "block" of pages. Having to have a random "oops, this block can have > other blocks inside of it that aren't mapped" test in the middle of > that function really makes me go "Uhh, no". > The block in the name is related to pageblocks. > Plus, is it even guaranteed that the *first* pfn (that we get called > with) is pfnvalid to begin with? > Yes, the caller has already checked pfn_valid() and it used to be the case that this pfn was pageblock-aligned but not since commit c89511ab (mm: compaction: Restart compaction from near where it left off). > So I guess this patch fixes things, but it does make me go "That's > really *really* ugly". > Quasimoto strikes again ---8<--- mm: compaction: check pfn_valid when entering a new MAX_ORDER_NR_PAGES block during isolation for free Commit 0bf380bc (mm: compaction: check pfn_valid when entering a new MAX_ORDER_NR_PAGES block during isolation for migration) added a check for pfn_valid() when isolating pages for migration as the scanner does not necessarily start pageblock-aligned. Since commit c89511ab (mm: compaction: Restart compaction from near where it left off), the free scanner has the same problem. This patch makes sure that the pfn range passed to isolate_freepages_block() is within the same block so that pfn_valid() checks are unnecessary. Reported-by: Henrik Rydberg <rydberg@euromail.se> Signed-off-by: Mel Gorman <mgorman@suse.de> diff --git a/mm/compaction.c b/mm/compaction.c index 9eef558..c23fa55 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -713,7 +713,15 @@ static void isolate_freepages(struct zone *zone, /* Found a block suitable for isolating free pages from */ isolated = 0; - end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn); + + /* + * As pfn may not start aligned, pfn+pageblock_nr_page + * may cross a MAX_ORDER_NR_PAGES boundary and miss + * a pfn_valid check. Ensure isolate_freepages_block() + * only scans within a pageblock. + */ + end_pfn = ALIGN(pfn + pageblock_nr_pages, pageblock_nr_pages); + end_pfn = min(end_pfn, end_pfn); isolated = isolate_freepages_block(cc, pfn, end_pfn, freelist, false); nr_freepages += isolated; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: Oops in 3.7-rc8 isolate_free_pages_block() 2012-12-06 17:55 ` Mel Gorman @ 2012-12-06 18:19 ` Linus Torvalds 2012-12-06 18:21 ` Mel Gorman 2012-12-06 18:32 ` Henrik Rydberg 0 siblings, 2 replies; 18+ messages in thread From: Linus Torvalds @ 2012-12-06 18:19 UTC (permalink / raw) To: Mel Gorman; +Cc: Jan Kara, Henrik Rydberg, linux-mm, Linux Kernel Mailing List On Thu, Dec 6, 2012 at 9:55 AM, Mel Gorman <mgorman@suse.de> wrote: > > Yeah. I was listening to a talk while I was writing it, a bit cranky and > didn't see why I should suffer alone. Makes sense. > Quasimoto strikes again Is that Quasimodo's Japanese cousin? > - end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn); > + > + /* > + * As pfn may not start aligned, pfn+pageblock_nr_page > + * may cross a MAX_ORDER_NR_PAGES boundary and miss > + * a pfn_valid check. Ensure isolate_freepages_block() > + * only scans within a pageblock. > + */ > + end_pfn = ALIGN(pfn + pageblock_nr_pages, pageblock_nr_pages); > + end_pfn = min(end_pfn, end_pfn); Ok, this looks much nicer, except it's obviously buggy. The min(end_pfn, end_pfn) thing is insane, and I'm sure you meant for that line to be + end_pfn = min(end_pfn, zone_end_pfn); Henrik, does that - corrected - patch (*instead* of the previous one, not in addition to) also fix your issue? Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Oops in 3.7-rc8 isolate_free_pages_block() 2012-12-06 18:19 ` Linus Torvalds @ 2012-12-06 18:21 ` Mel Gorman 2012-12-06 18:32 ` Henrik Rydberg 1 sibling, 0 replies; 18+ messages in thread From: Mel Gorman @ 2012-12-06 18:21 UTC (permalink / raw) To: Linus Torvalds Cc: Jan Kara, Henrik Rydberg, linux-mm, Linux Kernel Mailing List On Thu, Dec 06, 2012 at 10:19:35AM -0800, Linus Torvalds wrote: > On Thu, Dec 6, 2012 at 9:55 AM, Mel Gorman <mgorman@suse.de> wrote: > > > > Yeah. I was listening to a talk while I was writing it, a bit cranky and > > didn't see why I should suffer alone. > > Makes sense. > > > Quasimoto strikes again > > Is that Quasimodo's Japanese cousin? > Yes, he's tried to escape his terrible legacy with a name change. > > - end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn); > > + > > + /* > > + * As pfn may not start aligned, pfn+pageblock_nr_page > > + * may cross a MAX_ORDER_NR_PAGES boundary and miss > > + * a pfn_valid check. Ensure isolate_freepages_block() > > + * only scans within a pageblock. > > + */ > > + end_pfn = ALIGN(pfn + pageblock_nr_pages, pageblock_nr_pages); > > + end_pfn = min(end_pfn, end_pfn); > > Ok, this looks much nicer, except it's obviously buggy. The > min(end_pfn, end_pfn) thing is insane, and I'm sure you meant for that > line to be > > + end_pfn = min(end_pfn, zone_end_pfn); > *sigh* Yes, I did. Thanks. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Oops in 3.7-rc8 isolate_free_pages_block() 2012-12-06 18:19 ` Linus Torvalds 2012-12-06 18:21 ` Mel Gorman @ 2012-12-06 18:32 ` Henrik Rydberg 2012-12-06 18:41 ` Linus Torvalds 1 sibling, 1 reply; 18+ messages in thread From: Henrik Rydberg @ 2012-12-06 18:32 UTC (permalink / raw) To: Linus Torvalds; +Cc: Mel Gorman, Jan Kara, linux-mm, Linux Kernel Mailing List On Thu, Dec 06, 2012 at 10:19:35AM -0800, Linus Torvalds wrote: > On Thu, Dec 6, 2012 at 9:55 AM, Mel Gorman <mgorman@suse.de> wrote: > > > > Yeah. I was listening to a talk while I was writing it, a bit cranky and > > didn't see why I should suffer alone. > > Makes sense. > > > Quasimoto strikes again > > Is that Quasimodo's Japanese cousin? > > > - end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn); > > + > > + /* > > + * As pfn may not start aligned, pfn+pageblock_nr_page > > + * may cross a MAX_ORDER_NR_PAGES boundary and miss > > + * a pfn_valid check. Ensure isolate_freepages_block() > > + * only scans within a pageblock. > > + */ > > + end_pfn = ALIGN(pfn + pageblock_nr_pages, pageblock_nr_pages); > > + end_pfn = min(end_pfn, end_pfn); > > Ok, this looks much nicer, except it's obviously buggy. The > min(end_pfn, end_pfn) thing is insane, and I'm sure you meant for that > line to be > > + end_pfn = min(end_pfn, zone_end_pfn); > > Henrik, does that - corrected - patch (*instead* of the previous one, > not in addition to) also fix your issue? Yes - I can no longer trigger the failpath, so it seems to work. Mel, enjoy the rest of the talk. ;-) Generally, I am a bit surprised that noone hit this before, given that it was quite easy to trigger. I will check 3.6 as well. Thanks, Henrik -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Oops in 3.7-rc8 isolate_free_pages_block() 2012-12-06 18:32 ` Henrik Rydberg @ 2012-12-06 18:41 ` Linus Torvalds 2012-12-06 19:01 ` Mel Gorman 2012-12-06 19:28 ` Henrik Rydberg 0 siblings, 2 replies; 18+ messages in thread From: Linus Torvalds @ 2012-12-06 18:41 UTC (permalink / raw) To: Henrik Rydberg; +Cc: Mel Gorman, Jan Kara, linux-mm, Linux Kernel Mailing List On Thu, Dec 6, 2012 at 10:32 AM, Henrik Rydberg <rydberg@euromail.se> wrote: >> >> Henrik, does that - corrected - patch (*instead* of the previous one, >> not in addition to) also fix your issue? > > Yes - I can no longer trigger the failpath, so it seems to work. Mel, > enjoy the rest of the talk. ;-) > > Generally, I am a bit surprised that noone hit this before, given that > it was quite easy to trigger. I will check 3.6 as well. Actually, looking at it some more, I think that two-liner patch had *ANOTHER* bug. Because the other line seems buggy as well. Instead of end_pfn = ALIGN(pfn + pageblock_nr_pages, pageblock_nr_pages); I think it should be end_pfn = ALIGN(pfn+1, pageblock_nr_pages); instead. ALIGN() already aligns upwards (but the "+1" is needed in case pfn is already at a pageblock_nr_pages boundary, at which point ALIGN() would have just returned that same boundary. Hmm? Mel, please confirm. And Henrik, it might be good to test that doubly-fixed patch. Because reading the patch and trying to fix bugs in it that way is *not* the same as actually verifying it ;) Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Oops in 3.7-rc8 isolate_free_pages_block() 2012-12-06 18:41 ` Linus Torvalds @ 2012-12-06 19:01 ` Mel Gorman 2012-12-06 19:28 ` Henrik Rydberg 1 sibling, 0 replies; 18+ messages in thread From: Mel Gorman @ 2012-12-06 19:01 UTC (permalink / raw) To: Linus Torvalds Cc: Henrik Rydberg, Jan Kara, linux-mm, Linux Kernel Mailing List On Thu, Dec 06, 2012 at 10:41:14AM -0800, Linus Torvalds wrote: > On Thu, Dec 6, 2012 at 10:32 AM, Henrik Rydberg <rydberg@euromail.se> wrote: > >> > >> Henrik, does that - corrected - patch (*instead* of the previous one, > >> not in addition to) also fix your issue? > > > > Yes - I can no longer trigger the failpath, so it seems to work. Mel, > > enjoy the rest of the talk. ;-) > > > > Generally, I am a bit surprised that noone hit this before, given that > > it was quite easy to trigger. I will check 3.6 as well. > > Actually, looking at it some more, I think that two-liner patch had > *ANOTHER* bug. > > Because the other line seems buggy as well. > > Instead of > > end_pfn = ALIGN(pfn + pageblock_nr_pages, pageblock_nr_pages); > > I think it should be > > end_pfn = ALIGN(pfn+1, pageblock_nr_pages); > > instead. ALIGN() already aligns upwards (but the "+1" is needed in > case pfn is already at a pageblock_nr_pages boundary, at which point > ALIGN() would have just returned that same boundary. > > Hmm? Mel, please confirm. FFS. Yes, confirmed. In answer to Henrik's wondering why others have reported this -- reproducing this requires a large enough hole with the right aligment to have compaction walk into a PFN range with no memmap. Size and alignment depends in the memory model - 4M for FLATMEM and 128M for SPARSEMEM on x86. It needs a "lucky" machine. ---8<--- mm: compaction: check pfn_valid when entering a new MAX_ORDER_NR_PAGES block during isolation for free Commit 0bf380bc (mm: compaction: check pfn_valid when entering a new MAX_ORDER_NR_PAGES block during isolation for migration) added a check for pfn_valid() when isolating pages for migration as the scanner does not necessarily start pageblock-aligned. Since commit c89511ab (mm: compaction: Restart compaction from near where it left off), the free scanner has the same problem. This patch makes sure that the pfn range passed to isolate_freepages_block() is within the same block so that pfn_valid() checks are unnecessary. Reported-by: Henrik Rydberg <rydberg@euromail.se> Signed-off-by: Mel Gorman <mgorman@suse.de> --- mm/compaction.c | 10 +++++++++- 1 files changed, 9 insertions(+), 1 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 9eef558..694eaab 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -713,7 +713,15 @@ static void isolate_freepages(struct zone *zone, /* Found a block suitable for isolating free pages from */ isolated = 0; - end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn); + + /* + * As pfn may not start aligned, pfn+pageblock_nr_page + * may cross a MAX_ORDER_NR_PAGES boundary and miss + * a pfn_valid check. Ensure isolate_freepages_block() + * only scans within a pageblock + */ + end_pfn = ALIGN(pfn + 1, pageblock_nr_pages); + end_pfn = min(end_pfn, zone_end_pfn); isolated = isolate_freepages_block(cc, pfn, end_pfn, freelist, false); nr_freepages += isolated; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: Oops in 3.7-rc8 isolate_free_pages_block() 2012-12-06 18:41 ` Linus Torvalds 2012-12-06 19:01 ` Mel Gorman @ 2012-12-06 19:28 ` Henrik Rydberg 2012-12-06 19:38 ` Linus Torvalds 1 sibling, 1 reply; 18+ messages in thread From: Henrik Rydberg @ 2012-12-06 19:28 UTC (permalink / raw) To: Linus Torvalds; +Cc: Mel Gorman, Jan Kara, linux-mm, Linux Kernel Mailing List > Actually, looking at it some more, I think that two-liner patch had > *ANOTHER* bug. > > Because the other line seems buggy as well. > > Instead of > > end_pfn = ALIGN(pfn + pageblock_nr_pages, pageblock_nr_pages); > > I think it should be > > end_pfn = ALIGN(pfn+1, pageblock_nr_pages); > > instead. ALIGN() already aligns upwards (but the "+1" is needed in > case pfn is already at a pageblock_nr_pages boundary, at which point > ALIGN() would have just returned that same boundary. Ah, and now the two callers treat the pointers the same way. > Hmm? Mel, please confirm. And Henrik, it might be good to test that > doubly-fixed patch. Because reading the patch and trying to fix bugs > in it that way is *not* the same as actually verifying it ;) Confirmed, working. I also checked 3.6, but could not trigger the original problem there. The code also looks different, so it makes sense. To be explicit, this is what I tested on top of v3.7-rc8: --- mm/compaction.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/mm/compaction.c b/mm/compaction.c index 9eef558..ff1c483 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -713,7 +713,15 @@ static void isolate_freepages(struct zone *zone, /* Found a block suitable for isolating free pages from */ isolated = 0; - end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn); + + /* + * As pfn may not start aligned, pfn+pageblock_nr_page + * may cross a MAX_ORDER_NR_PAGES boundary and miss + * a pfn_valid check. Ensure isolate_freepages_block() + * only scans within a pageblock. + */ + end_pfn = ALIGN(pfn + 1, pageblock_nr_pages); + end_pfn = min(end_pfn, zone_end_pfn); isolated = isolate_freepages_block(cc, pfn, end_pfn, freelist, false); nr_freepages += isolated; -- 1.8.0.1 Hopefully, that's a wrap. :-) Henrik -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: Oops in 3.7-rc8 isolate_free_pages_block() 2012-12-06 19:28 ` Henrik Rydberg @ 2012-12-06 19:38 ` Linus Torvalds 2012-12-06 21:39 ` Henrik Rydberg 2012-12-07 8:32 ` Mel Gorman 0 siblings, 2 replies; 18+ messages in thread From: Linus Torvalds @ 2012-12-06 19:38 UTC (permalink / raw) To: Henrik Rydberg; +Cc: Mel Gorman, Jan Kara, linux-mm, Linux Kernel Mailing List Ok, I've applied the patch. Mel, some grepping shows that there is an old line that does end_pfn = ALIGN(low_pfn + pageblock_nr_pages, pageblock_nr_pages); which looks bogus. That should probably also use "+ 1" instead. But I'll consider that an independent issue, so I applied the one patch regardless. There is also a low_pfn += pageblock_nr_pages; low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1; that looks suspicious for similar reasons. Maybe low_pfn = ALIGN(low_pfn + 1, pageblock_nr_pages) - 1; instead? Although that *can* result in the same low_pfn in the end, so maybe that one was correct after all? I just did some grepping, no actual semantic analysis... Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Oops in 3.7-rc8 isolate_free_pages_block() 2012-12-06 19:38 ` Linus Torvalds @ 2012-12-06 21:39 ` Henrik Rydberg 2012-12-07 8:32 ` Mel Gorman 1 sibling, 0 replies; 18+ messages in thread From: Henrik Rydberg @ 2012-12-06 21:39 UTC (permalink / raw) To: Linus Torvalds; +Cc: Mel Gorman, Jan Kara, linux-mm, Linux Kernel Mailing List > There is also a > > low_pfn += pageblock_nr_pages; > low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1; > > that looks suspicious for similar reasons. Maybe > > low_pfn = ALIGN(low_pfn + 1, pageblock_nr_pages) - 1; > > instead? Although that *can* result in the same low_pfn in the end, so > maybe that one was correct after all? I just did some grepping, no > actual semantic analysis... Here is a totally obscure version: low_pfn |= pageblock_nr_pages - 1; It simply moves to the very end of the block, which seems to be what was intended. Henrik -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Oops in 3.7-rc8 isolate_free_pages_block() 2012-12-06 19:38 ` Linus Torvalds 2012-12-06 21:39 ` Henrik Rydberg @ 2012-12-07 8:32 ` Mel Gorman 1 sibling, 0 replies; 18+ messages in thread From: Mel Gorman @ 2012-12-07 8:32 UTC (permalink / raw) To: Linus Torvalds Cc: Henrik Rydberg, Jan Kara, linux-mm, Linux Kernel Mailing List On Thu, Dec 06, 2012 at 11:38:47AM -0800, Linus Torvalds wrote: > Ok, I've applied the patch. > Thanks. > Mel, some grepping shows that there is an old line that does > > end_pfn = ALIGN(low_pfn + pageblock_nr_pages, pageblock_nr_pages); > > which looks bogus. It's bogus. The impact is that multiple compaction attempts may be needed to clear a particular block for allocation. THP allocation success rate under stress will be lower and the latency before a range of pages is collapsed by khugepaged to a huge page will be higher. The impact of this is less and it should not result in a bug like Henrik's An attentive reviewer is going to exclaim that GFP_ATOMIC allocations for jumbo frames is impacted by this but it isn't. Even with this bogus walk, compaction will be clearing SWAP_CLUSTER_MAX contiguous chunks which is enough for jumbo frames. > That should probably also use "+ 1" instead. But > I'll consider that an independent issue, so I applied the one patch > regardless. > > There is also a > > low_pfn += pageblock_nr_pages; > low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1; > > that looks suspicious for similar reasons. Maybe > > low_pfn = ALIGN(low_pfn + 1, pageblock_nr_pages) - 1; > This one is working by co-incidence because the low_pfn will be aligned in most cases. If it was outright broken then CMA would never work either. > instead? Although that *can* result in the same low_pfn in the end, so > maybe that one was correct after all? I just did some grepping, no > actual semantic analysis... > They need fixing but the impact is much less severe and does not justify delaying 3.8 over unlike the other last-minute fixes. My performance writing patches during talks was less than stellar yesterday so I'll avoid a repeat performance and follow up with Andrew early next week with a cc to -stable. It'll also give me a chance to run the patches through the highalloc stress tests and confirm that allocation success rate is higher and latency lower as would be expected by such a fix. Thanks. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Oops in 3.7-rc8 isolate_free_pages_block() 2012-12-06 16:19 ` Mel Gorman 2012-12-06 16:50 ` Linus Torvalds @ 2012-12-06 16:58 ` Henrik Rydberg 2012-12-06 17:22 ` Henrik Rydberg 2 siblings, 0 replies; 18+ messages in thread From: Henrik Rydberg @ 2012-12-06 16:58 UTC (permalink / raw) To: Mel Gorman; +Cc: Jan Kara, Linus Torvalds, linux-mm, Linux Kernel Mailing List Hi Mel, > Still travelling and am not in a position to test this properly :(. > However, this bug feels very similar to a bug in the migration scanner where > a pfn_valid check is missed because the start is not aligned. Henrik, when > did this start happening? I would be a little surprised if it started between > 3.6 and 3.7-rcX but maybe it's just easier to hit now for some reason. I started using transparent hugepages when moving to 3.7-rc1, so it is quite possible that the problem was there already in 3.6. > How reproducible is this? Is there anything in particular you do to > trigger the oops? Unfortunately nothing special, and it is rare. IIRC, it has happened after a long uptime, but I guess that only means the probability of the oops is higher then. > Does the following patch help any? It's only compile tested I'm afraid. > > ---8<--- > mm: compaction: check pfn_valid when entering a new MAX_ORDER_NR_PAGES block during isolation for free > > Commit 0bf380bc (mm: compaction: check pfn_valid when entering a new > MAX_ORDER_NR_PAGES block during isolation for migration) added a check > for pfn_valid() when isolating pages for migration as the scanner does > not necessarily start pageblock-aligned. However, the free scanner has > the same problem. If it encounters a hole, it can also trigger an oops > when is calls PageBuddy(page) on a page that is within an hole. > > Reported-by: Henrik Rydberg <rydberg@euromail.se> > Signed-off-by: Mel Gorman <mgorman@suse.de> > Cc: stable@vger.kernel.org > --- > mm/compaction.c | 10 ++++++++++ > 1 files changed, 10 insertions(+), 0 deletions(-) > > diff --git a/mm/compaction.c b/mm/compaction.c > index 9eef558..7d85ad485 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -298,6 +298,16 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, > continue; > if (!valid_page) > valid_page = page; > + > + /* > + * As blockpfn may not start aligned, blockpfn->end_pfn > + * may cross a MAX_ORDER_NR_PAGES boundary and a pfn_valid > + * check is necessary. If the pfn is not valid, stop > + * isolation. > + */ > + if ((blockpfn & (MAX_ORDER_NR_PAGES - 1)) == 0 && > + !pfn_valid(blockpfn)) > + break; > if (!PageBuddy(page)) > continue; > I am running with it now, adding a printout to see if the case happens at all. Might take a while, will try to stress the machine a bit. Thanks, Henrik -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Oops in 3.7-rc8 isolate_free_pages_block() 2012-12-06 16:19 ` Mel Gorman 2012-12-06 16:50 ` Linus Torvalds 2012-12-06 16:58 ` Henrik Rydberg @ 2012-12-06 17:22 ` Henrik Rydberg 2 siblings, 0 replies; 18+ messages in thread From: Henrik Rydberg @ 2012-12-06 17:22 UTC (permalink / raw) To: Mel Gorman; +Cc: Jan Kara, Linus Torvalds, linux-mm, Linux Kernel Mailing List > Still travelling and am not in a position to test this properly :(. > However, this bug feels very similar to a bug in the migration scanner where > a pfn_valid check is missed because the start is not aligned. Henrik, when > did this start happening? I would be a little surprised if it started between > 3.6 and 3.7-rcX but maybe it's just easier to hit now for some reason. How > reproducible is this? Is there anything in particular you do to trigger the > oops? Does the following patch help any? It's only compile tested I'm afraid. I managed to trigger the path several times with a small memory-intensive program, and since I am still here, Tested-by: Henrik Rydberg <rydberg@euromail.se> Thanks! Henrik -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2012-12-07 8:41 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <20121206091744.GA1397@polaris.bitmath.org> 2012-12-06 14:48 ` Oops in 3.7-rc8 isolate_free_pages_block() Jan Kara 2012-12-06 15:22 ` Henrik Rydberg 2012-12-06 16:10 ` Linus Torvalds 2012-12-06 16:35 ` Mel Gorman 2012-12-06 16:19 ` Mel Gorman 2012-12-06 16:50 ` Linus Torvalds 2012-12-06 17:55 ` Mel Gorman 2012-12-06 18:19 ` Linus Torvalds 2012-12-06 18:21 ` Mel Gorman 2012-12-06 18:32 ` Henrik Rydberg 2012-12-06 18:41 ` Linus Torvalds 2012-12-06 19:01 ` Mel Gorman 2012-12-06 19:28 ` Henrik Rydberg 2012-12-06 19:38 ` Linus Torvalds 2012-12-06 21:39 ` Henrik Rydberg 2012-12-07 8:32 ` Mel Gorman 2012-12-06 16:58 ` Henrik Rydberg 2012-12-06 17:22 ` Henrik Rydberg
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).