From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx152.postini.com [74.125.245.152]) by kanga.kvack.org (Postfix) with SMTP id A46726B0081 for ; Thu, 6 Dec 2012 09:48:24 -0500 (EST) Date: Thu, 6 Dec 2012 15:48:21 +0100 From: Jan Kara Subject: Re: Oops in 3.7-rc8 isolate_free_pages_block() Message-ID: <20121206144821.GC18547@quack.suse.cz> References: <20121206091744.GA1397@polaris.bitmath.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20121206091744.GA1397@polaris.bitmath.org> Sender: owner-linux-mm@kvack.org List-ID: To: Henrik Rydberg Cc: Linus Torvalds , mgorman@suse.de, linux-mm@kvack.org, Linux Kernel Mailing List On Thu 06-12-12 10:17:44, Henrik Rydberg wrote: > Hi Linus, > > This is the third time I encounter this oops in 3.7, but the first > time I managed to get a decent screenshot: > > http://bitmath.org/test/oops-3.7-rc8.jpg > > It seems to have to do with page migration. I run with transparent > hugepages configured, just for the fun of it. > > I am happy to test any suggestions. Adding linux-mm and Mel as an author of compaction in particular to CC... It seems that while traversing struct page structures, we entered into a new huge page (note that RBX is 0xffffea0001c00000 - just the beginning of a huge page) and oopsed on PageBuddy test (_mapcount is at offset 0x18 in struct page). It might be useful if you provide disassembly of isolate_freepages_block() function in your kernel so that we can guess more from other register contents... Honza Honza -- Jan Kara SUSE Labs, CR -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx189.postini.com [74.125.245.189]) by kanga.kvack.org (Postfix) with SMTP id 564DE6B0062 for ; Thu, 6 Dec 2012 10:20:30 -0500 (EST) Received: from ipb4.telenor.se (ipb4.telenor.se [195.54.127.167]) by smtprelay-h21.telenor.se (Postfix) with ESMTP id 7E8E0E9EC9 for ; Thu, 6 Dec 2012 16:20:28 +0100 (CET) From: "Henrik Rydberg" Date: Thu, 6 Dec 2012 16:22:34 +0100 Subject: Re: Oops in 3.7-rc8 isolate_free_pages_block() Message-ID: <20121206152234.GA5309@polaris.bitmath.org> References: <20121206091744.GA1397@polaris.bitmath.org> <20121206144821.GC18547@quack.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20121206144821.GC18547@quack.suse.cz> Sender: owner-linux-mm@kvack.org List-ID: To: Jan Kara Cc: Linus Torvalds , mgorman@suse.de, linux-mm@kvack.org, Linux Kernel Mailing List Hi Jan, > > http://bitmath.org/test/oops-3.7-rc8.jpg > > > > It seems to have to do with page migration. I run with transparent > > hugepages configured, just for the fun of it. > > > > I am happy to test any suggestions. > > Adding linux-mm and Mel as an author of compaction in particular to CC... > > It seems that while traversing struct page structures, we entered into a new > huge page (note that RBX is 0xffffea0001c00000 - just the beginning of > a huge page) and oopsed on PageBuddy test (_mapcount is at offset 0x18 in > struct page). It might be useful if you provide disassembly of > isolate_freepages_block() function in your kernel so that we can guess more > from other register contents... I had to recreate the vmlinux file, but it seems be at the right address, so here we go: ffffffff810a6d00 : ffffffff810a6d00: 48 b8 00 00 00 00 00 movabs $0xffffea0000000000,%rax ffffffff810a6d07: ea ff ff ffffffff810a6d0a: 41 57 push %r15 ffffffff810a6d0c: 41 56 push %r14 ffffffff810a6d0e: 49 89 fe mov %rdi,%r14 ffffffff810a6d11: 41 55 push %r13 ffffffff810a6d13: 49 89 d5 mov %rdx,%r13 ffffffff810a6d16: 41 54 push %r12 ffffffff810a6d18: 55 push %rbp ffffffff810a6d19: 53 push %rbx ffffffff810a6d1a: 48 89 f3 mov %rsi,%rbx ffffffff810a6d1d: 48 c1 e3 06 shl $0x6,%rbx ffffffff810a6d21: 48 83 ec 58 sub $0x58,%rsp ffffffff810a6d25: 48 01 c3 add %rax,%rbx ffffffff810a6d28: 48 39 f2 cmp %rsi,%rdx ffffffff810a6d2b: 48 89 74 24 30 mov %rsi,0x30(%rsp) ffffffff810a6d30: 44 88 44 24 3b mov %r8b,0x3b(%rsp) ffffffff810a6d35: 0f 86 15 02 00 00 jbe ffffffff810a6f50 ffffffff810a6d3b: 48 8d 47 58 lea 0x58(%rdi),%rax ffffffff810a6d3f: 31 d2 xor %edx,%edx ffffffff810a6d41: 48 8b 6c 24 30 mov 0x30(%rsp),%rbp ffffffff810a6d46: 48 89 44 24 20 mov %rax,0x20(%rsp) ffffffff810a6d4b: 48 8d 47 40 lea 0x40(%rdi),%rax ffffffff810a6d4f: 49 89 dc mov %rbx,%r12 ffffffff810a6d52: c7 44 24 3c 00 00 00 movl $0x0,0x3c(%rsp) ffffffff810a6d59: 00 ffffffff810a6d5a: 49 89 ce mov %rcx,%r14 ffffffff810a6d5d: 41 89 d7 mov %edx,%r15d ffffffff810a6d60: 48 89 44 24 28 mov %rax,0x28(%rsp) ffffffff810a6d65: 48 89 7c 24 18 mov %rdi,0x18(%rsp) ffffffff810a6d6a: eb 1c jmp ffffffff810a6d88 ffffffff810a6d6c: 0f 1f 40 00 nopl 0x0(%rax) ffffffff810a6d70: 48 83 c5 01 add $0x1,%rbp ffffffff810a6d74: 48 83 c3 40 add $0x40,%rbx ffffffff810a6d78: 49 39 ed cmp %rbp,%r13 ffffffff810a6d7b: 0f 86 cf 00 00 00 jbe ffffffff810a6e50 ffffffff810a6d81: 4d 85 e4 test %r12,%r12 ffffffff810a6d84: 4c 0f 44 e3 cmove %rbx,%r12 ffffffff810a6d88: 8b 43 18 mov 0x18(%rbx),%eax ffffffff810a6d8b: 83 f8 80 cmp $0xffffff80,%eax ffffffff810a6d8e: 75 e0 jne ffffffff810a6d70 ffffffff810a6d90: 48 8b 44 24 18 mov 0x18(%rsp),%rax ffffffff810a6d95: 48 8d 74 24 48 lea 0x48(%rsp),%rsi ffffffff810a6d9a: 41 0f b6 d7 movzbl %r15b,%edx ffffffff810a6d9e: 4c 8b 44 24 20 mov 0x20(%rsp),%r8 ffffffff810a6da3: 48 8b 4c 24 28 mov 0x28(%rsp),%rcx ffffffff810a6da8: 48 8b 40 50 mov 0x50(%rax),%rax ffffffff810a6dac: 48 89 c7 mov %rax,%rdi ffffffff810a6daf: 48 83 c7 50 add $0x50,%rdi ffffffff810a6db3: e8 a8 fe ff ff callq ffffffff810a6c60 ffffffff810a6db8: 84 c0 test %al,%al ffffffff810a6dba: 41 89 c7 mov %eax,%r15d ffffffff810a6dbd: 0f 84 8d 00 00 00 je ffffffff810a6e50 ffffffff810a6dc3: 80 7c 24 3b 00 cmpb $0x0,0x3b(%rsp) ffffffff810a6dc8: 0f 84 c2 00 00 00 je ffffffff810a6e90 ffffffff810a6dce: 8b 43 18 mov 0x18(%rbx),%eax ffffffff810a6dd1: 83 f8 80 cmp $0xffffff80,%eax ffffffff810a6dd4: 75 9a jne ffffffff810a6d70 ffffffff810a6dd6: 48 89 df mov %rbx,%rdi ffffffff810a6dd9: e8 32 db fe ff callq ffffffff81094910 ffffffff810a6dde: 85 c0 test %eax,%eax ffffffff810a6de0: 0f 84 81 01 00 00 je ffffffff810a6f67 ffffffff810a6de6: 01 44 24 3c add %eax,0x3c(%rsp) ffffffff810a6dea: 83 f8 00 cmp $0x0,%eax ffffffff810a6ded: 0f 8e 48 01 00 00 jle ffffffff810a6f3b ffffffff810a6df3: 4d 8b 06 mov (%r14),%r8 ffffffff810a6df6: 48 89 d9 mov %rbx,%rcx ffffffff810a6df9: 31 ff xor %edi,%edi ffffffff810a6dfb: 4c 8d 5b 20 lea 0x20(%rbx),%r11 ffffffff810a6dff: 90 nop ffffffff810a6e00: 48 8d 51 20 lea 0x20(%rcx),%rdx ffffffff810a6e04: 48 89 ce mov %rcx,%rsi ffffffff810a6e07: 83 c7 01 add $0x1,%edi ffffffff810a6e0a: 48 29 de sub %rbx,%rsi ffffffff810a6e0d: 48 83 c1 40 add $0x40,%rcx ffffffff810a6e11: 39 c7 cmp %eax,%edi ffffffff810a6e13: 49 89 50 08 mov %rdx,0x8(%r8) ffffffff810a6e17: 4e 89 04 1e mov %r8,(%rsi,%r11,1) ffffffff810a6e1b: 49 89 d0 mov %rdx,%r8 ffffffff810a6e1e: 4e 89 74 1e 08 mov %r14,0x8(%rsi,%r11,1) ffffffff810a6e23: 49 89 16 mov %rdx,(%r14) ffffffff810a6e26: 75 d8 jne ffffffff810a6e00 ffffffff810a6e28: 8d 48 ff lea -0x1(%rax),%ecx ffffffff810a6e2b: 48 98 cltq ffffffff810a6e2d: 48 83 e8 01 sub $0x1,%rax ffffffff810a6e31: 48 63 c9 movslq %ecx,%rcx ffffffff810a6e34: 48 01 cd add %rcx,%rbp ffffffff810a6e37: 48 c1 e0 06 shl $0x6,%rax ffffffff810a6e3b: 48 01 c3 add %rax,%rbx ffffffff810a6e3e: 48 83 c5 01 add $0x1,%rbp ffffffff810a6e42: 48 83 c3 40 add $0x40,%rbx ffffffff810a6e46: 49 39 ed cmp %rbp,%r13 ffffffff810a6e49: 0f 87 32 ff ff ff ja ffffffff810a6d81 ffffffff810a6e4f: 90 nop ffffffff810a6e50: 4c 8b 74 24 18 mov 0x18(%rsp),%r14 ffffffff810a6e55: 44 89 fa mov %r15d,%edx ffffffff810a6e58: 48 63 44 24 3c movslq 0x3c(%rsp),%rax ffffffff810a6e5d: 80 7c 24 3b 00 cmpb $0x0,0x3b(%rsp) ffffffff810a6e62: 74 11 je ffffffff810a6e75 ffffffff810a6e64: 4c 89 e9 mov %r13,%rcx ffffffff810a6e67: 48 2b 4c 24 30 sub 0x30(%rsp),%rcx ffffffff810a6e6c: 48 39 c1 cmp %rax,%rcx ffffffff810a6e6f: 0f 87 b7 00 00 00 ja ffffffff810a6f2c ffffffff810a6e75: 84 d2 test %dl,%dl ffffffff810a6e77: 75 31 jne ffffffff810a6eaa ffffffff810a6e79: 4c 39 ed cmp %r13,%rbp ffffffff810a6e7c: 74 4d je ffffffff810a6ecb ffffffff810a6e7e: 48 83 c4 58 add $0x58,%rsp ffffffff810a6e82: 5b pop %rbx ffffffff810a6e83: 5d pop %rbp ffffffff810a6e84: 41 5c pop %r12 ffffffff810a6e86: 41 5d pop %r13 ffffffff810a6e88: 41 5e pop %r14 ffffffff810a6e8a: 41 5f pop %r15 ffffffff810a6e8c: c3 retq ffffffff810a6e8d: 0f 1f 00 nopl (%rax) ffffffff810a6e90: 48 89 df mov %rbx,%rdi ffffffff810a6e93: e8 78 fd ff ff callq ffffffff810a6c10 ffffffff810a6e98: 84 c0 test %al,%al ffffffff810a6e9a: 0f 85 2e ff ff ff jne ffffffff810a6dce ffffffff810a6ea0: 4c 8b 74 24 18 mov 0x18(%rsp),%r14 ffffffff810a6ea5: 48 63 44 24 3c movslq 0x3c(%rsp),%rax ffffffff810a6eaa: 49 8b 7e 50 mov 0x50(%r14),%rdi ffffffff810a6eae: 48 89 44 24 08 mov %rax,0x8(%rsp) ffffffff810a6eb3: 48 8b 74 24 48 mov 0x48(%rsp),%rsi ffffffff810a6eb8: 48 83 c7 50 add $0x50,%rdi ffffffff810a6ebc: e8 1f 14 60 00 callq ffffffff816a82e0 <_raw_spin_unlock_irqrestore> ffffffff810a6ec1: 4c 39 ed cmp %r13,%rbp ffffffff810a6ec4: 48 8b 44 24 08 mov 0x8(%rsp),%rax ffffffff810a6ec9: 75 b3 jne ffffffff810a6e7e ffffffff810a6ecb: 4d 85 e4 test %r12,%r12 ffffffff810a6ece: 49 8b 5e 50 mov 0x50(%r14),%rbx ffffffff810a6ed2: 74 aa je ffffffff810a6e7e ffffffff810a6ed4: 8b 4c 24 3c mov 0x3c(%rsp),%ecx ffffffff810a6ed8: 85 c9 test %ecx,%ecx ffffffff810a6eda: 75 a2 jne ffffffff810a6e7e ffffffff810a6edc: b9 03 00 00 00 mov $0x3,%ecx ffffffff810a6ee1: ba 03 00 00 00 mov $0x3,%edx ffffffff810a6ee6: be 01 00 00 00 mov $0x1,%esi ffffffff810a6eeb: 4c 89 e7 mov %r12,%rdi ffffffff810a6eee: 48 89 44 24 08 mov %rax,0x8(%rsp) ffffffff810a6ef3: e8 18 d1 fe ff callq ffffffff81094010 ffffffff810a6ef8: 41 80 7e 42 00 cmpb $0x0,0x42(%r14) ffffffff810a6efd: 48 8b 44 24 08 mov 0x8(%rsp),%rax ffffffff810a6f02: 0f 85 76 ff ff ff jne ffffffff810a6e7e ffffffff810a6f08: 48 ba 00 00 00 00 00 movabs $0x160000000000,%rdx ffffffff810a6f0f: 16 00 00 ffffffff810a6f12: 49 01 d4 add %rdx,%r12 ffffffff810a6f15: 49 c1 fc 06 sar $0x6,%r12 ffffffff810a6f19: 4c 3b 63 60 cmp 0x60(%rbx),%r12 ffffffff810a6f1d: 0f 83 5b ff ff ff jae ffffffff810a6e7e ffffffff810a6f23: 4c 89 63 60 mov %r12,0x60(%rbx) ffffffff810a6f27: e9 52 ff ff ff jmpq ffffffff810a6e7e ffffffff810a6f2c: 31 c0 xor %eax,%eax ffffffff810a6f2e: c7 44 24 3c 00 00 00 movl $0x0,0x3c(%rsp) ffffffff810a6f35: 00 ffffffff810a6f36: e9 3a ff ff ff jmpq ffffffff810a6e75 ffffffff810a6f3b: 0f 84 2f fe ff ff je ffffffff810a6d70 ffffffff810a6f41: e9 e2 fe ff ff jmpq ffffffff810a6e28 ffffffff810a6f46: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) ffffffff810a6f4d: 00 00 00 ffffffff810a6f50: 48 89 f5 mov %rsi,%rbp ffffffff810a6f53: 31 c0 xor %eax,%eax ffffffff810a6f55: c7 44 24 3c 00 00 00 movl $0x0,0x3c(%rsp) ffffffff810a6f5c: 00 ffffffff810a6f5d: 31 d2 xor %edx,%edx ffffffff810a6f5f: 45 31 e4 xor %r12d,%r12d ffffffff810a6f62: e9 f6 fe ff ff jmpq ffffffff810a6e5d ffffffff810a6f67: 80 7c 24 3b 00 cmpb $0x0,0x3b(%rsp) ffffffff810a6f6c: 0f 84 74 fe ff ff je ffffffff810a6de6 ffffffff810a6f72: 44 89 fa mov %r15d,%edx ffffffff810a6f75: 4c 8b 74 24 18 mov 0x18(%rsp),%r14 ffffffff810a6f7a: 48 63 44 24 3c movslq 0x3c(%rsp),%rax ffffffff810a6f7f: e9 e0 fe ff ff jmpq ffffffff810a6e64 ffffffff810a6f84: 66 66 66 2e 0f 1f 84 data32 data32 nopw %cs:0x0(%rax,%rax,1) ffffffff810a6f8b: 00 00 00 00 00 Thanks, Henrik -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx111.postini.com [74.125.245.111]) by kanga.kvack.org (Postfix) with SMTP id 037136B006C for ; Thu, 6 Dec 2012 11:10:35 -0500 (EST) Received: by mail-wg0-f47.google.com with SMTP id dq11so3248118wgb.26 for ; Thu, 06 Dec 2012 08:10:34 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20121206152234.GA5309@polaris.bitmath.org> References: <20121206091744.GA1397@polaris.bitmath.org> <20121206144821.GC18547@quack.suse.cz> <20121206152234.GA5309@polaris.bitmath.org> From: Linus Torvalds Date: Thu, 6 Dec 2012 08:10:13 -0800 Message-ID: Subject: Re: Oops in 3.7-rc8 isolate_free_pages_block() Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org List-ID: To: Henrik Rydberg Cc: Jan Kara , Mel Gorman , linux-mm , Linux Kernel Mailing List , Andrew Morton Ok, so it's isolate_freepages_block+0x88, and as Jan Kara already guessed from just the offset, that is indeed likely the PageBuddy() test. On Thu, Dec 6, 2012 at 7:22 AM, Henrik Rydberg wrote: > > http://bitmath.org/test/oops-3.7-rc8.jpg > > ffffffff810a6d6a: eb 1c jmp ffffffff810a6d88 > ffffffff810a6d6c: 0f 1f 40 00 nopl 0x0(%rax) On the first entry to the loop, we jump *into* the loop, over the end condition (the compiler has basically turned. And we jump directly to the faulting instruction. Looking at the register state, though, we're not at the first iteration of the loop, so we don't have to worry about that case. The loop itself then starts with: > ffffffff810a6d70: 48 83 c5 01 add $0x1,%rbp > ffffffff810a6d74: 48 83 c3 40 add $0x40,%rbx The above is the "blockpfn++, cursor++" part of the loop, while the test below is the loop condition ("blockpfn < end_pfn"): > ffffffff810a6d78: 49 39 ed cmp %rbp,%r13 > ffffffff810a6d7b: 0f 86 cf 00 00 00 jbe ffffffff810a6e50 >>From your image, %rbp is 0x070000 and %r13 is 0x0702f9. The "pfn_valid_within()" test is a no-op because we don't have holes in zones on x86, so then we have if (!valid_page) valid_page = page; which generates a test+cmove: > ffffffff810a6d81: 4d 85 e4 test %r12,%r12 > ffffffff810a6d84: 4c 0f 44 e3 cmove %rbx,%r12 (which is how we can tell we're not at the beginning: 'valid_page' is 0xffffea0001bfbe40, while the current page is 0xffffea0001c00000). .. and finally the oopsing instruction from PageBuddy(), which is the read of the 'page->_mapcount' > ffffffff810a6d88: 8b 43 18 mov 0x18(%rbx),%eax > ffffffff810a6d8b: 83 f8 80 cmp $0xffffff80,%eax > ffffffff810a6d8e: 75 e0 jne ffffffff810a6d70 So yeah, that loop has apparently wandered into la-la-land. end_pfn must be somehow wrong. Mel, does any of this ring a bell (Andrew also added to the cc, since the patches came through him). Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx165.postini.com [74.125.245.165]) by kanga.kvack.org (Postfix) with SMTP id D78046B00A5 for ; Thu, 6 Dec 2012 11:27:56 -0500 (EST) Date: Thu, 6 Dec 2012 16:19:34 +0000 From: Mel Gorman Subject: Re: Oops in 3.7-rc8 isolate_free_pages_block() Message-ID: <20121206161934.GA17258@suse.de> References: <20121206091744.GA1397@polaris.bitmath.org> <20121206144821.GC18547@quack.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20121206144821.GC18547@quack.suse.cz> Sender: owner-linux-mm@kvack.org List-ID: To: Jan Kara Cc: Henrik Rydberg , Linus Torvalds , linux-mm@kvack.org, Linux Kernel Mailing List On Thu, Dec 06, 2012 at 03:48:21PM +0100, Jan Kara wrote: > On Thu 06-12-12 10:17:44, Henrik Rydberg wrote: > > Hi Linus, > > > > This is the third time I encounter this oops in 3.7, but the first > > time I managed to get a decent screenshot: > > > > http://bitmath.org/test/oops-3.7-rc8.jpg > > > > It seems to have to do with page migration. I run with transparent > > hugepages configured, just for the fun of it. > > > > I am happy to test any suggestions. > Adding linux-mm and Mel as an author of compaction in particular to CC... > It seems that while traversing struct page structures, we entered into a new > huge page (note that RBX is 0xffffea0001c00000 - just the beginning of > a huge page) and oopsed on PageBuddy test (_mapcount is at offset 0x18 in > struct page). It might be useful if you provide disassembly of > isolate_freepages_block() function in your kernel so that we can guess more > from other register contents... > Still travelling and am not in a position to test this properly :(. However, this bug feels very similar to a bug in the migration scanner where a pfn_valid check is missed because the start is not aligned. Henrik, when did this start happening? I would be a little surprised if it started between 3.6 and 3.7-rcX but maybe it's just easier to hit now for some reason. How reproducible is this? Is there anything in particular you do to trigger the oops? Does the following patch help any? It's only compile tested I'm afraid. ---8<--- mm: compaction: check pfn_valid when entering a new MAX_ORDER_NR_PAGES block during isolation for free Commit 0bf380bc (mm: compaction: check pfn_valid when entering a new MAX_ORDER_NR_PAGES block during isolation for migration) added a check for pfn_valid() when isolating pages for migration as the scanner does not necessarily start pageblock-aligned. However, the free scanner has the same problem. If it encounters a hole, it can also trigger an oops when is calls PageBuddy(page) on a page that is within an hole. Reported-by: Henrik Rydberg Signed-off-by: Mel Gorman Cc: stable@vger.kernel.org --- mm/compaction.c | 10 ++++++++++ 1 files changed, 10 insertions(+), 0 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 9eef558..7d85ad485 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -298,6 +298,16 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, continue; if (!valid_page) valid_page = page; + + /* + * As blockpfn may not start aligned, blockpfn->end_pfn + * may cross a MAX_ORDER_NR_PAGES boundary and a pfn_valid + * check is necessary. If the pfn is not valid, stop + * isolation. + */ + if ((blockpfn & (MAX_ORDER_NR_PAGES - 1)) == 0 && + !pfn_valid(blockpfn)) + break; if (!PageBuddy(page)) continue; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx105.postini.com [74.125.245.105]) by kanga.kvack.org (Postfix) with SMTP id 1E0848D0001 for ; Thu, 6 Dec 2012 11:43:34 -0500 (EST) Date: Thu, 6 Dec 2012 16:35:11 +0000 From: Mel Gorman Subject: Re: Oops in 3.7-rc8 isolate_free_pages_block() Message-ID: <20121206163511.GB17258@suse.de> References: <20121206091744.GA1397@polaris.bitmath.org> <20121206144821.GC18547@quack.suse.cz> <20121206152234.GA5309@polaris.bitmath.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Linus Torvalds Cc: Henrik Rydberg , Jan Kara , linux-mm , Linux Kernel Mailing List , Andrew Morton On Thu, Dec 06, 2012 at 08:10:13AM -0800, Linus Torvalds wrote: > Ok, so it's isolate_freepages_block+0x88, and as Jan Kara already > guessed from just the offset, that is indeed likely the PageBuddy() > test. > > On Thu, Dec 6, 2012 at 7:22 AM, Henrik Rydberg wrote: > > > > http://bitmath.org/test/oops-3.7-rc8.jpg > > > > ffffffff810a6d6a: eb 1c jmp ffffffff810a6d88 > > ffffffff810a6d6c: 0f 1f 40 00 nopl 0x0(%rax) > > On the first entry to the loop, we jump *into* the loop, over the end > condition (the compiler has basically turned. And we jump directly to > the faulting instruction. Looking at the register state, though, we're > not at the first iteration of the loop, so we don't have to worry > about that case. The loop itself then starts with: > > > ffffffff810a6d70: 48 83 c5 01 add $0x1,%rbp > > ffffffff810a6d74: 48 83 c3 40 add $0x40,%rbx > > The above is the "blockpfn++, cursor++" part of the loop, while the > test below is the loop condition ("blockpfn < end_pfn"): > > > ffffffff810a6d78: 49 39 ed cmp %rbp,%r13 > > ffffffff810a6d7b: 0f 86 cf 00 00 00 jbe ffffffff810a6e50 > > From your image, %rbp is 0x070000 and %r13 is 0x0702f9. > > The "pfn_valid_within()" test is a no-op because we don't have holes > in zones on x86, so then we have > That thing is not about holes in zones, it's about holes within a MAX_ORDER_NR_PAGES block but either way it's a no-op x86 and we're not doing a pfn_valid check in this loop. I didn't look back in time but I have a vague recollection that this used to be always start with an aligned PFN but with large amounts of churn since, it's no longer true. > if (!valid_page) > valid_page = page; > > which generates a test+cmove: > > > ffffffff810a6d81: 4d 85 e4 test %r12,%r12 > > ffffffff810a6d84: 4c 0f 44 e3 cmove %rbx,%r12 > > (which is how we can tell we're not at the beginning: 'valid_page' is > 0xffffea0001bfbe40, while the current page is 0xffffea0001c00000). > > .. and finally the oopsing instruction from PageBuddy(), which is the > read of the 'page->_mapcount' > > > ffffffff810a6d88: 8b 43 18 mov 0x18(%rbx),%eax > > ffffffff810a6d8b: 83 f8 80 cmp $0xffffff80,%eax > > ffffffff810a6d8e: 75 e0 jne ffffffff810a6d70 > > So yeah, that loop has apparently wandered into la-la-land. end_pfn > must be somehow wrong. > I think we wandered into a hole where there is no valid struct page. > Mel, does any of this ring a bell (Andrew also added to the cc, since > the patches came through him). > It reminded me of a similar bug in the migration scanner which I mentioned in the patch elsewhere in the thread but carelessly failed to cc Andrew. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx159.postini.com [74.125.245.159]) by kanga.kvack.org (Postfix) with SMTP id C77BB6B00A7 for ; Thu, 6 Dec 2012 11:51:16 -0500 (EST) Received: by mail-wi0-f169.google.com with SMTP id hq12so682414wib.2 for ; Thu, 06 Dec 2012 08:51:15 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20121206161934.GA17258@suse.de> References: <20121206091744.GA1397@polaris.bitmath.org> <20121206144821.GC18547@quack.suse.cz> <20121206161934.GA17258@suse.de> From: Linus Torvalds Date: Thu, 6 Dec 2012 08:50:54 -0800 Message-ID: Subject: Re: Oops in 3.7-rc8 isolate_free_pages_block() Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org List-ID: To: Mel Gorman Cc: Jan Kara , Henrik Rydberg , linux-mm , Linux Kernel Mailing List On Thu, Dec 6, 2012 at 8:19 AM, Mel Gorman wrote: > > Still travelling and am not in a position to test this properly :(. > However, this bug feels very similar to a bug in the migration scanner where > a pfn_valid check is missed because the start is not aligned. Ugh. This patch makes my eyes bleed. Is there no way to do this nicely in the caller? IOW, fix the 'end_pfn' logic way upstream where it is computed, and just cap it at the MAX_ORDER_NR_PAGES boundary? For example, isolate_freepages_range() seems to have this *other* end-point alignment thing going on, and does it in a loop. Wouldn't it be much better to have a separate loop that looped up to the next MAX_ORDER_NR_PAGES boundary instead of having this kind of very random test in the middle of a loop. Even the name ("isolate_freepages_block") implies that we have a "block" of pages. Having to have a random "oops, this block can have other blocks inside of it that aren't mapped" test in the middle of that function really makes me go "Uhh, no". Plus, is it even guaranteed that the *first* pfn (that we get called with) is pfnvalid to begin with? So I guess this patch fixes things, but it does make me go "That's really *really* ugly". Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx140.postini.com [74.125.245.140]) by kanga.kvack.org (Postfix) with SMTP id 109028D0001 for ; Thu, 6 Dec 2012 11:56:25 -0500 (EST) Received: from ipb4.telenor.se (ipb4.telenor.se [195.54.127.167]) by smtprelay-h22.telenor.se (Postfix) with ESMTP id 45D96E9D18 for ; Thu, 6 Dec 2012 17:56:23 +0100 (CET) From: "Henrik Rydberg" Date: Thu, 6 Dec 2012 17:58:29 +0100 Subject: Re: Oops in 3.7-rc8 isolate_free_pages_block() Message-ID: <20121206165829.GA392@polaris.bitmath.org> References: <20121206091744.GA1397@polaris.bitmath.org> <20121206144821.GC18547@quack.suse.cz> <20121206161934.GA17258@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20121206161934.GA17258@suse.de> Sender: owner-linux-mm@kvack.org List-ID: To: Mel Gorman Cc: Jan Kara , Linus Torvalds , linux-mm@kvack.org, Linux Kernel Mailing List Hi Mel, > Still travelling and am not in a position to test this properly :(. > However, this bug feels very similar to a bug in the migration scanner where > a pfn_valid check is missed because the start is not aligned. Henrik, when > did this start happening? I would be a little surprised if it started between > 3.6 and 3.7-rcX but maybe it's just easier to hit now for some reason. I started using transparent hugepages when moving to 3.7-rc1, so it is quite possible that the problem was there already in 3.6. > How reproducible is this? Is there anything in particular you do to > trigger the oops? Unfortunately nothing special, and it is rare. IIRC, it has happened after a long uptime, but I guess that only means the probability of the oops is higher then. > Does the following patch help any? It's only compile tested I'm afraid. > > ---8<--- > mm: compaction: check pfn_valid when entering a new MAX_ORDER_NR_PAGES block during isolation for free > > Commit 0bf380bc (mm: compaction: check pfn_valid when entering a new > MAX_ORDER_NR_PAGES block during isolation for migration) added a check > for pfn_valid() when isolating pages for migration as the scanner does > not necessarily start pageblock-aligned. However, the free scanner has > the same problem. If it encounters a hole, it can also trigger an oops > when is calls PageBuddy(page) on a page that is within an hole. > > Reported-by: Henrik Rydberg > Signed-off-by: Mel Gorman > Cc: stable@vger.kernel.org > --- > mm/compaction.c | 10 ++++++++++ > 1 files changed, 10 insertions(+), 0 deletions(-) > > diff --git a/mm/compaction.c b/mm/compaction.c > index 9eef558..7d85ad485 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -298,6 +298,16 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, > continue; > if (!valid_page) > valid_page = page; > + > + /* > + * As blockpfn may not start aligned, blockpfn->end_pfn > + * may cross a MAX_ORDER_NR_PAGES boundary and a pfn_valid > + * check is necessary. If the pfn is not valid, stop > + * isolation. > + */ > + if ((blockpfn & (MAX_ORDER_NR_PAGES - 1)) == 0 && > + !pfn_valid(blockpfn)) > + break; > if (!PageBuddy(page)) > continue; > I am running with it now, adding a printout to see if the case happens at all. Might take a while, will try to stress the machine a bit. Thanks, Henrik -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx177.postini.com [74.125.245.177]) by kanga.kvack.org (Postfix) with SMTP id 4B3AE6B00B8 for ; Thu, 6 Dec 2012 12:20:23 -0500 (EST) Received: from ipb2.telenor.se (ipb2.telenor.se [195.54.127.165]) by smtprelay-h22.telenor.se (Postfix) with ESMTP id 6ABF5E8CAE for ; Thu, 6 Dec 2012 18:20:21 +0100 (CET) From: "Henrik Rydberg" Date: Thu, 6 Dec 2012 18:22:25 +0100 Subject: Re: Oops in 3.7-rc8 isolate_free_pages_block() Message-ID: <20121206172225.GA978@polaris.bitmath.org> References: <20121206091744.GA1397@polaris.bitmath.org> <20121206144821.GC18547@quack.suse.cz> <20121206161934.GA17258@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20121206161934.GA17258@suse.de> Sender: owner-linux-mm@kvack.org List-ID: To: Mel Gorman Cc: Jan Kara , Linus Torvalds , linux-mm@kvack.org, Linux Kernel Mailing List > Still travelling and am not in a position to test this properly :(. > However, this bug feels very similar to a bug in the migration scanner where > a pfn_valid check is missed because the start is not aligned. Henrik, when > did this start happening? I would be a little surprised if it started between > 3.6 and 3.7-rcX but maybe it's just easier to hit now for some reason. How > reproducible is this? Is there anything in particular you do to trigger the > oops? Does the following patch help any? It's only compile tested I'm afraid. I managed to trigger the path several times with a small memory-intensive program, and since I am still here, Tested-by: Henrik Rydberg Thanks! Henrik -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx136.postini.com [74.125.245.136]) by kanga.kvack.org (Postfix) with SMTP id 2B1448D0006 for ; Thu, 6 Dec 2012 13:03:40 -0500 (EST) Date: Thu, 6 Dec 2012 17:55:15 +0000 From: Mel Gorman Subject: Re: Oops in 3.7-rc8 isolate_free_pages_block() Message-ID: <20121206175451.GC17258@suse.de> References: <20121206091744.GA1397@polaris.bitmath.org> <20121206144821.GC18547@quack.suse.cz> <20121206161934.GA17258@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Linus Torvalds Cc: Jan Kara , Henrik Rydberg , linux-mm , Linux Kernel Mailing List On Thu, Dec 06, 2012 at 08:50:54AM -0800, Linus Torvalds wrote: > On Thu, Dec 6, 2012 at 8:19 AM, Mel Gorman wrote: > > > > Still travelling and am not in a position to test this properly :(. > > However, this bug feels very similar to a bug in the migration scanner where > > a pfn_valid check is missed because the start is not aligned. > > Ugh. This patch makes my eyes bleed. > Yeah. I was listening to a talk while I was writing it, a bit cranky and didn't see why I should suffer alone. > Is there no way to do this nicely in the caller? IOW, fix the > 'end_pfn' logic way upstream where it is computed, and just cap it at > the MAX_ORDER_NR_PAGES boundary? > Easily done in the caller, but not on the MAX_ORDER_NR_PAGES boundary. The caller is striding by pageblock so a MAX_ORDER_NR_PAGES alignment will not work out. > For example, isolate_freepages_range() seems to have this *other* > end-point alignment thing going on, and does it in a loop. Wouldn't it > be much better to have a separate loop that looped up to the next > MAX_ORDER_NR_PAGES boundary instead of having this kind of very random > test in the middle of a loop. > > Even the name ("isolate_freepages_block") implies that we have a > "block" of pages. Having to have a random "oops, this block can have > other blocks inside of it that aren't mapped" test in the middle of > that function really makes me go "Uhh, no". > The block in the name is related to pageblocks. > Plus, is it even guaranteed that the *first* pfn (that we get called > with) is pfnvalid to begin with? > Yes, the caller has already checked pfn_valid() and it used to be the case that this pfn was pageblock-aligned but not since commit c89511ab (mm: compaction: Restart compaction from near where it left off). > So I guess this patch fixes things, but it does make me go "That's > really *really* ugly". > Quasimoto strikes again ---8<--- mm: compaction: check pfn_valid when entering a new MAX_ORDER_NR_PAGES block during isolation for free Commit 0bf380bc (mm: compaction: check pfn_valid when entering a new MAX_ORDER_NR_PAGES block during isolation for migration) added a check for pfn_valid() when isolating pages for migration as the scanner does not necessarily start pageblock-aligned. Since commit c89511ab (mm: compaction: Restart compaction from near where it left off), the free scanner has the same problem. This patch makes sure that the pfn range passed to isolate_freepages_block() is within the same block so that pfn_valid() checks are unnecessary. Reported-by: Henrik Rydberg Signed-off-by: Mel Gorman diff --git a/mm/compaction.c b/mm/compaction.c index 9eef558..c23fa55 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -713,7 +713,15 @@ static void isolate_freepages(struct zone *zone, /* Found a block suitable for isolating free pages from */ isolated = 0; - end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn); + + /* + * As pfn may not start aligned, pfn+pageblock_nr_page + * may cross a MAX_ORDER_NR_PAGES boundary and miss + * a pfn_valid check. Ensure isolate_freepages_block() + * only scans within a pageblock. + */ + end_pfn = ALIGN(pfn + pageblock_nr_pages, pageblock_nr_pages); + end_pfn = min(end_pfn, end_pfn); isolated = isolate_freepages_block(cc, pfn, end_pfn, freelist, false); nr_freepages += isolated; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx111.postini.com [74.125.245.111]) by kanga.kvack.org (Postfix) with SMTP id 3E31A6B00C4 for ; Thu, 6 Dec 2012 13:19:57 -0500 (EST) Received: by mail-wg0-f47.google.com with SMTP id dq11so3313934wgb.26 for ; Thu, 06 Dec 2012 10:19:55 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20121206175451.GC17258@suse.de> References: <20121206091744.GA1397@polaris.bitmath.org> <20121206144821.GC18547@quack.suse.cz> <20121206161934.GA17258@suse.de> <20121206175451.GC17258@suse.de> From: Linus Torvalds Date: Thu, 6 Dec 2012 10:19:35 -0800 Message-ID: Subject: Re: Oops in 3.7-rc8 isolate_free_pages_block() Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org List-ID: To: Mel Gorman Cc: Jan Kara , Henrik Rydberg , linux-mm , Linux Kernel Mailing List On Thu, Dec 6, 2012 at 9:55 AM, Mel Gorman wrote: > > Yeah. I was listening to a talk while I was writing it, a bit cranky and > didn't see why I should suffer alone. Makes sense. > Quasimoto strikes again Is that Quasimodo's Japanese cousin? > - end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn); > + > + /* > + * As pfn may not start aligned, pfn+pageblock_nr_page > + * may cross a MAX_ORDER_NR_PAGES boundary and miss > + * a pfn_valid check. Ensure isolate_freepages_block() > + * only scans within a pageblock. > + */ > + end_pfn = ALIGN(pfn + pageblock_nr_pages, pageblock_nr_pages); > + end_pfn = min(end_pfn, end_pfn); Ok, this looks much nicer, except it's obviously buggy. The min(end_pfn, end_pfn) thing is insane, and I'm sure you meant for that line to be + end_pfn = min(end_pfn, zone_end_pfn); Henrik, does that - corrected - patch (*instead* of the previous one, not in addition to) also fix your issue? Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx157.postini.com [74.125.245.157]) by kanga.kvack.org (Postfix) with SMTP id 1C0756B00CC for ; Thu, 6 Dec 2012 13:29:27 -0500 (EST) Date: Thu, 6 Dec 2012 18:21:03 +0000 From: Mel Gorman Subject: Re: Oops in 3.7-rc8 isolate_free_pages_block() Message-ID: <20121206182103.GD17258@suse.de> References: <20121206091744.GA1397@polaris.bitmath.org> <20121206144821.GC18547@quack.suse.cz> <20121206161934.GA17258@suse.de> <20121206175451.GC17258@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Linus Torvalds Cc: Jan Kara , Henrik Rydberg , linux-mm , Linux Kernel Mailing List On Thu, Dec 06, 2012 at 10:19:35AM -0800, Linus Torvalds wrote: > On Thu, Dec 6, 2012 at 9:55 AM, Mel Gorman wrote: > > > > Yeah. I was listening to a talk while I was writing it, a bit cranky and > > didn't see why I should suffer alone. > > Makes sense. > > > Quasimoto strikes again > > Is that Quasimodo's Japanese cousin? > Yes, he's tried to escape his terrible legacy with a name change. > > - end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn); > > + > > + /* > > + * As pfn may not start aligned, pfn+pageblock_nr_page > > + * may cross a MAX_ORDER_NR_PAGES boundary and miss > > + * a pfn_valid check. Ensure isolate_freepages_block() > > + * only scans within a pageblock. > > + */ > > + end_pfn = ALIGN(pfn + pageblock_nr_pages, pageblock_nr_pages); > > + end_pfn = min(end_pfn, end_pfn); > > Ok, this looks much nicer, except it's obviously buggy. The > min(end_pfn, end_pfn) thing is insane, and I'm sure you meant for that > line to be > > + end_pfn = min(end_pfn, zone_end_pfn); > *sigh* Yes, I did. Thanks. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx103.postini.com [74.125.245.103]) by kanga.kvack.org (Postfix) with SMTP id 1ED816B00CE for ; Thu, 6 Dec 2012 13:30:55 -0500 (EST) Received: from ipb4.telenor.se (ipb4.telenor.se [195.54.127.167]) by smtprelay-h21.telenor.se (Postfix) with ESMTP id 5FEAAE93CE for ; Thu, 6 Dec 2012 19:30:53 +0100 (CET) From: "Henrik Rydberg" Date: Thu, 6 Dec 2012 19:32:59 +0100 Subject: Re: Oops in 3.7-rc8 isolate_free_pages_block() Message-ID: <20121206183259.GA591@polaris.bitmath.org> References: <20121206091744.GA1397@polaris.bitmath.org> <20121206144821.GC18547@quack.suse.cz> <20121206161934.GA17258@suse.de> <20121206175451.GC17258@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Linus Torvalds Cc: Mel Gorman , Jan Kara , linux-mm , Linux Kernel Mailing List On Thu, Dec 06, 2012 at 10:19:35AM -0800, Linus Torvalds wrote: > On Thu, Dec 6, 2012 at 9:55 AM, Mel Gorman wrote: > > > > Yeah. I was listening to a talk while I was writing it, a bit cranky and > > didn't see why I should suffer alone. > > Makes sense. > > > Quasimoto strikes again > > Is that Quasimodo's Japanese cousin? > > > - end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn); > > + > > + /* > > + * As pfn may not start aligned, pfn+pageblock_nr_page > > + * may cross a MAX_ORDER_NR_PAGES boundary and miss > > + * a pfn_valid check. Ensure isolate_freepages_block() > > + * only scans within a pageblock. > > + */ > > + end_pfn = ALIGN(pfn + pageblock_nr_pages, pageblock_nr_pages); > > + end_pfn = min(end_pfn, end_pfn); > > Ok, this looks much nicer, except it's obviously buggy. The > min(end_pfn, end_pfn) thing is insane, and I'm sure you meant for that > line to be > > + end_pfn = min(end_pfn, zone_end_pfn); > > Henrik, does that - corrected - patch (*instead* of the previous one, > not in addition to) also fix your issue? Yes - I can no longer trigger the failpath, so it seems to work. Mel, enjoy the rest of the talk. ;-) Generally, I am a bit surprised that noone hit this before, given that it was quite easy to trigger. I will check 3.6 as well. Thanks, Henrik -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx174.postini.com [74.125.245.174]) by kanga.kvack.org (Postfix) with SMTP id C39878D0006 for ; Thu, 6 Dec 2012 13:41:35 -0500 (EST) Received: by mail-wi0-f169.google.com with SMTP id hq12so785532wib.2 for ; Thu, 06 Dec 2012 10:41:34 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20121206183259.GA591@polaris.bitmath.org> References: <20121206091744.GA1397@polaris.bitmath.org> <20121206144821.GC18547@quack.suse.cz> <20121206161934.GA17258@suse.de> <20121206175451.GC17258@suse.de> <20121206183259.GA591@polaris.bitmath.org> From: Linus Torvalds Date: Thu, 6 Dec 2012 10:41:14 -0800 Message-ID: Subject: Re: Oops in 3.7-rc8 isolate_free_pages_block() Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org List-ID: To: Henrik Rydberg Cc: Mel Gorman , Jan Kara , linux-mm , Linux Kernel Mailing List On Thu, Dec 6, 2012 at 10:32 AM, Henrik Rydberg wrote: >> >> Henrik, does that - corrected - patch (*instead* of the previous one, >> not in addition to) also fix your issue? > > Yes - I can no longer trigger the failpath, so it seems to work. Mel, > enjoy the rest of the talk. ;-) > > Generally, I am a bit surprised that noone hit this before, given that > it was quite easy to trigger. I will check 3.6 as well. Actually, looking at it some more, I think that two-liner patch had *ANOTHER* bug. Because the other line seems buggy as well. Instead of end_pfn = ALIGN(pfn + pageblock_nr_pages, pageblock_nr_pages); I think it should be end_pfn = ALIGN(pfn+1, pageblock_nr_pages); instead. ALIGN() already aligns upwards (but the "+1" is needed in case pfn is already at a pageblock_nr_pages boundary, at which point ALIGN() would have just returned that same boundary. Hmm? Mel, please confirm. And Henrik, it might be good to test that doubly-fixed patch. Because reading the patch and trying to fix bugs in it that way is *not* the same as actually verifying it ;) Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx177.postini.com [74.125.245.177]) by kanga.kvack.org (Postfix) with SMTP id C79C28D0006 for ; Thu, 6 Dec 2012 14:09:39 -0500 (EST) Date: Thu, 6 Dec 2012 19:01:14 +0000 From: Mel Gorman Subject: Re: Oops in 3.7-rc8 isolate_free_pages_block() Message-ID: <20121206190114.GE17258@suse.de> References: <20121206091744.GA1397@polaris.bitmath.org> <20121206144821.GC18547@quack.suse.cz> <20121206161934.GA17258@suse.de> <20121206175451.GC17258@suse.de> <20121206183259.GA591@polaris.bitmath.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Linus Torvalds Cc: Henrik Rydberg , Jan Kara , linux-mm , Linux Kernel Mailing List On Thu, Dec 06, 2012 at 10:41:14AM -0800, Linus Torvalds wrote: > On Thu, Dec 6, 2012 at 10:32 AM, Henrik Rydberg wrote: > >> > >> Henrik, does that - corrected - patch (*instead* of the previous one, > >> not in addition to) also fix your issue? > > > > Yes - I can no longer trigger the failpath, so it seems to work. Mel, > > enjoy the rest of the talk. ;-) > > > > Generally, I am a bit surprised that noone hit this before, given that > > it was quite easy to trigger. I will check 3.6 as well. > > Actually, looking at it some more, I think that two-liner patch had > *ANOTHER* bug. > > Because the other line seems buggy as well. > > Instead of > > end_pfn = ALIGN(pfn + pageblock_nr_pages, pageblock_nr_pages); > > I think it should be > > end_pfn = ALIGN(pfn+1, pageblock_nr_pages); > > instead. ALIGN() already aligns upwards (but the "+1" is needed in > case pfn is already at a pageblock_nr_pages boundary, at which point > ALIGN() would have just returned that same boundary. > > Hmm? Mel, please confirm. FFS. Yes, confirmed. In answer to Henrik's wondering why others have reported this -- reproducing this requires a large enough hole with the right aligment to have compaction walk into a PFN range with no memmap. Size and alignment depends in the memory model - 4M for FLATMEM and 128M for SPARSEMEM on x86. It needs a "lucky" machine. ---8<--- mm: compaction: check pfn_valid when entering a new MAX_ORDER_NR_PAGES block during isolation for free Commit 0bf380bc (mm: compaction: check pfn_valid when entering a new MAX_ORDER_NR_PAGES block during isolation for migration) added a check for pfn_valid() when isolating pages for migration as the scanner does not necessarily start pageblock-aligned. Since commit c89511ab (mm: compaction: Restart compaction from near where it left off), the free scanner has the same problem. This patch makes sure that the pfn range passed to isolate_freepages_block() is within the same block so that pfn_valid() checks are unnecessary. Reported-by: Henrik Rydberg Signed-off-by: Mel Gorman --- mm/compaction.c | 10 +++++++++- 1 files changed, 9 insertions(+), 1 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 9eef558..694eaab 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -713,7 +713,15 @@ static void isolate_freepages(struct zone *zone, /* Found a block suitable for isolating free pages from */ isolated = 0; - end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn); + + /* + * As pfn may not start aligned, pfn+pageblock_nr_page + * may cross a MAX_ORDER_NR_PAGES boundary and miss + * a pfn_valid check. Ensure isolate_freepages_block() + * only scans within a pageblock + */ + end_pfn = ALIGN(pfn + 1, pageblock_nr_pages); + end_pfn = min(end_pfn, zone_end_pfn); isolated = isolate_freepages_block(cc, pfn, end_pfn, freelist, false); nr_freepages += isolated; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx132.postini.com [74.125.245.132]) by kanga.kvack.org (Postfix) with SMTP id B9AE38D0006 for ; Thu, 6 Dec 2012 14:26:40 -0500 (EST) Received: from ipb5.telenor.se (ipb5.telenor.se [195.54.127.168]) by smtprelay-b22.telenor.se (Postfix) with ESMTP id D35E7EBBF3 for ; Thu, 6 Dec 2012 20:26:38 +0100 (CET) From: "Henrik Rydberg" Date: Thu, 6 Dec 2012 20:28:45 +0100 Subject: Re: Oops in 3.7-rc8 isolate_free_pages_block() Message-ID: <20121206192845.GA599@polaris.bitmath.org> References: <20121206091744.GA1397@polaris.bitmath.org> <20121206144821.GC18547@quack.suse.cz> <20121206161934.GA17258@suse.de> <20121206175451.GC17258@suse.de> <20121206183259.GA591@polaris.bitmath.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Linus Torvalds Cc: Mel Gorman , Jan Kara , linux-mm , Linux Kernel Mailing List > Actually, looking at it some more, I think that two-liner patch had > *ANOTHER* bug. > > Because the other line seems buggy as well. > > Instead of > > end_pfn = ALIGN(pfn + pageblock_nr_pages, pageblock_nr_pages); > > I think it should be > > end_pfn = ALIGN(pfn+1, pageblock_nr_pages); > > instead. ALIGN() already aligns upwards (but the "+1" is needed in > case pfn is already at a pageblock_nr_pages boundary, at which point > ALIGN() would have just returned that same boundary. Ah, and now the two callers treat the pointers the same way. > Hmm? Mel, please confirm. And Henrik, it might be good to test that > doubly-fixed patch. Because reading the patch and trying to fix bugs > in it that way is *not* the same as actually verifying it ;) Confirmed, working. I also checked 3.6, but could not trigger the original problem there. The code also looks different, so it makes sense. To be explicit, this is what I tested on top of v3.7-rc8: --- mm/compaction.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/mm/compaction.c b/mm/compaction.c index 9eef558..ff1c483 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -713,7 +713,15 @@ static void isolate_freepages(struct zone *zone, /* Found a block suitable for isolating free pages from */ isolated = 0; - end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn); + + /* + * As pfn may not start aligned, pfn+pageblock_nr_page + * may cross a MAX_ORDER_NR_PAGES boundary and miss + * a pfn_valid check. Ensure isolate_freepages_block() + * only scans within a pageblock. + */ + end_pfn = ALIGN(pfn + 1, pageblock_nr_pages); + end_pfn = min(end_pfn, zone_end_pfn); isolated = isolate_freepages_block(cc, pfn, end_pfn, freelist, false); nr_freepages += isolated; -- 1.8.0.1 Hopefully, that's a wrap. :-) Henrik -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx113.postini.com [74.125.245.113]) by kanga.kvack.org (Postfix) with SMTP id F07DC6B00D1 for ; Thu, 6 Dec 2012 14:39:08 -0500 (EST) Received: by mail-wg0-f47.google.com with SMTP id dq11so3350767wgb.26 for ; Thu, 06 Dec 2012 11:39:07 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20121206192845.GA599@polaris.bitmath.org> References: <20121206091744.GA1397@polaris.bitmath.org> <20121206144821.GC18547@quack.suse.cz> <20121206161934.GA17258@suse.de> <20121206175451.GC17258@suse.de> <20121206183259.GA591@polaris.bitmath.org> <20121206192845.GA599@polaris.bitmath.org> From: Linus Torvalds Date: Thu, 6 Dec 2012 11:38:47 -0800 Message-ID: Subject: Re: Oops in 3.7-rc8 isolate_free_pages_block() Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org List-ID: To: Henrik Rydberg Cc: Mel Gorman , Jan Kara , linux-mm , Linux Kernel Mailing List Ok, I've applied the patch. Mel, some grepping shows that there is an old line that does end_pfn = ALIGN(low_pfn + pageblock_nr_pages, pageblock_nr_pages); which looks bogus. That should probably also use "+ 1" instead. But I'll consider that an independent issue, so I applied the one patch regardless. There is also a low_pfn += pageblock_nr_pages; low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1; that looks suspicious for similar reasons. Maybe low_pfn = ALIGN(low_pfn + 1, pageblock_nr_pages) - 1; instead? Although that *can* result in the same low_pfn in the end, so maybe that one was correct after all? I just did some grepping, no actual semantic analysis... Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx157.postini.com [74.125.245.157]) by kanga.kvack.org (Postfix) with SMTP id 0632A8D0011 for ; Thu, 6 Dec 2012 16:37:09 -0500 (EST) Received: from ipb1.telenor.se (ipb1.telenor.se [195.54.127.164]) by smtprelay-b31.telenor.se (Postfix) with ESMTP id 1CE38E9E17 for ; Thu, 6 Dec 2012 22:37:07 +0100 (CET) From: "Henrik Rydberg" Date: Thu, 6 Dec 2012 22:39:09 +0100 Subject: Re: Oops in 3.7-rc8 isolate_free_pages_block() Message-ID: <20121206213909.GA625@polaris.bitmath.org> References: <20121206091744.GA1397@polaris.bitmath.org> <20121206144821.GC18547@quack.suse.cz> <20121206161934.GA17258@suse.de> <20121206175451.GC17258@suse.de> <20121206183259.GA591@polaris.bitmath.org> <20121206192845.GA599@polaris.bitmath.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Linus Torvalds Cc: Mel Gorman , Jan Kara , linux-mm , Linux Kernel Mailing List > There is also a > > low_pfn += pageblock_nr_pages; > low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1; > > that looks suspicious for similar reasons. Maybe > > low_pfn = ALIGN(low_pfn + 1, pageblock_nr_pages) - 1; > > instead? Although that *can* result in the same low_pfn in the end, so > maybe that one was correct after all? I just did some grepping, no > actual semantic analysis... Here is a totally obscure version: low_pfn |= pageblock_nr_pages - 1; It simply moves to the very end of the block, which seems to be what was intended. Henrik -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx120.postini.com [74.125.245.120]) by kanga.kvack.org (Postfix) with SMTP id 524E76B0068 for ; Fri, 7 Dec 2012 03:41:33 -0500 (EST) Date: Fri, 7 Dec 2012 08:32:48 +0000 From: Mel Gorman Subject: Re: Oops in 3.7-rc8 isolate_free_pages_block() Message-ID: <20121207083248.GF17258@suse.de> References: <20121206091744.GA1397@polaris.bitmath.org> <20121206144821.GC18547@quack.suse.cz> <20121206161934.GA17258@suse.de> <20121206175451.GC17258@suse.de> <20121206183259.GA591@polaris.bitmath.org> <20121206192845.GA599@polaris.bitmath.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Linus Torvalds Cc: Henrik Rydberg , Jan Kara , linux-mm , Linux Kernel Mailing List On Thu, Dec 06, 2012 at 11:38:47AM -0800, Linus Torvalds wrote: > Ok, I've applied the patch. > Thanks. > Mel, some grepping shows that there is an old line that does > > end_pfn = ALIGN(low_pfn + pageblock_nr_pages, pageblock_nr_pages); > > which looks bogus. It's bogus. The impact is that multiple compaction attempts may be needed to clear a particular block for allocation. THP allocation success rate under stress will be lower and the latency before a range of pages is collapsed by khugepaged to a huge page will be higher. The impact of this is less and it should not result in a bug like Henrik's An attentive reviewer is going to exclaim that GFP_ATOMIC allocations for jumbo frames is impacted by this but it isn't. Even with this bogus walk, compaction will be clearing SWAP_CLUSTER_MAX contiguous chunks which is enough for jumbo frames. > That should probably also use "+ 1" instead. But > I'll consider that an independent issue, so I applied the one patch > regardless. > > There is also a > > low_pfn += pageblock_nr_pages; > low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1; > > that looks suspicious for similar reasons. Maybe > > low_pfn = ALIGN(low_pfn + 1, pageblock_nr_pages) - 1; > This one is working by co-incidence because the low_pfn will be aligned in most cases. If it was outright broken then CMA would never work either. > instead? Although that *can* result in the same low_pfn in the end, so > maybe that one was correct after all? I just did some grepping, no > actual semantic analysis... > They need fixing but the impact is much less severe and does not justify delaying 3.8 over unlike the other last-minute fixes. My performance writing patches during talks was less than stellar yesterday so I'll avoid a repeat performance and follow up with Andrew early next week with a cc to -stable. It'll also give me a chance to run the patches through the highalloc stress tests and confirm that allocation success rate is higher and latency lower as would be expected by such a fix. Thanks. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754847Ab2LFJPn (ORCPT ); Thu, 6 Dec 2012 04:15:43 -0500 Received: from smtprelay-h22.telenor.se ([195.54.99.197]:60117 "EHLO smtprelay-h22.telenor.se" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751268Ab2LFJPl (ORCPT ); Thu, 6 Dec 2012 04:15:41 -0500 X-SENDER-IP: [85.230.168.206] X-LISTENER: [smtp.bredband.net] X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AlzKAB1hwFBV5qjOPGdsb2JhbABEgwiCZQGwaIZUBH0XAwEBAQE4NIJfHCMYXSUKGognCMIkFIxeAYMkYQOWAoEdhF6DU4lt X-IronPort-AV: E=Sophos;i="4.84,229,1355094000"; d="scan'208";a="165046131" From: "Henrik Rydberg" Date: Thu, 6 Dec 2012 10:17:44 +0100 To: Linus Torvalds Cc: Linux Kernel Mailing List Subject: Oops in 3.7-rc8 isolate_free_pages_block() Message-ID: <20121206091744.GA1397@polaris.bitmath.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Linus, This is the third time I encounter this oops in 3.7, but the first time I managed to get a decent screenshot: http://bitmath.org/test/oops-3.7-rc8.jpg It seems to have to do with page migration. I run with transparent hugepages configured, just for the fun of it. I am happy to test any suggestions. Thanks, Henrik From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1423608Ab2LFOsZ (ORCPT ); Thu, 6 Dec 2012 09:48:25 -0500 Received: from cantor2.suse.de ([195.135.220.15]:33519 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1423590Ab2LFOsX (ORCPT ); Thu, 6 Dec 2012 09:48:23 -0500 Date: Thu, 6 Dec 2012 15:48:21 +0100 From: Jan Kara To: Henrik Rydberg Cc: Linus Torvalds , mgorman@suse.de, linux-mm@kvack.org, Linux Kernel Mailing List Subject: Re: Oops in 3.7-rc8 isolate_free_pages_block() Message-ID: <20121206144821.GC18547@quack.suse.cz> References: <20121206091744.GA1397@polaris.bitmath.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20121206091744.GA1397@polaris.bitmath.org> User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 06-12-12 10:17:44, Henrik Rydberg wrote: > Hi Linus, > > This is the third time I encounter this oops in 3.7, but the first > time I managed to get a decent screenshot: > > http://bitmath.org/test/oops-3.7-rc8.jpg > > It seems to have to do with page migration. I run with transparent > hugepages configured, just for the fun of it. > > I am happy to test any suggestions. Adding linux-mm and Mel as an author of compaction in particular to CC... It seems that while traversing struct page structures, we entered into a new huge page (note that RBX is 0xffffea0001c00000 - just the beginning of a huge page) and oopsed on PageBuddy test (_mapcount is at offset 0x18 in struct page). It might be useful if you provide disassembly of isolate_freepages_block() function in your kernel so that we can guess more from other register contents... Honza Honza -- Jan Kara SUSE Labs, CR From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1423835Ab2LFPUb (ORCPT ); Thu, 6 Dec 2012 10:20:31 -0500 Received: from smtprelay-h22.telenor.se ([195.54.99.197]:51874 "EHLO smtprelay-h22.telenor.se" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1423589Ab2LFPUa (ORCPT ); Thu, 6 Dec 2012 10:20:30 -0500 X-SENDER-IP: [85.230.168.206] X-LISTENER: [smtp.bredband.net] X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap5BAMS2wFBV5qjOPGdsb2JhbABEhUu4XRcDAQEBATg0gh8BBTocIxAIAw44FCUKGognCMIqFJAHYQOWAoEdhF6DU4lu X-IronPort-AV: E=Sophos;i="4.84,230,1355094000"; d="scan'208";a="165334179" From: "Henrik Rydberg" Date: Thu, 6 Dec 2012 16:22:34 +0100 To: Jan Kara Cc: Linus Torvalds , mgorman@suse.de, linux-mm@kvack.org, Linux Kernel Mailing List Subject: Re: Oops in 3.7-rc8 isolate_free_pages_block() Message-ID: <20121206152234.GA5309@polaris.bitmath.org> References: <20121206091744.GA1397@polaris.bitmath.org> <20121206144821.GC18547@quack.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20121206144821.GC18547@quack.suse.cz> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Jan, > > http://bitmath.org/test/oops-3.7-rc8.jpg > > > > It seems to have to do with page migration. I run with transparent > > hugepages configured, just for the fun of it. > > > > I am happy to test any suggestions. > > Adding linux-mm and Mel as an author of compaction in particular to CC... > > It seems that while traversing struct page structures, we entered into a new > huge page (note that RBX is 0xffffea0001c00000 - just the beginning of > a huge page) and oopsed on PageBuddy test (_mapcount is at offset 0x18 in > struct page). It might be useful if you provide disassembly of > isolate_freepages_block() function in your kernel so that we can guess more > from other register contents... I had to recreate the vmlinux file, but it seems be at the right address, so here we go: ffffffff810a6d00 : ffffffff810a6d00: 48 b8 00 00 00 00 00 movabs $0xffffea0000000000,%rax ffffffff810a6d07: ea ff ff ffffffff810a6d0a: 41 57 push %r15 ffffffff810a6d0c: 41 56 push %r14 ffffffff810a6d0e: 49 89 fe mov %rdi,%r14 ffffffff810a6d11: 41 55 push %r13 ffffffff810a6d13: 49 89 d5 mov %rdx,%r13 ffffffff810a6d16: 41 54 push %r12 ffffffff810a6d18: 55 push %rbp ffffffff810a6d19: 53 push %rbx ffffffff810a6d1a: 48 89 f3 mov %rsi,%rbx ffffffff810a6d1d: 48 c1 e3 06 shl $0x6,%rbx ffffffff810a6d21: 48 83 ec 58 sub $0x58,%rsp ffffffff810a6d25: 48 01 c3 add %rax,%rbx ffffffff810a6d28: 48 39 f2 cmp %rsi,%rdx ffffffff810a6d2b: 48 89 74 24 30 mov %rsi,0x30(%rsp) ffffffff810a6d30: 44 88 44 24 3b mov %r8b,0x3b(%rsp) ffffffff810a6d35: 0f 86 15 02 00 00 jbe ffffffff810a6f50 ffffffff810a6d3b: 48 8d 47 58 lea 0x58(%rdi),%rax ffffffff810a6d3f: 31 d2 xor %edx,%edx ffffffff810a6d41: 48 8b 6c 24 30 mov 0x30(%rsp),%rbp ffffffff810a6d46: 48 89 44 24 20 mov %rax,0x20(%rsp) ffffffff810a6d4b: 48 8d 47 40 lea 0x40(%rdi),%rax ffffffff810a6d4f: 49 89 dc mov %rbx,%r12 ffffffff810a6d52: c7 44 24 3c 00 00 00 movl $0x0,0x3c(%rsp) ffffffff810a6d59: 00 ffffffff810a6d5a: 49 89 ce mov %rcx,%r14 ffffffff810a6d5d: 41 89 d7 mov %edx,%r15d ffffffff810a6d60: 48 89 44 24 28 mov %rax,0x28(%rsp) ffffffff810a6d65: 48 89 7c 24 18 mov %rdi,0x18(%rsp) ffffffff810a6d6a: eb 1c jmp ffffffff810a6d88 ffffffff810a6d6c: 0f 1f 40 00 nopl 0x0(%rax) ffffffff810a6d70: 48 83 c5 01 add $0x1,%rbp ffffffff810a6d74: 48 83 c3 40 add $0x40,%rbx ffffffff810a6d78: 49 39 ed cmp %rbp,%r13 ffffffff810a6d7b: 0f 86 cf 00 00 00 jbe ffffffff810a6e50 ffffffff810a6d81: 4d 85 e4 test %r12,%r12 ffffffff810a6d84: 4c 0f 44 e3 cmove %rbx,%r12 ffffffff810a6d88: 8b 43 18 mov 0x18(%rbx),%eax ffffffff810a6d8b: 83 f8 80 cmp $0xffffff80,%eax ffffffff810a6d8e: 75 e0 jne ffffffff810a6d70 ffffffff810a6d90: 48 8b 44 24 18 mov 0x18(%rsp),%rax ffffffff810a6d95: 48 8d 74 24 48 lea 0x48(%rsp),%rsi ffffffff810a6d9a: 41 0f b6 d7 movzbl %r15b,%edx ffffffff810a6d9e: 4c 8b 44 24 20 mov 0x20(%rsp),%r8 ffffffff810a6da3: 48 8b 4c 24 28 mov 0x28(%rsp),%rcx ffffffff810a6da8: 48 8b 40 50 mov 0x50(%rax),%rax ffffffff810a6dac: 48 89 c7 mov %rax,%rdi ffffffff810a6daf: 48 83 c7 50 add $0x50,%rdi ffffffff810a6db3: e8 a8 fe ff ff callq ffffffff810a6c60 ffffffff810a6db8: 84 c0 test %al,%al ffffffff810a6dba: 41 89 c7 mov %eax,%r15d ffffffff810a6dbd: 0f 84 8d 00 00 00 je ffffffff810a6e50 ffffffff810a6dc3: 80 7c 24 3b 00 cmpb $0x0,0x3b(%rsp) ffffffff810a6dc8: 0f 84 c2 00 00 00 je ffffffff810a6e90 ffffffff810a6dce: 8b 43 18 mov 0x18(%rbx),%eax ffffffff810a6dd1: 83 f8 80 cmp $0xffffff80,%eax ffffffff810a6dd4: 75 9a jne ffffffff810a6d70 ffffffff810a6dd6: 48 89 df mov %rbx,%rdi ffffffff810a6dd9: e8 32 db fe ff callq ffffffff81094910 ffffffff810a6dde: 85 c0 test %eax,%eax ffffffff810a6de0: 0f 84 81 01 00 00 je ffffffff810a6f67 ffffffff810a6de6: 01 44 24 3c add %eax,0x3c(%rsp) ffffffff810a6dea: 83 f8 00 cmp $0x0,%eax ffffffff810a6ded: 0f 8e 48 01 00 00 jle ffffffff810a6f3b ffffffff810a6df3: 4d 8b 06 mov (%r14),%r8 ffffffff810a6df6: 48 89 d9 mov %rbx,%rcx ffffffff810a6df9: 31 ff xor %edi,%edi ffffffff810a6dfb: 4c 8d 5b 20 lea 0x20(%rbx),%r11 ffffffff810a6dff: 90 nop ffffffff810a6e00: 48 8d 51 20 lea 0x20(%rcx),%rdx ffffffff810a6e04: 48 89 ce mov %rcx,%rsi ffffffff810a6e07: 83 c7 01 add $0x1,%edi ffffffff810a6e0a: 48 29 de sub %rbx,%rsi ffffffff810a6e0d: 48 83 c1 40 add $0x40,%rcx ffffffff810a6e11: 39 c7 cmp %eax,%edi ffffffff810a6e13: 49 89 50 08 mov %rdx,0x8(%r8) ffffffff810a6e17: 4e 89 04 1e mov %r8,(%rsi,%r11,1) ffffffff810a6e1b: 49 89 d0 mov %rdx,%r8 ffffffff810a6e1e: 4e 89 74 1e 08 mov %r14,0x8(%rsi,%r11,1) ffffffff810a6e23: 49 89 16 mov %rdx,(%r14) ffffffff810a6e26: 75 d8 jne ffffffff810a6e00 ffffffff810a6e28: 8d 48 ff lea -0x1(%rax),%ecx ffffffff810a6e2b: 48 98 cltq ffffffff810a6e2d: 48 83 e8 01 sub $0x1,%rax ffffffff810a6e31: 48 63 c9 movslq %ecx,%rcx ffffffff810a6e34: 48 01 cd add %rcx,%rbp ffffffff810a6e37: 48 c1 e0 06 shl $0x6,%rax ffffffff810a6e3b: 48 01 c3 add %rax,%rbx ffffffff810a6e3e: 48 83 c5 01 add $0x1,%rbp ffffffff810a6e42: 48 83 c3 40 add $0x40,%rbx ffffffff810a6e46: 49 39 ed cmp %rbp,%r13 ffffffff810a6e49: 0f 87 32 ff ff ff ja ffffffff810a6d81 ffffffff810a6e4f: 90 nop ffffffff810a6e50: 4c 8b 74 24 18 mov 0x18(%rsp),%r14 ffffffff810a6e55: 44 89 fa mov %r15d,%edx ffffffff810a6e58: 48 63 44 24 3c movslq 0x3c(%rsp),%rax ffffffff810a6e5d: 80 7c 24 3b 00 cmpb $0x0,0x3b(%rsp) ffffffff810a6e62: 74 11 je ffffffff810a6e75 ffffffff810a6e64: 4c 89 e9 mov %r13,%rcx ffffffff810a6e67: 48 2b 4c 24 30 sub 0x30(%rsp),%rcx ffffffff810a6e6c: 48 39 c1 cmp %rax,%rcx ffffffff810a6e6f: 0f 87 b7 00 00 00 ja ffffffff810a6f2c ffffffff810a6e75: 84 d2 test %dl,%dl ffffffff810a6e77: 75 31 jne ffffffff810a6eaa ffffffff810a6e79: 4c 39 ed cmp %r13,%rbp ffffffff810a6e7c: 74 4d je ffffffff810a6ecb ffffffff810a6e7e: 48 83 c4 58 add $0x58,%rsp ffffffff810a6e82: 5b pop %rbx ffffffff810a6e83: 5d pop %rbp ffffffff810a6e84: 41 5c pop %r12 ffffffff810a6e86: 41 5d pop %r13 ffffffff810a6e88: 41 5e pop %r14 ffffffff810a6e8a: 41 5f pop %r15 ffffffff810a6e8c: c3 retq ffffffff810a6e8d: 0f 1f 00 nopl (%rax) ffffffff810a6e90: 48 89 df mov %rbx,%rdi ffffffff810a6e93: e8 78 fd ff ff callq ffffffff810a6c10 ffffffff810a6e98: 84 c0 test %al,%al ffffffff810a6e9a: 0f 85 2e ff ff ff jne ffffffff810a6dce ffffffff810a6ea0: 4c 8b 74 24 18 mov 0x18(%rsp),%r14 ffffffff810a6ea5: 48 63 44 24 3c movslq 0x3c(%rsp),%rax ffffffff810a6eaa: 49 8b 7e 50 mov 0x50(%r14),%rdi ffffffff810a6eae: 48 89 44 24 08 mov %rax,0x8(%rsp) ffffffff810a6eb3: 48 8b 74 24 48 mov 0x48(%rsp),%rsi ffffffff810a6eb8: 48 83 c7 50 add $0x50,%rdi ffffffff810a6ebc: e8 1f 14 60 00 callq ffffffff816a82e0 <_raw_spin_unlock_irqrestore> ffffffff810a6ec1: 4c 39 ed cmp %r13,%rbp ffffffff810a6ec4: 48 8b 44 24 08 mov 0x8(%rsp),%rax ffffffff810a6ec9: 75 b3 jne ffffffff810a6e7e ffffffff810a6ecb: 4d 85 e4 test %r12,%r12 ffffffff810a6ece: 49 8b 5e 50 mov 0x50(%r14),%rbx ffffffff810a6ed2: 74 aa je ffffffff810a6e7e ffffffff810a6ed4: 8b 4c 24 3c mov 0x3c(%rsp),%ecx ffffffff810a6ed8: 85 c9 test %ecx,%ecx ffffffff810a6eda: 75 a2 jne ffffffff810a6e7e ffffffff810a6edc: b9 03 00 00 00 mov $0x3,%ecx ffffffff810a6ee1: ba 03 00 00 00 mov $0x3,%edx ffffffff810a6ee6: be 01 00 00 00 mov $0x1,%esi ffffffff810a6eeb: 4c 89 e7 mov %r12,%rdi ffffffff810a6eee: 48 89 44 24 08 mov %rax,0x8(%rsp) ffffffff810a6ef3: e8 18 d1 fe ff callq ffffffff81094010 ffffffff810a6ef8: 41 80 7e 42 00 cmpb $0x0,0x42(%r14) ffffffff810a6efd: 48 8b 44 24 08 mov 0x8(%rsp),%rax ffffffff810a6f02: 0f 85 76 ff ff ff jne ffffffff810a6e7e ffffffff810a6f08: 48 ba 00 00 00 00 00 movabs $0x160000000000,%rdx ffffffff810a6f0f: 16 00 00 ffffffff810a6f12: 49 01 d4 add %rdx,%r12 ffffffff810a6f15: 49 c1 fc 06 sar $0x6,%r12 ffffffff810a6f19: 4c 3b 63 60 cmp 0x60(%rbx),%r12 ffffffff810a6f1d: 0f 83 5b ff ff ff jae ffffffff810a6e7e ffffffff810a6f23: 4c 89 63 60 mov %r12,0x60(%rbx) ffffffff810a6f27: e9 52 ff ff ff jmpq ffffffff810a6e7e ffffffff810a6f2c: 31 c0 xor %eax,%eax ffffffff810a6f2e: c7 44 24 3c 00 00 00 movl $0x0,0x3c(%rsp) ffffffff810a6f35: 00 ffffffff810a6f36: e9 3a ff ff ff jmpq ffffffff810a6e75 ffffffff810a6f3b: 0f 84 2f fe ff ff je ffffffff810a6d70 ffffffff810a6f41: e9 e2 fe ff ff jmpq ffffffff810a6e28 ffffffff810a6f46: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) ffffffff810a6f4d: 00 00 00 ffffffff810a6f50: 48 89 f5 mov %rsi,%rbp ffffffff810a6f53: 31 c0 xor %eax,%eax ffffffff810a6f55: c7 44 24 3c 00 00 00 movl $0x0,0x3c(%rsp) ffffffff810a6f5c: 00 ffffffff810a6f5d: 31 d2 xor %edx,%edx ffffffff810a6f5f: 45 31 e4 xor %r12d,%r12d ffffffff810a6f62: e9 f6 fe ff ff jmpq ffffffff810a6e5d ffffffff810a6f67: 80 7c 24 3b 00 cmpb $0x0,0x3b(%rsp) ffffffff810a6f6c: 0f 84 74 fe ff ff je ffffffff810a6de6 ffffffff810a6f72: 44 89 fa mov %r15d,%edx ffffffff810a6f75: 4c 8b 74 24 18 mov 0x18(%rsp),%r14 ffffffff810a6f7a: 48 63 44 24 3c movslq 0x3c(%rsp),%rax ffffffff810a6f7f: e9 e0 fe ff ff jmpq ffffffff810a6e64 ffffffff810a6f84: 66 66 66 2e 0f 1f 84 data32 data32 nopw %cs:0x0(%rax,%rax,1) ffffffff810a6f8b: 00 00 00 00 00 Thanks, Henrik From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1424175Ab2LFQKg (ORCPT ); Thu, 6 Dec 2012 11:10:36 -0500 Received: from mail-we0-f174.google.com ([74.125.82.174]:53363 "EHLO mail-we0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1423381Ab2LFQKf (ORCPT ); Thu, 6 Dec 2012 11:10:35 -0500 MIME-Version: 1.0 In-Reply-To: <20121206152234.GA5309@polaris.bitmath.org> References: <20121206091744.GA1397@polaris.bitmath.org> <20121206144821.GC18547@quack.suse.cz> <20121206152234.GA5309@polaris.bitmath.org> From: Linus Torvalds Date: Thu, 6 Dec 2012 08:10:13 -0800 X-Google-Sender-Auth: Cw3PjqhsnQWm6VvwueJxcZS8D_E Message-ID: Subject: Re: Oops in 3.7-rc8 isolate_free_pages_block() To: Henrik Rydberg Cc: Jan Kara , Mel Gorman , linux-mm , Linux Kernel Mailing List , Andrew Morton Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Ok, so it's isolate_freepages_block+0x88, and as Jan Kara already guessed from just the offset, that is indeed likely the PageBuddy() test. On Thu, Dec 6, 2012 at 7:22 AM, Henrik Rydberg wrote: > > http://bitmath.org/test/oops-3.7-rc8.jpg > > ffffffff810a6d6a: eb 1c jmp ffffffff810a6d88 > ffffffff810a6d6c: 0f 1f 40 00 nopl 0x0(%rax) On the first entry to the loop, we jump *into* the loop, over the end condition (the compiler has basically turned. And we jump directly to the faulting instruction. Looking at the register state, though, we're not at the first iteration of the loop, so we don't have to worry about that case. The loop itself then starts with: > ffffffff810a6d70: 48 83 c5 01 add $0x1,%rbp > ffffffff810a6d74: 48 83 c3 40 add $0x40,%rbx The above is the "blockpfn++, cursor++" part of the loop, while the test below is the loop condition ("blockpfn < end_pfn"): > ffffffff810a6d78: 49 39 ed cmp %rbp,%r13 > ffffffff810a6d7b: 0f 86 cf 00 00 00 jbe ffffffff810a6e50 >>From your image, %rbp is 0x070000 and %r13 is 0x0702f9. The "pfn_valid_within()" test is a no-op because we don't have holes in zones on x86, so then we have if (!valid_page) valid_page = page; which generates a test+cmove: > ffffffff810a6d81: 4d 85 e4 test %r12,%r12 > ffffffff810a6d84: 4c 0f 44 e3 cmove %rbx,%r12 (which is how we can tell we're not at the beginning: 'valid_page' is 0xffffea0001bfbe40, while the current page is 0xffffea0001c00000). .. and finally the oopsing instruction from PageBuddy(), which is the read of the 'page->_mapcount' > ffffffff810a6d88: 8b 43 18 mov 0x18(%rbx),%eax > ffffffff810a6d8b: 83 f8 80 cmp $0xffffff80,%eax > ffffffff810a6d8e: 75 e0 jne ffffffff810a6d70 So yeah, that loop has apparently wandered into la-la-land. end_pfn must be somehow wrong. Mel, does any of this ring a bell (Andrew also added to the cc, since the patches came through him). Linus From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1424419Ab2LFQ15 (ORCPT ); Thu, 6 Dec 2012 11:27:57 -0500 Received: from cantor2.suse.de ([195.135.220.15]:38095 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1422879Ab2LFQ14 (ORCPT ); Thu, 6 Dec 2012 11:27:56 -0500 Date: Thu, 6 Dec 2012 16:19:34 +0000 From: Mel Gorman To: Jan Kara Cc: Henrik Rydberg , Linus Torvalds , linux-mm@kvack.org, Linux Kernel Mailing List Subject: Re: Oops in 3.7-rc8 isolate_free_pages_block() Message-ID: <20121206161934.GA17258@suse.de> References: <20121206091744.GA1397@polaris.bitmath.org> <20121206144821.GC18547@quack.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20121206144821.GC18547@quack.suse.cz> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Dec 06, 2012 at 03:48:21PM +0100, Jan Kara wrote: > On Thu 06-12-12 10:17:44, Henrik Rydberg wrote: > > Hi Linus, > > > > This is the third time I encounter this oops in 3.7, but the first > > time I managed to get a decent screenshot: > > > > http://bitmath.org/test/oops-3.7-rc8.jpg > > > > It seems to have to do with page migration. I run with transparent > > hugepages configured, just for the fun of it. > > > > I am happy to test any suggestions. > Adding linux-mm and Mel as an author of compaction in particular to CC... > It seems that while traversing struct page structures, we entered into a new > huge page (note that RBX is 0xffffea0001c00000 - just the beginning of > a huge page) and oopsed on PageBuddy test (_mapcount is at offset 0x18 in > struct page). It might be useful if you provide disassembly of > isolate_freepages_block() function in your kernel so that we can guess more > from other register contents... > Still travelling and am not in a position to test this properly :(. However, this bug feels very similar to a bug in the migration scanner where a pfn_valid check is missed because the start is not aligned. Henrik, when did this start happening? I would be a little surprised if it started between 3.6 and 3.7-rcX but maybe it's just easier to hit now for some reason. How reproducible is this? Is there anything in particular you do to trigger the oops? Does the following patch help any? It's only compile tested I'm afraid. ---8<--- mm: compaction: check pfn_valid when entering a new MAX_ORDER_NR_PAGES block during isolation for free Commit 0bf380bc (mm: compaction: check pfn_valid when entering a new MAX_ORDER_NR_PAGES block during isolation for migration) added a check for pfn_valid() when isolating pages for migration as the scanner does not necessarily start pageblock-aligned. However, the free scanner has the same problem. If it encounters a hole, it can also trigger an oops when is calls PageBuddy(page) on a page that is within an hole. Reported-by: Henrik Rydberg Signed-off-by: Mel Gorman Cc: stable@vger.kernel.org --- mm/compaction.c | 10 ++++++++++ 1 files changed, 10 insertions(+), 0 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 9eef558..7d85ad485 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -298,6 +298,16 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, continue; if (!valid_page) valid_page = page; + + /* + * As blockpfn may not start aligned, blockpfn->end_pfn + * may cross a MAX_ORDER_NR_PAGES boundary and a pfn_valid + * check is necessary. If the pfn is not valid, stop + * isolation. + */ + if ((blockpfn & (MAX_ORDER_NR_PAGES - 1)) == 0 && + !pfn_valid(blockpfn)) + break; if (!PageBuddy(page)) continue; From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755188Ab2LFQnf (ORCPT ); Thu, 6 Dec 2012 11:43:35 -0500 Received: from cantor2.suse.de ([195.135.220.15]:38665 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S964770Ab2LFQnd (ORCPT ); Thu, 6 Dec 2012 11:43:33 -0500 Date: Thu, 6 Dec 2012 16:35:11 +0000 From: Mel Gorman To: Linus Torvalds Cc: Henrik Rydberg , Jan Kara , linux-mm , Linux Kernel Mailing List , Andrew Morton Subject: Re: Oops in 3.7-rc8 isolate_free_pages_block() Message-ID: <20121206163511.GB17258@suse.de> References: <20121206091744.GA1397@polaris.bitmath.org> <20121206144821.GC18547@quack.suse.cz> <20121206152234.GA5309@polaris.bitmath.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Dec 06, 2012 at 08:10:13AM -0800, Linus Torvalds wrote: > Ok, so it's isolate_freepages_block+0x88, and as Jan Kara already > guessed from just the offset, that is indeed likely the PageBuddy() > test. > > On Thu, Dec 6, 2012 at 7:22 AM, Henrik Rydberg wrote: > > > > http://bitmath.org/test/oops-3.7-rc8.jpg > > > > ffffffff810a6d6a: eb 1c jmp ffffffff810a6d88 > > ffffffff810a6d6c: 0f 1f 40 00 nopl 0x0(%rax) > > On the first entry to the loop, we jump *into* the loop, over the end > condition (the compiler has basically turned. And we jump directly to > the faulting instruction. Looking at the register state, though, we're > not at the first iteration of the loop, so we don't have to worry > about that case. The loop itself then starts with: > > > ffffffff810a6d70: 48 83 c5 01 add $0x1,%rbp > > ffffffff810a6d74: 48 83 c3 40 add $0x40,%rbx > > The above is the "blockpfn++, cursor++" part of the loop, while the > test below is the loop condition ("blockpfn < end_pfn"): > > > ffffffff810a6d78: 49 39 ed cmp %rbp,%r13 > > ffffffff810a6d7b: 0f 86 cf 00 00 00 jbe ffffffff810a6e50 > > From your image, %rbp is 0x070000 and %r13 is 0x0702f9. > > The "pfn_valid_within()" test is a no-op because we don't have holes > in zones on x86, so then we have > That thing is not about holes in zones, it's about holes within a MAX_ORDER_NR_PAGES block but either way it's a no-op x86 and we're not doing a pfn_valid check in this loop. I didn't look back in time but I have a vague recollection that this used to be always start with an aligned PFN but with large amounts of churn since, it's no longer true. > if (!valid_page) > valid_page = page; > > which generates a test+cmove: > > > ffffffff810a6d81: 4d 85 e4 test %r12,%r12 > > ffffffff810a6d84: 4c 0f 44 e3 cmove %rbx,%r12 > > (which is how we can tell we're not at the beginning: 'valid_page' is > 0xffffea0001bfbe40, while the current page is 0xffffea0001c00000). > > .. and finally the oopsing instruction from PageBuddy(), which is the > read of the 'page->_mapcount' > > > ffffffff810a6d88: 8b 43 18 mov 0x18(%rbx),%eax > > ffffffff810a6d8b: 83 f8 80 cmp $0xffffff80,%eax > > ffffffff810a6d8e: 75 e0 jne ffffffff810a6d70 > > So yeah, that loop has apparently wandered into la-la-land. end_pfn > must be somehow wrong. > I think we wandered into a hole where there is no valid struct page. > Mel, does any of this ring a bell (Andrew also added to the cc, since > the patches came through him). > It reminded me of a similar bug in the migration scanner which I mentioned in the patch elsewhere in the thread but carelessly failed to cc Andrew. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1946054Ab2LFQvW (ORCPT ); Thu, 6 Dec 2012 11:51:22 -0500 Received: from mail-wi0-f170.google.com ([209.85.212.170]:35135 "EHLO mail-wi0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1424218Ab2LFQvT (ORCPT ); Thu, 6 Dec 2012 11:51:19 -0500 MIME-Version: 1.0 In-Reply-To: <20121206161934.GA17258@suse.de> References: <20121206091744.GA1397@polaris.bitmath.org> <20121206144821.GC18547@quack.suse.cz> <20121206161934.GA17258@suse.de> From: Linus Torvalds Date: Thu, 6 Dec 2012 08:50:54 -0800 X-Google-Sender-Auth: IkKARusN7uttb4oFtWu1FDXEsVk Message-ID: Subject: Re: Oops in 3.7-rc8 isolate_free_pages_block() To: Mel Gorman Cc: Jan Kara , Henrik Rydberg , linux-mm , Linux Kernel Mailing List Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Dec 6, 2012 at 8:19 AM, Mel Gorman wrote: > > Still travelling and am not in a position to test this properly :(. > However, this bug feels very similar to a bug in the migration scanner where > a pfn_valid check is missed because the start is not aligned. Ugh. This patch makes my eyes bleed. Is there no way to do this nicely in the caller? IOW, fix the 'end_pfn' logic way upstream where it is computed, and just cap it at the MAX_ORDER_NR_PAGES boundary? For example, isolate_freepages_range() seems to have this *other* end-point alignment thing going on, and does it in a loop. Wouldn't it be much better to have a separate loop that looped up to the next MAX_ORDER_NR_PAGES boundary instead of having this kind of very random test in the middle of a loop. Even the name ("isolate_freepages_block") implies that we have a "block" of pages. Having to have a random "oops, this block can have other blocks inside of it that aren't mapped" test in the middle of that function really makes me go "Uhh, no". Plus, is it even guaranteed that the *first* pfn (that we get called with) is pfnvalid to begin with? So I guess this patch fixes things, but it does make me go "That's really *really* ugly". Linus From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1424480Ab2LFQ40 (ORCPT ); Thu, 6 Dec 2012 11:56:26 -0500 Received: from smtprelay-h21.telenor.se ([195.54.99.196]:39924 "EHLO smtprelay-h21.telenor.se" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1424328Ab2LFQ4Z (ORCPT ); Thu, 6 Dec 2012 11:56:25 -0500 X-SENDER-IP: [85.230.168.206] X-LISTENER: [smtp.bredband.net] X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AnhKAPHMwFBV5qjOPGdsb2JhbABEhUuFI7M7FwMBAQEBODSCHgEBBAEnExwTAQ8FCwgDDjgUJQoaE4gKCsI+FJAHYQOWAoV7g1OJbg X-IronPort-AV: E=Sophos;i="4.84,230,1355094000"; d="scan'208";a="165391500" From: "Henrik Rydberg" Date: Thu, 6 Dec 2012 17:58:29 +0100 To: Mel Gorman Cc: Jan Kara , Linus Torvalds , linux-mm@kvack.org, Linux Kernel Mailing List Subject: Re: Oops in 3.7-rc8 isolate_free_pages_block() Message-ID: <20121206165829.GA392@polaris.bitmath.org> References: <20121206091744.GA1397@polaris.bitmath.org> <20121206144821.GC18547@quack.suse.cz> <20121206161934.GA17258@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20121206161934.GA17258@suse.de> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Mel, > Still travelling and am not in a position to test this properly :(. > However, this bug feels very similar to a bug in the migration scanner where > a pfn_valid check is missed because the start is not aligned. Henrik, when > did this start happening? I would be a little surprised if it started between > 3.6 and 3.7-rcX but maybe it's just easier to hit now for some reason. I started using transparent hugepages when moving to 3.7-rc1, so it is quite possible that the problem was there already in 3.6. > How reproducible is this? Is there anything in particular you do to > trigger the oops? Unfortunately nothing special, and it is rare. IIRC, it has happened after a long uptime, but I guess that only means the probability of the oops is higher then. > Does the following patch help any? It's only compile tested I'm afraid. > > ---8<--- > mm: compaction: check pfn_valid when entering a new MAX_ORDER_NR_PAGES block during isolation for free > > Commit 0bf380bc (mm: compaction: check pfn_valid when entering a new > MAX_ORDER_NR_PAGES block during isolation for migration) added a check > for pfn_valid() when isolating pages for migration as the scanner does > not necessarily start pageblock-aligned. However, the free scanner has > the same problem. If it encounters a hole, it can also trigger an oops > when is calls PageBuddy(page) on a page that is within an hole. > > Reported-by: Henrik Rydberg > Signed-off-by: Mel Gorman > Cc: stable@vger.kernel.org > --- > mm/compaction.c | 10 ++++++++++ > 1 files changed, 10 insertions(+), 0 deletions(-) > > diff --git a/mm/compaction.c b/mm/compaction.c > index 9eef558..7d85ad485 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -298,6 +298,16 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, > continue; > if (!valid_page) > valid_page = page; > + > + /* > + * As blockpfn may not start aligned, blockpfn->end_pfn > + * may cross a MAX_ORDER_NR_PAGES boundary and a pfn_valid > + * check is necessary. If the pfn is not valid, stop > + * isolation. > + */ > + if ((blockpfn & (MAX_ORDER_NR_PAGES - 1)) == 0 && > + !pfn_valid(blockpfn)) > + break; > if (!PageBuddy(page)) > continue; > I am running with it now, adding a printout to see if the case happens at all. Might take a while, will try to stress the machine a bit. Thanks, Henrik From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1424632Ab2LFRUX (ORCPT ); Thu, 6 Dec 2012 12:20:23 -0500 Received: from smtprelay-h21.telenor.se ([195.54.99.196]:42577 "EHLO smtprelay-h21.telenor.se" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1423012Ab2LFRUW (ORCPT ); Thu, 6 Dec 2012 12:20:22 -0500 X-SENDER-IP: [85.230.168.206] X-LISTENER: [smtp.bredband.net] X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AjlRAD/TwFBV5qjOPGdsb2JhbABEhUsiAYUAszsXAwEBAQE4NIIeAQEEATocEwEPBQsIAw44FCUKGhOICgrCURSQB2EDlgKFe4NTiW4 X-IronPort-AV: E=Sophos;i="4.84,230,1355094000"; d="scan'208";a="459423345" From: "Henrik Rydberg" Date: Thu, 6 Dec 2012 18:22:25 +0100 To: Mel Gorman Cc: Jan Kara , Linus Torvalds , linux-mm@kvack.org, Linux Kernel Mailing List Subject: Re: Oops in 3.7-rc8 isolate_free_pages_block() Message-ID: <20121206172225.GA978@polaris.bitmath.org> References: <20121206091744.GA1397@polaris.bitmath.org> <20121206144821.GC18547@quack.suse.cz> <20121206161934.GA17258@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20121206161934.GA17258@suse.de> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > Still travelling and am not in a position to test this properly :(. > However, this bug feels very similar to a bug in the migration scanner where > a pfn_valid check is missed because the start is not aligned. Henrik, when > did this start happening? I would be a little surprised if it started between > 3.6 and 3.7-rcX but maybe it's just easier to hit now for some reason. How > reproducible is this? Is there anything in particular you do to trigger the > oops? Does the following patch help any? It's only compile tested I'm afraid. I managed to trigger the path several times with a small memory-intensive program, and since I am still here, Tested-by: Henrik Rydberg Thanks! Henrik From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1946463Ab2LFSDk (ORCPT ); Thu, 6 Dec 2012 13:03:40 -0500 Received: from cantor2.suse.de ([195.135.220.15]:41552 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1946322Ab2LFSDj (ORCPT ); Thu, 6 Dec 2012 13:03:39 -0500 Date: Thu, 6 Dec 2012 17:55:15 +0000 From: Mel Gorman To: Linus Torvalds Cc: Jan Kara , Henrik Rydberg , linux-mm , Linux Kernel Mailing List Subject: Re: Oops in 3.7-rc8 isolate_free_pages_block() Message-ID: <20121206175451.GC17258@suse.de> References: <20121206091744.GA1397@polaris.bitmath.org> <20121206144821.GC18547@quack.suse.cz> <20121206161934.GA17258@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Dec 06, 2012 at 08:50:54AM -0800, Linus Torvalds wrote: > On Thu, Dec 6, 2012 at 8:19 AM, Mel Gorman wrote: > > > > Still travelling and am not in a position to test this properly :(. > > However, this bug feels very similar to a bug in the migration scanner where > > a pfn_valid check is missed because the start is not aligned. > > Ugh. This patch makes my eyes bleed. > Yeah. I was listening to a talk while I was writing it, a bit cranky and didn't see why I should suffer alone. > Is there no way to do this nicely in the caller? IOW, fix the > 'end_pfn' logic way upstream where it is computed, and just cap it at > the MAX_ORDER_NR_PAGES boundary? > Easily done in the caller, but not on the MAX_ORDER_NR_PAGES boundary. The caller is striding by pageblock so a MAX_ORDER_NR_PAGES alignment will not work out. > For example, isolate_freepages_range() seems to have this *other* > end-point alignment thing going on, and does it in a loop. Wouldn't it > be much better to have a separate loop that looped up to the next > MAX_ORDER_NR_PAGES boundary instead of having this kind of very random > test in the middle of a loop. > > Even the name ("isolate_freepages_block") implies that we have a > "block" of pages. Having to have a random "oops, this block can have > other blocks inside of it that aren't mapped" test in the middle of > that function really makes me go "Uhh, no". > The block in the name is related to pageblocks. > Plus, is it even guaranteed that the *first* pfn (that we get called > with) is pfnvalid to begin with? > Yes, the caller has already checked pfn_valid() and it used to be the case that this pfn was pageblock-aligned but not since commit c89511ab (mm: compaction: Restart compaction from near where it left off). > So I guess this patch fixes things, but it does make me go "That's > really *really* ugly". > Quasimoto strikes again ---8<--- mm: compaction: check pfn_valid when entering a new MAX_ORDER_NR_PAGES block during isolation for free Commit 0bf380bc (mm: compaction: check pfn_valid when entering a new MAX_ORDER_NR_PAGES block during isolation for migration) added a check for pfn_valid() when isolating pages for migration as the scanner does not necessarily start pageblock-aligned. Since commit c89511ab (mm: compaction: Restart compaction from near where it left off), the free scanner has the same problem. This patch makes sure that the pfn range passed to isolate_freepages_block() is within the same block so that pfn_valid() checks are unnecessary. Reported-by: Henrik Rydberg Signed-off-by: Mel Gorman diff --git a/mm/compaction.c b/mm/compaction.c index 9eef558..c23fa55 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -713,7 +713,15 @@ static void isolate_freepages(struct zone *zone, /* Found a block suitable for isolating free pages from */ isolated = 0; - end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn); + + /* + * As pfn may not start aligned, pfn+pageblock_nr_page + * may cross a MAX_ORDER_NR_PAGES boundary and miss + * a pfn_valid check. Ensure isolate_freepages_block() + * only scans within a pageblock. + */ + end_pfn = ALIGN(pfn + pageblock_nr_pages, pageblock_nr_pages); + end_pfn = min(end_pfn, end_pfn); isolated = isolate_freepages_block(cc, pfn, end_pfn, freelist, false); nr_freepages += isolated; From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755274Ab2LFSUA (ORCPT ); Thu, 6 Dec 2012 13:20:00 -0500 Received: from mail-we0-f174.google.com ([74.125.82.174]:62743 "EHLO mail-we0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752922Ab2LFST6 (ORCPT ); Thu, 6 Dec 2012 13:19:58 -0500 MIME-Version: 1.0 In-Reply-To: <20121206175451.GC17258@suse.de> References: <20121206091744.GA1397@polaris.bitmath.org> <20121206144821.GC18547@quack.suse.cz> <20121206161934.GA17258@suse.de> <20121206175451.GC17258@suse.de> From: Linus Torvalds Date: Thu, 6 Dec 2012 10:19:35 -0800 X-Google-Sender-Auth: q02krgAHQ6nPjjANrTMs5jyfkOU Message-ID: Subject: Re: Oops in 3.7-rc8 isolate_free_pages_block() To: Mel Gorman Cc: Jan Kara , Henrik Rydberg , linux-mm , Linux Kernel Mailing List Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Dec 6, 2012 at 9:55 AM, Mel Gorman wrote: > > Yeah. I was listening to a talk while I was writing it, a bit cranky and > didn't see why I should suffer alone. Makes sense. > Quasimoto strikes again Is that Quasimodo's Japanese cousin? > - end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn); > + > + /* > + * As pfn may not start aligned, pfn+pageblock_nr_page > + * may cross a MAX_ORDER_NR_PAGES boundary and miss > + * a pfn_valid check. Ensure isolate_freepages_block() > + * only scans within a pageblock. > + */ > + end_pfn = ALIGN(pfn + pageblock_nr_pages, pageblock_nr_pages); > + end_pfn = min(end_pfn, end_pfn); Ok, this looks much nicer, except it's obviously buggy. The min(end_pfn, end_pfn) thing is insane, and I'm sure you meant for that line to be + end_pfn = min(end_pfn, zone_end_pfn); Henrik, does that - corrected - patch (*instead* of the previous one, not in addition to) also fix your issue? Linus From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1946657Ab2LFS31 (ORCPT ); Thu, 6 Dec 2012 13:29:27 -0500 Received: from cantor2.suse.de ([195.135.220.15]:42381 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1946537Ab2LFS30 (ORCPT ); Thu, 6 Dec 2012 13:29:26 -0500 Date: Thu, 6 Dec 2012 18:21:03 +0000 From: Mel Gorman To: Linus Torvalds Cc: Jan Kara , Henrik Rydberg , linux-mm , Linux Kernel Mailing List Subject: Re: Oops in 3.7-rc8 isolate_free_pages_block() Message-ID: <20121206182103.GD17258@suse.de> References: <20121206091744.GA1397@polaris.bitmath.org> <20121206144821.GC18547@quack.suse.cz> <20121206161934.GA17258@suse.de> <20121206175451.GC17258@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Dec 06, 2012 at 10:19:35AM -0800, Linus Torvalds wrote: > On Thu, Dec 6, 2012 at 9:55 AM, Mel Gorman wrote: > > > > Yeah. I was listening to a talk while I was writing it, a bit cranky and > > didn't see why I should suffer alone. > > Makes sense. > > > Quasimoto strikes again > > Is that Quasimodo's Japanese cousin? > Yes, he's tried to escape his terrible legacy with a name change. > > - end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn); > > + > > + /* > > + * As pfn may not start aligned, pfn+pageblock_nr_page > > + * may cross a MAX_ORDER_NR_PAGES boundary and miss > > + * a pfn_valid check. Ensure isolate_freepages_block() > > + * only scans within a pageblock. > > + */ > > + end_pfn = ALIGN(pfn + pageblock_nr_pages, pageblock_nr_pages); > > + end_pfn = min(end_pfn, end_pfn); > > Ok, this looks much nicer, except it's obviously buggy. The > min(end_pfn, end_pfn) thing is insane, and I'm sure you meant for that > line to be > > + end_pfn = min(end_pfn, zone_end_pfn); > *sigh* Yes, I did. Thanks. -- Mel Gorman SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1946670Ab2LFSaz (ORCPT ); Thu, 6 Dec 2012 13:30:55 -0500 Received: from smtprelay-h22.telenor.se ([195.54.99.197]:50670 "EHLO smtprelay-h22.telenor.se" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1946537Ab2LFSay (ORCPT ); Thu, 6 Dec 2012 13:30:54 -0500 X-SENDER-IP: [85.230.168.206] X-LISTENER: [smtp.bredband.net] X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AnNKACHjwFBV5qjOPGdsb2JhbABEhUuFI7M6FwMBAQEBODSCHgEBBTocIxAIAxguFCUKGhOIFMJqFIwlg2JhA5YChXuDU4lu X-IronPort-AV: E=Sophos;i="4.84,231,1355094000"; d="scan'208";a="165437439" From: "Henrik Rydberg" Date: Thu, 6 Dec 2012 19:32:59 +0100 To: Linus Torvalds Cc: Mel Gorman , Jan Kara , linux-mm , Linux Kernel Mailing List Subject: Re: Oops in 3.7-rc8 isolate_free_pages_block() Message-ID: <20121206183259.GA591@polaris.bitmath.org> References: <20121206091744.GA1397@polaris.bitmath.org> <20121206144821.GC18547@quack.suse.cz> <20121206161934.GA17258@suse.de> <20121206175451.GC17258@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Dec 06, 2012 at 10:19:35AM -0800, Linus Torvalds wrote: > On Thu, Dec 6, 2012 at 9:55 AM, Mel Gorman wrote: > > > > Yeah. I was listening to a talk while I was writing it, a bit cranky and > > didn't see why I should suffer alone. > > Makes sense. > > > Quasimoto strikes again > > Is that Quasimodo's Japanese cousin? > > > - end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn); > > + > > + /* > > + * As pfn may not start aligned, pfn+pageblock_nr_page > > + * may cross a MAX_ORDER_NR_PAGES boundary and miss > > + * a pfn_valid check. Ensure isolate_freepages_block() > > + * only scans within a pageblock. > > + */ > > + end_pfn = ALIGN(pfn + pageblock_nr_pages, pageblock_nr_pages); > > + end_pfn = min(end_pfn, end_pfn); > > Ok, this looks much nicer, except it's obviously buggy. The > min(end_pfn, end_pfn) thing is insane, and I'm sure you meant for that > line to be > > + end_pfn = min(end_pfn, zone_end_pfn); > > Henrik, does that - corrected - patch (*instead* of the previous one, > not in addition to) also fix your issue? Yes - I can no longer trigger the failpath, so it seems to work. Mel, enjoy the rest of the talk. ;-) Generally, I am a bit surprised that noone hit this before, given that it was quite easy to trigger. I will check 3.6 as well. Thanks, Henrik From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1946734Ab2LFSlg (ORCPT ); Thu, 6 Dec 2012 13:41:36 -0500 Received: from mail-we0-f174.google.com ([74.125.82.174]:35280 "EHLO mail-we0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1946405Ab2LFSlf (ORCPT ); Thu, 6 Dec 2012 13:41:35 -0500 MIME-Version: 1.0 In-Reply-To: <20121206183259.GA591@polaris.bitmath.org> References: <20121206091744.GA1397@polaris.bitmath.org> <20121206144821.GC18547@quack.suse.cz> <20121206161934.GA17258@suse.de> <20121206175451.GC17258@suse.de> <20121206183259.GA591@polaris.bitmath.org> From: Linus Torvalds Date: Thu, 6 Dec 2012 10:41:14 -0800 X-Google-Sender-Auth: p3Jpp9v6ClWtejObkzHRZWLOgzA Message-ID: Subject: Re: Oops in 3.7-rc8 isolate_free_pages_block() To: Henrik Rydberg Cc: Mel Gorman , Jan Kara , linux-mm , Linux Kernel Mailing List Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Dec 6, 2012 at 10:32 AM, Henrik Rydberg wrote: >> >> Henrik, does that - corrected - patch (*instead* of the previous one, >> not in addition to) also fix your issue? > > Yes - I can no longer trigger the failpath, so it seems to work. Mel, > enjoy the rest of the talk. ;-) > > Generally, I am a bit surprised that noone hit this before, given that > it was quite easy to trigger. I will check 3.6 as well. Actually, looking at it some more, I think that two-liner patch had *ANOTHER* bug. Because the other line seems buggy as well. Instead of end_pfn = ALIGN(pfn + pageblock_nr_pages, pageblock_nr_pages); I think it should be end_pfn = ALIGN(pfn+1, pageblock_nr_pages); instead. ALIGN() already aligns upwards (but the "+1" is needed in case pfn is already at a pageblock_nr_pages boundary, at which point ALIGN() would have just returned that same boundary. Hmm? Mel, please confirm. And Henrik, it might be good to test that doubly-fixed patch. Because reading the patch and trying to fix bugs in it that way is *not* the same as actually verifying it ;) Linus From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1946893Ab2LFTJk (ORCPT ); Thu, 6 Dec 2012 14:09:40 -0500 Received: from cantor2.suse.de ([195.135.220.15]:43799 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1424565Ab2LFTJi (ORCPT ); Thu, 6 Dec 2012 14:09:38 -0500 Date: Thu, 6 Dec 2012 19:01:14 +0000 From: Mel Gorman To: Linus Torvalds Cc: Henrik Rydberg , Jan Kara , linux-mm , Linux Kernel Mailing List Subject: Re: Oops in 3.7-rc8 isolate_free_pages_block() Message-ID: <20121206190114.GE17258@suse.de> References: <20121206091744.GA1397@polaris.bitmath.org> <20121206144821.GC18547@quack.suse.cz> <20121206161934.GA17258@suse.de> <20121206175451.GC17258@suse.de> <20121206183259.GA591@polaris.bitmath.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Dec 06, 2012 at 10:41:14AM -0800, Linus Torvalds wrote: > On Thu, Dec 6, 2012 at 10:32 AM, Henrik Rydberg wrote: > >> > >> Henrik, does that - corrected - patch (*instead* of the previous one, > >> not in addition to) also fix your issue? > > > > Yes - I can no longer trigger the failpath, so it seems to work. Mel, > > enjoy the rest of the talk. ;-) > > > > Generally, I am a bit surprised that noone hit this before, given that > > it was quite easy to trigger. I will check 3.6 as well. > > Actually, looking at it some more, I think that two-liner patch had > *ANOTHER* bug. > > Because the other line seems buggy as well. > > Instead of > > end_pfn = ALIGN(pfn + pageblock_nr_pages, pageblock_nr_pages); > > I think it should be > > end_pfn = ALIGN(pfn+1, pageblock_nr_pages); > > instead. ALIGN() already aligns upwards (but the "+1" is needed in > case pfn is already at a pageblock_nr_pages boundary, at which point > ALIGN() would have just returned that same boundary. > > Hmm? Mel, please confirm. FFS. Yes, confirmed. In answer to Henrik's wondering why others have reported this -- reproducing this requires a large enough hole with the right aligment to have compaction walk into a PFN range with no memmap. Size and alignment depends in the memory model - 4M for FLATMEM and 128M for SPARSEMEM on x86. It needs a "lucky" machine. ---8<--- mm: compaction: check pfn_valid when entering a new MAX_ORDER_NR_PAGES block during isolation for free Commit 0bf380bc (mm: compaction: check pfn_valid when entering a new MAX_ORDER_NR_PAGES block during isolation for migration) added a check for pfn_valid() when isolating pages for migration as the scanner does not necessarily start pageblock-aligned. Since commit c89511ab (mm: compaction: Restart compaction from near where it left off), the free scanner has the same problem. This patch makes sure that the pfn range passed to isolate_freepages_block() is within the same block so that pfn_valid() checks are unnecessary. Reported-by: Henrik Rydberg Signed-off-by: Mel Gorman --- mm/compaction.c | 10 +++++++++- 1 files changed, 9 insertions(+), 1 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 9eef558..694eaab 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -713,7 +713,15 @@ static void isolate_freepages(struct zone *zone, /* Found a block suitable for isolating free pages from */ isolated = 0; - end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn); + + /* + * As pfn may not start aligned, pfn+pageblock_nr_page + * may cross a MAX_ORDER_NR_PAGES boundary and miss + * a pfn_valid check. Ensure isolate_freepages_block() + * only scans within a pageblock + */ + end_pfn = ALIGN(pfn + 1, pageblock_nr_pages); + end_pfn = min(end_pfn, zone_end_pfn); isolated = isolate_freepages_block(cc, pfn, end_pfn, freelist, false); nr_freepages += isolated; From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1424696Ab2LFT0l (ORCPT ); Thu, 6 Dec 2012 14:26:41 -0500 Received: from smtprelay-h22.telenor.se ([195.54.99.197]:57333 "EHLO smtprelay-h22.telenor.se" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1423240Ab2LFT0k (ORCPT ); Thu, 6 Dec 2012 14:26:40 -0500 X-SENDER-IP: [85.230.168.206] X-LISTENER: [smtp.bredband.net] X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AnNKAAfxwFBV5qjOPGdsb2JhbABEhUuFI7M5FwMBAQEBODSCHgEBBAE6HCMFCwgDRhQlChqIHQrCbBSMLoNZYQOWAoV7g1OJboFj X-IronPort-AV: E=Sophos;i="4.84,231,1355094000"; d="scan'208";a="239084959" From: "Henrik Rydberg" Date: Thu, 6 Dec 2012 20:28:45 +0100 To: Linus Torvalds Cc: Mel Gorman , Jan Kara , linux-mm , Linux Kernel Mailing List Subject: Re: Oops in 3.7-rc8 isolate_free_pages_block() Message-ID: <20121206192845.GA599@polaris.bitmath.org> References: <20121206091744.GA1397@polaris.bitmath.org> <20121206144821.GC18547@quack.suse.cz> <20121206161934.GA17258@suse.de> <20121206175451.GC17258@suse.de> <20121206183259.GA591@polaris.bitmath.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > Actually, looking at it some more, I think that two-liner patch had > *ANOTHER* bug. > > Because the other line seems buggy as well. > > Instead of > > end_pfn = ALIGN(pfn + pageblock_nr_pages, pageblock_nr_pages); > > I think it should be > > end_pfn = ALIGN(pfn+1, pageblock_nr_pages); > > instead. ALIGN() already aligns upwards (but the "+1" is needed in > case pfn is already at a pageblock_nr_pages boundary, at which point > ALIGN() would have just returned that same boundary. Ah, and now the two callers treat the pointers the same way. > Hmm? Mel, please confirm. And Henrik, it might be good to test that > doubly-fixed patch. Because reading the patch and trying to fix bugs > in it that way is *not* the same as actually verifying it ;) Confirmed, working. I also checked 3.6, but could not trigger the original problem there. The code also looks different, so it makes sense. To be explicit, this is what I tested on top of v3.7-rc8: --- mm/compaction.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/mm/compaction.c b/mm/compaction.c index 9eef558..ff1c483 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -713,7 +713,15 @@ static void isolate_freepages(struct zone *zone, /* Found a block suitable for isolating free pages from */ isolated = 0; - end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn); + + /* + * As pfn may not start aligned, pfn+pageblock_nr_page + * may cross a MAX_ORDER_NR_PAGES boundary and miss + * a pfn_valid check. Ensure isolate_freepages_block() + * only scans within a pageblock. + */ + end_pfn = ALIGN(pfn + 1, pageblock_nr_pages); + end_pfn = min(end_pfn, zone_end_pfn); isolated = isolate_freepages_block(cc, pfn, end_pfn, freelist, false); nr_freepages += isolated; -- 1.8.0.1 Hopefully, that's a wrap. :-) Henrik From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1424770Ab2LFTjK (ORCPT ); Thu, 6 Dec 2012 14:39:10 -0500 Received: from mail-wg0-f46.google.com ([74.125.82.46]:38663 "EHLO mail-wg0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1424743Ab2LFTjJ (ORCPT ); Thu, 6 Dec 2012 14:39:09 -0500 MIME-Version: 1.0 In-Reply-To: <20121206192845.GA599@polaris.bitmath.org> References: <20121206091744.GA1397@polaris.bitmath.org> <20121206144821.GC18547@quack.suse.cz> <20121206161934.GA17258@suse.de> <20121206175451.GC17258@suse.de> <20121206183259.GA591@polaris.bitmath.org> <20121206192845.GA599@polaris.bitmath.org> From: Linus Torvalds Date: Thu, 6 Dec 2012 11:38:47 -0800 X-Google-Sender-Auth: lv3cXVkfSLdkDPg8NzbtaUha7f0 Message-ID: Subject: Re: Oops in 3.7-rc8 isolate_free_pages_block() To: Henrik Rydberg Cc: Mel Gorman , Jan Kara , linux-mm , Linux Kernel Mailing List Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Ok, I've applied the patch. Mel, some grepping shows that there is an old line that does end_pfn = ALIGN(low_pfn + pageblock_nr_pages, pageblock_nr_pages); which looks bogus. That should probably also use "+ 1" instead. But I'll consider that an independent issue, so I applied the one patch regardless. There is also a low_pfn += pageblock_nr_pages; low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1; that looks suspicious for similar reasons. Maybe low_pfn = ALIGN(low_pfn + 1, pageblock_nr_pages) - 1; instead? Although that *can* result in the same low_pfn in the end, so maybe that one was correct after all? I just did some grepping, no actual semantic analysis... Linus From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1424917Ab2LFVhO (ORCPT ); Thu, 6 Dec 2012 16:37:14 -0500 Received: from smtprelay-b21.telenor.se ([195.54.99.212]:36377 "EHLO smtprelay-b21.telenor.se" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1424897Ab2LFVhL (ORCPT ); Thu, 6 Dec 2012 16:37:11 -0500 X-SENDER-IP: [85.230.168.206] X-LISTENER: [smtp.bredband.net] X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: At9NAMMPwVBV5qjOPGdsb2JhbABEim6zQBcDAQEBATg0gh4BAQQBOhwjBQsIA0YUJQoaiB0KwmsUjEiDP2EDlgKFe4NTiW4 X-IronPort-AV: E=Sophos;i="4.84,233,1355094000"; d="scan'208";a="462904698" From: "Henrik Rydberg" Date: Thu, 6 Dec 2012 22:39:09 +0100 To: Linus Torvalds Cc: Mel Gorman , Jan Kara , linux-mm , Linux Kernel Mailing List Subject: Re: Oops in 3.7-rc8 isolate_free_pages_block() Message-ID: <20121206213909.GA625@polaris.bitmath.org> References: <20121206091744.GA1397@polaris.bitmath.org> <20121206144821.GC18547@quack.suse.cz> <20121206161934.GA17258@suse.de> <20121206175451.GC17258@suse.de> <20121206183259.GA591@polaris.bitmath.org> <20121206192845.GA599@polaris.bitmath.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > There is also a > > low_pfn += pageblock_nr_pages; > low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1; > > that looks suspicious for similar reasons. Maybe > > low_pfn = ALIGN(low_pfn + 1, pageblock_nr_pages) - 1; > > instead? Although that *can* result in the same low_pfn in the end, so > maybe that one was correct after all? I just did some grepping, no > actual semantic analysis... Here is a totally obscure version: low_pfn |= pageblock_nr_pages - 1; It simply moves to the very end of the block, which seems to be what was intended. Henrik From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754729Ab2LGIle (ORCPT ); Fri, 7 Dec 2012 03:41:34 -0500 Received: from cantor2.suse.de ([195.135.220.15]:37097 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752724Ab2LGIlc (ORCPT ); Fri, 7 Dec 2012 03:41:32 -0500 Date: Fri, 7 Dec 2012 08:32:48 +0000 From: Mel Gorman To: Linus Torvalds Cc: Henrik Rydberg , Jan Kara , linux-mm , Linux Kernel Mailing List Subject: Re: Oops in 3.7-rc8 isolate_free_pages_block() Message-ID: <20121207083248.GF17258@suse.de> References: <20121206091744.GA1397@polaris.bitmath.org> <20121206144821.GC18547@quack.suse.cz> <20121206161934.GA17258@suse.de> <20121206175451.GC17258@suse.de> <20121206183259.GA591@polaris.bitmath.org> <20121206192845.GA599@polaris.bitmath.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Dec 06, 2012 at 11:38:47AM -0800, Linus Torvalds wrote: > Ok, I've applied the patch. > Thanks. > Mel, some grepping shows that there is an old line that does > > end_pfn = ALIGN(low_pfn + pageblock_nr_pages, pageblock_nr_pages); > > which looks bogus. It's bogus. The impact is that multiple compaction attempts may be needed to clear a particular block for allocation. THP allocation success rate under stress will be lower and the latency before a range of pages is collapsed by khugepaged to a huge page will be higher. The impact of this is less and it should not result in a bug like Henrik's An attentive reviewer is going to exclaim that GFP_ATOMIC allocations for jumbo frames is impacted by this but it isn't. Even with this bogus walk, compaction will be clearing SWAP_CLUSTER_MAX contiguous chunks which is enough for jumbo frames. > That should probably also use "+ 1" instead. But > I'll consider that an independent issue, so I applied the one patch > regardless. > > There is also a > > low_pfn += pageblock_nr_pages; > low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1; > > that looks suspicious for similar reasons. Maybe > > low_pfn = ALIGN(low_pfn + 1, pageblock_nr_pages) - 1; > This one is working by co-incidence because the low_pfn will be aligned in most cases. If it was outright broken then CMA would never work either. > instead? Although that *can* result in the same low_pfn in the end, so > maybe that one was correct after all? I just did some grepping, no > actual semantic analysis... > They need fixing but the impact is much less severe and does not justify delaying 3.8 over unlike the other last-minute fixes. My performance writing patches during talks was less than stellar yesterday so I'll avoid a repeat performance and follow up with Andrew early next week with a cc to -stable. It'll also give me a chance to run the patches through the highalloc stress tests and confirm that allocation success rate is higher and latency lower as would be expected by such a fix. Thanks. -- Mel Gorman SUSE Labs