* kernel BUG at /src/linux-dev/mm/mempolicy.c:1738! on v3.16-rc1 @ 2014-06-19 21:56 Naoya Horiguchi 2014-06-20 4:35 ` Hugh Dickins 2014-06-20 14:24 ` Christoph Lameter 0 siblings, 2 replies; 9+ messages in thread From: Naoya Horiguchi @ 2014-06-19 21:56 UTC (permalink / raw) To: linux-mm Cc: linux-kernel, Andrew Morton, Hugh Dickins, Christoph Lameter, KOSAKI Motohiro, Naoya Horiguchi Hi, I triggered the following bug on v3.16-rc1 when I did mbind() testing where multiple processes repeat calling mbind() for a shared mapped file (causing pingpong of page migration.) In my investigation, it seems that some vma accidentally has vma->vm_start = 0, which makes new_vma_page() choose a wrong vma and results in breaking the assumption that the address passed to alloc_pages_vma() should be inside a given vma. I'm suspecting that mbind_range() do something wrong around vma handling, but I don't have enough luck yet. Anyone has an idea? Thanks, Naoya Horiguchi [ 339.133960] ------------[ cut here ]------------ [ 339.134893] kernel BUG at /src/linux-dev/mm/mempolicy.c:1738! [ 339.134893] invalid opcode: 0000 [#1] SMP [ 339.134893] Modules linked in: stap_2acbad8c3ba47062dbdc6f227d00f8f4__1958(O) bnep bluetooth cfg80211 rfkill ip6t_rpfilter ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ppdev microcode i2c_piix4 pcspkr i2c_core virtio_balloon parport_pc parport serio_raw nfsd auth_rpcgss oid_registry nfs_acl lockd sunrpc virtio_blk virtio_net floppy ata_generic pata_acpi [ 339.134893] CPU: 2 PID: 2840 Comm: mbind_fuzz Tainted: G O 3.16.0-rc1-140619-1205-00003-g80aa6b64a44e #157 [ 339.134893] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 339.134893] task: ffff88007c133b60 ti: ffff88007dd28000 task.ti: ffff88007dd28000 [ 339.134893] RIP: 0010:[<ffffffff811ebfe0>] [<ffffffff811ebfe0>] policy_zonelist+0x50/0xb0 [ 339.134893] RSP: 0000:ffff88007dd2bcf8 EFLAGS: 00010293 [ 339.134893] RAX: 0000000000000000 RBX: ffff88007c133b60 RCX: 0000000000000000 [ 339.134893] RDX: 0000000000000002 RSI: ffff88011bd3fad0 RDI: 00000000000200da [ 339.134893] RBP: ffff88007dd2bd00 R08: 0000000000000002 R09: 0000000000000002 [ 339.134893] R10: ffff88007d8f3958 R11: 0000000000000001 R12: 00000000000200da [ 339.134893] R13: 0000000000000000 R14: ffff88011bd3fad0 R15: 0000000000000000 [ 339.134893] FS: 00007f457cf90740(0000) GS:ffff8800bec00000(0000) knlGS:0000000000000000 [ 339.134893] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 339.134893] CR2: 0000700000184000 CR3: 000000007959b000 CR4: 00000000000006e0 [ 339.134893] Stack: [ 339.134893] ffff88007c133b60 ffff88007dd2bd68 ffffffff811eeac8 ffff88007c133b60 [ 339.134893] ffff88007c133b60 0000000000000000 000000020000000c 0000000000000000 [ 339.134893] ffff88007c387e60 ffff88007c387e60 ffffea0000e19340 ffff88007d8f3958 [ 339.134893] Call Trace: [ 339.134893] [<ffffffff811eeac8>] alloc_pages_vma+0x88/0x1a0 [ 339.134893] [<ffffffff811eec7b>] new_vma_page+0x9b/0xb0 [ 339.134893] [<ffffffff811fee4d>] unmap_and_move+0x3d/0x200 [ 339.134893] [<ffffffff811ff235>] migrate_pages+0xe5/0x1e0 [ 339.134893] [<ffffffff811eebe0>] ? alloc_pages_vma+0x1a0/0x1a0 [ 339.134893] [<ffffffff811ef3c2>] do_mbind+0x1f2/0x3a0 [ 339.134893] [<ffffffff811ef60b>] SyS_mbind+0x9b/0xb0 [ 339.134893] [<ffffffff8174798b>] tracesys+0xdd/0xe2 [ 339.134893] Code: 63 d2 31 c0 85 db 48 8b 14 d5 00 2d d6 81 0f 95 c0 48 69 c0 20 22 01 00 5b 5d 48 8d 84 02 00 1d 00 00 c3 0f 1f 84 00 00 00 00 00 <0f> 0b 66 0f 1f 44 00 00 f6 46 06 02 75 12 89 fb 48 0f bf 56 08 [ 339.134893] RIP [<ffffffff811ebfe0>] policy_zonelist+0x50/0xb0 [ 339.134893] RSP <ffff88007dd2bcf8> [ 339.178924] ---[ end trace 37c12438b6936769 ]--- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: kernel BUG at /src/linux-dev/mm/mempolicy.c:1738! on v3.16-rc1 2014-06-19 21:56 kernel BUG at /src/linux-dev/mm/mempolicy.c:1738! on v3.16-rc1 Naoya Horiguchi @ 2014-06-20 4:35 ` Hugh Dickins 2014-06-20 15:40 ` Naoya Horiguchi 2014-06-20 14:24 ` Christoph Lameter 1 sibling, 1 reply; 9+ messages in thread From: Hugh Dickins @ 2014-06-20 4:35 UTC (permalink / raw) To: Naoya Horiguchi Cc: linux-mm, linux-kernel, Andrew Morton, Hugh Dickins, Christoph Lameter, KOSAKI Motohiro, Naoya Horiguchi On Thu, 19 Jun 2014, Naoya Horiguchi wrote: > Hi, > > I triggered the following bug on v3.16-rc1 when I did mbind() testing > where multiple processes repeat calling mbind() for a shared mapped file > (causing pingpong of page migration.) The shared mapped file on shmem/tmpfs? So involving shared policy stuff? > > In my investigation, it seems that some vma accidentally has vma->vm_start > = 0, which makes new_vma_page() choose a wrong vma and results in breaking > the assumption that the address passed to alloc_pages_vma() should be > inside a given vma. I've not heard of that before. What evidence led you there? > I'm suspecting that mbind_range() do something wrong around vma handling, > but I don't have enough luck yet. Anyone has an idea? No idea at present. Please send disassembly (objdump -d, or objdump -ld if you had DEBUG_INFO) of policy_zonelist() - the Code line isn't enough to go on, since it just shows where the BUG jumped to out-of-line, with no clue as to what might be in the registers - thanks. Hugh > > Thanks, > Naoya Horiguchi > > [ 339.133960] ------------[ cut here ]------------ > [ 339.134893] kernel BUG at /src/linux-dev/mm/mempolicy.c:1738! > [ 339.134893] invalid opcode: 0000 [#1] SMP > [ 339.134893] Modules linked in: stap_2acbad8c3ba47062dbdc6f227d00f8f4__1958(O) bnep bluetooth cfg80211 rfkill ip6t_rpfilter ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ppdev microcode i2c_piix4 pcspkr i2c_core virtio_balloon parport_pc parport serio_raw nfsd auth_rpcgss oid_registry nfs_acl lockd sunrpc virtio_blk virtio_net floppy ata_generic pata_acpi > [ 339.134893] CPU: 2 PID: 2840 Comm: mbind_fuzz Tainted: G O 3.16.0-rc1-140619-1205-00003-g80aa6b64a44e #157 > [ 339.134893] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 > [ 339.134893] task: ffff88007c133b60 ti: ffff88007dd28000 task.ti: ffff88007dd28000 > [ 339.134893] RIP: 0010:[<ffffffff811ebfe0>] [<ffffffff811ebfe0>] policy_zonelist+0x50/0xb0 > [ 339.134893] RSP: 0000:ffff88007dd2bcf8 EFLAGS: 00010293 > [ 339.134893] RAX: 0000000000000000 RBX: ffff88007c133b60 RCX: 0000000000000000 > [ 339.134893] RDX: 0000000000000002 RSI: ffff88011bd3fad0 RDI: 00000000000200da > [ 339.134893] RBP: ffff88007dd2bd00 R08: 0000000000000002 R09: 0000000000000002 > [ 339.134893] R10: ffff88007d8f3958 R11: 0000000000000001 R12: 00000000000200da > [ 339.134893] R13: 0000000000000000 R14: ffff88011bd3fad0 R15: 0000000000000000 > [ 339.134893] FS: 00007f457cf90740(0000) GS:ffff8800bec00000(0000) knlGS:0000000000000000 > [ 339.134893] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 339.134893] CR2: 0000700000184000 CR3: 000000007959b000 CR4: 00000000000006e0 > [ 339.134893] Stack: > [ 339.134893] ffff88007c133b60 ffff88007dd2bd68 ffffffff811eeac8 ffff88007c133b60 > [ 339.134893] ffff88007c133b60 0000000000000000 000000020000000c 0000000000000000 > [ 339.134893] ffff88007c387e60 ffff88007c387e60 ffffea0000e19340 ffff88007d8f3958 > [ 339.134893] Call Trace: > [ 339.134893] [<ffffffff811eeac8>] alloc_pages_vma+0x88/0x1a0 > [ 339.134893] [<ffffffff811eec7b>] new_vma_page+0x9b/0xb0 > [ 339.134893] [<ffffffff811fee4d>] unmap_and_move+0x3d/0x200 > [ 339.134893] [<ffffffff811ff235>] migrate_pages+0xe5/0x1e0 > [ 339.134893] [<ffffffff811eebe0>] ? alloc_pages_vma+0x1a0/0x1a0 > [ 339.134893] [<ffffffff811ef3c2>] do_mbind+0x1f2/0x3a0 > [ 339.134893] [<ffffffff811ef60b>] SyS_mbind+0x9b/0xb0 > [ 339.134893] [<ffffffff8174798b>] tracesys+0xdd/0xe2 > [ 339.134893] Code: 63 d2 31 c0 85 db 48 8b 14 d5 00 2d d6 81 0f 95 c0 48 69 c0 20 22 01 00 5b 5d 48 8d 84 02 00 1d 00 00 c3 0f 1f 84 00 00 00 00 00 <0f> 0b 66 0f 1f 44 00 00 f6 46 06 02 75 12 89 fb 48 0f bf 56 08 > [ 339.134893] RIP [<ffffffff811ebfe0>] policy_zonelist+0x50/0xb0 > [ 339.134893] RSP <ffff88007dd2bcf8> > [ 339.178924] ---[ end trace 37c12438b6936769 ]--- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: kernel BUG at /src/linux-dev/mm/mempolicy.c:1738! on v3.16-rc1 2014-06-20 4:35 ` Hugh Dickins @ 2014-06-20 15:40 ` Naoya Horiguchi 0 siblings, 0 replies; 9+ messages in thread From: Naoya Horiguchi @ 2014-06-20 15:40 UTC (permalink / raw) To: Hugh Dickins Cc: linux-mm, linux-kernel, Andrew Morton, Christoph Lameter, KOSAKI Motohiro, Naoya Horiguchi On Thu, Jun 19, 2014 at 09:35:48PM -0700, Hugh Dickins wrote: > On Thu, 19 Jun 2014, Naoya Horiguchi wrote: > > Hi, > > > > I triggered the following bug on v3.16-rc1 when I did mbind() testing > > where multiple processes repeat calling mbind() for a shared mapped file > > (causing pingpong of page migration.) > > The shared mapped file on shmem/tmpfs? So involving shared policy stuff? Sorry if it was confusing, I used an ext4 file and mapped it to multiple processes, and then let the processes call mbind() with arguments of random address, random length, and random node. And yes, I think it's memory policy thing too. > > > > In my investigation, it seems that some vma accidentally has vma->vm_start > > = 0, which makes new_vma_page() choose a wrong vma and results in breaking > > the assumption that the address passed to alloc_pages_vma() should be > > inside a given vma. > > I've not heard of that before. What evidence led you there? First of all, I checked the return value of get_vma_policy() when BUG() happens (the BUG() was triggered when policy->mode is not MPOL_PREFERRED or MPOL_BIND, so I thought that something happened on struct mempolicy.) The result was weird, I got something like this: policy->flags 0xffff, policy->mode 0x8801, policy->refcnt 0x1d2cd5a9 So I tried to see why this happened, and checked the argument vma of page_address_in_vma(), then got vma->vm_start == 0 when the BUG() was triggered. page_address_in_vma() returns the address of a given page if the address is within the vma, so if vma's range is corrupted, the result is corrupted too. Unfortunately, I'm not sure why this wrong vma happened, but when I check the vma->vm_prev and vma->vm_next, I got something like this: corrupted vma, vm_start:0, vm_end:700000083000 vma->vm_prev, vm_start:70000000d000, vm_end:7000003e5000 # mempolicy associated with this vma was like this: # policy->flags 0, policy->mode 2, policy->refcnt 1 # so this vma seems fine. vma->vm_next was NULL so these vmas seems to be partially overlapped, that's why I suspected vma might be split or merged in the wrong way. > > I'm suspecting that mbind_range() do something wrong around vma handling, > > but I don't have enough luck yet. Anyone has an idea? > > No idea at present. > > Please send disassembly (objdump -d, or objdump -ld if you had DEBUG_INFO) > of policy_zonelist() - the Code line isn't enough to go on, since it just > shows where the BUG jumped to out-of-line, with no clue as to what might > be in the registers - thanks. OK, it's like below: ffffffff811ebf90 <policy_zonelist>: policy_zonelist(): /src/linux-dev/mm/mempolicy.c:1720 ffffffff811ebf90: e8 9b d5 55 00 callq ffffffff81749530 <__fentry__> ffffffff811ebf95: 55 push %rbp ffffffff811ebf96: 48 89 e5 mov %rsp,%rbp ffffffff811ebf99: 53 push %rbx /src/linux-dev/mm/mempolicy.c:1721 ffffffff811ebf9a: 0f b7 46 04 movzwl 0x4(%rsi),%eax ffffffff811ebf9e: 66 83 f8 01 cmp $0x1,%ax ffffffff811ebfa2: 74 44 je ffffffff811ebfe8 <policy_zonelist+0x58> ffffffff811ebfa4: 66 83 f8 02 cmp $0x2,%ax ffffffff811ebfa8: 75 36 jne ffffffff811ebfe0 <policy_zonelist+0x50> /src/linux-dev/mm/mempolicy.c:1733 ffffffff811ebfaa: 89 fb mov %edi,%ebx ffffffff811ebfac: 81 e3 00 00 04 00 and $0x40000,%ebx ffffffff811ebfb2: 75 56 jne ffffffff811ec00a <policy_zonelist+0x7a> ffffffff811ebfb4: 48 63 d2 movslq %edx,%rdx gfp_zonelist(): /src/linux-dev/include/linux/gfp.h:274 ffffffff811ebfb7: 31 c0 xor %eax,%eax ffffffff811ebfb9: 85 db test %ebx,%ebx policy_zonelist(): /src/linux-dev/mm/mempolicy.c:1740 ffffffff811ebfbb: 48 8b 14 d5 00 2d d6 mov -0x7e29d300(,%rdx,8),%rdx ffffffff811ebfc2: 81 node_zonelist(): /src/linux-dev/include/linux/gfp.h:274 ffffffff811ebfc3: 0f 95 c0 setne %al policy_zonelist(): /src/linux-dev/mm/mempolicy.c:1740 ffffffff811ebfc6: 48 69 c0 20 22 01 00 imul $0x12220,%rax,%rax /src/linux-dev/mm/mempolicy.c:1741 ffffffff811ebfcd: 5b pop %rbx ffffffff811ebfce: 5d pop %rbp /src/linux-dev/mm/mempolicy.c:1740 ffffffff811ebfcf: 48 8d 84 02 00 1d 00 lea 0x1d00(%rdx,%rax,1),%rax ffffffff811ebfd6: 00 /src/linux-dev/mm/mempolicy.c:1741 ffffffff811ebfd7: c3 retq ffffffff811ebfd8: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1) ffffffff811ebfdf: 00 /src/linux-dev/mm/mempolicy.c:1738 ffffffff811ebfe0: 0f 0b ud2 ffffffff811ebfe2: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) /src/linux-dev/mm/mempolicy.c:1723 ffffffff811ebfe8: f6 46 06 02 testb $0x2,0x6(%rsi) ffffffff811ebfec: 75 12 jne ffffffff811ec000 <policy_zonelist+0x70> ffffffff811ebfee: 89 fb mov %edi,%ebx ffffffff811ebff0: 48 0f bf 56 08 movswq 0x8(%rsi),%rdx ffffffff811ebff5: 81 e3 00 00 04 00 and $0x40000,%ebx ffffffff811ebffb: eb ba jmp ffffffff811ebfb7 <policy_zonelist+0x27> ffffffff811ebffd: 0f 1f 00 nopl (%rax) ffffffff811ec000: 81 e7 00 00 04 00 and $0x40000,%edi ffffffff811ec006: 89 fb mov %edi,%ebx ffffffff811ec008: eb aa jmp ffffffff811ebfb4 <policy_zonelist+0x24> /src/linux-dev/mm/mempolicy.c:1734 ffffffff811ec00a: 48 8d 7e 08 lea 0x8(%rsi),%rdi ffffffff811ec00e: 48 63 d2 movslq %edx,%rdx variable_test_bit(): /src/linux-dev/arch/x86/include/asm/bitops.h:318 ffffffff811ec011: 48 0f a3 56 08 bt %rdx,0x8(%rsi) ffffffff811ec016: 19 c0 sbb %eax,%eax policy_zonelist(): /src/linux-dev/mm/mempolicy.c:1733 ffffffff811ec018: 85 c0 test %eax,%eax ffffffff811ec01a: 75 9b jne ffffffff811ebfb7 <policy_zonelist+0x27> __first_node(): /src/linux-dev/include/linux/nodemask.h:248 ffffffff811ec01c: be 00 04 00 00 mov $0x400,%esi ffffffff811ec021: e8 9a ae 1b 00 callq ffffffff813a6ec0 <find_first_bit> ffffffff811ec026: ba 00 04 00 00 mov $0x400,%edx ffffffff811ec02b: 3d 00 04 00 00 cmp $0x400,%eax ffffffff811ec030: 0f 4e d0 cmovle %eax,%edx ffffffff811ec033: 48 63 d2 movslq %edx,%rdx ffffffff811ec036: e9 7c ff ff ff jmpq ffffffff811ebfb7 <policy_zonelist+0x27> policy_zonelist(): ffffffff811ec03b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) Thanks, Naoya Horiguchi -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: kernel BUG at /src/linux-dev/mm/mempolicy.c:1738! on v3.16-rc1 2014-06-19 21:56 kernel BUG at /src/linux-dev/mm/mempolicy.c:1738! on v3.16-rc1 Naoya Horiguchi 2014-06-20 4:35 ` Hugh Dickins @ 2014-06-20 14:24 ` Christoph Lameter 2014-06-20 19:46 ` Naoya Horiguchi 1 sibling, 1 reply; 9+ messages in thread From: Christoph Lameter @ 2014-06-20 14:24 UTC (permalink / raw) To: Naoya Horiguchi Cc: linux-mm, linux-kernel, Andrew Morton, Hugh Dickins, KOSAKI Motohiro, Naoya Horiguchi On Thu, 19 Jun 2014, Naoya Horiguchi wrote: > I'm suspecting that mbind_range() do something wrong around vma handling, > but I don't have enough luck yet. Anyone has an idea? Well memory policy data corrupted. This looks like you were trying to do page migration via mbind()? Could we get some more details as to what is going on here? Specifically the parameters passed to mbind would be interesting. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: kernel BUG at /src/linux-dev/mm/mempolicy.c:1738! on v3.16-rc1 2014-06-20 14:24 ` Christoph Lameter @ 2014-06-20 19:46 ` Naoya Horiguchi 2014-06-20 20:03 ` Hugh Dickins 0 siblings, 1 reply; 9+ messages in thread From: Naoya Horiguchi @ 2014-06-20 19:46 UTC (permalink / raw) To: Christoph Lameter Cc: linux-mm, linux-kernel, Andrew Morton, Hugh Dickins, KOSAKI Motohiro, Naoya Horiguchi On Fri, Jun 20, 2014 at 09:24:36AM -0500, Christoph Lameter wrote: > On Thu, 19 Jun 2014, Naoya Horiguchi wrote: > > > I'm suspecting that mbind_range() do something wrong around vma handling, > > but I don't have enough luck yet. Anyone has an idea? > > Well memory policy data corrupted. This looks like you were trying to do > page migration via mbind()? Right. > Could we get some more details as to what is > going on here? Specifically the parameters passed to mbind would be > interesting. My view about the kernel behavior was in another email a few hours ago. And as for what userspace did, I attach the reproducer below. It's simply doing mbind(mode=MPOL_BIND, flags=MPOL_MF_MOVE_ALL) on random address/length/node. What I did to trigger the bug is like below: while true ; do dd if=/dev/urandom of=testfile bs=4096 count=1000 for i in $(seq 10) ; do ./mbind_bug_reproducer testfile > /dev/null & done sleep 3 pkill -SIGUSR1 -f mbind_bug_reproducer done mbind_bug_reproducer.c --- #include <stdio.h> #include <signal.h> #include <sys/mman.h> #include <numa.h> #include <numaif.h> #include <fcntl.h> #define ADDR_INPUT 0x700000000000 #define PS 4096 #define err(x) perror(x),exit(EXIT_FAILURE) #define errmsg(x, ...) fprintf(stderr, x, ##__VA_ARGS__),exit(EXIT_FAILURE) int flag = 1; void sig_handle_flag(int signo) { flag = 0; } void set_new_nodes(struct bitmask *mask, unsigned long node) { numa_bitmask_clearall(mask); numa_bitmask_setbit(mask, node); } int main(int argc, char *argv[]) { int nr = 1000; int fd = -1; char *pfile; struct timeval tv; struct bitmask *nodes; unsigned long nr_nodes; unsigned long memsize = nr * PS; nr_nodes = numa_max_node() + 1; /* numa_num_possible_nodes(); */ nodes = numa_bitmask_alloc(nr_nodes); if (nr_nodes < 2) errmsg("A minimum of 2 nodes is required for this test.\n"); gettimeofday(&tv, NULL); srandom(tv.tv_usec); fd = open(argv[1], O_RDWR, S_IRWXU); if (fd < 0) err("open"); pfile = mmap((void *)ADDR_INPUT, memsize, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); if (pfile == (void*)-1L) err("mmap"); signal(SIGUSR1, sig_handle_flag); while (flag) { int node; unsigned long offset; unsigned long length; memset(pfile, 'a', memsize); node = random() % nr_nodes; set_new_nodes(nodes, random() & nr_nodes); offset = (random() % nr) * PS; length = (random() % (nr - offset/PS)) * PS; printf("[%d] node:%x, offset:%x, length:%x\n", getpid(), node, offset, length); mbind(pfile + offset, length, MPOL_BIND, nodes->maskp, nodes->size + 1, MPOL_MF_MOVE_ALL); } munmap(pfile, memsize); return 0; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: kernel BUG at /src/linux-dev/mm/mempolicy.c:1738! on v3.16-rc1 2014-06-20 19:46 ` Naoya Horiguchi @ 2014-06-20 20:03 ` Hugh Dickins 2014-06-20 20:32 ` Christoph Lameter 2014-06-20 21:12 ` Naoya Horiguchi 0 siblings, 2 replies; 9+ messages in thread From: Hugh Dickins @ 2014-06-20 20:03 UTC (permalink / raw) To: Naoya Horiguchi Cc: Christoph Lameter, linux-mm, linux-kernel, Andrew Morton, Hugh Dickins, KOSAKI Motohiro, Naoya Horiguchi On Fri, 20 Jun 2014, Naoya Horiguchi wrote: > On Fri, Jun 20, 2014 at 09:24:36AM -0500, Christoph Lameter wrote: > > On Thu, 19 Jun 2014, Naoya Horiguchi wrote: > > > > > I'm suspecting that mbind_range() do something wrong around vma handling, > > > but I don't have enough luck yet. Anyone has an idea? > > > > Well memory policy data corrupted. This looks like you were trying to do > > page migration via mbind()? > > Right. > > > Could we get some more details as to what is > > going on here? Specifically the parameters passed to mbind would be > > interesting. > > My view about the kernel behavior was in another email a few hours ago. > And as for what userspace did, I attach the reproducer below. It's simply > doing mbind(mode=MPOL_BIND, flags=MPOL_MF_MOVE_ALL) on random address/length/node. Thanks for the additional information earlier. ext4, so no shmem shared mempolicy involved: that cuts down the bugspace considerably. I agree from what you said that it looked like corrupt vm_area_struct and hence corrupt policy. Here's an obvious patch to try, entirely untested - thanks for the reproducer, but I'd rather leave the testing to you. Sounds like you have a useful fuzzer there: good catch. [PATCH] mm: fix crashes from mbind() merging vmas v2.6.34's 9d8cebd4bcd7 ("mm: fix mbind vma merge problem") introduced vma merging to mbind(), but it should have also changed the convention of passing start vma from queue_pages_range() (formerly check_range()) to new_vma_page(): vma merging may have already freed that structure, resulting in BUG at mm/mempolicy.c:1738 and probably worse crashes. Fixes: 9d8cebd4bcd7 ("mm: fix mbind vma merge problem") Reported-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Hugh Dickins <hughd@google.com> Cc: stable@vger.kernel.org # 2.6.34+ --- mm/mempolicy.c | 46 ++++++++++++++++++++-------------------------- 1 file changed, 20 insertions(+), 26 deletions(-) --- 3.16-rc1/mm/mempolicy.c 2014-06-16 00:28:55.116076530 -0700 +++ linux/mm/mempolicy.c 2014-06-20 12:40:00.000204558 -0700 @@ -656,19 +656,18 @@ static unsigned long change_prot_numa(st * @nodes and @flags,) it's isolated and queued to the pagelist which is * passed via @private.) */ -static struct vm_area_struct * +static int queue_pages_range(struct mm_struct *mm, unsigned long start, unsigned long end, const nodemask_t *nodes, unsigned long flags, void *private) { - int err; - struct vm_area_struct *first, *vma, *prev; - + int err = 0; + struct vm_area_struct *vma, *prev; - first = find_vma(mm, start); - if (!first) - return ERR_PTR(-EFAULT); + vma = find_vma(mm, start); + if (!vma) + return -EFAULT; prev = NULL; - for (vma = first; vma && vma->vm_start < end; vma = vma->vm_next) { + for (; vma && vma->vm_start < end; vma = vma->vm_next) { unsigned long endvma = vma->vm_end; if (endvma > end) @@ -678,9 +677,9 @@ queue_pages_range(struct mm_struct *mm, if (!(flags & MPOL_MF_DISCONTIG_OK)) { if (!vma->vm_next && vma->vm_end < end) - return ERR_PTR(-EFAULT); + return -EFAULT; if (prev && prev->vm_end < vma->vm_start) - return ERR_PTR(-EFAULT); + return -EFAULT; } if (flags & MPOL_MF_LAZY) { @@ -694,15 +693,13 @@ queue_pages_range(struct mm_struct *mm, err = queue_pages_pgd_range(vma, start, endvma, nodes, flags, private); - if (err) { - first = ERR_PTR(err); + if (err) break; - } } next: prev = vma; } - return first; + return err; } /* @@ -1156,16 +1153,17 @@ out: /* * Allocate a new page for page migration based on vma policy. - * Start assuming that page is mapped by vma pointed to by @private. + * Start by assuming the page is mapped by the same vma as contains @start. * Search forward from there, if not. N.B., this assumes that the * list of pages handed to migrate_pages()--which is how we get here-- * is in virtual address order. */ -static struct page *new_vma_page(struct page *page, unsigned long private, int **x) +static struct page *new_page(struct page *page, unsigned long start, int **x) { - struct vm_area_struct *vma = (struct vm_area_struct *)private; + struct vm_area_struct *vma; unsigned long uninitialized_var(address); + vma = find_vma(current->mm, start); while (vma) { address = page_address_in_vma(page, vma); if (address != -EFAULT) @@ -1195,7 +1193,7 @@ int do_migrate_pages(struct mm_struct *m return -ENOSYS; } -static struct page *new_vma_page(struct page *page, unsigned long private, int **x) +static struct page *new_page(struct page *page, unsigned long start, int **x) { return NULL; } @@ -1205,7 +1203,6 @@ static long do_mbind(unsigned long start unsigned short mode, unsigned short mode_flags, nodemask_t *nmask, unsigned long flags) { - struct vm_area_struct *vma; struct mm_struct *mm = current->mm; struct mempolicy *new; unsigned long end; @@ -1271,11 +1268,9 @@ static long do_mbind(unsigned long start if (err) goto mpol_out; - vma = queue_pages_range(mm, start, end, nmask, + err = queue_pages_range(mm, start, end, nmask, flags | MPOL_MF_INVERT, &pagelist); - - err = PTR_ERR(vma); /* maybe ... */ - if (!IS_ERR(vma)) + if (!err) err = mbind_range(mm, start, end, new); if (!err) { @@ -1283,9 +1278,8 @@ static long do_mbind(unsigned long start if (!list_empty(&pagelist)) { WARN_ON_ONCE(flags & MPOL_MF_LAZY); - nr_failed = migrate_pages(&pagelist, new_vma_page, - NULL, (unsigned long)vma, - MIGRATE_SYNC, MR_MEMPOLICY_MBIND); + nr_failed = migrate_pages(&pagelist, new_page, NULL, + start, MIGRATE_SYNC, MR_MEMPOLICY_MBIND); if (nr_failed) putback_movable_pages(&pagelist); } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: kernel BUG at /src/linux-dev/mm/mempolicy.c:1738! on v3.16-rc1 2014-06-20 20:03 ` Hugh Dickins @ 2014-06-20 20:32 ` Christoph Lameter 2014-06-20 21:12 ` Naoya Horiguchi 1 sibling, 0 replies; 9+ messages in thread From: Christoph Lameter @ 2014-06-20 20:32 UTC (permalink / raw) To: Hugh Dickins Cc: Naoya Horiguchi, linux-mm, linux-kernel, Andrew Morton, KOSAKI Motohiro, Naoya Horiguchi On Fri, 20 Jun 2014, Hugh Dickins wrote: > [PATCH] mm: fix crashes from mbind() merging vmas > > v2.6.34's 9d8cebd4bcd7 ("mm: fix mbind vma merge problem") introduced > vma merging to mbind(), but it should have also changed the convention > of passing start vma from queue_pages_range() (formerly check_range()) > to new_vma_page(): vma merging may have already freed that structure, > resulting in BUG at mm/mempolicy.c:1738 and probably worse crashes. Good catch. Cannot find fault with what I see. Acked-by: Christoph Lameter <cl@linux.com> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: kernel BUG at /src/linux-dev/mm/mempolicy.c:1738! on v3.16-rc1 2014-06-20 20:03 ` Hugh Dickins 2014-06-20 20:32 ` Christoph Lameter @ 2014-06-20 21:12 ` Naoya Horiguchi 2014-06-20 21:42 ` Hugh Dickins 1 sibling, 1 reply; 9+ messages in thread From: Naoya Horiguchi @ 2014-06-20 21:12 UTC (permalink / raw) To: Hugh Dickins Cc: Christoph Lameter, linux-mm, linux-kernel, Andrew Morton, KOSAKI Motohiro, Naoya Horiguchi On Fri, Jun 20, 2014 at 01:03:58PM -0700, Hugh Dickins wrote: > On Fri, 20 Jun 2014, Naoya Horiguchi wrote: > > On Fri, Jun 20, 2014 at 09:24:36AM -0500, Christoph Lameter wrote: > > > On Thu, 19 Jun 2014, Naoya Horiguchi wrote: > > > > > > > I'm suspecting that mbind_range() do something wrong around vma handling, > > > > but I don't have enough luck yet. Anyone has an idea? > > > > > > Well memory policy data corrupted. This looks like you were trying to do > > > page migration via mbind()? > > > > Right. > > > > > Could we get some more details as to what is > > > going on here? Specifically the parameters passed to mbind would be > > > interesting. > > > > My view about the kernel behavior was in another email a few hours ago. > > And as for what userspace did, I attach the reproducer below. It's simply > > doing mbind(mode=MPOL_BIND, flags=MPOL_MF_MOVE_ALL) on random address/length/node. > > Thanks for the additional information earlier. ext4, so no shmem > shared mempolicy involved: that cuts down the bugspace considerably. > > I agree from what you said that it looked like corrupt vm_area_struct > and hence corrupt policy. > > Here's an obvious patch to try, entirely untested - thanks for the > reproducer, but I'd rather leave the testing to you. Sounds like > you have a useful fuzzer there: good catch. > > > [PATCH] mm: fix crashes from mbind() merging vmas > > v2.6.34's 9d8cebd4bcd7 ("mm: fix mbind vma merge problem") introduced > vma merging to mbind(), but it should have also changed the convention > of passing start vma from queue_pages_range() (formerly check_range()) > to new_vma_page(): vma merging may have already freed that structure, > resulting in BUG at mm/mempolicy.c:1738 and probably worse crashes. > > Fixes: 9d8cebd4bcd7 ("mm: fix mbind vma merge problem") > Reported-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> > Signed-off-by: Hugh Dickins <hughd@google.com> > Cc: stable@vger.kernel.org # 2.6.34+ With your patch, the bug doesn't reproduce in one hour testing. I think it's long enough because it took only a few minutes until the reproducer triggered the bug without your patch. So I think the problem was gone, thank you very much! Tested-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> > --- > > mm/mempolicy.c | 46 ++++++++++++++++++++-------------------------- > 1 file changed, 20 insertions(+), 26 deletions(-) > > --- 3.16-rc1/mm/mempolicy.c 2014-06-16 00:28:55.116076530 -0700 > +++ linux/mm/mempolicy.c 2014-06-20 12:40:00.000204558 -0700 > @@ -656,19 +656,18 @@ static unsigned long change_prot_numa(st > * @nodes and @flags,) it's isolated and queued to the pagelist which is > * passed via @private.) > */ > -static struct vm_area_struct * > +static int > queue_pages_range(struct mm_struct *mm, unsigned long start, unsigned long end, > const nodemask_t *nodes, unsigned long flags, void *private) > { > - int err; > - struct vm_area_struct *first, *vma, *prev; > - > + int err = 0; > + struct vm_area_struct *vma, *prev; > > - first = find_vma(mm, start); > - if (!first) > - return ERR_PTR(-EFAULT); > + vma = find_vma(mm, start); > + if (!vma) > + return -EFAULT; > prev = NULL; > - for (vma = first; vma && vma->vm_start < end; vma = vma->vm_next) { > + for (; vma && vma->vm_start < end; vma = vma->vm_next) { > unsigned long endvma = vma->vm_end; > > if (endvma > end) > @@ -678,9 +677,9 @@ queue_pages_range(struct mm_struct *mm, > > if (!(flags & MPOL_MF_DISCONTIG_OK)) { > if (!vma->vm_next && vma->vm_end < end) > - return ERR_PTR(-EFAULT); > + return -EFAULT; > if (prev && prev->vm_end < vma->vm_start) > - return ERR_PTR(-EFAULT); > + return -EFAULT; > } > > if (flags & MPOL_MF_LAZY) { > @@ -694,15 +693,13 @@ queue_pages_range(struct mm_struct *mm, > > err = queue_pages_pgd_range(vma, start, endvma, nodes, > flags, private); > - if (err) { > - first = ERR_PTR(err); > + if (err) > break; > - } > } > next: > prev = vma; > } > - return first; > + return err; > } > > /* > @@ -1156,16 +1153,17 @@ out: > > /* > * Allocate a new page for page migration based on vma policy. > - * Start assuming that page is mapped by vma pointed to by @private. > + * Start by assuming the page is mapped by the same vma as contains @start. > * Search forward from there, if not. N.B., this assumes that the > * list of pages handed to migrate_pages()--which is how we get here-- > * is in virtual address order. > */ > -static struct page *new_vma_page(struct page *page, unsigned long private, int **x) > +static struct page *new_page(struct page *page, unsigned long start, int **x) > { > - struct vm_area_struct *vma = (struct vm_area_struct *)private; > + struct vm_area_struct *vma; > unsigned long uninitialized_var(address); > > + vma = find_vma(current->mm, start); > while (vma) { > address = page_address_in_vma(page, vma); > if (address != -EFAULT) > @@ -1195,7 +1193,7 @@ int do_migrate_pages(struct mm_struct *m > return -ENOSYS; > } > > -static struct page *new_vma_page(struct page *page, unsigned long private, int **x) > +static struct page *new_page(struct page *page, unsigned long start, int **x) > { > return NULL; > } > @@ -1205,7 +1203,6 @@ static long do_mbind(unsigned long start > unsigned short mode, unsigned short mode_flags, > nodemask_t *nmask, unsigned long flags) > { > - struct vm_area_struct *vma; > struct mm_struct *mm = current->mm; > struct mempolicy *new; > unsigned long end; > @@ -1271,11 +1268,9 @@ static long do_mbind(unsigned long start > if (err) > goto mpol_out; > > - vma = queue_pages_range(mm, start, end, nmask, > + err = queue_pages_range(mm, start, end, nmask, > flags | MPOL_MF_INVERT, &pagelist); > - > - err = PTR_ERR(vma); /* maybe ... */ > - if (!IS_ERR(vma)) > + if (!err) > err = mbind_range(mm, start, end, new); > > if (!err) { > @@ -1283,9 +1278,8 @@ static long do_mbind(unsigned long start > > if (!list_empty(&pagelist)) { > WARN_ON_ONCE(flags & MPOL_MF_LAZY); > - nr_failed = migrate_pages(&pagelist, new_vma_page, > - NULL, (unsigned long)vma, > - MIGRATE_SYNC, MR_MEMPOLICY_MBIND); > + nr_failed = migrate_pages(&pagelist, new_page, NULL, > + start, MIGRATE_SYNC, MR_MEMPOLICY_MBIND); > if (nr_failed) > putback_movable_pages(&pagelist); > } > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: kernel BUG at /src/linux-dev/mm/mempolicy.c:1738! on v3.16-rc1 2014-06-20 21:12 ` Naoya Horiguchi @ 2014-06-20 21:42 ` Hugh Dickins 0 siblings, 0 replies; 9+ messages in thread From: Hugh Dickins @ 2014-06-20 21:42 UTC (permalink / raw) To: Naoya Horiguchi Cc: Hugh Dickins, Christoph Lameter, linux-mm, linux-kernel, Andrew Morton, KOSAKI Motohiro, Naoya Horiguchi On Fri, 20 Jun 2014, Naoya Horiguchi wrote: > On Fri, Jun 20, 2014 at 01:03:58PM -0700, Hugh Dickins wrote: > > On Fri, 20 Jun 2014, Naoya Horiguchi wrote: > > > On Fri, Jun 20, 2014 at 09:24:36AM -0500, Christoph Lameter wrote: > > > > On Thu, 19 Jun 2014, Naoya Horiguchi wrote: > > > > > > > > > I'm suspecting that mbind_range() do something wrong around vma handling, > > > > > but I don't have enough luck yet. Anyone has an idea? > > > > > > > > Well memory policy data corrupted. This looks like you were trying to do > > > > page migration via mbind()? > > > > > > Right. > > > > > > > Could we get some more details as to what is > > > > going on here? Specifically the parameters passed to mbind would be > > > > interesting. > > > > > > My view about the kernel behavior was in another email a few hours ago. > > > And as for what userspace did, I attach the reproducer below. It's simply > > > doing mbind(mode=MPOL_BIND, flags=MPOL_MF_MOVE_ALL) on random address/length/node. > > > > Thanks for the additional information earlier. ext4, so no shmem > > shared mempolicy involved: that cuts down the bugspace considerably. > > > > I agree from what you said that it looked like corrupt vm_area_struct > > and hence corrupt policy. > > > > Here's an obvious patch to try, entirely untested - thanks for the > > reproducer, but I'd rather leave the testing to you. Sounds like > > you have a useful fuzzer there: good catch. > > > > > > [PATCH] mm: fix crashes from mbind() merging vmas > > > > v2.6.34's 9d8cebd4bcd7 ("mm: fix mbind vma merge problem") introduced > > vma merging to mbind(), but it should have also changed the convention > > of passing start vma from queue_pages_range() (formerly check_range()) > > to new_vma_page(): vma merging may have already freed that structure, > > resulting in BUG at mm/mempolicy.c:1738 and probably worse crashes. > > > > Fixes: 9d8cebd4bcd7 ("mm: fix mbind vma merge problem") > > Reported-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> > > Signed-off-by: Hugh Dickins <hughd@google.com> > > Cc: stable@vger.kernel.org # 2.6.34+ > > With your patch, the bug doesn't reproduce in one hour testing. I think > it's long enough because it took only a few minutes until the reproducer > triggered the bug without your patch. > So I think the problem was gone, thank you very much! > > Tested-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> & Acked-by: Christoph Lameter <cl@linux.com> Great, thank you both. I was afraid you might hit something else immediately. I'll send the patch to Andrew later - unless it magically appears in his tree before. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2014-06-20 21:43 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-06-19 21:56 kernel BUG at /src/linux-dev/mm/mempolicy.c:1738! on v3.16-rc1 Naoya Horiguchi 2014-06-20 4:35 ` Hugh Dickins 2014-06-20 15:40 ` Naoya Horiguchi 2014-06-20 14:24 ` Christoph Lameter 2014-06-20 19:46 ` Naoya Horiguchi 2014-06-20 20:03 ` Hugh Dickins 2014-06-20 20:32 ` Christoph Lameter 2014-06-20 21:12 ` Naoya Horiguchi 2014-06-20 21:42 ` Hugh Dickins
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).