From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Dmitry Vyukov <dvyukov@google.com>,
	Doug Gilbert <dgilbert@interlog.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Rientjes <rientjes@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Shiraz Hashim <shashim@codeaurora.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Hugh Dickins <hughd@google.com>,
	Sasha Levin <sasha.levin@oracle.com>,
	syzkaller <syzkaller@googlegroups.com>,
	Kostya Serebryany <kcc@google.com>,
	Alexander Potapenko <glider@google.com>,
	linux-scsi@vger.kernel.org
Subject: Re: mm: VM_BUG_ON_PAGE(PageTail(page)) in mbind
Date: Tue, 26 Jan 2016 22:28:29 +0200	[thread overview]
Message-ID: <20160126202829.GA21250@node.shutemov.name> (raw)
In-Reply-To: <CACT4Y+YK7or=W4RGpv1k1T5-xDHu3_PPVZWqsQU6nWoArsV5vA@mail.gmail.com>
On Tue, Jan 26, 2016 at 01:52:31PM +0100, Dmitry Vyukov wrote:
> Hello,
> 
> The following program triggers the following bug:
> 
> page:ffffea0000b82240 count:0 mapcount:1 mapping:dead0000ffffffff
> index:0x0 compound_mapcount: 0
> flags: 0x1fffc0000000000()
> page dumped because: VM_BUG_ON_PAGE(PageTail(page))
> ------------[ cut here ]------------
> kernel BUG at mm/vmscan.c:1446!
> invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
> Modules linked in:
> CPU: 1 PID: 6868 Comm: a.out Not tainted 4.5.0-rc1+ #287
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: ffff88003e24af80 ti: ffff88002e808000 task.ti: ffff88002e808000
> RIP: 0010:[<ffffffff816a4b7a>]  [<ffffffff816a4b7a>]
> isolate_lru_page+0x4ea/0x6d0
> RSP: 0018:ffff88002e80fa50  EFLAGS: 00010282
> RAX: ffff88003e24af80 RBX: ffffea0000b82240 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffea0000b82278
> RBP: ffff88002e80fa88 R08: 0000000000000001 R09: 0000000000000000
> R10: ffff88003e24af80 R11: 0000000000000001 R12: ffffea0000b82260
> R13: ffffea0000b82200 R14: ffffea0000b82201 R15: 0000000020004000
> FS:  0000000000c1f880(0063) GS:ffff88003ed00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000000020005ff8 CR3: 000000002e324000 CR4: 00000000000006e0
> Stack:
>  dffffc0000000000 ffffffff816f5e89 ffff88003124b010 0000000020002000
>  dffffc0000000000 ffffea0000b82240 0000000020004000 ffff88002e80fb10
>  ffffffff817612bd ffffea0000000001 ffff88002e80fc70 ffff88002e80fde8
> Call Trace:
>  [<     inline     >] migrate_page_add mm/mempolicy.c:966
>  [<ffffffff817612bd>] queue_pages_pte_range+0x4ad/0x10b0 mm/mempolicy.c:552
>  [<     inline     >] walk_pmd_range mm/pagewalk.c:50
>  [<     inline     >] walk_pud_range mm/pagewalk.c:90
>  [<     inline     >] walk_pgd_range mm/pagewalk.c:116
>  [<ffffffff81732713>] __walk_page_range+0x653/0xcd0 mm/pagewalk.c:204
>  [<ffffffff81732ec4>] walk_page_range+0x134/0x300 mm/pagewalk.c:281
>  [<ffffffff8175f07b>] queue_pages_range+0xfb/0x130 mm/mempolicy.c:687
>  [<ffffffff817678c1>] do_mbind+0x2c1/0xdc0 mm/mempolicy.c:1239
>  [<     inline     >] SYSC_mbind mm/mempolicy.c:1351
>  [<ffffffff8176871d>] SyS_mbind+0x13d/0x150 mm/mempolicy.c:1333
>  [<ffffffff8646ed76>] entry_SYSCALL_64_fastpath+0x16/0x7a
> arch/x86/entry/entry_64.S:185
> Code: 89 df e8 aa 64 04 00 0f 0b e8 63 6d ed ff 4d 8d 6e ff e9 73 fb
> ff ff e8 55 6d ed ff 48 c7 c6 60 7b 5b 86 48 89 df e8 86 64 04 00 <0f>
> 0b e8 3f 6d ed ff 4d 8d 6e ff e9 eb fb ff ff c7 45 d0 f0 ff
> RIP  [<ffffffff816a4b7a>] isolate_lru_page+0x4ea/0x6d0 mm/vmscan.c:1446
>  RSP <ffff88002e80fa50>
> ---[ end trace 310d844ac0b69c5b ]---
> BUG: sleeping function called from invalid context at include/linux/sched.h:2805
> in_atomic(): 1, irqs_disabled(): 0, pid: 6868, name: a.out
> INFO: lockdep is turned off.
> CPU: 1 PID: 6868 Comm: a.out Tainted: G      D         4.5.0-rc1+ #287
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>  00000000ffffffff ffff88002e80f548 ffffffff829f9d0d ffff88003e24af80
>  0000000000001ad4 0000000000000000 ffff88002e80f570 ffffffff813cba2b
>  ffff88003e24af80 ffffffff865527a0 0000000000000af5 ffff88002e80f5b0
> Call Trace:
>  [<     inline     >] __dump_stack lib/dump_stack.c:15
>  [<ffffffff829f9d0d>] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
>  [<ffffffff813cba2b>] ___might_sleep+0x27b/0x3a0 kernel/sched/core.c:7703
>  [<ffffffff813cbbe0>] __might_sleep+0x90/0x1a0 kernel/sched/core.c:7665
>  [<     inline     >] threadgroup_change_begin include/linux/sched.h:2805
>  [<ffffffff81383221>] exit_signals+0x81/0x430 kernel/signal.c:2392
>  [<ffffffff8135c55c>] do_exit+0x23c/0x2cb0 kernel/exit.c:701
>  [<ffffffff811aa28f>] oops_end+0x9f/0xd0 arch/x86/kernel/dumpstack.c:250
>  [<ffffffff811aa686>] die+0x46/0x60 arch/x86/kernel/dumpstack.c:316
>  [<     inline     >] do_trap_no_signal arch/x86/kernel/traps.c:205
>  [<ffffffff811a3b9f>] do_trap+0x18f/0x380 arch/x86/kernel/traps.c:251
>  [<ffffffff811a400e>] do_error_trap+0x11e/0x280 arch/x86/kernel/traps.c:290
>  [<ffffffff811a527b>] do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:303
>  [<ffffffff86470a8e>] invalid_op+0x1e/0x30 arch/x86/entry/entry_64.S:830
>  [<     inline     >] migrate_page_add mm/mempolicy.c:966
>  [<ffffffff817612bd>] queue_pages_pte_range+0x4ad/0x10b0 mm/mempolicy.c:552
>  [<     inline     >] walk_pmd_range mm/pagewalk.c:50
>  [<     inline     >] walk_pud_range mm/pagewalk.c:90
>  [<     inline     >] walk_pgd_range mm/pagewalk.c:116
>  [<ffffffff81732713>] __walk_page_range+0x653/0xcd0 mm/pagewalk.c:204
>  [<ffffffff81732ec4>] walk_page_range+0x134/0x300 mm/pagewalk.c:281
>  [<ffffffff8175f07b>] queue_pages_range+0xfb/0x130 mm/mempolicy.c:687
>  [<ffffffff817678c1>] do_mbind+0x2c1/0xdc0 mm/mempolicy.c:1239
>  [<     inline     >] SYSC_mbind mm/mempolicy.c:1351
>  [<ffffffff8176871d>] SyS_mbind+0x13d/0x150 mm/mempolicy.c:1333
>  [<ffffffff8646ed76>] entry_SYSCALL_64_fastpath+0x16/0x7a
> arch/x86/entry/entry_64.S:185
> note: a.out[6868] exited with preempt_count 1
> 
> 
> // autogenerated by syzkaller (http://github.com/google/syzkaller)
> #include <pthread.h>
> #include <stdint.h>
> #include <string.h>
> #include <sys/syscall.h>
> #include <unistd.h>
> #include <fcntl.h>
> 
> #ifndef SYS_mlock2
> #define SYS_mlock2 325
> #endif
> 
> int main()
> {
>   long r[8];
>   memset(r, -1, sizeof(r));
>   r[0] = syscall(SYS_mmap, 0x20000000ul, 0x1000ul, 0x3ul, 0x32ul,
>                  0xfffffffffffffffful, 0x0ul);
>   memcpy((void*)0x20000f33, "\x2f\x64\x65\x76\x2f\x73\x67\x23", 8);
>   r[2] = syscall(SYS_open, "/dev/sg0",O_RDWR);
>   r[3] = syscall(SYS_mmap, 0x20001000ul, 0x4000ul, 0x4ul, 0x12ul, r[2],
>                  0x0ul);
>   r[4] = syscall(SYS_mlock2, 0x20001000ul, 0x3000ul, 0x1ul, 0, 0, 0);
>   r[5] = syscall(SYS_mmap, 0x20005000ul, 0x1000ul, 0x3ul, 0x32ul,
>                  0xfffffffffffffffful, 0x0ul);
>   *(uint64_t*)0x20005ff8 = (uint64_t)0x80000000;
>   r[7] = syscall(SYS_mbind, 0x20000000ul, 0x4000ul, 0x8000ul,
>                  0x20005ff8ul, 0x5ul, 0x2ul);
>   return 0;
> }
> 
> 
> On commit 92e963f50fc74041b5e9e744c330dca48e04f08d.
The patch below fixes the issue for me, but this bug makes me wounder how
many bugs like this we have in kernel... :-/
Looks like we are too permissive about which VMA is migratable:
vma_migratable() filters out VMA by VM_IO and VM_PFNMAP.
I think VM_DONTEXPAND also correlate with VMA which cannot be migrated.
$ git grep VM_DONTEXPAND drivers | grep -v '\(VM_IO\|VM_PFNMAN\)' | wc -l 
33
Hm.. :-|
It worth looking on them closely... And I wouldn't be surprised if some
VMAs without all of these flags are not migratable too.
Sigh.. Any thoughts?
next prev parent reply	other threads:[~2016-01-26 20:28 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-26 12:52 mm: VM_BUG_ON_PAGE(PageTail(page)) in mbind Dmitry Vyukov
2016-01-26 20:28 ` Kirill A. Shutemov [this message]
2016-01-26 20:48   ` Andrew Morton
2016-01-26 21:07     ` Kirill A. Shutemov
2016-01-26 20:49   ` Andrew Morton
2016-01-26 21:05     ` Kirill A. Shutemov
2016-01-26 21:37   ` Vlastimil Babka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox
  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):
  git send-email \
    --in-reply-to=20160126202829.GA21250@node.shutemov.name \
    --to=kirill@shutemov.name \
    --cc=akpm@linux-foundation.org \
    --cc=dgilbert@interlog.com \
    --cc=dvyukov@google.com \
    --cc=glider@google.com \
    --cc=hughd@google.com \
    --cc=kcc@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=rientjes@google.com \
    --cc=sasha.levin@oracle.com \
    --cc=shashim@codeaurora.org \
    --cc=syzkaller@googlegroups.com \
    /path/to/YOUR_REPLY
  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
  Be sure your reply has a Subject: header at the top and a blank line
  before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).