* kernel BUG at mm/shmem.c:LINE! @ 2018-07-07 1:19 syzbot 2018-07-07 2:57 ` Matthew Wilcox 2018-07-09 14:36 ` Matthew Wilcox 0 siblings, 2 replies; 14+ messages in thread From: syzbot @ 2018-07-07 1:19 UTC (permalink / raw) To: hughd, linux-kernel, linux-mm, syzkaller-bugs Hello, syzbot found the following crash on: HEAD commit: 526674536360 Add linux-next specific files for 20180706 git tree: linux-next console output: https://syzkaller.appspot.com/x/log.txt?x=116d16fc400000 kernel config: https://syzkaller.appspot.com/x/.config?x=c8d1cfc0cb798e48 dashboard link: https://syzkaller.appspot.com/bug?extid=b8e0dfee3fd8c9012771 compiler: gcc (GCC) 8.0.1 20180413 (experimental) syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=170e462c400000 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15f1ba2c400000 IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+b8e0dfee3fd8c9012771@syzkaller.appspotmail.com raw: 02fffc0000001028 ffffea0007011dc8 ffffea0007058b48 ffff8801a7576ab8 raw: 000000000000016e ffff8801a7588930 00000003ffffffff ffff8801d9a44c80 page dumped because: VM_BUG_ON_PAGE(page_to_pgoff(page) != index) page->mem_cgroup:ffff8801d9a44c80 ------------[ cut here ]------------ kernel BUG at mm/shmem.c:815! invalid opcode: 0000 [#1] SMP KASAN CPU: 0 PID: 4429 Comm: syz-executor697 Not tainted 4.18.0-rc3-next-20180706+ #1 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:shmem_undo_range+0xdaa/0x29a0 mm/shmem.c:815 Code: 00 0f 85 bd 19 00 00 48 8d 65 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 e8 a5 f0 d6 ff 48 c7 c6 e0 32 f1 87 4c 89 e7 e8 16 10 05 00 <0f> 0b e8 8f f0 d6 ff 49 8d 7c 24 20 48 89 f8 48 c1 e8 03 80 3c 18 RSP: 0018:ffff8801ab88e158 EFLAGS: 00010246 RAX: 0000000000000000 RBX: dffffc0000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff81aaab95 RDI: ffffed0035711c18 RBP: ffff8801ab88e8d0 R08: ffff8801a7af04c0 R09: ffffed003b5c4fc0 R10: ffffed003b5c4fc0 R11: ffff8801dae27e07 R12: ffffea0007058b00 R13: ffff8801ab88e8a8 R14: 0000000000000001 R15: 000000000000016e FS: 0000000000000000(0000) GS:ffff8801dae00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004b625c CR3: 0000000008e6a000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: shmem_truncate_range+0x27/0xa0 mm/shmem.c:971 shmem_evict_inode+0x3b2/0xcb0 mm/shmem.c:1071 evict+0x4ae/0x990 fs/inode.c:558 iput_final fs/inode.c:1508 [inline] iput+0x635/0xaa0 fs/inode.c:1534 dentry_unlink_inode+0x4ae/0x640 fs/dcache.c:377 __dentry_kill+0x44c/0x7a0 fs/dcache.c:569 dentry_kill+0xc9/0x5a0 fs/dcache.c:688 dput.part.26+0x66b/0x7a0 fs/dcache.c:849 dput+0x15/0x20 fs/dcache.c:831 __fput+0x558/0x930 fs/file_table.c:235 ____fput+0x15/0x20 fs/file_table.c:251 task_work_run+0x1ec/0x2a0 kernel/task_work.c:113 exit_task_work include/linux/task_work.h:22 [inline] do_exit+0x1b08/0x2750 kernel/exit.c:869 do_group_exit+0x177/0x440 kernel/exit.c:972 get_signal+0x88e/0x1970 kernel/signal.c:2467 do_signal+0x9c/0x21c0 arch/x86/kernel/signal.c:816 exit_to_usermode_loop+0x2e0/0x370 arch/x86/entry/common.c:162 prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline] syscall_return_slowpath arch/x86/entry/common.c:268 [inline] do_syscall_64+0x6be/0x820 arch/x86/entry/common.c:293 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x441c29 Code: Bad RIP value. RSP: 002b:00007fff6e973338 EFLAGS: 00000246 ORIG_RAX: 0000000000000028 RAX: ffffffffffffffe0 RBX: 0000000000000000 RCX: 0000000000441c29 RDX: 0000000020000180 RSI: 0000000000000004 RDI: 0000000000000003 RBP: 00007fff6e973350 R08: 0000000000000001 R09: 0000000000000000 R10: 0a00004000000002 R11: 0000000000000246 R12: ffffffffffffffff R13: 0000000000000005 R14: 0000000000000000 R15: 0000000000000000 Modules linked in: Dumping ftrace buffer: (ftrace buffer empty) ---[ end trace 68c2f261fd3bbf54 ]--- RIP: 0010:shmem_undo_range+0xdaa/0x29a0 mm/shmem.c:815 Code: 00 0f 85 bd 19 00 00 48 8d 65 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 e8 a5 f0 d6 ff 48 c7 c6 e0 32 f1 87 4c 89 e7 e8 16 10 05 00 <0f> 0b e8 8f f0 d6 ff 49 8d 7c 24 20 48 89 f8 48 c1 e8 03 80 3c 18 RSP: 0018:ffff8801ab88e158 EFLAGS: 00010246 RAX: 0000000000000000 RBX: dffffc0000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff81aaab95 RDI: ffffed0035711c18 RBP: ffff8801ab88e8d0 R08: ffff8801a7af04c0 R09: ffffed003b5c4fc0 R10: ffffed003b5c4fc0 R11: ffff8801dae27e07 R12: ffffea0007058b00 R13: ffff8801ab88e8a8 R14: 0000000000000001 R15: 000000000000016e FS: 0000000000000000(0000) GS:ffff8801dae00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000441bff CR3: 0000000008e6a000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 --- This bug is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkaller@googlegroups.com. syzbot will keep track of this bug report. See: https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with syzbot. syzbot can test patches for this bug, for details see: https://goo.gl/tpsmEJ#testing-patches ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel BUG at mm/shmem.c:LINE! 2018-07-07 1:19 kernel BUG at mm/shmem.c:LINE! syzbot @ 2018-07-07 2:57 ` Matthew Wilcox 2018-07-09 14:36 ` Matthew Wilcox 1 sibling, 0 replies; 14+ messages in thread From: Matthew Wilcox @ 2018-07-07 2:57 UTC (permalink / raw) To: syzbot; +Cc: hughd, linux-kernel, linux-mm, syzkaller-bugs On Fri, Jul 06, 2018 at 06:19:02PM -0700, syzbot wrote: > IMPORTANT: if you fix the bug, please add the following tag to the commit: > Reported-by: syzbot+b8e0dfee3fd8c9012771@syzkaller.appspotmail.com > > raw: 02fffc0000001028 ffffea0007011dc8 ffffea0007058b48 ffff8801a7576ab8 > raw: 000000000000016e ffff8801a7588930 00000003ffffffff ffff8801d9a44c80 > page dumped because: VM_BUG_ON_PAGE(page_to_pgoff(page) != index) > page->mem_cgroup:ffff8801d9a44c80 > ------------[ cut here ]------------ > kernel BUG at mm/shmem.c:815! > invalid opcode: 0000 [#1] SMP KASAN > CPU: 0 PID: 4429 Comm: syz-executor697 Not tainted 4.18.0-rc3-next-20180706+ > #1 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > Google 01/01/2011 > RIP: 0010:shmem_undo_range+0xdaa/0x29a0 mm/shmem.c:815 Pretty sure this one's mine. At least I spotted a codepath earlier today which could lead to it. I'll fix it in the morning. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel BUG at mm/shmem.c:LINE! 2018-07-07 1:19 kernel BUG at mm/shmem.c:LINE! syzbot 2018-07-07 2:57 ` Matthew Wilcox @ 2018-07-09 14:36 ` Matthew Wilcox 2018-07-23 2:28 ` Hugh Dickins 1 sibling, 1 reply; 14+ messages in thread From: Matthew Wilcox @ 2018-07-09 14:36 UTC (permalink / raw) To: syzbot; +Cc: hughd, linux-kernel, linux-mm, syzkaller-bugs On Fri, Jul 06, 2018 at 06:19:02PM -0700, syzbot wrote: > Hello, > > syzbot found the following crash on: > > HEAD commit: 526674536360 Add linux-next specific files for 20180706 > git tree: linux-next > console output: https://syzkaller.appspot.com/x/log.txt?x=116d16fc400000 > kernel config: https://syzkaller.appspot.com/x/.config?x=c8d1cfc0cb798e48 > dashboard link: https://syzkaller.appspot.com/bug?extid=b8e0dfee3fd8c9012771 > compiler: gcc (GCC) 8.0.1 20180413 (experimental) > syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=170e462c400000 > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15f1ba2c400000 > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > Reported-by: syzbot+b8e0dfee3fd8c9012771@syzkaller.appspotmail.com #syz fix: shmem: Convert shmem_add_to_page_cache to XArray ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel BUG at mm/shmem.c:LINE! 2018-07-09 14:36 ` Matthew Wilcox @ 2018-07-23 2:28 ` Hugh Dickins 2018-07-23 14:01 ` Matthew Wilcox 0 siblings, 1 reply; 14+ messages in thread From: Hugh Dickins @ 2018-07-23 2:28 UTC (permalink / raw) To: Matthew Wilcox Cc: syzbot, Hugh Dickins, Kirill A. Shutemov, Andrew Morton, linux-kernel, linux-mm, syzkaller-bugs On Mon, 9 Jul 2018, Matthew Wilcox wrote: > On Fri, Jul 06, 2018 at 06:19:02PM -0700, syzbot wrote: > > Hello, > > > > syzbot found the following crash on: > > > > HEAD commit: 526674536360 Add linux-next specific files for 20180706 > > git tree: linux-next > > console output: https://syzkaller.appspot.com/x/log.txt?x=116d16fc400000 > > kernel config: https://syzkaller.appspot.com/x/.config?x=c8d1cfc0cb798e48 > > dashboard link: https://syzkaller.appspot.com/bug?extid=b8e0dfee3fd8c9012771 > > compiler: gcc (GCC) 8.0.1 20180413 (experimental) > > syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=170e462c400000 > > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15f1ba2c400000 > > > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > > Reported-by: syzbot+b8e0dfee3fd8c9012771@syzkaller.appspotmail.com > > #syz fix: shmem: Convert shmem_add_to_page_cache to XArray I don't see the patch, but I do see a diff in shmem_add_to_page_cache() between mmotm 4.18.0-rc3-mm1 and current mmotm 4.18.0-rc5-mm1, relating to use of xas_create_range(). Whether or not that fixed syzbot's kernel BUG at mm/shmem.c:815! I don't know, but I'm afraid it has not fixed linux-next breakage of huge tmpfs: I get a similar page_to_pgoff BUG at mm/filemap.c:1466! Please try something like mount -o remount,huge=always /dev/shm cp /dev/zero /dev/shm Writing soon crashes in find_lock_entry(), looking up offset 0x201 but getting the page for offset 0x3c1 instead. I've spent a while on it, but better turn over to you, Matthew: my guess is that xas_create_range() does not create the layout you expect from it. Thanks, Hugh ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel BUG at mm/shmem.c:LINE! 2018-07-23 2:28 ` Hugh Dickins @ 2018-07-23 14:01 ` Matthew Wilcox 2018-07-23 19:14 ` Hugh Dickins 0 siblings, 1 reply; 14+ messages in thread From: Matthew Wilcox @ 2018-07-23 14:01 UTC (permalink / raw) To: Hugh Dickins Cc: syzbot, Kirill A. Shutemov, Andrew Morton, linux-kernel, linux-mm, syzkaller-bugs On Sun, Jul 22, 2018 at 07:28:01PM -0700, Hugh Dickins wrote: > Whether or not that fixed syzbot's kernel BUG at mm/shmem.c:815! > I don't know, but I'm afraid it has not fixed linux-next breakage of > huge tmpfs: I get a similar page_to_pgoff BUG at mm/filemap.c:1466! > > Please try something like > mount -o remount,huge=always /dev/shm > cp /dev/zero /dev/shm > > Writing soon crashes in find_lock_entry(), looking up offset 0x201 > but getting the page for offset 0x3c1 instead. Hmm. I don't see a crash while running that command, but I do see an RCU stall in find_get_entries() called from shmem_undo_range() when running 'cp' the second time -- ie while truncating the /dev/shm/zero file. Maybe I'm seeing the same bug as you, and maybe I'm seeing a different one. Do we have a shmem test suite somewhere? > I've spent a while on it, but better turn over to you, Matthew: > my guess is that xas_create_range() does not create the layout > you expect from it. I've dumped the XArray tree on my machine and it actually looks fine *except* that the pages pointed to are free! That indicates to me I screwed up somebody's reference count somewhere. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel BUG at mm/shmem.c:LINE! 2018-07-23 14:01 ` Matthew Wilcox @ 2018-07-23 19:14 ` Hugh Dickins 2018-07-23 20:36 ` Matthew Wilcox 0 siblings, 1 reply; 14+ messages in thread From: Hugh Dickins @ 2018-07-23 19:14 UTC (permalink / raw) To: Matthew Wilcox Cc: Hugh Dickins, syzbot, Kirill A. Shutemov, Andrew Morton, linux-kernel, linux-mm, syzkaller-bugs On Mon, 23 Jul 2018, Matthew Wilcox wrote: > On Sun, Jul 22, 2018 at 07:28:01PM -0700, Hugh Dickins wrote: > > Whether or not that fixed syzbot's kernel BUG at mm/shmem.c:815! > > I don't know, but I'm afraid it has not fixed linux-next breakage of > > huge tmpfs: I get a similar page_to_pgoff BUG at mm/filemap.c:1466! > > > > Please try something like > > mount -o remount,huge=always /dev/shm > > cp /dev/zero /dev/shm > > > > Writing soon crashes in find_lock_entry(), looking up offset 0x201 > > but getting the page for offset 0x3c1 instead. > > Hmm. I don't see a crash while running that command, Thanks for looking. It is the VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page) in find_lock_entry(). Perhaps you didn't have CONFIG_DEBUG_VM=y on this occasion? Or you don't think of an oops as a kernel crash, and didn't notice it in dmesg? I see now that I've arranged for oops to crash, since I don't like to miss them myself; but it is a very clean oops, no locks held, so can just kill the process and continue. I recommend CONFIG_DEBUG_VM=y (for developers, not for distros), but if you'd prefer to avoid it for now, just edit that VM_BUG_ON_PAGE() in find_lock_entry() to a BUG_ON(). Or is there something more mysterious stopping it from showing up for you? It's repeatable for me. When not crashing, that "cp" should fill up about half of RAM before it hits the implicit tmpfs volume limit; but I am assuming a not entirely fragmented machine - it does need to allocate two 2MB pages before hitting the VM_BUG_ON_PAGE(). If you still can't see the crash, look to see how long /dev/shm/zero is after the "cp": mine crashes a page or two over 2MB (I'm being vague because I'm typing from the laptop I'd prefer not to reproduce it on at the moment: I think it would be 1 page over, i_size not yet updated for the page of index 0x201). But the xarray should by that stage have been populated for two 2MB pages (by your "goto next" loop in shmem_add_to_page_cache()). > but I do see an RCU > stall in find_get_entries() called from shmem_undo_range() when running > 'cp' the second time -- ie while truncating the /dev/shm/zero file. When I stopped oops crashing, I did indeed hang on that second attempt: no "RCU stall" seen, but I've probably missed the relevant config option. I wouldn't like to predict what happens if find_get_entry() returns the wrong page when that VM_BUG_ON_PAGE() is compiled out, very confusing. If it's compiled in, but just killed the process and dmesg was missed, then there's an unlocked page lock which will indeed hang a subsequent truncate (if the xarray yields the same wrong page again), though I don't know if that would amount to an RCU stall. > Maybe I'm seeing the same bug as you, and maybe I'm seeing a different > one. Do we have a shmem test suite somewhere? Not as such. xfstests works on tmpfs, huge or not, but I'd have to write up a few instructions, note one or two "-g auto" tests to patch out since they take forever on tmpfs, and the few failures expected; and update my snapshot of the tree to check that over first (I pulled it last mid-May). I'd rather not get into that at present: a working "cp" will be a great step forward, then I can easily run xfstests on the fixed kernel. > > > I've spent a while on it, but better turn over to you, Matthew: > > my guess is that xas_create_range() does not create the layout > > you expect from it. > > I've dumped the XArray tree on my machine and it actually looks fine > *except* that the pages pointed to are free! That indicates to me I > screwed up somebody's reference count somewhere. I don't actually know what a good xarray for two 2MB pages should look like, since the best I can find seems to be a bad one! Are you sure that those pages are free, rather than most of them tails of one of the two compound pages involved? I think it's the same in your rewrite of struct page, the compound_head field (lru.next), with its low bit set, were how to recognize a tail page. Hugh ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel BUG at mm/shmem.c:LINE! 2018-07-23 19:14 ` Hugh Dickins @ 2018-07-23 20:36 ` Matthew Wilcox 2018-07-23 22:42 ` Hugh Dickins 0 siblings, 1 reply; 14+ messages in thread From: Matthew Wilcox @ 2018-07-23 20:36 UTC (permalink / raw) To: Hugh Dickins Cc: syzbot, Kirill A. Shutemov, Andrew Morton, linux-kernel, linux-mm, syzkaller-bugs On Mon, Jul 23, 2018 at 12:14:41PM -0700, Hugh Dickins wrote: > On Mon, 23 Jul 2018, Matthew Wilcox wrote: > > On Sun, Jul 22, 2018 at 07:28:01PM -0700, Hugh Dickins wrote: > > > Whether or not that fixed syzbot's kernel BUG at mm/shmem.c:815! > > > I don't know, but I'm afraid it has not fixed linux-next breakage of > > > huge tmpfs: I get a similar page_to_pgoff BUG at mm/filemap.c:1466! > > > > > > Please try something like > > > mount -o remount,huge=always /dev/shm > > > cp /dev/zero /dev/shm > > > > > > Writing soon crashes in find_lock_entry(), looking up offset 0x201 > > > but getting the page for offset 0x3c1 instead. > > > > Hmm. I don't see a crash while running that command, > > Thanks for looking. > > It is the VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page) > in find_lock_entry(). Perhaps you didn't have CONFIG_DEBUG_VM=y > on this occasion? Or you don't think of an oops as a kernel crash, > and didn't notice it in dmesg? I see now that I've arranged for oops > to crash, since I don't like to miss them myself; but it is a very > clean oops, no locks held, so can just kill the process and continue. Usually I run with that turned on, but somehow in my recent messing with my test system, that got turned off. Once I turned it back on, it spots the bug instantly. > Or is there something more mysterious stopping it from showing up for > you? It's repeatable for me. When not crashing, that "cp" should fill > up about half of RAM before it hits the implicit tmpfs volume limit; > but I am assuming a not entirely fragmented machine - it does need > to allocate two 2MB pages before hitting the VM_BUG_ON_PAGE(). I tried that too, before noticing that DEBUG_VM was off; raised my test VM's memory from 2GB to 8GB. > Are you sure that those pages are free, rather than most of them tails > of one of the two compound pages involved? I think it's the same in your > rewrite of struct page, the compound_head field (lru.next), with its low > bit set, were how to recognize a tail page. Yes, PageTail was set, and so was TAIL_MAPPING (0xdead0000000000400). What was going on was the first 2MB page was being stored at indices 0-511, then the second 2MB page was being stored at indices 64-575 instead of 512-1023. I figured out a fix and pushed it to the 'ida' branch in git://git.infradead.org/users/willy/linux-dax.git It won't be in linux-next tomorrow because the nvdimm people have just dumped a pile of patches into their tree that conflict with the XArray-DAX rewrite, so Stephen has pulled the XArray tree out of linux-next temporarily. I didn't have time to sort out the merge conflict today because I judged your bug report more important. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel BUG at mm/shmem.c:LINE! 2018-07-23 20:36 ` Matthew Wilcox @ 2018-07-23 22:42 ` Hugh Dickins 2018-07-23 22:54 ` Matthew Wilcox 0 siblings, 1 reply; 14+ messages in thread From: Hugh Dickins @ 2018-07-23 22:42 UTC (permalink / raw) To: Matthew Wilcox Cc: Hugh Dickins, syzbot, Kirill A. Shutemov, Andrew Morton, linux-kernel, linux-mm, syzkaller-bugs On Mon, 23 Jul 2018, Matthew Wilcox wrote: > On Mon, Jul 23, 2018 at 12:14:41PM -0700, Hugh Dickins wrote: > > On Mon, 23 Jul 2018, Matthew Wilcox wrote: > > > On Sun, Jul 22, 2018 at 07:28:01PM -0700, Hugh Dickins wrote: > > > > Whether or not that fixed syzbot's kernel BUG at mm/shmem.c:815! > > > > I don't know, but I'm afraid it has not fixed linux-next breakage of > > > > huge tmpfs: I get a similar page_to_pgoff BUG at mm/filemap.c:1466! > > > > > > > > Please try something like > > > > mount -o remount,huge=always /dev/shm > > > > cp /dev/zero /dev/shm > > > > > > > > Writing soon crashes in find_lock_entry(), looking up offset 0x201 > > > > but getting the page for offset 0x3c1 instead. > > > > > > Hmm. I don't see a crash while running that command, > > > > Thanks for looking. > > > > It is the VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page) > > in find_lock_entry(). Perhaps you didn't have CONFIG_DEBUG_VM=y > > on this occasion? Or you don't think of an oops as a kernel crash, > > and didn't notice it in dmesg? I see now that I've arranged for oops > > to crash, since I don't like to miss them myself; but it is a very > > clean oops, no locks held, so can just kill the process and continue. > > Usually I run with that turned on, but somehow in my recent messing > with my test system, that got turned off. Once I turned it back on, > it spots the bug instantly. > > > Or is there something more mysterious stopping it from showing up for > > you? It's repeatable for me. When not crashing, that "cp" should fill > > up about half of RAM before it hits the implicit tmpfs volume limit; > > but I am assuming a not entirely fragmented machine - it does need > > to allocate two 2MB pages before hitting the VM_BUG_ON_PAGE(). > > I tried that too, before noticing that DEBUG_VM was off; raised my test > VM's memory from 2GB to 8GB. > > > Are you sure that those pages are free, rather than most of them tails > > of one of the two compound pages involved? I think it's the same in your > > rewrite of struct page, the compound_head field (lru.next), with its low > > bit set, were how to recognize a tail page. > > Yes, PageTail was set, and so was TAIL_MAPPING (0xdead0000000000400). > What was going on was the first 2MB page was being stored at indices > 0-511, then the second 2MB page was being stored at indices 64-575 > instead of 512-1023. > > I figured out a fix and pushed it to the 'ida' branch in > git://git.infradead.org/users/willy/linux-dax.git Great, thanks a lot for sorting that out so quickly. But I've cloned the tree and don't see today's patch, so assume you've folded the fix into an existing commit? If possible, please append the diff of today's fix to this thread so that we can try it out. Or if that's difficult, please at least tell which files were modified, then I can probably work it out from the diff of those files against mmotm. Thanks, Hugh > > It won't be in linux-next tomorrow because the nvdimm people have > just dumped a pile of patches into their tree that conflict with > the XArray-DAX rewrite, so Stephen has pulled the XArray tree out > of linux-next temporarily. I didn't have time to sort out the merge > conflict today because I judged your bug report more important. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel BUG at mm/shmem.c:LINE! 2018-07-23 22:42 ` Hugh Dickins @ 2018-07-23 22:54 ` Matthew Wilcox 2018-07-24 9:12 ` Hugh Dickins 0 siblings, 1 reply; 14+ messages in thread From: Matthew Wilcox @ 2018-07-23 22:54 UTC (permalink / raw) To: Hugh Dickins Cc: syzbot, Kirill A. Shutemov, Andrew Morton, linux-kernel, linux-mm, syzkaller-bugs On Mon, Jul 23, 2018 at 03:42:22PM -0700, Hugh Dickins wrote: > On Mon, 23 Jul 2018, Matthew Wilcox wrote: > > I figured out a fix and pushed it to the 'ida' branch in > > git://git.infradead.org/users/willy/linux-dax.git > > Great, thanks a lot for sorting that out so quickly. But I've cloned > the tree and don't see today's patch, so assume you've folded the fix > into an existing commit? If possible, please append the diff of today's > fix to this thread so that we can try it out. Or if that's difficult, > please at least tell which files were modified, then I can probably > work it out from the diff of those files against mmotm. Sure! It's just this: diff --git a/lib/xarray.c b/lib/xarray.c index 32a9c2a6a9e9..383c410997eb 100644 --- a/lib/xarray.c +++ b/lib/xarray.c @@ -660,6 +660,8 @@ void xas_create_range(struct xa_state *xas) unsigned char sibs = xas->xa_sibs; xas->xa_index |= ((sibs + 1) << shift) - 1; + if (!xas_top(xas->xa_node) && xas->xa_node->shift == xas->xa_shift) + xas->xa_offset |= sibs; xas->xa_shift = 0; xas->xa_sibs = 0; The only other things changed are the test suite, and removing an unnecessary change, so they can be ignored: diff --git a/lib/test_xarray.c b/lib/test_xarray.c index 8a67d4bb1788..ec06c3ca19e9 100644 --- a/lib/test_xarray.c +++ b/lib/test_xarray.c @@ -695,19 +695,20 @@ static noinline void check_move(struct xarray *xa) check_move_small(xa, (1UL << i) - 1); } -static noinline void check_create_range_1(struct xarray *xa, +static noinline void xa_store_many_order(struct xarray *xa, unsigned long index, unsigned order) { XA_STATE_ORDER(xas, xa, index, order); - unsigned int i; + unsigned int i = 0; do { xas_lock(&xas); + XA_BUG_ON(xa, xas_find_conflict(&xas)); xas_create_range(&xas); if (xas_error(&xas)) goto unlock; for (i = 0; i < (1U << order); i++) { - xas_store(&xas, xa + i); + XA_BUG_ON(xa, xas_store(&xas, xa_mk_value(index + i))); xas_next(&xas); } unlock: @@ -715,7 +716,29 @@ static noinline void check_create_range_1(struct xarray *xa, } while (xas_nomem(&xas, GFP_KERNEL)); XA_BUG_ON(xa, xas_error(&xas)); - xa_destroy(xa); +} + +static noinline void check_create_range_1(struct xarray *xa, + unsigned long index, unsigned order) +{ + unsigned long i; + + xa_store_many_order(xa, index, order); + for (i = index; i < index + (1UL << order); i++) + xa_erase_value(xa, i); + XA_BUG_ON(xa, !xa_empty(xa)); +} + +static noinline void check_create_range_2(struct xarray *xa, unsigned order) +{ + unsigned long i; + unsigned long nr = 1UL << order; + + for (i = 0; i < nr * nr; i += nr) + xa_store_many_order(xa, i, order); + for (i = 0; i < nr * nr; i++) + xa_erase_value(xa, i); + XA_BUG_ON(xa, !xa_empty(xa)); } static noinline void check_create_range(struct xarray *xa) @@ -729,6 +752,8 @@ static noinline void check_create_range(struct xarray *xa) check_create_range_1(xa, 2U << order, order); check_create_range_1(xa, 3U << order, order); check_create_range_1(xa, 1U << 24, order); + if (order < 10) + check_create_range_2(xa, order); } } diff --git a/mm/shmem.c b/mm/shmem.c index af2d7fa05af7..3ac507803787 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -589,8 +589,8 @@ static int shmem_add_to_page_cache(struct page *page, VM_BUG_ON(expected && PageTransHuge(page)); page_ref_add(page, nr); - page->index = index; page->mapping = mapping; + page->index = index; do { void *entry; ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: kernel BUG at mm/shmem.c:LINE! 2018-07-23 22:54 ` Matthew Wilcox @ 2018-07-24 9:12 ` Hugh Dickins 2018-07-26 6:53 ` Hugh Dickins 0 siblings, 1 reply; 14+ messages in thread From: Hugh Dickins @ 2018-07-24 9:12 UTC (permalink / raw) To: Matthew Wilcox Cc: Hugh Dickins, syzbot, Kirill A. Shutemov, Andrew Morton, linux-kernel, linux-mm, syzkaller-bugs On Mon, 23 Jul 2018, Matthew Wilcox wrote: > On Mon, Jul 23, 2018 at 03:42:22PM -0700, Hugh Dickins wrote: > > On Mon, 23 Jul 2018, Matthew Wilcox wrote: > > > I figured out a fix and pushed it to the 'ida' branch in > > > git://git.infradead.org/users/willy/linux-dax.git > > > > Great, thanks a lot for sorting that out so quickly. But I've cloned > > the tree and don't see today's patch, so assume you've folded the fix > > into an existing commit? If possible, please append the diff of today's > > fix to this thread so that we can try it out. Or if that's difficult, > > please at least tell which files were modified, then I can probably > > work it out from the diff of those files against mmotm. > > Sure! It's just this: > > diff --git a/lib/xarray.c b/lib/xarray.c > index 32a9c2a6a9e9..383c410997eb 100644 > --- a/lib/xarray.c > +++ b/lib/xarray.c > @@ -660,6 +660,8 @@ void xas_create_range(struct xa_state *xas) > unsigned char sibs = xas->xa_sibs; > > xas->xa_index |= ((sibs + 1) << shift) - 1; > + if (!xas_top(xas->xa_node) && xas->xa_node->shift == xas->xa_shift) > + xas->xa_offset |= sibs; > xas->xa_shift = 0; > xas->xa_sibs = 0; Yes, that's a big improvement, the huge "cp" is now fine, thank you. I've updated my xfstests tree, and tried that on mmotm with this patch. The few failures are exactly the same as on 4.18-rc6, whether mounting tmpfs as huge or not. But four of the tests, generic/{340,345,346,354} crash (oops) on 4.18-rc5-mm1 + your patch above, but pass on 4.18-rc6. That was simply with non-huge tmpfs: I just patched them out and didn't try for whether they crash with huge tmpfs too: probably they do, but that won't be very interesting until the non-huge crashes are fixed. I paid no attention to where the crashes were, I was just pressing on to skip the problem tests to get as full a run as possible, with that list of what's problematic and needs further investigation. To test non-huge tmpfs (as root), I wrap xfstests' check script as follows (you'll want to mkdir or substitute somewhere else for /xft): export FSTYP=tmpfs export DISABLE_UDF_TEST=1 export TEST_DEV=tmpfs1: export TEST_DIR=/xft export SCRATCH_DEV=tmpfs2: export SCRATCH_MNT=/mnt mount -t $FSTYP -o size=1088M $TEST_DEV $TEST_DIR || exit $? ./check "$@" # typically "-g auto" umount /xft /mnt 2>/dev/null But don't bother with "-g auto" for the moment: I have workarounds in for a few of them, generic/{027,213,449}, which we need not get into right now (without them, two of those tests can take close to forever). To test huge tmpfs (as root), I wrap xfstests' check script as: export FSTYP=tmpfs export DISABLE_UDF_TEST=1 export TEST_DEV=tmpfs1: export TEST_DIR=/xft export SCRATCH_DEV=tmpfs2: export SCRATCH_MNT=/mnt export TMPFS_MOUNT_OPTIONS="-o size=1088M,huge=always" mount -t $FSTYP $TMPFS_MOUNT_OPTIONS $TEST_DEV $TEST_DIR || exit $? ./check "$@" # typically "-g auto" umount /xft /mnt 2>/dev/null Hugh ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel BUG at mm/shmem.c:LINE! 2018-07-24 9:12 ` Hugh Dickins @ 2018-07-26 6:53 ` Hugh Dickins 2018-07-26 14:33 ` Matthew Wilcox 0 siblings, 1 reply; 14+ messages in thread From: Hugh Dickins @ 2018-07-26 6:53 UTC (permalink / raw) To: Matthew Wilcox Cc: Hugh Dickins, syzbot, Kirill A. Shutemov, Andrew Morton, linux-kernel, linux-mm, syzkaller-bugs On Tue, 24 Jul 2018, Hugh Dickins wrote: > On Mon, 23 Jul 2018, Matthew Wilcox wrote: > > On Mon, Jul 23, 2018 at 03:42:22PM -0700, Hugh Dickins wrote: > > > On Mon, 23 Jul 2018, Matthew Wilcox wrote: > > > > I figured out a fix and pushed it to the 'ida' branch in > > > > git://git.infradead.org/users/willy/linux-dax.git > > > > > > Great, thanks a lot for sorting that out so quickly. But I've cloned > > > the tree and don't see today's patch, so assume you've folded the fix > > > into an existing commit? If possible, please append the diff of today's > > > fix to this thread so that we can try it out. Or if that's difficult, > > > please at least tell which files were modified, then I can probably > > > work it out from the diff of those files against mmotm. > > > > Sure! It's just this: > > > > diff --git a/lib/xarray.c b/lib/xarray.c > > index 32a9c2a6a9e9..383c410997eb 100644 > > --- a/lib/xarray.c > > +++ b/lib/xarray.c > > @@ -660,6 +660,8 @@ void xas_create_range(struct xa_state *xas) > > unsigned char sibs = xas->xa_sibs; > > > > xas->xa_index |= ((sibs + 1) << shift) - 1; > > + if (!xas_top(xas->xa_node) && xas->xa_node->shift == xas->xa_shift) > > + xas->xa_offset |= sibs; > > xas->xa_shift = 0; > > xas->xa_sibs = 0; > > Yes, that's a big improvement, the huge "cp" is now fine, thank you. > > I've updated my xfstests tree, and tried that on mmotm with this patch. > The few failures are exactly the same as on 4.18-rc6, whether mounting > tmpfs as huge or not. But four of the tests, generic/{340,345,346,354} > crash (oops) on 4.18-rc5-mm1 + your patch above, but pass on 4.18-rc6. Now I've learnt that an oops on 0xffffffffffffffbe points to EEXIST, not to EREMOTE, it's easy: patch below fixes those four xfstests (and no doubt a similar oops I've seen occasionally under swapping load): so gives clean xfstests runs for non-huge and huge tmpfs. I can reproduce a kernel BUG at mm/khugepaged.c:1358! - that's the VM_BUG_ON(index != xas.xa_index) in collapse_shmem() - but it will take too long to describe how to reproduce that one, so I'm running it past you just in case you have a quick idea on it, otherwise I'll try harder. I did just try an xas_set(&xas, index) before the loop, in case the xas_create_range(&xas) had interfered with initial state; but if that made any difference at all, it only delayed the crash. Hugh --- mmotm/mm/shmem.c 2018-07-20 17:54:42.002805461 -0700 +++ linux/mm/shmem.c 2018-07-25 23:32:39.170892551 -0700 @@ -597,8 +597,10 @@ static int shmem_add_to_page_cache(struc void *entry; xas_lock_irq(&xas); entry = xas_find_conflict(&xas); - if (entry != expected) + if (entry != expected) { xas_set_err(&xas, -EEXIST); + goto unlock; + } xas_create_range(&xas); if (xas_error(&xas)) goto unlock; ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel BUG at mm/shmem.c:LINE! 2018-07-26 6:53 ` Hugh Dickins @ 2018-07-26 14:33 ` Matthew Wilcox 2018-07-26 16:40 ` Hugh Dickins 0 siblings, 1 reply; 14+ messages in thread From: Matthew Wilcox @ 2018-07-26 14:33 UTC (permalink / raw) To: Hugh Dickins Cc: syzbot, Kirill A. Shutemov, Andrew Morton, linux-kernel, linux-mm, syzkaller-bugs On Wed, Jul 25, 2018 at 11:53:15PM -0700, Hugh Dickins wrote: > Now I've learnt that an oops on 0xffffffffffffffbe points to EEXIST, > not to EREMOTE, it's easy: patch below fixes those four xfstests > (and no doubt a similar oops I've seen occasionally under swapping > load): so gives clean xfstests runs for non-huge and huge tmpfs. Excellent! I'm adding this: +++ b/lib/test_xarray.c @@ -741,6 +741,13 @@ static noinline void check_create_range_2(struct xarray *xa , unsigned order) XA_BUG_ON(xa, !xa_empty(xa)); } +static noinline void check_create_range_3(void) +{ + XA_STATE(xas, NULL, 0); + xas_set_err(&xas, -EEXIST); + xas_create_range(&xas); +} + static noinline void check_create_range(struct xarray *xa) { unsigned int order; @@ -755,6 +762,8 @@ static noinline void check_create_range(struct xarray *xa) if (order < 10) check_create_range_2(xa, order); } + + check_create_range_3(); } static LIST_HEAD(shadow_nodes); and fixing the bug differently ;-) But many thanks for spotting it! I'll look into the next bug you reported ... ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel BUG at mm/shmem.c:LINE! 2018-07-26 14:33 ` Matthew Wilcox @ 2018-07-26 16:40 ` Hugh Dickins 2018-07-26 19:32 ` Matthew Wilcox 0 siblings, 1 reply; 14+ messages in thread From: Hugh Dickins @ 2018-07-26 16:40 UTC (permalink / raw) To: Matthew Wilcox Cc: Hugh Dickins, syzbot, Kirill A. Shutemov, Andrew Morton, linux-kernel, linux-mm, syzkaller-bugs On Thu, 26 Jul 2018, Matthew Wilcox wrote: > On Wed, Jul 25, 2018 at 11:53:15PM -0700, Hugh Dickins wrote: > > and fixing the bug differently ;-) But many thanks for spotting it! I thought you might :) > > I'll look into the next bug you reported ... No need: that idea now works a lot better when I use the initialized "start", instead of the uninitialized "index". Hugh --- mmotm/mm/khugepaged.c 2018-07-20 17:54:41.978805312 -0700 +++ linux/mm/khugepaged.c 2018-07-26 09:20:22.416949014 -0700 @@ -1352,6 +1352,7 @@ static void collapse_shmem(struct mm_str goto out; } while (1); + xas_set(&xas, start); for (index = start; index < end; index++) { struct page *page = xas_next(&xas); ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel BUG at mm/shmem.c:LINE! 2018-07-26 16:40 ` Hugh Dickins @ 2018-07-26 19:32 ` Matthew Wilcox 0 siblings, 0 replies; 14+ messages in thread From: Matthew Wilcox @ 2018-07-26 19:32 UTC (permalink / raw) To: Hugh Dickins Cc: syzbot, Kirill A. Shutemov, Andrew Morton, linux-kernel, linux-mm, syzkaller-bugs On Thu, Jul 26, 2018 at 09:40:20AM -0700, Hugh Dickins wrote: > On Thu, 26 Jul 2018, Matthew Wilcox wrote: > > On Wed, Jul 25, 2018 at 11:53:15PM -0700, Hugh Dickins wrote: > > > > and fixing the bug differently ;-) But many thanks for spotting it! > > I thought you might :) The xas_* functions are all _expected_ to behave the same way when passed an XA_STATE containing an error -- do nothing. xas_create_range() behaved that way initially, then I fixed a bug and broke that invariant. Now the test suite checks it so I won't break it again. > > I'll look into the next bug you reported ... > > No need: that idea now works a lot better when I use the initialized > "start", instead of the uninitialized "index". Ugh. xas_create_range() is _supposed_ to return with xas pointing to the first index in the range. I wonder what I messed up. I've had a go at producing a test-case for this and haven't provoked a bug yet. Still, I don't want to keep xas_create_range() around long-term. I want to transition all the places that currently use it to use multi-index entries. So I'm going to put your workaround in and then work on deleting xas_create_range() altogether. Thanks so much for all your work on this! ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2018-07-26 19:32 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-07-07 1:19 kernel BUG at mm/shmem.c:LINE! syzbot 2018-07-07 2:57 ` Matthew Wilcox 2018-07-09 14:36 ` Matthew Wilcox 2018-07-23 2:28 ` Hugh Dickins 2018-07-23 14:01 ` Matthew Wilcox 2018-07-23 19:14 ` Hugh Dickins 2018-07-23 20:36 ` Matthew Wilcox 2018-07-23 22:42 ` Hugh Dickins 2018-07-23 22:54 ` Matthew Wilcox 2018-07-24 9:12 ` Hugh Dickins 2018-07-26 6:53 ` Hugh Dickins 2018-07-26 14:33 ` Matthew Wilcox 2018-07-26 16:40 ` Hugh Dickins 2018-07-26 19:32 ` Matthew Wilcox
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).