All of lore.kernel.org
 help / color / mirror / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: syzbot <syzbot+ecf80462cb7d5d552bc7@syzkaller.appspotmail.com>
Cc: akpm@linux-foundation.org, andreyknvl@google.com,
	hannes@cmpxchg.org, khalid.aziz@oracle.com,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	mhocko@suse.com, rppt@linux.ibm.com,
	syzkaller-bugs@googlegroups.com, torvalds@linux-foundation.org,
	"Kirill A. Shutemov" <kirill@shutemov.name>
Subject: Re: general protection fault in madvise_cold_or_pageout_pte_range
Date: Tue, 15 Sep 2020 09:33:49 -0700	[thread overview]
Message-ID: <20200915163349.GA2868856@google.com> (raw)
In-Reply-To: <00000000000002a86f05af42ab27@google.com>

On Mon, Sep 14, 2020 at 02:29:15AM -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:    729e3d09 Merge tag 'ceph-for-5.9-rc5' of git://github.com/..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=1482b99e900000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=8f5c353182ed6199
> dashboard link: https://syzkaller.appspot.com/bug?extid=ecf80462cb7d5d552bc7
> compiler:       clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81)
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=16e2a255900000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=164afdb3900000
> 
> The issue was bisected to:
> 
> commit 1a4e58cce84ee88129d5d49c064bd2852b481357
> Author: Minchan Kim <minchan@kernel.org>
> Date:   Wed Sep 25 23:49:15 2019 +0000
> 
>     mm: introduce MADV_PAGEOUT
> 
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=127f973e900000
> final oops:     https://syzkaller.appspot.com/x/report.txt?x=117f973e900000
> console output: https://syzkaller.appspot.com/x/log.txt?x=167f973e900000
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+ecf80462cb7d5d552bc7@syzkaller.appspotmail.com
> Fixes: 1a4e58cce84e ("mm: introduce MADV_PAGEOUT")
> 
> general protection fault, probably for non-canonical address 0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN
> KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
> CPU: 1 PID: 6826 Comm: syz-executor142 Not tainted 5.9.0-rc4-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> RIP: 0010:__lock_acquire+0x84/0x2ae0 kernel/locking/lockdep.c:4296
> Code: ff df 8a 04 30 84 c0 0f 85 e3 16 00 00 83 3d 56 58 35 08 00 0f 84 0e 17 00 00 83 3d 25 c7 f5 07 00 74 2c 4c 89 e8 48 c1 e8 03 <80> 3c 30 00 74 12 4c 89 ef e8 3e d1 5a 00 48 be 00 00 00 00 00 fc
> RSP: 0018:ffffc90004b9f850 EFLAGS: 00010006
> RAX: 0000000000000003 RBX: 0000000000000001 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: dffffc0000000000 RDI: 0000000000000018
> RBP: ffffc90004b9f9a8 R08: 0000000000000001 R09: 0000000000000000
> R10: fffffbfff131e2e6 R11: 0000000000000000 R12: ffff8880937161c0
> R13: 0000000000000018 R14: 0000000000000000 R15: 0000000000000000
> FS:  0000000002638880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000000002100003f CR3: 00000000a49a2000 CR4: 00000000001506e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  lock_acquire+0x140/0x6f0 kernel/locking/lockdep.c:5006
>  __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
>  _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
>  spin_lock include/linux/spinlock.h:354 [inline]
>  madvise_cold_or_pageout_pte_range+0x52f/0x25c0 mm/madvise.c:389
>  walk_pmd_range mm/pagewalk.c:89 [inline]
>  walk_pud_range mm/pagewalk.c:160 [inline]
>  walk_p4d_range mm/pagewalk.c:193 [inline]
>  walk_pgd_range mm/pagewalk.c:229 [inline]
>  __walk_page_range+0xe7b/0x1da0 mm/pagewalk.c:331
>  walk_page_range+0x2c3/0x5c0 mm/pagewalk.c:427
>  madvise_pageout_page_range mm/madvise.c:521 [inline]
>  madvise_pageout mm/madvise.c:557 [inline]
>  madvise_vma mm/madvise.c:946 [inline]
>  do_madvise+0x12d0/0x2090 mm/madvise.c:1145
>  __do_sys_madvise mm/madvise.c:1171 [inline]
>  __se_sys_madvise mm/madvise.c:1169 [inline]
>  __x64_sys_madvise+0x76/0x80 mm/madvise.c:1169
>  do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> RIP: 0033:0x4440e9
> Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 db d7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
> RSP: 002b:00007ffed62d6668 EFLAGS: 00000246 ORIG_RAX: 000000000000001c
> RAX: ffffffffffffffda RBX: 00000000004002e0 RCX: 00000000004440e9
> RDX: 0000000000000015 RSI: 0000000000600003 RDI: 0000000020000000
> RBP: 00000000006ce018 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000004 R11: 0000000000000246 R12: 0000000000401d50
> R13: 0000000000401de0 R14: 0000000000000000 R15: 0000000000000000
> Modules linked in:
> ---[ end trace 0453ba4a30f03f10 ]---
> RIP: 0010:__lock_acquire+0x84/0x2ae0 kernel/locking/lockdep.c:4296
> Code: ff df 8a 04 30 84 c0 0f 85 e3 16 00 00 83 3d 56 58 35 08 00 0f 84 0e 17 00 00 83 3d 25 c7 f5 07 00 74 2c 4c 89 e8 48 c1 e8 03 <80> 3c 30 00 74 12 4c 89 ef e8 3e d1 5a 00 48 be 00 00 00 00 00 fc
> RSP: 0018:ffffc90004b9f850 EFLAGS: 00010006
> RAX: 0000000000000003 RBX: 0000000000000001 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: dffffc0000000000 RDI: 0000000000000018
> RBP: ffffc90004b9f9a8 R08: 0000000000000001 R09: 0000000000000000
> R10: fffffbfff131e2e6 R11: 0000000000000000 R12: ffff8880937161c0
> R13: 0000000000000018 R14: 0000000000000000 R15: 0000000000000000
> FS:  0000000002638880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000000002100003f CR3: 00000000a49a2000 CR4: 00000000001506e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> 


The backing vma was shmem. When I see the implemenation of __split_huge_pmd,
it looks like pmd zapping if vma is not vma_is_anonymous unlike anon vma
whereremapping pmd page to ptes.

commit d21b9e57c74c (HEAD)
Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Date:   Tue Jul 26 15:25:37 2016 -0700

    thp: handle file pages in split_huge_pmd()

    Splitting THP PMD is simple: just unmap it as in DAX case.  This way we
    can avoid memory overhead on page table allocation to deposit.

    It's probably a good idea to try to allocation page table with
    GFP_ATOMIC in __split_huge_pmd_locked() to avoid refaulting the area,
    but clearing pmd should be good enough for now.

    Unlike DAX, we also remove the page from rmap and drop reference.
    pmd_young() is transfered to PageReferenced().

If so, we need to check the pmd validation after splitting.
Ccing to Kirill for double check.

From 26e804a0723f92862aa1ee9cc2c9e5d4691cb11d Mon Sep 17 00:00:00 2001
From: Minchan Kim <minchan@kernel.org>
Date: Mon, 14 Sep 2020 23:32:15 -0700
Subject: [PATCH] mm: validate pmd after splitting

syzbot reported following.

general protection fault, probably for non-canonical address 0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
CPU: 1 PID: 6826 Comm: syz-executor142 Not tainted 5.9.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:__lock_acquire+0x84/0x2ae0 kernel/locking/lockdep.c:4296
Code: ff df 8a 04 30 84 c0 0f 85 e3 16 00 00 83 3d 56 58 35 08 00 0f 84 0e 17 00 00 83 3d 25 c7 f5 07 00 74 2c 4c 89 e8 48 c1 e8 03 <80> 3c 30 00 74 12 4c 89 ef e8 3e d1 5a 00 48 be 00 00 00 00 00 fc
RSP: 0018:ffffc90004b9f850 EFLAGS: 00010006
RAX: 0000000000000003 RBX: 0000000000000001 RCX: 0000000000000000
RDX: 0000000000000000 RSI: dffffc0000000000 RDI: 0000000000000018
RBP: ffffc90004b9f9a8 R08: 0000000000000001 R09: 0000000000000000
R10: fffffbfff131e2e6 R11: 0000000000000000 R12: ffff8880937161c0
R13: 0000000000000018 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000002638880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000002100003f CR3: 00000000a49a2000 CR4: 00000000001506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 lock_acquire+0x140/0x6f0 kernel/locking/lockdep.c:5006
 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
 _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
 spin_lock include/linux/spinlock.h:354 [inline]
 madvise_cold_or_pageout_pte_range+0x52f/0x25c0 mm/madvise.c:389
 walk_pmd_range mm/pagewalk.c:89 [inline]
 walk_pud_range mm/pagewalk.c:160 [inline]
 walk_p4d_range mm/pagewalk.c:193 [inline]
 walk_pgd_range mm/pagewalk.c:229 [inline]
 __walk_page_range+0xe7b/0x1da0 mm/pagewalk.c:331
 walk_page_range+0x2c3/0x5c0 mm/pagewalk.c:427
 madvise_pageout_page_range mm/madvise.c:521 [inline]
 madvise_pageout mm/madvise.c:557 [inline]
 madvise_vma mm/madvise.c:946 [inline]
 do_madvise+0x12d0/0x2090 mm/madvise.c:1145
 __do_sys_madvise mm/madvise.c:1171 [inline]
 __se_sys_madvise mm/madvise.c:1169 [inline]
 __x64_sys_madvise+0x76/0x80 mm/madvise.c:1169
 do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

In case of split page of file-backed THP, it zaps the pmd instead of
remapping of sub-pages so need to check pmd validity after split.

Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: syzbot+ecf80462cb7d5d552bc7@syzkaller.appspotmail.com
Fixes: 1a4e58cce84e ("mm: introduce MADV_PAGEOUT")
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 mm/madvise.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index d4aa5f776543..0e0d61003fc6 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -381,9 +381,9 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
 		return 0;
 	}
 
+regular_page:
 	if (pmd_trans_unstable(pmd))
 		return 0;
-regular_page:
 #endif
 	tlb_change_page_size(tlb, PAGE_SIZE);
 	orig_pte = pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
-- 
2.28.0.618.gf4bc123cb7-goog



  parent reply	other threads:[~2020-09-15 16:33 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-14  9:29 general protection fault in madvise_cold_or_pageout_pte_range syzbot
2020-09-14 20:38 ` Minchan Kim
2020-09-15 16:33 ` Minchan Kim [this message]
2020-09-26  8:36   ` Kirill A. Shutemov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200915163349.GA2868856@google.com \
    --to=minchan@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=andreyknvl@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=khalid.aziz@oracle.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=rppt@linux.ibm.com \
    --cc=syzbot+ecf80462cb7d5d552bc7@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.