Re: [syzbot] kernel BUG in free_huge_page

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mike Kravetz <mike.kravetz@oracle.com>
To: syzbot <syzbot+83cc82a0254bc0c17b52@syzkaller.appspotmail.com>,
	Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, llvm@lists.linux.dev, muchun.song@linux.dev,
	nathan@kernel.org, ndesaulniers@google.com,
	syzkaller-bugs@googlegroups.com, trix@redhat.com,
	Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Subject: Re: [syzbot] kernel BUG in free_huge_page
Date: Thu, 26 Jan 2023 15:42:29 -0800	[thread overview]
Message-ID: <Y9MP5fmQ28nceDjx@monkey> (raw)
In-Reply-To: <000000000000a9dc0705f3105a26@google.com>

On 01/24/23 22:00, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:    691781f561e9 Add linux-next specific files for 20230123
> git tree:       linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=1393d0ac480000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=804cddf7ddbc6c64
> dashboard link: https://syzkaller.appspot.com/bug?extid=83cc82a0254bc0c17b52
> compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
<snip>
> ------------[ cut here ]------------
> kernel BUG at mm/hugetlb.c:1865!
> invalid opcode: 0000 [#1] PREEMPT SMP KASAN
> CPU: 1 PID: 8927 Comm: syz-executor.5 Not tainted 6.2.0-rc5-next-20230123-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/12/2023
> RIP: 0010:free_huge_page+0xa5b/0xe80 mm/hugetlb.c:1865
> Code: 0f 0b e8 08 98 b7 ff 4c 89 e7 e8 00 3e f7 ff 89 c3 e9 a0 f9 ff ff e8 f4 97 b7 ff 48 c7 c6 a0 6f 59 8a 4c 89 e7 e8 55 8b ef ff <0f> 0b e8 de 97 b7 ff 48 8d 7b 17 48 b8 00 00 00 00 00 fc ff df 4c
> RSP: 0018:ffffc9000557f908 EFLAGS: 00010246
> RAX: 0000000000040000 RBX: 0000000000000001 RCX: ffffc900062ea000
> RDX: 0000000000040000 RSI: ffffffff81ca564b RDI: 0000000000000000
> RBP: ffffffff91c45bf8 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000001 R11: 1ffffffff21798de R12: ffffea0002580000
> R13: ffffea0002580090 R14: 0000000000000000 R15: ffffea0002580034
> FS:  00007f42bf3be700(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f2349dabf84 CR3: 0000000021513000 CR4: 00000000003506e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  <TASK>
>  __folio_put_large mm/swap.c:119 [inline]
>  __folio_put+0x109/0x140 mm/swap.c:127
>  folio_put include/linux/mm.h:1203 [inline]
>  put_page+0x21b/0x280 include/linux/mm.h:1272
>  hugetlb_fault+0x153e/0x23f0 mm/hugetlb.c:6130
>  follow_hugetlb_page+0x6ab/0x1e40 mm/hugetlb.c:6524
>  __get_user_pages+0x29b/0xfc0 mm/gup.c:1125
>  populate_vma_page_range+0x241/0x320 mm/gup.c:1526
>  __mm_populate+0x105/0x3b0 mm/gup.c:1640
>  do_mlock+0x370/0x6d0 mm/mlock.c:608
>  __do_sys_mlock mm/mlock.c:616 [inline]
>  __se_sys_mlock mm/mlock.c:614 [inline]
>  __x64_sys_mlock+0x59/0x80 mm/mlock.c:614
>  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>  do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
>  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> RIP: 0033:0x7f42be68c0c9

I believe this has the same root cause as the problem reported here:

https://lore.kernel.org/linux-mm/20230124162346.404985e8@thinkpad/

With the code segment,
[...]
>  	page = pte_page(entry);
> -	if (page != pagecache_page)
> +	if (page_folio(page) != pagecache_folio)
>  		if (!trylock_page(page)) {
>  			need_wait_lock = 1;
>  			goto out_ptl;
>  		}
>  
> -	get_page(page);
> +	folio_get(pagecache_folio);
>  

In Gerald's case, pagecache_folio == NULL.

In addition, note that page_folio(page) and pagecache_folio can refer to
two different folios.  We already have a ref on pagecache_folio from previous
code 'pagecache_folio = filemap_lock_folio()'.  The above would incorrectly
get another ref on pagecache_folio instead of page.  So, page would be
short one ref but still mapped.  At the end of hugetlb_fault(), we will
do a put_page(page) and since we did not take a ref on the page, the ref
count will drop to zero and we end up calling free_huge_page while the page
is still mapped. That triggers the BUG. :(

This looks to be fixed in v2 of the patch.
-- 
Mike Kravetz

     prev parent reply	other threads:[~2023-01-26 23:42 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-25  6:00 [syzbot] kernel BUG in free_huge_page syzbot
2023-01-26 23:42 ` Mike Kravetz [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y9MP5fmQ28nceDjx@monkey \
    --to=mike.kravetz@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=gerald.schaefer@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=llvm@lists.linux.dev \
    --cc=muchun.song@linux.dev \
    --cc=nathan@kernel.org \
    --cc=ndesaulniers@google.com \
    --cc=sidhartha.kumar@oracle.com \
    --cc=syzbot+83cc82a0254bc0c17b52@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=trix@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.