Re: [PATCH 6.6.y] mm/hugetlb: fix DEBUG_LOCKS_WARN_ON(1) when dissolve_free_hugetlb_folio()

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Greg KH <gregkh@linuxfoundation.org>
To: Miaohe Lin <linmiaohe@huawei.com>
Cc: stable@vger.kernel.org, Oscar Salvador <osalvador@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH 6.6.y] mm/hugetlb: fix DEBUG_LOCKS_WARN_ON(1) when dissolve_free_hugetlb_folio()
Date: Tue, 30 Apr 2024 10:19:38 +0200	[thread overview]
Message-ID: <2024043020-dropper-create-2cf5@gregkh> (raw)
In-Reply-To: <20240430074331.2500025-1-linmiaohe@huawei.com>

On Tue, Apr 30, 2024 at 03:43:31PM +0800, Miaohe Lin wrote:
> When I did memory failure tests recently, below warning occurs:
> 
> DEBUG_LOCKS_WARN_ON(1)
> WARNING: CPU: 8 PID: 1011 at kernel/locking/lockdep.c:232 __lock_acquire+0xccb/0x1ca0
> Modules linked in: mce_inject hwpoison_inject
> CPU: 8 PID: 1011 Comm: bash Kdump: loaded Not tainted 6.9.0-rc3-next-20240410-00012-gdb69f219f4be #3
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
> RIP: 0010:__lock_acquire+0xccb/0x1ca0
> RSP: 0018:ffffa7a1c7fe3bd0 EFLAGS: 00000082
> RAX: 0000000000000000 RBX: eb851eb853975fcf RCX: ffffa1ce5fc1c9c8
> RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffffa1ce5fc1c9c0
> RBP: ffffa1c6865d3280 R08: ffffffffb0f570a8 R09: 0000000000009ffb
> R10: 0000000000000286 R11: ffffffffb0f2ad50 R12: ffffa1c6865d3d10
> R13: ffffa1c6865d3c70 R14: 0000000000000000 R15: 0000000000000004
> FS:  00007ff9f32aa740(0000) GS:ffffa1ce5fc00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007ff9f3134ba0 CR3: 00000008484e4000 CR4: 00000000000006f0
> Call Trace:
>  <TASK>
>  lock_acquire+0xbe/0x2d0
>  _raw_spin_lock_irqsave+0x3a/0x60
>  hugepage_subpool_put_pages.part.0+0xe/0xc0
>  free_huge_folio+0x253/0x3f0
>  dissolve_free_huge_page+0x147/0x210
>  __page_handle_poison+0x9/0x70
>  memory_failure+0x4e6/0x8c0
>  hard_offline_page_store+0x55/0xa0
>  kernfs_fop_write_iter+0x12c/0x1d0
>  vfs_write+0x380/0x540
>  ksys_write+0x64/0xe0
>  do_syscall_64+0xbc/0x1d0
>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7ff9f3114887
> RSP: 002b:00007ffecbacb458 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007ff9f3114887
> RDX: 000000000000000c RSI: 0000564494164e10 RDI: 0000000000000001
> RBP: 0000564494164e10 R08: 00007ff9f31d1460 R09: 000000007fffffff
> R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000c
> R13: 00007ff9f321b780 R14: 00007ff9f3217600 R15: 00007ff9f3216a00
>  </TASK>
> Kernel panic - not syncing: kernel: panic_on_warn set ...
> CPU: 8 PID: 1011 Comm: bash Kdump: loaded Not tainted 6.9.0-rc3-next-20240410-00012-gdb69f219f4be #3
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
> Call Trace:
>  <TASK>
>  panic+0x326/0x350
>  check_panic_on_warn+0x4f/0x50
>  __warn+0x98/0x190
>  report_bug+0x18e/0x1a0
>  handle_bug+0x3d/0x70
>  exc_invalid_op+0x18/0x70
>  asm_exc_invalid_op+0x1a/0x20
> RIP: 0010:__lock_acquire+0xccb/0x1ca0
> RSP: 0018:ffffa7a1c7fe3bd0 EFLAGS: 00000082
> RAX: 0000000000000000 RBX: eb851eb853975fcf RCX: ffffa1ce5fc1c9c8
> RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffffa1ce5fc1c9c0
> RBP: ffffa1c6865d3280 R08: ffffffffb0f570a8 R09: 0000000000009ffb
> R10: 0000000000000286 R11: ffffffffb0f2ad50 R12: ffffa1c6865d3d10
> R13: ffffa1c6865d3c70 R14: 0000000000000000 R15: 0000000000000004
>  lock_acquire+0xbe/0x2d0
>  _raw_spin_lock_irqsave+0x3a/0x60
>  hugepage_subpool_put_pages.part.0+0xe/0xc0
>  free_huge_folio+0x253/0x3f0
>  dissolve_free_huge_page+0x147/0x210
>  __page_handle_poison+0x9/0x70
>  memory_failure+0x4e6/0x8c0
>  hard_offline_page_store+0x55/0xa0
>  kernfs_fop_write_iter+0x12c/0x1d0
>  vfs_write+0x380/0x540
>  ksys_write+0x64/0xe0
>  do_syscall_64+0xbc/0x1d0
>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7ff9f3114887
> RSP: 002b:00007ffecbacb458 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007ff9f3114887
> RDX: 000000000000000c RSI: 0000564494164e10 RDI: 0000000000000001
> RBP: 0000564494164e10 R08: 00007ff9f31d1460 R09: 000000007fffffff
> R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000c
> R13: 00007ff9f321b780 R14: 00007ff9f3217600 R15: 00007ff9f3216a00
>  </TASK>
> 
> After git bisecting and digging into the code, I believe the root cause is
> that _deferred_list field of folio is unioned with _hugetlb_subpool field.
> In __update_and_free_hugetlb_folio(), folio->_deferred_list is
> initialized leading to corrupted folio->_hugetlb_subpool when folio is
> hugetlb.  Later free_huge_folio() will use _hugetlb_subpool and above
> warning happens.
> 
> But it is assumed hugetlb flag must have been cleared when calling
> folio_put() in update_and_free_hugetlb_folio().  This assumption is broken
> due to below race:
> 
> CPU1					CPU2
> dissolve_free_huge_page			update_and_free_pages_bulk
>  update_and_free_hugetlb_folio		 hugetlb_vmemmap_restore_folios
> 					  folio_clear_hugetlb_vmemmap_optimized
>   clear_flag = folio_test_hugetlb_vmemmap_optimized
>   if (clear_flag) <-- False, it's already cleared.
>    __folio_clear_hugetlb(folio) <-- Hugetlb is not cleared.
>   folio_put
>    free_huge_folio <-- free_the_page is expected.
> 					 list_for_each_entry()
> 					  __folio_clear_hugetlb <-- Too late.
> 
> Fix this issue by checking whether folio is hugetlb directly instead of
> checking clear_flag to close the race window.
> 
> Link: https://lkml.kernel.org/r/20240419085819.1901645-1-linmiaohe@huawei.com
> Fixes: 32c877191e02 ("hugetlb: do not clear hugetlb dtor until allocating vmemmap")
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> Reviewed-by: Oscar Salvador <osalvador@suse.de>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> (cherry picked from commit 52ccdde16b6540abe43b6f8d8e1e1ec90b0983af)
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  mm/hugetlb.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index a17950160395..3a0f6b78f925 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1782,7 +1782,7 @@ static void __update_and_free_hugetlb_folio(struct hstate *h,
>  	 * If vmemmap pages were allocated above, then we need to clear the
>  	 * hugetlb destructor under the hugetlb lock.
>  	 */
> -	if (clear_dtor) {
> +	if (folio_test_hugetlb(folio)) {
>  		spin_lock_irq(&hugetlb_lock);
>  		__clear_hugetlb_destructor(h, folio);
>  		spin_unlock_irq(&hugetlb_lock);
> -- 
> 2.33.0
> 
> 

Again, this breaks the build, did you not test it?  Always do so before
sending it out :(

thanks,

greg k-h

next prev parent reply	other threads:[~2024-04-30  8:19 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-29 11:34 FAILED: patch "[PATCH] mm/hugetlb: fix DEBUG_LOCKS_WARN_ON(1) when" failed to apply to 6.6-stable tree gregkh
2024-04-30  7:43 ` [PATCH 6.6.y] mm/hugetlb: fix DEBUG_LOCKS_WARN_ON(1) when dissolve_free_hugetlb_folio() Miaohe Lin
2024-04-30  8:19   ` Greg KH [this message]
2024-05-05  6:21 ` Miaohe Lin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2024043020-dropper-create-2cf5@gregkh \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=linmiaohe@huawei.com \
    --cc=osalvador@suse.de \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.