From: "Oscar Salvador (SUSE)" <osalvador@kernel.org>
To: mawupeng <mawupeng1@huawei.com>
Cc: muchun.song@linux.dev, osalvador@suse.de, david@kernel.org,
akpm@linux-foundation.org, ljs@kernel.org,
Liam.Howlett@oracle.com, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, linmiaohe@huawei.com,
nao.horiguchi@gmail.com, mike.kravetz@oracle.com,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3] mm/memory-failure: fix hugetlb_lock AA deadlock in get_huge_page_for_hwpoison
Date: Thu, 21 May 2026 10:48:04 +0200 [thread overview]
Message-ID: <ag7GxM02D92LUrLd@localhost.localdomain> (raw)
In-Reply-To: <4ffe8dd7-86b6-42aa-a979-a9ae941e068e@huawei.com>
On Wed, May 20, 2026 at 07:24:28PM +0800, mawupeng wrote:
> You are correct. The refcount dropping logic in the `unmap` path was indeed flawed.
> This issue was originally uncovered by fuzzing. Based on the initial stack trace,
> we diagnosed it as a recursive locking (AA) deadlock on `hugetlb_lock`.
>
> We initially suspected that `unmap` had prematurely released the folio reference
> count, triggering the free path. However, after a thorough analysis of the refcount
> state machine and the actual execution context, we confirmed that this hypothesis
> is impossible. The root cause lies elsewhere in the locking hierarchy, and we are
> currently tracing the exact call path that leads to the nested `hugetlb_lock`
> acquisition.
>
> The deadlock can be triggered by injecting hardware poison errors on a hugetlb
> page while concurrent unmapping activity occurs. The following minimal userspace
> test case demonstrates the race condition by spawning multiple processes to
> widen the timing window for the lock contention.
After staring at it, it is obvious the code is wrong.
We __should__ not be calling folio_put under the lock, as recursion will
happen if we are the last user holding a reference.
Thinking about it, I cannot think of a way we would need nesting here.
Anyway, this is a genuine bug, so thanks for that, but it all got very
confusing because of the traces pointing to wwrong places.
The thing is quite simple:
- We start with the assumption that a hugetlb folio is mapped to
userspace and that madvise
thread#0 thread#1
madvise(folio, MADV_HWPOISON) (we poisoned the page)
madvise(folio, MADV_HWPOISON) (second call)
unmap(folio)
try_memory_failure_hugetlb
get_huge_page_for_hwpoison (takes lock)
__get_huge_page_for_hwpoison
hugetlb_update_hwpoison
- we get MF_HUGETLB_FOLIO_PRE_POISONED
we jump to out which does
folio_put
free_huge_page (takes lock.. yaiks)
So yes, the fix is to have the folio_put happening not within the lock.
Please, send the patch with the right changelog (and no version) and I will ack it.
--
Oscar Salvador
SUSE Labs
next prev parent reply other threads:[~2026-05-21 8:48 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-20 2:01 [PATCH v3] mm/memory-failure: fix hugetlb_lock AA deadlock in get_huge_page_for_hwpoison Wupeng Ma
2026-05-20 2:40 ` Kefeng Wang
2026-05-20 2:45 ` mawupeng
2026-05-20 8:13 ` Oscar Salvador (SUSE)
2026-05-20 10:38 ` David Hildenbrand (Arm)
2026-05-20 11:24 ` mawupeng
2026-05-21 8:48 ` Oscar Salvador (SUSE) [this message]
2026-05-21 9:03 ` mawupeng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ag7GxM02D92LUrLd@localhost.localdomain \
--to=osalvador@kernel.org \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=david@kernel.org \
--cc=linmiaohe@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=mawupeng1@huawei.com \
--cc=mhocko@suse.com \
--cc=mike.kravetz@oracle.com \
--cc=muchun.song@linux.dev \
--cc=nao.horiguchi@gmail.com \
--cc=osalvador@suse.de \
--cc=rppt@kernel.org \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.