From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 455D12F0673
	for <linux-kernel@vger.kernel.org>; Thu, 21 May 2026 08:48:11 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1779353293; cv=none; b=OD03U5V9MTmJUC7paKW/8t8cpiztdlIBIHYI6bz+lMREJFkJcx24GD21grL1yefPhzjL0q8asaQ+RkiJihZRF6Bvl/hvAa+DPFIbxJcE9TMlr2O/8ziggHiRV5Vlr8TVVb5d0siOxO3YaEuTdfjQ2n7Bnb07aj9fOT5KT7Pvggc=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1779353293; c=relaxed/simple;
	bh=3M8RA0zimcmSLmZMrdOSifHlfLO2K2P9UKmP4flf84g=;
	h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version:
	 Content-Type:Content-Disposition:In-Reply-To; b=bytZxL54jMwFUtRE89E18qbIBfQjcGmoQd9g9P3DEqLz7uIpWGp7od7+zvuEP0n7c+YmsoTwv/KVtKvz6UoxMKmDKSrlOs7EpXxEMGIBDmWKPQmQfTbCbL5xkBoA1A9qWeNj1qmfQ/V+/c8/7F6nzP3f6pV4Apyov4MOrLUxWJ0=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=cOpbSrFz; arc=none smtp.client-ip=100.103.45.18
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="cOpbSrFz"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id C53151F00A3B;
	Thu, 21 May 2026 08:48:07 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org;
	s=k20260515; t=1779353291;
	bh=pNVsGGEw95+5JljDAToyIzMIwooJ11i63IvGijfxMd0=;
	h=Date:From:To:Cc:Subject:References:In-Reply-To;
	b=cOpbSrFzsblkU4j9n0Wp5hApJp6HHtqth4BLp5mqQylFoFVpBQLyizxeuFSskWeFa
	 unbfwWF+CMHyHZhuzLn07kjn/Kc14bQKGonAKQhRQlNZO0+zt0sCYryD0vA8D1KDF4
	 ZyhOtsb169teI+Iz9hwTMqpp/q+/RP7snvnPdV7b20VLoVUV7Y1IkQOwPEk29Ck9WZ
	 RghNDZcyHkWWTdEbhavRfM2fd/DTjZIxOxrizTIZD7iF60xEwLYOw7cJ7iRZyJti/l
	 Gt06s5FGglq3UBqraRava0PtZhQFOVQF2Z/Qqg3MA19wwLVy4DPWekWmLw9kxcn8Jq
	 lcXVUxWmCYYuA==
Date: Thu, 21 May 2026 10:48:04 +0200
From: "Oscar Salvador (SUSE)" <osalvador@kernel.org>
To: mawupeng <mawupeng1@huawei.com>
Cc: muchun.song@linux.dev, osalvador@suse.de, david@kernel.org,
	akpm@linux-foundation.org, ljs@kernel.org, Liam.Howlett@oracle.com,
	vbabka@kernel.org, rppt@kernel.org, surenb@google.com,
	mhocko@suse.com, linmiaohe@huawei.com, nao.horiguchi@gmail.com,
	mike.kravetz@oracle.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3] mm/memory-failure: fix hugetlb_lock AA deadlock in
 get_huge_page_for_hwpoison
Message-ID: <ag7GxM02D92LUrLd@localhost.localdomain>
References: <20260520020128.3506168-1-mawupeng1@huawei.com>
 <ag1tNtzmjNlrj4Xm@localhost.localdomain>
 <4ffe8dd7-86b6-42aa-a979-a9ae941e068e@huawei.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4ffe8dd7-86b6-42aa-a979-a9ae941e068e@huawei.com>

On Wed, May 20, 2026 at 07:24:28PM +0800, mawupeng wrote:
 
> You are correct. The refcount dropping logic in the `unmap` path was indeed flawed. 
> This issue was originally uncovered by fuzzing. Based on the initial stack trace, 
> we diagnosed it as a recursive locking (AA) deadlock on `hugetlb_lock`.
> 
> We initially suspected that `unmap` had prematurely released the folio reference 
> count, triggering the free path. However, after a thorough analysis of the refcount 
> state machine and the actual execution context, we confirmed that this hypothesis 
> is impossible. The root cause lies elsewhere in the locking hierarchy, and we are 
> currently tracing the exact call path that leads to the nested `hugetlb_lock` 
> acquisition.
> 
> The deadlock can be triggered by injecting hardware poison errors on a hugetlb
> page while concurrent unmapping activity occurs. The following minimal userspace
> test case demonstrates the race condition by spawning multiple processes to
> widen the timing window for the lock contention.


After staring at it, it is obvious the code is wrong.
We __should__ not be calling folio_put under the lock, as recursion will
happen if we are the last user holding a reference.
Thinking about it, I cannot think of a way we would need nesting here.

Anyway, this is a genuine bug, so thanks for that, but it all got very
confusing because of the traces pointing to wwrong places.
The thing is quite simple:

- We start with the assumption that a hugetlb folio is mapped to
  userspace and that madvise 

 thread#0                                     thread#1
  madvise(folio, MADV_HWPOISON) (we poisoned the page)
  madvise(folio, MADV_HWPOISON) (second call)
                                              unmap(folio)
   try_memory_failure_hugetlb
    get_huge_page_for_hwpoison (takes lock)
     __get_huge_page_for_hwpoison
       hugetlb_update_hwpoison
        - we get MF_HUGETLB_FOLIO_PRE_POISONED
	  we jump to out which does
	  folio_put
	   free_huge_page      (takes lock.. yaiks)


So yes, the fix is to have the folio_put happening not within the lock.

Please, send the patch with the right changelog (and no version) and I will ack it.


-- 
Oscar Salvador
SUSE Labs