From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 455D12F0673 for ; Thu, 21 May 2026 08:48:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779353293; cv=none; b=OD03U5V9MTmJUC7paKW/8t8cpiztdlIBIHYI6bz+lMREJFkJcx24GD21grL1yefPhzjL0q8asaQ+RkiJihZRF6Bvl/hvAa+DPFIbxJcE9TMlr2O/8ziggHiRV5Vlr8TVVb5d0siOxO3YaEuTdfjQ2n7Bnb07aj9fOT5KT7Pvggc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779353293; c=relaxed/simple; bh=3M8RA0zimcmSLmZMrdOSifHlfLO2K2P9UKmP4flf84g=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=bytZxL54jMwFUtRE89E18qbIBfQjcGmoQd9g9P3DEqLz7uIpWGp7od7+zvuEP0n7c+YmsoTwv/KVtKvz6UoxMKmDKSrlOs7EpXxEMGIBDmWKPQmQfTbCbL5xkBoA1A9qWeNj1qmfQ/V+/c8/7F6nzP3f6pV4Apyov4MOrLUxWJ0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=cOpbSrFz; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="cOpbSrFz" Received: by smtp.kernel.org (Postfix) with ESMTPSA id C53151F00A3B; Thu, 21 May 2026 08:48:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1779353291; bh=pNVsGGEw95+5JljDAToyIzMIwooJ11i63IvGijfxMd0=; h=Date:From:To:Cc:Subject:References:In-Reply-To; b=cOpbSrFzsblkU4j9n0Wp5hApJp6HHtqth4BLp5mqQylFoFVpBQLyizxeuFSskWeFa unbfwWF+CMHyHZhuzLn07kjn/Kc14bQKGonAKQhRQlNZO0+zt0sCYryD0vA8D1KDF4 ZyhOtsb169teI+Iz9hwTMqpp/q+/RP7snvnPdV7b20VLoVUV7Y1IkQOwPEk29Ck9WZ RghNDZcyHkWWTdEbhavRfM2fd/DTjZIxOxrizTIZD7iF60xEwLYOw7cJ7iRZyJti/l Gt06s5FGglq3UBqraRava0PtZhQFOVQF2Z/Qqg3MA19wwLVy4DPWekWmLw9kxcn8Jq lcXVUxWmCYYuA== Date: Thu, 21 May 2026 10:48:04 +0200 From: "Oscar Salvador (SUSE)" To: mawupeng Cc: muchun.song@linux.dev, osalvador@suse.de, david@kernel.org, akpm@linux-foundation.org, ljs@kernel.org, Liam.Howlett@oracle.com, vbabka@kernel.org, rppt@kernel.org, surenb@google.com, mhocko@suse.com, linmiaohe@huawei.com, nao.horiguchi@gmail.com, mike.kravetz@oracle.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v3] mm/memory-failure: fix hugetlb_lock AA deadlock in get_huge_page_for_hwpoison Message-ID: References: <20260520020128.3506168-1-mawupeng1@huawei.com> <4ffe8dd7-86b6-42aa-a979-a9ae941e068e@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4ffe8dd7-86b6-42aa-a979-a9ae941e068e@huawei.com> On Wed, May 20, 2026 at 07:24:28PM +0800, mawupeng wrote: > You are correct. The refcount dropping logic in the `unmap` path was indeed flawed. > This issue was originally uncovered by fuzzing. Based on the initial stack trace, > we diagnosed it as a recursive locking (AA) deadlock on `hugetlb_lock`. > > We initially suspected that `unmap` had prematurely released the folio reference > count, triggering the free path. However, after a thorough analysis of the refcount > state machine and the actual execution context, we confirmed that this hypothesis > is impossible. The root cause lies elsewhere in the locking hierarchy, and we are > currently tracing the exact call path that leads to the nested `hugetlb_lock` > acquisition. > > The deadlock can be triggered by injecting hardware poison errors on a hugetlb > page while concurrent unmapping activity occurs. The following minimal userspace > test case demonstrates the race condition by spawning multiple processes to > widen the timing window for the lock contention. After staring at it, it is obvious the code is wrong. We __should__ not be calling folio_put under the lock, as recursion will happen if we are the last user holding a reference. Thinking about it, I cannot think of a way we would need nesting here. Anyway, this is a genuine bug, so thanks for that, but it all got very confusing because of the traces pointing to wwrong places. The thing is quite simple: - We start with the assumption that a hugetlb folio is mapped to userspace and that madvise thread#0 thread#1 madvise(folio, MADV_HWPOISON) (we poisoned the page) madvise(folio, MADV_HWPOISON) (second call) unmap(folio) try_memory_failure_hugetlb get_huge_page_for_hwpoison (takes lock) __get_huge_page_for_hwpoison hugetlb_update_hwpoison - we get MF_HUGETLB_FOLIO_PRE_POISONED we jump to out which does folio_put free_huge_page (takes lock.. yaiks) So yes, the fix is to have the folio_put happening not within the lock. Please, send the patch with the right changelog (and no version) and I will ack it. -- Oscar Salvador SUSE Labs