From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-172.mta0.migadu.com (out-172.mta0.migadu.com [91.218.175.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 59C7D38D401 for ; Thu, 14 May 2026 08:58:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778749120; cv=none; b=gkBP4o5jsJ3o5+Ly1e92x4D+dU6uYgtn2ArKYB1bMcKGH3CjQx538qxBNmpAOwzHPkWr9kfEBkXfFIP+1/9SywsiaklEk0+lPMBsl9UM4fy4hKF8zrRRiW3ck42CWv66ClS9XbuEda099s6QXSA15JKZMev+o19rxLhvEu9G2iA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778749120; c=relaxed/simple; bh=vMuOerukQKVUrXP/xJO3P4ndfjQjkdwxRRT4lFlTNbE=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=f3v2q+wpLmLoA3uvNXQ66YO8hm8CTtzQaMX7Evp1X83eDLEO0edrd9UEmjh5NulVHtk/XAf0zB172vjcP7fLSFDpYnS/6zpjnHAYWoLXlJdalueRVJeRNsIOsDr/8sVxm1oio/WCV/jMroanZLwJ0szuVLV/Ij6bxP3SYeCXngI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=nVIpRwXF; arc=none smtp.client-ip=91.218.175.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="nVIpRwXF" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1778749116; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=71daxRLec4jyhYBAvxsKNAYEmlfTLs3LqROItb0Etew=; b=nVIpRwXF2bayTanxg/F32tyPngv3o1/mCGuBtHPUjfYatIIn2szaOvrl6AIaOy5TwH7J6G Qy2ISDfGZAx8nIZBtScls34NVnaSb83jFva/oyZMPo01TV24mGOZNP6PyM0R0Hzn61K+G7 bnUuyYdzVrzkZS2tUP462of/uGAAdbg= From: Kaitao Cheng To: linmiaohe@huawei.com, nao.horiguchi@gmail.com, akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Kaitao Cheng Subject: [PATCH] mm/memory-failure: Use zone_pcp_disable() for poison handling Date: Thu, 14 May 2026 16:57:54 +0800 Message-ID: <20260514085754.84097-1-kaitao.cheng@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT From: Kaitao Cheng __page_handle_poison() used drain_all_pages() instead of zone_pcp_disable() because dissolve_free_hugetlb_folio() could restore HVO vmemmap pages and decrement hugetlb_optimize_vmemmap_key. That static key update took cpu_hotplug_lock through static_key_slow_dec(), while zone_pcp_disable() holds pcp_batch_high_lock. CPU hotplug takes the locks in the opposite order through page_alloc_cpu_online/dead(), so the combination could deadlock. That dependency no longer exists. Commit da3e2d1ca43d ("mm/hugetlb: remove hugetlb_optimize_vmemmap_key static key") removed the HVO static key and the static_branch_dec() from hugetlb_vmemmap_restore_folio(). The dissolve_free_hugetlb_folio() path no longer reaches static_key_slow_dec(). Use zone_pcp_disable() again while dissolving the hugetlb folio and taking the target page off the buddy allocator. This prevents the drained PCP lists from being refilled before take_page_off_buddy() runs, making the page isolation deterministic. Signed-off-by: Kaitao Cheng --- mm/memory-failure.c | 18 +++--------------- 1 file changed, 3 insertions(+), 15 deletions(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 866c4428ac7e..b9619d43173b 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -172,23 +172,11 @@ static int __page_handle_poison(struct page *page) { int ret; - /* - * zone_pcp_disable() can't be used here. It will - * hold pcp_batch_high_lock and dissolve_free_hugetlb_folio() might hold - * cpu_hotplug_lock via static_key_slow_dec() when hugetlb vmemmap - * optimization is enabled. This will break current lock dependency - * chain and leads to deadlock. - * Disabling pcp before dissolving the page was a deterministic - * approach because we made sure that those pages cannot end up in any - * PCP list. Draining PCP lists expels those pages to the buddy system, - * but nothing guarantees that those pages do not get back to a PCP - * queue if we need to refill those. - */ + zone_pcp_disable(page_zone(page)); ret = dissolve_free_hugetlb_folio(page_folio(page)); - if (!ret) { - drain_all_pages(page_zone(page)); + if (!ret) ret = take_page_off_buddy(page); - } + zone_pcp_enable(page_zone(page)); return ret; } -- 2.50.1 (Apple Git-155)