From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 44DDF223DC0; Mon, 2 Jun 2025 15:16:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1748877395; cv=none; b=EZW6QHQj0B1BVZLFMeM6WywGEOS5+oXWpW90eBKH4I8MbTsDTOBqcrvMvVF89xZl15aXTZMP8x1sHyZVGz43J8uDR73VaW96vBvEkjKTZG3x66TUhl9uXa2TVHvy+JrM/PF9WyUvdbgpVyqs+/iLfH5pJMKJteNoOpcLwI2Ej3E= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1748877395; c=relaxed/simple; bh=ulJ2vLC4i5dYBqEWE2rgxXz7EN1E/XvXMoDWoDlkBUo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=iBQF07h3pHXXzHq5UnP+BEYevCFQc6ekSTCXTnurhvbsi0pxbroFzFywfj8CqUkb3+kZetOLKN+Y1DfIEsCipPh4JC0MCfWEMnJGMQ+cPEB6UPOZ5+eGOCXDH9GS3FPfwItxuK1XMuI6hRIog/WFDJaeS1yRGn8XOBX+Mrj5VzY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linuxfoundation.org header.i=@linuxfoundation.org header.b=rel8KTCh; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linuxfoundation.org header.i=@linuxfoundation.org header.b="rel8KTCh" Received: by smtp.kernel.org (Postfix) with ESMTPSA id BED20C4CEEB; Mon, 2 Jun 2025 15:16:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1748877395; bh=ulJ2vLC4i5dYBqEWE2rgxXz7EN1E/XvXMoDWoDlkBUo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=rel8KTChc6aZfJhMdbxYLsAZa6wiqBXZsWM4cEDbIUszapZk42/U2iY9wy3FfBDYK +cis2up92tjaHS2PsHMuubt5MtO1blZp8SKW5nCrJ8AHXJVi6StIL9pNT9X7YWHGsw FMLxe2Bb2Ds6wqrcOrJXe3Q482MdSxiaICxIFMGw= From: Greg Kroah-Hartman To: stable@vger.kernel.org Cc: Greg Kroah-Hartman , patches@lists.linux.dev, Tianyang Zhang , Suren Baghdasaryan , Vlastimil Babka , Michal Hocko , Brendan Jackman , Johannes Weiner , Zi Yan , Andrew Morton Subject: [PATCH 6.1 264/325] mm/page_alloc.c: avoid infinite retries caused by cpuset race Date: Mon, 2 Jun 2025 15:49:00 +0200 Message-ID: <20250602134330.498435748@linuxfoundation.org> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250602134319.723650984@linuxfoundation.org> References: <20250602134319.723650984@linuxfoundation.org> User-Agent: quilt/0.68 X-stable: review X-Patchwork-Hint: ignore Precedence: bulk X-Mailing-List: stable@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit 6.1-stable review patch. If anyone has any objections, please let me know. ------------------ From: Tianyang Zhang commit e05741fb10c38d70bbd7ec12b23c197b6355d519 upstream. __alloc_pages_slowpath has no change detection for ac->nodemask in the part of retry path, while cpuset can modify it in parallel. For some processes that set mempolicy as MPOL_BIND, this results ac->nodemask changes, and then the should_reclaim_retry will judge based on the latest nodemask and jump to retry, while the get_page_from_freelist only traverses the zonelist from ac->preferred_zoneref, which selected by a expired nodemask and may cause infinite retries in some cases cpu 64: __alloc_pages_slowpath { /* ..... */ retry: /* ac->nodemask = 0x1, ac->preferred->zone->nid = 1 */ if (alloc_flags & ALLOC_KSWAPD) wake_all_kswapds(order, gfp_mask, ac); /* cpu 1: cpuset_write_resmask update_nodemask update_nodemasks_hier update_tasks_nodemask mpol_rebind_task mpol_rebind_policy mpol_rebind_nodemask // mempolicy->nodes has been modified, // which ac->nodemask point to */ /* ac->nodemask = 0x3, ac->preferred->zone->nid = 1 */ if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags, did_some_progress > 0, &no_progress_loops)) goto retry; } Simultaneously starting multiple cpuset01 from LTP can quickly reproduce this issue on a multi node server when the maximum memory pressure is reached and the swap is enabled Link: https://lkml.kernel.org/r/20250416082405.20988-1-zhangtianyang@loongson.cn Fixes: c33d6c06f60f ("mm, page_alloc: avoid looking up the first zone in a zonelist twice") Signed-off-by: Tianyang Zhang Reviewed-by: Suren Baghdasaryan Reviewed-by: Vlastimil Babka Cc: Michal Hocko Cc: Brendan Jackman Cc: Johannes Weiner Cc: Zi Yan Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman --- mm/page_alloc.c | 8 ++++++++ 1 file changed, 8 insertions(+) --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5195,6 +5195,14 @@ restart: } retry: + /* + * Deal with possible cpuset update races or zonelist updates to avoid + * infinite retries. + */ + if (check_retry_cpuset(cpuset_mems_cookie, ac) || + check_retry_zonelist(zonelist_iter_cookie)) + goto restart; + /* Ensure kswapd doesn't accidentally go to sleep as long as we loop */ if (alloc_flags & ALLOC_KSWAPD) wake_all_kswapds(order, gfp_mask, ac);