From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD322C71153 for ; Mon, 28 Aug 2023 10:28:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231436AbjH1K2F (ORCPT ); Mon, 28 Aug 2023 06:28:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33900 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231240AbjH1KWj (ORCPT ); Mon, 28 Aug 2023 06:22:39 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 13C97103 for ; Mon, 28 Aug 2023 03:22:24 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id A788462F9C for ; Mon, 28 Aug 2023 10:22:23 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 913F1C433CA; Mon, 28 Aug 2023 10:22:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1693218143; bh=wYW0UW+5XKSt9w3MqXDxKc7I/RqpX/gPWzKrH8fwTKA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=u9MLgix15mQGAsQm4CzGSox4cz9cybf9fCgKv2Q0BQSmxy9KlesFXu4tRA9z7iIyd 2k8GeDLBdkeGcM4Fxm8cooRW0/H1Y89aL8Koh06hsfMQvedyVTIGlrggN9A1b1f78G KAQbAJ/0jDVQMU7VNeUwg+DeHW3Z6NveJZupiFlE= From: Greg Kroah-Hartman To: stable@vger.kernel.org Cc: Greg Kroah-Hartman , patches@lists.linux.dev, Yin Fengwei , Yu Zhao , Ryan Roberts , David Hildenbrand , Kefeng Wang , Matthew Wilcox , Minchan Kim , "Vishal Moola (Oracle)" , Yang Shi , Andrew Morton Subject: [PATCH 6.4 111/129] madvise:madvise_cold_or_pageout_pte_range(): dont use mapcount() against large folio for sharing check Date: Mon, 28 Aug 2023 12:13:10 +0200 Message-ID: <20230828101201.059923563@linuxfoundation.org> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20230828101157.383363777@linuxfoundation.org> References: <20230828101157.383363777@linuxfoundation.org> User-Agent: quilt/0.67 X-stable: review X-Patchwork-Hint: ignore MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org 6.4-stable review patch. If anyone has any objections, please let me know. ------------------ From: Yin Fengwei commit 2f406263e3e954aa24c1248edcfa9be0c1bb30fa upstream. Patch series "don't use mapcount() to check large folio sharing", v2. In madvise_cold_or_pageout_pte_range() and madvise_free_pte_range(), folio_mapcount() is used to check whether the folio is shared. But it's not correct as folio_mapcount() returns total mapcount of large folio. Use folio_estimated_sharers() here as the estimated number is enough. This patchset will fix the cases: User space application call madvise() with MADV_FREE, MADV_COLD and MADV_PAGEOUT for specific address range. There are THP mapped to the range. Without the patchset, the THP is skipped. With the patch, the THP will be split and handled accordingly. David reported the cow self test skip some cases because of MADV_PAGEOUT skip THP: https://lore.kernel.org/linux-mm/9e92e42d-488f-47db-ac9d-75b24cd0d037@intel.com/T/#mbf0f2ec7fbe45da47526de1d7036183981691e81 and I confirmed this patchset make it work again. This patch (of 3): Commit 07e8c82b5eff ("madvise: convert madvise_cold_or_pageout_pte_range() to use folios") replaced the page_mapcount() with folio_mapcount() to check whether the folio is shared by other mapping. It's not correct for large folio. folio_mapcount() returns the total mapcount of large folio which is not suitable to detect whether the folio is shared. Use folio_estimated_sharers() which returns a estimated number of shares. That means it's not 100% correct. It should be OK for madvise case here. User-visible effects is that the THP is skipped when user call madvise. But the correct behavior is THP should be split and processed then. NOTE: this change is a temporary fix to reduce the user-visible effects before the long term fix from David is ready. Link: https://lkml.kernel.org/r/20230808020917.2230692-1-fengwei.yin@intel.com Link: https://lkml.kernel.org/r/20230808020917.2230692-2-fengwei.yin@intel.com Fixes: 07e8c82b5eff ("madvise: convert madvise_cold_or_pageout_pte_range() to use folios") Signed-off-by: Yin Fengwei Reviewed-by: Yu Zhao Reviewed-by: Ryan Roberts Cc: David Hildenbrand Cc: Kefeng Wang Cc: Matthew Wilcox Cc: Minchan Kim Cc: Vishal Moola (Oracle) Cc: Yang Shi Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman --- mm/madvise.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/mm/madvise.c +++ b/mm/madvise.c @@ -376,7 +376,7 @@ static int madvise_cold_or_pageout_pte_r folio = pfn_folio(pmd_pfn(orig_pmd)); /* Do not interfere with other mappings of this folio */ - if (folio_mapcount(folio) != 1) + if (folio_estimated_sharers(folio) != 1) goto huge_unlock; if (pageout_anon_only_filter && !folio_test_anon(folio)) @@ -448,7 +448,7 @@ regular_folio: * are sure it's worth. Split it if we are only owner. */ if (folio_test_large(folio)) { - if (folio_mapcount(folio) != 1) + if (folio_estimated_sharers(folio) != 1) break; if (pageout_anon_only_filter && !folio_test_anon(folio)) break;