From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 705FFC4332F for ; Tue, 20 Dec 2022 07:25:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0A2308E0008; Tue, 20 Dec 2022 02:25:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 02B138E0001; Tue, 20 Dec 2022 02:25:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DC0418E0008; Tue, 20 Dec 2022 02:25:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id CAFD88E0001 for ; Tue, 20 Dec 2022 02:25:49 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id B05C11408B1 for ; Tue, 20 Dec 2022 07:25:49 +0000 (UTC) X-FDA: 80261850018.10.4727DAE Received: from mail-pg1-f176.google.com (mail-pg1-f176.google.com [209.85.215.176]) by imf27.hostedemail.com (Postfix) with ESMTP id 23DAF40006 for ; Tue, 20 Dec 2022 07:25:47 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=eM2vnsug; spf=pass (imf27.hostedemail.com: domain of shiyn.lin@gmail.com designates 209.85.215.176 as permitted sender) smtp.mailfrom=shiyn.lin@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1671521148; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=JcdvTJSUQptFc3AVHk0e+fX/02wn2fKXzabvFygcaXg=; b=P/fI15U9jmbzWN7+XABnP52dQ3SpZy6Pw0E5rEkMZ9G8bZwAmtMMJDsBZ0b422O1YjXfR9 uuXL+0bTY7cBAsftLq7Fs2EFPZnHAD9+OTcHZ1Sy7SsZCgV86ukS+KCWDfQAXvcHL7r8pG 9mOdn4hc63Ac2manmL9YHV7vXl/fHrA= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=eM2vnsug; spf=pass (imf27.hostedemail.com: domain of shiyn.lin@gmail.com designates 209.85.215.176 as permitted sender) smtp.mailfrom=shiyn.lin@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1671521148; a=rsa-sha256; cv=none; b=C+CmHXwQuGEEAr/6IqafiDWfeEDjF900ytQHvh7lapiz1zrr8pc4YzLNd1spnyBdEYcXoC sYjvXHN6R6SwUjdDtUPrrqhfcglntzZdoOZONd722aafXzQ/DOTdCPSpgu+WyFiFPMWkKe Cd6TOpAuhsBVbtbkgVF1wypBi0sexz4= Received: by mail-pg1-f176.google.com with SMTP id q71so7772762pgq.8 for ; Mon, 19 Dec 2022 23:25:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=JcdvTJSUQptFc3AVHk0e+fX/02wn2fKXzabvFygcaXg=; b=eM2vnsugEhx9Fxi3d/0pz8nOmWBqT6WOzDrbW2LhnZvHmhw0chNBsYkvjN9VHvsJoE XDGGsi1BW5fvDjyFKw5DRpVKmd/q1UNAs2X7gpQCPCeM5unfWrK8STQMKuqo5m+tn9go 0+vnQSFdH2xF64ozR84c/+1qjLCTk8g3LGNK1HF4gauRIInyVcbu4uDtltrrzuP7hpYL 6xnBk5U0Sod8Q3OjEmpZwHe0xKZeNbnQo8Oosk8YJ3s0R8xbrKHj8svVdgtzpBKrYvC+ VkfDBibWFHr5ZAjwx+ekYkHpMvMzJNRc7L7rEtQKsT6YkAzjevQWGv8xW9Fmrvl9nbQC m0qw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=JcdvTJSUQptFc3AVHk0e+fX/02wn2fKXzabvFygcaXg=; b=LTrJuMe9KUBSLQ4Obz2Gtkm5Htawtra4eP2qWlSddZR6JbptT75xSZMSxhgw+9n3QV bvsY0izx1BoGnUchp4GfF9ICql/SJhvQHeUxQ+w9Er4AoOZFjYoVSmZi5sF6bMLGal5/ R06jBzzb9DB/Ni5g4W4t0ZNfSiYUU5cARr/YsEOZ9tv34i1aSMVIdqNbNXIudCOctZzN oUs+WdxEDZvuqNFimwF16dKkOIwweo2jw1JfcZ2lOdtCv2zavW/Yakr1xxHCX9NArG1j vKUm03Sidw3mS+xMXrzHsUbhQbOA8+iY1vQdTYIztkLk3j0hfdvjQ+3iGgZc60qZ51wT ubwA== X-Gm-Message-State: ANoB5plk9WvcO4DORty77yBwjkbyCUnpqyvBDNIq4MHlAD1LNDXfnIki V1c2SuiCEqwd6c6L+C5Fpno= X-Google-Smtp-Source: AA0mqf58qBYe4mGrS2Ei5/k0QEHKSaSFh9E15Ikmi+osvtgPbnlSPrbtQehqB3CGfj1ZyaecavszbA== X-Received: by 2002:aa7:814f:0:b0:56c:232e:3b00 with SMTP id d15-20020aa7814f000000b0056c232e3b00mr41607310pfn.7.1671521147136; Mon, 19 Dec 2022 23:25:47 -0800 (PST) Received: from archlinux.localdomain ([140.121.198.213]) by smtp.googlemail.com with ESMTPSA id q15-20020aa7982f000000b00576f9773c80sm7865544pfl.206.2022.12.19.23.25.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 19 Dec 2022 23:25:46 -0800 (PST) From: Chih-En Lin To: Andrew Morton , Qi Zheng , David Hildenbrand , Matthew Wilcox , Christophe Leroy , John Hubbard , Nadav Amit Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Steven Rostedt , Masami Hiramatsu , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Yang Shi , Peter Xu , Zach O'Keefe , "Liam R . Howlett" , Alex Sierra , Xianting Tian , Colin Cross , Suren Baghdasaryan , Barry Song , Pasha Tatashin , Suleiman Souhlal , Brian Geffon , Yu Zhao , Tong Tiangen , Liu Shixin , Li kunyu , Anshuman Khandual , Vlastimil Babka , Hugh Dickins , Minchan Kim , Miaohe Lin , Gautam Menghani , Catalin Marinas , Mark Brown , Will Deacon , "Eric W . Biederman" , Thomas Gleixner , Sebastian Andrzej Siewior , Andy Lutomirski , Fenghua Yu , Barret Rhoden , Davidlohr Bueso , "Jason A . Donenfeld" , Dinglan Peng , Pedro Fonseca , Jim Huang , Huichun Feng , Chih-En Lin Subject: [PATCH v3 05/14] mm/khugepaged: Break COW PTE before scanning pte Date: Tue, 20 Dec 2022 15:27:34 +0800 Message-Id: <20221220072743.3039060-6-shiyn.lin@gmail.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221220072743.3039060-1-shiyn.lin@gmail.com> References: <20221220072743.3039060-1-shiyn.lin@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Stat-Signature: r755bniz3yaqtobe9p7oxkiqybtu38bu X-Rspam-User: X-Rspamd-Queue-Id: 23DAF40006 X-Rspamd-Server: rspam06 X-HE-Tag: 1671521147-3125 X-HE-Meta: U2FsdGVkX1+82cd0eYj614SXYR5nrunezvx95LCorOQk2QZXEQ5tWimD3mbuY4DSl5ZDmoYd0GjcnfwK23hR6uOqXovoEV949IxTMicPN9A+JMV4NZcAMexdpYbCVTRCDCdSFYNWX75hEsohLn+q5o2gery08Ju2yE6NCxPbSMcMl48oKt/Wf85QFbtBVDX/gs+BdRhSZOfL4e1B/agW+J7xUrJmgnrg+ZEqv+71qBlrgZ+2LN2pqAfEw8oLZepzc/OQc6C1OpLw2vHOP0K+EpP7HAqlE/XDVT40wxQETtKYR3UyJ9HaSaqjBkjMiOCVjAbtfyek7sFYfBkKKGqPnPYD1ILkFQ/G3tLHA3jMXG+p//oJ6H0AIjxeJ6Brj2My9GV9wbJJtOhyTBMk24l3915WxaytvxfY+/QJgep8HACqf1yUOiYMluyRNe7Cuv4NnshbymXAdJRAAkcD9x2k88bNo9apj310rtddJoFnCg46XT1AhA4KUfVl7Vmz4kmMsyJE/cp+xqBKXs7vXk64muJ9tcBF/QuB5UA1hNEL3Fy9c8kUEPxFa7R8uft24QcqRopujo9lRTfjLXr8Y1yqJbGeVOfRniveq5NvUxIhLTiYFa4MBrP1vAtuLdUSVXeKLnCeZAxLBzHnl5NnYeftkZaWpHYWEOeJIgE08nrDQ2l2/KKW6djYY+RvG9DoBo/7MqYxFnYoBR7NkVIySmNPkqjeq0wKvITR0d1jZGCagsVfIstNVeFVLTSpl74LZrTR6NyTcAsyo6ExKUj7LK+CHoZ44cQqwcNgsXvFODyQ7t2b0kib0yVJPGBHBKjzkk6nhHPiDHZEA4uocapdSlfXU6BrjpPLe6Y1uKu6WfMqO8S2hVEoRPKvmqpSxxfv2+7GtzxKY/bikr84xhZgxCNp9WMxVVi11FLuane95czJPwYjSutCvy1Hiwjh8U+k2aF3Fa9td5yR2eryqO3SsWI eCx4WDZ3 9yYRpTNpaavoaFFwYWMH0ooBmTMPql0WqA94OS1XcdHbo0yWb7Mg9zjAATLcdShxMgzcklL4JGy45WaZwk98FQA6/QHKVFlgLJtud9/j2+hO50f4Y56v7t/sSekMGAgJ/FGuHNIJbx5CZQG6dD+im0uIYK2RCSpx9p5lA4ZixLU8+Tn8ijlo08kPQxsTKvVzYA9BjBxnnbnDBHS/siQrl0ui8doWjKIIPCNFpQq5X9gA655w= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We should not allow THP to collapse COW-ed PTE. So, break COW PTE before collapse_pte_mapped_thp() collapse to THP. Also, break COW PTE before khugepaged_scan_pmd() scan PTE. Signed-off-by: Chih-En Lin --- include/trace/events/huge_memory.h | 1 + mm/khugepaged.c | 23 +++++++++++++++++++++++ 2 files changed, 24 insertions(+) diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h index 760455dfa8600..881553aa0f2f2 100644 --- a/include/trace/events/huge_memory.h +++ b/include/trace/events/huge_memory.h @@ -13,6 +13,7 @@ EM( SCAN_PMD_NULL, "pmd_null") \ EM( SCAN_PMD_NONE, "pmd_none") \ EM( SCAN_PMD_MAPPED, "page_pmd_mapped") \ + EM( SCAN_COW_PTE, "cowed_pte") \ EM( SCAN_EXCEED_NONE_PTE, "exceed_none_pte") \ EM( SCAN_EXCEED_SWAP_PTE, "exceed_swap_pte") \ EM( SCAN_EXCEED_SHARED_PTE, "exceed_shared_pte") \ diff --git a/mm/khugepaged.c b/mm/khugepaged.c index a8d5ef2a77d24..106e1ce3931f7 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -31,6 +31,7 @@ enum scan_result { SCAN_PMD_NULL, SCAN_PMD_NONE, SCAN_PMD_MAPPED, + SCAN_COW_PTE, SCAN_EXCEED_NONE_PTE, SCAN_EXCEED_SWAP_PTE, SCAN_EXCEED_SHARED_PTE, @@ -1030,6 +1031,9 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, if (result != SCAN_SUCCEED) goto out_up_write; + /* We should already handled COW-ed PTE. */ + VM_WARN_ON(test_bit(MMF_COW_PTE, &mm->flags) && !pmd_write(*pmd)); + anon_vma_lock_write(vma->anon_vma); mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, NULL, mm, @@ -1139,6 +1143,16 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm, memset(cc->node_load, 0, sizeof(cc->node_load)); nodes_clear(cc->alloc_nmask); + + /* + * Before we scan each pte entry, we should first check PTE + * could be modified. So, we break COW if PTE is COW-ed. + */ + if (break_cow_pte(vma, pmd, address) < 0) { + result = SCAN_COW_PTE; + goto out; + } + pte = pte_offset_map_lock(mm, pmd, address, &ptl); for (_address = address, _pte = pte; _pte < pte + HPAGE_PMD_NR; _pte++, _address += PAGE_SIZE) { @@ -1197,6 +1211,10 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm, goto out_unmap; } + /* + * If we only trigger the break COW PTE, the page usually + * still in COW mapping, which it still be shared. + */ if (page_mapcount(page) > 1) { ++shared; if (cc->is_khugepaged && @@ -1472,6 +1490,11 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, goto drop_hpage; } + /* We shouldn't let COW-ed PTE collapse. */ + if (break_cow_pte(vma, pmd, haddr) < 0) + goto drop_hpage; + VM_WARN_ON(test_bit(MMF_COW_PTE, &mm->flags) && !pmd_write(*pmd)); + start_pte = pte_offset_map_lock(mm, pmd, haddr, &ptl); result = SCAN_FAIL; -- 2.37.3