From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6CB65C4332F for ; Tue, 27 Dec 2022 02:37:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 789918E0002; Mon, 26 Dec 2022 21:37:22 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 739CC8E0001; Mon, 26 Dec 2022 21:37:22 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5B3308E0002; Mon, 26 Dec 2022 21:37:22 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 483818E0001 for ; Mon, 26 Dec 2022 21:37:22 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 1A1C6C063C for ; Tue, 27 Dec 2022 02:37:22 +0000 (UTC) X-FDA: 80286524724.22.99C8015 Received: from mail-pj1-f48.google.com (mail-pj1-f48.google.com [209.85.216.48]) by imf07.hostedemail.com (Postfix) with ESMTP id 49CAA4000D for ; Tue, 27 Dec 2022 02:37:20 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=ipYTU1VH; spf=pass (imf07.hostedemail.com: domain of shiyn.lin@gmail.com designates 209.85.216.48 as permitted sender) smtp.mailfrom=shiyn.lin@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1672108640; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=WNTcuC/uMnLX3sHEs8cTcq592eiISwDTWsJLCh9JY8s=; b=Nxm0EBHoDIuRVDbBzL+jm2/mmtmly8ZeDvfvvN2m82s7m4/TBn/HK2rdOWaUmSJxufBYMI EtzFr+1LPp7fb1nX/sGBmkreB38kGSw+pvCFydFQBtn87/dKhv1pYD4fP6YP8Ds2F4jsnq GpzUh9bvq5N19Eo5ejW6CdwyoIEDlcg= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=ipYTU1VH; spf=pass (imf07.hostedemail.com: domain of shiyn.lin@gmail.com designates 209.85.216.48 as permitted sender) smtp.mailfrom=shiyn.lin@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1672108640; a=rsa-sha256; cv=none; b=12DUKGN3tNwZV5GidSYupmJ6x6cJjAck75WLuYQy2p8PThcvcRWevfl/KZfBz1YI95mj5g wbHQr+V1TZtYFBANNJlIsZMwwvEgLtnIQBrpZwG8XzI5EQPU7kXMVh4YHGZk6ZrS8+Vj/J R0jHI5xfUhHp+oJiKitCZfVConTwllM= Received: by mail-pj1-f48.google.com with SMTP id o1-20020a17090a678100b00219cf69e5f0so16065909pjj.2 for ; Mon, 26 Dec 2022 18:37:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=WNTcuC/uMnLX3sHEs8cTcq592eiISwDTWsJLCh9JY8s=; b=ipYTU1VHHkMc/2Z723iIhWnbWXKGdoiEGlqGg821lKYKNKXm/iSlIMDM3PSNh4GfZC FYZDG4tPXOmPjK2RN11otymbDxPFd6XqrV1v+5eFwUFPjrtX03SwAnqL/nOFripGKMvS ri7F2AOwBqeBH+SiSGBJCp0sFqc5oa+toGhcPMslmZSHIgPimRhG/zgydZxXZw4vXSuQ D2/iIIP2/oubu2m9NN3uqgk5l1vDTCFxjfyzvxPS7TURG6X7FjUY/99DlZKn+nLlo1tl /p937Xve3RrRkWqawgxELSVb2tD0nswU2SPLOQe9Yuipys58Q8WtzQZksS89CsHGT3Qe ZfxQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=WNTcuC/uMnLX3sHEs8cTcq592eiISwDTWsJLCh9JY8s=; b=WZI+9+PsX5c7dWTQXuZqIPVnhcp9XX1u2O4Blej7ggALpTD+VXp0n8J6YCe4/+WQpQ dB87Owj9GGtikl3Tz2LVw9zS8w+p0BPzqJYOrWljSMDEyUgLTIrMzNRlDDeunsaBFx1Z 02gTT8hWjy5Sa5Z7WHL5gkaQmSRbx9HyCma1BSTf9cMV1DLMoffiqJ3Yyu6yiWgt9+Ce mVBEv/9nNTpSeOGn1LbpIiXBMcsqfNkvEnFGt+olqzRgaWHVwoPE5Q+Abg/I08hO8wck mAn257hSovsen1TkfBlBEeJXUTt5JqCNOjRRwet8/rcOnQQbPDhoTnap436x+J5KXHQb Tauw== X-Gm-Message-State: AFqh2kpHKuCxWJ4o3RqL6by3tzsSfSwLHBm0B6gmKv4FnoJVM5ERtG2c eKwUUs1de/1QiNjidGXCmLk= X-Google-Smtp-Source: AMrXdXv51Ha5eu6nVcy7J37wJ6My3h1JNeMoiYkvi2rsZ3j8RZ95WeibJVVMoFNoKmGuroeYo/8S0w== X-Received: by 2002:a05:6a20:6f61:b0:9d:efd3:66f7 with SMTP id gu33-20020a056a206f6100b0009defd366f7mr19787678pzb.62.1672108639012; Mon, 26 Dec 2022 18:37:19 -0800 (PST) Received: from archlinux.localdomain ([140.121.198.213]) by smtp.gmail.com with ESMTPSA id 31-20020a63145f000000b0047048c201e3sm6924997pgu.33.2022.12.26.18.37.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 26 Dec 2022 18:37:18 -0800 (PST) Date: Tue, 27 Dec 2022 10:40:22 +0800 From: Chih-En Lin To: Barry Song Cc: Andrew Morton , Qi Zheng , David Hildenbrand , Matthew Wilcox , Christophe Leroy , John Hubbard , Nadav Amit , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Steven Rostedt , Masami Hiramatsu , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Yang Shi , Peter Xu , Zach O'Keefe , "Liam R . Howlett" , Alex Sierra , Xianting Tian , Colin Cross , Suren Baghdasaryan , Pasha Tatashin , Suleiman Souhlal , Brian Geffon , Yu Zhao , Tong Tiangen , Liu Shixin , Li kunyu , Anshuman Khandual , Vlastimil Babka , Hugh Dickins , Minchan Kim , Miaohe Lin , Gautam Menghani , Catalin Marinas , Mark Brown , Will Deacon , "Eric W . Biederman" , Thomas Gleixner , Sebastian Andrzej Siewior , Andy Lutomirski , Fenghua Yu , Barret Rhoden , Davidlohr Bueso , "Jason A . Donenfeld" , Dinglan Peng , Pedro Fonseca , Jim Huang , Huichun Feng Subject: Re: [PATCH v3 04/14] mm/rmap: Break COW PTE in rmap walking Message-ID: References: <20221220072743.3039060-1-shiyn.lin@gmail.com> <20221220072743.3039060-5-shiyn.lin@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Stat-Signature: 3th1g4bi1iyaxbmap61pxpbxnyboorbm X-Rspam-User: X-Rspamd-Queue-Id: 49CAA4000D X-Rspamd-Server: rspam06 X-HE-Tag: 1672108640-675776 X-HE-Meta: U2FsdGVkX1+ZAREYpTahRpSNxZ0TNysbGsoGgvTJnoupdmqq20RH2SAngtpII/01uauxIAWVoTyyrAk7MzOe48hbQvNQvnYvUVwIN48/EtrT9yDcm51STUJN74QY104iICzX9sDyW0QIHH0JLMZL9+ZQM+AlD+TY5bqW6Wku8ZhRzELTsGGZspjcqgQUDaBQdOyoFqBbasIqDYAUsjHFhUc/WuiuxgOHEZqHuGWN+2wGluLu0EAN2ESLpoMHWqEcbyfsSH4HXayS/0+8y8ZCT+1N2y/HMqPu2Dvql1Pxo1nomFawmCH0uYHHv1BJNawz+kM/Pg+Au3jMrQScmomKjqpZ/91qRWQjY3j8Anl20qxtcxRLFdnZYrsVevxnPv+39CxcpYYqTBFJRtyzD52GbIHRvEl5b5IrIqHeozeM5qQ9v/pL3u/o0bjW1Az7+iPn6oMhS+l6QiHighZwKlCEpaP9KSDi9B3ZmrjW8FnzFXRkbJk+1aWsZumVWx4JGL6qSVx5FK7P5fBz3ACsWyU97Cn76AP/VNQU4GQq5TRQzGy20aFvXEHk+2FFteWOcgUKnrRF2jjyf5yE5v1pGON+l3OUN4EH3n/69HBrNn1bJjHCM5ArsYQZAprzlNvSiO0uFRji9/QmvS2Lz0PCgldm4O10jJ200hAXwvMFWumPuy3Ky2r7LGGdgjf7m0TomM1E4P4x2ggRHbSnRB+JewAGw4Z9xTp/sdzGkNq7O1e29QMLOolKNC1Qi1TSpeVkVCVJ4J0zXI3/jP0Ka8wehS0OP9YdmVJc7AAlrcKhArtH5XqoBNgJ2gLI49JxUnzrxyJEin9ABlyCs7Qz1fviEQzr0dyf2IbvMGwWqeIe7fgm3nNpuJ1711JgsB7aS30u6CRamDNa7kEZpy4Y5xwUSBG8uoxbeHLMdAoCkd9G9KFGw5KCOaT0DaC0fij4+iJ4HNlT+UqZIfws+0Ltaj2e6Nn LoUG9R0u k7Ym/K7Dt2TGHLV0tidBHHFCRY2KsOkk3gNfbBvX9PhV2oZrEfvuqPsj36oFWmaqPbUj6i3lp3tcB/t3qAv8gOQv9/zSue1C5ebvfGPynK2Qlnussg8s4adD5SLwzCkjEhgk3VP+OY8VLf3GDli6980NYuw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Dec 27, 2022 at 02:15:00PM +1300, Barry Song wrote: > On Mon, Dec 26, 2022 at 11:56 PM Chih-En Lin wrote: > > > > On Mon, Dec 26, 2022 at 10:40:49PM +1300, Barry Song wrote: > > > On Tue, Dec 20, 2022 at 8:25 PM Chih-En Lin wrote: > > > > > > > > Some of the features (unmap, migrate, device exclusive, mkclean, etc) > > > > might modify the pte entry via rmap. Add a new page vma mapped walk > > > > flag, PVMW_BREAK_COW_PTE, to indicate the rmap walking to break COW PTE. > > > > > > > > Signed-off-by: Chih-En Lin > > > > --- > > > > include/linux/rmap.h | 2 ++ > > > > mm/migrate.c | 3 ++- > > > > mm/page_vma_mapped.c | 2 ++ > > > > mm/rmap.c | 12 +++++++----- > > > > mm/vmscan.c | 7 ++++++- > > > > 5 files changed, 19 insertions(+), 7 deletions(-) > > > > > > > > diff --git a/include/linux/rmap.h b/include/linux/rmap.h > > > > index bd3504d11b155..d0f07e5519736 100644 > > > > --- a/include/linux/rmap.h > > > > +++ b/include/linux/rmap.h > > > > @@ -368,6 +368,8 @@ int make_device_exclusive_range(struct mm_struct *mm, unsigned long start, > > > > #define PVMW_SYNC (1 << 0) > > > > /* Look for migration entries rather than present PTEs */ > > > > #define PVMW_MIGRATION (1 << 1) > > > > +/* Break COW-ed PTE during walking */ > > > > +#define PVMW_BREAK_COW_PTE (1 << 2) > > > > > > > > struct page_vma_mapped_walk { > > > > unsigned long pfn; > > > > diff --git a/mm/migrate.c b/mm/migrate.c > > > > index dff333593a8ae..a4be7e04c9b09 100644 > > > > --- a/mm/migrate.c > > > > +++ b/mm/migrate.c > > > > @@ -174,7 +174,8 @@ void putback_movable_pages(struct list_head *l) > > > > static bool remove_migration_pte(struct folio *folio, > > > > struct vm_area_struct *vma, unsigned long addr, void *old) > > > > { > > > > - DEFINE_FOLIO_VMA_WALK(pvmw, old, vma, addr, PVMW_SYNC | PVMW_MIGRATION); > > > > + DEFINE_FOLIO_VMA_WALK(pvmw, old, vma, addr, > > > > + PVMW_SYNC | PVMW_MIGRATION | PVMW_BREAK_COW_PTE); > > > > > > > > while (page_vma_mapped_walk(&pvmw)) { > > > > rmap_t rmap_flags = RMAP_NONE; > > > > diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c > > > > index 93e13fc17d3cb..5dfc9236dc505 100644 > > > > --- a/mm/page_vma_mapped.c > > > > +++ b/mm/page_vma_mapped.c > > > > @@ -251,6 +251,8 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) > > > > step_forward(pvmw, PMD_SIZE); > > > > continue; > > > > } > > > > + if (pvmw->flags & PVMW_BREAK_COW_PTE) > > > > + break_cow_pte(vma, pvmw->pmd, pvmw->address); > > > > if (!map_pte(pvmw)) > > > > goto next_pte; > > > > this_pte: > > > > diff --git a/mm/rmap.c b/mm/rmap.c > > > > index 2ec925e5fa6a9..b1b7dcbd498be 100644 > > > > --- a/mm/rmap.c > > > > +++ b/mm/rmap.c > > > > @@ -807,7 +807,8 @@ static bool folio_referenced_one(struct folio *folio, > > > > struct vm_area_struct *vma, unsigned long address, void *arg) > > > > { > > > > struct folio_referenced_arg *pra = arg; > > > > - DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0); > > > > + /* it will clear the entry, so we should break COW PTE. */ > > > > + DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, PVMW_BREAK_COW_PTE); > > > > > > what do you mean by breaking cow pte? in memory reclamation case, we are only > > > checking and clearing page referenced bit in pte, do we really need to > > > break cow? > > > > Since we might clear page referenced bit, it will modify the write > > protection shared page table (COW-ed PTE). We should duplicate it. > > > > Actually, I didn’t break COW at first because it will conditionally > > modify the table and only clear the referenced bit. > > So, if clearing page referenced bit is fine to the COW-ed PTE table > > and the break COW PTE is unnecessary here, we can remove it. > > if a page is mapped by 100 processes and anyone of these 100 processes > access this page, we will get a reference bit in the PTE. Otherwise, we will > have to scan 100 PTEs to figure out if a page is accessed and should be > kept in LRU. > i don't see the fundamental necessity to duplicate PTE only because of clearing > the reference bit. as keeping the pte shared will help save a lot of cost for > memory reclamation for those CPUs which have hardware reference bits > in PTE. > As I knew, if the page reference bit is unset and the accessed bit of the pte entry is set, the reclamation will clear the accessed bit and set the reference bit of the page. So, for the such case with COW PTE, the logic is same as the normal PTE one. It makes sense. Thanks for helping me to clear up the point. Thanks, Chih-En Lin