From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 63339C433DB for ; Fri, 15 Jan 2021 17:09:32 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id F193F2333E for ; Fri, 15 Jan 2021 17:09:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F193F2333E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 9F1658D01A1; Fri, 15 Jan 2021 12:09:27 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 99FAE8D019D; Fri, 15 Jan 2021 12:09:27 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8691E8D01A1; Fri, 15 Jan 2021 12:09:27 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0236.hostedemail.com [216.40.44.236]) by kanga.kvack.org (Postfix) with ESMTP id 6E8188D019D for ; Fri, 15 Jan 2021 12:09:27 -0500 (EST) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 32D711803C4F6 for ; Fri, 15 Jan 2021 17:09:27 +0000 (UTC) X-FDA: 77708645574.13.nut70_420095127531 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin13.hostedemail.com (Postfix) with ESMTP id EC11118086CA1 for ; Fri, 15 Jan 2021 17:09:26 +0000 (UTC) X-HE-Tag: nut70_420095127531 X-Filterd-Recvd-Size: 15328 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by imf45.hostedemail.com (Postfix) with ESMTP for ; Fri, 15 Jan 2021 17:09:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1610730565; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/1vlUeHhytX0Zn1XKXcct2jvLo89TqfY4pIBek2Nk6w=; b=X9sdlObl+XQ/0lVK7/JX1F9Qec/c6SyRvwrAyDR8ObLMLNh0VyecBmmsSxlmPTBMLX0y/B dBU4evy7mZCCsF6PV1eh79nA1eeL9M2sTBOsdPH5CKFbmHnMJeUZtSGuVZuKL28zTOVY48 e3tJfaM3jmhwEzPcVMzPhbgXp4N32Is= Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-286-nvnj7ygcOvGYrRkDbtJPVA-1; Fri, 15 Jan 2021 12:09:24 -0500 X-MC-Unique: nvnj7ygcOvGYrRkDbtJPVA-1 Received: by mail-qt1-f200.google.com with SMTP id b24so7851444qtt.22 for ; Fri, 15 Jan 2021 09:09:24 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=/1vlUeHhytX0Zn1XKXcct2jvLo89TqfY4pIBek2Nk6w=; b=sfhLm5sYfe4hPiq0XaH0DnOGYRVKsGg8AZdqhgL5MLtq5njDFPirRXg6X2T02aNQ2o vdhuIgqOwGvrUZvgSluRRe7Y6kYnx9CM/k2qc7JPwEW8YYTR1xU/6yMfAgWNWx4Q3OUO 6/GLYoP0YD/jaQU/W2hTs8EHj7ZtmFZKHSAM6Yy8BFZXY86eXvhvE6JPX7edOk6UnIeb NS0uv5xeMfqA0ISVIDixmka0wybAOSET+GgBLn7qeQl3vCutrpqhpQNxQzPs4A75tRhR 8dn6hrC8BcQIJ6tpZPU6kPjjcoFJMLfQLf5CLfnyJYPu649P0JI2qwpdolF7bNneZTBK AIpg== X-Gm-Message-State: AOAM530j7RCk8ckFtg7dFUqW4xwZsnfORhd7wR+L7jJCKuc9ssyeMKUR Jx31Mb01BiCPhXDwoCGSGt8ns5yJ3gjRBWZc6GFaEOhQQRZvsQ7gnOkd6Rsfl1vuak70MHzJcha Lc1Cwi8J6Nmc= X-Received: by 2002:a37:a1d6:: with SMTP id k205mr13509294qke.384.1610730563101; Fri, 15 Jan 2021 09:09:23 -0800 (PST) X-Google-Smtp-Source: ABdhPJyUDTEd6KHwD8hcYvYDXopX+e8jSdEwnqhv+y4ihKk9o1h30UduZojRTJB6Qc+C6GSA3qFLSQ== X-Received: by 2002:a37:a1d6:: with SMTP id k205mr13509268qke.384.1610730562780; Fri, 15 Jan 2021 09:09:22 -0800 (PST) Received: from localhost.localdomain ([142.126.83.202]) by smtp.gmail.com with ESMTPSA id d123sm5187840qke.95.2021.01.15.09.09.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 15 Jan 2021 09:09:21 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Mike Rapoport , Mike Kravetz , peterx@redhat.com, Jerome Glisse , "Kirill A . Shutemov" , Hugh Dickins , Axel Rasmussen , Matthew Wilcox , Andrew Morton , Andrea Arcangeli , Nadav Amit Subject: [PATCH RFC 07/30] mm/swap: Introduce the idea of special swap ptes Date: Fri, 15 Jan 2021 12:08:44 -0500 Message-Id: <20210115170907.24498-8-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210115170907.24498-1-peterx@redhat.com> References: <20210115170907.24498-1-peterx@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We used to have special swap entries, like migration entries, hw-poison entries, device private entries, etc. Those "special swap entries" reside in the range that they need to be at = least swap entries first, and their types are decided by swp_type(entry). This patch introduces another idea called "special swap ptes". It's very easy to get confused against "special swap entries", but a spei= cal swap pte should never contain a swap entry at all. It means, it's illega= l to call pte_to_swp_entry() upon a special swap pte. Make the uffd-wp special pte to be the first special swap pte. Before this patch, is_swap_pte()=3D=3Dtrue means one of the below: (a.1) The pte has a normal swap entry (non_swap_entry()=3D=3Dfalse). = For example, when an anonymous page got swapped out. (a.2) The pte has a special swap entry (non_swap_entry()=3D=3Dtrue). = For example, a migration entry, a hw-poison entry, etc. After this patch, is_swap_pte()=3D=3Dtrue means one of the below, where c= ase (b) is added: (a) The pte contains a swap entry. (a.1) The pte has a normal swap entry (non_swap_entry()=3D=3Dfalse). = For example, when an anonymous page got swapped out. (a.2) The pte has a special swap entry (non_swap_entry()=3D=3Dtrue). = For example, a migration entry, a hw-poison entry, etc. (b) The pte does not contain a swap entry at all (so it cannot be passed into pte_to_swp_entry()). For example, uffd-wp special swap pte. Teach the whole mm core about this new idea. It's done by introducing an= other helper called pte_has_swap_entry() which stands for case (a.1) and (a.2). Before this patch, it will be the same as is_swap_pte() because there's n= o special swap pte yet. Now for most of the previous use of is_swap_entry(= ) in mm core, we'll need to use the new helper pte_has_swap_entry() instead, t= o make sure we won't try to parse a swap entry from a swap special pte (which do= es not contain a swap entry at all!). We either handle the swap special pte, or= it'll naturally use the default "else" paths. Warn properly (e.g., in do_swap_page()) when we see a special swap pte - = we should never call do_swap_page() upon those ptes, but just to bail out ea= rly if it happens. Signed-off-by: Peter Xu --- fs/proc/task_mmu.c | 14 ++++++++------ include/linux/swapops.h | 39 ++++++++++++++++++++++++++++++++++++++- mm/khugepaged.c | 11 ++++++++++- mm/memcontrol.c | 2 +- mm/memory.c | 7 +++++++ mm/migrate.c | 2 +- mm/mprotect.c | 2 +- mm/mremap.c | 2 +- mm/page_vma_mapped.c | 6 +++--- 9 files changed, 70 insertions(+), 15 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index ee5a235b3056..5286fd23bbf4 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -498,7 +498,7 @@ static void smaps_pte_entry(pte_t *pte, unsigned long= addr, =20 if (pte_present(*pte)) { page =3D vm_normal_page(vma, addr, *pte); - } else if (is_swap_pte(*pte)) { + } else if (pte_has_swap_entry(*pte)) { swp_entry_t swpent =3D pte_to_swp_entry(*pte); =20 if (!non_swap_entry(swpent)) { @@ -518,8 +518,10 @@ static void smaps_pte_entry(pte_t *pte, unsigned lon= g addr, page =3D migration_entry_to_page(swpent); else if (is_device_private_entry(swpent)) page =3D device_private_entry_to_page(swpent); - } else if (unlikely(IS_ENABLED(CONFIG_SHMEM) && mss->check_shmem_swap - && pte_none(*pte))) { + } else if (unlikely(IS_ENABLED(CONFIG_SHMEM) && + mss->check_shmem_swap && + /* Here swap special pte is the same as none pte */ + (pte_none(*pte) || is_swap_special_pte(*pte)))) { page =3D xa_load(&vma->vm_file->f_mapping->i_pages, linear_page_index(vma, addr)); if (xa_is_value(page)) @@ -688,7 +690,7 @@ static int smaps_hugetlb_range(pte_t *pte, unsigned l= ong hmask, =20 if (pte_present(*pte)) { page =3D vm_normal_page(vma, addr, *pte); - } else if (is_swap_pte(*pte)) { + } else if (pte_has_swap_entry(*pte)) { swp_entry_t swpent =3D pte_to_swp_entry(*pte); =20 if (is_migration_entry(swpent)) @@ -1053,7 +1055,7 @@ static inline void clear_soft_dirty(struct vm_area_= struct *vma, ptent =3D pte_wrprotect(old_pte); ptent =3D pte_clear_soft_dirty(ptent); ptep_modify_prot_commit(vma, addr, pte, old_pte, ptent); - } else if (is_swap_pte(ptent)) { + } else if (pte_has_swap_entry(ptent)) { ptent =3D pte_swp_clear_soft_dirty(ptent); set_pte_at(vma->vm_mm, addr, pte, ptent); } @@ -1366,7 +1368,7 @@ static pagemap_entry_t pte_to_pagemap_entry(struct = pagemapread *pm, page =3D vm_normal_page(vma, addr, pte); if (pte_soft_dirty(pte)) flags |=3D PM_SOFT_DIRTY; - } else if (is_swap_pte(pte)) { + } else if (pte_has_swap_entry(pte)) { swp_entry_t entry; if (pte_swp_soft_dirty(pte)) flags |=3D PM_SOFT_DIRTY; diff --git a/include/linux/swapops.h b/include/linux/swapops.h index 7dd57303bb0c..7b7387d2892f 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -5,6 +5,7 @@ #include #include #include +#include =20 #ifdef CONFIG_MMU =20 @@ -52,12 +53,48 @@ static inline pgoff_t swp_offset(swp_entry_t entry) return entry.val & SWP_OFFSET_MASK; } =20 -/* check whether a pte points to a swap entry */ +/* + * is_swap_pte() returns true for three cases: + * + * (a) The pte contains a swap entry. + * + * (a.1) The pte has a normal swap entry (non_swap_entry()=3D=3Dfalse)= . For + * example, when an anonymous page got swapped out. + * + * (a.2) The pte has a special swap entry (non_swap_entry()=3D=3Dtrue)= . For + * example, a migration entry, a hw-poison entry, etc. + * + * (b) The pte does not contain a swap entry at all (so it cannot be pas= sed + * into pte_to_swp_entry()). For example, uffd-wp special swap pte. + */ static inline int is_swap_pte(pte_t pte) { return !pte_none(pte) && !pte_present(pte); } =20 +/* + * A swap-like special pte should only be used as special marker to trig= ger a + * page fault. We should treat them similarly as pte_none() in most cas= es, + * except that it may contain some special information that can persist = within + * the pte. Currently the only special swap pte is UFFD_WP_SWP_PTE_SPEC= IAL. + * + * Note: we should never call pte_to_swp_entry() upon a special swap pte= , + * Because a swap special pte does not contain a swap entry! + */ +static inline bool is_swap_special_pte(pte_t pte) +{ + return pte_swp_uffd_wp_special(pte); +} + +/* + * Returns true if the pte contains a swap entry. This includes not onl= y the + * normal swp entry case, but also for migration entries, etc. + */ +static inline bool pte_has_swap_entry(pte_t pte) +{ + return is_swap_pte(pte) && !is_swap_special_pte(pte); +} + /* * Convert the arch-dependent pte representation of a swp_entry_t into a= n * arch-independent swp_entry_t. diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 4e3dff13eb70..20807163a25f 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1006,7 +1006,7 @@ static bool __collapse_huge_page_swapin(struct mm_s= truct *mm, for (; vmf.address < address + HPAGE_PMD_NR*PAGE_SIZE; vmf.pte++, vmf.address +=3D PAGE_SIZE) { vmf.orig_pte =3D *vmf.pte; - if (!is_swap_pte(vmf.orig_pte)) + if (!pte_has_swap_entry(vmf.orig_pte)) continue; swapped_in++; ret =3D do_swap_page(&vmf); @@ -1238,6 +1238,15 @@ static int khugepaged_scan_pmd(struct mm_struct *m= m, _pte++, _address +=3D PAGE_SIZE) { pte_t pteval =3D *_pte; if (is_swap_pte(pteval)) { + if (is_swap_special_pte(pteval)) { + /* + * Reuse SCAN_PTE_UFFD_WP. If there will be + * new users of is_swap_special_pte(), we'd + * better introduce a new result type. + */ + result =3D SCAN_PTE_UFFD_WP; + goto out_unmap; + } if (++unmapped <=3D khugepaged_max_ptes_swap) { /* * Always be strict with uffd-wp diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 29459a6ce1c7..3af43a218b8b 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5776,7 +5776,7 @@ static enum mc_target_type get_mctgt_type(struct vm= _area_struct *vma, =20 if (pte_present(ptent)) page =3D mc_handle_present_pte(vma, addr, ptent); - else if (is_swap_pte(ptent)) + else if (pte_has_swap_entry(ptent)) page =3D mc_handle_swap_pte(vma, ptent, &ent); else if (pte_none(ptent)) page =3D mc_handle_file_pte(vma, addr, ptent, &ent); diff --git a/mm/memory.c b/mm/memory.c index 5ab3106cdd35..394c2602dce7 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3255,6 +3255,13 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) if (!pte_unmap_same(vmf)) goto out; =20 + /* + * We should never call do_swap_page upon a swap special pte; just be + * safe to bail out if it happens. + */ + if (WARN_ON_ONCE(is_swap_special_pte(vmf->orig_pte))) + goto out; + entry =3D pte_to_swp_entry(vmf->orig_pte); if (unlikely(non_swap_entry(entry))) { if (is_migration_entry(entry)) { diff --git a/mm/migrate.c b/mm/migrate.c index 5795cb82e27c..8a5459859e17 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -318,7 +318,7 @@ void __migration_entry_wait(struct mm_struct *mm, pte= _t *ptep, =20 spin_lock(ptl); pte =3D *ptep; - if (!is_swap_pte(pte)) + if (!pte_has_swap_entry(pte)) goto out; =20 entry =3D pte_to_swp_entry(pte); diff --git a/mm/mprotect.c b/mm/mprotect.c index 56c02beb6041..e75bfe43cedd 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -139,7 +139,7 @@ static unsigned long change_pte_range(struct vm_area_= struct *vma, pmd_t *pmd, } ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent); pages++; - } else if (is_swap_pte(oldpte)) { + } else if (pte_has_swap_entry(oldpte)) { swp_entry_t entry =3D pte_to_swp_entry(oldpte); pte_t newpte; =20 diff --git a/mm/mremap.c b/mm/mremap.c index 138abbae4f75..f736fcbe1247 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -106,7 +106,7 @@ static pte_t move_soft_dirty_pte(pte_t pte) #ifdef CONFIG_MEM_SOFT_DIRTY if (pte_present(pte)) pte =3D pte_mksoft_dirty(pte); - else if (is_swap_pte(pte)) + else if (pte_has_swap_entry(pte)) pte =3D pte_swp_mksoft_dirty(pte); #endif return pte; diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index 5e77b269c330..c97884007232 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -36,7 +36,7 @@ static bool map_pte(struct page_vma_mapped_walk *pvmw) * For more details on device private memory see HMM * (include/linux/hmm.h or mm/hmm.c). */ - if (is_swap_pte(*pvmw->pte)) { + if (pte_has_swap_entry(*pvmw->pte)) { swp_entry_t entry; =20 /* Handle un-addressable ZONE_DEVICE memory */ @@ -88,7 +88,7 @@ static bool check_pte(struct page_vma_mapped_walk *pvmw= ) =20 if (pvmw->flags & PVMW_MIGRATION) { swp_entry_t entry; - if (!is_swap_pte(*pvmw->pte)) + if (!pte_has_swap_entry(*pvmw->pte)) return false; entry =3D pte_to_swp_entry(*pvmw->pte); =20 @@ -96,7 +96,7 @@ static bool check_pte(struct page_vma_mapped_walk *pvmw= ) return false; =20 pfn =3D migration_entry_to_pfn(entry); - } else if (is_swap_pte(*pvmw->pte)) { + } else if (pte_has_swap_entry(*pvmw->pte)) { swp_entry_t entry; =20 /* Handle un-addressable ZONE_DEVICE memory */ --=20 2.26.2