From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 30C86C433E9 for ; Fri, 15 Jan 2021 17:09:20 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BA8A8238EE for ; Fri, 15 Jan 2021 17:09:19 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BA8A8238EE Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4A00E8D019C; Fri, 15 Jan 2021 12:09:19 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 44C1F8D016A; Fri, 15 Jan 2021 12:09:19 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 24D328D019C; Fri, 15 Jan 2021 12:09:19 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0181.hostedemail.com [216.40.44.181]) by kanga.kvack.org (Postfix) with ESMTP id 0C5388D016A for ; Fri, 15 Jan 2021 12:09:19 -0500 (EST) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id C4FD78E49 for ; Fri, 15 Jan 2021 17:09:18 +0000 (UTC) X-FDA: 77708645196.16.smoke66_1b0c8ad27531 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin16.hostedemail.com (Postfix) with ESMTP id A39F1100E7FC4 for ; Fri, 15 Jan 2021 17:09:18 +0000 (UTC) X-HE-Tag: smoke66_1b0c8ad27531 X-Filterd-Recvd-Size: 12131 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf18.hostedemail.com (Postfix) with ESMTP for ; Fri, 15 Jan 2021 17:09:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1610730557; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uxL/EXqPGyWZMDtVPV4yKceP//rOWf13NhV4Ho8YVV0=; b=e6Q4VPVZ5nAc/rdnN4NxIK1u86p7SyoeF6H1pVG9QPlpZ1hQ2I7wRciWANOwzp2dMv5AXd dA58mA9to2WiQdArj/aR77B0BlSTMH6eteczGYpqxbOfP+uTdt75yyGcqKJxNbmpi+kFqJ Nl4caFlsMQGAV+ufRgdVvKqri+dBxWs= Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-593-FrhnAD6ROMqCtp8tWXqpFw-1; Fri, 15 Jan 2021 12:09:13 -0500 X-MC-Unique: FrhnAD6ROMqCtp8tWXqpFw-1 Received: by mail-qt1-f200.google.com with SMTP id g9so7867210qtv.12 for ; Fri, 15 Jan 2021 09:09:13 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=uxL/EXqPGyWZMDtVPV4yKceP//rOWf13NhV4Ho8YVV0=; b=rJEfrq6xL9c7pTpLutwL50eQX6mB45wxiDE/pi7dDTY4OmoQ+HxkNwrT7Y4crmZqah w4qxvmPdv5IyG5DM1qW1XP9KG0wVcaamo+lKU3F16JRzm/wdfHSnY2Hb2qnpEZxfJwMw iqq/0h+zzmY9cqclaY7DLlss+736/6lgxBDUS29SI7MdaeXt1CdaHEp4srQmshJ9jk/C Ax72oTQ5evFcyTf8TDzyuh1hes2uxguMUNrmkDRTBLUfN6dJVEK+e1dTHxNQMWg72LPH hmMN0HPl+6tPC9AtzLPYnC0FSpHkSCgmJpPSbC9Zog+gv43oIOD3OfEwNUP6AG0fGqY2 15pQ== X-Gm-Message-State: AOAM530Q2OlZNbdkwTWM9sff68iRutTLD9yAm2w/sXgT0iw+O7OChoiT xMbTP9fPFSfRzv8ibMsD6ObHjgaoxVuIMbE4fgQWWo8jD59v0Bjzp4/xQtzvrUMdF0oeXgZfJ/D Lk9YvNMBUfOk= X-Received: by 2002:ac8:4e0e:: with SMTP id c14mr12659975qtw.71.1610730553164; Fri, 15 Jan 2021 09:09:13 -0800 (PST) X-Google-Smtp-Source: ABdhPJxk/GhB36vk+VQJjp7yB/I/Ci+O+t+37/F8esLD26hM8u75lmGxSsENJDlgYLD/4ZSHsS7ccw== X-Received: by 2002:ac8:4e0e:: with SMTP id c14mr12659938qtw.71.1610730552804; Fri, 15 Jan 2021 09:09:12 -0800 (PST) Received: from localhost.localdomain ([142.126.83.202]) by smtp.gmail.com with ESMTPSA id d123sm5187840qke.95.2021.01.15.09.09.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 15 Jan 2021 09:09:12 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Mike Rapoport , Mike Kravetz , peterx@redhat.com, Jerome Glisse , "Kirill A . Shutemov" , Hugh Dickins , Axel Rasmussen , Matthew Wilcox , Andrew Morton , Andrea Arcangeli , Nadav Amit Subject: [PATCH RFC 02/30] mm/userfaultfd: Fix uffd-wp special cases for fork() Date: Fri, 15 Jan 2021 12:08:39 -0500 Message-Id: <20210115170907.24498-3-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210115170907.24498-1-peterx@redhat.com> References: <20210115170907.24498-1-peterx@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We tried to do something similar in b569a1760782 ("userfaultfd: wp: drop _PAGE_UFFD_WP properly when fork") previously, but it's not doing it all right.. A few fixes around the code path: 1. We were referencing VM_UFFD_WP vm_flags on the _old_ vma rather than= the new vma. That's overlooked in b569a1760782, so it won't work as exp= ected. Thanks to the recent rework on fork code (7a4830c380f3a8b3), we can = easily get the new vma now, so switch the checks to that. 2. Dropping the uffd-wp bit in copy_huge_pmd() could be wrong if the hu= ge pmd is a migration huge pmd. When it happens, instead of using pmd_uffd= _wp(), we should use pmd_swp_uffd_wp(). The fix is simply to handle them se= parately. 3. Forget to carry over uffd-wp bit for a write migration huge pmd entr= y. This also happens in copy_huge_pmd(), where we converted a write hug= e migration entry into a read one. 4. In copy_nonpresent_pte(), drop uffd-wp if necessary for swap ptes. 5. In copy_present_page() when COW is enforced when fork(), we also nee= d to pass over the uffd-wp bit if VM_UFFD_WP is armed on the new vma, and= when the pte to be copied has uffd-wp bit set. Remove the comment in copy_present_pte() about this. It won't help a hug= e lot to only comment there, but comment everywhere would be an overkill. Let'= s assume the commit messages would help. Cc: Jerome Glisse Cc: Mike Rapoport Fixes: b569a1760782f3da03ff718d61f74163dea599ff Signed-off-by: Peter Xu --- include/linux/huge_mm.h | 3 ++- mm/huge_memory.c | 23 ++++++++++------------- mm/memory.c | 24 +++++++++++++----------- 3 files changed, 25 insertions(+), 25 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 0365aa97f8e7..77b8d2132c3a 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -10,7 +10,8 @@ extern vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf); extern int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src= _mm, pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long addr, - struct vm_area_struct *vma); + struct vm_area_struct *src_vma, + struct vm_area_struct *dst_vma); extern void huge_pmd_set_accessed(struct vm_fault *vmf, pmd_t orig_pmd); extern int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src= _mm, pud_t *dst_pud, pud_t *src_pud, unsigned long addr, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index b64ad1947900..35d4acac6874 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -996,7 +996,7 @@ struct page *follow_devmap_pmd(struct vm_area_struct = *vma, unsigned long addr, =20 int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long addr, - struct vm_area_struct *vma) + struct vm_area_struct *src_vma, struct vm_area_struct *dst_vma) { spinlock_t *dst_ptl, *src_ptl; struct page *src_page; @@ -1005,7 +1005,7 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct = mm_struct *src_mm, int ret =3D -ENOMEM; =20 /* Skip if can be re-fill on fault */ - if (!vma_is_anonymous(vma)) + if (!vma_is_anonymous(dst_vma)) return 0; =20 pgtable =3D pte_alloc_one(dst_mm); @@ -1019,14 +1019,6 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct= mm_struct *src_mm, ret =3D -EAGAIN; pmd =3D *src_pmd; =20 - /* - * Make sure the _PAGE_UFFD_WP bit is cleared if the new VMA - * does not have the VM_UFFD_WP, which means that the uffd - * fork event is not enabled. - */ - if (!(vma->vm_flags & VM_UFFD_WP)) - pmd =3D pmd_clear_uffd_wp(pmd); - #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION if (unlikely(is_swap_pmd(pmd))) { swp_entry_t entry =3D pmd_to_swp_entry(pmd); @@ -1037,11 +1029,15 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struc= t mm_struct *src_mm, pmd =3D swp_entry_to_pmd(entry); if (pmd_swp_soft_dirty(*src_pmd)) pmd =3D pmd_swp_mksoft_dirty(pmd); + if (pmd_swp_uffd_wp(*src_pmd)) + pmd =3D pmd_swp_mkuffd_wp(pmd); set_pmd_at(src_mm, addr, src_pmd, pmd); } add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); mm_inc_nr_ptes(dst_mm); pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); + if (!(dst_vma->vm_flags & VM_UFFD_WP)) + pmd =3D pmd_swp_clear_uffd_wp(pmd); set_pmd_at(dst_mm, addr, dst_pmd, pmd); ret =3D 0; goto out_unlock; @@ -1077,13 +1073,13 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struc= t mm_struct *src_mm, * best effort that the pinned pages won't be replaced by another * random page during the coming copy-on-write. */ - if (unlikely(is_cow_mapping(vma->vm_flags) && + if (unlikely(is_cow_mapping(src_vma->vm_flags) && atomic_read(&src_mm->has_pinned) && page_maybe_dma_pinned(src_page))) { pte_free(dst_mm, pgtable); spin_unlock(src_ptl); spin_unlock(dst_ptl); - __split_huge_pmd(vma, src_pmd, addr, false, NULL); + __split_huge_pmd(src_vma, src_pmd, addr, false, NULL); return -EAGAIN; } =20 @@ -1093,8 +1089,9 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct = mm_struct *src_mm, out_zero_page: mm_inc_nr_ptes(dst_mm); pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); - pmdp_set_wrprotect(src_mm, addr, src_pmd); + if (!(dst_vma->vm_flags & VM_UFFD_WP)) + pmd =3D pmd_clear_uffd_wp(pmd); pmd =3D pmd_mkold(pmd_wrprotect(pmd)); set_pmd_at(dst_mm, addr, dst_pmd, pmd); =20 diff --git a/mm/memory.c b/mm/memory.c index c48f8df6e502..d6d2873368e1 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -696,10 +696,10 @@ struct page *vm_normal_page_pmd(struct vm_area_stru= ct *vma, unsigned long addr, =20 static unsigned long copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, - pte_t *dst_pte, pte_t *src_pte, struct vm_area_struct *vma, - unsigned long addr, int *rss) + pte_t *dst_pte, pte_t *src_pte, struct vm_area_struct *src_vma, + struct vm_area_struct *dst_vma, unsigned long addr, int *rss) { - unsigned long vm_flags =3D vma->vm_flags; + unsigned long vm_flags =3D dst_vma->vm_flags; pte_t pte =3D *src_pte; struct page *page; swp_entry_t entry =3D pte_to_swp_entry(pte); @@ -768,6 +768,8 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct = mm_struct *src_mm, set_pte_at(src_mm, addr, src_pte, pte); } } + if (!userfaultfd_wp(dst_vma)) + pte =3D pte_swp_clear_uffd_wp(pte); set_pte_at(dst_mm, addr, dst_pte, pte); return 0; } @@ -839,6 +841,9 @@ copy_present_page(struct vm_area_struct *dst_vma, str= uct vm_area_struct *src_vma /* All done, just insert the new page copy in the child */ pte =3D mk_pte(new_page, dst_vma->vm_page_prot); pte =3D maybe_mkwrite(pte_mkdirty(pte), dst_vma); + if (userfaultfd_wp(dst_vma) && pte_uffd_wp(*src_pte)) + /* Uffd-wp needs to be delivered to dest pte as well */ + pte =3D pte_wrprotect(pte_mkuffd_wp(pte)); set_pte_at(dst_vma->vm_mm, addr, dst_pte, pte); return 0; } @@ -888,12 +893,7 @@ copy_present_pte(struct vm_area_struct *dst_vma, str= uct vm_area_struct *src_vma, pte =3D pte_mkclean(pte); pte =3D pte_mkold(pte); =20 - /* - * Make sure the _PAGE_UFFD_WP bit is cleared if the new VMA - * does not have the VM_UFFD_WP, which means that the uffd - * fork event is not enabled. - */ - if (!(vm_flags & VM_UFFD_WP)) + if (!(dst_vma->vm_flags & VM_UFFD_WP)) pte =3D pte_clear_uffd_wp(pte); =20 set_pte_at(dst_vma->vm_mm, addr, dst_pte, pte); @@ -968,7 +968,8 @@ copy_pte_range(struct vm_area_struct *dst_vma, struct= vm_area_struct *src_vma, if (unlikely(!pte_present(*src_pte))) { entry.val =3D copy_nonpresent_pte(dst_mm, src_mm, dst_pte, src_pte, - src_vma, addr, rss); + src_vma, dst_vma, + addr, rss); if (entry.val) break; progress +=3D 8; @@ -1046,7 +1047,8 @@ copy_pmd_range(struct vm_area_struct *dst_vma, stru= ct vm_area_struct *src_vma, int err; VM_BUG_ON_VMA(next-addr !=3D HPAGE_PMD_SIZE, src_vma); err =3D copy_huge_pmd(dst_mm, src_mm, - dst_pmd, src_pmd, addr, src_vma); + dst_pmd, src_pmd, addr, src_vma, + dst_vma); if (err =3D=3D -ENOMEM) return -ENOMEM; if (!err) --=20 2.26.2