From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 36233C7EE29 for ; Fri, 9 Jun 2023 23:35:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232563AbjFIXfT (ORCPT ); Fri, 9 Jun 2023 19:35:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33702 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232651AbjFIXd3 (ORCPT ); Fri, 9 Jun 2023 19:33:29 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F10A649F0 for ; Fri, 9 Jun 2023 16:30:33 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id D1F9C65D0B for ; Fri, 9 Jun 2023 23:30:33 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2DB2DC433EF; Fri, 9 Jun 2023 23:30:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1686353433; bh=pXfQFxxCvszw/gYOnhoLZLD3s94mWWv0f4cTALfhi7I=; h=Date:To:From:Subject:From; b=rp2eUHqOOQ98EzKWX78LcgrREW9s8WhGjM+/PbLf9w9MqhD9W5JqbDCmi8udz8BLz 3ODNSCho4e18XgvKUah6SEc4qRcDCWDKHXm8sx7/ZxGO3DbUZcib6bl/NWEGI78a6F TdKDoM0SQb09OShoc+aJAJ1NcS38N6CSFe8LJdCE= Date: Fri, 09 Jun 2023 16:30:32 -0700 To: mm-commits@vger.kernel.org, peterz@infradead.org, mpenttil@redhat.com, kirill@shutemov.name, jhubbard@nvidia.com, jgg@nvidia.com, jack@suse.cz, david@redhat.com, lstoakes@gmail.com, akpm@linux-foundation.org From: Andrew Morton Subject: [merged mm-stable] mm-mmap-separate-writenotify-and-dirty-tracking-logic.patch removed from -mm tree Message-Id: <20230609233033.2DB2DC433EF@smtp.kernel.org> Precedence: bulk Reply-To: linux-kernel@vger.kernel.org List-ID: X-Mailing-List: mm-commits@vger.kernel.org The quilt patch titled Subject: mm/mmap: separate writenotify and dirty tracking logic has been removed from the -mm tree. Its filename was mm-mmap-separate-writenotify-and-dirty-tracking-logic.patch This patch was dropped because it was merged into the mm-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Lorenzo Stoakes Subject: mm/mmap: separate writenotify and dirty tracking logic Date: Thu, 4 May 2023 22:27:51 +0100 Patch series "mm/gup: disallow GUP writing to file-backed mappings by default", v9. Writing to file-backed mappings which require folio dirty tracking using GUP is a fundamentally broken operation, as kernel write access to GUP mappings do not adhere to the semantics expected by a file system. A GUP caller uses the direct mapping to access the folio, which does not cause write notify to trigger, nor does it enforce that the caller marks the folio dirty. The problem arises when, after an initial write to the folio, writeback results in the folio being cleaned and then the caller, via the GUP interface, writes to the folio again. As a result of the use of this secondary, direct, mapping to the folio no write notify will occur, and if the caller does mark the folio dirty, this will be done so unexpectedly. For example, consider the following scenario:- 1. A folio is written to via GUP which write-faults the memory, notifying the file system and dirtying the folio. 2. Later, writeback is triggered, resulting in the folio being cleaned and the PTE being marked read-only. 3. The GUP caller writes to the folio, as it is mapped read/write via the direct mapping. 4. The GUP caller, now done with the page, unpins it and sets it dirty (though it does not have to). This change updates both the PUP FOLL_LONGTERM slow and fast APIs. As pin_user_pages_fast_only() does not exist, we can rely on a slightly imperfect whitelisting in the PUP-fast case and fall back to the slow case should this fail. This patch (of 3): vma_wants_writenotify() is specifically intended for setting PTE page table flags, accounting for existing page table flag state and whether the underlying filesystem performs dirty tracking for a file-backed mapping. Everything is predicated firstly on whether the mapping is shared writable, as this is the only instance where dirty tracking is pertinent - MAP_PRIVATE mappings will always be CoW'd and unshared, and read-only file-backed shared mappings cannot be written to, even with FOLL_FORCE. All other checks are in line with existing logic, though now separated into checks eplicitily for dirty tracking and those for determining how to set page table flags. We make this change so we can perform checks in the GUP logic to determine which mappings might be problematic when written to. Link: https://lkml.kernel.org/r/cover.1683235180.git.lstoakes@gmail.com Link: https://lkml.kernel.org/r/0f218370bd49b4e6bbfbb499f7c7b92c26ba1ceb.1683235180.git.lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes Reviewed-by: John Hubbard Reviewed-by: Mika Penttilä Reviewed-by: Jan Kara Reviewed-by: Jason Gunthorpe Acked-by: David Hildenbrand Cc: Kirill A . Shutemov Cc: Peter Zijlstra Signed-off-by: Andrew Morton --- include/linux/mm.h | 1 mm/mmap.c | 58 ++++++++++++++++++++++++++++++++++--------- 2 files changed, 47 insertions(+), 12 deletions(-) --- a/include/linux/mm.h~mm-mmap-separate-writenotify-and-dirty-tracking-logic +++ a/include/linux/mm.h @@ -2461,6 +2461,7 @@ extern unsigned long move_page_tables(st #define MM_CP_UFFD_WP_ALL (MM_CP_UFFD_WP | \ MM_CP_UFFD_WP_RESOLVE) +bool vma_needs_dirty_tracking(struct vm_area_struct *vma); int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot); static inline bool vma_wants_manual_pte_write_upgrade(struct vm_area_struct *vma) { --- a/mm/mmap.c~mm-mmap-separate-writenotify-and-dirty-tracking-logic +++ a/mm/mmap.c @@ -1454,6 +1454,48 @@ SYSCALL_DEFINE1(old_mmap, struct mmap_ar } #endif /* __ARCH_WANT_SYS_OLD_MMAP */ +static bool vm_ops_needs_writenotify(const struct vm_operations_struct *vm_ops) +{ + return vm_ops && (vm_ops->page_mkwrite || vm_ops->pfn_mkwrite); +} + +static bool vma_is_shared_writable(struct vm_area_struct *vma) +{ + return (vma->vm_flags & (VM_WRITE | VM_SHARED)) == + (VM_WRITE | VM_SHARED); +} + +static bool vma_fs_can_writeback(struct vm_area_struct *vma) +{ + /* No managed pages to writeback. */ + if (vma->vm_flags & VM_PFNMAP) + return false; + + return vma->vm_file && vma->vm_file->f_mapping && + mapping_can_writeback(vma->vm_file->f_mapping); +} + +/* + * Does this VMA require the underlying folios to have their dirty state + * tracked? + */ +bool vma_needs_dirty_tracking(struct vm_area_struct *vma) +{ + /* Only shared, writable VMAs require dirty tracking. */ + if (!vma_is_shared_writable(vma)) + return false; + + /* Does the filesystem need to be notified? */ + if (vm_ops_needs_writenotify(vma->vm_ops)) + return true; + + /* + * Even if the filesystem doesn't indicate a need for writenotify, if it + * can writeback, dirty tracking is still required. + */ + return vma_fs_can_writeback(vma); +} + /* * Some shared mappings will want the pages marked read-only * to track write events. If so, we'll downgrade vm_page_prot @@ -1462,21 +1504,18 @@ SYSCALL_DEFINE1(old_mmap, struct mmap_ar */ int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot) { - vm_flags_t vm_flags = vma->vm_flags; - const struct vm_operations_struct *vm_ops = vma->vm_ops; - /* If it was private or non-writable, the write bit is already clear */ - if ((vm_flags & (VM_WRITE|VM_SHARED)) != ((VM_WRITE|VM_SHARED))) + if (!vma_is_shared_writable(vma)) return 0; /* The backer wishes to know when pages are first written to? */ - if (vm_ops && (vm_ops->page_mkwrite || vm_ops->pfn_mkwrite)) + if (vm_ops_needs_writenotify(vma->vm_ops)) return 1; /* The open routine did something to the protections that pgprot_modify * won't preserve? */ if (pgprot_val(vm_page_prot) != - pgprot_val(vm_pgprot_modify(vm_page_prot, vm_flags))) + pgprot_val(vm_pgprot_modify(vm_page_prot, vma->vm_flags))) return 0; /* @@ -1490,13 +1529,8 @@ int vma_wants_writenotify(struct vm_area if (userfaultfd_wp(vma)) return 1; - /* Specialty mapping? */ - if (vm_flags & VM_PFNMAP) - return 0; - /* Can the mapping track the dirty pages? */ - return vma->vm_file && vma->vm_file->f_mapping && - mapping_can_writeback(vma->vm_file->f_mapping); + return vma_fs_can_writeback(vma); } /* _ Patches currently in -mm which might be from lstoakes@gmail.com are mm-vmalloc-do-not-output-a-spurious-warning-when-huge-vmalloc-fails.patch