From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 248E2C433EF for ; Mon, 15 Nov 2021 08:03:01 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BF8F463218 for ; Mon, 15 Nov 2021 08:03:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org BF8F463218 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 5EA406B0098; Mon, 15 Nov 2021 03:03:00 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 59A986B0099; Mon, 15 Nov 2021 03:03:00 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 463F76B009A; Mon, 15 Nov 2021 03:03:00 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0028.hostedemail.com [216.40.44.28]) by kanga.kvack.org (Postfix) with ESMTP id 3856D6B0098 for ; Mon, 15 Nov 2021 03:03:00 -0500 (EST) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 000E1181327AA for ; Mon, 15 Nov 2021 08:02:59 +0000 (UTC) X-FDA: 78810423720.03.E9BC11C Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf28.hostedemail.com (Postfix) with ESMTP id 75D7E90000AF for ; Mon, 15 Nov 2021 08:02:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1636963378; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YHxegeFAQpAm7QNoFX5FWZJCYVnUR8APDp1wCGQBN2s=; b=NxsVeVAsH3mrEoJdhR+oUCg7kbLYVcYwDK+AxUeBGTVYlMBltuRhMV2fyokhZuzxX5fW// O+daUJeHhBtBv9+D4GqbbtUGunTahXR0CCj5GFUpMxHzCv61cXkde9czNg+YWxEHfMeb8r boowzOVtoKQfGWwbqhxXofOEpqiX9I0= Received: from mail-pj1-f70.google.com (mail-pj1-f70.google.com [209.85.216.70]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-60-AVVtOiesNyWH97t_XcBc6A-1; Mon, 15 Nov 2021 03:02:57 -0500 X-MC-Unique: AVVtOiesNyWH97t_XcBc6A-1 Received: by mail-pj1-f70.google.com with SMTP id x18-20020a17090a789200b001a7317f995cso8232533pjk.4 for ; Mon, 15 Nov 2021 00:02:57 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=YHxegeFAQpAm7QNoFX5FWZJCYVnUR8APDp1wCGQBN2s=; b=jL0qJHybr0z0Ttddkdt8v/x2BcVnEfvpQ09CD4IQwZ6AckWeSJc8Bj8vhKvwBjVvrb bQ2vPRdOj9LwszmmFmR0zlsZVuTdYaFNCE/po71JjVAkERLZi8HJ+aKtKnkYDqXVLaPr eTh6+KhPlGxm6XKN8V2LCdtK9QGwt08pqOuersqWAboTfyQBFR/dIWzUPSUYCbyx8feX V2qgxHGeyq5VzH/ALuXE7PXjqw8ijJ45cK+CPdd+kh/9eEdRdhCGml7ajnQtF/6RxdTw aKlYdjvGJmoC0QptuDbWk5ZEYdMAU9jFK+NBobHZkjtVL6ClMvcEouMrz81AEkP2M/Co J0lQ== X-Gm-Message-State: AOAM532eKThgGBCOdWKMDvIFTMu1COLVZ8Qv0K2XjR3Oj2eZpu80LD2M Gxb/xpQUJTaflHmfC7xOo99VLeeN3k07eGCsKXCNRWDX6MVnpPwCJ3pd7SKj7d2EcopzOrq/kzw 2rgSR7WO0gW+BkQ+5PFTEHVkOT7fRQ4JAha14NSM6hrni0NrcGgl7dh9cUzwg X-Received: by 2002:a17:902:be06:b0:142:5a21:9e8a with SMTP id r6-20020a170902be0600b001425a219e8amr33100066pls.17.1636963376339; Mon, 15 Nov 2021 00:02:56 -0800 (PST) X-Google-Smtp-Source: ABdhPJzi3mzgc10Ws3LtJnt+vKzwU+NxNytHynLNU3P3cefT6vAfJ+/+/0CGsrQ9Zf5D1jVuVMeZpQ== X-Received: by 2002:a17:902:be06:b0:142:5a21:9e8a with SMTP id r6-20020a170902be0600b001425a219e8amr33100014pls.17.1636963375845; Mon, 15 Nov 2021 00:02:55 -0800 (PST) Received: from localhost.localdomain ([94.177.118.89]) by smtp.gmail.com with ESMTPSA id d2sm15074317pfj.42.2021.11.15.00.02.48 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 15 Nov 2021 00:02:55 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Nadav Amit , peterx@redhat.com, Alistair Popple , Andrew Morton , Mike Kravetz , Mike Rapoport , Matthew Wilcox , Jerome Glisse , Axel Rasmussen , "Kirill A . Shutemov" , David Hildenbrand , Andrea Arcangeli , Hugh Dickins Subject: [PATCH v6 17/23] mm/hugetlb: Only drop uffd-wp special pte if required Date: Mon, 15 Nov 2021 16:02:43 +0800 Message-Id: <20211115080243.75040-1-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20211115075522.73795-1-peterx@redhat.com> References: <20211115075522.73795-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="US-ASCII" X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 75D7E90000AF X-Stat-Signature: xp98zojxm6grteouc14pnr5rftstqih9 Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=NxsVeVAs; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf28.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=peterx@redhat.com X-HE-Tag: 1636963379-227485 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: As with shmem uffd-wp special ptes, only drop the uffd-wp special swap pt= e if unmapping an entire vma or synchronized such that faults can not race wit= h the unmap operation. This requires passing zap_flags all the way to the lowe= st level hugetlb unmap routine: __unmap_hugepage_range. In general, unmap calls originated in hugetlbfs code will pass the ZAP_FLAG_DROP_MARKER flag as synchronization is in place to prevent fault= s. The exception is hole punch which will first unmap without any synchroniz= ation. Later when hole punch actually removes the page from the file, it will ch= eck to see if there was a subsequent fault and if so take the hugetlb fault mute= x while unmapping again. This second unmap will pass in ZAP_FLAG_DROP_MARK= ER. The justification of "whether to apply ZAP_FLAG_DROP_MARKER flag when unm= ap a hugetlb range" is (IMHO): we should never reach a state when a page fault= could errornously fault in a page-cache page that was wr-protected to be writab= le, even in an extremely short period. That could happen if e.g. we pass ZAP_FLAG_DROP_MARKER when hugetlbfs_punch_hole() calls hugetlb_vmdelete_l= ist(), because if a page faults after that call and before remove_inode_hugepage= s() is executed, the page cache can be mapped writable again in the small racy w= indow, that can cause unexpected data overwritten. Reviewed-by: Mike Kravetz Signed-off-by: Peter Xu --- fs/hugetlbfs/inode.c | 15 +++++++++------ include/linux/hugetlb.h | 8 +++++--- mm/hugetlb.c | 33 +++++++++++++++++++++++++-------- mm/memory.c | 5 ++++- 4 files changed, 43 insertions(+), 18 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 49d2e686be74..92c8d1a47404 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -404,7 +404,8 @@ static void remove_huge_page(struct page *page) } =20 static void -hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_= t end) +hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_= t end, + unsigned long zap_flags) { struct vm_area_struct *vma; =20 @@ -437,7 +438,7 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pg= off_t start, pgoff_t end) } =20 unmap_hugepage_range(vma, vma->vm_start + v_offset, v_end, - NULL); + NULL, zap_flags); } } =20 @@ -515,7 +516,8 @@ static void remove_inode_hugepages(struct inode *inod= e, loff_t lstart, mutex_lock(&hugetlb_fault_mutex_table[hash]); hugetlb_vmdelete_list(&mapping->i_mmap, index * pages_per_huge_page(h), - (index + 1) * pages_per_huge_page(h)); + (index + 1) * pages_per_huge_page(h), + ZAP_FLAG_DROP_MARKER); i_mmap_unlock_write(mapping); } =20 @@ -581,7 +583,8 @@ static void hugetlb_vmtruncate(struct inode *inode, l= off_t offset) i_mmap_lock_write(mapping); i_size_write(inode, offset); if (!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root)) - hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0); + hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0, + ZAP_FLAG_DROP_MARKER); i_mmap_unlock_write(mapping); remove_inode_hugepages(inode, offset, LLONG_MAX); } @@ -614,8 +617,8 @@ static long hugetlbfs_punch_hole(struct inode *inode,= loff_t offset, loff_t len) i_mmap_lock_write(mapping); if (!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root)) hugetlb_vmdelete_list(&mapping->i_mmap, - hole_start >> PAGE_SHIFT, - hole_end >> PAGE_SHIFT); + hole_start >> PAGE_SHIFT, + hole_end >> PAGE_SHIFT, 0); i_mmap_unlock_write(mapping); remove_inode_hugepages(inode, hole_start, hole_end); inode_unlock(inode); diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index a46011510e49..4c3ea7ee8ce8 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -143,11 +143,12 @@ long follow_hugetlb_page(struct mm_struct *, struct= vm_area_struct *, unsigned long *, unsigned long *, long, unsigned int, int *); void unmap_hugepage_range(struct vm_area_struct *, - unsigned long, unsigned long, struct page *); + unsigned long, unsigned long, struct page *, + unsigned long); void __unmap_hugepage_range_final(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long start, unsigned long end, - struct page *ref_page); + struct page *ref_page, unsigned long zap_flags); void hugetlb_report_meminfo(struct seq_file *); int hugetlb_report_node_meminfo(char *buf, int len, int nid); void hugetlb_show_meminfo(void); @@ -400,7 +401,8 @@ static inline unsigned long hugetlb_change_protection= ( =20 static inline void __unmap_hugepage_range_final(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long start, - unsigned long end, struct page *ref_page) + unsigned long end, struct page *ref_page, + unsigned long zap_flags) { BUG(); } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index bba2ede5f6dc..16fb9cd8d9c5 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4926,7 +4926,7 @@ int move_hugetlb_page_tables(struct vm_area_struct = *vma, =20 static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_are= a_struct *vma, unsigned long start, unsigned long end, - struct page *ref_page) + struct page *ref_page, unsigned long zap_flags) { struct mm_struct *mm =3D vma->vm_mm; unsigned long address; @@ -4983,7 +4983,18 @@ static void __unmap_hugepage_range(struct mmu_gath= er *tlb, struct vm_area_struct * unmapped and its refcount is dropped, so just clear pte here. */ if (unlikely(!pte_present(pte))) { - huge_pte_clear(mm, address, ptep, sz); + /* + * If the pte was wr-protected by uffd-wp in any of the + * swap forms, meanwhile the caller does not want to + * drop the uffd-wp bit in this zap, then replace the + * pte with a marker. + */ + if (pte_swp_uffd_wp_any(pte) && + !(zap_flags & ZAP_FLAG_DROP_MARKER)) + set_huge_pte_at(mm, address, ptep, + make_pte_marker(PTE_MARKER_UFFD_WP)); + else + huge_pte_clear(mm, address, ptep, sz); spin_unlock(ptl); continue; } @@ -5011,7 +5022,11 @@ static void __unmap_hugepage_range(struct mmu_gath= er *tlb, struct vm_area_struct tlb_remove_huge_tlb_entry(h, tlb, ptep, address); if (huge_pte_dirty(pte)) set_page_dirty(page); - + /* Leave a uffd-wp pte marker if needed */ + if (huge_pte_uffd_wp(pte) && + !(zap_flags & ZAP_FLAG_DROP_MARKER)) + set_huge_pte_at(mm, address, ptep, + make_pte_marker(PTE_MARKER_UFFD_WP)); hugetlb_count_sub(pages_per_huge_page(h), mm); page_remove_rmap(page, true); =20 @@ -5029,9 +5044,10 @@ static void __unmap_hugepage_range(struct mmu_gath= er *tlb, struct vm_area_struct =20 void __unmap_hugepage_range_final(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long start, - unsigned long end, struct page *ref_page) + unsigned long end, struct page *ref_page, + unsigned long zap_flags) { - __unmap_hugepage_range(tlb, vma, start, end, ref_page); + __unmap_hugepage_range(tlb, vma, start, end, ref_page, zap_flags); =20 /* * Clear this flag so that x86's huge_pmd_share page_table_shareable @@ -5047,12 +5063,13 @@ void __unmap_hugepage_range_final(struct mmu_gath= er *tlb, } =20 void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long star= t, - unsigned long end, struct page *ref_page) + unsigned long end, struct page *ref_page, + unsigned long zap_flags) { struct mmu_gather tlb; =20 tlb_gather_mmu(&tlb, vma->vm_mm); - __unmap_hugepage_range(&tlb, vma, start, end, ref_page); + __unmap_hugepage_range(&tlb, vma, start, end, ref_page, zap_flags); tlb_finish_mmu(&tlb); } =20 @@ -5107,7 +5124,7 @@ static void unmap_ref_private(struct mm_struct *mm,= struct vm_area_struct *vma, */ if (!is_vma_resv_set(iter_vma, HPAGE_RESV_OWNER)) unmap_hugepage_range(iter_vma, address, - address + huge_page_size(h), page); + address + huge_page_size(h), page, 0); } i_mmap_unlock_write(mapping); } diff --git a/mm/memory.c b/mm/memory.c index cc625c616645..69a73d47513b 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1631,8 +1631,11 @@ static void unmap_single_vma(struct mmu_gather *tl= b, * safe to do nothing in this case. */ if (vma->vm_file) { + unsigned long zap_flags =3D details ? + details->zap_flags : 0; i_mmap_lock_write(vma->vm_file->f_mapping); - __unmap_hugepage_range_final(tlb, vma, start, end, NULL); + __unmap_hugepage_range_final(tlb, vma, start, end, + NULL, zap_flags); i_mmap_unlock_write(vma->vm_file->f_mapping); } } else --=20 2.32.0