linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/25] fs/dax: Fix ZONE_DEVICE page reference counts
@ 2024-12-17  5:12 Alistair Popple
  2024-12-17  5:12 ` [PATCH v4 01/25] fuse: Fix dax truncate/punch_hole fault path Alistair Popple
                   ` (24 more replies)
  0 siblings, 25 replies; 47+ messages in thread
From: Alistair Popple @ 2024-12-17  5:12 UTC (permalink / raw)
  To: akpm, dan.j.williams, linux-mm
  Cc: Alistair Popple, lina, zhang.lyra, gerald.schaefer,
	vishal.l.verma, dave.jiang, logang, bhelgaas, jack, jgg,
	catalin.marinas, will, mpe, npiggin, dave.hansen, ira.weiny,
	willy, djwong, tytso, linmiaohe, david, peterx, linux-doc,
	linux-kernel, linux-arm-kernel, linuxppc-dev, nvdimm, linux-cxl,
	linux-fsdevel, linux-ext4, linux-xfs, jhubbard, hch, david

Main updates since v3:
 - Rebased onto next-20241216

 - Fixed a bunch of build breakages reported by John Hubbard and the
   kernel test robot due to various combinations of CONFIG options.

 - Split the rmap changes into a separate patch as suggested by David H.

 - Reworded the description for the P2PDMA change.

Main updates since v2:

 - Rename the DAX specific dax_insert_XXX functions to vmf_insert_XXX
   and have them pass the vmf struct.

 - Seperate out the device DAX changes.

 - Restore the page share mapping counting and associated warnings.

 - Rework truncate to require file-systems to have previously called
   dax_break_layout() to remove the address space mapping for a
   page. This found several bugs which are fixed by the first half of
   the series. The motivation for this was initially to allow the FS
   DAX page-cache mappings to hold a reference on the page.

   However that turned out to be a dead-end (see the comments on patch
   21), but it found several bugs and I think overall it is an
   improvement so I have left it here.

Device and FS DAX pages have always maintained their own page
reference counts without following the normal rules for page reference
counting. In particular pages are considered free when the refcount
hits one rather than zero and refcounts are not added when mapping the
page.

Tracking this requires special PTE bits (PTE_DEVMAP) and a secondary
mechanism for allowing GUP to hold references on the page (see
get_dev_pagemap). However there doesn't seem to be any reason why FS
DAX pages need their own reference counting scheme.

By treating the refcounts on these pages the same way as normal pages
we can remove a lot of special checks. In particular pXd_trans_huge()
becomes the same as pXd_leaf(), although I haven't made that change
here. It also frees up a valuable SW define PTE bit on architectures
that have devmap PTE bits defined.

It also almost certainly allows further clean-up of the devmap managed
functions, but I have left that as a future improvment. It also
enables support for compound ZONE_DEVICE pages which is one of my
primary motivators for doing this work.

Signed-off-by: Alistair Popple <apopple@nvidia.com>

---

Cc: lina@asahilina.net
Cc: zhang.lyra@gmail.com
Cc: gerald.schaefer@linux.ibm.com
Cc: dan.j.williams@intel.com
Cc: vishal.l.verma@intel.com
Cc: dave.jiang@intel.com
Cc: logang@deltatee.com
Cc: bhelgaas@google.com
Cc: jack@suse.cz
Cc: jgg@ziepe.ca
Cc: catalin.marinas@arm.com
Cc: will@kernel.org
Cc: mpe@ellerman.id.au
Cc: npiggin@gmail.com
Cc: dave.hansen@linux.intel.com
Cc: ira.weiny@intel.com
Cc: willy@infradead.org
Cc: djwong@kernel.org
Cc: tytso@mit.edu
Cc: linmiaohe@huawei.com
Cc: david@redhat.com
Cc: peterx@redhat.com
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: nvdimm@lists.linux.dev
Cc: linux-cxl@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-ext4@vger.kernel.org
Cc: linux-xfs@vger.kernel.org
Cc: jhubbard@nvidia.com
Cc: hch@lst.de
Cc: david@fromorbit.com

Alistair Popple (25):
  fuse: Fix dax truncate/punch_hole fault path
  fs/dax: Return unmapped busy pages from dax_layout_busy_page_range()
  fs/dax: Don't skip locked entries when scanning entries
  fs/dax: Refactor wait for dax idle page
  fs/dax: Create a common implementation to break DAX layouts
  fs/dax: Always remove DAX page-cache entries when breaking layouts
  fs/dax: Ensure all pages are idle prior to filesystem unmount
  fs/dax: Remove PAGE_MAPPING_DAX_SHARED mapping flag
  mm/gup.c: Remove redundant check for PCI P2PDMA page
  mm/mm_init: Move p2pdma page refcount initialisation to p2pdma
  mm: Allow compound zone device pages
  mm/memory: Enhance insert_page_into_pte_locked() to create writable mappings
  mm/memory: Add vmf_insert_page_mkwrite()
  rmap: Add support for PUD sized mappings to rmap
  huge_memory: Add vmf_insert_folio_pud()
  huge_memory: Add vmf_insert_folio_pmd()
  memremap: Add is_device_dax_page() and is_fsdax_page() helpers
  gup: Don't allow FOLL_LONGTERM pinning of FS DAX pages
  proc/task_mmu: Ignore ZONE_DEVICE pages
  mm/mlock: Skip ZONE_DEVICE PMDs during mlock
  fs/dax: Properly refcount fs dax pages
  device/dax: Properly refcount device dax pages when mapping
  mm: Remove pXX_devmap callers
  mm: Remove devmap related functions and page table bits
  Revert "riscv: mm: Add support for ZONE_DEVICE"

 Documentation/mm/arch_pgtable_helpers.rst     |   6 +-
 arch/arm64/Kconfig                            |   1 +-
 arch/arm64/include/asm/pgtable-prot.h         |   1 +-
 arch/arm64/include/asm/pgtable.h              |  24 +-
 arch/powerpc/Kconfig                          |   1 +-
 arch/powerpc/include/asm/book3s/64/hash-4k.h  |   6 +-
 arch/powerpc/include/asm/book3s/64/hash-64k.h |   7 +-
 arch/powerpc/include/asm/book3s/64/pgtable.h  |  52 +---
 arch/powerpc/include/asm/book3s/64/radix.h    |  14 +-
 arch/powerpc/mm/book3s64/hash_pgtable.c       |   3 +-
 arch/powerpc/mm/book3s64/pgtable.c            |   8 +-
 arch/powerpc/mm/book3s64/radix_pgtable.c      |   5 +-
 arch/powerpc/mm/pgtable.c                     |   2 +-
 arch/riscv/Kconfig                            |   1 +-
 arch/riscv/include/asm/pgtable-64.h           |  20 +-
 arch/riscv/include/asm/pgtable-bits.h         |   1 +-
 arch/riscv/include/asm/pgtable.h              |  17 +-
 arch/x86/Kconfig                              |   1 +-
 arch/x86/include/asm/pgtable.h                |  51 +---
 arch/x86/include/asm/pgtable_types.h          |   5 +-
 drivers/dax/device.c                          |  15 +-
 drivers/gpu/drm/nouveau/nouveau_dmem.c        |   3 +-
 drivers/nvdimm/pmem.c                         |   4 +-
 drivers/pci/p2pdma.c                          |  19 +-
 fs/dax.c                                      | 357 ++++++++++++++-----
 fs/ext4/inode.c                               |  43 +--
 fs/fuse/dax.c                                 |  35 +--
 fs/fuse/virtio_fs.c                           |   3 +-
 fs/proc/task_mmu.c                            |  18 +-
 fs/userfaultfd.c                              |   2 +-
 fs/xfs/xfs_inode.c                            |  40 +-
 fs/xfs/xfs_inode.h                            |   3 +-
 fs/xfs/xfs_super.c                            |  18 +-
 include/linux/dax.h                           |  37 ++-
 include/linux/huge_mm.h                       |  22 +-
 include/linux/memremap.h                      |  28 +-
 include/linux/migrate.h                       |   4 +-
 include/linux/mm.h                            |  40 +--
 include/linux/mm_types.h                      |  14 +-
 include/linux/mmzone.h                        |  12 +-
 include/linux/page-flags.h                    |   6 +-
 include/linux/pfn_t.h                         |  20 +-
 include/linux/pgtable.h                       |  21 +-
 include/linux/rmap.h                          |  15 +-
 lib/test_hmm.c                                |   3 +-
 mm/Kconfig                                    |   4 +-
 mm/debug_vm_pgtable.c                         |  59 +---
 mm/gup.c                                      | 176 +---------
 mm/hmm.c                                      |  12 +-
 mm/huge_memory.c                              | 233 +++++++-----
 mm/internal.h                                 |   2 +-
 mm/khugepaged.c                               |   2 +-
 mm/madvise.c                                  |   8 +-
 mm/mapping_dirty_helpers.c                    |   4 +-
 mm/memory-failure.c                           |   6 +-
 mm/memory.c                                   | 126 ++++---
 mm/memremap.c                                 |  59 +--
 mm/migrate_device.c                           |   9 +-
 mm/mlock.c                                    |   2 +-
 mm/mm_init.c                                  |  23 +-
 mm/mprotect.c                                 |   2 +-
 mm/mremap.c                                   |   5 +-
 mm/page_vma_mapped.c                          |   5 +-
 mm/pagewalk.c                                 |  14 +-
 mm/pgtable-generic.c                          |   7 +-
 mm/rmap.c                                     |  56 +++-
 mm/swap.c                                     |   2 +-
 mm/truncate.c                                 |  16 +-
 mm/userfaultfd.c                              |   5 +-
 mm/vmscan.c                                   |   5 +-
 70 files changed, 922 insertions(+), 928 deletions(-)

base-commit: e25c8d66f6786300b680866c0e0139981273feba
-- 
git-series 0.9.1

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2025-01-07 11:29 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-17  5:12 [PATCH v4 00/25] fs/dax: Fix ZONE_DEVICE page reference counts Alistair Popple
2024-12-17  5:12 ` [PATCH v4 01/25] fuse: Fix dax truncate/punch_hole fault path Alistair Popple
2024-12-17  5:12 ` [PATCH v4 02/25] fs/dax: Return unmapped busy pages from dax_layout_busy_page_range() Alistair Popple
2024-12-17  5:12 ` [PATCH v4 03/25] fs/dax: Don't skip locked entries when scanning entries Alistair Popple
2024-12-17  5:12 ` [PATCH v4 04/25] fs/dax: Refactor wait for dax idle page Alistair Popple
2024-12-17  5:12 ` [PATCH v4 05/25] fs/dax: Create a common implementation to break DAX layouts Alistair Popple
2024-12-17  5:12 ` [PATCH v4 06/25] fs/dax: Always remove DAX page-cache entries when breaking layouts Alistair Popple
2024-12-17  5:12 ` [PATCH v4 07/25] fs/dax: Ensure all pages are idle prior to filesystem unmount Alistair Popple
2024-12-17  5:12 ` [PATCH v4 08/25] fs/dax: Remove PAGE_MAPPING_DAX_SHARED mapping flag Alistair Popple
2024-12-17  5:12 ` [PATCH v4 09/25] mm/gup.c: Remove redundant check for PCI P2PDMA page Alistair Popple
2024-12-17 22:06   ` David Hildenbrand
2024-12-17  5:12 ` [PATCH v4 10/25] mm/mm_init: Move p2pdma page refcount initialisation to p2pdma Alistair Popple
2024-12-17 22:14   ` David Hildenbrand
2024-12-18 22:49     ` Alistair Popple
2024-12-20 18:29       ` David Hildenbrand
2024-12-17  5:12 ` [PATCH v4 11/25] mm: Allow compound zone device pages Alistair Popple
2024-12-17  5:12 ` [PATCH v4 12/25] mm/memory: Enhance insert_page_into_pte_locked() to create writable mappings Alistair Popple
2024-12-20 19:01   ` David Hildenbrand
2024-12-20 19:06     ` David Hildenbrand
2025-01-06  2:07       ` Alistair Popple
2025-01-07 11:29         ` David Hildenbrand
2024-12-17  5:12 ` [PATCH v4 13/25] mm/memory: Add vmf_insert_page_mkwrite() Alistair Popple
2024-12-17  5:12 ` [PATCH v4 14/25] rmap: Add support for PUD sized mappings to rmap Alistair Popple
2024-12-17 22:27   ` David Hildenbrand
2024-12-18 22:55     ` Alistair Popple
2024-12-20 18:31       ` David Hildenbrand
2024-12-17  5:12 ` [PATCH v4 15/25] huge_memory: Add vmf_insert_folio_pud() Alistair Popple
2024-12-20 18:52   ` David Hildenbrand
2025-01-06  6:39     ` Alistair Popple
2024-12-17  5:12 ` [PATCH v4 16/25] huge_memory: Add vmf_insert_folio_pmd() Alistair Popple
2024-12-20 18:54   ` David Hildenbrand
2024-12-17  5:13 ` [PATCH v4 17/25] memremap: Add is_device_dax_page() and is_fsdax_page() helpers Alistair Popple
2024-12-20 18:39   ` David Hildenbrand
2024-12-17  5:13 ` [PATCH v4 18/25] gup: Don't allow FOLL_LONGTERM pinning of FS DAX pages Alistair Popple
2024-12-17 22:33   ` David Hildenbrand
2024-12-17  5:13 ` [PATCH v4 19/25] proc/task_mmu: Ignore ZONE_DEVICE pages Alistair Popple
2024-12-17 22:31   ` David Hildenbrand
2024-12-18 23:11     ` Alistair Popple
2024-12-20 18:32       ` David Hildenbrand
2025-01-06  6:43         ` Alistair Popple
2024-12-17  5:13 ` [PATCH v4 20/25] mm/mlock: Skip ZONE_DEVICE PMDs during mlock Alistair Popple
2024-12-17 22:28   ` David Hildenbrand
2024-12-17  5:13 ` [PATCH v4 21/25] fs/dax: Properly refcount fs dax pages Alistair Popple
2024-12-17  5:13 ` [PATCH v4 22/25] device/dax: Properly refcount device dax pages when mapping Alistair Popple
2024-12-17  5:13 ` [PATCH v4 23/25] mm: Remove pXX_devmap callers Alistair Popple
2024-12-17  5:13 ` [PATCH v4 24/25] mm: Remove devmap related functions and page table bits Alistair Popple
2024-12-17  5:13 ` [PATCH v4 25/25] Revert "riscv: mm: Add support for ZONE_DEVICE" Alistair Popple

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).