public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
From: John Hubbard <jhubbard@nvidia.com>
To: Alistair Popple <apopple@nvidia.com>, <dan.j.williams@intel.com>,
	<linux-mm@kvack.org>
Cc: <lina@asahilina.net>, <zhang.lyra@gmail.com>,
	<gerald.schaefer@linux.ibm.com>, <vishal.l.verma@intel.com>,
	<dave.jiang@intel.com>, <logang@deltatee.com>,
	<bhelgaas@google.com>, <jack@suse.cz>, <jgg@ziepe.ca>,
	<catalin.marinas@arm.com>, <will@kernel.org>,
	<mpe@ellerman.id.au>, <npiggin@gmail.com>,
	<dave.hansen@linux.intel.com>, <ira.weiny@intel.com>,
	<willy@infradead.org>, <djwong@kernel.org>, <tytso@mit.edu>,
	<linmiaohe@huawei.com>, <david@redhat.com>, <peterx@redhat.com>,
	<linux-doc@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<linux-arm-kernel@lists.infradead.org>,
	<linuxppc-dev@lists.ozlabs.org>, <nvdimm@lists.linux.dev>,
	<linux-cxl@vger.kernel.org>, <linux-fsdevel@vger.kernel.org>,
	<linux-ext4@vger.kernel.org>, <linux-xfs@vger.kernel.org>,
	<hch@lst.de>, <david@fromorbit.com>
Subject: Re: [PATCH v3 05/25] fs/dax: Create a common implementation to break DAX layouts
Date: Thu, 21 Nov 2024 18:57:16 -0800	[thread overview]
Message-ID: <61ff7fe6-9a79-4dd3-8076-7106fe08be5c@nvidia.com> (raw)
In-Reply-To: <31b18e4f813bf9e661d5d98d928c79556c8c2c57.1732239628.git-series.apopple@nvidia.com>

On 11/21/24 5:40 PM, Alistair Popple wrote:
> Prior to freeing a block file systems supporting FS DAX must check
> that the associated pages are both unmapped from user-space and not
> undergoing DMA or other access from eg. get_user_pages(). This is
> achieved by unmapping the file range and scanning the FS DAX
> page-cache to see if any pages within the mapping have an elevated
> refcount.
> 
> This is done using two functions - dax_layout_busy_page_range() which
> returns a page to wait for the refcount to become idle on. Rather than
> open-code this introduce a common implementation to both unmap and
> wait for the page to become idle.
> 
> Signed-off-by: Alistair Popple <apopple@nvidia.com>
> ---
>   fs/dax.c            | 29 +++++++++++++++++++++++++++++
>   fs/ext4/inode.c     | 10 +---------
>   fs/fuse/dax.c       | 29 +++++------------------------
>   fs/xfs/xfs_inode.c  | 23 +++++------------------
>   fs/xfs/xfs_inode.h  |  2 +-
>   include/linux/dax.h |  7 +++++++
>   6 files changed, 48 insertions(+), 52 deletions(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index efc1d56..b1ad813 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -845,6 +845,35 @@ int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index)
>   	return ret;
>   }
>   
> +static int wait_page_idle(struct page *page,
> +			void (cb)(struct inode *),
> +			struct inode *inode)
> +{
> +	return ___wait_var_event(page, page_ref_count(page) == 1,
> +				TASK_INTERRUPTIBLE, 0, 0, cb(inode));
> +}
> +
> +/*
> + * Unmaps the inode and waits for any DMA to complete prior to deleting the
> + * DAX mapping entries for the range.
> + */
> +int dax_break_mapping(struct inode *inode, loff_t start, loff_t end,
> +		void (cb)(struct inode *))
> +{
> +	struct page *page;
> +	int error;
> +
> +	do {
> +		page = dax_layout_busy_page_range(inode->i_mapping, start, end);
> +		if (!page)
> +			break;
> +
> +		error = wait_page_idle(page, cb, inode);
> +	} while (error == 0);
> +
> +	return error;
> +}

Hi Alistair!

This needs to be EXPORT'ed. In fact so do two others, but I thought I'd
reply at the exact point that the first fix needs to be inserted, which
is here. And let you sprinkle the remaining two into the right patches.

The overall diff required for me to build the kernel with this series is:

diff --git a/fs/dax.c b/fs/dax.c
index 0169b356b975..35e3c41eb517 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -921,6 +921,7 @@ void dax_delete_mapping_range(struct address_space *mapping,
  	}
  	xas_unlock_irq(&xas);
  }
+EXPORT_SYMBOL_GPL(dax_delete_mapping_range);
  
  static int wait_page_idle(struct page *page,
  			void (cb)(struct inode *),
@@ -961,6 +962,7 @@ int dax_break_mapping(struct inode *inode, loff_t start, loff_t end,
  
  	return error;
  }
+EXPORT_SYMBOL_GPL(dax_break_mapping);
  
  void dax_break_mapping_uninterruptible(struct inode *inode,
  				void (cb)(struct inode *))
@@ -979,6 +981,7 @@ void dax_break_mapping_uninterruptible(struct inode *inode,
  	if (!page)
  		dax_delete_mapping_range(inode->i_mapping, 0, LLONG_MAX);
  }
+EXPORT_SYMBOL_GPL(dax_break_mapping_uninterruptible);
  
  /*
   * Invalidate DAX entry if it is clean.


thanks,
John Hubbard


> +
>   /*
>    * Invalidate DAX entry if it is clean.
>    */
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index cf87c5b..d42c011 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3885,15 +3885,7 @@ int ext4_break_layouts(struct inode *inode)
>   	if (WARN_ON_ONCE(!rwsem_is_locked(&inode->i_mapping->invalidate_lock)))
>   		return -EINVAL;
>   
> -	do {
> -		page = dax_layout_busy_page(inode->i_mapping);
> -		if (!page)
> -			return 0;
> -
> -		error = dax_wait_page_idle(page, ext4_wait_dax_page, inode);
> -	} while (error == 0);
> -
> -	return error;
> +	return dax_break_mapping_inode(inode, ext4_wait_dax_page);
>   }
>   
>   /*
> diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
> index af436b5..2493f9c 100644
> --- a/fs/fuse/dax.c
> +++ b/fs/fuse/dax.c
> @@ -665,38 +665,19 @@ static void fuse_wait_dax_page(struct inode *inode)
>   	filemap_invalidate_lock(inode->i_mapping);
>   }
>   
> -/* Should be called with mapping->invalidate_lock held exclusively */
> -static int __fuse_dax_break_layouts(struct inode *inode, bool *retry,
> -				    loff_t start, loff_t end)
> -{
> -	struct page *page;
> -
> -	page = dax_layout_busy_page_range(inode->i_mapping, start, end);
> -	if (!page)
> -		return 0;
> -
> -	*retry = true;
> -	return dax_wait_page_idle(page, fuse_wait_dax_page, inode);
> -}
> -
> -/* dmap_end == 0 leads to unmapping of whole file */
> +/* Should be called with mapping->invalidate_lock held exclusively.
> + * dmap_end == 0 leads to unmapping of whole file.
> + */
>   int fuse_dax_break_layouts(struct inode *inode, u64 dmap_start,
>   				  u64 dmap_end)
>   {
> -	bool	retry;
> -	int	ret;
> -
> -	do {
> -		retry = false;
> -		ret = __fuse_dax_break_layouts(inode, &retry, dmap_start,
> -					       dmap_end);
> -	} while (ret == 0 && retry);
>   	if (!dmap_end) {
>   		dmap_start = 0;
>   		dmap_end = LLONG_MAX;
>   	}
>   
> -	return ret;
> +	return dax_break_mapping(inode, dmap_start, dmap_end,
> +				fuse_wait_dax_page);
>   }
>   
>   ssize_t fuse_dax_read_iter(struct kiocb *iocb, struct iov_iter *to)
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index eb12123..120597a 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -2704,21 +2704,17 @@ xfs_mmaplock_two_inodes_and_break_dax_layout(
>   	struct xfs_inode	*ip2)
>   {
>   	int			error;
> -	bool			retry;
>   	struct page		*page;
>   
>   	if (ip1->i_ino > ip2->i_ino)
>   		swap(ip1, ip2);
>   
>   again:
> -	retry = false;
>   	/* Lock the first inode */
>   	xfs_ilock(ip1, XFS_MMAPLOCK_EXCL);
> -	error = xfs_break_dax_layouts(VFS_I(ip1), &retry);
> -	if (error || retry) {
> +	error = xfs_break_dax_layouts(VFS_I(ip1));
> +	if (error) {
>   		xfs_iunlock(ip1, XFS_MMAPLOCK_EXCL);
> -		if (error == 0 && retry)
> -			goto again;
>   		return error;
>   	}
>   
> @@ -2977,19 +2973,11 @@ xfs_wait_dax_page(
>   
>   int
>   xfs_break_dax_layouts(
> -	struct inode		*inode,
> -	bool			*retry)
> +	struct inode		*inode)
>   {
> -	struct page		*page;
> -
>   	xfs_assert_ilocked(XFS_I(inode), XFS_MMAPLOCK_EXCL);
>   
> -	page = dax_layout_busy_page(inode->i_mapping);
> -	if (!page)
> -		return 0;
> -
> -	*retry = true;
> -	return dax_wait_page_idle(page, xfs_wait_dax_page, inode);
> +	return dax_break_mapping_inode(inode, xfs_wait_dax_page);
>   }
>   
>   int
> @@ -3007,8 +2995,7 @@ xfs_break_layouts(
>   		retry = false;
>   		switch (reason) {
>   		case BREAK_UNMAP:
> -			error = xfs_break_dax_layouts(inode, &retry);
> -			if (error || retry)
> +			if (xfs_break_dax_layouts(inode))
>   				break;
>   			fallthrough;
>   		case BREAK_WRITE:
> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> index 97ed912..0db27ba 100644
> --- a/fs/xfs/xfs_inode.h
> +++ b/fs/xfs/xfs_inode.h
> @@ -564,7 +564,7 @@ xfs_itruncate_extents(
>   	return xfs_itruncate_extents_flags(tpp, ip, whichfork, new_size, 0);
>   }
>   
> -int	xfs_break_dax_layouts(struct inode *inode, bool *retry);
> +int	xfs_break_dax_layouts(struct inode *inode);
>   int	xfs_break_layouts(struct inode *inode, uint *iolock,
>   		enum layout_break_reason reason);
>   
> diff --git a/include/linux/dax.h b/include/linux/dax.h
> index 773dfc4..7419c88 100644
> --- a/include/linux/dax.h
> +++ b/include/linux/dax.h
> @@ -257,6 +257,13 @@ vm_fault_t dax_finish_sync_fault(struct vm_fault *vmf,
>   int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index);
>   int dax_invalidate_mapping_entry_sync(struct address_space *mapping,
>   				      pgoff_t index);
> +int __must_check dax_break_mapping(struct inode *inode, loff_t start,
> +				loff_t end, void (cb)(struct inode *));
> +static inline int __must_check dax_break_mapping_inode(struct inode *inode,
> +						void (cb)(struct inode *))
> +{
> +	return dax_break_mapping(inode, 0, LLONG_MAX, cb);
> +}
>   int dax_dedupe_file_range_compare(struct inode *src, loff_t srcoff,
>   				  struct inode *dest, loff_t destoff,
>   				  loff_t len, bool *is_same,




  reply	other threads:[~2024-11-22  2:57 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-22  1:40 [PATCH v3 00/25] fs/dax: Fix ZONE_DEVICE page reference counts Alistair Popple
2024-11-22  1:40 ` [PATCH v3 01/25] fuse: Fix dax truncate/punch_hole fault path Alistair Popple
2024-11-22  1:40 ` [PATCH v3 02/25] fs/dax: Return unmapped busy pages from dax_layout_busy_page_range() Alistair Popple
2024-11-22  1:40 ` [PATCH v3 03/25] fs/dax: Don't skip locked entries when scanning entries Alistair Popple
2024-11-22  1:40 ` [PATCH v3 04/25] fs/dax: Refactor wait for dax idle page Alistair Popple
2024-11-22  1:40 ` [PATCH v3 05/25] fs/dax: Create a common implementation to break DAX layouts Alistair Popple
2024-11-22  2:57   ` John Hubbard [this message]
2024-11-22  3:37     ` Alistair Popple
2024-11-25 13:27   ` kernel test robot
2024-11-22  1:40 ` [PATCH v3 06/25] fs/dax: Always remove DAX page-cache entries when breaking layouts Alistair Popple
2024-11-22  1:40 ` [PATCH v3 07/25] fs/dax: Ensure all pages are idle prior to filesystem unmount Alistair Popple
2024-11-22  1:40 ` [PATCH v3 08/25] fs/dax: Remove PAGE_MAPPING_DAX_SHARED mapping flag Alistair Popple
2024-11-22  1:40 ` [PATCH v3 09/25] mm/gup.c: Remove redundant check for PCI P2PDMA page Alistair Popple
2024-11-22  1:40 ` [PATCH v3 10/25] pci/p2pdma: Don't initialise page refcount to one Alistair Popple
2024-11-22 18:20   ` Bjorn Helgaas
2024-11-24 22:39     ` Alistair Popple
2024-11-22  1:40 ` [PATCH v3 11/25] mm: Allow compound zone device pages Alistair Popple
2024-11-25  5:05   ` kernel test robot
2024-11-25 15:51   ` kernel test robot
2024-11-22  1:40 ` [PATCH v3 12/25] mm/memory: Enhance insert_page_into_pte_locked() to create writable mappings Alistair Popple
2024-11-22  1:40 ` [PATCH v3 13/25] mm/memory: Add vmf_insert_page_mkwrite() Alistair Popple
2024-11-22  1:40 ` [PATCH v3 14/25] huge_memory: Allow mappings of PUD sized pages Alistair Popple
2024-12-14 15:32   ` David Hildenbrand
2024-11-22  1:40 ` [PATCH v3 15/25] huge_memory: Allow mappings of PMD " Alistair Popple
2024-11-22  1:40 ` [PATCH v3 16/25] memremap: Add is_device_dax_page() and is_fsdax_page() helpers Alistair Popple
2024-11-22  1:40 ` [PATCH v3 17/25] gup: Don't allow FOLL_LONGTERM pinning of FS DAX pages Alistair Popple
2024-11-22  3:23   ` John Hubbard
2024-11-22  1:40 ` [PATCH v3 18/25] proc/task_mmu: Ignore ZONE_DEVICE pages Alistair Popple
2024-11-22  1:40 ` [PATCH v3 19/25] memcontrol-v1: " Alistair Popple
2024-11-22  1:40 ` [PATCH v3 20/25] mm/mlock: Skip ZONE_DEVICE PMDs during mlock Alistair Popple
2024-11-22  1:40 ` [PATCH v3 21/25] fs/dax: Properly refcount fs dax pages Alistair Popple
2024-11-22  1:40 ` [PATCH v3 22/25] device/dax: Properly refcount device dax pages when mapping Alistair Popple
2024-11-22  1:40 ` [PATCH v3 23/25] mm: Remove pXX_devmap callers Alistair Popple
2024-11-22  1:40 ` [PATCH v3 24/25] mm: Remove devmap related functions and page table bits Alistair Popple
2024-11-22  1:40 ` [PATCH v3 25/25] Revert "riscv: mm: Add support for ZONE_DEVICE" Alistair Popple
2024-11-25 13:13   ` Björn Töpel
2024-12-14  1:39 ` [PATCH v3 00/25] fs/dax: Fix ZONE_DEVICE page reference counts Dan Williams
2024-12-14 15:22   ` David Hildenbrand
2024-12-16  0:55     ` Alistair Popple
2024-12-16  6:26       ` Andrew Morton
2024-12-17  5:20         ` Alistair Popple

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=61ff7fe6-9a79-4dd3-8076-7106fe08be5c@nvidia.com \
    --to=jhubbard@nvidia.com \
    --cc=apopple@nvidia.com \
    --cc=bhelgaas@google.com \
    --cc=catalin.marinas@arm.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dave.jiang@intel.com \
    --cc=david@fromorbit.com \
    --cc=david@redhat.com \
    --cc=djwong@kernel.org \
    --cc=gerald.schaefer@linux.ibm.com \
    --cc=hch@lst.de \
    --cc=ira.weiny@intel.com \
    --cc=jack@suse.cz \
    --cc=jgg@ziepe.ca \
    --cc=lina@asahilina.net \
    --cc=linmiaohe@huawei.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=logang@deltatee.com \
    --cc=mpe@ellerman.id.au \
    --cc=npiggin@gmail.com \
    --cc=nvdimm@lists.linux.dev \
    --cc=peterx@redhat.com \
    --cc=tytso@mit.edu \
    --cc=vishal.l.verma@intel.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=zhang.lyra@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox