All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: David Howells <dhowells@redhat.com>, Steve French <smfrench@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>,
	Marc Dionne <marc.dionne@auristor.com>,
	 Paulo Alcantara <pc@manguebit.com>,
	Shyam Prasad N <sprasad@microsoft.com>,
	Tom Talpey <tom@talpey.com>,
	Dominique Martinet <asmadeus@codewreck.org>,
	Eric Van Hensbergen <ericvh@kernel.org>,
	Ilya Dryomov <idryomov@gmail.com>,
	Christian Brauner <christian@brauner.io>,
	linux-cachefs@redhat.com, linux-afs@lists.infradead.org,
	 linux-cifs@vger.kernel.org, linux-nfs@vger.kernel.org,
	 ceph-devel@vger.kernel.org, v9fs@lists.linux.dev,
	linux-fsdevel@vger.kernel.org,  linux-mm@kvack.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4 11/39] netfs: Implement unbuffered/DIO vs buffered I/O locking
Date: Wed, 13 Dec 2023 11:08:23 -0500	[thread overview]
Message-ID: <debede02dbf2701043c2a2b8b9fc05665bc59030.camel@kernel.org> (raw)
In-Reply-To: <20231213152350.431591-12-dhowells@redhat.com>

On Wed, 2023-12-13 at 15:23 +0000, David Howells wrote:
> Borrow NFS's direct-vs-buffered I/O locking into netfslib.  Similar code is
> also used in ceph.
> 
> Modify it to have the correct checker annotations for i_rwsem lock
> acquisition/release and to return -ERESTARTSYS if waits are interrupted.
> 

This is just adding new infrastructure. It'd be nice to go ahead and
convert a filesystem to use this at the same time. Ceph would be a good
candidate. Otherwise, I'm not sure how this shakes out as far as
cleanliness in the callers.


> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Jeff Layton <jlayton@kernel.org>
> cc: linux-cachefs@redhat.com
> cc: linux-fsdevel@vger.kernel.org
> cc: linux-mm@kvack.org
> ---
>  fs/netfs/Makefile     |   1 +
>  fs/netfs/locking.c    | 215 ++++++++++++++++++++++++++++++++++++++++++
>  include/linux/netfs.h |  10 ++
>  3 files changed, 226 insertions(+)
>  create mode 100644 fs/netfs/locking.c
> 
> diff --git a/fs/netfs/Makefile b/fs/netfs/Makefile
> index a84fe9bbd3c4..cf3fc847b8ac 100644
> --- a/fs/netfs/Makefile
> +++ b/fs/netfs/Makefile
> @@ -4,6 +4,7 @@ netfs-y := \
>  	buffered_read.o \
>  	io.o \
>  	iterator.o \
> +	locking.o \
>  	main.o \
>  	misc.o \
>  	objects.o
> diff --git a/fs/netfs/locking.c b/fs/netfs/locking.c
> new file mode 100644
> index 000000000000..58e0f48394c5
> --- /dev/null
> +++ b/fs/netfs/locking.c
> @@ -0,0 +1,215 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * I/O and data path helper functionality.
> + *
> + * Borrowed from NFS Copyright (c) 2016 Trond Myklebust
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/netfs.h>
> +
> +/*
> + * inode_dio_wait_interruptible - wait for outstanding DIO requests to finish
> + * @inode: inode to wait for
> + *
> + * Waits for all pending direct I/O requests to finish so that we can
> + * proceed with a truncate or equivalent operation.
> + *
> + * Must be called under a lock that serializes taking new references
> + * to i_dio_count, usually by inode->i_mutex.
> + */
> +static int inode_dio_wait_interruptible(struct inode *inode)
> +{
> +	if (!atomic_read(&inode->i_dio_count))
> +		return 0;
> +
> +	wait_queue_head_t *wq = bit_waitqueue(&inode->i_state, __I_DIO_WAKEUP);
> +	DEFINE_WAIT_BIT(q, &inode->i_state, __I_DIO_WAKEUP);
> +
> +	for (;;) {
> +		prepare_to_wait(wq, &q.wq_entry, TASK_INTERRUPTIBLE);
> +		if (!atomic_read(&inode->i_dio_count))
> +			break;
> +		if (signal_pending(current))
> +			break;
> +		schedule();
> +	}
> +	finish_wait(wq, &q.wq_entry);
> +
> +	return atomic_read(&inode->i_dio_count) ? -ERESTARTSYS : 0;
> +}
> +
> +/* Call with exclusively locked inode->i_rwsem */
> +static int netfs_block_o_direct(struct netfs_inode *ictx)
> +{
> +	if (!test_bit(NETFS_ICTX_ODIRECT, &ictx->flags))
> +		return 0;
> +	clear_bit(NETFS_ICTX_ODIRECT, &ictx->flags);
> +	return inode_dio_wait_interruptible(&ictx->inode);
> +}
> +
> +/**
> + * netfs_start_io_read - declare the file is being used for buffered reads
> + * @inode: file inode
> + *
> + * Declare that a buffered read operation is about to start, and ensure
> + * that we block all direct I/O.
> + * On exit, the function ensures that the NETFS_ICTX_ODIRECT flag is unset,
> + * and holds a shared lock on inode->i_rwsem to ensure that the flag
> + * cannot be changed.
> + * In practice, this means that buffered read operations are allowed to
> + * execute in parallel, thanks to the shared lock, whereas direct I/O
> + * operations need to wait to grab an exclusive lock in order to set
> + * NETFS_ICTX_ODIRECT.
> + * Note that buffered writes and truncates both take a write lock on
> + * inode->i_rwsem, meaning that those are serialised w.r.t. the reads.
> + */
> +int netfs_start_io_read(struct inode *inode)
> +	__acquires(inode->i_rwsem)
> +{
> +	struct netfs_inode *ictx = netfs_inode(inode);
> +
> +	/* Be an optimist! */
> +	if (down_read_interruptible(&inode->i_rwsem) < 0)
> +		return -ERESTARTSYS;
> +	if (test_bit(NETFS_ICTX_ODIRECT, &ictx->flags) == 0)
> +		return 0;
> +	up_read(&inode->i_rwsem);
> +
> +	/* Slow path.... */
> +	if (down_write_killable(&inode->i_rwsem) < 0)
> +		return -ERESTARTSYS;
> +	if (netfs_block_o_direct(ictx) < 0) {
> +		up_write(&inode->i_rwsem);
> +		return -ERESTARTSYS;
> +	}
> +	downgrade_write(&inode->i_rwsem);
> +	return 0;
> +}
> +EXPORT_SYMBOL(netfs_start_io_read);
> +
> +/**
> + * netfs_end_io_read - declare that the buffered read operation is done
> + * @inode: file inode
> + *
> + * Declare that a buffered read operation is done, and release the shared
> + * lock on inode->i_rwsem.
> + */
> +void netfs_end_io_read(struct inode *inode)
> +	__releases(inode->i_rwsem)
> +{
> +	up_read(&inode->i_rwsem);
> +}
> +EXPORT_SYMBOL(netfs_end_io_read);
> +
> +/**
> + * netfs_start_io_write - declare the file is being used for buffered writes
> + * @inode: file inode
> + *
> + * Declare that a buffered read operation is about to start, and ensure
> + * that we block all direct I/O.
> + */
> +int netfs_start_io_write(struct inode *inode)
> +	__acquires(inode->i_rwsem)
> +{
> +	struct netfs_inode *ictx = netfs_inode(inode);
> +
> +	if (down_write_killable(&inode->i_rwsem) < 0)
> +		return -ERESTARTSYS;
> +	if (netfs_block_o_direct(ictx) < 0) {
> +		up_write(&inode->i_rwsem);
> +		return -ERESTARTSYS;
> +	}
> +	return 0;
> +}
> +EXPORT_SYMBOL(netfs_start_io_write);
> +
> +/**
> + * netfs_end_io_write - declare that the buffered write operation is done
> + * @inode: file inode
> + *
> + * Declare that a buffered write operation is done, and release the
> + * lock on inode->i_rwsem.
> + */
> +void netfs_end_io_write(struct inode *inode)
> +	__releases(inode->i_rwsem)
> +{
> +	up_write(&inode->i_rwsem);
> +}
> +EXPORT_SYMBOL(netfs_end_io_write);
> +
> +/* Call with exclusively locked inode->i_rwsem */
> +static int netfs_block_buffered(struct inode *inode)
> +{
> +	struct netfs_inode *ictx = netfs_inode(inode);
> +	int ret;
> +
> +	if (!test_bit(NETFS_ICTX_ODIRECT, &ictx->flags)) {
> +		set_bit(NETFS_ICTX_ODIRECT, &ictx->flags);
> +		if (inode->i_mapping->nrpages != 0) {
> +			unmap_mapping_range(inode->i_mapping, 0, 0, 0);
> +			ret = filemap_fdatawait(inode->i_mapping);
> +			if (ret < 0) {
> +				clear_bit(NETFS_ICTX_ODIRECT, &ictx->flags);
> +				return ret;
> +			}
> +		}
> +	}
> +	return 0;
> +}
> +
> +/**
> + * netfs_start_io_direct - declare the file is being used for direct i/o
> + * @inode: file inode
> + *
> + * Declare that a direct I/O operation is about to start, and ensure
> + * that we block all buffered I/O.
> + * On exit, the function ensures that the NETFS_ICTX_ODIRECT flag is set,
> + * and holds a shared lock on inode->i_rwsem to ensure that the flag
> + * cannot be changed.
> + * In practice, this means that direct I/O operations are allowed to
> + * execute in parallel, thanks to the shared lock, whereas buffered I/O
> + * operations need to wait to grab an exclusive lock in order to clear
> + * NETFS_ICTX_ODIRECT.
> + * Note that buffered writes and truncates both take a write lock on
> + * inode->i_rwsem, meaning that those are serialised w.r.t. O_DIRECT.
> + */
> +int netfs_start_io_direct(struct inode *inode)
> +	__acquires(inode->i_rwsem)
> +{
> +	struct netfs_inode *ictx = netfs_inode(inode);
> +	int ret;
> +
> +	/* Be an optimist! */
> +	if (down_read_interruptible(&inode->i_rwsem) < 0)
> +		return -ERESTARTSYS;
> +	if (test_bit(NETFS_ICTX_ODIRECT, &ictx->flags) != 0)
> +		return 0;
> +	up_read(&inode->i_rwsem);
> +
> +	/* Slow path.... */
> +	if (down_write_killable(&inode->i_rwsem) < 0)
> +		return -ERESTARTSYS;
> +	ret = netfs_block_buffered(inode);
> +	if (ret < 0) {
> +		up_write(&inode->i_rwsem);
> +		return ret;
> +	}
> +	downgrade_write(&inode->i_rwsem);
> +	return 0;
> +}
> +EXPORT_SYMBOL(netfs_start_io_direct);
> +
> +/**
> + * netfs_end_io_direct - declare that the direct i/o operation is done
> + * @inode: file inode
> + *
> + * Declare that a direct I/O operation is done, and release the shared
> + * lock on inode->i_rwsem.
> + */
> +void netfs_end_io_direct(struct inode *inode)
> +	__releases(inode->i_rwsem)
> +{
> +	up_read(&inode->i_rwsem);
> +}
> +EXPORT_SYMBOL(netfs_end_io_direct);
> diff --git a/include/linux/netfs.h b/include/linux/netfs.h
> index 8efbfd3b2820..fc6d9756a029 100644
> --- a/include/linux/netfs.h
> +++ b/include/linux/netfs.h
> @@ -129,6 +129,8 @@ struct netfs_inode {
>  	struct fscache_cookie	*cache;
>  #endif
>  	loff_t			remote_i_size;	/* Size of the remote file */
> +	unsigned long		flags;
> +#define NETFS_ICTX_ODIRECT	0		/* The file has DIO in progress */
>  };
>  
>  /*
> @@ -310,6 +312,13 @@ ssize_t netfs_extract_user_iter(struct iov_iter *orig, size_t orig_len,
>  				struct iov_iter *new,
>  				iov_iter_extraction_t extraction_flags);
>  
> +int netfs_start_io_read(struct inode *inode);
> +void netfs_end_io_read(struct inode *inode);
> +int netfs_start_io_write(struct inode *inode);
> +void netfs_end_io_write(struct inode *inode);
> +int netfs_start_io_direct(struct inode *inode);
> +void netfs_end_io_direct(struct inode *inode);
> +
>  /**
>   * netfs_inode - Get the netfs inode context from the inode
>   * @inode: The inode to query
> @@ -335,6 +344,7 @@ static inline void netfs_inode_init(struct netfs_inode *ctx,
>  {
>  	ctx->ops = ops;
>  	ctx->remote_i_size = i_size_read(&ctx->inode);
> +	ctx->flags = 0;
>  #if IS_ENABLED(CONFIG_FSCACHE)
>  	ctx->cache = NULL;
>  #endif
> 
-- 
Jeff Layton <jlayton@kernel.org>

  reply	other threads:[~2023-12-13 16:08 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-13 15:23 [PATCH v4 00/39] netfs, afs, 9p: Delegate high-level I/O to netfslib David Howells
2023-12-13 15:23 ` [PATCH v4 01/39] netfs, fscache: Move fs/fscache/* into fs/netfs/ David Howells
2023-12-13 15:23 ` [PATCH v4 02/39] netfs, fscache: Combine fscache with netfs David Howells
2023-12-13 15:23 ` [PATCH v4 03/39] netfs, fscache: Remove ->begin_cache_operation David Howells
2023-12-13 15:23 ` [PATCH v4 04/39] netfs, fscache: Move /proc/fs/fscache to /proc/fs/netfs and put in a symlink David Howells
2023-12-13 15:23 ` [PATCH v4 05/39] netfs: Move pinning-for-writeback from fscache to netfs David Howells
2023-12-13 15:23 ` [PATCH v4 06/39] netfs: Add a procfile to list in-progress requests David Howells
2023-12-13 15:59   ` Jeff Layton
2023-12-13 15:23 ` [PATCH v4 07/39] netfs: Allow the netfs to make the io (sub)request alloc larger David Howells
2023-12-13 15:23 ` [PATCH v4 08/39] netfs: Add a ->free_subrequest() op David Howells
2023-12-13 15:23 ` [PATCH v4 09/39] afs: Don't use folio->private to record partial modification David Howells
2023-12-13 15:23 ` [PATCH v4 10/39] netfs: Provide invalidate_folio and release_folio calls David Howells
2023-12-13 16:05   ` Jeff Layton
2023-12-13 15:23 ` [PATCH v4 11/39] netfs: Implement unbuffered/DIO vs buffered I/O locking David Howells
2023-12-13 16:08   ` Jeff Layton [this message]
2023-12-13 16:30     ` Jeff Layton
2023-12-13 15:23 ` [PATCH v4 12/39] netfs: Add iov_iters to (sub)requests to describe various buffers David Howells
2023-12-13 16:37   ` Jeff Layton
2023-12-19 14:31     ` David Howells
2023-12-19 14:40       ` David Howells
2023-12-13 15:23 ` [PATCH v4 13/39] netfs: Add support for DIO buffering David Howells
2023-12-13 15:23 ` [PATCH v4 14/39] netfs: Provide tools to create a buffer in an xarray David Howells
2023-12-13 15:23 ` [PATCH v4 15/39] netfs: Add bounce buffering support David Howells
2023-12-13 15:23 ` [PATCH v4 16/39] netfs: Add func to calculate pagecount/size-limited span of an iterator David Howells
2023-12-13 15:23 ` [PATCH v4 17/39] netfs: Limit subrequest by size or number of segments David Howells
2023-12-13 15:23 ` [PATCH v4 18/39] netfs: Export netfs_put_subrequest() and some tracepoints David Howells
2023-12-13 18:01   ` Jeff Layton
2023-12-19 14:42     ` David Howells
2023-12-19 14:48       ` David Howells
2023-12-13 15:23 ` [PATCH v4 19/39] netfs: Extend the netfs_io_*request structs to handle writes David Howells
2023-12-13 15:23 ` [PATCH v4 20/39] netfs: Add a hook to allow tell the netfs to update its i_size David Howells
2023-12-13 15:23 ` [PATCH v4 21/39] netfs: Make netfs_put_request() handle a NULL pointer David Howells
2023-12-13 15:23 ` [PATCH v4 22/39] netfs: Make the refcounting of netfs_begin_read() easier to use David Howells
2023-12-13 15:23 ` [PATCH v4 23/39] netfs: Prep to use folio->private for write grouping and streaming write David Howells
2023-12-13 15:23 ` [PATCH v4 24/39] netfs: Dispatch write requests to process a writeback slice David Howells
2023-12-13 15:23 ` [PATCH v4 25/39] netfs: Provide func to copy data to pagecache for buffered write David Howells
2023-12-13 15:23 ` [PATCH v4 26/39] netfs: Make netfs_read_folio() handle streaming-write pages David Howells
2023-12-13 15:23 ` [PATCH v4 27/39] netfs: Allocate multipage folios in the writepath David Howells
2023-12-13 15:23 ` [PATCH v4 28/39] netfs: Implement support for unbuffered/DIO read David Howells
2023-12-14 12:43   ` Jeff Layton
2023-12-19 15:46     ` David Howells
2023-12-13 15:23 ` [PATCH v4 29/39] netfs: Implement unbuffered/DIO write support David Howells
2023-12-13 15:23 ` [PATCH v4 30/39] netfs: Implement buffered write API David Howells
2023-12-13 15:23 ` [PATCH v4 31/39] netfs: Allow buffered shared-writeable mmap through netfs_page_mkwrite() David Howells
2023-12-13 15:23 ` [PATCH v4 32/39] netfs: Provide netfs_file_read_iter() David Howells
2023-12-13 15:23 ` [PATCH v4 33/39] netfs, cachefiles: Pass upper bound length to allow expansion David Howells
2023-12-13 15:23 ` [PATCH v4 34/39] netfs: Provide a writepages implementation David Howells
2023-12-13 15:23 ` [PATCH v4 35/39] netfs: Provide a launder_folio implementation David Howells
2023-12-13 15:23 ` [PATCH v4 36/39] netfs: Implement a write-through caching option David Howells
2023-12-14 13:49   ` Jeff Layton
2023-12-19 16:51     ` David Howells
2023-12-19 17:19       ` Jeff Layton
2023-12-13 15:23 ` [PATCH v4 37/39] netfs: Optimise away reads above the point at which there can be no data David Howells
2023-12-14 14:07   ` Jeff Layton
2023-12-19 16:56     ` David Howells
2023-12-13 15:23 ` [PATCH v4 38/39] afs: Use the netfs write helpers David Howells
2023-12-13 15:23 ` [PATCH v4 39/39] 9p: Use netfslib read/write_iter David Howells
2023-12-13 15:39   ` Christian Schoenebeck
2023-12-14 14:11 ` [PATCH v4 00/39] netfs, afs, 9p: Delegate high-level I/O to netfslib Jeff Layton
2023-12-15 12:03 ` Christian Brauner
2023-12-15 13:29   ` Dominique Martinet
2023-12-18 11:05     ` Christian Brauner
2023-12-20 10:04     ` David Howells
2023-12-20 13:26       ` Christian Brauner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=debede02dbf2701043c2a2b8b9fc05665bc59030.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=asmadeus@codewreck.org \
    --cc=ceph-devel@vger.kernel.org \
    --cc=christian@brauner.io \
    --cc=dhowells@redhat.com \
    --cc=ericvh@kernel.org \
    --cc=idryomov@gmail.com \
    --cc=linux-afs@lists.infradead.org \
    --cc=linux-cachefs@redhat.com \
    --cc=linux-cifs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=marc.dionne@auristor.com \
    --cc=netdev@vger.kernel.org \
    --cc=pc@manguebit.com \
    --cc=smfrench@gmail.com \
    --cc=sprasad@microsoft.com \
    --cc=tom@talpey.com \
    --cc=v9fs@lists.linux.dev \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.