From: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
To: Alex Markuze <amarkuze@redhat.com>,
"slava@dubeyko.com" <slava@dubeyko.com>,
David Howells <dhowells@redhat.com>
Cc: "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
"idryomov@gmail.com" <idryomov@gmail.com>,
"jlayton@kernel.org" <jlayton@kernel.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>,
"dongsheng.yang@easystack.cn" <dongsheng.yang@easystack.cn>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH 32/35] netfs: Add some more RMW support for ceph
Date: Wed, 19 Mar 2025 19:14:24 +0000 [thread overview]
Message-ID: <b31f451e2949e7c07535accda067178238f7e1bb.camel@ibm.com> (raw)
In-Reply-To: <20250313233341.1675324-33-dhowells@redhat.com>
On Thu, 2025-03-13 at 23:33 +0000, David Howells wrote:
> Add some support for RMW in ceph:
>
> (1) Add netfs_unbuffered_read_from_inode() to allow reading from an inode
> without having a file pointer so that truncate can modify a
> now-partial tail block of a content-encrypted file.
>
> This takes an additional argument to cause it to fail or give a short
> read if a hole is encountered. This is noted on the request with
> NETFS_RREQ_NO_READ_HOLE for the filesystem to pick up.
>
> (2) Set NETFS_RREQ_RMW when doing an RMW as part of a request.
>
> (3) Provide a ->rmw_read_done() op for netfslib to tell the filesystem
> that it has completed the read required for RMW.
>
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Jeff Layton <jlayton@kernel.org>
> cc: Viacheslav Dubeyko <slava@dubeyko.com>
> cc: Alex Markuze <amarkuze@redhat.com>
> cc: Ilya Dryomov <idryomov@gmail.com>
> cc: ceph-devel@vger.kernel.org
> cc: linux-fsdevel@vger.kernel.org
> ---
> fs/netfs/direct_read.c | 75 ++++++++++++++++++++++++++++++++++++
> fs/netfs/direct_write.c | 1 +
> fs/netfs/main.c | 1 +
> fs/netfs/objects.c | 1 +
> fs/netfs/read_collect.c | 2 +
> fs/netfs/write_retry.c | 3 ++
> include/linux/netfs.h | 7 ++++
> include/trace/events/netfs.h | 3 ++
> 8 files changed, 93 insertions(+)
>
> diff --git a/fs/netfs/direct_read.c b/fs/netfs/direct_read.c
> index 5e4bd1e5a378..4061f934dfe6 100644
> --- a/fs/netfs/direct_read.c
> +++ b/fs/netfs/direct_read.c
> @@ -373,3 +373,78 @@ ssize_t netfs_unbuffered_read_iter(struct kiocb *iocb, struct iov_iter *iter)
> return ret;
> }
> EXPORT_SYMBOL(netfs_unbuffered_read_iter);
> +
> +/**
> + * netfs_unbuffered_read_from_inode - Perform an unbuffered sync I/O read
> + * @inode: The inode being accessed
> + * @pos: The file position to read from
> + * @iter: The output buffer (also specifies read length)
> + * @nohole: True to return short/ENODATA if hole encountered
> + *
> + * Perform a synchronous unbuffered I/O from the inode to the output buffer.
> + * No use is made of the pagecache. The output buffer must be suitably aligned
> + * if content encryption is to be used. If @nohole is true then the read will
> + * stop short if a hole is encountered and return -ENODATA if the read begins
> + * with a hole.
> + *
> + * The caller must hold any appropriate locks.
> + */
> +ssize_t netfs_unbuffered_read_from_inode(struct inode *inode, loff_t pos,
> + struct iov_iter *iter, bool nohole)
> +{
> + struct netfs_io_request *rreq;
> + ssize_t ret;
> + size_t orig_count = iov_iter_count(iter);
> +
> + _enter("");
> +
> + if (WARN_ON(user_backed_iter(iter)))
> + return -EIO;
> +
> + if (!orig_count)
> + return 0; /* Don't update atime */
> +
> + ret = filemap_write_and_wait_range(inode->i_mapping, pos, orig_count);
> + if (ret < 0)
> + return ret;
> + inode_update_time(inode, S_ATIME);
> +
> + rreq = netfs_alloc_request(inode->i_mapping, NULL, pos, orig_count,
> + NULL, NETFS_UNBUFFERED_READ);
> + if (IS_ERR(rreq))
> + return PTR_ERR(rreq);
> +
> + ret = -EIO;
> + if (test_bit(NETFS_RREQ_CONTENT_ENCRYPTION, &rreq->flags) &&
> + WARN_ON(!netfs_is_crypto_aligned(rreq, iter)))
> + goto out;
> +
> + netfs_stat(&netfs_n_rh_dio_read);
> + trace_netfs_read(rreq, rreq->start, rreq->len,
> + netfs_read_trace_unbuffered_read_from_inode);
> +
> + rreq->buffer.iter = *iter;
The struct iov_iter structure is complex enough and we assign it by value to
rreq->buffer.iter. So, the initial pointer will not receive any changes then. Is
it desired behavior here?
Thanks,
Slava.
> + rreq->len = orig_count;
> + rreq->direct_bv_unpin = false;
> + iov_iter_advance(iter, orig_count);
> +
> + if (nohole)
> + __set_bit(NETFS_RREQ_NO_READ_HOLE, &rreq->flags);
> +
> + /* We're going to do the crypto in place in the destination buffer. */
> + if (test_bit(NETFS_RREQ_CONTENT_ENCRYPTION, &rreq->flags))
> + __set_bit(NETFS_RREQ_CRYPT_IN_PLACE, &rreq->flags);
> +
> + ret = netfs_dispatch_unbuffered_reads(rreq);
> +
> + if (!rreq->submitted) {
> + netfs_put_request(rreq, false, netfs_rreq_trace_put_no_submit);
> + goto out;
> + }
> +
> + ret = netfs_wait_for_read(rreq);
> +out:
> + netfs_put_request(rreq, false, netfs_rreq_trace_put_return);
> + return ret;
> +}
> +EXPORT_SYMBOL(netfs_unbuffered_read_from_inode);
> diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c
> index 83c5c06c4710..a99722f90c71 100644
> --- a/fs/netfs/direct_write.c
> +++ b/fs/netfs/direct_write.c
> @@ -145,6 +145,7 @@ static ssize_t netfs_write_through_bounce_buffer(struct netfs_io_request *wreq,
> wreq->start = gstart;
> wreq->len = gend - gstart;
>
> + __set_bit(NETFS_RREQ_RMW, &ictx->flags);
> if (gstart >= end) {
> /* At or after EOF, nothing to read. */
> } else {
> diff --git a/fs/netfs/main.c b/fs/netfs/main.c
> index 07f8cffbda8c..0900dea53e4a 100644
> --- a/fs/netfs/main.c
> +++ b/fs/netfs/main.c
> @@ -39,6 +39,7 @@ static const char *netfs_origins[nr__netfs_io_origin] = {
> [NETFS_READ_GAPS] = "RG",
> [NETFS_READ_SINGLE] = "R1",
> [NETFS_READ_FOR_WRITE] = "RW",
> + [NETFS_UNBUFFERED_READ] = "UR",
> [NETFS_DIO_READ] = "DR",
> [NETFS_WRITEBACK] = "WB",
> [NETFS_WRITEBACK_SINGLE] = "W1",
> diff --git a/fs/netfs/objects.c b/fs/netfs/objects.c
> index 4606e830c116..958c4d460d07 100644
> --- a/fs/netfs/objects.c
> +++ b/fs/netfs/objects.c
> @@ -60,6 +60,7 @@ struct netfs_io_request *netfs_alloc_request(struct address_space *mapping,
> origin == NETFS_READ_GAPS ||
> origin == NETFS_READ_SINGLE ||
> origin == NETFS_READ_FOR_WRITE ||
> + origin == NETFS_UNBUFFERED_READ ||
> origin == NETFS_DIO_READ) {
> INIT_WORK(&rreq->work, netfs_read_collection_worker);
> rreq->io_streams[0].avail = true;
> diff --git a/fs/netfs/read_collect.c b/fs/netfs/read_collect.c
> index 0a0bff90ca9e..013a90738dcd 100644
> --- a/fs/netfs/read_collect.c
> +++ b/fs/netfs/read_collect.c
> @@ -462,6 +462,7 @@ static void netfs_read_collection(struct netfs_io_request *rreq)
> //netfs_rreq_is_still_valid(rreq);
>
> switch (rreq->origin) {
> + case NETFS_UNBUFFERED_READ:
> case NETFS_DIO_READ:
> case NETFS_READ_GAPS:
> case NETFS_RMW_READ:
> @@ -681,6 +682,7 @@ ssize_t netfs_wait_for_read(struct netfs_io_request *rreq)
> if (ret == 0) {
> ret = rreq->transferred;
> switch (rreq->origin) {
> + case NETFS_UNBUFFERED_READ:
> case NETFS_DIO_READ:
> case NETFS_READ_SINGLE:
> ret = rreq->transferred;
> diff --git a/fs/netfs/write_retry.c b/fs/netfs/write_retry.c
> index f727b48e2bfe..9e4e79d5a403 100644
> --- a/fs/netfs/write_retry.c
> +++ b/fs/netfs/write_retry.c
> @@ -386,6 +386,9 @@ ssize_t netfs_rmw_read(struct netfs_io_request *wreq, struct file *file,
> ret = 0;
> }
>
> + if (ret == 0 && rreq->netfs_ops->rmw_read_done)
> + rreq->netfs_ops->rmw_read_done(wreq, rreq);
> +
> error:
> netfs_put_request(rreq, false, netfs_rreq_trace_put_return);
> return ret;
> diff --git a/include/linux/netfs.h b/include/linux/netfs.h
> index 9d17d4bd9753..4049c985b9b4 100644
> --- a/include/linux/netfs.h
> +++ b/include/linux/netfs.h
> @@ -220,6 +220,7 @@ enum netfs_io_origin {
> NETFS_READ_GAPS, /* This read is a synchronous read to fill gaps */
> NETFS_READ_SINGLE, /* This read should be treated as a single object */
> NETFS_READ_FOR_WRITE, /* This read is to prepare a write */
> + NETFS_UNBUFFERED_READ, /* This is an unbuffered I/O read */
> NETFS_DIO_READ, /* This is a direct I/O read */
> NETFS_WRITEBACK, /* This write was triggered by writepages */
> NETFS_WRITEBACK_SINGLE, /* This monolithic write was triggered by writepages */
> @@ -308,6 +309,9 @@ struct netfs_io_request {
> #define NETFS_RREQ_CONTENT_ENCRYPTION 16 /* Content encryption is in use */
> #define NETFS_RREQ_CRYPT_IN_PLACE 17 /* Do decryption in place */
> #define NETFS_RREQ_PUT_RMW_TAIL 18 /* Need to put ->rmw_tail */
> +#define NETFS_RREQ_RMW 19 /* Performing RMW cycle */
> +#define NETFS_RREQ_REPEAT_RMW 20 /* Need to perform an RMW cycle */
> +#define NETFS_RREQ_NO_READ_HOLE 21 /* Give short read/error if hole encountered */
> #define NETFS_RREQ_USE_PGPRIV2 31 /* [DEPRECATED] Use PG_private_2 to mark
> * write to cache on read */
> const struct netfs_request_ops *netfs_ops;
> @@ -336,6 +340,7 @@ struct netfs_request_ops {
> /* Modification handling */
> void (*update_i_size)(struct inode *inode, loff_t i_size);
> void (*post_modify)(struct inode *inode, void *fs_priv);
> + void (*rmw_read_done)(struct netfs_io_request *wreq, struct netfs_io_request *rreq);
>
> /* Write request handling */
> void (*begin_writeback)(struct netfs_io_request *wreq);
> @@ -432,6 +437,8 @@ ssize_t netfs_unbuffered_read_iter_locked(struct kiocb *iocb, struct iov_iter *i
> ssize_t netfs_unbuffered_read_iter(struct kiocb *iocb, struct iov_iter *iter);
> ssize_t netfs_buffered_read_iter(struct kiocb *iocb, struct iov_iter *iter);
> ssize_t netfs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter);
> +ssize_t netfs_unbuffered_read_from_inode(struct inode *inode, loff_t pos,
> + struct iov_iter *iter, bool nohole);
>
> /* High-level write API */
> ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter,
> diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h
> index 74af82d773bd..9254c6f0e604 100644
> --- a/include/trace/events/netfs.h
> +++ b/include/trace/events/netfs.h
> @@ -23,6 +23,7 @@
> EM(netfs_read_trace_read_gaps, "READ-GAPS") \
> EM(netfs_read_trace_read_single, "READ-SNGL") \
> EM(netfs_read_trace_prefetch_for_write, "PREFETCHW") \
> + EM(netfs_read_trace_unbuffered_read_from_inode, "READ-INOD") \
> E_(netfs_read_trace_write_begin, "WRITEBEGN")
>
> #define netfs_write_traces \
> @@ -38,6 +39,7 @@
> EM(NETFS_READ_GAPS, "RG") \
> EM(NETFS_READ_SINGLE, "R1") \
> EM(NETFS_READ_FOR_WRITE, "RW") \
> + EM(NETFS_UNBUFFERED_READ, "UR") \
> EM(NETFS_DIO_READ, "DR") \
> EM(NETFS_WRITEBACK, "WB") \
> EM(NETFS_WRITEBACK_SINGLE, "W1") \
> @@ -104,6 +106,7 @@
> EM(netfs_sreq_trace_io_progress, "IO ") \
> EM(netfs_sreq_trace_limited, "LIMIT") \
> EM(netfs_sreq_trace_need_clear, "N-CLR") \
> + EM(netfs_sreq_trace_need_rmw, "N-RMW") \
> EM(netfs_sreq_trace_partial_read, "PARTR") \
> EM(netfs_sreq_trace_need_retry, "ND-RT") \
> EM(netfs_sreq_trace_pending, "PEND ") \
>
>
next prev parent reply other threads:[~2025-03-19 19:14 UTC|newest]
Thread overview: 72+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-13 23:32 [RFC PATCH 00/35] ceph, rbd, netfs: Make ceph fully use netfslib David Howells
2025-03-13 23:32 ` [RFC PATCH 01/35] ceph: Fix incorrect flush end position calculation David Howells
2025-03-13 23:32 ` [RFC PATCH 02/35] libceph: Rename alignment to offset David Howells
2025-03-14 19:04 ` Viacheslav Dubeyko
2025-03-14 20:01 ` David Howells
2025-03-13 23:32 ` [RFC PATCH 03/35] libceph: Add a new data container type, ceph_databuf David Howells
2025-03-14 20:06 ` Viacheslav Dubeyko
2025-03-17 11:27 ` David Howells
2025-03-13 23:32 ` [RFC PATCH 04/35] ceph: Convert ceph_mds_request::r_pagelist to a databuf David Howells
2025-03-14 22:27 ` slava
2025-03-17 11:52 ` David Howells
2025-03-20 20:34 ` Viacheslav Dubeyko
2025-03-20 22:01 ` David Howells
2025-03-13 23:32 ` [RFC PATCH 05/35] libceph: Add functions to add ceph_databufs to requests David Howells
2025-03-13 23:32 ` [RFC PATCH 06/35] rbd: Use ceph_databuf for rbd_obj_read_sync() David Howells
2025-03-17 19:08 ` Viacheslav Dubeyko
2025-04-11 13:48 ` David Howells
2025-03-13 23:32 ` [RFC PATCH 07/35] libceph: Change ceph_osdc_call()'s reply to a ceph_databuf David Howells
2025-03-17 19:41 ` Viacheslav Dubeyko
2025-03-17 22:12 ` David Howells
2025-03-13 23:33 ` [RFC PATCH 08/35] libceph: Unexport osd_req_op_cls_request_data_pages() David Howells
2025-03-13 23:33 ` [RFC PATCH 09/35] libceph: Remove osd_req_op_cls_response_data_pages() David Howells
2025-03-13 23:33 ` [RFC PATCH 10/35] libceph: Convert notify_id_pages to a ceph_databuf David Howells
2025-03-13 23:33 ` [RFC PATCH 11/35] ceph: Use ceph_databuf in DIO David Howells
2025-03-17 20:03 ` Viacheslav Dubeyko
2025-03-17 22:26 ` David Howells
2025-03-13 23:33 ` [RFC PATCH 12/35] libceph: Bypass the messenger-v1 Tx loop for databuf/iter data blobs David Howells
2025-03-13 23:33 ` [RFC PATCH 13/35] rbd: Switch from using bvec_iter to iov_iter David Howells
2025-03-18 19:38 ` Viacheslav Dubeyko
2025-03-18 22:13 ` David Howells
2025-03-13 23:33 ` [RFC PATCH 14/35] libceph: Remove bvec and bio data container types David Howells
2025-03-13 23:33 ` [RFC PATCH 15/35] libceph: Make osd_req_op_cls_init() use a ceph_databuf and map it David Howells
2025-03-13 23:33 ` [RFC PATCH 16/35] libceph: Convert req_page of ceph_osdc_call() to ceph_databuf David Howells
2025-03-13 23:33 ` [RFC PATCH 17/35] libceph, rbd: Use ceph_databuf encoding start/stop David Howells
2025-03-18 19:59 ` Viacheslav Dubeyko
2025-03-18 22:19 ` David Howells
2025-03-20 21:45 ` Viacheslav Dubeyko
2025-03-13 23:33 ` [RFC PATCH 18/35] libceph, rbd: Convert some page arrays to ceph_databuf David Howells
2025-03-18 20:02 ` Viacheslav Dubeyko
2025-03-18 22:25 ` David Howells
2025-03-13 23:33 ` [RFC PATCH 19/35] libceph, ceph: Convert users of ceph_pagelist " David Howells
2025-03-18 20:09 ` Viacheslav Dubeyko
2025-03-18 22:27 ` David Howells
2025-03-13 23:33 ` [RFC PATCH 20/35] libceph: Remove ceph_pagelist David Howells
2025-03-13 23:33 ` [RFC PATCH 21/35] libceph: Make notify code use ceph_databuf_enc_start/stop David Howells
2025-03-18 20:12 ` Viacheslav Dubeyko
2025-03-18 22:36 ` David Howells
2025-03-13 23:33 ` [RFC PATCH 22/35] libceph, rbd: Convert ceph_osdc_notify() reply to ceph_databuf David Howells
2025-03-19 0:08 ` Viacheslav Dubeyko
2025-03-20 14:44 ` David Howells
2025-03-13 23:33 ` [RFC PATCH 23/35] rbd: Use ceph_databuf_enc_start/stop() David Howells
2025-03-19 0:32 ` Viacheslav Dubeyko
2025-03-20 14:59 ` Why use plain numbers and totals rather than predef'd constants for RPC sizes? David Howells
2025-03-20 21:48 ` Viacheslav Dubeyko
2025-03-13 23:33 ` [RFC PATCH 24/35] ceph: Make ceph_calc_file_object_mapping() return size as size_t David Howells
2025-03-13 23:33 ` [RFC PATCH 25/35] ceph: Wrap POSIX_FADV_WILLNEED to get caps David Howells
2025-03-13 23:33 ` [RFC PATCH 26/35] ceph: Kill ceph_rw_context David Howells
2025-03-13 23:33 ` [RFC PATCH 27/35] netfs: Pass extra write context to write functions David Howells
2025-03-13 23:33 ` [RFC PATCH 28/35] netfs: Adjust group handling David Howells
2025-03-19 18:57 ` Viacheslav Dubeyko
2025-03-20 15:22 ` David Howells
2025-03-13 23:33 ` [RFC PATCH 29/35] netfs: Allow fs-private data to be handed through to request alloc David Howells
2025-03-13 23:33 ` [RFC PATCH 30/35] netfs: Make netfs_page_mkwrite() use folio_mkwrite_check_truncate() David Howells
2025-03-13 23:33 ` [RFC PATCH 31/35] netfs: Fix netfs_unbuffered_read() to return ssize_t rather than int David Howells
2025-03-13 23:33 ` [RFC PATCH 32/35] netfs: Add some more RMW support for ceph David Howells
2025-03-19 19:14 ` Viacheslav Dubeyko [this message]
2025-03-20 15:25 ` David Howells
2025-03-13 23:33 ` [RFC PATCH 33/35] ceph: Use netfslib [INCOMPLETE] David Howells
2025-03-19 19:54 ` Viacheslav Dubeyko
2025-03-20 15:38 ` David Howells
2025-03-13 23:33 ` [RFC PATCH 34/35] ceph: Enable multipage folios for ceph files David Howells
2025-03-13 23:33 ` [RFC PATCH 35/35] ceph: Remove old I/O API bits David Howells
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b31f451e2949e7c07535accda067178238f7e1bb.camel@ibm.com \
--to=slava.dubeyko@ibm.com \
--cc=amarkuze@redhat.com \
--cc=ceph-devel@vger.kernel.org \
--cc=dhowells@redhat.com \
--cc=dongsheng.yang@easystack.cn \
--cc=idryomov@gmail.com \
--cc=jlayton@kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=slava@dubeyko.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox