From: Brian Foster <bfoster@redhat.com>
To: Christoph Hellwig <hch@lst.de>
Cc: Christian Brauner <brauner@kernel.org>,
"Darrick J. Wong" <djwong@kernel.org>,
Carlos Maiolino <cem@kernel.org>,
linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 5/8] iomap: optionally use ioends for direct I/O
Date: Thu, 12 Dec 2024 08:29:36 -0500 [thread overview]
Message-ID: <Z1rlQA6N8tCfRlLi@bfoster> (raw)
In-Reply-To: <20241211085420.1380396-6-hch@lst.de>
On Wed, Dec 11, 2024 at 09:53:45AM +0100, Christoph Hellwig wrote:
> struct iomap_ioend currently tracks outstanding buffered writes and has
> some really nice code in core iomap and XFS to merge contiguous I/Os
> an defer them to userspace for completion in a very efficient way.
>
> For zoned writes we'll also need a per-bio user context completion to
> record the written blocks, and the infrastructure for that would look
> basically like the ioend handling for buffered I/O.
>
> So instead of reinventing the wheel, reuse the existing infrastructure.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> fs/iomap/buffered-io.c | 3 +++
> fs/iomap/direct-io.c | 50 +++++++++++++++++++++++++++++++++++++++++-
> fs/iomap/internal.h | 7 ++++++
> include/linux/iomap.h | 4 +++-
> 4 files changed, 62 insertions(+), 2 deletions(-)
> create mode 100644 fs/iomap/internal.h
>
...
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index b521eb15759e..b5466361cafe 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
...
> @@ -163,6 +166,51 @@ static inline void iomap_dio_set_error(struct iomap_dio *dio, int ret)
> cmpxchg(&dio->error, 0, ret);
> }
>
> +u32 iomap_finish_ioend_direct(struct iomap_ioend *ioend)
> +{
> + struct iomap_dio *dio = ioend->io_bio.bi_private;
> + bool should_dirty = (dio->flags & IOMAP_DIO_DIRTY);
> + struct kiocb *iocb = dio->iocb;
> + u32 vec_count = ioend->io_bio.bi_vcnt;
> +
> + if (ioend->io_error)
> + iomap_dio_set_error(dio, ioend->io_error);
> +
> + if (atomic_dec_and_test(&dio->ref)) {
> + struct inode *inode = file_inode(iocb->ki_filp);
> +
> + if (dio->wait_for_completion) {
> + struct task_struct *waiter = dio->submit.waiter;
> +
> + WRITE_ONCE(dio->submit.waiter, NULL);
> + blk_wake_io_task(waiter);
> + } else if (!inode->i_mapping->nrpages) {
> + WRITE_ONCE(iocb->private, NULL);
> +
> + /*
> + * We must never invalidate pages from this thread to
> + * avoid deadlocks with buffered I/O completions.
> + * Tough luck if you hit the tiny race with someone
> + * dirtying the range now.
> + */
> + dio->flags |= IOMAP_DIO_NO_INVALIDATE;
> + iomap_dio_complete_work(&dio->aio.work);
> + } else {
> + INIT_WORK(&dio->aio.work, iomap_dio_complete_work);
> + queue_work(inode->i_sb->s_dio_done_wq, &dio->aio.work);
> + }
> + }
> +
> + if (should_dirty) {
> + bio_check_pages_dirty(&ioend->io_bio);
> + } else {
> + bio_release_pages(&ioend->io_bio, false);
> + bio_put(&ioend->io_bio);
> + }
> +
Not that it matters all that much, but I'm a little curious about the
reasoning for using vec_count here. AFAICS this correlates to per-folio
writeback completions for buffered I/O, but that doesn't seem to apply
to direct I/O. Is there a reason to have the caller throttle based on
vec_counts, or are we just pulling some non-zero value for consistency
sake?
Brian
> + return vec_count;
> +}
> +
> void iomap_dio_bio_end_io(struct bio *bio)
> {
> struct iomap_dio *dio = bio->bi_private;
> diff --git a/fs/iomap/internal.h b/fs/iomap/internal.h
> new file mode 100644
> index 000000000000..20cccfc3bb13
> --- /dev/null
> +++ b/fs/iomap/internal.h
> @@ -0,0 +1,7 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _IOMAP_INTERNAL_H
> +#define _IOMAP_INTERNAL_H 1
> +
> +u32 iomap_finish_ioend_direct(struct iomap_ioend *ioend);
> +
> +#endif /* _IOMAP_INTERNAL_H */
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index eaa8cb9083eb..f6943c80e5fd 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -343,9 +343,11 @@ sector_t iomap_bmap(struct address_space *mapping, sector_t bno,
> #define IOMAP_IOEND_UNWRITTEN (1U << 1)
> /* don't merge into previous ioend */
> #define IOMAP_IOEND_BOUNDARY (1U << 2)
> +/* is direct I/O */
> +#define IOMAP_IOEND_DIRECT (1U << 3)
>
> #define IOMAP_IOEND_NOMERGE_FLAGS \
> - (IOMAP_IOEND_SHARED | IOMAP_IOEND_UNWRITTEN)
> + (IOMAP_IOEND_SHARED | IOMAP_IOEND_UNWRITTEN | IOMAP_IOEND_DIRECT)
>
> /*
> * Structure for writeback I/O completions.
> --
> 2.45.2
>
>
next prev parent reply other threads:[~2024-12-12 13:27 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-11 8:53 RFC: iomap patches for zoned XFS Christoph Hellwig
2024-12-11 8:53 ` [PATCH 1/8] iomap: allow the file system to submit the writeback bios Christoph Hellwig
2024-12-12 17:48 ` Darrick J. Wong
2024-12-11 8:53 ` [PATCH 2/8] iomap: simplify io_flags and io_type in struct iomap_ioend Christoph Hellwig
2024-12-12 17:55 ` Darrick J. Wong
2024-12-13 4:53 ` Christoph Hellwig
2024-12-11 8:53 ` [PATCH 3/8] iomap: add a IOMAP_F_ZONE_APPEND flag Christoph Hellwig
2024-12-12 18:05 ` Darrick J. Wong
2024-12-13 4:55 ` Christoph Hellwig
2024-12-16 4:55 ` Christoph Hellwig
2024-12-19 17:30 ` Darrick J. Wong
2024-12-19 17:35 ` Christoph Hellwig
2024-12-19 17:36 ` Darrick J. Wong
2024-12-11 8:53 ` [PATCH 4/8] iomap: split bios to zone append limits in the submission handlers Christoph Hellwig
2024-12-12 13:28 ` Brian Foster
2024-12-12 15:05 ` Christoph Hellwig
2024-12-12 14:21 ` John Garry
2024-12-12 15:07 ` Christoph Hellwig
2024-12-12 19:51 ` Darrick J. Wong
2024-12-13 4:50 ` Christoph Hellwig
2024-12-11 8:53 ` [PATCH 5/8] iomap: optionally use ioends for direct I/O Christoph Hellwig
2024-12-12 13:29 ` Brian Foster [this message]
2024-12-12 15:12 ` Christoph Hellwig
2024-12-12 19:56 ` Darrick J. Wong
2024-12-13 4:51 ` Christoph Hellwig
2024-12-11 8:53 ` [PATCH 6/8] iomap: pass private data to iomap_page_mkwrite Christoph Hellwig
2024-12-11 8:53 ` [PATCH 7/8] iomap: pass private data to iomap_zero_range Christoph Hellwig
2024-12-11 8:53 ` [PATCH 8/8] iomap: pass private data to iomap_truncate_page Christoph Hellwig
2024-12-12 19:56 ` Darrick J. Wong
2024-12-12 13:29 ` RFC: iomap patches for zoned XFS Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z1rlQA6N8tCfRlLi@bfoster \
--to=bfoster@redhat.com \
--cc=brauner@kernel.org \
--cc=cem@kernel.org \
--cc=djwong@kernel.org \
--cc=hch@lst.de \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.