From: Brian Foster <bfoster@redhat.com>
To: Christoph Hellwig <hch@lst.de>
Cc: Christian Brauner <brauner@kernel.org>,
"Darrick J. Wong" <djwong@kernel.org>,
Carlos Maiolino <cem@kernel.org>,
linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 5/8] iomap: optionally use ioends for direct I/O
Date: Thu, 12 Dec 2024 08:29:36 -0500 [thread overview]
Message-ID: <Z1rlQA6N8tCfRlLi@bfoster> (raw)
In-Reply-To: <20241211085420.1380396-6-hch@lst.de>
On Wed, Dec 11, 2024 at 09:53:45AM +0100, Christoph Hellwig wrote:
> struct iomap_ioend currently tracks outstanding buffered writes and has
> some really nice code in core iomap and XFS to merge contiguous I/Os
> an defer them to userspace for completion in a very efficient way.
>
> For zoned writes we'll also need a per-bio user context completion to
> record the written blocks, and the infrastructure for that would look
> basically like the ioend handling for buffered I/O.
>
> So instead of reinventing the wheel, reuse the existing infrastructure.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> fs/iomap/buffered-io.c | 3 +++
> fs/iomap/direct-io.c | 50 +++++++++++++++++++++++++++++++++++++++++-
> fs/iomap/internal.h | 7 ++++++
> include/linux/iomap.h | 4 +++-
> 4 files changed, 62 insertions(+), 2 deletions(-)
> create mode 100644 fs/iomap/internal.h
>
...
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index b521eb15759e..b5466361cafe 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
...
> @@ -163,6 +166,51 @@ static inline void iomap_dio_set_error(struct iomap_dio *dio, int ret)
> cmpxchg(&dio->error, 0, ret);
> }
>
> +u32 iomap_finish_ioend_direct(struct iomap_ioend *ioend)
> +{
> + struct iomap_dio *dio = ioend->io_bio.bi_private;
> + bool should_dirty = (dio->flags & IOMAP_DIO_DIRTY);
> + struct kiocb *iocb = dio->iocb;
> + u32 vec_count = ioend->io_bio.bi_vcnt;
> +
> + if (ioend->io_error)
> + iomap_dio_set_error(dio, ioend->io_error);
> +
> + if (atomic_dec_and_test(&dio->ref)) {
> + struct inode *inode = file_inode(iocb->ki_filp);
> +
> + if (dio->wait_for_completion) {
> + struct task_struct *waiter = dio->submit.waiter;
> +
> + WRITE_ONCE(dio->submit.waiter, NULL);
> + blk_wake_io_task(waiter);
> + } else if (!inode->i_mapping->nrpages) {
> + WRITE_ONCE(iocb->private, NULL);
> +
> + /*
> + * We must never invalidate pages from this thread to
> + * avoid deadlocks with buffered I/O completions.
> + * Tough luck if you hit the tiny race with someone
> + * dirtying the range now.
> + */
> + dio->flags |= IOMAP_DIO_NO_INVALIDATE;
> + iomap_dio_complete_work(&dio->aio.work);
> + } else {
> + INIT_WORK(&dio->aio.work, iomap_dio_complete_work);
> + queue_work(inode->i_sb->s_dio_done_wq, &dio->aio.work);
> + }
> + }
> +
> + if (should_dirty) {
> + bio_check_pages_dirty(&ioend->io_bio);
> + } else {
> + bio_release_pages(&ioend->io_bio, false);
> + bio_put(&ioend->io_bio);
> + }
> +
Not that it matters all that much, but I'm a little curious about the
reasoning for using vec_count here. AFAICS this correlates to per-folio
writeback completions for buffered I/O, but that doesn't seem to apply
to direct I/O. Is there a reason to have the caller throttle based on
vec_counts, or are we just pulling some non-zero value for consistency
sake?
Brian
> + return vec_count;
> +}
> +
> void iomap_dio_bio_end_io(struct bio *bio)
> {
> struct iomap_dio *dio = bio->bi_private;
> diff --git a/fs/iomap/internal.h b/fs/iomap/internal.h
> new file mode 100644
> index 000000000000..20cccfc3bb13
> --- /dev/null
> +++ b/fs/iomap/internal.h
> @@ -0,0 +1,7 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _IOMAP_INTERNAL_H
> +#define _IOMAP_INTERNAL_H 1
> +
> +u32 iomap_finish_ioend_direct(struct iomap_ioend *ioend);
> +
> +#endif /* _IOMAP_INTERNAL_H */
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index eaa8cb9083eb..f6943c80e5fd 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -343,9 +343,11 @@ sector_t iomap_bmap(struct address_space *mapping, sector_t bno,
> #define IOMAP_IOEND_UNWRITTEN (1U << 1)
> /* don't merge into previous ioend */
> #define IOMAP_IOEND_BOUNDARY (1U << 2)
> +/* is direct I/O */
> +#define IOMAP_IOEND_DIRECT (1U << 3)
>
> #define IOMAP_IOEND_NOMERGE_FLAGS \
> - (IOMAP_IOEND_SHARED | IOMAP_IOEND_UNWRITTEN)
> + (IOMAP_IOEND_SHARED | IOMAP_IOEND_UNWRITTEN | IOMAP_IOEND_DIRECT)
>
> /*
> * Structure for writeback I/O completions.
> --
> 2.45.2
>
>
next prev parent reply other threads:[~2024-12-12 13:27 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-11 8:53 RFC: iomap patches for zoned XFS Christoph Hellwig
2024-12-11 8:53 ` [PATCH 1/8] iomap: allow the file system to submit the writeback bios Christoph Hellwig
2024-12-12 17:48 ` Darrick J. Wong
2024-12-11 8:53 ` [PATCH 2/8] iomap: simplify io_flags and io_type in struct iomap_ioend Christoph Hellwig
2024-12-12 17:55 ` Darrick J. Wong
2024-12-13 4:53 ` Christoph Hellwig
2024-12-11 8:53 ` [PATCH 3/8] iomap: add a IOMAP_F_ZONE_APPEND flag Christoph Hellwig
2024-12-12 18:05 ` Darrick J. Wong
2024-12-13 4:55 ` Christoph Hellwig
2024-12-16 4:55 ` Christoph Hellwig
2024-12-19 17:30 ` Darrick J. Wong
2024-12-19 17:35 ` Christoph Hellwig
2024-12-19 17:36 ` Darrick J. Wong
2024-12-11 8:53 ` [PATCH 4/8] iomap: split bios to zone append limits in the submission handlers Christoph Hellwig
2024-12-12 13:28 ` Brian Foster
2024-12-12 15:05 ` Christoph Hellwig
2024-12-12 14:21 ` John Garry
2024-12-12 15:07 ` Christoph Hellwig
2024-12-12 19:51 ` Darrick J. Wong
2024-12-13 4:50 ` Christoph Hellwig
2024-12-11 8:53 ` [PATCH 5/8] iomap: optionally use ioends for direct I/O Christoph Hellwig
2024-12-12 13:29 ` Brian Foster [this message]
2024-12-12 15:12 ` Christoph Hellwig
2024-12-12 19:56 ` Darrick J. Wong
2024-12-13 4:51 ` Christoph Hellwig
2024-12-11 8:53 ` [PATCH 6/8] iomap: pass private data to iomap_page_mkwrite Christoph Hellwig
2024-12-11 8:53 ` [PATCH 7/8] iomap: pass private data to iomap_zero_range Christoph Hellwig
2024-12-11 8:53 ` [PATCH 8/8] iomap: pass private data to iomap_truncate_page Christoph Hellwig
2024-12-12 19:56 ` Darrick J. Wong
2024-12-12 13:29 ` RFC: iomap patches for zoned XFS Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z1rlQA6N8tCfRlLi@bfoster \
--to=bfoster@redhat.com \
--cc=brauner@kernel.org \
--cc=cem@kernel.org \
--cc=djwong@kernel.org \
--cc=hch@lst.de \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox