From: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: Jan Kara <jack@suse.cz>,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
lsf-pc@lists.linux-foundation.org
Subject: Re: [RFCv1][WIP] ext2: Move direct-io to use iomap
Date: Mon, 20 Mar 2023 21:41:25 +0530 [thread overview]
Message-ID: <87ttyfz9le.fsf@doe.com> (raw)
In-Reply-To: <20230316154143.GA11351@frogsfrogsfrogs>
"Darrick J. Wong" <djwong@kernel.org> writes:
> On Thu, Mar 16, 2023 at 08:10:29PM +0530, Ritesh Harjani (IBM) wrote:
>> [DO NOT MERGE] [WORK-IN-PROGRESS]
>>
>> Hello Jan,
>>
>> This is an initial version of the patch set which I wanted to share
>> before today's call. This is still work in progress but atleast passes
>> the set of test cases which I had kept for dio testing (except 1 from my
>> list).
>>
>> Looks like there won't be much/any changes required from iomap side to
>> support ext2 moving to iomap apis.
>>
>> I will be doing some more testing specifically test generic/083 which is
>> occassionally failing in my testing.
>> Also once this is stabilized, I can do some performance testing too if you
>> feel so. Last I remembered we saw some performance regressions when ext4
>> moved to iomap for dio.
>>
>> PS: Please ignore if there are some silly mistakes. As I said, I wanted
>> to get this out before today's discussion. :)
>>
>> Thanks for your help!!
>>
>> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
>> ---
>> fs/ext2/ext2.h | 1 +
>> fs/ext2/file.c | 114 ++++++++++++++++++++++++++++++++++++++++++++++++
>> fs/ext2/inode.c | 20 +--------
>> 3 files changed, 117 insertions(+), 18 deletions(-)
>>
>> diff --git a/fs/ext2/ext2.h b/fs/ext2/ext2.h
>> index cb78d7dcfb95..cb5e309fe040 100644
>> --- a/fs/ext2/ext2.h
>> +++ b/fs/ext2/ext2.h
>> @@ -753,6 +753,7 @@ extern unsigned long ext2_count_free (struct buffer_head *, unsigned);
>> extern struct inode *ext2_iget (struct super_block *, unsigned long);
>> extern int ext2_write_inode (struct inode *, struct writeback_control *);
>> extern void ext2_evict_inode(struct inode *);
>> +extern void ext2_write_failed(struct address_space *mapping, loff_t to);
>> extern int ext2_get_block(struct inode *, sector_t, struct buffer_head *, int);
>> extern int ext2_setattr (struct mnt_idmap *, struct dentry *, struct iattr *);
>> extern int ext2_getattr (struct mnt_idmap *, const struct path *,
>> diff --git a/fs/ext2/file.c b/fs/ext2/file.c
>> index 6b4bebe982ca..7a8561304559 100644
>> --- a/fs/ext2/file.c
>> +++ b/fs/ext2/file.c
>> @@ -161,12 +161,123 @@ int ext2_fsync(struct file *file, loff_t start, loff_t end, int datasync)
>> return ret;
>> }
>>
>> +static ssize_t ext2_dio_read_iter(struct kiocb *iocb, struct iov_iter *to)
>> +{
>> + struct file *file = iocb->ki_filp;
>> + struct inode *inode = file->f_mapping->host;
>> + ssize_t ret;
>> +
>> + inode_lock_shared(inode);
>> + ret = iomap_dio_rw(iocb, to, &ext2_iomap_ops, NULL, 0, NULL, 0);
>> + inode_unlock_shared(inode);
>> +
>> + return ret;
>> +}
>> +
>> +static int ext2_dio_write_end_io(struct kiocb *iocb, ssize_t size,
>> + int error, unsigned int flags)
>> +{
>> + loff_t pos = iocb->ki_pos;
>> + struct inode *inode = file_inode(iocb->ki_filp);
>> +
>> + if (error)
>> + return error;
>> +
>> + pos += size;
>> + if (pos > i_size_read(inode))
>> + i_size_write(inode, pos);
>> +
>> + return 0;
>> +}
>> +
>> +static const struct iomap_dio_ops ext2_dio_write_ops = {
>> + .end_io = ext2_dio_write_end_io,
>> +};
>> +
>> +static ssize_t ext2_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
>> +{
>> + struct file *file = iocb->ki_filp;
>> + struct inode *inode = file->f_mapping->host;
>> + ssize_t ret;
>> + unsigned int flags;
>> + unsigned long blocksize = inode->i_sb->s_blocksize;
>> + loff_t offset = iocb->ki_pos;
>> + loff_t count = iov_iter_count(from);
>> +
>> +
>> + inode_lock(inode);
>> + ret = generic_write_checks(iocb, from);
>> + if (ret <= 0)
>> + goto out_unlock;
>> + ret = file_remove_privs(file);
>> + if (ret)
>> + goto out_unlock;
>> + ret = file_update_time(file);
>> + if (ret)
>> + goto out_unlock;
>
> kiocb_modified() instead of calling file_remove_privs?
Yes, looks likle it is a replacement for file_remove_privs and
file_update_time().
>> +
>> + /*
>> + * We pass IOMAP_DIO_NOSYNC because otherwise iomap_dio_rw()
>> + * calls for generic_write_sync in iomap_dio_complete().
>> + * Since ext2_fsync nmust be called w/o inode lock,
>> + * hence we pass IOMAP_DIO_NOSYNC and handle generic_write_sync()
>> + * ourselves.
>> + */
>> + flags = IOMAP_DIO_NOSYNC;
>> +
>> + /* use IOMAP_DIO_FORCE_WAIT for unaligned of extending writes */
>> + if (iocb->ki_pos + iov_iter_count(from) > i_size_read(inode) ||
>> + (!IS_ALIGNED(iocb->ki_pos | iov_iter_alignment(from), blocksize)))
>> + flags |= IOMAP_DIO_FORCE_WAIT;
>> +
>> + ret = iomap_dio_rw(iocb, from, &ext2_iomap_ops, &ext2_dio_write_ops,
>> + flags, NULL, 0);
>> +
>> + if (ret == -ENOTBLK)
>> + ret = 0;
>> +
>> + if (ret < 0 && ret != -EIOCBQUEUED)
>> + ext2_write_failed(inode->i_mapping, offset + count);
>> +
>> + /* handle case for partial write or fallback to buffered write */
>> + if (ret >= 0 && iov_iter_count(from)) {
>> + loff_t pos, endbyte;
>> + ssize_t status;
>> + ssize_t ret2;
>> +
>> + pos = iocb->ki_pos;
>> + status = generic_perform_write(iocb, from);
>> + if (unlikely(status < 0)) {
>> + ret = status;
>> + goto out_unlock;
>> + }
>> + endbyte = pos + status - 1;
>> + ret2 = filemap_write_and_wait_range(inode->i_mapping, pos,
>> + endbyte);
>> + if (ret2 == 0) {
>> + iocb->ki_pos = endbyte + 1;
>> + ret += status;
>> + invalidate_mapping_pages(inode->i_mapping,
>> + pos >> PAGE_SHIFT,
>> + endbyte >> PAGE_SHIFT);
>> + }
>> + }
>
> (Why not fall back to the actual buffered write path?)
>
Because then we can handle everything related to DIO in
ext4_dio_file_write() itself e.g. As per the semantics of DIO we should
ensure that page-cache pages are written to disk and invalidated before
returning (filemap_write_and_wait_range() and invalidate_mapping_pages())
> Otherwise this looks like a reasonable first start.
Thanks!
-ritesh
next prev parent reply other threads:[~2023-03-20 16:20 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-29 4:46 LSF/MM/BPF 2023 IOMAP conversion status update Luis Chamberlain
2023-01-29 5:06 ` Matthew Wilcox
2023-01-29 5:39 ` Luis Chamberlain
2023-02-08 16:04 ` Jan Kara
2023-02-24 7:01 ` Zhang Yi
2023-02-26 20:16 ` Ritesh Harjani
2023-03-16 14:40 ` [RFCv1][WIP] ext2: Move direct-io to use iomap Ritesh Harjani (IBM)
2023-03-16 15:41 ` Darrick J. Wong
2023-03-20 16:11 ` Ritesh Harjani [this message]
2023-03-20 13:15 ` Christoph Hellwig
2023-03-20 17:51 ` Jan Kara
2023-03-22 6:34 ` Ritesh Harjani
2023-03-23 11:30 ` Jan Kara
2023-03-23 13:19 ` Ritesh Harjani
2023-03-30 0:02 ` Christoph Hellwig
2023-02-27 19:26 ` LSF/MM/BPF 2023 IOMAP conversion status update Darrick J. Wong
2023-02-27 21:02 ` Matthew Wilcox
2023-02-27 19:47 ` Darrick J. Wong
2023-02-27 20:24 ` Luis Chamberlain
2023-02-27 19:06 ` Darrick J. Wong
2023-02-27 19:58 ` Luis Chamberlain
2023-03-01 16:59 ` Ritesh Harjani
2023-03-01 17:08 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ttyfz9le.fsf@doe.com \
--to=ritesh.list@gmail.com \
--cc=djwong@kernel.org \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lsf-pc@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.