From: Dave Chinner <david@fromorbit.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [PATCH] fs: Move mark_inode_dirty out of __generic_write_end
Date: Thu, 20 Jun 2019 14:47:08 +1000 [thread overview]
Message-ID: <20190620044708.GT14363@dread.disaster.area> (raw)
In-Reply-To: <20190618144716.8133-1-agruenba@redhat.com>
On Tue, Jun 18, 2019 at 04:47:16PM +0200, Andreas Gruenbacher wrote:
> Remove the mark_inode_dirty call from __generic_write_end and add it to
> generic_write_end and the high-level iomap functions where necessary.
> That way, large writes will only dirty inodes at the end instead of
> dirtying them once per page. This fixes a performance bottleneck on
> gfs2.
>
> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
> ---
> fs/buffer.c | 26 ++++++++++++++++++--------
> fs/iomap.c | 42 ++++++++++++++++++++++++++++++++++++++----
> 2 files changed, 56 insertions(+), 12 deletions(-)
....
>
> diff --git a/fs/iomap.c b/fs/iomap.c
> index 23ef63fd1669..9454568a7f5e 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -881,6 +881,13 @@ iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *iter,
> {
> struct inode *inode = iocb->ki_filp->f_mapping->host;
> loff_t pos = iocb->ki_pos, ret = 0, written = 0;
> + loff_t old_size;
> +
> + /*
> + * No need to use i_size_read() here, the i_size cannot change under us
> + * because we hold i_rwsem.
> + */
> + old_size = inode->i_size;
>
> while (iov_iter_count(iter)) {
> ret = iomap_apply(inode, pos, iov_iter_count(iter),
> @@ -891,6 +898,9 @@ iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *iter,
> written += ret;
> }
>
> + if (old_size != inode->i_size)
> + mark_inode_dirty(inode);
> +
> return written ? written : ret;
> }
> EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
> @@ -961,18 +971,30 @@ int
> iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len,
> const struct iomap_ops *ops)
> {
> + loff_t old_size;
> loff_t ret;
>
> + /*
> + * No need to use i_size_read() here, the i_size cannot change under us
> + * because we hold i_rwsem.
> + */
> + old_size = inode->i_size;
> +
> while (len) {
> ret = iomap_apply(inode, pos, len, IOMAP_WRITE, ops, NULL,
> iomap_dirty_actor);
> if (ret <= 0)
> - return ret;
> + goto out;
> pos += ret;
> len -= ret;
> }
> + ret = 0;
>
> - return 0;
> +out:
> + if (old_size != inode->i_size)
> + mark_inode_dirty(inode);
I don't think we want to do this.
The patches I have that add range locking for XFS allow buffered
writes to run concurrently with operations that change the inode
size as long as the ranges don't overlap. To do this, XFS will not
hold the i_rwsem over any iomap call it makes in future - it will
hold a range lock instead. Hence we can have writes and other IO
operations occurring at the same time some other operation is
changing the size of the file, and that means this code no longer
does what you are intending it to do because the inode->i_size is no
longer constant across these operations...
Hence I think adding code that depends on i_rwsem to be held to
function correctly is the wrong direction to be taking the iomap
infrastructure.
Cheers,
Dave.
--
Dave Chinner
david at fromorbit.com
WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>,
cluster-devel@redhat.com, linux-fsdevel@vger.kernel.org,
Jan Kara <jack@suse.cz>
Subject: Re: [PATCH] fs: Move mark_inode_dirty out of __generic_write_end
Date: Thu, 20 Jun 2019 14:47:08 +1000 [thread overview]
Message-ID: <20190620044708.GT14363@dread.disaster.area> (raw)
In-Reply-To: <20190618144716.8133-1-agruenba@redhat.com>
On Tue, Jun 18, 2019 at 04:47:16PM +0200, Andreas Gruenbacher wrote:
> Remove the mark_inode_dirty call from __generic_write_end and add it to
> generic_write_end and the high-level iomap functions where necessary.
> That way, large writes will only dirty inodes at the end instead of
> dirtying them once per page. This fixes a performance bottleneck on
> gfs2.
>
> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
> ---
> fs/buffer.c | 26 ++++++++++++++++++--------
> fs/iomap.c | 42 ++++++++++++++++++++++++++++++++++++++----
> 2 files changed, 56 insertions(+), 12 deletions(-)
....
>
> diff --git a/fs/iomap.c b/fs/iomap.c
> index 23ef63fd1669..9454568a7f5e 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -881,6 +881,13 @@ iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *iter,
> {
> struct inode *inode = iocb->ki_filp->f_mapping->host;
> loff_t pos = iocb->ki_pos, ret = 0, written = 0;
> + loff_t old_size;
> +
> + /*
> + * No need to use i_size_read() here, the i_size cannot change under us
> + * because we hold i_rwsem.
> + */
> + old_size = inode->i_size;
>
> while (iov_iter_count(iter)) {
> ret = iomap_apply(inode, pos, iov_iter_count(iter),
> @@ -891,6 +898,9 @@ iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *iter,
> written += ret;
> }
>
> + if (old_size != inode->i_size)
> + mark_inode_dirty(inode);
> +
> return written ? written : ret;
> }
> EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
> @@ -961,18 +971,30 @@ int
> iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len,
> const struct iomap_ops *ops)
> {
> + loff_t old_size;
> loff_t ret;
>
> + /*
> + * No need to use i_size_read() here, the i_size cannot change under us
> + * because we hold i_rwsem.
> + */
> + old_size = inode->i_size;
> +
> while (len) {
> ret = iomap_apply(inode, pos, len, IOMAP_WRITE, ops, NULL,
> iomap_dirty_actor);
> if (ret <= 0)
> - return ret;
> + goto out;
> pos += ret;
> len -= ret;
> }
> + ret = 0;
>
> - return 0;
> +out:
> + if (old_size != inode->i_size)
> + mark_inode_dirty(inode);
I don't think we want to do this.
The patches I have that add range locking for XFS allow buffered
writes to run concurrently with operations that change the inode
size as long as the ranges don't overlap. To do this, XFS will not
hold the i_rwsem over any iomap call it makes in future - it will
hold a range lock instead. Hence we can have writes and other IO
operations occurring at the same time some other operation is
changing the size of the file, and that means this code no longer
does what you are intending it to do because the inode->i_size is no
longer constant across these operations...
Hence I think adding code that depends on i_rwsem to be held to
function correctly is the wrong direction to be taking the iomap
infrastructure.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2019-06-20 4:47 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-18 14:47 [Cluster-devel] [PATCH] fs: Move mark_inode_dirty out of __generic_write_end Andreas Gruenbacher
2019-06-18 14:47 ` Andreas Gruenbacher
2019-06-19 16:01 ` [Cluster-devel] " Jan Kara
2019-06-19 16:01 ` Jan Kara
2019-06-20 4:47 ` Dave Chinner [this message]
2019-06-20 4:47 ` Dave Chinner
2019-06-24 6:54 ` [Cluster-devel] " Christoph Hellwig
2019-06-24 6:54 ` Christoph Hellwig
2019-06-24 18:22 ` [Cluster-devel] " Andreas Gruenbacher
2019-06-24 18:22 ` Andreas Gruenbacher
2019-06-25 9:57 ` [Cluster-devel] " Christoph Hellwig
2019-06-25 9:57 ` Christoph Hellwig
2019-06-25 10:50 ` [Cluster-devel] " Christoph Hellwig
2019-06-25 10:50 ` Christoph Hellwig
2019-06-25 18:13 ` [Cluster-devel] " Andreas Gruenbacher
2019-06-25 18:13 ` Andreas Gruenbacher
2019-06-26 6:03 ` [Cluster-devel] " Christoph Hellwig
2019-06-26 6:03 ` Christoph Hellwig
2019-06-26 12:07 ` [Cluster-devel] " Andreas Gruenbacher
2019-06-26 12:07 ` Andreas Gruenbacher
2019-06-25 15:00 ` [Cluster-devel] " Andreas Gruenbacher
2019-06-25 15:00 ` Andreas Gruenbacher
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190620044708.GT14363@dread.disaster.area \
--to=david@fromorbit.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.