From: Vivek Goyal <vgoyal@redhat.com>
To: Linux fsdevel mailing list <linux-fsdevel@vger.kernel.org>,
Miklos Szeredi <miklos@szeredi.hu>,
virtio-fs-list <virtio-fs@redhat.com>
Cc: Brian Foster <bfoster@redhat.com>
Subject: Re: [Virtio-fs] [RFC PATCH] fuse: update attributes on read() only on timeout
Date: Tue, 29 Sep 2020 15:58:48 -0400 [thread overview]
Message-ID: <20200929195848.GH220516@redhat.com> (raw)
In-Reply-To: <20200929185015.GG220516@redhat.com>
On Tue, Sep 29, 2020 at 02:50:15PM -0400, Vivek Goyal wrote:
> Following commit added a flag to invalidate guest page cache automatically.
>
> 72d0d248ca823 fuse: add FUSE_AUTO_INVAL_DATA init flag
>
> Idea seemed to be that for network file systmes if client A modifies
> the file, then client B should be able to detect that mtime of file
> change and invalidate its own cache and fetch new data from server.
>
> There are few questions/issues with this method.
>
> How soon client B able to detect that file has changed. Should it
> first GETATTR from server for every READ and compare mtime. That
> will be much stronger cache coherency but very slow because every
> READ will first be preceeded by a GETATTR.
>
> Or should this be driven by inode timeout. That is if inode cached attrs
> (including mtime) have timed out, we fetch new mtime from server and
> invalidate cache based on that.
>
> Current logic calls fuse_update_attr() on every READ. But that method
> will result in GETATTR only if either attrs have timedout or if cached
> attrs have been invalidated.
>
> If client B is only doing READs (and not WRITEs), then attrs should be
> valid for inode timeout interval. And that means client B will detect
> mtime change only after timeout interval.
>
> But if client B is also doing WRITE, then once WRITE completes, we
> invalidate cached attrs. That means next READ will force GETATTR()
> and invalidate page cache. In this case client B will detect the
> change by client A much sooner but it can't differentiate between
> its own WRITEs and by another client WRITE. So every WRITE followed
> by READ will result in GETATTR, followed by page cache invalidation
> and performance suffers in mixed read/write workloads.
>
> I am assuming that intent of auto_inval_data is to detect changes
> by another client but it can take up to "inode timeout" seconds
> to detect that change. (And it does not guarantee an immidiate change
> detection).
>
> If above assumption is acceptable, then I am proposing this patch
> which will update attrs on READ only if attrs have timed out. This
> means every second we will do a GETATTR and invalidate page cache.
>
> This is also suboptimal because only if client B is writing, our
> cache is still valid but we will still invalidate it after 1 second.
> But we don't have a good mechanism to differentiate between our own
> changes and another client's changes. So this is probably second
> best method to reduce the extent of issue.
>
> I am running equivalent of following fio workload on virtiofs (cache=auto)
> and there I see a performance improvement of roughly 12%.
>
> fio --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio
> +--bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75
> +--output=/output/fio.txt
>
> NAME WORKLOAD Bandwidth IOPS
> vtfs-auto-sh randrw-psync 43.3mb/14.4mb 10.8k/3709
> vtfs-auto-sh-invaltime randrw-psync 48.9mb/16.3mb 12.2k/4197
Also ran a kernel compilation test. With this change, there is roughly
10% improvement in build time.
- git clone linux
- cd linux
- sync; echo 3 > /proc/sys/vm/drop_cache
- make defconfig
- time make -j32
Without patch
-------------
real 4m57.819s
user 23m2.432s
sys 18m39.004s
With patch
----------
real 4m33.549s
user 23m4.168s
sys 18m36.515s
Thanks
Vivek
>
> Signee-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
> fs/fuse/dir.c | 6 ++++++
> fs/fuse/file.c | 21 +++++++++++++++------
> fs/fuse/fuse_i.h | 1 +
> 3 files changed, 22 insertions(+), 6 deletions(-)
>
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index 26f028bc760b..5c4eda0edd95 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -1002,6 +1002,12 @@ int fuse_update_attributes(struct inode *inode, struct file *file)
> STATX_BASIC_STATS & ~STATX_ATIME, 0);
> }
>
> +int fuse_update_attributes_timeout(struct inode *inode, struct file *file)
> +{
> + /* Only refresh attrs if timeout has happened */
> + return fuse_update_get_attr(inode, file, NULL, 0, 0);
> +}
> +
> int fuse_reverse_inval_entry(struct super_block *sb, u64 parent_nodeid,
> u64 child_nodeid, struct qstr *name)
> {
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 6611ef3269a8..dea001f5f4e9 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -972,19 +972,28 @@ static ssize_t fuse_cache_read_iter(struct kiocb *iocb, struct iov_iter *to)
> {
> struct inode *inode = iocb->ki_filp->f_mapping->host;
> struct fuse_conn *fc = get_fuse_conn(inode);
> + int err = 0;
>
> /*
> - * In auto invalidate mode, always update attributes on read.
> + * In auto invalidate mode, update attributes on read if previously
> + * stored attributes timed out. This is primarily done to detect
> + * writes by other clients and invalidate local cache. But we don't
> + * have a way to differentiate between our own writes and writes
> + * by other clients. So we end up updating attrs and invalidating
> + * cache on our own write. Stick to the theme of detecting changes
> + * after timeout. This is what happens already if we are not doing
> + * writes but other client is doing.
> + *
> * Otherwise, only update if we attempt to read past EOF (to ensure
> * i_size is up to date).
> */
> - if (fc->auto_inval_data ||
> - (iocb->ki_pos + iov_iter_count(to) > i_size_read(inode))) {
> - int err;
> + if (iocb->ki_pos + iov_iter_count(to) > i_size_read(inode)) {
> err = fuse_update_attributes(inode, iocb->ki_filp);
> - if (err)
> - return err;
> + } else if (fc->auto_inval_data) {
> + err = fuse_update_attributes_timeout(inode, iocb->ki_filp);
> }
> + if (err)
> + return err;
>
> return generic_file_read_iter(iocb, to);
> }
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index 740a8a7d7ae6..78f93129b60e 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -1004,6 +1004,7 @@ u64 fuse_lock_owner_id(struct fuse_conn *fc, fl_owner_t id);
> void fuse_update_ctime(struct inode *inode);
>
> int fuse_update_attributes(struct inode *inode, struct file *file);
> +int fuse_update_attributes_timeout(struct inode *inode, struct file *file);
>
> void fuse_flush_writepages(struct inode *inode);
>
> --
> 2.25.4
>
WARNING: multiple messages have this Message-ID (diff)
From: Vivek Goyal <vgoyal@redhat.com>
To: Linux fsdevel mailing list <linux-fsdevel@vger.kernel.org>,
Miklos Szeredi <miklos@szeredi.hu>,
virtio-fs-list <virtio-fs@redhat.com>
Cc: Brian Foster <bfoster@redhat.com>
Subject: Re: [RFC PATCH] fuse: update attributes on read() only on timeout
Date: Tue, 29 Sep 2020 15:58:48 -0400 [thread overview]
Message-ID: <20200929195848.GH220516@redhat.com> (raw)
In-Reply-To: <20200929185015.GG220516@redhat.com>
On Tue, Sep 29, 2020 at 02:50:15PM -0400, Vivek Goyal wrote:
> Following commit added a flag to invalidate guest page cache automatically.
>
> 72d0d248ca823 fuse: add FUSE_AUTO_INVAL_DATA init flag
>
> Idea seemed to be that for network file systmes if client A modifies
> the file, then client B should be able to detect that mtime of file
> change and invalidate its own cache and fetch new data from server.
>
> There are few questions/issues with this method.
>
> How soon client B able to detect that file has changed. Should it
> first GETATTR from server for every READ and compare mtime. That
> will be much stronger cache coherency but very slow because every
> READ will first be preceeded by a GETATTR.
>
> Or should this be driven by inode timeout. That is if inode cached attrs
> (including mtime) have timed out, we fetch new mtime from server and
> invalidate cache based on that.
>
> Current logic calls fuse_update_attr() on every READ. But that method
> will result in GETATTR only if either attrs have timedout or if cached
> attrs have been invalidated.
>
> If client B is only doing READs (and not WRITEs), then attrs should be
> valid for inode timeout interval. And that means client B will detect
> mtime change only after timeout interval.
>
> But if client B is also doing WRITE, then once WRITE completes, we
> invalidate cached attrs. That means next READ will force GETATTR()
> and invalidate page cache. In this case client B will detect the
> change by client A much sooner but it can't differentiate between
> its own WRITEs and by another client WRITE. So every WRITE followed
> by READ will result in GETATTR, followed by page cache invalidation
> and performance suffers in mixed read/write workloads.
>
> I am assuming that intent of auto_inval_data is to detect changes
> by another client but it can take up to "inode timeout" seconds
> to detect that change. (And it does not guarantee an immidiate change
> detection).
>
> If above assumption is acceptable, then I am proposing this patch
> which will update attrs on READ only if attrs have timed out. This
> means every second we will do a GETATTR and invalidate page cache.
>
> This is also suboptimal because only if client B is writing, our
> cache is still valid but we will still invalidate it after 1 second.
> But we don't have a good mechanism to differentiate between our own
> changes and another client's changes. So this is probably second
> best method to reduce the extent of issue.
>
> I am running equivalent of following fio workload on virtiofs (cache=auto)
> and there I see a performance improvement of roughly 12%.
>
> fio --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio
> +--bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75
> +--output=/output/fio.txt
>
> NAME WORKLOAD Bandwidth IOPS
> vtfs-auto-sh randrw-psync 43.3mb/14.4mb 10.8k/3709
> vtfs-auto-sh-invaltime randrw-psync 48.9mb/16.3mb 12.2k/4197
Also ran a kernel compilation test. With this change, there is roughly
10% improvement in build time.
- git clone linux
- cd linux
- sync; echo 3 > /proc/sys/vm/drop_cache
- make defconfig
- time make -j32
Without patch
-------------
real 4m57.819s
user 23m2.432s
sys 18m39.004s
With patch
----------
real 4m33.549s
user 23m4.168s
sys 18m36.515s
Thanks
Vivek
>
> Signee-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
> fs/fuse/dir.c | 6 ++++++
> fs/fuse/file.c | 21 +++++++++++++++------
> fs/fuse/fuse_i.h | 1 +
> 3 files changed, 22 insertions(+), 6 deletions(-)
>
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index 26f028bc760b..5c4eda0edd95 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -1002,6 +1002,12 @@ int fuse_update_attributes(struct inode *inode, struct file *file)
> STATX_BASIC_STATS & ~STATX_ATIME, 0);
> }
>
> +int fuse_update_attributes_timeout(struct inode *inode, struct file *file)
> +{
> + /* Only refresh attrs if timeout has happened */
> + return fuse_update_get_attr(inode, file, NULL, 0, 0);
> +}
> +
> int fuse_reverse_inval_entry(struct super_block *sb, u64 parent_nodeid,
> u64 child_nodeid, struct qstr *name)
> {
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 6611ef3269a8..dea001f5f4e9 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -972,19 +972,28 @@ static ssize_t fuse_cache_read_iter(struct kiocb *iocb, struct iov_iter *to)
> {
> struct inode *inode = iocb->ki_filp->f_mapping->host;
> struct fuse_conn *fc = get_fuse_conn(inode);
> + int err = 0;
>
> /*
> - * In auto invalidate mode, always update attributes on read.
> + * In auto invalidate mode, update attributes on read if previously
> + * stored attributes timed out. This is primarily done to detect
> + * writes by other clients and invalidate local cache. But we don't
> + * have a way to differentiate between our own writes and writes
> + * by other clients. So we end up updating attrs and invalidating
> + * cache on our own write. Stick to the theme of detecting changes
> + * after timeout. This is what happens already if we are not doing
> + * writes but other client is doing.
> + *
> * Otherwise, only update if we attempt to read past EOF (to ensure
> * i_size is up to date).
> */
> - if (fc->auto_inval_data ||
> - (iocb->ki_pos + iov_iter_count(to) > i_size_read(inode))) {
> - int err;
> + if (iocb->ki_pos + iov_iter_count(to) > i_size_read(inode)) {
> err = fuse_update_attributes(inode, iocb->ki_filp);
> - if (err)
> - return err;
> + } else if (fc->auto_inval_data) {
> + err = fuse_update_attributes_timeout(inode, iocb->ki_filp);
> }
> + if (err)
> + return err;
>
> return generic_file_read_iter(iocb, to);
> }
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index 740a8a7d7ae6..78f93129b60e 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -1004,6 +1004,7 @@ u64 fuse_lock_owner_id(struct fuse_conn *fc, fl_owner_t id);
> void fuse_update_ctime(struct inode *inode);
>
> int fuse_update_attributes(struct inode *inode, struct file *file);
> +int fuse_update_attributes_timeout(struct inode *inode, struct file *file);
>
> void fuse_flush_writepages(struct inode *inode);
>
> --
> 2.25.4
>
next prev parent reply other threads:[~2020-09-29 19:58 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-29 18:50 [Virtio-fs] [RFC PATCH] fuse: update attributes on read() only on timeout Vivek Goyal
2020-09-29 18:50 ` Vivek Goyal
2020-09-29 19:58 ` Vivek Goyal [this message]
2020-09-29 19:58 ` Vivek Goyal
2020-09-30 4:35 ` [Virtio-fs] " Amir Goldstein
2020-09-30 4:35 ` Amir Goldstein
2020-09-30 6:40 ` [Virtio-fs] " Miklos Szeredi
2020-09-30 6:40 ` Miklos Szeredi
2020-09-30 11:57 ` [Virtio-fs] " Amir Goldstein
2020-09-30 11:57 ` Amir Goldstein
2020-10-01 13:47 ` [Virtio-fs] Kudos: productivity boost using virtio-fs Harry G. Coin
2020-10-01 13:51 ` Dr. David Alan Gilbert
2020-10-01 14:12 ` Harry G. Coin
2020-10-01 16:26 ` Dr. David Alan Gilbert
2020-10-02 14:47 ` Vivek Goyal
2020-09-30 13:02 ` [Virtio-fs] [RFC PATCH] fuse: update attributes on read() only on timeout Vivek Goyal
2020-09-30 13:02 ` Vivek Goyal
2020-09-30 13:19 ` [Virtio-fs] " Amir Goldstein
2020-09-30 13:19 ` Amir Goldstein
2020-09-30 13:38 ` [Virtio-fs] " Miklos Szeredi
2020-09-30 13:38 ` Miklos Szeredi
2020-09-30 9:03 ` [Virtio-fs] " Stefan Hajnoczi
2020-09-30 9:03 ` Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200929195848.GH220516@redhat.com \
--to=vgoyal@redhat.com \
--cc=bfoster@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=virtio-fs@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.