From: Vivek Goyal <vgoyal@redhat.com>
To: Jeffle Xu <jefflexu@linux.alibaba.com>
Cc: virtio-fs@redhat.com, joseph.qi@linux.alibaba.com, miklos@szeredi.hu
Subject: Re: [Virtio-fs] [virtiofsd PATCH v6 5/6] virtiofsd: implement file size based dax policy
Date: Wed, 20 Oct 2021 13:54:25 -0400 [thread overview]
Message-ID: <YXBX0flS4CocXyDk@redhat.com> (raw)
In-Reply-To: <20211011030938.109206-6-jefflexu@linux.alibaba.com>
On Mon, Oct 11, 2021 at 11:09:37AM +0800, Jeffle Xu wrote:
> When DAX window is fully utilized and needs to be expanded to avoid the
> performance fluctuation triggered by DAX window recaliming, it may not
> be wise to allocate DAX window for files with size smaller than some
> specific point, considering from the perspective of reducing memory
> overhead.
>
> To maintain one DAX window chunk (e.g., 2MB in size), 32KB
> (512 * 64 bytes) memory footprint will be consumed for page descriptors
> inside guest. Thus it'd better disable DAX for those files smaller than
> 32KB, to reduce the demand for DAX window and thus avoid the unworthy
> memory overhead.
>
> Thus only flag the file with FUSE_ATTR_DAX when the file size is greater
> than 32 KB. The guest will enable DAX only for those files flagged with
> FUSE_ATTR_DAX, when virtiofs is mounted with '-o dax=inode'.
>
> To be noted that both FUSE_LOOKUP and FUSE_READDIRPLUS are affected, and
> will convey FUSE_ATTR_DAX flag to the guest.
>
> This policy will be used when '-o dax=server' option is specified for
> virtiofsd.
So enabling DAX based on size is just one type of policy. Tomorrow one
could think of enabling DAX on some other attribute. So only enable it
for non-executable files etc.
We could be more specific and define a policy say "-o dax=filesize". This
will allow defining other policies easily later.
But problem with this option is that defining more complex policies
will be hard. Say I want to enable DAX on non-executable files which
have size greater than 32K. Being able to combine multiple policies
will be easy if we implement separate options/knobs to control them
and that will allow turning on multiple policies together.
Say, -o dax_size_threshold=1MB -o dax_no_exec_file
So I guess a generic "-o dax=server" is not a bad idea. As of now this
will implement a default size based policy of 32K size. But one can add
options later to tweak this.
Vivek
>
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> ---
> tools/virtiofsd/passthrough_ll.c | 32 ++++++++++++++++++++++++++++++++
> 1 file changed, 32 insertions(+)
>
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index b4de86e317..a5a5061906 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -54,6 +54,7 @@
> #include <sys/syscall.h>
> #include <sys/wait.h>
> #include <sys/xattr.h>
> +#include <sys/user.h>
> #include <syslog.h>
> #include <linux/fs.h>
>
> @@ -61,6 +62,19 @@
> #include "passthrough_helpers.h"
> #include "passthrough_seccomp.h"
>
> +/*
> + * One page descriptor (64 bytes in size) needs to be maintained for every page
> + * in the DAX window chunk, i.e., there is certain guest memory overhead when
> + * DAX is enabled. Thus disable DAX for those files smaller than this certain
> + * memory overhead if virtiofs is mounted in per-file DAX mode, in which case
> + * the guest page cache will consume less memory when DAX is disabled.
> + */
> +#define FUSE_DAX_SHIFT 21
> +#define PAGE_DESC_SHIFT 6
> +#define FUSE_PERFILE_DAX_SHIFT \
> + (FUSE_DAX_SHIFT - PAGE_SHIFT + PAGE_DESC_SHIFT)
> +#define FUSE_PERFILE_DAX_POINT (1 << FUSE_PERFILE_DAX_SHIFT)
> +
> /* Keep track of inode posix locks for each owner. */
> struct lo_inode_plock {
> uint64_t lock_owner;
> @@ -986,6 +1000,20 @@ static int do_statx(struct lo_data *lo, int dirfd, const char *pathname,
> return 0;
> }
>
> +static bool lo_should_enable_dax(struct lo_data *lo,
> + struct fuse_entry_param *e)
> +{
> + if (lo->dax == DAX_NONE || !S_ISREG(e->attr.st_mode)) {
> + return false;
> + }
> +
> + if (lo->dax & DAX_SERVER) {
> + return e->attr.st_size > FUSE_PERFILE_DAX_POINT;
> + }
> +
> + return false;
> +}
> +
> /*
> * Increments nlookup on the inode on success. unref_inode_lolocked() must be
> * called eventually to decrement nlookup again. If inodep is non-NULL, the
> @@ -1076,6 +1104,10 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
> }
> e->ino = inode->fuse_ino;
>
> + if (lo_should_enable_dax(lo, e)) {
> + e->attr_flags |= FUSE_ATTR_DAX;
> + }
> +
> /* Transfer ownership of inode pointer to caller or drop it */
> if (inodep) {
> *inodep = inode;
> --
> 2.27.0
>
next prev parent reply other threads:[~2021-10-20 17:54 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-11 3:09 [Virtio-fs] [virtiofsd PATCH v6 0/6] virtiofsd: support per-file DAX Jeffle Xu
2021-10-11 3:09 ` [Virtio-fs] [virtiofsd PATCH v6 1/6] virtiofsd: add .ioctl() support Jeffle Xu
2021-10-15 2:49 ` [Virtio-fs] [virtiofsd PATCH v7 " Jeffle Xu
2021-10-20 17:36 ` [Virtio-fs] [virtiofsd PATCH v6 " Vivek Goyal
2021-10-21 2:32 ` JeffleXu
2021-10-11 3:09 ` [Virtio-fs] [virtiofsd PATCH v6 2/6] virtiofsd: support per-file DAX in fuse protocol Jeffle Xu
2021-10-11 3:09 ` [Virtio-fs] [virtiofsd PATCH v6 3/6] virtiofsd: negotiate per-file DAX in FUSE_INIT Jeffle Xu
2021-10-11 3:09 ` [Virtio-fs] [virtiofsd PATCH v6 4/6] virtiofsd: add 'dax=' option Jeffle Xu
2021-10-11 3:09 ` [Virtio-fs] [virtiofsd PATCH v6 5/6] virtiofsd: implement file size based dax policy Jeffle Xu
2021-10-20 17:54 ` Vivek Goyal [this message]
2021-11-01 8:59 ` JeffleXu
2021-11-01 13:18 ` Vivek Goyal
2021-10-11 3:09 ` [Virtio-fs] [virtiofsd PATCH v6 6/6] virtiofsd: implement persistent inode attribute " Jeffle Xu
2021-10-20 17:55 ` Vivek Goyal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YXBX0flS4CocXyDk@redhat.com \
--to=vgoyal@redhat.com \
--cc=jefflexu@linux.alibaba.com \
--cc=joseph.qi@linux.alibaba.com \
--cc=miklos@szeredi.hu \
--cc=virtio-fs@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.