From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: aliguori@us.ibm.com, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 1/2] hw/9pfs: Add new virtfs option cache=writethrough to skip host page cache
Date: Mon, 10 Oct 2011 14:36:33 +0530 [thread overview]
Message-ID: <87obxp9pwm.fsf@skywalker.in.ibm.com> (raw)
In-Reply-To: <CAJSP0QUenqf5RmgSDJKA4--6KGRQ2+Xcg0ZcwiZHWCrctUUF4w@mail.gmail.com>
On Mon, 10 Oct 2011 05:54:56 +0100, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Sun, Oct 9, 2011 at 7:35 PM, Aneesh Kumar K.V
> <aneesh.kumar@linux.vnet.ibm.com> wrote:
> > On Sun, 9 Oct 2011 17:16:50 +0100, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> >> On Sun, Oct 9, 2011 at 4:34 PM, Aneesh Kumar K.V
> >> <aneesh.kumar@linux.vnet.ibm.com> wrote:
> >> > On Sat, 8 Oct 2011 12:24:37 +0100, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> >> >> On Fri, Oct 7, 2011 at 7:46 AM, Aneesh Kumar K.V
> >> >> <aneesh.kumar@linux.vnet.ibm.com> wrote:
> >> >> > cache=writethrough implies the file are opened in the host with O_SYNC open flag
> >> >> >
> >> >> > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> >> >> > ---
> >> >> > fsdev/file-op-9p.h | 1 +
> >> >> > fsdev/qemu-fsdev.c | 10 ++++++++--
> >> >> > fsdev/qemu-fsdev.h | 2 ++
> >> >> > hw/9pfs/virtio-9p-device.c | 5 +++++
> >> >> > hw/9pfs/virtio-9p.c | 24 ++++++++++++++++++------
> >> >> > qemu-config.c | 6 ++++++
> >> >> > qemu-options.hx | 17 ++++++++++++-----
> >> >> > vl.c | 6 ++++++
> >> >> > 8 files changed, 58 insertions(+), 13 deletions(-)
> >> >>
> >> >> When would this be used? For serving up vanilla 9P?
> >> >>
> >> >> I think 9P.u and 9P.l have support for fsync(2) while vanilla 9P does not.
> >> >>
> >> >
> >> > TFSYNC is added by 9p.L. So we would need this for 9p.u.
> >>
> >> I think 9p.u is covered by this wstat hack in
> >> http://ericvh.github.com/9p-rfc/rfc9p2000.html#anchor32:
> >>
> >> "if all the elements of the directory entry in a Twstat message are
> >> ``don't touch'' val- ues, the server may interpret it as a request to
> >> guarantee that the contents of the associated file are committed to
> >> stable storage before the Rwstat message is returned."
> >>
> >> A real TFSYNC operation is nicer though and could be mandatory (the
> >> 9P2000 RFC only says "the server *may*").
> >>
> >> > Another use
> >> > case is to ensure that we don't leave pages on host as dirty. That would
> >> > ensure that large writeback from a guest don't result in large number of
> >> > dirty pages on the host, thereby resulting in writeback in the host. It
> >> > would be needed for predictable I/O behavior in a setup where we have
> >> > multiple guest.
> >>
> >> I see. I'm mostly curious about this change because the caching modes
> >> are a nightmare with block devices - a lot of time is spent discussing
> >> and benchmarking them, and they cause confusion when configuring KVM.
> >>
> >> It sounds like O_SYNC is being used in order to keep page cache clean.
> >> But keeping the page cache clean is a side-effect of O_SYNC's
> >> behavior: writing out each page and synchronizing the disk write
> >> cache. If you are attempting to bypass the page cache, just use
> >> O_DIRECT without O_SYNC.
> >
> > But how about reads. I want to make sure i get to use the page cache for
> > reads and also want to keep the page cache clean.
> >
> >> O_SYNC is doing the additional disk write
> >> cache synchronization which will slow down I/O and prevent the server
> >> from using disk write cache. O_SYNC is not the right flag to use.
> >
> > O_DIRECT have additional requirement on buffer alignment, and we don't
> > want to send every read to disk. VirtFS also support zero copy
> > read/write, so that buffer alignment will always not be possible.
> > We want to make sure writes don't leave the page cache dirty so
> > that host doesn't spent much time in write back of data dirtied by the guest.
>
> sync_file_range(2) kicks off the write-out but does not flush file
> system metadata or synchronize the disk write cache.
>
something like below ?
commit bc7c28877b30155264f351d26692b3cceca853bb
Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Date: Mon May 23 11:29:15 2011 +0530
hw/9pfs: Add new virtfs option cache=writethrough to skip host page cache
cache=writethrough implies the file are opened in the host with O_SYNC open flag
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
diff --git a/fsdev/file-op-9p.h b/fsdev/file-op-9p.h
index 8de8abf..84ec9e0 100644
--- a/fsdev/file-op-9p.h
+++ b/fsdev/file-op-9p.h
@@ -53,12 +53,16 @@ struct xattr_operations;
/* FsContext flag values */
#define PATHNAME_FSCONTEXT 0x1
+/* cache flags */
+#define V9FS_WRITETHROUGH_CACHE 0x1
+
typedef struct FsContext
{
int flags;
char *fs_root;
SecModel fs_sm;
uid_t uid;
+ int cache_flags;
struct xattr_operations **xops;
/* fs driver specific data */
void *private;
diff --git a/fsdev/qemu-fsdev.c b/fsdev/qemu-fsdev.c
index 768819f..fce016b 100644
--- a/fsdev/qemu-fsdev.c
+++ b/fsdev/qemu-fsdev.c
@@ -34,6 +34,8 @@ int qemu_fsdev_add(QemuOpts *opts)
const char *fstype = qemu_opt_get(opts, "fstype");
const char *path = qemu_opt_get(opts, "path");
const char *sec_model = qemu_opt_get(opts, "security_model");
+ const char *cache = qemu_opt_get(opts, "cache");
+
if (!fsdev_id) {
fprintf(stderr, "fsdev: No id specified\n");
@@ -72,10 +74,14 @@ int qemu_fsdev_add(QemuOpts *opts)
fsle->fse.path = g_strdup(path);
fsle->fse.security_model = g_strdup(sec_model);
fsle->fse.ops = FsTypes[i].ops;
-
+ fsle->fse.cache_model = 0;
+ if (cache) {
+ if (!strcmp(cache, "writethrough")) {
+ fsle->fse.cache_model = V9FS_WRITETHROUGH_CACHE;
+ }
+ }
QTAILQ_INSERT_TAIL(&fstype_entries, fsle, next);
return 0;
-
}
FsTypeEntry *get_fsdev_fsentry(char *id)
diff --git a/fsdev/qemu-fsdev.h b/fsdev/qemu-fsdev.h
index e04931a..9c440f2 100644
--- a/fsdev/qemu-fsdev.h
+++ b/fsdev/qemu-fsdev.h
@@ -41,6 +41,7 @@ typedef struct FsTypeEntry {
char *fsdev_id;
char *path;
char *security_model;
+ int cache_flags;
FileOperations *ops;
} FsTypeEntry;
diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c
index e5b68da..b23e8a4 100644
--- a/hw/9pfs/virtio-9p-device.c
+++ b/hw/9pfs/virtio-9p-device.c
@@ -115,6 +115,7 @@ VirtIODevice *virtio_9p_init(DeviceState *dev, V9fsConf *conf)
exit(1);
}
+ s->ctx.cache_flags = fse->cache_flags;
s->ctx.fs_root = g_strdup(fse->path);
len = strlen(conf->tag);
if (len > MAX_TAG_LEN) {
diff --git a/hw/9pfs/virtio-9p-handle.c b/hw/9pfs/virtio-9p-handle.c
index 5c8b5ed..441a37f 100644
--- a/hw/9pfs/virtio-9p-handle.c
+++ b/hw/9pfs/virtio-9p-handle.c
@@ -202,6 +202,15 @@ static ssize_t handle_pwritev(FsContext *ctx, int fd, const struct iovec *iov,
return writev(fd, iov, iovcnt);
}
#endif
+ if (ctx->cache_flags & V9FS_WRITETHROUGH_CACHE) {
+ /*
+ * Initiate a writeback. This is not a data integrity sync.
+ * We want to ensure that we don't leave dirty pages in the cache
+ * after write when cache=writethrough is sepcified.
+ */
+ sync_file_range(fd, offset, 0,
+ SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE);
+ }
}
static int handle_chmod(FsContext *fs_ctx, V9fsPath *fs_path, FsCred *credp)
diff --git a/hw/9pfs/virtio-9p-local.c b/hw/9pfs/virtio-9p-local.c
index 9559ff6..5745d14 100644
--- a/hw/9pfs/virtio-9p-local.c
+++ b/hw/9pfs/virtio-9p-local.c
@@ -213,6 +213,15 @@ static ssize_t local_pwritev(FsContext *ctx, int fd, const struct iovec *iov,
return writev(fd, iov, iovcnt);
}
#endif
+ if (ctx->cache_flags & V9FS_WRITETHROUGH_CACHE) {
+ /*
+ * Initiate a writeback. This is not a data integrity sync.
+ * We want to ensure that we don't leave dirty pages in the cache
+ * after write when cache=writethrough is sepcified.
+ */
+ sync_file_range(fd, offset, 0,
+ SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE);
+ }
}
static int local_chmod(FsContext *fs_ctx, V9fsPath *fs_path, FsCred *credp)
diff --git a/hw/9pfs/virtio-9p.c b/hw/9pfs/virtio-9p.c
index c01c31a..3958788 100644
--- a/hw/9pfs/virtio-9p.c
+++ b/hw/9pfs/virtio-9p.c
@@ -80,6 +80,20 @@ void cred_init(FsCred *credp)
credp->fc_rdev = -1;
}
+static int get_dotl_openflags(V9fsState *s, int oflags)
+{
+ int flags;
+ /*
+ * Filter the client open flags
+ */
+ flags = oflags & ~(O_NOCTTY | O_ASYNC | O_CREAT);
+ /*
+ * Ignore direct disk access hint until the server supports it.
+ */
+ flags &= ~O_DIRECT;
+ return flags;
+}
+
void v9fs_string_init(V9fsString *str)
{
str->data = NULL;
@@ -1598,10 +1612,7 @@ static void v9fs_open(void *opaque)
err = offset;
} else {
if (s->proto_version == V9FS_PROTO_2000L) {
- flags = mode;
- flags &= ~(O_NOCTTY | O_ASYNC | O_CREAT);
- /* Ignore direct disk access hint until the server supports it. */
- flags &= ~O_DIRECT;
+ flags = get_dotl_openflags(s, mode);
} else {
flags = omode_to_uflags(mode);
}
@@ -1650,8 +1661,7 @@ static void v9fs_lcreate(void *opaque)
goto out_nofid;
}
- /* Ignore direct disk access hint until the server supports it. */
- flags &= ~O_DIRECT;
+ flags = get_dotl_openflags(pdu->s, flags);
err = v9fs_co_open2(pdu, fidp, &name, gid,
flags | O_CREAT, mode, &stbuf);
if (err < 0) {
diff --git a/qemu-config.c b/qemu-config.c
index 7a7854f..b2ab0b2 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -177,6 +177,9 @@ QemuOptsList qemu_fsdev_opts = {
}, {
.name = "security_model",
.type = QEMU_OPT_STRING,
+ }, {
+ .name = "cache",
+ .type = QEMU_OPT_STRING,
},
{ /*End of list */ }
},
@@ -199,6 +202,9 @@ QemuOptsList qemu_virtfs_opts = {
}, {
.name = "security_model",
.type = QEMU_OPT_STRING,
+ }, {
+ .name = "cache",
+ .type = QEMU_OPT_STRING,
},
{ /*End of list */ }
diff --git a/qemu-options.hx b/qemu-options.hx
index dfbabd0..38f0aef 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -525,7 +525,8 @@ ETEXI
DEFHEADING(File system options:)
DEF("fsdev", HAS_ARG, QEMU_OPTION_fsdev,
- "-fsdev local,id=id,path=path,security_model=[mapped|passthrough|none]\n",
+ "-fsdev local,id=id,path=path,security_model=[mapped|passthrough|none]\n"
+ " [,cache=writethrough]\n",
QEMU_ARCH_ALL)
STEXI
@@ -541,7 +542,7 @@ The specific Fstype will determine the applicable options.
Options to each backend are described below.
-@item -fsdev local ,id=@var{id} ,path=@var{path} ,security_model=@var{security_model}
+@item -fsdev local ,id=@var{id} ,path=@var{path} ,security_model=@var{security_model}[,cache=@var{cache}]
Create a file-system-"device" for local-filesystem.
@@ -552,13 +553,17 @@ Create a file-system-"device" for local-filesystem.
@option{security_model} specifies the security model to be followed.
@option{security_model} is required.
+@option{cache} specifies whether to skip the host page cache.
+@option{cache} is an optional argument.
+
@end table
ETEXI
DEFHEADING(Virtual File system pass-through options:)
DEF("virtfs", HAS_ARG, QEMU_OPTION_virtfs,
- "-virtfs local,path=path,mount_tag=tag,security_model=[mapped|passthrough|none]\n",
+ "-virtfs local,path=path,mount_tag=tag,security_model=[mapped|passthrough|none]\n"
+ " [,cache=writethrough]\n",
QEMU_ARCH_ALL)
STEXI
@@ -574,7 +579,7 @@ The specific Fstype will determine the applicable options.
Options to each backend are described below.
-@item -virtfs local ,path=@var{path} ,mount_tag=@var{mount_tag} ,security_model=@var{security_model}
+@item -virtfs local ,path=@var{path} ,mount_tag=@var{mount_tag} ,security_model=@var{security_model}[,cache=@var{cache}]
Create a Virtual file-system-pass through for local-filesystem.
@@ -585,10 +590,12 @@ Create a Virtual file-system-pass through for local-filesystem.
@option{security_model} specifies the security model to be followed.
@option{security_model} is required.
-
@option{mount_tag} specifies the tag with which the exported file is mounted.
@option{mount_tag} is required.
+@option{cache} specifies whether to skip the host page cache.
+@option{cache} is an optional argument.
+
@end table
ETEXI
diff --git a/vl.c b/vl.c
index bd4a5ce..6760e39 100644
--- a/vl.c
+++ b/vl.c
@@ -2794,6 +2794,7 @@ int main(int argc, char **argv, char **envp)
case QEMU_OPTION_virtfs: {
QemuOpts *fsdev;
QemuOpts *device;
+ const char *cache;
olist = qemu_find_opts("virtfs");
if (!olist) {
@@ -2823,6 +2824,11 @@ int main(int argc, char **argv, char **envp)
qemu_opt_get(opts, "mount_tag"));
exit(1);
}
+
+ cache = qemu_opt_get(opts, "cache");
+ if (cache) {
+ qemu_opt_set(fsdev, "cache", cache);
+ }
qemu_opt_set(fsdev, "fstype", qemu_opt_get(opts, "fstype"));
qemu_opt_set(fsdev, "path", qemu_opt_get(opts, "path"));
qemu_opt_set(fsdev, "security_model",
next prev parent reply other threads:[~2011-10-10 9:06 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-10-07 6:46 [Qemu-devel] [PATCH 1/2] hw/9pfs: Add new virtfs option cache=writethrough to skip host page cache Aneesh Kumar K.V
2011-10-07 6:46 ` [Qemu-devel] [PATCH 2/2] qemu-options.hx: Update virtfs command documentation Aneesh Kumar K.V
2011-10-08 11:24 ` [Qemu-devel] [PATCH 1/2] hw/9pfs: Add new virtfs option cache=writethrough to skip host page cache Stefan Hajnoczi
2011-10-09 15:34 ` Aneesh Kumar K.V
2011-10-09 16:16 ` Stefan Hajnoczi
2011-10-09 18:35 ` Aneesh Kumar K.V
2011-10-10 4:54 ` Stefan Hajnoczi
2011-10-10 9:06 ` Aneesh Kumar K.V [this message]
2011-10-12 9:55 ` Stefan Hajnoczi
2011-10-12 13:19 ` Aneesh Kumar K.V
2011-10-12 14:32 ` Stefan Hajnoczi
2011-10-12 15:15 ` Aneesh Kumar K.V
2011-10-13 14:45 ` Stefan Hajnoczi
-- strict thread matches above, loose matches on Subject: below --
2011-06-06 17:17 Aneesh Kumar K.V
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87obxp9pwm.fsf@skywalker.in.ibm.com \
--to=aneesh.kumar@linux.vnet.ibm.com \
--cc=aliguori@us.ibm.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).