From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:52485) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RDBot-0000SX-Rq for qemu-devel@nongnu.org; Mon, 10 Oct 2011 05:06:52 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RDBor-0000bH-Ra for qemu-devel@nongnu.org; Mon, 10 Oct 2011 05:06:47 -0400 Received: from e2.ny.us.ibm.com ([32.97.182.142]:54615) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RDBor-0000b1-Ee for qemu-devel@nongnu.org; Mon, 10 Oct 2011 05:06:45 -0400 Received: from /spool/local by us.ibm.com with XMail ESMTP for from ; Mon, 10 Oct 2011 05:06:44 -0400 Received: from d01av02.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by d01relay07.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id p9A96cOX3031166 for ; Mon, 10 Oct 2011 05:06:40 -0400 Received: from d01av02.pok.ibm.com (loopback [127.0.0.1]) by d01av02.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id p9A96cCC020894 for ; Mon, 10 Oct 2011 06:06:38 -0300 From: "Aneesh Kumar K.V" In-Reply-To: References: <1317969967-8983-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <87ty7i9o1b.fsf@skywalker.in.ibm.com> <87r52m9fnt.fsf@skywalker.in.ibm.com> Date: Mon, 10 Oct 2011 14:36:33 +0530 Message-ID: <87obxp9pwm.fsf@skywalker.in.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH 1/2] hw/9pfs: Add new virtfs option cache=writethrough to skip host page cache List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: aliguori@us.ibm.com, qemu-devel@nongnu.org On Mon, 10 Oct 2011 05:54:56 +0100, Stefan Hajnoczi wr= ote: > On Sun, Oct 9, 2011 at 7:35 PM, Aneesh Kumar K.V > wrote: > > On Sun, 9 Oct 2011 17:16:50 +0100, Stefan Hajnoczi = wrote: > >> On Sun, Oct 9, 2011 at 4:34 PM, Aneesh Kumar K.V > >> wrote: > >> > On Sat, 8 Oct 2011 12:24:37 +0100, Stefan Hajnoczi wrote: > >> >> On Fri, Oct 7, 2011 at 7:46 AM, Aneesh Kumar K.V > >> >> wrote: > >> >> > cache=3Dwritethrough implies the file are opened in the host with= O_SYNC open flag > >> >> > > >> >> > Signed-off-by: Aneesh Kumar K.V > >> >> > --- > >> >> > =C2=A0fsdev/file-op-9p.h =C2=A0 =C2=A0 =C2=A0 =C2=A0 | =C2=A0 =C2= =A01 + > >> >> > =C2=A0fsdev/qemu-fsdev.c =C2=A0 =C2=A0 =C2=A0 =C2=A0 | =C2=A0 10 = ++++++++-- > >> >> > =C2=A0fsdev/qemu-fsdev.h =C2=A0 =C2=A0 =C2=A0 =C2=A0 | =C2=A0 =C2= =A02 ++ > >> >> > =C2=A0hw/9pfs/virtio-9p-device.c | =C2=A0 =C2=A05 +++++ > >> >> > =C2=A0hw/9pfs/virtio-9p.c =C2=A0 =C2=A0 =C2=A0 =C2=A0| =C2=A0 24 = ++++++++++++++++++------ > >> >> > =C2=A0qemu-config.c =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0| =C2=A0 =C2=A06 ++++++ > >> >> > =C2=A0qemu-options.hx =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0| = =C2=A0 17 ++++++++++++----- > >> >> > =C2=A0vl.c =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 | =C2=A0 =C2=A06 ++++++ > >> >> > =C2=A08 files changed, 58 insertions(+), 13 deletions(-) > >> >> > >> >> When would this be used? =C2=A0For serving up vanilla 9P? > >> >> > >> >> I think 9P.u and 9P.l have support for fsync(2) while vanilla 9P do= es not. > >> >> > >> > > >> > TFSYNC is added by 9p.L. So we would need this for 9p.u. > >> > >> I think 9p.u is covered by this wstat hack in > >> http://ericvh.github.com/9p-rfc/rfc9p2000.html#anchor32: > >> > >> "if all the elements of the directory entry in a Twstat message are > >> ``don't touch'' val- ues, the server may interpret it as a request to > >> guarantee that the contents of the associated file are committed to > >> stable storage before the Rwstat message is returned." > >> > >> A real TFSYNC operation is nicer though and could be mandatory (the > >> 9P2000 RFC only says "the server *may*"). > >> > >> > Another use > >> > case is to ensure that we don't leave pages on host as dirty. That w= ould > >> > ensure that large writeback from a guest don't result in large numbe= r of > >> > dirty pages on the host, thereby resulting in writeback in the host.= It > >> > would be needed for predictable I/O behavior in a setup where we have > >> > multiple guest. > >> > >> I see. =C2=A0I'm mostly curious about this change because the caching = modes > >> are a nightmare with block devices - a lot of time is spent discussing > >> and benchmarking them, and they cause confusion when configuring KVM. > >> > >> It sounds like O_SYNC is being used in order to keep page cache clean. > >> =C2=A0But keeping the page cache clean is a side-effect of O_SYNC's > >> behavior: writing out each page and synchronizing the disk write > >> cache. =C2=A0If you are attempting to bypass the page cache, just use > >> O_DIRECT without O_SYNC. > > > > But how about reads. I want to make sure i get to use the page cache for > > reads and also want to keep the page cache clean. > > > >> =C2=A0O_SYNC is doing the additional disk write > >> cache synchronization which will slow down I/O and prevent the server > >> from using disk write cache. =C2=A0O_SYNC is not the right flag to use. > > > > O_DIRECT have additional requirement on buffer alignment, and we don't > > want to send every read to disk. VirtFS also support zero copy > > read/write, so that buffer alignment will always not be possible. > > We want to make sure writes don't leave the page cache dirty so > > that host doesn't spent much time in write back of data dirtied by the = guest. >=20 > sync_file_range(2) kicks off the write-out but does not flush file > system metadata or synchronize the disk write cache. >=20 something like below ? commit bc7c28877b30155264f351d26692b3cceca853bb Author: Aneesh Kumar K.V Date: Mon May 23 11:29:15 2011 +0530 hw/9pfs: Add new virtfs option cache=3Dwritethrough to skip host page c= ache =20=20=20=20 cache=3Dwritethrough implies the file are opened in the host with O_SYN= C open flag =20=20=20=20 Signed-off-by: Aneesh Kumar K.V diff --git a/fsdev/file-op-9p.h b/fsdev/file-op-9p.h index 8de8abf..84ec9e0 100644 --- a/fsdev/file-op-9p.h +++ b/fsdev/file-op-9p.h @@ -53,12 +53,16 @@ struct xattr_operations; /* FsContext flag values */ #define PATHNAME_FSCONTEXT 0x1 =20 +/* cache flags */ +#define V9FS_WRITETHROUGH_CACHE 0x1 + typedef struct FsContext { int flags; char *fs_root; SecModel fs_sm; uid_t uid; + int cache_flags; struct xattr_operations **xops; /* fs driver specific data */ void *private; diff --git a/fsdev/qemu-fsdev.c b/fsdev/qemu-fsdev.c index 768819f..fce016b 100644 --- a/fsdev/qemu-fsdev.c +++ b/fsdev/qemu-fsdev.c @@ -34,6 +34,8 @@ int qemu_fsdev_add(QemuOpts *opts) const char *fstype =3D qemu_opt_get(opts, "fstype"); const char *path =3D qemu_opt_get(opts, "path"); const char *sec_model =3D qemu_opt_get(opts, "security_model"); + const char *cache =3D qemu_opt_get(opts, "cache"); + =20 if (!fsdev_id) { fprintf(stderr, "fsdev: No id specified\n"); @@ -72,10 +74,14 @@ int qemu_fsdev_add(QemuOpts *opts) fsle->fse.path =3D g_strdup(path); fsle->fse.security_model =3D g_strdup(sec_model); fsle->fse.ops =3D FsTypes[i].ops; - + fsle->fse.cache_model =3D 0; + if (cache) { + if (!strcmp(cache, "writethrough")) { + fsle->fse.cache_model =3D V9FS_WRITETHROUGH_CACHE; + } + } QTAILQ_INSERT_TAIL(&fstype_entries, fsle, next); return 0; - } =20 FsTypeEntry *get_fsdev_fsentry(char *id) diff --git a/fsdev/qemu-fsdev.h b/fsdev/qemu-fsdev.h index e04931a..9c440f2 100644 --- a/fsdev/qemu-fsdev.h +++ b/fsdev/qemu-fsdev.h @@ -41,6 +41,7 @@ typedef struct FsTypeEntry { char *fsdev_id; char *path; char *security_model; + int cache_flags; FileOperations *ops; } FsTypeEntry; =20 diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c index e5b68da..b23e8a4 100644 --- a/hw/9pfs/virtio-9p-device.c +++ b/hw/9pfs/virtio-9p-device.c @@ -115,6 +115,7 @@ VirtIODevice *virtio_9p_init(DeviceState *dev, V9fsConf= *conf) exit(1); } =20 + s->ctx.cache_flags =3D fse->cache_flags; s->ctx.fs_root =3D g_strdup(fse->path); len =3D strlen(conf->tag); if (len > MAX_TAG_LEN) { diff --git a/hw/9pfs/virtio-9p-handle.c b/hw/9pfs/virtio-9p-handle.c index 5c8b5ed..441a37f 100644 --- a/hw/9pfs/virtio-9p-handle.c +++ b/hw/9pfs/virtio-9p-handle.c @@ -202,6 +202,15 @@ static ssize_t handle_pwritev(FsContext *ctx, int fd, = const struct iovec *iov, return writev(fd, iov, iovcnt); } #endif + if (ctx->cache_flags & V9FS_WRITETHROUGH_CACHE) { + /* + * Initiate a writeback. This is not a data integrity sync. + * We want to ensure that we don't leave dirty pages in the cache + * after write when cache=3Dwritethrough is sepcified. + */ + sync_file_range(fd, offset, 0, + SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRIT= E); + } } =20 static int handle_chmod(FsContext *fs_ctx, V9fsPath *fs_path, FsCred *cred= p) diff --git a/hw/9pfs/virtio-9p-local.c b/hw/9pfs/virtio-9p-local.c index 9559ff6..5745d14 100644 --- a/hw/9pfs/virtio-9p-local.c +++ b/hw/9pfs/virtio-9p-local.c @@ -213,6 +213,15 @@ static ssize_t local_pwritev(FsContext *ctx, int fd, c= onst struct iovec *iov, return writev(fd, iov, iovcnt); } #endif + if (ctx->cache_flags & V9FS_WRITETHROUGH_CACHE) { + /* + * Initiate a writeback. This is not a data integrity sync. + * We want to ensure that we don't leave dirty pages in the cache + * after write when cache=3Dwritethrough is sepcified. + */ + sync_file_range(fd, offset, 0, + SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRIT= E); + } } =20 static int local_chmod(FsContext *fs_ctx, V9fsPath *fs_path, FsCred *credp) diff --git a/hw/9pfs/virtio-9p.c b/hw/9pfs/virtio-9p.c index c01c31a..3958788 100644 --- a/hw/9pfs/virtio-9p.c +++ b/hw/9pfs/virtio-9p.c @@ -80,6 +80,20 @@ void cred_init(FsCred *credp) credp->fc_rdev =3D -1; } =20 +static int get_dotl_openflags(V9fsState *s, int oflags) +{ + int flags; + /* + * Filter the client open flags + */ + flags =3D oflags & ~(O_NOCTTY | O_ASYNC | O_CREAT); + /* + * Ignore direct disk access hint until the server supports it. + */ + flags &=3D ~O_DIRECT; + return flags; +} + void v9fs_string_init(V9fsString *str) { str->data =3D NULL; @@ -1598,10 +1612,7 @@ static void v9fs_open(void *opaque) err =3D offset; } else { if (s->proto_version =3D=3D V9FS_PROTO_2000L) { - flags =3D mode; - flags &=3D ~(O_NOCTTY | O_ASYNC | O_CREAT); - /* Ignore direct disk access hint until the server supports it= . */ - flags &=3D ~O_DIRECT; + flags =3D get_dotl_openflags(s, mode); } else { flags =3D omode_to_uflags(mode); } @@ -1650,8 +1661,7 @@ static void v9fs_lcreate(void *opaque) goto out_nofid; } =20 - /* Ignore direct disk access hint until the server supports it. */ - flags &=3D ~O_DIRECT; + flags =3D get_dotl_openflags(pdu->s, flags); err =3D v9fs_co_open2(pdu, fidp, &name, gid, flags | O_CREAT, mode, &stbuf); if (err < 0) { diff --git a/qemu-config.c b/qemu-config.c index 7a7854f..b2ab0b2 100644 --- a/qemu-config.c +++ b/qemu-config.c @@ -177,6 +177,9 @@ QemuOptsList qemu_fsdev_opts =3D { }, { .name =3D "security_model", .type =3D QEMU_OPT_STRING, + }, { + .name =3D "cache", + .type =3D QEMU_OPT_STRING, }, { /*End of list */ } }, @@ -199,6 +202,9 @@ QemuOptsList qemu_virtfs_opts =3D { }, { .name =3D "security_model", .type =3D QEMU_OPT_STRING, + }, { + .name =3D "cache", + .type =3D QEMU_OPT_STRING, }, =20 { /*End of list */ } diff --git a/qemu-options.hx b/qemu-options.hx index dfbabd0..38f0aef 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -525,7 +525,8 @@ ETEXI DEFHEADING(File system options:) =20 DEF("fsdev", HAS_ARG, QEMU_OPTION_fsdev, - "-fsdev local,id=3Did,path=3Dpath,security_model=3D[mapped|passthrough= |none]\n", + "-fsdev local,id=3Did,path=3Dpath,security_model=3D[mapped|passthrough= |none]\n" + " [,cache=3Dwritethrough]\n", QEMU_ARCH_ALL) =20 STEXI @@ -541,7 +542,7 @@ The specific Fstype will determine the applicable optio= ns. =20 Options to each backend are described below. =20 -@item -fsdev local ,id=3D@var{id} ,path=3D@var{path} ,security_model=3D@va= r{security_model} +@item -fsdev local ,id=3D@var{id} ,path=3D@var{path} ,security_model=3D@va= r{security_model}[,cache=3D@var{cache}] =20 Create a file-system-"device" for local-filesystem. =20 @@ -552,13 +553,17 @@ Create a file-system-"device" for local-filesystem. @option{security_model} specifies the security model to be followed. @option{security_model} is required. =20 +@option{cache} specifies whether to skip the host page cache. +@option{cache} is an optional argument. + @end table ETEXI =20 DEFHEADING(Virtual File system pass-through options:) =20 DEF("virtfs", HAS_ARG, QEMU_OPTION_virtfs, - "-virtfs local,path=3Dpath,mount_tag=3Dtag,security_model=3D[mapped|pa= ssthrough|none]\n", + "-virtfs local,path=3Dpath,mount_tag=3Dtag,security_model=3D[mapped|pa= ssthrough|none]\n" + " [,cache=3Dwritethrough]\n", QEMU_ARCH_ALL) =20 STEXI @@ -574,7 +579,7 @@ The specific Fstype will determine the applicable optio= ns. =20 Options to each backend are described below. =20 -@item -virtfs local ,path=3D@var{path} ,mount_tag=3D@var{mount_tag} ,secur= ity_model=3D@var{security_model} +@item -virtfs local ,path=3D@var{path} ,mount_tag=3D@var{mount_tag} ,secur= ity_model=3D@var{security_model}[,cache=3D@var{cache}] =20 Create a Virtual file-system-pass through for local-filesystem. =20 @@ -585,10 +590,12 @@ Create a Virtual file-system-pass through for local-f= ilesystem. @option{security_model} specifies the security model to be followed. @option{security_model} is required. =20 - @option{mount_tag} specifies the tag with which the exported file is mount= ed. @option{mount_tag} is required. =20 +@option{cache} specifies whether to skip the host page cache. +@option{cache} is an optional argument. + @end table ETEXI =20 diff --git a/vl.c b/vl.c index bd4a5ce..6760e39 100644 --- a/vl.c +++ b/vl.c @@ -2794,6 +2794,7 @@ int main(int argc, char **argv, char **envp) case QEMU_OPTION_virtfs: { QemuOpts *fsdev; QemuOpts *device; + const char *cache; =20 olist =3D qemu_find_opts("virtfs"); if (!olist) { @@ -2823,6 +2824,11 @@ int main(int argc, char **argv, char **envp) qemu_opt_get(opts, "mount_tag")); exit(1); } + + cache =3D qemu_opt_get(opts, "cache"); + if (cache) { + qemu_opt_set(fsdev, "cache", cache); + } qemu_opt_set(fsdev, "fstype", qemu_opt_get(opts, "fstype")= ); qemu_opt_set(fsdev, "path", qemu_opt_get(opts, "path")); qemu_opt_set(fsdev, "security_model",