From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:47011) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RE0nn-0007F2-8B for qemu-devel@nongnu.org; Wed, 12 Oct 2011 11:33:05 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RE0nl-0007Z9-7z for qemu-devel@nongnu.org; Wed, 12 Oct 2011 11:33:03 -0400 Received: from e2.ny.us.ibm.com ([32.97.182.142]:52850) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RE0nl-0007Yp-3q for qemu-devel@nongnu.org; Wed, 12 Oct 2011 11:33:01 -0400 Received: from /spool/local by us.ibm.com with XMail ESMTP for from ; Wed, 12 Oct 2011 11:17:49 -0400 Received: from d01av02.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by d01relay05.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id p9CFFixg113014 for ; Wed, 12 Oct 2011 11:15:44 -0400 Received: from d01av02.pok.ibm.com (loopback [127.0.0.1]) by d01av02.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id p9CFFhDJ004127 for ; Wed, 12 Oct 2011 12:15:43 -0300 From: "Aneesh Kumar K.V" In-Reply-To: References: <1317969967-8983-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <87ty7i9o1b.fsf@skywalker.in.ibm.com> <87r52m9fnt.fsf@skywalker.in.ibm.com> <87obxp9pwm.fsf@skywalker.in.ibm.com> <87r52iuz38.fsf@linux.vnet.ibm.com> Date: Wed, 12 Oct 2011 20:45:31 +0530 Message-ID: <878voqutpo.fsf@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH 1/2] hw/9pfs: Add new virtfs option cache=writethrough to skip host page cache List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: aliguori@us.ibm.com, qemu-devel@nongnu.org On Wed, 12 Oct 2011 15:32:36 +0100, Stefan Hajnoczi wr= ote: > On Wed, Oct 12, 2011 at 2:19 PM, Aneesh Kumar K.V > wrote: > > On Wed, 12 Oct 2011 10:55:21 +0100, Stefan Hajnoczi wrote: > >> On Mon, Oct 10, 2011 at 10:06 AM, Aneesh Kumar K.V > >> wrote: > >> > diff --git a/hw/9pfs/virtio-9p-handle.c b/hw/9pfs/virtio-9p-handle.c > >> > index 5c8b5ed..441a37f 100644 > >> > --- a/hw/9pfs/virtio-9p-handle.c > >> > +++ b/hw/9pfs/virtio-9p-handle.c > >> > @@ -202,6 +202,15 @@ static ssize_t handle_pwritev(FsContext *ctx, i= nt fd, const struct iovec *iov, > >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 return writev(fd, iov, iovcnt); > >> > >> The sync_file_range(2) call below is dead-code since we'll return > >> immediately after writev(2) completes. =C2=A0The writev(2) return value > >> needs to be saved temporarily and then returned after > >> sync_file_range(2). > > > > Missed that. Will fix in the next update > > > >> > >> > =C2=A0 =C2=A0 } > >> > =C2=A0#endif > >> > + =C2=A0 =C2=A0if (ctx->cache_flags & V9FS_WRITETHROUGH_CACHE) { > >> > >> -drive cache=3Dwritethrough means something different from 9pfs > >> "writethrough". =C2=A0This is confusing so I wonder if there is a bett= er > >> name like immediate write-out. > >> > > > > cache=3Dimmediate-write-out ? > > > >> > + =C2=A0 =C2=A0 =C2=A0 =C2=A0/* > >> > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 * Initiate a writeback. This is not a = data integrity sync. > >> > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 * We want to ensure that we don't leav= e dirty pages in the cache > >> > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 * after write when cache=3Dwritethroug= h is sepcified. > >> > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 */ > >> > + =C2=A0 =C2=A0 =C2=A0 =C2=A0sync_file_range(fd, offset, 0, > >> > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE); > >> > + =C2=A0 =C2=A0} > >> > >> I'm not sure whether SYNC_FILE_RANGE_WAIT_BEFORE is necessary. =C2=A0A= s a > >> best-effort mechanism just SYNC_FILE_RANGE_WRITE does the job although > >> a client that rapidly rewrites may be able to leave dirty pages in the > >> host page cache > > > > Shouldn't we prevent this ? That is the reason for me to use that > > WAIT_BEFORE ? >=20 > The flag will cause sync_file_range(2) to wait on in-flight I/O. The > guest will notice slow I/O. You should at least specify a range > instead of nbytes=3D0 in the arguments. version 3 i have=20 commit db3cb2ac561d837176f9e0d02075a898c57ad309 Author: Aneesh Kumar K.V Date: Wed Oct 12 19:11:23 2011 +0530 hw/9pfs: Add new virtfs option writeout=3Dimmediate skip host page cache =20=20=20=20 writeout=3Dimmediate implies the after pwritev we do a sync_file_range. =20=20=20=20 Signed-off-by: Aneesh Kumar K.V diff --git a/fsdev/file-op-9p.h b/fsdev/file-op-9p.h index 8de8abf..af3ecbe 100644 --- a/fsdev/file-op-9p.h +++ b/fsdev/file-op-9p.h @@ -53,12 +53,16 @@ struct xattr_operations; /* FsContext flag values */ #define PATHNAME_FSCONTEXT 0x1 =20 +/* export flags */ +#define V9FS_IMMEDIATE_WRITEOUT 0x1 + typedef struct FsContext { int flags; char *fs_root; SecModel fs_sm; uid_t uid; + int export_flags; struct xattr_operations **xops; /* fs driver specific data */ void *private; diff --git a/fsdev/qemu-fsdev.c b/fsdev/qemu-fsdev.c index 768819f..946bad7 100644 --- a/fsdev/qemu-fsdev.c +++ b/fsdev/qemu-fsdev.c @@ -34,6 +34,8 @@ int qemu_fsdev_add(QemuOpts *opts) const char *fstype =3D qemu_opt_get(opts, "fstype"); const char *path =3D qemu_opt_get(opts, "path"); const char *sec_model =3D qemu_opt_get(opts, "security_model"); + const char *writeout =3D qemu_opt_get(opts, "writeout"); + =20 if (!fsdev_id) { fprintf(stderr, "fsdev: No id specified\n"); @@ -72,10 +74,14 @@ int qemu_fsdev_add(QemuOpts *opts) fsle->fse.path =3D g_strdup(path); fsle->fse.security_model =3D g_strdup(sec_model); fsle->fse.ops =3D FsTypes[i].ops; - + fsle->fse.export_flags =3D 0; + if (writeout) { + if (!strcmp(writeout, "immediate")) { + fsle->fse.export_flags =3D V9FS_IMMEDIATE_WRITEOUT; + } + } QTAILQ_INSERT_TAIL(&fstype_entries, fsle, next); return 0; - } =20 FsTypeEntry *get_fsdev_fsentry(char *id) diff --git a/fsdev/qemu-fsdev.h b/fsdev/qemu-fsdev.h index e04931a..3b2feb4 100644 --- a/fsdev/qemu-fsdev.h +++ b/fsdev/qemu-fsdev.h @@ -41,6 +41,7 @@ typedef struct FsTypeEntry { char *fsdev_id; char *path; char *security_model; + int export_flags; FileOperations *ops; } FsTypeEntry; =20 diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c index e5b68da..403eed0 100644 --- a/hw/9pfs/virtio-9p-device.c +++ b/hw/9pfs/virtio-9p-device.c @@ -115,6 +115,7 @@ VirtIODevice *virtio_9p_init(DeviceState *dev, V9fsConf= *conf) exit(1); } =20 + s->ctx.export_flags =3D fse->export_flags; s->ctx.fs_root =3D g_strdup(fse->path); len =3D strlen(conf->tag); if (len > MAX_TAG_LEN) { diff --git a/hw/9pfs/virtio-9p-handle.c b/hw/9pfs/virtio-9p-handle.c index 5c8b5ed..91d757a 100644 --- a/hw/9pfs/virtio-9p-handle.c +++ b/hw/9pfs/virtio-9p-handle.c @@ -192,16 +192,27 @@ static ssize_t handle_preadv(FsContext *ctx, int fd, = const struct iovec *iov, static ssize_t handle_pwritev(FsContext *ctx, int fd, const struct iovec *= iov, int iovcnt, off_t offset) { + ssize_t ret; #ifdef CONFIG_PREADV - return pwritev(fd, iov, iovcnt, offset); + ret =3D pwritev(fd, iov, iovcnt, offset); #else int err =3D lseek(fd, offset, SEEK_SET); if (err =3D=3D -1) { return err; } else { - return writev(fd, iov, iovcnt); + ret =3D writev(fd, iov, iovcnt); } #endif + if (ret > 0 && ctx->export_flags & V9FS_IMMEDIATE_WRITEOUT) { + /* + * Initiate a writeback. This is not a data integrity sync. + * We want to ensure that we don't leave dirty pages in the cache + * after write when writeout=3Dimmediate is sepcified. + */ + sync_file_range(fd, offset, ret, + SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRIT= E); + } + return ret; } =20 static int handle_chmod(FsContext *fs_ctx, V9fsPath *fs_path, FsCred *cred= p) diff --git a/hw/9pfs/virtio-9p-local.c b/hw/9pfs/virtio-9p-local.c index 9559ff6..7f4681a 100644 --- a/hw/9pfs/virtio-9p-local.c +++ b/hw/9pfs/virtio-9p-local.c @@ -203,16 +203,28 @@ static ssize_t local_preadv(FsContext *ctx, int fd, c= onst struct iovec *iov, static ssize_t local_pwritev(FsContext *ctx, int fd, const struct iovec *i= ov, int iovcnt, off_t offset) { + ssize_t ret +; #ifdef CONFIG_PREADV - return pwritev(fd, iov, iovcnt, offset); + ret =3D pwritev(fd, iov, iovcnt, offset); #else int err =3D lseek(fd, offset, SEEK_SET); if (err =3D=3D -1) { return err; } else { - return writev(fd, iov, iovcnt); + ret =3D writev(fd, iov, iovcnt); } #endif + if (ret > 0 && ctx->export_flags & V9FS_IMMEDIATE_WRITEOUT) { + /* + * Initiate a writeback. This is not a data integrity sync. + * We want to ensure that we don't leave dirty pages in the cache + * after write when writeout=3Dimmediate is sepcified. + */ + sync_file_range(fd, offset, ret, + SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRIT= E); + } + return ret; } =20 static int local_chmod(FsContext *fs_ctx, V9fsPath *fs_path, FsCred *credp) diff --git a/hw/9pfs/virtio-9p.c b/hw/9pfs/virtio-9p.c index c01c31a..3958788 100644 --- a/hw/9pfs/virtio-9p.c +++ b/hw/9pfs/virtio-9p.c @@ -80,6 +80,20 @@ void cred_init(FsCred *credp) credp->fc_rdev =3D -1; } =20 +static int get_dotl_openflags(V9fsState *s, int oflags) +{ + int flags; + /* + * Filter the client open flags + */ + flags =3D oflags & ~(O_NOCTTY | O_ASYNC | O_CREAT); + /* + * Ignore direct disk access hint until the server supports it. + */ + flags &=3D ~O_DIRECT; + return flags; +} + void v9fs_string_init(V9fsString *str) { str->data =3D NULL; @@ -1598,10 +1612,7 @@ static void v9fs_open(void *opaque) err =3D offset; } else { if (s->proto_version =3D=3D V9FS_PROTO_2000L) { - flags =3D mode; - flags &=3D ~(O_NOCTTY | O_ASYNC | O_CREAT); - /* Ignore direct disk access hint until the server supports it= . */ - flags &=3D ~O_DIRECT; + flags =3D get_dotl_openflags(s, mode); } else { flags =3D omode_to_uflags(mode); } @@ -1650,8 +1661,7 @@ static void v9fs_lcreate(void *opaque) goto out_nofid; } =20 - /* Ignore direct disk access hint until the server supports it. */ - flags &=3D ~O_DIRECT; + flags =3D get_dotl_openflags(pdu->s, flags); err =3D v9fs_co_open2(pdu, fidp, &name, gid, flags | O_CREAT, mode, &stbuf); if (err < 0) { diff --git a/qemu-config.c b/qemu-config.c index 7a7854f..4559236 100644 --- a/qemu-config.c +++ b/qemu-config.c @@ -177,6 +177,9 @@ QemuOptsList qemu_fsdev_opts =3D { }, { .name =3D "security_model", .type =3D QEMU_OPT_STRING, + }, { + .name =3D "writeout", + .type =3D QEMU_OPT_STRING, }, { /*End of list */ } }, @@ -199,6 +202,9 @@ QemuOptsList qemu_virtfs_opts =3D { }, { .name =3D "security_model", .type =3D QEMU_OPT_STRING, + }, { + .name =3D "writeout", + .type =3D QEMU_OPT_STRING, }, =20 { /*End of list */ } diff --git a/qemu-options.hx b/qemu-options.hx index dfbabd0..20fd7b5 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -525,7 +525,8 @@ ETEXI DEFHEADING(File system options:) =20 DEF("fsdev", HAS_ARG, QEMU_OPTION_fsdev, - "-fsdev local,id=3Did,path=3Dpath,security_model=3D[mapped|passthrough= |none]\n", + "-fsdev local,id=3Did,path=3Dpath,security_model=3D[mapped|passthrough= |none]\n" + " [,writeout=3Dimmediate]\n", QEMU_ARCH_ALL) =20 STEXI @@ -541,7 +542,7 @@ The specific Fstype will determine the applicable optio= ns. =20 Options to each backend are described below. =20 -@item -fsdev local ,id=3D@var{id} ,path=3D@var{path} ,security_model=3D@va= r{security_model} +@item -fsdev local ,id=3D@var{id} ,path=3D@var{path} ,security_model=3D@va= r{security_model}[,writeout=3D@var{writeout}] =20 Create a file-system-"device" for local-filesystem. =20 @@ -552,13 +553,17 @@ Create a file-system-"device" for local-filesystem. @option{security_model} specifies the security model to be followed. @option{security_model} is required. =20 +@option{writeout} specifies whether to skip the host page cache. +@option{writeout} is an optional argument. + @end table ETEXI =20 DEFHEADING(Virtual File system pass-through options:) =20 DEF("virtfs", HAS_ARG, QEMU_OPTION_virtfs, - "-virtfs local,path=3Dpath,mount_tag=3Dtag,security_model=3D[mapped|pa= ssthrough|none]\n", + "-virtfs local,path=3Dpath,mount_tag=3Dtag,security_model=3D[mapped|pa= ssthrough|none]\n" + " [,writeout=3Dimmediate]\n", QEMU_ARCH_ALL) =20 STEXI @@ -574,7 +579,7 @@ The specific Fstype will determine the applicable optio= ns. =20 Options to each backend are described below. =20 -@item -virtfs local ,path=3D@var{path} ,mount_tag=3D@var{mount_tag} ,secur= ity_model=3D@var{security_model} +@item -virtfs local ,path=3D@var{path} ,mount_tag=3D@var{mount_tag} ,secur= ity_model=3D@var{security_model}[,writeout=3D@var{writeout}] =20 Create a Virtual file-system-pass through for local-filesystem. =20 @@ -585,10 +590,12 @@ Create a Virtual file-system-pass through for local-f= ilesystem. @option{security_model} specifies the security model to be followed. @option{security_model} is required. =20 - @option{mount_tag} specifies the tag with which the exported file is mount= ed. @option{mount_tag} is required. =20 +@option{writeout} specifies whether to skip the host page cache. +@option{writeout} is an optional argument. + @end table ETEXI =20 diff --git a/vl.c b/vl.c index dbf7778..08e1a95 100644 --- a/vl.c +++ b/vl.c @@ -2785,6 +2785,7 @@ int main(int argc, char **argv, char **envp) case QEMU_OPTION_virtfs: { QemuOpts *fsdev; QemuOpts *device; + const char *writeout; =20 olist =3D qemu_find_opts("virtfs"); if (!olist) { @@ -2814,6 +2815,11 @@ int main(int argc, char **argv, char **envp) qemu_opt_get(opts, "mount_tag")); exit(1); } + + writeout =3D qemu_opt_get(opts, "writeout"); + if (writeout) { + qemu_opt_set(fsdev, "writeout", writeout); + } qemu_opt_set(fsdev, "fstype", qemu_opt_get(opts, "fstype")= ); qemu_opt_set(fsdev, "path", qemu_opt_get(opts, "path")); qemu_opt_set(fsdev, "security_model",