From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41080) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bSe3A-0002N6-HQ for qemu-devel@nongnu.org; Thu, 28 Jul 2016 01:40:06 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bSe36-0001Gc-7L for qemu-devel@nongnu.org; Thu, 28 Jul 2016 01:40:03 -0400 Received: from lgeamrelo13.lge.com ([156.147.23.53]:36784) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bSe35-0001GR-7i for qemu-devel@nongnu.org; Thu, 28 Jul 2016 01:40:00 -0400 Date: Thu, 28 Jul 2016 14:39:53 +0900 From: Namhyung Kim Message-ID: <20160728053953.GB4371@sejong> References: <1469632111-23260-1-git-send-email-namhyung@kernel.org> <1469632111-23260-7-git-send-email-namhyung@kernel.org> <20160728023254-mutt-send-email-mst@kernel.org> MIME-Version: 1.0 In-Reply-To: <20160728023254-mutt-send-email-mst@kernel.org> Content-Type: text/plain; charset="utf-8" Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH 6/7] qemu: Implement virtio-pstore device List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Michael S. Tsirkin" Cc: kvm@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, LKML , Paolo Bonzini , Radim =?utf-8?B?S3LEjW3DocWZ?= , Anthony Liguori , Anton Vorontsov , Colin Cross , Kees Cook , Tony Luck , Steven Rostedt , Ingo Molnar , Minchan Kim On Thu, Jul 28, 2016 at 03:02:54AM +0300, Michael S. Tsirkin wrote: > On Thu, Jul 28, 2016 at 12:08:30AM +0900, Namhyung Kim wrote: > > Add virtio pstore device to allow kernel log files saved on the host. > > It will save the log files on the directory given by pstore device > > option. > >=20 > > $ qemu-system-x86=5F64 -device virtio-pstore,directory=3Ddir-xx ... > >=20 > > (guest) # echo c > /proc/sysrq-trigger >=20 > So if the point is handling system crashes, I suspect > virtio is the wrong protocol to use. ATM it's rather > elaborate for performance, it's likely not to work when > linux is crashing. I think you want something very very > simple that will still work when you crash. Like maybe > a serial device? Well, I dont' know. As you know, the kernel oops dump is already sent to serial device but it's rather slow. As I wrote in the cover letter, enabling ftrace=5Fdump=5Fon=5Foops makes it even worse.. Also pstore saves the (compressed) binary data so I thought it'd be better to have a dedicated IO channel. I know that we can't rely on anything in kernel when the kernel is crashing. In the virtualization environment, I think it'd be great if it can dump the crash data in the host directly so I chose the virtio. I thought the virtio is a relatively thin layer and independent from other kernel features. Even if it doesn't guarantee to work 100%, it's worth trying IMHO. >=20 > > $ ls dir-xx > > dmesg-1.enc.z dmesg-2.enc.z > >=20 > > The log files are usually compressed using zlib. Users can see the log > > messages directly on the host or on the guest (using pstore filesystem). >=20 > So this lacks all management tools that a regular storage device > has, and these are necessary if people are to use this > in production. >=20 > For example, some kind of provision for limiting the amount of host disk > this can consume seems called for. Rate-limiting disk writes > on host also seems necessary. Handling host disk errors always > propagates error to guest but an option to e.g. stop vm might > be useful to avoid corruption. Yes, it needs that kind of management. I'd like to look at the existing implementation. But basically I thought it as a debugging feature and it won't produce much data for the default setting. Maybe I can limit the file size not to exceed the buffer size and keep up to pre-configured number of files for each type. Now host disk error will be propagated to guest. >=20 > >=20 > > The 'directory' property is required for virtio-pstore device to work. > > It also adds 'bufsize' and 'console' (boolean) properties. >=20 > No idea what these do. Seem to be RW by guest but have no effect > otherwise. The 'directory' is a pathname which it saves the data files. The 'bufsize' sets the size of buffer used by pstore. The 'console' enables saving pstore console type data. >=20 >=20 > > Cc: Paolo Bonzini > > Cc: Radim Kr=C4=8Dm=C3=A1=C5=99 > > Cc: "Michael S. Tsirkin" > > Cc: Anthony Liguori > > Cc: Anton Vorontsov > > Cc: Colin Cross > > Cc: Kees Cook > > Cc: Tony Luck > > Cc: Steven Rostedt > > Cc: Ingo Molnar > > Cc: Minchan Kim > > Cc: kvm@vger.kernel.org > > Cc: qemu-devel@nongnu.org > > Cc: virtualization@lists.linux-foundation.org > > Signed-off-by: Namhyung Kim > > --- [SNIP] > > diff --git a/hw/virtio/virtio-pstore.c b/hw/virtio/virtio-pstore.c > > new file mode 100644 > > index 0000000..2ca7786 > > --- /dev/null > > +++ b/hw/virtio/virtio-pstore.c > > @@ -0,0 +1,477 @@ > > +/* > > + * Virtio Pstore Device > > + * > > + * Copyright (C) 2016 LG Electronics > > + * > > + * Authors: > > + * Namhyung Kim > > + * > > + * This work is licensed under the terms of the GNU GPL, version 2. >=20 > We generally ask new code to be v2 or later. Ok, will update. >=20 > > See > > + * the COPYING file in the top-level directory. > > + * > > + */ > > + > > +#include > > + > > +#include "qemu/osdep.h" > > +#include "qemu/iov.h" > > +#include "qemu-common.h" > > +#include "qemu/cutils.h" > > +#include "qemu/error-report.h" > > +#include "sysemu/kvm.h" > > +#include "qapi/visitor.h" > > +#include "qapi-event.h" > > +#include "trace.h" > > + > > +#include "hw/virtio/virtio.h" > > +#include "hw/virtio/virtio-bus.h" > > +#include "hw/virtio/virtio-access.h" > > +#include "hw/virtio/virtio-pstore.h" > > + > > + > > +static void virtio=5Fpstore=5Fto=5Ffilename(VirtIOPstore *s, char *buf= , size=5Ft sz, > > + struct virtio=5Fpstore=5Freq *re= q) > > +{ > > + const char *basename; > > + unsigned long long id =3D 0; > > + unsigned int flags =3D le32=5Fto=5Fcpu(req->flags); > > + > > + switch (le16=5Fto=5Fcpu(req->type)) { > > + case VIRTIO=5FPSTORE=5FTYPE=5FDMESG: > > + basename =3D "dmesg"; > > + id =3D s->id++; > > + break; > > + case VIRTIO=5FPSTORE=5FTYPE=5FCONSOLE: > > + basename =3D "console"; > > + if (s->console=5Fid) { > > + id =3D s->console=5Fid; > > + } else { > > + id =3D s->console=5Fid =3D s->id++; > > + } > > + break; > > + default: > > + basename =3D "unknown"; > > + break; > > + } > > + > > + snprintf(buf, sz, "%s/%s-%llu%s", s->directory, basename, id, > > + flags & VIRTIO=5FPSTORE=5FFL=5FCOMPRESSED ? ".enc.z" : ""= ); > > +} > > + > > +static void virtio=5Fpstore=5Ffrom=5Ffilename(VirtIOPstore *s, char *n= ame, > > + char *buf, size=5Ft sz, > > + struct virtio=5Fpstore=5Ffilei= nfo *info) > > +{ > > + snprintf(buf, sz, "%s/%s", s->directory, name); >=20 > if this does not fit, buf will not be 0 terminated and can > generally be corrupted. Will change it to use malloc instead. >=20 >=20 > > + > > + if (g=5Fstr=5Fhas=5Fprefix(name, "dmesg-")) { > > + info->type =3D VIRTIO=5FPSTORE=5FTYPE=5FDMESG; > > + name +=3D strlen("dmesg-"); > > + } else if (g=5Fstr=5Fhas=5Fprefix(name, "console-")) { > > + info->type =3D VIRTIO=5FPSTORE=5FTYPE=5FCONSOLE; > > + name +=3D strlen("console-"); > > + } else if (g=5Fstr=5Fhas=5Fprefix(name, "unknown-")) { > > + info->type =3D VIRTIO=5FPSTORE=5FTYPE=5FUNKNOWN; > > + name +=3D strlen("unknown-"); > > + } > > + > > + qemu=5Fstrtoull(name, NULL, 0, &info->id); > > + > > + info->flags =3D 0; > > + if (g=5Fstr=5Fhas=5Fsuffix(name, ".enc.z")) { > > + info->flags |=3D VIRTIO=5FPSTORE=5FFL=5FCOMPRESSED; > > + } > > +} [SNIP] > > +static ssize=5Ft virtio=5Fpstore=5Fdo=5Fwrite(VirtIOPstore *s, struct = iovec *out=5Fsg, > > + unsigned int out=5Fnum, > > + struct virtio=5Fpstore=5Freq *re= q) > > +{ > > + char path[PATH=5FMAX]; > > + int fd; > > + ssize=5Ft len; > > + unsigned short type; > > + int flags =3D O=5FWRONLY | O=5FCREAT; > > + > > + /* we already consume the req */ > > + iov=5Fdiscard=5Ffront(&out=5Fsg, &out=5Fnum, sizeof(*req)); > > + > > + virtio=5Fpstore=5Fto=5Ffilename(s, path, sizeof(path), req); > > + > > + type =3D le16=5Fto=5Fcpu(req->type); > > + > > + if (type =3D=3D VIRTIO=5FPSTORE=5FTYPE=5FDMESG) { > > + flags |=3D O=5FTRUNC; > > + } else if (type =3D=3D VIRTIO=5FPSTORE=5FTYPE=5FCONSOLE) { > > + flags |=3D O=5FAPPEND; > > + } > > + > > + fd =3D open(path, flags, 0644); > > + if (fd < 0) { > > + error=5Freport("cannot open %s", path); > > + return -1; > > + } > > + len =3D writev(fd, out=5Fsg, out=5Fnum); > > + close(fd); > > + > > + return len; >=20 > All this is blocking VM until host io completes. Hmm.. I don't know about the internals of qemu. So does it make guest stop? If so, that's what I want to do for =5FDMESG. :) As it's called only on kernel oops I think it's admittable. But for =5FCONSOLE, it needs to do asynchronously. Maybe I can add a thread to do the work. >=20 >=20 > > +} > > + > > +static ssize=5Ft virtio=5Fpstore=5Fdo=5Fclose(VirtIOPstore *s) > > +{ > > + if (s->dirp =3D=3D NULL) { > > + return 0; > > + } > > + > > + closedir(s->dirp); > > + s->dirp =3D NULL; > > + > > + return 0; > > +} > > + > > +static ssize=5Ft virtio=5Fpstore=5Fdo=5Ferase(VirtIOPstore *s, > > + struct virtio=5Fpstore=5Freq *re= q) > > +{ > > + char path[PATH=5FMAX]; > > + > > + virtio=5Fpstore=5Fto=5Ffilename(s, path, sizeof(path), req); > > + > > + return unlink(path); > > +} > > + > > +static void virtio=5Fpstore=5Fhandle=5Fio(VirtIODevice *vdev, VirtQueu= e *vq) > > +{ > > + VirtIOPstore *s =3D VIRTIO=5FPSTORE(vdev); > > + VirtQueueElement *elem; > > + struct virtio=5Fpstore=5Freq req; > > + struct virtio=5Fpstore=5Fres res; > > + ssize=5Ft len =3D 0; > > + int ret; > > + > > + for (;;) { > > + elem =3D virtqueue=5Fpop(vq, sizeof(VirtQueueElement)); > > + if (!elem) { > > + return; > > + } > > + > > + if (elem->out=5Fnum < 1 || elem->in=5Fnum < 1) { > > + error=5Freport("request or response buffer is missing"); > > + exit(1); > > + } > > + > > + len =3D iov=5Fto=5Fbuf(elem->out=5Fsg, elem->out=5Fnum, 0, &re= q, sizeof(req)); > > + if (len !=3D (ssize=5Ft)sizeof(req)) { > > + error=5Freport("invalid request size: %ld", (long)len); > > + exit(1); > > + } > > + res.cmd =3D req.cmd; > > + res.type =3D req.type; > > + > > + switch (le16=5Fto=5Fcpu(req.cmd)) { > > + case VIRTIO=5FPSTORE=5FCMD=5FOPEN: > > + ret =3D virtio=5Fpstore=5Fdo=5Fopen(s); > > + break; > > + case VIRTIO=5FPSTORE=5FCMD=5FREAD: > > + ret =3D virtio=5Fpstore=5Fdo=5Fread(s, elem->in=5Fsg, elem= ->in=5Fnum, &res); > > + if (ret > 0) { > > + len =3D ret; > > + ret =3D 0; > > + } > > + break; > > + case VIRTIO=5FPSTORE=5FCMD=5FWRITE: > > + ret =3D virtio=5Fpstore=5Fdo=5Fwrite(s, elem->out=5Fsg, el= em->out=5Fnum, &req); > > + break; > > + case VIRTIO=5FPSTORE=5FCMD=5FCLOSE: > > + ret =3D virtio=5Fpstore=5Fdo=5Fclose(s); > > + break; > > + case VIRTIO=5FPSTORE=5FCMD=5FERASE: > > + ret =3D virtio=5Fpstore=5Fdo=5Ferase(s, &req); > > + break; > > + default: > > + ret =3D -1; > > + break; > > + } > > + > > + res.ret =3D ret; > > + > > + iov=5Ffrom=5Fbuf(elem->in=5Fsg, elem->in=5Fnum, 0, &res, sizeo= f(res)); > > + virtqueue=5Fpush(vq, elem, sizeof(res) + len); >=20 > this is wrong - len should be # of bytes written into guest memory. ??? All command except READ only write the virtio=5Fpstore=5Fret so len is 0. For READ, virtio=5Fpstore=5Fdo=5Fread() returns the actual bytes it wrote (minus sizeof(res) of course), so I think it's fine, no? Anyway, thanks for your review! Thanks, Namhyung >=20 > > + > > + virtio=5Fnotify(vdev, vq); > > + g=5Ffree(elem); > > + > > + if (ret < 0) { > > + return; > > + } > > + } > > +}