Linux virtualization list
 help / color / mirror / Atom feed
* Re: [PATCH 2/3] qemu: Implement virtio-pstore device
From: Namhyung Kim @ 2016-07-18 14:21 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Tony Luck, Radim Kr??m????, Kees Cook, kvm, Michael S. Tsirkin,
	Anton Vorontsov, LKML, Steven Rostedt, qemu-devel, Minchan Kim,
	Anthony Liguori, Colin Cross, Paolo Bonzini, virtualization,
	Ingo Molnar
In-Reply-To: <20160718100353.GA15163@stefanha-x1.localdomain>

Hello,

On Mon, Jul 18, 2016 at 11:03:53AM +0100, Stefan Hajnoczi wrote:
> On Mon, Jul 18, 2016 at 01:37:40PM +0900, Namhyung Kim wrote:
> > From: Namhyung Kim <namhyung@gmail.com>
> > 
> > Add virtio pstore device to allow kernel log files saved on the host.
> > It will save the log files on the directory given by pstore device
> > option.
> > 
> >   $ qemu-system-x86_64 -device virtio-pstore,directory=dir-xx ...
> > 
> >   (guest) # echo c > /proc/sysrq-trigger
> > 
> >   $ ls dir-xx
> >   dmesg-0.enc.z  dmesg-1.enc.z
> > 
> > The log files are usually compressed using zlib.  Users can see the log
> > messages directly on the host or on the guest (using pstore filesystem).
> 
> The implementation is synchronous (i.e. can pause guest code execution),
> does not handle write errors, and does not limit the amount of data the
> guest can write.  This is sufficient for ad-hoc debugging and usage with
> trusted guests.
> 
> If you want this to be available in environments where the guest isn't
> trusted then there must be a limit on how much the guest can write or
> some kind of log rotation.

Right.  The synchronous IO is required by the pstore subsystem
implementation AFAIK (it uses a single psinfo->buf in the loop).  And
I agree that it should have a way to handle write errors and to limit
amount of data.

> 
> > 
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Radim Kr??m???? <rkrcmar@redhat.com>
> > Cc: "Michael S. Tsirkin" <mst@redhat.com>
> > Cc: Anthony Liguori <aliguori@amazon.com>
> > Cc: Anton Vorontsov <anton@enomsg.org>
> > Cc: Colin Cross <ccross@android.com>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Tony Luck <tony.luck@intel.com>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Cc: Minchan Kim <minchan@kernel.org>
> > Cc: kvm@vger.kernel.org
> > Cc: qemu-devel@nongnu.org
> > Cc: virtualization@lists.linux-foundation.org
> > Signed-off-by: Namhyung Kim <namhyung@gmail.com>
> > ---

[SNIP]
> > +
> > +static void virtio_pstore_to_filename(VirtIOPstore *s, char *buf, size_t sz,
> > +                                      struct virtio_pstore_hdr *hdr)
> > +{
> > +    const char *basename;
> > +
> > +    switch (hdr->type) {
> 
> Missing le16_to_cpu()?
> 
> > +    case VIRTIO_PSTORE_TYPE_DMESG:
> > +        basename = "dmesg";
> > +        break;
> > +    default:
> > +        basename = "unknown";
> > +        break;
> > +    }
> > +
> > +    snprintf(buf, sz, "%s/%s-%llu%s", s->directory, basename,
> > +             (unsigned long long) hdr->id,
> 
> Missing le64_to_cpu()?
> 
> > +             hdr->flags & VIRTIO_PSTORE_FL_COMPRESSED ? ".enc.z" : "");
> 
> Missing le32_to_cpu()?

Oops, will fix.

> 
> > +}
> > +
> > +static void virtio_pstore_from_filename(VirtIOPstore *s, char *name,
> > +                                        char *buf, size_t sz,
> > +                                        struct virtio_pstore_hdr *hdr)
> > +{
> > +    size_t len = strlen(name);
> > +
> > +    hdr->flags = 0;
> > +    if (!strncmp(name + len - 6, ".enc.z", 6)) {
> 
> Please use g_str_has_suffix(name, ".enc.z") to avoid accessing before
> the beginning of the string if the filename is shorter than 6
> characters.

Ah, ok.

> 
> > +        hdr->flags |= VIRTIO_PSTORE_FL_COMPRESSED;
> > +    }
> > +
> > +    snprintf(buf, sz, "%s/%s", s->directory, name);
> > +
> > +    if (!strncmp(name, "dmesg-", 6)) {
> 
> g_str_has_prefix(name, "dmesg-")
> 
> > +        hdr->type = cpu_to_le16(VIRTIO_PSTORE_TYPE_DMESG);
> > +        name += 6;
> > +    } else if (!strncmp(name, "unknown-", 8)) {
> 
> g_str_has_prefix(name, "unknown-")

Will change.

> 
> > +        hdr->type = cpu_to_le16(VIRTIO_PSTORE_TYPE_UNKNOWN);
> > +        name += 8;
> > +    }
> > +
> > +    qemu_strtoull(name, NULL, 0, &hdr->id);
> > +}
> > +
> > +static ssize_t virtio_pstore_do_open(VirtIOPstore *s)
> > +{
> > +    s->dir = opendir(s->directory);
> > +    if (s->dir == NULL) {
> > +        return -1;
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> > +static ssize_t virtio_pstore_do_read(VirtIOPstore *s, void *buf, size_t sz,
> > +                                      struct virtio_pstore_hdr *hdr)
> > +{
> > +    char path[PATH_MAX];
> > +    FILE *fp;
> > +    ssize_t len;
> > +    struct stat stbuf;
> > +    struct dirent *dent;
> > +
> > +    if (s->dir == NULL) {
> > +        return -1;
> > +    }
> > +
> > +    dent = readdir(s->dir);
> > +    while (dent) {
> > +        if (dent->d_name[0] != '.') {
> > +            break;
> > +        }
> > +        dent = readdir(s->dir);
> > +    }
> > +
> > +    if (dent == NULL) {
> > +        return 0;
> > +    }
> > +
> > +    virtio_pstore_from_filename(s, dent->d_name, path, sizeof(path), hdr);
> > +    if (stat(path, &stbuf) < 0) {
> > +        return -1;
> > +    }
> 
> Please use fstat(fileno(fp), &stbuf) after opening the file instead.
> The race condition doesn't matter in this case but the race-free code is
> just as simple so it's one less thing someone reading the code has to
> worry about.

Fair enough.

> 
> > +
> > +    fp = fopen(path, "r");
> > +    if (fp == NULL) {
> > +        error_report("cannot open %s (%p %p)", path, s, s->directory);
> > +        return -1;
> > +    }
> > +
> > +    len = fread(buf, 1, sz, fp);
> > +    if (len < 0 && errno == EAGAIN) {
> > +        len = 0;
> > +    }
> > +
> > +    hdr->id = cpu_to_le64(hdr->id);
> > +    hdr->flags = cpu_to_le32(hdr->flags);
> > +    hdr->time_sec = cpu_to_le64(stbuf.st_ctim.tv_sec);
> > +    hdr->time_nsec = cpu_to_le32(stbuf.st_ctim.tv_nsec);
> > +
> > +    fclose(fp);
> > +    return len;
> > +}
> > +

[SNIP]
> > +static void virtio_pstore_handle_io(VirtIODevice *vdev, VirtQueue *vq)
> > +{
> > +    VirtIOPstore *s = VIRTIO_PSTORE(vdev);
> > +    VirtQueueElement *elem;
> > +    struct virtio_pstore_hdr *hdr;
> > +    ssize_t len;
> > +
> > +    for (;;) {
> > +        elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
> > +        if (!elem) {
> > +            return;
> > +        }
> > +
> > +        hdr = elem->out_sg[0].iov_base;
> > +        if (elem->out_sg[0].iov_len != sizeof(*hdr)) {
> > +            error_report("invalid header size: %u",
> > +                         (unsigned)elem->out_sg[0].iov_len);
> > +            exit(1);
> > +        }
> 
> Please use iov_to_buf() instead of directly accessing out_sg[].  Virtio
> devices are not supposed to assume a particular iovec layout.  In other
> words, virtio_pstore_hdr could be split across multiple out_sg[] iovecs.

I got it.

> 
> You must also copy in data (similar to Linux syscall implementations) to
> prevent the guest from modifying data while the command is processed.
> Such race conditions could lead to security bugs.

Ok, but this assumes the operation is synchronous.  I agree on your
opinion if I could make it async.

> 
> > +
> > +        switch (hdr->cmd) {
> > +        case VIRTIO_PSTORE_CMD_OPEN:
> > +            len = virtio_pstore_do_open(s);
> > +            break;
> > +        case VIRTIO_PSTORE_CMD_READ:
> > +            len = virtio_pstore_do_read(s, elem->in_sg[0].iov_base,
> > +                                        elem->in_sg[0].iov_len, hdr);
> 
> Same issue with iovec layout for in_sg[] here.  The guest driver must be
> able to submit any in_sg[] iovec array and the device cannot assume
> in_sg[0] is the only iovec to fill.

Ok.

> 
> > +            break;
> > +        case VIRTIO_PSTORE_CMD_WRITE:
> > +            len = virtio_pstore_do_write(s, elem->out_sg[1].iov_base,
> > +                                         elem->out_sg[1].iov_len, hdr);
> > +            break;
> > +        case VIRTIO_PSTORE_CMD_CLOSE:
> > +            len = virtio_pstore_do_close(s);
> > +            break;
> > +        case VIRTIO_PSTORE_CMD_ERASE:
> > +            len = virtio_pstore_do_erase(s, hdr);
> > +            break;
> > +        default:
> > +            len = -1;
> > +            break;
> > +        }
> > +
> > +        if (len < 0) {
> > +            return;
> > +        }
> > +
> > +        virtqueue_push(vq, elem, len);
> > +
> > +        virtio_notify(vdev, vq);
> > +        g_free(elem);
> > +    }
> > +}
> > +
> > +static void virtio_pstore_device_realize(DeviceState *dev, Error **errp)
> > +{
> > +    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> > +    VirtIOPstore *s = VIRTIO_PSTORE(dev);
> > +
> > +    virtio_init(vdev, "virtio-pstore", VIRTIO_ID_PSTORE, 0);
> > +
> > +    s->vq = virtio_add_queue(vdev, 128, virtio_pstore_handle_io);
> > +}
> > +
> > +static void virtio_pstore_device_unrealize(DeviceState *dev, Error **errp)
> > +{
> > +    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> > +
> > +    virtio_cleanup(vdev);
> > +}
> > +
> > +static uint64_t get_features(VirtIODevice *vdev, uint64_t f, Error **errp)
> > +{
> > +    return f;
> > +}
> > +
> > +static void pstore_get_directory(Object *obj, Visitor *v,
> > +                                 const char *name, void *opaque,
> > +                                 Error **errp)
> > +{
> > +    VirtIOPstore *s = opaque;
> > +
> > +    visit_type_str(v, name, &s->directory, errp);
> > +}
> > +
> > +static void pstore_set_directory(Object *obj, Visitor *v,
> > +                                 const char *name, void *opaque,
> > +                                 Error **errp)
> > +{
> > +    VirtIOPstore *s = opaque;
> > +    Error *local_err = NULL;
> > +    char *value;
> > +
> > +    visit_type_str(v, name, &value, &local_err);
> > +    if (local_err) {
> > +        error_propagate(errp, local_err);
> > +        return;
> > +    }
> > +
> > +    g_free(s->directory);
> > +    s->directory = strdup(value);
> 
> Please use g_strdup() since this is paired with g_free().
> 
> Or even simpler would be s->directory = value and do not g_free(value)
> below.

Ok, I was not sure whether I could use it without alloc/free pair.
Will do it simpler way then. :)


> 
> > +
> > +    g_free(value);
> > +}
> > +
> > +static void pstore_release_directory(Object *obj, const char *name,
> > +                                     void *opaque)
> > +{
> > +    VirtIOPstore *s = opaque;
> > +
> > +    g_free(s->directory);
> > +    s->directory = NULL;
> > +}
> > +
> > +static Property virtio_pstore_properties[] = {
> > +    DEFINE_PROP_END_OF_LIST(),
> > +};
> > +
> > +static void virtio_pstore_instance_init(Object *obj)
> > +{
> > +    VirtIOPstore *s = VIRTIO_PSTORE(obj);
> > +
> > +    object_property_add(obj, "directory", "str",
> > +                        pstore_get_directory, pstore_set_directory,
> > +                        pstore_release_directory, s, NULL);
> > +}
> > +
> > +static void virtio_pstore_class_init(ObjectClass *klass, void *data)
> > +{
> > +    DeviceClass *dc = DEVICE_CLASS(klass);
> > +    VirtioDeviceClass *vdc = VIRTIO_DEVICE_CLASS(klass);
> > +
> > +    dc->props = virtio_pstore_properties;
> > +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> > +    vdc->realize = virtio_pstore_device_realize;
> > +    vdc->unrealize = virtio_pstore_device_unrealize;
> > +    vdc->get_features = get_features;
> > +}
> > +
> > +static const TypeInfo virtio_pstore_info = {
> > +    .name = TYPE_VIRTIO_PSTORE,
> > +    .parent = TYPE_VIRTIO_DEVICE,
> > +    .instance_size = sizeof(VirtIOPstore),
> > +    .instance_init = virtio_pstore_instance_init,
> > +    .class_init = virtio_pstore_class_init,
> > +};
> > +
> > +static void virtio_register_types(void)
> > +{
> > +    type_register_static(&virtio_pstore_info);
> > +}
> > +
> > +type_init(virtio_register_types)
> > diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> > index 9ed1624..5689c6f 100644
> > --- a/include/hw/pci/pci.h
> > +++ b/include/hw/pci/pci.h
> > @@ -79,6 +79,7 @@
> >  #define PCI_DEVICE_ID_VIRTIO_SCSI        0x1004
> >  #define PCI_DEVICE_ID_VIRTIO_RNG         0x1005
> >  #define PCI_DEVICE_ID_VIRTIO_9P          0x1009
> > +#define PCI_DEVICE_ID_VIRTIO_PSTORE      0x100a
> >  
> >  #define PCI_VENDOR_ID_REDHAT             0x1b36
> >  #define PCI_DEVICE_ID_REDHAT_BRIDGE      0x0001
> > diff --git a/include/hw/virtio/virtio-pstore.h b/include/hw/virtio/virtio-pstore.h
> > new file mode 100644
> > index 0000000..74cd1f6
> > --- /dev/null
> > +++ b/include/hw/virtio/virtio-pstore.h
> > @@ -0,0 +1,30 @@
> > +/*
> > + * Virtio Pstore Support
> > + *
> > + * Authors:
> > + *  Namhyung Kim      <namhyung@gmail.com>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2.  See
> > + * the COPYING file in the top-level directory.
> > + *
> > + */
> > +
> > +#ifndef _QEMU_VIRTIO_PSTORE_H
> > +#define _QEMU_VIRTIO_PSTORE_H
> > +
> > +#include "standard-headers/linux/virtio_pstore.h"
> > +#include "hw/virtio/virtio.h"
> > +#include "hw/pci/pci.h"
> > +
> > +#define TYPE_VIRTIO_PSTORE "virtio-pstore-device"
> > +#define VIRTIO_PSTORE(obj) \
> > +        OBJECT_CHECK(VirtIOPstore, (obj), TYPE_VIRTIO_PSTORE)
> > +
> > +typedef struct VirtIOPstore {
> > +    VirtIODevice parent_obj;
> > +    VirtQueue *vq;
> > +    char *directory;
> > +    DIR *dir;
> > +} VirtIOPstore;
> > +
> > +#endif
> > diff --git a/include/standard-headers/linux/virtio_ids.h b/include/standard-headers/linux/virtio_ids.h
> > index 77925f5..cba6322 100644
> > --- a/include/standard-headers/linux/virtio_ids.h
> > +++ b/include/standard-headers/linux/virtio_ids.h
> > @@ -41,5 +41,6 @@
> >  #define VIRTIO_ID_CAIF	       12 /* Virtio caif */
> >  #define VIRTIO_ID_GPU          16 /* virtio GPU */
> >  #define VIRTIO_ID_INPUT        18 /* virtio input */
> > +#define VIRTIO_ID_PSTORE       19 /* virtio pstore */
> 
> 19 has already been reserved.  22 is the next free ID (vsock, crypto,
> and sdm are currently under review and already use 19, 20, and 21).

I wasn't aware of the ongoing works but Cornelia already told me about
it.  Will update.

> 
> Please send a VIRTIO draft specification to
> virtio-dev@lists.oasis-open.org.  You can find information on the VIRTIO
> standards process here:
> https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=virtio

Thank you very much for this information and your detailed review!
I'll take a look at the virtio standards process too.

Thanks,
Namhyung

^ permalink raw reply

* Re: [PATCH v2] virtio_blk: Fix a slient kernel panic
From: Cornelia Huck @ 2016-07-18 15:21 UTC (permalink / raw)
  To: Minfei Huang
  Cc: Minfei Huang, fanc.fnst, linux-kernel, virtualization, mst,
	Minfei Huang
In-Reply-To: <1468850489-40157-1-git-send-email-mnghuan@gmail.com>

On Mon, 18 Jul 2016 22:01:29 +0800
Minfei Huang <mnfhuang@gmail.com> wrote:

> We do a lot of memory allocation in function init_vq, and don't handle
> the allocation failure properly. Then this function will return 0,
> although initialization fails due to lacking memory. At that moment,
> kernel will panic in guest machine, if virtio is used to drive disk.
> 
> To fix this bug, we should take care of allocation failure, and return
> correct value to let caller know what happen.
> 
> Tested-by: Chao Fan <fanc.fnst@cn.fujitsu.com>
> Signed-off-by: Minfei Huang <minfei.hmf@alibaba-inc.com>
> Signed-off-by: Minfei Huang <mnghuan@gmail.com>
> ---
> v1:
> - Refactor the patch to make code more readable
> ---
>  drivers/block/virtio_blk.c | 32 +++++++++++---------------------
>  1 file changed, 11 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 42758b5..d920512 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -381,9 +381,9 @@ static int init_vq(struct virtio_blk *vblk)
>  {
>  	int err = 0;
>  	int i;
> -	vq_callback_t **callbacks;
> -	const char **names;
> -	struct virtqueue **vqs;
> +	vq_callback_t **callbacks = NULL;
> +	const char **names = NULL;
> +	struct virtqueue **vqs = NULL;

If you init the variables to NULL anyway...

>  	unsigned short num_vqs;
>  	struct virtio_device *vdev = vblk->vdev;
> 
> @@ -394,22 +394,16 @@ static int init_vq(struct virtio_blk *vblk)
>  		num_vqs = 1;
> 

...just do

err = -ENOMEM;

here and...

>  	vblk->vqs = kmalloc(sizeof(*vblk->vqs) * num_vqs, GFP_KERNEL);
> -	if (!vblk->vqs) {
> -		err = -ENOMEM;
> -		goto out;
> -	}
> +	if (!vblk->vqs)
> +		return -ENOMEM;
> 
>  	names = kmalloc(sizeof(*names) * num_vqs, GFP_KERNEL);
> -	if (!names)
> -		goto err_names;
> -
>  	callbacks = kmalloc(sizeof(*callbacks) * num_vqs, GFP_KERNEL);
> -	if (!callbacks)
> -		goto err_callbacks;
> -
>  	vqs = kmalloc(sizeof(*vqs) * num_vqs, GFP_KERNEL);
> -	if (!vqs)
> -		goto err_vqs;
> +	if (!names || !callbacks || !vqs) {
> +		err = -ENOMEM;
> +		goto out;
> +	}

...you could use the

foo = kmalloc(...);
if (!foo)
	goto out;

sequence in any case. This avoids trying again and again if e.g. the
names allocation already failed.

Alternatively, you should be fine if you don't init the variables to
NULL: The code is now either taking an early exit or setting all of the
variables anyway.

^ permalink raw reply

* Re: [PATCH v2] virtio_blk: Fix a slient kernel panic
From: Minfei Huang @ 2016-07-18 16:18 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Minfei Huang, fanc.fnst, linux-kernel, virtualization, mst,
	Minfei Huang
In-Reply-To: <20160718172128.5b8c10b1.cornelia.huck@de.ibm.com>

On 07/18/16 at 05:21P, Cornelia Huck wrote:
> On Mon, 18 Jul 2016 22:01:29 +0800
> Minfei Huang <mnfhuang@gmail.com> wrote:
> > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > index 42758b5..d920512 100644
> > --- a/drivers/block/virtio_blk.c
> > +++ b/drivers/block/virtio_blk.c
> > @@ -381,9 +381,9 @@ static int init_vq(struct virtio_blk *vblk)
> >  {
> >  	int err = 0;
> >  	int i;
> > -	vq_callback_t **callbacks;
> > -	const char **names;
> > -	struct virtqueue **vqs;
> > +	vq_callback_t **callbacks = NULL;
> > +	const char **names = NULL;
> > +	struct virtqueue **vqs = NULL;
> 
> If you init the variables to NULL anyway...

Hi, Cornelia.

Thanks for reviewing this patch.

Seems there is no need to init these variables to NULL. I will remove
them laster.

> 
> >  	unsigned short num_vqs;
> >  	struct virtio_device *vdev = vblk->vdev;
> > 
> > @@ -394,22 +394,16 @@ static int init_vq(struct virtio_blk *vblk)
> >  		num_vqs = 1;
> > 
> 
> ...just do
> 
> err = -ENOMEM;
> 
> here and...
> 
> >  	vblk->vqs = kmalloc(sizeof(*vblk->vqs) * num_vqs, GFP_KERNEL);
> > -	if (!vblk->vqs) {
> > -		err = -ENOMEM;
> > -		goto out;
> > -	}
> > +	if (!vblk->vqs)
> > +		return -ENOMEM;
> > 
> >  	names = kmalloc(sizeof(*names) * num_vqs, GFP_KERNEL);
> > -	if (!names)
> > -		goto err_names;
> > -
> >  	callbacks = kmalloc(sizeof(*callbacks) * num_vqs, GFP_KERNEL);
> > -	if (!callbacks)
> > -		goto err_callbacks;
> > -
> >  	vqs = kmalloc(sizeof(*vqs) * num_vqs, GFP_KERNEL);
> > -	if (!vqs)
> > -		goto err_vqs;
> > +	if (!names || !callbacks || !vqs) {
> > +		err = -ENOMEM;
> > +		goto out;
> > +	}
> 
> ...you could use the
> 
> foo = kmalloc(...);
> if (!foo)
> 	goto out;
> 
> sequence in any case. This avoids trying again and again if e.g. the
> names allocation already failed.

For this implementation, I have referred others which calls
vdev->config->find_vqs as well. Yes, this continues trying to allocate
memory, although memory allocation failed before.

> 
> Alternatively, you should be fine if you don't init the variables to
> NULL: The code is now either taking an early exit or setting all of the
> variables anyway.
> 

It's a big change if we refactor the helper ->find_vqs, since other
devices also call it.

Thanks
Minfei

^ permalink raw reply

* Re: [PATCH v2] virtio_blk: Fix a slient kernel panic
From: Cornelia Huck @ 2016-07-18 16:25 UTC (permalink / raw)
  To: Minfei Huang; +Cc: Minfei Huang, virtualization, linux-kernel, fanc.fnst, mst
In-Reply-To: <20160718160421.GA20047@huangminfeis-MacBook-Pro.local>

On Tue, 19 Jul 2016 00:18:32 +0800
Minfei Huang <mnfhuang@gmail.com> wrote:

> On 07/18/16 at 05:21P, Cornelia Huck wrote:
> > On Mon, 18 Jul 2016 22:01:29 +0800
> > Minfei Huang <mnfhuang@gmail.com> wrote:
> > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > > index 42758b5..d920512 100644
> > > --- a/drivers/block/virtio_blk.c
> > > +++ b/drivers/block/virtio_blk.c
> > > @@ -381,9 +381,9 @@ static int init_vq(struct virtio_blk *vblk)
> > >  {
> > >  	int err = 0;
> > >  	int i;
> > > -	vq_callback_t **callbacks;
> > > -	const char **names;
> > > -	struct virtqueue **vqs;
> > > +	vq_callback_t **callbacks = NULL;
> > > +	const char **names = NULL;
> > > +	struct virtqueue **vqs = NULL;
> > 
> > If you init the variables to NULL anyway...
> 
> Hi, Cornelia.
> 
> Thanks for reviewing this patch.
> 
> Seems there is no need to init these variables to NULL. I will remove
> them laster.

Fine with me.

> 
> > 
> > >  	unsigned short num_vqs;
> > >  	struct virtio_device *vdev = vblk->vdev;
> > > 
> > > @@ -394,22 +394,16 @@ static int init_vq(struct virtio_blk *vblk)
> > >  		num_vqs = 1;
> > > 
> > 
> > ...just do
> > 
> > err = -ENOMEM;
> > 
> > here and...
> > 
> > >  	vblk->vqs = kmalloc(sizeof(*vblk->vqs) * num_vqs, GFP_KERNEL);
> > > -	if (!vblk->vqs) {
> > > -		err = -ENOMEM;
> > > -		goto out;
> > > -	}
> > > +	if (!vblk->vqs)
> > > +		return -ENOMEM;
> > > 
> > >  	names = kmalloc(sizeof(*names) * num_vqs, GFP_KERNEL);
> > > -	if (!names)
> > > -		goto err_names;
> > > -
> > >  	callbacks = kmalloc(sizeof(*callbacks) * num_vqs, GFP_KERNEL);
> > > -	if (!callbacks)
> > > -		goto err_callbacks;
> > > -
> > >  	vqs = kmalloc(sizeof(*vqs) * num_vqs, GFP_KERNEL);
> > > -	if (!vqs)
> > > -		goto err_vqs;
> > > +	if (!names || !callbacks || !vqs) {
> > > +		err = -ENOMEM;
> > > +		goto out;
> > > +	}
> > 
> > ...you could use the
> > 
> > foo = kmalloc(...);
> > if (!foo)
> > 	goto out;
> > 
> > sequence in any case. This avoids trying again and again if e.g. the
> > names allocation already failed.
> 
> For this implementation, I have referred others which calls
> vdev->config->find_vqs as well. Yes, this continues trying to allocate
> memory, although memory allocation failed before.

It might not be the best idea, though; although it should hopefully be
a not-so-common occurrence.

> 
> > 
> > Alternatively, you should be fine if you don't init the variables to
> > NULL: The code is now either taking an early exit or setting all of the
> > variables anyway.
> > 
> 
> It's a big change if we refactor the helper ->find_vqs, since other
> devices also call it.

Actually, I was referring to not initializing the variables to NULL in
this function and keeping the rest of your changes: IOW, just what you
suggested above :)

^ permalink raw reply

* Re: [PATCH 1/3] virtio: Basic implementation of virtio pstore driver
From: Kees Cook @ 2016-07-18 17:50 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Tony Luck, Radim Kr??m????, KVM, Michael S. Tsirkin,
	Anton Vorontsov, LKML, Steven Rostedt, qemu-devel, Minchan Kim,
	Anthony Liguori, Colin Cross, Paolo Bonzini,
	virtualization@lists.linux-foundation.org, Ingo Molnar
In-Reply-To: <20160718055037.GA8051@danjae.aot.lge.com>

On Sun, Jul 17, 2016 at 10:50 PM, Namhyung Kim <namhyung@kernel.org> wrote:
> Hello,
>
> On Sun, Jul 17, 2016 at 10:12:26PM -0700, Kees Cook wrote:
>> On Sun, Jul 17, 2016 at 9:37 PM, Namhyung Kim <namhyung@kernel.org> wrote:
>> > The virtio pstore driver provides interface to the pstore subsystem so
>> > that the guest kernel's log/dump message can be saved on the host
>> > machine.  Users can access the log file directly on the host, or on the
>> > guest at the next boot using pstore filesystem.  It currently deals with
>> > kernel log (printk) buffer only, but we can extend it to have other
>> > information (like ftrace dump) later.
>> >
>> > It supports legacy PCI device using single order-2 page buffer.  As all
>> > operation of pstore is synchronous, it would be fine IMHO.  However I
>> > don't know how to make write operation synchronous since it's called
>> > with a spinlock held (from any context including NMI).
>> >
>> > Cc: Paolo Bonzini <pbonzini@redhat.com>
>> > Cc: Radim Kr??m???? <rkrcmar@redhat.com>
>> > Cc: "Michael S. Tsirkin" <mst@redhat.com>
>> > Cc: Anthony Liguori <aliguori@amazon.com>
>> > Cc: Anton Vorontsov <anton@enomsg.org>
>> > Cc: Colin Cross <ccross@android.com>
>> > Cc: Kees Cook <keescook@chromium.org>
>> > Cc: Tony Luck <tony.luck@intel.com>
>> > Cc: Steven Rostedt <rostedt@goodmis.org>
>> > Cc: Ingo Molnar <mingo@kernel.org>
>> > Cc: Minchan Kim <minchan@kernel.org>
>> > Cc: kvm@vger.kernel.org
>> > Cc: qemu-devel@nongnu.org
>> > Cc: virtualization@lists.linux-foundation.org
>> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
>>
>> This looks great to me! I'd love to use this in qemu. (Right now I go
>> through hoops to use the ramoops backend for testing.)
>>
>> Reviewed-by: Kees Cook <keescook@chromium.org>
>
> Thank you!
>
>>
>> Notes below...
>>
>
> [SNIP]
>> > +static u16 to_virtio_type(struct virtio_pstore *vps, enum pstore_type_id type)
>> > +{
>> > +       u16 ret;
>> > +
>> > +       switch (type) {
>> > +       case PSTORE_TYPE_DMESG:
>> > +               ret = cpu_to_virtio16(vps->vdev, VIRTIO_PSTORE_TYPE_DMESG);
>> > +               break;
>> > +       default:
>> > +               ret = cpu_to_virtio16(vps->vdev, VIRTIO_PSTORE_TYPE_UNKNOWN);
>> > +               break;
>> > +       }
>>
>> I would love to see this support PSTORE_TYPE_CONSOLE too. It should be
>> relatively easy to add: I think it'd just be another virtio command?
>
> Do you want to append the data to the host file as guest does
> printk()?  I think it needs some kind of buffer management, but it's
> not hard to add IMHO.

Well, with most pstore backends, the buffer size is limited, so it
tends to be a circular buffer of some sort. I think whatever you
choose to do is fine (I saw the various mentions of resource limits in
the qemu part of this thread), as long as the last N bytes of console
can be seen on the host side, where N is some portion of the memory
set aside for the log. (I don't mind the idea of an unlimited console
log either, but I suspect that will not be accepted on the qemu
side...)

> [SNIP]
>> > +static int notrace virt_pstore_write(enum pstore_type_id type,
>> > +                                    enum kmsg_dump_reason reason,
>> > +                                    u64 *id, unsigned int part, int count,
>> > +                                    bool compressed, size_t size,
>> > +                                    struct pstore_info *psi)
>> > +{
>> > +       struct virtio_pstore *vps = psi->data;
>> > +       struct virtio_pstore_hdr *hdr = &vps->hdr;
>> > +       struct scatterlist sg[2];
>> > +       unsigned int flags = compressed ? VIRTIO_PSTORE_FL_COMPRESSED : 0;
>> > +
>> > +       *id = vps->id++;
>> > +
>> > +       hdr->cmd   = cpu_to_virtio16(vps->vdev, VIRTIO_PSTORE_CMD_WRITE);
>> > +       hdr->id    = cpu_to_virtio64(vps->vdev, *id);
>> > +       hdr->flags = cpu_to_virtio32(vps->vdev, flags);
>> > +       hdr->type  = to_virtio_type(vps, type);
>> > +
>> > +       sg_init_table(sg, 2);
>> > +       sg_set_buf(&sg[0], hdr, sizeof(*hdr));
>> > +       sg_set_buf(&sg[1], psi->buf, size);
>> > +       virtqueue_add_outbuf(vps->vq, sg, 2, vps, GFP_ATOMIC);
>> > +       virtqueue_kick(vps->vq);
>> > +
>> > +       /* TODO: make it synchronous */
>> > +       return 0;
>>
>> The down side to this being asynchronous is the lack of error
>> reporting. Perhaps this could check hdr->type before queuing and error
>> for any VIRTIO_PSTORE_TYPE_UNKNOWN message instead of trying to send
>> it?
>
> I cannot follow, sorry.  Could you please elaborate it more?

The mention you have here of "TODO: make it synchronous" made me think
about what effects that could have. If a pstore_write() were issued
for a type other than DMESG, the above code would send it through
virtio anyway. No error reporting is possible unless this is
synchronous, but the only error here would simply be "I don't know
what anything except DMESG is", so maybe this code could refuse to
forward anything with type UNKNOWN in the first place. (Just an idea:
I don't think there's anything very wrong here. It just seemed like a
potential improvement.)

>> > +}
>> > +
>> > +static int virt_pstore_erase(enum pstore_type_id type, u64 id, int count,
>> > +                            struct timespec time, struct pstore_info *psi)
>> > +{
>> > +       struct virtio_pstore *vps = psi->data;
>> > +       struct virtio_pstore_hdr *hdr = &vps->hdr;
>> > +       struct scatterlist sg[1];
>> > +       unsigned int len;
>> > +
>> > +       hdr->cmd   = cpu_to_virtio16(vps->vdev, VIRTIO_PSTORE_CMD_ERASE);
>> > +       hdr->id    = cpu_to_virtio64(vps->vdev, id);
>> > +       hdr->type  = to_virtio_type(vps, type);
>> > +
>> > +       sg_init_one(sg, hdr, sizeof(*hdr));
>> > +       virtqueue_add_outbuf(vps->vq, sg, 1, vps, GFP_KERNEL);
>> > +       virtqueue_kick(vps->vq);
>> > +
>> > +       wait_event(vps->acked, virtqueue_get_buf(vps->vq, &len));
>> > +       return 0;
>> > +}
>> > +
>> > +static int virt_pstore_init(struct virtio_pstore *vps)
>> > +{
>> > +       struct pstore_info *psinfo = &vps->pstore;
>> > +       int err;
>> > +
>> > +       vps->id = 0;
>> > +       vps->buflen = 0;
>> > +       psinfo->bufsize = VIRT_PSTORE_BUFSIZE;
>> > +       psinfo->buf = (void *)__get_free_pages(GFP_KERNEL, VIRT_PSTORE_ORDER);
>> > +       if (!psinfo->buf) {
>> > +               pr_err("cannot allocate pstore buffer\n");
>> > +               return -ENOMEM;
>> > +       }
>> > +
>> > +       psinfo->owner = THIS_MODULE;
>> > +       psinfo->name  = "virtio";
>> > +       psinfo->open  = virt_pstore_open;
>> > +       psinfo->close = virt_pstore_close;
>> > +       psinfo->read  = virt_pstore_read;
>> > +       psinfo->erase = virt_pstore_erase;
>> > +       psinfo->write = virt_pstore_write;
>> > +       psinfo->flags = PSTORE_FLAGS_FRAGILE;
>>
>> For console support, this flag would need to be dropped -- though I
>> suspect you know that already.:)
>
> Yep, I intentionally support DMESG type only in this patchset for
> simplicity.  Others could be added later. :)

Cool by me. :)

>
>
>>
>> > +       psinfo->data  = vps;
>> > +       spin_lock_init(&psinfo->buf_lock);
>> > +
>> > +       err = pstore_register(psinfo);
>> > +       if (err)
>> > +               kfree(psinfo->buf);
>> > +
>> > +       return err;
>> > +}
>
> [SNIP]
>>
>> Awesome! Can't wait to use it. :)
>
> Thanks for your review! :)

You're welcome! :)

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply

* [PATCH] virtio-balloon: Delete an unnecessary check before the function call "iput"
From: SF Markus Elfring @ 2016-07-18 19:25 UTC (permalink / raw)
  To: virtualization, Michael S. Tsirkin; +Cc: Julia Lawall, kernel-janitors, LKML
In-Reply-To: <5317A59D.4@users.sourceforge.net>

From: Markus Elfring <elfring@users.sourceforge.net>
Date: Mon, 18 Jul 2016 21:17:18 +0200

The iput() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
---
 drivers/virtio/virtio_balloon.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 888d5f8..81f7d6c 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -611,8 +611,7 @@ static void virtballoon_remove(struct virtio_device *vdev)
 	cancel_work_sync(&vb->update_balloon_stats_work);
 
 	remove_common(vb);
-	if (vb->vb_dev_info.inode)
-		iput(vb->vb_dev_info.inode);
+	iput(vb->vb_dev_info.inode);
 	kfree(vb);
 }
 
-- 
2.9.2

^ permalink raw reply related

* Re: [PATCH v2] virtio_blk: Fix a slient kernel panic
From: Minfei Huang @ 2016-07-19  0:37 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Minfei Huang, fanc.fnst, linux-kernel, virtualization, mst,
	Minfei Huang
In-Reply-To: <20160718182534.7b0bdfe9.cornelia.huck@de.ibm.com>

On 07/18/16 at 06:25P, Cornelia Huck wrote:
> On Tue, 19 Jul 2016 00:18:32 +0800
> Minfei Huang <mnfhuang@gmail.com> wrote:
> 
> > On 07/18/16 at 05:21P, Cornelia Huck wrote:
> > > On Mon, 18 Jul 2016 22:01:29 +0800
> > > Minfei Huang <mnfhuang@gmail.com> wrote:
> > > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > > > index 42758b5..d920512 100644
> > > > --- a/drivers/block/virtio_blk.c
> > > > +++ b/drivers/block/virtio_blk.c
> > > > @@ -381,9 +381,9 @@ static int init_vq(struct virtio_blk *vblk)
> > > >  {
> > > >  	int err = 0;
> > > >  	int i;
> > > > -	vq_callback_t **callbacks;
> > > > -	const char **names;
> > > > -	struct virtqueue **vqs;
> > > > +	vq_callback_t **callbacks = NULL;
> > > > +	const char **names = NULL;
> > > > +	struct virtqueue **vqs = NULL;
> > > 
> > > If you init the variables to NULL anyway...
> > 
> > Hi, Cornelia.
> > 
> > Thanks for reviewing this patch.
> > 
> > Seems there is no need to init these variables to NULL. I will remove
> > them laster.
> 
> Fine with me.
> 
> > 
> > > 
> > > >  	unsigned short num_vqs;
> > > >  	struct virtio_device *vdev = vblk->vdev;
> > > > 
> > > > @@ -394,22 +394,16 @@ static int init_vq(struct virtio_blk *vblk)
> > > >  		num_vqs = 1;
> > > > 
> > > 
> > > ...just do
> > > 
> > > err = -ENOMEM;
> > > 
> > > here and...
> > > 
> > > >  	vblk->vqs = kmalloc(sizeof(*vblk->vqs) * num_vqs, GFP_KERNEL);
> > > > -	if (!vblk->vqs) {
> > > > -		err = -ENOMEM;
> > > > -		goto out;
> > > > -	}
> > > > +	if (!vblk->vqs)
> > > > +		return -ENOMEM;
> > > > 
> > > >  	names = kmalloc(sizeof(*names) * num_vqs, GFP_KERNEL);
> > > > -	if (!names)
> > > > -		goto err_names;
> > > > -
> > > >  	callbacks = kmalloc(sizeof(*callbacks) * num_vqs, GFP_KERNEL);
> > > > -	if (!callbacks)
> > > > -		goto err_callbacks;
> > > > -
> > > >  	vqs = kmalloc(sizeof(*vqs) * num_vqs, GFP_KERNEL);
> > > > -	if (!vqs)
> > > > -		goto err_vqs;
> > > > +	if (!names || !callbacks || !vqs) {
> > > > +		err = -ENOMEM;
> > > > +		goto out;
> > > > +	}
> > > 
> > > ...you could use the
> > > 
> > > foo = kmalloc(...);
> > > if (!foo)
> > > 	goto out;
> > > 
> > > sequence in any case. This avoids trying again and again if e.g. the
> > > names allocation already failed.
> > 
> > For this implementation, I have referred others which calls
> > vdev->config->find_vqs as well. Yes, this continues trying to allocate
> > memory, although memory allocation failed before.
> 
> It might not be the best idea, though; although it should hopefully be
> a not-so-common occurrence.

Yep, for that memont, there is enough memory to be allocated.

> 
> > 
> > > 
> > > Alternatively, you should be fine if you don't init the variables to
> > > NULL: The code is now either taking an early exit or setting all of the
> > > variables anyway.
> > > 
> > 
> > It's a big change if we refactor the helper ->find_vqs, since other
> > devices also call it.
> 
> Actually, I was referring to not initializing the variables to NULL in
> this function and keeping the rest of your changes: IOW, just what you
> suggested above :)

Ok. I will repost an update to fix it.

Thanks
Minfei

> 

^ permalink raw reply

* [PATCH v3] virtio: new feature to detect IOMMU device quirk
From: Michael S. Tsirkin @ 2016-07-19  2:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: virtualization

The interaction between virtio and IOMMUs is messy.

On most systems with virtio, physical addresses match bus addresses,
and it doesn't particularly matter which one we use to program
the device.

On some systems, including Xen and any system with a physical device
that speaks virtio behind a physical IOMMU, we must program the IOMMU
for virtio DMA to work at all.

On other systems, including SPARC and PPC64, virtio-pci devices are
enumerated as though they are behind an IOMMU, but the virtio host
ignores the IOMMU, so we must either pretend that the IOMMU isn't
there or somehow map everything as the identity.

Add a feature bit to detect that quirk: VIRTIO_F_IOMMU_PLATFORM.

Any device with this feature bit set to 0 needs a quirk and has to be
passed physical addresses (as opposed to bus addresses) even though
the device is behind an IOMMU.

Note: it has to be a per-device quirk because for example, there could
be a mix of passed-through and virtual virtio devices. As another
example, some devices could be implemented by an out of process
hypervisor backend (in case of qemu vhost, or vhost-user) and so support
for an IOMMU needs to be coded up separately.

It would be cleanest to handle this in IOMMU core code, but that needs
per-device DMA ops. While we are waiting for that to be implemented, use
a work-around in virtio core.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---

wanted to use per-device dma ops but that does not
seem to be ready. So let's put this in virtio
code for now, and move when it becomes possible.

 include/uapi/linux/virtio_config.h |  4 +++-
 drivers/virtio/virtio_ring.c       | 15 ++++++++++++++-
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
index 4cb65bb..3f6195e 100644
--- a/include/uapi/linux/virtio_config.h
+++ b/include/uapi/linux/virtio_config.h
@@ -49,7 +49,7 @@
  * transport being used (eg. virtio_ring), the rest are per-device feature
  * bits. */
 #define VIRTIO_TRANSPORT_F_START	28
-#define VIRTIO_TRANSPORT_F_END		33
+#define VIRTIO_TRANSPORT_F_END		34
 
 #ifndef VIRTIO_CONFIG_NO_LEGACY
 /* Do we get callbacks when the ring is completely used, even if we've
@@ -63,4 +63,6 @@
 /* v1.0 compliant. */
 #define VIRTIO_F_VERSION_1		32
 
+/* Do not bypass the IOMMU (if configured) */
+#define VIRTIO_F_IOMMU_PLATFORM		33
 #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index ca6bfdd..2a0c8bf 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -117,7 +117,10 @@ struct vring_virtqueue {
 #define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
 
 /*
- * The interaction between virtio and a possible IOMMU is a mess.
+ * Modern virtio devices might set feature bits to specify whether
+ * they use the platform IOMMU. If there, just use the DMA API.
+ *
+ * If not there, the interaction between virtio and DMA API is messy.
  *
  * On most systems with virtio, physical addresses match bus addresses,
  * and it doesn't particularly matter whether we use the DMA API.
@@ -137,6 +140,10 @@ struct vring_virtqueue {
 
 static bool vring_use_dma_api(struct virtio_device *vdev)
 {
+	if (virtio_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM))
+		return true;
+
+	/* Otherwise, we are left to guess. */
 	/*
 	 * In theory, it's possible to have a buggy QEMU-supposed
 	 * emulated Q35 IOMMU and Xen enabled at the same time.  On
@@ -1099,6 +1106,12 @@ void vring_transport_features(struct virtio_device *vdev)
 			break;
 		case VIRTIO_F_VERSION_1:
 			break;
+		case VIRTIO_F_IOMMU_PASSTHROUGH:
+			break;
+		case VIRTIO_F_IOMMU_PLATFORM:
+			/* Ignore passthrough hint for now, obey kernel config. */
+			__virtio_clear_bit(vdev, VIRTIO_F_IOMMU_PASSTHROUGH);
+			break;
 		default:
 			/* We don't understand this bit. */
 			__virtio_clear_bit(vdev, i);
-- 
MST

^ permalink raw reply related

* [PATCH v3] virtio_blk: Fix a slient kernel panic
From: Minfei Huang @ 2016-07-19  4:32 UTC (permalink / raw)
  To: mst, cornelia.huck
  Cc: Minfei Huang, fanc.fnst, linux-kernel, Minfei Huang,
	virtualization

From: Minfei Huang <mnghuan@gmail.com>

We do a lot of memory allocation in function init_vq, and don't handle
the allocation failure properly. Then this function will return 0,
although initialization fails due to lacking memory. At that moment,
kernel will panic in guest machine, if virtio is used to drive disk.

To fix this bug, we should take care of allocation failure, and return
correct value to let caller know what happen.

Tested-by: Chao Fan <fanc.fnst@cn.fujitsu.com>
Signed-off-by: Minfei Huang <minfei.hmf@alibaba-inc.com>
Signed-off-by: Minfei Huang <mnghuan@gmail.com>
---
v2:
- Remove useless initialisation to NULL
v1:
- Refactor the patch to make code more readable
---
 drivers/block/virtio_blk.c | 26 ++++++++------------------
 1 file changed, 8 insertions(+), 18 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 42758b5..4ee78c0 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -394,22 +394,16 @@ static int init_vq(struct virtio_blk *vblk)
 		num_vqs = 1;
 
 	vblk->vqs = kmalloc(sizeof(*vblk->vqs) * num_vqs, GFP_KERNEL);
-	if (!vblk->vqs) {
-		err = -ENOMEM;
-		goto out;
-	}
+	if (!vblk->vqs)
+		return -ENOMEM;
 
 	names = kmalloc(sizeof(*names) * num_vqs, GFP_KERNEL);
-	if (!names)
-		goto err_names;
-
 	callbacks = kmalloc(sizeof(*callbacks) * num_vqs, GFP_KERNEL);
-	if (!callbacks)
-		goto err_callbacks;
-
 	vqs = kmalloc(sizeof(*vqs) * num_vqs, GFP_KERNEL);
-	if (!vqs)
-		goto err_vqs;
+	if (!names || !callbacks || !vqs) {
+		err = -ENOMEM;
+		goto out;
+	}
 
 	for (i = 0; i < num_vqs; i++) {
 		callbacks[i] = virtblk_done;
@@ -420,7 +414,7 @@ static int init_vq(struct virtio_blk *vblk)
 	/* Discover virtqueues and write information to configuration.  */
 	err = vdev->config->find_vqs(vdev, num_vqs, vqs, callbacks, names);
 	if (err)
-		goto err_find_vqs;
+		goto out;
 
 	for (i = 0; i < num_vqs; i++) {
 		spin_lock_init(&vblk->vqs[i].lock);
@@ -428,16 +422,12 @@ static int init_vq(struct virtio_blk *vblk)
 	}
 	vblk->num_vqs = num_vqs;
 
- err_find_vqs:
+out:
 	kfree(vqs);
- err_vqs:
 	kfree(callbacks);
- err_callbacks:
 	kfree(names);
- err_names:
 	if (err)
 		kfree(vblk->vqs);
- out:
 	return err;
 }
 
-- 
2.7.4 (Apple Git-66)

^ permalink raw reply related

* Re: [PATCH v3] virtio_blk: Fix a slient kernel panic
From: Cornelia Huck @ 2016-07-19 12:22 UTC (permalink / raw)
  To: Minfei Huang
  Cc: Minfei Huang, fanc.fnst, linux-kernel, virtualization, mst,
	Minfei Huang
In-Reply-To: <1468902762-43268-1-git-send-email-mnfhuang@gmail.com>

On Tue, 19 Jul 2016 12:32:42 +0800
Minfei Huang <mnfhuang@gmail.com> wrote:

> From: Minfei Huang <mnghuan@gmail.com>
> 
> We do a lot of memory allocation in function init_vq, and don't handle
> the allocation failure properly. Then this function will return 0,
> although initialization fails due to lacking memory. At that moment,
> kernel will panic in guest machine, if virtio is used to drive disk.
> 
> To fix this bug, we should take care of allocation failure, and return
> correct value to let caller know what happen.
> 
> Tested-by: Chao Fan <fanc.fnst@cn.fujitsu.com>
> Signed-off-by: Minfei Huang <minfei.hmf@alibaba-inc.com>
> Signed-off-by: Minfei Huang <mnghuan@gmail.com>
> ---
> v2:
> - Remove useless initialisation to NULL
> v1:
> - Refactor the patch to make code more readable
> ---
>  drivers/block/virtio_blk.c | 26 ++++++++------------------
>  1 file changed, 8 insertions(+), 18 deletions(-)

Your changes certainly make the function more compact.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>

^ permalink raw reply

* [PATCH -next] drm/virtio: Fix non static symbol warning
From: Wei Yongjun @ 2016-07-19 12:45 UTC (permalink / raw)
  To: David Airlie, Gerd Hoffmann
  Cc: Wei Yongjun, linux-kernel, dri-devel, virtualization

From: Wei Yongjun <yongjun_wei@trendmicro.com.cn>

Fixes the following sparse warning:

drivers/gpu/drm/virtio/virtgpu_display.c:349:37: warning:
  symbol 'virtio_mode_config_helpers' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
---
 drivers/gpu/drm/virtio/virtgpu_display.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_display.c b/drivers/gpu/drm/virtio/virtgpu_display.c
index fdfc711..4e192aa 100644
--- a/drivers/gpu/drm/virtio/virtgpu_display.c
+++ b/drivers/gpu/drm/virtio/virtgpu_display.c
@@ -346,7 +346,7 @@ static void vgdev_atomic_commit_tail(struct drm_atomic_state *state)
 	drm_atomic_helper_cleanup_planes(dev, state);
 }
 
-struct drm_mode_config_helper_funcs virtio_mode_config_helpers = {
+static struct drm_mode_config_helper_funcs virtio_mode_config_helpers = {
 	.atomic_commit_tail = vgdev_atomic_commit_tail,
 };

^ permalink raw reply related

* Re: [PATCH 1/3] virtio: Basic implementation of virtio pstore driver
From: Namhyung Kim @ 2016-07-19 13:43 UTC (permalink / raw)
  To: Kees Cook
  Cc: Tony Luck, Radim Kr??m????, KVM, Michael S. Tsirkin,
	Anton Vorontsov, LKML, Steven Rostedt, qemu-devel, Minchan Kim,
	Anthony Liguori, Colin Cross, Paolo Bonzini,
	virtualization@lists.linux-foundation.org, Ingo Molnar
In-Reply-To: <CAGXu5jJBM6GLwR6B28US11v90pFsiU2yJddm4V6kx3CHdLcOqQ@mail.gmail.com>

Hi Kees,

On Mon, Jul 18, 2016 at 10:50:06AM -0700, Kees Cook wrote:
> On Sun, Jul 17, 2016 at 10:50 PM, Namhyung Kim <namhyung@kernel.org> wrote:
> > Hello,
> >
> > On Sun, Jul 17, 2016 at 10:12:26PM -0700, Kees Cook wrote:
> >> On Sun, Jul 17, 2016 at 9:37 PM, Namhyung Kim <namhyung@kernel.org> wrote:
> > [SNIP]
> >> > +static u16 to_virtio_type(struct virtio_pstore *vps, enum pstore_type_id type)
> >> > +{
> >> > +       u16 ret;
> >> > +
> >> > +       switch (type) {
> >> > +       case PSTORE_TYPE_DMESG:
> >> > +               ret = cpu_to_virtio16(vps->vdev, VIRTIO_PSTORE_TYPE_DMESG);
> >> > +               break;
> >> > +       default:
> >> > +               ret = cpu_to_virtio16(vps->vdev, VIRTIO_PSTORE_TYPE_UNKNOWN);
> >> > +               break;
> >> > +       }
> >>
> >> I would love to see this support PSTORE_TYPE_CONSOLE too. It should be
> >> relatively easy to add: I think it'd just be another virtio command?
> >
> > Do you want to append the data to the host file as guest does
> > printk()?  I think it needs some kind of buffer management, but it's
> > not hard to add IMHO.
> 
> Well, with most pstore backends, the buffer size is limited, so it
> tends to be a circular buffer of some sort. I think whatever you
> choose to do is fine (I saw the various mentions of resource limits in
> the qemu part of this thread), as long as the last N bytes of console
> can be seen on the host side, where N is some portion of the memory
> set aside for the log. (I don't mind the idea of an unlimited console
> log either, but I suspect that will not be accepted on the qemu
> side...)

I think it needs two kinds of buffer management.

The first one is the psinfo->buf (or something similar).  IIUC the
PSTORE_TYPE_CONSOLE is different than PSTORE_TYPE_DMESG as it is
emitted every time printk() sends messages to console.  So I think the
it should remain in async mode due to performance reason.  To do that,
the message should be copied to psinfo->buf and then sent via virtio.
Then it needs to keep track of the available buffer position IMHO.

The other one is the file management on the host side.  I am thinking
of a simple way that the log file is splitted when it exceeds the half
of the allowed max size.  It would be configurable and might allow
unlimited logs if user requests it explicitly (if qemu guys say ok)..

Maybe we need to use 'part' or 'count' for filenames to identify
the splitted files.

> 
> > [SNIP]
> >> > +static int notrace virt_pstore_write(enum pstore_type_id type,
> >> > +                                    enum kmsg_dump_reason reason,
> >> > +                                    u64 *id, unsigned int part, int count,
> >> > +                                    bool compressed, size_t size,
> >> > +                                    struct pstore_info *psi)
> >> > +{
> >> > +       struct virtio_pstore *vps = psi->data;
> >> > +       struct virtio_pstore_hdr *hdr = &vps->hdr;
> >> > +       struct scatterlist sg[2];
> >> > +       unsigned int flags = compressed ? VIRTIO_PSTORE_FL_COMPRESSED : 0;
> >> > +
> >> > +       *id = vps->id++;
> >> > +
> >> > +       hdr->cmd   = cpu_to_virtio16(vps->vdev, VIRTIO_PSTORE_CMD_WRITE);
> >> > +       hdr->id    = cpu_to_virtio64(vps->vdev, *id);
> >> > +       hdr->flags = cpu_to_virtio32(vps->vdev, flags);
> >> > +       hdr->type  = to_virtio_type(vps, type);
> >> > +
> >> > +       sg_init_table(sg, 2);
> >> > +       sg_set_buf(&sg[0], hdr, sizeof(*hdr));
> >> > +       sg_set_buf(&sg[1], psi->buf, size);
> >> > +       virtqueue_add_outbuf(vps->vq, sg, 2, vps, GFP_ATOMIC);
> >> > +       virtqueue_kick(vps->vq);
> >> > +
> >> > +       /* TODO: make it synchronous */
> >> > +       return 0;
> >>
> >> The down side to this being asynchronous is the lack of error
> >> reporting. Perhaps this could check hdr->type before queuing and error
> >> for any VIRTIO_PSTORE_TYPE_UNKNOWN message instead of trying to send
> >> it?
> >
> > I cannot follow, sorry.  Could you please elaborate it more?
> 
> The mention you have here of "TODO: make it synchronous" made me think
> about what effects that could have. If a pstore_write() were issued
> for a type other than DMESG, the above code would send it through
> virtio anyway. No error reporting is possible unless this is
> synchronous, but the only error here would simply be "I don't know
> what anything except DMESG is", so maybe this code could refuse to
> forward anything with type UNKNOWN in the first place. (Just an idea:
> I don't think there's anything very wrong here. It just seemed like a
> potential improvement.)

Yep, that kind of error handling should be easy.  My concern is when
write operation is failed on the host side.  We need a way to report
it back to the guest and might disallow further writes at least for
the same type.

Thanks,
Namhyung

^ permalink raw reply

* [PATCH] content: Reserve virtio device ID for pstore
From: Namhyung Kim @ 2016-07-19 15:13 UTC (permalink / raw)
  To: virtio-dev; +Cc: Namhyung Kim, qemu-devel, kvm, virtualization

From: Namhyung Kim <namhyung@kernel.org>

This patch just reserve next available device ID for pstore device
type.  The device specification for pstore will come later.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 content.tex | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/content.tex b/content.tex
index d989d98..10bacc5 100644
--- a/content.tex
+++ b/content.tex
@@ -2990,6 +2990,8 @@ Device ID  &  Virtio Device    \\
 \hline
 18         &   Input device \\
 \hline
+22         &   pstore device \\
+\hline
 \end{tabular}
 
 Some of the devices above are unspecified by this document,
-- 
2.8.0

^ permalink raw reply related

* Re: [PATCH 1/3] virtio: Basic implementation of virtio pstore driver
From: Namhyung Kim @ 2016-07-19 15:32 UTC (permalink / raw)
  To: Kees Cook
  Cc: Tony Luck, Radim Kr??m????, KVM, Michael S. Tsirkin,
	Anton Vorontsov, LKML, Steven Rostedt, qemu-devel, Minchan Kim,
	Anthony Liguori, Colin Cross, Paolo Bonzini,
	virtualization@lists.linux-foundation.org, Ingo Molnar
In-Reply-To: <20160719134303.GA20047@danjae.aot.lge.com>

On Tue, Jul 19, 2016 at 10:43 PM, Namhyung Kim <namhyung@kernel.org> wrote:
> Hi Kees,
>
> On Mon, Jul 18, 2016 at 10:50:06AM -0700, Kees Cook wrote:
>> On Sun, Jul 17, 2016 at 10:50 PM, Namhyung Kim <namhyung@kernel.org> wrote:
>> > Hello,
>> >
>> > On Sun, Jul 17, 2016 at 10:12:26PM -0700, Kees Cook wrote:
>> >> On Sun, Jul 17, 2016 at 9:37 PM, Namhyung Kim <namhyung@kernel.org> wrote:
>> > [SNIP]
>> >> > +static u16 to_virtio_type(struct virtio_pstore *vps, enum pstore_type_id type)
>> >> > +{
>> >> > +       u16 ret;
>> >> > +
>> >> > +       switch (type) {
>> >> > +       case PSTORE_TYPE_DMESG:
>> >> > +               ret = cpu_to_virtio16(vps->vdev, VIRTIO_PSTORE_TYPE_DMESG);
>> >> > +               break;
>> >> > +       default:
>> >> > +               ret = cpu_to_virtio16(vps->vdev, VIRTIO_PSTORE_TYPE_UNKNOWN);
>> >> > +               break;
>> >> > +       }
>> >>
>> >> I would love to see this support PSTORE_TYPE_CONSOLE too. It should be
>> >> relatively easy to add: I think it'd just be another virtio command?
>> >
>> > Do you want to append the data to the host file as guest does
>> > printk()?  I think it needs some kind of buffer management, but it's
>> > not hard to add IMHO.
>>
>> Well, with most pstore backends, the buffer size is limited, so it
>> tends to be a circular buffer of some sort. I think whatever you
>> choose to do is fine (I saw the various mentions of resource limits in
>> the qemu part of this thread), as long as the last N bytes of console
>> can be seen on the host side, where N is some portion of the memory
>> set aside for the log. (I don't mind the idea of an unlimited console
>> log either, but I suspect that will not be accepted on the qemu
>> side...)
>
> I think it needs two kinds of buffer management.
>
> The first one is the psinfo->buf (or something similar).  IIUC the
> PSTORE_TYPE_CONSOLE is different than PSTORE_TYPE_DMESG as it is
> emitted every time printk() sends messages to console.  So I think the
> it should remain in async mode due to performance reason.  To do that,
> the message should be copied to psinfo->buf and then sent via virtio.
> Then it needs to keep track of the available buffer position IMHO.

Looking at the code, I found myself confused with the PSTORE_TYPE_FTRACE
and PSTORE_TYPE_CONSOLE.  It seems that handling of
PSTORE_TYPE_CONSOLE is basically same as PSTORE_TYPE_DMESG..

Thanks,
Namhyung

^ permalink raw reply

* Re: [PATCH 2/3] qemu: Implement virtio-pstore device
From: Namhyung Kim @ 2016-07-19 15:48 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Tony Luck, Radim Kr??m????, Kees Cook, kvm, Michael S. Tsirkin,
	Anton Vorontsov, LKML, Steven Rostedt, qemu-devel, Minchan Kim,
	Anthony Liguori, Colin Cross, Paolo Bonzini, virtualization,
	Ingo Molnar
In-Reply-To: <20160718100353.GA15163@stefanha-x1.localdomain>

Hello,

On Mon, Jul 18, 2016 at 11:03:53AM +0100, Stefan Hajnoczi wrote:
> On Mon, Jul 18, 2016 at 01:37:40PM +0900, Namhyung Kim wrote:
> > +static void virtio_pstore_handle_io(VirtIODevice *vdev, VirtQueue *vq)
> > +{
> > +    VirtIOPstore *s = VIRTIO_PSTORE(vdev);
> > +    VirtQueueElement *elem;
> > +    struct virtio_pstore_hdr *hdr;
> > +    ssize_t len;
> > +
> > +    for (;;) {
> > +        elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
> > +        if (!elem) {
> > +            return;
> > +        }
> > +
> > +        hdr = elem->out_sg[0].iov_base;
> > +        if (elem->out_sg[0].iov_len != sizeof(*hdr)) {
> > +            error_report("invalid header size: %u",
> > +                         (unsigned)elem->out_sg[0].iov_len);
> > +            exit(1);
> > +        }
> 
> Please use iov_to_buf() instead of directly accessing out_sg[].  Virtio
> devices are not supposed to assume a particular iovec layout.  In other
> words, virtio_pstore_hdr could be split across multiple out_sg[] iovecs.
> 
> You must also copy in data (similar to Linux syscall implementations) to
> prevent the guest from modifying data while the command is processed.
> Such race conditions could lead to security bugs.

By accessing elem->out_sg[0].iov_base directly, I abused it as an
in-and-out buffer.  But it seems not allowed by the virtio spec, do I
have to use separate buffers for request and response?

Thanks,
Namhyung

^ permalink raw reply

* Re: [PATCH -next] drm/virtio: Fix non static symbol warning
From: Sean Paul via Virtualization @ 2016-07-19 19:45 UTC (permalink / raw)
  To: Wei Yongjun
  Cc: David Airlie, Linux Kernel Mailing List, dri-devel,
	virtualization, Wei Yongjun
In-Reply-To: <1468932356-26678-1-git-send-email-weiyj_lk@163.com>

On Tue, Jul 19, 2016 at 8:45 AM, Wei Yongjun <weiyj_lk@163.com> wrote:
> From: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
>
> Fixes the following sparse warning:
>
> drivers/gpu/drm/virtio/virtgpu_display.c:349:37: warning:
>   symbol 'virtio_mode_config_helpers' was not declared. Should it be static?
>
> Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>

Applied to drm-misc

> ---
>  drivers/gpu/drm/virtio/virtgpu_display.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/virtio/virtgpu_display.c b/drivers/gpu/drm/virtio/virtgpu_display.c
> index fdfc711..4e192aa 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_display.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_display.c
> @@ -346,7 +346,7 @@ static void vgdev_atomic_commit_tail(struct drm_atomic_state *state)
>         drm_atomic_helper_cleanup_planes(dev, state);
>  }
>
> -struct drm_mode_config_helper_funcs virtio_mode_config_helpers = {
> +static struct drm_mode_config_helper_funcs virtio_mode_config_helpers = {
>         .atomic_commit_tail = vgdev_atomic_commit_tail,
>  };
>
>

^ permalink raw reply

* Re: [Qemu-devel] [PATCH] content: Reserve virtio device ID for pstore
From: Cornelia Huck @ 2016-07-20  7:54 UTC (permalink / raw)
  To: Namhyung Kim; +Cc: virtio-dev, virtualization, qemu-devel, kvm, Namhyung Kim
In-Reply-To: <1468941217-12904-1-git-send-email-namhyung@gmail.com>

On Wed, 20 Jul 2016 00:13:37 +0900
Namhyung Kim <namhyung@gmail.com> wrote:

> From: Namhyung Kim <namhyung@kernel.org>
> 
> This patch just reserve next available device ID for pstore device
> type.  The device specification for pstore will come later.
> 
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  content.tex | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/content.tex b/content.tex
> index d989d98..10bacc5 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -2990,6 +2990,8 @@ Device ID  &  Virtio Device    \\
>  \hline
>  18         &   Input device \\
>  \hline
> +22         &   pstore device \\
> +\hline
>  \end{tabular}
> 
>  Some of the devices above are unspecified by this document,

Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>

^ permalink raw reply

* Re: [PATCH 2/3] qemu: Implement virtio-pstore device
From: Stefan Hajnoczi @ 2016-07-20  8:21 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Tony Luck, Radim Kr??m????, Kees Cook, kvm, Michael S. Tsirkin,
	Anton Vorontsov, LKML, Steven Rostedt, qemu-devel, Minchan Kim,
	Anthony Liguori, Colin Cross, Paolo Bonzini, virtualization,
	Ingo Molnar
In-Reply-To: <20160719154839.GB20047@danjae.aot.lge.com>


[-- Attachment #1.1: Type: text/plain, Size: 1697 bytes --]

On Wed, Jul 20, 2016 at 12:48:39AM +0900, Namhyung Kim wrote:
> Hello,
> 
> On Mon, Jul 18, 2016 at 11:03:53AM +0100, Stefan Hajnoczi wrote:
> > On Mon, Jul 18, 2016 at 01:37:40PM +0900, Namhyung Kim wrote:
> > > +static void virtio_pstore_handle_io(VirtIODevice *vdev, VirtQueue *vq)
> > > +{
> > > +    VirtIOPstore *s = VIRTIO_PSTORE(vdev);
> > > +    VirtQueueElement *elem;
> > > +    struct virtio_pstore_hdr *hdr;
> > > +    ssize_t len;
> > > +
> > > +    for (;;) {
> > > +        elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
> > > +        if (!elem) {
> > > +            return;
> > > +        }
> > > +
> > > +        hdr = elem->out_sg[0].iov_base;
> > > +        if (elem->out_sg[0].iov_len != sizeof(*hdr)) {
> > > +            error_report("invalid header size: %u",
> > > +                         (unsigned)elem->out_sg[0].iov_len);
> > > +            exit(1);
> > > +        }
> > 
> > Please use iov_to_buf() instead of directly accessing out_sg[].  Virtio
> > devices are not supposed to assume a particular iovec layout.  In other
> > words, virtio_pstore_hdr could be split across multiple out_sg[] iovecs.
> > 
> > You must also copy in data (similar to Linux syscall implementations) to
> > prevent the guest from modifying data while the command is processed.
> > Such race conditions could lead to security bugs.
> 
> By accessing elem->out_sg[0].iov_base directly, I abused it as an
> in-and-out buffer.  But it seems not allowed by the virtio spec, do I
> have to use separate buffers for request and response?

Yes, a virtqueue element has (host read-only) out buffers followed by
(host write-only) in buffers.

Stefan

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH 2/3] qemu: Implement virtio-pstore device
From: Stefan Hajnoczi @ 2016-07-20  8:29 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Tony Luck, Radim Kr??m????, Kees Cook, kvm, Michael S. Tsirkin,
	Anton Vorontsov, LKML, Steven Rostedt, qemu-devel, Minchan Kim,
	Anthony Liguori, Colin Cross, Paolo Bonzini, virtualization,
	Ingo Molnar
In-Reply-To: <20160718142118.GA16575@danjae.aot.lge.com>


[-- Attachment #1.1: Type: text/plain, Size: 1995 bytes --]

On Mon, Jul 18, 2016 at 11:21:18PM +0900, Namhyung Kim wrote:
> On Mon, Jul 18, 2016 at 11:03:53AM +0100, Stefan Hajnoczi wrote:
> > On Mon, Jul 18, 2016 at 01:37:40PM +0900, Namhyung Kim wrote:
> > > From: Namhyung Kim <namhyung@gmail.com>
> > > 
> > > Add virtio pstore device to allow kernel log files saved on the host.
> > > It will save the log files on the directory given by pstore device
> > > option.
> > > 
> > >   $ qemu-system-x86_64 -device virtio-pstore,directory=dir-xx ...
> > > 
> > >   (guest) # echo c > /proc/sysrq-trigger
> > > 
> > >   $ ls dir-xx
> > >   dmesg-0.enc.z  dmesg-1.enc.z
> > > 
> > > The log files are usually compressed using zlib.  Users can see the log
> > > messages directly on the host or on the guest (using pstore filesystem).
> > 
> > The implementation is synchronous (i.e. can pause guest code execution),
> > does not handle write errors, and does not limit the amount of data the
> > guest can write.  This is sufficient for ad-hoc debugging and usage with
> > trusted guests.
> > 
> > If you want this to be available in environments where the guest isn't
> > trusted then there must be a limit on how much the guest can write or
> > some kind of log rotation.
> 
> Right.  The synchronous IO is required by the pstore subsystem
> implementation AFAIK (it uses a single psinfo->buf in the loop).

The pstore subsystem in Linux may be synchronous but the QEMU device
emulation does not have to be synchronous.

Synchronous device emulation means that no other vcpu or QEMU main loop
processing can occur while device emulation is blocked in a syscall.
This can make the QEMU monitor unavailable for libvirt and management
tools.  The guest can experience jitter since vcpus freeze if they
vmexit while device emulation is blocked (it holds the QEMU global
mutex and prevents other QEMU threads from making progress).

You could use include/io.h for asynchronous I/O (qio_channel_add_watch()).

Stefan

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH] content: Reserve virtio device ID for pstore
From: Stefan Hajnoczi @ 2016-07-20  9:16 UTC (permalink / raw)
  To: Namhyung Kim; +Cc: virtio-dev, Namhyung Kim, qemu-devel, kvm, virtualization
In-Reply-To: <1468941217-12904-1-git-send-email-namhyung@gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 421 bytes --]

On Wed, Jul 20, 2016 at 12:13:37AM +0900, Namhyung Kim wrote:
> From: Namhyung Kim <namhyung@kernel.org>
> 
> This patch just reserve next available device ID for pstore device
> type.  The device specification for pstore will come later.
> 
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  content.tex | 2 ++
>  1 file changed, 2 insertions(+)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH 2/3] qemu: Implement virtio-pstore device
From: Namhyung Kim @ 2016-07-20 12:30 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Tony Luck, Radim Kr??m????, Kees Cook, kvm, Michael S. Tsirkin,
	Anton Vorontsov, LKML, Steven Rostedt, qemu-devel, Minchan Kim,
	Anthony Liguori, Colin Cross, Paolo Bonzini, virtualization,
	Ingo Molnar
In-Reply-To: <20160720082108.GA13233@stefanha-x1.localdomain>

On Wed, Jul 20, 2016 at 09:21:08AM +0100, Stefan Hajnoczi wrote:
> On Wed, Jul 20, 2016 at 12:48:39AM +0900, Namhyung Kim wrote:
> > Hello,
> > 
> > On Mon, Jul 18, 2016 at 11:03:53AM +0100, Stefan Hajnoczi wrote:
> > > On Mon, Jul 18, 2016 at 01:37:40PM +0900, Namhyung Kim wrote:
> > > > +static void virtio_pstore_handle_io(VirtIODevice *vdev, VirtQueue *vq)
> > > > +{
> > > > +    VirtIOPstore *s = VIRTIO_PSTORE(vdev);
> > > > +    VirtQueueElement *elem;
> > > > +    struct virtio_pstore_hdr *hdr;
> > > > +    ssize_t len;
> > > > +
> > > > +    for (;;) {
> > > > +        elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
> > > > +        if (!elem) {
> > > > +            return;
> > > > +        }
> > > > +
> > > > +        hdr = elem->out_sg[0].iov_base;
> > > > +        if (elem->out_sg[0].iov_len != sizeof(*hdr)) {
> > > > +            error_report("invalid header size: %u",
> > > > +                         (unsigned)elem->out_sg[0].iov_len);
> > > > +            exit(1);
> > > > +        }
> > > 
> > > Please use iov_to_buf() instead of directly accessing out_sg[].  Virtio
> > > devices are not supposed to assume a particular iovec layout.  In other
> > > words, virtio_pstore_hdr could be split across multiple out_sg[] iovecs.
> > > 
> > > You must also copy in data (similar to Linux syscall implementations) to
> > > prevent the guest from modifying data while the command is processed.
> > > Such race conditions could lead to security bugs.
> > 
> > By accessing elem->out_sg[0].iov_base directly, I abused it as an
> > in-and-out buffer.  But it seems not allowed by the virtio spec, do I
> > have to use separate buffers for request and response?
> 
> Yes, a virtqueue element has (host read-only) out buffers followed by
> (host write-only) in buffers.

Thanks for clarification.  I'll split them.

Thanks,
Namhyung

^ permalink raw reply

* Re: [PATCH 2/3] qemu: Implement virtio-pstore device
From: Namhyung Kim @ 2016-07-20 12:46 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Tony Luck, Radim Kr??m????, Kees Cook, kvm, Michael S. Tsirkin,
	Anton Vorontsov, LKML, Steven Rostedt, qemu-devel, Minchan Kim,
	Anthony Liguori, Colin Cross, Paolo Bonzini, virtualization,
	Ingo Molnar
In-Reply-To: <20160720082906.GB13233@stefanha-x1.localdomain>

On Wed, Jul 20, 2016 at 09:29:06AM +0100, Stefan Hajnoczi wrote:
> On Mon, Jul 18, 2016 at 11:21:18PM +0900, Namhyung Kim wrote:
> > On Mon, Jul 18, 2016 at 11:03:53AM +0100, Stefan Hajnoczi wrote:
> > > On Mon, Jul 18, 2016 at 01:37:40PM +0900, Namhyung Kim wrote:
> > > > From: Namhyung Kim <namhyung@gmail.com>
> > > > 
> > > > Add virtio pstore device to allow kernel log files saved on the host.
> > > > It will save the log files on the directory given by pstore device
> > > > option.
> > > > 
> > > >   $ qemu-system-x86_64 -device virtio-pstore,directory=dir-xx ...
> > > > 
> > > >   (guest) # echo c > /proc/sysrq-trigger
> > > > 
> > > >   $ ls dir-xx
> > > >   dmesg-0.enc.z  dmesg-1.enc.z
> > > > 
> > > > The log files are usually compressed using zlib.  Users can see the log
> > > > messages directly on the host or on the guest (using pstore filesystem).
> > > 
> > > The implementation is synchronous (i.e. can pause guest code execution),
> > > does not handle write errors, and does not limit the amount of data the
> > > guest can write.  This is sufficient for ad-hoc debugging and usage with
> > > trusted guests.
> > > 
> > > If you want this to be available in environments where the guest isn't
> > > trusted then there must be a limit on how much the guest can write or
> > > some kind of log rotation.
> > 
> > Right.  The synchronous IO is required by the pstore subsystem
> > implementation AFAIK (it uses a single psinfo->buf in the loop).
> 
> The pstore subsystem in Linux may be synchronous but the QEMU device
> emulation does not have to be synchronous.
> 
> Synchronous device emulation means that no other vcpu or QEMU main loop
> processing can occur while device emulation is blocked in a syscall.
> This can make the QEMU monitor unavailable for libvirt and management
> tools.  The guest can experience jitter since vcpus freeze if they
> vmexit while device emulation is blocked (it holds the QEMU global
> mutex and prevents other QEMU threads from making progress).

Thanks for your detailed explanation.  I'll try to change pstore
implementation to deal with async devices.

Thanks,
Namhyung

> 
> You could use include/io.h for asynchronous I/O (qio_channel_add_watch()).
> 
> Stefan

^ permalink raw reply

* Re: [PATCH 1/3] virtio: Basic implementation of virtio pstore driver
From: Namhyung Kim @ 2016-07-20 12:56 UTC (permalink / raw)
  To: Kees Cook
  Cc: Tony Luck, Radim Kr??m????, KVM, Michael S. Tsirkin,
	Anton Vorontsov, LKML, Steven Rostedt, qemu-devel, Minchan Kim,
	Anthony Liguori, Colin Cross, Paolo Bonzini,
	virtualization@lists.linux-foundation.org, Ingo Molnar
In-Reply-To: <20160719134303.GA20047@danjae.aot.lge.com>

On Tue, Jul 19, 2016 at 10:43 PM, Namhyung Kim <namhyung@kernel.org> wrote:
> The other one is the file management on the host side.  I am thinking
> of a simple way that the log file is splitted when it exceeds the half
> of the allowed max size.  It would be configurable and might allow
> unlimited logs if user requests it explicitly (if qemu guys say ok)..

On a second thought, I think that size of log file should not exceed
the pstore buffer size in order to be read back later.  When total size
of log files becomes larger then the limit, the oldest file of same
type can be deleted. This way looks simpler to manage IMHO.

Also I think the pstore id should be managed by host device rather
than guest driver to handle splitted files properly.

Thanks,
Namhyung

^ permalink raw reply

* RE: [PATCH v2 kernel 0/7] Extend virtio-balloon for fast (de)inflating & fast live migration
From: Li, Liang Z @ 2016-07-21  8:14 UTC (permalink / raw)
  To: mst@redhat.com
  Cc: virtio-dev@lists.oasis-open.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, qemu-devel@nongnu.org,
	dgilbert@redhat.com, virtualization@lists.linux-foundation.org
In-Reply-To: <1467196340-22079-1-git-send-email-liang.z.li@intel.com>

Hi Michael,

If you have time, could you help to review this patch set?

Thanks!
Liang


> -----Original Message-----
> From: Li, Liang Z
> Sent: Wednesday, June 29, 2016 6:32 PM
> To: mst@redhat.com
> Cc: linux-kernel@vger.kernel.org; virtualization@lists.linux-foundation.org;
> kvm@vger.kernel.org; qemu-devel@nongnu.org; virtio-dev@lists.oasis-
> open.org; dgilbert@redhat.com; quintela@redhat.com; Li, Liang Z
> Subject: [PATCH v2 kernel 0/7] Extend virtio-balloon for fast (de)inflating &
> fast live migration
> 
> This patch set contains two parts of changes to the virtio-balloon.
> 
> One is the change for speeding up the inflating & deflating process, the main
> idea of this optimization is to use bitmap to send the page information to
> host instead of the PFNs, to reduce the overhead of virtio data transmission,
> address translation and madvise(). This can help to improve the performance
> by about 85%.
> 
> Another change is for speeding up live migration. By skipping process guest's
> free pages in the first round of data copy, to reduce needless data processing,
> this can help to save quite a lot of CPU cycles and network bandwidth. We
> put guest's free page information in bitmap and send it to host with the virt
> queue of virtio-balloon. For an idle 8GB guest, this can help to shorten the
> total live migration time from 2Sec to about 500ms in the 10Gbps network
> environment.
> 
> 
> Changes from v1 to v2:
>     * Abandon the patch for dropping page cache.
>     * Put some structures to uapi head file.
>     * Use a new way to determine the page bitmap size.
>     * Use a unified way to send the free page information with the bitmap
>     * Address the issues referred in MST's comments
> 
> Liang Li (7):
>   virtio-balloon: rework deflate to add page to a list
>   virtio-balloon: define new feature bit and page bitmap head
>   mm: add a function to get the max pfn
>   virtio-balloon: speed up inflate/deflate process
>   virtio-balloon: define feature bit and head for misc virt queue
>   mm: add the related functions to get free page info
>   virtio-balloon: tell host vm's free page info
> 
>  drivers/virtio/virtio_balloon.c     | 306
> +++++++++++++++++++++++++++++++-----
>  include/uapi/linux/virtio_balloon.h |  41 +++++
>  mm/page_alloc.c                     |  52 ++++++
>  3 files changed, 359 insertions(+), 40 deletions(-)
> 
> --
> 1.8.3.1

^ permalink raw reply

* [PATCH v3 0/4] implement vcpu preempted check
From: Pan Xinhui @ 2016-07-21 11:45 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev, virtualization, linux-s390,
	xen-devel-request, kvm
  Cc: kernellwp, jgross, Pan Xinhui, peterz, benh, will.deacon, mingo,
	paulus, mpe, pbonzini, paulmck

change from v2:
	no code change, fix typos, update some comments

change from v1:
	a simplier definition of default vcpu_is_preempted
	skip mahcine type check on ppc, and add config. remove dedicated macro.
	add one patch to drop overload of rwsem_spin_on_owner and mutex_spin_on_owner. 
	add more comments
	thanks boqun and Peter's suggestion.

This patch set aims to fix lock holder preemption issues.

test-case:
perf record -a perf bench sched messaging -g 400 -p && perf report

before patch:
18.09%  sched-messaging  [kernel.vmlinux]  [k] osq_lock
12.28%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
 5.27%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
 3.89%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task
 3.64%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
 3.41%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner.is
 2.49%  sched-messaging  [kernel.vmlinux]  [k] system_call

after patch:
 9.99%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
 5.28%  sched-messaging  [unknown]         [H] 0xc0000000000768e0
 4.27%  sched-messaging  [kernel.vmlinux]  [k] __copy_tofrom_user_power7
 3.77%  sched-messaging  [kernel.vmlinux]  [k] copypage_power7
 3.24%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
 3.02%  sched-messaging  [kernel.vmlinux]  [k] system_call
 2.69%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task

We introduce interface bool vcpu_is_preempted(int cpu) and use it in some spin
loops of osq_lock, rwsem_spin_on_owner and mutex_spin_on_owner.
These spin_on_onwer variant also cause rcu stall before we apply this patch set

Pan Xinhui (4):
  kernel/sched: introduce vcpu preempted check interface
  powerpc/spinlock: support vcpu preempted check
  locking/osq: Drop the overhead of osq_lock()
  kernel/locking: Drop the overhead of {mutex,rwsem}_spin_on_owner

 arch/powerpc/include/asm/spinlock.h | 18 ++++++++++++++++++
 include/linux/sched.h               | 12 ++++++++++++
 kernel/locking/mutex.c              | 15 +++++++++++++--
 kernel/locking/osq_lock.c           | 10 +++++++++-
 kernel/locking/rwsem-xadd.c         | 16 +++++++++++++---
 5 files changed, 65 insertions(+), 6 deletions(-)

-- 
2.4.11

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox