* Re: [PATCH] xen: do not disable netfront in dom0
From: Konrad Rzeszutek Wilk @ 2012-05-22 18:34 UTC (permalink / raw)
To: Marek Marczykowski, davem
Cc: netdev, Jeremy Fitzhardinge, virtualization, linux-kernel,
xen-devel
In-Reply-To: <20120522130558.D828E6C7@duch.mimuw.edu.pl>
On Sun, May 20, 2012 at 01:45:10PM +0200, Marek Marczykowski wrote:
> Netfront driver can be also useful in dom0, eg when all NICs are assigned to
> some domU (aka driver domain). Then using netback in domU and netfront in dom0
> is the only way to get network access in dom0.
>
> Signed-off-by: Marek Marczykowski <marmarek@invisiblethingslab.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
> drivers/net/xen-netfront.c | 6 ------
> 1 files changed, 0 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index 698b905..e31ebff 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -1953,9 +1953,6 @@ static int __init netif_init(void)
> if (!xen_domain())
> return -ENODEV;
>
> - if (xen_initial_domain())
> - return 0;
> -
> printk(KERN_INFO "Initialising Xen virtual ethernet driver.\n");
>
> return xenbus_register_frontend(&netfront_driver);
> @@ -1965,9 +1962,6 @@ module_init(netif_init);
>
> static void __exit netif_exit(void)
> {
> - if (xen_initial_domain())
> - return;
> -
> xenbus_unregister_driver(&netfront_driver);
> }
> module_exit(netif_exit);
> --
> 1.7.4.4
^ permalink raw reply
* Re: [RFC PATCH 1/5] block: Introduce q->abort_queue_fn()
From: Tejun Heo @ 2012-05-22 15:14 UTC (permalink / raw)
To: Asias He; +Cc: Jens Axboe, kvm, Michael S. Tsirkin, virtualization,
linux-fsdevel
In-Reply-To: <4FBB409D.4070201@redhat.com>
Hello,
On Tue, May 22, 2012 at 03:30:37PM +0800, Asias He wrote:
> On 05/21/2012 11:42 PM, Tejun Heo wrote:
> 1) if the queue is stopped, q->request_fn() will never call called.
> we will be stuck in the loop forever. This can happen if the remove
> method is called after the q->request_fn() calls blk_stop_queue() to
> stop the queue when the device is full, and before the device
> interrupt handler to start the queue. This can be fixed by calling
> blk_start_queue() before __blk_run_queue(q).
>
> blk_drain_queue() {
> while(true) {
> ...
> if (!list_empty(&q->queue_head))
> __blk_run_queue(q);
> ...
> }
> }
Wouldn't that be properly fixed by making queue cleanup override
stopped state?
> 2) Since the device is gonna be removed, is it safe to rely on the
> device to finish the request before the DEAD marking? E.g, In
> vritio-blk, We reset the device and thus disable the interrupt
> before we call blk_cleanup_queue(). I also suspect that the real
> hardware can finish the pending requests when being hot-unplugged.
Yes, it should be safe (otherwise it's a driver bug). Device driver
already knows the state of the device it is driving. If the device
can't service requests for whatever reason, the device driver should
abort any in-flight and future requests. That's how other block
drivers behave and I don't see why virtio should be any different.
Also, blk_drain_queue() is used for other purposes too - elevator
switch and blkcg policy changes. You definitely don't want to be
aborting requests across those events.
So, NACK.
Thanks.
--
tejun
^ permalink raw reply
* Re: [RFC PATCH 5/5] virtio-blk: Use block layer provided spinlock
From: Asias He @ 2012-05-22 8:22 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: kvm, virtualization
In-Reply-To: <20120521205953.GC17031@redhat.com>
On 05/22/2012 04:59 AM, Michael S. Tsirkin wrote:
> On Mon, May 21, 2012 at 05:08:33PM +0800, Asias He wrote:
>> Block layer will allocate a spinlock for the queue if the driver does
>> not provide one in blk_init_queue().
>>
>> The reason to use the internal spinlock is that blk_cleanup_queue() will
>> switch to use the internal spinlock in the cleanup code path.
>> if (q->queue_lock !=&q->__queue_lock)
>> q->queue_lock =&q->__queue_lock;
>>
>> However, processes which are in D state might have taken the driver
>> provided spinlock, when the processes wake up , they would release the
>> block provided spinlock.
>
> Are you saying any driver with its own spinlock is
> broken if hotunplugged under stress?
Hi, Michael
I can not say that. It is very hard to find real hardware device to try
this. I tried on qemu with LSI Logic / Symbios Logic 53c895a scsi disk
with hot-unplug. It is completely broken. And IDE does not support
hotplug at all.
Do you see any downside of using the block provided spinlock?
>
>> =====================================
>> [ BUG: bad unlock balance detected! ]
>> 3.4.0-rc7+ #238 Not tainted
>> -------------------------------------
>> fio/3587 is trying to release lock (&(&q->__queue_lock)->rlock) at:
>> [<ffffffff813274d2>] blk_queue_bio+0x2a2/0x380
>> but there are no more locks to release!
>>
>> other info that might help us debug this:
>> 1 lock held by fio/3587:
>> #0: (&(&vblk->lock)->rlock){......}, at:
>> [<ffffffff8132661a>] get_request_wait+0x19a/0x250
>>
>> Cc: Rusty Russell<rusty@rustcorp.com.au>
>> Cc: "Michael S. Tsirkin"<mst@redhat.com>
>> Cc: virtualization@lists.linux-foundation.org
>> Cc: kvm@vger.kernel.org
>> Signed-off-by: Asias He<asias@redhat.com>
>> ---
>> drivers/block/virtio_blk.c | 9 +++------
>> 1 file changed, 3 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
>> index ba35509..0c2f0e8 100644
>> --- a/drivers/block/virtio_blk.c
>> +++ b/drivers/block/virtio_blk.c
>> @@ -21,8 +21,6 @@ struct workqueue_struct *virtblk_wq;
>>
>> struct virtio_blk
>> {
>> - spinlock_t lock;
>> -
>> struct virtio_device *vdev;
>> struct virtqueue *vq;
>>
>> @@ -65,7 +63,7 @@ static void blk_done(struct virtqueue *vq)
>> unsigned int len;
>> unsigned long flags;
>>
>> - spin_lock_irqsave(&vblk->lock, flags);
>> + spin_lock_irqsave(vblk->disk->queue->queue_lock, flags);
>> while ((vbr = virtqueue_get_buf(vblk->vq,&len)) != NULL) {
>> int error;
>>
>> @@ -99,7 +97,7 @@ static void blk_done(struct virtqueue *vq)
>> }
>> /* In case queue is stopped waiting for more buffers. */
>> blk_start_queue(vblk->disk->queue);
>> - spin_unlock_irqrestore(&vblk->lock, flags);
>> + spin_unlock_irqrestore(vblk->disk->queue->queue_lock, flags);
>> }
>>
>> static bool do_req(struct request_queue *q, struct virtio_blk *vblk,
>> @@ -456,7 +454,6 @@ static int __devinit virtblk_probe(struct virtio_device *vdev)
>> goto out_free_index;
>> }
>>
>> - spin_lock_init(&vblk->lock);
>> vblk->vdev = vdev;
>> vblk->sg_elems = sg_elems;
>> sg_init_table(vblk->sg, vblk->sg_elems);
>> @@ -481,7 +478,7 @@ static int __devinit virtblk_probe(struct virtio_device *vdev)
>> goto out_mempool;
>> }
>>
>> - q = vblk->disk->queue = blk_init_queue(do_virtblk_request,&vblk->lock);
>> + q = vblk->disk->queue = blk_init_queue(do_virtblk_request, NULL);
>> if (!q) {
>> err = -ENOMEM;
>> goto out_put_disk;
>> --
>> 1.7.10.1
--
Asias
^ permalink raw reply
* Re: [RFC PATCH 1/5] block: Introduce q->abort_queue_fn()
From: Asias He @ 2012-05-22 7:30 UTC (permalink / raw)
To: Tejun Heo
Cc: Jens Axboe, kvm, Michael S. Tsirkin, virtualization,
linux-fsdevel
In-Reply-To: <20120521154213.GB6549@google.com>
On 05/21/2012 11:42 PM, Tejun Heo wrote:
> On Mon, May 21, 2012 at 05:08:29PM +0800, Asias He wrote:
>> When user hot-unplug a disk which is busy serving I/O, __blk_run_queue
>> might be unable to drain all the requests. As a result, the
>> blk_drain_queue() would loop forever and blk_cleanup_queue would not
>> return. So hot-unplug will fail.
>>
>> This patch adds a callback in blk_drain_queue() for low lever driver to
>> abort requests.
>>
>> Currently, this is useful for virtio-blk to do cleanup in hot-unplug.
>
> Why is this necessary? virtio-blk should know that the device is gone
> and fail in-flight / new commands. That's what other drivers do.
> What makes virtio-blk different?
blk_cleanup_queue() relies on __blk_run_queue() to finish all the
requests before DEAD marking, right?
There are two problems:
1) if the queue is stopped, q->request_fn() will never call called. we
will be stuck in the loop forever. This can happen if the remove method
is called after the q->request_fn() calls blk_stop_queue() to stop the
queue when the device is full, and before the device interrupt handler
to start the queue. This can be fixed by calling blk_start_queue()
before __blk_run_queue(q).
blk_drain_queue() {
while(true) {
...
if (!list_empty(&q->queue_head))
__blk_run_queue(q);
...
}
}
2) Since the device is gonna be removed, is it safe to rely on the
device to finish the request before the DEAD marking? E.g, In
vritio-blk, We reset the device and thus disable the interrupt before we
call blk_cleanup_queue(). I also suspect that the real hardware can
finish the pending requests when being hot-unplugged.
So I proposed the q->abort_queue_fn() callback in blk_drain_queue() for
the driver to abort the queue explicitly no mater how the device behaves.
BTW, do we have any infrastructure in block layer to track the requests
already dispatched to driver. This might be useful for driver if it want
to abort all of them. Otherwise the driver has to do it on their own.
--
Asias
^ permalink raw reply
* Re: [RFC PATCH 5/5] virtio-blk: Use block layer provided spinlock
From: Michael S. Tsirkin @ 2012-05-21 20:59 UTC (permalink / raw)
To: Asias He; +Cc: kvm, virtualization
In-Reply-To: <1337591313-26333-5-git-send-email-asias@redhat.com>
On Mon, May 21, 2012 at 05:08:33PM +0800, Asias He wrote:
> Block layer will allocate a spinlock for the queue if the driver does
> not provide one in blk_init_queue().
>
> The reason to use the internal spinlock is that blk_cleanup_queue() will
> switch to use the internal spinlock in the cleanup code path.
> if (q->queue_lock != &q->__queue_lock)
> q->queue_lock = &q->__queue_lock;
>
> However, processes which are in D state might have taken the driver
> provided spinlock, when the processes wake up , they would release the
> block provided spinlock.
Are you saying any driver with its own spinlock is
broken if hotunplugged under stress?
> =====================================
> [ BUG: bad unlock balance detected! ]
> 3.4.0-rc7+ #238 Not tainted
> -------------------------------------
> fio/3587 is trying to release lock (&(&q->__queue_lock)->rlock) at:
> [<ffffffff813274d2>] blk_queue_bio+0x2a2/0x380
> but there are no more locks to release!
>
> other info that might help us debug this:
> 1 lock held by fio/3587:
> #0: (&(&vblk->lock)->rlock){......}, at:
> [<ffffffff8132661a>] get_request_wait+0x19a/0x250
>
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: virtualization@lists.linux-foundation.org
> Cc: kvm@vger.kernel.org
> Signed-off-by: Asias He <asias@redhat.com>
> ---
> drivers/block/virtio_blk.c | 9 +++------
> 1 file changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index ba35509..0c2f0e8 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -21,8 +21,6 @@ struct workqueue_struct *virtblk_wq;
>
> struct virtio_blk
> {
> - spinlock_t lock;
> -
> struct virtio_device *vdev;
> struct virtqueue *vq;
>
> @@ -65,7 +63,7 @@ static void blk_done(struct virtqueue *vq)
> unsigned int len;
> unsigned long flags;
>
> - spin_lock_irqsave(&vblk->lock, flags);
> + spin_lock_irqsave(vblk->disk->queue->queue_lock, flags);
> while ((vbr = virtqueue_get_buf(vblk->vq, &len)) != NULL) {
> int error;
>
> @@ -99,7 +97,7 @@ static void blk_done(struct virtqueue *vq)
> }
> /* In case queue is stopped waiting for more buffers. */
> blk_start_queue(vblk->disk->queue);
> - spin_unlock_irqrestore(&vblk->lock, flags);
> + spin_unlock_irqrestore(vblk->disk->queue->queue_lock, flags);
> }
>
> static bool do_req(struct request_queue *q, struct virtio_blk *vblk,
> @@ -456,7 +454,6 @@ static int __devinit virtblk_probe(struct virtio_device *vdev)
> goto out_free_index;
> }
>
> - spin_lock_init(&vblk->lock);
> vblk->vdev = vdev;
> vblk->sg_elems = sg_elems;
> sg_init_table(vblk->sg, vblk->sg_elems);
> @@ -481,7 +478,7 @@ static int __devinit virtblk_probe(struct virtio_device *vdev)
> goto out_mempool;
> }
>
> - q = vblk->disk->queue = blk_init_queue(do_virtblk_request, &vblk->lock);
> + q = vblk->disk->queue = blk_init_queue(do_virtblk_request, NULL);
> if (!q) {
> err = -ENOMEM;
> goto out_put_disk;
> --
> 1.7.10.1
^ permalink raw reply
* Re: [RFC PATCH 1/5] block: Introduce q->abort_queue_fn()
From: Tejun Heo @ 2012-05-21 15:42 UTC (permalink / raw)
To: Asias He; +Cc: Jens Axboe, kvm, Michael S. Tsirkin, virtualization,
linux-fsdevel
In-Reply-To: <1337591313-26333-1-git-send-email-asias@redhat.com>
On Mon, May 21, 2012 at 05:08:29PM +0800, Asias He wrote:
> When user hot-unplug a disk which is busy serving I/O, __blk_run_queue
> might be unable to drain all the requests. As a result, the
> blk_drain_queue() would loop forever and blk_cleanup_queue would not
> return. So hot-unplug will fail.
>
> This patch adds a callback in blk_drain_queue() for low lever driver to
> abort requests.
>
> Currently, this is useful for virtio-blk to do cleanup in hot-unplug.
Why is this necessary? virtio-blk should know that the device is gone
and fail in-flight / new commands. That's what other drivers do.
What makes virtio-blk different?
Thanks.
--
tejun
^ permalink raw reply
* [RFC PATCH 5/5] virtio-blk: Use block layer provided spinlock
From: Asias He @ 2012-05-21 9:08 UTC (permalink / raw)
To: Rusty Russell, Michael S. Tsirkin; +Cc: kvm, virtualization
In-Reply-To: <1337591313-26333-1-git-send-email-asias@redhat.com>
Block layer will allocate a spinlock for the queue if the driver does
not provide one in blk_init_queue().
The reason to use the internal spinlock is that blk_cleanup_queue() will
switch to use the internal spinlock in the cleanup code path.
if (q->queue_lock != &q->__queue_lock)
q->queue_lock = &q->__queue_lock;
However, processes which are in D state might have taken the driver
provided spinlock, when the processes wake up , they would release the
block provided spinlock.
=====================================
[ BUG: bad unlock balance detected! ]
3.4.0-rc7+ #238 Not tainted
-------------------------------------
fio/3587 is trying to release lock (&(&q->__queue_lock)->rlock) at:
[<ffffffff813274d2>] blk_queue_bio+0x2a2/0x380
but there are no more locks to release!
other info that might help us debug this:
1 lock held by fio/3587:
#0: (&(&vblk->lock)->rlock){......}, at:
[<ffffffff8132661a>] get_request_wait+0x19a/0x250
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Cc: kvm@vger.kernel.org
Signed-off-by: Asias He <asias@redhat.com>
---
drivers/block/virtio_blk.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index ba35509..0c2f0e8 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -21,8 +21,6 @@ struct workqueue_struct *virtblk_wq;
struct virtio_blk
{
- spinlock_t lock;
-
struct virtio_device *vdev;
struct virtqueue *vq;
@@ -65,7 +63,7 @@ static void blk_done(struct virtqueue *vq)
unsigned int len;
unsigned long flags;
- spin_lock_irqsave(&vblk->lock, flags);
+ spin_lock_irqsave(vblk->disk->queue->queue_lock, flags);
while ((vbr = virtqueue_get_buf(vblk->vq, &len)) != NULL) {
int error;
@@ -99,7 +97,7 @@ static void blk_done(struct virtqueue *vq)
}
/* In case queue is stopped waiting for more buffers. */
blk_start_queue(vblk->disk->queue);
- spin_unlock_irqrestore(&vblk->lock, flags);
+ spin_unlock_irqrestore(vblk->disk->queue->queue_lock, flags);
}
static bool do_req(struct request_queue *q, struct virtio_blk *vblk,
@@ -456,7 +454,6 @@ static int __devinit virtblk_probe(struct virtio_device *vdev)
goto out_free_index;
}
- spin_lock_init(&vblk->lock);
vblk->vdev = vdev;
vblk->sg_elems = sg_elems;
sg_init_table(vblk->sg, vblk->sg_elems);
@@ -481,7 +478,7 @@ static int __devinit virtblk_probe(struct virtio_device *vdev)
goto out_mempool;
}
- q = vblk->disk->queue = blk_init_queue(do_virtblk_request, &vblk->lock);
+ q = vblk->disk->queue = blk_init_queue(do_virtblk_request, NULL);
if (!q) {
err = -ENOMEM;
goto out_put_disk;
--
1.7.10.1
^ permalink raw reply related
* [RFC PATCH 4/5] virtio-blk: Use q->abort_queue_fn() to abort requests
From: Asias He @ 2012-05-21 9:08 UTC (permalink / raw)
To: Rusty Russell, Michael S. Tsirkin
Cc: Jens Axboe, kvm, virtualization, linux-fsdevel, Tejun Heo
In-Reply-To: <1337591313-26333-1-git-send-email-asias@redhat.com>
virtblk_abort_queue() will be called by the block layer when cleans up
the queue. blk_cleanup_queue -> blk_drain_queue() -> q->abort_queue_fn(q)
virtblk_abort_queue()
1) Abort requests in block which is not dispatched to driver
2) Abort requests already dispatched to driver
3) Wake up processes which is sleeping on get_request_wait()
This makes hot-unplug a disk which is busy serving I/O success.
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Cc: kvm@vger.kernel.org
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Tejun Heo <tj@kernel.org>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Asias He <asias@redhat.com>
---
drivers/block/virtio_blk.c | 38 +++++++++++++++++++++++++++-----------
1 file changed, 27 insertions(+), 11 deletions(-)
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 7d5f5b0..ba35509 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -182,6 +182,31 @@ static bool do_req(struct request_queue *q, struct virtio_blk *vblk,
return true;
}
+void virtblk_abort_queue(struct request_queue *q)
+{
+ struct virtio_blk *vblk = q->queuedata;
+ struct virtblk_req *vbr;
+ int i;
+
+ /* Abort requests in block layer. */
+ elv_abort_queue(q);
+
+ /* Abort requests dispatched to driver. */
+ while ((vbr = virtqueue_detach_unused_buf(vblk->vq))) {
+ vbr->req->cmd_flags |= REQ_QUIET;
+ __blk_end_request_all(vbr->req, -EIO);
+ mempool_free(vbr, vblk->pool);
+ }
+
+ /* Wake up threads sleeping on get_request_wait() */
+ for (i = 0; i < ARRAY_SIZE(q->rq.wait); i++) {
+ if (waitqueue_active(&q->rq.wait[i]))
+ wake_up_all(&q->rq.wait[i]);
+ }
+
+ return;
+}
+
static void do_virtblk_request(struct request_queue *q)
{
struct virtio_blk *vblk = q->queuedata;
@@ -462,6 +487,8 @@ static int __devinit virtblk_probe(struct virtio_device *vdev)
goto out_put_disk;
}
+ blk_queue_abort_queue(q, virtblk_abort_queue);
+
q->queuedata = vblk;
virtblk_name_format("vd", index, vblk->disk->disk_name, DISK_NAME_LEN);
@@ -576,8 +603,6 @@ static void __devexit virtblk_remove(struct virtio_device *vdev)
{
struct virtio_blk *vblk = vdev->priv;
int index = vblk->index;
- struct virtblk_req *vbr;
- unsigned long flags;
/* Prevent config work handler from accessing the device. */
mutex_lock(&vblk->config_lock);
@@ -591,15 +616,6 @@ static void __devexit virtblk_remove(struct virtio_device *vdev)
flush_work(&vblk->config_work);
-
- /* Abort requests dispatched to driver. */
- spin_lock_irqsave(&vblk->lock, flags);
- while ((vbr = virtqueue_detach_unused_buf(vblk->vq))) {
- __blk_end_request_all(vbr->req, -EIO);
- mempool_free(vbr, vblk->pool);
- }
- spin_unlock_irqrestore(&vblk->lock, flags);
-
blk_cleanup_queue(vblk->disk->queue);
put_disk(vblk->disk);
mempool_destroy(vblk->pool);
--
1.7.10.1
^ permalink raw reply related
* [RFC PATCH 3/5] virtio-blk: Call del_gendisk() before disable guest kick
From: Asias He @ 2012-05-21 9:08 UTC (permalink / raw)
To: Rusty Russell, Michael S. Tsirkin; +Cc: kvm, virtualization
In-Reply-To: <1337591313-26333-1-git-send-email-asias@redhat.com>
del_gendisk() might not return due to failing to remove the
/sys/block/vda/serial sysfs entry when another thread (udev) is
trying to read it.
virtblk_remove()
vdev->config->reset() : guest will not kick us through interrupt
del_gendisk()
device_del()
kobject_del(): got stuck, sysfs entry ref count non zero
sysfs_open_file(): user space process read /sys/block/vda/serial
sysfs_get_active() : got sysfs entry ref count
dev_attr_show()
virtblk_serial_show()
blk_execute_rq() : got stuck, interrupt is disabled
request cannot be finished
This patch fixes it by calling del_gendisk() before we disable guest's
interrupt so that the request sent in virtblk_serial_show() will be
finished and del_gendisk() will success.
This fixes another race in hot-unplug process.
It is save to call del_gendisk(vblk->disk) before
flush_work(&vblk->config_work) which might access vblk->disk, because
vblk->disk is not freed until put_disk(vblk->disk).
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Cc: kvm@vger.kernel.org
Signed-off-by: Asias He <asias@redhat.com>
---
drivers/block/virtio_blk.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 693187d..7d5f5b0 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -584,12 +584,13 @@ static void __devexit virtblk_remove(struct virtio_device *vdev)
vblk->config_enable = false;
mutex_unlock(&vblk->config_lock);
+ del_gendisk(vblk->disk);
+
/* Stop all the virtqueues. */
vdev->config->reset(vdev);
flush_work(&vblk->config_work);
- del_gendisk(vblk->disk);
/* Abort requests dispatched to driver. */
spin_lock_irqsave(&vblk->lock, flags);
--
1.7.10.1
^ permalink raw reply related
* [RFC PATCH 1/5] block: Introduce q->abort_queue_fn()
From: Asias He @ 2012-05-21 9:08 UTC (permalink / raw)
To: Jens Axboe, Tejun Heo
Cc: kvm, Michael S. Tsirkin, virtualization, linux-fsdevel
When user hot-unplug a disk which is busy serving I/O, __blk_run_queue
might be unable to drain all the requests. As a result, the
blk_drain_queue() would loop forever and blk_cleanup_queue would not
return. So hot-unplug will fail.
This patch adds a callback in blk_drain_queue() for low lever driver to
abort requests.
Currently, this is useful for virtio-blk to do cleanup in hot-unplug.
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Tejun Heo <tj@kernel.org>
Cc: linux-fsdevel@vger.kernel.org
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Cc: kvm@vger.kernel.org
Signed-off-by: Asias He <asias@redhat.com>
---
block/blk-core.c | 3 +++
block/blk-settings.c | 12 ++++++++++++
include/linux/blkdev.h | 3 +++
3 files changed, 18 insertions(+)
diff --git a/block/blk-core.c b/block/blk-core.c
index 1f61b74..ca42fd7 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -369,6 +369,9 @@ void blk_drain_queue(struct request_queue *q, bool drain_all)
if (drain_all)
blk_throtl_drain(q);
+ if (q->abort_queue_fn)
+ q->abort_queue_fn(q);
+
/*
* This function might be called on a queue which failed
* driver init after queue creation. Some drivers
diff --git a/block/blk-settings.c b/block/blk-settings.c
index d3234fc..83ccb48 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -100,6 +100,18 @@ void blk_queue_lld_busy(struct request_queue *q, lld_busy_fn *fn)
EXPORT_SYMBOL_GPL(blk_queue_lld_busy);
/**
+ * blk_queue_abort_queue - set driver specific abort function
+ * @q: queue
+ * @mbfn: abort_queue_fn
+ */
+void blk_queue_abort_queue(struct request_queue *q, abort_queue_fn *afn)
+{
+ q->abort_queue_fn = afn;
+}
+EXPORT_SYMBOL(blk_queue_abort_queue);
+
+
+/**
* blk_set_default_limits - reset limits to default values
* @lim: the queue_limits structure to reset
*
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 2aa2466..e2d58bd 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -200,6 +200,7 @@ struct request_pm_state
typedef void (request_fn_proc) (struct request_queue *q);
typedef void (make_request_fn) (struct request_queue *q, struct bio *bio);
+typedef void (abort_queue_fn) (struct request_queue *q);
typedef int (prep_rq_fn) (struct request_queue *, struct request *);
typedef void (unprep_rq_fn) (struct request_queue *, struct request *);
@@ -282,6 +283,7 @@ struct request_queue {
request_fn_proc *request_fn;
make_request_fn *make_request_fn;
+ abort_queue_fn *abort_queue_fn;
prep_rq_fn *prep_rq_fn;
unprep_rq_fn *unprep_rq_fn;
merge_bvec_fn *merge_bvec_fn;
@@ -821,6 +823,7 @@ extern struct request_queue *blk_init_allocated_queue(struct request_queue *,
request_fn_proc *, spinlock_t *);
extern void blk_cleanup_queue(struct request_queue *);
extern void blk_queue_make_request(struct request_queue *, make_request_fn *);
+extern void blk_queue_abort_queue(struct request_queue *, abort_queue_fn *);
extern void blk_queue_bounce_limit(struct request_queue *, u64);
extern void blk_limits_max_hw_sectors(struct queue_limits *, unsigned int);
extern void blk_queue_max_hw_sectors(struct request_queue *, unsigned int);
--
1.7.10.1
^ permalink raw reply related
* Re: [PATCH] virtio: fix typo in comment
From: Rusty Russell @ 2012-05-21 1:10 UTC (permalink / raw)
To: Chen Baozi; +Cc: Chen Baozi, linux-kernel, virtualization
In-Reply-To: <1337481874-3472-1-git-send-email-baozich@gmail.com>
On Sun, 20 May 2012 10:44:34 +0800, Chen Baozi <baozich@gmail.com> wrote:
> From: Chen Baozi <chenbaozi@gmail.com>
>
> - Delete "@request_vqs" and "@free_vqs" comments, since
> they are no longer in struct virtio_config_ops.
> - According to the macro below, "@val" should be "@v".
>
> Signed-off-by: Chen Baozi <chenbaozi@gmail.com>
Thanks, applied!
Cheers,
Rusty.
^ permalink raw reply
* [PATCH RESENT] xen: do not disable netfront in dom0
From: Marek Marczykowski @ 2012-05-20 11:45 UTC (permalink / raw)
To: David Miller
Cc: Jeremy Fitzhardinge, Ian.Campbell, Konrad Rzeszutek Wilk, netdev,
Marek Marczykowski, xen-devel, virtualization, linux-kernel
In-Reply-To: <20120522194319.GA2691@phenom.dumpdata.com>
Netfront driver can be also useful in dom0, eg when all NICs are assigned to
some domU (aka driver domain). Then using netback in domU and netfront in dom0
is the only way to get network access in dom0.
Signed-off-by: Marek Marczykowski <marmarek@invisiblethingslab.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
drivers/net/xen-netfront.c | 6 ------
1 files changed, 0 insertions(+), 6 deletions(-)
diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 0ebbb19..2027afe 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -1962,9 +1962,6 @@ static int __init netif_init(void)
if (!xen_domain())
return -ENODEV;
- if (xen_initial_domain())
- return 0;
-
if (xen_hvm_domain() && !xen_platform_pci_unplug)
return -ENODEV;
@@ -1977,9 +1974,6 @@ module_init(netif_init);
static void __exit netif_exit(void)
{
- if (xen_initial_domain())
- return;
-
xenbus_unregister_driver(&netfront_driver);
}
module_exit(netif_exit);
--
1.7.4.4
^ permalink raw reply related
* [PATCH] xen: do not disable netfront in dom0
From: Marek Marczykowski @ 2012-05-20 11:45 UTC (permalink / raw)
To: xen-devel
Cc: Jeremy Fitzhardinge, Konrad Rzeszutek Wilk, netdev,
Marek Marczykowski, linux-kernel, virtualization
Netfront driver can be also useful in dom0, eg when all NICs are assigned to
some domU (aka driver domain). Then using netback in domU and netfront in dom0
is the only way to get network access in dom0.
Signed-off-by: Marek Marczykowski <marmarek@invisiblethingslab.com>
---
drivers/net/xen-netfront.c | 6 ------
1 files changed, 0 insertions(+), 6 deletions(-)
diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 698b905..e31ebff 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -1953,9 +1953,6 @@ static int __init netif_init(void)
if (!xen_domain())
return -ENODEV;
- if (xen_initial_domain())
- return 0;
-
printk(KERN_INFO "Initialising Xen virtual ethernet driver.\n");
return xenbus_register_frontend(&netfront_driver);
@@ -1965,9 +1962,6 @@ module_init(netif_init);
static void __exit netif_exit(void)
{
- if (xen_initial_domain())
- return;
-
xenbus_unregister_driver(&netfront_driver);
}
module_exit(netif_exit);
--
1.7.4.4
^ permalink raw reply related
* [PATCH] virtio: fix typo in comment
From: Chen Baozi @ 2012-05-20 2:44 UTC (permalink / raw)
To: rusty; +Cc: Chen Baozi, linux-kernel, virtualization
From: Chen Baozi <chenbaozi@gmail.com>
- Delete "@request_vqs" and "@free_vqs" comments, since
they are no longer in struct virtio_config_ops.
- According to the macro below, "@val" should be "@v".
Signed-off-by: Chen Baozi <chenbaozi@gmail.com>
---
include/linux/virtio_config.h | 11 +----------
1 files changed, 1 insertions(+), 10 deletions(-)
diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index 7323a33..fc457f4 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -74,15 +74,6 @@
* @set_status: write the status byte
* vdev: the virtio_device
* status: the new status byte
- * @request_vqs: request the specified number of virtqueues
- * vdev: the virtio_device
- * max_vqs: the max number of virtqueues we want
- * If supplied, must call before any virtqueues are instantiated.
- * To modify the max number of virtqueues after request_vqs has been
- * called, call free_vqs and then request_vqs with a new value.
- * @free_vqs: cleanup resources allocated by request_vqs
- * vdev: the virtio_device
- * If supplied, must call after all virtqueues have been deleted.
* @reset: reset the device
* vdev: the virtio device
* After this, status and feature negotiation must be done again
@@ -156,7 +147,7 @@ static inline bool virtio_has_feature(const struct virtio_device *vdev,
* @vdev: the virtio device
* @fbit: the feature bit
* @offset: the type to search for.
- * @val: a pointer to the value to fill in.
+ * @v: a pointer to the value to fill in.
*
* The return value is -ENOENT if the feature doesn't exist. Otherwise
* the config value is copied into whatever is pointed to by v. */
--
1.7.1
^ permalink raw reply related
* Call for Participation: ACM HPDC 2012 -- Early registration deadline May 25th
From: Ioan Raicu @ 2012-05-19 13:47 UTC (permalink / raw)
To: virtualization
[-- Attachment #1.1: Type: text/plain, Size: 9777 bytes --]
Call for Participation
http://www.hpdc.org/2012/
The organizing committee is delighted to invite you to *HPDC'12*, the
/21st International ACM Symposium on High-Performance Parallel and
Distributed Computing/, to be held in *Delft, the Netherlands*, which is
a historic, picturesque city that is less than one hour away from
Amsterdam-Schiphol airport.
HPDC <http://www.hpdc.org> is the premier annual conference on the
design, the implementation, the evaluation, and the use of parallel and
distributed systems for high-end computing. HPDC is sponsored by
SIGARCH, the Special Interest Group on Computer Architecture
<http://www.sigarch.org> of the Association for Computing Machinery
<http://www.acm.org>.
*HPDC'12* will be held at Delft University of Technology
<http://www.tudelft.nl>, with the main conference taking place on *June
20-22* (Wednesday to Friday 1 PM), and with affiliated workshops on
*June 18-19* (Monday and Tuesday).
Early registration closes on May 25th, so if you plan on attending,
please register now at http://www.hpdc.org/2012/registration/.
*Some highlights of the conference:*
* *Awards:*
o Achievement Award - Ian Foster of the University of Chicago and
Argonne National Laboratory, USA
* *Keynote Speakers:*
o Mihai Budiu of Microsoft Research, Mountain View, USA.
Title: Putting "Big-data" to Good Use: Building Kinect
o Ricardo Bianchini of Rutgers University, USA.
Title: "Leveraging Renewable Energy in Data Centers: Present and
Future"
* *Accepted Papers:*
1. vSlicer: Latency-aware Virtual Machine Scheduling via
Differentiated-frequency CPU Slicing, Cong Xu (Purdue
University), Sahan Gamage (Purdue University), Pawan N. Rao
(Purdue University), Ardalan Kangarlou (NetApp), Ramana Kompella
(Purdue University), Dongyan Xu (Purdue University)
2. Singleton: System-wide Page Deduplication in Virtual
Environments, Prateek Sharma, Purushottam Kulkarni (IIT Bombay)
3. Locality-aware Dynamic VM Reconfiguration on MapReduce
Clouds, Jongse Park, Daewoo Lee, Bokyeong Kim, Jaehyuk Huh,
Seungryoul Maeng (KAIST)
4. Achieving Application-Centric Performance Targets via
Consolidation on Multicores: Myth or Reality?, Lydia Y. Chen
Chen (IBM Research Zurich Lab), Danilo Ansaloni (University of
Lugano), Evgenia Smirni (College of William and Mary), Akira
Yokokawa (University of Lugano), Walter Binder (University of
Lugano)
5. Enabling Event Tracing at Leadership-Class Scale through
I/O Forwarding Middleware, Thomas Ilsche (Technische Universität
Dresden), Joseph Schuchart (Technische Universität Dresden),
Jason Cope (Argonne National Laboratory), Dries Kimpe (Argonne
National Laboratory), Terry Jones (Oak Ridge National
Laboratory), Andreas Knöpfer (Technische Universität Dresden),
Kamil Iskra (Argonne National Laboratory), Robert Ross (Argonne
National Laboratory), Wolfgang E. Nagel (Technische Universität
Dresden), Stephen Poole (Oak Ridge National Laboratory)
6. ISOBAR Hybrid Compression-I/O Interleaving for Large-scale
Parallel I/O Optimization, Eric R. Schendel (North Carolina
State University), Saurabh V. Pendse (North Carolina State
University), John Jenkins (North Carolina State University),
David A. Boyuka (North Carolina State University), Zhenhuan Gong
(North Carolina State University), Sriram Lakshminarasimhan
(North Carolina State University), Qing Liu (Oak Ridge National
Laboratory), Scott Klasky (Oak Ridge National Laboratory),
Robert Ross (Argonne National Laboratory), Nagiza F. Samatova
(North Carolina State University)
7. QBox: Guaranteeing I/O Performance on Black Box Storage
Systems, Dimitris Skourtis, Shinpei Kato, Scott Brandt
(University of California, Santa Cruz)
8. Towards Efficient Live Migration of I/O Intensive
Workloads: A Transparent Storage Transfer Propo, Bogdan Nicolae
(INRIA), Franck Cappello (INRIA/UIUC)
9. A Virtual Memory Based Runtime to Support Multi-tenancy in
Clusters with GPUs, Michela Becchi (University of Missouri),
Kittisak Sajjapongse (University of Missouri), Ian Graves
(University of Missouri), Adam Procter (University of Missouri),
Vignesh Ravi (Ohio State University), Srimat Chakradhar (NEC
Laboratories America)
10. Interference-driven Scheduling and Resource Management for
GPU-based Heterogeneous Clusters, Rajat Phull, Cheng-Hong Li,
Kunal Rao, Hari Cadambi, Srimat Chakradhar (NEC Laboratories
America)
11. Work Stealing and Persistence-based Load Balancers for
Iterative Overdecomposed Applications, Jonathan Lifflander
(UIUC), Sriram Krishnamoorthy (PNNL), Laxmikant V. Kale (UIUC)
12. Highly Scalable Graph Search for the Graph500 Benchmark,
Koji Ueno (Tokyo Institute of Technology/JST CREST), Toyotaro
Suzumura (Tokyo Institute of Technology/IBM Research Tokyo/JST
CREST)
13. PonD : Dynamic Creation of HTC Pool on Demand Using a
Decentralized Resource Discovery System, Kyungyong Lee
(University of Florida), David Wolinsky (Yale University),
Renato Figueiredo (University of Florida)
14. SpeQuloS: A QoS Service for BoT Applications Using Best
Effort Distributed Computing Infrastructures, Simon Delamare
(INRIA), Gilles Fedak (INRIA), Derrick Kondo (INRIA), Oleg
Lodygensky (IN2P3)
15. Understanding the Effects and Implications of Compute Node
Related Failures in Hadoop, Florin Dinu, T. S. Eugene Ng (Rice
University)
16. Optimizing MapReduce for GPUs with Effective Shared Memory
Usage, Linchuan Chen, Gagan Agrawal (The Ohio State University)
17. CAM: A Topology Aware Minimum Cost Flow Based Resource
Manager for MapReduce Applications in the Cloud, Min Li
(Virginia Tech), Dinesh Subhraveti (IBM Almaden Research
Center), Ali Butt (Virginia Tech), Aleksandr Khasymski (Virginia
Tech), Prasenjit Sarkar (IBM Almaden Research Center)
18. Distributed Approximate Spectral Clustering for
Large-Scale Datasets, Fei Gao (Simon Fraser University), Wael
Abd-Almageed (University of Maryland)
19. Exploring Cross-layer Power Management for PGAS
Applications on the SCC Platform, Marc Gamell (Rutgers
University), Ivan Rodero (Rutgers University), Manish Parashar
(Rutgers University), Rajeev Muralidhar (Intel India)
20. Dynamic Adaptive Virtual Core Mapping to Improve Power,
Energy, and Performance in Multi-socket Multicores, Chang Bae
(Northwestern University), Lei Xia (Northwestern University),
Peter Dinda (Northwestern University), John Lange (University of
Pittsburgh)
21. VNET/P: Bridging the Cloud and High Performance Computing
Through Fast Overlay Networking, Lei Xia (Northwestern
University), Zheng Cui (University of New Mexico), John Lange
(University of Pittsburgh), Yuan Tang (UESTC, China), Peter
Dinda (Northwestern University), Patrick Bridges (University of
New Mexico)
22. Massively-Parallel Stream Processing under QoS Constraints
with Nephele, Björn Lohrmann, Daniel Warneke, Odej Kao
(Technische Universität Berlin)
23. A Resiliency Model for High Performance Infrastructure
Based on Logical Encapsulation, James Moore (The University of
Southern California/EMC Corporation), Carl Kesselman (The
University of Southern California)
* *Workshops:*
o Astro-HPC: Workshop on High-Performance Computing for Astronomy,
Ana Lucia Varbanescu, Rob van Nieuwpoort, and Simon Portegies Zwart
o ECMLS: 3rd Int'l Emerging Computational Methods for the Life
Sciences Workshop, Carole Goble, Judy Qiu, and Ian Foster
o ScienceCloud: 3rd Workshop on Scientific Cloud Computing, Yogesh
Simmhan, Gabriel Antoniu, and Carole Goble
o DIDC: Fifth Int'l Workshop on Data-Intensive Distributed
Computing, Tevfik Kosar and Douglas Thain
o MapReduce: The Third Int'l Workshop on MapReduce and its
Applications, Gilles Fedak and Geoffrey Fox
o VTDC: 6th Int'l Workshop on Virtualization Technologies in
Distributed Computing, Frédéric Desprez and Adrien Lèbre
For more information on the full program, see
http://www.hpdc.org/2012/program/conference-program/.
Looking forward to seeing you in Delft!
Regards,
Ioan Raicu
--
=================================================================
Ioan Raicu, Ph.D.
Assistant Professor, Illinois Institute of Technology (IIT)
Guest Research Faculty, Argonne National Laboratory (ANL)
=================================================================
Data-Intensive Distributed Systems Laboratory, CS/IIT
Distributed Systems Laboratory, MCS/ANL
=================================================================
Cel: 1-847-722-0876
Office: 1-312-567-5704
Email: iraicu@cs.iit.edu
Web: http://www.cs.iit.edu/~iraicu/
Web: http://datasys.cs.iit.edu/
=================================================================
=================================================================
[-- Attachment #1.2: Type: text/html, Size: 11989 bytes --]
[-- Attachment #2: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* MBDS paper submission due in one month (The International Workshop on Management of Big Data Systems)
From: Ming Zhao @ 2012-05-19 0:44 UTC (permalink / raw)
To: virtualization
CALL FOR PAPERS
International Workshop on Management of Big Data Systems (MBDS 2012)
http://www.cercs.gatech.edu/mbds12
In conjunction with ICAC 2012
http://icac2012.cs.fiu.edu
September 21, 2012, San Jose, CA
-----------------------------------------------------------------
IMPORTANT DATES
Paper submission due: June 15, 2012
Author notification: July 30, 2012
Workshop: Sep 21, 2012
-----------------------------------------------------------------
OVERVIEW
Data is growing at an exponential rate and several systems have emerged
to store and analyze such large amounts of data. These systems, termed
Big data systems are fast evolving Examples include the NoSQL storage
systems, Hadoop Map-Reduce, data analytics platforms, search and indexing
platforms, and messaging infrastructures. These systems address needs for
structured and unstructured data across a wide spectrum of domains such
as web, social networks, enterprise, cloud, mobile, sensor networks,
multimedia/streaming, cyberphysical and high performance systems, and
for multiple application verticals such as biosciences, healthcare,
transportation, public sector, energy utilities, oil & gas, and scientific
computing.
With increasing scale and complexity, managing these big data systems to
cope with failures and performance problems is becoming non-trivial.
New resource management and scheduling mechanisms are also needed for such
systems, and so are mechanisms for tuning and support from platform layers.
Several open source and proprietary solutions have been proposed to address
these requirements, with extensive contributions from industry and academia.
However, there remain substantial challenges, including those that pertain
to such systems autonomic and self-management capabilities.
The objective of the MBDS workshop is to bring together researchers,
practitioners, system administrators, system programmers, and others
interested in sharing and presenting their perspectives on the effective
management of big data systems. The focus of the workshop is on novel and
practical, systems-oriented work. MBDS offers an opportunity for researchers
and practitioners from industry, academia, and national labs to showcase the
latest advances in this area and to also discuss and identify future directions
and challenges in all aspects on autonomic management of big data systems.
Papers are solicited on all aspects of big data management. Specific topics
of interest include, but are not limited, to the following:
* Autonomic and self-managing techniques
* Application-level resource management and scheduling mechanisms
* System tuning/auto-tuning and configuration management
* Performance management, fault management, and power management
* Scalability challenges
* Complexity challenges, as for composite, cross-tier systems with multiple control loops
* Unified management of data in motion and data at rest
* Dealing with both structured and unstructured data
* Monitoring, diagnosis, and automated behaviour detection
* System-level principles and support for resource management
* Holistic management across hardware and software
* Implications of emerging hardware technologies such as non-volatile memory
* Domain specific challenges in web, cloud, social networks, mobile, sensor networks,
streaming analytics, cyber-physical systems
* System building and experience papers for specific industry verticals
-----------------------------------------------------------------
PAPER SUBMISSIONS
Full papers (a maximum of 6 pages in the two-column ACM proceedings
format) are invited on a wide variety of topics relating to management of big
data systems. Submitted papers must be original work, and may not be under
consideration for another conference or journal. Complete formatting and
submission instructions can be found on the workshop web site. Accepted
papers will appear in proceedings distributed at the conference and available
electronically.
-----------------------------------------------------------------
WORKSHOP ORGANIZERS
Karsten Schwan, Georgia Tech
Vanish Talwar, HP Labs
PUBLICITY CHAIR
Aravind Menon, Facebook
PROGRAM COMMITTEE
Amitanand Aiyer, Facebook
Adhyas Avasthi, Nokia Research
Milind Bhandarkar, Greenplum Labs, EMC
Randal Burns, John Hopkins University
Garth Gibson, Carnegie Mellon University and Panasas Inc.
Herodotos Herodotou, Duke University
Michael A Kozuch, Intel
Kai Li, Princeton University
Mohamed Mansour, Amazon
Aravind Menon, Facebook
Arif Merchant, Google
Beth Plale, Indiana University
Indrajit Roy, HP Labs
Gabor Szabo, Twitter
Craig Ulmer, Sandia National Lab
Kushagra Vaid, Microsoft
Weikuan Yu, Auburn University
Philip Zeyliger, Cloudera
^ permalink raw reply
* [PULL] virtio: last minute fixes for 3.4
From: Michael S. Tsirkin @ 2012-05-17 9:35 UTC (permalink / raw)
To: Linus Torvalds
Cc: kvm, mst, netdev, linux-kernel, virtualization, uobergfe,
amit.shah, David Miller
The following changes since commit 0e93b4b304ae052ba1bc73f6d34a68556fe93429:
Merge git://git.kernel.org/pub/scm/virt/kvm/kvm (2012-05-16 14:30:51 -0700)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git for_linus
for you to fetch changes up to ec13ee80145ccb95b00e6e610044bbd94a170051:
virtio_net: invoke softirqs after __napi_schedule (2012-05-17 12:16:38 +0300)
----------------------------------------------------------------
virtio: last minute fixes for 3.4
Here are a couple of last minute virtio fixes for 3.4.
Hope it's not too late yes - I might have tried too hard
to make sure the fix is well tested.
Fixes are by Amit and myself. One fixes module removal
and one suspend of a VM, the last one the handling of out
of memory condition.
They are thus very low risk as most people never hit these paths, but do fix
very annoying problems for people that do use the feature.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
----------------------------------------------------------------
Amit Shah (2):
virtio: console: tell host of open ports after resume from s3/s4
virtio: balloon: let host know of updated balloon size before module removal
Michael S. Tsirkin (1):
virtio_net: invoke softirqs after __napi_schedule
drivers/char/virtio_console.c | 7 +++++++
drivers/net/virtio_net.c | 2 ++
drivers/virtio/virtio_balloon.c | 1 +
3 files changed, 10 insertions(+), 0 deletions(-)
^ permalink raw reply
* Re: [PATCH 1/1] Drivers: hid: hid-hyperv.c: Set the hid drvdata correctly
From: Jiri Kosina @ 2012-05-17 8:03 UTC (permalink / raw)
To: K. Y. Srinivasan; +Cc: gregkh, linux-kernel, devel, virtualization, ohering
In-Reply-To: <1337201413-9857-1-git-send-email-kys@microsoft.com>
On Wed, 16 May 2012, K. Y. Srinivasan wrote:
> Set the hid drvdata prior to invoking hid_add_device() as hid_add_device()
> expects this state to be set. This bug was introduced in the recent hid
> changes that were made in:
>
> commit 07d9ab4f0e52cb2a383596e5ebbbd20232501393
> HID: hid-hyperv: Do not use hid_parse_report() directly
>
> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
> ---
> drivers/hid/hid-hyperv.c | 3 ++-
> 1 files changed, 2 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/hid/hid-hyperv.c b/drivers/hid/hid-hyperv.c
> index 032e6c0..3d62781 100644
> --- a/drivers/hid/hid-hyperv.c
> +++ b/drivers/hid/hid-hyperv.c
> @@ -516,11 +516,12 @@ static int mousevsc_probe(struct hv_device *device,
>
> sprintf(hid_dev->name, "%s", "Microsoft Vmbus HID-compliant Mouse");
>
> + hid_set_drvdata(hid_dev, device);
> +
> ret = hid_add_device(hid_dev);
> if (ret)
> goto probe_err1;
>
> - hid_set_drvdata(hid_dev, device);
>
> ret = hid_parse(hid_dev);
> if (ret) {
Applied, thanks KY.
--
Jiri Kosina
SUSE Labs
^ permalink raw reply
* Re: [PATCH] virtio_net: invoke softirqs after __napi_schedule
From: David Miller @ 2012-05-17 3:40 UTC (permalink / raw)
To: rusty; +Cc: netdev, virtualization, linux-kernel, mst
In-Reply-To: <87vcjvzdlm.fsf@rustcorp.com.au>
From: Rusty Russell <rusty@rustcorp.com.au>
Date: Thu, 17 May 2012 13:02:53 +0930
> On Wed, 16 May 2012 10:57:13 +0300, "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> __napi_schedule might raise softirq but nothing
>> causes do_softirq to trigger, so it does not in fact
>> run. As a result,
>> the error message "NOHZ: local_softirq_pending 08"
>> sometimes occurs during boot of a KVM guest when the network service is
>> started and we are oom:
>>
>> ...
>> Bringing up loopback interface: [ OK ]
>> Bringing up interface eth0:
>> Determining IP information for eth0...NOHZ: local_softirq_pending 08
>> done.
>> [ OK ]
>> ...
>>
>> Further, receive queue processing might get delayed
>> indefinitely until some interrupt triggers:
>> virtio_net expected napi to be run immediately.
>>
>> One way to cause do_softirq to be executed is by
>> invoking local_bh_enable(). As __napi_schedule is
>> normally called from bh or irq context, this
>> seems to make sense: disable bh before __napi_schedule
>> and enable afterwards.
>>
>> Reported-by: Ulrich Obergfell <uobergfe@redhat.com>
>> Tested-by: Ulrich Obergfell <uobergfe@redhat.com>
>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
...
> Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Michael, you're best to submit this directly to Linus as I just
made what I hope is my last push to him for 3.4 today.
^ permalink raw reply
* Re: [PATCH] virtio_net: invoke softirqs after __napi_schedule
From: Rusty Russell @ 2012-05-17 3:32 UTC (permalink / raw)
To: David Miller; +Cc: netdev, virtualization, linux-kernel, Michael S. Tsirkin
In-Reply-To: <20120516075712.GA2921@redhat.com>
On Wed, 16 May 2012 10:57:13 +0300, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> __napi_schedule might raise softirq but nothing
> causes do_softirq to trigger, so it does not in fact
> run. As a result,
> the error message "NOHZ: local_softirq_pending 08"
> sometimes occurs during boot of a KVM guest when the network service is
> started and we are oom:
>
> ...
> Bringing up loopback interface: [ OK ]
> Bringing up interface eth0:
> Determining IP information for eth0...NOHZ: local_softirq_pending 08
> done.
> [ OK ]
> ...
>
> Further, receive queue processing might get delayed
> indefinitely until some interrupt triggers:
> virtio_net expected napi to be run immediately.
>
> One way to cause do_softirq to be executed is by
> invoking local_bh_enable(). As __napi_schedule is
> normally called from bh or irq context, this
> seems to make sense: disable bh before __napi_schedule
> and enable afterwards.
>
> Reported-by: Ulrich Obergfell <uobergfe@redhat.com>
> Tested-by: Ulrich Obergfell <uobergfe@redhat.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>
> To test, one can hack try_fill_recv to always report oom.
> I'm not sure it's not too late for 3.4, but we can try.
> Rusty, could you review ASAP pls?
It's missing a big comment: it's a very complicated way of calling
do_softirq().
Indeed, this function is only used when we are not in interrupt
context. It's not hot at all, in any ideal scenario.
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
^ permalink raw reply
* [PATCH 1/1] Drivers: hid: hid-hyperv.c: Set the hid drvdata correctly
From: K. Y. Srinivasan @ 2012-05-16 20:50 UTC (permalink / raw)
To: gregkh, linux-kernel, devel, virtualization, ohering, jkosina
Cc: K. Y. Srinivasan
Set the hid drvdata prior to invoking hid_add_device() as hid_add_device()
expects this state to be set. This bug was introduced in the recent hid
changes that were made in:
commit 07d9ab4f0e52cb2a383596e5ebbbd20232501393
HID: hid-hyperv: Do not use hid_parse_report() directly
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
---
drivers/hid/hid-hyperv.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/drivers/hid/hid-hyperv.c b/drivers/hid/hid-hyperv.c
index 032e6c0..3d62781 100644
--- a/drivers/hid/hid-hyperv.c
+++ b/drivers/hid/hid-hyperv.c
@@ -516,11 +516,12 @@ static int mousevsc_probe(struct hv_device *device,
sprintf(hid_dev->name, "%s", "Microsoft Vmbus HID-compliant Mouse");
+ hid_set_drvdata(hid_dev, device);
+
ret = hid_add_device(hid_dev);
if (ret)
goto probe_err1;
- hid_set_drvdata(hid_dev, device);
ret = hid_parse(hid_dev);
if (ret) {
--
1.7.4.1
^ permalink raw reply related
* Re: [vmw_vmci RFC 01/11] Apply VMCI context code
From: Andrew Stiegmann @ 2012-05-16 18:34 UTC (permalink / raw)
To: Stephen Hemminger, gregkh
Cc: acking, dtor, linux-kernel, virtualization, dsouders, akpm,
cschamp
In-Reply-To: <20120516100121.3be6d0ca@nehalam.linuxnetplumber.net>
Both of your comments have been added to my "to do" list before the next time I publish. Thanks for the feedback.
----- Original Message -----
> From: "Stephen Hemminger" <shemminger@vyatta.com>
> To: "Andrew Stiegmann (stieg)" <astiegmann@vmware.com>
> Cc: linux-kernel@vger.kernel.org, acking@vmware.com, dtor@vmware.com, gregkh@linuxfoundation.org,
> virtualization@lists.linux-foundation.org, dsouders@vmware.com, akpm@linux-foundation.org, cschamp@vmware.com
> Sent: Wednesday, May 16, 2012 10:01:21 AM
> Subject: Re: [vmw_vmci RFC 01/11] Apply VMCI context code
>
> On Tue, 15 May 2012 08:06:58 -0700
> "Andrew Stiegmann (stieg)" <astiegmann@vmware.com> wrote:
>
> > Context code maintains state for vmci and allows the driver
> > to communicate with multiple VMs.
> >
> > Signed-off-by: Andrew Stiegmann (stieg) <astiegmann@vmware.com>
>
> Running checkpatch reveals the usual noise, and the following that
> should be addressed.
>
> ERROR: do not use C99 // comments
> #272: FILE: drivers/misc/vmw_vmci/vmci_context.c:183:
> +static bool ctx_exists_locked(uint32_t cid) // IN
>
> ERROR: "foo * bar" should be "foo *bar"
> #304: FILE: drivers/misc/vmw_vmci/vmci_context.c:215:
> + uid_t * user, struct vmci_ctx **outContext)
>
> I don't mind the C99 style comments, but the // IN convention
> is pretty useless and should be removed.
>
^ permalink raw reply
* Re: [vmw_vmci RFC 01/11] Apply VMCI context code
From: Stephen Hemminger @ 2012-05-16 17:01 UTC (permalink / raw)
To: Andrew Stiegmann (stieg)
Cc: acking, dtor, gregkh, linux-kernel, virtualization, dsouders,
akpm, cschamp
In-Reply-To: <1337094428-20453-2-git-send-email-astiegmann@vmware.com>
On Tue, 15 May 2012 08:06:58 -0700
"Andrew Stiegmann (stieg)" <astiegmann@vmware.com> wrote:
> Context code maintains state for vmci and allows the driver
> to communicate with multiple VMs.
>
> Signed-off-by: Andrew Stiegmann (stieg) <astiegmann@vmware.com>
Running checkpatch reveals the usual noise, and the following that
should be addressed.
ERROR: do not use C99 // comments
#272: FILE: drivers/misc/vmw_vmci/vmci_context.c:183:
+static bool ctx_exists_locked(uint32_t cid) // IN
ERROR: "foo * bar" should be "foo *bar"
#304: FILE: drivers/misc/vmw_vmci/vmci_context.c:215:
+ uid_t * user, struct vmci_ctx **outContext)
I don't mind the C99 style comments, but the // IN convention
is pretty useless and should be removed.
^ permalink raw reply
* Re: [vmw_vmci RFC 00/11] VMCI for Linux
From: Dor Laor @ 2012-05-16 8:55 UTC (permalink / raw)
To: Greg KH
Cc: acking, kvm-devel, dtor, linux-kernel, virtualization, dsouders,
Amit Shah, Andrew Stiegmann (stieg), akpm, cschamp
In-Reply-To: <20120515235024.GB1758@kroah.com>
On 05/16/2012 02:50 AM, Greg KH wrote:
> On Tue, May 15, 2012 at 08:06:57AM -0700, Andrew Stiegmann (stieg) wrote:
>> In an effort to improve the out-of-the-box experience with Linux
>> kernels for VMware users, VMware is working on readying the Virtual
>> Machine Communication Interface (vmw_vmci) and VMCI Sockets (vmw_vsock) kernel
>> modules for inclusion in the Linux kernel. The purpose of this post
>> is to acquire feedback on the vmw_vmci kernel module. The vmw_vsock
>> kernel module will be presented in a later post.
>>
>> VMCI allows virtual machines to communicate with host kernel modules
>> and the VMware hypervisors. User level applications both in a virtual
>> machine and on the host can use vmw_vmci through VMCI Sockets, a socket
>> address family designed to be compatible with UDP and TCP at the
>> interface level. Today, VMCI and VMCI Sockets are used by the VMware
>> shared folders (HGFS) and various VMware Tools components inside the
>> guest for zero-config, network-less access to VMware host services. In
>> addition to this, VMware's users are using VMCI Sockets for various
>> applications, where network access of the virtual machine is
>> restricted or non-existent. Examples of this are VMs communicating
>> with device proxies for proprietary hardware running as host
>> applications and automated testing of applications running within
>> virtual machines.
>>
>> In a virtual machine, VMCI is exposed as a regular PCI device. The
>> primary communication mechanisms supported are a point-to-point
>> bidirectional transport based on a pair of memory-mapped queues, and
>> asynchronous notifications in the form of datagrams and
>> doorbells. These features are available to kernel level components
>> such as HGFS and VMCI Sockets through the VMCI kernel API. In addition
>> to this, the VMCI kernel API provides support for receiving events
>> related to the state of the VMCI communication channels, and the
>> virtual machine itself.
>
> Don't we have something like this already for KVM and maybe Xen?
We have virtio-serial driver for guest-host communication:
http://fedoraproject.org/wiki/Features/VirtioSerial
http://www.linux-kvm.org/page/VMchannel_Requirements
Amit Shah, the writer is CCed as well as kvm-devel.
> virtio? Can't you use that code instead of a new block of code that is
> only used by vmware users? It has virtual pci devices which should give
> you what you want/need here, right?
>
> If not, why doesn't that work for you? Would it be easier to just
> extend it?
KVM uses virtio-serial as a pci device which has 'ports' on top of it.
The ports acts like channels that can be created dynamically and allows
guest userspace <-> host userspace communication.
In theory, the kvm mechanism should be a good fit for other hypervisors.
Nevertheless, despite my biased love for KVM, I bet it would be 'tricky'
for VMW to change their hardware model and shift to virtio hardware
abstraction. In addition, they'll be required to change existing apps
that use their socket code.
One can bunker in our rightful requirement of 'upstream first' but this
may slow/vanish the benefits of getting VMW code upstream for out of the
box experience for Linux users.
IMHO, let's be practical and include this pci device (pending standard
review) but _require_ that the VMCI sockets family will be a general
mechanism that may be used over virtio-serial as well.
Andrew, it would be the best to work w/ Amit and various other KVM
hackers to get your (changed) code upstream.
Regards,
Dor
>
> thanks,
>
> greg k-h
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply
* [PATCH] virtio_net: invoke softirqs after __napi_schedule
From: Michael S. Tsirkin @ 2012-05-16 7:57 UTC (permalink / raw)
To: David Miller; +Cc: netdev, virtualization, linux-kernel, Michael S. Tsirkin
__napi_schedule might raise softirq but nothing
causes do_softirq to trigger, so it does not in fact
run. As a result,
the error message "NOHZ: local_softirq_pending 08"
sometimes occurs during boot of a KVM guest when the network service is
started and we are oom:
...
Bringing up loopback interface: [ OK ]
Bringing up interface eth0:
Determining IP information for eth0...NOHZ: local_softirq_pending 08
done.
[ OK ]
...
Further, receive queue processing might get delayed
indefinitely until some interrupt triggers:
virtio_net expected napi to be run immediately.
One way to cause do_softirq to be executed is by
invoking local_bh_enable(). As __napi_schedule is
normally called from bh or irq context, this
seems to make sense: disable bh before __napi_schedule
and enable afterwards.
Reported-by: Ulrich Obergfell <uobergfe@redhat.com>
Tested-by: Ulrich Obergfell <uobergfe@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
To test, one can hack try_fill_recv to always report oom.
I'm not sure it's not too late for 3.4, but we can try.
Rusty, could you review ASAP pls?
drivers/net/virtio_net.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index af8acc8..cbefe67 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -492,7 +492,9 @@ static void virtnet_napi_enable(struct virtnet_info *vi)
* We synchronize against interrupts via NAPI_STATE_SCHED */
if (napi_schedule_prep(&vi->napi)) {
virtqueue_disable_cb(vi->rvq);
+ local_bh_disable();
__napi_schedule(&vi->napi);
+ local_bh_enable();
}
}
--
MST
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox