From: Dor Laor <dor.laor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: Rusty Russell <rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org>
Cc: kvm-devel
<kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org>,
Christian Borntraeger
<borntraeger-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>,
virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
Subject: Re: virtio_net and SMP guests
Date: Mon, 24 Dec 2007 01:19:19 +0200 [thread overview]
Message-ID: <476EECF7.9000204@qumranet.com> (raw)
In-Reply-To: <200712181751.24692.rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org>
Rusty Russell wrote:
> On Friday 14 December 2007 23:12:05 Christian Borntraeger wrote:
>
>> Rusty, Anthony, Dor,
>>
>> I need your brain power :-)
>>
>> On smp guests I have seen a problem with virtio (the version in curent
>> Avi's git) which do not occur on single processor guests:
>>
>> kernel BUG at /space/kvm/drivers/virtio/virtio_ring.c:228!
>> illegal operation: 0001 [#1]
>> Modules linked in: ipv6
>> CPU: 2 Not tainted
>> Process swapper (pid: 0, task: 000000000f83e038, ksp: 000000000f877d70)
>> Krnl PSW : 0704000180000000 000000000045df2a (vring_restart+0x5a/0x70)
>> R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:0 PM:0 EA:3
>> Krnl GPRS: 00000000c0a80101 0000000000000000 000000000eb35000
>> 0000000010005800 000000000045ded0 000000000000192f 000000000eb21000
>> 000000000eb21000 000000000000000e 000000000eb21900 000000000eb21920
>> 000000000f867cb8 0700000000d9b058 0000000000000010 000000000045c06a
>> 000000000f867cb8 Krnl Code: 000000000045df1e: e3b0b0700004 lg
>> %r11,112(%r11) 000000000045df24: 07fe bcr 15,%r14
>> 000000000045df26: a7f40001 brc 15,45df28
>>
>> >000000000045df2a: a7f4ffe1 brc 15,45deec
>>
>> 000000000045df2e: e31020300004 lg %r1,48(%r2)
>> 000000000045df34: a7480000 lhi %r4,0
>> 000000000045df38: 96011001 oi 1(%r1),1
>> 000000000045df3c: a7f4ffef brc 15,45df1a
>> Call Trace:
>> ([<000000000045c016>] virtnet_poll+0x96/0x42c)
>> [<000000000048cda2>] net_rx_action+0xca/0x150
>> [<0000000000137f7a>] __do_softirq+0x9e/0x130
>> [<00000000001105d6>] do_softirq+0xae/0xb4
>> [<0000000000138182>] irq_exit+0x96/0x9c
>> [<000000000010d710>] do_extint+0xcc/0xf8
>> [<00000000001135d0>] ext_no_vtime+0x16/0x1a
>> [<000000000010a57e>] cpu_idle+0x216/0x238
>>
>>
>> I think there is a valid code path, triggering this bug:
>>
>> CPU1 CPU2
>> ----------------------- -----------------------
>> - virtnet_poll found no
>> more packets on queue
>> - netif_rx_complete allow
>> poll to be called
>> - vq_ops->restart is called
>> - vq Interrupts are enabled
>> . <new packets arrive>
>> <vcpu is scheduled away>
>> . - interrupt is delivered
>> . - poll is called
>> . - poll work is done
>> . - netif_rx_complete
>> . - vq_ops->restart is called
>> . - check if vq interrupts are
>> . enable --> BUG
>>
>
> Shouldn't this be BUG()ing in START_USE()? Or did you disable that because
> of previous false positives?
>
> virtnet_poll() should not be reentering, AFAICT. virtio relies on the caller
> to ensure it never calls two virtio functions on the same queue
> simultaneously, and virtio_net in turn relies on the core net code to enforce
> this.
>
> So, how did this happen?
>
> Looks like we can have an interrupt on one CPU, which does:
>
> if (vq->vq.callback && !vq->vq.callback(&vq->vq))
> vq->vring.avail->flags |= VRING_AVAIL_F_NO_INTERRUPT;
>
> The callback schedules the softirq to run, which usually runs on same CPU...
> but if somehow it runs somewhere else, it could hit restart before the flag
> is set. This means removing the BUG_ON() is bad, because we could restart,
> and then set the NO_INTERRUPT flag from the handler, and stall.
>
> Note that interrupt suppression is advisory, so we can be sloppy on the
> setting side.
>
> To me this points to doing interrupt suppression a different way. If we
> have a ->disable_cb() virtio function, and call it before we call
> netif_rx_schedule, does that fix it?
>
> How's this?
> Rusty.
>
>
Looks good to me. The only thing is the naming.. Maybe one can find
better name than [dis|en]able_cb since
it is more like disable interrupts than disable_cb and enable_cb is more
like run_callbacks (and enable interrupts).
Actually while looking at it some more, there's might be a performance
optimization using your new patch:
Up to now, if the tx/rx paths were running out of descriptors they
called enable_cb.
enable_cb told the host it should trigger an irq the next time it has
data for the guest.
From now on, enable_cb will run the callbacks inside shortening latency.
btw, I tried to apply this patch on my source tree without luck, after
doing a manual merge, the
guest opssed on the BUG_ON. I think it's because my sources are not
aligned with yours.
Can you please post a mercurial/git repo? Please specify the relatively
repository in case you choose mercurial.
Thanks,
Dor
> ---
> virtio: explicit enable_cb/disable_cb rather than callback return.
>
> It seems that virtio_net wants to disable callbacks (interrupts) before
> calling netif_rx_schedule(), so we can't use the return value to do so.
>
> Rename "restart" to "cb_enable" and introduce "cb_disable" hook: callback
> now returns void, rather than a boolean.
>
> Signed-off-by: Rusty Russell <rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org>
>
> diff -r 0eabf082c13a drivers/block/virtio_blk.c
> --- a/drivers/block/virtio_blk.c Tue Dec 18 13:51:13 2007 +1100
> +++ b/drivers/block/virtio_blk.c Tue Dec 18 15:49:18 2007 +1100
> @@ -36,7 +36,7 @@ struct virtblk_req
> struct virtio_blk_inhdr in_hdr;
> };
>
> -static bool blk_done(struct virtqueue *vq)
> +static void blk_done(struct virtqueue *vq)
> {
> struct virtio_blk *vblk = vq->vdev->priv;
> struct virtblk_req *vbr;
> @@ -65,7 +65,6 @@ static bool blk_done(struct virtqueue *v
> /* In case queue is stopped waiting for more buffers. */
> blk_start_queue(vblk->disk->queue);
> spin_unlock_irqrestore(&vblk->lock, flags);
> - return true;
> }
>
> static bool do_req(struct request_queue *q, struct virtio_blk *vblk,
> diff -r 0eabf082c13a drivers/lguest/lguest_device.c
> --- a/drivers/lguest/lguest_device.c Tue Dec 18 13:51:13 2007 +1100
> +++ b/drivers/lguest/lguest_device.c Tue Dec 18 15:49:18 2007 +1100
> @@ -193,7 +193,7 @@ static void lg_notify(struct virtqueue *
> * So we provide devices with a "find virtqueue and set it up" function. */
> static struct virtqueue *lg_find_vq(struct virtio_device *vdev,
> unsigned index,
> - bool (*callback)(struct virtqueue *vq))
> + void (*callback)(struct virtqueue *vq))
> {
> struct lguest_device *ldev = to_lgdev(vdev);
> struct lguest_vq_info *lvq;
> diff -r 0eabf082c13a drivers/net/virtio_net.c
> --- a/drivers/net/virtio_net.c Tue Dec 18 13:51:13 2007 +1100
> +++ b/drivers/net/virtio_net.c Tue Dec 18 15:49:18 2007 +1100
> @@ -59,13 +59,12 @@ static inline void vnet_hdr_to_sg(struct
> sg_init_one(sg, skb_vnet_hdr(skb), sizeof(struct virtio_net_hdr));
> }
>
> -static bool skb_xmit_done(struct virtqueue *rvq)
> +static void skb_xmit_done(struct virtqueue *rvq)
> {
> struct virtnet_info *vi = rvq->vdev->priv;
>
> /* In case we were waiting for output buffers. */
> netif_wake_queue(vi->dev);
> - return true;
> }
>
> static void receive_skb(struct net_device *dev, struct sk_buff *skb,
> @@ -177,12 +176,12 @@ static void try_fill_recv(struct virtnet
> vi->rvq->vq_ops->kick(vi->rvq);
> }
>
> -static bool skb_recv_done(struct virtqueue *rvq)
> +static void skb_recv_done(struct virtqueue *rvq)
> {
> struct virtnet_info *vi = rvq->vdev->priv;
> + /* Suppress further interrupts. */
> + rvq->vq_ops->disable_cb(rvq);
> netif_rx_schedule(vi->dev, &vi->napi);
> - /* Suppress further interrupts. */
> - return false;
> }
>
> static int virtnet_poll(struct napi_struct *napi, int budget)
> @@ -208,7 +207,7 @@ again:
> /* Out of packets? */
> if (received < budget) {
> netif_rx_complete(vi->dev, napi);
> - if (unlikely(!vi->rvq->vq_ops->restart(vi->rvq))
> + if (unlikely(!vi->rvq->vq_ops->enable_cb(vi->rvq))
> && netif_rx_reschedule(vi->dev, napi))
> goto again;
> }
> diff -r 0eabf082c13a drivers/virtio/virtio_pci.c
> --- a/drivers/virtio/virtio_pci.c Tue Dec 18 13:51:13 2007 +1100
> +++ b/drivers/virtio/virtio_pci.c Tue Dec 18 15:49:18 2007 +1100
> @@ -190,7 +190,7 @@ static irqreturn_t vp_interrupt(int irq,
>
> /* the config->find_vq() implementation */
> static struct virtqueue *vp_find_vq(struct virtio_device *vdev, unsigned index,
> - bool (*callback)(struct virtqueue *vq))
> + void (*callback)(struct virtqueue *vq))
> {
> struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> struct virtio_pci_vq_info *info;
> diff -r 0eabf082c13a drivers/virtio/virtio_ring.c
> --- a/drivers/virtio/virtio_ring.c Tue Dec 18 13:51:13 2007 +1100
> +++ b/drivers/virtio/virtio_ring.c Tue Dec 18 15:49:18 2007 +1100
> @@ -220,7 +220,17 @@ static void *vring_get_buf(struct virtqu
> return ret;
> }
>
> -static bool vring_restart(struct virtqueue *_vq)
> +static void vring_disable_cb(struct virtqueue *_vq)
> +{
> + struct vring_virtqueue *vq = to_vvq(_vq);
> +
> + START_USE(vq);
> + BUG_ON(vq->vring.avail->flags & VRING_AVAIL_F_NO_INTERRUPT);
> + vq->vring.avail->flags |= VRING_AVAIL_F_NO_INTERRUPT;
> + END_USE(vq);
> +}
> +
> +static bool vring_enable_cb(struct virtqueue *_vq)
> {
> struct vring_virtqueue *vq = to_vvq(_vq);
>
> @@ -254,8 +264,8 @@ irqreturn_t vring_interrupt(int irq, voi
> return IRQ_HANDLED;
>
> pr_debug("virtqueue callback for %p (%p)\n", vq, vq->vq.callback);
> - if (vq->vq.callback && !vq->vq.callback(&vq->vq))
> - vq->vring.avail->flags |= VRING_AVAIL_F_NO_INTERRUPT;
> + if (vq->vq.callback)
> + vq->vq.callback(&vq->vq);
>
> return IRQ_HANDLED;
> }
> @@ -266,7 +276,8 @@ static struct virtqueue_ops vring_vq_ops
> .add_buf = vring_add_buf,
> .get_buf = vring_get_buf,
> .kick = vring_kick,
> - .restart = vring_restart,
> + .disable_cb = vring_disable_cb,
> + .enable_cb = vring_enable_cb,
> .shutdown = vring_shutdown,
> };
>
> @@ -274,7 +285,7 @@ struct virtqueue *vring_new_virtqueue(un
> struct virtio_device *vdev,
> void *pages,
> void (*notify)(struct virtqueue *),
> - bool (*callback)(struct virtqueue *))
> + void (*callback)(struct virtqueue *))
> {
> struct vring_virtqueue *vq;
> unsigned int i;
> diff -r 0eabf082c13a include/linux/virtio.h
> --- a/include/linux/virtio.h Tue Dec 18 13:51:13 2007 +1100
> +++ b/include/linux/virtio.h Tue Dec 18 15:49:18 2007 +1100
> @@ -11,15 +11,13 @@
> /**
> * virtqueue - a queue to register buffers for sending or receiving.
> * @callback: the function to call when buffers are consumed (can be NULL).
> - * If this returns false, callbacks are suppressed until vq_ops->restart
> - * is called.
> * @vdev: the virtio device this queue was created for.
> * @vq_ops: the operations for this virtqueue (see below).
> * @priv: a pointer for the virtqueue implementation to use.
> */
> struct virtqueue
> {
> - bool (*callback)(struct virtqueue *vq);
> + void (*callback)(struct virtqueue *vq);
> struct virtio_device *vdev;
> struct virtqueue_ops *vq_ops;
> void *priv;
> @@ -41,7 +39,9 @@ struct virtqueue
> * vq: the struct virtqueue we're talking about.
> * len: the length written into the buffer
> * Returns NULL or the "data" token handed to add_buf.
> - * @restart: restart callbacks after callback returned false.
> + * @disable_cb: disable callbacks
> + * vq: the struct virtqueue we're talking about.
> + * @enable_cb: restart callbacks after disable_cb.
> * vq: the struct virtqueue we're talking about.
> * This returns "false" (and doesn't re-enable) if there are pending
> * buffers in the queue, to avoid a race.
> @@ -65,7 +65,8 @@ struct virtqueue_ops {
>
> void *(*get_buf)(struct virtqueue *vq, unsigned int *len);
>
> - bool (*restart)(struct virtqueue *vq);
> + void (*disable_cb)(struct virtqueue *vq);
> + bool (*enable_cb)(struct virtqueue *vq);
>
> void (*shutdown)(struct virtqueue *vq);
> };
> diff -r 0eabf082c13a include/linux/virtio_config.h
> --- a/include/linux/virtio_config.h Tue Dec 18 13:51:13 2007 +1100
> +++ b/include/linux/virtio_config.h Tue Dec 18 15:49:18 2007 +1100
> @@ -61,7 +61,7 @@ struct virtio_config_ops
> void (*set_status)(struct virtio_device *vdev, u8 status);
> struct virtqueue *(*find_vq)(struct virtio_device *vdev,
> unsigned index,
> - bool (*callback)(struct virtqueue *));
> + void (*callback)(struct virtqueue *));
> void (*del_vq)(struct virtqueue *vq);
> };
>
> diff -r 0eabf082c13a include/linux/virtio_ring.h
> --- a/include/linux/virtio_ring.h Tue Dec 18 13:51:13 2007 +1100
> +++ b/include/linux/virtio_ring.h Tue Dec 18 15:49:18 2007 +1100
> @@ -114,7 +114,7 @@ struct virtqueue *vring_new_virtqueue(un
> struct virtio_device *vdev,
> void *pages,
> void (*notify)(struct virtqueue *vq),
> - bool (*callback)(struct virtqueue *vq));
> + void (*callback)(struct virtqueue *vq));
> void vring_del_virtqueue(struct virtqueue *vq);
>
> irqreturn_t vring_interrupt(int irq, void *_vq);
>
>
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
next prev parent reply other threads:[~2007-12-23 23:19 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-12-14 12:12 virtio_net and SMP guests Christian Borntraeger
[not found] ` <200712141312.05562.borntraeger-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
2007-12-16 11:55 ` Dor Laor
2007-12-18 6:51 ` Rusty Russell
[not found] ` <200712181751.24692.rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org>
2007-12-23 23:19 ` Dor Laor [this message]
[not found] ` <476EECF7.9000204-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-12-24 0:54 ` Rusty Russell
[not found] ` <200712241154.20885.rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org>
2007-12-25 12:22 ` Dor Laor
[not found] ` <4770F60D.4010904-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-12-26 0:48 ` Rusty Russell
2008-01-10 12:37 ` Christian Borntraeger
[not found] ` <200801101337.40433.borntraeger-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
2008-01-10 15:39 ` Christian Borntraeger
2008-01-10 15:51 ` Christian Borntraeger
2008-01-11 9:53 ` Rusty Russell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=476EECF7.9000204@qumranet.com \
--to=dor.laor-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
--cc=borntraeger-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org \
--cc=dor.laor-atKUWr5tajBWk0Htik3J/w@public.gmane.org \
--cc=kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org \
--cc=rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org \
--cc=virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox