public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Dor Laor <dor.laor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: Rusty Russell <rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org>
Cc: kvm-devel
	<kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org>,
	Christian Borntraeger
	<borntraeger-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
Subject: Re: virtio_net and SMP guests
Date: Mon, 24 Dec 2007 01:19:19 +0200	[thread overview]
Message-ID: <476EECF7.9000204@qumranet.com> (raw)
In-Reply-To: <200712181751.24692.rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org>

Rusty Russell wrote:
> On Friday 14 December 2007 23:12:05 Christian Borntraeger wrote:
>   
>> Rusty, Anthony, Dor,
>>
>> I need your brain power :-)
>>
>> On smp guests I have seen a problem with virtio (the version in curent
>> Avi's git) which do not occur on single processor guests:
>>
>> kernel BUG at /space/kvm/drivers/virtio/virtio_ring.c:228!
>> illegal operation: 0001 [#1]
>> Modules linked in: ipv6
>> CPU:    2    Not tainted
>> Process swapper (pid: 0, task: 000000000f83e038, ksp: 000000000f877d70)
>> Krnl PSW : 0704000180000000 000000000045df2a (vring_restart+0x5a/0x70)
>>            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:0 PM:0 EA:3
>> Krnl GPRS: 00000000c0a80101 0000000000000000 000000000eb35000
>> 0000000010005800 000000000045ded0 000000000000192f 000000000eb21000
>> 000000000eb21000 000000000000000e 000000000eb21900 000000000eb21920
>> 000000000f867cb8 0700000000d9b058 0000000000000010 000000000045c06a
>> 000000000f867cb8 Krnl Code: 000000000045df1e: e3b0b0700004       lg     
>> %r11,112(%r11) 000000000045df24: 07fe               bcr     15,%r14
>>            000000000045df26: a7f40001           brc     15,45df28
>>
>>           >000000000045df2a: a7f4ffe1           brc     15,45deec
>>
>>            000000000045df2e: e31020300004       lg      %r1,48(%r2)
>>            000000000045df34: a7480000           lhi     %r4,0
>>            000000000045df38: 96011001           oi      1(%r1),1
>>            000000000045df3c: a7f4ffef           brc     15,45df1a
>> Call Trace:
>> ([<000000000045c016>] virtnet_poll+0x96/0x42c)
>>  [<000000000048cda2>] net_rx_action+0xca/0x150
>>  [<0000000000137f7a>] __do_softirq+0x9e/0x130
>>  [<00000000001105d6>] do_softirq+0xae/0xb4
>>  [<0000000000138182>] irq_exit+0x96/0x9c
>>  [<000000000010d710>] do_extint+0xcc/0xf8
>>  [<00000000001135d0>] ext_no_vtime+0x16/0x1a
>>  [<000000000010a57e>] cpu_idle+0x216/0x238
>>
>>
>> I think there is a valid code path, triggering this bug:
>>
>> 	CPU1						CPU2
>> -----------------------				-----------------------
>> - virtnet_poll found no
>>   more packets on queue
>> - netif_rx_complete allow
>>   poll to be called
>> - vq_ops->restart is called
>> - vq Interrupts are enabled
>> 	.		     <new packets arrive>
>> <vcpu is scheduled away>
>> 	.					- interrupt is delivered
>> 	.					- poll is called
>> 	.					- poll work is done
>> 	.					- netif_rx_complete
>> 	.					- vq_ops->restart is called
>> 	.					- check if vq interrupts are
>> 	.					  enable --> BUG
>>     
>
> Shouldn't this be BUG()ing in START_USE()?  Or did you disable that because
> of previous false positives?
>
> virtnet_poll() should not be reentering, AFAICT.  virtio relies on the caller
> to ensure it never calls two virtio functions on the same queue
> simultaneously, and virtio_net in turn relies on the core net code to enforce
> this.
>
> So, how did this happen?
>
> Looks like we can have an interrupt on one CPU, which does:
>
> 	if (vq->vq.callback && !vq->vq.callback(&vq->vq))
> 		vq->vring.avail->flags |= VRING_AVAIL_F_NO_INTERRUPT;
>
> The callback schedules the softirq to run, which usually runs on same CPU...
> but if somehow it runs somewhere else, it could hit restart before the flag
> is set.  This means removing the BUG_ON() is bad, because we could restart,
> and then set the NO_INTERRUPT flag from the handler, and stall.
>
> Note that interrupt suppression is advisory, so we can be sloppy on the
> setting side.
>
> To me this points to doing interrupt suppression a different way.  If we
> have a ->disable_cb() virtio function, and call it before we call
> netif_rx_schedule, does that fix it?
>
> How's this?
> Rusty.
>
>   
Looks good to me. The only thing is the naming.. Maybe one can find 
better name than [dis|en]able_cb since
it is more like disable interrupts than disable_cb and enable_cb is more 
like run_callbacks (and enable interrupts).

Actually while looking at it some more, there's might be a performance 
optimization using your new patch:
Up to now, if the tx/rx paths were running out of descriptors they 
called enable_cb.
enable_cb told the host it should trigger an irq the next time it has 
data for the guest.
 From now on, enable_cb will run the callbacks inside shortening latency.

btw, I tried to apply this patch on my source tree without luck,  after 
doing a manual merge, the
guest opssed on the BUG_ON. I think it's because my sources are not 
aligned with yours.
Can you please post a mercurial/git repo? Please specify the relatively 
repository in case you choose mercurial.
Thanks,
Dor
> ---
> virtio: explicit enable_cb/disable_cb rather than callback return.
>
> It seems that virtio_net wants to disable callbacks (interrupts) before
> calling netif_rx_schedule(), so we can't use the return value to do so.
>
> Rename "restart" to "cb_enable" and introduce "cb_disable" hook: callback
> now returns void, rather than a boolean.
>
> Signed-off-by: Rusty Russell <rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org>
>
> diff -r 0eabf082c13a drivers/block/virtio_blk.c
> --- a/drivers/block/virtio_blk.c	Tue Dec 18 13:51:13 2007 +1100
> +++ b/drivers/block/virtio_blk.c	Tue Dec 18 15:49:18 2007 +1100
> @@ -36,7 +36,7 @@ struct virtblk_req
>  	struct virtio_blk_inhdr in_hdr;
>  };
>  
> -static bool blk_done(struct virtqueue *vq)
> +static void blk_done(struct virtqueue *vq)
>  {
>  	struct virtio_blk *vblk = vq->vdev->priv;
>  	struct virtblk_req *vbr;
> @@ -65,7 +65,6 @@ static bool blk_done(struct virtqueue *v
>  	/* In case queue is stopped waiting for more buffers. */
>  	blk_start_queue(vblk->disk->queue);
>  	spin_unlock_irqrestore(&vblk->lock, flags);
> -	return true;
>  }
>  
>  static bool do_req(struct request_queue *q, struct virtio_blk *vblk,
> diff -r 0eabf082c13a drivers/lguest/lguest_device.c
> --- a/drivers/lguest/lguest_device.c	Tue Dec 18 13:51:13 2007 +1100
> +++ b/drivers/lguest/lguest_device.c	Tue Dec 18 15:49:18 2007 +1100
> @@ -193,7 +193,7 @@ static void lg_notify(struct virtqueue *
>   * So we provide devices with a "find virtqueue and set it up" function. */
>  static struct virtqueue *lg_find_vq(struct virtio_device *vdev,
>  				    unsigned index,
> -				    bool (*callback)(struct virtqueue *vq))
> +				    void (*callback)(struct virtqueue *vq))
>  {
>  	struct lguest_device *ldev = to_lgdev(vdev);
>  	struct lguest_vq_info *lvq;
> diff -r 0eabf082c13a drivers/net/virtio_net.c
> --- a/drivers/net/virtio_net.c	Tue Dec 18 13:51:13 2007 +1100
> +++ b/drivers/net/virtio_net.c	Tue Dec 18 15:49:18 2007 +1100
> @@ -59,13 +59,12 @@ static inline void vnet_hdr_to_sg(struct
>  	sg_init_one(sg, skb_vnet_hdr(skb), sizeof(struct virtio_net_hdr));
>  }
>  
> -static bool skb_xmit_done(struct virtqueue *rvq)
> +static void skb_xmit_done(struct virtqueue *rvq)
>  {
>  	struct virtnet_info *vi = rvq->vdev->priv;
>  
>  	/* In case we were waiting for output buffers. */
>  	netif_wake_queue(vi->dev);
> -	return true;
>  }
>  
>  static void receive_skb(struct net_device *dev, struct sk_buff *skb,
> @@ -177,12 +176,12 @@ static void try_fill_recv(struct virtnet
>  	vi->rvq->vq_ops->kick(vi->rvq);
>  }
>  
> -static bool skb_recv_done(struct virtqueue *rvq)
> +static void skb_recv_done(struct virtqueue *rvq)
>  {
>  	struct virtnet_info *vi = rvq->vdev->priv;
> +	/* Suppress further interrupts. */
> +	rvq->vq_ops->disable_cb(rvq);
>  	netif_rx_schedule(vi->dev, &vi->napi);
> -	/* Suppress further interrupts. */
> -	return false;
>  }
>  
>  static int virtnet_poll(struct napi_struct *napi, int budget)
> @@ -208,7 +207,7 @@ again:
>  	/* Out of packets? */
>  	if (received < budget) {
>  		netif_rx_complete(vi->dev, napi);
> -		if (unlikely(!vi->rvq->vq_ops->restart(vi->rvq))
> +		if (unlikely(!vi->rvq->vq_ops->enable_cb(vi->rvq))
>  		    && netif_rx_reschedule(vi->dev, napi))
>  			goto again;
>  	}
> diff -r 0eabf082c13a drivers/virtio/virtio_pci.c
> --- a/drivers/virtio/virtio_pci.c	Tue Dec 18 13:51:13 2007 +1100
> +++ b/drivers/virtio/virtio_pci.c	Tue Dec 18 15:49:18 2007 +1100
> @@ -190,7 +190,7 @@ static irqreturn_t vp_interrupt(int irq,
>  
>  /* the config->find_vq() implementation */
>  static struct virtqueue *vp_find_vq(struct virtio_device *vdev, unsigned index,
> -				    bool (*callback)(struct virtqueue *vq))
> +				    void (*callback)(struct virtqueue *vq))
>  {
>  	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>  	struct virtio_pci_vq_info *info;
> diff -r 0eabf082c13a drivers/virtio/virtio_ring.c
> --- a/drivers/virtio/virtio_ring.c	Tue Dec 18 13:51:13 2007 +1100
> +++ b/drivers/virtio/virtio_ring.c	Tue Dec 18 15:49:18 2007 +1100
> @@ -220,7 +220,17 @@ static void *vring_get_buf(struct virtqu
>  	return ret;
>  }
>  
> -static bool vring_restart(struct virtqueue *_vq)
> +static void vring_disable_cb(struct virtqueue *_vq)
> +{
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +	START_USE(vq);
> +	BUG_ON(vq->vring.avail->flags & VRING_AVAIL_F_NO_INTERRUPT);
> +	vq->vring.avail->flags |= VRING_AVAIL_F_NO_INTERRUPT;
> +	END_USE(vq);
> +}
> +
> +static bool vring_enable_cb(struct virtqueue *_vq)
>  {
>  	struct vring_virtqueue *vq = to_vvq(_vq);
>  
> @@ -254,8 +264,8 @@ irqreturn_t vring_interrupt(int irq, voi
>  		return IRQ_HANDLED;
>  
>  	pr_debug("virtqueue callback for %p (%p)\n", vq, vq->vq.callback);
> -	if (vq->vq.callback && !vq->vq.callback(&vq->vq))
> -		vq->vring.avail->flags |= VRING_AVAIL_F_NO_INTERRUPT;
> +	if (vq->vq.callback)
> +		vq->vq.callback(&vq->vq);
>  
>  	return IRQ_HANDLED;
>  }
> @@ -266,7 +276,8 @@ static struct virtqueue_ops vring_vq_ops
>  	.add_buf = vring_add_buf,
>  	.get_buf = vring_get_buf,
>  	.kick = vring_kick,
> -	.restart = vring_restart,
> +	.disable_cb = vring_disable_cb,
> +	.enable_cb = vring_enable_cb,
>  	.shutdown = vring_shutdown,
>  };
>  
> @@ -274,7 +285,7 @@ struct virtqueue *vring_new_virtqueue(un
>  				      struct virtio_device *vdev,
>  				      void *pages,
>  				      void (*notify)(struct virtqueue *),
> -				      bool (*callback)(struct virtqueue *))
> +				      void (*callback)(struct virtqueue *))
>  {
>  	struct vring_virtqueue *vq;
>  	unsigned int i;
> diff -r 0eabf082c13a include/linux/virtio.h
> --- a/include/linux/virtio.h	Tue Dec 18 13:51:13 2007 +1100
> +++ b/include/linux/virtio.h	Tue Dec 18 15:49:18 2007 +1100
> @@ -11,15 +11,13 @@
>  /**
>   * virtqueue - a queue to register buffers for sending or receiving.
>   * @callback: the function to call when buffers are consumed (can be NULL).
> - *    If this returns false, callbacks are suppressed until vq_ops->restart
> - *    is called.
>   * @vdev: the virtio device this queue was created for.
>   * @vq_ops: the operations for this virtqueue (see below).
>   * @priv: a pointer for the virtqueue implementation to use.
>   */
>  struct virtqueue
>  {
> -	bool (*callback)(struct virtqueue *vq);
> +	void (*callback)(struct virtqueue *vq);
>  	struct virtio_device *vdev;
>  	struct virtqueue_ops *vq_ops;
>  	void *priv;
> @@ -41,7 +39,9 @@ struct virtqueue
>   *	vq: the struct virtqueue we're talking about.
>   *	len: the length written into the buffer
>   *	Returns NULL or the "data" token handed to add_buf.
> - * @restart: restart callbacks after callback returned false.
> + * @disable_cb: disable callbacks
> + *	vq: the struct virtqueue we're talking about.
> + * @enable_cb: restart callbacks after disable_cb.
>   *	vq: the struct virtqueue we're talking about.
>   *	This returns "false" (and doesn't re-enable) if there are pending
>   *	buffers in the queue, to avoid a race.
> @@ -65,7 +65,8 @@ struct virtqueue_ops {
>  
>  	void *(*get_buf)(struct virtqueue *vq, unsigned int *len);
>  
> -	bool (*restart)(struct virtqueue *vq);
> +	void (*disable_cb)(struct virtqueue *vq);
> +	bool (*enable_cb)(struct virtqueue *vq);
>  
>  	void (*shutdown)(struct virtqueue *vq);
>  };
> diff -r 0eabf082c13a include/linux/virtio_config.h
> --- a/include/linux/virtio_config.h	Tue Dec 18 13:51:13 2007 +1100
> +++ b/include/linux/virtio_config.h	Tue Dec 18 15:49:18 2007 +1100
> @@ -61,7 +61,7 @@ struct virtio_config_ops
>  	void (*set_status)(struct virtio_device *vdev, u8 status);
>  	struct virtqueue *(*find_vq)(struct virtio_device *vdev,
>  				     unsigned index,
> -				     bool (*callback)(struct virtqueue *));
> +				     void (*callback)(struct virtqueue *));
>  	void (*del_vq)(struct virtqueue *vq);
>  };
>  
> diff -r 0eabf082c13a include/linux/virtio_ring.h
> --- a/include/linux/virtio_ring.h	Tue Dec 18 13:51:13 2007 +1100
> +++ b/include/linux/virtio_ring.h	Tue Dec 18 15:49:18 2007 +1100
> @@ -114,7 +114,7 @@ struct virtqueue *vring_new_virtqueue(un
>  				      struct virtio_device *vdev,
>  				      void *pages,
>  				      void (*notify)(struct virtqueue *vq),
> -				      bool (*callback)(struct virtqueue *vq));
> +				      void (*callback)(struct virtqueue *vq));
>  void vring_del_virtqueue(struct virtqueue *vq);
>  
>  irqreturn_t vring_interrupt(int irq, void *_vq);
>
>   


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

  parent reply	other threads:[~2007-12-23 23:19 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-12-14 12:12 virtio_net and SMP guests Christian Borntraeger
     [not found] ` <200712141312.05562.borntraeger-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
2007-12-16 11:55   ` Dor Laor
2007-12-18  6:51   ` Rusty Russell
     [not found]     ` <200712181751.24692.rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org>
2007-12-23 23:19       ` Dor Laor [this message]
     [not found]         ` <476EECF7.9000204-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-12-24  0:54           ` Rusty Russell
     [not found]             ` <200712241154.20885.rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org>
2007-12-25 12:22               ` Dor Laor
     [not found]                 ` <4770F60D.4010904-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-12-26  0:48                   ` Rusty Russell
2008-01-10 12:37       ` Christian Borntraeger
     [not found]         ` <200801101337.40433.borntraeger-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
2008-01-10 15:39           ` Christian Borntraeger
2008-01-10 15:51             ` Christian Borntraeger
2008-01-11  9:53               ` Rusty Russell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=476EECF7.9000204@qumranet.com \
    --to=dor.laor-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
    --cc=borntraeger-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org \
    --cc=dor.laor-atKUWr5tajBWk0Htik3J/w@public.gmane.org \
    --cc=kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org \
    --cc=rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org \
    --cc=virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox