netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] virtio_net: Fix open <-> interrupt race
@ 2008-02-05 13:36 Christian Borntraeger
  2008-02-06  4:00 ` Rusty Russell
  0 siblings, 1 reply; 4+ messages in thread
From: Christian Borntraeger @ 2008-02-05 13:36 UTC (permalink / raw)
  To: netdev; +Cc: Rusty Russell, dor.laor, Anthony Liguori, virtualization

I got the following oops during interface ifup. Unfortunately its not 
easily reproducable so I cant say for sure that my fix fixes this
problem, but I am confident and I think its correct anyway:

   <2>kernel BUG at /space/kvm/drivers/virtio/virtio_ring.c:234!
    <4>illegal operation: 0001 [#1] PREEMPT SMP
    <4>Modules linked in:
    <4>CPU: 0 Not tainted 2.6.24zlive-guest-07293-gf1ca151-dirty #91
    <4>Process swapper (pid: 0, task: 0000000000800938, ksp: 000000000084ddb8)
    <4>Krnl PSW : 0404300180000000 0000000000466374 (vring_disable_cb+0x30/0x34)
    <4>           R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:3 PM:0 EA:3
    <4>Krnl GPRS: 0000000000000001 0000000000000001 0000000010003800 0000000000466344
    <4>           000000000e980900 00000000008848b0 000000000084e748 0000000000000000
    <4>           000000000087b300 0000000000001237 0000000000001237 000000000f85bdd8
    <4>           000000000e980920 00000000001137c0 0000000000464754 000000000f85bdd8
    <4>Krnl Code: 0000000000466368: e3b0b0700004        lg      %r11,112(%r11)
    <4>           000000000046636e: 07fe                bcr     15,%r14
    <4>           0000000000466370: a7f40001            brc     15,466372
    <4>          >0000000000466374: a7f4fff6            brc     15,466360
    <4>           0000000000466378: eb7ff0500024        stmg    %r7,%r15,80(%r15)
    <4>           000000000046637e: a7f13e00            tmll    %r15,15872
    <4>           0000000000466382: b90400ef            lgr     %r14,%r15
    <4>           0000000000466386: a7840001            brc     8,466388
    <4>Call Trace:
    <4>([<000201500f85c000>] 0x201500f85c000)
    <4> [<0000000000466556>] vring_interrupt+0x72/0x88
    <4> [<00000000004801a0>] kvm_extint_handler+0x34/0x44
    <4> [<000000000010d22c>] do_extint+0xbc/0xf8
    <4> [<0000000000113f98>] ext_no_vtime+0x16/0x1a
    <4> [<000000000010a182>] cpu_idle+0x216/0x238
    <4>([<000000000010a162>] cpu_idle+0x1f6/0x238)
    <4> [<0000000000568656>] rest_init+0xaa/0xb8
    <4> [<000000000084ee2c>] start_kernel+0x3fc/0x490
    <4> [<0000000000100020>] _stext+0x20/0x80
    <4>
    <4> <0>Kernel panic - not syncing: Fatal exception in interrupt
    <4>

After looking at the code and the dump I think the following scenario
happened: Ifup was running on cpu2 and the interrupt arrived on cpu0.
Now virtnet_open on cpu 2 managed to execute napi_enable and disable_cb
but did not execute rx_schedule. Meanwhile on cpu 0 skb_recv_done was
called by vring_interrupt, executed netif_rx_schedule_prep, which
succeeded and therefore called disable_cb. This triggered the BUG_ON,
as interrupts were already disabled by cpu 2.

I think the proper solution is to make the call to disable_cb depend on
the atomic update of NAPI_STATE_SCHED by using netif_rx_schedule_prep
in the same way as skb_recv_done. 


Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 drivers/net/virtio_net.c |   10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

Index: kvm/drivers/net/virtio_net.c
===================================================================
--- kvm.orig/drivers/net/virtio_net.c
+++ kvm/drivers/net/virtio_net.c
@@ -321,10 +321,12 @@ static int virtnet_open(struct net_devic
 
 	/* If all buffers were filled by other side before we napi_enabled, we
 	 * won't get another interrupt, so process any outstanding packets
-	 * now.  virtnet_poll wants re-enable the queue, so we disable here. */
-	vi->rvq->vq_ops->disable_cb(vi->rvq);
-	netif_rx_schedule(vi->dev, &vi->napi);
-
+	 * now.  virtnet_poll wants re-enable the queue, so we disable here.
+	 * We synchronize against interrupts via NAPI_STATE_SCHED */
+	if (netif_rx_schedule_prep(dev, &vi->napi)) {
+		vi->rvq->vq_ops->disable_cb(vi->rvq);
+		__netif_rx_schedule(dev, &vi->napi);
+	}
 	return 0;
 }
-- 
IBM Deutschland Entwicklung GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter
Geschäftsführung: Herbert Kircher 
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] virtio_net: Fix open <-> interrupt race
  2008-02-05 13:36 [PATCH] virtio_net: Fix open <-> interrupt race Christian Borntraeger
@ 2008-02-06  4:00 ` Rusty Russell
  2008-02-06  7:50   ` Christian Borntraeger
  0 siblings, 1 reply; 4+ messages in thread
From: Rusty Russell @ 2008-02-06  4:00 UTC (permalink / raw)
  To: Christian Borntraeger; +Cc: netdev, dor.laor, Anthony Liguori, virtualization

On Wednesday 06 February 2008 00:36:10 Christian Borntraeger wrote:
> I got the following oops during interface ifup. Unfortunately its not
> easily reproducable so I cant say for sure that my fix fixes this
> problem, but I am confident and I think its correct anyway:

Looks fine, and certainly shouldn't hurt.  Please push this out.

Acked-by: Rusty Russell <rusty@rustcorp.com.au>

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH] virtio_net: Fix open <-> interrupt race
  2008-02-06  4:00 ` Rusty Russell
@ 2008-02-06  7:50   ` Christian Borntraeger
  2008-02-06 11:44     ` Jeff Garzik
  0 siblings, 1 reply; 4+ messages in thread
From: Christian Borntraeger @ 2008-02-06  7:50 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Rusty Russell, netdev, virtualization

Jeff,

Rusty is (supposed to be) on vacation. Can you send this fix against
virtio_net with your next network driver fixes for 2.6.25? 

Thank you

-

I got the following oops during interface ifup. Unfortunately its not 
easily reproducable so I cant say for sure that my fix fixes this
problem, but I am confident and I think its correct anyway:

   <2>kernel BUG at /space/kvm/drivers/virtio/virtio_ring.c:234!
    <4>illegal operation: 0001 [#1] PREEMPT SMP
    <4>Modules linked in:
    <4>CPU: 0 Not tainted 2.6.24zlive-guest-07293-gf1ca151-dirty #91
    <4>Process swapper (pid: 0, task: 0000000000800938, ksp: 000000000084ddb8)
    <4>Krnl PSW : 0404300180000000 0000000000466374 (vring_disable_cb+0x30/0x34)
    <4>           R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:3 PM:0 EA:3
    <4>Krnl GPRS: 0000000000000001 0000000000000001 0000000010003800 0000000000466344
    <4>           000000000e980900 00000000008848b0 000000000084e748 0000000000000000
    <4>           000000000087b300 0000000000001237 0000000000001237 000000000f85bdd8
    <4>           000000000e980920 00000000001137c0 0000000000464754 000000000f85bdd8
    <4>Krnl Code: 0000000000466368: e3b0b0700004        lg      %r11,112(%r11)
    <4>           000000000046636e: 07fe                bcr     15,%r14
    <4>           0000000000466370: a7f40001            brc     15,466372
    <4>          >0000000000466374: a7f4fff6            brc     15,466360
    <4>           0000000000466378: eb7ff0500024        stmg    %r7,%r15,80(%r15)
    <4>           000000000046637e: a7f13e00            tmll    %r15,15872
    <4>           0000000000466382: b90400ef            lgr     %r14,%r15
    <4>           0000000000466386: a7840001            brc     8,466388
    <4>Call Trace:
    <4>([<000201500f85c000>] 0x201500f85c000)
    <4> [<0000000000466556>] vring_interrupt+0x72/0x88
    <4> [<00000000004801a0>] kvm_extint_handler+0x34/0x44
    <4> [<000000000010d22c>] do_extint+0xbc/0xf8
    <4> [<0000000000113f98>] ext_no_vtime+0x16/0x1a
    <4> [<000000000010a182>] cpu_idle+0x216/0x238
    <4>([<000000000010a162>] cpu_idle+0x1f6/0x238)
    <4> [<0000000000568656>] rest_init+0xaa/0xb8
    <4> [<000000000084ee2c>] start_kernel+0x3fc/0x490
    <4> [<0000000000100020>] _stext+0x20/0x80
    <4>
    <4> <0>Kernel panic - not syncing: Fatal exception in interrupt
    <4>

After looking at the code and the dump I think the following scenario
happened: Ifup was running on cpu2 and the interrupt arrived on cpu0.
Now virtnet_open on cpu 2 managed to execute napi_enable and disable_cb
but did not execute rx_schedule. Meanwhile on cpu 0 skb_recv_done was
called by vring_interrupt, executed netif_rx_schedule_prep, which
succeeded and therefore called disable_cb. This triggered the BUG_ON,
as interrupts were already disabled by cpu 2.

I think the proper solution is to make the call to disable_cb depend on
the atomic update of NAPI_STATE_SCHED by using netif_rx_schedule_prep
in the same way as skb_recv_done. 


Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>

---
 drivers/net/virtio_net.c |   10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

Index: kvm/drivers/net/virtio_net.c
===================================================================
--- kvm.orig/drivers/net/virtio_net.c
+++ kvm/drivers/net/virtio_net.c
@@ -321,10 +321,12 @@ static int virtnet_open(struct net_devic
 
 	/* If all buffers were filled by other side before we napi_enabled, we
 	 * won't get another interrupt, so process any outstanding packets
-	 * now.  virtnet_poll wants re-enable the queue, so we disable here. */
-	vi->rvq->vq_ops->disable_cb(vi->rvq);
-	netif_rx_schedule(vi->dev, &vi->napi);
-
+	 * now.  virtnet_poll wants re-enable the queue, so we disable here.
+	 * We synchronize against interrupts via NAPI_STATE_SCHED */
+	if (netif_rx_schedule_prep(dev, &vi->napi)) {
+		vi->rvq->vq_ops->disable_cb(vi->rvq);
+		__netif_rx_schedule(dev, &vi->napi);
+	}
 	return 0;
 }
 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] virtio_net: Fix open <-> interrupt race
  2008-02-06  7:50   ` Christian Borntraeger
@ 2008-02-06 11:44     ` Jeff Garzik
  0 siblings, 0 replies; 4+ messages in thread
From: Jeff Garzik @ 2008-02-06 11:44 UTC (permalink / raw)
  To: Christian Borntraeger; +Cc: Rusty Russell, netdev, virtualization

Christian Borntraeger wrote:
> Jeff,
> 
> Rusty is (supposed to be) on vacation. Can you send this fix against
> virtio_net with your next network driver fixes for 2.6.25? 
> 
> Thank you
> 
> -
> 
> I got the following oops during interface ifup. Unfortunately its not 
> easily reproducable so I cant say for sure that my fix fixes this
> problem, but I am confident and I think its correct anyway:
> 
>    <2>kernel BUG at /space/kvm/drivers/virtio/virtio_ring.c:234!
>     <4>illegal operation: 0001 [#1] PREEMPT SMP
>     <4>Modules linked in:
>     <4>CPU: 0 Not tainted 2.6.24zlive-guest-07293-gf1ca151-dirty #91
>     <4>Process swapper (pid: 0, task: 0000000000800938, ksp: 000000000084ddb8)
>     <4>Krnl PSW : 0404300180000000 0000000000466374 (vring_disable_cb+0x30/0x34)
>     <4>           R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:3 PM:0 EA:3
>     <4>Krnl GPRS: 0000000000000001 0000000000000001 0000000010003800 0000000000466344
>     <4>           000000000e980900 00000000008848b0 000000000084e748 0000000000000000
>     <4>           000000000087b300 0000000000001237 0000000000001237 000000000f85bdd8
>     <4>           000000000e980920 00000000001137c0 0000000000464754 000000000f85bdd8
>     <4>Krnl Code: 0000000000466368: e3b0b0700004        lg      %r11,112(%r11)
>     <4>           000000000046636e: 07fe                bcr     15,%r14
>     <4>           0000000000466370: a7f40001            brc     15,466372
>     <4>          >0000000000466374: a7f4fff6            brc     15,466360
>     <4>           0000000000466378: eb7ff0500024        stmg    %r7,%r15,80(%r15)
>     <4>           000000000046637e: a7f13e00            tmll    %r15,15872
>     <4>           0000000000466382: b90400ef            lgr     %r14,%r15
>     <4>           0000000000466386: a7840001            brc     8,466388
>     <4>Call Trace:
>     <4>([<000201500f85c000>] 0x201500f85c000)
>     <4> [<0000000000466556>] vring_interrupt+0x72/0x88
>     <4> [<00000000004801a0>] kvm_extint_handler+0x34/0x44
>     <4> [<000000000010d22c>] do_extint+0xbc/0xf8
>     <4> [<0000000000113f98>] ext_no_vtime+0x16/0x1a
>     <4> [<000000000010a182>] cpu_idle+0x216/0x238
>     <4>([<000000000010a162>] cpu_idle+0x1f6/0x238)
>     <4> [<0000000000568656>] rest_init+0xaa/0xb8
>     <4> [<000000000084ee2c>] start_kernel+0x3fc/0x490
>     <4> [<0000000000100020>] _stext+0x20/0x80
>     <4>
>     <4> <0>Kernel panic - not syncing: Fatal exception in interrupt
>     <4>
> 
> After looking at the code and the dump I think the following scenario
> happened: Ifup was running on cpu2 and the interrupt arrived on cpu0.
> Now virtnet_open on cpu 2 managed to execute napi_enable and disable_cb
> but did not execute rx_schedule. Meanwhile on cpu 0 skb_recv_done was
> called by vring_interrupt, executed netif_rx_schedule_prep, which
> succeeded and therefore called disable_cb. This triggered the BUG_ON,
> as interrupts were already disabled by cpu 2.
> 
> I think the proper solution is to make the call to disable_cb depend on
> the atomic update of NAPI_STATE_SCHED by using netif_rx_schedule_prep
> in the same way as skb_recv_done. 
> 
> 
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> Acked-by: Rusty Russell <rusty@rustcorp.com.au>
> 
> ---
>  drivers/net/virtio_net.c |   10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> Index: kvm/drivers/net/virtio_net.c
> ===================================================================
> --- kvm.orig/drivers/net/virtio_net.c
> +++ kvm/drivers/net/virtio_net.c
> @@ -321,10 +321,12 @@ static int virtnet_open(struct net_devic
>  
>  	/* If all buffers were filled by other side before we napi_enabled, we
>  	 * won't get another interrupt, so process any outstanding packets
> -	 * now.  virtnet_poll wants re-enable the queue, so we disable here. */
> -	vi->rvq->vq_ops->disable_cb(vi->rvq);
> -	netif_rx_schedule(vi->dev, &vi->napi);
> -
> +	 * now.  virtnet_poll wants re-enable the queue, so we disable here.
> +	 * We synchronize against interrupts via NAPI_STATE_SCHED */
> +	if (netif_rx_schedule_prep(dev, &vi->napi)) {
> +		vi->rvq->vq_ops->disable_cb(vi->rvq);
> +		__netif_rx_schedule(dev, &vi->napi);
> +	}
>  	return 0;
>  }

applied



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-02-06 11:44 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-05 13:36 [PATCH] virtio_net: Fix open <-> interrupt race Christian Borntraeger
2008-02-06  4:00 ` Rusty Russell
2008-02-06  7:50   ` Christian Borntraeger
2008-02-06 11:44     ` Jeff Garzik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).