xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* Pls help: netfront tx ring frozen (any clues appreciated)
       [not found] <CAJNqtuqZo5VKvGtYnGxp543dQ1FNk2Lz-8jzt5QnDYjR+XiS6w@mail.gmail.com>
@ 2012-02-23 16:29 ` Vijay Chander
  2012-02-25 15:46   ` Vijay Chander
  0 siblings, 1 reply; 5+ messages in thread
From: Vijay Chander @ 2012-02-23 16:29 UTC (permalink / raw)
  To: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1037 bytes --]

Hi,

    We are running into a situation where rsp_prod index in the shared ring
is not getting updated
for the netfront tx ring by the netback.

    We see that rsp_cons is the same value as rsp_prod, with req_prod 236
slots away(tx ring is full).
>From looking at the netfront driver code, it looks as if xennet_tx_buf_gc
processing only happens if rsp_prod is more
than rsp_cons.

   Our understanding is that netfront sets rsp_cons to tell the netback to
start processing transmits
from rsp_cons index onwards till req_prod. Once netback is done process X
requests, it will increment rsp_prod
by X. This will cause netfront to look at the status of each of individual
responses for the slots starting
from rsp_cons till rsp_prod (with rsp_prod  - rsp_cons = X in this case).

   Is there anyway to workaround this ? Will xennet_disconnect_backend(),
xennet_connect()
on the netfront cause us to recover from this stuck situation. We are ok
with pending TX packets getting dropped
since we have TCP running on top.

   Thanks,
-vijay

[-- Attachment #1.2: Type: text/html, Size: 1313 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Pls help: netfront tx ring frozen (any clues appreciated)
  2012-02-23 16:29 ` Pls help: netfront tx ring frozen (any clues appreciated) Vijay Chander
@ 2012-02-25 15:46   ` Vijay Chander
  2012-04-06 20:31     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 5+ messages in thread
From: Vijay Chander @ 2012-02-25 15:46 UTC (permalink / raw)
  To: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1422 bytes --]

If anybody encountered a similar situation as below where the netfront TX
ring is stuck ,
can you pls provide some pointers on how to get around this problem ?

This typically happens after about 2days of overnight traffic tests.

Thanks,
-vijay

On Thu, Feb 23, 2012 at 8:29 AM, Vijay Chander <vijay.chander@gmail.com>wrote:

>
>
> Hi,
>
>     We are running into a situation where rsp_prod index in the shared
> ring is not getting updated
> for the netfront tx ring by the netback.
>
>     We see that rsp_cons is the same value as rsp_prod, with req_prod 236
> slots away(tx ring is full).
> From looking at the netfront driver code, it looks as if xennet_tx_buf_gc
> processing only happens if rsp_prod is more
> than rsp_cons.
>
>    Our understanding is that netfront sets rsp_cons to tell the netback to
> start processing transmits
> from rsp_cons index onwards till req_prod. Once netback is done process X
> requests, it will increment rsp_prod
> by X. This will cause netfront to look at the status of each of individual
> responses for the slots starting
> from rsp_cons till rsp_prod (with rsp_prod  - rsp_cons = X in this case).
>
>    Is there anyway to workaround this ? Will xennet_disconnect_backend(),
> xennet_connect()
> on the netfront cause us to recover from this stuck situation. We are ok
> with pending TX packets getting dropped
> since we have TCP running on top.
>
>    Thanks,
> -vijay
>
>

[-- Attachment #1.2: Type: text/html, Size: 1999 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Pls help: netfront tx ring frozen (any clues appreciated)
  2012-02-25 15:46   ` Vijay Chander
@ 2012-04-06 20:31     ` Konrad Rzeszutek Wilk
  2012-04-09 19:09       ` Steve Prochniak
  0 siblings, 1 reply; 5+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-04-06 20:31 UTC (permalink / raw)
  To: Vijay Chander; +Cc: xen-devel

On Sat, Feb 25, 2012 at 07:46:36AM -0800, Vijay Chander wrote:
> If anybody encountered a similar situation as below where the netfront TX
> ring is stuck ,
> can you pls provide some pointers on how to get around this problem ?
> 
> This typically happens after about 2days of overnight traffic tests.

What kind of traffic? As in netperf for 48hrs? Is this from guest to guest
traffic or from outside host to the guest?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Pls help: netfront tx ring frozen (any clues appreciated)
  2012-04-06 20:31     ` Konrad Rzeszutek Wilk
@ 2012-04-09 19:09       ` Steve Prochniak
  2012-04-09 19:21         ` Steve Prochniak
  0 siblings, 1 reply; 5+ messages in thread
From: Steve Prochniak @ 2012-04-09 19:09 UTC (permalink / raw)
  To: Konrad Wilk; +Cc: xen-devel

I recall running into this problem while in development for a Network PV driver - though I don't recall if it was the TX or RX ring that would stall (maybe it was both or either).  During longevity testing, after days of nonstop traffic, something would go wrong and the interrupt would fail to clear.  This seemed to be a "after so many interrupts" bug, since halving the traffic would double the time necessary to reproduce.  At the time, we figured that we never saw this with the disk because it would have taken weeks to repro.

Mainly because of the length of time required to reproduce this, we never found out whether the problem was on the Dom0 or DomU side.  I worked around the problem by adding code that would detect that the condition was occurring, and then would trigger a reset of the event channel or interrupt.

Steve

-----Original Message-----
From: Konrad Rzeszutek Wilk 
Sent: Friday, April 06, 2012 4:32 PM
To: Vijay Chander
Cc: xen-devel@lists.xensource.com
Subject: Re: [Xen-devel] Pls help: netfront tx ring frozen (any clues appreciated)

On Sat, Feb 25, 2012 at 07:46:36AM -0800, Vijay Chander wrote:
> If anybody encountered a similar situation as below where the netfront TX
> ring is stuck ,
> can you pls provide some pointers on how to get around this problem ?
> 
> This typically happens after about 2days of overnight traffic tests.

What kind of traffic? As in netperf for 48hrs? Is this from guest to guest
traffic or from outside host to the guest?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Pls help: netfront tx ring frozen (any clues appreciated)
  2012-04-09 19:09       ` Steve Prochniak
@ 2012-04-09 19:21         ` Steve Prochniak
  0 siblings, 0 replies; 5+ messages in thread
From: Steve Prochniak @ 2012-04-09 19:21 UTC (permalink / raw)
  To: xen-devel

After digging up the code, when we observed this issue it was specific to the RX ring and it took about 4 days of nonstop traffic to reproduce.  So perhaps the issues are not related.

-----Original Message-----
From: Steve Prochniak 
Sent: Monday, April 09, 2012 3:09 PM
To: Konrad Wilk
Cc: xen-devel@lists.xensource.com
Subject: Re: [Xen-devel] Pls help: netfront tx ring frozen (any clues appreciated)

I recall running into this problem while in development for a Network PV driver - though I don't recall if it was the TX or RX ring that would stall (maybe it was both or either).  During longevity testing, after days of nonstop traffic, something would go wrong and the interrupt would fail to clear.  This seemed to be a "after so many interrupts" bug, since halving the traffic would double the time necessary to reproduce.  At the time, we figured that we never saw this with the disk because it would have taken weeks to repro.

Mainly because of the length of time required to reproduce this, we never found out whether the problem was on the Dom0 or DomU side.  I worked around the problem by adding code that would detect that the condition was occurring, and then would trigger a reset of the event channel or interrupt.

Steve

-----Original Message-----
From: Konrad Rzeszutek Wilk 
Sent: Friday, April 06, 2012 4:32 PM
To: Vijay Chander
Cc: xen-devel@lists.xensource.com
Subject: Re: [Xen-devel] Pls help: netfront tx ring frozen (any clues appreciated)

On Sat, Feb 25, 2012 at 07:46:36AM -0800, Vijay Chander wrote:
> If anybody encountered a similar situation as below where the netfront TX
> ring is stuck ,
> can you pls provide some pointers on how to get around this problem ?
> 
> This typically happens after about 2days of overnight traffic tests.

What kind of traffic? As in netperf for 48hrs? Is this from guest to guest
traffic or from outside host to the guest?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-04-09 19:21 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CAJNqtuqZo5VKvGtYnGxp543dQ1FNk2Lz-8jzt5QnDYjR+XiS6w@mail.gmail.com>
2012-02-23 16:29 ` Pls help: netfront tx ring frozen (any clues appreciated) Vijay Chander
2012-02-25 15:46   ` Vijay Chander
2012-04-06 20:31     ` Konrad Rzeszutek Wilk
2012-04-09 19:09       ` Steve Prochniak
2012-04-09 19:21         ` Steve Prochniak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).