* Xennet half die---netfront TX queue was stopped.
@ 2012-10-31 3:47 Yi, Shunli
2012-10-31 9:12 ` Ian Campbell
0 siblings, 1 reply; 5+ messages in thread
From: Yi, Shunli @ 2012-10-31 3:47 UTC (permalink / raw)
To: xen-devel@lists.xensource.com
[-- Attachment #1.1: Type: text/plain, Size: 2290 bytes --]
Hi,
I encountered a strange issue, the xennet interface in DomU stopped sending out anything in some rare cases.
I got chance to get more information on one reproduce, here are some findings.
The netfront driver check there isn't available TX slot any more, and it stopped the TX queue.
static inline int netfront_tx_slot_available(struct netfront_info *np)
{
return ((np->tx.req_prod_pvt - np->tx.rsp_cons) <
(TX_MAX_TARGET - MAX_SKB_FRAGS - 2));
}
Here is some runtime debugging information after that issue occurred:
[3833225.489956] tx.req_prod_pvt=0x210daaa tx.rsp_cons=0x210d9be
[3833225.489958] TX_MAX_TARGET = 0x100, MAX_SKB_FRAGS = 0x12 dev->state=0x7
[3833225.489961] np->tx.sring->rsp_prod = 0x210d9be np->tx.sring->req_prod=0x210daaa
[3833225.489964] np->tx.sring->req_event=0x210d9bf np->tx.sring->rsp_event=0x210da35
The "dev->state" of xennet interface in DomU:
[3833225.489968] __LINK_STATE_XOFF: yes
[3833225.489970] __LINK_STATE_START: yes
[3833225.489971] __LINK_STATE_PRESENT: yes
[3833225.489973] __LINK_STATE_SCHED: no
[3833225.489975] __LINK_STATE_NOCARRIER: no
[3833225.489976] __LINK_STATE_RX_SCHED: no
[3833225.489978] __LINK_STATE_LINKWATCH_PENDING: no
[3833225.489979] __LINK_STATE_DORMANT: no
[3833225.489981] __LINK_STATE_QDISC_RUNNING: no
Due to tx.rsp_cons == np->tx.sring->rsp_prod == 0x210d9be, the network_tx_buf_gc() will do nothing.
The problem is, the TX queue will never been enable any more.
Could anybody helps to understand this, any inputs are appreciated.
The platform information:
Xen: 3.4.2 x86_64 8GB MEM + 8 CPU cores.
Dom0: 2.6.28.8(xenified kernel) x86 1 GB MEM + 1 CPU cores
DomU: 2.6.28.8(xenified kernel) x86 3 GB MEM + 2 CPU cores
<there are other two DomUs running without any problems>
Thanks,
-Shunli
Protected by Websense Hosted Email Security -- www.websense.com
[-- Attachment #1.2: Type: text/html, Size: 10569 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Xennet half die---netfront TX queue was stopped.
2012-10-31 3:47 Xennet half die---netfront TX queue was stopped Yi, Shunli
@ 2012-10-31 9:12 ` Ian Campbell
2012-10-31 10:16 ` Yi, Shunli
0 siblings, 1 reply; 5+ messages in thread
From: Ian Campbell @ 2012-10-31 9:12 UTC (permalink / raw)
To: Yi, Shunli; +Cc: xen-devel@lists.xensource.com
On Wed, 2012-10-31 at 03:47 +0000, Yi, Shunli wrote:
>
> Dom0: 2.6.28.8(xenified kernel) x86 1 GB MEM + 1 CPU
> cores
>
> DomU: 2.6.28.8(xenified kernel) x86 3 GB MEM + 2 CPU
> cores
Those are both ancient and AFAIK not supported by any distro. I strongly
recommend you update to something more recent -- either a kernel
supported your distro or an up to date mainine version.
I see a commit in mainline "xen-netfront: correct MAX_TX_TARGET
calculation" which might be related.
Ian.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Xennet half die---netfront TX queue was stopped.
2012-10-31 9:12 ` Ian Campbell
@ 2012-10-31 10:16 ` Yi, Shunli
2012-10-31 10:20 ` Ian Campbell
0 siblings, 1 reply; 5+ messages in thread
From: Yi, Shunli @ 2012-10-31 10:16 UTC (permalink / raw)
To: Ian Campbell; +Cc: xen-devel@lists.xensource.com
Ian,
Thanks for your information.
And sorry for writing a wrong kernel version by mistake, we are using 2.6.18.8, which was downloaded from Xen.org when got Xen-3.4.2 release.
I've seen that patch, just don't think it can impacts this. (http://xenbits.xen.org/hg/linux-2.6.18-xen.hg/rev/ddb83aec5afc )
You can see the " TX_MAX_TARGET = 0x100" in the log below, it's 256 already.
Actually, we are doubting it's an overflow issue, I'm reading the netfront and netback to find the possibility of overflow.
>From one occurrence, I got the details from both netfront and netback. Wish it can help us to find something:
On backend interface:
[3357955.991282] rx.rsp_prod_pvt=0xc11734a0 rx.req_cons=0xf4d0eb80
[3357955.991283] tx.rsp_prod_pvt=0xc0af64 tx.req_cons=0xc0af64
On frontend interface:
[3363909.017567] tx.req_prod_pvt=0x12ef8f3 tx.rsp_cons=0x12ef807
[3363909.017569] TX_MAX_TARGET = 0x100, MAX_SKB_FRAGS = 0x12 dev->state=0x7
[3363909.017572] np->tx.sring->rsp_prod = 0x12ef807 np->tx.sring->req_prod=0x12ef8f3
[3363909.017575] np->tx.sring->req_event=0x12ef800 np->tx.sring->rsp_event=0x12ef87e
We have some rack servers in product encountered this rarely, no way to reproduce in lab now.
I've setup two rack servers to reproduce this in lab, both run about 20 days without reproducing.
Could somebody share some experience on this ? Any sharing would be appreciated.
Great thanks.
-Shunli
-----Original Message-----
From: Ian Campbell [mailto:ian.campbell@citrix.com]
Sent: Wednesday, October 31, 2012 5:13 PM
To: Yi, Shunli
Cc: xen-devel@lists.xensource.com
Subject: Re: [Xen-devel] Xennet half die---netfront TX queue was stopped.
On Wed, 2012-10-31 at 03:47 +0000, Yi, Shunli wrote:
>
> Dom0: 2.6.28.8(xenified kernel) x86 1 GB MEM + 1 CPU
> cores
>
> DomU: 2.6.28.8(xenified kernel) x86 3 GB MEM + 2 CPU
> cores
Those are both ancient and AFAIK not supported by any distro. I strongly recommend you update to something more recent -- either a kernel supported your distro or an up to date mainine version.
I see a commit in mainline "xen-netfront: correct MAX_TX_TARGET calculation" which might be related.
Ian.
To report this as spam, please forward to spam@websense.com. Thank you.
Protected by Websense Hosted Email Security -- www.websense.com
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Xennet half die---netfront TX queue was stopped.
2012-10-31 10:16 ` Yi, Shunli
@ 2012-10-31 10:20 ` Ian Campbell
2012-10-31 12:35 ` 答复: " Yi, Shunli
0 siblings, 1 reply; 5+ messages in thread
From: Ian Campbell @ 2012-10-31 10:20 UTC (permalink / raw)
To: Yi, Shunli; +Cc: xen-devel@lists.xensource.com
On Wed, 2012-10-31 at 10:16 +0000, Yi, Shunli wrote:
> Ian,
> Thanks for your information.
> And sorry for writing a wrong kernel version by mistake, we are using
> 2.6.18.8,
2.6.18.8 is even more ancient.
> which was downloaded from Xen.org when got Xen-3.4.2 release.
Well, 3.4.2 is also pretty old too.
In general there is no requirement to use the kernel which is supplied
with a given version of Xen (we don't even supply one any more). So
there is no real reason to stick with the 2.6.18 that happened to come
with 3.4.2.
You should upgrade at least your kernel as I suggested.
Ian.
^ permalink raw reply [flat|nested] 5+ messages in thread
* 答复: Xennet half die---netfront TX queue was stopped.
2012-10-31 10:20 ` Ian Campbell
@ 2012-10-31 12:35 ` Yi, Shunli
0 siblings, 0 replies; 5+ messages in thread
From: Yi, Shunli @ 2012-10-31 12:35 UTC (permalink / raw)
To: Ian Campbell; +Cc: xen-devel@lists.xensource.com
>> Ian,
>> Thanks for your information.
>> And sorry for writing a wrong kernel version by mistake, we are using
>> 2.6.18.8,
>2.6.18.8 is even more ancient.
>> which was downloaded from Xen.org when got Xen-3.4.2 release.
>Well, 3.4.2 is also pretty old too.
>In general there is no requirement to use the kernel which is supplied
>with a given version of Xen (we don't even supply one any more). So
>there is no real reason to stick with the 2.6.18 that happened to come
>with 3.4.2.
>You should upgrade at least your kernel as I suggested.
Yes, I agree and we are planning to migrate to the newer version(include the Xen and kernel).
But before that, I'm still trying to find a quick fix for that.
Thanks for your time . ^_^
-Shunli
Protected by Websense Hosted Email Security -- www.websense.com
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2012-10-31 12:35 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-10-31 3:47 Xennet half die---netfront TX queue was stopped Yi, Shunli
2012-10-31 9:12 ` Ian Campbell
2012-10-31 10:16 ` Yi, Shunli
2012-10-31 10:20 ` Ian Campbell
2012-10-31 12:35 ` 答复: " Yi, Shunli
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).