* Re: PV DomU running linux 3.17.3 causing xen-netback fatal error in Dom0 [not found] <E3A471111497714D953FBC6F551B7AA7B6BD43C9@USALMELXP004.LUXGROUP.NET> @ 2014-12-09 22:39 ` Anthony Wright 0 siblings, 0 replies; 14+ messages in thread From: Anthony Wright @ 2014-12-09 22:39 UTC (permalink / raw) To: Siegmann Joseph; +Cc: xen-devel@lists.xensource.com [-- Attachment #1.1: Type: text/plain, Size: 319 bytes --] On 09/12/2014 22:30, Siegmann Joseph wrote: > > Would you mind sharing what it would take to correct this issue… is > there a file I could just replace until a patch is released? > We simply applied David Vrabel's patch from 8/12/14 to a stock 3.17.3 kernel we were running in the DomU and it fixed the problem. [-- Attachment #1.2: Type: text/html, Size: 2160 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 14+ messages in thread
* PV DomU running linux 3.17.3 causing xen-netback fatal error in Dom0
@ 2014-11-28 15:19 Anthony Wright
2014-11-28 15:23 ` Ian Campbell
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Anthony Wright @ 2014-11-28 15:19 UTC (permalink / raw)
To: xen-devel@lists.xensource.com
We have a 64 bit PV DomU that we recently upgraded from linux 3.3.2 to
3.17.3 running on a 64 bit 3.17.3 Dom0 with Xen 4.4.0.
Shortly after the upgrade we started to lose network connectivity to the
DomU a few times a day that required a reboot to fix. We see nothing in
the xen logs or xl dmesg, but when we looked at the dmesg output we saw
the following output for the two incidents we investigated in detail:
[69332.026586] vif vif-4-0 vif4.0: txreq.offset: 85e, size: 4002, end: 6144
[69332.026607] vif vif-4-0 vif4.0: fatal error; disabling device
[69332.031069] br-default: port 2(vif4.0) entered disabled state
[824365.530740] vif vif-9-0 vif9.0: txreq.offset: a5e, size: 4002, end: 6656
[824365.530748] vif vif-9-0 vif9.0: fatal error; disabling device
[824365.531191] br-default: port 2(vif9.0) entered disabled state
We have a very similar setup running on another machine with a 3.17.3
DomU, 3.17.3 Dom0 and Xen 4.4.0 but we can't reproduce the issue on this
machine. This is a test system rather than a production system so has a
different workload and fewer CPUs.
The piece of code that outputs the error is in
drivers/net/xen-netback/netback.c.
The DomU has 4000MB of RAM and 8 CPUs.
Any ideas?
Thanks,
Anthony.
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: PV DomU running linux 3.17.3 causing xen-netback fatal error in Dom0 2014-11-28 15:19 Anthony Wright @ 2014-11-28 15:23 ` Ian Campbell 2014-11-28 16:34 ` Anthony Wright 2014-12-01 14:22 ` David Vrabel 2014-12-08 12:03 ` David Vrabel 2 siblings, 1 reply; 14+ messages in thread From: Ian Campbell @ 2014-11-28 15:23 UTC (permalink / raw) To: Anthony Wright; +Cc: xen-devel@lists.xensource.com On Fri, 2014-11-28 at 15:19 +0000, Anthony Wright wrote: > We have a 64 bit PV DomU that we recently upgraded from linux 3.3.2 to > 3.17.3 Is this a Debian kernel? In which case you might be seeing https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=767261 , this will be fixed in the next upload of the kernel, test binaries with the fixes are referenced in the bug log. Even if not Debian then you'll probably want the same set of backports. Ian. > running on a 64 bit 3.17.3 Dom0 with Xen 4.4.0. > > Shortly after the upgrade we started to lose network connectivity to the > DomU a few times a day that required a reboot to fix. We see nothing in > the xen logs or xl dmesg, but when we looked at the dmesg output we saw > the following output for the two incidents we investigated in detail: > > [69332.026586] vif vif-4-0 vif4.0: txreq.offset: 85e, size: 4002, end: 6144 > [69332.026607] vif vif-4-0 vif4.0: fatal error; disabling device > [69332.031069] br-default: port 2(vif4.0) entered disabled state > > > [824365.530740] vif vif-9-0 vif9.0: txreq.offset: a5e, size: 4002, end: 6656 > [824365.530748] vif vif-9-0 vif9.0: fatal error; disabling device > [824365.531191] br-default: port 2(vif9.0) entered disabled state > > We have a very similar setup running on another machine with a 3.17.3 > DomU, 3.17.3 Dom0 and Xen 4.4.0 but we can't reproduce the issue on this > machine. This is a test system rather than a production system so has a > different workload and fewer CPUs. > > The piece of code that outputs the error is in > drivers/net/xen-netback/netback.c. > > The DomU has 4000MB of RAM and 8 CPUs. > > Any ideas? > > Thanks, > > Anthony. > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: PV DomU running linux 3.17.3 causing xen-netback fatal error in Dom0 2014-11-28 15:23 ` Ian Campbell @ 2014-11-28 16:34 ` Anthony Wright 0 siblings, 0 replies; 14+ messages in thread From: Anthony Wright @ 2014-11-28 16:34 UTC (permalink / raw) To: Ian Campbell; +Cc: xen-devel@lists.xensource.com On 28/11/2014 15:23, Ian Campbell wrote: > On Fri, 2014-11-28 at 15:19 +0000, Anthony Wright wrote: >> We have a 64 bit PV DomU that we recently upgraded from linux 3.3.2 to >> 3.17.3 > Is this a Debian kernel? In which case you might be seeing It's a stock kernel from kernel.org, we have a custom system with no relation to Debian. > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=767261 , this will be > fixed in the next upload of the kernel, test binaries with the fixes are > referenced in the bug log. The error messages we're seeing are different from those reported, both the Dom0 and DomU continue to run correctly and the vif doesn't degrade slowly it fails the test in netback.c below which disables the interface: /* No crossing a page as the payload mustn't fragment. */ if (unlikely((txreq.offset + txreq.size) > PAGE_SIZE)) { netdev_err(queue->vif->dev, "txreq.offset: %x, size: %u, end: %lu\n", txreq.offset, txreq.size, (txreq.offset&~PAGE_MASK) + txreq.size); xenvif_fatal_tx_err(queue->vif); break; } > Even if not Debian then you'll probably want the same set of backports. I'm happy to apply the backports if you think it's likely to fix the problem despite the different symptoms, but from what I can see it looks like a different problem. thanks, Anthony > Ian. >> running on a 64 bit 3.17.3 Dom0 with Xen 4.4.0. >> >> Shortly after the upgrade we started to lose network connectivity to the >> DomU a few times a day that required a reboot to fix. We see nothing in >> the xen logs or xl dmesg, but when we looked at the dmesg output we saw >> the following output for the two incidents we investigated in detail: >> >> [69332.026586] vif vif-4-0 vif4.0: txreq.offset: 85e, size: 4002, end: 6144 >> [69332.026607] vif vif-4-0 vif4.0: fatal error; disabling device >> [69332.031069] br-default: port 2(vif4.0) entered disabled state >> >> >> [824365.530740] vif vif-9-0 vif9.0: txreq.offset: a5e, size: 4002, end: 6656 >> [824365.530748] vif vif-9-0 vif9.0: fatal error; disabling device >> [824365.531191] br-default: port 2(vif9.0) entered disabled state >> >> We have a very similar setup running on another machine with a 3.17.3 >> DomU, 3.17.3 Dom0 and Xen 4.4.0 but we can't reproduce the issue on this >> machine. This is a test system rather than a production system so has a >> different workload and fewer CPUs. >> >> The piece of code that outputs the error is in >> drivers/net/xen-netback/netback.c. >> >> The DomU has 4000MB of RAM and 8 CPUs. >> >> Any ideas? >> >> Thanks, >> >> Anthony. >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: PV DomU running linux 3.17.3 causing xen-netback fatal error in Dom0 2014-11-28 15:19 Anthony Wright 2014-11-28 15:23 ` Ian Campbell @ 2014-12-01 14:22 ` David Vrabel 2014-12-01 15:47 ` Anthony Wright 2014-12-01 18:17 ` David Vrabel 2014-12-08 12:03 ` David Vrabel 2 siblings, 2 replies; 14+ messages in thread From: David Vrabel @ 2014-12-01 14:22 UTC (permalink / raw) To: Anthony Wright, xen-devel@lists.xensource.com On 28/11/14 15:19, Anthony Wright wrote: > We have a 64 bit PV DomU that we recently upgraded from linux 3.3.2 to > 3.17.3 running on a 64 bit 3.17.3 Dom0 with Xen 4.4.0. > > Shortly after the upgrade we started to lose network connectivity to the > DomU a few times a day that required a reboot to fix. We see nothing in > the xen logs or xl dmesg, but when we looked at the dmesg output we saw > the following output for the two incidents we investigated in detail: > > [69332.026586] vif vif-4-0 vif4.0: txreq.offset: 85e, size: 4002, end: 6144 > [69332.026607] vif vif-4-0 vif4.0: fatal error; disabling device > [69332.031069] br-default: port 2(vif4.0) entered disabled state The guest's frontend driver isn't putting valid requests onto the ring (it crosses a page boundary) so this is a frontend bug. What guest are you running? David ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: PV DomU running linux 3.17.3 causing xen-netback fatal error in Dom0 2014-12-01 14:22 ` David Vrabel @ 2014-12-01 15:47 ` Anthony Wright 2014-12-01 18:17 ` David Vrabel 1 sibling, 0 replies; 14+ messages in thread From: Anthony Wright @ 2014-12-01 15:47 UTC (permalink / raw) To: David Vrabel; +Cc: xen-devel > On 28/11/14 15:19, Anthony Wright wrote: > > We have a 64 bit PV DomU that we recently upgraded from linux 3.3.2 > > to > > 3.17.3 running on a 64 bit 3.17.3 Dom0 with Xen 4.4.0. > > > > Shortly after the upgrade we started to lose network connectivity to > > the > > DomU a few times a day that required a reboot to fix. We see nothing > > in > > the xen logs or xl dmesg, but when we looked at the dmesg output we > > saw > > the following output for the two incidents we investigated in > > detail: > > > > [69332.026586] vif vif-4-0 vif4.0: txreq.offset: 85e, size: 4002, > > end: 6144 > > [69332.026607] vif vif-4-0 vif4.0: fatal error; disabling device > > [69332.031069] br-default: port 2(vif4.0) entered disabled state > > The guest's frontend driver isn't putting valid requests onto the ring > (it crosses a page boundary) so this is a frontend bug. > > What guest are you running? We're running a custom built 64 bit para-virtualised DomU with a stock Linux 3.17.3 downloaded from kernel.org. The problem only started happening when we upgraded the DomU Linux kernel from 3.3.2 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: PV DomU running linux 3.17.3 causing xen-netback fatal error in Dom0 2014-12-01 14:22 ` David Vrabel 2014-12-01 15:47 ` Anthony Wright @ 2014-12-01 18:17 ` David Vrabel 2014-12-02 15:40 ` Anthony Wright 2014-12-04 15:36 ` Anthony Wright 1 sibling, 2 replies; 14+ messages in thread From: David Vrabel @ 2014-12-01 18:17 UTC (permalink / raw) To: David Vrabel, Anthony Wright, xen-devel@lists.xensource.com Cc: Wei Liu, Ian Campbell On 01/12/14 14:22, David Vrabel wrote: > On 28/11/14 15:19, Anthony Wright wrote: >> We have a 64 bit PV DomU that we recently upgraded from linux 3.3.2 to >> 3.17.3 running on a 64 bit 3.17.3 Dom0 with Xen 4.4.0. >> >> Shortly after the upgrade we started to lose network connectivity to the >> DomU a few times a day that required a reboot to fix. We see nothing in >> the xen logs or xl dmesg, but when we looked at the dmesg output we saw >> the following output for the two incidents we investigated in detail: >> >> [69332.026586] vif vif-4-0 vif4.0: txreq.offset: 85e, size: 4002, end: 6144 >> [69332.026607] vif vif-4-0 vif4.0: fatal error; disabling device >> [69332.031069] br-default: port 2(vif4.0) entered disabled state > > The guest's frontend driver isn't putting valid requests onto the ring > (it crosses a page boundary) so this is a frontend bug. This VIF protocol is weird. The first slot contains a txreq with a size for the total length of the packet, subsequent slots have sizes for that fragment only. netback then has to calculate how long the first slot is, by subtracting all the size from the following slots. So something has gone wrong but it's not obvious what it is. Any chance you can dump the ring state when it happens? > What guest are you running? Sorry, I missed that you said 3.17.3 for the domU as well. David ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: PV DomU running linux 3.17.3 causing xen-netback fatal error in Dom0 2014-12-01 18:17 ` David Vrabel @ 2014-12-02 15:40 ` Anthony Wright 2014-12-04 15:36 ` Anthony Wright 1 sibling, 0 replies; 14+ messages in thread From: Anthony Wright @ 2014-12-02 15:40 UTC (permalink / raw) To: David Vrabel; +Cc: xen-devel, Wei Liu, Ian Campbell ----- Original Message ----- > On 01/12/14 14:22, David Vrabel wrote: > > On 28/11/14 15:19, Anthony Wright wrote: > > The guest's frontend driver isn't putting valid requests onto the > > ring > > (it crosses a page boundary) so this is a frontend bug. > > This VIF protocol is weird. The first slot contains a txreq with a > size > for the total length of the packet, subsequent slots have sizes for > that > fragment only. > > netback then has to calculate how long the first slot is, by > subtracting > all the size from the following slots. > > So something has gone wrong but it's not obvious what it is. Any > chance > you can dump the ring state when it happens? Really sorry, but how do I dump the ring state? I can a root shell on both the Dom0 & DomU, but I don't know the command to use to dump the ring state. Anthony. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: PV DomU running linux 3.17.3 causing xen-netback fatal error in Dom0 2014-12-01 18:17 ` David Vrabel 2014-12-02 15:40 ` Anthony Wright @ 2014-12-04 15:36 ` Anthony Wright 2014-12-04 15:53 ` David Vrabel 1 sibling, 1 reply; 14+ messages in thread From: Anthony Wright @ 2014-12-04 15:36 UTC (permalink / raw) To: David Vrabel; +Cc: xen-devel, Wei Liu, Ian Campbell > On 01/12/14 14:22, David Vrabel wrote: > This VIF protocol is weird. The first slot contains a txreq with a > size > for the total length of the packet, subsequent slots have sizes for > that > fragment only. > > netback then has to calculate how long the first slot is, by > subtracting > all the size from the following slots. > > So something has gone wrong but it's not obvious what it is. Any > chance > you can dump the ring state when it happens? We think we've worked out how to dump the ring state, please see below. dmesg output ============ [76571.687014] vif vif-6-0 vif6.0: txreq.offset: a5e, size: 4002, end: 6656 [76571.687035] vif vif-6-0 vif6.0: fatal error; disabling device [76571.700304] br-primary-1: port 2(vif6.0) entered disabled state /sys/kernel/debug/xen-netback/vif6.0/io_ring_q0 =============================================== Queue 0 TX: nr_ents 256 req prod 10164 (39) cons 10127 (2) event 10126 (1) rsp prod 10125 (base) pvt 10125 (0) event 10145 (20) pending prod 9589 pending cons 9333 nr_pending_reqs 0 dealloc prod 8501 dealloc cons 8501 dealloc_queue 0 RX: nr_ents 256 req prod 1321 (41) cons 1280 (0) event 1 (-1279) rsp prod 1280 (base) pvt 1280 (0) event 1281 (1) NAPI state: 1 NAPI weight: 64 TX queue len 0 Credit timer_pending: 0, credit: 18446744073709551615, usec: 0 remaining: 18446744073678062682, expires: 0, now: 4314107964 /sys/kernel/debug/xen-netback/vif6.0/io_ring_q1 =============================================== Queue 1 TX: nr_ents 256 req prod 10106 (0) cons 10106 (0) event 10107 (1) rsp prod 10106 (base) pvt 10106 (0) event 10107 (1) pending prod 9573 pending cons 9317 nr_pending_reqs 0 dealloc prod 8503 dealloc cons 8503 dealloc_queue 0 RX: nr_ents 256 req prod 594 (39) cons 555 (0) event 1 (-554) rsp prod 555 (base) pvt 555 (0) event 556 (1) NAPI state: 1 NAPI weight: 64 TX queue len 0 Credit timer_pending: 0, credit: 18446744073709551615, usec: 0 remaining: 18446744073678038030, expires: 0, now: 4314118667 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: PV DomU running linux 3.17.3 causing xen-netback fatal error in Dom0 2014-12-04 15:36 ` Anthony Wright @ 2014-12-04 15:53 ` David Vrabel 2014-12-05 12:48 ` Zoltan Kiss 0 siblings, 1 reply; 14+ messages in thread From: David Vrabel @ 2014-12-04 15:53 UTC (permalink / raw) To: Anthony Wright, David Vrabel; +Cc: xen-devel, Wei Liu, Ian Campbell On 04/12/14 15:36, Anthony Wright wrote: >> On 01/12/14 14:22, David Vrabel wrote: >> This VIF protocol is weird. The first slot contains a txreq with a >> size >> for the total length of the packet, subsequent slots have sizes for >> that >> fragment only. >> >> netback then has to calculate how long the first slot is, by >> subtracting >> all the size from the following slots. >> >> So something has gone wrong but it's not obvious what it is. Any >> chance >> you can dump the ring state when it happens? > > We think we've worked out how to dump the ring state, please see below. We need the full contents of the ring which isn't currently available via debugfs and I haven't had time to put together a debug patch to make it available. David ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: PV DomU running linux 3.17.3 causing xen-netback fatal error in Dom0 2014-12-04 15:53 ` David Vrabel @ 2014-12-05 12:48 ` Zoltan Kiss 2014-12-05 14:16 ` David Vrabel 0 siblings, 1 reply; 14+ messages in thread From: Zoltan Kiss @ 2014-12-05 12:48 UTC (permalink / raw) To: David Vrabel, Anthony Wright; +Cc: xen-devel, Wei Liu, Ian Campbell Hi, Maybe I'm misreading it, but it seems to me that netfront doesn't slice up the linear buffer at all, just blindly sends it. In xennet_start_xmit: unsigned int offset = offset_in_page(data); unsigned int len = skb_headlen(skb); ... tx->offset = offset; tx->size = len; Although in the slot counting it calculates it correctly: DIV_ROUND_UP(offset + len, PAGE_SIZE) Am I missing something? Zoli On 04/12/14 15:53, David Vrabel wrote: > On 04/12/14 15:36, Anthony Wright wrote: >>> On 01/12/14 14:22, David Vrabel wrote: >>> This VIF protocol is weird. The first slot contains a txreq with a >>> size >>> for the total length of the packet, subsequent slots have sizes for >>> that >>> fragment only. >>> >>> netback then has to calculate how long the first slot is, by >>> subtracting >>> all the size from the following slots. >>> >>> So something has gone wrong but it's not obvious what it is. Any >>> chance >>> you can dump the ring state when it happens? >> >> We think we've worked out how to dump the ring state, please see below. > > We need the full contents of the ring which isn't currently available > via debugfs and I haven't had time to put together a debug patch to make > it available. > > David > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: PV DomU running linux 3.17.3 causing xen-netback fatal error in Dom0 2014-12-05 12:48 ` Zoltan Kiss @ 2014-12-05 14:16 ` David Vrabel 0 siblings, 0 replies; 14+ messages in thread From: David Vrabel @ 2014-12-05 14:16 UTC (permalink / raw) To: Zoltan Kiss, David Vrabel, Anthony Wright Cc: xen-devel, Wei Liu, Ian Campbell On 05/12/14 12:48, Zoltan Kiss wrote: > Hi, > > Maybe I'm misreading it, but it seems to me that netfront doesn't slice > up the linear buffer at all, just blindly sends it. In xennet_start_xmit: This is handled in the beginning of xennet_make_frags() (which I would agree isn't not the obvious place for it). David ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: PV DomU running linux 3.17.3 causing xen-netback fatal error in Dom0 2014-11-28 15:19 Anthony Wright 2014-11-28 15:23 ` Ian Campbell 2014-12-01 14:22 ` David Vrabel @ 2014-12-08 12:03 ` David Vrabel 2014-12-09 16:46 ` Anthony Wright 2 siblings, 1 reply; 14+ messages in thread From: David Vrabel @ 2014-12-08 12:03 UTC (permalink / raw) To: Anthony Wright, xen-devel@lists.xensource.com On 28/11/14 15:19, Anthony Wright wrote: > We have a 64 bit PV DomU that we recently upgraded from linux 3.3.2 to > 3.17.3 running on a 64 bit 3.17.3 Dom0 with Xen 4.4.0. > > Shortly after the upgrade we started to lose network connectivity to the > DomU a few times a day that required a reboot to fix. We see nothing in > the xen logs or xl dmesg, but when we looked at the dmesg output we saw > the following output for the two incidents we investigated in detail: > > [69332.026586] vif vif-4-0 vif4.0: txreq.offset: 85e, size: 4002, end: 6144 > [69332.026607] vif vif-4-0 vif4.0: fatal error; disabling device > [69332.031069] br-default: port 2(vif4.0) entered disabled state > > > [824365.530740] vif vif-9-0 vif9.0: txreq.offset: a5e, size: 4002, end: 6656 > [824365.530748] vif vif-9-0 vif9.0: fatal error; disabling device > [824365.531191] br-default: port 2(vif9.0) entered disabled state > > We have a very similar setup running on another machine with a 3.17.3 > DomU, 3.17.3 Dom0 and Xen 4.4.0 but we can't reproduce the issue on this > machine. This is a test system rather than a production system so has a > different workload and fewer CPUs. > > The piece of code that outputs the error is in > drivers/net/xen-netback/netback.c. Does this patch to netfront fix it? 8<--------------------------------------------- xen-netfront: use correct linear area after linearizing an skb Commit 97a6d1bb2b658ac85ed88205ccd1ab809899884d (xen-netfront: Fix handling packets on compound pages with skb_linearize) attempted to fix a problem where an skb that would have required too many slots would be dropped causing TCP connections to stall. However, it filled in the first slot using the original buffer and not the new one and would use the wrong offset and grant access to the wrong page. Netback would notice the malformed request and stop all traffic on the VIF, reporting: vif vif-3-0 vif3.0: txreq.offset: 85e, size: 4002, end: 6144 vif vif-3-0 vif3.0: fatal error; disabling device Signed-off-by: David Vrabel <david.vrabel@citrix.com> --- drivers/net/xen-netfront.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c index ece8d18..eeed0ce 100644 --- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -627,6 +627,9 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev) slots, skb->len); if (skb_linearize(skb)) goto drop; + data = skb->data; + offset = offset_in_page(data); + len = skb_headlen(skb); } spin_lock_irqsave(&queue->tx_lock, flags); -- 1.7.10.4 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: PV DomU running linux 3.17.3 causing xen-netback fatal error in Dom0 2014-12-08 12:03 ` David Vrabel @ 2014-12-09 16:46 ` Anthony Wright 0 siblings, 0 replies; 14+ messages in thread From: Anthony Wright @ 2014-12-09 16:46 UTC (permalink / raw) To: David Vrabel, xen-devel@lists.xensource.com On 08/12/2014 12:03, David Vrabel wrote: > Does this patch to netfront fix it? > > 8<--------------------------------------------- > xen-netfront: use correct linear area after linearizing an skb > > Commit 97a6d1bb2b658ac85ed88205ccd1ab809899884d (xen-netfront: Fix > handling packets on compound pages with skb_linearize) attempted to > fix a problem where an skb that would have required too many slots > would be dropped causing TCP connections to stall. > > However, it filled in the first slot using the original buffer and not > the new one and would use the wrong offset and grant access to the > wrong page. > > Netback would notice the malformed request and stop all traffic on the > VIF, reporting: > > vif vif-3-0 vif3.0: txreq.offset: 85e, size: 4002, end: 6144 > vif vif-3-0 vif3.0: fatal error; disabling device > > Signed-off-by: David Vrabel <david.vrabel@citrix.com> > --- > drivers/net/xen-netfront.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c > index ece8d18..eeed0ce 100644 > --- a/drivers/net/xen-netfront.c > +++ b/drivers/net/xen-netfront.c > @@ -627,6 +627,9 @@ static int xennet_start_xmit(struct sk_buff *skb, > struct net_device *dev) > slots, skb->len); > if (skb_linearize(skb)) > goto drop; > + data = skb->data; > + offset = offset_in_page(data); > + len = skb_headlen(skb); > } > > spin_lock_irqsave(&queue->tx_lock, flags); The patch seems to have worked. Before we'd managed to reproduce the problem in under 10 seconds, with the patch we haven't seen the problem on the test or production systems. Thank you. Anthony. ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2014-12-09 22:39 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <E3A471111497714D953FBC6F551B7AA7B6BD43C9@USALMELXP004.LUXGROUP.NET>
2014-12-09 22:39 ` PV DomU running linux 3.17.3 causing xen-netback fatal error in Dom0 Anthony Wright
2014-11-28 15:19 Anthony Wright
2014-11-28 15:23 ` Ian Campbell
2014-11-28 16:34 ` Anthony Wright
2014-12-01 14:22 ` David Vrabel
2014-12-01 15:47 ` Anthony Wright
2014-12-01 18:17 ` David Vrabel
2014-12-02 15:40 ` Anthony Wright
2014-12-04 15:36 ` Anthony Wright
2014-12-04 15:53 ` David Vrabel
2014-12-05 12:48 ` Zoltan Kiss
2014-12-05 14:16 ` David Vrabel
2014-12-08 12:03 ` David Vrabel
2014-12-09 16:46 ` Anthony Wright
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.