* PV DomU running linux 3.17.3 causing xen-netback fatal error in Dom0
@ 2014-11-28 15:19 Anthony Wright
2014-11-28 15:23 ` Ian Campbell
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Anthony Wright @ 2014-11-28 15:19 UTC (permalink / raw)
To: xen-devel@lists.xensource.com
We have a 64 bit PV DomU that we recently upgraded from linux 3.3.2 to
3.17.3 running on a 64 bit 3.17.3 Dom0 with Xen 4.4.0.
Shortly after the upgrade we started to lose network connectivity to the
DomU a few times a day that required a reboot to fix. We see nothing in
the xen logs or xl dmesg, but when we looked at the dmesg output we saw
the following output for the two incidents we investigated in detail:
[69332.026586] vif vif-4-0 vif4.0: txreq.offset: 85e, size: 4002, end: 6144
[69332.026607] vif vif-4-0 vif4.0: fatal error; disabling device
[69332.031069] br-default: port 2(vif4.0) entered disabled state
[824365.530740] vif vif-9-0 vif9.0: txreq.offset: a5e, size: 4002, end: 6656
[824365.530748] vif vif-9-0 vif9.0: fatal error; disabling device
[824365.531191] br-default: port 2(vif9.0) entered disabled state
We have a very similar setup running on another machine with a 3.17.3
DomU, 3.17.3 Dom0 and Xen 4.4.0 but we can't reproduce the issue on this
machine. This is a test system rather than a production system so has a
different workload and fewer CPUs.
The piece of code that outputs the error is in
drivers/net/xen-netback/netback.c.
The DomU has 4000MB of RAM and 8 CPUs.
Any ideas?
Thanks,
Anthony.
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: PV DomU running linux 3.17.3 causing xen-netback fatal error in Dom0
2014-11-28 15:19 PV DomU running linux 3.17.3 causing xen-netback fatal error in Dom0 Anthony Wright
@ 2014-11-28 15:23 ` Ian Campbell
2014-11-28 16:34 ` Anthony Wright
2014-12-01 14:22 ` David Vrabel
2014-12-08 12:03 ` David Vrabel
2 siblings, 1 reply; 14+ messages in thread
From: Ian Campbell @ 2014-11-28 15:23 UTC (permalink / raw)
To: Anthony Wright; +Cc: xen-devel@lists.xensource.com
On Fri, 2014-11-28 at 15:19 +0000, Anthony Wright wrote:
> We have a 64 bit PV DomU that we recently upgraded from linux 3.3.2 to
> 3.17.3
Is this a Debian kernel? In which case you might be seeing
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=767261 , this will be
fixed in the next upload of the kernel, test binaries with the fixes are
referenced in the bug log.
Even if not Debian then you'll probably want the same set of backports.
Ian.
> running on a 64 bit 3.17.3 Dom0 with Xen 4.4.0.
>
> Shortly after the upgrade we started to lose network connectivity to the
> DomU a few times a day that required a reboot to fix. We see nothing in
> the xen logs or xl dmesg, but when we looked at the dmesg output we saw
> the following output for the two incidents we investigated in detail:
>
> [69332.026586] vif vif-4-0 vif4.0: txreq.offset: 85e, size: 4002, end: 6144
> [69332.026607] vif vif-4-0 vif4.0: fatal error; disabling device
> [69332.031069] br-default: port 2(vif4.0) entered disabled state
>
>
> [824365.530740] vif vif-9-0 vif9.0: txreq.offset: a5e, size: 4002, end: 6656
> [824365.530748] vif vif-9-0 vif9.0: fatal error; disabling device
> [824365.531191] br-default: port 2(vif9.0) entered disabled state
>
> We have a very similar setup running on another machine with a 3.17.3
> DomU, 3.17.3 Dom0 and Xen 4.4.0 but we can't reproduce the issue on this
> machine. This is a test system rather than a production system so has a
> different workload and fewer CPUs.
>
> The piece of code that outputs the error is in
> drivers/net/xen-netback/netback.c.
>
> The DomU has 4000MB of RAM and 8 CPUs.
>
> Any ideas?
>
> Thanks,
>
> Anthony.
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: PV DomU running linux 3.17.3 causing xen-netback fatal error in Dom0
2014-11-28 15:23 ` Ian Campbell
@ 2014-11-28 16:34 ` Anthony Wright
0 siblings, 0 replies; 14+ messages in thread
From: Anthony Wright @ 2014-11-28 16:34 UTC (permalink / raw)
To: Ian Campbell; +Cc: xen-devel@lists.xensource.com
On 28/11/2014 15:23, Ian Campbell wrote:
> On Fri, 2014-11-28 at 15:19 +0000, Anthony Wright wrote:
>> We have a 64 bit PV DomU that we recently upgraded from linux 3.3.2 to
>> 3.17.3
> Is this a Debian kernel? In which case you might be seeing
It's a stock kernel from kernel.org, we have a custom system with no
relation to Debian.
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=767261 , this will be
> fixed in the next upload of the kernel, test binaries with the fixes are
> referenced in the bug log.
The error messages we're seeing are different from those reported, both
the Dom0 and DomU continue to run correctly and the vif doesn't degrade
slowly it fails the test in netback.c below which disables the interface:
/* No crossing a page as the payload mustn't fragment. */
if (unlikely((txreq.offset + txreq.size) > PAGE_SIZE)) {
netdev_err(queue->vif->dev,
"txreq.offset: %x, size: %u, end: %lu\n",
txreq.offset, txreq.size,
(txreq.offset&~PAGE_MASK) + txreq.size);
xenvif_fatal_tx_err(queue->vif);
break;
}
> Even if not Debian then you'll probably want the same set of backports.
I'm happy to apply the backports if you think it's likely to fix the
problem despite the different symptoms, but from what I can see it looks
like a different problem.
thanks,
Anthony
> Ian.
>> running on a 64 bit 3.17.3 Dom0 with Xen 4.4.0.
>>
>> Shortly after the upgrade we started to lose network connectivity to the
>> DomU a few times a day that required a reboot to fix. We see nothing in
>> the xen logs or xl dmesg, but when we looked at the dmesg output we saw
>> the following output for the two incidents we investigated in detail:
>>
>> [69332.026586] vif vif-4-0 vif4.0: txreq.offset: 85e, size: 4002, end: 6144
>> [69332.026607] vif vif-4-0 vif4.0: fatal error; disabling device
>> [69332.031069] br-default: port 2(vif4.0) entered disabled state
>>
>>
>> [824365.530740] vif vif-9-0 vif9.0: txreq.offset: a5e, size: 4002, end: 6656
>> [824365.530748] vif vif-9-0 vif9.0: fatal error; disabling device
>> [824365.531191] br-default: port 2(vif9.0) entered disabled state
>>
>> We have a very similar setup running on another machine with a 3.17.3
>> DomU, 3.17.3 Dom0 and Xen 4.4.0 but we can't reproduce the issue on this
>> machine. This is a test system rather than a production system so has a
>> different workload and fewer CPUs.
>>
>> The piece of code that outputs the error is in
>> drivers/net/xen-netback/netback.c.
>>
>> The DomU has 4000MB of RAM and 8 CPUs.
>>
>> Any ideas?
>>
>> Thanks,
>>
>> Anthony.
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: PV DomU running linux 3.17.3 causing xen-netback fatal error in Dom0
2014-11-28 15:19 PV DomU running linux 3.17.3 causing xen-netback fatal error in Dom0 Anthony Wright
2014-11-28 15:23 ` Ian Campbell
@ 2014-12-01 14:22 ` David Vrabel
2014-12-01 15:47 ` Anthony Wright
2014-12-01 18:17 ` David Vrabel
2014-12-08 12:03 ` David Vrabel
2 siblings, 2 replies; 14+ messages in thread
From: David Vrabel @ 2014-12-01 14:22 UTC (permalink / raw)
To: Anthony Wright, xen-devel@lists.xensource.com
On 28/11/14 15:19, Anthony Wright wrote:
> We have a 64 bit PV DomU that we recently upgraded from linux 3.3.2 to
> 3.17.3 running on a 64 bit 3.17.3 Dom0 with Xen 4.4.0.
>
> Shortly after the upgrade we started to lose network connectivity to the
> DomU a few times a day that required a reboot to fix. We see nothing in
> the xen logs or xl dmesg, but when we looked at the dmesg output we saw
> the following output for the two incidents we investigated in detail:
>
> [69332.026586] vif vif-4-0 vif4.0: txreq.offset: 85e, size: 4002, end: 6144
> [69332.026607] vif vif-4-0 vif4.0: fatal error; disabling device
> [69332.031069] br-default: port 2(vif4.0) entered disabled state
The guest's frontend driver isn't putting valid requests onto the ring
(it crosses a page boundary) so this is a frontend bug.
What guest are you running?
David
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: PV DomU running linux 3.17.3 causing xen-netback fatal error in Dom0
2014-12-01 14:22 ` David Vrabel
@ 2014-12-01 15:47 ` Anthony Wright
2014-12-01 18:17 ` David Vrabel
1 sibling, 0 replies; 14+ messages in thread
From: Anthony Wright @ 2014-12-01 15:47 UTC (permalink / raw)
To: David Vrabel; +Cc: xen-devel
> On 28/11/14 15:19, Anthony Wright wrote:
> > We have a 64 bit PV DomU that we recently upgraded from linux 3.3.2
> > to
> > 3.17.3 running on a 64 bit 3.17.3 Dom0 with Xen 4.4.0.
> >
> > Shortly after the upgrade we started to lose network connectivity to
> > the
> > DomU a few times a day that required a reboot to fix. We see nothing
> > in
> > the xen logs or xl dmesg, but when we looked at the dmesg output we
> > saw
> > the following output for the two incidents we investigated in
> > detail:
> >
> > [69332.026586] vif vif-4-0 vif4.0: txreq.offset: 85e, size: 4002,
> > end: 6144
> > [69332.026607] vif vif-4-0 vif4.0: fatal error; disabling device
> > [69332.031069] br-default: port 2(vif4.0) entered disabled state
>
> The guest's frontend driver isn't putting valid requests onto the ring
> (it crosses a page boundary) so this is a frontend bug.
>
> What guest are you running?
We're running a custom built 64 bit para-virtualised DomU with a stock Linux 3.17.3 downloaded from kernel.org. The problem only started happening when we upgraded the DomU Linux kernel from 3.3.2
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: PV DomU running linux 3.17.3 causing xen-netback fatal error in Dom0
2014-12-01 14:22 ` David Vrabel
2014-12-01 15:47 ` Anthony Wright
@ 2014-12-01 18:17 ` David Vrabel
2014-12-02 15:40 ` Anthony Wright
2014-12-04 15:36 ` Anthony Wright
1 sibling, 2 replies; 14+ messages in thread
From: David Vrabel @ 2014-12-01 18:17 UTC (permalink / raw)
To: David Vrabel, Anthony Wright, xen-devel@lists.xensource.com
Cc: Wei Liu, Ian Campbell
On 01/12/14 14:22, David Vrabel wrote:
> On 28/11/14 15:19, Anthony Wright wrote:
>> We have a 64 bit PV DomU that we recently upgraded from linux 3.3.2 to
>> 3.17.3 running on a 64 bit 3.17.3 Dom0 with Xen 4.4.0.
>>
>> Shortly after the upgrade we started to lose network connectivity to the
>> DomU a few times a day that required a reboot to fix. We see nothing in
>> the xen logs or xl dmesg, but when we looked at the dmesg output we saw
>> the following output for the two incidents we investigated in detail:
>>
>> [69332.026586] vif vif-4-0 vif4.0: txreq.offset: 85e, size: 4002, end: 6144
>> [69332.026607] vif vif-4-0 vif4.0: fatal error; disabling device
>> [69332.031069] br-default: port 2(vif4.0) entered disabled state
>
> The guest's frontend driver isn't putting valid requests onto the ring
> (it crosses a page boundary) so this is a frontend bug.
This VIF protocol is weird. The first slot contains a txreq with a size
for the total length of the packet, subsequent slots have sizes for that
fragment only.
netback then has to calculate how long the first slot is, by subtracting
all the size from the following slots.
So something has gone wrong but it's not obvious what it is. Any chance
you can dump the ring state when it happens?
> What guest are you running?
Sorry, I missed that you said 3.17.3 for the domU as well.
David
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: PV DomU running linux 3.17.3 causing xen-netback fatal error in Dom0
2014-12-01 18:17 ` David Vrabel
@ 2014-12-02 15:40 ` Anthony Wright
2014-12-04 15:36 ` Anthony Wright
1 sibling, 0 replies; 14+ messages in thread
From: Anthony Wright @ 2014-12-02 15:40 UTC (permalink / raw)
To: David Vrabel; +Cc: xen-devel, Wei Liu, Ian Campbell
----- Original Message -----
> On 01/12/14 14:22, David Vrabel wrote:
> > On 28/11/14 15:19, Anthony Wright wrote:
> > The guest's frontend driver isn't putting valid requests onto the
> > ring
> > (it crosses a page boundary) so this is a frontend bug.
>
> This VIF protocol is weird. The first slot contains a txreq with a
> size
> for the total length of the packet, subsequent slots have sizes for
> that
> fragment only.
>
> netback then has to calculate how long the first slot is, by
> subtracting
> all the size from the following slots.
>
> So something has gone wrong but it's not obvious what it is. Any
> chance
> you can dump the ring state when it happens?
Really sorry, but how do I dump the ring state? I can a root shell on both the Dom0 & DomU, but I don't know the command to use to dump the ring state.
Anthony.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: PV DomU running linux 3.17.3 causing xen-netback fatal error in Dom0
2014-12-01 18:17 ` David Vrabel
2014-12-02 15:40 ` Anthony Wright
@ 2014-12-04 15:36 ` Anthony Wright
2014-12-04 15:53 ` David Vrabel
1 sibling, 1 reply; 14+ messages in thread
From: Anthony Wright @ 2014-12-04 15:36 UTC (permalink / raw)
To: David Vrabel; +Cc: xen-devel, Wei Liu, Ian Campbell
> On 01/12/14 14:22, David Vrabel wrote:
> This VIF protocol is weird. The first slot contains a txreq with a
> size
> for the total length of the packet, subsequent slots have sizes for
> that
> fragment only.
>
> netback then has to calculate how long the first slot is, by
> subtracting
> all the size from the following slots.
>
> So something has gone wrong but it's not obvious what it is. Any
> chance
> you can dump the ring state when it happens?
We think we've worked out how to dump the ring state, please see below.
dmesg output
============
[76571.687014] vif vif-6-0 vif6.0: txreq.offset: a5e, size: 4002, end: 6656
[76571.687035] vif vif-6-0 vif6.0: fatal error; disabling device
[76571.700304] br-primary-1: port 2(vif6.0) entered disabled state
/sys/kernel/debug/xen-netback/vif6.0/io_ring_q0
===============================================
Queue 0
TX: nr_ents 256
req prod 10164 (39) cons 10127 (2) event 10126 (1)
rsp prod 10125 (base) pvt 10125 (0) event 10145 (20)
pending prod 9589 pending cons 9333 nr_pending_reqs 0
dealloc prod 8501 dealloc cons 8501 dealloc_queue 0
RX: nr_ents 256
req prod 1321 (41) cons 1280 (0) event 1 (-1279)
rsp prod 1280 (base) pvt 1280 (0) event 1281 (1)
NAPI state: 1 NAPI weight: 64 TX queue len 0
Credit timer_pending: 0, credit: 18446744073709551615, usec: 0
remaining: 18446744073678062682, expires: 0, now: 4314107964
/sys/kernel/debug/xen-netback/vif6.0/io_ring_q1
===============================================
Queue 1
TX: nr_ents 256
req prod 10106 (0) cons 10106 (0) event 10107 (1)
rsp prod 10106 (base) pvt 10106 (0) event 10107 (1)
pending prod 9573 pending cons 9317 nr_pending_reqs 0
dealloc prod 8503 dealloc cons 8503 dealloc_queue 0
RX: nr_ents 256
req prod 594 (39) cons 555 (0) event 1 (-554)
rsp prod 555 (base) pvt 555 (0) event 556 (1)
NAPI state: 1 NAPI weight: 64 TX queue len 0
Credit timer_pending: 0, credit: 18446744073709551615, usec: 0
remaining: 18446744073678038030, expires: 0, now: 4314118667
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: PV DomU running linux 3.17.3 causing xen-netback fatal error in Dom0
2014-12-04 15:36 ` Anthony Wright
@ 2014-12-04 15:53 ` David Vrabel
2014-12-05 12:48 ` Zoltan Kiss
0 siblings, 1 reply; 14+ messages in thread
From: David Vrabel @ 2014-12-04 15:53 UTC (permalink / raw)
To: Anthony Wright, David Vrabel; +Cc: xen-devel, Wei Liu, Ian Campbell
On 04/12/14 15:36, Anthony Wright wrote:
>> On 01/12/14 14:22, David Vrabel wrote:
>> This VIF protocol is weird. The first slot contains a txreq with a
>> size
>> for the total length of the packet, subsequent slots have sizes for
>> that
>> fragment only.
>>
>> netback then has to calculate how long the first slot is, by
>> subtracting
>> all the size from the following slots.
>>
>> So something has gone wrong but it's not obvious what it is. Any
>> chance
>> you can dump the ring state when it happens?
>
> We think we've worked out how to dump the ring state, please see below.
We need the full contents of the ring which isn't currently available
via debugfs and I haven't had time to put together a debug patch to make
it available.
David
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: PV DomU running linux 3.17.3 causing xen-netback fatal error in Dom0
2014-12-04 15:53 ` David Vrabel
@ 2014-12-05 12:48 ` Zoltan Kiss
2014-12-05 14:16 ` David Vrabel
0 siblings, 1 reply; 14+ messages in thread
From: Zoltan Kiss @ 2014-12-05 12:48 UTC (permalink / raw)
To: David Vrabel, Anthony Wright; +Cc: xen-devel, Wei Liu, Ian Campbell
Hi,
Maybe I'm misreading it, but it seems to me that netfront doesn't slice
up the linear buffer at all, just blindly sends it. In xennet_start_xmit:
unsigned int offset = offset_in_page(data);
unsigned int len = skb_headlen(skb);
...
tx->offset = offset;
tx->size = len;
Although in the slot counting it calculates it correctly:
DIV_ROUND_UP(offset + len, PAGE_SIZE)
Am I missing something?
Zoli
On 04/12/14 15:53, David Vrabel wrote:
> On 04/12/14 15:36, Anthony Wright wrote:
>>> On 01/12/14 14:22, David Vrabel wrote:
>>> This VIF protocol is weird. The first slot contains a txreq with a
>>> size
>>> for the total length of the packet, subsequent slots have sizes for
>>> that
>>> fragment only.
>>>
>>> netback then has to calculate how long the first slot is, by
>>> subtracting
>>> all the size from the following slots.
>>>
>>> So something has gone wrong but it's not obvious what it is. Any
>>> chance
>>> you can dump the ring state when it happens?
>>
>> We think we've worked out how to dump the ring state, please see below.
>
> We need the full contents of the ring which isn't currently available
> via debugfs and I haven't had time to put together a debug patch to make
> it available.
>
> David
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: PV DomU running linux 3.17.3 causing xen-netback fatal error in Dom0
2014-12-05 12:48 ` Zoltan Kiss
@ 2014-12-05 14:16 ` David Vrabel
0 siblings, 0 replies; 14+ messages in thread
From: David Vrabel @ 2014-12-05 14:16 UTC (permalink / raw)
To: Zoltan Kiss, David Vrabel, Anthony Wright
Cc: xen-devel, Wei Liu, Ian Campbell
On 05/12/14 12:48, Zoltan Kiss wrote:
> Hi,
>
> Maybe I'm misreading it, but it seems to me that netfront doesn't slice
> up the linear buffer at all, just blindly sends it. In xennet_start_xmit:
This is handled in the beginning of xennet_make_frags() (which I would
agree isn't not the obvious place for it).
David
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: PV DomU running linux 3.17.3 causing xen-netback fatal error in Dom0
2014-11-28 15:19 PV DomU running linux 3.17.3 causing xen-netback fatal error in Dom0 Anthony Wright
2014-11-28 15:23 ` Ian Campbell
2014-12-01 14:22 ` David Vrabel
@ 2014-12-08 12:03 ` David Vrabel
2014-12-09 16:46 ` Anthony Wright
2 siblings, 1 reply; 14+ messages in thread
From: David Vrabel @ 2014-12-08 12:03 UTC (permalink / raw)
To: Anthony Wright, xen-devel@lists.xensource.com
On 28/11/14 15:19, Anthony Wright wrote:
> We have a 64 bit PV DomU that we recently upgraded from linux 3.3.2 to
> 3.17.3 running on a 64 bit 3.17.3 Dom0 with Xen 4.4.0.
>
> Shortly after the upgrade we started to lose network connectivity to the
> DomU a few times a day that required a reboot to fix. We see nothing in
> the xen logs or xl dmesg, but when we looked at the dmesg output we saw
> the following output for the two incidents we investigated in detail:
>
> [69332.026586] vif vif-4-0 vif4.0: txreq.offset: 85e, size: 4002, end: 6144
> [69332.026607] vif vif-4-0 vif4.0: fatal error; disabling device
> [69332.031069] br-default: port 2(vif4.0) entered disabled state
>
>
> [824365.530740] vif vif-9-0 vif9.0: txreq.offset: a5e, size: 4002, end: 6656
> [824365.530748] vif vif-9-0 vif9.0: fatal error; disabling device
> [824365.531191] br-default: port 2(vif9.0) entered disabled state
>
> We have a very similar setup running on another machine with a 3.17.3
> DomU, 3.17.3 Dom0 and Xen 4.4.0 but we can't reproduce the issue on this
> machine. This is a test system rather than a production system so has a
> different workload and fewer CPUs.
>
> The piece of code that outputs the error is in
> drivers/net/xen-netback/netback.c.
Does this patch to netfront fix it?
8<---------------------------------------------
xen-netfront: use correct linear area after linearizing an skb
Commit 97a6d1bb2b658ac85ed88205ccd1ab809899884d (xen-netfront: Fix
handling packets on compound pages with skb_linearize) attempted to
fix a problem where an skb that would have required too many slots
would be dropped causing TCP connections to stall.
However, it filled in the first slot using the original buffer and not
the new one and would use the wrong offset and grant access to the
wrong page.
Netback would notice the malformed request and stop all traffic on the
VIF, reporting:
vif vif-3-0 vif3.0: txreq.offset: 85e, size: 4002, end: 6144
vif vif-3-0 vif3.0: fatal error; disabling device
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
drivers/net/xen-netfront.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index ece8d18..eeed0ce 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -627,6 +627,9 @@ static int xennet_start_xmit(struct sk_buff *skb,
struct net_device *dev)
slots, skb->len);
if (skb_linearize(skb))
goto drop;
+ data = skb->data;
+ offset = offset_in_page(data);
+ len = skb_headlen(skb);
}
spin_lock_irqsave(&queue->tx_lock, flags);
--
1.7.10.4
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: PV DomU running linux 3.17.3 causing xen-netback fatal error in Dom0
2014-12-08 12:03 ` David Vrabel
@ 2014-12-09 16:46 ` Anthony Wright
0 siblings, 0 replies; 14+ messages in thread
From: Anthony Wright @ 2014-12-09 16:46 UTC (permalink / raw)
To: David Vrabel, xen-devel@lists.xensource.com
On 08/12/2014 12:03, David Vrabel wrote:
> Does this patch to netfront fix it?
>
> 8<---------------------------------------------
> xen-netfront: use correct linear area after linearizing an skb
>
> Commit 97a6d1bb2b658ac85ed88205ccd1ab809899884d (xen-netfront: Fix
> handling packets on compound pages with skb_linearize) attempted to
> fix a problem where an skb that would have required too many slots
> would be dropped causing TCP connections to stall.
>
> However, it filled in the first slot using the original buffer and not
> the new one and would use the wrong offset and grant access to the
> wrong page.
>
> Netback would notice the malformed request and stop all traffic on the
> VIF, reporting:
>
> vif vif-3-0 vif3.0: txreq.offset: 85e, size: 4002, end: 6144
> vif vif-3-0 vif3.0: fatal error; disabling device
>
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> ---
> drivers/net/xen-netfront.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index ece8d18..eeed0ce 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -627,6 +627,9 @@ static int xennet_start_xmit(struct sk_buff *skb,
> struct net_device *dev)
> slots, skb->len);
> if (skb_linearize(skb))
> goto drop;
> + data = skb->data;
> + offset = offset_in_page(data);
> + len = skb_headlen(skb);
> }
>
> spin_lock_irqsave(&queue->tx_lock, flags);
The patch seems to have worked. Before we'd managed to reproduce the
problem in under 10 seconds, with the patch we haven't seen the problem
on the test or production systems.
Thank you.
Anthony.
^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <E3A471111497714D953FBC6F551B7AA7B6BD43C9@USALMELXP004.LUXGROUP.NET>]
end of thread, other threads:[~2014-12-09 22:39 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-28 15:19 PV DomU running linux 3.17.3 causing xen-netback fatal error in Dom0 Anthony Wright
2014-11-28 15:23 ` Ian Campbell
2014-11-28 16:34 ` Anthony Wright
2014-12-01 14:22 ` David Vrabel
2014-12-01 15:47 ` Anthony Wright
2014-12-01 18:17 ` David Vrabel
2014-12-02 15:40 ` Anthony Wright
2014-12-04 15:36 ` Anthony Wright
2014-12-04 15:53 ` David Vrabel
2014-12-05 12:48 ` Zoltan Kiss
2014-12-05 14:16 ` David Vrabel
2014-12-08 12:03 ` David Vrabel
2014-12-09 16:46 ` Anthony Wright
[not found] <E3A471111497714D953FBC6F551B7AA7B6BD43C9@USALMELXP004.LUXGROUP.NET>
2014-12-09 22:39 ` Anthony Wright
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.