From mboxrd@z Thu Jan 1 00:00:00 1970 From: Zoltan Kiss Subject: Re: [3.15-rc3] Bisected: xen-netback mangles packets between two guests on a bridge since merge of "TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy" series. Date: Tue, 13 May 2014 14:40:07 +0100 Message-ID: <537220B7.5080202@citrix.com> References: <395225650.20140430124506@eikelenboom.it> <536D4282.9070309@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Cc: Ian Campbell , "David S. Miller" , , To: Sander Eikelenboom Return-path: Received: from smtp.citrix.com ([66.165.176.89]:12278 "EHLO SMTP.CITRIX.COM" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933243AbaEMNkn (ORCPT ); Tue, 13 May 2014 09:40:43 -0400 In-Reply-To: <536D4282.9070309@citrix.com> Sender: netdev-owner@vger.kernel.org List-ID: Hi, It seems I've fixed this: the receive side couldn't handle when the frags were changed. I'll post a patch shortly. Zoli On 09/05/14 22:02, Zoltan Kiss wrote: > Hi, > > Sorry for the long silence on this issue, I was busy trying to figure > out what went wrong. Fun facts: > > - commenting out that _pskb_pull_tail from tx_submit which > unconditionally pulls up the linear area to 128 bytes seems to solve the > problem > - I could repro the problem only when the sending guest had a 64 bit > kernel, but then even with 3.2. On the other hand, with 32 bit sending > guest it works fine. More exactly I think it boils down to the actual > config, I used XenServer Dom0 config files, see them here: > https://github.com/xenserver/linux-3.x.pg/blob/master/master/kernel-configuration > > - with 64 bit Debian 7 kernel as sender it also works, so I guess it's > not about 32/64 bit, but something in the config > - the receiving guest, where wget ran, doesn't matter. > - the "more than MAX_SKB_FRAGS slots" thing was a red herring. A typical > skb layout (on the sender's xenvif_start_xmit) which gets corrupted: > linear area: 66 bytes > 0. frag: 52 bytes > 1. frag: 1200 bytes > - so I guess the problem is when that pull_tail pulls the whole first > frag into the linear area > - a corrupt packet on the receiver side looks like the following: > - linear buffer: 128 bytes, content is OK > - the content of the frag area is shifted back 4096 bytes in the > TCP stream. So instead of the Nth byte it starts with the (N-4096)th byte > - the length is the same as on the sender side, I've checked by > looking at the IP id fields > - otherwise the stream content looks ok (I used a continuously > incrementing pattern) > - the next packet starts at the right place > - the pulling itself doesn't cause the corruption, I've printed out the > first frag after that, and it still looks OK > - ftrace_printk("%*ph") seems to have problems when the pointer points > to a grant mapped page. I have the impression that it tries to > dereference it when I read the trace buffer, at which point the mapping > and the content is long gone. > > I'll continue to look into this next week > > Zoli > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html