From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Re: domU to domU networking issues in v3.7? (netserver/netperf failing to communicate) Date: Wed, 14 Nov 2012 01:24:53 -0500 Message-ID: <20121114062451.GB24800@localhost.localdomain> References: <20121110135931.GD23686@localhost.localdomain> <1352714084.27833.137.camel@zakaz.uk.xensource.com> <20121112142835.GG19860@phenom.dumpdata.com> <711956045.20121112155024@eikelenboom.it> <20121112163204.GB9575@phenom.dumpdata.com> <1945521669.20121112182055@eikelenboom.it> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <1945521669.20121112182055@eikelenboom.it> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Sander Eikelenboom Cc: "annie.li@oracle.com" , Ian Campbell , "marcos.matsunaga@oracle.com" , "xen-devel@lists.xen.org" List-Id: xen-devel@lists.xenproject.org On Mon, Nov 12, 2012 at 06:20:55PM +0100, Sander Eikelenboom wrote: > > Monday, November 12, 2012, 5:32:04 PM, you wrote: > > > On Mon, Nov 12, 2012 at 03:50:24PM +0100, Sander Eikelenboom wrote: > >> > >> Monday, November 12, 2012, 3:28:35 PM, you wrote: > >> > >> > On Mon, Nov 12, 2012 at 09:54:44AM +0000, Ian Campbell wrote: > >> >> On Sat, 2012-11-10 at 13:59 +0000, Konrad Rzeszutek Wilk wrote: > >> >> > Hey Ian, Xen-devel mailingl list, > >> >> > > >> >> > I think the issue of 70% traffic lost was actually introduced in v3.6 or > >> >> > perhaps v3.5. Annie and Marcos (CC-ed here) are looking to see which of > >> >> > the releases introduced this. The issue we are seeing is that a domU > >> >> > to domU communication breaks - this is with netperf/netserver talking to > >> >> > each other. > >> >> > > >> >> > Anyhow, I think the 3.7 compound page exacerbated the problem and also > >> >> > (at least on some of my test hardware) exposed existing issues with > >> >> > drivers. The issue I have is that the 'skge' driver has a bug that has > >> >> > been there for ages (I tested way back to 3.0 and still saw it) were it > >> >> > can not work with SWIOTLB. It is probably missing an pci_dma_sync > >> >> > somewhere. > >> >> > > >> >> > Anyhow the compound page got me to look at Xen-SWIOTLB and that looks > >> >> > OK. Even with synthetic driver (the fake one I posted somewhere) it > >> >> > dealt with compound pages properly (with debug or non-debug Xen > >> >> > hypervisor). > >> >> > >> >> The debug build is probably most interesting since it deliberately > >> >> allocates a non 1-1 p-to-m mapping so as to catch exactly these sorts of > >> >> issues. > >> > >> > Right. My test env runs with that. And so far it only has issues > >> > with the skge one. > >> >> > >> >> > So was wondering if you had looked at this in more details? Any > >> >> > ideas? Or would it be more prudent to ask that once we know for sure > >> >> > which Linux release introduced the communication failures between > >> >> > guests? > >> >> > >> >> I've not looked at it any further I'm afraid. > >> >> > >> >> If these changes (be they in 3.5 or later, or earlier) are exposing > >> >> driver bugs then I suspect the netdev chaps would want to know about it. > >> > >> > Right. Annie (CC-ed here) mentioned to me that v3.5 looks to work ok. > >> > And is off checking v3.6. v3.7 is definitly a no go. > >> >> > >> >> FWIW I see the issue with tg3. > >> > >> After the issues with netback where fixed, I'm seeing the issues with net_front reverting the single commit 5640f7685831e088fe6c2e1f863a6805962f8e81 (that was pointed out for netback) also makes these disappear. > > > Were you ever able to trigger the BUG_ON in the patch that Ian posted? > > What exact patch (or any other patch that can help you ) ? > (so i can try again to be sure) This one: http://lists.xen.org/archives/html/xen-devel/2012-10/msg00893.html