From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Re: huge tcp performance-regression on pvops - kernel (+Solution) Date: Tue, 27 Oct 2009 10:54:07 -0400 Message-ID: <20091027145407.GD1193@phenom.dumpdata.com> References: <733454931.1283414.1256600244808.JavaMail.tomcat55@mrmseu2.kundenserver.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Return-path: Content-Disposition: inline In-Reply-To: <733454931.1283414.1256600244808.JavaMail.tomcat55@mrmseu2.kundenserver.de> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Ronny.Hegewald@online.de, ian.campbell@citrix.com Cc: xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org On Tue, Oct 27, 2009 at 12:37:25AM +0100, Ronny.Hegewald@online.de wrote: > < xmlns=3D"http://www.w3.org/1999/x" xml:lang=3D"en" lang=3D"en"><= /title><head><meta http-equiv=3D"Content-type" content=3D"text/; charset=3D= UTF-8" /><style type=3D"text/css"> , body {overflow-x: visible; } { width:= 100%; height:100%;margin:0px; padding:0px; overflow-y: auto; overflow-x: au= to; }body { font-size: 100.01%; font-family : Verdana, Geneva, Arial, Helve= tica, sans-serif; background-color:transparent; overflow:show; background-i= mage:none; margin:0px; padding:5px; }p { margin:0px; padding:0px; } body { = font-size: 12px; font-family : Verdana, Geneva, Arial, Helvetica, sans-seri= f; } p { margin: 0; padding: 0; } blockquote { padding-left: 5px; margin-le= ft: 5px; margin-bottom: 0px; margin-top: 0px; } blockquote.quote { border-l= eft: 1px solid #CCC; padding-left: 5px; margin-left: 5px; } .misspelled { b= ackground: transparent url(//webmailerng.1und1.de/static_resource/mailclien= t/widgets/basic/parts/maileditor/spellchecking_underline.gif) repeat-x scro= ll center bottom; } .correct {} .unknown {} .ignored {}</style></head><body= id=3D"bodyElement" style=3D""> > <p>Setup: pvops dom0 kernel 2.6.31.4 from Jeremys git-repository from 200= 9-10-18<span></span></p><p id=3D"__paragraph__1256597098000" style=3D""><sp= an style=3D""><br></span></p>Problem: Very slow network-performance (3-10 k= bs) when tcp-packets are used (noticed when using scp, samba, nfs over tcp)= <p id=3D"__paragraph__1256597073000" style=3D""><br></p><p id=3D"__paragrap= h__1256597073000" style=3D"">But this occurs only in the following situatio= ns:<br></p><p id=3D"__paragraph__1256597073000" style=3D"">  &nbs= p; <span></span></p><p id=3D"__paragraph__1256597073000" style=3D"">1.) dom= U to domU on same PC (domUs are paravirtualized linux-kernels)<br></p><p id= =3D"__paragraph__1256597073000" style=3D"">2.) domU to another PC that doe= sn't use<span id=3D"misspelled-have" class=3D"misspelled" name=3D"misspelle= d-have"></span> pvops-kernel (packets sent from another PC to domU works fi= ne)<br></p><br>domU to dom0 and the opposite way works without performance-= regression.<br><br><p>Reason: bigger tcp-packets get dropped from the domU = the tcp-packets are sent from (netstat -s in domU shows many retransmitted = tcp-segments)</p><p><br></p><p>tcpdump shows that the bigger packets leave = the vif from the domU they were sent from, but never arrive the vif from th= e domU they are sent to.<br><span></span></p><br><span style=3D""></span><p= >This is caused by this lines in drivers/xen/netback.c at line 1325 :<span>= </span></p><p id=3D"__paragraph__1256597423000" style=3D""><br><span style= =3D""></span></p><p id=3D"__paragraph__1256597423000" style=3D""> &nbs= p;            &= nbsp; if (skb->data_len < skb_shinfo(skb)->gso_size= ) { > </p><div class=3D"pre">        &n= bsp;            = ;    skb_shinfo(skb)->gso_size =3D 0;</div> > <div class=3D"pre">         =             &nb= sp;   skb_shinfo(skb)->gso_type =3D 0;</div> > <div class=3D"pre">         =         }</div><br><p id=3D"__paragraph_= _1256597121000" style=3D"">These lines were reverted from the linux-2.6.18-= xen mercurial repository <br></p><p id=3D"__paragraph__1256597121000" style= =3D""><br></p><p id=3D"__paragraph__1256597121000" style=3D"">on 2009-01-13= in changeset 774: > <a href=3D"http://xenbits.xensource.com/linux-2.6.18-xen.hg?rev/107e10e= 0e07c">107e10e0e07c: netfront/back: do not mark packets of length < MSS = as GSO</a></p><p id=3D"__paragraph__1256598462000" style=3D""><br><span sty= le=3D""></span></p><p id=3D"__paragraph__1256598462000" style=3D""><span st= yle=3D"">I used the patch on the above mentioned pvops tree and the problem= was gone.</span><span></span></p><p id=3D"__paragraph__1256600217000" styl= e=3D""><br><span style=3D""></span></p><p id=3D"__paragraph__1256600217000"= style=3D""><span style=3D"">I never noticed such problems on the 2.6.18-xe= n kernel or the forward-ported xen-kernel 2.6.31.4 (from </span>Andrew Lyon= )</p></body></> >=20 Kudos for discovering this. Did you verify whether the domU has the corresp= onding patch: diff -r 28acedb66302 -r 107e10e0e07c drivers/xen/netfront/netfront.c --- a/drivers/xen/netfront/netfront.c Wed Jan 07 12:21:54 2009 +0900 +++ b/drivers/xen/netfront/netfront.c Tue Jan 13 15:17:54 2009 +0000 @@ -1439,6 +1439,14 @@ np->stats.rx_packets++; np->stats.rx_bytes +=3D skb->len; +#if HAVE_TSO + if (skb->data_len < skb_shinfo(skb)->gso_size) { + skb_shinfo(skb)->gso_size =3D 0; +#if HAVE_GSO + skb_shinfo(skb)->gso_type =3D 0; +#endif + } +#endif __skb_queue_tail(&rxq, skb); np->rx.rsp_cons =3D ++i; Looking at the PV_OPs kernel it doesn't look to be there, but I was=20 wondering if the domU you are using has it? Ian, Would it make sense to remove this changeset if most of the DomU's don't have this fix?