From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeremy Fitzhardinge Subject: Re: Jeremy's GIT-tree and network problems Date: Mon, 29 Mar 2010 11:42:40 -0700 Message-ID: <4BB0F4A0.2000403@goop.org> References: <20100329T1647.GA.2c139.stse@fsing.rootsland.net> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <20100329T1647.GA.2c139.stse@fsing.rootsland.net> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: xen-devel@lists.xensource.com Cc: Adnan Misherfi , Konrad Rzeszutek Wilk List-Id: xen-devel@lists.xenproject.org On 03/29/2010 08:40 AM, Stephan Seitz wrote: > Hi! > > I have bridging problems with the Dom0 kernels from Jeremy=E2=80=99s tr= ee. I=20 > wrote a mail to xen-user (MSG-ID=20 > <20091220T1944.GA.ab998.stse@fsing.rootsland.net>, 20 Dec 2009), but=20 > without solutions. So I try xen-devel this time. > > > My hardware setup: > A PC with two NICs (Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI=20 > Express Gigabit Ethernet controller) is used as working environment=20 > (Dom0) and as firewall/proxy/DSL-router (DomU). > The two NICs are bridged between Dom0 and DomU. > Bridge eth0 containing peth0 and vif1.0 has an IP address in Dom0 and=20 > DomU. The DomU IP address is the gateway address in Dom0. > Bridge xenbr1 containing eth1 and vif1.1 has no IP address in Dom0 and=20 > DomU and is only used to connect the DSL modem to DomU. The IP address=20 > is given to the PPP interface in DomU. > Linux distribution is Debian/Testing (64bit) with XEN version 3.4.2 in=20 > December and 3.4.3rc3 now. The kernels are always self-compiled. > > > My working setup: > Dom0 with kernel 2.6.29.5 with xen-patches-2.6.29-6.tar.bz2 and DomU=20 > with standard kernel 2.6.32.x (and the 2.6.29.5 xen kernel before).=20 > The hypervisor was 3.4.2 and is now 3.4.3rc3. > Here everything works as expected. DomU acts as firewall and is using=20 > correct masquerading for all internet traffic. > > > My non-working setup: > Dom0 with the PV-Ops kernel from Jeremy=E2=80=99s tree (I tried the fol= lwoing=20 > kernels: 2.6.31.5-00500-g34013be, 2.6.31.6-00696-g41a0695 (tested in=20 > December) and now from xen/stable the versions=20 > 2.6.32.10-02792-gf112549 and 2.6.32.10-02798-gd945b01). DomU kernel=20 > and hypervisor are the same as in the working setup. > > What is working? > IP connection between Dom0 and DomU is working and between DomU and=20 > the internet. Traffic from Dom0 to the internet is working if DomU is=20 > used as a proxy (e.g. HTTP traffice with a squid in DomU). > > What is not working? > Direct IP connection between Dom0 and the internet (tested with ping=20 > and =E2=80=9Etelnet =E2=80=9D. > If I trace in DomU I see the packets leaving the ppp0 interface=20 > (correctly masqueraded), but I see no answering packets. > If I trace in Dom0 using the bridge interfaces between the DSL modem=20 > and DomU (xenbr1, eth1, vif1.1, see hardware setup above), I don=E2=80=99= t see=20 > the packets anymore. I only see packets from traffic generated=20 > directly by DomU. > The DomU configuration between the working and non-working setup is=20 > not changed, only the Dom0 kernel is changed. > > > So if anyone has an idea, what this could be and how to fix it, I will=20 > be glad. > > > Further information: > The NIC and the bridge driver are the same in all kernels from=20 > 2.6.29.5 until 2.6.32.10: > > osgiliath:~# ethtool -i eth1 > driver: r8169 > version: 2.3LK-NAPI > firmware-version: > bus-info: 0000:03:00.0 > osgiliath:~# ethtool -i xenbr1 > driver: bridge > version: 2.3 > firmware-version: N/A > bus-info: N/A > > The only difference in the output of =E2=80=9Eethtool eth1=E2=80=9D are= additional=20 > information about =E2=80=9Elink partner advertised modes=E2=80=9D in th= e 2.6.3x kernels. > > =E2=80=9Eethtool -k eth1=E2=80=9D shows the error message =E2=80=9ECann= ot get device flags:=20 > Operation not supported=E2=80=9D in the working setup for the working D= om0=20 > kernel. All other output is identical in all kernel versions: > > osgiliath:~# ethtool -k eth1 > Offload parameters for eth1: > rx-checksumming: on > tx-checksumming: off > scatter-gather: off > tcp-segmentation-offload: off > udp-fragmentation-offload: off > generic-segmentation-offload: off > generic-receive-offload: off > large-receive-offload: off > > Switching rx-checksumming off does not help. Have you tried carpet-bombing the ethtools: turn off everything on all=20 the dom0 interfaces (both the bridge(s) and all the component=20 interfaces) and all the domU interfaces? It does look like some kind of=20 checksum problem (or perhaps other offload?). Fortunately it looks like this is going to get some systematic=20 attention. I'd really like any reasonable (ie, not inherently broken for=20 other reasons) network setup to just work. Thanks, J