From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40729) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ayid8-0002kW-1g for qemu-devel@nongnu.org; Fri, 06 May 2016 12:29:37 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ayicv-0006bV-3y for qemu-devel@nongnu.org; Fri, 06 May 2016 12:29:24 -0400 Received: from mail-wm0-x244.google.com ([2a00:1450:400c:c09::244]:34200) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ayict-0006WG-Lm for qemu-devel@nongnu.org; Fri, 06 May 2016 12:29:17 -0400 Received: by mail-wm0-x244.google.com with SMTP id n129so9528613wmn.1 for ; Fri, 06 May 2016 09:29:01 -0700 (PDT) Date: Fri, 6 May 2016 17:28:55 +0100 From: Stefan Hajnoczi Message-ID: <20160506162855.GC23075@stefanha-x1.localdomain> References: <20160505174203.GC14181@stefanha-x1.localdomain> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="KDt/GgjP6HVcx58l" Content-Disposition: inline In-Reply-To: Subject: Re: [Qemu-devel] TCP Segementation Offloading List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Ingo Krabbe Cc: qemu-devel@nongnu.org, jasowang@redhat.com, mst@redhat.com --KDt/GgjP6HVcx58l Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, May 06, 2016 at 06:34:33AM +0200, Ingo Krabbe wrote: > > On Sun, May 01, 2016 at 02:31:57PM +0200, Ingo Krabbe wrote: > >> Good Mayday Qemu Developers, > >>=20 > >> today I tried to find a reference to a networking problem, that seems = to be of quite general nature: TCP Segmentation Offloading (TSO) in virtual= environments. > >>=20 > >> When I setup TAP network adapter for a virtual machine and put it into= a host bridge, the known best practice is to manually set "tso off gso off= " with ethtool, for the guest driver if I use a hardware emulation, such as= e1000 and/or "tso off gso off" for the host driver and/or for the bridge a= dapter, if I use the virtio driver, as otherwise you experience (sometimes?= ) performance problems or even lost packages. > >=20 > > I can't parse this sentence. In what cases do you think it's a "known > > best practice" to disable tso and gso? Maybe a table would be a clearer > > way to communicate this. > >=20 > > Can you provide a link to the source claiming tso and gso should be > > disabled? >=20 > Sorry for that long sentence. The consequence seems to be, that it is mos= t stable to turn off tso and gso for host bridges and for adapters in virtu= al machines. >=20 > One of the most comprehensive collections of arguments is this article >=20 > https://kris.io/2015/10/01/kvm-network-performance-tso-and-gso-turn-it-o= ff/ >=20 > while I also found a documentation for Centos 6 >=20 > https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6= /html/Virtualization_Host_Configuration_and_Guest_Installation_Guide/ch10s0= 4.html This documentation is about (ancient) RHEL 3.9 guests. I would not apply anything on that page to modern Linux distro releases without re-checking. >=20 > In google groups this one is discussed >=20 > https://code.google.com/p/ganeti/wiki/PerformanceTuning >=20 > Of course the same is found for Xen Machines >=20 > http://cloudnull.io/2012/07/xenserver-network-tuning/ >=20 > You see there are several Links in the internet and my first question is:= Why can't I find this discussion in the qemu-wiki space. >=20 > I think the bug >=20 > https://bugs.launchpad.net/bugs/1202289 >=20 > is related. Thanks for posting all the links! I hope Michael and/or Jason explain the current status for RHEL 6/7 and other modern distros. Maybe they can also follow up with the kris.io blog author if an update to the post is necessary. TSO/GSO is enabled by default on my Fedora and RHEL host/guests. If it was a best practice for those distros I'd expect the default settings to reflect that. Also, I would be surprised if the offload features were bad since work was put into supporting and extending them in virtio-net over the years. > >> I haven't found a complete analysis of the background of these problem= s, but there seem to be some effects on MTU based fragmentation and UDP che= cksums. > >>=20 > >> There is a tso related bug on launchpad, but the context of this bug i= s too narrow, for the generality of the problem. > >>=20 > >> Also it seems that there is a problem in LXC contexts too (I found suc= h a reference, without detailed description in a Post about Xen setup). > >>=20 > >> My question now is: Is there a bug in the driver code and shouldn't th= is be documented somewhere in wiki.qemu.org? Where there developments about= this topic in the past or is there any planned/ongoing work todo on the qe= mu drivers? > >>=20 > >> Most problem reports found relate to deprecated Centos6 qemu-kvm packa= ges. > >>=20 > >> In our company we have similar or even worse problems with Centos7 hos= ts and guest machines. > >=20 > > Have haven't explained what problem you are experiencing. If you want > > help with your setup please include your QEMU command-line (ps aux | > > grep qemu), the traffic pattern (ideally how to reproduce it with a > > benchmarking tool), and what observation you are making (e.g. netstat > > counters showing dropped packets). >=20 > I was quite astonished about the many hints about virtio drivers as we ha= d this problem with the e1000 driver in a Centos7 Guest on a Centos6 Host. >=20 > e1000 0000:00:03.0 ens3: Detected Tx Unit Hang#012 Tx Queue = <0>#012 TDH <42>#012 TDT <42>#012 nex= t_to_use <2e>#012 next_to_clean <42>#012buffer_info[next_t= o_clean]#012 time_stamp <104aff1b8>#012 next_to_watch <4= 4>#012 jiffies <104b00ee9>#012 next_to_watch.status <0> > Apr 25 21:08:48 db03 kernel: ------------[ cut here ]------------ > Apr 25 21:08:48 db03 kernel: WARNING: at net/sched/sch_generic.c:297 dev= _watchdog+0x270/0x280() > Apr 25 21:08:48 db03 kernel: NETDEV WATCHDOG: ens3 (e1000): transmit que= ue 0 timed out > Apr 25 21:08:48 db03 kernel: Modules linked in: binfmt_misc ipt_REJECT n= f_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip6t_REJECT nf_conntrack_ipv= 6 nf_defrag_ipv6 xt_conntrack nf_conntrack ip6table_filter ip6_tables btrfs= zlib_deflate raid6_pq xor ext4 mbcache jbd2 crc32_pclmul ghash_clmulni_int= el aesni_intel lrw gf128mul glue_helper ablk_helper i2c_piix4 ppdev cryptd = pcspkr virtio_balloon parport_pc parport sg nfsd auth_rpcgss nfs_acl lockd = grace sunrpc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic at= a_generic pata_acpi virtio_scsi cirrus syscopyarea sysfillrect sysimgblt dr= m_kms_helper ttm drm crct10dif_pclmul crct10dif_common ata_piix crc32c_inte= l virtio_pci e1000 i2c_core virtio_ring libata serio_raw virtio floppy dm_m= irror dm_region_hash dm_log dm_mod > Apr 25 21:08:48 db03 kernel: CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3= =2E10.0-327.13.1.el7.x86_64 #1 > Apr 25 21:08:48 db03 kernel: Hardware name: Red Hat KVM, BIOS 0.5.1 01/0= 1/2007 > Apr 25 21:08:48 db03 kernel: ffff88126f483d88 685d892e8a452abb ffff88126= f483d40 ffffffff8163571c > Apr 25 21:08:48 db03 kernel: ffff88126f483d78 ffffffff8107b200 000000000= 0000000 ffff881203b9a000 > Apr 25 21:08:48 db03 kernel: ffff881201c3e080 0000000000000001 000000000= 0000002 ffff88126f483de0 > Apr 25 21:08:48 db03 kernel: Call Trace: > Apr 25 21:08:48 db03 kernel: [] dump_stack+0x19= /0x1b > Apr 25 21:08:48 db03 kernel: [] warn_slowpath_common+0= x70/0xb0 > Apr 25 21:08:48 db03 kernel: [] warn_slowpath_fmt+0x5c= /0x80 > Apr 25 21:08:48 db03 kernel: [] dev_watchdog+0x270/0x2= 80 > Apr 25 21:08:48 db03 kernel: [] ? dev_graft_qdisc+0x80= /0x80 > Apr 25 21:08:48 db03 kernel: [] call_timer_fn+0x36/0x1= 10 > Apr 25 21:08:48 db03 kernel: [] ? dev_graft_qdisc+0x80= /0x80 > Apr 25 21:08:48 db03 kernel: [] run_timer_softirq+0x23= 7/0x340 > Apr 25 21:08:48 db03 kernel: [] __do_softirq+0xef/0x280 > Apr 25 21:08:48 db03 kernel: [] call_softirq+0x1c/0x30 > Apr 25 21:08:48 db03 kernel: [] do_softirq+0x65/0xa0 > Apr 25 21:08:48 db03 kernel: [] irq_exit+0x115/0x120 > Apr 25 21:08:48 db03 kernel: [] smp_apic_timer_interru= pt+0x45/0x60 > Apr 25 21:08:48 db03 kernel: [] apic_timer_interrupt+0= x6d/0x80 > Apr 25 21:08:48 db03 kernel: [] ? native_safe_h= alt+0x6/0x10 > Apr 25 21:08:48 db03 kernel: [] default_idle+0x1f/0xc0 > Apr 25 21:08:48 db03 kernel: [] arch_cpu_idle+0x26/0x30 > Apr 25 21:08:48 db03 kernel: [] cpu_startup_entry+0x24= 5/0x290 > Apr 25 21:08:48 db03 kernel: [] start_secondary+0x1ba/= 0x230 > Apr 25 21:08:48 db03 kernel: ---[ end trace 71ac4360272e207e ]--- > Apr 25 21:08:48 db03 kernel: e1000 0000:00:03.0 ens3: Reset adapter >=20 >=20 > I'm still not sure why this happens on this host "db03", while db02 and d= b01 are not affected. All guests are running on different hosts and the net= work is controlled by an openvswitch. This looks interesting. It could be a bug in QEMU's e1000 NIC emulation. Maybe it has already been fixed in qemu.git but I didn't see any relevant commits. Please post the RPM version numbers you are using (rpm -qa | grep qemu in host, rpm -qa | grep kernel in host). The e1000 driver can print additional information (to dump the contents of the tx ring). Please increase your kernel's log level to collect that information: # echo 8 >/proc/sys/kernel/printk The tx ring dump may allow someone to figure out why the packet caused tx to stall. Stefan --KDt/GgjP6HVcx58l Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJXLMZHAAoJEJykq7OBq3PIArcH/iUpOckhZmFfXN5a7sQ2OoJT R3pCf43kg7ZFtNry4x0S2DYqtXmXD1reugaxuWnwnQ1N1NOAHXGHdGmJsUAMpG2+ z3P+iuOfCztYkTxellp5ru4Er98tnqh3jIqTWlm/T3V8DodvLWmPmZ4WBgryv8vh 7JDwyjtnMWzcSALGinwLjCTYt5qtbcQqlUAdNrYUsAOU56xvLXIbvjX0Gzvqol5B 08/shWivD2bXsDKOAj9qIIbokQliIA6qAFG0VamgTHGGZH13lS5OWfXoQTcWMzMd 24QkwQhEtw7SlYjT4cpJD5UbXfbDjkzEkarG/VYHakqBAGJpz0K4DV9vgv+1Hcw= =el// -----END PGP SIGNATURE----- --KDt/GgjP6HVcx58l--