From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.1 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E01E9C433FF for ; Thu, 8 Aug 2019 20:08:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 939552166E for ; Thu, 8 Aug 2019 20:08:17 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="FN95/zY0" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2404440AbfHHUIQ (ORCPT ); Thu, 8 Aug 2019 16:08:16 -0400 Received: from mail-wm1-f65.google.com ([209.85.128.65]:36114 "EHLO mail-wm1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732704AbfHHUIQ (ORCPT ); Thu, 8 Aug 2019 16:08:16 -0400 Received: by mail-wm1-f65.google.com with SMTP id g67so3589181wme.1 for ; Thu, 08 Aug 2019 13:08:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=zJlr9wkYw4QBmLfWwHxHN0iNQhLKqTcj/RqQxeCLHa8=; b=FN95/zY0nAePfklhhxS8XC6NzJel1TihJZQkesgTjN0n3dW3CAlnGXNY73cWDAJ4qk D5o8Bitnkz5kBZ2SSF9AA+NaiY1H6lsBQcEkXuaYmmumzrSK3XKWUEzm0caDjI5GXWOe GmRjzVly52XaylSCUVg0/kcWJp3wc6NmDCtiRRaINYaEc2gxqgI8xTaLfpegjU+6znIA PKFE2gU/JVl7da2lGMiVLSK91Ks76o9c/VVBtN7BSbaSojJEWiOEQcJTRcGemcKrG8VX PMeeevVzcrE88f+YpaaVxQjLb+ldjn1Uu1hbKtJmbqsq2H4NYwnWpXL4J5Gn3Bt+EH5+ pTjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=zJlr9wkYw4QBmLfWwHxHN0iNQhLKqTcj/RqQxeCLHa8=; b=XTyoyCTEwcnD5VWh02hgEtos5p1Kpz0op1aleygwBIcLinJnXUeVDUe7KajcBw21EU fK2LrvJO4Q8AQuRKqoYGBqvjrDW8OAMeJlMNryo5xc3X55pnMyLCI6INR7j/QpbvkESm jr1WnL1FWKNWxELRjIInuoWxpfDRQnw0jtjJ2OGKygQaq5yrGBETyD+wjtTwSb0NNyBd E7VLbcfx/PB8/MqO7Dt68BjVXhZPNsWt4mGztvz46P+CNO3QApB4dSMKG/sRYUpyc/Qq E8vqNdetq5eJ6ogRT3IVIO++6zWw4pzCDugvC3XQAKDMRQj8gy89QUlL1+L58Dp/ZAw/ /0rw== X-Gm-Message-State: APjAAAXcm08o0KxzrJwuD0bsnIvTNYO2UW+/XKu7KN8WhQZ0w3q0H8kd 9T0lcGp7at9zbBX65EMPAR8= X-Google-Smtp-Source: APXvYqztMoi7jWjmMFZ+vZd67r8E0p3K3qU6wEHK8aa8wAnAMSd2fHk3eGr8+7Qay7bWDCCX47MnXQ== X-Received: by 2002:a1c:6555:: with SMTP id z82mr6560881wmb.129.1565294893916; Thu, 08 Aug 2019 13:08:13 -0700 (PDT) Received: from ?IPv6:2003:ea:8f2f:3200:ec8a:8637:bf5f:7faf? (p200300EA8F2F3200EC8A8637BF5F7FAF.dip0.t-ipconnect.de. [2003:ea:8f2f:3200:ec8a:8637:bf5f:7faf]) by smtp.googlemail.com with ESMTPSA id k9sm33052641wrd.46.2019.08.08.13.08.12 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 08 Aug 2019 13:08:13 -0700 (PDT) Subject: Re: [PATCH net-next] r8169: make use of xmit_more To: =?UTF-8?Q?Holger_Hoffst=c3=a4tte?= , Realtek linux nic maintainers , David Miller Cc: "netdev@vger.kernel.org" , Sander Eikelenboom , Eric Dumazet References: <2950b2f7-7460-cce0-d964-ad654d897295@gmail.com> <868a1f4c-5fba-c64b-ea31-30a3770e6a2f@applied-asynchrony.com> From: Heiner Kallweit Message-ID: Date: Thu, 8 Aug 2019 22:08:08 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 In-Reply-To: <868a1f4c-5fba-c64b-ea31-30a3770e6a2f@applied-asynchrony.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On 08.08.2019 21:52, Holger Hoffstätte wrote: > On 8/8/19 8:17 PM, Heiner Kallweit wrote: >> On 08.08.2019 17:53, Holger Hoffstätte wrote: >>> On 8/8/19 4:37 PM, Holger Hoffstätte wrote: >>>> >>>> Hello Heiner - >>>> >>>> On 7/28/19 11:25 AM, Heiner Kallweit wrote: >>>>> There was a previous attempt to use xmit_more, but the change had to be >>>>> reverted because under load sometimes a transmit timeout occurred [0]. >>>>> Maybe this was caused by a missing memory barrier, the new attempt >>>>> keeps the memory barrier before the call to netif_stop_queue like it >>>>> is used by the driver as of today. The new attempt also changes the >>>>> order of some calls as suggested by Eric. >>>>> >>>>> [0] https://lkml.org/lkml/2019/2/10/39 >>>>> >>>>> Signed-off-by: Heiner Kallweit >>>> >>>> I decided to take one for the team and merged this into my 5.2.x tree (just >>>> fixing up the path) and it has been working fine for the last 2 weeks in two >>>> machines..until today, when for the first time in forever some random NFS traffic >>>> made this old friend come out from under the couch: >>>> >>>> [Aug 8 14:13] ------------[ cut here ]------------ >>>> [  +0.000006] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out >>>> [  +0.000021] WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:442 dev_watchdog+0x21f/0x230 >>>> [  +0.000001] Modules linked in: lz4 lz4_compress lz4_decompress nfsd auth_rpcgss oid_registry lockd grace sunrpc sch_fq_codel btrfs xor zstd_compress raid6_pq zstd_decompress bfq jitterentropy_rng nct6775 hwmon_vid coretemp hwmon x86_pkg_temp_thermal aesni_intel aes_x86_64 i915 glue_helper crypto_simd cryptd i2c_i801 intel_gtt i2c_algo_bit iosf_mbi drm_kms_helper syscopyarea usbhid sysfillrect r8169 sysimgblt fb_sys_fops realtek drm libphy drm_panel_orientation_quirks i2c_core video backlight mq_deadline >>>> [  +0.000026] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.2.7 #1 >>>> [  +0.000001] Hardware name: System manufacturer System Product Name/P8Z68-V LX, BIOS 4105 07/01/2013 >>>> [  +0.000004] RIP: 0010:dev_watchdog+0x21f/0x230 >>>> [  +0.000002] Code: 3b 00 75 ea eb ad 4c 89 ef c6 05 1c 45 bd 00 01 e8 66 35 fc ff 44 89 e1 4c 89 ee 48 c7 c7 e8 5e fc 81 48 89 c2 e8 90 df 92 ff <0f> 0b eb 8e 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 66 66 66 66 90 >>>> [  +0.000002] RSP: 0018:ffffc90000118e68 EFLAGS: 00010286 >>>> [  +0.000002] RAX: 0000000000000000 RBX: ffff8887f7837600 RCX: 0000000000000303 >>>> [  +0.000001] RDX: 0000000000000001 RSI: 0000000000000092 RDI: ffffffff827a488c >>>> [  +0.000001] RBP: ffff8887f9fbc440 R08: 0000000000000303 R09: 0000000000000003 >>>> [  +0.000001] R10: 000000000001004c R11: 0000000000000001 R12: 0000000000000000 >>>> [  +0.000009] R13: ffff8887f9fbc000 R14: ffffffff8173aa20 R15: dead000000000200 >>>> [  +0.000001] FS:  0000000000000000(0000) GS:ffff8887ff580000(0000) knlGS:0000000000000000 >>>> [  +0.000000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>> [  +0.000001] CR2: 00007f8d1c04d000 CR3: 0000000002209001 CR4: 00000000000606e0 >>>> [  +0.000000] Call Trace: >>>> [  +0.000002]  >>>> [  +0.000005]  call_timer_fn+0x2b/0x120 >>>> [  +0.000002]  expire_timers+0xa4/0x100 >>>> [  +0.000001]  run_timer_softirq+0x8c/0x170 >>>> [  +0.000002]  ? __hrtimer_run_queues+0x13a/0x290 >>>> [  +0.000003]  ? sched_clock_cpu+0xe/0x130 >>>> [  +0.000003]  __do_softirq+0xeb/0x2de >>>> [  +0.000003]  irq_exit+0x9d/0xe0 >>>> [  +0.000002]  smp_apic_timer_interrupt+0x60/0x110 >>>> [  +0.000003]  apic_timer_interrupt+0xf/0x20 >>>> [  +0.000001]  >>>> [  +0.000003] RIP: 0010:cpuidle_enter_state+0xad/0x930 >>>> [  +0.000001] Code: c5 66 66 66 66 90 31 ff e8 90 99 9e ff 80 7c 24 0b 00 74 12 9c 58 f6 c4 02 0f 85 39 08 00 00 31 ff e8 e7 26 a2 ff fb 45 85 e4 <0f> 88 34 02 00 00 49 63 cc 4c 2b 2c 24 48 8d 04 49 48 c1 e0 05 8b >>>> [  +0.000000] RSP: 0018:ffffc9000008be50 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13 >>>> [  +0.000001] RAX: ffff8887ff5a9180 RBX: ffffffff822b6c40 RCX: 000000000000001f >>>> [  +0.000001] RDX: 0000000000000000 RSI: 0000000033087154 RDI: 0000000000000000 >>>> [  +0.000001] RBP: ffff8887ff5b1310 R08: 000030d021fae397 R09: ffff8887ff59c8c0 >>>> [  +0.000000] R10: ffff8887ff59c8c0 R11: 0000000000000006 R12: 0000000000000004 >>>> [  +0.000001] R13: 000030d021fae397 R14: 0000000000000004 R15: ffff8887fc281600 >>>> [  +0.000001]  cpuidle_enter+0x29/0x40 >>>> [  +0.000002]  do_idle+0x1e5/0x280 >>>> [  +0.000001]  cpu_startup_entry+0x19/0x20 >>>> [  +0.000002]  start_secondary+0x186/0x1c0 >>>> [  +0.000001]  secondary_startup_64+0xa4/0xb0 >>>> [  +0.000001] ---[ end trace 99493c768580f4fd ]--- >>>> >>>> The device is: >>>> >>>> Aug  7 23:19:09 tux kernel: libphy: r8169: probed >>>> Aug  7 23:19:09 tux kernel: r8169 0000:04:00.0 eth0: RTL8168evl/8111evl, c8:60:00:68:33:cc, XID 2c9, IRQ 36 >>>> Aug  7 23:19:09 tux kernel: r8169 0000:04:00.0 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko] >>>> Aug  7 23:19:12 tux kernel: RTL8211E Gigabit Ethernet r8169-400:00: attached PHY driver [RTL8211E Gigabit Ethernet] (mii_bus:phy_addr=r8169-400:00, irq=IGNORE) >>>> Aug  7 23:19:13 tux kernel: r8169 0000:04:00.0 eth0: No native access to PCI extended config space, falling back to CSI >>>> >>>> and using fq_codel, of course. >>>> >>>> This cpuidle hiccup used to be completely gone without xmit_more and this was >>>> the first (and so far only) time since merging it (regardless of load). >>>> Also, while I'm using BMQ as CPU scheduler, that hasn't made a difference for >>>> this particular problem in the past (with MuQSS/PDS) either; way back when I had >>>> Eric's previous attempt(s) it also hiccupped with CFS. >>>> >>>> Revert or wait for more reports when -next is merged in 5.4? >>> >>> Another question/data point: I've had the whole basket of offloads activated: >>> >>>    ethtool --offload eth0 rx on tx on gro on gso on sg on tso on >>> >>> and this caused zero problems without the xmit_more patch. However I just saw >>> that net-next has a patch where TSO is disabled due to a known HW defect in >>> RTL8168evl, which is of course what I have. Could this be the reason for the >>> stall/hiccup when xmit_more has its fingers in the pie? I kind of know what >>> xmit_more does, just not how it could interact with a possibly broken TSO that >>> nevertheless seems to work fine otherwise.. >>> >> >> I was about to ask exactly that, whether you have TSO enabled. I don't know what >> can trigger the HW issue, it was just confirmed by Realtek that this chip version >> has a problem with TSO. So the logical conclusion is: test w/o TSO, ideally the >> linux-next version. > > So disabling TSO alone didn't work - it leads to reduced throughout (~70 MB/s in iperf). > Instead I decided to backport 93681cd7d94f ("r8169: enable HW csum and TSO"), which > wasn't easy due to cleanups/renamings of dependencies, but I managed to backport > it and .. got the same problem of reduced throughout. wat?! > > After lots of trial & error I started disabling all offloads and finally found > that sg (Scatter-Gather) enabled alone - without TSO - will lead to the throughput > drop. So the culprit seems 93681cd7d94f, which disabled TSO on my NIC, but left > sg on by default. This weas repeatable - switch on sg, throughput drop; turn it > off - smooth sailing, now with reduced buffers. > > I modified the relevant bits to disable tso & sg like this: > >     /* RTL8168e-vl has a HW issue with TSO */ >     if (tp->mac_version == RTL_GIGA_MAC_VER_34) { > +        dev->vlan_features &= ~(NETIF_F_ALL_TSO|NETIF_F_SG); > +        dev->hw_features &= ~(NETIF_F_ALL_TSO|NETIF_F_SG); > +        dev->features &= ~(NETIF_F_ALL_TSO|NETIF_F_SG); >     } > > This seems to work since it restores performance without sg/tso by default > and without any additional offloads, yet with xmit_more in the mix. > We'll see whether that is stable over the next few days, but I strongly > suspect it will be good and that the hiccups were due to xmit_more/TSO > interaction. > Thanks a lot for the analysis and testing. Then I'll submit the disabling of SG on RTL8168evl (on your behalf), independent of whether it fixes the timeout issue. > thanks, > Holger > Heiner