From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stephen Hemminger Subject: Fw: [Bug 201137] New: using traffic control with sfq cause kernel crash Date: Sat, 15 Sep 2018 16:26:41 -0700 Message-ID: <20180915162641.00dd1050@xeon-e3> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit To: netdev@vger.kernel.org Return-path: Received: from mail-pf1-f172.google.com ([209.85.210.172]:39237 "EHLO mail-pf1-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725199AbeIPEre (ORCPT ); Sun, 16 Sep 2018 00:47:34 -0400 Received: by mail-pf1-f172.google.com with SMTP id j8-v6so5895240pff.6 for ; Sat, 15 Sep 2018 16:26:49 -0700 (PDT) Received: from xeon-e3 (204-195-22-127.wavecable.com. [204.195.22.127]) by smtp.gmail.com with ESMTPSA id p4-v6sm14416707pfb.180.2018.09.15.16.26.48 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sat, 15 Sep 2018 16:26:48 -0700 (PDT) Sender: netdev-owner@vger.kernel.org List-ID: Begin forwarded message: Date: Sat, 15 Sep 2018 08:43:09 +0000 From: bugzilla-daemon@bugzilla.kernel.org To: stephen@networkplumber.org Subject: [Bug 201137] New: using traffic control with sfq cause kernel crash https://bugzilla.kernel.org/show_bug.cgi?id=201137 Bug ID: 201137 Summary: using traffic control with sfq cause kernel crash Product: Networking Version: 2.5 Kernel Version: 4.18.5 Hardware: x86-64 OS: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: IPV4 Assignee: stephen@networkplumber.org Reporter: grafgrimm77@gmx.de Regression: No Created attachment 278555 --> https://bugzilla.kernel.org/attachment.cgi?id=278555&action=edit kernel config Copying from the machine to an other server (protocol does not matter), causes a kernel crash when using tc-setting with SFQ. The machine has a Qualcom Killer NIC: lspci |grep Killer 03:00.0 Ethernet controller: Qualcomm Atheros Killer E220x Gigabit Ethernet Controller (rev 13) I use traffic control with SFQ: tc qdisc add dev enp3s0 root handle 1: sfq tc qdisc show dev enp3s0 Now I try to copy a big file (124GB, an image of a partition) to another Linux-Server (same kernel version) to a NFS-Share. It does not matter if it is a nfs or samba or whatever-share. It also does not matter if I use cp or rsync command. The target-share is for example: grep base /proc/mounts jaguar.grafnetz:/base /mnt/base nfs4 rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.0.9,local_lock=none,addr=192.168.0.7 0 0 df shows this nfs-share called base when mounted: jaguar.grafnetz:/base 11718572032 6012592128 5705979904 52% /mnt/base Now I use a simpe cp-command: cp big-fime.dd.image /mnt/base/test_01 The machine crashes after 7833735168 Bytes reached the Target-Server. About 7,9 GB (with G=1000^3). I can reproduce this crash. The good thing is: I figured out that no kernel crash happens when I do not use: tc qdisc add dev enp3s0 root handle 1: sfq tc qdisc show dev enp3s0 (So I commented it out from my local start-script and rebootet the system.) Result: No crash any more. Copying the big file (124GB) completed without a kernel crash. Additional Information... NIC is configured with IPv4: haswell ~ # ifconfig enp3s0: flags=4163 mtu 1500 inet 192.168.0.9 netmask 255.255.255.0 broadcast 192.168.0.255 ether d4:3d:7e:bd:89:44 txqueuelen 1000 (Ethernet) RX packets 7399483 bytes 511559908 (487.8 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 91781850 bytes 47176316774 (43.9 GiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device interrupt 19 ethtool enp3s0 Settings for enp3s0: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Supported pause frame use: Symmetric Receive-only Supports auto-negotiation: Yes Supported FEC modes: Not reported Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Advertised pause frame use: Symmetric Advertised auto-negotiation: Yes Advertised FEC modes: Not reported Speed: 1000Mb/s Duplex: Full Port: Twisted Pair PHYAD: 0 Transceiver: internal Auto-negotiation: on MDI-X: Unknown Current message level: 0x000060e4 (24804) link ifup rx_err tx_err hw wol Link detected: yes While copying over the Gigabit-Network, speed is near maximum: ifstat enp2s0 KB/s in KB/s out 0.06 0.18 8348.65 31.60 117536.2 435.11 118049.0 435.04 119100.9 434.84 118889.7 435.19 119004.1 444.53 119061.4 440.47 119102.8 444.04 119077.4 444.39 119084.1 432.32 119089.6 439.71 [...] So, perhaps the sfq-Kernel-module has a bug. I use the vanilla kernel from kernel.org and sfq is compiled as a module. /usr/src/linux # grep SFQ .config CONFIG_NET_SCH_SFQ=m Perhaps important: the server with the target-share also uses sfq with the same settings without a problem. It runs stable. -- You are receiving this mail because: You are the assignee for the bug.