All of lore.kernel.org
 help / color / mirror / Atom feed
* Fw: [Bug 201137] New: using traffic control with sfq cause kernel crash
@ 2018-09-15 23:26 Stephen Hemminger
  0 siblings, 0 replies; only message in thread
From: Stephen Hemminger @ 2018-09-15 23:26 UTC (permalink / raw)
  To: netdev



Begin forwarded message:

Date: Sat, 15 Sep 2018 08:43:09 +0000
From: bugzilla-daemon@bugzilla.kernel.org
To: stephen@networkplumber.org
Subject: [Bug 201137] New: using traffic control with sfq cause kernel crash


https://bugzilla.kernel.org/show_bug.cgi?id=201137

            Bug ID: 201137
           Summary: using traffic control with sfq cause kernel crash
           Product: Networking
           Version: 2.5
    Kernel Version: 4.18.5
          Hardware: x86-64
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: IPV4
          Assignee: stephen@networkplumber.org
          Reporter: grafgrimm77@gmx.de
        Regression: No

Created attachment 278555
  --> https://bugzilla.kernel.org/attachment.cgi?id=278555&action=edit  
kernel config

Copying from the machine to an other server (protocol does not matter), causes
a kernel crash when using tc-setting with SFQ.

The machine has a Qualcom Killer NIC: lspci |grep Killer
03:00.0 Ethernet controller: Qualcomm Atheros Killer E220x Gigabit Ethernet
Controller (rev 13)

I use traffic control with SFQ: 
tc qdisc add dev enp3s0 root handle 1: sfq
tc qdisc show dev enp3s0

Now I try to copy a big file (124GB, an image of a partition) to another
Linux-Server (same kernel version) to a NFS-Share. It does not matter if it is
a nfs or samba or whatever-share. It also does not matter if I use cp or rsync
command. 

The target-share is for example:
grep base /proc/mounts
jaguar.grafnetz:/base /mnt/base nfs4
rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.0.9,local_lock=none,addr=192.168.0.7
0 0

df shows this nfs-share called base when mounted:
jaguar.grafnetz:/base   11718572032 6012592128 5705979904   52% /mnt/base

Now I use a simpe cp-command:
cp big-fime.dd.image /mnt/base/test_01
The machine crashes after 7833735168 Bytes reached the Target-Server. About 7,9
GB (with G=1000^3). 

I can reproduce this crash. 

The good thing is: I figured out that no kernel crash happens when I do not
use:
tc qdisc add dev enp3s0 root handle 1: sfq
tc qdisc show dev enp3s0
(So I commented it out from my local start-script and rebootet the system.)
Result: No crash any more. Copying the big file (124GB) completed without a
kernel crash. 

Additional Information...

NIC is configured with IPv4:
haswell ~ # ifconfig
enp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.0.9  netmask 255.255.255.0  broadcast 192.168.0.255
        ether d4:3d:7e:bd:89:44  txqueuelen 1000  (Ethernet)
        RX packets 7399483  bytes 511559908 (487.8 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 91781850  bytes 47176316774 (43.9 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device interrupt 19  

ethtool enp3s0
Settings for enp3s0:
        Supported ports: [ TP ]
        Supported link modes:   10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Full 
        Supported pause frame use: Symmetric Receive-only
        Supports auto-negotiation: Yes
        Supported FEC modes: Not reported
        Advertised link modes:  10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Full 
        Advertised pause frame use: Symmetric
        Advertised auto-negotiation: Yes
        Advertised FEC modes: Not reported
        Speed: 1000Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 0
        Transceiver: internal
        Auto-negotiation: on
        MDI-X: Unknown
        Current message level: 0x000060e4 (24804)
                               link ifup rx_err tx_err hw wol
        Link detected: yes

While copying over the Gigabit-Network, speed is near maximum:

ifstat
      enp2s0      
 KB/s in  KB/s out
    0.06      0.18
 8348.65     31.60
117536.2    435.11
118049.0    435.04
119100.9    434.84
118889.7    435.19
119004.1    444.53
119061.4    440.47
119102.8    444.04
119077.4    444.39
119084.1    432.32
119089.6    439.71
[...]

So, perhaps the sfq-Kernel-module has a bug. I use the vanilla kernel from
kernel.org and sfq is compiled as a module. 

/usr/src/linux # grep SFQ .config
CONFIG_NET_SCH_SFQ=m

Perhaps important: the server with the target-share also uses sfq with the same
settings without a problem. It runs stable.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2018-09-16  4:47 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-09-15 23:26 Fw: [Bug 201137] New: using traffic control with sfq cause kernel crash Stephen Hemminger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.