Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next 0/5] rtnetlink: add IFA_IF_NETNSID for RTM_GETADDR
From: Christian Brauner @ 2018-08-29 18:13 UTC (permalink / raw)
  To: Kirill Tkhai
  Cc: netdev, linux-kernel, davem, kuznet, yoshfuji, pombredanne,
	kstewart, gregkh, dsahern, fw, lucien.xin, jakub.kicinski, jbenc,
	nicolas.dichtel
In-Reply-To: <adc2fae5-e22b-3a74-d531-01570e7970ee@virtuozzo.com>

Hi Kirill,

Thanks for the question!

On Wed, Aug 29, 2018 at 11:30:37AM +0300, Kirill Tkhai wrote:
> Hi, Christian,
> 
> On 29.08.2018 02:18, Christian Brauner wrote:
> > From: Christian Brauner <christian@brauner.io>
> > 
> > Hey,
> > 
> > A while back we introduced and enabled IFLA_IF_NETNSID in
> > RTM_{DEL,GET,NEW}LINK requests (cf. [1], [2], [3], [4], [5]). This has led
> > to signficant performance increases since it allows userspace to avoid
> > taking the hit of a setns(netns_fd, CLONE_NEWNET), then getting the
> > interfaces from the netns associated with the netns_fd. Especially when a
> > lot of network namespaces are in use, using setns() becomes increasingly
> > problematic when performance matters.
> 
> could you please give a real example, when setns()+socket(AF_NETLINK) cause
> problems with the performance? You should do this only once on application
> startup, and then you have created netlink sockets in any net namespaces you
> need. What is the problem here?

So we have a daemon (LXD) that is often running thousands of containers.
When users issue a lxc list request against the daemon it returns a list
of all containers including all of the interfaces and addresses for each
container. To retrieve those addresses we currently rely on setns() +
getifaddrs() for each of those containers. That has horrible
performance.
The problem with what you're proposing is that the daemon would need to
cache a socket file descriptor for each container which is something
that we unfortunately cannot do since we can't excessively cache file
descriptors because we can easily hit the open file limit. We also
refrain from caching file descriptors for a long time for security
reasons.

For the case where users just request a list of the interfaces we
can already use RTM_GETLINK + IFLA_IF_NETNS which has way better
performance. But we can't do the same with RTM_GETADDR requests which
was an oversight on my part when I wrote the original patchset for the
RTM_*LINK requests. This just rectifies this and aligns RTM_GETLINK +
RTM_GETADDR.
Based on this patchset I have written a userspace POC that is basically
a netns namespace aware getifaddr() or - as I like to call it -
netns_getifaddr().

> 
> > Usually, RTML_GETLINK requests are followed by RTM_GETADDR requests (cf.
> > getifaddrs() style functions and friends). But currently, RTM_GETADDR
> > requests do not support a similar property like IFLA_IF_NETNSID for
> > RTM_*LINK requests.
> > This is problematic since userspace can retrieve interfaces from another
> > network namespace by sending a IFLA_IF_NETNSID property along but
> > RTM_GETLINK request but is still forced to use the legacy setns() style of
> > retrieving interfaces in RTM_GETADDR requests.
> > 
> > The goal of this series is to make it possible to perform RTM_GETADDR
> > requests on different network namespaces. To this end a new IFA_IF_NETNSID
> > property for RTM_*ADDR requests is introduced. It can be used to send a
> > network namespace identifier along in RTM_*ADDR requests.  The network
> > namespace identifier will be used to retrieve the target network namespace
> > in which the request is supposed to be fulfilled.  This aligns the behavior
> > of RTM_*ADDR requests with the behavior of RTM_*LINK requests.
> > 
> > Security:
> > - The caller must have assigned a valid network namespace identifier for
> >   the target network namespace.
> > - The caller must have CAP_NET_ADMIN in the owning user namespace of the
> >   target network namespace.
> > 
> > Thanks!
> > Christian
> > 
> > [1]: commit 7973bfd8758d ("rtnetlink: remove check for IFLA_IF_NETNSID")
> > [2]: commit 5bb8ed075428 ("rtnetlink: enable IFLA_IF_NETNSID for RTM_NEWLINK")
> > [3]: commit b61ad68a9fe8 ("rtnetlink: enable IFLA_IF_NETNSID for RTM_DELLINK")
> > [4]: commit c310bfcb6e1b ("rtnetlink: enable IFLA_IF_NETNSID for RTM_SETLINK")
> > [5]: commit 7c4f63ba8243 ("rtnetlink: enable IFLA_IF_NETNSID in do_setlink()")
> > 
> > Christian Brauner (5):
> >   rtnetlink: add rtnl_get_net_ns_capable()
> >   if_addr: add IFA_IF_NETNSID
> >   ipv4: enable IFA_IF_NETNSID for RTM_GETADDR
> >   ipv6: enable IFA_IF_NETNSID for RTM_GETADDR
> >   rtnetlink: move type calculation out of loop
> > 
> >  include/net/rtnetlink.h      |  1 +
> >  include/uapi/linux/if_addr.h |  1 +
> >  net/core/rtnetlink.c         | 15 +++++---
> >  net/ipv4/devinet.c           | 38 +++++++++++++++-----
> >  net/ipv6/addrconf.c          | 70 ++++++++++++++++++++++++++++--------
> >  5 files changed, 97 insertions(+), 28 deletions(-)
> > 

^ permalink raw reply

* Re: [PATCH] Revert "net: stmmac: Do not keep rearming the coalesce timer in stmmac_xmit"
From: Martin Blumenstingl @ 2018-08-29 18:05 UTC (permalink / raw)
  To: Jose.Abreu
  Cc: jbrunet, peppe.cavallaro, alexandre.torgue, netdev, linux-amlogic,
	Joao.Pinto, clabbe, linux-kernel, Vitor.Soares
In-Reply-To: <CAFBinCA+Q6vLzutdMLOgeTDg5GBkqvp3YzBQiUd_y-qyiz=dsQ@mail.gmail.com>

On Wed, Aug 29, 2018 at 8:03 PM Martin Blumenstingl
<martin.blumenstingl@googlemail.com> wrote:
[snip]
> from that point on:
> # dmesg | tail -n 4
> [   47.782778] RTL8211F Gigabit Ethernet 0.2009087f:00: attached PHY driver [RTL8211F Gigabit Ethernet] (mii_bus:phy_addr=0.2009087f:00, irq=30)
> [   47.803540] meson8b-dwmac c9410000.ethernet eth0: No Safety Features support found
> [   47.805505] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
> [   50.311975] meson8b-dwmac c9410000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
I forgot to mention: after traffic doesn't flow anymore there are no
additional messages in the kernel log
thus the output of "dmesg | tail -n 4" is identical before and after
the test run


Regards
Martin

^ permalink raw reply

* Re: [PATCH] Revert "net: stmmac: Do not keep rearming the coalesce timer in stmmac_xmit"
From: Martin Blumenstingl @ 2018-08-29 18:03 UTC (permalink / raw)
  To: Jose.Abreu
  Cc: jbrunet, peppe.cavallaro, alexandre.torgue, netdev, linux-amlogic,
	Joao.Pinto, clabbe, linux-kernel, Vitor.Soares
In-Reply-To: <d3858ef0-97c7-28b5-db4a-4ac71af52ba5@synopsys.com>

Hi Jose,

On Tue, Aug 28, 2018 at 10:12 AM Jose Abreu <Jose.Abreu@synopsys.com> wrote:
>
> Hi Jerome,
>
> On 24-08-2018 10:04, Jerome Brunet wrote:
> > This reverts commit 4ae0169fd1b3c792b66be58995b7e6b629919ecf.
> >
> > This change in the handling of the coalesce timer is causing regression on
> > (at least) amlogic platforms.
> >
> > Network will break down very quickly (a few seconds) after starting
> > a download. This can easily be reproduced using iperf3 for example.
> >
> > The problem has been reported on the S805, S905, S912 and A113 SoCs
> > (Realtek and Micrel PHYs) and it is likely impacting all Amlogics
> > platforms using Gbit ethernet
> >
> > No problem was seen with the platform using 10/100 only PHYs (GXL internal)
> >
> > Reverting change brings things back to normal and allows to use network
> > again until we better understand the problem with the coalesce timer.
> >
> >
>
> Apologies for the delayed answer but I was in FTO.
I hope you were able to enjoy your time off

> I'm not sure what can be causing this but I have some questions
> for you:
>     - What do you mean by "network will break down"? Do you see
> queue timeout?
in case of iperf3 traffic just stops

>     - What do you see in ethtool/ifconfig stats? Can you send me
> the stats before and after network break?
see below for my exact steps. let me know if you need more information
(but be prepared for some delay until I respond)

>     - Is your setup multi-queue/channel?
we don't specify anything in the .dtsi files of our platform, so the
"default" is being used

>     - Can you point me to the DT bindings of your setup?
in this specific case I used
arch/arm64/boot/dts/amlogic/meson-gxm-khadas-vim2.dts which inherits
the platform settings from arch/arm64/boot/dts/amlogic/meson-gx.dtsi
("ethmac" node)

so here's my exact steps to reproduce:
- boot the board
- IP address is auto-assigned via DHCP
- iperf3 -c 192.168.1.100
- wait until network breaks, CTRL+C
- ifconfig eth0 down
- ifconfig eth0 up
- (now Ethernet is working again)

from that point on:
# dmesg | tail -n 4
[   47.782778] RTL8211F Gigabit Ethernet 0.2009087f:00: attached PHY
driver [RTL8211F Gigabit Ethernet] (mii_bus:phy_addr=0.2009087f:00,
irq=30)
[   47.803540] meson8b-dwmac c9410000.ethernet eth0: No Safety
Features support found
[   47.805505] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
[   50.311975] meson8b-dwmac c9410000.ethernet eth0: Link is Up -
1Gbps/Full - flow control rx/tx

# ethtool -S eth0
NIC statistics:
     mmc_tx_octetcount_gb: 0
     mmc_tx_framecount_gb: 0
     mmc_tx_broadcastframe_g: 0
     mmc_tx_multicastframe_g: 0
     mmc_tx_64_octets_gb: 0
     mmc_tx_65_to_127_octets_gb: 0
     mmc_tx_128_to_255_octets_gb: 0
     mmc_tx_256_to_511_octets_gb: 0
     mmc_tx_512_to_1023_octets_gb: 0
     mmc_tx_1024_to_max_octets_gb: 0
     mmc_tx_unicast_gb: 0
     mmc_tx_multicast_gb: 0
     mmc_tx_broadcast_gb: 0
     mmc_tx_underflow_error: 0
     mmc_tx_singlecol_g: 0
     mmc_tx_multicol_g: 0
     mmc_tx_deferred: 0
     mmc_tx_latecol: 0
     mmc_tx_exesscol: 0
     mmc_tx_carrier_error: 0
     mmc_tx_octetcount_g: 0
     mmc_tx_framecount_g: 0
     mmc_tx_excessdef: 0
     mmc_tx_pause_frame: 0
     mmc_tx_vlan_frame_g: 0
     mmc_rx_framecount_gb: 43
     mmc_rx_octetcount_gb: 3096
     mmc_rx_octetcount_g: 3096
     mmc_rx_broadcastframe_g: 38
     mmc_rx_multicastframe_g: 0
     mmc_rx_crc_error: 0
     mmc_rx_align_error: 0
     mmc_rx_run_error: 0
     mmc_rx_jabber_error: 0
     mmc_rx_undersize_g: 0
     mmc_rx_oversize_g: 0
     mmc_rx_64_octets_gb: 29
     mmc_rx_65_to_127_octets_gb: 13
     mmc_rx_128_to_255_octets_gb: 0
     mmc_rx_256_to_511_octets_gb: 1
     mmc_rx_512_to_1023_octets_gb: 0
     mmc_rx_1024_to_max_octets_gb: 0
     mmc_rx_unicast_g: 5
     mmc_rx_length_error: 0
     mmc_rx_autofrangetype: 0
     mmc_rx_pause_frames: 0
     mmc_rx_fifo_overflow: 0
     mmc_rx_vlan_frames_gb: 0
     mmc_rx_watchdog_error: 0
     mmc_rx_ipc_intr_mask: 1073692671
     mmc_rx_ipc_intr: 0
     mmc_rx_ipv4_gd: 15
     mmc_rx_ipv4_hderr: 0
     mmc_rx_ipv4_nopay: 0
     mmc_rx_ipv4_frag: 0
     mmc_rx_ipv4_udsbl: 0
     mmc_rx_ipv4_gd_octets: 1028
     mmc_rx_ipv4_hderr_octets: 0
     mmc_rx_ipv4_nopay_octets: 0
     mmc_rx_ipv4_frag_octets: 0
     mmc_rx_ipv4_udsbl_octets: 0
     mmc_rx_ipv6_gd_octets: 0
     mmc_rx_ipv6_hderr_octets: 0
     mmc_rx_ipv6_nopay_octets: 0
     mmc_rx_ipv6_gd: 0
     mmc_rx_ipv6_hderr: 0
     mmc_rx_ipv6_nopay: 0
     mmc_rx_udp_gd: 11
     mmc_rx_udp_err: 0
     mmc_rx_tcp_gd: 4
     mmc_rx_tcp_err: 0
     mmc_rx_icmp_gd: 0
     mmc_rx_icmp_err: 0
     mmc_rx_udp_gd_octets: 592
     mmc_rx_udp_err_octets: 0
     mmc_rx_tcp_gd_octets: 136
     mmc_rx_tcp_err_octets: 0
     mmc_rx_icmp_gd_octets: 0
     mmc_rx_icmp_err_octets: 0
     tx_underflow: 0
     tx_carrier: 0
     tx_losscarrier: 0
     vlan_tag: 0
     tx_deferred: 0
     tx_vlan: 0
     tx_jabber: 0
     tx_frame_flushed: 0
     tx_payload_error: 0
     tx_ip_header_error: 0
     rx_desc: 0
     sa_filter_fail: 0
     overflow_error: 0
     ipc_csum_error: 0
     rx_collision: 0
     rx_crc_errors: 0
     dribbling_bit: 0
     rx_length: 0
     rx_mii: 0
     rx_multicast: 0
     rx_gmac_overflow: 0
     rx_watchdog: 0
     da_rx_filter_fail: 0
     sa_rx_filter_fail: 0
     rx_missed_cntr: 0
     rx_overflow_cntr: 0
     rx_vlan: 0
     tx_undeflow_irq: 0
     tx_process_stopped_irq: 0
     tx_jabber_irq: 0
     rx_overflow_irq: 0
     rx_buf_unav_irq: 0
     rx_process_stopped_irq: 0
     rx_watchdog_irq: 0
     tx_early_irq: 0
     fatal_bus_error_irq: 0
     rx_early_irq: 0
     threshold: 1
     tx_pkt_n: 8
     rx_pkt_n: 43
     normal_irq_n: 38
     rx_normal_irq_n: 34
     napi_poll: 38
     tx_normal_irq_n: 4
     tx_clean: 43
     tx_set_ic_bit: 4
     irq_receive_pmt_irq_n: 0
     mmc_tx_irq_n: 0
     mmc_rx_irq_n: 0
     mmc_rx_csum_offload_irq_n: 0
     irq_tx_path_in_lpi_mode_n: 0
     irq_tx_path_exit_lpi_mode_n: 0
     irq_rx_path_in_lpi_mode_n: 0
     irq_rx_path_exit_lpi_mode_n: 0
     phy_eee_wakeup_error_n: 0
     ip_hdr_err: 0
     ip_payload_err: 0
     ip_csum_bypassed: 0
     ipv4_pkt_rcvd: 0
     ipv6_pkt_rcvd: 0
     no_ptp_rx_msg_type_ext: 0
     ptp_rx_msg_type_sync: 0
     ptp_rx_msg_type_follow_up: 0
     ptp_rx_msg_type_delay_req: 0
     ptp_rx_msg_type_delay_resp: 0
     ptp_rx_msg_type_pdelay_req: 0
     ptp_rx_msg_type_pdelay_resp: 0
     ptp_rx_msg_type_pdelay_follow_up: 0
     ptp_rx_msg_type_announce: 0
     ptp_rx_msg_type_management: 0
     ptp_rx_msg_pkt_reserved_type: 0
     ptp_frame_type: 0
     ptp_ver: 0
     timestamp_dropped: 0
     av_pkt_rcvd: 0
     av_tagged_pkt_rcvd: 0
     vlan_tag_priority_val: 0
     l3_filter_match: 0
     l4_filter_match: 0
     l3_l4_filter_no_match: 0
     irq_pcs_ane_n: 0
     irq_pcs_link_n: 0
     irq_rgmii_n: 0
     mtl_tx_status_fifo_full: 0
     mtl_tx_fifo_not_empty: 0
     mmtl_fifo_ctrl: 0
     mtl_tx_fifo_read_ctrl_write: 0
     mtl_tx_fifo_read_ctrl_wait: 0
     mtl_tx_fifo_read_ctrl_read: 0
     mtl_tx_fifo_read_ctrl_idle: 0
     mac_tx_in_pause: 0
     mac_tx_frame_ctrl_xfer: 0
     mac_tx_frame_ctrl_idle: 0
     mac_tx_frame_ctrl_wait: 0
     mac_tx_frame_ctrl_pause: 0
     mac_gmii_tx_proto_engine: 0
     mtl_rx_fifo_fill_level_full: 0
     mtl_rx_fifo_fill_above_thresh: 0
     mtl_rx_fifo_fill_below_thresh: 0
     mtl_rx_fifo_fill_level_empty: 0
     mtl_rx_fifo_read_ctrl_flush: 0
     mtl_rx_fifo_read_ctrl_read_data: 0
     mtl_rx_fifo_read_ctrl_status: 0
     mtl_rx_fifo_read_ctrl_idle: 0
     mtl_rx_fifo_ctrl_active: 0
     mac_rx_frame_ctrl_fifo: 0
     mac_gmii_rx_proto_engine: 0
     tx_tso_frames: 0
     tx_tso_nfrags: 0
# ethtool eth0
Settings for eth0:
        Supported ports: [ TP MII ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Supported pause frame use: Symmetric Receive-only
        Supports auto-negotiation: Yes
        Supported FEC modes: Not reported
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Advertised pause frame use: No
        Advertised auto-negotiation: Yes
        Advertised FEC modes: Not reported
        Link partner advertised link modes:  10baseT/Half 10baseT/Full
                                             100baseT/Half 100baseT/Full
                                             1000baseT/Full
        Link partner advertised pause frame use: Symmetric Receive-only
        Link partner advertised auto-negotiation: Yes
        Link partner advertised FEC modes: Not reported
        Speed: 1000Mb/s
        Duplex: Full
        Port: MII
        PHYAD: 0
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: ug
        Wake-on: d
        Current message level: 0x0000003f (63)
                               drv probe link timer ifdown ifup
        Link detected: yes
# ifconfig eth0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.1.116  netmask 255.255.255.0  broadcast 192.168.1.255
        ether 66:d8:d0:34:f1:cb  txqueuelen 1000  (Ethernet)
        RX packets 4420  bytes 292864 (286.0 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 19165  bytes 28975001 (27.6 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device interrupt 21

# iperf3 -c 192.168.1.100
Connecting to host 192.168.1.100, port 5201
[  5] local 192.168.1.116 port 50608 connected to 192.168.1.100 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   111 MBytes   932 Mbits/sec    1    338 KBytes
[  5]   1.00-2.00   sec   111 MBytes   930 Mbits/sec    0    351 KBytes
[  5]   2.00-3.00   sec   112 MBytes   938 Mbits/sec    0    375 KBytes
[  5]   3.00-4.00   sec   112 MBytes   943 Mbits/sec    0    390 KBytes
[  5]   4.00-5.00   sec  50.3 MBytes   421 Mbits/sec    1   1.41 KBytes
[  5]   5.00-6.00   sec  0.00 Bytes  0.00 bits/sec    1   1.41 KBytes
[  5]   6.00-7.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes
[  5]   7.00-8.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes
[  5]   8.00-9.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes
[  5]  10.00-29.11  sec  0.00 Bytes  0.00 bits/sec    2   1.41 KBytes
<CTRL+C after ~30 seconds>
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-29.11  sec   496 MBytes   143 Mbits/sec    5             sender
[  5]   0.00-29.11  sec  0.00 Bytes  0.00 bits/sec                  receiver
iperf3: interrupt - the client has terminated

# ethtool -S eth0
NIC statistics:
     mmc_tx_octetcount_gb: 0
     mmc_tx_framecount_gb: 0
     mmc_tx_broadcastframe_g: 0
     mmc_tx_multicastframe_g: 0
     mmc_tx_64_octets_gb: 0
     mmc_tx_65_to_127_octets_gb: 0
     mmc_tx_128_to_255_octets_gb: 0
     mmc_tx_256_to_511_octets_gb: 0
     mmc_tx_512_to_1023_octets_gb: 0
     mmc_tx_1024_to_max_octets_gb: 0
     mmc_tx_unicast_gb: 0
     mmc_tx_multicast_gb: 0
     mmc_tx_broadcast_gb: 0
     mmc_tx_underflow_error: 0
     mmc_tx_singlecol_g: 0
     mmc_tx_multicol_g: 0
     mmc_tx_deferred: 0
     mmc_tx_latecol: 0
     mmc_tx_exesscol: 0
     mmc_tx_carrier_error: 0
     mmc_tx_octetcount_g: 0
     mmc_tx_framecount_g: 0
     mmc_tx_excessdef: 0
     mmc_tx_pause_frame: 0
     mmc_tx_vlan_frame_g: 0
     mmc_rx_framecount_gb: 16766
     mmc_rx_octetcount_gb: 1173986
     mmc_rx_octetcount_g: 1173986
     mmc_rx_broadcastframe_g: 98
     mmc_rx_multicastframe_g: 0
     mmc_rx_crc_error: 0
     mmc_rx_align_error: 0
     mmc_rx_run_error: 0
     mmc_rx_jabber_error: 0
     mmc_rx_undersize_g: 0
     mmc_rx_oversize_g: 0
     mmc_rx_64_octets_gb: 96
     mmc_rx_65_to_127_octets_gb: 16669
     mmc_rx_128_to_255_octets_gb: 0
     mmc_rx_256_to_511_octets_gb: 1
     mmc_rx_512_to_1023_octets_gb: 0
     mmc_rx_1024_to_max_octets_gb: 0
     mmc_rx_unicast_g: 16668
     mmc_rx_length_error: 0
     mmc_rx_autofrangetype: 0
     mmc_rx_pause_frames: 0
     mmc_rx_fifo_overflow: 0
     mmc_rx_vlan_frames_gb: 0
     mmc_rx_watchdog_error: 0
     mmc_rx_ipc_intr_mask: 2147385342
     mmc_rx_ipc_intr: 0
     mmc_rx_ipv4_gd: 16671
     mmc_rx_ipv4_hderr: 0
     mmc_rx_ipv4_nopay: 0
     mmc_rx_ipv4_frag: 0
     mmc_rx_ipv4_udsbl: 0
     mmc_rx_ipv4_gd_octets: 867822
     mmc_rx_ipv4_hderr_octets: 0
     mmc_rx_ipv4_nopay_octets: 0
     mmc_rx_ipv4_frag_octets: 0
     mmc_rx_ipv4_udsbl_octets: 0
     mmc_rx_ipv6_gd_octets: 0
     mmc_rx_ipv6_hderr_octets: 0
     mmc_rx_ipv6_nopay_octets: 0
     mmc_rx_ipv6_gd: 0
     mmc_rx_ipv6_hderr: 0
     mmc_rx_ipv6_nopay: 0
     mmc_rx_udp_gd: 28
     mmc_rx_udp_err: 0
     mmc_rx_tcp_gd: 16643
     mmc_rx_tcp_err: 0
     mmc_rx_icmp_gd: 0
     mmc_rx_icmp_err: 0
     mmc_rx_udp_gd_octets: 1068
     mmc_rx_udp_err_octets: 0
     mmc_rx_tcp_gd_octets: 533334
     mmc_rx_tcp_err_octets: 0
     mmc_rx_icmp_gd_octets: 0
     mmc_rx_icmp_err_octets: 0
     tx_underflow: 0
     tx_carrier: 0
     tx_losscarrier: 0
     vlan_tag: 0
     tx_deferred: 0
     tx_vlan: 0
     tx_jabber: 0
     tx_frame_flushed: 0
     tx_payload_error: 0
     tx_ip_header_error: 0
     rx_desc: 0
     sa_filter_fail: 0
     overflow_error: 0
     ipc_csum_error: 0
     rx_collision: 0
     rx_crc_errors: 0
     dribbling_bit: 0
     rx_length: 0
     rx_mii: 0
     rx_multicast: 0
     rx_gmac_overflow: 0
     rx_watchdog: 0
     da_rx_filter_fail: 0
     sa_rx_filter_fail: 0
     rx_missed_cntr: 0
     rx_overflow_cntr: 0
     rx_vlan: 0
     tx_undeflow_irq: 0
     tx_process_stopped_irq: 0
     tx_jabber_irq: 0
     rx_overflow_irq: 0
     rx_buf_unav_irq: 0
     rx_process_stopped_irq: 0
     rx_watchdog_irq: 0
     tx_early_irq: 0
     fatal_bus_error_irq: 0
     rx_early_irq: 11717
     threshold: 1
     tx_pkt_n: 359107
     rx_pkt_n: 16660
     normal_irq_n: 108987
     rx_normal_irq_n: 7449
     napi_poll: 107154
     tx_normal_irq_n: 105918
     tx_clean: 107179
     tx_set_ic_bit: 179554
     irq_receive_pmt_irq_n: 0
     mmc_tx_irq_n: 0
     mmc_rx_irq_n: 0
     mmc_rx_csum_offload_irq_n: 0
     irq_tx_path_in_lpi_mode_n: 0
     irq_tx_path_exit_lpi_mode_n: 0
     irq_rx_path_in_lpi_mode_n: 0
     irq_rx_path_exit_lpi_mode_n: 0
     phy_eee_wakeup_error_n: 0
     ip_hdr_err: 0
     ip_payload_err: 0
     ip_csum_bypassed: 0
     ipv4_pkt_rcvd: 0
     ipv6_pkt_rcvd: 0
     no_ptp_rx_msg_type_ext: 0
     ptp_rx_msg_type_sync: 0
     ptp_rx_msg_type_follow_up: 0
     ptp_rx_msg_type_delay_req: 0
     ptp_rx_msg_type_delay_resp: 0
     ptp_rx_msg_type_pdelay_req: 0
     ptp_rx_msg_type_pdelay_resp: 0
     ptp_rx_msg_type_pdelay_follow_up: 0
     ptp_rx_msg_type_announce: 0
     ptp_rx_msg_type_management: 0
     ptp_rx_msg_pkt_reserved_type: 0
     ptp_frame_type: 0
     ptp_ver: 0
     timestamp_dropped: 0
     av_pkt_rcvd: 0
     av_tagged_pkt_rcvd: 0
     vlan_tag_priority_val: 0
     l3_filter_match: 0
     l4_filter_match: 0
     l3_l4_filter_no_match: 0
     irq_pcs_ane_n: 0
     irq_pcs_link_n: 0
     irq_rgmii_n: 0
     mtl_tx_status_fifo_full: 0
     mtl_tx_fifo_not_empty: 0
     mmtl_fifo_ctrl: 0
     mtl_tx_fifo_read_ctrl_write: 0
     mtl_tx_fifo_read_ctrl_wait: 0
     mtl_tx_fifo_read_ctrl_read: 0
     mtl_tx_fifo_read_ctrl_idle: 0
     mac_tx_in_pause: 0
     mac_tx_frame_ctrl_xfer: 0
     mac_tx_frame_ctrl_idle: 0
     mac_tx_frame_ctrl_wait: 0
     mac_tx_frame_ctrl_pause: 0
     mac_gmii_tx_proto_engine: 0
     mtl_rx_fifo_fill_level_full: 0
     mtl_rx_fifo_fill_above_thresh: 0
     mtl_rx_fifo_fill_below_thresh: 0
     mtl_rx_fifo_fill_level_empty: 0
     mtl_rx_fifo_read_ctrl_flush: 0
     mtl_rx_fifo_read_ctrl_read_data: 0
     mtl_rx_fifo_read_ctrl_status: 0
     mtl_rx_fifo_read_ctrl_idle: 0
     mtl_rx_fifo_ctrl_active: 0
     mac_rx_frame_ctrl_fifo: 0
     mac_gmii_rx_proto_engine: 0
     tx_tso_frames: 0
     tx_tso_nfrags: 0
# ethtool eth0
Settings for eth0:
        Supported ports: [ TP MII ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Supported pause frame use: Symmetric Receive-only
        Supports auto-negotiation: Yes
        Supported FEC modes: Not reported
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Advertised pause frame use: No
        Advertised auto-negotiation: Yes
        Advertised FEC modes: Not reported
        Link partner advertised link modes:  10baseT/Half 10baseT/Full
                                             100baseT/Half 100baseT/Full
                                             1000baseT/Full
        Link partner advertised pause frame use: Symmetric Receive-only
        Link partner advertised auto-negotiation: Yes
        Link partner advertised FEC modes: Not reported
        Speed: 1000Mb/s
        Duplex: Full
        Port: MII
        PHYAD: 0
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: ug
        Wake-on: d
        Current message level: 0x0000003f (63)
                               drv probe link timer ifdown ifup
        Link detected: yes
# ifconfig eth0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.1.116  netmask 255.255.255.0  broadcast 192.168.1.255
        ether 66:d8:d0:34:f1:cb  txqueuelen 1000  (Ethernet)
        RX packets 21021  bytes 1389130 (1.3 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 378272  bytes 572598814 (546.0 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device interrupt 21


Regards
Martin

^ permalink raw reply

* [PATCH] wil6210: fix unsigned cid comparison with >= 0
From: Gustavo A. R. Silva @ 2018-08-29 17:50 UTC (permalink / raw)
  To: Maya Erez, Kalle Valo, David S. Miller
  Cc: linux-wireless, wil6210, netdev, linux-kernel,
	Gustavo A. R. Silva

The comparison of cid >= 0 is always true because cid is of type u8
(8 bits, unsigned).

Fix this by removing such comparison and updating the type of
variable cid to u8 in the caller function.

Addresses-Coverity-ID: 1473079 ("Unsigned compared against 0")
Fixes: b9010f105f21 ("wil6210: add FT roam support for AP and station")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
---
 drivers/net/wireless/ath/wil6210/wil6210.h | 2 +-
 drivers/net/wireless/ath/wil6210/wmi.c     | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/ath/wil6210/wil6210.h b/drivers/net/wireless/ath/wil6210/wil6210.h
index cf6a691..abb8201 100644
--- a/drivers/net/wireless/ath/wil6210/wil6210.h
+++ b/drivers/net/wireless/ath/wil6210/wil6210.h
@@ -455,7 +455,7 @@ static inline void parse_cidxtid(u8 cidxtid, u8 *cid, u8 *tid)
  */
 static inline bool wil_cid_valid(u8 cid)
 {
-	return (cid >= 0 && cid < WIL6210_MAX_CID);
+	return cid < WIL6210_MAX_CID;
 }
 
 struct wil6210_mbox_ring {
diff --git a/drivers/net/wireless/ath/wil6210/wmi.c b/drivers/net/wireless/ath/wil6210/wmi.c
index c3ad8e4..4859f0e 100644
--- a/drivers/net/wireless/ath/wil6210/wmi.c
+++ b/drivers/net/wireless/ath/wil6210/wmi.c
@@ -1177,7 +1177,7 @@ static void wmi_evt_ring_en(struct wil6210_vif *vif, int id, void *d, int len)
 	u8 vri = evt->ring_index;
 	struct wireless_dev *wdev = vif_to_wdev(vif);
 	struct wil_sta_info *sta;
-	int cid;
+	u8 cid;
 	struct key_params params;
 
 	wil_dbg_wmi(wil, "Enable vring %d MID %d\n", vri, vif->mid);
-- 
2.7.4

^ permalink raw reply related

* [iproute PATCH] iprule: Fix for incorrect space between dst and prefix
From: Phil Sutter @ 2018-08-29 13:52 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

This was added by accident when introducing JSON support.

Fixes: 0dd4ccc56c0e3 ("iprule: add json support")
Signed-off-by: Phil Sutter <phil@nwl.cc>
---
 ip/iprule.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ip/iprule.c b/ip/iprule.c
index 8b9421431c26a..744d6d88e3433 100644
--- a/ip/iprule.c
+++ b/ip/iprule.c
@@ -239,7 +239,7 @@ int print_rule(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
 
 		print_string(PRINT_FP, NULL, "to ", NULL);
 		print_color_string(PRINT_ANY, ifa_family_color(frh->family),
-				   "dst", "%s ", dst);
+				   "dst", "%s", dst);
 		if (frh->dst_len != host_len)
 			print_uint(PRINT_ANY, "dstlen", "/%u ", frh->dst_len);
 		else
-- 
2.18.0

^ permalink raw reply related

* [PATCH v2 5/6] net/wan/fsl_ucc_hdlc: GUMR for non tsa mode
From: David Gounaris @ 2018-08-29 13:13 UTC (permalink / raw)
  To: qiang.zhao, netdev, linuxppc-dev, robh+dt; +Cc: David Gounaris
In-Reply-To: <20180829131328.27901-1-david.gounaris@infinera.com>

The following bits in the GUMR is changed for non
tsa mode: CDS, CTSP and CTSS are set to zero.

When set, there is no tx interrupts from the controller.

Signed-off-by: David Gounaris <david.gounaris@infinera.com>
---
 drivers/net/wan/fsl_ucc_hdlc.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/wan/fsl_ucc_hdlc.c b/drivers/net/wan/fsl_ucc_hdlc.c
index be5b0096af3b..248f1f5bcd04 100644
--- a/drivers/net/wan/fsl_ucc_hdlc.c
+++ b/drivers/net/wan/fsl_ucc_hdlc.c
@@ -97,6 +97,13 @@ static int uhdlc_init(struct ucc_hdlc_private *priv)
 	if (priv->tsa) {
 		uf_info->tsa = 1;
 		uf_info->ctsp = 1;
+		uf_info->cds = 1;
+		uf_info->ctss = 1;
+	}
+	else {
+		uf_info->cds = 0;
+		uf_info->ctsp = 0;
+		uf_info->ctss = 0;
 	}
 
 	/* This sets HPM register in CMXUCR register which configures a
-- 
2.13.6

^ permalink raw reply related

* [PATCH v2 6/6] net/wan/fsl_ucc_hdlc: tx timeout handler
From: David Gounaris @ 2018-08-29 13:13 UTC (permalink / raw)
  To: qiang.zhao, netdev, linuxppc-dev, robh+dt; +Cc: David Gounaris
In-Reply-To: <20180829131328.27901-1-david.gounaris@infinera.com>

Added tx timerout handler. This helps
when troubleshooting.

Signed-off-by: David Gounaris <david.gounaris@infinera.com>
---
 drivers/net/wan/fsl_ucc_hdlc.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/wan/fsl_ucc_hdlc.c b/drivers/net/wan/fsl_ucc_hdlc.c
index 248f1f5bcd04..629ef5049d27 100644
--- a/drivers/net/wan/fsl_ucc_hdlc.c
+++ b/drivers/net/wan/fsl_ucc_hdlc.c
@@ -1002,11 +1002,15 @@ static const struct dev_pm_ops uhdlc_pm_ops = {
 #define HDLC_PM_OPS NULL
 
 #endif
+static void uhdlc_tx_timeout(struct net_device *ndev) {
+	netdev_err(ndev, "%s\n", __FUNCTION__);
+}
 static const struct net_device_ops uhdlc_ops = {
 	.ndo_open       = uhdlc_open,
 	.ndo_stop       = uhdlc_close,
 	.ndo_start_xmit = hdlc_start_xmit,
 	.ndo_do_ioctl   = uhdlc_ioctl,
+	.ndo_tx_timeout	= uhdlc_tx_timeout,
 };
 
 static int ucc_hdlc_probe(struct platform_device *pdev)
@@ -1125,6 +1129,7 @@ static int ucc_hdlc_probe(struct platform_device *pdev)
 	hdlc = dev_to_hdlc(dev);
 	dev->tx_queue_len = 16;
 	dev->netdev_ops = &uhdlc_ops;
+	dev->watchdog_timeo = 2*HZ;
 	hdlc->attach = ucc_hdlc_attach;
 	hdlc->xmit = ucc_hdlc_tx;
 	netif_napi_add(dev, &uhdlc_priv->napi, ucc_hdlc_poll, 32);
-- 
2.13.6

^ permalink raw reply related

* [PATCH v2 4/6] net/wan/fsl_ucc_hdlc: hmask
From: David Gounaris @ 2018-08-29 13:13 UTC (permalink / raw)
  To: qiang.zhao, netdev, linuxppc-dev, robh+dt; +Cc: David Gounaris
In-Reply-To: <20180829131328.27901-1-david.gounaris@infinera.com>

Ability to set hmask in the device-tree,
which can be used to change address
filtering of packets.

Signed-off-by: David Gounaris <david.gounaris@infinera.com>
---
 Documentation/devicetree/bindings/soc/fsl/cpm_qe/network.txt | 6 ++++++
 drivers/net/wan/fsl_ucc_hdlc.c                               | 5 ++++-
 drivers/net/wan/fsl_ucc_hdlc.h                               | 1 +
 3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/soc/fsl/cpm_qe/network.txt b/Documentation/devicetree/bindings/soc/fsl/cpm_qe/network.txt
index 03c741602c6d..6d2dd8a31482 100644
--- a/Documentation/devicetree/bindings/soc/fsl/cpm_qe/network.txt
+++ b/Documentation/devicetree/bindings/soc/fsl/cpm_qe/network.txt
@@ -98,6 +98,12 @@ The property below is dependent on fsl,tdm-interface:
 	usage: optional for tdm interface
 	value type: <empty>
 	Definition : Internal loopback connecting on TDM layer.
+- fsl,hmask
+	usage: optional
+	Value type: <u16>
+	Definition: HDLC address recognition. Set to zero to disable
+		    address filtering of packets:
+		    fsl,hmask = /bits/ 16 <0x0000>;
 
 Example for tdm interface:
 
diff --git a/drivers/net/wan/fsl_ucc_hdlc.c b/drivers/net/wan/fsl_ucc_hdlc.c
index 0f703d7be5e0..be5b0096af3b 100644
--- a/drivers/net/wan/fsl_ucc_hdlc.c
+++ b/drivers/net/wan/fsl_ucc_hdlc.c
@@ -263,7 +263,7 @@ static int uhdlc_init(struct ucc_hdlc_private *priv)
 	iowrite16be(MAX_FRAME_LENGTH, &priv->ucc_pram->mflr);
 	iowrite16be(DEFAULT_RFTHR, &priv->ucc_pram->rfthr);
 	iowrite16be(DEFAULT_RFTHR, &priv->ucc_pram->rfcnt);
-	iowrite16be(DEFAULT_ADDR_MASK, &priv->ucc_pram->hmask);
+	iowrite16be(priv->hmask, &priv->ucc_pram->hmask);
 	iowrite16be(DEFAULT_HDLC_ADDR, &priv->ucc_pram->haddr1);
 	iowrite16be(DEFAULT_HDLC_ADDR, &priv->ucc_pram->haddr2);
 	iowrite16be(DEFAULT_HDLC_ADDR, &priv->ucc_pram->haddr3);
@@ -1097,6 +1097,9 @@ static int ucc_hdlc_probe(struct platform_device *pdev)
 		if (ret)
 			goto free_utdm;
 	}
+	
+	if (of_property_read_u16(np, "fsl,hmask", &uhdlc_priv->hmask))
+		uhdlc_priv->hmask = DEFAULT_ADDR_MASK;
 
 	ret = uhdlc_init(uhdlc_priv);
 	if (ret) {
diff --git a/drivers/net/wan/fsl_ucc_hdlc.h b/drivers/net/wan/fsl_ucc_hdlc.h
index c21134c1f180..b99fa2f1cd99 100644
--- a/drivers/net/wan/fsl_ucc_hdlc.h
+++ b/drivers/net/wan/fsl_ucc_hdlc.h
@@ -106,6 +106,7 @@ struct ucc_hdlc_private {
 
 	unsigned short encoding;
 	unsigned short parity;
+	unsigned short hmask;
 	u32 clocking;
 	spinlock_t lock;	/* lock for Tx BD and Tx buffer */
 #ifdef CONFIG_PM
-- 
2.13.6

^ permalink raw reply related

* [PATCH v2 3/6] net/wan/fsl_ucc_hdlc: Adding ARPHRD_ETHER
From: David Gounaris @ 2018-08-29 13:13 UTC (permalink / raw)
  To: qiang.zhao, netdev, linuxppc-dev, robh+dt; +Cc: David Gounaris
In-Reply-To: <20180829131328.27901-1-david.gounaris@infinera.com>

This patch is to avoid discarding ethernet
packets when using HDLC_ETH protocol.

Signed-off-by: David Gounaris <david.gounaris@infinera.com>
---
 drivers/net/wan/fsl_ucc_hdlc.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/wan/fsl_ucc_hdlc.c b/drivers/net/wan/fsl_ucc_hdlc.c
index c8e526bf1130..0f703d7be5e0 100644
--- a/drivers/net/wan/fsl_ucc_hdlc.c
+++ b/drivers/net/wan/fsl_ucc_hdlc.c
@@ -376,6 +376,10 @@ static netdev_tx_t ucc_hdlc_tx(struct sk_buff *skb, struct net_device *dev)
 		dev->stats.tx_bytes += skb->len;
 		break;
 
+	case ARPHRD_ETHER:
+		dev->stats.tx_bytes += skb->len;
+		break;
+
 	default:
 		dev->stats.tx_dropped++;
 		dev_kfree_skb(skb);
@@ -513,6 +517,8 @@ static int hdlc_rx_done(struct ucc_hdlc_private *priv, int rx_work_limit)
 			break;
 
 		case ARPHRD_PPP:
+		case ARPHRD_ETHER:
+			
 			length -= HDLC_CRC_SIZE;
 
 			skb = dev_alloc_skb(length);
-- 
2.13.6

^ permalink raw reply related

* [PATCH v2 2/6] net/wan/fsl_ucc_hdlc: allow PARITY_CRC16_PR0_CCITT parity
From: David Gounaris @ 2018-08-29 13:13 UTC (permalink / raw)
  To: qiang.zhao, netdev, linuxppc-dev, robh+dt; +Cc: David Gounaris
In-Reply-To: <20180829131328.27901-1-david.gounaris@infinera.com>

Signed-off-by: David Gounaris <david.gounaris@infinera.com>
---
 drivers/net/wan/fsl_ucc_hdlc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wan/fsl_ucc_hdlc.c b/drivers/net/wan/fsl_ucc_hdlc.c
index 5cf6dcba039c..c8e526bf1130 100644
--- a/drivers/net/wan/fsl_ucc_hdlc.c
+++ b/drivers/net/wan/fsl_ucc_hdlc.c
@@ -781,7 +781,8 @@ static int ucc_hdlc_attach(struct net_device *dev, unsigned short encoding,
 
 	if (parity != PARITY_NONE &&
 	    parity != PARITY_CRC32_PR1_CCITT &&
-	    parity != PARITY_CRC16_PR1_CCITT)
+	    parity != PARITY_CRC16_PR1_CCITT && 
+	    parity != PARITY_CRC16_PR0_CCITT)
 		return -EINVAL;
 
 	priv->encoding = encoding;
-- 
2.13.6

^ permalink raw reply related

* [PATCH v2 1/6] net/wan/fsl_ucc_hdlc: allow ucc index up to 7
From: David Gounaris @ 2018-08-29 13:13 UTC (permalink / raw)
  To: qiang.zhao, netdev, linuxppc-dev, robh+dt; +Cc: David Gounaris
In-Reply-To: <20180829131328.27901-1-david.gounaris@infinera.com>

There is a need to allow higher indexes to be
able to support MPC83xx platforms. (UCC1-UCC8)

Signed-off-by: David Gounaris <david.gounaris@infinera.com>
---
 drivers/net/wan/fsl_ucc_hdlc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wan/fsl_ucc_hdlc.c b/drivers/net/wan/fsl_ucc_hdlc.c
index 33df76405b86..5cf6dcba039c 100644
--- a/drivers/net/wan/fsl_ucc_hdlc.c
+++ b/drivers/net/wan/fsl_ucc_hdlc.c
@@ -1016,7 +1016,7 @@ static int ucc_hdlc_probe(struct platform_device *pdev)
 	}
 
 	ucc_num = val - 1;
-	if ((ucc_num > 3) || (ucc_num < 0)) {
+	if ((ucc_num > UCC_MAX_NUM - 1) || (ucc_num < 0)) {
 		dev_err(&pdev->dev, ": Invalid UCC num\n");
 		return -EINVAL;
 	}
-- 
2.13.6

^ permalink raw reply related

* [PATCH v2 0/6] Ethernet over hdlc
From: David Gounaris @ 2018-08-29 13:13 UTC (permalink / raw)
  To: qiang.zhao, netdev, linuxppc-dev, robh+dt; +Cc: David Gounaris
In-Reply-To: <20180828110921.2542-2-david.gounaris@infinera.com>

Here is what has been changed in v2 after the review comments.

v2-0001: Using UCC_MAX_NUM
v2-0002: Unchanged
v2-0003: Changed commit message
v2-0004: Adding fsl,hmask into the dt instead of changing the default value.
v2-0005: Unchanged
v2-0006: Unchanged

Adding robh+dt@kernel.org for comments regarding dt.

Best Regards
David Gounaris

David Gounaris (6):
  net/wan/fsl_ucc_hdlc: allow ucc index up to 7
  net/wan/fsl_ucc_hdlc: allow PARITY_CRC16_PR0_CCITT parity
  net/wan/fsl_ucc_hdlc: Adding ARPHRD_ETHER
  net/wan/fsl_ucc_hdlc: hmask
  net/wan/fsl_ucc_hdlc: GUMR for non tsa mode
  net/wan/fsl_ucc_hdlc: tx timeout handler

 .../devicetree/bindings/soc/fsl/cpm_qe/network.txt |  6 +++++
 drivers/net/wan/fsl_ucc_hdlc.c                     | 28 +++++++++++++++++++---
 drivers/net/wan/fsl_ucc_hdlc.h                     |  1 +
 3 files changed, 32 insertions(+), 3 deletions(-)

-- 
2.13.6

^ permalink raw reply

* [PATCH net-next 3/3] net: socket: implement 64-bit timestamps
From: Arnd Bergmann @ 2018-08-29 13:11 UTC (permalink / raw)
  To: netdev, David S . Miller
  Cc: linux-arch, y2038, linux-alpha, linux-kernel, linux-mips,
	linux-sh, viro, tglx, edumazet, Arnd Bergmann
In-Reply-To: <20180829130308.3504560-1-arnd@arndb.de>

The 'timeval' and 'timespec' data structures used for socket timestamps
are going to be redefined in user space based on 64-bit time_t in future
versions of the C library to deal with the y2038 overflow problem,
which breaks the ABI definition.

Unlike many modern ioctl commands, SIOCGSTAMP and SIOCGSTAMPNS do not
use the _IOR() macro to encode the size of the transferred data, so it
remains ambiguous whether the application uses the old or new layout.

The best workaround I could find is rather ugly: we redefine the command
code based on the size of the respective data structure with a ternary
operator. This lets it get evaluated as late as possible, hopefully after
that structure is visible to the caller. We cannot use an #ifdef here,
because inux/sockios.h might have been included before any libc header
that could determine the size of time_t.

The ioctl implementation now interprets the new command codes as always
referring to the 64-bit structure on all architectures, while the old
architecture specific command code still refers to the old architecture
specific layout. The new command number is only used when they are
actually different.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/alpha/include/uapi/asm/sockios.h  |  4 ++--
 arch/mips/include/uapi/asm/sockios.h   |  4 ++--
 arch/sh/include/uapi/asm/sockios.h     |  5 +++--
 arch/xtensa/include/uapi/asm/sockios.h |  4 ++--
 include/uapi/asm-generic/sockios.h     |  4 ++--
 include/uapi/linux/sockios.h           | 21 +++++++++++++++++++++
 net/socket.c                           | 22 +++++++++++++++++-----
 7 files changed, 49 insertions(+), 15 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/sockios.h b/arch/alpha/include/uapi/asm/sockios.h
index ba287e4b01bf..af92bc27c3be 100644
--- a/arch/alpha/include/uapi/asm/sockios.h
+++ b/arch/alpha/include/uapi/asm/sockios.h
@@ -11,7 +11,7 @@
 #define SIOCSPGRP	_IOW('s', 8, pid_t)
 #define SIOCGPGRP	_IOR('s', 9, pid_t)
 
-#define SIOCGSTAMP	0x8906		/* Get stamp (timeval) */
-#define SIOCGSTAMPNS	0x8907		/* Get stamp (timespec) */
+#define SIOCGSTAMP_OLD	0x8906		/* Get stamp (timeval) */
+#define SIOCGSTAMPNS_OLD 0x8907		/* Get stamp (timespec) */
 
 #endif /* _ASM_ALPHA_SOCKIOS_H */
diff --git a/arch/mips/include/uapi/asm/sockios.h b/arch/mips/include/uapi/asm/sockios.h
index 5b40a88593fa..66f60234f290 100644
--- a/arch/mips/include/uapi/asm/sockios.h
+++ b/arch/mips/include/uapi/asm/sockios.h
@@ -21,7 +21,7 @@
 #define SIOCSPGRP	_IOW('s', 8, pid_t)
 #define SIOCGPGRP	_IOR('s', 9, pid_t)
 
-#define SIOCGSTAMP	0x8906		/* Get stamp (timeval) */
-#define SIOCGSTAMPNS	0x8907		/* Get stamp (timespec) */
+#define SIOCGSTAMP_OLD	0x8906		/* Get stamp (timeval) */
+#define SIOCGSTAMPNS_OLD 0x8907		/* Get stamp (timespec) */
 
 #endif /* _ASM_SOCKIOS_H */
diff --git a/arch/sh/include/uapi/asm/sockios.h b/arch/sh/include/uapi/asm/sockios.h
index 17313d2c3527..ef18a668456d 100644
--- a/arch/sh/include/uapi/asm/sockios.h
+++ b/arch/sh/include/uapi/asm/sockios.h
@@ -10,6 +10,7 @@
 #define SIOCSPGRP	_IOW('s', 8, pid_t)
 #define SIOCGPGRP	_IOR('s', 9, pid_t)
 
-#define SIOCGSTAMP	_IOR('s', 100, struct timeval) /* Get stamp (timeval) */
-#define SIOCGSTAMPNS	_IOR('s', 101, struct timespec) /* Get stamp (timespec) */
+#define SIOCGSTAMP_OLD	_IOR('s', 100, struct timeval) /* Get stamp (timeval) */
+#define SIOCGSTAMPNS_OLD _IOR('s', 101, struct timespec) /* Get stamp (timespec) */
+
 #endif /* __ASM_SH_SOCKIOS_H */
diff --git a/arch/xtensa/include/uapi/asm/sockios.h b/arch/xtensa/include/uapi/asm/sockios.h
index fb8ac3607189..1a1f58f4b75a 100644
--- a/arch/xtensa/include/uapi/asm/sockios.h
+++ b/arch/xtensa/include/uapi/asm/sockios.h
@@ -26,7 +26,7 @@
 #define SIOCSPGRP	_IOW('s', 8, pid_t)
 #define SIOCGPGRP	_IOR('s', 9, pid_t)
 
-#define SIOCGSTAMP	0x8906		/* Get stamp (timeval) */
-#define SIOCGSTAMPNS	0x8907		/* Get stamp (timespec) */
+#define SIOCGSTAMP_OLD	0x8906		/* Get stamp (timeval) */
+#define SIOCGSTAMPNS_OLD 0x8907		/* Get stamp (timespec) */
 
 #endif	/* _XTENSA_SOCKIOS_H */
diff --git a/include/uapi/asm-generic/sockios.h b/include/uapi/asm-generic/sockios.h
index 64f658c7cec2..44fa3ed70483 100644
--- a/include/uapi/asm-generic/sockios.h
+++ b/include/uapi/asm-generic/sockios.h
@@ -8,7 +8,7 @@
 #define FIOGETOWN	0x8903
 #define SIOCGPGRP	0x8904
 #define SIOCATMARK	0x8905
-#define SIOCGSTAMP	0x8906		/* Get stamp (timeval) */
-#define SIOCGSTAMPNS	0x8907		/* Get stamp (timespec) */
+#define SIOCGSTAMP_OLD	0x8906		/* Get stamp (timeval) */
+#define SIOCGSTAMPNS_OLD 0x8907		/* Get stamp (timespec) */
 
 #endif /* __ASM_GENERIC_SOCKIOS_H */
diff --git a/include/uapi/linux/sockios.h b/include/uapi/linux/sockios.h
index d393e9ed3964..7d1bccbbef78 100644
--- a/include/uapi/linux/sockios.h
+++ b/include/uapi/linux/sockios.h
@@ -19,6 +19,7 @@
 #ifndef _LINUX_SOCKIOS_H
 #define _LINUX_SOCKIOS_H
 
+#include <asm/bitsperlong.h>
 #include <asm/sockios.h>
 
 /* Linux-specific socket ioctls */
@@ -27,6 +28,26 @@
 
 #define SOCK_IOC_TYPE	0x89
 
+/*
+ * the timeval/timespec data structure layout is defined by libc,
+ * so we need to cover both possible versions on 32-bit.
+ */
+/* Get stamp (timeval) */
+#define SIOCGSTAMP_NEW	 _IOR(SOCK_IOC_TYPE, 0x06, long long[2])
+/* Get stamp (timespec) */
+#define SIOCGSTAMPNS_NEW _IOR(SOCK_IOC_TYPE, 0x07, long long[2])
+
+#if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__))
+/* on 64-bit and x32, avoid the ?: operator */
+#define SIOCGSTAMP	SIOCGSTAMP_OLD
+#define SIOCGSTAMPNS	SIOCGSTAMPNS_OLD
+#else
+#define SIOCGSTAMP	((sizeof(struct timeval))  == 8 ? \
+			 SIOCGSTAMP_OLD   : SIOCGSTAMP_NEW)
+#define SIOCGSTAMPNS	((sizeof(struct timespec)) == 8 ? \
+			 SIOCGSTAMPNS_OLD : SIOCGSTAMPNS_NEW)
+#endif
+
 /* Routing table calls. */
 #define SIOCADDRT	0x890B		/* add routing table entry	*/
 #define SIOCDELRT	0x890C		/* delete routing table entry	*/
diff --git a/net/socket.c b/net/socket.c
index 6814e8dc8af1..9762e7d5378b 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1069,14 +1069,24 @@ static long sock_ioctl(struct file *file, unsigned cmd, unsigned long arg)
 
 			err = open_related_ns(&net->ns, get_net_ns);
 			break;
-		case SIOCGSTAMP:
-		case SIOCGSTAMPNS:
+		case SIOCGSTAMP_OLD:
+		case SIOCGSTAMPNS_OLD:
 			if (!sock->ops->gettstamp) {
 				err = -ENOIOCTLCMD;
 				break;
 			}
 			err = sock->ops->gettstamp(sock, argp,
-						   cmd == SIOCGSTAMP, false);
+						   cmd == SIOCGSTAMP_OLD,
+						   !IS_ENABLED(CONFIG_64BIT));
+		case SIOCGSTAMP_NEW:
+		case SIOCGSTAMPNS_NEW:
+			if (!sock->ops->gettstamp) {
+				err = -ENOIOCTLCMD;
+				break;
+			}
+			err = sock->ops->gettstamp(sock, argp,
+						   cmd == SIOCGSTAMP_NEW,
+						   false);
 			break;
 		default:
 			err = sock_do_ioctl(net, sock, cmd, arg);
@@ -3095,8 +3105,8 @@ static int compat_sock_ioctl_trans(struct file *file, struct socket *sock,
 	case SIOCADDRT:
 	case SIOCDELRT:
 		return routing_ioctl(net, sock, cmd, argp);
-	case SIOCGSTAMP:
-	case SIOCGSTAMPNS:
+	case SIOCGSTAMP_OLD:
+	case SIOCGSTAMPNS_OLD:
 		if (!sock->ops->gettstamp)
 			return -ENOIOCTLCMD;
 		return sock->ops->gettstamp(sock, argp, cmd == SIOCGSTAMP,
@@ -3119,6 +3129,8 @@ static int compat_sock_ioctl_trans(struct file *file, struct socket *sock,
 	case SIOCADDDLCI:
 	case SIOCDELDLCI:
 	case SIOCGSKNS:
+	case SIOCGSTAMP_NEW:
+	case SIOCGSTAMPNS_NEW:
 		return sock_ioctl(file, cmd, arg);
 
 	case SIOCGIFFLAGS:
-- 
2.18.0

^ permalink raw reply related

* Re: [PATCH net-next] rds: store socket timestamps as ktime_t
From: Santosh Shilimkar @ 2018-08-29 17:00 UTC (permalink / raw)
  To: Arnd Bergmann, David S. Miller
  Cc: Sowmini Varadhan, Willem de Bruijn, Ka-Cheong Poon,
	Salvatore Mesoraca, Avinash Repaka, Eric Dumazet, netdev,
	linux-rdma, rds-devel, linux-kernel
In-Reply-To: <20180829154732.844217-1-arnd@arndb.de>

On 8/29/2018 8:47 AM, Arnd Bergmann wrote:
> rds is the last in-kernel user of the old do_gettimeofday()
> function. Convert it over to ktime_get_real() to make it
> work more like the generic socket timestamps, and to let
> us kill off do_gettimeofday().
> 
> A follow-up patch will have to change the user space interface
> to deal better with 32-bit tasks, which may use an incompatible
> layout for 'struct timespec'.
> 
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
Thanks Arnd !!

FWIW,
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

^ permalink raw reply

* Re: [PATCH net-next] virtio_net: force_napi_tx module param.
From: Willem de Bruijn @ 2018-08-29 13:01 UTC (permalink / raw)
  To: Jason Wang
  Cc: Jon Olson (Google Drive), Michael S. Tsirkin, caleb.raitto,
	David Miller, Network Development, Caleb Raitto
In-Reply-To: <f7f8a848-6b63-a4c4-469e-9c019a4cfc91@redhat.com>

On Wed, Aug 29, 2018 at 3:56 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
>
> On 2018年08月29日 03:57, Willem de Bruijn wrote:
> > On Mon, Jul 30, 2018 at 2:06 AM Jason Wang <jasowang@redhat.com> wrote:
> >>
> >>
> >> On 2018年07月25日 08:17, Jon Olson wrote:
> >>> On Tue, Jul 24, 2018 at 3:46 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >>>> On Tue, Jul 24, 2018 at 06:31:54PM -0400, Willem de Bruijn wrote:
> >>>>> On Tue, Jul 24, 2018 at 6:23 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >>>>>> On Tue, Jul 24, 2018 at 04:52:53PM -0400, Willem de Bruijn wrote:
> >>>>>>> >From the above linked patch, I understand that there are yet
> >>>>>>> other special cases in production, such as a hard cap on #tx queues to
> >>>>>>> 32 regardless of number of vcpus.
> >>>>>> I don't think upstream kernels have this limit - we can
> >>>>>> now use vmalloc for higher number of queues.
> >>>>> Yes. that patch* mentioned it as a google compute engine imposed
> >>>>> limit. It is exactly such cloud provider imposed rules that I'm
> >>>>> concerned about working around in upstream drivers.
> >>>>>
> >>>>> * for reference, I mean https://patchwork.ozlabs.org/patch/725249/
> >>>> Yea. Why does GCE do it btw?
> >>> There are a few reasons for the limit, some historical, some current.
> >>>
> >>> Historically we did this because of a kernel limit on the number of
> >>> TAP queues (in Montreal I thought this limit was 32). To my chagrin,
> >>> the limit upstream at the time we did it was actually eight. We had
> >>> increased the limit from eight to 32 internally, and it appears in
> >>> upstream it has subsequently increased upstream to 256. We no longer
> >>> use TAP for networking, so that constraint no longer applies for us,
> >>> but when looking at removing/raising the limit we discovered no
> >>> workloads that clearly benefited from lifting it, and it also placed
> >>> more pressure on our virtual networking stack particularly on the Tx
> >>> side. We left it as-is.
> >>>
> >>> In terms of current reasons there are really two. One is memory usage.
> >>> As you know, virtio-net uses rx/tx pairs, so there's an expectation
> >>> that the guest will have an Rx queue for every Tx queue. We run our
> >>> individual virtqueues fairly deep (4096 entries) to give guests a wide
> >>> time window for re-posting Rx buffers and avoiding starvation on
> >>> packet delivery. Filling an Rx vring with max-sized mergeable buffers
> >>> (4096 bytes) is 16MB of GFP_ATOMIC allocations. At 32 queues this can
> >>> be up to 512MB of memory posted for network buffers. Scaling this to
> >>> the largest VM GCE offers today (160 VCPUs -- n1-ultramem-160) keeping
> >>> all of the Rx rings full would (in the large average Rx packet size
> >>> case) consume up to 2.5 GB(!) of guest RAM. Now, those VMs have 3.8T
> >>> of RAM available, but I don't believe we've observed a situation where
> >>> they would have benefited from having 2.5 gigs of buffers posted for
> >>> incoming network traffic :)
> >> We can work to have async txq and rxq instead of paris if there's a
> >> strong requirement.
> >>
> >>> The second reason is interrupt related -- as I mentioned above, we
> >>> have found no workloads that clearly benefit from so many queues, but
> >>> we have found workloads that degrade. In particular workloads that do
> >>> a lot of small packet processing but which aren't extremely latency
> >>> sensitive can achieve higher PPS by taking fewer interrupt across
> >>> fewer VCPUs due to better batching (this also incurs higher latency,
> >>> but at the limit the "busy" cores end up suppressing most interrupts
> >>> and spending most of their cycles farming out work). Memcache is a
> >>> good example here, particularly if the latency targets for request
> >>> completion are in the ~milliseconds range (rather than the
> >>> microseconds we typically strive for with TCP_RR-style workloads).
> >>>
> >>> All of that said, we haven't been forthcoming with data (and
> >>> unfortunately I don't have it handy in a useful form, otherwise I'd
> >>> simply post it here), so I understand the hesitation to simply run
> >>> with napi_tx across the board. As Willem said, this patch seemed like
> >>> the least disruptive way to allow us to continue down the road of
> >>> "universal" NAPI Tx and to hopefully get data across enough workloads
> >>> (with VMs small, large, and absurdly large :) to present a compelling
> >>> argument in one direction or another. As far as I know there aren't
> >>> currently any NAPI related ethtool commands (based on a quick perusal
> >>> of ethtool.h)
> >> As I suggest before, maybe we can (ab)use tx-frames-irq.
> > I forgot to respond to this originally, but I agree.
> >
> > How about something like the snippet below. It would be simpler to
> > reason about if only allow switching while the device is down, but
> > napi does not strictly require that.
> >
> > +static int virtnet_set_coalesce(struct net_device *dev,
> > +                               struct ethtool_coalesce *ec)
> > +{
> > +       const u32 tx_coalesce_napi_mask = (1 << 16);
> > +       const struct ethtool_coalesce ec_default = {
> > +               .cmd = ETHTOOL_SCOALESCE,
> > +               .rx_max_coalesced_frames = 1,
> > +               .tx_max_coalesced_frames = 1,
> > +       };
> > +       struct virtnet_info *vi = netdev_priv(dev);
> > +       int napi_weight = 0;
> > +       bool running;
> > +       int i;
> > +
> > +       if (ec->tx_max_coalesced_frames & tx_coalesce_napi_mask) {
> > +               ec->tx_max_coalesced_frames &= ~tx_coalesce_napi_mask;
> > +               napi_weight = NAPI_POLL_WEIGHT;
> > +       }
> > +
> > +       /* disallow changes to fields not explicitly tested above */
> > +       if (memcmp(ec, &ec_default, sizeof(ec_default)))
> > +               return -EINVAL;
> > +
> > +       if (napi_weight ^ vi->sq[0].napi.weight) {
> > +               running = netif_running(vi->dev);
> > +
> > +               for (i = 0; i < vi->max_queue_pairs; i++) {
> > +                       vi->sq[i].napi.weight = napi_weight;
> > +
> > +                       if (!running)
> > +                               continue;
> > +
> > +                       if (napi_weight)
> > +                               virtnet_napi_tx_enable(vi, vi->sq[i].vq,
> > +                                                      &vi->sq[i].napi);
> > +                       else
> > +                               napi_disable(&vi->sq[i].napi);
> > +               }
> > +       }
> > +
> > +       return 0;
> > +}
> > +
> > +static int virtnet_get_coalesce(struct net_device *dev,
> > +                               struct ethtool_coalesce *ec)
> > +{
> > +       const u32 tx_coalesce_napi_mask = (1 << 16);
> > +       const struct ethtool_coalesce ec_default = {
> > +               .cmd = ETHTOOL_GCOALESCE,
> > +               .rx_max_coalesced_frames = 1,
> > +               .tx_max_coalesced_frames = 1,
> > +       };
> > +       struct virtnet_info *vi = netdev_priv(dev);
> > +
> > +       memcpy(ec, &ec_default, sizeof(ec_default));
> > +
> > +       if (vi->sq[0].napi.weight)
> > +               ec->tx_max_coalesced_frames |= tx_coalesce_napi_mask;
> > +
> > +       return 0;
> > +}
>
> Looks good. Just one nit, maybe it's better simply check against zero?

I wanted to avoid making napi and interrupt moderation mutually
exclusive. If the virtio-net driver ever gets true moderation support,
it should be able to work alongside napi.

But I can make no-napi be 0 and napi be 1. That is future proof, in
the sense that napi is enabled if there is any interrupt moderation.

^ permalink raw reply

* Re: [PATCH bpf-next 11/11] samples/bpf: add -c/--copy -z/--zero-copy flags to xdpsock
From: Jesper Dangaard Brouer @ 2018-08-29 12:44 UTC (permalink / raw)
  To: Björn Töpel
  Cc: magnus.karlsson, magnus.karlsson, alexander.h.duyck,
	alexander.duyck, ast, daniel, netdev, jesse.brandeburg,
	anjali.singhai, peter.waskiewicz.jr, Björn Töpel,
	michael.lundkvist, willemdebruijn.kernel, john.fastabend,
	jakub.kicinski, neerav.parikh, mykyta.iziumtsev, francois.ozog,
	ilias.apalodimas, brian.brooks, u9012063, pavel, qi.z.zhang,
	brouer
In-Reply-To: <20180828124435.30578-12-bjorn.topel@gmail.com>

On Tue, 28 Aug 2018 14:44:35 +0200
Björn Töpel <bjorn.topel@gmail.com> wrote:

> From: Björn Töpel <bjorn.topel@intel.com>
> 
> The -c/--copy -z/--zero-copy flags enforces either copy or zero-copy
> mode.

Nice, thanks for adding this.  It allows me to quickly test the
difference between normal-copy vs zero-copy modes.
(Kernel bpf-next without RETPOLINE).

AF_XDP RX-drop:
 Normal-copy mode: rx 13,070,318 pps - 76.5 ns
 Zero-copy   mode: rx 26,132,328 pps - 38.3 ns

Compare to XDP_DROP:  34,251,464 pps - 29.2 ns
   XDP_DROP + read :  30,756,664 pps - 32.5 ns

The normal-copy mode is surprisingly fast (and it works for every
driver implemeting the regular XDP_REDIRECT action).  It is still
faster to do in-kernel XDP_DROP than AF_XDP zero-copy mode dropping,
which was expected given frames travel to a remote CPU before returned
(don't think remote CPU reads payload?).  The gap in nanosec is
actually quite small, thus I'm impressed by the SPSC-queue
implementation working across these CPUs.


AF_XDP layer2-fwd:
 Normal-copy mode: rx  3,200,885   tx  3,200,892
 Zero-copy   mode: rx 17,026,300   tx 17,026,269

Compare to XDP_TX: rx 14,529,079   tx 14,529,850  - 68.82 ns
     XDP_REDIRECT: rx 13,235,785   tx 13,235,784  - 75.55 ns

The copy-mode is slow because it allocates SKBs internally (I do
wonder if we could speed it up by using ndo_xdp_xmit + disable-BH).
More intersting is that the zero-copy is faster than XDP_TX and
XDP_REDIRECT. I think the speedup comes from avoiding some DMA mapping
calls with ZC.

Side-note: XDP_TX vs. REDIRECT: 75.55 - 68.82 = 6.73 ns.  The cost of
going through the xdp_do_redirect_map core is actually quite small :-)
(I have some micro optimizations that should help ~2ns).


AF_XDP TX-only:
 Normal-copy mode: tx  2,853,461 pps
 Zero-copy   mode: tx 22,255,311 pps

(There is not XDP mode that does TX to compare against)

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: Kernel Panic on high bandwidth transfer over wifi
From: Nathaniel Munk @ 2018-08-29 12:12 UTC (permalink / raw)
  To: Willy Tarreau, netdev@vger.kernel.org
In-Reply-To: <20180829120928.GA21238@1wt.eu>

[-- Attachment #1: Type: text/plain, Size: 404 bytes --]

Of course I did, sorry.

-------------------
Nathaniel Munk
nathaniel@munk.com.au
0435 726 099

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On August 29, 2018 10:09 PM, Willy Tarreau <w@1wt.eu> wrote:

> On Wed, Aug 29, 2018 at 11:42:44AM +0000, Nathaniel Munk wrote:
>
> > As you can see from the attached log
>
> You apparently forgot to attach the log.
>
> Willy



[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: dmesg.log --]
[-- Type: text/x-log; name="dmesg.log", Size: 14254 bytes --]

[ 1242.620637] TCP recvmsg seq # bug: copied 93359823, seq 1, rcvnxt 93359D8F, fl 80000000
[ 1242.620700] WARNING: CPU: 0 PID: 10255 at net/ipv4/tcp.c:2003 tcp_recvmsg+0x579/0xc70
[ 1242.620704] Modules linked in: ccm nf_tables_set nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat_ipv6 nft_chain_nat_ipv4 nf_tables ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack libcrc32c ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bpfilter 8021q mrp ipheth btusb btrtl btbcm btintel uvcvideo bluetooth videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev media ecdh_generic joydev mousedev nls_iso8859_1 nls_cp437 vfat fat arc4 snd_hda_codec_hdmi
[ 1242.620797]  snd_soc_skl snd_soc_skl_ipc iwlmvm snd_soc_sst_ipc snd_soc_sst_dsp snd_hda_ext_core snd_soc_acpi snd_soc_core mei_wdt mac80211 snd_hda_codec_realtek snd_compress iTCO_wdt ac97_bus snd_hda_codec_generic snd_pcm_dmaengine iTCO_vendor_support intel_rapl snd_hda_intel x86_pkg_temp_thermal intel_powerclamp snd_hda_codec coretemp kvm_intel snd_hda_core wmi_bmof iwlwifi kvm cfg80211 snd_hwdep input_leds snd_pcm thinkpad_acpi snd_timer psmouse irqbypass nvram mei_me intel_cstate rfkill intel_uncore e1000e intel_rapl_perf mei i2c_i801 snd intel_pch_thermal soundcore ac tpm_tis tpm_tis_core tpm rng_core led_class evdev battery wmi rtc_cmos mac_hid pcc_cpufreq ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 fscrypto algif_skcipher af_alg dm_crypt dm_mod sd_mod crct10dif_pclmul crc32_pclmul
[ 1242.620885]  crc32c_intel ghash_clmulni_intel pcbc serio_raw atkbd libps2 ahci libahci aesni_intel libata aes_x86_64 crypto_simd xhci_pci cryptd glue_helper scsi_mod xhci_hcd i8042 serio hid_generic usbhid usbcore usb_common hid intel_agp i915 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm intel_gtt agpgart
[ 1242.620933] CPU: 0 PID: 10255 Comm: IOCP Thread 0 Tainted: G     U  W         4.18.5-arch1-1-ARCH #1
[ 1242.620937] Hardware name: LENOVO 20FN001CAU/20FN001CAU, BIOS R06ET59W (1.33 ) 02/27/2018
[ 1242.620949] RIP: 0010:tcp_recvmsg+0x579/0xc70
[ 1242.620951] Code: fb ff ff 4c 89 e0 41 8b 8d 38 05 00 00 44 8b 44 24 2c 89 de 48 c7 c7 28 f5 6f 9a 4c 89 54 24 08 48 89 44 24 10 e8 11 f8 9f ff <0f> 0b 48 8b 44 24 10 4c 8b 54 24 08 8b 4c 24 3c 39 4c 24 38 0f 8c 
[ 1242.621003] RSP: 0018:ffffb0de41223bb0 EFLAGS: 00010282
[ 1242.621007] RAX: 0000000000000000 RBX: 0000000093359823 RCX: 0000000000000001
[ 1242.621010] RDX: 0000000080000001 RSI: ffffffff9a6811ce RDI: 00000000ffffffff
[ 1242.621013] RBP: ffffb0de41223c70 R08: ffffffff99cddf10 R09: 00000000000003c4
[ 1242.621016] R10: 0000000000000008 R11: ffffffff9ae04f2d R12: ffff8a6a70e76a00
[ 1242.621018] R13: ffff8a6a39ea18c0 R14: 0000000000000000 R15: ffff8a6a39ea1dfc
[ 1242.621022] FS:  0000000000000000(0000) GS:ffff8a6a81400000(0063) knlGS:00000000ed455b40
[ 1242.621026] CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
[ 1242.621028] CR2: 00000000e9f68000 CR3: 00000002347aa004 CR4: 00000000003606f0
[ 1242.621032] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1242.621035] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1242.621038] Call Trace:
[ 1242.621049]  ? update_rq_clock+0x33/0x120
[ 1242.621053]  ? compat_import_iovec+0x37/0xcd
[ 1242.621058]  ? __switch_to_asm+0x40/0x70
[ 1242.621061]  inet_recvmsg+0x5b/0x100
[ 1242.621066]  ___sys_recvmsg+0xdd/0x1e0
[ 1242.621070]  ? __switch_to_asm+0x34/0x70
[ 1242.621074]  ? _raw_spin_unlock_irq+0x1d/0x30
[ 1242.621077]  ? finish_task_switch+0x83/0x2c0
[ 1242.621080]  ? tcp_keepalive_timer.cold.3+0x19/0x19
[ 1242.621084]  ? tcp_poll+0x12e/0x260
[ 1242.621086]  ? sock_poll+0x61/0xb0
[ 1242.621091]  ? ep_item_poll.isra.1+0x40/0xc0
[ 1242.621095]  ? ep_send_events_proc+0x7b/0x1a0
[ 1242.621098]  ? __ia32_sys_epoll_ctl+0x20/0x20
[ 1242.621101]  ? preempt_count_add+0x68/0xa0
[ 1242.621106]  ? _raw_spin_lock_irqsave+0x25/0x50
[ 1242.621112]  ? __fget+0x6e/0xa0
[ 1242.621116]  __sys_recvmsg+0x54/0xa0
[ 1242.621124]  __ia32_compat_sys_socketcall+0x174/0x300
[ 1242.621129]  ? do_epoll_wait+0x8b/0xd0
[ 1242.621135]  do_fast_syscall_32+0xa7/0x2a0
[ 1242.621139]  entry_SYSENTER_compat+0x7f/0x91
[ 1242.621144] ---[ end trace dc996496c7568a8f ]---
[ 1242.621232] BUG: unable to handle kernel paging request at 000000020024cb72
[ 1242.621238] PGD 80000001f9fe0067 P4D 80000001f9fe0067 PUD 0 
[ 1242.621244] Oops: 0000 [#1] PREEMPT SMP PTI
[ 1242.621250] CPU: 0 PID: 10255 Comm: IOCP Thread 0 Tainted: G     U  W         4.18.5-arch1-1-ARCH #1
[ 1242.621252] Hardware name: LENOVO 20FN001CAU/20FN001CAU, BIOS R06ET59W (1.33 ) 02/27/2018
[ 1242.621258] RIP: 0010:tcp_drop+0x17/0x40
[ 1242.621259] Code: 00 e9 5b ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00 8b 96 cc 00 00 00 48 8b 8e d0 00 00 00 b8 01 00 00 00 <66> 83 7c 11 06 00 66 0f 45 44 11 06 0f b7 c0 f0 01 87 a8 00 00 00 
[ 1242.621299] RSP: 0018:ffffb0de41223b08 EFLAGS: 00010203
[ 1242.621303] RAX: 0000000000000001 RBX: ffff8a6a39ea18c0 RCX: 0000000100254102
[ 1242.621305] RDX: 00000000ffff8a6a RSI: ffff8a6a39ea1988 RDI: ffff8a6a39ea18c0
[ 1242.621306] RBP: ffff8a6a39ea1988 R08: 0000000000000000 R09: ffffffff99e8930e
[ 1242.621308] R10: ffff8a6a743f6310 R11: 0000000000000002 R12: ffff8a6a39ea1988
[ 1242.621310] R13: ffffffff99e92380 R14: 0000000000000000 R15: 00000000fffffff5
[ 1242.621312] FS:  0000000000000000(0000) GS:ffff8a6a81400000(0063) knlGS:00000000ed455b40
[ 1242.621315] CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
[ 1242.621316] CR2: 000000020024cb72 CR3: 00000002347aa004 CR4: 00000000003606f0
[ 1242.621319] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1242.621321] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1242.621323] Call Trace:
[ 1242.621332]  tcp_rcv_established+0x10a/0x630
[ 1242.621339]  tcp_v4_do_rcv+0x11e/0x1c0
[ 1242.621348]  __release_sock+0x7c/0xc0
[ 1242.621354]  release_sock+0x2b/0x90
[ 1242.621357]  tcp_recvmsg+0x4dc/0xc70
[ 1242.621366]  ? update_rq_clock+0x33/0x120
[ 1242.621374]  ? compat_import_iovec+0x37/0xcd
[ 1242.621381]  ? __switch_to_asm+0x40/0x70
[ 1242.621386]  inet_recvmsg+0x5b/0x100
[ 1242.621394]  ___sys_recvmsg+0xdd/0x1e0
[ 1242.621399]  ? __switch_to_asm+0x34/0x70
[ 1242.621403]  ? _raw_spin_unlock_irq+0x1d/0x30
[ 1242.621407]  ? finish_task_switch+0x83/0x2c0
[ 1242.621412]  ? tcp_keepalive_timer.cold.3+0x19/0x19
[ 1242.621420]  ? tcp_poll+0x12e/0x260
[ 1242.621423]  ? sock_poll+0x61/0xb0
[ 1242.621434]  ? ep_item_poll.isra.1+0x40/0xc0
[ 1242.621438]  ? ep_send_events_proc+0x7b/0x1a0
[ 1242.621442]  ? __ia32_sys_epoll_ctl+0x20/0x20
[ 1242.621445]  ? preempt_count_add+0x68/0xa0
[ 1242.621450]  ? _raw_spin_lock_irqsave+0x25/0x50
[ 1242.621458]  ? __fget+0x6e/0xa0
[ 1242.621462]  __sys_recvmsg+0x54/0xa0
[ 1242.621469]  __ia32_compat_sys_socketcall+0x174/0x300
[ 1242.621473]  ? do_epoll_wait+0x8b/0xd0
[ 1242.621480]  do_fast_syscall_32+0xa7/0x2a0
[ 1242.621484]  entry_SYSENTER_compat+0x7f/0x91
[ 1242.621490] Modules linked in: ccm nf_tables_set nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat_ipv6 nft_chain_nat_ipv4 nf_tables ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack libcrc32c ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bpfilter 8021q mrp ipheth btusb btrtl btbcm btintel uvcvideo bluetooth videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev media ecdh_generic joydev mousedev nls_iso8859_1 nls_cp437 vfat fat arc4 snd_hda_codec_hdmi
[ 1242.621570]  snd_soc_skl snd_soc_skl_ipc iwlmvm snd_soc_sst_ipc snd_soc_sst_dsp snd_hda_ext_core snd_soc_acpi snd_soc_core mei_wdt mac80211 snd_hda_codec_realtek snd_compress iTCO_wdt ac97_bus snd_hda_codec_generic snd_pcm_dmaengine iTCO_vendor_support intel_rapl snd_hda_intel x86_pkg_temp_thermal intel_powerclamp snd_hda_codec coretemp kvm_intel snd_hda_core wmi_bmof iwlwifi kvm cfg80211 snd_hwdep input_leds snd_pcm thinkpad_acpi snd_timer psmouse irqbypass nvram mei_me intel_cstate rfkill intel_uncore e1000e intel_rapl_perf mei i2c_i801 snd intel_pch_thermal soundcore ac tpm_tis tpm_tis_core tpm rng_core led_class evdev battery wmi rtc_cmos mac_hid pcc_cpufreq ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 fscrypto algif_skcipher af_alg dm_crypt dm_mod sd_mod crct10dif_pclmul crc32_pclmul
[ 1242.621647]  crc32c_intel ghash_clmulni_intel pcbc serio_raw atkbd libps2 ahci libahci aesni_intel libata aes_x86_64 crypto_simd xhci_pci cryptd glue_helper scsi_mod xhci_hcd i8042 serio hid_generic usbhid usbcore usb_common hid intel_agp i915 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm intel_gtt agpgart
[ 1242.621693] CR2: 000000020024cb72
[ 1242.621698] ---[ end trace dc996496c7568a90 ]---
[ 1242.621709] RIP: 0010:tcp_drop+0x17/0x40
[ 1242.621712] Code: 00 e9 5b ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00 8b 96 cc 00 00 00 48 8b 8e d0 00 00 00 b8 01 00 00 00 <66> 83 7c 11 06 00 66 0f 45 44 11 06 0f b7 c0 f0 01 87 a8 00 00 00 
[ 1242.621758] RSP: 0018:ffffb0de41223b08 EFLAGS: 00010203
[ 1242.621763] RAX: 0000000000000001 RBX: ffff8a6a39ea18c0 RCX: 0000000100254102
[ 1242.621765] RDX: 00000000ffff8a6a RSI: ffff8a6a39ea1988 RDI: ffff8a6a39ea18c0
[ 1242.621767] RBP: ffff8a6a39ea1988 R08: 0000000000000000 R09: ffffffff99e8930e
[ 1242.621769] R10: ffff8a6a743f6310 R11: 0000000000000002 R12: ffff8a6a39ea1988
[ 1242.621771] R13: ffffffff99e92380 R14: 0000000000000000 R15: 00000000fffffff5
[ 1242.621773] FS:  0000000000000000(0000) GS:ffff8a6a81400000(0063) knlGS:00000000ed455b40
[ 1242.621775] CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
[ 1242.621777] CR2: 000000020024cb72 CR3: 00000002347aa004 CR4: 00000000003606f0
[ 1242.621780] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1242.621782] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

--- Seperate Boot ---
[  158.591645] WARNING: CPU: 2 PID: 29 at net/core/dev.c:4279 net_tx_action+0x1fe/0x260
[  158.591651] Modules linked in: ccm nf_tables_set nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat_ipv6 nft_chain_nat_ipv4 nf_tables ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack libcrc32c arc4 iwlmvm mac80211 ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bpfilter btusb snd_soc_skl btrtl btbcm btintel uvcvideo bluetooth ipheth snd_soc_skl_ipc snd_soc_sst_ipc snd_soc_sst_dsp iwlwifi videobuf2_vmalloc snd_hda_ext_core videobuf2_memops videobuf2_v4l2 8021q mrp videobuf2_common
[  158.591799]  snd_soc_acpi intel_rapl snd_soc_core videodev x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel snd_hda_codec_hdmi nls_iso8859_1 media ecdh_generic nls_cp437 snd_compress joydev ac97_bus snd_hda_codec_realtek vfat snd_hda_codec_generic fat snd_pcm_dmaengine mousedev kvm snd_hda_intel cfg80211 snd_hda_codec wmi_bmof mei_wdt snd_hda_core snd_hwdep snd_pcm irqbypass intel_cstate iTCO_wdt tpm_tis iTCO_vendor_support intel_uncore psmouse tpm_tis_core thinkpad_acpi tpm intel_rapl_perf nvram input_leds e1000e snd_timer rfkill rng_core wmi snd mei_me evdev mei battery mac_hid led_class ac intel_pch_thermal rtc_cmos i2c_i801 pcc_cpufreq soundcore ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 fscrypto algif_skcipher af_alg dm_crypt dm_mod sd_mod crct10dif_pclmul crc32_pclmul
[  158.591959]  crc32c_intel ghash_clmulni_intel pcbc serio_raw ahci atkbd libahci libps2 xhci_pci libata aesni_intel aes_x86_64 crypto_simd cryptd glue_helper xhci_hcd scsi_mod i8042 serio hid_generic usbhid usbcore usb_common hid intel_agp i915 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm intel_gtt agpgart
[  158.592043] CPU: 2 PID: 29 Comm: ksoftirqd/2 Tainted: G     U            4.18.5 #1
[  158.592047] Hardware name: LENOVO 20FN001CAU/20FN001CAU, BIOS R06ET59W (1.33 ) 02/27/2018
[  158.592058] RIP: 0010:net_tx_action+0x1fe/0x260
[  158.592060] Code: 8d bb 04 01 00 00 e8 f1 d5 15 00 e9 2a ff ff ff 5b 4c 89 e7 5d 41 5c 41 5d 41 5e e9 fc 5c 0c 00 0f 1f 44 00 00 e9 76 fe ff ff <0f> 0b e9 5f fe ff ff 65 8b 05 24 6d bf 44 89 c0 48 0f a3 05 2a 38 
[  158.592195] RSP: 0018:ffffb5d4c0d9be50 EFLAGS: 00010286
[  158.592202] RAX: 00000000c0000000 RBX: ffff945bb4ae3800 RCX: 00000000000000a1
[  158.592206] RDX: 0000000080000101 RSI: ffffffffbbc81896 RDI: ffffffffbbc898b5
[  158.592211] RBP: 0000000000000002 R08: 000000694facc758 R09: 0000000000000100
[  158.592215] R10: 0000000000000000 R11: ffff945bc1520a68 R12: ffff945bc1522f00
[  158.592219] R13: ffff945bb4ae3900 R14: ffffffffbb4182a0 R15: ffffffffbaea1950
[  158.592226] FS:  0000000000000000(0000) GS:ffff945bc1500000(0000) knlGS:0000000000000000
[  158.592231] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  158.592235] CR2: 00007ff03400dad8 CR3: 000000018de0a003 CR4: 00000000003606e0
[  158.592241] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  158.592245] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  158.592248] Call Trace:
[  158.592270]  __do_softirq+0x10d/0x30d
[  158.592287]  ? sort_range+0x20/0x20
[  158.592295]  run_ksoftirqd+0x32/0x40
[  158.592306]  smpboot_thread_fn+0x198/0x230
[  158.592319]  kthread+0x112/0x130
[  158.592330]  ? kthread_flush_work_fn+0x10/0x10
[  158.592342]  ret_from_fork+0x35/0x40
[  158.592353] ---[ end trace 88207aade9b579ae ]---

^ permalink raw reply

* Re: Kernel Panic on high bandwidth transfer over wifi
From: Willy Tarreau @ 2018-08-29 12:09 UTC (permalink / raw)
  To: Nathaniel Munk; +Cc: netdev@vger.kernel.org
In-Reply-To: <porA6Oc9uVOmzI_nV2MhBU5RzBhBCq82gffuA1TBkrQxwTToIy03qc44DkN8BmdVKgQAGU6qDyrkCx7sDQI5SbSt9b9-HHf9ufHcIQlT-kg=@munk.com.au>

On Wed, Aug 29, 2018 at 11:42:44AM +0000, Nathaniel Munk wrote:
> As you can see from the attached log

You apparently forgot to attach the log.

Willy

^ permalink raw reply

* [PATCH net-next] rds: store socket timestamps as ktime_t
From: Arnd Bergmann @ 2018-08-29 15:47 UTC (permalink / raw)
  To: Santosh Shilimkar, David S. Miller
  Cc: Arnd Bergmann, Sowmini Varadhan, Willem de Bruijn, Ka-Cheong Poon,
	Salvatore Mesoraca, Avinash Repaka, Eric Dumazet, netdev,
	linux-rdma, rds-devel, linux-kernel

rds is the last in-kernel user of the old do_gettimeofday()
function. Convert it over to ktime_get_real() to make it
work more like the generic socket timestamps, and to let
us kill off do_gettimeofday().

A follow-up patch will have to change the user space interface
to deal better with 32-bit tasks, which may use an incompatible
layout for 'struct timespec'.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 net/rds/rds.h  |  2 +-
 net/rds/recv.c | 14 ++++++--------
 2 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/net/rds/rds.h b/net/rds/rds.h
index c4dcf654d8fe..6bfaf05b63b2 100644
--- a/net/rds/rds.h
+++ b/net/rds/rds.h
@@ -278,7 +278,7 @@ struct rds_incoming {
 	struct in6_addr		i_saddr;
 
 	rds_rdma_cookie_t	i_rdma_cookie;
-	struct timeval		i_rx_tstamp;
+	ktime_t			i_rx_tstamp;
 	u64			i_rx_lat_trace[RDS_RX_MAX_TRACES];
 };
 
diff --git a/net/rds/recv.c b/net/rds/recv.c
index 504cd6bcc54c..12719653188a 100644
--- a/net/rds/recv.c
+++ b/net/rds/recv.c
@@ -50,8 +50,7 @@ void rds_inc_init(struct rds_incoming *inc, struct rds_connection *conn,
 	inc->i_conn = conn;
 	inc->i_saddr = *saddr;
 	inc->i_rdma_cookie = 0;
-	inc->i_rx_tstamp.tv_sec = 0;
-	inc->i_rx_tstamp.tv_usec = 0;
+	inc->i_rx_tstamp = ktime_set(0, 0);
 
 	for (i = 0; i < RDS_RX_MAX_TRACES; i++)
 		inc->i_rx_lat_trace[i] = 0;
@@ -67,8 +66,7 @@ void rds_inc_path_init(struct rds_incoming *inc, struct rds_conn_path *cp,
 	inc->i_conn_path = cp;
 	inc->i_saddr = *saddr;
 	inc->i_rdma_cookie = 0;
-	inc->i_rx_tstamp.tv_sec = 0;
-	inc->i_rx_tstamp.tv_usec = 0;
+	inc->i_rx_tstamp = ktime_set(0, 0);
 }
 EXPORT_SYMBOL_GPL(rds_inc_path_init);
 
@@ -385,7 +383,7 @@ void rds_recv_incoming(struct rds_connection *conn, struct in6_addr *saddr,
 				      be32_to_cpu(inc->i_hdr.h_len),
 				      inc->i_hdr.h_dport);
 		if (sock_flag(sk, SOCK_RCVTSTAMP))
-			do_gettimeofday(&inc->i_rx_tstamp);
+			inc->i_rx_tstamp = ktime_get_real();
 		rds_inc_addref(inc);
 		inc->i_rx_lat_trace[RDS_MSG_RX_END] = local_clock();
 		list_add_tail(&inc->i_item, &rs->rs_recv_queue);
@@ -552,11 +550,11 @@ static int rds_cmsg_recv(struct rds_incoming *inc, struct msghdr *msg,
 			goto out;
 	}
 
-	if ((inc->i_rx_tstamp.tv_sec != 0) &&
+	if ((inc->i_rx_tstamp != 0) &&
 	    sock_flag(rds_rs_to_sk(rs), SOCK_RCVTSTAMP)) {
+		struct timeval tv = ktime_to_timeval(inc->i_rx_tstamp);
 		ret = put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMP,
-			       sizeof(struct timeval),
-			       &inc->i_rx_tstamp);
+			       sizeof(tv), &tv);
 		if (ret)
 			goto out;
 	}
-- 
2.18.0

^ permalink raw reply related

* Kernel Panic on high bandwidth transfer over wifi
From: Nathaniel Munk @ 2018-08-29 11:42 UTC (permalink / raw)
  To: netdev@vger.kernel.org

Hi all,
I'm running Arch Linux on kernel 4.18.5 (same issue on both arch-provided kernel and mainline built-from-source). There is an issue whereby the kernel crashes when transferring at high bandwidths (approx 6mB/s) over a specific wifi connection. I can only reproduce the issue when using the Personal Hotspot on my iPhone 6S+, but can reproduce it very consistently on that connection.

More often than not, any download reaching this speed will cause a panic, but if the download is immediately terminated at the first error the system can recover (and doing this I have obtained the attached logs). Unfortunately, I have not had access to a second machine to obtain the netconsole printout of the panic.

As above, high-bandwidth transfers on other wifi networks do not cause the issue (nor on ethernet connections).

As you can see from the attached log, the issue appears at tcp_recvmsg+0x579 and net_tx_action+0x1fe. At both these positions (net/ipv4/tcp.c:2000 and net/core/dev.c:4279 in mainline 4.18.5), a member of the skb struct is called.

Thank you for your time (and I apologize if this is spurious or badly worded, this is my first bug report), and please don't hesitate to let me know if there's anything else I can do to help work this out.

Regards,
-------------------
Nathaniel Munk
nathaniel@munk.com.au

^ permalink raw reply

* Re: [PATCH net] sctp: hold transport before accessing its asoc in sctp_transport_get_next
From: Neil Horman @ 2018-08-29 11:35 UTC (permalink / raw)
  To: Xin Long; +Cc: network dev, linux-sctp, davem, Marcelo Ricardo Leitner
In-Reply-To: <CADvbK_f7+UqMqm-ZzEwSz92Joxx_46a1kBrwNbpAi52Qkk=FuQ@mail.gmail.com>

On Wed, Aug 29, 2018 at 12:08:40AM +0800, Xin Long wrote:
> On Mon, Aug 27, 2018 at 9:08 PM Neil Horman <nhorman@tuxdriver.com> wrote:
> >
> > On Mon, Aug 27, 2018 at 06:38:31PM +0800, Xin Long wrote:
> > > As Marcelo noticed, in sctp_transport_get_next, it is iterating over
> > > transports but then also accessing the association directly, without
> > > checking any refcnts before that, which can cause an use-after-free
> > > Read.
> > >
> > > So fix it by holding transport before accessing the association. With
> > > that, sctp_transport_hold calls can be removed in the later places.
> > >
> > > Fixes: 626d16f50f39 ("sctp: export some apis or variables for sctp_diag and reuse some for proc")
> > > Reported-by: syzbot+fe62a0c9aa6a85c6de16@syzkaller.appspotmail.com
> > > Signed-off-by: Xin Long <lucien.xin@gmail.com>
> > > ---
> > >  net/sctp/proc.c   |  4 ----
> > >  net/sctp/socket.c | 22 +++++++++++++++-------
> > >  2 files changed, 15 insertions(+), 11 deletions(-)
> > >
> > > diff --git a/net/sctp/proc.c b/net/sctp/proc.c
> > > index ef5c9a8..4d6f1c8 100644
> > > --- a/net/sctp/proc.c
> > > +++ b/net/sctp/proc.c
> > > @@ -264,8 +264,6 @@ static int sctp_assocs_seq_show(struct seq_file *seq, void *v)
> > >       }
> > >
> > >       transport = (struct sctp_transport *)v;
> > > -     if (!sctp_transport_hold(transport))
> > > -             return 0;
> > >       assoc = transport->asoc;
> > >       epb = &assoc->base;
> > >       sk = epb->sk;
> > > @@ -322,8 +320,6 @@ static int sctp_remaddr_seq_show(struct seq_file *seq, void *v)
> > >       }
> > >
> > >       transport = (struct sctp_transport *)v;
> > > -     if (!sctp_transport_hold(transport))
> > > -             return 0;
> > >       assoc = transport->asoc;
> > >
> > >       list_for_each_entry_rcu(tsp, &assoc->peer.transport_addr_list,
> > > diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> > > index e96b15a..aa76586 100644
> > > --- a/net/sctp/socket.c
> > > +++ b/net/sctp/socket.c
> > > @@ -5005,9 +5005,14 @@ struct sctp_transport *sctp_transport_get_next(struct net *net,
> > >                       break;
> > >               }
> > >
> > > +             if (!sctp_transport_hold(t))
> > > +                     continue;
> > > +
> > >               if (net_eq(sock_net(t->asoc->base.sk), net) &&
> > >                   t->asoc->peer.primary_path == t)
> > >                       break;
> > > +
> > > +             sctp_transport_put(t);
> > >       }
> > >
> > >       return t;
> > > @@ -5017,13 +5022,18 @@ struct sctp_transport *sctp_transport_get_idx(struct net *net,
> > >                                             struct rhashtable_iter *iter,
> > >                                             int pos)
> > >  {
> > > -     void *obj = SEQ_START_TOKEN;
> > > +     struct sctp_transport *t;
> > >
> > > -     while (pos && (obj = sctp_transport_get_next(net, iter)) &&
> > > -            !IS_ERR(obj))
> > > -             pos--;
> > > +     if (!pos)
> > > +             return SEQ_START_TOKEN;
> > >
> > > -     return obj;
> > > +     while ((t = sctp_transport_get_next(net, iter)) && !IS_ERR(t)) {
> > > +             if (!--pos)
> > > +                     break;
> > > +             sctp_transport_put(t);
> > > +     }
> > > +
> > > +     return t;
> > >  }
> > >
> > >  int sctp_for_each_endpoint(int (*cb)(struct sctp_endpoint *, void *),
> > > @@ -5082,8 +5092,6 @@ int sctp_for_each_transport(int (*cb)(struct sctp_transport *, void *),
> > >
> > >       tsp = sctp_transport_get_idx(net, &hti, *pos + 1);
> > >       for (; !IS_ERR_OR_NULL(tsp); tsp = sctp_transport_get_next(net, &hti)) {
> > > -             if (!sctp_transport_hold(tsp))
> > > -                     continue;
> > >               ret = cb(tsp, p);
> > >               if (ret)
> > >                       break;
> > > --
> > > 2.1.0
> > >
> > >
> > Acked-by: Neil Horman <nhorman@tuxdriver.com>
> >
> > Additionally, its not germaine to this particular fix, but why are we still
> > using that pos variable in sctp_transport_get_idx?  With the conversion to
> > rhashtables, it doesn't seem particularly useful anymore.
> For proc, seems so, hti is saved into seq->private.
> But for diag, "hti" in sctp_for_each_transport() is a local variable.
> do you think where we can save it?
> 
Sorry, wasn't suggesting that it had to be removed from sctp_for_each_trasnport,
its clearly used as both an input and output there.  All I was sugesting was
that, in sctp_transport_get_idx, the pos variable might no longer be needed
there specifically, as sctp_transprt_get_next should terminate the loop on its
own.  Or is there another purpose for that positional variable I am missing

Neil

^ permalink raw reply

* [PATCH] veth: add software timestamping
From: Michael Walle @ 2018-08-29 15:24 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel, David S . Miller, Michael Walle

Provide a software TX timestamp as well as the ethtool query interface
and report the software timestamp capabilities.

Tested with "ethtool -T" and two linuxptp instances each bound to a
tunnel endpoint.

Signed-off-by: Michael Walle <michael@walle.cc>
---
 drivers/net/veth.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 8d679c8b7f25..bc8faf13a731 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -24,6 +24,7 @@
 #include <linux/filter.h>
 #include <linux/ptr_ring.h>
 #include <linux/bpf_trace.h>
+#include <linux/net_tstamp.h>
 
 #define DRV_NAME	"veth"
 #define DRV_VERSION	"1.0"
@@ -114,6 +115,18 @@ static void veth_get_ethtool_stats(struct net_device *dev,
 	data[0] = peer ? peer->ifindex : 0;
 }
 
+static int veth_get_ts_info(struct net_device *dev,
+			    struct ethtool_ts_info *info)
+{
+	info->so_timestamping =
+		SOF_TIMESTAMPING_TX_SOFTWARE |
+		SOF_TIMESTAMPING_RX_SOFTWARE |
+		SOF_TIMESTAMPING_SOFTWARE;
+	info->phc_index = -1;
+
+	return 0;
+}
+
 static const struct ethtool_ops veth_ethtool_ops = {
 	.get_drvinfo		= veth_get_drvinfo,
 	.get_link		= ethtool_op_get_link,
@@ -121,6 +134,7 @@ static const struct ethtool_ops veth_ethtool_ops = {
 	.get_sset_count		= veth_get_sset_count,
 	.get_ethtool_stats	= veth_get_ethtool_stats,
 	.get_link_ksettings	= veth_get_link_ksettings,
+	.get_ts_info		= veth_get_ts_info,
 };
 
 /* general routines */
@@ -201,6 +215,7 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev)
 			skb_record_rx_queue(skb, rxq);
 	}
 
+	skb_tx_timestamp(skb);
 	if (likely(veth_forward_skb(rcv, skb, rq, rcv_xdp) == NET_RX_SUCCESS)) {
 		struct pcpu_vstats *stats = this_cpu_ptr(dev->vstats);
 
-- 
2.11.0

^ permalink raw reply related

* Re: [PATCH net 0/2] igmp: fix two incorrect unsolicit report count issues
From: Hangbin Liu @ 2018-08-29 11:08 UTC (permalink / raw)
  To: netdev; +Cc: David Miller
In-Reply-To: <1535537171-24533-1-git-send-email-liuhangbin@gmail.com>

Opps, sent two duplicate mails by mistake.
Please ignore the duplicate messages. Sorry.

On Wed, Aug 29, 2018 at 06:06:07PM +0800, Hangbin Liu wrote:
> Just like the subject, fix two minor igmp unsolicit report count issues.
> 
> Hangbin Liu (2):
>   igmp: fix incorrect unsolicit report count when join group
>   igmp: fix incorrect unsolicit report count after link down and up
> 
>  net/ipv4/igmp.c | 11 ++++-------
>  1 file changed, 4 insertions(+), 7 deletions(-)
> 
> -- 
> 2.5.5
> 

^ permalink raw reply

* Re: [PATCH 0/2] net/sched: Add hardware specific counters to TC actions
From: Paolo Abeni @ 2018-08-29 10:23 UTC (permalink / raw)
  To: Jakub Kicinski, Eelco Chaudron
  Cc: David Miller, netdev, jhs, xiyou.wangcong, jiri, simon.horman,
	Marcelo Ricardo Leitner, louis.peens
In-Reply-To: <20180823201446.3802e84b@cakuba.netronome.com>

On Thu, 2018-08-23 at 20:14 +0200, Jakub Kicinski wrote:
> I asked Louis to run some tests while I'm travelling, and he reports
> that my worry about reporting the extra stats was unfounded.  Update
> function does not show up in traces at all.  It seems under stress
> (generated with stress-ng) the thread dumping the stats in userspace
> (in OvS it would be the revalidator) actually consumes less CPU in
> __gnet_stats_copy_basic (0.4% less for ~2.0% total).
> 
> Would this match with your results?  I'm not sure why dumping would be
> faster with your change..

Wild guess on my side: the relevant patch changes a bit the binary
layout of the 'tc_action' struct, possibly (I still need to check with
pahole) moving the tcf_lock and the stats field on different
cachelines, reducing false sharing that could affect badly such test.

Cheers,

Paolo

^ permalink raw reply

* [RFC net-next] veth: report NEWLINK event when moving the peer device in a new namespace
From: Lorenzo Bianconi @ 2018-08-29 10:09 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <cover.1535532739.git.lorenzo.bianconi@redhat.com>

When moving a veth device to another namespace, userspace receives a
RTM_DELLINK message indicating the device has been removed from current
netns. However, the other peer does not receive a netlink event
containing new values for IFLA_LINK_NETNSID and IFLA_LINK veth
attributes.
Fix that behaviour sending to userspace a RTM_NEWLINK message in the peer
namespace to report new IFLA_LINK_NETNSID/IFLA_LINK values

Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
---
 drivers/net/veth.c | 60 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 59 insertions(+), 1 deletion(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 8d679c8b7f25..b27d46d8084a 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -1242,18 +1242,76 @@ static struct rtnl_link_ops veth_link_ops = {
 	.get_link_net	= veth_get_link_net,
 };
 
+static int veth_notify(struct notifier_block *this,
+		       unsigned long event, void *ptr)
+{
+	struct net_device *peer, *dev = netdev_notifier_info_to_dev(ptr);
+	struct net *peer_net, *net = dev_net(dev);
+	int nsid, ret = NOTIFY_DONE;
+	struct veth_priv *priv;
+
+	if (dev->netdev_ops != &veth_netdev_ops)
+		return NOTIFY_DONE;
+
+	if (event != NETDEV_REGISTER)
+		return NOTIFY_DONE;
+
+	priv = netdev_priv(dev);
+
+	rcu_read_lock();
+
+	peer = rcu_dereference(priv->peer);
+	if (!peer)
+		goto out;
+
+	peer_net = dev_net(peer);
+	/* do not forward events if both veth devices
+	 * are in the same namespace
+	 */
+	if (peer_net == net)
+		goto out;
+
+	/* notify on peer namespace new IFLA_LINK_NETNSID
+	 * and IFLA_LINK values
+	 */
+	nsid = peernet2id_alloc(peer_net, net);
+	rtmsg_ifinfo_newnet(RTM_NEWLINK, peer, ~0U, GFP_ATOMIC,
+			    &nsid, dev->ifindex);
+	ret = NOTIFY_OK;
+
+out:
+	rcu_read_unlock();
+
+	return ret;
+}
+
+static struct notifier_block veth_notifier = {
+	.notifier_call = veth_notify,
+};
+
 /*
  * init/fini
  */
 
 static __init int veth_init(void)
 {
-	return rtnl_link_register(&veth_link_ops);
+	int err;
+
+	err = register_netdevice_notifier(&veth_notifier);
+	if (err < 0)
+		return err;
+
+	err = rtnl_link_register(&veth_link_ops);
+	if (err < 0)
+		unregister_netdevice_notifier(&veth_notifier);
+
+	return err;
 }
 
 static __exit void veth_exit(void)
 {
 	rtnl_link_unregister(&veth_link_ops);
+	unregister_netdevice_notifier(&veth_notifier);
 }
 
 module_init(veth_init);
-- 
2.17.1

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox