* [GIT] Networking
From: David Miller @ 2017-04-26 19:21 UTC (permalink / raw)
To: torvalds; +Cc: akpm, netdev, linux-kernel
1) MLX5 bug fixes from Saeed Mahameed et al.
a) Release wrong resources when firmware timeout happens
b) Wrong check for encapsulation size limits
c) UAR memory leak
d) ETHTOOL_GRXCLSRLALL fails to fill in info->data
2) Don't cache l3mdev on mis-matches local route, causes
net devices to leak refs. From Robert Shearman.
3) Handle fragmented SKBs properly in macsec driver, the problem
is that we were mis-sizing the sgvec table. From Jason A.
Donenfeld.
4) We cannot have checksum offload enabled for inner UDP tunneled
packet during IPSEC, from Ansis Atteka.
5) Fix double SKB free in ravb driver, from Dan Carpenter.
6) Fix CPU port handling in b53 DSA driver, from Florian Dainelli.
7) Don't use on-stack buffers for usb_control_msg() in CAN usb driver,
from Maksim Salau.
8) Fix device leak in macvlan driver, from Herbert Xu. We have to
purge the broadcast queue properly on port destroy.
9) Fix tx ring entry limit on EF10 devices in sfc driver. From
Bert Kenward.
10) Fix memory leaks in team driver, from Pan Bian.
11) Don't setup ipv6_stub before it can be actually used, from Paolo
Abeni.
12) Fix tipc socket flow control accounting, from Parthasarathy
Bhuvaragan.
13) Fix crash on module unload in hso driver, from Andreas Kemnade.
14) Fix purging of bridge multicast entries, the problem is that if
we don't defer it to ndo_uninit it's possible for new entries to
get added after we purge. Fix from Xin Long.
15) Don't return garbage for PACKET_HDRLEN getsockopt, from Alexander
Potapenko.
16) Fix autoneg stall properly in PHY layer, and revert micrel driver
change that was papering over it. From Alexander Kochetkov.
17) Don't dereference an ipv4 route as an ipv6 one in the ip6_tunnnel
code, from Cong Wang.
18) Clear out the congestion control private of the TCP socket in all
of the right places, from Wei Wang.
19) rawv6_ioctl measures SKB length incorrectly, fix from Jamie
Bainbridge.
Please pull, thanks a lot!
The following changes since commit 94836ecf1e7378b64d37624fbb81fe48fbd4c772:
Merge tag 'nfsd-4.11-2' of git://linux-nfs.org/~bfields/linux (2017-04-21 16:37:48 -0700)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git
for you to fetch changes up to 105f5528b9bbaa08b526d3405a5bcd2ff0c953c8:
ipv6: check raw payload size correctly in ioctl (2017-04-26 14:59:35 -0400)
----------------------------------------------------------------
Alexander Kochetkov (1):
net: phy: fix auto-negotiation stall due to unavailable interrupt
Alexander Potapenko (1):
net/packet: check length in getsockopt() called with PACKET_HDRLEN
Andreas Kemnade (1):
net: hso: fix module unloading
Ansis Atteka (1):
udp: disable inner UDP checksum offloads in IPsec case
Bert Kenward (1):
sfc: tx ring can only have 2048 entries for all EF10 NICs
Dan Carpenter (2):
net: tc35815: move free after the dereference
ravb: Double free on error in ravb_start_xmit()
David Ahern (2):
net: ipv6: send unsolicited NA if enabled for all interfaces
net: ipv6: regenerate host route if moved to gc list
David S. Miller (4):
Merge tag 'mlx5-fixes-2017-04-22' of git://git.kernel.org/.../saeed/linux
Merge branch 'dsa-b53-58xx-fixes'
Merge tag 'linux-can-fixes-for-4.11-20170425' of git://git.kernel.org/.../mkl/linux-can
Revert "phy: micrel: Disable auto negotiation on startup"
Eugenia Emantayev (1):
net/mlx5e: Fix small packet threshold
Florian Fainelli (3):
net: dsa: b53: Include IMP/CPU port in dumb forwarding mode
net: dsa: b53: Implement software reset for 58xx devices
net: dsa: b53: Fix CPU port for 58xx devices
Herbert Xu (1):
macvlan: Fix device ref leak when purging bc_queue
Ilan Tayari (1):
net/mlx5e: Fix ETHTOOL_GRXCLSRLALL handling
Jamie Bainbridge (1):
ipv6: check raw payload size correctly in ioctl
Jason A. Donenfeld (2):
macsec: avoid heap overflow in skb_to_sgvec
macsec: dynamically allocate space for sglist
Maksim Salau (1):
net: can: usb: gs_usb: Fix buffer on stack
Maor Gottlieb (1):
net/mlx5: Fix UAR memory leak
Martin KaFai Lau (1):
net/mlx5e: Fix race in mlx5e_sw_stats and mlx5e_vport_stats
Mohamad Haj Yahia (1):
net/mlx5: Fix driver load bad flow when having fw initializing timeout
Myungho Jung (1):
net: core: Prevent from dereferencing null pointer when releasing SKB
Or Gerlitz (3):
net/mlx5: E-Switch, Correctly deal with inline mode on ConnectX-5
net/mlx5e: Make sure the FW max encap size is enough for ipv4 tunnels
net/mlx5e: Make sure the FW max encap size is enough for ipv6 tunnels
Pan Bian (1):
team: fix memory leaks
Paolo Abeni (1):
ipv6: move stub initialization after ipv6 setup completion
Parthasarathy Bhuvaragan (2):
tipc: fix socket flow control accounting error at tipc_send_stream
tipc: fix socket flow control accounting error at tipc_recv_stream
Robert Shearman (1):
ipv4: Avoid caching l3mdev dst on mismatched local route
Roman Spychała (1):
usb: plusb: Add support for PL-27A1
Sabrina Dubroca (1):
ipv6: fix source routing
Stephane Grosjean (2):
can: usb: Add support of PCAN-Chip USB stamp module
can: usb: Kconfig: Add PCAN-USB X6 device in help text
WANG Cong (1):
ipv6: check skb->protocol before lookup for nexthop
Wei Wang (1):
tcp: memset ca_priv data to 0 properly
Xin Long (1):
bridge: move bridge multicast cleanup to ndo_uninit
stephen hemminger (1):
netvsc: fix calculation of available send sections
sudarsana.kalluru@cavium.com (1):
qed: Fix error in the dcbx app meta data initialization.
drivers/net/can/usb/Kconfig | 2 ++
drivers/net/can/usb/gs_usb.c | 17 ++++++++++++-----
drivers/net/can/usb/peak_usb/pcan_usb_core.c | 2 ++
drivers/net/can/usb/peak_usb/pcan_usb_core.h | 2 ++
drivers/net/can/usb/peak_usb/pcan_usb_fd.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
drivers/net/dsa/b53/b53_common.c | 37 +++++++++++++++++++++++++++++++++++--
drivers/net/dsa/b53/b53_regs.h | 5 +++++
drivers/net/ethernet/mellanox/mlx5/core/en.h | 2 +-
drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c | 1 +
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 4 ++--
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 87 ++++++++++++++++++++++++++++++++++++++++++++++++---------------------------------------
drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 36 ++++++++++++++++++++++++------------
drivers/net/ethernet/mellanox/mlx5/core/main.c | 2 +-
drivers/net/ethernet/mellanox/mlx5/core/uar.c | 1 +
drivers/net/ethernet/qlogic/qed/qed_dcbx.c | 10 +++++-----
drivers/net/ethernet/renesas/ravb_main.c | 7 ++++---
drivers/net/ethernet/sfc/efx.h | 5 ++++-
drivers/net/ethernet/sfc/workarounds.h | 1 +
drivers/net/ethernet/toshiba/tc35815.c | 2 +-
drivers/net/hyperv/hyperv_net.h | 1 -
drivers/net/hyperv/netvsc.c | 9 ++++-----
drivers/net/macsec.c | 27 +++++++++++++++++++++------
drivers/net/macvlan.c | 11 ++++++++++-
drivers/net/phy/micrel.c | 11 -----------
drivers/net/phy/phy.c | 40 ++++++++++++++++++++++++++++++++++++----
drivers/net/team/team.c | 8 ++++++--
drivers/net/usb/Kconfig | 2 +-
drivers/net/usb/hso.c | 2 +-
drivers/net/usb/plusb.c | 15 +++++++++++++--
include/linux/phy.h | 1 +
net/bridge/br_device.c | 1 +
net/bridge/br_if.c | 1 -
net/core/dev.c | 3 +++
net/ipv4/route.c | 3 ++-
net/ipv4/tcp_cong.c | 11 +++--------
net/ipv4/udp_offload.c | 3 +++
net/ipv6/addrconf.c | 14 ++++++++++++--
net/ipv6/af_inet6.c | 6 ++++--
net/ipv6/exthdrs.c | 4 ++++
net/ipv6/ip6_tunnel.c | 34 ++++++++++++++++++----------------
net/ipv6/ndisc.c | 3 ++-
net/ipv6/raw.c | 3 +--
net/packet/af_packet.c | 2 ++
net/tipc/socket.c | 4 ++--
44 files changed, 373 insertions(+), 141 deletions(-)
^ permalink raw reply
* Re: [PATCH] ipv6: check raw payload size correctly in ioctl
From: David Miller @ 2017-04-26 19:00 UTC (permalink / raw)
To: jbainbri; +Cc: kuznet, jmorris, yoshfuji, kaber, netdev
In-Reply-To: <1493167407-27969-1-git-send-email-jbainbri@redhat.com>
From: Jamie Bainbridge <jbainbri@redhat.com>
Date: Wed, 26 Apr 2017 10:43:27 +1000
> In situations where an skb is paged, the transport header pointer and
> tail pointer can be the same because the skb contents are in frags.
>
> This results in ioctl(SIOCINQ/FIONREAD) incorrectly returning a
> length of 0 when the length to receive is actually greater than zero.
>
> skb->len is already correctly set in ip6_input_finish() with
> pskb_pull(), so use skb->len as it always returns the correct result
> for both linear and paged data.
>
> Signed-off-by: Jamie Bainbridge <jbainbri@redhat.com>
Applied and queued up for -stable, thanks.
^ permalink raw reply
* Re: [PATCH net-next] tcp: memset ca_priv data to 0 properly
From: David Miller @ 2017-04-26 18:59 UTC (permalink / raw)
To: weiwan; +Cc: netdev, edumazet, ycheng, ncardwell
In-Reply-To: <20170426003802.40091-1-tracywwnj@gmail.com>
From: Wei Wang <weiwan@google.com>
Date: Tue, 25 Apr 2017 17:38:02 -0700
> From: Wei Wang <weiwan@google.com>
>
> Always zero out ca_priv data in tcp_assign_congestion_control() so that
> ca_priv data is cleared out during socket creation.
> Also always zero out ca_priv data in tcp_reinit_congestion_control() so
> that when cc algorithm is changed, ca_priv data is cleared out as well.
> We should still zero out ca_priv data even in TCP_CLOSE state because
> user could call connect() on AF_UNSPEC to disconnect the socket and
> leave it in TCP_CLOSE state and later call setsockopt() to switch cc
> algorithm on this socket.
>
> Fixes: 2b0a8c9ee ("tcp: add CDG congestion control")
> Reported-by: Andrey Konovalov <andreyknvl@google.com>
> Signed-off-by: Wei Wang <weiwan@google.com>
> Acked-by: Eric Dumazet <edumazet@google.com>
> Acked-by: Yuchung Cheng <ycheng@google.com>
> Acked-by: Neal Cardwell <ncardwell@google.com>
Applied to 'net' and queued up for -stable, thanks.
^ permalink raw reply
* Re: [Patch net] ipv6: check skb->protocol before lookup for nexthop
From: David Miller @ 2017-04-26 18:51 UTC (permalink / raw)
To: xiyou.wangcong; +Cc: netdev, andreyknvl, steffen.klassert
In-Reply-To: <1493156235-9823-1-git-send-email-xiyou.wangcong@gmail.com>
From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Tue, 25 Apr 2017 14:37:15 -0700
> Andrey reported a out-of-bound access in ip6_tnl_xmit(), this
> is because we use an ipv4 dst in ip6_tnl_xmit() and cast an IPv4
> neigh key as an IPv6 address:
>
> neigh = dst_neigh_lookup(skb_dst(skb),
> &ipv6_hdr(skb)->daddr);
> if (!neigh)
> goto tx_err_link_failure;
>
> addr6 = (struct in6_addr *)&neigh->primary_key; // <=== HERE
> addr_type = ipv6_addr_type(addr6);
>
> if (addr_type == IPV6_ADDR_ANY)
> addr6 = &ipv6_hdr(skb)->daddr;
>
> memcpy(&fl6->daddr, addr6, sizeof(fl6->daddr));
>
> Also the network header of the skb at this point should be still IPv4
> for 4in6 tunnels, we shold not just use it as IPv6 header.
>
> This patch fixes it by checking if skb->protocol is ETH_P_IPV6: if it
> is, we are safe to do the nexthop lookup using skb_dst() and
> ipv6_hdr(skb)->daddr; if not (aka IPv4), we have no clue about which
> dest address we can pick here, we have to rely on callers to fill it
> from tunnel config, so just fall to ip6_route_output() to make the
> decision.
>
> Fixes: ea3dc9601bda ("ip6_tunnel: Add support for wildcard tunnel endpoints.")
> Reported-by: Andrey Konovalov <andreyknvl@google.com>
> Tested-by: Andrey Konovalov <andreyknvl@google.com>
> Cc: Steffen Klassert <steffen.klassert@secunet.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Applied and queued up for -stable, thanks Cong.
^ permalink raw reply
* Re: [PATCH net-next] virtio-net: on tx, only call napi_disable if tx napi is on
From: David Miller @ 2017-04-26 18:50 UTC (permalink / raw)
To: willemdebruijn.kernel; +Cc: netdev, mst, jasowang, virtualization, willemb
In-Reply-To: <20170425195917.54209-1-willemdebruijn.kernel@gmail.com>
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Date: Tue, 25 Apr 2017 15:59:17 -0400
> From: Willem de Bruijn <willemb@google.com>
>
> As of tx napi, device down (`ip link set dev $dev down`) hangs unless
> tx napi is enabled. Else napi_enable is not called, so napi_disable
> will spin on test_and_set_bit NAPI_STATE_SCHED.
>
> Only call napi_disable if tx napi is enabled.
>
> Fixes: 5a719c2552ca ("virtio-net: transmit napi")
> Reported-by: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Willem de Bruijn <willemb@google.com>
Applied, thanks.
^ permalink raw reply
* Re: [PATCH net-next 0/2] Move sub crq init out of interrupt context
From: David Miller @ 2017-04-26 18:49 UTC (permalink / raw)
To: nfont; +Cc: netdev, jallen, tlfalcon
In-Reply-To: <20170425185704.41126.65738.stgit@ltcalpine2-lp23.aus.stglabs.ibm.com>
From: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Date: Tue, 25 Apr 2017 15:00:58 -0400
> The sub crqs are currently intialized in interrupt context when
> handling a crq response fromn the vios server. There is no reason
> they must be initialized there.
>
> Moving the initialization of the sub crqs to the ibmvnic_init routine
> allows us to do the initialization outside of interrupt context and
> make all of the allocations with GFP_KERNEL instead of GFP_ATOMIC.
Series applied, thanks.
^ permalink raw reply
* Re: [PATCH v3] net: core: Prevent from dereferencing null pointer when releasing SKB
From: David Miller @ 2017-04-26 18:47 UTC (permalink / raw)
To: mhjungk; +Cc: netdev
In-Reply-To: <1493146695-5387-1-git-send-email-mhjungk@gmail.com>
From: Myungho Jung <mhjungk@gmail.com>
Date: Tue, 25 Apr 2017 11:58:15 -0700
> Added NULL check to make __dev_kfree_skb_irq consistent with kfree
> family of functions.
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=195289
>
> Signed-off-by: Myungho Jung <mhjungk@gmail.com>
Applied, thank you.
^ permalink raw reply
* Re: [PATCH net-next] dt-bindings: mdio: Clarify binding document
From: David Miller @ 2017-04-26 18:46 UTC (permalink / raw)
To: f.fainelli-Re5JQEeQqe8AvxtiuMwx3w
Cc: netdev-u79uwXL29TY76Z2rM5mHXA, rogerq-l0cyMroinI0,
andrew-g2DYL2Zd6BY, tony-4v6yS6AI5VpBDgjK7y7TUQ,
nsekhar-l0cyMroinI0, jsarha-l0cyMroinI0,
linux-omap-u79uwXL29TY76Z2rM5mHXA, lars-Qo5EllUWu/uELgA04lAiVw,
robh+dt-DgEjT+Ai2ygdnm+yROfE0A, mark.rutland-5wv7dgnIgG8,
devicetree-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20170425183308.26107-1-f.fainelli-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
From: Florian Fainelli <f.fainelli-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date: Tue, 25 Apr 2017 11:33:03 -0700
> The described GPIO reset property is applicable to *all* child PHYs. If
> we have one reset line per PHY present on the MDIO bus, these
> automatically become properties of the child PHY nodes.
>
> Finally, indicate how the RESET pulse width must be defined, which is
> the maximum value of all individual PHYs RESET pulse widths determined
> by reading their datasheets.
>
> Fixes: 69226896ad63 ("mdio_bus: Issue GPIO RESET to PHYs.")
> Signed-off-by: Florian Fainelli <f.fainelli-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Applied, thanks Florian.
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [oss-drivers] Re: [RFC 3/4] nfp: make use of extended ack message reporting
From: Simon Horman @ 2017-04-26 18:44 UTC (permalink / raw)
To: David Miller
Cc: jhs, jakub.kicinski, netdev, johannes, dsa, daniel,
alexei.starovoitov, bblanco, john.fastabend, kubakici,
oss-drivers
In-Reply-To: <20170426.104416.270999555163740292.davem@davemloft.net>
On Wed, Apr 26, 2017 at 10:44:16AM -0400, David Miller wrote:
> From: Simon Horman <simon.horman@netronome.com>
> Date: Wed, 26 Apr 2017 13:13:16 +0200
>
> > On Tue, Apr 25, 2017 at 10:20:22AM -0400, David Miller wrote:
> >> From: Jamal Hadi Salim <jhs@mojatatu.com>
> >> Date: Tue, 25 Apr 2017 08:42:32 -0400
> >>
> >> > So are we going to standardize these strings?
> >>
> >> No.
> >>
> >> > i.e what if some user has written a bash script that depends on this
> >> > string and it gets changed later.
> >>
> >> They can't do that.
> >>
> >> It's free form extra information an application may or not provide
> >> to the user when the kernel emits it.
> >
> > I don't feel strongly about this and perhaps it can be revisited at some
> > point but perhaps it would be worth documenting that he strings do not
> > form part of the UAPI as my expectation would have been that they do f.e. to
> > facilitate internationalisation.
>
> These two things are entirely separate.
>
> We can maintain uptodate translations of the strings, yet document that
> they can change at any time and are thus not UAPI.
Thanks, I see that now.
^ permalink raw reply
* Re: [PATCH net-next 00/10] tcp: do not use tcp_time_stamp for rcv autotuning
From: David Miller @ 2017-04-26 18:44 UTC (permalink / raw)
To: edumazet; +Cc: netdev, soheil, eric.dumazet
In-Reply-To: <20170425171541.3417-1-edumazet@google.com>
From: Eric Dumazet <edumazet@google.com>
Date: Tue, 25 Apr 2017 10:15:31 -0700
> Some devices or linux distributions use HZ=100 or HZ=250
>
> TCP receive buffer autotuning has poor behavior caused by this choice.
> Since autotuning happens after 4 ms or 10 ms, short distance flows
> get their receive buffer tuned to a very high value, but after an initial
> period where it was frozen to (too small) initial value.
>
> With BBR (or other CC allowing to increase BDP), we are willing to
> increase tcp_rmem[2], but this receive autotuning defect is a blocker
> for hosts dealing with gazillions of TCP flows in the data centers,
> since many of them have inflated RCVBUF. Risk of OOM is too high.
>
> Note that TSO autodefer, tcp cubic, and TCP TS options (RFC 7323)
> also suffer from our dependency to jiffies (via tcp_time_stamp).
>
> We have ongoing efforts to improve all that in the future.
Looks great, series applied, thanks Eric.
^ permalink raw reply
* Re: [PATCH v2] macsec: dynamically allocate space for sglist
From: David Miller @ 2017-04-26 18:42 UTC (permalink / raw)
To: Jason; +Cc: netdev, linux-kernel, stable, security, sd
In-Reply-To: <20170425170818.32661-1-Jason@zx2c4.com>
From: "Jason A. Donenfeld" <Jason@zx2c4.com>
Date: Tue, 25 Apr 2017 19:08:18 +0200
> We call skb_cow_data, which is good anyway to ensure we can actually
> modify the skb as such (another error from prior). Now that we have the
> number of fragments required, we can safely allocate exactly that amount
> of memory.
>
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> Cc: Sabrina Dubroca <sd@queasysnail.net>
> Cc: security@kernel.org
> Cc: stable@vger.kernel.org
Applied, thanks.
^ permalink raw reply
* Re: [PATCH net-next] rhashtable: remove insecure_max_entries param
From: David Miller @ 2017-04-26 18:39 UTC (permalink / raw)
To: fw; +Cc: netdev
In-Reply-To: <20170425094134.21885-1-fw@strlen.de>
From: Florian Westphal <fw@strlen.de>
Date: Tue, 25 Apr 2017 11:41:34 +0200
> no users in the tree, insecure_max_entries is always set to
> ht->p.max_size * 2 in rhtashtable_init().
>
> Replace only spot that uses it with a ht->p.max_size check.
>
> Signed-off-by: Florian Westphal <fw@strlen.de>
Applied, thanks Florian.
^ permalink raw reply
* Re: [RFC PATCH] netxen_nic: null-terminate serial number string in netxen_check_options()
From: David Miller @ 2017-04-26 18:38 UTC (permalink / raw)
To: jmarchan
Cc: manish.chopra, rahul.verma, Dept-GELinuxNICDev, netdev,
linux-kernel
In-Reply-To: <20170425074229.28267-1-jmarchan@redhat.com>
From: "Jerome Marchand" <jmarchan@redhat.com>
Date: Tue, 25 Apr 2017 09:42:29 +0200
> The serial_num string in netxen_check_options() is not always properly
> null-terminated. I couldn't find the documention on the serial number
> format and I suspect a proper integer to string conversion is in
> order, but this patch a least prevents the out-of-bound access.
>
> It solves the following kasan warning:
...
> @@ -842,7 +842,7 @@ netxen_check_options(struct netxen_adapter *adapter)
> {
> u32 fw_major, fw_minor, fw_build, prev_fw_version;
> char brd_name[NETXEN_MAX_SHORT_NAME];
> - char serial_num[32];
> + char serial_num[33];
> int i, offset, val, err;
> __le32 *ptr32;
> struct pci_dev *pdev = adapter->pdev;
Another problem is that the serial_num array is only 4-byte aligned by
accident. Steps are necessary to make sure the ptr32 assignments don't
take unaligned traps.
Something like:
union {
char buf[33];
__le32 dummy;
} serial_num;
^ permalink raw reply
* [PATCH v2 net-next] ip6_tunnel: Fix missing tunnel encapsulation limit option
From: Craig Gallek @ 2017-04-26 18:37 UTC (permalink / raw)
To: Hideaki YOSHIFUJI, Alexey Kuznetsov, David S . Miller; +Cc: netdev
In-Reply-To: <20170426170707.165201-1-kraigatgoog@gmail.com>
From: Craig Gallek <cgallek@google.com>
The IPv6 tunneling code tries to insert IPV6_TLV_TNL_ENCAP_LIMIT and
IPV6_TLV_PADN options when an encapsulation limit is defined (the
default is a limit of 4). An MTU adjustment is done to account for
these options as well. However, the options are never present in the
generated packets.
The issue appears to be a subtlety between IPV6_DSTOPTS and
IPV6_RTHDRDSTOPTS defined in RFC 3542. When the IPIP tunnel driver was
written, the encap limit options were included as IPV6_RTHDRDSTOPTS in
dst0opt of struct ipv6_txoptions. Later, ipv6_push_nfrags_opts was
(correctly) updated to require IPV6_RTHDR options when IPV6_RTHDRDSTOPTS
are to be used. This caused the options to no longer be included in v6
encapsulated packets.
The fix is to use IPV6_DSTOPTS (in dst1opt of struct ipv6_txoptions)
instead. IPV6_DSTOPTS do not have the additional IPV6_RTHDR requirement.
Fixes: 1df64a8569c7: ("[IPV6]: Add ip6ip6 tunnel driver.")
Fixes: 333fad5364d6: ("[IPV6]: Support several new sockopt / ancillary data in Advanced API (RFC3542)")
Signed-off-by: Craig Gallek <kraig@google.com>
---
v2: Change tunnel code to use dst1opt rather than making the checks for
dst0opt more permissive.
net/ipv6/ip6_tunnel.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index ad15d38b41e8..c81f9541f1f7 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -954,7 +954,7 @@ static void init_tel_txopt(struct ipv6_tel_txoption *opt, __u8 encap_limit)
opt->dst_opt[5] = IPV6_TLV_PADN;
opt->dst_opt[6] = 1;
- opt->ops.dst0opt = (struct ipv6_opt_hdr *) opt->dst_opt;
+ opt->ops.dst1opt = (struct ipv6_opt_hdr *) opt->dst_opt;
opt->ops.opt_nflen = 8;
}
@@ -1176,7 +1176,7 @@ int ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev, __u8 dsfield,
if (encap_limit >= 0) {
init_tel_txopt(&opt, encap_limit);
- ipv6_push_nfrag_opts(skb, &opt.ops, &proto, NULL, NULL);
+ ipv6_push_frag_opts(skb, &opt.ops, &proto);
}
/* Calculate max headroom for all the headers and adjust
--
2.13.0.rc0.306.g87b477812d-goog
^ permalink raw reply related
* GSO + changing TX queues == crash?
From: Jakub Kicinski @ 2017-04-26 18:35 UTC (permalink / raw)
To: netdev@vger.kernel.org; +Cc: Eric Dumazet
Hi!
I'm seeing crashes with GSO on when changing the number of TX rings. I
initially thought it was a nfp driver problem but I managed to
reproduce it with i40e now.
What I run on the nfp was iperf sending two dozen streams. Then I run
this little script while 40Gbps is being sent:
bl() {
tc -s qdisc show dev p4p1 | \
tail -$((MAX_TX*3)) | \
grep backlog | \
grep -v "backlog 0b"
}
while true
do
ethtool -L p4p1 tx 0
a=$(bl | wc -l)
echo down $a
# there are 8 combined queues, we shouldn't see more backlog
[ $a -gt 8 ] && break
sleep 2
ethtool -L p4p1 tx 10
echo up $(bl | wc -l)
sleep 2
done
The idea is to catch when after reconfig more queues have backlog than
are configured. Right after this script exits driver gets traffic on
queues which are down. It usually reproduces within a minute, I run it
with tso on gso off and tso off gso off for 15 minutes and that didn't
crash.
i40e machine was running kernel 4.10, with the NFP driver I'm able to
reproduce on net-next all the way back to 3.16 (I haven't tried older).
FWIW the nfp driver is doing:
netif_carrier_off()
for enabled rings:
disable_irq()
napi_disable()
netif_tx_disable()
nfp's free_tx_bufs()
netif_set_real_num_tx_queues()
for enabled rings:
napi_enable()
enable_irq()
netif_tx_wake_all_queues()
nfp's read_link() # does netif_carrier_on()
I was entirely convinced that it's a driver problem, but the fact I
crashed the i40e made me worry :|
This is a crash from i40e, it takes a bit longer to kill it than the
nfp, maybe because it takes longer to reconfig:
[ 461.822381] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[ 461.831229] IP: i40e_lan_xmit_frame+0xf1/0x1420 [i40e]
[ 461.837045] PGD 0
[ 461.837046]
[ 461.841089] Oops: 0002 [#1] SMP
[ 461.844665] Modules linked in: xfs nls_iso8859_1 ipmi_devintf ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xtc
[ 461.924168] fscache tg3 i40e ptp ahci pps_core libahci fjes
[ 461.930595] CPU: 15 PID: 0 Comm: swapper/15 Not tainted 4.11.0-041100rc1-generic #201703051731
[ 461.940340] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
[ 461.948823] task: ffffa087db2b0000 task.stack: ffffb9b903350000
[ 461.955546] RIP: 0010:i40e_lan_xmit_frame+0xf1/0x1420 [i40e]
[ 461.961965] RSP: 0018:ffffa087df1c3d80 EFLAGS: 00010293
[ 461.967899] RAX: 0000000000000000 RBX: ffffa087c3077d00 RCX: 0000000000000000
[ 461.975971] RDX: 0000000000000000 RSI: 0000000000000007 RDI: ffffa087b4e1d70c
[ 461.984044] RBP: ffffa087df1c3e20 R08: ffffa087d440349c R09: 0000000000000000
[ 461.992117] R10: ffffa087de821a08 R11: 0000000000000000 R12: ffffa087b7c60000
[ 462.000188] R13: 0000000000000002 R14: 00000000000005ea R15: ffffa087d97b3000
[ 462.008261] FS: 0000000000000000(0000) GS:ffffa087df1c0000(0000) knlGS:0000000000000000
[ 462.017423] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 462.023941] CR2: 0000000000000008 CR3: 00000006efe09000 CR4: 00000000003406e0
[ 462.032014] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 462.040083] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 462.048154] Call Trace:
[ 462.050969] <IRQ>
[ 462.053311] ? update_load_avg+0x79/0x520
[ 462.057876] ? sched_clock_cpu+0x11/0xb0
[ 462.062356] dev_hard_start_xmit+0xa3/0x1f0
[ 462.067127] sch_direct_xmit+0xfc/0x1c0
[ 462.071509] __qdisc_run+0x122/0x270
[ 462.075598] net_tx_action+0xfd/0x1e0
[ 462.079786] __do_softirq+0x104/0x2af
[ 462.083973] irq_exit+0xb6/0xc0
[ 462.087577] smp_apic_timer_interrupt+0x3d/0x50
[ 462.092732] apic_timer_interrupt+0x89/0x90
[ 462.097501] RIP: 0010:cpuidle_enter_state+0x122/0x2c0
[ 462.103240] RSP: 0018:ffffb9b903353e58 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
[ 462.111818] RAX: 0000000000000000 RBX: 0000000000000004 RCX: 000000000000001f
[ 462.119890] RDX: 0000006b86c1bdc1 RSI: ffffa087df1d8998 RDI: 0000000000000000
[ 462.127962] RBP: ffffb9b903353e98 R08: 0000000000084c3f R09: 0000000000000018
[ 462.136032] R10: 00000000000310a2 R11: 0000000000055e38 R12: ffffa087df1e5800
[ 462.144094] R13: ffffffffb86ec998 R14: 0000000000000004 R15: ffffffffb86ec980
[ 462.152165] </IRQ>
[ 462.154601] ? cpuidle_enter_state+0x110/0x2c0
[ 462.159662] cpuidle_enter+0x17/0x20
[ 462.163748] call_cpuidle+0x23/0x40
[ 462.167740] do_idle+0x189/0x200
[ 462.171438] cpu_startup_entry+0x71/0x80
[ 462.175908] start_secondary+0x154/0x190
[ 462.180384] start_cpu+0x14/0x14
[ 462.184080] Code: 8d 75 05 66 39 c8 0f 86 e4 04 00 00 01 d0 0f b7 d1 29 d0 83 e8 01 39 c6 0f 8f ed 05 00 00 49 8b 44 24 20 48 8d 14 89 4c 8d 1c d0 <49> 89 5b 08 8b 83
[ 462.205347] RIP: i40e_lan_xmit_frame+0xf1/0x1420 [i40e] RSP: ffffa087df1c3d80
[ 462.213418] CR2: 0000000000000008
[ 462.217226] ---[ end trace 0ee2eefbe09283a8 ]---
[ 462.272123] Kernel panic - not syncing: Fatal exception in interrupt
[ 462.279393] Kernel Offset: 0x36800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 462.345422] ---[ end Kernel panic - not syncing: Fatal exception in interrupt
^ permalink raw reply
* Re: [PATCH v1] net: phy: fix auto-negotiation stall due to unavailable interrupt
From: David Miller @ 2017-04-26 18:36 UTC (permalink / raw)
To: al.kochet
Cc: f.fainelli, netdev, linux-kernel, sergei.shtylyov, rogerq,
madalin.bucur
In-Reply-To: <1492686004-30527-2-git-send-email-al.kochet@gmail.com>
From: Alexander Kochetkov <al.kochet@gmail.com>
Date: Thu, 20 Apr 2017 14:00:04 +0300
> The Ethernet link on an interrupt driven PHY was not coming up if the Ethernet
> cable was plugged before the Ethernet interface was brought up.
>
> The patch trigger PHY state machine to update link state if PHY was requested to
> do auto-negotiation and auto-negotiation complete flag already set.
>
> During power-up cycle the PHY do auto-negotiation, generate interrupt and set
> auto-negotiation complete flag. Interrupt is handled by PHY state machine but
> doesn't update link state because PHY is in PHY_READY state. After some time
> MAC bring up, start and request PHY to do auto-negotiation. If there are no new
> settings to advertise genphy_config_aneg() doesn't start PHY auto-negotiation.
> PHY continue to stay in auto-negotiation complete state and doesn't fire
> interrupt. At the same time PHY state machine expect that PHY started
> auto-negotiation and is waiting for interrupt from PHY and it won't get it.
>
> Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com>
> Cc: stable <stable@vger.kernel.org> # v4.9+
Applied, and I reverted the micrel commit too.
Thanks.
^ permalink raw reply
* Re: [PATCH] stmmac: Add support for SIMATIC IOT2000 platform
From: Andy Shevchenko @ 2017-04-26 18:29 UTC (permalink / raw)
To: Jan Kiszka
Cc: Giuseppe Cavallaro, Alexandre Torgue, netdev,
Linux Kernel Mailing List, Sascha Weisenberger
In-Reply-To: <5705c598-7721-fca5-d4e3-8a1a88907173@siemens.com>
On Tue, Apr 25, 2017 at 3:15 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> On 2017-04-25 13:42, Andy Shevchenko wrote:
>> On Tue, Apr 25, 2017 at 1:09 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>> On 2017-04-25 12:07, Jan Kiszka wrote:
>>>> On 2017-04-25 11:46, Andy Shevchenko wrote:
>>>>> On Tue, Apr 25, 2017 at 12:00 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>>>> {
>>>>> .name = "SIMATIC IOT2000",
>>>>> .func = 6,
>>>>> .phy_addr = 1,
>>>>> },
>>>>> {
>>>>> .name = "SIMATIC IOT2000",
>>>>> .func = 7,
>>>>> .phy_addr = 1,
>>>>> },
>>>>>
>>>>> That's all what you need.
>>>>
>>>> Nope. Again: the asset tag is the way to tell both apart AND to ensure
>>>> that we do not match on future devices.
>>
>>> To be more verbose: your version (which is our old one) would even
>>> enable the second, not connected port on the IOT2020. Incorrectly.
>>
>> So, name has 2000 for 2020 device? It's clear bug in DMI table you have. :-(
>>
>> What else do you have in DMI which can be used to distinguish those devices?
So, except asset tag is there anything else to use?
Whatever you choose would be nice to split this long conditional:
if (!strcmp(dmi->name, name) && dmi->func == func) {
/* ASSET_TAG is optional */
if (dmi->asset_tag && strcmp(dmi->asset_tag, asset_tag))
continue;
return dmi->phy_addr;
}
> Andy, there are devices out there in the field, if we as engineers like
> it or not, that are all called "IOT2000" although they are sightly
> different inside. This patch accounts for that. And it does that even
> without adding "platform_data" hacks like in other patches of mine. ;)
Yes, which is good.
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply
* [PATCH net-next 3/6] bpf: bpf_progs stores all loaded programs
From: Hannes Frederic Sowa @ 2017-04-26 18:24 UTC (permalink / raw)
To: netdev; +Cc: ast, daniel, jbenc, aconole
In-Reply-To: <20170426182419.14574-1-hannes@stressinduktion.org>
We later want to give users a quick dump of what is possible with procfs,
so store a list of all currently loaded bpf programs. Later this list
will be printed in procfs.
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
include/linux/filter.h | 4 ++--
kernel/bpf/core.c | 51 +++++++++++++++++++++++---------------------------
kernel/bpf/syscall.c | 4 ++--
3 files changed, 27 insertions(+), 32 deletions(-)
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 9a7786db14fa53..63624c619e371b 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -753,8 +753,8 @@ bpf_address_lookup(unsigned long addr, unsigned long *size,
return ret;
}
-void bpf_prog_kallsyms_add(struct bpf_prog *fp);
-void bpf_prog_kallsyms_del(struct bpf_prog *fp);
+void bpf_prog_link(struct bpf_prog *fp);
+void bpf_prog_unlink(struct bpf_prog *fp);
#else /* CONFIG_BPF_JIT */
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 043f634ff58d87..2139118258cdf8 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -365,22 +365,6 @@ static struct latch_tree_root bpf_tree __cacheline_aligned;
int bpf_jit_kallsyms __read_mostly;
-static void bpf_prog_ksym_node_add(struct bpf_prog_aux *aux)
-{
- WARN_ON_ONCE(!list_empty(&aux->bpf_progs_head));
- list_add_tail_rcu(&aux->bpf_progs_head, &bpf_progs);
- latch_tree_insert(&aux->ksym_tnode, &bpf_tree, &bpf_tree_ops);
-}
-
-static void bpf_prog_ksym_node_del(struct bpf_prog_aux *aux)
-{
- if (list_empty(&aux->bpf_progs_head))
- return;
-
- latch_tree_erase(&aux->ksym_tnode, &bpf_tree, &bpf_tree_ops);
- list_del_rcu(&aux->bpf_progs_head);
-}
-
static bool bpf_prog_kallsyms_candidate(const struct bpf_prog *fp)
{
return fp->jited && !bpf_prog_was_classic(fp);
@@ -392,38 +376,45 @@ static bool bpf_prog_kallsyms_verify_off(const struct bpf_prog *fp)
fp->aux->bpf_progs_head.prev == LIST_POISON2;
}
-void bpf_prog_kallsyms_add(struct bpf_prog *fp)
+void bpf_prog_link(struct bpf_prog *fp)
{
- if (!bpf_prog_kallsyms_candidate(fp) ||
- !capable(CAP_SYS_ADMIN))
- return;
+ struct bpf_prog_aux *aux = fp->aux;
spin_lock_bh(&bpf_lock);
- bpf_prog_ksym_node_add(fp->aux);
+ list_add_tail_rcu(&aux->bpf_progs_head, &bpf_progs);
+ if (bpf_prog_kallsyms_candidate(fp))
+ latch_tree_insert(&aux->ksym_tnode, &bpf_tree, &bpf_tree_ops);
spin_unlock_bh(&bpf_lock);
}
-void bpf_prog_kallsyms_del(struct bpf_prog *fp)
+void bpf_prog_unlink(struct bpf_prog *fp)
{
- if (!bpf_prog_kallsyms_candidate(fp))
- return;
+ struct bpf_prog_aux *aux = fp->aux;
spin_lock_bh(&bpf_lock);
- bpf_prog_ksym_node_del(fp->aux);
+ list_del_rcu(&aux->bpf_progs_head);
+ if (bpf_prog_kallsyms_candidate(fp))
+ latch_tree_erase(&aux->ksym_tnode, &bpf_tree, &bpf_tree_ops);
spin_unlock_bh(&bpf_lock);
}
static struct bpf_prog *bpf_prog_kallsyms_find(unsigned long addr)
{
struct latch_tree_node *n;
+ struct bpf_prog *prog;
if (!bpf_jit_kallsyms_enabled())
return NULL;
n = latch_tree_find((void *)addr, &bpf_tree, &bpf_tree_ops);
- return n ?
- container_of(n, struct bpf_prog_aux, ksym_tnode)->prog :
- NULL;
+ if (!n)
+ return NULL;
+
+ prog = container_of(n, struct bpf_prog_aux, ksym_tnode)->prog;
+ if (!prog->priv_cap_sys_admin)
+ return NULL;
+
+ return prog;
}
const char *__bpf_address_lookup(unsigned long addr, unsigned long *size,
@@ -474,6 +465,10 @@ int bpf_get_kallsym(unsigned int symnum, unsigned long *value, char *type,
rcu_read_lock();
list_for_each_entry_rcu(aux, &bpf_progs, bpf_progs_head) {
+ if (!bpf_prog_kallsyms_candidate(aux->prog) ||
+ !aux->prog->priv_cap_sys_admin)
+ continue;
+
if (it++ != symnum)
continue;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 13642c73dca0b4..d61d1bd3e6fee6 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -664,7 +664,7 @@ void bpf_prog_put(struct bpf_prog *prog)
{
if (atomic_dec_and_test(&prog->aux->refcnt)) {
trace_bpf_prog_put_rcu(prog);
- bpf_prog_kallsyms_del(prog);
+ bpf_prog_unlink(prog);
call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu);
}
}
@@ -858,7 +858,7 @@ static int bpf_prog_load(union bpf_attr *attr)
/* failed to allocate fd */
goto free_used_maps;
- bpf_prog_kallsyms_add(prog);
+ bpf_prog_link(prog);
trace_bpf_prog_load(prog, err);
return err;
--
2.9.3
^ permalink raw reply related
* [PATCH net-next 5/6] bpf: add skeleton for procfs printing of bpf_progs
From: Hannes Frederic Sowa @ 2017-04-26 18:24 UTC (permalink / raw)
To: netdev; +Cc: ast, daniel, jbenc, aconole
In-Reply-To: <20170426182419.14574-1-hannes@stressinduktion.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
kernel/bpf/core.c | 90 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 90 insertions(+)
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 048e2d79718a16..3ba175a24e971a 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -32,6 +32,9 @@
#include <linux/kallsyms.h>
#include <linux/rcupdate.h>
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
+
#include <asm/unaligned.h>
/* Registers */
@@ -488,6 +491,93 @@ int bpf_get_kallsym(unsigned int symnum, unsigned long *value, char *type,
return ret;
}
+/* ebpf procfs implementation */
+
+static struct proc_dir_entry *ebpf_proc_dir;
+
+static void *ebpf_proc_start(struct seq_file *s, loff_t *pos)
+ __acquires(RCU_BH)
+{
+ struct bpf_prog_aux *aux;
+ loff_t off = 0;
+
+ rcu_read_lock();
+
+ if (*pos == 0)
+ return SEQ_START_TOKEN;
+
+ list_for_each_entry_rcu(aux, &bpf_progs, bpf_progs_head)
+ if (++off == *pos)
+ return aux;
+
+ return NULL;
+}
+
+static void *ebpf_proc_next(struct seq_file *s, void *v, loff_t *pos)
+{
+ struct bpf_prog_aux *aux;
+
+ ++*pos;
+
+ if (v == SEQ_START_TOKEN)
+ return list_first_or_null_rcu(&bpf_progs, struct bpf_prog_aux,
+ bpf_progs_head);
+
+ aux = v;
+ return list_next_or_null_rcu(&bpf_progs,
+ &aux->bpf_progs_head,
+ struct bpf_prog_aux,
+ bpf_progs_head);
+}
+
+static void ebpf_proc_stop(struct seq_file *s, void *v)
+ __releases(RCU_BH)
+{
+ rcu_read_unlock();
+}
+
+static int ebpf_proc_show(struct seq_file *s, void *v)
+{
+ if (v == SEQ_START_TOKEN) {
+ seq_printf(s, "# tag\n");
+ return 0;
+ }
+
+ return 0;
+}
+
+static const struct seq_operations ebpf_seq_ops = {
+ .start = ebpf_proc_start,
+ .next = ebpf_proc_next,
+ .stop = ebpf_proc_stop,
+ .show = ebpf_proc_show,
+};
+
+static int ebpf_proc_open(struct inode *inode, struct file *file)
+{
+ return seq_open(file, &ebpf_seq_ops);
+}
+
+static const struct file_operations ebpf_proc_operations = {
+ .open = ebpf_proc_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = seq_release,
+};
+
+static int __init ebpf_proc_init(void)
+{
+ ebpf_proc_dir = proc_mkdir("bpf", NULL);
+ if (!ebpf_proc_dir)
+ return 0;
+ proc_create("programs", 0400, ebpf_proc_dir, &ebpf_proc_operations);
+ return 0;
+}
+
+device_initcall(ebpf_proc_init);
+
+/* end of bpf proc inmplementation */
+
struct bpf_binary_header *
bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
unsigned int alignment,
--
2.9.3
^ permalink raw reply related
* [PATCH net-next 6/6] bpf: show bpf programs
From: Hannes Frederic Sowa @ 2017-04-26 18:24 UTC (permalink / raw)
To: netdev; +Cc: ast, daniel, jbenc, aconole
In-Reply-To: <20170426182419.14574-1-hannes@stressinduktion.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
include/uapi/linux/bpf.h | 32 +++++++++++++++++++-------------
kernel/bpf/core.c | 30 +++++++++++++++++++++++++++++-
2 files changed, 48 insertions(+), 14 deletions(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index e553529929f683..d6506e320953d5 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -101,20 +101,26 @@ enum bpf_map_type {
BPF_MAP_TYPE_HASH_OF_MAPS,
};
+#define BPF_PROG_TYPES \
+ X(BPF_PROG_TYPE_UNSPEC), \
+ X(BPF_PROG_TYPE_SOCKET_FILTER), \
+ X(BPF_PROG_TYPE_KPROBE), \
+ X(BPF_PROG_TYPE_SCHED_CLS), \
+ X(BPF_PROG_TYPE_SCHED_ACT), \
+ X(BPF_PROG_TYPE_TRACEPOINT), \
+ X(BPF_PROG_TYPE_XDP), \
+ X(BPF_PROG_TYPE_PERF_EVENT), \
+ X(BPF_PROG_TYPE_CGROUP_SKB), \
+ X(BPF_PROG_TYPE_CGROUP_SOCK), \
+ X(BPF_PROG_TYPE_LWT_IN), \
+ X(BPF_PROG_TYPE_LWT_OUT), \
+ X(BPF_PROG_TYPE_LWT_XMIT),
+
+
enum bpf_prog_type {
- BPF_PROG_TYPE_UNSPEC,
- BPF_PROG_TYPE_SOCKET_FILTER,
- BPF_PROG_TYPE_KPROBE,
- BPF_PROG_TYPE_SCHED_CLS,
- BPF_PROG_TYPE_SCHED_ACT,
- BPF_PROG_TYPE_TRACEPOINT,
- BPF_PROG_TYPE_XDP,
- BPF_PROG_TYPE_PERF_EVENT,
- BPF_PROG_TYPE_CGROUP_SKB,
- BPF_PROG_TYPE_CGROUP_SOCK,
- BPF_PROG_TYPE_LWT_IN,
- BPF_PROG_TYPE_LWT_OUT,
- BPF_PROG_TYPE_LWT_XMIT,
+#define X(type) type
+ BPF_PROG_TYPES
+#undef X
};
enum bpf_attach_type {
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 3ba175a24e971a..685c1d0f31e029 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -536,13 +536,41 @@ static void ebpf_proc_stop(struct seq_file *s, void *v)
rcu_read_unlock();
}
+static const char *bpf_type_string(enum bpf_prog_type type)
+{
+ static const char *bpf_type_names[] = {
+#define X(type) #type
+ BPF_PROG_TYPES
+#undef X
+ };
+
+ if (type >= ARRAY_SIZE(bpf_type_names))
+ return "<unknown>";
+
+ return bpf_type_names[type];
+}
+
static int ebpf_proc_show(struct seq_file *s, void *v)
{
+ struct bpf_prog *prog;
+ struct bpf_prog_aux *aux;
+ char prog_tag[sizeof(prog->tag) * 2 + 1] = { };
+
if (v == SEQ_START_TOKEN) {
- seq_printf(s, "# tag\n");
+ seq_printf(s, "# tag\t\t\ttype\t\t\truntime\tcap\tmemlock\n");
return 0;
}
+ aux = v;
+ prog = aux->prog;
+
+ bin2hex(prog_tag, prog->tag, sizeof(prog->tag));
+ seq_printf(s, "%s\t%s\t%s\t%s\t%llu\n", prog_tag,
+ bpf_type_string(prog->type),
+ prog->jited ? "jit" : "int",
+ prog->priv_cap_sys_admin ? "priv" : "unpriv",
+ prog->pages * 1ULL << PAGE_SHIFT);
+
return 0;
}
--
2.9.3
^ permalink raw reply related
* [PATCH net-next 1/6] bpf: bpf_lock needs only block bottom half
From: Hannes Frederic Sowa @ 2017-04-26 18:24 UTC (permalink / raw)
To: netdev; +Cc: ast, daniel, jbenc, aconole
In-Reply-To: <20170426182419.14574-1-hannes@stressinduktion.org>
We never modify bpf programs from hardirqs ever.
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
kernel/bpf/core.c | 12 ++++--------
1 file changed, 4 insertions(+), 8 deletions(-)
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index b4f1cb0c5ac710..6f81e0f5a0faa2 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -394,27 +394,23 @@ static bool bpf_prog_kallsyms_verify_off(const struct bpf_prog *fp)
void bpf_prog_kallsyms_add(struct bpf_prog *fp)
{
- unsigned long flags;
-
if (!bpf_prog_kallsyms_candidate(fp) ||
!capable(CAP_SYS_ADMIN))
return;
- spin_lock_irqsave(&bpf_lock, flags);
+ spin_lock_bh(&bpf_lock);
bpf_prog_ksym_node_add(fp->aux);
- spin_unlock_irqrestore(&bpf_lock, flags);
+ spin_unlock_bh(&bpf_lock);
}
void bpf_prog_kallsyms_del(struct bpf_prog *fp)
{
- unsigned long flags;
-
if (!bpf_prog_kallsyms_candidate(fp))
return;
- spin_lock_irqsave(&bpf_lock, flags);
+ spin_lock_bh(&bpf_lock);
bpf_prog_ksym_node_del(fp->aux);
- spin_unlock_irqrestore(&bpf_lock, flags);
+ spin_unlock_bh(&bpf_lock);
}
static struct bpf_prog *bpf_prog_kallsyms_find(unsigned long addr)
--
2.9.3
^ permalink raw reply related
* [PATCH net-next 4/6] bpf: track if the bpf program was loaded with SYS_ADMIN capabilities
From: Hannes Frederic Sowa @ 2017-04-26 18:24 UTC (permalink / raw)
To: netdev; +Cc: ast, daniel, jbenc, aconole
In-Reply-To: <20170426182419.14574-1-hannes@stressinduktion.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
include/linux/filter.h | 6 ++++--
kernel/bpf/core.c | 4 +++-
kernel/bpf/syscall.c | 7 ++++---
kernel/bpf/verifier.c | 4 ++--
net/core/filter.c | 6 +++---
5 files changed, 16 insertions(+), 11 deletions(-)
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 63624c619e371b..635311f57bf24f 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -413,7 +413,8 @@ struct bpf_prog {
locked:1, /* Program image locked? */
gpl_compatible:1, /* Is filter GPL compatible? */
cb_access:1, /* Is control block accessed? */
- dst_needed:1; /* Do we need dst entry? */
+ dst_needed:1, /* Do we need dst entry? */
+ priv_cap_sys_admin:1; /* Where we loaded as sys_admin? */
kmemcheck_bitfield_end(meta);
enum bpf_prog_type type; /* Type of BPF program */
u32 len; /* Number of filter blocks */
@@ -615,7 +616,8 @@ static inline int sk_filter(struct sock *sk, struct sk_buff *skb)
struct bpf_prog *bpf_prog_select_runtime(struct bpf_prog *fp, int *err);
void bpf_prog_free(struct bpf_prog *fp);
-struct bpf_prog *bpf_prog_alloc(unsigned int size, gfp_t gfp_extra_flags);
+struct bpf_prog *bpf_prog_alloc(unsigned int size, gfp_t gfp_extra_flags,
+ bool cap_sys_admin);
struct bpf_prog *bpf_prog_realloc(struct bpf_prog *fp_old, unsigned int size,
gfp_t gfp_extra_flags);
void __bpf_prog_free(struct bpf_prog *fp);
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 2139118258cdf8..048e2d79718a16 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -74,7 +74,8 @@ void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, int k, uns
return NULL;
}
-struct bpf_prog *bpf_prog_alloc(unsigned int size, gfp_t gfp_extra_flags)
+struct bpf_prog *bpf_prog_alloc(unsigned int size, gfp_t gfp_extra_flags,
+ bool cap_sys_admin)
{
gfp_t gfp_flags = GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO |
gfp_extra_flags;
@@ -94,6 +95,7 @@ struct bpf_prog *bpf_prog_alloc(unsigned int size, gfp_t gfp_extra_flags)
return NULL;
}
+ fp->priv_cap_sys_admin = cap_sys_admin;
fp->pages = size / PAGE_SIZE;
fp->aux = aux;
fp->aux->prog = fp;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index d61d1bd3e6fee6..ed698c17578a49 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -786,7 +786,7 @@ EXPORT_SYMBOL_GPL(bpf_prog_get_type);
/* last field in 'union bpf_attr' used by this command */
#define BPF_PROG_LOAD_LAST_FIELD kern_version
-static int bpf_prog_load(union bpf_attr *attr)
+static int bpf_prog_load(union bpf_attr *attr, bool cap_sys_admin)
{
enum bpf_prog_type type = attr->prog_type;
struct bpf_prog *prog;
@@ -817,7 +817,8 @@ static int bpf_prog_load(union bpf_attr *attr)
return -EPERM;
/* plain bpf_prog allocation */
- prog = bpf_prog_alloc(bpf_prog_size(attr->insn_cnt), GFP_USER);
+ prog = bpf_prog_alloc(bpf_prog_size(attr->insn_cnt), GFP_USER,
+ cap_sys_admin);
if (!prog)
return -ENOMEM;
@@ -1053,7 +1054,7 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
err = map_get_next_key(&attr);
break;
case BPF_PROG_LOAD:
- err = bpf_prog_load(&attr);
+ err = bpf_prog_load(&attr, capable(CAP_SYS_ADMIN));
break;
case BPF_OBJ_PIN:
err = bpf_obj_pin(&attr);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 6f8b6ed690be93..24c9dac374770f 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3488,7 +3488,7 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr)
if (ret < 0)
goto skip_full_check;
- env->allow_ptr_leaks = capable(CAP_SYS_ADMIN);
+ env->allow_ptr_leaks = env->prog->priv_cap_sys_admin;
ret = do_check(env);
@@ -3589,7 +3589,7 @@ int bpf_analyzer(struct bpf_prog *prog, const struct bpf_ext_analyzer_ops *ops,
if (ret < 0)
goto skip_full_check;
- env->allow_ptr_leaks = capable(CAP_SYS_ADMIN);
+ env->allow_ptr_leaks = prog->priv_cap_sys_admin;
ret = do_check(env);
diff --git a/net/core/filter.c b/net/core/filter.c
index 9a37860a80fc78..dc020d40bb770a 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1100,7 +1100,7 @@ int bpf_prog_create(struct bpf_prog **pfp, struct sock_fprog_kern *fprog)
if (!bpf_check_basics_ok(fprog->filter, fprog->len))
return -EINVAL;
- fp = bpf_prog_alloc(bpf_prog_size(fprog->len), 0);
+ fp = bpf_prog_alloc(bpf_prog_size(fprog->len), 0, false);
if (!fp)
return -ENOMEM;
@@ -1147,7 +1147,7 @@ int bpf_prog_create_from_user(struct bpf_prog **pfp, struct sock_fprog *fprog,
if (!bpf_check_basics_ok(fprog->filter, fprog->len))
return -EINVAL;
- fp = bpf_prog_alloc(bpf_prog_size(fprog->len), 0);
+ fp = bpf_prog_alloc(bpf_prog_size(fprog->len), 0, false);
if (!fp)
return -ENOMEM;
@@ -1249,7 +1249,7 @@ struct bpf_prog *__get_filter(struct sock_fprog *fprog, struct sock *sk)
if (!bpf_check_basics_ok(fprog->filter, fprog->len))
return ERR_PTR(-EINVAL);
- prog = bpf_prog_alloc(bpf_prog_size(fprog->len), 0);
+ prog = bpf_prog_alloc(bpf_prog_size(fprog->len), 0, false);
if (!prog)
return ERR_PTR(-ENOMEM);
--
2.9.3
^ permalink raw reply related
* [PATCH net-next 0/6] bpf: list all loaded ebpf programs in /proc/bpf/programs
From: Hannes Frederic Sowa @ 2017-04-26 18:24 UTC (permalink / raw)
To: netdev; +Cc: ast, daniel, jbenc, aconole
Right now it seems difficult to list all active ebpf programs in a
system. This new /proc/bpf/programs node should help and print
basically essential information about loaded ebpf programs.
This should help an admin to get a quick look of what is going on in
his system.
Feedback welcome!
Hannes Frederic Sowa (6):
bpf: bpf_lock needs only block bottom half
bpf: rename bpf_kallsyms to bpf_progs, ksym_lnode to bpf_progs_head
bpf: bpf_progs stores all loaded programs
bpf: track if the bpf program was loaded with SYS_ADMIN capabilities
bpf: add skeleton for procfs printing of bpf_progs
bpf: show bpf programs
include/linux/bpf.h | 2 +-
include/linux/filter.h | 10 ++-
include/uapi/linux/bpf.h | 32 ++++----
kernel/bpf/core.c | 195 +++++++++++++++++++++++++++++++++++++----------
kernel/bpf/syscall.c | 11 +--
kernel/bpf/verifier.c | 4 +-
net/core/filter.c | 6 +-
7 files changed, 190 insertions(+), 70 deletions(-)
--
2.9.3
^ permalink raw reply
* [PATCH net-next 2/6] bpf: rename bpf_kallsyms to bpf_progs, ksym_lnode to bpf_progs_head
From: Hannes Frederic Sowa @ 2017-04-26 18:24 UTC (permalink / raw)
To: netdev; +Cc: ast, daniel, jbenc, aconole
In-Reply-To: <20170426182419.14574-1-hannes@stressinduktion.org>
We will soon put all bpf programs on this list, thus use apropriate names.
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
include/linux/bpf.h | 2 +-
kernel/bpf/core.c | 18 +++++++++---------
2 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 6bb38d76faf42a..0fbf6a76555cc9 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -172,7 +172,7 @@ struct bpf_prog_aux {
u32 used_map_cnt;
u32 max_ctx_offset;
struct latch_tree_node ksym_tnode;
- struct list_head ksym_lnode;
+ struct list_head bpf_progs_head;
const struct bpf_verifier_ops *ops;
struct bpf_map **used_maps;
struct bpf_prog *prog;
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 6f81e0f5a0faa2..043f634ff58d87 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -98,7 +98,7 @@ struct bpf_prog *bpf_prog_alloc(unsigned int size, gfp_t gfp_extra_flags)
fp->aux = aux;
fp->aux->prog = fp;
- INIT_LIST_HEAD_RCU(&fp->aux->ksym_lnode);
+ INIT_LIST_HEAD_RCU(&fp->aux->bpf_progs_head);
return fp;
}
@@ -360,25 +360,25 @@ static const struct latch_tree_ops bpf_tree_ops = {
};
static DEFINE_SPINLOCK(bpf_lock);
-static LIST_HEAD(bpf_kallsyms);
+static LIST_HEAD(bpf_progs);
static struct latch_tree_root bpf_tree __cacheline_aligned;
int bpf_jit_kallsyms __read_mostly;
static void bpf_prog_ksym_node_add(struct bpf_prog_aux *aux)
{
- WARN_ON_ONCE(!list_empty(&aux->ksym_lnode));
- list_add_tail_rcu(&aux->ksym_lnode, &bpf_kallsyms);
+ WARN_ON_ONCE(!list_empty(&aux->bpf_progs_head));
+ list_add_tail_rcu(&aux->bpf_progs_head, &bpf_progs);
latch_tree_insert(&aux->ksym_tnode, &bpf_tree, &bpf_tree_ops);
}
static void bpf_prog_ksym_node_del(struct bpf_prog_aux *aux)
{
- if (list_empty(&aux->ksym_lnode))
+ if (list_empty(&aux->bpf_progs_head))
return;
latch_tree_erase(&aux->ksym_tnode, &bpf_tree, &bpf_tree_ops);
- list_del_rcu(&aux->ksym_lnode);
+ list_del_rcu(&aux->bpf_progs_head);
}
static bool bpf_prog_kallsyms_candidate(const struct bpf_prog *fp)
@@ -388,8 +388,8 @@ static bool bpf_prog_kallsyms_candidate(const struct bpf_prog *fp)
static bool bpf_prog_kallsyms_verify_off(const struct bpf_prog *fp)
{
- return list_empty(&fp->aux->ksym_lnode) ||
- fp->aux->ksym_lnode.prev == LIST_POISON2;
+ return list_empty(&fp->aux->bpf_progs_head) ||
+ fp->aux->bpf_progs_head.prev == LIST_POISON2;
}
void bpf_prog_kallsyms_add(struct bpf_prog *fp)
@@ -473,7 +473,7 @@ int bpf_get_kallsym(unsigned int symnum, unsigned long *value, char *type,
return ret;
rcu_read_lock();
- list_for_each_entry_rcu(aux, &bpf_kallsyms, ksym_lnode) {
+ list_for_each_entry_rcu(aux, &bpf_progs, bpf_progs_head) {
if (it++ != symnum)
continue;
--
2.9.3
^ permalink raw reply related
* Re: linux-next: Tree for Apr 26 (net/can/bcm.c)
From: Oliver Hartkopp @ 2017-04-26 18:18 UTC (permalink / raw)
To: Randy Dunlap, Stephen Rothwell, Linux-Next Mailing List
Cc: Linux Kernel Mailing List, netdev@vger.kernel.org, linux-can
In-Reply-To: <05bd24d8-6840-3551-a529-c464f3b26d0a@infradead.org>
Hi Randy,
thanks for the report!
Some fallout of my namespace support integration %-)
I posted a patch for it:
http://marc.info/?l=linux-can&m=149323049630039&w=2
Many thanks & best regards,
Oliver
On 04/26/2017 04:53 PM, Randy Dunlap wrote:
> On 04/26/17 01:03, Stephen Rothwell wrote:
>> Hi all,
>>
>> Changes since 20170424:
>>
>
> on x86_64:
>
> when CONFIG_PROC_FS is not enabled:
>
> ../net/can/bcm.c:1541:14: error: 'struct netns_can' has no member named 'bcmproc_dir'
> ../net/can/bcm.c:1601:14: error: 'struct netns_can' has no member named 'bcmproc_dir'
> ../net/can/bcm.c:1696:11: error: 'struct netns_can' has no member named 'bcmproc_dir'
> ../net/can/bcm.c:1707:15: error: 'struct netns_can' has no member named 'bcmproc_dir'
>
> 2 of those are "protected" by
> if (IS_ENABLED(CONFIG_PROC_FS)) {
> but that doesn't seem to help/work here.
>
> gcc v4.8.5
>
>
>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox