Netdev List
 help / color / mirror / Atom feed
* [PATCH 1/3] bpfilter: add bpfilter_umh to .gitignore
From: Masahiro Yamada @ 2018-06-08 17:12 UTC (permalink / raw)
  To: netdev, Alexei Starovoitov, David S . Miller
  Cc: Arnd Bergmann, Geert Uytterhoeven, linux-kernel, Masahiro Yamada
In-Reply-To: <1528477930-7342-1-git-send-email-yamada.masahiro@socionext.com>

bpfilter_umh is a generated file.  It should be ignored by git.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
---

 net/bpfilter/.gitignore | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 net/bpfilter/.gitignore

diff --git a/net/bpfilter/.gitignore b/net/bpfilter/.gitignore
new file mode 100644
index 0000000..e97084e
--- /dev/null
+++ b/net/bpfilter/.gitignore
@@ -0,0 +1 @@
+bpfilter_umh
-- 
2.7.4

^ permalink raw reply related

* [PATCH 2/3] bpfilter: include bpfilter_umh in assembly instead of using objcopy
From: Masahiro Yamada @ 2018-06-08 17:12 UTC (permalink / raw)
  To: netdev, Alexei Starovoitov, David S . Miller
  Cc: Arnd Bergmann, Geert Uytterhoeven, linux-kernel, Masahiro Yamada,
	YueHaibing
In-Reply-To: <1528477930-7342-1-git-send-email-yamada.masahiro@socionext.com>

Do not use the troublesome ELF magic.  What is happening here is to
embed a user-space program into the kernel.  Simply wrap it in the
assembly with the '.incbin' directive.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
---

 net/bpfilter/Makefile            | 15 ++-------------
 net/bpfilter/bpfilter_kern.c     | 11 +++++------
 net/bpfilter/bpfilter_umh_blob.S |  7 +++++++
 3 files changed, 14 insertions(+), 19 deletions(-)
 create mode 100644 net/bpfilter/bpfilter_umh_blob.S

diff --git a/net/bpfilter/Makefile b/net/bpfilter/Makefile
index aafa720..39c6980 100644
--- a/net/bpfilter/Makefile
+++ b/net/bpfilter/Makefile
@@ -15,18 +15,7 @@ ifeq ($(CONFIG_BPFILTER_UMH), y)
 HOSTLDFLAGS += -static
 endif
 
-# a bit of elf magic to convert bpfilter_umh binary into a binary blob
-# inside bpfilter_umh.o elf file referenced by
-# _binary_net_bpfilter_bpfilter_umh_start symbol
-# which bpfilter_kern.c passes further into umh blob loader at run-time
-quiet_cmd_copy_umh = GEN $@
-      cmd_copy_umh = echo ':' > $(obj)/.bpfilter_umh.o.cmd; \
-      $(OBJCOPY) -I binary -O $(CONFIG_OUTPUT_FORMAT) \
-      -B `$(OBJDUMP) -f $<|grep architecture|cut -d, -f1|cut -d' ' -f2` \
-      --rename-section .data=.init.rodata $< $@
-
-$(obj)/bpfilter_umh.o: $(obj)/bpfilter_umh
-	$(call cmd,copy_umh)
+$(obj)/bpfilter_umh_blob.o: $(obj)/bpfilter_umh
 
 obj-$(CONFIG_BPFILTER_UMH) += bpfilter.o
-bpfilter-objs += bpfilter_kern.o bpfilter_umh.o
+bpfilter-objs += bpfilter_kern.o bpfilter_umh_blob.o
diff --git a/net/bpfilter/bpfilter_kern.c b/net/bpfilter/bpfilter_kern.c
index b13d058..fcc1a7c 100644
--- a/net/bpfilter/bpfilter_kern.c
+++ b/net/bpfilter/bpfilter_kern.c
@@ -10,11 +10,8 @@
 #include <linux/file.h>
 #include "msgfmt.h"
 
-#define UMH_start _binary_net_bpfilter_bpfilter_umh_start
-#define UMH_end _binary_net_bpfilter_bpfilter_umh_end
-
-extern char UMH_start;
-extern char UMH_end;
+extern char bpfilter_umh_start;
+extern char bpfilter_umh_end;
 
 static struct umh_info info;
 /* since ip_getsockopt() can run in parallel, serialize access to umh */
@@ -89,7 +86,9 @@ static int __init load_umh(void)
 	int err;
 
 	/* fork usermode process */
-	err = fork_usermode_blob(&UMH_start, &UMH_end - &UMH_start, &info);
+	err = fork_usermode_blob(&bpfilter_umh_end,
+				 &bpfilter_umh_end - &bpfilter_umh_start,
+				 &info);
 	if (err)
 		return err;
 	pr_info("Loaded bpfilter_umh pid %d\n", info.pid);
diff --git a/net/bpfilter/bpfilter_umh_blob.S b/net/bpfilter/bpfilter_umh_blob.S
new file mode 100644
index 0000000..40311d1
--- /dev/null
+++ b/net/bpfilter/bpfilter_umh_blob.S
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+	.section .init.rodata, "a"
+	.global bpfilter_umh_start
+bpfilter_umh_start:
+	.incbin "net/bpfilter/bpfilter_umh"
+	.global bpfilter_umh_end
+bpfilter_umh_end:
-- 
2.7.4

^ permalink raw reply related

* [PATCH 3/3] bpfilter: do not (ab)use host-program build rule
From: Masahiro Yamada @ 2018-06-08 17:12 UTC (permalink / raw)
  To: netdev, Alexei Starovoitov, David S . Miller
  Cc: Arnd Bergmann, Geert Uytterhoeven, linux-kernel, Masahiro Yamada,
	YueHaibing
In-Reply-To: <1528477930-7342-1-git-send-email-yamada.masahiro@socionext.com>

It is an ugly hack to overwrite $(HOSTCC) with $(CC) to reuse the
build rules from scripts/Makefile.host.  It should not be tedious
to write a build rule for its own.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
---

 net/bpfilter/Makefile                   | 17 +++++++++++------
 net/bpfilter/{main.c => bpfilter_umh.c} |  0
 2 files changed, 11 insertions(+), 6 deletions(-)
 rename net/bpfilter/{main.c => bpfilter_umh.c} (100%)

diff --git a/net/bpfilter/Makefile b/net/bpfilter/Makefile
index 39c6980..6571b30 100644
--- a/net/bpfilter/Makefile
+++ b/net/bpfilter/Makefile
@@ -3,18 +3,23 @@
 # Makefile for the Linux BPFILTER layer.
 #
 
-hostprogs-y := bpfilter_umh
-bpfilter_umh-objs := main.o
-HOSTCFLAGS += -I. -Itools/include/ -Itools/include/uapi
-HOSTCC := $(CC)
-
 ifeq ($(CONFIG_BPFILTER_UMH), y)
 # builtin bpfilter_umh should be compiled with -static
 # since rootfs isn't mounted at the time of __init
 # function is called and do_execv won't find elf interpreter
-HOSTLDFLAGS += -static
+STATIC := -static
 endif
 
+quiet_cmd_cc_user = CC      $@
+      cmd_cc_user = $(CC) -Wall -Wmissing-prototypes -O2 -std=gnu89 \
+		    -I$(srctree) -I$(srctree)/tools/include/ \
+		    -I$(srctree)/tools/include/uapi $(STATIC) -o $@ $<
+
+$(obj)/bpfilter_umh: $(src)/bpfilter_umh.c FORCE
+	$(call if_changed,cc_user)
+
+targets += bpfilter_umh
+
 $(obj)/bpfilter_umh_blob.o: $(obj)/bpfilter_umh
 
 obj-$(CONFIG_BPFILTER_UMH) += bpfilter.o
diff --git a/net/bpfilter/main.c b/net/bpfilter/bpfilter_umh.c
similarity index 100%
rename from net/bpfilter/main.c
rename to net/bpfilter/bpfilter_umh.c
-- 
2.7.4

^ permalink raw reply related

* Re: Qualcomm rmnet driver and qmi_wwan
From: Subash Abhinov Kasiviswanathan @ 2018-06-08 17:19 UTC (permalink / raw)
  To: Daniele Palmas; +Cc: Bjørn Mork, Dan Williams, netdev
In-Reply-To: <CAGRyCJFqiDWDypSij3SGskLpJgtAJ_8f5qKLRY8Kt_yEKB=Q_g@mail.gmail.com>

> I followed Dan's advice and prepared a very basic test patch
> (attached) for testing it through ip link.
> 
> Basically things seem to be properly working with qmicli, but I needed
> to modify a bit qmi_wwan, so I'm adding Bjørn that maybe can help.
> 
> Bjørn,
> 
> I'm trying to add support to rmnet in qmi_wwan: I had to modify the
> code as in the attached test patch, but I'm not sure it is the right
> way.
> 
> This is done under the assumption that the rmnet device would be the
> only one to register an rx handler to qmi_wwan, but it is probably
> wrong.
> 
> Basically I'm wondering if there is a more correct way to understand
> if an rmnet device is linked to the real qmi_wwan device.
> 
> Thanks,
> Daniele


Hi Daniele / Bjørn

Is it possible to define a pass through mode in qmi_wwan. This is to
ensure that all packets in MAP format are passed through instead of
processing in qmi_wwan layer. The pass through mode would just call
netif_receive_skb() on all these packets.

That would allow all the packets to be intercepted by the rx_handler
attached by rmnet which would subsequently de-multiplex and process
the packets.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply

* [ANNOUNCE] iproute 4.17
From: Stephen Hemminger @ 2018-06-08 17:25 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel

New iproute2 release for Linux 4.17 

Lastest version iproute2 utility to support new features in Linux 4.17.
In addition to usual range of small changes, some items worth noting:
  * RDMA tool has gotten lots of updates
  * lots of devlink updates
  * more bpf tool updates from Daniel Borkmann
  * more VRF related changes
  * ss -s  command no longer reports socket statistics off slab cache.
    This was broken since early in 2.6 development cycle and users only
    noticed 10 yrs later.
  * The ip command subtypes support JSON output.
    Most of tc commands as well.


The tarball can be dowloaded from:
  https://www.kernel.org/pub/linux/utils/net/iproute2/iproute2-4.17.0.tar.gz

The upstream repositories for master and net-next branch are now
split. Master branch is at:
  git://git.kernel.org/pub/scm/network/iproute2/iproute2.git

and patches for next release are in (master branch):
  git://git.kernel.org/pub/scm/network/iproute2/iproute2-next.git


Report problems (or enhancements) to the netdev@vger.kernel.org mailing list.

---
Adam Vyskovsky (1):
      tc: fix an off-by-one error while printing tc actions

Alexander Alemayhu (4):
      man: add examples to ip.8
      man: fix man page warnings
      tc: bpf: add ppc64 and sparc64 to list of archs with eBPF support
      examples/bpf: update list of examples

Alexander Aring (5):
      tc: m_ife: allow ife type to zero
      tc: m_ife: print IEEE ethertype format
      tc: m_ife: report about kernels default type
      man: tc-ife: add default type note
      tc: m_ife: fix match tcindex parsing

Alexander Heinlein (1):
      ip/xfrm: Fix deleteall when having many policies installed

Alexander Zubkov (5):
      iproute: list/flush/save filter also by metric
      iproute: "list/flush/save default" selected all of the routes
      treat "default" and "all"/"any" addresses differenty
      treat "default" and "all"/"any" addresses differenty
      arrange prefix parsing code after redundant patches

Alexey Kodanev (1):
      fix typo in ip-xfrm man page, rmd610 -> rmd160

Amir Vadai (14):
      libnetlink: Introduce rta_getattr_be*()
      tc/cls_flower: Classify packet in ip tunnels
      tc/act_tunnel: Introduce ip tunnel action
      tc/pedit: Fix a typo in pedit usage message
      tc/pedit: Extend pedit to specify offset relative to mac/transport headers
      tc/pedit: Introduce 'add' operation
      tc/pedit: p_ip: introduce editing ttl header
      tc/pedit: Support fields bigger than 32 bits
      tc/pedit: p_eth: ETH header editor
      tc/pedit: p_tcp: introduce pedit tcp support
      pedit: Fix a typo in warning
      pedit: Do not allow using retain for too big fields
      pedit: Check for extended capability in protocol parser
      pedit: Introduce ipv6 support

Amritha Nambiar (4):
      tc/mqprio: Offload mode and shaper options in mqprio
      flower: Represent HW traffic classes as classid values
      man: tc-mqprio: add documentation for new offload options
      man: tc-flower: add explanation for hw_tc option

Andreas Henriksson (1):
      ss: fix help/man TCP-STATE description for listening

Antonio Quartulli (2):
      ss: fix crash when skipping disabled header field
      ss: fix NULL pointer access when parsing unix sockets with oldformat

Arkadi Sharshevsky (15):
      devlink: Change netlink attribute validation
      devlink: Add support for pipeline debug (dpipe)
      bridge: Distinguish between externally learned vs offloaded FDBs
      devlink: Make match/action parsing more flexible
      devlink: Add support for special format protocol headers
      devlink: Add support for protocol IPv4/IPv6/Ethernet special formats
      devlink: Ignore unknown attributes
      devlink: Change empty line indication with indentations
      devlink: mnlg: Add support for extended ack
      devlink: Add support for devlink resource abstraction
      devlink: Add support for hot reload
      devlink: Move dpipe context from heap to stack
      devlink: Add support for resource/dpipe relation
      devlink: Update man pages and add resource man
      devlink: Fix error reporting

Asbjørn Sloth Tønnesen (2):
      testsuite: refactor kernel config search
      testsuite: search for kernel config in /boot

Baruch Siach (5):
      tc: add missing limits.h header
      ip: include libc headers first
      lib: fix multiple strlcpy definition
      README: update libdb build dependency information
      arpd: remove pthread dependency

Benjamin LaHaise (2):
      f_flower: don't set TCA_FLOWER_KEY_ETH_TYPE for "protocol all"
      tc: flower: support for matching MPLS labels

Boris Pismenny (1):
      ip xfrm: Add xfrm state crypto offload

Casey Callendrello (1):
      netns: make /var/run/netns bind-mount recursive

Chris Mi (3):
      tc: fix command "tc actions del" hang issue
      lib/libnetlink: Add a new function rtnl_talk_iov
      tc: Add batchsize feature for filter and actions

Christian Brauner (1):
      netns: allow negative nsid

Christian Ehrhardt (2):
      tests: read limited amount from /dev/urandom
      tests: make sure rand_dev suffix has 6 chars

Christoph Paasch (1):
      ip: add fastopen_no_cookie option to ip route

Craig Gallek (2):
      gre6: fix copy/paste bugs in GREv6 attribute manipulation
      iplink: Expose IFLA_*_FWMARK attributes for supported link types

Cyrill Gorcunov (2):
      libnetlink: Add test for error code returned from netlink reply
      ss: Add inet raw sockets information gathering via netlink diag interface

Daniel Borkmann (19):
      bpf: make tc's bpf loader generic and move into lib
      bpf: check for owner_prog_type and notify users when differ
      bpf: add initial support for attaching xdp progs
      {f,m}_bpf: dump tag over insns
      bpf: test for valid type in bpf_get_work_dir
      bpf: add support for generic xdp
      bpf: update printing of generic xdp mode
      bpf: dump error to the user when retrieving pinned prog fails
      bpf: indicate lderr when bpf_apply_relo_data fails
      bpf: remove obsolete samples
      bpf: support loading map in map from obj
      bpf: dump id/jited info for cls/act programs
      bpf: improve error reporting around tail calls
      bpf: fix mnt path when from env
      bpf: unbreak libelf linkage for bpf obj loader
      bpf: minor cleanups for bpf_trace_pipe
      bpf: consolidate dumps to use bpf_dump_prog_info
      json: move json printer to common library
      bpf: properly output json for xdp

David Ahern (56):
      Makefile: really suppress printing of directories
      lib bpf: Add support for BPF_PROG_ATTACH and BPF_PROG_DETACH
      bpf: export bpf_prog_load
      bpf: Add BPF_ macros
      move cmd_exec to lib utils
      Add filesystem APIs to lib
      change name_is_vrf to return index
      libnetlink: Add variant of rtnl_talk that does not display RTNETLINK answers error
      Introduce ip vrf command
      Fix compile warning in get_addr_1
      ip vrf: Move kernel config hint to prog_load failure
      ip vrf: Refactor ipvrf_identify
      ip vrf: Fix reset to default VRF
      ip netns: Reset vrf to default VRF on namespace switch
      ip vrf: Fix run-on error message on mkdir failure
      ip vrf: Improve cgroup2 error messages
      ip vrf: Improve bpf error messages
      Add support for rt_protos.d
      rttable: Fix invalid range checking when table id is converted to u32
      ip route: error out on multiple via without nexthop keyword
      ip route: Make name of protocol 0 consistent
      ip vrf: Handle vrf in a cgroup hierarchy
      ip netns: refactor netns_identify
      ip vrf: Handle VRF nesting in namespace
      ip vrf: Detect invalid vrf name in pids command
      ip: Add support for MPLS netconf
      ip route: Add missing space between nexthop and via for mpls multipath routes
      netlink: Add flag to suppress print of nlmsg error
      ip netconf: Show all address families by default in dumps
      ip netconf: show all families on dev request
      ip vrf: Add command name next to pid
      ip vrf: Add command name next to pid
      ip: mpls: fix printing of mpls labels
      ip: add support for more MPLS labels
      netlink: Change rtnl_dump_done to always show error
      ip address: Export ip_linkaddr_list
      ip address: Move filter struct to ip_common.h
      ip address: Change print_linkinfo_brief to take filter as an input
      ip vrf: Add show command
      lib: Dump ext-ack string by default
      libnetlink: Fix extack attribute parsing
      libnetlink: Handle extack messages for non-error case
      Update headers from 4.15-rc3
      Restore --no-print-directory option for silent builds
      Update kernel headers to 4.15-rc8
      Update kernel headers to 4.16.0-rc2+
      Update kernel headers to 08009a760213
      Import tc_em_ipt.h from kernel at commit 08009a760213
      libnetlink: __rtnl_talk_iov should only loop max iovlen times
      Update kernel headers to 4.16.0-rc4+
      Update kernel headers
      Update kernel headers
      devlink: Print size of -1 as unlimited
      utils: Do not reset family for default, any, all addresses
      ip route: Print expires as signed int
      iplink_vrf: Save device index from response for return code

David Forster (1):
      ip6tunnel: Align ipv6 tunnel key display with ipv4

David Lebrun (9):
      ip: add ip sr command to control SR-IPv6 internal structures
      iproute: add support for SR-IPv6 lwtunnel encapsulation
      man: add documentation for IPv6 SR commands
      iproute: fix compilation issue with older glibc
      iproute: add helper functions for SRH processing
      iproute: add support for SRv6 local segment processing
      man: add documentation for seg6local lwt
      iproute: add support for seg6 l2encap mode
      man: add documentation for seg6 l2encap mode

David Michael (1):
      tc: make tc linking depend on libtc.a

Davide Caratti (4):
      tc: m_csum: add support for SCTP checksum
      tc: fix typo in tc-tcindex man page
      tc: bash-completion: add missing 'classid' keyword
      tc: fix parsing of the control action

Donald Sharp (5):
      ip: mroute: Add table output to show command
      ip: Properly display AF_BRIDGE address information for neighbor events
      ip: Use the `struct fib_rule_hdr` for rules
      ip: Display ip rule protocol used
      ip: Allow rules to accept a specified protocol

Eli Cohen (1):
      iplink: Update usage in help message

Eric Dumazet (2):
      ss: print tcpi_rcv_mss and tcpi_advmss
      tc: fq: support low_rate_threshold attribute

Eyal Birger (2):
      tc: ematch: add parse_eopt_argv() method for providing ematches with argv parameters
      tc: add em_ipt ematch for calling xtables matches from tc matching context

Filip Moc (1):
      ip fou: pass family attribute as u8

Gal Pressman (3):
      iplink: Validate minimum tx rate is less than maximum tx rate
      ipaddress: Make sure VF min/max rate API is supported before using it
      man: Document the meaning of zero in min/max_tx_rate parameters

GhantaKrishnamurthy MohanKrishna (1):
      ss: Add support for TIPC socket diag in ss tool

Girish Moodalbail (2):
      vxlan: Add support for modifying vxlan device attributes
      geneve: support for modifying geneve device

Greg Greenway (1):
      Add "show" subcommand to "ip fou"

Guillaume Nault (3):
      ip/l2tp: remove offset and peer-offset options
      l2tp: no need to export session offsets in JSON output
      bridge: fix typo in hairpin error message

Hadar Hen Zion (4):
      tc/cls_flower: Add dest UDP port to tunnel params
      tc/m_tunnel_key: Add dest UDP port to tunnel key action
      tc/cls_flower: Add to the usage encapsulation dest UDP port
      tc/m_tunnel_key: Add to the usage encapsulation dest UDP port

Hangbin Liu (12):
      iplink: bridge: add support for IFLA_BR_FDB_FLUSH
      iplink: bridge: add support for IFLA_BR_VLAN_STATS_ENABLED
      iplink: bridge: add support for IFLA_BR_MCAST_STATS_ENABLED
      iplink: bridge: add support for IFLA_BR_MCAST_IGMP_VERSION
      iplink: bridge: add support for IFLA_BR_MCAST_MLD_VERSION
      iplink: bridge_slave: add support for IFLA_BRPORT_FLUSH
      man: ip-link.8: Document bridge_slave fdb_flush option
      man: ip-link.8: Document bridge_slave fdb_flush option
      ip neigh: allow flush FAILED neighbour entry
      utils: return default family when rtm_family is not RTNL_FAMILY_IPMR/IP6MR
      lib/libnetlink: re malloc buff if size is not enough
      lib/libnetlink: update rtnl_talk to support malloc buff at run time

Hoang Le (1):
      tipc: TIPC_NLA_LINK_NAME value pass on nesting entry TIPC_NLA_LINK

Ido Schimmel (2):
      iproute: Display offload indication per-nexthop
      iproute: Parse last nexthop in a multipath route

Ivan Delalande (2):
      utils: add print_escape_buf to format and print arbitrary bytes
      ss: print MD5 signature keys configured on TCP sockets

Ivan Vecera (3):
      lib: make resolve_hosts variable common
      devlink: add batch command support
      devlink: don't enforce NETLINK_{CAP,EXT}_ACK sock opts

Jakub Kicinski (23):
      bpf: print xdp offloaded mode
      bpf: add xdpdrv for requesting XDP driver mode
      bpf: allow requesting XDP HW offload
      bpf: initialize the verifier log
      bpf: pass program type in struct bpf_cfg_in
      bpf: keep parsed program mode in struct bpf_cfg_in
      bpf: allocate opcode table in struct bpf_cfg_in
      bpf: split parse from program loading
      bpf: rename bpf_parse_common() to bpf_parse_and_load_common()
      bpf: expose bpf_parse_common() and bpf_load_common()
      bpf: allow loading programs for a specific ifindex
      {f, m}_bpf: don't allow specifying multiple bpf programs
      tc_filter: resolve device name before parsing filter
      f_bpf: communicate ifindex for eBPF offload
      iplink: communicate ifindex for xdp offload
      ip: link: add support for netdevsim device type
      tc: red: allow setting th_min and th_max to the same value
      bpf: support map offload
      tc: red: JSON-ify RED output
      tc: prio: JSON-ify prio output
      ip: address: fix stats64 JSON object name
      tc: fix second printing of requeues
      iplink_geneve: correct size of message to avoid spurious errors

Jakub Sitnicki (2):
      iproute: Remove useless check for nexthop keyword when setting RTA_OIF
      iproute: Abort if nexthop cannot be parsed

Jamal Hadi Salim (6):
      utils: make hex2mem available to all users
      actions: Add support for user cookies
      tc actions: Improved batching and time filtered dumping
      actions: update the man page to describe the "since" time filter
      tc/actions: introduce support for jump action
      tc: Fix filter protocol output

Jean-Philippe Brucker (1):
      ss: fix NULL dereference when rendering without header

Jesus Sanchez-Palencia (1):
      man: Clarify idleslope calculation for tc-cbs

Jiri Benc (3):
      Revert "man pages: add man page for skbmod action"
      tc: m_tunnel_key: reformat the usage text
      tc: m_tunnel_key: add csum/nocsum option

Jiri Kosina (2):
      iproute2: tc: introduce build dependency on libnetlink
      iproute2: add support for invisible qdisc dumping

Jiri Pirko (28):
      devlink: use DEVLINK_CMD_ESWITCH_* instead of DEVLINK_CMD_ESWITCH_MODE_*
      tc_filter: add support for chain index
      tc: actions: add helpers to parse and print control actions
      tc/actions: introduce support for goto chain action
      tc: flower: add support for tcp flags
      tc: gact: fix control action parsing
      tc: add support for TRAP action
      tc: don't print error message on miss when parsing action with default
      tc: move action cookie print out of the stats if
      tc: remove action cookie len from printout
      tc: jsonify qdisc core
      tc: jsonify stats2
      tc: jsonify fq_codel qdisc
      tc: jsonify htb qdisc
      tc: jsonify filter core
      tc: jsonify flower filter
      tc: jsonify matchall filter
      tc: jsonify actions core
      tc: jsonify gact action
      tc: jsonify mirred action
      tc: jsonify vlan action
      man: add -json option to tc manpage
      tc: fix json array closing
      tc: introduce tc_qdisc_block_exists helper
      tc: introduce support for block-handle for filter operations
      tc: implement ingress/egress block index attributes for qdiscs
      devlink: fix port new monitoring message typo
      man: fix devlink object list

Joe Stringer (1):
      bpf: Print section name when hitting non ld64 issue

Jon Maloy (3):
      tipc: change family attribute from u32 to u16
      tipc: introduce command for handling a new 128-bit node identity
      tipc: change node address printout formats

Julien Fortin (31):
      ip: vfinfo: remove code duplication for IFLA_VF_RSS_QUERY_EN
      color: add new COLOR_NONE and disable_color function
      ip: add new command line argument -json (mutually exclusive with -color)
      json_writer: add new json handlers (null, float with format, lluint, hu)
      ip: ip_print: add new API to print JSON or regular format output
      ip: ipaddress.c: add support for json output
      ip: iplink.c: open/close json obj for ip -brief -json link show dev DEV
      ip: iplink_bond.c: add json output support
      ip: iplink_bond_slave.c: add json output support (info_slave_data)
      ip: iplink_hsr.c: add json output support
      ip: iplink_bridge.c: add json output support
      ip: iplink_bridge_slave.c: add json output support
      ip: iplink_can.c: add json output support
      ip: iplink_geneve.c: add json output support
      ip: iplink_ipoib.c: add json output support
      ip: iplink_ipvlan.c: add json output support
      ip: iplink_vrf.c: add json output support
      ip: iplink_vxlan.c: add json output support
      ip: iplink_xdp.c: add json output support
      ip: ipmacsec.c: add json output support
      ip: link_gre.c: add json output support
      ip: link_gre6.c: add json output support
      ip: link_ip6tnl.c: add json output support
      ip: link_iptnl.c: add json output support
      ip: link_vti.c: add json output support
      ip: link_vti6.c: add json output support
      ip: link_macvlan.c: add json output support
      ip: iplink_vlan.c: add json output support
      ip: ipaddress: fix missing space after prefixlen
      lib: json_print: rework 'new_json_obj' drop FILE* argument
      lib: json_print: rework 'new_json_obj' drop FILE* argument

Khem Raj (1):
      tc: include stdint.h explicitly for UINT16_MAX

Krister Johansen (3):
      iptunnel: document mode parameter for sit tunnels
      iptunnel: add support for mpls/ip to sit tunnels
      iptunnel: add support for mpls/ip to ipip tunnels

Leon Romanovsky (34):
      devlink: Call dl_free in early exit case
      utils: Move BIT macro to common header
      rdma: Add basic infrastructure for RDMA tool
      rdma: Add dev object
      rdma: Add link object
      rdma: Add json and pretty outputs
      rdma: Implement json output for dev object
      rdma: Add json output to link object
      rdma: Add initial manual for the tool
      ip: Fix compilation break on old systems
      rdma: Reduce scope of _dev_map_lookup call
      rdma: Protect dev_map_lookup from wrong input
      rdma: Move per-device handler function to generic code
      rdma: Fix misspelled SYS_IMAGE_GUID
      rdma: Check that port index exists before operate on link layer
      rdma: Print supplied device name in case of wrong name
      rdma: Get rid of dev_map_free call
      rdma: Rename free function to be rd_cleanup
      rdma: Rename rd_free_devmap to be rd_free
      rdma: Move link execution logic to common code
      rdma: Add option to provide "-" sign for the port number
      rdma: Make visible the number of arguments
      rdma: Add filtering infrastructure
      rdma: Set pointer to device name position
      rdma: Allow external usage of compare string routine
      rdma: Add resource tracking summary
      rdma: Add QP resource tracking information
      rdma: Document resource tracking
      rdma: Check return value of strdup call
      rdma: Add batch command support
      rdma: Avoid memory leak for skipper resource
      rdma: Update device capabilities flags
      rdma: Move RDMA UAPI header file to be under RDMA responsibility
      rdma: Ignore unknown netlink attributes

Lorenzo Colitti (3):
      ip: support UID range routing.
      iproute: build more easily on Android
      iproute2: fixes to compile on some systems.

Lubomir Rintel (1):
      lib/namespace: don't try to mount rw /sys over a ro one

Luca Boccassi (7):
      man: drop references to Debian-specific paths
      man: add more keywords to ip.8 short description
      man: ip-address: document 15-char limit for LABEL
      man: routel/routef: don't mention filesystem paths
      man: fix small formatting errors
      Drop capabilities if not running ip exec vrf with libcap
      ip: do not drop capabilities if net_admin=i is set

Lucas Bates (2):
      man page: add page for skbmod action
      Add new man page for tc actions.

Lukas Braun (1):
      man: ip-route.8: Mention that lower metric means higher priority

Mahesh Bandewar (1):
      ip/ipvlan: enhance ability to add mode flags to existing modes

Marcelo Ricardo Leitner (1):
      tc-netem: fix limit description in man page

Martin KaFai Lau (1):
      bpf: Add support for IFLA_XDP_PROG_ID

Masatake YAMATO (1):
      ss: prepare rth when killing inet sock

Matteo Croce (3):
      tc: fix typo in manpage
      netns: avoid directory traversal
      netns: more input validation

Matthias Schiffer (1):
      devlink, rdma, tipc: properly define TARGETS without HAVE_MNL

Michal Kubecek (4):
      iplink: check for message truncation in iplink_get()
      iplink: double the buffer size also in iplink_get()
      ip xfrm: use correct key length for netlink message
      ip maddr: fix filtering by device

Michal Kubeček (1):
      routel: fix infinite loop in line parser

Michal Privoznik (1):
      tc: util: Don't call NEXT_ARG_FWD() in __parse_action_control()

Mike Frysinger (2):
      mark shell scripts +x
      ifcfg/rtpr: convert to POSIX shell

Nathan Harold (1):
      iproute2: fix 'ip xfrm monitor all' command

Neal Cardwell (1):
      ss: print new tcp_info fields: delivery_rate and app_limited

Nicolas Dichtel (4):
      link_gre6: really support encaplimit option
      ip: IFLA_NEW_NETNSID/IFLA_NEW_IFINDEX support
      ip: display netns name instead of nsid
      iplink: enable to specify a name for the link-netns

Nikhil Gajendrakumar (1):
      bridge: this patch adds json support for bridge mdb show

Nikolay Aleksandrov (7):
      bridge: fdb: add state filter support
      ipmroute: add support for RTNH_F_UNRESOLVED
      iplink: add support for xstats subcommand
      iplink: bridge: add support for displaying xstats
      iplink: bridge_slave: add support for displaying xstats
      ip: bridge_slave: add support for per-port group_fwd_mask
      ip: bridge_slave: add neigh_suppress to the type help and

Nishanth Devarajan (1):
      tc: B.W limits can now be specified in %.

Nogah Frankel (4):
      ifstat: Includes reorder
      ifstat: Add extended statistics to ifstat
      ifstat: Add "sw only" extended statistics to ifstat
      ifstat: Add xstat to ifstat man page

Oliver Hartkopp (3):
      ip: link add vxcan support
      ip: add vxcan to help text
      ip: add vxcan/veth to ip-link man page

Or Gerlitz (4):
      tc: matchall: Print skip flags when dumping a filter
      tc/pedit: p_udp: introduce pedit udp support
      tc: Reflect HW offload status
      tc: flower: add support for matching on ip tos and ttl

Paul Blakey (2):
      tc: flower: support matching flags
      tc: flower: Refactor matching flags to be more user friendly

Pavel Maltsev (1):
      Allow to configure /var/run/netns directory

Petr Machata (1):
      ip: link_gre6.c: Support IP6_TNL_F_ALLOW_LOCAL_REMOTE flag

Petr Vorel (8):
      ip: fix igmp parsing when iface is long
      color: use "light" colors for dark background
      tests: Remove bashisms (s/source/.)
      tests: Revert back /bin/sh in shebang
      color: Fix ip segfault when using --color switch
      color: Fix another ip segfault when using --color switch
      color: Cleanup code to remove "magic" offset + 7
      color: Rename enum

Phil Dibowitz (1):
      Show 'external' link mode in output

Phil Sutter (113):
      ss: Mark fall through in arg parsing switch()
      ss: Drop empty lines in UDP output
      ss: Add missing tab when printing UNIX details
      ss: Use sockstat->type in all socket types
      ss: introduce proc_ctx_print()
      ss: Drop list traversal from unix_stats_print()
      ss: Eliminate unix_use_proc()
      ss: Turn generic_proc_open() wrappers into macros
      ss: Make tmr_name local to tcp_timer_print()
      ss: Make user_ent_hash_build_init local to user_ent_hash_build()
      ss: Make some variables function-local
      ss: Make slabstat_ids local to get_slabstat()
      ss: Get rid of useless goto in handle_follow_request()
      ss: Get rid of single-fielded struct snmpstat
      ss: Make unix_state_map local to unix_show()
      ss: Make sstate_name local to sock_state_print()
      ss: Make sstate_namel local to scan_state()
      ss: unix_show: No need to initialize members of calloc'ed structs
      tc: m_xt: Fix segfault with iptables-1.6.0
      tc: m_xt: Drop needless parentheses from #if checks
      man: tc-csum.8: Fix example
      man: ip-route.8: Fix 'expires' indenting
      testsuite: Generate nlmsg blob at runtime
      testsuite: Search kernel config in modules dir also
      man: ss.8: Add missing protocols to description of -A
      ip: link: bond: Fix whitespace in help text
      ip: link: macvlan: Add newline to help output
      ip: link: Unify link type help functions a bit
      ip: link: Add missing link type help texts
      man: ip-link: Specify min/max values for bridge slave priority and cost
      man: ip-rule.8: Further clarify how to interpret priority value
      man: ip.8: Document -brief flag
      tc: m_xt: Prevent a segfault in libipt
      man: Collect names of man pages automatically
      bpf: Make bytecode-file reading a little more robust
      Really fix get_addr() and get_prefix() error messages
      tc-simple: Fix documentation
      examples: Some shell fixes to cbq.init
      ifcfg: Quote left-hand side of [ ] expression
      tipc/node: Fix socket fd check in cmd_node_get_addr()
      iproute_lwtunnel: Argument to strerror must be positive
      iproute_lwtunnel: csum_mode value checking was ineffective
      ss: Don't leak fd in tcp_show_netlink_file()
      tc/em_ipset: Don't leak sockfd on error path
      ipvrf: Fix error path of vrf_switch()
      ifstat: Fix memleak in error case
      ifstat: Fix memleak in dump_kern_db() for json output
      ss: Fix potential memleak in unix_stats_print()
      tipc/bearer: Fix resource leak in error path
      devlink: No need for this self-assignment
      ipntable: No need to check and assign to parms_rta
      iproute: Fix for missing 'Oifs:' display
      lib/rt_names: Drop dead code in rtnl_rttable_n2a()
      ss: Skip useless check in parse_hostcond()
      ss: Drop useless assignment
      tc/m_gact: Drop dead code
      ipaddress: Avoid accessing uninitialized variable lcl
      iplink_can: Prevent overstepping array bounds
      ipmaddr: Avoid accessing uninitialized data
      ss: Use C99 initializer in netlink_show_one()
      netem/maketable: Check return value of fstat()
      tc/q_multiq: Don't pass garbage in TCA_OPTIONS
      iproute: Check mark value input
      iplink_vrf: Complain if main table is not found
      devlink: Check return code of strslashrsplit()
      lib/bpf: Don't leak fp in bpf_find_mntpt()
      ifstat, nstat: Check fdopen() return value
      tc/q_netem: Don't dereference possibly NULL pointer
      tc/tc_filter: Make sure filter name is not empty
      tipc/bearer: Prevent NULL pointer dereference
      ipntable: Avoid memory allocation for filter.name
      lib/fs: Fix format string in find_fs_mount()
      lib/inet_proto: Review inet_proto_{a2n,n2a}()
      lnstat_util: Simplify alloc_and_open() a bit
      tc/m_xt: Fix for potential string buffer overflows
      lib/ll_map: Choose size of new cache items at run-time
      ss: Make struct tcpstat fields 'timer' and 'timeout' unsigned
      ss: Make sure scanned index value to unix_state_map is sane
      netem/maketable: Check return value of fscanf()
      lib/bpf: Check return value of write()
      lib/fs: Fix and simplify make_path()
      lib/libnetlink: Don't pass NULL parameter to memcpy()
      ss: Fix for added diag support check
      link_gre6: Fix for changing tclass/flowlabel
      link_gre6: Print the tunnel's tclass setting
      utils: Implement strlcpy() and strlcat()
      Convert the obvious cases to strlcpy()
      Convert harmful calls to strncpy() to strlcpy()
      ipxfrm: Replace STRBUF_CAT macro with strlcat()
      tc_util: No need to terminate an snprintf'ed buffer
      lnstat_util: Make sure buffer is NUL-terminated
      lib/bpf: Fix bytecode-file parsing
      utils: strlcpy() and strlcat() don't clobber dst
      ipaddress: Fix segfault in 'addr showdump'
      ip-route: Fix for listing routes with RTAX_LOCK attribute
      ip{6, }tunnel: Avoid copying user-supplied interface name around
      tc: flower: No need to cache indev arg
      Check user supplied interface name lengths
      ss: Distinguish between IPv4 and IPv6 wildcard sockets
      ss: Detect IPPROTO_ICMPV6 sockets
      tc_util: Drop needless pointer check
      tc_util: Silence spurious compiler warning
      link_gre6: Detect invalid encaplimit values
      man: tc-csum.8: Fix inconsistency in example description
      tc: Optimize gact action lookup
      Remove leftovers from removed Latex documentation
      ip-link: Fix use after free in nl_get_ll_addr_len()
      man: ip-route.8: ssthresh parameter is NUMBER
      man: tc-vlan.8: Fix for incorrect example
      ssfilter: Eliminate shift/reduce conflicts
      ss: Allow excluding a socket table from being queried
      ss: Put filter DB parsing into a separate function
      ss: Drop filter_default_dbs()

Philip Prindeville (1):
      iproute2: add support for GRE ignore-df knob

Pieter Jansen van Vuuren (1):
      tc: f_flower: Add support for matching first frag packets

Quentin Monnet (2):
      README: update location of git repositories, remove broken info link
      README: re-add updated information link

Ralf Baechle (1):
      ip: HSR: Fix cut and paste error

Remigiusz Kołłątaj (1):
      ip: add handling for new CAN netlink interface

Robert Shearman (6):
      iplink: add support for afstats subcommand
      man: Fix formatting of vrf parameter of ip-link show command
      iproute: Add support for ttl-propagation attribute
      iproute: Add support for MPLS LWT ttl attribute
      gre: Fix ttl inherit option
      vxlan: Make id optional when modifying a link

Roi Dayan (11):
      devlink: Add usage help for eswitch subcommand
      devlink: Add option to set and show eswitch inline mode
      tc: flower: Fix typo and style in flower man page
      tc: tunnel_key: Add tc-tunnel_key man page to Makefile
      tc: flower: Fix flower output for src and dst ports
      tc: flower: Add missing err check when parsing flower options
      tc: flower: Fix incorrect error msg about eth type
      tc: flower: Fix parsing ip address
      devlink: Add json and pretty options to help and man
      devlink: Add option to set and show eswitch encapsulation support
      tc: Fix compilation error with old iptables

Roman Mashak (29):
      tc: pass correct conversion specifier to print 'unsigned int' action index.
      tc: fixed man page fonts for keywords and variable values
      tc: updated man page to reflect filter-id use in filter GET command.
      tc: distinguish Add/Replace action operations.
      tc: print skbedit action when dumping actions.
      tc: fix Makefile to build skbmod
      tc: fixed typo in usage text.
      tc: updated tc-u32 man page to reflect skip_sw and skip_hw parameters.
      tc: updated ife man page.
      ss: initialize 'fackets' member of tcpstat structure
      bridge: isolate vlans parsing code in a separate API
      bridge: dump vlan table information for link
      bridge: request vlans along with link information
      ip: added missing newline in man page
      ip netns: use strtol() instead of atoi()
      tc: distinguish Add/Replace qdisc operations
      ss: remove duplicate assignment
      ss: add missing path MTU parameter
      tc: added tc monitor description in man page
      tc: updated tc-bpf man page
      tc: print actual action for sample action
      tc: use get_u32() in psample action to match types
      tc: print actual action for connmark action
      tc: print index, refcnt & bindcnt for nat action
      tc: add oneline mode
      tc: enable json output for actions
      tc: support oneline mode in action generic printer functions
      tc: jsonify sample action
      tc: return on invalid smac or dmac in ife action

Roopa Prabhu (9):
      ip: extend route get to return matching fib route
      iproute: extend route get for mpls routes
      iplink: new option to set neigh suppression on a bridge port
      iplink: bridge: support bridge port vlan_tunnel attribute
      bridge: vlan: support for per vlan tunnel info
      bridge: fdb: print NDA_SRC_VNI if available
      ss: print skmeminfo for packet sockets
      iprule: support for ip_proto, sport and dport match options
      bridge: add option extern_learn to set NTF_EXT_LEARNED on fdb entries

Sabrina Dubroca (3):
      man: ip-link.8: document bridge options
      ip link: add support to display extended tun attributes
      ip link: add json support for tun attributes

Serhey Popovych (90):
      ip/tunnel: Unify setup and accept zero address for local/remote endpoints
      ip/tunnel: Use get_addr() instead of get_prefix() for local/remote endpoints
      ip: gre: fix IFLA_GRE_LINK attribute sizing
      iplink: Improve index parameter handling
      iplink: Process "alias" parameter correctly
      iplink: Kill redundant network device name checks
      ip/tunnel: Use tnl_parse_key() to parse tunnel key
      link_ip6tnl: Use IN6ADDR_ANY_INIT to initialize local/remote endpoints
      link_vti6: Always add local/remote endpoint attributes
      utils: ll_addr: Handle ARPHRD_IP6GRE in ll_addr_n2a()
      ip/tunnel: No need to free answer after rtnl_talk() on error
      gre,ip6tnl/tunnel: Fix noencap- support
      gre6/tunnel: Do not submit garbage in flowinfo
      vxcan,veth: Forbid "type" for peer device
      ip/tunnel: Document "external" parameter
      link_iptnl: Kill code duplication
      link_iptnl: Print tunnel mode
      link_iptnl: Open "encap" JSON object
      ip6/tunnel: Fix tclass output
      ip6tnl/tunnel: Do not print obscure flowinfo
      ip6/tunnel: Unify tclass printing
      ip6/tunnel: Unify flowlabel printing
      ip6/tunnel: Unify encap_limit printing
      gre6/tunnel: Output flowlabel after tclass
      ip6tnl/tunnel: Output hoplimit before encapsulation limit
      ipaddress: Use family_name() for better code reuse
      iplink: Fix "alias" parameter length calculations
      iplink: Use ll_index_to_name() instead of if_indextoname()
      ip/tunnel: Correct and unify ttl/hoplimit printing
      ip/tunnel: Simplify and unify tos printing
      ip/tunnel: Use print_0xhex() instead of print_string()
      ip/tunnel: Abstract tunnel encapsulation options printing
      gre/tunnel: Print erspan_index using print_uint()
      vti/tunnel: Unify ikey/okey printing
      vti6/tunnel: Unify and simplify link type help functions
      tunnel: Return constant string without copying it
      utils: Always specify family for address in get_addr_1()
      utils: Always specify family and ->bytelen in get_prefix_1()
      utils: Fast inet address classification after get_addr()
      iplink_geneve: Get rid of inet_get_addr()
      iplink_vxlan: Get rid of inet_get_addr()
      ip: Get rid of inet_get_addr()
      gre/gre6: Post merge fixes
      tunnel: Add space between encap-dport and encap-sport in non-JSON output
      iptnl/ip6tnl: Unify ttl/hoplimit parsing routines
      vti/vti6: Minor improvements
      iplink: Use ll_name_to_index() instead of if_nametoindex()
      ip/tunnel: Be consistent when printing tunnel collect metadata
      gre/gre6: Unify attribute addition to netlink buffer
      utils: Introduce get_addr_rta() and inet_addr_match_rta()
      ipaddress: Use inet_addr_match_rta()
      iprule: Use inet_addr_match_rta()
      ipmroute: Use inet_addr_match_rta()
      ipneigh: Use inet_addr_match_rta()
      ipl2tp: Use get_addr_rta()
      tcp_metric: Use get_addr_rta()
      ip/tunnel: Unify local/remote endpoint address printing
      Revert "ip address: Change print_linkinfo_brief to take filter as an input"
      ip: Consolidate ip, xdp and lwtunnel parse/dump prototypes in ip_common.h
      ip: Minor cleanups
      treewide: Use addattr_nest()/addattr_nest_end() to handle nested attributes
      ipaddress: Unify print_link_stats() and print_link_stats64()
      ip: Introduce get_rtnl_link_stats_rta() to get link statistics
      tunnel: Split statistic getting and printing
      iptunnel/ip6tunnel: Code cleanups
      iptunnel/ip6tunnel: Use netlink to walk through tunnels list
      tuntap: Use netlink to walk through tuntap list
      vti/vti6: Unify vti_print_help()
      gre/gre6: Unify gre_print_help()
      iptnl/ip6tnl: Unify iptunnel_print_help()
      ip/tunnel: Minor cleanups
      ip: Use print_0xhex() where appropriate
      utils: Introduce and use inet_prefix_reset()
      vti/vti6: Unify local/remote endpoint address parsing
      gre/gre6: Unify local/remote endpoint address parsing
      iptnl/ip6tnl: Unify local/remote endpoint and 6rd address parsing
      ip: Use single variable to represent -pretty
      ipaddress: Abstract IFA_LABEL matching code
      ipaddress: ll_map: Replace ll_idx_n2a() with ll_index_to_name()
      utils: Reimplement ll_idx_n2a() and introduce ll_idx_a2n()
      ipaddress: Improve print_linkinfo()
      ipaddress: Simplify print_linkinfo_brief() and it's usage
      lib: Correct object file dependencies
      utils: Introduce and use get_ifname_rta()
      utils: Introduce and use print_name_and_link() to print name@link
      ipaddress: Make print_linkinfo_brief() static
      utils: Introduce and use nodev() helper routine
      iplink: Use "dev" and "name" parameters interchangeable when possible
      iplink: Follow documented behaviour when "index" is given
      iplink: Perform most of request buffer setups and checks in iplink_parse()

Shmulik Ladkani (2):
      tc: m_mirred: Add support for ingress redirect/mirror
      ip: link_ip6tnl.c/ip6tunnel.c: Support IP6_TNL_F_ALLOW_LOCAL_REMOTE flag

Simon Horman (20):
      tc: flower: Support matching on SCTP ports
      tc: flower: remove references to eth_type in manpage
      tc: flower: document SCTP ip_proto
      tc: flower: correct name of ip_proto parameter to flower_parse_port()
      tc: flower: make use of flower_port_attr_type() safe and silent
      tc: flower: introduce enum flower_endpoint
      tc: flower: support matching on ICMP type and code
      tc: flower: document that *_ip parameters take a PREFIX as an argument.
      tc: flower: Allow *_mac options to accept a mask
      tc: flower: document that *_ip parameters take a PREFIX as an argument.
      tc: flower: Allow *_mac options to accept a mask
      tc: flower: Update dest UDP port documentation
      tc: ife: correct spelling of prio in example
      tc: flower: Support matching ARP
      tc: flower: use correct type when calling flower_icmp_attr_type
      tc: flower: Update documentation to indicate ARP takes IPv4 prefixes
      tc: flower: provide generic masked u8 parser helper
      tc: flower: provide generic masked u8 print helper
      tc: flower: support masked ICMP code and type match
      tc actions: store and dump correct length of user cookies

Simon Ruderich (3):
      man: document ip route get mark
      man: document ip fou show
      man: document ip xfrm policy nosock

Solio Sarabia (1):
      iplink: validate maximum gso_max_size

Stefan Hajnoczi (2):
      ss: allow AF_FAMILY constants >32
      ss: add AF_VSOCK support

Stefano Brivio (8):
      ss: Remove useless width specifier in process context print
      ss: Streamline process context printing in netlink_show_one()
      ss: Fix width calculations when Netid or State columns are missing
      ss: Replace printf() calls for "main" output by calls to helper
      ss: Introduce columns lightweight abstraction
      ss: Buffer raw fields first, then render them as a table
      ss: Implement automatic column width calculation
      ss: Fix rendering of continuous output (-E, --events)

Stephen Hemminger (235):
      update kernel headers to 4.9-net-next
      update net-next headers
      tc: flower checkpatch cleanups
      Update kernel headers for XDP and tcp_info
      update kernel headers from net-next
      update kernel headers from net-next
      update to net-next headers (pre 4.10 rc)
      lwtunnel: style cleanup
      libnetlink: break up dump function
      utils: cleanup style
      ipvrf: cleanup style issues
      configure: fix elftest when warnings enabled
      update kernel headers
      Revert "tc: flower: document that *_ip parameters take a PREFIX as an argument."
      Revert "tc: flower: Allow *_mac options to accept a mask"
      minor kernel header update
      whitespace cleanup
      kernel headers update
      add more uapi header files
      include: remove unused header
      update kernel headers (from 4.10-rc4)
      update kernel headers from 4.10 net-next
      update kernel headers from net-next
      tcp: header file update
      update headers from bridge tunnel metadata
      tc: add missing sample file
      update headers from net-next
      update headers from 4.10-rc8
      utils: hex2mem get rid of unnecessary goto
      v4.10.0
      add missing iplink_xstats.c
      update headers from net-next
      Update headers based on 4.11 merge window
      netlink route attribute cleanup
      xfrm: remove unnecessary casts
      tc: use rta_getattr_u32
      bpf: remove unnecessary cast
      pie: remove always false condition
      update headers from 4.11-rc2
      update kernel headers from net-next
      update headers from net-next
      update headers from 4.11-rc3
      update headers from net-next (post 4.11-rc3)
      update kernel headers from net-next
      netem: fix out of bounds access in maketable
      Update kernel headers from 4.11 net-next
      add seg6.h kernel headers
      update kernel headers from net-next
      remove unused header file sysctl.h
      iplink: whitespace cleanup
      pedit: fix whitespace
      update headers to 4.11 net-next
      v4.11.0
      update kernel headers during 4.12 merge window
      update headers from 4.12-rc2
      include: remove no longer used iptables_common.h
      update to current net-next headers
      update headers to get changes for TCA_FLOWER
      update headers to get IFLA_EVENT
      updated headers from net-next
      update headers from net-next (bpf and tc)
      more bpf header updates
      xfrm: get #define's from linux includes
      update headers to get TCA_TUNNEL_CSUM
      update kernel headers from net-next
      v4.12.0
      update kernel headers from net-next
      update headers to 4.13-rc1
      remove duplicated #include's
      Update headers from net-next
      ip: change flag names to an array
      update headers from 4.13-rc4
      tc: fix m_simple usage
      update headers from 4.13 net-next
      iproute: Add support for extended ack to rtnl_talk
      ss: enclose IPv6 address in brackets
      lib: fix extended ack with and without libmnl
      lib: need to pass LIBMNL flag
      include: update headers from net-next
      tc, ip: more Makefile updates for LIBMNL
      vti6: fix local/remote any addr handling
      change how Config is used in Makefile's
      vti: print keys in hex not dotted notation
      more BPF headers update
      seg6: add include/linux/seg6_local.h
      include: add pfkeyv2.h drop ipv6.h
      update kernel headers from net-next
      config: put CFLAGS/LDLIBS in config.mk
      add ERSPAN headers
      rdma: fix duplicate initialization in port_names
      libnetlink: drop unused parameter to rtnl_dump_done
      bpf: drop unused parameter to bpf_report_map_in_map
      tc: use named initializer for default mqprio options
      devlink: header update
      update headers from net-next
      update headers from 4.14 merge
      v4.13.0
      BPF: update headers from 4.14-rc1
      tc: flower remove unused variable
      doc: remove obsolete ip-tunnels documentation
      doc: remove outdated ss documentation
      doc: remove outdated arpd documentation
      doc: remove outdated nstat/rtstat documentation
      ignore generated Config file
      doc: remove outdated tc-filters documentation
      doc: remove outdated IPv6 flow label document
      doc: drop old ip command documentation
      update headers from net-next rc
      tipc: don't need custom CFLAGS
      update uapi headers from 4.14-rc4 net-next
      rdma: move headers to uapi
      uapi: add include linux/vm_sockets_diag.h
      netem: fix code indentation
      update headers for TC and TIPC from net-next
      bpf: update header file
      include: add TCP fastopen option
      update kernel headers
      iproute: source code cleanup
      bridge: checkpatch related cleanups
      Update kernel headers based on 4.14-rc7
      Update kernel headers from net-next (4.14-rc6)
      update kernel headers from 4.14-rc7 net-next
      Update kernel headers from 4.14-rc8 nete-next
      Update kernel headers with new SPDK identifier
      netem: use fixed rather than floating point for scaling
      update kernel headers
      update kernel headers from 4.14 net-next
      drop unneeded include of syslog.h
      v4.14.0
      utils: remove duplicate include of ctype.h
      v4.14.1
      update headers from 4.15-rc1
      ila: fix formatting of help message
      update bpf header from net-next
      tc: replace magic constant 16 with #define
      tc: break long lines
      SPDX license identifiers
      m_vlan: style cleanups
      m_action: style cleanup
      m_gact: whitespace cleanup
      m_mirred: style cleanups
      update bpf header from net-next
      update headers from 4.15-rc2
      iplink: allow configuring GSO max values
      uapi: add access to snd_cwnd and other sock_ops
      uapi: tun add eBPF based queue selection method
      iplink: add definitions for GSO_MAX
      include: qdisc offload defines
      ip: validate vlan value for vlan info
      ss: fix crash with invalid command input file
      utils: fix makeargs stack overflow
      include: update ethernet headers
      tc: remove no longer relevant README
      v4.15.0
      include: update uapi with BPF from 4.15-rc1
      include: update netfilter headers from 4.15-rc1
      include: update rdma uapi from 4.15-rc1
      include: update interface UAPI from 4.15-rc1
      include: update UAPI types.h
      iproute: refactor printing flags
      iproute: make printing icmpv6 a function
      iproute: make printing IPv4 cache flags a function
      iproute: refactor cacheinfo printing
      iproute: refactor metrics print
      iproute: refactor printing flow info
      iproute: refactor newdst, gateway and via printing
      iproute: refactor multipath print
      iproute: refactor printing of interface
      iproute: whitespace fixes
      iproute: don't do assignment in condition
      iproute: make flush a separate function
      json: make pretty printing optional
      man: add documentation for json and pretty flags
      json: fix newline at end of array
      iproute: implement JSON and color output
      include: update rdma header from 4.16-rc1
      uapi: update if_ether compat headers
      ip: don't colorize the master device
      ip: remove dead code
      bridge: implement json pretty print flag
      bridge: colorize output and use JSON print library
      bridge: add json support for link command
      bridge: update man page for new color and json changes
      ip: always print interface name in color
      tc: implement color output
      json_writer: add SPDX Identifier (GPL-2/BSD-2)
      ipneigh: add color and json support
      ipaddrlabel: add json support
      iprule: add json support
      ipntable: add json support
      ipnetconf: add JSON support
      tcp_metrics; make tables const
      tcp_metrics: add json support
      ipsr: add json support
      token: support JSON
      tuntap: support JSON output
      fou: break long lines
      fou: support JSON output
      ip: macsec cleanup
      ipmacsec: collapse common code
      macsec: support JSON
      netns: add JSON support
      ipmaddr: json and color support
      ipmroute: convert to output JSON
      ipmroute: better error message if no kernel mroute
      Revert "iproute: "list/flush/save default" selected all of the routes"
      tc: help and whitespace cleanup
      rdma: fix man page typos
      ip/ila: support json and color
      ip/l2tp: add JSON support
      bridge: avoid snprint truncation on time
      pedit: fix strncpy warning
      ip: use strlcpy() to avoid truncation
      tunnel: use strlcpy to avoid strncpy warnings
      tc_class: fix snprintf warning
      ematch: fix possible snprintf overflow
      misc: avoid snprintf warnings in ss and nstat
      bpf: avoid compiler warnings about strncpy
      namespace: limit the length of namespace name to avoid snprintf overflow
      uapi/if_ether: add definition of ether type field
      v4.16.0
      uapi/bpf: update kernel header from 4.17-rc1
      uapi/tipc: update header from 4.17-rc1
      uapi/sctp: update header from 4.17-rc1
      ipneigh: fix missing format specifier
      flower: use 16 bit format where possible
      bpf: fix warnings on gcc-8 about string truncation
      rdma: align headers with upstream
      rdma: add ib header files
      ss: remove non-functional slabinfo
      tc: allow 0% for percent options
      ip: defer lookup interface index
      rt_protos: drop old experimental gated names
      uapi: update bpf.h to include padding
      v4.17.0

Steve Wise (7):
      rdma: update rdma_netlink.h
      rdma: add UAPI rdma_user_cm.h
      rdma: initialize the rd struct
      rdma: Add CM_ID resource tracking information
      rdma: Add CQ resource tracking information
      rdma: Add MR resource tracking information
      rdma: Add PD resource tracking information

Tariq Toukan (1):
      ip-address: Fix negative prints of large TX rate limits

Thomas Egerer (3):
      xfrm_policy: Add filter option for socket policies
      xfrm_policy: Do not attempt to deleteall a socket policy
      xfrm_{state, policy}: Allow to deleteall polices/states with marks

Thomas Graf (2):
      bpf: Fix number of retries when growing log buffer
      lwt: BPF support for LWT

Thomas Haller (1):
      man: fix documentation for range of route table ID

Timothy Redaelli (2):
      ip-route: Prevent some other double spaces in output
      bridge: Prevent a double space in bridge mdb show

Toke Høiland-Jørgensen (4):
      tc: Add missing documentation for codel and fq_codel parameters
      tc: Add JSON output of fq_codel stats
      ingress: Don't break JSON output
      json_print: Fix hidden 64-bit type promotion

Tom Herbert (5):
      ila: Fix reporting of ILA locators and locator match
      ila: added csum neutral support to ipila
      ila: support to configure checksum neutral-map-auto
      ila: support for configuring identifier and hook types
      ila: create ila_common.h

Vincent Bernat (2):
      vxlan: use preferred address family when neither group or remote is specified
      color: disable color when json output is requested

Vinicius Costa Gomes (2):
      tc: Add support for the CBS qdisc
      man: Add initial manpage for tc-cbs(8)

Vlad Yasevich (1):
      ip: Add IFLA_EVENT output to ip monitor

Wei Wang (1):
      ss: print tcpi_rcv_ssthresh

William Tu (5):
      gre: add support for ERSPAN tunnel
      ip6_gre: add support for ERSPAN tunnel
      gre6: add collect metadata support
      erspan: add erspan version II support
      erspan: add erspan usage description

Wolfgang Bumiller (1):
      tc/lexer: let quotes actually start strings

Yotam Gigi (10):
      tc: man: matchall: Fix example indentation
      tc: Add support for the sample tc action
      tc: man: Add man entry for the tc-sample action
      tc: man: matchall: Update examples to include sample
      tc: bash-completion: Add the _from variant to _tc_one* funcs
      tc: bash-completion: Prepare action autocomplete to support several actions
      tc: bash-completion: Make the *_KIND variables global
      tc: bash-completion: Add support for filter actions
      tc: bash-completion: Add support for matchall
      ip: mroute: Print offload indication

Yuchung Cheng (1):
      ss: print new tcp_info fields: busy, rwnd-limited, sndbuf-limited times

Yulia Kartseva (1):
      tc: fix ipv6 filter selector attribute for some prefix lengths

Yuval Mintz (2):
      qdisc: print offload indication
      tc: Correct json output for actions

Zhang Shengju (1):
      iplink: add support for IFLA_CARRIER attribute

yupeng (1):
      man: add additional explainations for ss

Élie Bouttier (1):
      ip route: replace exits with returns

^ permalink raw reply

* Re: [PATCH] Bluetooth: hci_bcm: Configure SCO routing automatically
From: Rob Herring @ 2018-06-08 17:25 UTC (permalink / raw)
  To: attitokes
  Cc: David S. Miller, Mark Rutland, Marcel Holtmann, Johan Hedberg,
	Artiom Vaskov, netdev, devicetree, linux-kernel@vger.kernel.org,
	open list:BLUETOOTH DRIVERS
In-Reply-To: <20180608162009.22762-1-attitokes@gmail.com>

On Fri, Jun 8, 2018 at 10:20 AM,  <attitokes@gmail.com> wrote:
> From: Attila Tőkés <attitokes@gmail.com>
>
> Added support to automatically configure the SCO packet routing at the device setup. The SCO packets are used with the HSP / HFP profiles, but in some devices (ex. CYW43438) they are routed to a PCM output by default. This change allows sending the vendor specific HCI command to configure the SCO routing. The parameters of the command are loaded from the device tree.

Please wrap your commit msg.

>
> Signed-off-by: Attila Tőkés <attitokes@gmail.com>
> ---
>  .../bindings/net/broadcom-bluetooth.txt       |  7 ++

Please split bindings to separate patch.

>  drivers/bluetooth/hci_bcm.c                   | 72 +++++++++++++++++++
>  2 files changed, 79 insertions(+)
>
> diff --git a/Documentation/devicetree/bindings/net/broadcom-bluetooth.txt b/Documentation/devicetree/bindings/net/broadcom-bluetooth.txt
> index 4194ff7e..aea3a094 100644
> --- a/Documentation/devicetree/bindings/net/broadcom-bluetooth.txt
> +++ b/Documentation/devicetree/bindings/net/broadcom-bluetooth.txt
> @@ -21,6 +21,12 @@ Optional properties:
>   - clocks: clock specifier if external clock provided to the controller
>   - clock-names: should be "extclk"
>
> + SCO routing parameters:
> + - sco-routing: 0-3 (PCM, Transport, Codec, I2S)
> + - pcm-interface-rate: 0-4 (128 Kbps - 2048 Kbps)
> + - pcm-frame-type: 0 (short), 1 (long)
> + - pcm-sync-mode: 0 (slave), 1 (master)
> + - pcm-clock-mode: 0 (slave), 1 (master)

Are these Broadcom specific? Properties need either vendor prefix or
to be documented in a common location. I think these look like the
latter.

However, this also looks incomplete to me. For example, which SoC
I2S/PCM port is BT audio connected to and how does it fit into the
existing audio related bindings? There's been work on HDMI audio
bindings which would be similar (except for the SCO over UART at
least).


>
>  Example:
>
> @@ -31,5 +37,6 @@ Example:
>         bluetooth {
>                 compatible = "brcm,bcm43438-bt";
>                 max-speed = <921600>;
> +               sco-routing = <1>; /* 1 = transport (UART) */
>         };
>  };
> diff --git a/drivers/bluetooth/hci_bcm.c b/drivers/bluetooth/hci_bcm.c
> index ddbd8c6a..0e729534 100644
> --- a/drivers/bluetooth/hci_bcm.c
> +++ b/drivers/bluetooth/hci_bcm.c
> @@ -83,6 +83,16 @@
>   * @hu: pointer to HCI UART controller struct,
>   *     used to disable flow control during runtime suspend and system sleep
>   * @is_suspended: whether flow control is currently disabled
> + *
> + *  SCO routing parameters:
> + *   used as the parameters for the bcm_set_pcm_int_params command
> + *     @sco_routing:
> + *      >= 255 (skip SCO routing configuration)
> + *      0-3 (PCM, Transport, Codec, I2S)
> + *     @pcm_interface_rate: 0-4 (128 Kbps - 2048 Kbps)
> + *     @pcm_frame_type: 0 (short), 1 (long)
> + *     @pcm_sync_mode: 0 (slave), 1 (master)
> + *     @pcm_clock_mode: 0 (slave), 1 (master)
>   */
>  struct bcm_device {
>         /* Must be the first member, hci_serdev.c expects this. */
> @@ -114,6 +124,13 @@ struct bcm_device {
>         struct hci_uart         *hu;
>         bool                    is_suspended;
>  #endif
> +
> +       /* SCO routing parameters */
> +       u8                      sco_routing;
> +       u8                      pcm_interface_rate;
> +       u8                      pcm_frame_type;
> +       u8                      pcm_sync_mode;
> +       u8                      pcm_clock_mode;
>  };
>
>  /* generic bcm uart resources */
> @@ -189,6 +206,40 @@ static int bcm_set_baudrate(struct hci_uart *hu, unsigned int speed)
>         return 0;
>  }
>
> +static int bcm_configure_sco_routing(struct hci_uart *hu, struct bcm_device *bcm_dev)
> +{
> +       struct hci_dev *hdev = hu->hdev;
> +       struct sk_buff *skb;
> +       struct bcm_set_pcm_int_params params;
> +
> +       if (bcm_dev->sco_routing >= 0xff) {
> +               /* SCO routing configuration should be skipped */
> +               return 0;
> +       }
> +
> +       bt_dev_dbg(hdev, "BCM: Configuring SCO routing (%d %d %d %d %d)",
> +                       bcm_dev->sco_routing, bcm_dev->pcm_interface_rate, bcm_dev->pcm_frame_type,
> +                       bcm_dev->pcm_sync_mode, bcm_dev->pcm_clock_mode);
> +
> +       params.routing = bcm_dev->sco_routing;
> +       params.rate = bcm_dev->pcm_interface_rate;
> +       params.frame_sync = bcm_dev->pcm_frame_type;
> +       params.sync_mode = bcm_dev->pcm_sync_mode;
> +       params.clock_mode = bcm_dev->pcm_clock_mode;
> +
> +       /* Send the SCO routing configuration command */
> +       skb = __hci_cmd_sync(hdev, 0xfc1c, sizeof(params), &params, HCI_CMD_TIMEOUT);
> +       if (IS_ERR(skb)) {
> +               int err = PTR_ERR(skb);
> +               bt_dev_err(hdev, "BCM: failed to configure SCO routing (%d)", err);
> +               return err;
> +       }
> +
> +       kfree_skb(skb);
> +
> +       return 0;
> +}
> +
>  /* bcm_device_exists should be protected by bcm_device_lock */
>  static bool bcm_device_exists(struct bcm_device *device)
>  {
> @@ -534,6 +585,9 @@ static int bcm_setup(struct hci_uart *hu)
>                         host_set_baudrate(hu, speed);
>         }
>
> +       /* Configure SCO routing if needed */
> +       bcm_configure_sco_routing(hu, bcm->dev);
> +
>  finalize:
>         release_firmware(fw);
>
> @@ -1004,9 +1058,21 @@ static int bcm_acpi_probe(struct bcm_device *dev)
>  }
>  #endif /* CONFIG_ACPI */
>
> +static void read_u8_device_property(struct device *device, const char *property, u8 *destination) {
> +       u32 temp;
> +       if (device_property_read_u32(device, property, &temp) == 0) {
> +               *destination = temp & 0xff;
> +       }
> +}
> +
>  static int bcm_of_probe(struct bcm_device *bdev)
>  {
>         device_property_read_u32(bdev->dev, "max-speed", &bdev->oper_speed);
> +       read_u8_device_property(bdev->dev, "sco-routing", &bdev->sco_routing);
> +       read_u8_device_property(bdev->dev, "pcm-interface-rate", &bdev->pcm_interface_rate);
> +       read_u8_device_property(bdev->dev, "pcm-frame-type", &bdev->pcm_frame_type);
> +       read_u8_device_property(bdev->dev, "pcm-sync-mode", &bdev->pcm_sync_mode);
> +       read_u8_device_property(bdev->dev, "pcm-clock-mode", &bdev->pcm_clock_mode);

These are actually broken because the DT properties are 32-bit.

Rob

^ permalink raw reply

* Re: [PATCH 1/2] iproute2: Add support for a few routing protocols
From: Stephen Hemminger @ 2018-06-08 17:29 UTC (permalink / raw)
  To: Donald Sharp; +Cc: netdev, dsahern
In-Reply-To: <20180608124638.4895-2-sharpd@cumulusnetworks.com>

On Fri,  8 Jun 2018 08:46:37 -0400
Donald Sharp <sharpd@cumulusnetworks.com> wrote:

> Add support for:
> 
> BGP
> ISIS
> OSPF
> RIP
> EIGRP
> 
> Routing protocols to iproute2.
> 
> Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
> ---
>  etc/iproute2/rt_protos    | 5 +++++
>  include/linux/rtnetlink.h | 5 +++++
>  lib/rt_names.c            | 5 +++++
>  3 files changed, 15 insertions(+)
> 

I just merged iproute2-next into iproute2 and rtnetlink.h is now up to date.
Please rebase your patches.

^ permalink raw reply

* netdevice notifier and device private data
From: Alexander Aring @ 2018-06-08 17:34 UTC (permalink / raw)
  To: netdev; +Cc: linux-wpan, linux-bluetooth

Hey netdev community,

I am trying to solve some issue which Eric Dumazet points to me by
commit ca0edb131bdf ("ieee802154: 6lowpan: fix possible NULL deref in
lowpan_device_event()").

The issue is that dev->type can be changed during runtime. We don't have
any problems with the netdevice notifier which Eric Dumazet fixed. I am
bother with another netdevice notifier which is broken because the same
tun/tap feature and I don't have any dev->$SUBSYSTEM_DEV_POINTER to check
if this is my netdevice type.

This netdevice notifier will access the dev->priv area which is only
available for the dev->type which was allocated and initialized with the
right dev->priv room. If a tap/tun netdevice changed their dev->type I
might have an illegal read of netdev->priv and I can't confirm that it
has the data which I cast to it. The reason for that is that tap/tun
netdevices doesn't run my netdevice init.

I already see code outside who changed tun netdevice to the
ARPHRD_6LOWPAN type and I suppose they running into this issue.
(Btw: I don't know why somebody wants to changed that type to
ARPHRD_6LOWPAN on tun).

My question is:

How we deal with that? Is it forbidden to access dev->priv from a
global netdevice notifier which only checks for dev->type?

I could solve it like Eric Dumazet and introduce a special
dev->$SUBSYSTEM_DEV_POINTER and check on it if set. At least tun/tap
will not set these pointers, then I am sure the netdevice was running
through my init function. Seems for me the best solution right now and
I think I will go for it.

I assumed before the data of dev->priv is binded to dev->type.
This tun/tap feature will break at least my handling and I am not sure
if there are others users which using dev->priv in netdevice notifier
and don't check on dev->$SUBSYSTEM_DEV_POINTER if they have one.

Thanks for everybody in advance to solve this issue.

- Alex

^ permalink raw reply

* [PATCH v2 0/1] Addition of new routing protocols for iproute2
From: Donald Sharp @ 2018-06-08 17:47 UTC (permalink / raw)
  To: netdev, dsahern, stephen
In-Reply-To: <20180608124638.4895-1-sharpd@cumulusnetworks.com>

The linux kernel recently accepted some new RTPROT values for some
fairly standard routing protocols.  This commit brings in support
for iproute2 to handle these new values.

v2 - Update to latest version of master which has rtnetlink.h code and drop
     of work already done.

Donald Sharp (1):
  iproute2: Add support for a few routing protocols

 etc/iproute2/rt_protos | 5 +++++
 lib/rt_names.c         | 5 +++++
 2 files changed, 10 insertions(+)

-- 
2.14.4

^ permalink raw reply

* [PATCH v2 1/1] iproute2: Add support for a few routing protocols
From: Donald Sharp @ 2018-06-08 17:47 UTC (permalink / raw)
  To: netdev, dsahern, stephen
In-Reply-To: <20180608124638.4895-1-sharpd@cumulusnetworks.com>

Add support for:

BGP
ISIS
OSPF
RIP
EIGRP

Routing protocols to iproute2.

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
---
v2: Update to latest version of code.
 etc/iproute2/rt_protos | 5 +++++
 lib/rt_names.c         | 5 +++++
 2 files changed, 10 insertions(+)

diff --git a/etc/iproute2/rt_protos b/etc/iproute2/rt_protos
index 2a9ee01b..b3a0ec8f 100644
--- a/etc/iproute2/rt_protos
+++ b/etc/iproute2/rt_protos
@@ -16,3 +16,8 @@
 15	ntk
 16      dhcp
 42	babel
+186	bgp
+187	isis
+188	ospf
+189	rip
+192	eigrp
diff --git a/lib/rt_names.c b/lib/rt_names.c
index a02db35e..66d5f2f0 100644
--- a/lib/rt_names.c
+++ b/lib/rt_names.c
@@ -134,6 +134,11 @@ static char *rtnl_rtprot_tab[256] = {
 	[RTPROT_XORP]	  = "xorp",
 	[RTPROT_NTK]	  = "ntk",
 	[RTPROT_DHCP]	  = "dhcp",
+	[RTPROT_BGP]	  = "bgp",
+	[RTPROT_ISIS]	  = "isis",
+	[RTPROT_OSPF]	  = "ospf",
+	[RTPROT_RIP]	  = "rip",
+	[RTPROT_EIGRP]	  = "eigrp",
 };
 
 
-- 
2.14.4

^ permalink raw reply related

* Re: [PATCH bpf] bpf: implement dummy fops for bpf objects
From: Alexei Starovoitov @ 2018-06-08 18:05 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: ast, netdev
In-Reply-To: <20180608161034.3854-1-daniel@iogearbox.net>

On Fri, Jun 08, 2018 at 06:10:34PM +0200, Daniel Borkmann wrote:
> syzkaller was able to trigger the following warning in
> do_dentry_open():
> 
>   WARNING: CPU: 1 PID: 4508 at fs/open.c:778 do_dentry_open+0x4ad/0xe40 fs/open.c:778
>   Kernel panic - not syncing: panic_on_warn set ...
> 
>   CPU: 1 PID: 4508 Comm: syz-executor867 Not tainted 4.17.0+ #90
>   Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
>   Call Trace:
>   [...]
>    vfs_open+0x139/0x230 fs/open.c:908
>    do_last fs/namei.c:3370 [inline]
>    path_openat+0x1717/0x4dc0 fs/namei.c:3511
>    do_filp_open+0x249/0x350 fs/namei.c:3545
>    do_sys_open+0x56f/0x740 fs/open.c:1101
>    __do_sys_openat fs/open.c:1128 [inline]
>    __se_sys_openat fs/open.c:1122 [inline]
>    __x64_sys_openat+0x9d/0x100 fs/open.c:1122
>    do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
>    entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> Problem was that prog and map inodes in bpf fs did not
> implement a dummy file open operation that would return an
> error. The patch in do_dentry_open() checks whether f_ops
> are present and if not bails out with an error. While this
> may be fine, we really shouldn't be throwing a warning
> though. Thus follow the model similar to bad_file_ops and
> reject the request unconditionally with -EIO.
> 
> Fixes: b2197755b263 ("bpf: add support for persistent maps/progs")
> Reported-by: syzbot+2e7fcab0f56fdbb330b8@syzkaller.appspotmail.com
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

Applied, Thanks

^ permalink raw reply

* Re: netdevice notifier and device private data
From: Stephen Hemminger @ 2018-06-08 18:14 UTC (permalink / raw)
  To: Alexander Aring; +Cc: netdev, linux-wpan, linux-bluetooth
In-Reply-To: <20180608173455.vrnfvv7dlu4oxwqf@x220t>

On Fri, 8 Jun 2018 13:34:55 -0400
Alexander Aring <aring@mojatatu.com> wrote:

> Hey netdev community,
> 
> I am trying to solve some issue which Eric Dumazet points to me by
> commit ca0edb131bdf ("ieee802154: 6lowpan: fix possible NULL deref in
> lowpan_device_event()").
> 
> The issue is that dev->type can be changed during runtime. We don't have
> any problems with the netdevice notifier which Eric Dumazet fixed. I am
> bother with another netdevice notifier which is broken because the same
> tun/tap feature and I don't have any dev->$SUBSYSTEM_DEV_POINTER to check
> if this is my netdevice type.
> 
> This netdevice notifier will access the dev->priv area which is only
> available for the dev->type which was allocated and initialized with the
> right dev->priv room. If a tap/tun netdevice changed their dev->type I
> might have an illegal read of netdev->priv and I can't confirm that it
> has the data which I cast to it. The reason for that is that tap/tun
> netdevices doesn't run my netdevice init.
> 
> I already see code outside who changed tun netdevice to the
> ARPHRD_6LOWPAN type and I suppose they running into this issue.
> (Btw: I don't know why somebody wants to changed that type to
> ARPHRD_6LOWPAN on tun).
> 
> My question is:
> 
> How we deal with that? Is it forbidden to access dev->priv from a
> global netdevice notifier which only checks for dev->type?
> 
> I could solve it like Eric Dumazet and introduce a special
> dev->$SUBSYSTEM_DEV_POINTER and check on it if set. At least tun/tap
> will not set these pointers, then I am sure the netdevice was running
> through my init function. Seems for me the best solution right now and
> I think I will go for it.
> 
> I assumed before the data of dev->priv is binded to dev->type.
> This tun/tap feature will break at least my handling and I am not sure
> if there are others users which using dev->priv in netdevice notifier
> and don't check on dev->$SUBSYSTEM_DEV_POINTER if they have one.
> 
> Thanks for everybody in advance to solve this issue.
> 
> - Alex

notifiers are always called with RTNL mutex held
and dev->type should not change unless RTNL is held.

^ permalink raw reply

* Re: [PATCH net] failover: eliminate callback hell
From: Stephen Hemminger @ 2018-06-08 18:30 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alexander Duyck, Samudrala, Sridhar, Jiri Pirko, KY Srinivasan,
	Haiyang Zhang, David Miller, Netdev, Stephen Hemminger
In-Reply-To: <20180607201850-mutt-send-email-mst@kernel.org>

On Thu, 7 Jun 2018 20:22:15 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Thu, Jun 07, 2018 at 09:17:42AM -0700, Stephen Hemminger wrote:
> > On Thu, 7 Jun 2018 18:41:31 +0300
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >   
> > > > > Why would DPDK care what we do in the kernel? Isn't it just slapping
> > > > > vfio-pci on the netdevs it sees?    
> > > > 
> > > > Alex, you are correct for Intel devices; but DPDK on Azure is not Intel based.,.
> > > > The DPDK support uses:
> > > >  * Mellanox MLX5 which uses the Infinband hooks to do DMA directly to
> > > >    userspace. This means VF netdev device must exist and be visible.
> > > >  * Slow path using kernel netvsc device, TAP and BPF to get exception
> > > >    path packets to userspace.
> > > >  * A autodiscovery mechanism that to set all this up that relies on
> > > >    2 device model and sysfs.    
> > > 
> > > Could you describe what does it look for exactly? What will break if
> > > instead of MLX5 being a child of the PV, it's a child of the failover
> > > device?  
> > 
> > So in DPDK there is an internal four device model:
> > 	1. failsafe is like failover in your model
> > 	2. TAP is used like netvsc in kernel
> > 	3. MLX5 is the VF
> > 	4. vdev_netvsc is a pseudo device whose only reason to exist
> > 	   is to glue everything together.
> > 
> > Digging deeper inside...
> > 
> > Vdev_netvsc does:
> >    * driver is started in a convuluted way off device arguments
> >    * probe routine for driver runs
> >       - scans list of kernel interfaces in sysfs
> >       - matches those using VMBUS   
> 
> Could you tell a bit more what does this step entail?

Quick code high/low lights.


	ret = vdev_netvsc_foreach_iface(vdev_netvsc_netvsc_probe, 1, name,
					kvargs, specified, &matched);
static int
vdev_netvsc_foreach_iface(int (*func)(const struct if_nameindex *iface,
				      const struct ether_addr *eth_addr,
				      va_list ap), int is_netvsc, ...)
{
	struct if_nameindex *iface = if_nameindex();


	for (i = 0; iface[i].if_name; ++i) {

		is_netvsc_ret = vdev_netvsc_iface_is_netvsc(&iface[i]) ? 1 : 0;
		if (is_netvsc ^ is_netvsc_ret)
			continue;

		strlcpy(req.ifr_name, iface[i].if_name, sizeof(req.ifr_name));
		if (ioctl(s, SIOCGIFHWADDR, &req) == -1) {
		}

		memcpy(eth_addr.addr_bytes, req.ifr_hwaddr.sa_data,
		       RTE_DIM(eth_addr.addr_bytes));

		ret = func(&iface[i], &eth_addr, ap);  << func is vdev_netvsc_netvsc_probe


static int
vdev_netvsc_netvsc_probe(const struct if_nameindex *iface,
			 const struct ether_addr *eth_addr,
			 va_list ap)
{

	/* Routed NetVSC should not be probed. */
	if (vdev_netvsc_has_route(iface, AF_INET) ||
	    vdev_netvsc_has_route(iface, AF_INET6)) {
		if (!specified)
			return 0;
		DRV_LOG(WARNING, "probably using routed NetVSC interface \"%s\""
			" (index %u)", iface->if_name, iface->if_index);
	}
	/* Create interface context. */
	ctx = calloc(1, sizeof(*ctx));
...


> 
> >       - skip netvsc devices that have an IPV4 route
> >    * scan for PCI devices that have same MAC address as kernel netvsc
> >      devices discovered in previous step
> >    * add these interfaces to arguments to failsafe
> > 
> > Then failsafe configures based on arguments on device
> > 
> > The code works but is specific to the Azure hardware model, and exposes lots
> > of things to application that it should not have to care about.
> > 
> > If you  try and walk through this code in DPDK, you will see why I have developed
> > a dislike for high levels of indirection.
> > 
> > 
> > 	     
> 
> Thanks that was helpful!  I'll try to poke at it next week.  Just from
> the description it seems the kernel is merely used to locate the MAC
> address through sysfs and that for this DPDK code to keep working the
> hidden device must be hidden from it in sysfs - is that a fair summary?

What is the point of the 3 device model? What value does it have
to userspace? How would userspace use each of the three devices.
Going back to 3 device model really doesn't make sense to me if
there is not visible benefit.

Some other considerations:
   * there is ongoing development to support RDMA failover as
     well in netvsc.

   * there is a new driver which implements the VMBUS protocol
     in userspace for DPDK. This gets rid of several layers and
     removes any special scanning code. The vmbus device is
     unbound from netvsc and bound to UIO device.  Then the user
     space DPDK driver manages all the host signalling events
     including VF discovery. It is really 2 device model done
     all in userspace. The kernel device is still needed when
     the VF is mellanox; because that is how the MLX DPDK driver
     rolls.

  * what about nested KVM on Hyper-V? Would it make sense to
    have a way to pass subset of VF queues to guest?

^ permalink raw reply

* Re: [PATCH net] failover: eliminate callback hell
From: Michael S. Tsirkin @ 2018-06-08 19:04 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Alexander Duyck, Samudrala, Sridhar, Jiri Pirko, KY Srinivasan,
	Haiyang Zhang, David Miller, Netdev, Stephen Hemminger
In-Reply-To: <20180608113008.76cbf425@xeon-e3>

On Fri, Jun 08, 2018 at 11:30:08AM -0700, Stephen Hemminger wrote:
>   * what about nested KVM on Hyper-V? Would it make sense to
>     have a way to pass subset of VF queues to guest?

No as long as hyper-v doesn't have a vIOMMU.

-- 
MST

^ permalink raw reply

* Re: Qualcomm rmnet driver and qmi_wwan
From: Bjørn Mork @ 2018-06-08 19:10 UTC (permalink / raw)
  To: Subash Abhinov Kasiviswanathan; +Cc: Daniele Palmas, Dan Williams, netdev
In-Reply-To: <8a77f905ddcd6a8136dd9f2d5de11438@codeaurora.org>

Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> writes:

>> I followed Dan's advice and prepared a very basic test patch
>> (attached) for testing it through ip link.
>>
>> Basically things seem to be properly working with qmicli, but I needed
>> to modify a bit qmi_wwan, so I'm adding Bjørn that maybe can help.
>>
>> Bjørn,
>>
>> I'm trying to add support to rmnet in qmi_wwan: I had to modify the
>> code as in the attached test patch, but I'm not sure it is the right
>> way.
>>
>> This is done under the assumption that the rmnet device would be the
>> only one to register an rx handler to qmi_wwan, but it is probably
>> wrong.
>>
>> Basically I'm wondering if there is a more correct way to understand
>> if an rmnet device is linked to the real qmi_wwan device.
>>
>> Thanks,
>> Daniele
>
>
> Hi Daniele / Bjørn
>
> Is it possible to define a pass through mode in qmi_wwan. This is to
> ensure that all packets in MAP format are passed through instead of
> processing in qmi_wwan layer. The pass through mode would just call
> netif_receive_skb() on all these packets.
>
> That would allow all the packets to be intercepted by the rx_handler
> attached by rmnet which would subsequently de-multiplex and process
> the packets.

This sounds like a good idea. I probably won't have any time to look at
this in the near future, though.  Sorry about that. Extremely overloaded
both at work and private right now...

But I trust that you and Daniele can work out something. Please keep me
CCed, but don't expect timely replies.


Bjørn

^ permalink raw reply

* Re: netdevice notifier and device private data
From: Alexander Aring @ 2018-06-08 19:41 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, linux-wpan, linux-bluetooth
In-Reply-To: <20180608111457.0a9b4cae@xeon-e3>

Hi Stephen,

On Fri, Jun 08, 2018 at 11:14:57AM -0700, Stephen Hemminger wrote:
...
> 
> notifiers are always called with RTNL mutex held
> and dev->type should not change unless RTNL is held.

thanks for you answer. I am not talking about any race between notifiers
vs dev->type change.

I am talking that dev->type was already changed and a upcoming notifier ends
in undefined behaviour when it derefences dev->priv. I have some notifier
which maps a cast from dev->type to a specific structure at dev->priv. This
structure is not there in tap/tun devices if they changed to "my" dev->type
and the notifier occurs.

- Alex

^ permalink raw reply

* Re: netdevice notifier and device private data
From: Michael Richardson @ 2018-06-08 19:37 UTC (permalink / raw)
  To: Alexander Aring; +Cc: netdev, linux-wpan, linux-bluetooth
In-Reply-To: <20180608173455.vrnfvv7dlu4oxwqf@x220t>

[-- Attachment #1: Type: text/plain, Size: 822 bytes --]


Alexander Aring <aring@mojatatu.com> wrote:
    Alex> I already see code outside who changed tun netdevice to the
    Alex> ARPHRD_6LOWPAN type and I suppose they running into this
    Alex> issue.  (Btw: I don't know why somebody wants to changed that
    Alex> type to ARPHRD_6LOWPAN on tun).

so that they can have the kernel do 6lowpan processing, emitting 6lowPAN
packets into userspace to be transfered into a radio via some proprietary
interface (including, for instance SLIP over USB cable to Contiki or OpenWSN stack, 
set up to act as radio only)

-- 
]               Never tell me the odds!                 | ipv6 mesh networks [ 
]   Michael Richardson, Sandelman Software Works        | network architect  [ 
]     mcr@sandelman.ca  http://www.sandelman.ca/        |   ruby on rails    [ 
	

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 464 bytes --]

^ permalink raw reply

* Re: [PATCH 2/3] bpfilter: include bpfilter_umh in assembly instead of using objcopy
From: Alexei Starovoitov @ 2018-06-08 20:47 UTC (permalink / raw)
  To: Masahiro Yamada
  Cc: netdev, Alexei Starovoitov, David S . Miller, Arnd Bergmann,
	Geert Uytterhoeven, linux-kernel, YueHaibing
In-Reply-To: <1528477930-7342-3-git-send-email-yamada.masahiro@socionext.com>

On Sat, Jun 09, 2018 at 02:12:09AM +0900, Masahiro Yamada wrote:
> Do not use the troublesome ELF magic.  What is happening here is to
> embed a user-space program into the kernel.  Simply wrap it in the
> assembly with the '.incbin' directive.
> 
> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
> ---
> 
>  net/bpfilter/Makefile            | 15 ++-------------
>  net/bpfilter/bpfilter_kern.c     | 11 +++++------
>  net/bpfilter/bpfilter_umh_blob.S |  7 +++++++
>  3 files changed, 14 insertions(+), 19 deletions(-)
>  create mode 100644 net/bpfilter/bpfilter_umh_blob.S
> 
> diff --git a/net/bpfilter/Makefile b/net/bpfilter/Makefile
> index aafa720..39c6980 100644
> --- a/net/bpfilter/Makefile
> +++ b/net/bpfilter/Makefile
> @@ -15,18 +15,7 @@ ifeq ($(CONFIG_BPFILTER_UMH), y)
>  HOSTLDFLAGS += -static
>  endif
>  
> -# a bit of elf magic to convert bpfilter_umh binary into a binary blob
> -# inside bpfilter_umh.o elf file referenced by
> -# _binary_net_bpfilter_bpfilter_umh_start symbol
> -# which bpfilter_kern.c passes further into umh blob loader at run-time
> -quiet_cmd_copy_umh = GEN $@
> -      cmd_copy_umh = echo ':' > $(obj)/.bpfilter_umh.o.cmd; \
> -      $(OBJCOPY) -I binary -O $(CONFIG_OUTPUT_FORMAT) \
> -      -B `$(OBJDUMP) -f $<|grep architecture|cut -d, -f1|cut -d' ' -f2` \
> -      --rename-section .data=.init.rodata $< $@
> -
> -$(obj)/bpfilter_umh.o: $(obj)/bpfilter_umh
> -	$(call cmd,copy_umh)
> +$(obj)/bpfilter_umh_blob.o: $(obj)/bpfilter_umh
>  
>  obj-$(CONFIG_BPFILTER_UMH) += bpfilter.o
> -bpfilter-objs += bpfilter_kern.o bpfilter_umh.o
> +bpfilter-objs += bpfilter_kern.o bpfilter_umh_blob.o
> diff --git a/net/bpfilter/bpfilter_kern.c b/net/bpfilter/bpfilter_kern.c
> index b13d058..fcc1a7c 100644
> --- a/net/bpfilter/bpfilter_kern.c
> +++ b/net/bpfilter/bpfilter_kern.c
> @@ -10,11 +10,8 @@
>  #include <linux/file.h>
>  #include "msgfmt.h"
>  
> -#define UMH_start _binary_net_bpfilter_bpfilter_umh_start
> -#define UMH_end _binary_net_bpfilter_bpfilter_umh_end
> -
> -extern char UMH_start;
> -extern char UMH_end;
> +extern char bpfilter_umh_start;
> +extern char bpfilter_umh_end;
>  
>  static struct umh_info info;
>  /* since ip_getsockopt() can run in parallel, serialize access to umh */
> @@ -89,7 +86,9 @@ static int __init load_umh(void)
>  	int err;
>  
>  	/* fork usermode process */
> -	err = fork_usermode_blob(&UMH_start, &UMH_end - &UMH_start, &info);
> +	err = fork_usermode_blob(&bpfilter_umh_end,
> +				 &bpfilter_umh_end - &bpfilter_umh_start,
> +				 &info);
>  	if (err)
>  		return err;
>  	pr_info("Loaded bpfilter_umh pid %d\n", info.pid);
> diff --git a/net/bpfilter/bpfilter_umh_blob.S b/net/bpfilter/bpfilter_umh_blob.S
> new file mode 100644
> index 0000000..40311d1
> --- /dev/null
> +++ b/net/bpfilter/bpfilter_umh_blob.S
> @@ -0,0 +1,7 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +	.section .init.rodata, "a"
> +	.global bpfilter_umh_start
> +bpfilter_umh_start:
> +	.incbin "net/bpfilter/bpfilter_umh"

Interesting. I think this is good idea. Looks cleaner than objcopy magic.
btw CONFIG_OUTPUT_FORMAT already fixed by
commit 8d97ca6b6755 ("bpfilter: fix OUTPUT_FORMAT") in net tree.
Could you please rebase on top of that tree?

^ permalink raw reply

* Re: [PATCH 3/3] bpfilter: do not (ab)use host-program build rule
From: Alexei Starovoitov @ 2018-06-08 20:52 UTC (permalink / raw)
  To: Masahiro Yamada
  Cc: netdev, Alexei Starovoitov, David S . Miller, Arnd Bergmann,
	Geert Uytterhoeven, linux-kernel, YueHaibing, Daniel Borkmann
In-Reply-To: <1528477930-7342-4-git-send-email-yamada.masahiro@socionext.com>

On Sat, Jun 09, 2018 at 02:12:10AM +0900, Masahiro Yamada wrote:
> It is an ugly hack to overwrite $(HOSTCC) with $(CC) to reuse the
> build rules from scripts/Makefile.host.  It should not be tedious
> to write a build rule for its own.
> 
> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
> ---
> 
>  net/bpfilter/Makefile                   | 17 +++++++++++------
>  net/bpfilter/{main.c => bpfilter_umh.c} |  0
>  2 files changed, 11 insertions(+), 6 deletions(-)
>  rename net/bpfilter/{main.c => bpfilter_umh.c} (100%)
> 
> diff --git a/net/bpfilter/Makefile b/net/bpfilter/Makefile
> index 39c6980..6571b30 100644
> --- a/net/bpfilter/Makefile
> +++ b/net/bpfilter/Makefile
> @@ -3,18 +3,23 @@
>  # Makefile for the Linux BPFILTER layer.
>  #
>  
> -hostprogs-y := bpfilter_umh
> -bpfilter_umh-objs := main.o
> -HOSTCFLAGS += -I. -Itools/include/ -Itools/include/uapi
> -HOSTCC := $(CC)

that is a hack indeed. I don't like it either, but see below.

> -
>  ifeq ($(CONFIG_BPFILTER_UMH), y)
>  # builtin bpfilter_umh should be compiled with -static
>  # since rootfs isn't mounted at the time of __init
>  # function is called and do_execv won't find elf interpreter
> -HOSTLDFLAGS += -static
> +STATIC := -static
>  endif
>  
> +quiet_cmd_cc_user = CC      $@
> +      cmd_cc_user = $(CC) -Wall -Wmissing-prototypes -O2 -std=gnu89 \
> +		    -I$(srctree) -I$(srctree)/tools/include/ \
> +		    -I$(srctree)/tools/include/uapi $(STATIC) -o $@ $<
> +
> +$(obj)/bpfilter_umh: $(src)/bpfilter_umh.c FORCE
> +	$(call if_changed,cc_user)

Does this scale?
Please see two top patches here:
https://git.kernel.org/pub/scm/linux/kernel/git/ast/bpf.git/log/?h=ipt_bpf
that add more meat to bpfilter and a lot more files.
Recompiling all of them at once isn't nice either.
This Makefile needs different .c -> .o rules for bpfilter_kern.c files
that get compiled into kernel module and for the rest of umh files:
main.c ctor.c init.c gen.c etc
that need to be compiled .c -> .o differently.
I don't see how to do it without ugly hacks in Makefile.
In that sense HOSTCC = CC hack looked the least ugly to me that's
why I went with it.
Better ideas?

^ permalink raw reply

* [RFC PATCH 0/3] BPF socket filter to deal with skb frags
From: Tushar Dave @ 2018-06-08 21:00 UTC (permalink / raw)
  To: netdev, ast, daniel, davem, john.fastabend, jakub.kicinski, kafai,
	rdna, quentin.monnet, brakmo, acme

This RFC allows bpf socket filter programs to look into complete skb
i.e. linear and non-linear part of skb. (patch1)

For a proof of concept I'm using RDS sample program that uses bpf socket
filter and inspect skb packet data from linear and non-linear part e.g.
skb frags. (patch 2 and 3)

I'm sharing this RFC to get some feedback on direction.

Details:
patch1 adds new bpf helper function and needed infrastructure so that
socket(sk) filter based eBPF program can retrieve non-linear part of skb
(e.g. skb frags) unlike current socket filter that only deals with
linear skb. This patch adds very basic functionality and for now allow
socket filter programs to only read packet data (from linear and
non-linear part of) skb. The idea behind this patch is to add eBPF
helper that allow socket filter based ebpf program to walk through the
skb frag using bpf tail call. This way ebpf program can do deep packet
inspection (i.e. allows to look into headers as well as payload).

patch2 adds sample ebpf socket filter program that uses rds socket. The
sample program opens an rds socket, attach ebpf program to rds socket
and uses bpf helper added in patch 1 to look into skb. For a test,
current ebpf program only prints first few bytes from skb->data and skb
frags.

patch3 allows rds_recv_incoming to invoke bpf socket filter program if
any program is attached to rds socket.


FYI, I'm also working on a follow-up patchset that deals with *struct
scatterlist* to allow RDS filtering for IB/RDMA use cases that do not
have an sk_buff.

Thanks.
-Tushar

Tushar Dave (3):
  ebpf: add next_skb_frag bpf helper for sk filter
  samples/bpf: add sample RDS program
  rds: invoke sk filter attached to rds socket

 include/linux/filter.h                    |   2 +
 include/uapi/linux/bpf.h                  |  10 +-
 net/core/filter.c                         |  44 ++++-
 net/rds/recv.c                            |  17 ++
 samples/bpf/Makefile                      |   3 +
 samples/bpf/rds_skb_kern.c                |  87 +++++++++
 samples/bpf/rds_skb_user.c                | 311 ++++++++++++++++++++++++++++++
 tools/include/uapi/linux/bpf.h            |  10 +-
 tools/testing/selftests/bpf/bpf_helpers.h |   2 +
 9 files changed, 482 insertions(+), 4 deletions(-)
 create mode 100644 samples/bpf/rds_skb_kern.c
 create mode 100644 samples/bpf/rds_skb_user.c

-- 
1.8.3.1

^ permalink raw reply

* [RFC PATCH 1/3] ebpf: add next_skb_frag bpf helper for sk filter
From: Tushar Dave @ 2018-06-08 21:00 UTC (permalink / raw)
  To: netdev, ast, daniel, davem, john.fastabend, jakub.kicinski, kafai,
	rdna, quentin.monnet, brakmo, acme
In-Reply-To: <1528491607-10399-1-git-send-email-tushar.n.dave@oracle.com>

Today socket filter only deals with linear skbs. This change allows
ebpf programs to look into non-linear skb e.g. skb frags. This will be
useful when users need to look into data which is not contained in the
linear part of skb.

Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
 include/linux/filter.h                    |  2 ++
 include/uapi/linux/bpf.h                  | 10 ++++++-
 net/core/filter.c                         | 44 +++++++++++++++++++++++++++++--
 tools/include/uapi/linux/bpf.h            | 10 ++++++-
 tools/testing/selftests/bpf/bpf_helpers.h |  2 ++
 5 files changed, 64 insertions(+), 4 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 9dbcb9d..603b8bf 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -500,6 +500,7 @@ struct sk_filter {
 
 struct bpf_skb_data_end {
 	struct qdisc_skb_cb qdisc_cb;
+	u8 index;
 	void *data_meta;
 	void *data_end;
 };
@@ -534,6 +535,7 @@ static inline void bpf_compute_data_pointers(struct sk_buff *skb)
 	BUILD_BUG_ON(sizeof(*cb) > FIELD_SIZEOF(struct sk_buff, cb));
 	cb->data_meta = skb->data - skb_metadata_len(skb);
 	cb->data_end  = skb->data + skb_headlen(skb);
+	cb->index = 0;
 }
 
 static inline u8 *bpf_skb_cb(struct sk_buff *skb)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index d94d333..5fe9668 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1902,6 +1902,13 @@ struct bpf_stack_build_id {
  *		egress otherwise). This is the only flag supported for now.
  *	Return
  *		**SK_PASS** on success, or **SK_DROP** on error.
+ *
+ * int bpf_next_skb_frag(struct sk_buff *skb)
+ *	Description
+ *		This helper allows users to look into non-linear part of skb
+ *		e.g. skb frags.
+ *	Return
+ *		0 on success, or a negative error in case of failure.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -1976,7 +1983,8 @@ struct bpf_stack_build_id {
 	FN(fib_lookup),			\
 	FN(sock_hash_update),		\
 	FN(msg_redirect_hash),		\
-	FN(sk_redirect_hash),
+	FN(sk_redirect_hash),		\
+	FN(next_skb_frag),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/net/core/filter.c b/net/core/filter.c
index 51ea7dd..fd8e90f 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3752,6 +3752,38 @@ static unsigned long bpf_xdp_copy(void *dst_buff, const void *src_buff,
 	.arg1_type      = ARG_PTR_TO_CTX,
 };
 
+BPF_CALL_1(bpf_next_skb_frag, struct sk_buff *, skb)
+{
+	struct bpf_skb_data_end *cb = (struct bpf_skb_data_end *)skb->cb;
+	const skb_frag_t *frag;
+
+	if (skb->data_len == 0)
+		return -ENODATA;
+
+	if (cb->index == (u8)skb_shinfo(skb)->nr_frags)
+		return -ENODATA;
+
+	/* get the frag start and end address into data_meta and data_end
+	 * respectively so eBPF program can look into skb frag
+	 */
+	frag = &skb_shinfo(skb)->frags[cb->index];
+	cb->data_meta = page_address(skb_frag_page(frag)) +
+			frag->page_offset;
+	cb->data_end = cb->data_meta + skb_frag_size(frag);
+
+	/* update frag index */
+	cb->index++;
+
+	return 0;
+}
+
+static const struct bpf_func_proto bpf_next_skb_frag_proto = {
+	.func		= bpf_next_skb_frag,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+};
+
 BPF_CALL_5(bpf_setsockopt, struct bpf_sock_ops_kern *, bpf_sock,
 	   int, level, int, optname, char *, optval, int, optlen)
 {
@@ -4415,6 +4447,8 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
 		return &bpf_get_socket_cookie_proto;
 	case BPF_FUNC_get_socket_uid:
 		return &bpf_get_socket_uid_proto;
+	case BPF_FUNC_next_skb_frag:
+		return &bpf_next_skb_frag_proto;
 	default:
 		return bpf_base_func_proto(func_id);
 	}
@@ -4698,10 +4732,16 @@ static bool sk_filter_is_valid_access(int off, int size,
 				      struct bpf_insn_access_aux *info)
 {
 	switch (off) {
-	case bpf_ctx_range(struct __sk_buff, tc_classid):
 	case bpf_ctx_range(struct __sk_buff, data):
-	case bpf_ctx_range(struct __sk_buff, data_meta):
+		info->reg_type = PTR_TO_PACKET;
+		break;
 	case bpf_ctx_range(struct __sk_buff, data_end):
+		info->reg_type = PTR_TO_PACKET_END;
+		break;
+	case bpf_ctx_range(struct __sk_buff, data_meta):
+		info->reg_type = PTR_TO_PACKET;
+		break;
+	case bpf_ctx_range(struct __sk_buff, tc_classid):
 	case bpf_ctx_range_till(struct __sk_buff, family, local_port):
 		return false;
 	}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index d94d333..5fe9668 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1902,6 +1902,13 @@ struct bpf_stack_build_id {
  *		egress otherwise). This is the only flag supported for now.
  *	Return
  *		**SK_PASS** on success, or **SK_DROP** on error.
+ *
+ * int bpf_next_skb_frag(struct sk_buff *skb)
+ *	Description
+ *		This helper allows users to look into non-linear part of skb
+ *		e.g. skb frags.
+ *	Return
+ *		0 on success, or a negative error in case of failure.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -1976,7 +1983,8 @@ struct bpf_stack_build_id {
 	FN(fib_lookup),			\
 	FN(sock_hash_update),		\
 	FN(msg_redirect_hash),		\
-	FN(sk_redirect_hash),
+	FN(sk_redirect_hash),		\
+	FN(next_skb_frag),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index 8f143df..51f2153 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -114,6 +114,8 @@ static int (*bpf_get_stack)(void *ctx, void *buf, int size, int flags) =
 static int (*bpf_fib_lookup)(void *ctx, struct bpf_fib_lookup *params,
 			     int plen, __u32 flags) =
 	(void *) BPF_FUNC_fib_lookup;
+static unsigned long long (*bpf_next_skb_frag)(void *ctx) =
+	(void *) BPF_FUNC_next_skb_frag;
 
 /* llvm builtin functions that eBPF C program may use to
  * emit BPF_LD_ABS and BPF_LD_IND instructions
-- 
1.8.3.1

^ permalink raw reply related

* [RFC PATCH 2/3] samples/bpf: add sample RDS program
From: Tushar Dave @ 2018-06-08 21:00 UTC (permalink / raw)
  To: netdev, ast, daniel, davem, john.fastabend, jakub.kicinski, kafai,
	rdna, quentin.monnet, brakmo, acme
In-Reply-To: <1528491607-10399-1-git-send-email-tushar.n.dave@oracle.com>

When run in server mode, the sample RDS program opens PF_RDS socket,
attaches ebpf program to RDS socket which then uses bpf_skb_next_frag
helper along with bpf tail calls to inspect skb linear and non-linear
data.

To ease testing, RDS client functionality is also added so that users
can generate RDS packet.

Run server:
[root@lab71 bpf]# ./rds_skb -s 192.168.3.71
running server in a loop
transport tcp
server bound to address: 192.168.3.71 port 4000
server listening on 192.168.3.71
192.168.3.71 received a packet from 192.168.3.71 of len 8192 cmsg len 0,
on port 52287
payload contains:30 31 32 33 34 35 36 37 38 39 3a 3b 3c 3d 3e 3f 40 41
42 43 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f 50 51 52 53 54 55 56 57 58 59
5a 5b 5c 5d 5e 5f 60 61 62 63 64 65 66 67 68 69 6a 6b ...
server listening on 192.168.3.71

Run client:
[root@lab70 bpf]# ./rds_skb -s 192.168.3.71 -c 192.168.3.70
transport tcp
client bound to address: 192.168.3.71 port 47437
client sending 8192 byte message from 192.168.3.71 to 192.168.3.70 on
port 47437

bpf program output:
[root@lab71]# cat /sys/kernel/debug/tracing/trace_pipe
          <idle>-0     [000] ..s. 218923.839673: 0: 30 31 32
          <idle>-0     [000] ..s. 218923.839682: 0: 33 34 35
          <idle>-0     [000] ..s. 218923.845133: 0: be bf c0
          <idle>-0     [000] ..s. 218923.845135: 0: c1 c2 c3
          <idle>-0     [000] ..s. 218923.850581: 0: be bf c0
          <idle>-0     [000] ..s. 218923.850582: 0: c1 c2 c3
          <idle>-0     [000] ..s. 218923.850582: 0: no more skb frag

Note: changing MTU to 9000 help assure that RDS get skb with
fragments.

Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
 samples/bpf/Makefile       |   3 +
 samples/bpf/rds_skb_kern.c |  87 +++++++++++++
 samples/bpf/rds_skb_user.c | 311 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 401 insertions(+)
 create mode 100644 samples/bpf/rds_skb_kern.c
 create mode 100644 samples/bpf/rds_skb_user.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 62a99ab..a05c3b2 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -51,6 +51,7 @@ hostprogs-y += cpustat
 hostprogs-y += xdp_adjust_tail
 hostprogs-y += xdpsock
 hostprogs-y += xdp_fwd
+hostprogs-y += rds_skb
 
 # Libbpf dependencies
 LIBBPF = $(TOOLS_PATH)/lib/bpf/libbpf.a
@@ -105,6 +106,7 @@ cpustat-objs := bpf_load.o cpustat_user.o
 xdp_adjust_tail-objs := xdp_adjust_tail_user.o
 xdpsock-objs := bpf_load.o xdpsock_user.o
 xdp_fwd-objs := bpf_load.o xdp_fwd_user.o
+rds_skb-objs := bpf_load.o rds_skb_user.o
 
 # Tell kbuild to always build the programs
 always := $(hostprogs-y)
@@ -160,6 +162,7 @@ always += cpustat_kern.o
 always += xdp_adjust_tail_kern.o
 always += xdpsock_kern.o
 always += xdp_fwd_kern.o
+always += rds_skb_kern.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
 HOSTCFLAGS += -I$(srctree)/tools/lib/
diff --git a/samples/bpf/rds_skb_kern.c b/samples/bpf/rds_skb_kern.c
new file mode 100644
index 0000000..c8832d4
--- /dev/null
+++ b/samples/bpf/rds_skb_kern.c
@@ -0,0 +1,87 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/filter.h>
+#include <linux/ptrace.h>
+#include <linux/version.h>
+#include <uapi/linux/bpf.h>
+#include <linux/rds.h>
+#include "bpf_helpers.h"
+
+
+#define PROG(F) SEC("socket/"__stringify(F)) int bpf_func_##F
+
+#define bpf_printk(fmt, ...)				\
+({							\
+	char ____fmt[] = fmt;				\
+	bpf_trace_printk(____fmt, sizeof(____fmt),	\
+			##__VA_ARGS__);			\
+})
+
+
+struct bpf_map_def SEC("maps") jmp_table = {
+	.type = BPF_MAP_TYPE_PROG_ARRAY,
+	.key_size = sizeof(u32),
+	.value_size = sizeof(u32),
+	.max_entries = 2,
+};
+
+#define FRAG 1
+
+static inline void dump_skb(struct __sk_buff *skb)
+{
+	void *data = (void *)(long) skb->data_meta;
+	void *data_end = (void *)(long) skb->data_end;
+	unsigned char *d;
+
+	if (data + 6 > data_end)
+		return;
+
+	d = (unsigned char *)data;
+	bpf_printk("%x %x %x\n", d[0], d[1], d[2]);
+	bpf_printk("%x %x %x\n", d[3], d[4], d[5]);
+	return;
+}
+
+static void populate_skb_frags(struct __sk_buff *skb)
+{
+	int ret;
+
+	ret = bpf_next_skb_frag(skb);
+	if (ret == -ENODATA) {
+		bpf_printk("no more skb frag\n");
+		return;
+	}
+
+	bpf_tail_call(skb, &jmp_table, 1);
+}
+
+/* walk skb frag */
+
+PROG(FRAG)(struct __sk_buff *skb)
+{
+	dump_skb(skb);
+	populate_skb_frags(skb);
+	return 0;
+}
+
+SEC("socket/0")
+int main_prog(struct __sk_buff *skb)
+{
+	void *data = (void *)(long) skb->data;
+	void *data_end = (void *)(long) skb->data_end;
+	int ret;
+	unsigned char *d;
+
+	if (data + 6 > data_end) {
+		bpf_printk("out\n");
+		return 0;
+	}
+
+	d = (unsigned char *)data;
+	bpf_printk("%x %x %x\n", d[0], d[1], d[2]);
+	bpf_printk("%x %x %x\n", d[3], d[4], d[5]);
+
+	populate_skb_frags(skb);
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/samples/bpf/rds_skb_user.c b/samples/bpf/rds_skb_user.c
new file mode 100644
index 0000000..9f73dc3
--- /dev/null
+++ b/samples/bpf/rds_skb_user.c
@@ -0,0 +1,311 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <arpa/inet.h>
+#include <assert.h>
+#include "bpf_load.h"
+#include <getopt.h>
+#include <errno.h>
+#include <netinet/in.h>
+#include <limits.h>
+#include <linux/sockios.h>
+#include <linux/rds.h>
+#include <linux/errqueue.h>
+#include <linux/bpf.h>
+#include <strings.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <string.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <unistd.h>
+
+#define TESTPORT	4000
+#define BUFSIZE		8192
+
+static const char *trans2str(int trans)
+{
+	switch (trans) {
+	case RDS_TRANS_TCP:
+		return ("tcp");
+	case RDS_TRANS_NONE:
+		return ("none");
+	default:
+		return ("unknown");
+	}
+}
+
+static int gettransport(int sock)
+{
+	int err;
+	char val;
+	socklen_t len = sizeof(int);
+
+	err = getsockopt(sock, SOL_RDS, SO_RDS_TRANSPORT,
+			 (char *)&val, &len);
+	if (err < 0) {
+		fprintf(stderr, "%s: getsockopt %s\n",
+			__func__, strerror(errno));
+		return err;
+	}
+	return (int)val;
+}
+
+static int settransport(int sock, int transport)
+{
+	int err;
+
+	err = setsockopt(sock, SOL_RDS, SO_RDS_TRANSPORT,
+			 (char *)&transport, sizeof(transport));
+	if (err < 0) {
+		fprintf(stderr, "could not set transport %s, %s\n",
+			trans2str(transport), strerror(errno));
+	}
+	return err;
+}
+
+static void print_sock_local_info(int fd, char *str, struct sockaddr_in *ret)
+{
+	socklen_t sin_size = sizeof(struct sockaddr_in);
+	struct sockaddr_in sin;
+	int err;
+
+	err = getsockname(fd, (struct sockaddr *)&sin, &sin_size);
+	if (err < 0) {
+		fprintf(stderr, "%s getsockname %s\n",
+			__func__, strerror(errno));
+		return;
+	}
+	printf("%s address: %s port %d\n",
+		(str ? str : ""), inet_ntoa(sin.sin_addr), ntohs(sin.sin_port));
+
+	if (ret != NULL)
+		*ret = sin;
+}
+
+static void server(char *address, in_port_t port)
+{
+	struct sockaddr_in sin, din;
+	struct msghdr msg;
+	struct iovec *iov;
+	int rc, sock;
+	char *buf;
+
+	buf = calloc(BUFSIZE, sizeof(char));
+	if (!buf) {
+		fprintf(stderr, "%s: calloc %s\n", __func__, strerror(errno));
+		return;
+	}
+
+	sock = socket(PF_RDS, SOCK_SEQPACKET, 0);
+	if (sock < 0) {
+		fprintf(stderr, "%s: socket %s\n", __func__, strerror(errno));
+		goto out;
+	}
+	if (settransport(sock, RDS_TRANS_TCP) < 0)
+		goto out;
+
+	printf("transport %s\n", trans2str(gettransport(sock)));
+
+	memset(&sin, 0, sizeof(sin));
+	sin.sin_family = AF_INET;
+	sin.sin_addr.s_addr = inet_addr(address);
+	sin.sin_port = htons(port);
+
+	rc = bind(sock, (struct sockaddr *)&sin, sizeof(sin));
+	if (rc < 0) {
+		fprintf(stderr, "%s: bind %s\n", __func__, strerror(errno));
+		goto out;
+	}
+
+	/* attach eBPF program */
+	assert(setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, &prog_fd[1],
+			  sizeof(prog_fd[0])) == 0);
+
+	print_sock_local_info(sock, "server bound to", NULL);
+
+	iov = calloc(1, sizeof(struct iovec));
+	if (!iov) {
+		fprintf(stderr, "%s: calloc %s\n", __func__, strerror(errno));
+		goto out;
+	}
+
+	while (1) {
+		memset(buf, 0, BUFSIZE);
+		iov[0].iov_base = buf;
+		iov[0].iov_len = BUFSIZE;
+
+		memset(&msg, 0, sizeof(msg));
+		msg.msg_name = &din;
+		msg.msg_namelen = sizeof(din);
+		msg.msg_iov = iov;
+		msg.msg_iovlen = 1;
+
+		printf("server listening on %s\n", inet_ntoa(sin.sin_addr));
+
+		rc = recvmsg(sock, &msg, 0);
+		if (rc < 0) {
+			fprintf(stderr, "%s: recvmsg %s\n",
+				__func__, strerror(errno));
+			break;
+		}
+
+		printf("%s received a packet from %s of len %d cmsg len %d, on port %d\n",
+			inet_ntoa(sin.sin_addr),
+			inet_ntoa(din.sin_addr),
+			(uint32_t) iov[0].iov_len,
+			(uint32_t) msg.msg_controllen,
+			ntohs(din.sin_port));
+
+		{
+			int i;
+
+			printf("payload contains:");
+			for (i = 0; i < 60; i++)
+				printf("%x ", buf[i]);
+			printf("...\n");
+		}
+	}
+	free(iov);
+out:
+	free(buf);
+}
+
+static void create_message(char *buf)
+{
+	unsigned int i;
+
+	for (i = 0; i < BUFSIZE; i++) {
+		buf[i] = i + 0x30;
+	}
+}
+
+static int build_rds_packet(struct msghdr *msg, char *buf)
+{
+	struct iovec *iov;
+
+	iov = calloc(1, sizeof(struct iovec));
+	if (!iov) {
+		fprintf(stderr, "%s: calloc %s\n", __func__, strerror(errno));
+		return -1;
+	}
+
+	msg->msg_iov = iov;
+	msg->msg_iovlen = 1;
+
+	iov[0].iov_base = buf;
+	iov[0].iov_len = BUFSIZE * sizeof(char);
+
+	return 0;
+}
+
+static void client(char *localaddr, char *remoteaddr, in_port_t server_port)
+{
+	struct sockaddr_in sin, din;
+	struct msghdr msg;
+	int rc, sock;
+	char *buf;
+
+	buf = calloc(BUFSIZE, sizeof(char));
+	if (!buf) {
+		fprintf(stderr, "%s: calloc %s\n", __func__, strerror(errno));
+		return;
+	}
+
+	create_message(buf);
+
+	sock = socket(PF_RDS, SOCK_SEQPACKET, 0);
+	if (sock < 0) {
+		fprintf(stderr, "%s: socket %s\n", __func__, strerror(errno));
+		goto out;
+	}
+
+	if (settransport(sock, RDS_TRANS_TCP) < 0)
+		goto out;
+
+	printf("transport %s\n", trans2str(gettransport(sock)));
+
+	memset(&sin, 0, sizeof(sin));
+	sin.sin_family = AF_INET;
+	sin.sin_addr.s_addr = inet_addr(localaddr);
+	sin.sin_port = 0;
+
+	rc = bind(sock, (struct sockaddr *)&sin, sizeof(sin));
+	if (rc < 0) {
+		fprintf(stderr, "%s: bind %s\n", __func__, strerror(errno));
+		goto out;
+	}
+	print_sock_local_info(sock, "client bound to",  &sin);
+
+	memset(&msg, 0, sizeof(msg));
+	msg.msg_name = &din;
+	msg.msg_namelen = sizeof(din);
+
+	memset(&din, 0, sizeof(din));
+	din.sin_family = AF_INET;
+	din.sin_addr.s_addr = inet_addr(remoteaddr);
+	din.sin_port = htons(server_port);
+
+	rc = build_rds_packet(&msg, buf);
+	if (rc < 0)
+		goto out;
+
+	printf("client sending %d byte message from %s to %s on port %d\n",
+		(uint32_t) msg.msg_iov->iov_len, localaddr,
+		remoteaddr, ntohs(sin.sin_port));
+
+	rc = sendmsg(sock, &msg, 0);
+	if (rc < 0)
+		fprintf(stderr, "%s: sendmsg %s\n", __func__, strerror(errno));
+
+	if (msg.msg_control)
+		free(msg.msg_control);
+	if (msg.msg_iov)
+		free(msg.msg_iov);
+out:
+	free(buf);
+
+	return;
+}
+
+static void usage(char *progname)
+{
+	fprintf(stderr, "Usage %s [-s srvaddr] [-c clientaddr]\n", progname);
+}
+
+int main(int argc, char **argv)
+{
+	in_port_t server_port = TESTPORT;
+	char *serveraddr = NULL;
+	char *clientaddr = NULL;
+	char filename[256];
+	int opt;
+
+	while ((opt = getopt(argc, argv, "s:c:")) != -1) {
+		switch (opt) {
+		case 's':
+			serveraddr = optarg;
+			break;
+		case 'c':
+			clientaddr = optarg;
+			break;
+		default:
+			usage(argv[0]);
+			return 1;
+		}
+	}
+
+	snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
+
+	if (load_bpf_file(filename)) {
+		fprintf(stderr, "Error: load_bpf_file %s", bpf_log_buf);
+		return 1;
+	}
+
+	if (serveraddr && !clientaddr) {
+		printf("running server in a loop\n");
+		server(serveraddr, server_port);
+	} else if (serveraddr && clientaddr) {
+		client(clientaddr, serveraddr, server_port);
+	}
+
+	return 0;
+}
-- 
1.8.3.1

^ permalink raw reply related

* [RFC PATCH 3/3] rds: invoke sk filter attached to rds socket
From: Tushar Dave @ 2018-06-08 21:00 UTC (permalink / raw)
  To: netdev, ast, daniel, davem, john.fastabend, jakub.kicinski, kafai,
	rdna, quentin.monnet, brakmo, acme
In-Reply-To: <1528491607-10399-1-git-send-email-tushar.n.dave@oracle.com>

RDS module sits on top of TCP (rds_tcp) and IB (rds_rdma), so messages
arrive in form of skb (over TCP) and scatterlist (over IB/RDMA).
However, because socket filter only deal with skb (e.g. struct skb as
bpf context) we can only use socket filter for rds_tcp and not for
rds_rdma. For that reason this patch invokes socket filter only for
rds socket with tcp transport e.g. rds_tcp.

note:
BTW, we dont want rds-core to be polluted by module-specific data
structures e.g. we included tcp.h to retrieve rds_tcp specific
structures. For non-RFC version we will add a way to get transport
specific indirections to get the skb.

Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
 net/rds/recv.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/net/rds/recv.c b/net/rds/recv.c
index dc67458..3be9628 100644
--- a/net/rds/recv.c
+++ b/net/rds/recv.c
@@ -39,6 +39,7 @@
 #include <linux/rds.h>
 
 #include "rds.h"
+#include "tcp.h"
 
 void rds_inc_init(struct rds_incoming *inc, struct rds_connection *conn,
 		  __be32 saddr)
@@ -369,6 +370,22 @@ void rds_recv_incoming(struct rds_connection *conn, __be32 saddr, __be32 daddr,
 	/* We can be racing with rds_release() which marks the socket dead. */
 	sk = rds_rs_to_sk(rs);
 
+	if (rs->rs_transport->t_type == RDS_TRANS_TCP) {
+		struct sk_buff *skb;
+		struct sk_filter *filter = sk->sk_filter;
+		struct rds_tcp_incoming *tinc;
+
+		tinc = container_of(inc, struct rds_tcp_incoming, ti_inc);
+		skb = tinc->ti_skb_list.next;
+		rcu_read_lock();
+		filter = rcu_dereference(sk->sk_filter);
+		if (filter) {
+			bpf_compute_data_pointers(skb);
+			bpf_prog_run_save_cb(filter->prog, skb);
+		}
+		rcu_read_unlock();
+	}
+
 	/* serialize with rds_release -> sock_orphan */
 	write_lock_irqsave(&rs->rs_recv_lock, flags);
 	if (!sock_flag(sk, SOCK_DEAD)) {
-- 
1.8.3.1

^ permalink raw reply related

* Re: Fw: [Bug 199995] New: Ramdomly sent TCP Reset from Kernel with bonding mode "brodcast"
From: Michal Kubecek @ 2018-06-08 21:04 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Eric Dumazet, netdev
In-Reply-To: <20180608095954.4a0437e4@xeon-e3>

On Fri, Jun 08, 2018 at 09:59:54AM -0700, Stephen Hemminger wrote:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=199995
> 
>             Bug ID: 199995
>            Summary: Ramdomly sent TCP Reset from Kernel with bonding mode
>                     "brodcast"
> 
> after a dist upgrade from Ubuntu 17.10 (Kernel 4.13.x) to Ubuntu 18.04 (Kernel
> 4.15.0) I suffer from ramdomly generated TCP RST packets sent (presumably) by
> the Kernel 
> on a bonding device that uses bonding mode "brodcast" with 2 physical NICs.
> 
> With tcpdump/whireshark I can see that the kernel randomly sends TCP-RST
> packets after the SYN/ACK/ACK packet is received (see attached PCAP).
> This only happens if the kernel receives the initial SYN packet on both
> physical NICs (and therefore seeing it twice), before the connection is
> established by sending SYN/ACK.
> It's not happening in 100% of all cases and only, if the system can use two or
> more CPU cores/threads. With only one CPU available to the system, this
> behaviour is not reproducable.

I have seen similar report earlier from one of our customers running
SLE12 SP2 (kernel 4.4). The problem is that if duplicated SYN packet is
received on both slaves, these two copies can be processed by the
lockless listener simultaneously on different CPUs and each can reply by
SYNACK with different sequence number which results in a reset.

I tried to think of a way to prevent this race without losing the
performance gain of lockless listener but couldn't come with anything.
Eventually, I managed to persuade the customer that this setup (where
each packet is received twice under normal circumstances) is not what
broadcast mode was designed for (based on the description in
Documentation/networking/bonding.txt).

However, the lockless listener was introduced in 4.4 so it's not clear
why reporter started encountering this after an upgrade from 4.13 to
4.15.

Michal Kubecek

^ permalink raw reply

* Re: [RFC PATCH 1/3] ebpf: add next_skb_frag bpf helper for sk filter
From: Daniel Borkmann @ 2018-06-08 21:27 UTC (permalink / raw)
  To: Tushar Dave, netdev, ast, davem, john.fastabend, jakub.kicinski,
	kafai, rdna, quentin.monnet, brakmo, acme
In-Reply-To: <1528491607-10399-2-git-send-email-tushar.n.dave@oracle.com>

On 06/08/2018 11:00 PM, Tushar Dave wrote:
> Today socket filter only deals with linear skbs. This change allows
> ebpf programs to look into non-linear skb e.g. skb frags. This will be
> useful when users need to look into data which is not contained in the
> linear part of skb.

Hmm, I don't think this statement is correct in its form here ... they
can handle non-linear skbs just fine.

Straight forward way is to use bpf_skb_load_bytes(). It's simple and uses
internally skb_header_pointer(), and that one of course walks everything
if it really has to via skb_copy_bits() (page frags _and_ frag list). And
if you need to look into mac/net headers that may otherwise not be accessible
anymore from socket layer, there's bpf_skb_load_bytes_relative() helper
which is effectively doing the negative offset trick from ld_abs/ind more
efficient for multi-byte loads.

Thanks,
Daniel

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox