Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next] cxgb4: do L1 config when module is inserted
From: David Miller @ 2018-05-21 16:21 UTC (permalink / raw)
  To: ganeshgr; +Cc: netdev, nirranjan, indranil, venkatesh, leedom
In-Reply-To: <1526894143-4986-1-git-send-email-ganeshgr@chelsio.com>

From: Ganesh Goudar <ganeshgr@chelsio.com>
Date: Mon, 21 May 2018 14:45:43 +0530

> trigger an L1 configure operation when a transceiver module
> is inserted in order to cause current "sticky" options like
> Requested Forward Error Correction to be reapplied.
> 
> Signed-off-by: Casey Leedom <leedom@chelsio.com>
> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>

Applied, although:

> @@ -491,6 +491,9 @@ struct link_config {
>  
>  	unsigned char  link_ok;          /* link up? */
>  	unsigned char  link_down_rc;     /* link down reason */
> +
> +	unsigned char   new_module;	 /* ->OS Transceiver Module inserted */
> +	unsigned char   redo_l1cfg;	 /* ->CC redo current "sticky" L1 CFG */
>  };

The various booleans in link_config should be converted to use type 'bool'
and true/false values.

^ permalink raw reply

* [PATCH net] dccp: don't free ccid2_hc_tx_sock struct in dccp_disconnect()
From: Alexey Kodanev @ 2018-05-21 16:28 UTC (permalink / raw)
  To: netdev; +Cc: David Miller, Alexey Kodanev

Syzbot reported the use-after-free in timer_is_static_object() [1].

This can happen because the structure for the rto timer (ccid2_hc_tx_sock)
is removed in dccp_disconnect(), and ccid2_hc_tx_rto_expire() can be
called after that.

The report [1] is similar to the one in commit 120e9dabaf55 ("dccp:
defer ccid_hc_tx_delete() at dismantle time"). And the fix is the same,
delay freeing ccid2_hc_tx_sock structure, so that it is freed in
dccp_sk_destruct().

[1]
==================================================================
BUG: KASAN: use-after-free in timer_is_static_object+0x80/0x90
kernel/time/timer.c:607
Read of size 8 at addr ffff8801bebb5118 by task syz-executor2/25299

CPU: 1 PID: 25299 Comm: syz-executor2 Not tainted 4.17.0-rc5+ #54
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
  <IRQ>
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
  print_address_description+0x6c/0x20b mm/kasan/report.c:256
  kasan_report_error mm/kasan/report.c:354 [inline]
  kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
  __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
  timer_is_static_object+0x80/0x90 kernel/time/timer.c:607
  debug_object_activate+0x2d9/0x670 lib/debugobjects.c:508
  debug_timer_activate kernel/time/timer.c:709 [inline]
  debug_activate kernel/time/timer.c:764 [inline]
  __mod_timer kernel/time/timer.c:1041 [inline]
  mod_timer+0x4d3/0x13b0 kernel/time/timer.c:1102
  sk_reset_timer+0x22/0x60 net/core/sock.c:2742
  ccid2_hc_tx_rto_expire+0x587/0x680 net/dccp/ccids/ccid2.c:147
  call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
  expire_timers kernel/time/timer.c:1363 [inline]
  __run_timers+0x79e/0xc50 kernel/time/timer.c:1666
  run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
  __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285
  invoke_softirq kernel/softirq.c:365 [inline]
  irq_exit+0x1d1/0x200 kernel/softirq.c:405
  exiting_irq arch/x86/include/asm/apic.h:525 [inline]
  smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052
  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
  </IRQ>
...
Allocated by task 25374:
  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
  set_track mm/kasan/kasan.c:460 [inline]
  kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
  kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
  kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554
  ccid_new+0x25b/0x3e0 net/dccp/ccid.c:151
  dccp_hdlr_ccid+0x27/0x150 net/dccp/feat.c:44
  __dccp_feat_activate+0x184/0x270 net/dccp/feat.c:344
  dccp_feat_activate_values+0x3a7/0x819 net/dccp/feat.c:1538
  dccp_create_openreq_child+0x472/0x610 net/dccp/minisocks.c:128
  dccp_v4_request_recv_sock+0x12c/0xca0 net/dccp/ipv4.c:408
  dccp_v6_request_recv_sock+0x125d/0x1f10 net/dccp/ipv6.c:415
  dccp_check_req+0x455/0x6a0 net/dccp/minisocks.c:197
  dccp_v4_rcv+0x7b8/0x1f3f net/dccp/ipv4.c:841
  ip_local_deliver_finish+0x2e3/0xd80 net/ipv4/ip_input.c:215
  NF_HOOK include/linux/netfilter.h:288 [inline]
  ip_local_deliver+0x1e1/0x720 net/ipv4/ip_input.c:256
  dst_input include/net/dst.h:450 [inline]
  ip_rcv_finish+0x81b/0x2200 net/ipv4/ip_input.c:396
  NF_HOOK include/linux/netfilter.h:288 [inline]
  ip_rcv+0xb70/0x143d net/ipv4/ip_input.c:492
  __netif_receive_skb_core+0x26f5/0x3630 net/core/dev.c:4592
  __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4657
  process_backlog+0x219/0x760 net/core/dev.c:5337
  napi_poll net/core/dev.c:5735 [inline]
  net_rx_action+0x7b7/0x1930 net/core/dev.c:5801
  __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285

Freed by task 25374:
  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
  set_track mm/kasan/kasan.c:460 [inline]
  __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
  kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
  __cache_free mm/slab.c:3498 [inline]
  kmem_cache_free+0x86/0x2d0 mm/slab.c:3756
  ccid_hc_tx_delete+0xc3/0x100 net/dccp/ccid.c:190
  dccp_disconnect+0x130/0xc66 net/dccp/proto.c:286
  dccp_close+0x3bc/0xe60 net/dccp/proto.c:1045
  inet_release+0x104/0x1f0 net/ipv4/af_inet.c:427
  inet6_release+0x50/0x70 net/ipv6/af_inet6.c:460
  sock_release+0x96/0x1b0 net/socket.c:594
  sock_close+0x16/0x20 net/socket.c:1149
  __fput+0x34d/0x890 fs/file_table.c:209
  ____fput+0x15/0x20 fs/file_table.c:243
  task_work_run+0x1e4/0x290 kernel/task_work.c:113
  tracehook_notify_resume include/linux/tracehook.h:191 [inline]
  exit_to_usermode_loop+0x2bd/0x310 arch/x86/entry/common.c:166
  prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
  syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
  do_syscall_64+0x6ac/0x800 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff8801bebb4cc0
  which belongs to the cache ccid2_hc_tx_sock of size 1240
The buggy address is located 1112 bytes inside of
  1240-byte region [ffff8801bebb4cc0, ffff8801bebb5198)
The buggy address belongs to the page:
page:ffffea0006faed00 count:1 mapcount:0 mapping:ffff8801bebb41c0
index:0xffff8801bebb5240 compound_mapcount: 0
flags: 0x2fffc0000008100(slab|head)
raw: 02fffc0000008100 ffff8801bebb41c0 ffff8801bebb5240 0000000100000003
raw: ffff8801cdba3138 ffffea0007634120 ffff8801cdbaab40 0000000000000000
page dumped because: kasan: bad access detected
...
==================================================================

Reported-by: syzbot+5d47e9ec91a6f15dbd6f@syzkaller.appspotmail.com
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
---
 net/dccp/proto.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index 84cd4e3..0d56e36 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -283,9 +283,7 @@ int dccp_disconnect(struct sock *sk, int flags)
 
 	dccp_clear_xmit_timers(sk);
 	ccid_hc_rx_delete(dp->dccps_hc_rx_ccid, sk);
-	ccid_hc_tx_delete(dp->dccps_hc_tx_ccid, sk);
 	dp->dccps_hc_rx_ccid = NULL;
-	dp->dccps_hc_tx_ccid = NULL;
 
 	__skb_queue_purge(&sk->sk_receive_queue);
 	__skb_queue_purge(&sk->sk_write_queue);
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PACTH net-next] cxgb4: copy the length of cpl_tx_pkt_core to fw_wr
From: David Miller @ 2018-05-21 16:16 UTC (permalink / raw)
  To: ganeshgr; +Cc: netdev, nirranjan, indranil, venkatesh
In-Reply-To: <1526885796-13618-1-git-send-email-ganeshgr@chelsio.com>

From: Ganesh Goudar <ganeshgr@chelsio.com>
Date: Mon, 21 May 2018 12:26:36 +0530

> immdlen field of FW_ETH_TX_PKT_WR is filled in a wrong way,
> we must copy the length of all the cpls encapsulated in fw
> work request. In the xmit path we missed adding the length
> of CPL_TX_PKT_CORE but we added the length of WR_HDR and it
> worked because WR_HDR and CPL_TX_PKT_CORE are of same length.
> Add the length of cpl_tx_pkt_core not WR_HDR's. This also
> fixes the lso cpl errors for udp tunnels
> 
> Fixes: d0a1299c6bf7 ("cxgb4: add support for vxlan segmentation offload")
> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] net: ethernet: Sort Kconfig sourcing alphabetically
From: David Miller @ 2018-05-21 16:15 UTC (permalink / raw)
  To: f.fainelli
  Cc: netdev, arnd, andrew, aviad.krawczyk, jaswinder.singh,
	hayashi.kunihiko, mdf, linus.walleij, alexandre.belloni,
	linux-kernel
In-Reply-To: <20180521035830.7897-1-f.fainelli@gmail.com>

From: Florian Fainelli <f.fainelli@gmail.com>
Date: Sun, 20 May 2018 20:58:28 -0700

> A number of entries were not alphabetically sorted, remedy that.
> 
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] net: phy: phylink: Don't release NULL GPIO
From: David Miller @ 2018-05-21 16:14 UTC (permalink / raw)
  To: f.fainelli; +Cc: netdev, rmk+kernel, andrew, linux-kernel
In-Reply-To: <20180521034947.469-1-f.fainelli@gmail.com>

From: Florian Fainelli <f.fainelli@gmail.com>
Date: Sun, 20 May 2018 20:49:47 -0700

> If CONFIG_GPIOLIB is disabled, gpiod_put() becomes a stub that produces a
> warning, this helped identify that we could be attempting to release a NULL
> pl->link_gpio GPIO descriptor, so guard against that.
> 
> Fixes: daab3349ad1a ("net: phy: phylink: Release link GPIO")
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

Applied.

^ permalink raw reply

* [PATCH v2 bpf-next 2/3] net/ipv6: Add helper to return path MTU based on fib result
From: dsahern @ 2018-05-21 16:08 UTC (permalink / raw)
  To: netdev, borkmann, ast; +Cc: davem, David Ahern
In-Reply-To: <20180521160816.7060-1-dsahern@kernel.org>

From: David Ahern <dsahern@gmail.com>

Determine path MTU from a FIB lookup result. Logic is based on
ip6_dst_mtu_forward plus lookup of nexthop exception.

Add ip6_dst_mtu_forward to ipv6_stubs to handle access by core
bpf code.

Signed-off-by: David Ahern <dsahern@gmail.com>
---
 include/net/addrconf.h   |  2 ++
 include/net/ip6_fib.h    |  6 ++++++
 include/net/ip6_route.h  |  3 +++
 net/ipv6/addrconf_core.c |  8 ++++++++
 net/ipv6/af_inet6.c      |  1 +
 net/ipv6/route.c         | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 68 insertions(+)

diff --git a/include/net/addrconf.h b/include/net/addrconf.h
index ff766ab207e0..c07d4dd09361 100644
--- a/include/net/addrconf.h
+++ b/include/net/addrconf.h
@@ -236,6 +236,8 @@ struct ipv6_stub {
 						   struct flowi6 *fl6, int oif,
 						   const struct sk_buff *skb,
 						   int strict);
+	u32 (*ip6_mtu_from_fib6)(struct fib6_info *f6i, struct in6_addr *daddr,
+				 struct in6_addr *saddr);
 
 	void (*udpv6_encap_enable)(void);
 	void (*ndisc_send_na)(struct net_device *dev, const struct in6_addr *daddr,
diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index cc70f6da8462..7897efe80727 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -412,6 +412,12 @@ static inline struct net_device *fib6_info_nh_dev(const struct fib6_info *f6i)
 	return f6i->fib6_nh.nh_dev;
 }
 
+static inline
+struct lwtunnel_state *fib6_info_nh_lwt(const struct fib6_info *f6i)
+{
+	return f6i->fib6_nh.nh_lwtstate;
+}
+
 void inet6_rt_notify(int event, struct fib6_info *rt, struct nl_info *info,
 		     unsigned int flags);
 
diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index 4cf1ef935ed9..7b9c82de11cc 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -300,6 +300,9 @@ static inline unsigned int ip6_dst_mtu_forward(const struct dst_entry *dst)
 	return mtu;
 }
 
+u32 ip6_mtu_from_fib6(struct fib6_info *f6i, struct in6_addr *daddr,
+		      struct in6_addr *saddr);
+
 struct neighbour *ip6_neigh_lookup(const struct in6_addr *gw,
 				   struct net_device *dev, struct sk_buff *skb,
 				   const void *daddr);
diff --git a/net/ipv6/addrconf_core.c b/net/ipv6/addrconf_core.c
index 2fe754fd4f5e..5cd0029d930e 100644
--- a/net/ipv6/addrconf_core.c
+++ b/net/ipv6/addrconf_core.c
@@ -161,12 +161,20 @@ eafnosupport_fib6_multipath_select(const struct net *net, struct fib6_info *f6i,
 	return f6i;
 }
 
+static u32
+eafnosupport_ip6_mtu_from_fib6(struct fib6_info *f6i, struct in6_addr *daddr,
+			       struct in6_addr *saddr)
+{
+	return 0;
+}
+
 const struct ipv6_stub *ipv6_stub __read_mostly = &(struct ipv6_stub) {
 	.ipv6_dst_lookup   = eafnosupport_ipv6_dst_lookup,
 	.fib6_get_table    = eafnosupport_fib6_get_table,
 	.fib6_table_lookup = eafnosupport_fib6_table_lookup,
 	.fib6_lookup       = eafnosupport_fib6_lookup,
 	.fib6_multipath_select = eafnosupport_fib6_multipath_select,
+	.ip6_mtu_from_fib6 = eafnosupport_ip6_mtu_from_fib6,
 };
 EXPORT_SYMBOL_GPL(ipv6_stub);
 
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 50de8b0d4f70..9ed0eae91758 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -894,6 +894,7 @@ static const struct ipv6_stub ipv6_stub_impl = {
 	.fib6_table_lookup = fib6_table_lookup,
 	.fib6_lookup       = fib6_lookup,
 	.fib6_multipath_select = fib6_multipath_select,
+	.ip6_mtu_from_fib6 = ip6_mtu_from_fib6,
 	.udpv6_encap_enable = udpv6_encap_enable,
 	.ndisc_send_na = ndisc_send_na,
 	.nd_tbl	= &nd_tbl,
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index cc24ed3bc334..dc5d5c84dbef 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2603,6 +2603,54 @@ static unsigned int ip6_mtu(const struct dst_entry *dst)
 	return mtu - lwtunnel_headroom(dst->lwtstate, mtu);
 }
 
+/* MTU selection:
+ * 1. mtu on route is locked - use it
+ * 2. mtu from nexthop exception
+ * 3. mtu from egress device
+ *
+ * based on ip6_dst_mtu_forward and exception logic of
+ * rt6_find_cached_rt; called with rcu_read_lock
+ */
+u32 ip6_mtu_from_fib6(struct fib6_info *f6i, struct in6_addr *daddr,
+		      struct in6_addr *saddr)
+{
+	struct rt6_exception_bucket *bucket;
+	struct rt6_exception *rt6_ex;
+	struct in6_addr *src_key;
+	struct inet6_dev *idev;
+	u32 mtu = 0;
+
+	if (unlikely(fib6_metric_locked(f6i, RTAX_MTU))) {
+		mtu = f6i->fib6_pmtu;
+		if (mtu)
+			goto out;
+	}
+
+	src_key = NULL;
+#ifdef CONFIG_IPV6_SUBTREES
+	if (f6i->fib6_src.plen)
+		src_key = saddr;
+#endif
+
+	bucket = rcu_dereference(f6i->rt6i_exception_bucket);
+	rt6_ex = __rt6_find_exception_rcu(&bucket, daddr, src_key);
+	if (rt6_ex && !rt6_check_expired(rt6_ex->rt6i))
+		mtu = dst_metric_raw(&rt6_ex->rt6i->dst, RTAX_MTU);
+
+	if (likely(!mtu)) {
+		struct net_device *dev = fib6_info_nh_dev(f6i);
+
+		mtu = IPV6_MIN_MTU;
+		idev = __in6_dev_get(dev);
+		if (idev && idev->cnf.mtu6 > mtu)
+			mtu = idev->cnf.mtu6;
+	}
+
+	mtu = min_t(unsigned int, mtu, IP6_MAX_MTU);
+out:
+	return mtu - lwtunnel_headroom(fib6_info_nh_lwt(f6i), mtu);
+}
+
 struct dst_entry *icmp6_dst_alloc(struct net_device *dev,
 				  struct flowi6 *fl6)
 {
-- 
2.11.0

^ permalink raw reply related

* [PATCH v2 bpf-next 3/3] bpf: Add mtu checking to FIB forwarding helper
From: dsahern @ 2018-05-21 16:08 UTC (permalink / raw)
  To: netdev, borkmann, ast; +Cc: davem, David Ahern
In-Reply-To: <20180521160816.7060-1-dsahern@kernel.org>

From: David Ahern <dsahern@gmail.com>

Add check that egress MTU can handle packet to be forwarded. If
the MTU is less than the packet length, return 0 meaning the
packet is expected to continue up the stack for help - eg.,
fragmenting the packet or sending an ICMP.

The XDP path needs to leverage the FIB entry for an MTU on the
route spec or an exception entry for a given destination. The
skb path lets is_skb_forwardable decide if the packet can be
sent.

Signed-off-by: David Ahern <dsahern@gmail.com>
---
 net/core/filter.c | 42 +++++++++++++++++++++++++++++++++++-------
 1 file changed, 35 insertions(+), 7 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index aec5ebafb262..ba3ff5aa575a 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4089,7 +4089,7 @@ static int bpf_fib_set_fwd_params(struct bpf_fib_lookup *params,
 
 #if IS_ENABLED(CONFIG_INET)
 static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
-			       u32 flags)
+			       u32 flags, bool check_mtu)
 {
 	struct in_device *in_dev;
 	struct neighbour *neigh;
@@ -4098,6 +4098,7 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
 	struct fib_nh *nh;
 	struct flowi4 fl4;
 	int err;
+	u32 mtu;
 
 	dev = dev_get_by_index_rcu(net, params->ifindex);
 	if (unlikely(!dev))
@@ -4149,6 +4150,12 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
 	if (res.fi->fib_nhs > 1)
 		fib_select_path(net, &res, &fl4, NULL);
 
+	if (check_mtu) {
+		mtu = ip_mtu_from_fib_result(&res, params->ipv4_dst);
+		if (params->tot_len > mtu)
+			return 0;
+	}
+
 	nh = &res.fi->fib_nh[res.nh_sel];
 
 	/* do not handle lwt encaps right now */
@@ -4177,7 +4184,7 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
 
 #if IS_ENABLED(CONFIG_IPV6)
 static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
-			       u32 flags)
+			       u32 flags, bool check_mtu)
 {
 	struct in6_addr *src = (struct in6_addr *) params->ipv6_src;
 	struct in6_addr *dst = (struct in6_addr *) params->ipv6_dst;
@@ -4188,6 +4195,7 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
 	struct flowi6 fl6;
 	int strict = 0;
 	int oif;
+	u32 mtu;
 
 	/* link local addresses are never forwarded */
 	if (rt6_need_strict(dst) || rt6_need_strict(src))
@@ -4250,6 +4258,12 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
 						       fl6.flowi6_oif, NULL,
 						       strict);
 
+	if (check_mtu) {
+		mtu = ipv6_stub->ip6_mtu_from_fib6(f6i, dst, src);
+		if (params->tot_len > mtu)
+			return 0;
+	}
+
 	if (f6i->fib6_nh.nh_lwtstate)
 		return 0;
 
@@ -4282,12 +4296,12 @@ BPF_CALL_4(bpf_xdp_fib_lookup, struct xdp_buff *, ctx,
 #if IS_ENABLED(CONFIG_INET)
 	case AF_INET:
 		return bpf_ipv4_fib_lookup(dev_net(ctx->rxq->dev), params,
-					   flags);
+					   flags, true);
 #endif
 #if IS_ENABLED(CONFIG_IPV6)
 	case AF_INET6:
 		return bpf_ipv6_fib_lookup(dev_net(ctx->rxq->dev), params,
-					   flags);
+					   flags, true);
 #endif
 	}
 	return 0;
@@ -4306,20 +4320,34 @@ static const struct bpf_func_proto bpf_xdp_fib_lookup_proto = {
 BPF_CALL_4(bpf_skb_fib_lookup, struct sk_buff *, skb,
 	   struct bpf_fib_lookup *, params, int, plen, u32, flags)
 {
+	struct net *net = dev_net(skb->dev);
+	int index = 0;
+
 	if (plen < sizeof(*params))
 		return -EINVAL;
 
 	switch (params->family) {
 #if IS_ENABLED(CONFIG_INET)
 	case AF_INET:
-		return bpf_ipv4_fib_lookup(dev_net(skb->dev), params, flags);
+		index = bpf_ipv4_fib_lookup(net, params, flags, false);
+		break;
 #endif
 #if IS_ENABLED(CONFIG_IPV6)
 	case AF_INET6:
-		return bpf_ipv6_fib_lookup(dev_net(skb->dev), params, flags);
+		index = bpf_ipv6_fib_lookup(net, params, flags, false);
+		break;
 #endif
 	}
-	return -ENOTSUPP;
+
+	if (index > 0) {
+		struct net_device *dev;
+
+		dev = dev_get_by_index_rcu(net, index);
+		if (!is_skb_forwardable(dev, skb))
+			index = 0;
+	}
+
+	return index;
 }
 
 static const struct bpf_func_proto bpf_skb_fib_lookup_proto = {
-- 
2.11.0

^ permalink raw reply related

* [PATCH v2 bpf-next 1/3] net/ipv4: Add helper to return path MTU based on fib result
From: dsahern @ 2018-05-21 16:08 UTC (permalink / raw)
  To: netdev, borkmann, ast; +Cc: davem, David Ahern
In-Reply-To: <20180521160816.7060-1-dsahern@kernel.org>

From: David Ahern <dsahern@gmail.com>

Determine path MTU from a FIB lookup result. Logic is a distillation of
ip_dst_mtu_maybe_forward.

Signed-off-by: David Ahern <dsahern@gmail.com>
---
 include/net/ip_fib.h |  2 ++
 net/ipv4/route.c     | 31 +++++++++++++++++++++++++++++++
 2 files changed, 33 insertions(+)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 81d0f2107ff1..69c91d1934c1 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -449,4 +449,6 @@ static inline void fib_proc_exit(struct net *net)
 }
 #endif
 
+u32 ip_mtu_from_fib_result(struct fib_result *res, __be32 daddr);
+
 #endif  /* _NET_FIB_H */
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 29268efad247..ac3b22bc51b2 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1352,6 +1352,37 @@ static struct fib_nh_exception *find_exception(struct fib_nh *nh, __be32 daddr)
 	return NULL;
 }
 
+/* MTU selection:
+ * 1. mtu on route is locked - use it
+ * 2. mtu from nexthop exception
+ * 3. mtu from egress device
+ */
+
+u32 ip_mtu_from_fib_result(struct fib_result *res, __be32 daddr)
+{
+	struct fib_info *fi = res->fi;
+	struct fib_nh *nh = &fi->fib_nh[res->nh_sel];
+	struct net_device *dev = nh->nh_dev;
+	u32 mtu = 0;
+
+	if (dev_net(dev)->ipv4.sysctl_ip_fwd_use_pmtu ||
+	    fi->fib_metrics->metrics[RTAX_LOCK - 1] & (1 << RTAX_MTU))
+		mtu = fi->fib_mtu;
+
+	if (likely(!mtu)) {
+		struct fib_nh_exception *fnhe;
+
+		fnhe = find_exception(nh, daddr);
+		if (fnhe && !time_after_eq(jiffies, fnhe->fnhe_expires))
+			mtu = fnhe->fnhe_pmtu;
+	}
+
+	if (likely(!mtu))
+		mtu = min(READ_ONCE(dev->mtu), IP_MAX_MTU);
+
+	return mtu - lwtunnel_headroom(nh->nh_lwtstate, mtu);
+}
+
 static bool rt_bind_exception(struct rtable *rt, struct fib_nh_exception *fnhe,
 			      __be32 daddr, const bool do_cache)
 {
-- 
2.11.0

^ permalink raw reply related

* [PATCH v2 bpf-next 0/3] bpf: Add MTU check to fib lookup helper
From: dsahern @ 2018-05-21 16:08 UTC (permalink / raw)
  To: netdev, borkmann, ast; +Cc: davem, David Ahern

From: David Ahern <dsahern@gmail.com>

Packets that exceed the egress MTU can not be forwarded in the fast path.
Add IPv4 and IPv6 MTU helpers that take a FIB lookup result (versus the
typical dst path) and add the calls to bpf_ipv{4,6}_fib_lookup.

v2
- add ip6_mtu_from_fib6 to ipv6_stub
- only call the new MTU helpers for fib lookups in XDP path; skb
  path uses is_skb_forwardable to determine if the packet can be
  sent via the egress device from the FIB lookup

David Ahern (3):
  net/ipv4: Add helper to return path MTU based on fib result
  net/ipv6: Add helper to return path MTU based on fib result
  bpf: Add mtu checking to FIB forwarding helper

 include/net/addrconf.h   |  2 ++
 include/net/ip6_fib.h    |  6 ++++++
 include/net/ip6_route.h  |  3 +++
 include/net/ip_fib.h     |  2 ++
 net/core/filter.c        | 42 +++++++++++++++++++++++++++++++++++-------
 net/ipv4/route.c         | 31 +++++++++++++++++++++++++++++++
 net/ipv6/addrconf_core.c |  8 ++++++++
 net/ipv6/af_inet6.c      |  1 +
 net/ipv6/route.c         | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 9 files changed, 136 insertions(+), 7 deletions(-)

-- 
2.11.0

^ permalink raw reply

* Re: [PATCH net] sctp: fix the issue that flags are ignored when using kernel_connect
From: Marcelo Ricardo Leitner @ 2018-05-21 15:52 UTC (permalink / raw)
  To: Xin Long; +Cc: network dev, linux-sctp, davem, Neil Horman, mkubecek
In-Reply-To: <4863916c3e574b0d860725466d7d4a2f445fbe5b.1526805550.git.lucien.xin@gmail.com>

On Sun, May 20, 2018 at 04:39:10PM +0800, Xin Long wrote:
> Now sctp uses inet_dgram_connect as its proto_ops .connect, and the flags
> param can't be passed into its proto .connect where this flags is really
> needed.
> 
> sctp works around it by getting flags from socket file in __sctp_connect.
> It works for connecting from userspace, as inherently the user sock has
> socket file and it passes f_flags as the flags param into the proto_ops
> .connect.
> 
> However, the sock created by sock_create_kern doesn't have a socket file,
> and it passes the flags (like O_NONBLOCK) by using the flags param in
> kernel_connect, which calls proto_ops .connect later.
> 
> So to fix it, this patch defines a new proto_ops .connect for sctp,
> sctp_inet_connect, which calls __sctp_connect() directly with this
> flags param. After this, the sctp's proto .connect can be removed.
> 
> Note that sctp_inet_connect doesn't need to do some checks that are not
> needed for sctp, which makes thing better than with inet_dgram_connect.
> 
> Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> Signed-off-by: Xin Long <lucien.xin@gmail.com>

Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

> ---
>  include/net/sctp/sctp.h |  2 ++
>  net/sctp/ipv6.c         |  2 +-
>  net/sctp/protocol.c     |  2 +-
>  net/sctp/socket.c       | 51 +++++++++++++++++++++++++++++++++----------------
>  4 files changed, 39 insertions(+), 18 deletions(-)
> 
> diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
> index 28b996d..35498e6 100644
> --- a/include/net/sctp/sctp.h
> +++ b/include/net/sctp/sctp.h
> @@ -103,6 +103,8 @@ void sctp_addr_wq_mgmt(struct net *, struct sctp_sockaddr_entry *, int);
>  /*
>   * sctp/socket.c
>   */
> +int sctp_inet_connect(struct socket *sock, struct sockaddr *uaddr,
> +		      int addr_len, int flags);
>  int sctp_backlog_rcv(struct sock *sk, struct sk_buff *skb);
>  int sctp_inet_listen(struct socket *sock, int backlog);
>  void sctp_write_space(struct sock *sk);
> diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
> index 4224711..0cd2e76 100644
> --- a/net/sctp/ipv6.c
> +++ b/net/sctp/ipv6.c
> @@ -1006,7 +1006,7 @@ static const struct proto_ops inet6_seqpacket_ops = {
>  	.owner		   = THIS_MODULE,
>  	.release	   = inet6_release,
>  	.bind		   = inet6_bind,
> -	.connect	   = inet_dgram_connect,
> +	.connect	   = sctp_inet_connect,
>  	.socketpair	   = sock_no_socketpair,
>  	.accept		   = inet_accept,
>  	.getname	   = sctp_getname,
> diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
> index d685f84..6bf0a99 100644
> --- a/net/sctp/protocol.c
> +++ b/net/sctp/protocol.c
> @@ -1012,7 +1012,7 @@ static const struct proto_ops inet_seqpacket_ops = {
>  	.owner		   = THIS_MODULE,
>  	.release	   = inet_release,	/* Needs to be wrapped... */
>  	.bind		   = inet_bind,
> -	.connect	   = inet_dgram_connect,
> +	.connect	   = sctp_inet_connect,
>  	.socketpair	   = sock_no_socketpair,
>  	.accept		   = inet_accept,
>  	.getname	   = inet_getname,	/* Semantics are different.  */
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index 80835ac..ae7e7c6 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -1086,7 +1086,7 @@ static int sctp_setsockopt_bindx(struct sock *sk,
>   */
>  static int __sctp_connect(struct sock *sk,
>  			  struct sockaddr *kaddrs,
> -			  int addrs_size,
> +			  int addrs_size, int flags,
>  			  sctp_assoc_t *assoc_id)
>  {
>  	struct net *net = sock_net(sk);
> @@ -1104,7 +1104,6 @@ static int __sctp_connect(struct sock *sk,
>  	union sctp_addr *sa_addr = NULL;
>  	void *addr_buf;
>  	unsigned short port;
> -	unsigned int f_flags = 0;
>  
>  	sp = sctp_sk(sk);
>  	ep = sp->ep;
> @@ -1254,13 +1253,7 @@ static int __sctp_connect(struct sock *sk,
>  	sp->pf->to_sk_daddr(sa_addr, sk);
>  	sk->sk_err = 0;
>  
> -	/* in-kernel sockets don't generally have a file allocated to them
> -	 * if all they do is call sock_create_kern().
> -	 */
> -	if (sk->sk_socket->file)
> -		f_flags = sk->sk_socket->file->f_flags;
> -
> -	timeo = sock_sndtimeo(sk, f_flags & O_NONBLOCK);
> +	timeo = sock_sndtimeo(sk, flags & O_NONBLOCK);
>  
>  	if (assoc_id)
>  		*assoc_id = asoc->assoc_id;
> @@ -1348,7 +1341,7 @@ static int __sctp_setsockopt_connectx(struct sock *sk,
>  				      sctp_assoc_t *assoc_id)
>  {
>  	struct sockaddr *kaddrs;
> -	int err = 0;
> +	int err = 0, flags = 0;
>  
>  	pr_debug("%s: sk:%p addrs:%p addrs_size:%d\n",
>  		 __func__, sk, addrs, addrs_size);
> @@ -1367,7 +1360,13 @@ static int __sctp_setsockopt_connectx(struct sock *sk,
>  	if (err)
>  		goto out_free;
>  
> -	err = __sctp_connect(sk, kaddrs, addrs_size, assoc_id);
> +	/* in-kernel sockets don't generally have a file allocated to them
> +	 * if all they do is call sock_create_kern().
> +	 */
> +	if (sk->sk_socket->file)
> +		flags = sk->sk_socket->file->f_flags;
> +
> +	err = __sctp_connect(sk, kaddrs, addrs_size, flags, assoc_id);
>  
>  out_free:
>  	kvfree(kaddrs);
> @@ -4397,16 +4396,26 @@ static int sctp_setsockopt(struct sock *sk, int level, int optname,
>   * len: the size of the address.
>   */
>  static int sctp_connect(struct sock *sk, struct sockaddr *addr,
> -			int addr_len)
> +			int addr_len, int flags)
>  {
> -	int err = 0;
> +	struct inet_sock *inet = inet_sk(sk);
>  	struct sctp_af *af;
> +	int err = 0;
>  
>  	lock_sock(sk);
>  
>  	pr_debug("%s: sk:%p, sockaddr:%p, addr_len:%d\n", __func__, sk,
>  		 addr, addr_len);
>  
> +	/* We may need to bind the socket. */
> +	if (!inet->inet_num) {
> +		if (sk->sk_prot->get_port(sk, 0)) {
> +			release_sock(sk);
> +			return -EAGAIN;
> +		}
> +		inet->inet_sport = htons(inet->inet_num);
> +	}
> +
>  	/* Validate addr_len before calling common connect/connectx routine. */
>  	af = sctp_get_af_specific(addr->sa_family);
>  	if (!af || addr_len < af->sockaddr_len) {
> @@ -4415,13 +4424,25 @@ static int sctp_connect(struct sock *sk, struct sockaddr *addr,
>  		/* Pass correct addr len to common routine (so it knows there
>  		 * is only one address being passed.
>  		 */
> -		err = __sctp_connect(sk, addr, af->sockaddr_len, NULL);
> +		err = __sctp_connect(sk, addr, af->sockaddr_len, flags, NULL);
>  	}
>  
>  	release_sock(sk);
>  	return err;
>  }
>  
> +int sctp_inet_connect(struct socket *sock, struct sockaddr *uaddr,
> +		      int addr_len, int flags)
> +{
> +	if (addr_len < sizeof(uaddr->sa_family))
> +		return -EINVAL;
> +
> +	if (uaddr->sa_family == AF_UNSPEC)
> +		return -EOPNOTSUPP;
> +
> +	return sctp_connect(sock->sk, uaddr, addr_len, flags);
> +}
> +
>  /* FIXME: Write comments. */
>  static int sctp_disconnect(struct sock *sk, int flags)
>  {
> @@ -8724,7 +8745,6 @@ struct proto sctp_prot = {
>  	.name        =	"SCTP",
>  	.owner       =	THIS_MODULE,
>  	.close       =	sctp_close,
> -	.connect     =	sctp_connect,
>  	.disconnect  =	sctp_disconnect,
>  	.accept      =	sctp_accept,
>  	.ioctl       =	sctp_ioctl,
> @@ -8767,7 +8787,6 @@ struct proto sctpv6_prot = {
>  	.name		= "SCTPv6",
>  	.owner		= THIS_MODULE,
>  	.close		= sctp_close,
> -	.connect	= sctp_connect,
>  	.disconnect	= sctp_disconnect,
>  	.accept		= sctp_accept,
>  	.ioctl		= sctp_ioctl,
> -- 
> 2.1.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* Re: [PATCH net-next] sctp: add support for SCTP_REUSE_PORT sockopt
From: Marcelo Ricardo Leitner @ 2018-05-21 15:51 UTC (permalink / raw)
  To: Michael Tuexen; +Cc: Neil Horman, Xin Long, network dev, linux-sctp, davem
In-Reply-To: <43A7D2C9-DFCE-4ADA-9ABB-B7ACD78C210B@fh-muenster.de>

On Mon, May 21, 2018 at 04:09:31PM +0200, Michael Tuexen wrote:
> > On 21. May 2018, at 15:48, Neil Horman <nhorman@tuxdriver.com> wrote:
> > 
> > On Mon, May 21, 2018 at 02:16:56PM +0200, Michael Tuexen wrote:
> >>> On 21. May 2018, at 13:39, Neil Horman <nhorman@tuxdriver.com> wrote:
> >>> 
> >>> On Sun, May 20, 2018 at 10:54:04PM -0300, Marcelo Ricardo Leitner wrote:
> >>>> On Sun, May 20, 2018 at 08:50:59PM -0400, Neil Horman wrote:
> >>>>> On Sat, May 19, 2018 at 03:44:40PM +0800, Xin Long wrote:
> >>>>>> This feature is actually already supported by sk->sk_reuse which can be
> >>>>>> set by SO_REUSEADDR. But it's not working exactly as RFC6458 demands in
> >>>>>> section 8.1.27, like:
> >>>>>> 
> >>>>>> - This option only supports one-to-one style SCTP sockets
> >>>>>> - This socket option must not be used after calling bind()
> >>>>>>   or sctp_bindx().
> >>>>>> 
> >>>>>> Besides, SCTP_REUSE_PORT sockopt should be provided for user's programs.
> >>>>>> Otherwise, the programs with SCTP_REUSE_PORT from other systems will not
> >>>>>> work in linux.
> >>>>>> 
> >>>>>> This patch reuses sk->sk_reuse and works pretty much as SO_REUSEADDR,
> >>>>>> just with some extra setup limitations that are neeeded when it is being
> >>>>>> enabled.
> >>>>>> 
> >>>>>> "It should be noted that the behavior of the socket-level socket option
> >>>>>> to reuse ports and/or addresses for SCTP sockets is unspecified", so it
> >>>>>> leaves SO_REUSEADDR as is for the compatibility.
> >>>>>> 
> >>>>>> Signed-off-by: Xin Long <lucien.xin@gmail.com>
> >>>>>> ---
> >>>>>> include/uapi/linux/sctp.h |  1 +
> >>>>>> net/sctp/socket.c         | 48 +++++++++++++++++++++++++++++++++++++++++++++++
> >>>>>> 2 files changed, 49 insertions(+)
> >>>>>> 
> >>>>> A few things:
> >>>>> 
> >>>>> 1) I agree with Tom, this feature is a complete duplication of the SK_REUSEPORT
> >>>>> socket option.  I understand that this is an implementation of the option in the
> >>>>> RFC, but its definately a duplication of a feature, which makes several things
> >>>>> really messy.
> >>>>> 
> >>>>> 2) The overloading of the sk_reuse opeion is a bad idea, for several reasons.
> >>>>> Chief among them is the behavioral interference between this patch and the
> >>>>> SO_REUSEADDR socket level option, that also sets this feature.  If you set
> >>>>> sk_reuse via SO_REUSEADDR, you will set the SCTP port reuse feature regardless
> >>>>> of the bind or 1:1/1:m state of the socket.  Vice versa, if you set this socket
> >>>>> option via the SCTP_PORT_REUSE option you will inadvertently turn on address
> >>>>> reuse for the socket.  We can't do that.
> >>>> 
> >>>> Given your comments, going a bit further here, one other big
> >>>> implication is that a port would never be able to be considered to
> >>>> fully meet SCTP standards regarding reuse because a rogue application
> >>>> may always abuse of the socket level opt to gain access to the port.
> >>>> 
> >>>> IOW, the patch allows the application to use such restrictions against
> >>>> itself and nothing else, which undermines the patch idea.
> >>>> 
> >>> Agreed.
> >>> 
> >>>> I lack the knowledge on why the SCTP option was proposed in the RFC. I
> >>>> guess they had a good reason to add the restriction on 1:1/1:m style.
> >>>> Does the usage of the current imply in any risk to SCTP sockets? If
> >>>> yes, that would give some grounds for going forward with the SCTP
> >>>> option.
> >>>> 
> >>> I'm also not privy to why the sctp option was proposed, though I expect that the
> >>> lack of standardization of SO_REUSEPORT probably had something to do with it.
> >>> As for the reasoning behind restriction to only 1:1 sockets, if I had to guess,
> >>> I would say it likely because it creates ordering difficulty at the application
> >>> level.
> >>> 
> >>> CC-ing Michael Tuxen, who I believe had some input on this RFC.  Hopefully he
> >>> can shed some light on this.
> >> Dear all,
> >> 
> >> the reason this was added is to have a specified way to allow a system to
> >> behave like a client and server making use of the INIT collision.
> >> 
> >> For 1-to-many style sockets you can do this by creating a socket, binding it,
> >> calling listen on it and trying to connect to the peer.
> >> 
> >> For 1-to-1 style sockets you need two sockets for it. One listener and one
> >> you use to connect (and close it in case of failure, open a new one...).
> >> 
> >> It was not clear if one can achieve this with SO_REUSEPORT and/or SO_REUSEADDR
> >> on all platforms. We left that unspecified.
> >> 
> >> I hope this makes the intention clearer.
> >> 
> > I think it makes the intention clearer yes, but it unfortunately does nothing in
> > my mind to clarify how the implementation should best handle the potential
> > overlap in functionality.  What I see here is that we have two functional paths
> > (the SO_REUSEPORT path and the SCTP_PORT_REUSE path), which may or may not
> > (depending on the OS implementation achieve the same functional goal (allowing
> > multiple sockets to share a port while allowing one socket to listen and the
> > other connect to a remote peer).  If both implementations do the same thing on a
> > given platform, we can either just alias one to another and be done, but if they
> > don't then we either have to implement both paths, and ensure that the
> > SO_REUSEPORT path is a no-op/error return for SCTP sockets, or that each path
> > implements a distinct feature set that is cleaarly documented.
> > 
> > That said, I think we may be in luck.  Looking at the connect and listen paths,
> > it appears to me that:
> > 
> > 1) Sockets ignore SO_REUSEPORT in the connect and listen paths (save for any
> > autobinding) so it would appear that the intent of the SCTP rfc can be honored
> > via SO_REUSEPORT on linux.  
> > 
> > 2) SO_REUSEPORT prevents changing state after a bind has occured, so we can honr
> > that part of the SCTP RFC.
> > 
> > The only missing part is the restriction that SCTP_REUSE_PORT has which is
> > unaccounted for is that 1:M sockets aren't allowed to enable port reuse.
> > However, I think the implication from Michaels description above is that port
> > reuse on a 1:M socket is implicit because a single socket can connect and listen
> > in that use case, rather than there being a danger to doing so.
> > 
> > As such, I would propose that we implement this socket option by simply setting
> > the sk->sk_reuseport field in the sock structure, and document the fact that
> > linux does not restrict port reuse from 1:M sockets.
> > 
> > Thoughts?
> Sounds acceptable to me...

+1

> 
> Best regards
> Michael
> > Neil
> > 
> 

^ permalink raw reply

* Re: [PATCH net-next 0/2] net: sfp: small improvements
From: David Miller @ 2018-05-21 15:51 UTC (permalink / raw)
  To: antoine.tenart
  Cc: linux, netdev, linux-kernel, thomas.petazzoni, maxime.chevallier,
	gregory.clement, miquel.raynal, nadavh, stefanc, ymarkman, mw
In-Reply-To: <20180517082907.14420-1-antoine.tenart@bootlin.com>

From: Antoine Tenart <antoine.tenart@bootlin.com>
Date: Thu, 17 May 2018 10:29:05 +0200

> This series was part of the mvpp2 phylink one but as we reworked it to
> use fixed-link on the DB boards, the SFP commits weren't needed
> anymore for our use case. Two of the three patches still are needed I
> believe (I ditched the one about non-wired SFP cages), so they are sent
> here in a separate series.

Based upon the discussion of patch #1, it seems there is a desire to make
the i2c-bus property mandatory since it isn't clear if access to the SFP
module without it really all that doable.

^ permalink raw reply

* Re: [patch net-next] nfp: flower: set sysfs link to device for representors
From: David Miller @ 2018-05-21 15:49 UTC (permalink / raw)
  To: jiri
  Cc: netdev, jakub.kicinski, simon.horman, dirk.vandermerwe,
	john.hurley, pieter.jansenvanvuuren, oss-drivers
In-Reply-To: <20180517100520.23971-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@resnulli.us>
Date: Thu, 17 May 2018 12:05:20 +0200

> From: Jiri Pirko <jiri@mellanox.com>
> 
> Do this so the sysfs has "device" link correctly set.
> 
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>

Please sort out the non-PF representor issue with Or and Jakub.

Thanks.

^ permalink raw reply

* Re: [PATCH v2 net] stmmac: strip vlan tag on reception only for 8021q tagged frames
From: David Miller @ 2018-05-21 15:48 UTC (permalink / raw)
  To: eladv6; +Cc: makita.toshiaki, netdev, peppe.cavallaro, alexandre.torgue
In-Reply-To: <20180517.124356.1373521143004050823.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Thu, 17 May 2018 12:43:56 -0400 (EDT)

> Giuseppe and Alexandre, please review this patch.

If nobody thinks this patch is important enough to actually
review, I'm tossing it.

Sorry.

^ permalink raw reply

* Re: [PATCH net] tuntap: raise EPOLLOUT on device up
From: David Miller @ 2018-05-21 15:47 UTC (permalink / raw)
  To: jasowang; +Cc: netdev, linux-kernel, mst, hannes, edumazet
In-Reply-To: <1526648443-24128-1-git-send-email-jasowang@redhat.com>

From: Jason Wang <jasowang@redhat.com>
Date: Fri, 18 May 2018 21:00:43 +0800

> We return -EIO on device down but can not raise EPOLLOUT after it was
> up. This may confuse user like vhost which expects tuntap to raise
> EPOLLOUT to re-enable its TX routine after tuntap is down. This could
> be easily reproduced by transmitting packets from VM while down and up
> the tap device. Fixing this by set SOCKWQ_ASYNC_NOSPACE on -EIO.
> 
> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Cc: Eric Dumazet <edumazet@google.com>
> Fixes: 1bd4978a88ac2 ("tun: honor IFF_UP in tun_get_user()")
> Signed-off-by: Jason Wang <jasowang@redhat.com>

I'm no so sure what to do with this patch.

Like Michael says, this flag bit is only checks upon transmit which
may or may not happen after this point.  It doesn't seem to be
guaranteed.

^ permalink raw reply

* Re: [PATCH net-next 7/7] net: dsa: qca8k: Remove rudundant parentheses
From: Florian Fainelli @ 2018-05-21 15:21 UTC (permalink / raw)
  To: Michal Vokáč, netdev, michal.vokac
  Cc: linux-kernel, devicetree, vivien.didelot, andrew, mark.rutland,
	robh+dt, davem
In-Reply-To: <1526909293-56377-8-git-send-email-michal.vokac@ysoft.com>



On 05/21/2018 06:28 AM, Michal Vokáč wrote:
> Fix warning reported by checkpatch.

Nit in the subject: should be redundant, with that:

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply

* Re: [PATCH net-next 6/7] net: dsa: qca8k: Replace GPL boilerplate by SPDX
From: Florian Fainelli @ 2018-05-21 15:20 UTC (permalink / raw)
  To: Michal Vokáč, netdev, michal.vokac
  Cc: linux-kernel, devicetree, vivien.didelot, andrew, mark.rutland,
	robh+dt, davem
In-Reply-To: <1526909293-56377-7-git-send-email-michal.vokac@ysoft.com>



On 05/21/2018 06:28 AM, Michal Vokáč wrote:
> Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>

I don't know if we need all people who contributed to that driver to
agree on that, this is not a license change, so it should be okay I presume?

-- 
Florian

^ permalink raw reply

* Re: [PATCH net-next 5/7] net: dsa: qca8k: Allow overwriting CPU port setting
From: Florian Fainelli @ 2018-05-21 15:20 UTC (permalink / raw)
  To: Michal Vokáč, netdev, michal.vokac
  Cc: linux-kernel, devicetree, vivien.didelot, andrew, mark.rutland,
	robh+dt, davem
In-Reply-To: <1526909293-56377-6-git-send-email-michal.vokac@ysoft.com>



On 05/21/2018 06:28 AM, Michal Vokáč wrote:
> Implement adjust_link function that allows to overwrite default CPU port
> setting using fixed-link device tree subnode.
> 
> Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply

* Re: [PATCH net-next 4/7] net: dsa: qca8k: Force CPU port to its highest bandwidth
From: Florian Fainelli @ 2018-05-21 15:19 UTC (permalink / raw)
  To: Michal Vokáč, netdev, michal.vokac
  Cc: linux-kernel, devicetree, vivien.didelot, andrew, mark.rutland,
	robh+dt, davem
In-Reply-To: <1526909293-56377-5-git-send-email-michal.vokac@ysoft.com>



On 05/21/2018 06:28 AM, Michal Vokáč wrote:
> By default autonegotiation is enabled to configure MAC on all ports.
> For the CPU port autonegotiation can not be used so we need to set
> some sensible defaults manually.
> 
> This patch forces the default setting of the CPU port to 1000Mbps/full
> duplex which is the chip maximum capability.
> 
> Also correct size of the bit field used to configure link speed.
> 
> Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>

Likewise, would not we want to have a:

Fixes: 6b93fb46480a ("net-next: dsa: add new driver for qca8xxx family")

tag here as well?
-- 
Florian

^ permalink raw reply

* Re: [PATCH net-next 3/7] net: dsa: qca8k: Enable RXMAC when bringing up a port
From: Florian Fainelli @ 2018-05-21 15:17 UTC (permalink / raw)
  To: Michal Vokáč, netdev, michal.vokac
  Cc: linux-kernel, devicetree, vivien.didelot, andrew, mark.rutland,
	robh+dt, davem
In-Reply-To: <1526909293-56377-4-git-send-email-michal.vokac@ysoft.com>



On 05/21/2018 06:28 AM, Michal Vokáč wrote:
> When a port is brought up/down do not enable/disable only the TXMAC
> but the RXMAC as well. This is essential for the CPU port to work.
> 
> Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>

Should this have:

Fixes: 6b93fb46480a ("net-next: dsa: add new driver for qca8xxx family")?
-- 
Florian

^ permalink raw reply

* Re: [PATCH] bpf: check NULL for sk_to_full_sk()
From: Eric Dumazet @ 2018-05-21 15:17 UTC (permalink / raw)
  To: YueHaibing, ast, daniel; +Cc: linux-kernel, netdev
In-Reply-To: <20180521075558.11968-1-yuehaibing@huawei.com>



On 05/21/2018 12:55 AM, YueHaibing wrote:
> like commit df39a9f106d5 ("bpf: check NULL for sk_to_full_sk() return value"),
> we should check sk_to_full_sk return value against NULL.
> 
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
> ---
>  include/linux/bpf-cgroup.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
> index 30d15e6..fd3fbeb 100644
> --- a/include/linux/bpf-cgroup.h
> +++ b/include/linux/bpf-cgroup.h
> @@ -91,7 +91,7 @@ int __cgroup_bpf_check_dev_permission(short dev_type, u32 major, u32 minor,
>  	int __ret = 0;							       \
>  	if (cgroup_bpf_enabled && sk && sk == skb->sk) {		       \
>  		typeof(sk) __sk = sk_to_full_sk(sk);			       \
> -		if (sk_fullsock(__sk))					       \
> +		if (__sk && sk_fullsock(__sk))				       \
>  			__ret = __cgroup_bpf_run_filter_skb(__sk, skb,	       \
>  						      BPF_CGROUP_INET_EGRESS); \
>  	}								       \
> 

Why is this needed ???

^ permalink raw reply

* Re: [net-next PATCH v2 2/4] net: Enable Tx queue selection based on Rx queues
From: Willem de Bruijn @ 2018-05-21 15:12 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Amritha Nambiar, Linux Kernel Network Developers, David S. Miller,
	Alexander Duyck, Sridhar Samudrala, Eric Dumazet,
	Hannes Frederic Sowa
In-Reply-To: <CALx6S36h=gGb1LkLuJ80DUrE=m+FhbcQ0AD94AdtEUvxJfHf=g@mail.gmail.com>

On Mon, May 21, 2018 at 10:51 AM, Tom Herbert <tom@herbertland.com> wrote:
> On Sat, May 19, 2018 at 1:27 PM, Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
>> On Sat, May 19, 2018 at 4:13 PM, Willem de Bruijn
>> <willemdebruijn.kernel@gmail.com> wrote:
>>> On Fri, May 18, 2018 at 12:03 AM, Tom Herbert <tom@herbertland.com> wrote:
>>>> On Tue, May 15, 2018 at 6:26 PM, Amritha Nambiar
>>>> <amritha.nambiar@intel.com> wrote:
>>>>> This patch adds support to pick Tx queue based on the Rx queue map
>>>>> configuration set by the admin through the sysfs attribute
>>>>> for each Tx queue. If the user configuration for receive
>>>>> queue map does not apply, then the Tx queue selection falls back
>>>>> to CPU map based selection and finally to hashing.
>>>>>
>>>>> Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
>>>>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>>>>> ---
>>
>>>>> +static int get_xps_queue(struct net_device *dev, struct sk_buff *skb)
>>>>> +{
>>>>> +#ifdef CONFIG_XPS
>>>>> +       enum xps_map_type i = XPS_MAP_RXQS;
>>>>> +       struct xps_dev_maps *dev_maps;
>>>>> +       struct sock *sk = skb->sk;
>>>>> +       int queue_index = -1;
>>>>> +       unsigned int tci = 0;
>>>>> +
>>>>> +       if (sk && sk->sk_rx_queue_mapping <= dev->real_num_rx_queues &&
>>>>> +           dev->ifindex == sk->sk_rx_ifindex)
>>>>> +               tci = sk->sk_rx_queue_mapping;
>>>>> +
>>>>> +       rcu_read_lock();
>>>>> +       while (queue_index < 0 && i < __XPS_MAP_MAX) {
>>>>> +               if (i == XPS_MAP_CPUS)
>>>>
>>>> This while loop typifies exactly why I don't think the XPS maps should
>>>> be an array.
>>>
>>> +1
>>
>> as a matter of fact, as enabling both cpu and rxqueue map at the same
>> time makes no sense, only one map is needed at any one time. The
>> only difference is in how it is indexed. It should probably not be possible
>> to configure both at the same time. Keeping a single map probably also
>> significantly simplifies patch 1/4.
>
> Willem,
>
> I think it might makes sense to have them both. Maybe one application
> is spin polling that needs this, where others might be happy with
> normal CPU mappings as default.

Some entries in the rx_queue table have queue_pair affinity
configured, the others return -1 to fall through to the cpu
affinity table?

I guess that implies flow steering to those special purpose
queues. I wonder whether this would be used this in practice.
I does make the code more complex by having to duplicate
the map lookup logic (mostly, patch 1/4).

^ permalink raw reply

* Re: [PATCH net 0/4] Fix several issues of virtio-net mergeable XDP
From: Michael S. Tsirkin @ 2018-05-21 15:04 UTC (permalink / raw)
  To: Jason Wang; +Cc: virtualization, netdev, linux-kernel
In-Reply-To: <1526891706-18516-1-git-send-email-jasowang@redhat.com>

On Mon, May 21, 2018 at 04:35:02PM +0800, Jason Wang wrote:
> Hi:
> 
> Please review the patches that tries to fix sevreal issues of
> virtio-net mergeable XDP.
> 
> Thanks

I think we should do 3/4 differently.
The rest looks good, and probably needed on stable.

Thanks!

> Jason Wang (4):
>   virtio-net: correctly redirect linearized packet
>   virtio-net: correctly transmit XDP buff after linearizing
>   virtio-net: reset num_buf to 1 after linearizing packet
>   virito-net: fix leaking page for gso packet during mergeable XDP
> 
>  drivers/net/virtio_net.c | 21 +++++++++++----------
>  1 file changed, 11 insertions(+), 10 deletions(-)
> 
> -- 
> 2.7.4

^ permalink raw reply

* Re: [PATCH net 2/4] virtio-net: correctly transmit XDP buff after linearizing
From: Michael S. Tsirkin @ 2018-05-21 15:03 UTC (permalink / raw)
  To: Jason Wang; +Cc: virtualization, netdev, linux-kernel, John Fastabend
In-Reply-To: <1526891706-18516-3-git-send-email-jasowang@redhat.com>

On Mon, May 21, 2018 at 04:35:04PM +0800, Jason Wang wrote:
> We should not go for the error path after successfully transmitting a
> XDP buffer after linearizing. Since the error path may try to pop and
> drop next packet and increase the drop counters. Fixing this by simply
> drop the refcnt of original page and go for xmit path.
> 
> Fixes: 72979a6c3590 ("virtio_net: xdp, add slowpath case for non contiguous buffers")
> Cc: John Fastabend <john.fastabend@gmail.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  drivers/net/virtio_net.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index c15d240..6260d65 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -775,7 +775,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  			}
>  			*xdp_xmit = true;
>  			if (unlikely(xdp_page != page))
> -				goto err_xdp;
> +				put_page(page);
>  			rcu_read_unlock();
>  			goto xdp_xmit;
>  		case XDP_REDIRECT:
> -- 
> 2.7.4

^ permalink raw reply

* Re: [PATCH net 1/4] virtio-net: correctly redirect linearized packet
From: Michael S. Tsirkin @ 2018-05-21 15:03 UTC (permalink / raw)
  To: Jason Wang; +Cc: virtualization, netdev, linux-kernel
In-Reply-To: <1526891706-18516-2-git-send-email-jasowang@redhat.com>

On Mon, May 21, 2018 at 04:35:03PM +0800, Jason Wang wrote:
> After a linearized packet was redirected by XDP, we should not go for
> the err path which will try to pop buffers for the next packet and
> increase the drop counter. Fixing this by just drop the page refcnt
> for the original page.
> 
> Fixes: 186b3c998c50 ("virtio-net: support XDP_REDIRECT")
> Reported-by: David Ahern <dsahern@gmail.com>
> Tested-by: David Ahern <dsahern@gmail.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  drivers/net/virtio_net.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 770422e..c15d240 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -787,7 +787,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  			}
>  			*xdp_xmit = true;
>  			if (unlikely(xdp_page != page))
> -				goto err_xdp;
> +				put_page(page);
>  			rcu_read_unlock();
>  			goto xdp_xmit;
>  		default:
> -- 
> 2.7.4

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox