Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net-next] net: NET_VENDOR_MICROSEMI should default ot N
From: David Ahern @ 2018-05-17 15:43 UTC (permalink / raw)
  To: netdev; +Cc: alexandre.belloni, David Ahern

Other ethernet drivers default to N. There is no reason for Microsemi
to default to y. I believe Linus has set the bar at a feature that cures
cancer can be enabled by default. [1]

[1] https://lkml.org/lkml/2010/3/2/366

Signed-off-by: David Ahern <dsahern@gmail.com>
---
 drivers/net/ethernet/mscc/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mscc/Kconfig b/drivers/net/ethernet/mscc/Kconfig
index 36c84625d54e..0ef40b05c8af 100644
--- a/drivers/net/ethernet/mscc/Kconfig
+++ b/drivers/net/ethernet/mscc/Kconfig
@@ -1,7 +1,7 @@
 # SPDX-License-Identifier: (GPL-2.0 OR MIT)
 config NET_VENDOR_MICROSEMI
 	bool "Microsemi devices"
-	default y
+	default n
 	help
 	  If you have a network (Ethernet) card belonging to this class, say Y.
 
-- 
2.11.0

^ permalink raw reply related

* Re: [PATCH net-next 3/4] tcp: add SACK compression
From: Eric Dumazet @ 2018-05-17 15:46 UTC (permalink / raw)
  To: Neal Cardwell, Eric Dumazet
  Cc: David Miller, Netdev, Toke Høiland-Jørgensen,
	Yuchung Cheng, Soheil Hassas Yeganeh
In-Reply-To: <58bcf9c0-e4f0-691d-8d6a-40ff3629f500@gmail.com>



On 05/17/2018 08:40 AM, Eric Dumazet wrote:
> 
> 
> On 05/17/2018 08:14 AM, Neal Cardwell wrote:

>> Any particular motivation for the 2.5ms here? It might be nice to match the
>> existing TSO autosizing dynamics and use 1ms here instead of having a
>> separate new constant of 2.5ms. Smaller time scales here should lead to
>> less burstiness and queue pressure from data packets in the network, and we
>> know from experience that the CPU overhead of 1ms chunks is acceptable.
> 
> This came from my tests on wifi really :)
> 
> I also had the idea to make this threshold adjustable for wifi, like we did for sk_pacing_shift.
> 
> (On wifi, we might want to increase the max delay between ACK)
> 
> So maybe use 1ms delay, when sk_pacing_shift == 10, but increase it if sk_pacing_shift has been lowered.
> 
>

BTW, maybe my changelog or patch is not clear enough :

As soon as some packets are received in order, we send an ACK, even if the timer was armed.
(This is the beginning of __tcp_ack_snd_check())

When this ACK is sent, timer is canceled (in tcp_event_ack_sent())

^ permalink raw reply

* [bpf-next PATCH 0/2] SK_MSG programs: read sock fields
From: John Fastabend @ 2018-05-17 15:53 UTC (permalink / raw)
  To: ast, daniel; +Cc: netdev

In this series we add the ability for sk msg programs to read basic
sock information about the sock they are attached to. The second
patch adds the tests to the selftest test_verifier.

One obseration that I had from writing this seriess is lots of the
./net/core/filter.c code is almost duplicated across program types.
I thought about building a template/macro that we could use as a
single block of code to read sock data out for multiple programs,
but I wasn't convinced it was worth it yet. The result was using a
macro saved a couple lines of code per block but made the code
a bit harder to read IMO. We can probably revisit the idea later
if we get more duplication.

---

John Fastabend (2):
      bpf: allow sk_msg programs to read sock fields
      bpf: add sk_msg prog sk access tests to test_verifier

 include/linux/filter.h                      |    1 
 include/uapi/linux/bpf.h                    |    8 ++
 kernel/bpf/sockmap.c                        |    1 
 net/core/filter.c                           |  114 ++++++++++++++++++++++++++-
 tools/include/uapi/linux/bpf.h              |    8 ++
 tools/testing/selftests/bpf/test_verifier.c |  115 +++++++++++++++++++++++++++
 6 files changed, 244 insertions(+), 3 deletions(-)

^ permalink raw reply

* [bpf-next PATCH 1/2] bpf: allow sk_msg programs to read sock fields
From: John Fastabend @ 2018-05-17 15:54 UTC (permalink / raw)
  To: ast, daniel; +Cc: netdev
In-Reply-To: <20180517155305.21250.52379.stgit@john-Precision-Tower-5810>

Currently sk_msg programs only have access to the raw data. However,
it is often useful when building policies to have the policies specific
to the socket endpoint. This allows using the socket tuple as input
into filters, etc.

This patch adds ctx access to the sock fields.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 include/linux/filter.h   |    1 
 include/uapi/linux/bpf.h |    8 +++
 kernel/bpf/sockmap.c     |    1 
 net/core/filter.c        |  114 +++++++++++++++++++++++++++++++++++++++++++++-
 4 files changed, 121 insertions(+), 3 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 9dbcb9d..d358d18 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -517,6 +517,7 @@ struct sk_msg_buff {
 	bool sg_copy[MAX_SKB_FRAGS];
 	__u32 flags;
 	struct sock *sk_redir;
+	struct sock *sk;
 	struct sk_buff *skb;
 	struct list_head list;
 };
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index d94d333..97446bb 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2176,6 +2176,14 @@ enum sk_action {
 struct sk_msg_md {
 	void *data;
 	void *data_end;
+
+	__u32 family;
+	__u32 remote_ip4;	/* Stored in network byte order */
+	__u32 local_ip4;	/* Stored in network byte order */
+	__u32 remote_ip6[4];	/* Stored in network byte order */
+	__u32 local_ip6[4];	/* Stored in network byte order */
+	__u32 remote_port;	/* Stored in network byte order */
+	__u32 local_port;	/* stored in host byte order */
 };
 
 #define BPF_TAG_SIZE	8
diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index c6de139..0ebf157 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -523,6 +523,7 @@ static unsigned int smap_do_tx_msg(struct sock *sk,
 	}
 
 	bpf_compute_data_pointers_sg(md);
+	md->sk = sk;
 	rc = (*prog->bpf_func)(md, prog->insnsi);
 	psock->apply_bytes = md->apply_bytes;
 
diff --git a/net/core/filter.c b/net/core/filter.c
index 6d0d156..aec5eba 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5148,18 +5148,23 @@ static bool sk_msg_is_valid_access(int off, int size,
 	switch (off) {
 	case offsetof(struct sk_msg_md, data):
 		info->reg_type = PTR_TO_PACKET;
+		if (size != sizeof(__u64))
+			return false;
 		break;
 	case offsetof(struct sk_msg_md, data_end):
 		info->reg_type = PTR_TO_PACKET_END;
+		if (size != sizeof(__u64))
+			return false;
 		break;
+	default:
+		if (size != sizeof(__u32))
+			return false;
 	}
 
 	if (off < 0 || off >= sizeof(struct sk_msg_md))
 		return false;
 	if (off % size != 0)
 		return false;
-	if (size != sizeof(__u64))
-		return false;
 
 	return true;
 }
@@ -5835,7 +5840,8 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
 		break;
 
 	case offsetof(struct bpf_sock_ops, local_ip4):
-		BUILD_BUG_ON(FIELD_SIZEOF(struct sock_common, skc_rcv_saddr) != 4);
+		BUILD_BUG_ON(FIELD_SIZEOF(struct sock_common,
+					  skc_rcv_saddr) != 4);
 
 		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
 					      struct bpf_sock_ops_kern, sk),
@@ -6152,6 +6158,7 @@ static u32 sk_msg_convert_ctx_access(enum bpf_access_type type,
 				     struct bpf_prog *prog, u32 *target_size)
 {
 	struct bpf_insn *insn = insn_buf;
+	int off;
 
 	switch (si->off) {
 	case offsetof(struct sk_msg_md, data):
@@ -6164,6 +6171,107 @@ static u32 sk_msg_convert_ctx_access(enum bpf_access_type type,
 				      si->dst_reg, si->src_reg,
 				      offsetof(struct sk_msg_buff, data_end));
 		break;
+	case offsetof(struct sk_msg_md, family):
+		BUILD_BUG_ON(FIELD_SIZEOF(struct sock_common, skc_family) != 2);
+
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
+					      struct sk_msg_buff, sk),
+				      si->dst_reg, si->src_reg,
+				      offsetof(struct sk_msg_buff, sk));
+		*insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->dst_reg,
+				      offsetof(struct sock_common, skc_family));
+		break;
+
+	case offsetof(struct sk_msg_md, remote_ip4):
+		BUILD_BUG_ON(FIELD_SIZEOF(struct sock_common, skc_daddr) != 4);
+
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
+						struct sk_msg_buff, sk),
+				      si->dst_reg, si->src_reg,
+				      offsetof(struct sk_msg_buff, sk));
+		*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
+				      offsetof(struct sock_common, skc_daddr));
+		break;
+
+	case offsetof(struct sk_msg_md, local_ip4):
+		BUILD_BUG_ON(FIELD_SIZEOF(struct sock_common,
+					  skc_rcv_saddr) != 4);
+
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
+					      struct sk_msg_buff, sk),
+				      si->dst_reg, si->src_reg,
+				      offsetof(struct sk_msg_buff, sk));
+		*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
+				      offsetof(struct sock_common,
+					       skc_rcv_saddr));
+		break;
+
+	case offsetof(struct sk_msg_md, remote_ip6[0]) ...
+	     offsetof(struct sk_msg_md, remote_ip6[3]):
+#if IS_ENABLED(CONFIG_IPV6)
+		BUILD_BUG_ON(FIELD_SIZEOF(struct sock_common,
+					  skc_v6_daddr.s6_addr32[0]) != 4);
+
+		off = si->off;
+		off -= offsetof(struct sk_msg_md, remote_ip6[0]);
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
+						struct sk_msg_buff, sk),
+				      si->dst_reg, si->src_reg,
+				      offsetof(struct sk_msg_buff, sk));
+		*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
+				      offsetof(struct sock_common,
+					       skc_v6_daddr.s6_addr32[0]) +
+				      off);
+#else
+		*insn++ = BPF_MOV32_IMM(si->dst_reg, 0);
+#endif
+		break;
+
+	case offsetof(struct sk_msg_md, local_ip6[0]) ...
+	     offsetof(struct sk_msg_md, local_ip6[3]):
+#if IS_ENABLED(CONFIG_IPV6)
+		BUILD_BUG_ON(FIELD_SIZEOF(struct sock_common,
+					  skc_v6_rcv_saddr.s6_addr32[0]) != 4);
+
+		off = si->off;
+		off -= offsetof(struct sk_msg_md, local_ip6[0]);
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
+						struct sk_msg_buff, sk),
+				      si->dst_reg, si->src_reg,
+				      offsetof(struct sk_msg_buff, sk));
+		*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
+				      offsetof(struct sock_common,
+					       skc_v6_rcv_saddr.s6_addr32[0]) +
+				      off);
+#else
+		*insn++ = BPF_MOV32_IMM(si->dst_reg, 0);
+#endif
+		break;
+
+	case offsetof(struct sk_msg_md, remote_port):
+		BUILD_BUG_ON(FIELD_SIZEOF(struct sock_common, skc_dport) != 2);
+
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
+						struct sk_msg_buff, sk),
+				      si->dst_reg, si->src_reg,
+				      offsetof(struct sk_msg_buff, sk));
+		*insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->dst_reg,
+				      offsetof(struct sock_common, skc_dport));
+#ifndef __BIG_ENDIAN_BITFIELD
+		*insn++ = BPF_ALU32_IMM(BPF_LSH, si->dst_reg, 16);
+#endif
+		break;
+
+	case offsetof(struct sk_msg_md, local_port):
+		BUILD_BUG_ON(FIELD_SIZEOF(struct sock_common, skc_num) != 2);
+
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
+						struct sk_msg_buff, sk),
+				      si->dst_reg, si->src_reg,
+				      offsetof(struct sk_msg_buff, sk));
+		*insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->dst_reg,
+				      offsetof(struct sock_common, skc_num));
+		break;
 	}
 
 	return insn - insn_buf;

^ permalink raw reply related

* [bpf-next PATCH 2/2] bpf: add sk_msg prog sk access tests to test_verifier
From: John Fastabend @ 2018-05-17 15:54 UTC (permalink / raw)
  To: ast, daniel; +Cc: netdev
In-Reply-To: <20180517155305.21250.52379.stgit@john-Precision-Tower-5810>

Add tests for BPF_PROG_TYPE_SK_MSG to test_verifier for read access
to new sk fields.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 tools/include/uapi/linux/bpf.h              |    8 ++
 tools/testing/selftests/bpf/test_verifier.c |  115 +++++++++++++++++++++++++++
 2 files changed, 123 insertions(+)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index d94d333..97446bb 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -2176,6 +2176,14 @@ enum sk_action {
 struct sk_msg_md {
 	void *data;
 	void *data_end;
+
+	__u32 family;
+	__u32 remote_ip4;	/* Stored in network byte order */
+	__u32 local_ip4;	/* Stored in network byte order */
+	__u32 remote_ip6[4];	/* Stored in network byte order */
+	__u32 local_ip6[4];	/* Stored in network byte order */
+	__u32 remote_port;	/* Stored in network byte order */
+	__u32 local_port;	/* stored in host byte order */
 };
 
 #define BPF_TAG_SIZE	8
diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index a877af0..1ac7630 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -1686,6 +1686,121 @@ static void bpf_fill_rand_ld_dw(struct bpf_test *self)
 		.prog_type = BPF_PROG_TYPE_SK_SKB,
 	},
 	{
+		"valid access family in SK_MSG",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+				    offsetof(struct sk_msg_md, family)),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_SK_MSG,
+	},
+	{
+		"valid access remote_ip4 in SK_MSG",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+				    offsetof(struct sk_msg_md, remote_ip4)),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_SK_MSG,
+	},
+	{
+		"valid access local_ip4 in SK_MSG",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+				    offsetof(struct sk_msg_md, local_ip4)),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_SK_MSG,
+	},
+	{
+		"valid access remote_port in SK_MSG",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+				    offsetof(struct sk_msg_md, remote_port)),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_SK_MSG,
+	},
+	{
+		"valid access local_port in SK_MSG",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+				    offsetof(struct sk_msg_md, local_port)),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_SK_MSG,
+	},
+	{
+		"valid access remote_ip6 in SK_MSG",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+				    offsetof(struct sk_msg_md, remote_ip6[0])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+				    offsetof(struct sk_msg_md, remote_ip6[1])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+				    offsetof(struct sk_msg_md, remote_ip6[2])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+				    offsetof(struct sk_msg_md, remote_ip6[3])),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_SK_SKB,
+	},
+	{
+		"valid access local_ip6 in SK_MSG",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+				    offsetof(struct sk_msg_md, local_ip6[0])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+				    offsetof(struct sk_msg_md, local_ip6[1])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+				    offsetof(struct sk_msg_md, local_ip6[2])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+				    offsetof(struct sk_msg_md, local_ip6[3])),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_SK_SKB,
+	},
+	{
+		"invalid 64B read of family in SK_MSG",
+		.insns = {
+			BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_1,
+				    offsetof(struct sk_msg_md, family)),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "invalid bpf_context access",
+		.result = REJECT,
+		.prog_type = BPF_PROG_TYPE_SK_MSG,
+	},
+	{
+		"invalid read past end of SK_MSG",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1,
+				    offsetof(struct sk_msg_md, local_port) + 4),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "",
+		.result = REJECT,
+		.prog_type = BPF_PROG_TYPE_SK_MSG,
+	},
+	{
+		"invalid read offset in SK_MSG",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1,
+				    offsetof(struct sk_msg_md, family) + 1),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "",
+		.result = REJECT,
+		.prog_type = BPF_PROG_TYPE_SK_MSG,
+	},
+	{
 		"direct packet read for SK_MSG",
 		.insns = {
 			BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_1,

^ permalink raw reply related

* [PATCH net] net: test tailroom before appending to linear skb
From: Willem de Bruijn @ 2018-05-17 15:54 UTC (permalink / raw)
  To: netdev; +Cc: davem, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

Device features may change during transmission. In particular with
corking, a device may toggle scatter-gather in between allocating
and writing to an skb.

Do not unconditionally assume that !NETIF_F_SG at write time implies
that the same held at alloc time and thus the skb has sufficient
tailroom.

This issue predates git history.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 net/ipv4/ip_output.c  | 3 ++-
 net/ipv6/ip6_output.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 83c73bab2c3d..c15204ec2eb0 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1045,7 +1045,8 @@ static int __ip_append_data(struct sock *sk,
 		if (copy > length)
 			copy = length;
 
-		if (!(rt->dst.dev->features&NETIF_F_SG)) {
+		if (!(rt->dst.dev->features&NETIF_F_SG) &&
+		    skb_tailroom(skb) > copy) {
 			unsigned int off;
 
 			off = skb->len;
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 2e891d2c30ef..7b6d1689087b 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1503,7 +1503,8 @@ static int __ip6_append_data(struct sock *sk,
 		if (copy > length)
 			copy = length;
 
-		if (!(rt->dst.dev->features&NETIF_F_SG)) {
+		if (!(rt->dst.dev->features&NETIF_F_SG) &&
+		    skb_tailroom(skb) >= copy) {
 			unsigned int off;
 
 			off = skb->len;
-- 
2.17.0.441.gb46fe60e1d-goog

^ permalink raw reply related

* Re: net: ieee802154: 6lowpan: fix frag reassembly
From: Greg KH @ 2018-05-17 15:59 UTC (permalink / raw)
  To: Stefan Schmidt
  Cc: Stefan Schmidt, stable, Alexander Aring, David S. Miller,
	linux-wpan@vger.kernel.org, Network Development
In-Reply-To: <d70c35cd-2795-70f1-a7fc-2785c7938fac@datenfreihafen.org>

On Thu, May 17, 2018 at 04:16:20PM +0200, Stefan Schmidt wrote:
> Hello Greg.
> 
> On 17.05.2018 10:59, Greg KH wrote:
> > On Mon, May 14, 2018 at 05:22:18PM +0200, Stefan Schmidt wrote:
> >> Hello.
> >>
> >>
> >> Please apply f18fa5de5ba7f1d6650951502bb96a6e4715a948
> >>
> >> (net: ieee802154: 6lowpan: fix frag reassembly) to the 4.16.x stable tree.
> >>
> >>
> >> Earlier trees are not needed as the problem was introduced in 4.16.
> > 
> > Really?  Commit f18fa5de5ba7 ("net: ieee802154: 6lowpan: fix frag
> > reassembly") says it fixes commit 648700f76b03 ("inet: frags: use
> > rhashtables for reassembly units") which did not show up until 4.17-rc1:
> > 	$ git describe --contains 648700f76b03
> > 	v4.17-rc1~148^2~20^2~11
> > 
> > Also, it did not get backported to 4.16.y, so I don't see how it is
> > needed in 4.16-stable.
> 
> I guess its time to blush on my side. During the bisection for the
> commit that introduced the problem I came to the point where it was
> clear to me that it was already in 4.16. This was a while back I have
> have honestly no idea how I did this mistake.
> 
> I tested again now with plain 4.16 and it works fine.
> The fix is also in 4.17-rcX where it actually is needed. In the end I am
> glad that it was not introduced and slipped me in an earlier release.
> 
> > To verify this, I tried applying the patch, and it totally fails to
> > apply to the 4.16.y tree.
> > 
> > So are you _sure_ you want/need this in 4.16?  If so, can you provide a
> > working backport that you have verified works?
> 
> No backport needed. I simply screwed up when verifying this for 4.16.
> I put on the hat of shame for today and will try harder the next time.

Hey, not a problem, thanks for verifying, 'git describe --contains' is
your friend :)

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH net-next] net: NET_VENDOR_MICROSEMI should default ot N
From: Sergei Shtylyov @ 2018-05-17 16:02 UTC (permalink / raw)
  To: David Ahern, netdev; +Cc: alexandre.belloni
In-Reply-To: <20180517154330.10678-1-dsahern@gmail.com>

On 05/17/2018 06:43 PM, David Ahern wrote:

> Other ethernet drivers default to N. There is no reason for Microsemi
> to default to y. I believe Linus has set the bar at a feature that cures
> cancer can be enabled by default. [1]
> 
> [1] https://lkml.org/lkml/2010/3/2/366
> 
> Signed-off-by: David Ahern <dsahern@gmail.com>
> ---
>  drivers/net/ethernet/mscc/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/mscc/Kconfig b/drivers/net/ethernet/mscc/Kconfig
> index 36c84625d54e..0ef40b05c8af 100644
> --- a/drivers/net/ethernet/mscc/Kconfig
> +++ b/drivers/net/ethernet/mscc/Kconfig
> @@ -1,7 +1,7 @@
>  # SPDX-License-Identifier: (GPL-2.0 OR MIT)
>  config NET_VENDOR_MICROSEMI
>  	bool "Microsemi devices"
> -	default y
> +	default n

   You know that N is the default default? :-)

[...]

MBR, Sergei

^ permalink raw reply

* Re: [PATCH bpf-next] bpf: change eBPF helper doc parsing script to allow for smaller indent
From: Quentin Monnet @ 2018-05-17 16:05 UTC (permalink / raw)
  To: Daniel Borkmann, ast; +Cc: netdev, oss-drivers
In-Reply-To: <b123dc7c-9ec3-f82a-1077-fcbe1c99cc1a@iogearbox.net>

2018-05-17 17:38 UTC+0200 ~ Daniel Borkmann <daniel@iogearbox.net>
> On 05/17/2018 02:43 PM, Quentin Monnet wrote:
>> Documentation for eBPF helpers can be parsed from bpf.h and eventually
>> turned into a man page. Commit 6f96674dbd8c ("bpf: relax constraints on
>> formatting for eBPF helper documentation") changed the script used to
>> parse it, in order to allow for different indent style and to ease the
>> work for writing documentation for future helpers.
>>
>> The script currently considers that the first tab can be replaced by 6
>> to 8 spaces. But the documentation for bpf_fib_lookup() uses a mix of
>> tabs (for the "Description" part) and of spaces ("Return" part), and
>> only has 5 space long indent for the latter.
>>
>> We probably do not want to change the values accepted by the script each
>> time a new helper gets a new indent style. However, it is worth noting
>> that with those 5 spaces, the "Description" and "Return" part *look*
>> aligned in the generated patch and in `git show`, so it is likely other
>> helper authors will use the same length. Therefore, allow for helper
>> documentation to use 5 spaces only for the first indent level.
>>
>> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
> 
> Applied to bpf-next, thanks Quentin! Btw in the current uapi description
> some of the helpers have a new line before 'Return' and most have not. I
> presume it doesn't really matter though we might want to do a one-time
> cleanup on these cases at some point in time.

Thanks Daniel!

I did notice those new lines as well. The script was failing on the
5-space indent, but not on the new lines, so I let them as they are. I
agree for the cleanup, I can send a patch when the various helpers
currently being discussed on the list are merged.

Best,
Quentin

^ permalink raw reply

* Re: [PATCH net] net: test tailroom before appending to linear skb
From: Eric Dumazet @ 2018-05-17 16:05 UTC (permalink / raw)
  To: Willem de Bruijn, netdev; +Cc: davem, Willem de Bruijn
In-Reply-To: <20180517155437.120414-1-willemdebruijn.kernel@gmail.com>



On 05/17/2018 08:54 AM, Willem de Bruijn wrote:
> From: Willem de Bruijn <willemb@google.com>
> 
> Device features may change during transmission. In particular with
> corking, a device may toggle scatter-gather in between allocating
> and writing to an skb.
> 
> Do not unconditionally assume that !NETIF_F_SG at write time implies
> that the same held at alloc time and thus the skb has sufficient
> tailroom.
> 
> This issue predates git history.
> 
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Reported-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Willem de Bruijn <willemb@google.com>

Excellent, thanks Willem.

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* Re: suspicious RCU usage at ./include/net/inet_sock.h:LINE
From: Dmitry Vyukov @ 2018-05-17 16:09 UTC (permalink / raw)
  To: Eric Biggers
  Cc: syzbot, David Miller, dccp, Gerrit Renker, Alexey Kuznetsov, LKML,
	netdev, syzkaller-bugs, Hideaki YOSHIFUJI
In-Reply-To: <20180408192952.GB685@sol.localdomain>

On Sun, Apr 8, 2018 at 9:29 PM, Eric Biggers <ebiggers3@gmail.com> wrote:
>> syzkaller has found reproducer for the following crash on
>> fba961ab29e5ffb055592442808bb0f7962e05da
>> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/master
>> compiler: gcc (GCC) 7.1.1 20170620
>> .config is attached
>> Raw console output is attached.
>> C reproducer is attached
>> syzkaller reproducer is attached. See https://goo.gl/kgGztJ
>> for information about syzkaller reproducers
>>
>>
>> Can not set IPV6_FL_F_REFLECT if flowlabel_consistency sysctl is enable
>>
>> =============================
>> WARNING: suspicious RCU usage
>> 4.15.0-rc4+ #164 Not tainted
>> -----------------------------
>> ./include/net/inet_sock.h:136 suspicious rcu_dereference_check() usage!
>>
>> other info that might help us debug this:
>>
>>
>> rcu_scheduler_active = 2, debug_locks = 1
>> 1 lock held by syzkaller667189/5780:
>>  #0:  (sk_lock-AF_INET6){+.+.}, at: [<000000008d7d4e62>] lock_sock
>> include/net/sock.h:1462 [inline]
>>  #0:  (sk_lock-AF_INET6){+.+.}, at: [<000000008d7d4e62>]
>> do_ipv6_setsockopt.isra.9+0x23d/0x38f0 net/ipv6/ipv6_sockglue.c:167
>>
>> stack backtrace:
>> CPU: 0 PID: 5780 Comm: syzkaller667189 Not tainted 4.15.0-rc4+ #164
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
>> Google 01/01/2011
>> Call Trace:
>>  __dump_stack lib/dump_stack.c:17 [inline]
>>  dump_stack+0x194/0x257 lib/dump_stack.c:53
>>  lockdep_rcu_suspicious+0x123/0x170 kernel/locking/lockdep.c:4585
>>  ireq_opt_deref include/net/inet_sock.h:135 [inline]
>>  inet_csk_route_req+0x824/0xca0 net/ipv4/inet_connection_sock.c:544
>>  dccp_v4_send_response+0xa7/0x640 net/dccp/ipv4.c:485
>>  dccp_v4_conn_request+0x9ee/0x11b0 net/dccp/ipv4.c:633
>>  dccp_v6_conn_request+0xd30/0x1350 net/dccp/ipv6.c:317
>>  dccp_rcv_state_process+0x574/0x1620 net/dccp/input.c:612
>>  dccp_v4_do_rcv+0xeb/0x160 net/dccp/ipv4.c:682
>>  dccp_v6_do_rcv+0x81a/0x9b0 net/dccp/ipv6.c:578
>>  sk_backlog_rcv include/net/sock.h:907 [inline]
>>  __release_sock+0x124/0x360 net/core/sock.c:2274
>>  release_sock+0xa4/0x2a0 net/core/sock.c:2789
>>  do_ipv6_setsockopt.isra.9+0x50f/0x38f0 net/ipv6/ipv6_sockglue.c:898
>>  ipv6_setsockopt+0xd7/0x150 net/ipv6/ipv6_sockglue.c:922
>>  dccp_setsockopt+0x85/0xd0 net/dccp/proto.c:573
>>  sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2978
>>  SYSC_setsockopt net/socket.c:1821 [inline]
>>  SyS_setsockopt+0x189/0x360 net/socket.c:1800
>>  entry_SYSCALL_64_fastpath+0x1f/0x96
>> RIP: 0033:0x445ec9
>> RSP: 002b:00007fa001b58db8 EFLAGS: 00000297 ORIG_RAX: 0000000000000036
>> RAX: ffffffffffffffda RBX: 00000000006dbc24 RCX: 0000000000445ec9
>> RDX: 0000000000000020 RSI: 0000000000000029 RDI: 0000000000000004
>> RBP: 00000000006dbc20 R08: 0000000000000020 R09: 0000000000000000
>> R10: 000000002030a000 R11: 0000000000000297 R12: 0000000000000000
>> R13: 00007fff809eec1f R14: 00007fa001b599c0 R15: 0000000000000001
>>
>> =============================
>> WARNING: suspicious RCU usage
>> 4.15.0-rc4+ #164 Not tainted
>> -----------------------------
>> ./include/net/inet_sock.h:136 suspicious rcu_dereference_check() usage!
>>
>> other info that might help us debug this:
>>
>>
>> rcu_scheduler_active = 2, debug_locks = 1
>> 1 lock held by syzkaller667189/5780:
>>  #0:  (sk_lock-AF_INET6){+.+.}, at: [<000000008d7d4e62>] lock_sock
>> include/net/sock.h:1462 [inline]
>>  #0:  (sk_lock-AF_INET6){+.+.}, at: [<000000008d7d4e62>]
>> do_ipv6_setsockopt.isra.9+0x23d/0x38f0 net/ipv6/ipv6_sockglue.c:167
>>
>> stack backtrace:
>> CPU: 0 PID: 5780 Comm: syzkaller667189 Not tainted 4.15.0-rc4+ #164
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
>> Google 01/01/2011
>> Call Trace:
>>  __dump_stack lib/dump_stack.c:17 [inline]
>>  dump_stack+0x194/0x257 lib/dump_stack.c:53
>>  lockdep_rcu_suspicious+0x123/0x170 kernel/locking/lockdep.c:4585
>>  ireq_opt_deref include/net/inet_sock.h:135 [inline]
>>  dccp_v4_send_response+0x4b0/0x640 net/dccp/ipv4.c:496
>>  dccp_v4_conn_request+0x9ee/0x11b0 net/dccp/ipv4.c:633
>>  dccp_v6_conn_request+0xd30/0x1350 net/dccp/ipv6.c:317
>>  dccp_rcv_state_process+0x574/0x1620 net/dccp/input.c:612
>>  dccp_v4_do_rcv+0xeb/0x160 net/dccp/ipv4.c:682
>>  dccp_v6_do_rcv+0x81a/0x9b0 net/dccp/ipv6.c:578
>>  sk_backlog_rcv include/net/sock.h:907 [inline]
>>  __release_sock+0x124/0x360 net/core/sock.c:2274
>>  release_sock+0xa4/0x2a0 net/core/sock.c:2789
>>  do_ipv6_setsockopt.isra.9+0x50f/0x38f0 net/ipv6/ipv6_sockglue.c:898
>>  ipv6_setsockopt+0xd7/0x150 net/ipv6/ipv6_sockglue.c:922
>>  dccp_setsockopt+0x85/0xd0 net/dccp/proto.c:573
>>  sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2978
>>  SYSC_setsockopt net/socket.c:1821 [inline]
>>  SyS_setsockopt+0x189/0x360 net/socket.c:1800
>>  entry_SYSCALL_64_fastpath+0x1f/0x96
>> RIP: 0033:0x445ec9
>> RSP: 002b:00007fa001b58db8 EFLAGS: 00000297 ORIG_RAX: 0000000000000036
>> RAX: ffffffffffffffda RBX: 00000000006dbc24 RCX: 0000000000445ec9
>> RDX: 0000000000000020 RSI: 0000000000000029 RDI: 0000000000000004
>> RBP: 00000000006dbc20 R08: 0000000000000020 R09: 0000000000000000
>> R10: 000000002030a000 R11: 0000000000000297 R12: 0000000000000000
>> R13: 00007fff809eec1f R14: 00007fa001b599c0 R15: 0000000000000001
>
> syzbot stopped hitting this for some reason, but the bug is still there.
> Here's a simplified reproducer that works on Linus' tree as of today:
>
> #include <linux/in.h>
> #include <stdlib.h>
> #include <sys/socket.h>
> #include <unistd.h>
>
> int main()
> {
>         int is_parent = (fork() != 0);
>         for (;;) {
>                 int fd = socket(AF_INET, SOCK_DCCP, 0);
>                 struct sockaddr_in addr = {
>                         .sin_family = AF_INET,
>                         .sin_port = htobe16(0x4e23),
>                 };
>                 if (is_parent) {
>                         connect(fd, (void *)&addr, sizeof(addr));
>                 } else {
>                         bind(fd, (void *)&addr, sizeof(addr));
>                         listen(fd, 100);
>                         setsockopt(fd, 0, 0xFFFF, NULL, 0);
>                 }
>                 close(fd);
>         }
> }



Still happens on the current upstream HEAD
e6506eb241871d68647c53cb6d0a16299550ae97.

^ permalink raw reply

* [PATCH bpf-next 0/3] bpf: Add MTU check to fib lookup helper
From: David Ahern @ 2018-05-17 16:09 UTC (permalink / raw)
  To: netdev, borkmann, ast; +Cc: davem, David Ahern

Packets that exceed the egress MTU can not be forwarded in the fast path.
Add IPv4 and IPv6 MTU helpers that take a FIB lookup result (versus the
typical dst path) and add the calls to bpf_ipv{4,6}_fib_lookup.

David Ahern (3):
  net/ipv4: Add helper to return path MTU based on fib result
  net/ipv6: Add helper to return path MTU based on fib result
  bpf: Add mtu checking to FIB forwarding helper

 include/net/ip6_fib.h   |  6 ++++++
 include/net/ip6_route.h |  3 +++
 include/net/ip_fib.h    |  2 ++
 net/core/filter.c       | 10 ++++++++++
 net/ipv4/route.c        | 31 +++++++++++++++++++++++++++++++
 net/ipv6/route.c        | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 100 insertions(+)

-- 
2.11.0

^ permalink raw reply

* [PATCH bpf-next 1/3] net/ipv4: Add helper to return path MTU based on fib result
From: David Ahern @ 2018-05-17 16:09 UTC (permalink / raw)
  To: netdev, borkmann, ast; +Cc: davem, David Ahern
In-Reply-To: <20180517160930.25076-1-dsahern@gmail.com>

Determine path MTU from a FIB lookup result. Logic is a distillation of
ip_dst_mtu_maybe_forward.

Signed-off-by: David Ahern <dsahern@gmail.com>
---
 include/net/ip_fib.h |  2 ++
 net/ipv4/route.c     | 31 +++++++++++++++++++++++++++++++
 2 files changed, 33 insertions(+)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 81d0f2107ff1..69c91d1934c1 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -449,4 +449,6 @@ static inline void fib_proc_exit(struct net *net)
 }
 #endif
 
+u32 ip_mtu_from_fib_result(struct fib_result *res, __be32 daddr);
+
 #endif  /* _NET_FIB_H */
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 29268efad247..ac3b22bc51b2 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1352,6 +1352,37 @@ static struct fib_nh_exception *find_exception(struct fib_nh *nh, __be32 daddr)
 	return NULL;
 }
 
+/* MTU selection:
+ * 1. mtu on route is locked - use it
+ * 2. mtu from nexthop exception
+ * 3. mtu from egress device
+ */
+
+u32 ip_mtu_from_fib_result(struct fib_result *res, __be32 daddr)
+{
+	struct fib_info *fi = res->fi;
+	struct fib_nh *nh = &fi->fib_nh[res->nh_sel];
+	struct net_device *dev = nh->nh_dev;
+	u32 mtu = 0;
+
+	if (dev_net(dev)->ipv4.sysctl_ip_fwd_use_pmtu ||
+	    fi->fib_metrics->metrics[RTAX_LOCK - 1] & (1 << RTAX_MTU))
+		mtu = fi->fib_mtu;
+
+	if (likely(!mtu)) {
+		struct fib_nh_exception *fnhe;
+
+		fnhe = find_exception(nh, daddr);
+		if (fnhe && !time_after_eq(jiffies, fnhe->fnhe_expires))
+			mtu = fnhe->fnhe_pmtu;
+	}
+
+	if (likely(!mtu))
+		mtu = min(READ_ONCE(dev->mtu), IP_MAX_MTU);
+
+	return mtu - lwtunnel_headroom(nh->nh_lwtstate, mtu);
+}
+
 static bool rt_bind_exception(struct rtable *rt, struct fib_nh_exception *fnhe,
 			      __be32 daddr, const bool do_cache)
 {
-- 
2.11.0

^ permalink raw reply related

* [PATCH bpf-next 2/3] net/ipv6: Add helper to return path MTU based on fib result
From: David Ahern @ 2018-05-17 16:09 UTC (permalink / raw)
  To: netdev, borkmann, ast; +Cc: davem, David Ahern
In-Reply-To: <20180517160930.25076-1-dsahern@gmail.com>

Determine path MTU from a FIB lookup result. Logic is based on
ip6_dst_mtu_forward plus lookup of nexthop exception.

Signed-off-by: David Ahern <dsahern@gmail.com>
---
 include/net/ip6_fib.h   |  6 ++++++
 include/net/ip6_route.h |  3 +++
 net/ipv6/route.c        | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 57 insertions(+)

diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index cc70f6da8462..7897efe80727 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -412,6 +412,12 @@ static inline struct net_device *fib6_info_nh_dev(const struct fib6_info *f6i)
 	return f6i->fib6_nh.nh_dev;
 }
 
+static inline
+struct lwtunnel_state *fib6_info_nh_lwt(const struct fib6_info *f6i)
+{
+	return f6i->fib6_nh.nh_lwtstate;
+}
+
 void inet6_rt_notify(int event, struct fib6_info *rt, struct nl_info *info,
 		     unsigned int flags);
 
diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index 4cf1ef935ed9..7b9c82de11cc 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -300,6 +300,9 @@ static inline unsigned int ip6_dst_mtu_forward(const struct dst_entry *dst)
 	return mtu;
 }
 
+u32 ip6_mtu_from_fib6(struct fib6_info *f6i, struct in6_addr *daddr,
+		      struct in6_addr *saddr);
+
 struct neighbour *ip6_neigh_lookup(const struct in6_addr *gw,
 				   struct net_device *dev, struct sk_buff *skb,
 				   const void *daddr);
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index cc24ed3bc334..a9b2c8e06404 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2603,6 +2603,54 @@ static unsigned int ip6_mtu(const struct dst_entry *dst)
 	return mtu - lwtunnel_headroom(dst->lwtstate, mtu);
 }
 
+/* MTU selection:
+ * 1. mtu on route is locked - use it
+ * 2. mtu from nexthop exception
+ * 3. mtu from egress device
+ *
+ * based on ip6_dst_mtu_forward and exception logic of
+ * rt6_find_cached_rt; called with rcu_read_lock
+ */
+u32 ip6_mtu_from_fib6(struct fib6_info *f6i, struct in6_addr *daddr,
+		      struct in6_addr *saddr)
+{
+	struct rt6_exception_bucket *bucket;
+	struct rt6_exception *rt6_ex;
+	struct in6_addr *src_key;
+	struct inet6_dev *idev;
+	u32 mtu = 0;
+
+	if (unlikely(fib6_metric_locked(f6i, RTAX_MTU))) {
+		mtu = f6i->fib6_pmtu;
+		if (mtu)
+			goto out;
+	}
+
+	src_key = NULL;
+#ifdef CONFIG_IPV6_SUBTREES
+	if (f6i->fib6_src.plen)
+		src_key = saddr;
+#endif
+
+	bucket = rcu_dereference(f6i->rt6i_exception_bucket);
+	rt6_ex = __rt6_find_exception_rcu(&bucket, daddr, src_key);
+	if (rt6_ex && !rt6_check_expired(rt6_ex->rt6i))
+		mtu = dst_metric_raw(&rt6_ex->rt6i->dst, RTAX_MTU);
+
+	if (likely(!mtu)) {
+		struct net_device *dev = fib6_info_nh_dev(f6i);
+
+		mtu = IPV6_MIN_MTU;
+		idev = __in6_dev_get(dev);
+		if (idev && idev->cnf.mtu6 > mtu)
+			mtu = idev->cnf.mtu6;
+	}
+
+	mtu = min_t(unsigned int, mtu, IP6_MAX_MTU);
+out:
+	return mtu - lwtunnel_headroom(fib6_info_nh_lwt(f6i), mtu);
+}
+
 struct dst_entry *icmp6_dst_alloc(struct net_device *dev,
 				  struct flowi6 *fl6)
 {
-- 
2.11.0

^ permalink raw reply related

* [PATCH bpf-next 3/3] bpf: Add mtu checking to FIB forwarding helper
From: David Ahern @ 2018-05-17 16:09 UTC (permalink / raw)
  To: netdev, borkmann, ast; +Cc: davem, David Ahern
In-Reply-To: <20180517160930.25076-1-dsahern@gmail.com>

Add check that egress MTU can handle packet to be forwarded. If
the MTU is less than the packet lenght, return 0 meaning the
packet is expected to continue up the stack for help - eg.,
fragmenting the packet or sending an ICMP.

Signed-off-by: David Ahern <dsahern@gmail.com>
---
 net/core/filter.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/net/core/filter.c b/net/core/filter.c
index 6d0d1560bd70..c47c47a75d4b 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4098,6 +4098,7 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
 	struct fib_nh *nh;
 	struct flowi4 fl4;
 	int err;
+	u32 mtu;
 
 	dev = dev_get_by_index_rcu(net, params->ifindex);
 	if (unlikely(!dev))
@@ -4149,6 +4150,10 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
 	if (res.fi->fib_nhs > 1)
 		fib_select_path(net, &res, &fl4, NULL);
 
+	mtu = ip_mtu_from_fib_result(&res, params->ipv4_dst);
+	if (params->tot_len > mtu)
+		return 0;
+
 	nh = &res.fi->fib_nh[res.nh_sel];
 
 	/* do not handle lwt encaps right now */
@@ -4188,6 +4193,7 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
 	struct flowi6 fl6;
 	int strict = 0;
 	int oif;
+	u32 mtu;
 
 	/* link local addresses are never forwarded */
 	if (rt6_need_strict(dst) || rt6_need_strict(src))
@@ -4250,6 +4256,10 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
 						       fl6.flowi6_oif, NULL,
 						       strict);
 
+	mtu = ip6_mtu_from_fib6(f6i, dst, src);
+	if (params->tot_len > mtu)
+		return 0;
+
 	if (f6i->fib6_nh.nh_lwtstate)
 		return 0;
 
-- 
2.11.0

^ permalink raw reply related

* Re: [bpf PATCH 1/2] bpf: sockmap update rollback on error can incorrectly dec prog refcnt
From: Martin KaFai Lau @ 2018-05-17 16:15 UTC (permalink / raw)
  To: John Fastabend; +Cc: ast, daniel, netdev
In-Reply-To: <20180516214651.6664.62408.stgit@john-Precision-Tower-5810>

On Wed, May 16, 2018 at 02:46:51PM -0700, John Fastabend wrote:
> If the user were to only attach one of the parse or verdict programs
> then it is possible a subsequent sockmap update could incorrectly
> decrement the refcnt on the program. This happens because in the
> rollback logic, after an error, we have to decrement the program
> reference count when its been incremented. However, we only increment
> the program reference count if the user has both a verdict and a
> parse program. The reason for this is because, at least at the
> moment, both are required for any one to be meaningful. The problem
> fixed here is in the rollback path we decrement the program refcnt
> even if only one existing. But we never incremented the refcnt in
> the first place creating an imbalance.
> 
> This patch fixes the error path to handle this case.
> 
> Fixes: 2f857d04601a ("bpf: sockmap, remove STRPARSER map_flags and add multi-map support")
> Reported-by: Daniel Borkmann <daniel@iogearbox.net>
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>

^ permalink raw reply

* Re: [bpf PATCH 2/2] bpf: parse and verdict prog attach may race with bpf map update
From: Martin KaFai Lau @ 2018-05-17 16:16 UTC (permalink / raw)
  To: John Fastabend; +Cc: ast, daniel, netdev
In-Reply-To: <20180516214656.6664.34077.stgit@john-Precision-Tower-5810>

On Wed, May 16, 2018 at 02:46:56PM -0700, John Fastabend wrote:
> In the sockmap design BPF programs (SK_SKB_STREAM_PARSER and
> SK_SKB_STREAM_VERDICT) are attached to the sockmap map type and when
> a sock is added to the map the programs are used by the socket.
> However, sockmap updates from both userspace and BPF programs can
> happen concurrently with the attach and detach of these programs.
> 
> To resolve this we use the bpf_prog_inc_not_zero and a READ_ONCE()
> primitive to ensure the program pointer is not refeched and
> possibly NULL'd before the refcnt increment. This happens inside
> a RCU critical section so although the pointer reference in the map
> object may be NULL (by a concurrent detach operation) the reference
> from READ_ONCE will not be free'd until after grace period. This
> ensures the object returned by READ_ONCE() is valid through the
> RCU criticl section and safe to use as long as we "know" it may
> be free'd shortly.
> 
> Daniel spotted a case in the sock update API where instead of using
> the READ_ONCE() program reference we used the pointer from the
> original map, stab->bpf_{verdict|parse}. The problem with this is
> the logic checks the object returned from the READ_ONCE() is not
> NULL and then tries to reference the object again but using the
> above map pointer, which may have already been NULL'd by a parallel
> detach operation. If this happened bpf_porg_inc_not_zero could
> dereference a NULL pointer.
> 
> Fix this by using variable returned by READ_ONCE() that is checked
> for NULL.
> 
> Fixes: 2f857d04601a ("bpf: sockmap, remove STRPARSER map_flags and add multi-map support")
> Reported-by: Daniel Borkmann <daniel@iogearbox.net>
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>

^ permalink raw reply

* Re: [PATCH net-next] net/smc: init conn.tx_work & conn.send_lock sooner
From: Ursula Braun @ 2018-05-17 16:30 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Eric Dumazet, linux-s390
In-Reply-To: <CANn89iJCjp++D=awHqPicuBqdF8dcvj9=-NF3=YUVSdxh7VgGQ@mail.gmail.com>



On 05/17/2018 05:28 PM, Eric Dumazet wrote:
> On Thu, May 17, 2018 at 6:58 AM Ursula Braun <ubraun@linux.ibm.com> wrote:
> 
> 
> 
>> On 05/17/2018 02:20 PM, Eric Dumazet wrote:
>>> On Thu, May 17, 2018 at 5:13 AM Ursula Braun <ubraun@linux.ibm.com>
> wrote:
>>>
>>>> This problem should no longer show up with yesterday's net-next commit
>>>> 569bc6436568 ("net/smc: no tx work trigger for fallback sockets").
>>>
>>> It definitely triggers on latest net-next, which includes 569bc6436568
>>>
>>> Thanks.
>>>
> 
>> Sorry, my fault.
> 
>> Your proposed patch solves the problem. On the other hand the purpose of
>> smc_tx_init() has been to cover tx-related socket initializations needed
> for
>> connection sockets only. tx_work is something that should be scheduled
> only
>> for active connection sockets in non-fallback mode.
>> Thus I prefer this alternate patch to solve the problem:
> 
>> ---
>>   net/smc/af_smc.c |    8 ++++++--
>>   1 file changed, 6 insertions(+), 2 deletions(-)
> 
>> --- a/net/smc/af_smc.c
>> +++ b/net/smc/af_smc.c
>> @@ -1362,14 +1362,18 @@ static int smc_setsockopt(struct socket
>>                  }
>>                  break;
>>          case TCP_NODELAY:
>> -               if (sk->sk_state != SMC_INIT && sk->sk_state !=
> SMC_LISTEN) {
>> +               if (sk->sk_state != SMC_INIT &&
>> +                   sk->sk_state != SMC_LISTEN &&
>> +                   sk->sk_state != SMC_CLOSED) {
>>                          if (val && !smc->use_fallback)
>>                                  mod_delayed_work(system_wq,
> &smc->conn.tx_work,
>>                                                   0);
>>                  }
>>                  break;
>>          case TCP_CORK:
>> -               if (sk->sk_state != SMC_INIT && sk->sk_state !=
> SMC_LISTEN) {
>> +               if (sk->sk_state != SMC_INIT &&
>> +                   sk->sk_state != SMC_LISTEN &&
>> +                   sk->sk_state != SMC_CLOSED) {
>>                          if (!val && !smc->use_fallback)
>>                                  mod_delayed_work(system_wq,
> &smc->conn.tx_work,
>>                                                   0);
> 
>> What do you think?
> 
> I think my patch is cleaner.
> 
> Deferring spinlock and workqueues setup is a recipe for disaster.
> 

If your solution is preferred, I agree. In this case my today's net/smc patch
   net/smc: initialize tx_work before llc initial handshake
for the net-tree is obsolete.

^ permalink raw reply

* Re: [PATCH net-next] net: NET_VENDOR_MICROSEMI should default ot N
From: David Miller @ 2018-05-17 16:32 UTC (permalink / raw)
  To: dsahern; +Cc: netdev, alexandre.belloni
In-Reply-To: <20180517154330.10678-1-dsahern@gmail.com>

From: David Ahern <dsahern@gmail.com>
Date: Thu, 17 May 2018 08:43:30 -0700

> Other ethernet drivers default to N. There is no reason for Microsemi
> to default to y. I believe Linus has set the bar at a feature that cures
> cancer can be enabled by default. [1]
> 
> [1] https://lkml.org/lkml/2010/3/2/366
> 
> Signed-off-by: David Ahern <dsahern@gmail.com>

For "drivers" yes, those should default to N.  But for vendor guards
like this, they should default to Y.

^ permalink raw reply

* [net-next 6/6] ice: Update NVM AQ command functions
From: Jeff Kirsher @ 2018-05-17 16:37 UTC (permalink / raw)
  To: davem
  Cc: Anirudh Venkataramanan, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher
In-Reply-To: <20180517163732.30910-1-jeffrey.t.kirsher@intel.com>

From: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

This patch updates the NVM read/erase/update AQ commands to align with
the latest specification.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_adminq_cmd.h | 13 +++++++------
 drivers/net/ethernet/intel/ice/ice_nvm.c        |  7 ++++---
 2 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
index 7dc5f045e969..7541ec2270b3 100644
--- a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
@@ -1049,7 +1049,9 @@ struct ice_aqc_set_event_mask {
  * NVM Update commands (indirect 0x0703)
  */
 struct ice_aqc_nvm {
-	u8	cmd_flags;
+	__le16 offset_low;
+	u8 offset_high;
+	u8 cmd_flags;
 #define ICE_AQC_NVM_LAST_CMD		BIT(0)
 #define ICE_AQC_NVM_PCIR_REQ		BIT(0)	/* Used by NVM Update reply */
 #define ICE_AQC_NVM_PRESERVATION_S	1
@@ -1058,12 +1060,11 @@ struct ice_aqc_nvm {
 #define ICE_AQC_NVM_PRESERVE_ALL	BIT(1)
 #define ICE_AQC_NVM_PRESERVE_SELECTED	(3 << CSR_AQ_NVM_PRESERVATION_S)
 #define ICE_AQC_NVM_FLASH_ONLY		BIT(7)
-	u8	module_typeid;
-	__le16	length;
+	__le16 module_typeid;
+	__le16 length;
 #define ICE_AQC_NVM_ERASE_LEN	0xFFFF
-	__le32	offset;
-	__le32	addr_high;
-	__le32	addr_low;
+	__le32 addr_high;
+	__le32 addr_low;
 };
 
 /* Get/Set RSS key (indirect 0x0B04/0x0B02) */
diff --git a/drivers/net/ethernet/intel/ice/ice_nvm.c b/drivers/net/ethernet/intel/ice/ice_nvm.c
index fa7a69ac92b0..92da0a626ce0 100644
--- a/drivers/net/ethernet/intel/ice/ice_nvm.c
+++ b/drivers/net/ethernet/intel/ice/ice_nvm.c
@@ -16,7 +16,7 @@
  * Read the NVM using the admin queue commands (0x0701)
  */
 static enum ice_status
-ice_aq_read_nvm(struct ice_hw *hw, u8 module_typeid, u32 offset, u16 length,
+ice_aq_read_nvm(struct ice_hw *hw, u16 module_typeid, u32 offset, u16 length,
 		void *data, bool last_command, struct ice_sq_cd *cd)
 {
 	struct ice_aq_desc desc;
@@ -33,8 +33,9 @@ ice_aq_read_nvm(struct ice_hw *hw, u8 module_typeid, u32 offset, u16 length,
 	/* If this is the last command in a series, set the proper flag. */
 	if (last_command)
 		cmd->cmd_flags |= ICE_AQC_NVM_LAST_CMD;
-	cmd->module_typeid = module_typeid;
-	cmd->offset = cpu_to_le32(offset);
+	cmd->module_typeid = cpu_to_le16(module_typeid);
+	cmd->offset_low = cpu_to_le16(offset & 0xFFFF);
+	cmd->offset_high = (offset >> 16) & 0xFF;
 	cmd->length = cpu_to_le16(length);
 
 	return ice_aq_send_cmd(hw, &desc, data, length, cd);
-- 
2.17.0

^ permalink raw reply related

* [net-next 3/6] ixgbe: release lock for the duration of ixgbe_suspend_close()
From: Jeff Kirsher @ 2018-05-17 16:37 UTC (permalink / raw)
  To: davem; +Cc: Pavel Tatashin, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20180517163732.30910-1-jeffrey.t.kirsher@intel.com>

From: Pavel Tatashin <pasha.tatashin@oracle.com>

Currently, during device_shutdown() ixgbe holds rtnl_lock for the duration
of lengthy ixgbe_close_suspend(). On machines with multiple ixgbe cards
this lock prevents scaling if device_shutdown() function is multi-threaded.

It is not necessary to hold this lock during ixgbe_close_suspend()
as it is not held when ixgbe_close() is called also during shutdown but for
kexec case.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index a52d92e182ee..5ddfb93ed491 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -6698,8 +6698,15 @@ static int __ixgbe_shutdown(struct pci_dev *pdev, bool *enable_wake)
 	rtnl_lock();
 	netif_device_detach(netdev);
 
-	if (netif_running(netdev))
+	if (netif_running(netdev)) {
+		/* Suspend takes a long time, device_shutdown may be
+		 * parallelized this function, so drop lock for the
+		 * duration of this call.
+		 */
+		rtnl_unlock();
 		ixgbe_close_suspend(adapter);
+		rtnl_lock();
+	}
 
 	ixgbe_clear_interrupt_scheme(adapter);
 	rtnl_unlock();
-- 
2.17.0

^ permalink raw reply related

* [net-next 5/6] ixgbevf: fix MAC address changes through ixgbevf_set_mac()
From: Jeff Kirsher @ 2018-05-17 16:37 UTC (permalink / raw)
  To: davem; +Cc: Emil Tantilov, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20180517163732.30910-1-jeffrey.t.kirsher@intel.com>

From: Emil Tantilov <emil.s.tantilov@intel.com>

Set hw->mac.perm_addr in ixgbevf_set_mac() in order to avoid losing the
custom MAC on reset. This can happen in the following case:

>ip link set $vf address $mac
>ethtool -r $vf

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 9a939dcaf727..083041129539 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -4164,6 +4164,7 @@ static int ixgbevf_set_mac(struct net_device *netdev, void *p)
 		return -EPERM;
 
 	ether_addr_copy(hw->mac.addr, addr->sa_data);
+	ether_addr_copy(hw->mac.perm_addr, addr->sa_data);
 	ether_addr_copy(netdev->dev_addr, addr->sa_data);
 
 	return 0;
-- 
2.17.0

^ permalink raw reply related

* [net-next 4/6] ixgbe: force VF to grab new MAC on driver reload
From: Jeff Kirsher @ 2018-05-17 16:37 UTC (permalink / raw)
  To: davem; +Cc: Emil Tantilov, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20180517163732.30910-1-jeffrey.t.kirsher@intel.com>

From: Emil Tantilov <emil.s.tantilov@intel.com>

Do not validate the MAC address during a reset, unless the MAC was set on
the host. This way the VF will get a new MAC address every time it reloads.

Remove the "no MAC address assigned" message since it will get spammed on
reset and it doesn't help much as the MAC on the VF is randomly generated.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
index 2649c06d877b..6f59933cdff7 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
@@ -854,14 +854,11 @@ static int ixgbe_vf_reset_msg(struct ixgbe_adapter *adapter, u32 vf)
 
 	/* reply to reset with ack and vf mac address */
 	msgbuf[0] = IXGBE_VF_RESET;
-	if (!is_zero_ether_addr(vf_mac)) {
+	if (!is_zero_ether_addr(vf_mac) && adapter->vfinfo[vf].pf_set_mac) {
 		msgbuf[0] |= IXGBE_VT_MSGTYPE_ACK;
 		memcpy(addr, vf_mac, ETH_ALEN);
 	} else {
 		msgbuf[0] |= IXGBE_VT_MSGTYPE_NACK;
-		dev_warn(&adapter->pdev->dev,
-			 "VF %d has no MAC address assigned, you may have to assign one manually\n",
-			 vf);
 	}
 
 	/*
-- 
2.17.0

^ permalink raw reply related

* [net-next 0/6][pull request] 10GbE Intel Wired LAN Driver Updates 2018-05-17
From: Jeff Kirsher @ 2018-05-17 16:37 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, nhorman, sassmann, jogreene

This series contains updates to ixgbe, ixgbevf and ice drivers.

Cathy Zhou resolves sparse warnings by using the force attribute.

Mauro S M Rodrigues fixes a bug where IRQs were not freed if a PCI error
recovery system opts to remove the device which causes
ixgbe_io_error_detected() to return PCI_ERS_RESULT_DISCONNECT before
calling ixgbe_close_suspend() which results in IRQs not freed and
crashing when the remove handler calls pci_disable_device().  Resolved
this by calling ixgbe_close_suspend() before evaluating the PCI channel
state.

Pavel Tatashin releases the rtnl_lock during the call to
ixgbe_close_suspend() to allow scaling if device_shutdown() is
multi-threaded.

Emil modifies ixgbe to not validate the MAC address during a reset,
unless the MAC was set on the host so that the VF will get a new MAC
address every time it reloads.  Also updates ixgbevf to set
hw->mac.perm_addr in order to retain the custom MAC on a reset.

Anirudh updates the ice NVM read/erase/update AQ commands to align with
the latest specification.

The following are changes since commit b9f672af148bf7a08a6031743156faffd58dbc7e:
  Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 10GbE

Anirudh Venkataramanan (1):
  ice: Update NVM AQ command functions

Cathy Zhou (1):
  ixgbe: cleanup sparse warnings

Emil Tantilov (2):
  ixgbe: force VF to grab new MAC on driver reload
  ixgbevf: fix MAC address changes through ixgbevf_set_mac()

Mauro S M Rodrigues (1):
  ixgbe/ixgbevf: Free IRQ when PCI error recovery removes the device

Pavel Tatashin (1):
  ixgbe: release lock for the duration of ixgbe_suspend_close()

 .../net/ethernet/intel/ice/ice_adminq_cmd.h   | 13 +++---
 drivers/net/ethernet/intel/ice/ice_nvm.c      |  7 +--
 .../net/ethernet/intel/ixgbe/ixgbe_82599.c    | 13 +++---
 .../net/ethernet/intel/ixgbe/ixgbe_common.c   |  2 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c |  2 +-
 .../net/ethernet/intel/ixgbe/ixgbe_ipsec.c    | 25 +++++++----
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 44 ++++++++++++-------
 .../net/ethernet/intel/ixgbe/ixgbe_model.h    | 16 +++----
 .../net/ethernet/intel/ixgbe/ixgbe_sriov.c    |  5 +--
 drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c |  9 ++--
 .../net/ethernet/intel/ixgbevf/ixgbevf_main.c |  7 +--
 11 files changed, 82 insertions(+), 61 deletions(-)

-- 
2.17.0

^ permalink raw reply

* [net-next 2/6] ixgbe/ixgbevf: Free IRQ when PCI error recovery removes the device
From: Jeff Kirsher @ 2018-05-17 16:37 UTC (permalink / raw)
  To: davem
  Cc: Mauro S M Rodrigues, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher
In-Reply-To: <20180517163732.30910-1-jeffrey.t.kirsher@intel.com>

From: Mauro S M Rodrigues <maurosr@linux.vnet.ibm.com>

Since commit f7f37e7ff2b9 ("ixgbe: handle close/suspend race with
netif_device_detach/present") ixgbe_close_suspend is called, from
ixgbe_close, only if the device is present, i.e. if it isn't detached.
That exposed a situation where IRQs weren't freed if a PCI error
recovery system opts to remove the device. For such case the pci channel
state is set to pci_channel_io_perm_failure and ixgbe_io_error_detected
was returning PCI_ERS_RESULT_DISCONNECT before calling
ixgbe_close_suspend consequentially not freeing IRQ and crashing when
the remove handler calls pci_disable_device, hitting a BUG_ON at
free_msi_irqs, which asserts that there is no non-free IRQ associated
with the device to be removed:

BUG_ON(irq_has_action(entry->irq + i));

The issue is fixed by calling the ixgbe_close_suspend before evaluate
the pci channel state.

Reported-by: Naresh Bannoth <nbannoth@in.ibm.com>
Reported-by: Abdul Haleem <abdhalee@in.ibm.com>
Signed-off-by: Mauro S M Rodrigues <maurosr@linux.vnet.ibm.com>
Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c     | 6 +++---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 163b34a9572d..a52d92e182ee 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -10935,14 +10935,14 @@ static pci_ers_result_t ixgbe_io_error_detected(struct pci_dev *pdev,
 	rtnl_lock();
 	netif_device_detach(netdev);
 
+	if (netif_running(netdev))
+		ixgbe_close_suspend(adapter);
+
 	if (state == pci_channel_io_perm_failure) {
 		rtnl_unlock();
 		return PCI_ERS_RESULT_DISCONNECT;
 	}
 
-	if (netif_running(netdev))
-		ixgbe_close_suspend(adapter);
-
 	if (!test_and_set_bit(__IXGBE_DISABLED, &adapter->state))
 		pci_disable_device(pdev);
 	rtnl_unlock();
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 1ccce6cd51fc..9a939dcaf727 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -4747,14 +4747,14 @@ static pci_ers_result_t ixgbevf_io_error_detected(struct pci_dev *pdev,
 	rtnl_lock();
 	netif_device_detach(netdev);
 
+	if (netif_running(netdev))
+		ixgbevf_close_suspend(adapter);
+
 	if (state == pci_channel_io_perm_failure) {
 		rtnl_unlock();
 		return PCI_ERS_RESULT_DISCONNECT;
 	}
 
-	if (netif_running(netdev))
-		ixgbevf_close_suspend(adapter);
-
 	if (!test_and_set_bit(__IXGBEVF_DISABLED, &adapter->state))
 		pci_disable_device(pdev);
 	rtnl_unlock();
-- 
2.17.0

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox