Netdev List
 help / color / mirror / Atom feed
* Re: [pull request][net 0/3] Mellanox, mlx5 fixes 2019-07-15
From: David Miller @ 2019-07-16  0:20 UTC (permalink / raw)
  To: saeedm; +Cc: netdev
In-Reply-To: <20190715200940.31799-1-saeedm@mellanox.com>

From: Saeed Mahameed <saeedm@mellanox.com>
Date: Mon, 15 Jul 2019 20:09:53 +0000

> This pull request provides mlx5 TC flower and tunnel fixes for
> kernel 5.2 from Eli and Vlad.
> 
> Please pull and let me know if there is any problem.

Pulled, thanks Saeed.

^ permalink raw reply

* Re: [PATCH iproute2 net-next v3 2/5] taprio: Add support for setting flags
From: Stephen Hemminger @ 2019-07-16  0:24 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Vedang Patel, netdev, jhs, xiyou.wangcong, jiri, vinicius.gomes,
	leandro.maciel.dorileo, m-karicheri2, dsahern
In-Reply-To: <20190715171515.248460a6@cakuba.netronome.com>

On Mon, 15 Jul 2019 17:15:15 -0700
Jakub Kicinski <jakub.kicinski@netronome.com> wrote:

> On Mon, 15 Jul 2019 16:37:43 -0700, Stephen Hemminger wrote:
> > On Mon, 15 Jul 2019 15:51:41 -0700
> > Vedang Patel <vedang.patel@intel.com> wrote:  
> > > @@ -442,6 +458,11 @@ static int taprio_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
> > >  
> > >  	print_string(PRINT_ANY, "clockid", "clockid %s", get_clock_name(clockid));
> > >  
> > > +	if (tb[TCA_TAPRIO_ATTR_FLAGS]) {
> > > +		taprio_flags = rta_getattr_u32(tb[TCA_TAPRIO_ATTR_FLAGS]);
> > > +		print_uint(PRINT_ANY, "flags", " flags %x", taprio_flags);
> > > +	}  
> >[...]
> > 3. Use the print_0xhex() instead of print_uint() for hex values. The difference
> >    is that in the JSON output, print_uint would be decimal but the print_0xhex
> >    is always hex.  And use "flags %#x" so that it is clear you are printing flags in hex.  
> 
> In my humble personal experience scripting tests using iproute2 and
> bpftool with Python I found printing the "hex string" instead of just
> outputing the integer value counter productive :( Even tho it looks
> better to the eye, JSON is primarily for machine processing and hex
> strings have to be manually converted.

If it is hex on normal output, it should be hex on JSON output.
And what ever the normal output format is has to be accepted on command line as input.

^ permalink raw reply

* [bpf-next RFC 0/6] Introduce a BPF helper to generate SYN cookies
From: Petar Penkov @ 2019-07-16  0:26 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, edumazet, lmb, sdf, Petar Penkov

From: Petar Penkov <ppenkov@google.com>

This patch series introduces a BPF helper function that allows generating SYN
cookies from BPF.

The first two patches in the series modify several TCP helper functions to
allow for SKB-less operation, as is the case with XDP.

The third patch introduces the bpf_tcp_gen_syncookie helper function which
generates a SYN cookie for either XDP or TC programs. 

The last three patches sync tools/ and add a test. 

The primary design consideration I see in the patch series is the return value
of the helper function. Currently bpf_tcp_gen_syncookie returns a 64-bit value
that contains both the 32-bit syncookie, and the 16-bit mss value which is
encoded in the cookie. On error, it would return a negative value instead. I
chose this over writing the cookie into the provided TCP packet to avoid writing
packet data as currently if a helper changes the packet data, the first argument
has to point to the context (can this be relaxed?). 

To make the API cleaner we can instead return something like the struct below
though the return type would then not really be RET_INTEGER or any of the
currently existing return types.
struct bpf_syncookie {
	u16 error; // or u8 error, u8 unused for future use
	u16 mss;
	u32 syncookie;
}

Petar Penkov (6):
  tcp: tcp_syn_flood_action read port from socket
  tcp: add skb-less helpers to retrieve SYN cookie
  bpf: add bpf_tcp_gen_syncookie helper
  bpf: sync bpf.h to tools/
  selftests/bpf: bpf_tcp_gen_syncookie->bpf_helpers
  selftests/bpf: add test for bpf_tcp_gen_syncookie

 include/net/tcp.h                             | 11 +++
 include/uapi/linux/bpf.h                      | 30 ++++++-
 net/core/filter.c                             | 62 +++++++++++++
 net/ipv4/tcp_input.c                          | 87 +++++++++++++++++--
 net/ipv4/tcp_ipv4.c                           |  8 ++
 net/ipv6/tcp_ipv6.c                           |  8 ++
 tools/include/uapi/linux/bpf.h                | 37 +++++++-
 tools/testing/selftests/bpf/bpf_helpers.h     |  3 +
 .../bpf/progs/test_tcp_check_syncookie_kern.c | 28 ++++--
 .../bpf/test_tcp_check_syncookie_user.c       | 61 +++++++++++--
 10 files changed, 313 insertions(+), 22 deletions(-)

-- 
2.22.0.510.g264f2c817a-goog


^ permalink raw reply

* [bpf-next RFC 1/6] tcp: tcp_syn_flood_action read port from socket
From: Petar Penkov @ 2019-07-16  0:26 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, edumazet, lmb, sdf, Petar Penkov
In-Reply-To: <20190716002650.154729-1-ppenkov.kernel@gmail.com>

From: Petar Penkov <ppenkov@google.com>

This allows us to call this function before an SKB has been
allocated.

Signed-off-by: Petar Penkov <ppenkov@google.com>
---
 net/ipv4/tcp_input.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index c21e8a22fb3b..8892df6de1d4 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -6422,9 +6422,7 @@ EXPORT_SYMBOL(inet_reqsk_alloc);
 /*
  * Return true if a syncookie should be sent
  */
-static bool tcp_syn_flood_action(const struct sock *sk,
-				 const struct sk_buff *skb,
-				 const char *proto)
+static bool tcp_syn_flood_action(const struct sock *sk, const char *proto)
 {
 	struct request_sock_queue *queue = &inet_csk(sk)->icsk_accept_queue;
 	const char *msg = "Dropping request";
@@ -6444,7 +6442,7 @@ static bool tcp_syn_flood_action(const struct sock *sk,
 	    net->ipv4.sysctl_tcp_syncookies != 2 &&
 	    xchg(&queue->synflood_warned, 1) == 0)
 		net_info_ratelimited("%s: Possible SYN flooding on port %d. %s.  Check SNMP counters.\n",
-				     proto, ntohs(tcp_hdr(skb)->dest), msg);
+				     proto, sk->sk_num, msg);
 
 	return want_cookie;
 }
@@ -6487,7 +6485,7 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
 	 */
 	if ((net->ipv4.sysctl_tcp_syncookies == 2 ||
 	     inet_csk_reqsk_queue_is_full(sk)) && !isn) {
-		want_cookie = tcp_syn_flood_action(sk, skb, rsk_ops->slab_name);
+		want_cookie = tcp_syn_flood_action(sk, rsk_ops->slab_name);
 		if (!want_cookie)
 			goto drop;
 	}
-- 
2.22.0.510.g264f2c817a-goog


^ permalink raw reply related

* [bpf-next RFC 2/6] tcp: add skb-less helpers to retrieve SYN cookie
From: Petar Penkov @ 2019-07-16  0:26 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, edumazet, lmb, sdf, Petar Penkov
In-Reply-To: <20190716002650.154729-1-ppenkov.kernel@gmail.com>

From: Petar Penkov <ppenkov@google.com>

This patch allows generation of a SYN cookie before an SKB has been
allocated, as is the case at XDP.

Signed-off-by: Petar Penkov <ppenkov@google.com>
---
 include/net/tcp.h    | 11 ++++++
 net/ipv4/tcp_input.c | 79 ++++++++++++++++++++++++++++++++++++++++++++
 net/ipv4/tcp_ipv4.c  |  8 +++++
 net/ipv6/tcp_ipv6.c  |  8 +++++
 4 files changed, 106 insertions(+)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index cca3c59b98bf..a128e22c0d5d 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -414,6 +414,17 @@ void tcp_parse_options(const struct net *net, const struct sk_buff *skb,
 		       int estab, struct tcp_fastopen_cookie *foc);
 const u8 *tcp_parse_md5sig_option(const struct tcphdr *th);
 
+/*
+ *	BPF SKB-less helpers
+ */
+u16 tcp_v4_get_syncookie(struct sock *sk, struct iphdr *iph,
+			 struct tcphdr *tch, u32 *cookie);
+u16 tcp_v6_get_syncookie(struct sock *sk, struct ipv6hdr *iph,
+			 struct tcphdr *tch, u32 *cookie);
+u16 tcp_get_syncookie(struct request_sock_ops *rsk_ops,
+		      const struct tcp_request_sock_ops *af_ops,
+		      struct sock *sk, void *iph, struct tcphdr *tch,
+		      u32 *cookie);
 /*
  *	TCP v4 functions exported for the inet6 API
  */
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 8892df6de1d4..1406d7e0953c 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3782,6 +3782,52 @@ static void smc_parse_options(const struct tcphdr *th,
 #endif
 }
 
+/* Try to parse the MSS option from the TCP header. Return 0 on failure, clamped
+ * value on success.
+ *
+ * Invoked for BPF SYN cookie generation, so th should be a SYN.
+ */
+static u16 tcp_parse_mss_option(const struct net *net, const struct tcphdr *th,
+				u16 user_mss)
+{
+	const unsigned char *ptr = (const unsigned char *)(th + 1);
+	int length = (th->doff * 4) - sizeof(struct tcphdr);
+	u16 mss = 0;
+
+	while (length > 0) {
+		int opcode = *ptr++;
+		int opsize;
+
+		switch (opcode) {
+		case TCPOPT_EOL:
+			return mss;
+		case TCPOPT_NOP:	/* Ref: RFC 793 section 3.1 */
+			length--;
+			continue;
+		default:
+			if (length < 2)
+				return mss;
+			opsize = *ptr++;
+			if (opsize < 2) /* "silly options" */
+				return mss;
+			if (opsize > length)
+				return mss;	/* fail on partial options */
+			if (opcode == TCPOPT_MSS && opsize == TCPOLEN_MSS) {
+				u16 in_mss = get_unaligned_be16(ptr);
+
+				if (in_mss) {
+					if (user_mss && user_mss < in_mss)
+						in_mss = user_mss;
+					mss = in_mss;
+				}
+			}
+			ptr += opsize - 2;
+			length -= opsize;
+		}
+	}
+	return mss;
+}
+
 /* Look for tcp options. Normally only called on SYN and SYNACK packets.
  * But, this can also be called on packets in the established flow when
  * the fast version below fails.
@@ -6464,6 +6510,39 @@ static void tcp_reqsk_record_syn(const struct sock *sk,
 	}
 }
 
+u16 tcp_get_syncookie(struct request_sock_ops *rsk_ops,
+		      const struct tcp_request_sock_ops *af_ops,
+		      struct sock *sk, void *iph, struct tcphdr *th,
+		      u32 *cookie)
+{
+	u16 mss = 0;
+#ifdef CONFIG_SYN_COOKIES
+	bool is_v4 = rsk_ops->family == AF_INET;
+	struct tcp_sock *tp = tcp_sk(sk);
+
+	if (sock_net(sk)->ipv4.sysctl_tcp_syncookies != 2 &&
+	    !inet_csk_reqsk_queue_is_full(sk))
+		return 0;
+
+	if (!tcp_syn_flood_action(sk, rsk_ops->slab_name))
+		return 0;
+
+	if (sk_acceptq_is_full(sk)) {
+		NET_INC_STATS(sock_net(sk), LINUX_MIB_LISTENOVERFLOWS);
+		return 0;
+	}
+
+	mss = tcp_parse_mss_option(sock_net(sk), th, tp->rx_opt.user_mss);
+	if (!mss)
+		mss = af_ops->mss_clamp;
+
+	tcp_synq_overflow(sk);
+	*cookie = is_v4 ? __cookie_v4_init_sequence(iph, th, &mss)
+			: __cookie_v6_init_sequence(iph, th, &mss);
+#endif
+	return mss;
+}
+
 int tcp_conn_request(struct request_sock_ops *rsk_ops,
 		     const struct tcp_request_sock_ops *af_ops,
 		     struct sock *sk, struct sk_buff *skb)
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index d57641cb3477..0e06e59784bd 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1515,6 +1515,14 @@ static struct sock *tcp_v4_cookie_check(struct sock *sk, struct sk_buff *skb)
 	return sk;
 }
 
+u16 tcp_v4_get_syncookie(struct sock *sk, struct iphdr *iph,
+			 struct tcphdr *tch, u32 *cookie)
+{
+	return tcp_get_syncookie(&tcp_request_sock_ops,
+				 &tcp_request_sock_ipv4_ops, sk, iph, tch,
+				 cookie);
+}
+
 /* The socket must have it's spinlock held when we get
  * here, unless it is a TCP_LISTEN socket.
  *
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index d56a9019a0fe..ce46cdba54bc 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1058,6 +1058,14 @@ static struct sock *tcp_v6_cookie_check(struct sock *sk, struct sk_buff *skb)
 	return sk;
 }
 
+u16 tcp_v6_get_syncookie(struct sock *sk, struct ipv6hdr *iph,
+			 struct tcphdr *tch, u32 *cookie)
+{
+	return tcp_get_syncookie(&tcp6_request_sock_ops,
+				 &tcp_request_sock_ipv6_ops, sk, iph, tch,
+				 cookie);
+}
+
 static int tcp_v6_conn_request(struct sock *sk, struct sk_buff *skb)
 {
 	if (skb->protocol == htons(ETH_P_IP))
-- 
2.22.0.510.g264f2c817a-goog


^ permalink raw reply related

* [bpf-next RFC 3/6] bpf: add bpf_tcp_gen_syncookie helper
From: Petar Penkov @ 2019-07-16  0:26 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, edumazet, lmb, sdf, Petar Penkov
In-Reply-To: <20190716002650.154729-1-ppenkov.kernel@gmail.com>

From: Petar Penkov <ppenkov@google.com>

This helper function allows BPF programs to try to generate SYN
cookies, given a reference to a listener socket. The function works
from XDP and with an skb context since bpf_skc_lookup_tcp can lookup a
socket in both cases.

Signed-off-by: Petar Penkov <ppenkov@google.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
---
 include/uapi/linux/bpf.h | 30 ++++++++++++++++++-
 net/core/filter.c        | 62 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 91 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 6f68438aa4ed..abf4a85c76d1 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2713,6 +2713,33 @@ union bpf_attr {
  *		**-EPERM** if no permission to send the *sig*.
  *
  *		**-EAGAIN** if bpf program can try again.
+ *
+ * s64 bpf_tcp_gen_syncookie(struct bpf_sock *sk, void *iph, u32 iph_len, struct tcphdr *th, u32 th_len)
+ *	Description
+ *		Try to issue a SYN cookie for the packet with corresponding
+ *		IP/TCP headers, *iph* and *th*, on the listening socket in *sk*.
+ *
+ *		*iph* points to the start of the IPv4 or IPv6 header, while
+ *		*iph_len* contains **sizeof**\ (**struct iphdr**) or
+ *		**sizeof**\ (**struct ip6hdr**).
+ *
+ *		*th* points to the start of the TCP header, while *th_len*
+ *		contains **sizeof**\ (**struct tcphdr**).
+ *
+ *	Return
+ *		On success, lower 32 bits hold the generated SYN cookie in
+ *		network order and the higher 32 bits hold the MSS value for that
+ *		cookie.
+ *
+ *		On failure, the returned value is one of the following:
+ *
+ *		**-EINVAL** SYN cookie cannot be issued due to error
+ *
+ *		**-ENOENT** SYN cookie should not be issued (no SYN flood)
+ *
+ *		**-ENOTSUPP** kernel configuration does not enable SYN cookies
+ *
+ *		**-EPROTONOSUPPORT** *sk* family is not AF_INET/AF_INET6
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -2824,7 +2851,8 @@ union bpf_attr {
 	FN(strtoul),			\
 	FN(sk_storage_get),		\
 	FN(sk_storage_delete),		\
-	FN(send_signal),
+	FN(send_signal),		\
+	FN(tcp_gen_syncookie),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/net/core/filter.c b/net/core/filter.c
index 47f6386fb17a..109fd1e286f4 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5850,6 +5850,64 @@ static const struct bpf_func_proto bpf_tcp_check_syncookie_proto = {
 	.arg5_type	= ARG_CONST_SIZE,
 };
 
+BPF_CALL_5(bpf_tcp_gen_syncookie, struct sock *, sk, void *, iph, u32, iph_len,
+	   struct tcphdr *, th, u32, th_len)
+{
+#ifdef CONFIG_SYN_COOKIES
+	u32 cookie;
+	u16 mss;
+
+	if (unlikely(th_len < sizeof(*th)))
+		return -EINVAL;
+
+	if (sk->sk_protocol != IPPROTO_TCP || sk->sk_state != TCP_LISTEN)
+		return -EINVAL;
+
+	if (!sock_net(sk)->ipv4.sysctl_tcp_syncookies)
+		return -EINVAL;
+
+	if (!th->syn || th->ack || th->fin || th->rst)
+		return -EINVAL;
+
+	switch (sk->sk_family) {
+	case AF_INET:
+		if (unlikely(iph_len < sizeof(struct iphdr)))
+			return -EINVAL;
+		mss = tcp_v4_get_syncookie(sk, iph, th, &cookie);
+		break;
+
+#if IS_BUILTIN(CONFIG_IPV6)
+	case AF_INET6:
+		if (unlikely(iph_len < sizeof(struct ipv6hdr)))
+			return -EINVAL;
+		mss = tcp_v6_get_syncookie(sk, iph, th, &cookie);
+		break;
+#endif /* CONFIG_IPV6 */
+
+	default:
+		return -EPROTONOSUPPORT;
+	}
+	if (mss <= 0)
+		return -ENOENT;
+
+	return htonl(cookie) | ((u64)mss << 32);
+#else
+	return -ENOTSUPP;
+#endif /* CONFIG_SYN_COOKIES */
+}
+
+static const struct bpf_func_proto bpf_tcp_gen_syncookie_proto = {
+	.func		= bpf_tcp_gen_syncookie,
+	.gpl_only	= true, /* __cookie_v*_init_sequence() is GPL */
+	.pkt_access	= true,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_SOCK_COMMON,
+	.arg2_type	= ARG_PTR_TO_MEM,
+	.arg3_type	= ARG_CONST_SIZE,
+	.arg4_type	= ARG_PTR_TO_MEM,
+	.arg5_type	= ARG_CONST_SIZE,
+};
+
 #endif /* CONFIG_INET */
 
 bool bpf_helper_changes_pkt_data(void *func)
@@ -6135,6 +6193,8 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_tcp_check_syncookie_proto;
 	case BPF_FUNC_skb_ecn_set_ce:
 		return &bpf_skb_ecn_set_ce_proto;
+	case BPF_FUNC_tcp_gen_syncookie:
+		return &bpf_tcp_gen_syncookie_proto;
 #endif
 	default:
 		return bpf_base_func_proto(func_id);
@@ -6174,6 +6234,8 @@ xdp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_xdp_skc_lookup_tcp_proto;
 	case BPF_FUNC_tcp_check_syncookie:
 		return &bpf_tcp_check_syncookie_proto;
+	case BPF_FUNC_tcp_gen_syncookie:
+		return &bpf_tcp_gen_syncookie_proto;
 #endif
 	default:
 		return bpf_base_func_proto(func_id);
-- 
2.22.0.510.g264f2c817a-goog


^ permalink raw reply related

* [bpf-next RFC 5/6] selftests/bpf: bpf_tcp_gen_syncookie->bpf_helpers
From: Petar Penkov @ 2019-07-16  0:26 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, edumazet, lmb, sdf, Petar Penkov
In-Reply-To: <20190716002650.154729-1-ppenkov.kernel@gmail.com>

From: Petar Penkov <ppenkov@google.com>

Expose bpf_tcp_gen_syncookie to selftests.

Signed-off-by: Petar Penkov <ppenkov@google.com>
---
 tools/testing/selftests/bpf/bpf_helpers.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index 5a3d92c8bec8..19f01e967402 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -228,6 +228,9 @@ static void *(*bpf_sk_storage_get)(void *map, struct bpf_sock *sk,
 static int (*bpf_sk_storage_delete)(void *map, struct bpf_sock *sk) =
 	(void *)BPF_FUNC_sk_storage_delete;
 static int (*bpf_send_signal)(unsigned sig) = (void *)BPF_FUNC_send_signal;
+static long long (*bpf_tcp_gen_syncookie)(struct bpf_sock *sk, void *ip,
+					  int ip_len, void *tcp, int tcp_len) =
+	(void *) BPF_FUNC_tcp_gen_syncookie;
 
 /* llvm builtin functions that eBPF C program may use to
  * emit BPF_LD_ABS and BPF_LD_IND instructions
-- 
2.22.0.510.g264f2c817a-goog


^ permalink raw reply related

* [bpf-next RFC 4/6] bpf: sync bpf.h to tools/
From: Petar Penkov @ 2019-07-16  0:26 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, edumazet, lmb, sdf, Petar Penkov
In-Reply-To: <20190716002650.154729-1-ppenkov.kernel@gmail.com>

From: Petar Penkov <ppenkov@google.com>

Sync updated documentation for bpf_redirect_map.

Sync the bpf_tcp_gen_syncookie helper function definition with the one
in tools/uapi.

Signed-off-by: Petar Penkov <ppenkov@google.com>
---
 tools/include/uapi/linux/bpf.h | 37 +++++++++++++++++++++++++++++++---
 1 file changed, 34 insertions(+), 3 deletions(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index f506c68b2612..abf4a85c76d1 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1571,8 +1571,11 @@ union bpf_attr {
  * 		but this is only implemented for native XDP (with driver
  * 		support) as of this writing).
  *
- * 		All values for *flags* are reserved for future usage, and must
- * 		be left at zero.
+ * 		The lower two bits of *flags* are used as the return code if
+ * 		the map lookup fails. This is so that the return value can be
+ * 		one of the XDP program return codes up to XDP_TX, as chosen by
+ * 		the caller. Any higher bits in the *flags* argument must be
+ * 		unset.
  *
  * 		When used to redirect packets to net devices, this helper
  * 		provides a high performance increase over **bpf_redirect**\ ().
@@ -2710,6 +2713,33 @@ union bpf_attr {
  *		**-EPERM** if no permission to send the *sig*.
  *
  *		**-EAGAIN** if bpf program can try again.
+ *
+ * s64 bpf_tcp_gen_syncookie(struct bpf_sock *sk, void *iph, u32 iph_len, struct tcphdr *th, u32 th_len)
+ *	Description
+ *		Try to issue a SYN cookie for the packet with corresponding
+ *		IP/TCP headers, *iph* and *th*, on the listening socket in *sk*.
+ *
+ *		*iph* points to the start of the IPv4 or IPv6 header, while
+ *		*iph_len* contains **sizeof**\ (**struct iphdr**) or
+ *		**sizeof**\ (**struct ip6hdr**).
+ *
+ *		*th* points to the start of the TCP header, while *th_len*
+ *		contains **sizeof**\ (**struct tcphdr**).
+ *
+ *	Return
+ *		On success, lower 32 bits hold the generated SYN cookie in
+ *		network order and the higher 32 bits hold the MSS value for that
+ *		cookie.
+ *
+ *		On failure, the returned value is one of the following:
+ *
+ *		**-EINVAL** SYN cookie cannot be issued due to error
+ *
+ *		**-ENOENT** SYN cookie should not be issued (no SYN flood)
+ *
+ *		**-ENOTSUPP** kernel configuration does not enable SYN cookies
+ *
+ *		**-EPROTONOSUPPORT** *sk* family is not AF_INET/AF_INET6
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -2821,7 +2851,8 @@ union bpf_attr {
 	FN(strtoul),			\
 	FN(sk_storage_get),		\
 	FN(sk_storage_delete),		\
-	FN(send_signal),
+	FN(send_signal),		\
+	FN(tcp_gen_syncookie),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
-- 
2.22.0.510.g264f2c817a-goog


^ permalink raw reply related

* [bpf-next RFC 6/6] selftests/bpf: add test for bpf_tcp_gen_syncookie
From: Petar Penkov @ 2019-07-16  0:26 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, edumazet, lmb, sdf, Petar Penkov
In-Reply-To: <20190716002650.154729-1-ppenkov.kernel@gmail.com>

From: Petar Penkov <ppenkov@google.com>

Modify the existing bpf_tcp_check_syncookie test to also generate a
SYN cookie, pass the packet to the kernel, and verify that the two
cookies are the same (and both valid). Since cloned SKBs are skipped
during generic XDP, this test does not issue a SYN cookie when run in
XDP mode. We therefore only check that a valid SYN cookie was issued at
the TC hook.

Additionally, verify that the MSS for that SYN cookie is within
expected range.

Signed-off-by: Petar Penkov <ppenkov@google.com>
---
 .../bpf/progs/test_tcp_check_syncookie_kern.c | 28 +++++++--
 .../bpf/test_tcp_check_syncookie_user.c       | 61 ++++++++++++++++---
 2 files changed, 76 insertions(+), 13 deletions(-)

diff --git a/tools/testing/selftests/bpf/progs/test_tcp_check_syncookie_kern.c b/tools/testing/selftests/bpf/progs/test_tcp_check_syncookie_kern.c
index 1ab095bcacd8..229832766f42 100644
--- a/tools/testing/selftests/bpf/progs/test_tcp_check_syncookie_kern.c
+++ b/tools/testing/selftests/bpf/progs/test_tcp_check_syncookie_kern.c
@@ -19,8 +19,8 @@
 struct bpf_map_def SEC("maps") results = {
 	.type = BPF_MAP_TYPE_ARRAY,
 	.key_size = sizeof(__u32),
-	.value_size = sizeof(__u64),
-	.max_entries = 1,
+	.value_size = sizeof(__u32),
+	.max_entries = 3,
 };
 
 static __always_inline void check_syncookie(void *ctx, void *data,
@@ -33,8 +33,10 @@ static __always_inline void check_syncookie(void *ctx, void *data,
 	struct ipv6hdr *ipv6h;
 	struct tcphdr *tcph;
 	int ret;
+	__u32 key_mss = 2;
+	__u32 key_gen = 1;
 	__u32 key = 0;
-	__u64 value = 1;
+	__s64 seq_mss;
 
 	ethh = data;
 	if (ethh + 1 > data_end)
@@ -66,6 +68,8 @@ static __always_inline void check_syncookie(void *ctx, void *data,
 		if (sk->state != BPF_TCP_LISTEN)
 			goto release;
 
+		seq_mss = bpf_tcp_gen_syncookie(sk, ipv4h, sizeof(*ipv4h),
+						tcph, sizeof(*tcph));
 		ret = bpf_tcp_check_syncookie(sk, ipv4h, sizeof(*ipv4h),
 					      tcph, sizeof(*tcph));
 		break;
@@ -95,6 +99,9 @@ static __always_inline void check_syncookie(void *ctx, void *data,
 		if (sk->state != BPF_TCP_LISTEN)
 			goto release;
 
+		seq_mss = bpf_tcp_gen_syncookie(sk, ipv6h, sizeof(*ipv6h),
+						tcph, sizeof(*tcph));
+
 		ret = bpf_tcp_check_syncookie(sk, ipv6h, sizeof(*ipv6h),
 					      tcph, sizeof(*tcph));
 		break;
@@ -103,8 +110,19 @@ static __always_inline void check_syncookie(void *ctx, void *data,
 		return;
 	}
 
-	if (ret == 0)
-		bpf_map_update_elem(&results, &key, &value, 0);
+	if (seq_mss > 0) {
+		__u32 cookie = bpf_ntohl((__u32)seq_mss);
+		__u32 mss = seq_mss >> 32;
+
+		bpf_map_update_elem(&results, &key_gen, &cookie, 0);
+		bpf_map_update_elem(&results, &key_mss, &mss, 0);
+	}
+
+	if (ret == 0) {
+		__u32 cookie = bpf_ntohl(tcph->ack_seq) - 1;
+
+		bpf_map_update_elem(&results, &key, &cookie, 0);
+	}
 
 release:
 	bpf_sk_release(sk);
diff --git a/tools/testing/selftests/bpf/test_tcp_check_syncookie_user.c b/tools/testing/selftests/bpf/test_tcp_check_syncookie_user.c
index 87829c86c746..f3ff49ceb481 100644
--- a/tools/testing/selftests/bpf/test_tcp_check_syncookie_user.c
+++ b/tools/testing/selftests/bpf/test_tcp_check_syncookie_user.c
@@ -2,6 +2,7 @@
 // Copyright (c) 2018 Facebook
 // Copyright (c) 2019 Cloudflare
 
+#include <limits.h>
 #include <string.h>
 #include <stdlib.h>
 #include <unistd.h>
@@ -77,7 +78,7 @@ static int connect_to_server(int server_fd)
 	return fd;
 }
 
-static int get_map_fd_by_prog_id(int prog_id)
+static int get_map_fd_by_prog_id(int prog_id, bool *xdp)
 {
 	struct bpf_prog_info info = {};
 	__u32 info_len = sizeof(info);
@@ -104,6 +105,8 @@ static int get_map_fd_by_prog_id(int prog_id)
 		goto err;
 	}
 
+	*xdp = info.type == BPF_PROG_TYPE_XDP;
+
 	map_fd = bpf_map_get_fd_by_id(map_ids[0]);
 	if (map_fd < 0)
 		log_err("Failed to get fd by map id %d", map_ids[0]);
@@ -113,18 +116,32 @@ static int get_map_fd_by_prog_id(int prog_id)
 	return map_fd;
 }
 
-static int run_test(int server_fd, int results_fd)
+static int run_test(int server_fd, int results_fd, bool xdp)
 {
 	int client = -1, srv_client = -1;
 	int ret = 0;
 	__u32 key = 0;
-	__u64 value = 0;
+	__u32 key_gen = 1;
+	__u32 key_mss = 2;
+	__u32 value = 0;
+	__u32 value_gen = 0;
+	__u32 value_mss = 0;
 
 	if (bpf_map_update_elem(results_fd, &key, &value, 0) < 0) {
 		log_err("Can't clear results");
 		goto err;
 	}
 
+	if (bpf_map_update_elem(results_fd, &key_gen, &value_gen, 0) < 0) {
+		log_err("Can't clear results");
+		goto err;
+	}
+
+	if (bpf_map_update_elem(results_fd, &key_mss, &value_mss, 0) < 0) {
+		log_err("Can't clear results");
+		goto err;
+	}
+
 	client = connect_to_server(server_fd);
 	if (client == -1)
 		goto err;
@@ -140,8 +157,35 @@ static int run_test(int server_fd, int results_fd)
 		goto err;
 	}
 
-	if (value != 1) {
-		log_err("Didn't match syncookie: %llu", value);
+	if (value == 0) {
+		log_err("Didn't match syncookie: %u", value);
+		goto err;
+	}
+
+	if (bpf_map_lookup_elem(results_fd, &key_gen, &value_gen) < 0) {
+		log_err("Can't lookup result");
+		goto err;
+	}
+
+	if (xdp && value_gen == 0) {
+		// SYN packets do not get passed through generic XDP, skip the
+		// rest of the test.
+		log_err("Did not find SYN cookie at XDP.");
+		goto out;
+	}
+
+	if (bpf_map_lookup_elem(results_fd, &key_mss, &value_mss) < 0) {
+		log_err("Can't lookup result");
+		goto err;
+	}
+
+	if (value != value_gen) {
+		log_err("BPF generated cookie does not match kernel one");
+		goto err;
+	}
+
+	if (value_mss < 536 || value_mss > USHRT_MAX) {
+		log_err("Unexpected MSS retrieved");
 		goto err;
 	}
 
@@ -163,13 +207,14 @@ int main(int argc, char **argv)
 	int server_v6 = -1;
 	int results = -1;
 	int err = 0;
+	bool xdp;
 
 	if (argc < 2) {
 		fprintf(stderr, "Usage: %s prog_id\n", argv[0]);
 		exit(1);
 	}
 
-	results = get_map_fd_by_prog_id(atoi(argv[1]));
+	results = get_map_fd_by_prog_id(atoi(argv[1]), &xdp);
 	if (results < 0) {
 		log_err("Can't get map");
 		goto err;
@@ -194,10 +239,10 @@ int main(int argc, char **argv)
 	if (server_v6 == -1)
 		goto err;
 
-	if (run_test(server, results))
+	if (run_test(server, results, xdp))
 		goto err;
 
-	if (run_test(server_v6, results))
+	if (run_test(server_v6, results, xdp))
 		goto err;
 
 	printf("ok\n");
-- 
2.22.0.510.g264f2c817a-goog


^ permalink raw reply related

* Re: [PATCH iproute2 net-next v3 2/5] taprio: Add support for setting flags
From: Jakub Kicinski @ 2019-07-16  0:34 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Vedang Patel, netdev, jhs, xiyou.wangcong, jiri, vinicius.gomes,
	leandro.maciel.dorileo, m-karicheri2, dsahern
In-Reply-To: <20190715172422.4e127da2@hermes.lan>

On Mon, 15 Jul 2019 17:24:22 -0700, Stephen Hemminger wrote:
> On Mon, 15 Jul 2019 17:15:15 -0700
> Jakub Kicinski <jakub.kicinski@netronome.com> wrote:
> > On Mon, 15 Jul 2019 16:37:43 -0700, Stephen Hemminger wrote:  
> > > On Mon, 15 Jul 2019 15:51:41 -0700
> > > Vedang Patel <vedang.patel@intel.com> wrote:    
> > > > @@ -442,6 +458,11 @@ static int taprio_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
> > > >  
> > > >  	print_string(PRINT_ANY, "clockid", "clockid %s", get_clock_name(clockid));
> > > >  
> > > > +	if (tb[TCA_TAPRIO_ATTR_FLAGS]) {
> > > > +		taprio_flags = rta_getattr_u32(tb[TCA_TAPRIO_ATTR_FLAGS]);
> > > > +		print_uint(PRINT_ANY, "flags", " flags %x", taprio_flags);
> > > > +	}    
> > >[...]
> > > 3. Use the print_0xhex() instead of print_uint() for hex values. The difference
> > >    is that in the JSON output, print_uint would be decimal but the print_0xhex
> > >    is always hex.  And use "flags %#x" so that it is clear you are printing flags in hex.    
> > 
> > In my humble personal experience scripting tests using iproute2 and
> > bpftool with Python I found printing the "hex string" instead of just
> > outputing the integer value counter productive :( Even tho it looks
> > better to the eye, JSON is primarily for machine processing and hex
> > strings have to be manually converted.  
> 
> If it is hex on normal output, it should be hex on JSON output.
> And what ever the normal output format is has to be accepted on command line as input.

Ah, I forgot the output == input principle in iproute2!
In any case if there was ever a vote whether to limit this principle to
non-JSON output, and make machines' life easier, I'd vote 'yes' :)

^ permalink raw reply

* [PATCH net-next v2 0/7] net/rds: RDMA fixes
From: Gerd Rausch @ 2019-07-16  0:41 UTC (permalink / raw)
  To: Santosh Shilimkar, netdev; +Cc: David Miller

A number of net/rds fixes necessary to make "rds_rdma.ko"
pass some basic Oracle internal tests.

Gerd Rausch (7):
  net/rds: Give fr_state a chance to transition to FRMR_IS_FREE
  net/rds: Get rid of "wait_clean_list_grace" and add locking
  net/rds: Wait for the FRMR_IS_FREE (or FRMR_IS_STALE) transition after
    posting IB_WR_LOCAL_INV
  net/rds: Fix NULL/ERR_PTR inconsistency
  net/rds: Set fr_state only to FRMR_IS_FREE if IB_WR_LOCAL_INV had been
    successful
  net/rds: Keep track of and wait for FRWR segments in use upon shutdown
  net/rds: Initialize ic->i_fastreg_wrs upon allocation

 net/rds/ib.h      |  1 +
 net/rds/ib_cm.c   |  9 ++++-
 net/rds/ib_frmr.c | 84 ++++++++++++++++++++++++++++++++++++++++++-----
 net/rds/ib_mr.h   |  4 +++
 net/rds/ib_rdma.c | 60 +++++++++++----------------------
 5 files changed, 109 insertions(+), 49 deletions(-)

-- 

Changes in submitted patch v2:
* Use "wait_event" instead of "wait_event_timeout" in order to
  not have a deadline for the HCA firmware to respond.

^ permalink raw reply

* Re: [PATCH iproute2 net-next v3 2/5] taprio: Add support for setting flags
From: Patel, Vedang @ 2019-07-16  1:15 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: netdev@vger.kernel.org, Jamal Hadi Salim, Cong Wang,
	jiri@resnulli.us, Gomes, Vinicius, Dorileo, Leandro,
	jakub.kicinski@netronome.com, m-karicheri2@ti.com,
	dsahern@gmail.com
In-Reply-To: <20190715163743.2c6cec2b@hermes.lan>

Hi Stephen,

> On Jul 15, 2019, at 4:37 PM, Stephen Hemminger <stephen@networkplumber.org> wrote:
> 
> On Mon, 15 Jul 2019 15:51:41 -0700
> Vedang Patel <vedang.patel@intel.com> wrote:
> 
>> @@ -405,6 +420,7 @@ static int taprio_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
>> 	struct rtattr *tb[TCA_TAPRIO_ATTR_MAX + 1];
>> 	struct tc_mqprio_qopt *qopt = 0;
>> 	__s32 clockid = CLOCKID_INVALID;
>> +	__u32 taprio_flags = 0;
>> 	int i;
>> 
>> 	if (opt == NULL)
>> @@ -442,6 +458,11 @@ static int taprio_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
>> 
>> 	print_string(PRINT_ANY, "clockid", "clockid %s", get_clock_name(clockid));
>> 
>> +	if (tb[TCA_TAPRIO_ATTR_FLAGS]) {
>> +		taprio_flags = rta_getattr_u32(tb[TCA_TAPRIO_ATTR_FLAGS]);
>> +		print_uint(PRINT_ANY, "flags", " flags %x", taprio_flags);
>> +	}
>> +
> 
> Overall this looks fine, but three small comments:
> 1. It is better not to do unnecessary variable initialization
> 2. It is better to move variables into the basic block where they are used.
> 3. Use the print_0xhex() instead of print_uint() for hex values. The difference
>   is that in the JSON output, print_uint would be decimal but the print_0xhex
>   is always hex.  And use "flags %#x" so that it is clear you are printing flags in hex.
Thanks for they inputs. I will incorporate your comments and send the updated series in a couple of days.

-Vedang
> 
> 


^ permalink raw reply

* Re: [PATCH iproute2 net-next v3 3/5] taprio: add support for setting txtime_delay.
From: Patel, Vedang @ 2019-07-16  1:17 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: netdev@vger.kernel.org, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
	Gomes, Vinicius, Dorileo, Leandro, jakub.kicinski@netronome.com,
	m-karicheri2@ti.com, dsahern@gmail.com
In-Reply-To: <20190715163822.32596123@hermes.lan>



> On Jul 15, 2019, at 4:38 PM, Stephen Hemminger <stephen@networkplumber.org> wrote:
> 
> On Mon, 15 Jul 2019 15:51:42 -0700
> Vedang Patel <vedang.patel@intel.com> wrote:
> 
>> +			if (get_s32(&txtime_delay, *argv, 0)) {
> 
> Is txtime_delay of a negative value meaningful?

No, txtime-delay should always be a positive value. I will change this to u32 here.  I will also make the corresponding changes in the kernel and send the updated patch. 

Thanks,
Vedang

^ permalink raw reply

* Re: [PATCH] ipvs: remove unnecessary space
From: yangxingwu @ 2019-07-16  1:26 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Simon Horman, wensong, ja, kadlec, fw, davem, netdev, lvs-devel,
	netfilter-devel, coreteam, linux-kernel
In-Reply-To: <20190715082747.fdlpvekbqyhwx724@salvia>

ok

I will remove all unnecessary spaces and send the v2 patch

Thansk Pablo

Pablo Neira Ayuso <pablo@netfilter.org> 于2019年7月15日周一 下午4:27写道:
>
> On Wed, Jul 10, 2019 at 10:06:09AM +0200, Simon Horman wrote:
> > On Wed, Jul 10, 2019 at 03:45:52PM +0800, yangxingwu wrote:
> > > ---
> > >  net/netfilter/ipvs/ip_vs_mh.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/net/netfilter/ipvs/ip_vs_mh.c b/net/netfilter/ipvs/ip_vs_mh.c
> > > index 94d9d34..98e358e 100644
> > > --- a/net/netfilter/ipvs/ip_vs_mh.c
> > > +++ b/net/netfilter/ipvs/ip_vs_mh.c
> > > @@ -174,8 +174,8 @@ static int ip_vs_mh_populate(struct ip_vs_mh_state *s,
> > >             return 0;
> > >     }
> > >
> > > -   table =  kcalloc(BITS_TO_LONGS(IP_VS_MH_TAB_SIZE),
> > > -                    sizeof(unsigned long), GFP_KERNEL);
> > > +   table = kcalloc(BITS_TO_LONGS(IP_VS_MH_TAB_SIZE),
> > > +                   sizeof(unsigned long), GFP_KERNEL);
>
> May I ask one thing? :-)
>
> Please, remove all unnecessary spaces in one go, search for:
>
>         git grep "=  "
>
> in the netfilter tree, and send a v2 for this one.
>
> Thanks.

^ permalink raw reply

* Re: [PATCH v3 0/3] kernel/notifier.c: avoid duplicate registration
From: Xiaoming Ni @ 2019-07-16  2:00 UTC (permalink / raw)
  To: Vasily Averin, gregkh@linuxfoundation.org
  Cc: adobriyan@gmail.com, akpm@linux-foundation.org,
	anna.schumaker@netapp.com, arjan@linux.intel.com,
	bfields@fieldses.org, chuck.lever@oracle.com, davem@davemloft.net,
	jlayton@kernel.org, luto@kernel.org, mingo@kernel.org,
	Nadia.Derbey@bull.net, paulmck@linux.vnet.ibm.com,
	semen.protsenko@linaro.org, stable@kernel.org,
	stern@rowland.harvard.edu, tglx@linutronix.de,
	torvalds@linux-foundation.org, trond.myklebust@hammerspace.com,
	viresh.kumar@linaro.org, Huangjianhui (Alex), Dailei,
	linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org,
	netdev@vger.kernel.org
In-Reply-To: <5521e5a4-66d9-aaf8-3a12-3999bfc6be8b@virtuozzo.com>

On 2019/7/15 13:38, Vasily Averin wrote:
> On 7/14/19 5:45 AM, Xiaoming Ni wrote:
>> On 2019/7/12 22:07, gregkh@linuxfoundation.org wrote:
>>> On Fri, Jul 12, 2019 at 09:11:57PM +0800, Xiaoming Ni wrote:
>>>> On 2019/7/11 21:57, Vasily Averin wrote:
>>>>> On 7/11/19 4:55 AM, Nixiaoming wrote:
>>>>>> On Wed, July 10, 2019 1:49 PM Vasily Averin wrote:
>>>>>>> On 7/10/19 6:09 AM, Xiaoming Ni wrote:
>>>>>>>> Registering the same notifier to a hook repeatedly can cause the hook
>>>>>>>> list to form a ring or lose other members of the list.
>>>>>>>
>>>>>>> I think is not enough to _prevent_ 2nd register attempt,
>>>>>>> it's enough to detect just attempt and generate warning to mark host in bad state.
>>>>>>>
>>>>>>
>>>>>> Duplicate registration is prevented in my patch, not just "mark host in bad state"
>>>>>>
>>>>>> Duplicate registration is checked and exited in notifier_chain_cond_register()
>>>>>>
>>>>>> Duplicate registration was checked in notifier_chain_register() but only 
>>>>>> the alarm was triggered without exiting. added by commit 831246570d34692e 
>>>>>> ("kernel/notifier.c: double register detection")
>>>>>>
>>>>>> My patch is like a combination of 831246570d34692e and notifier_chain_cond_register(),
>>>>>>  which triggers an alarm and exits when a duplicate registration is detected.
>>>>>>
>>>>>>> Unexpected 2nd register of the same hook most likely will lead to 2nd unregister,
>>>>>>> and it can lead to host crash in any time: 
>>>>>>> you can unregister notifier on first attempt it can be too early, it can be still in use.
>>>>>>> on the other hand you can never call 2nd unregister at all.
>>>>>>
>>>>>> Since the member was not added to the linked list at the time of the second registration, 
>>>>>> no linked list ring was formed. 
>>>>>> The member is released on the first unregistration and -ENOENT on the second unregistration.
>>>>>> After patching, the fault has been alleviated
>>>>>
>>>>> You are wrong here.
>>>>> 2nd notifier's registration is a pure bug, this should never happen.
>>>>> If you know the way to reproduce this situation -- you need to fix it. 
>>>>>
>>>>> 2nd registration can happen in 2 cases:
>>>>> 1) missed rollback, when someone forget to call unregister after successfull registration, 
>>>>> and then tried to call register again. It can lead to crash for example when according module will be unloaded.
>>>>> 2) some subsystem is registered twice, for example from  different namespaces.
>>>>> in this case unregister called during sybsystem cleanup in first namespace will incorrectly remove notifier used 
>>>>> in second namespace, it also can lead to unexpacted behaviour.
>>>>>
>>>> So in these two cases, is it more reasonable to trigger BUG() directly when checking for duplicate registration ?
>>>> But why does current notifier_chain_register() just trigger WARN() without exiting ?
>>>> notifier_chain_cond_register() direct exit without triggering WARN() ?
>>>
>>> It should recover from this, if it can be detected.  The main point is
>>> that not all apis have to be this "robust" when used within the kernel
>>> as we do allow for the callers to know what they are doing :)
>>>
>> In the notifier_chain_register(), the condition ( (*nl) == n) is the same registration of the same hook.
>>  We can intercept this situation and avoid forming a linked list ring to make the API more rob
> 
> Once again -- yes, you CAN prevent list corruption, but you CANNOT recover the host and return it back to safe state.
> If double register event was detected -- it means you have bug in kernel.
> 
> Yes, you can add BUG here and crash the host immediately, but I prefer to use warning in such situations.
> 
>>> If this does not cause any additional problems or slow downs, it's
>>> probably fine to add.
>>>
>> Notifier_chain_register() is not a system hotspot function.
>> At the same time, there is already a WARN_ONCE judgment. There is no new judgment in the new patch.
>> It only changes the processing under the condition of (*nl) == n, which will not cause performance problems.
>> At the same time, avoiding the formation of a link ring can make the system more robust.
> 
> I disagree, 
> yes, node will have correct list, but anyway node will work wrong and can crash the host in any time.

Sorry, my description is not accurate.

My patch feature does not prevent users from repeatedly registering hooks.
But avoiding the chain ring caused by the user repeatedly registering the hook

There are no modules for duplicate registration hooks in the current system.
But considering that not all modules are in the kernel source tree,
In order to improve the robustness of the kernel API, we should avoid the linked list ring caused by repeated registration.
Or in order to improve the efficiency of problem location, when the duplicate registration is checked, the system crashes directly.

On the other hand, the difference between notifier_chain_register() and notifier_chain_cond_register() for duplicate registrations is confusing:
Blocking the formation of the linked list ring in notifier_chain_cond_register()
There is no interception of the linked list ring in notifier_chain_register(), just an alarm.
Give me the illusion: Isn't notifier_chain_register() allowed to create a linked list ring?

Thanks

xiaoming Ni




^ permalink raw reply

* RE: [EXT] [PATCH v1] net: fec: optionally reset PHY via a reset-controller
From: Andy Duan @ 2019-07-16  2:02 UTC (permalink / raw)
  To: Sven Van Asbroeck
  Cc: David S . Miller, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <20190715210512.15823-1-TheSven73@gmail.com>

From: Sven Van Asbroeck <thesven73@gmail.com> Sent: Tuesday, July 16, 2019 5:05 AM
> The current fec driver allows the PHY to be reset via a gpio, specified in the
> devicetree. However, some PHYs need to be reset in a more complex way.
> 
> To accommodate such PHYs, allow an optional reset controller in the fec
> devicetree. If no reset controller is found, the fec will fall back to the legacy
> reset behaviour.
> 
> Example:
> &fec {
>         phy-mode = "rgmii";
>         resets = <&phy_reset>;
>         reset-names = "phy";
>         status = "okay";
> };
> 
> Signed-off-by: Sven Van Asbroeck <TheSven73@gmail.com>
> ---
> 
> Will send a Documentation patch if this receives a positive review.
> 
>  drivers/net/ethernet/freescale/fec_main.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/freescale/fec_main.c
> b/drivers/net/ethernet/freescale/fec_main.c
> index 38f10f7dcbc3..5a5f3ed6f16d 100644
> --- a/drivers/net/ethernet/freescale/fec_main.c
> +++ b/drivers/net/ethernet/freescale/fec_main.c
> @@ -61,6 +61,7 @@
>  #include <linux/regulator/consumer.h>
>  #include <linux/if_vlan.h>
>  #include <linux/pinctrl/consumer.h>
> +#include <linux/reset.h>
>  #include <linux/prefetch.h>
>  #include <soc/imx/cpuidle.h>
> 
> @@ -3335,6 +3336,7 @@ static int fec_enet_get_irq_cnt(struct
> platform_device *pdev)  static int  fec_probe(struct platform_device *pdev)
> {
> +       struct reset_control *phy_reset;
>         struct fec_enet_private *fep;
>         struct fec_platform_data *pdata;
>         struct net_device *ndev;
> @@ -3490,7 +3492,9 @@ fec_probe(struct platform_device *pdev)
>         pm_runtime_set_active(&pdev->dev);
>         pm_runtime_enable(&pdev->dev);
> 
> -       ret = fec_reset_phy(pdev);
> +       phy_reset = devm_reset_control_get_exclusive(&pdev->dev,
> "phy");
> +       ret = IS_ERR(phy_reset) ? fec_reset_phy(pdev) :
> +                       reset_control_reset(phy_reset);
>         if (ret)
>                 goto failed_reset;

The patch looks fine.
But the reset mechanism in the driver should be abandoned since
the phylib already can handle mii bus reset and phy device reset like
below piece code:

of_mdiobus_register()
    of_mdiobus_register_phy()
        phy_device_register()
            mdiobus_register_device()
                /* Assert the reset signal */
                mdio_device_reset(mdiodev, 1);
            /* Deassert the reset signal */
            mdio_device_reset(&phydev->mdio, 0);

dts:
        ethernet-phy@0 {
            compatible = "ethernet-phy-id0141.0e90", "ethernet-phy-ieee802.3-c45";
            interrupt-parent = <&PIC>;
            interrupts = <35 1>;
            reg = <0>;

            resets = <&rst 8>;
            reset-names = "phy";
            reset-gpios = <&gpio1 4 1>;
            reset-assert-us = <1000>;
            reset-deassert-us = <2000>;
        };
    };
            
Please refer to doc:
Documentation/devicetree/bindings/net/ethernet-phy.yaml
> 
> --
> 2.17.1


^ permalink raw reply

* Re: [PATCH net-next v2 0/7] net/rds: RDMA fixes
From: David Miller @ 2019-07-16  2:05 UTC (permalink / raw)
  To: gerd.rausch; +Cc: santosh.shilimkar, netdev
In-Reply-To: <510cd678-67d6-bd53-1d8e-7a74c4efb14a@oracle.com>


net-next is closed, and why are you submitting bug fixes for net-next
when 'net' is the appropriate tree to target for that purpose?

^ permalink raw reply

* [PATCH v2] net/netfilter: remove unnecessary spaces
From: yangxingwu @ 2019-07-16  2:13 UTC (permalink / raw)
  To: pablo
  Cc: wensong, horms, ja, kadlec, fw, davem, netdev, lvs-devel,
	netfilter-devel, coreteam, linux-kernel, joe, yangxingwu
In-Reply-To: <20190715082747.fdlpvekbqyhwx724@salvia>

this patch removes extra spaces

Signed-off-by: yangxingwu <xingwu.yang@gmail.com>
---
 net/netfilter/ipset/ip_set_hash_gen.h  | 2 +-
 net/netfilter/ipset/ip_set_list_set.c  | 2 +-
 net/netfilter/ipvs/ip_vs_core.c        | 2 +-
 net/netfilter/ipvs/ip_vs_mh.c          | 4 ++--
 net/netfilter/ipvs/ip_vs_proto_tcp.c   | 2 +-
 net/netfilter/nf_conntrack_ftp.c       | 2 +-
 net/netfilter/nf_conntrack_proto_tcp.c | 2 +-
 net/netfilter/nfnetlink_log.c          | 4 ++--
 net/netfilter/nfnetlink_queue.c        | 4 ++--
 net/netfilter/xt_IDLETIMER.c           | 2 +-
 10 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h
index 10f6196..eb907d2 100644
--- a/net/netfilter/ipset/ip_set_hash_gen.h
+++ b/net/netfilter/ipset/ip_set_hash_gen.h
@@ -954,7 +954,7 @@ struct htype {
 		mtype_data_netmask(d, NCIDR_GET(h->nets[j].cidr[0]));
 #endif
 		key = HKEY(d, h->initval, t->htable_bits);
-		n =  rcu_dereference_bh(hbucket(t, key));
+		n = rcu_dereference_bh(hbucket(t, key));
 		if (!n)
 			continue;
 		for (i = 0; i < n->pos; i++) {
diff --git a/net/netfilter/ipset/ip_set_list_set.c b/net/netfilter/ipset/ip_set_list_set.c
index 8ada318..5c2be76 100644
--- a/net/netfilter/ipset/ip_set_list_set.c
+++ b/net/netfilter/ipset/ip_set_list_set.c
@@ -289,7 +289,7 @@ struct list_set {
 	if (n &&
 	    !(SET_WITH_TIMEOUT(set) &&
 	      ip_set_timeout_expired(ext_timeout(n, set))))
-		n =  NULL;
+		n = NULL;
 
 	e = kzalloc(set->dsize, GFP_ATOMIC);
 	if (!e)
diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index 7138556..6b3ae76 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -615,7 +615,7 @@ int ip_vs_leave(struct ip_vs_service *svc, struct sk_buff *skb,
 		unsigned int flags = (svc->flags & IP_VS_SVC_F_ONEPACKET &&
 				      iph->protocol == IPPROTO_UDP) ?
 				      IP_VS_CONN_F_ONE_PACKET : 0;
-		union nf_inet_addr daddr =  { .all = { 0, 0, 0, 0 } };
+		union nf_inet_addr daddr = { .all = { 0, 0, 0, 0 } };
 
 		/* create a new connection entry */
 		IP_VS_DBG(6, "%s(): create a cache_bypass entry\n", __func__);
diff --git a/net/netfilter/ipvs/ip_vs_mh.c b/net/netfilter/ipvs/ip_vs_mh.c
index 94d9d34..da0280c 100644
--- a/net/netfilter/ipvs/ip_vs_mh.c
+++ b/net/netfilter/ipvs/ip_vs_mh.c
@@ -174,8 +174,8 @@ static int ip_vs_mh_populate(struct ip_vs_mh_state *s,
 		return 0;
 	}
 
-	table =  kcalloc(BITS_TO_LONGS(IP_VS_MH_TAB_SIZE),
-			 sizeof(unsigned long), GFP_KERNEL);
+	table = kcalloc(BITS_TO_LONGS(IP_VS_MH_TAB_SIZE),
+			sizeof(unsigned long), GFP_KERNEL);
 	if (!table)
 		return -ENOMEM;
 
diff --git a/net/netfilter/ipvs/ip_vs_proto_tcp.c b/net/netfilter/ipvs/ip_vs_proto_tcp.c
index 915ac82..c7b46a9 100644
--- a/net/netfilter/ipvs/ip_vs_proto_tcp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_tcp.c
@@ -710,7 +710,7 @@ static int __ip_vs_tcp_init(struct netns_ipvs *ipvs, struct ip_vs_proto_data *pd
 							sizeof(tcp_timeouts));
 	if (!pd->timeout_table)
 		return -ENOMEM;
-	pd->tcp_state_table =  tcp_states;
+	pd->tcp_state_table = tcp_states;
 	return 0;
 }
 
diff --git a/net/netfilter/nf_conntrack_ftp.c b/net/netfilter/nf_conntrack_ftp.c
index 8c6c11b..26c1ff8 100644
--- a/net/netfilter/nf_conntrack_ftp.c
+++ b/net/netfilter/nf_conntrack_ftp.c
@@ -162,7 +162,7 @@ static int try_rfc959(const char *data, size_t dlen,
 	if (length == 0)
 		return 0;
 
-	cmd->u3.ip =  htonl((array[0] << 24) | (array[1] << 16) |
+	cmd->u3.ip = htonl((array[0] << 24) | (array[1] << 16) |
 				    (array[2] << 8) | array[3]);
 	cmd->u.tcp.port = htons((array[4] << 8) | array[5]);
 	return length;
diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
index 1e2cc83..48f3a67 100644
--- a/net/netfilter/nf_conntrack_proto_tcp.c
+++ b/net/netfilter/nf_conntrack_proto_tcp.c
@@ -1225,7 +1225,7 @@ static int tcp_to_nlattr(struct sk_buff *skb, struct nlattr *nla,
 	[CTA_PROTOINFO_TCP_WSCALE_ORIGINAL] = { .type = NLA_U8 },
 	[CTA_PROTOINFO_TCP_WSCALE_REPLY]    = { .type = NLA_U8 },
 	[CTA_PROTOINFO_TCP_FLAGS_ORIGINAL]  = { .len = sizeof(struct nf_ct_tcp_flags) },
-	[CTA_PROTOINFO_TCP_FLAGS_REPLY]	    = { .len =  sizeof(struct nf_ct_tcp_flags) },
+	[CTA_PROTOINFO_TCP_FLAGS_REPLY]	    = { .len = sizeof(struct nf_ct_tcp_flags) },
 };
 
 #define TCP_NLATTR_SIZE	( \
diff --git a/net/netfilter/nfnetlink_log.c b/net/netfilter/nfnetlink_log.c
index 6dee4f9..d69e186 100644
--- a/net/netfilter/nfnetlink_log.c
+++ b/net/netfilter/nfnetlink_log.c
@@ -651,7 +651,7 @@ static void nfulnl_instance_free_rcu(struct rcu_head *head)
 	/* FIXME: do we want to make the size calculation conditional based on
 	 * what is actually present?  way more branches and checks, but more
 	 * memory efficient... */
-	size =    nlmsg_total_size(sizeof(struct nfgenmsg))
+	size = nlmsg_total_size(sizeof(struct nfgenmsg))
 		+ nla_total_size(sizeof(struct nfulnl_msg_packet_hdr))
 		+ nla_total_size(sizeof(u_int32_t))	/* ifindex */
 		+ nla_total_size(sizeof(u_int32_t))	/* ifindex */
@@ -668,7 +668,7 @@ static void nfulnl_instance_free_rcu(struct rcu_head *head)
 		+ nla_total_size(sizeof(struct nfgenmsg));	/* NLMSG_DONE */
 
 	if (in && skb_mac_header_was_set(skb)) {
-		size +=   nla_total_size(skb->dev->hard_header_len)
+		size += nla_total_size(skb->dev->hard_header_len)
 			+ nla_total_size(sizeof(u_int16_t))	/* hwtype */
 			+ nla_total_size(sizeof(u_int16_t));	/* hwlen */
 	}
diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
index 89750f7..a1ef6e3 100644
--- a/net/netfilter/nfnetlink_queue.c
+++ b/net/netfilter/nfnetlink_queue.c
@@ -394,7 +394,7 @@ static int nfqnl_put_bridge(struct nf_queue_entry *entry, struct sk_buff *skb)
 	char *secdata = NULL;
 	u32 seclen = 0;
 
-	size =    nlmsg_total_size(sizeof(struct nfgenmsg))
+	size = nlmsg_total_size(sizeof(struct nfgenmsg))
 		+ nla_total_size(sizeof(struct nfqnl_msg_packet_hdr))
 		+ nla_total_size(sizeof(u_int32_t))	/* ifindex */
 		+ nla_total_size(sizeof(u_int32_t))	/* ifindex */
@@ -453,7 +453,7 @@ static int nfqnl_put_bridge(struct nf_queue_entry *entry, struct sk_buff *skb)
 	}
 
 	if (queue->flags & NFQA_CFG_F_UID_GID) {
-		size +=  (nla_total_size(sizeof(u_int32_t))	/* uid */
+		size += (nla_total_size(sizeof(u_int32_t))	/* uid */
 			+ nla_total_size(sizeof(u_int32_t)));	/* gid */
 	}
 
diff --git a/net/netfilter/xt_IDLETIMER.c b/net/netfilter/xt_IDLETIMER.c
index 9cec9ea..f56d3ed 100644
--- a/net/netfilter/xt_IDLETIMER.c
+++ b/net/netfilter/xt_IDLETIMER.c
@@ -283,7 +283,7 @@ static int __init idletimer_tg_init(void)
 
 	idletimer_tg_kobj = &idletimer_tg_device->kobj;
 
-	err =  xt_register_target(&idletimer_tg);
+	err = xt_register_target(&idletimer_tg);
 	if (err < 0) {
 		pr_debug("couldn't register xt target\n");
 		goto out_dev;
-- 
1.8.3.1


^ permalink raw reply related

* Re: [PATCH] ipvs: remove unnecessary space
From: yangxingwu @ 2019-07-16  2:18 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Simon Horman, wensong, ja, kadlec, fw, davem, netdev, lvs-devel,
	netfilter-devel, coreteam, linux-kernel
In-Reply-To: <20190715082747.fdlpvekbqyhwx724@salvia>

Pablo

v2 has been sent

I made the following changes:

1. remove all unnecessary spaces in one go
2. revert bitmap_alloc ( since it's irrelevant to this subject)
3. chenge subject to "net/netfiler:remove unnecessary space"

thanks

Pablo Neira Ayuso <pablo@netfilter.org> 于2019年7月15日周一 下午4:27写道:
>
> On Wed, Jul 10, 2019 at 10:06:09AM +0200, Simon Horman wrote:
> > On Wed, Jul 10, 2019 at 03:45:52PM +0800, yangxingwu wrote:
> > > ---
> > >  net/netfilter/ipvs/ip_vs_mh.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/net/netfilter/ipvs/ip_vs_mh.c b/net/netfilter/ipvs/ip_vs_mh.c
> > > index 94d9d34..98e358e 100644
> > > --- a/net/netfilter/ipvs/ip_vs_mh.c
> > > +++ b/net/netfilter/ipvs/ip_vs_mh.c
> > > @@ -174,8 +174,8 @@ static int ip_vs_mh_populate(struct ip_vs_mh_state *s,
> > >             return 0;
> > >     }
> > >
> > > -   table =  kcalloc(BITS_TO_LONGS(IP_VS_MH_TAB_SIZE),
> > > -                    sizeof(unsigned long), GFP_KERNEL);
> > > +   table = kcalloc(BITS_TO_LONGS(IP_VS_MH_TAB_SIZE),
> > > +                   sizeof(unsigned long), GFP_KERNEL);
>
> May I ask one thing? :-)
>
> Please, remove all unnecessary spaces in one go, search for:
>
>         git grep "=  "
>
> in the netfilter tree, and send a v2 for this one.
>
> Thanks.

^ permalink raw reply

* [PATCH] net: sctp: fix warning "NULL check before some freeing functions is not needed"
From: Hariprasad Kelam @ 2019-07-16  2:20 UTC (permalink / raw)
  To: Vlad Yasevich, Neil Horman, Marcelo Ricardo Leitner,
	David S. Miller, linux-sctp, netdev, linux-kernel

This patch removes NULL checks before calling kfree.

fixes below issues reported by coccicheck
net/sctp/sm_make_chunk.c:2586:3-8: WARNING: NULL check before some
freeing functions is not needed.
net/sctp/sm_make_chunk.c:2652:3-8: WARNING: NULL check before some
freeing functions is not needed.
net/sctp/sm_make_chunk.c:2667:3-8: WARNING: NULL check before some
freeing functions is not needed.
net/sctp/sm_make_chunk.c:2684:3-8: WARNING: NULL check before some
freeing functions is not needed.

Signed-off-by: Hariprasad Kelam <hariprasad.kelam@gmail.com>
---
 net/sctp/sm_make_chunk.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index ed39396..36bd8a6e 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -2582,8 +2582,7 @@ static int sctp_process_param(struct sctp_association *asoc,
 	case SCTP_PARAM_STATE_COOKIE:
 		asoc->peer.cookie_len =
 			ntohs(param.p->length) - sizeof(struct sctp_paramhdr);
-		if (asoc->peer.cookie)
-			kfree(asoc->peer.cookie);
+		kfree(asoc->peer.cookie);
 		asoc->peer.cookie = kmemdup(param.cookie->body, asoc->peer.cookie_len, gfp);
 		if (!asoc->peer.cookie)
 			retval = 0;
@@ -2648,8 +2647,7 @@ static int sctp_process_param(struct sctp_association *asoc,
 			goto fall_through;
 
 		/* Save peer's random parameter */
-		if (asoc->peer.peer_random)
-			kfree(asoc->peer.peer_random);
+		kfree(asoc->peer.peer_random);
 		asoc->peer.peer_random = kmemdup(param.p,
 					    ntohs(param.p->length), gfp);
 		if (!asoc->peer.peer_random) {
@@ -2663,8 +2661,7 @@ static int sctp_process_param(struct sctp_association *asoc,
 			goto fall_through;
 
 		/* Save peer's HMAC list */
-		if (asoc->peer.peer_hmacs)
-			kfree(asoc->peer.peer_hmacs);
+		kfree(asoc->peer.peer_hmacs);
 		asoc->peer.peer_hmacs = kmemdup(param.p,
 					    ntohs(param.p->length), gfp);
 		if (!asoc->peer.peer_hmacs) {
@@ -2680,8 +2677,7 @@ static int sctp_process_param(struct sctp_association *asoc,
 		if (!ep->auth_enable)
 			goto fall_through;
 
-		if (asoc->peer.peer_chunks)
-			kfree(asoc->peer.peer_chunks);
+		kfree(asoc->peer.peer_chunks);
 		asoc->peer.peer_chunks = kmemdup(param.p,
 					    ntohs(param.p->length), gfp);
 		if (!asoc->peer.peer_chunks)
-- 
2.7.4


^ permalink raw reply related

* Re: [PATCH net-next v2 0/7] net/rds: RDMA fixes
From: Gerd Rausch @ 2019-07-16  2:24 UTC (permalink / raw)
  To: David Miller; +Cc: santosh.shilimkar, netdev
In-Reply-To: <20190715.190547.2251732138126894888.davem@davemloft.net>

Hi David,

This was a followup to the patch-series that I had already sent.
I'll re-write the Subject-prefix and re-submit it for "net".

Sorry for the noise,

  Gerd

On 15/07/2019 19.05, David Miller wrote:
> 
> net-next is closed, and why are you submitting bug fixes for net-next
> when 'net' is the appropriate tree to target for that purpose?
> 

^ permalink raw reply

* [PATCH] qlge: Move drivers/net/ethernet/qlogic/qlge/ to drivers/staging/qlge/
From: Benjamin Poirier @ 2019-07-16  2:34 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: Manish Chopra, GR-Linux-NIC-Dev, netdev, David Miller

The hardware has been declared EOL by the vendor more than 5 years ago.
What's more relevant to the Linux kernel is that the quality of this driver
is not on par with many other mainline drivers.

Cc: Manish Chopra <manishc@marvell.com>
Message-id: <20190617074858.32467-1-bpoirier@suse.com>
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
---
 MAINTAINERS                                   |  2 +-
 drivers/net/ethernet/qlogic/Kconfig           |  9 ----
 drivers/net/ethernet/qlogic/Makefile          |  1 -
 drivers/staging/Kconfig                       |  2 +
 drivers/staging/Makefile                      |  1 +
 drivers/staging/qlge/Kconfig                  | 10 +++++
 .../ethernet/qlogic => staging}/qlge/Makefile |  0
 drivers/staging/qlge/TODO                     | 41 +++++++++++++++++++
 .../ethernet/qlogic => staging}/qlge/qlge.h   |  0
 .../qlogic => staging}/qlge/qlge_dbg.c        |  0
 .../qlogic => staging}/qlge/qlge_ethtool.c    |  0
 .../qlogic => staging}/qlge/qlge_main.c       |  0
 .../qlogic => staging}/qlge/qlge_mpi.c        |  0
 13 files changed, 55 insertions(+), 11 deletions(-)
 create mode 100644 drivers/staging/qlge/Kconfig
 rename drivers/{net/ethernet/qlogic => staging}/qlge/Makefile (100%)
 create mode 100644 drivers/staging/qlge/TODO
 rename drivers/{net/ethernet/qlogic => staging}/qlge/qlge.h (100%)
 rename drivers/{net/ethernet/qlogic => staging}/qlge/qlge_dbg.c (100%)
 rename drivers/{net/ethernet/qlogic => staging}/qlge/qlge_ethtool.c (100%)
 rename drivers/{net/ethernet/qlogic => staging}/qlge/qlge_main.c (100%)
 rename drivers/{net/ethernet/qlogic => staging}/qlge/qlge_mpi.c (100%)

diff --git a/MAINTAINERS b/MAINTAINERS
index f5533d1bda2e..7347bbf97f66 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13145,7 +13145,7 @@ M:	Manish Chopra <manishc@marvell.com>
 M:	GR-Linux-NIC-Dev@marvell.com
 L:	netdev@vger.kernel.org
 S:	Supported
-F:	drivers/net/ethernet/qlogic/qlge/
+F:	drivers/staging/qlge/
 
 QM1D1B0004 MEDIA DRIVER
 M:	Akihiro Tsukada <tskd08@gmail.com>
diff --git a/drivers/net/ethernet/qlogic/Kconfig b/drivers/net/ethernet/qlogic/Kconfig
index a391cf6ee4b2..55a29ec76680 100644
--- a/drivers/net/ethernet/qlogic/Kconfig
+++ b/drivers/net/ethernet/qlogic/Kconfig
@@ -66,15 +66,6 @@ config QLCNIC_HWMON
 
 	  This data is available via the hwmon sysfs interface.
 
-config QLGE
-	tristate "QLogic QLGE 10Gb Ethernet Driver Support"
-	depends on PCI
-	---help---
-	  This driver supports QLogic ISP8XXX 10Gb Ethernet cards.
-
-	  To compile this driver as a module, choose M here: the module
-	  will be called qlge.
-
 config NETXEN_NIC
 	tristate "NetXen Multi port (1/10) Gigabit Ethernet NIC"
 	depends on PCI
diff --git a/drivers/net/ethernet/qlogic/Makefile b/drivers/net/ethernet/qlogic/Makefile
index 6cd2e333a5fc..1ae4a0743bd5 100644
--- a/drivers/net/ethernet/qlogic/Makefile
+++ b/drivers/net/ethernet/qlogic/Makefile
@@ -5,7 +5,6 @@
 
 obj-$(CONFIG_QLA3XXX) += qla3xxx.o
 obj-$(CONFIG_QLCNIC) += qlcnic/
-obj-$(CONFIG_QLGE) += qlge/
 obj-$(CONFIG_NETXEN_NIC) += netxen/
 obj-$(CONFIG_QED) += qed/
 obj-$(CONFIG_QEDE)+= qede/
diff --git a/drivers/staging/Kconfig b/drivers/staging/Kconfig
index 7c96a01eef6c..0b8a614be11e 100644
--- a/drivers/staging/Kconfig
+++ b/drivers/staging/Kconfig
@@ -120,4 +120,6 @@ source "drivers/staging/kpc2000/Kconfig"
 
 source "drivers/staging/isdn/Kconfig"
 
+source "drivers/staging/qlge/Kconfig"
+
 endif # STAGING
diff --git a/drivers/staging/Makefile b/drivers/staging/Makefile
index fcaac9693b83..741152511a10 100644
--- a/drivers/staging/Makefile
+++ b/drivers/staging/Makefile
@@ -50,3 +50,4 @@ obj-$(CONFIG_EROFS_FS)		+= erofs/
 obj-$(CONFIG_FIELDBUS_DEV)     += fieldbus/
 obj-$(CONFIG_KPC2000)		+= kpc2000/
 obj-$(CONFIG_ISDN_CAPI)		+= isdn/
+obj-$(CONFIG_QLGE)		+= qlge/
diff --git a/drivers/staging/qlge/Kconfig b/drivers/staging/qlge/Kconfig
new file mode 100644
index 000000000000..ae9ed2c5300b
--- /dev/null
+++ b/drivers/staging/qlge/Kconfig
@@ -0,0 +1,10 @@
+# SPDX-License-Identifier: GPL-2.0
+
+config QLGE
+	tristate "QLogic QLGE 10Gb Ethernet Driver Support"
+	depends on PCI
+	help
+	This driver supports QLogic ISP8XXX 10Gb Ethernet cards.
+
+	To compile this driver as a module, choose M here. The module will be
+	called qlge.
diff --git a/drivers/net/ethernet/qlogic/qlge/Makefile b/drivers/staging/qlge/Makefile
similarity index 100%
rename from drivers/net/ethernet/qlogic/qlge/Makefile
rename to drivers/staging/qlge/Makefile
diff --git a/drivers/staging/qlge/TODO b/drivers/staging/qlge/TODO
new file mode 100644
index 000000000000..d17d6399d86f
--- /dev/null
+++ b/drivers/staging/qlge/TODO
@@ -0,0 +1,41 @@
+* commit 7c734359d350 ("qlge: Size RX buffers based on MTU.", v2.6.33-rc1)
+  introduced dead code in the receive routines, which should be rewritten
+  anyways by the admission of the author himself, see the comment above
+  ql_build_rx_skb(). That function is now used to handle packets that
+  underwent header splitting but it still contains code to handle non split
+  cases.
+* truesize accounting is incorrect (ex: a 9000B frame has skb->truesize 10280
+  while containing two frags of order-1 allocations, ie. >16K)
+* while in that area, using 8k buffers for 9k frames seems to be an especially
+  poor choice...
+* in the "chain of large buffers" case, the driver uses an skb allocated with
+  head room but only puts data in the frags.
+* rename "rx" queues to "completion" queues. Calling tx completion queues "rx
+  queues" is confusing.
+* struct rx_ring is used for rx and tx completions, with some members relevant
+  to one case only
+* there is an inordinate amount of disparate debugging code, most of which is
+  of questionable value. In particular, qlge_dbg.c has hundreds of lines of
+  code bitrotting away in ifdef land (doesn't compile since commit
+  18c49b91777c ("qlge: do vlan cleanup", v3.1-rc1), 8 years ago).
+* triggering an ethtool regdump will instead hexdump a 176k struct to dmesg
+  depending on some module parameters.
+* the flow control implementation in firmware is buggy, disable it by default
+* some structures are initialized redundantly (ex. memset 0 after
+  alloc_etherdev)
+* the driver has a habit of using runtime checks where compile time checks are
+  possible (ex. ql_free_rx_buffers(), ql_alloc_rx_buffers())
+* reorder struct members to avoid holes if it doesn't impact performance
+* in terms of namespace, the driver uses either qlge_, ql_ (used by
+  other qlogic drivers, with clashes, ex: ql_sem_spinlock) or nothing (with
+  clashes, ex: struct ob_mac_iocb_req). Rename everything to use the "qlge_"
+  prefix.
+* avoid legacy/deprecated apis (ex. replace pci_dma_*, replace pci_enable_msi,
+  use pci_iomap)
+* some "while" loops could be rewritten with simple "for", ex.
+  ql_wait_reg_rdy(), ql_start_rx_ring())
+* remove duplicate and useless comments
+* fix weird line wrapping (all over, ex. the ql_set_routing_reg() calls in
+  qlge_set_multicast_list()).
+* fix weird indentation (all over, ex. the for loops in qlge_get_stats())
+* fix checkpatch issues
diff --git a/drivers/net/ethernet/qlogic/qlge/qlge.h b/drivers/staging/qlge/qlge.h
similarity index 100%
rename from drivers/net/ethernet/qlogic/qlge/qlge.h
rename to drivers/staging/qlge/qlge.h
diff --git a/drivers/net/ethernet/qlogic/qlge/qlge_dbg.c b/drivers/staging/qlge/qlge_dbg.c
similarity index 100%
rename from drivers/net/ethernet/qlogic/qlge/qlge_dbg.c
rename to drivers/staging/qlge/qlge_dbg.c
diff --git a/drivers/net/ethernet/qlogic/qlge/qlge_ethtool.c b/drivers/staging/qlge/qlge_ethtool.c
similarity index 100%
rename from drivers/net/ethernet/qlogic/qlge/qlge_ethtool.c
rename to drivers/staging/qlge/qlge_ethtool.c
diff --git a/drivers/net/ethernet/qlogic/qlge/qlge_main.c b/drivers/staging/qlge/qlge_main.c
similarity index 100%
rename from drivers/net/ethernet/qlogic/qlge/qlge_main.c
rename to drivers/staging/qlge/qlge_main.c
diff --git a/drivers/net/ethernet/qlogic/qlge/qlge_mpi.c b/drivers/staging/qlge/qlge_mpi.c
similarity index 100%
rename from drivers/net/ethernet/qlogic/qlge/qlge_mpi.c
rename to drivers/staging/qlge/qlge_mpi.c
-- 
2.22.0


^ permalink raw reply related

* Re: [PATCH] net: sctp: fix warning "NULL check before some freeing functions is not needed"
From: Marcelo Ricardo Leitner @ 2019-07-16  2:51 UTC (permalink / raw)
  To: Hariprasad Kelam
  Cc: Vlad Yasevich, Neil Horman, David S. Miller, linux-sctp, netdev,
	linux-kernel
In-Reply-To: <20190716022002.GA19592@hari-Inspiron-1545>

On Tue, Jul 16, 2019 at 07:50:02AM +0530, Hariprasad Kelam wrote:
> This patch removes NULL checks before calling kfree.
> 
> fixes below issues reported by coccicheck
> net/sctp/sm_make_chunk.c:2586:3-8: WARNING: NULL check before some
> freeing functions is not needed.
> net/sctp/sm_make_chunk.c:2652:3-8: WARNING: NULL check before some
> freeing functions is not needed.
> net/sctp/sm_make_chunk.c:2667:3-8: WARNING: NULL check before some
> freeing functions is not needed.
> net/sctp/sm_make_chunk.c:2684:3-8: WARNING: NULL check before some
> freeing functions is not needed.
> 
> Signed-off-by: Hariprasad Kelam <hariprasad.kelam@gmail.com>

Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

> ---
>  net/sctp/sm_make_chunk.c | 12 ++++--------
>  1 file changed, 4 insertions(+), 8 deletions(-)
> 
> diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
> index ed39396..36bd8a6e 100644
> --- a/net/sctp/sm_make_chunk.c
> +++ b/net/sctp/sm_make_chunk.c
> @@ -2582,8 +2582,7 @@ static int sctp_process_param(struct sctp_association *asoc,
>  	case SCTP_PARAM_STATE_COOKIE:
>  		asoc->peer.cookie_len =
>  			ntohs(param.p->length) - sizeof(struct sctp_paramhdr);
> -		if (asoc->peer.cookie)
> -			kfree(asoc->peer.cookie);
> +		kfree(asoc->peer.cookie);
>  		asoc->peer.cookie = kmemdup(param.cookie->body, asoc->peer.cookie_len, gfp);
>  		if (!asoc->peer.cookie)
>  			retval = 0;
> @@ -2648,8 +2647,7 @@ static int sctp_process_param(struct sctp_association *asoc,
>  			goto fall_through;
>  
>  		/* Save peer's random parameter */
> -		if (asoc->peer.peer_random)
> -			kfree(asoc->peer.peer_random);
> +		kfree(asoc->peer.peer_random);
>  		asoc->peer.peer_random = kmemdup(param.p,
>  					    ntohs(param.p->length), gfp);
>  		if (!asoc->peer.peer_random) {
> @@ -2663,8 +2661,7 @@ static int sctp_process_param(struct sctp_association *asoc,
>  			goto fall_through;
>  
>  		/* Save peer's HMAC list */
> -		if (asoc->peer.peer_hmacs)
> -			kfree(asoc->peer.peer_hmacs);
> +		kfree(asoc->peer.peer_hmacs);
>  		asoc->peer.peer_hmacs = kmemdup(param.p,
>  					    ntohs(param.p->length), gfp);
>  		if (!asoc->peer.peer_hmacs) {
> @@ -2680,8 +2677,7 @@ static int sctp_process_param(struct sctp_association *asoc,
>  		if (!ep->auth_enable)
>  			goto fall_through;
>  
> -		if (asoc->peer.peer_chunks)
> -			kfree(asoc->peer.peer_chunks);
> +		kfree(asoc->peer.peer_chunks);
>  		asoc->peer.peer_chunks = kmemdup(param.p,
>  					    ntohs(param.p->length), gfp);
>  		if (!asoc->peer.peer_chunks)
> -- 
> 2.7.4
> 

^ permalink raw reply

* [PATCH bpf] libbpf: fix another GCC8 warning for strncpy
From: Andrii Nakryiko @ 2019-07-16  3:57 UTC (permalink / raw)
  To: ast, daniel, bpf, netdev
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko, Magnus Karlsson

Similar issue was fixed in cdfc7f888c2a ("libbpf: fix GCC8 warning for
strncpy") already. This one was missed. Fixing now.

Cc: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 tools/lib/bpf/xsk.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
index b33740221b7e..5007b5d4fd2c 100644
--- a/tools/lib/bpf/xsk.c
+++ b/tools/lib/bpf/xsk.c
@@ -517,7 +517,8 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
 		err = -errno;
 		goto out_socket;
 	}
-	strncpy(xsk->ifname, ifname, IFNAMSIZ);
+	strncpy(xsk->ifname, ifname, IFNAMSIZ - 1);
+	xsk->ifname[IFNAMSIZ - 1] = '\0';
 
 	err = xsk_set_xdp_socket_config(&xsk->config, usr_config);
 	if (err)
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH 7/9] x86/pci: Pass lockdep condition to pcm_mmcfg_list iterator (v1)
From: Joel Fernandes @ 2019-07-16  4:03 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-kernel, Alexey Kuznetsov, Borislav Petkov, c0d1n61at3,
	David S. Miller, edumazet, Greg Kroah-Hartman, Hideaki YOSHIFUJI,
	H. Peter Anvin, Ingo Molnar, Jonathan Corbet, Josh Triplett,
	keescook, kernel-hardening, kernel-team, Lai Jiangshan, Len Brown,
	linux-acpi, linux-doc, linux-pci, linux-pm, Mathieu Desnoyers,
	neilb, netdev, Oleg Nesterov, Paul E. McKenney, Pavel Machek,
	peterz, Rafael J. Wysocki, Rasmus Villemoes, rcu, Steven Rostedt,
	Tejun Heo, Thomas Gleixner, will,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190715200235.GG46935@google.com>

On Mon, Jul 15, 2019 at 03:02:35PM -0500, Bjorn Helgaas wrote:
> On Mon, Jul 15, 2019 at 10:37:03AM -0400, Joel Fernandes (Google) wrote:
> > The pcm_mmcfg_list is traversed with list_for_each_entry_rcu without a
> > reader-lock held, because the pci_mmcfg_lock is already held. Make this
> > known to the list macro so that it fixes new lockdep warnings that
> > trigger due to lockdep checks added to list_for_each_entry_rcu().
> > 
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> 
> Ingo takes care of most patches to this file, but FWIW,
> 
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>

Thanks.

> I would personally prefer if you capitalized the subject to match the
> "x86/PCI:" convention that's used fairly consistently in
> arch/x86/pci/.
> 
> Also, I didn't apply this to be sure, but it looks like this might
> make a line or two wider than 80 columns, which I would rewrap if I
> were applying this.

Updated below is the patch with the nits corrected:

---8<-----------------------

From 73fab09d7e33ca2110c24215f8ed428c12625dbe Mon Sep 17 00:00:00 2001
From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Date: Sat, 1 Jun 2019 15:05:49 -0400
Subject: [PATCH] x86/PCI: Pass lockdep condition to pcm_mmcfg_list iterator
 (v1)

The pcm_mmcfg_list is traversed with list_for_each_entry_rcu without a
reader-lock held, because the pci_mmcfg_lock is already held. Make this
known to the list macro so that it fixes new lockdep warnings that
trigger due to lockdep checks added to list_for_each_entry_rcu().

Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 arch/x86/pci/mmconfig-shared.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
index 7389db538c30..9e3250ec5a37 100644
--- a/arch/x86/pci/mmconfig-shared.c
+++ b/arch/x86/pci/mmconfig-shared.c
@@ -29,6 +29,7 @@
 static bool pci_mmcfg_running_state;
 static bool pci_mmcfg_arch_init_failed;
 static DEFINE_MUTEX(pci_mmcfg_lock);
+#define pci_mmcfg_lock_held() lock_is_held(&(pci_mmcfg_lock).dep_map)
 
 LIST_HEAD(pci_mmcfg_list);
 
@@ -54,7 +55,8 @@ static void list_add_sorted(struct pci_mmcfg_region *new)
 	struct pci_mmcfg_region *cfg;
 
 	/* keep list sorted by segment and starting bus number */
-	list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list) {
+	list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list,
+				pci_mmcfg_lock_held()) {
 		if (cfg->segment > new->segment ||
 		    (cfg->segment == new->segment &&
 		     cfg->start_bus >= new->start_bus)) {
@@ -118,7 +120,8 @@ struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus)
 {
 	struct pci_mmcfg_region *cfg;
 
-	list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list)
+	list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list
+				pci_mmcfg_lock_held())
 		if (cfg->segment == segment &&
 		    cfg->start_bus <= bus && bus <= cfg->end_bus)
 			return cfg;
-- 
2.22.0.510.g264f2c817a-goog


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox