Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH bpf-next v4 2/5] bpf: implement BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap
From: Peter Oskolkov @ 2019-01-30 19:48 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, netdev
  Cc: Peter Oskolkov, David Ahern, Peter Oskolkov
In-Reply-To: <20190130194811.239760-1-posk@google.com>

This patch implements BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap
BPF helper. It enables BPF programs (specifically, BPF_PROG_TYPE_LWT_IN
and BPF_PROG_TYPE_LWT_XMIT prog types) to add IP encapsulation headers
to packets (e.g. IP/GRE, GUE, IPIP).

This is useful when thousands of different short-lived flows should be
encapped, each with different and dynamically determined destination.
Although lwtunnels can be used in some of these scenarios, the ability
to dynamically generate encap headers adds more flexibility, e.g.
when routing depends on the state of the host (reflected in global bpf
maps).

Signed-off-by: Peter Oskolkov <posk@google.com>
---
 include/net/lwtunnel.h |  3 +++
 net/core/filter.c      |  3 ++-
 net/core/lwt_bpf.c     | 59 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/include/net/lwtunnel.h b/include/net/lwtunnel.h
index 33fd9ba7e0e5..f0973eca8036 100644
--- a/include/net/lwtunnel.h
+++ b/include/net/lwtunnel.h
@@ -126,6 +126,8 @@ int lwtunnel_cmp_encap(struct lwtunnel_state *a, struct lwtunnel_state *b);
 int lwtunnel_output(struct net *net, struct sock *sk, struct sk_buff *skb);
 int lwtunnel_input(struct sk_buff *skb);
 int lwtunnel_xmit(struct sk_buff *skb);
+int bpf_lwt_push_ip_encap(struct sk_buff *skb, void *hdr, u32 len,
+			  bool ingress);
 
 static inline void lwtunnel_set_redirect(struct dst_entry *dst)
 {
@@ -138,6 +140,7 @@ static inline void lwtunnel_set_redirect(struct dst_entry *dst)
 		dst->input = lwtunnel_input;
 	}
 }
+
 #else
 
 static inline void lwtstate_free(struct lwtunnel_state *lws)
diff --git a/net/core/filter.c b/net/core/filter.c
index 27d3fbe4b77b..de6bd4b4e0a3 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -73,6 +73,7 @@
 #include <linux/seg6_local.h>
 #include <net/seg6.h>
 #include <net/seg6_local.h>
+#include <net/lwtunnel.h>
 
 /**
  *	sk_filter_trim_cap - run a packet through a socket filter
@@ -4804,7 +4805,7 @@ static int bpf_push_seg6_encap(struct sk_buff *skb, u32 type, void *hdr, u32 len
 static int bpf_push_ip_encap(struct sk_buff *skb, void *hdr, u32 len,
 			     bool ingress)
 {
-	return -EINVAL;  /* Implemented in the next patch. */
+	return bpf_lwt_push_ip_encap(skb, hdr, len, ingress);
 }
 
 BPF_CALL_4(bpf_lwt_in_push_encap, struct sk_buff *, skb, u32, type, void *, hdr,
diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c
index a648568c5e8f..6a6e9acab73d 100644
--- a/net/core/lwt_bpf.c
+++ b/net/core/lwt_bpf.c
@@ -390,6 +390,65 @@ static const struct lwtunnel_encap_ops bpf_encap_ops = {
 	.owner		= THIS_MODULE,
 };
 
+int bpf_lwt_push_ip_encap(struct sk_buff *skb, void *hdr, u32 len, bool ingress)
+{
+	struct iphdr *iph;
+	bool ipv4;
+	int err;
+
+	if (unlikely(len < sizeof(struct iphdr) || len > LWT_BPF_MAX_HEADROOM))
+		return -EINVAL;
+
+	/* validate protocol and length */
+	iph = (struct iphdr *)hdr;
+	if (iph->version == 4) {
+		ipv4 = true;
+		if (unlikely(len < iph->ihl * 4))
+			return -EINVAL;
+	} else if (iph->version == 6) {
+		ipv4 = false;
+		if (unlikely(len < sizeof(struct ipv6hdr)))
+			return -EINVAL;
+	} else {
+		return -EINVAL;
+	}
+
+	if (ingress)
+		err = skb_cow_head(skb, len + skb->mac_len);
+	else
+		err = skb_cow_head(skb,
+				   len + LL_RESERVED_SPACE(skb_dst(skb)->dev));
+	if (unlikely(err))
+		return err;
+
+	/* push the encap headers and fix pointers */
+	skb_reset_inner_headers(skb);
+	skb->encapsulation = 1;
+	skb_push(skb, len);
+	if (ingress)
+		skb_postpush_rcsum(skb, iph, len);
+	skb_reset_network_header(skb);
+	memcpy(skb_network_header(skb), hdr, len);
+	bpf_compute_data_pointers(skb);
+
+	if (ipv4) {
+		skb->protocol = htons(ETH_P_IP);
+		iph = ip_hdr(skb);
+		if (iph->ihl * 4 < len)
+			skb_set_transport_header(skb, iph->ihl * 4);
+
+		if (!iph->check)
+			iph->check = ip_fast_csum((unsigned char *)iph,
+						  iph->ihl);
+	} else {
+		skb->protocol = htons(ETH_P_IPV6);
+		if (sizeof(struct ipv6hdr) < len)
+			skb_set_transport_header(skb, sizeof(struct ipv6hdr));
+	}
+
+	return 0;
+}
+
 static int __init bpf_lwt_init(void)
 {
 	return lwtunnel_encap_add_ops(&bpf_encap_ops, LWTUNNEL_ENCAP_BPF);
-- 
2.20.1.495.gaa96b0ce6b-goog


^ permalink raw reply related

* [PATCH bpf-next v4 1/5] bpf: add plumbing for BPF_LWT_ENCAP_IP in bpf_lwt_push_encap
From: Peter Oskolkov @ 2019-01-30 19:48 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, netdev
  Cc: Peter Oskolkov, David Ahern, Peter Oskolkov
In-Reply-To: <20190130194811.239760-1-posk@google.com>

This patch adds all needed plumbing in preparation to allowing
bpf programs to do IP encapping via bpf_lwt_push_encap. Actual
implementation is added in the next patch in the patchset.

Of note:
- bpf_lwt_push_encap can now be called from BPF_PROG_TYPE_LWT_XMIT
  prog types in addition to BPF_PROG_TYPE_LWT_IN;
- as route lookups are different for ingress vs egress, the single
  external bpf_lwt_push_encap BPF helper is routed internally to
  either bpf_lwt_in_push_encap or bpf_lwt_xmit_push_encap BPF_CALLs,
  depending on prog type.

Signed-off-by: Peter Oskolkov <posk@google.com>
---
 include/uapi/linux/bpf.h | 23 ++++++++++++++++++--
 net/core/filter.c        | 46 +++++++++++++++++++++++++++++++++++-----
 2 files changed, 62 insertions(+), 7 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 60b99b730a41..911c15585fab 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2015,6 +2015,16 @@ union bpf_attr {
  *			Only works if *skb* contains an IPv6 packet. Insert a
  *			Segment Routing Header (**struct ipv6_sr_hdr**) inside
  *			the IPv6 header.
+ *		**BPF_LWT_ENCAP_IP**
+ *			IP encapsulation (GRE/GUE/IPIP/etc). The outer header
+ *			must be IPv4 or IPv6, followed by zero or more
+ *			additional headers, up to LWT_BPF_MAX_HEADROOM total
+ *			bytes in all prepended headers.
+ *
+ *		BPF_LWT_ENCAP_SEG6*** types can be called by bpf programs of
+ *		type BPF_PROG_TYPE_LWT_IN; BPF_LWT_ENCAP_IP type can be called
+ *		by bpf programs of types BPF_PROG_TYPE_LWT_IN and
+ *		BPF_PROG_TYPE_LWT_XMIT.
  *
  * 		A call to this helper is susceptible to change the underlaying
  * 		packet buffer. Therefore, at load time, all checks on pointers
@@ -2495,7 +2505,8 @@ enum bpf_hdr_start_off {
 /* Encapsulation type for BPF_FUNC_lwt_push_encap helper. */
 enum bpf_lwt_encap_mode {
 	BPF_LWT_ENCAP_SEG6,
-	BPF_LWT_ENCAP_SEG6_INLINE
+	BPF_LWT_ENCAP_SEG6_INLINE,
+	BPF_LWT_ENCAP_IP,
 };
 
 #define __bpf_md_ptr(type, name)	\
@@ -2583,7 +2594,15 @@ enum bpf_ret_code {
 	BPF_DROP = 2,
 	/* 3-6 reserved */
 	BPF_REDIRECT = 7,
-	/* >127 are reserved for prog type specific return codes */
+	/* >127 are reserved for prog type specific return codes.
+	 *
+	 * BPF_LWT_REROUTE: used by BPF_PROG_TYPE_LWT_IN and
+	 *    BPF_PROG_TYPE_LWT_XMIT to indicate that skb had been
+	 *    changed and should be routed based on its new L3 header.
+	 *    (This is an L3 redirect, as opposed to L2 redirect
+	 *    represented by BPF_REDIRECT above).
+	 */
+	BPF_LWT_REROUTE = 128,
 };
 
 struct bpf_sock {
diff --git a/net/core/filter.c b/net/core/filter.c
index 41984ad4b9b4..27d3fbe4b77b 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4801,7 +4801,13 @@ static int bpf_push_seg6_encap(struct sk_buff *skb, u32 type, void *hdr, u32 len
 }
 #endif /* CONFIG_IPV6_SEG6_BPF */
 
-BPF_CALL_4(bpf_lwt_push_encap, struct sk_buff *, skb, u32, type, void *, hdr,
+static int bpf_push_ip_encap(struct sk_buff *skb, void *hdr, u32 len,
+			     bool ingress)
+{
+	return -EINVAL;  /* Implemented in the next patch. */
+}
+
+BPF_CALL_4(bpf_lwt_in_push_encap, struct sk_buff *, skb, u32, type, void *, hdr,
 	   u32, len)
 {
 	switch (type) {
@@ -4809,14 +4815,41 @@ BPF_CALL_4(bpf_lwt_push_encap, struct sk_buff *, skb, u32, type, void *, hdr,
 	case BPF_LWT_ENCAP_SEG6:
 	case BPF_LWT_ENCAP_SEG6_INLINE:
 		return bpf_push_seg6_encap(skb, type, hdr, len);
+#endif
+#if IS_ENABLED(CONFIG_LWTUNNEL_BPF)
+	case BPF_LWT_ENCAP_IP:
+		return bpf_push_ip_encap(skb, hdr, len, true /* ingress */);
 #endif
 	default:
 		return -EINVAL;
 	}
 }
 
-static const struct bpf_func_proto bpf_lwt_push_encap_proto = {
-	.func		= bpf_lwt_push_encap,
+BPF_CALL_4(bpf_lwt_xmit_push_encap, struct sk_buff *, skb, u32, type,
+	   void *, hdr, u32, len)
+{
+	switch (type) {
+#if IS_ENABLED(CONFIG_LWTUNNEL_BPF)
+	case BPF_LWT_ENCAP_IP:
+		return bpf_push_ip_encap(skb, hdr, len, false /* egress */);
+#endif
+	default:
+		return -EINVAL;
+	}
+}
+
+static const struct bpf_func_proto bpf_lwt_in_push_encap_proto = {
+	.func		= bpf_lwt_in_push_encap,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+	.arg2_type	= ARG_ANYTHING,
+	.arg3_type	= ARG_PTR_TO_MEM,
+	.arg4_type	= ARG_CONST_SIZE
+};
+
+static const struct bpf_func_proto bpf_lwt_xmit_push_encap_proto = {
+	.func		= bpf_lwt_xmit_push_encap,
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
@@ -5282,7 +5315,8 @@ bool bpf_helper_changes_pkt_data(void *func)
 	    func == bpf_lwt_seg6_adjust_srh ||
 	    func == bpf_lwt_seg6_action ||
 #endif
-	    func == bpf_lwt_push_encap)
+	    func == bpf_lwt_in_push_encap ||
+	    func == bpf_lwt_xmit_push_encap)
 		return true;
 
 	return false;
@@ -5660,7 +5694,7 @@ lwt_in_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
 	switch (func_id) {
 	case BPF_FUNC_lwt_push_encap:
-		return &bpf_lwt_push_encap_proto;
+		return &bpf_lwt_in_push_encap_proto;
 	default:
 		return lwt_out_func_proto(func_id, prog);
 	}
@@ -5696,6 +5730,8 @@ lwt_xmit_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_l4_csum_replace_proto;
 	case BPF_FUNC_set_hash_invalid:
 		return &bpf_set_hash_invalid_proto;
+	case BPF_FUNC_lwt_push_encap:
+		return &bpf_lwt_xmit_push_encap_proto;
 	default:
 		return lwt_out_func_proto(func_id, prog);
 	}
-- 
2.20.1.495.gaa96b0ce6b-goog


^ permalink raw reply related

* [PATCH bpf-next v4 0/5] bpf: add BPF_LWT_ENCAP_IP option to bpf_lwt_push_encap
From: Peter Oskolkov @ 2019-01-30 19:48 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, netdev
  Cc: Peter Oskolkov, David Ahern, Peter Oskolkov

This patchset implements BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap
BPF helper. It enables BPF programs (specifically, BPF_PROG_TYPE_LWT_IN
and BPF_PROG_TYPE_LWT_XMIT prog types) to add IP encapsulation headers
to packets (e.g. IP/GRE, GUE, IPIP).

This is useful when thousands of different short-lived flows should be
encapped, each with different and dynamically determined destination.
Although lwtunnels can be used in some of these scenarios, the ability
to dynamically generate encap headers adds more flexibility, e.g.
when routing depends on the state of the host (reflected in global bpf
maps).

V2 changes: Added flowi-based route lookup, IPv6 encapping, and
encapping on ingress.

V3 changes: incorporated David Ahern's suggestions:
  - added l3mdev check/oif (patch 2)
  - sync bpf.h from include/uapi into tools/include/uapi
  - selftest tweaks

V4 changes: moved route lookup/dst change from bpf_push_ip_encap
  to when BPF_LWT_REROUTE is handled, as suggested by David Ahern.

Peter Oskolkov (5):
  bpf: add plumbing for BPF_LWT_ENCAP_IP in bpf_lwt_push_encap
  bpf: implement BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap
  bpf: add handling of BPF_LWT_REROUTE to lwt_bpf.c
  bpf: sync <kdir>/<uapi>/bpf.h with tools/<uapi>/bpf.h
  selftests: bpf: add test_lwt_ip_encap selftest

 include/net/lwtunnel.h                        |   3 +
 include/uapi/linux/bpf.h                      |  23 +-
 net/core/filter.c                             |  47 ++-
 net/core/lwt_bpf.c                            | 175 ++++++++++
 tools/include/uapi/linux/bpf.h                |  23 +-
 tools/testing/selftests/bpf/Makefile          |   5 +-
 .../testing/selftests/bpf/test_lwt_ip_encap.c |  84 +++++
 .../selftests/bpf/test_lwt_ip_encap.sh        | 311 ++++++++++++++++++
 8 files changed, 660 insertions(+), 11 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/test_lwt_ip_encap.c
 create mode 100755 tools/testing/selftests/bpf/test_lwt_ip_encap.sh

-- 
2.20.1.495.gaa96b0ce6b-goog

^ permalink raw reply

* Re: [PATCH bpf-next 3/4] bpf: fix lockdep false positive in bpf_prog_register
From: Peter Zijlstra @ 2019-01-30 19:46 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, davem, daniel, edumazet, jannh, netdev,
	kernel-team
In-Reply-To: <20190130193241.4gkbc7zuo3m7r3ll@ast-mbp.dhcp.thefacebook.com>

On Wed, Jan 30, 2019 at 11:32:43AM -0800, Alexei Starovoitov wrote:
> On Wed, Jan 30, 2019 at 11:10:58AM +0100, Peter Zijlstra wrote:
> > 
> > Why do you say this is not possible? All you need is 3 CPUs, one doing a
> > CPU online, one doing a perf ioctl() and one doing that
> > bpf_probe_register().
> 
> yeah. indeed. I'm impressed that lockdep figured it out
> while I missed it manually looking through the code and stack traces.

That's what it does :-)


^ permalink raw reply

* Re: [PATCH bpf-next 2/4] bpf: fix lockdep false positive in stackmap
From: Peter Zijlstra @ 2019-01-30 19:44 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, davem, daniel, edumazet, jannh, netdev,
	kernel-team, Waiman Long
In-Reply-To: <20190130193040.3hmh6g7efk5z3g2j@ast-mbp.dhcp.thefacebook.com>

On Wed, Jan 30, 2019 at 11:30:41AM -0800, Alexei Starovoitov wrote:
> On Wed, Jan 30, 2019 at 11:15:30AM +0100, Peter Zijlstra wrote:
> > On Tue, Jan 29, 2019 at 08:04:56PM -0800, Alexei Starovoitov wrote:
> > > Lockdep warns about false positive:
> > 
> > This is not a false positive, and you probably also need to use
> > down_read_non_owner() to match this up_read_non_owner().
> > 
> > {up,down}_read() and {up,down}_read_non_owner() are not only different
> > in the lockdep annotation; there is also optimistic spin stuff that
> > relies on 'owner' tracking.
> 
> Can you point out in the code the spin bit?

Hurmph, looks like you're right. I got lost in that stuff again. I hate
that rwsem code :/

Rewriting that all is on the todo list somewhere, but it's far down :/

> As far as I can see sem->owner is debug only feature.
> All owner checks are done under CONFIG_DEBUG_RWSEMS.
> 
> Also there is no down_read_trylock_non_owner() at the moment.
> We can argue about it for -next, but I'd rather silence lockdep
> with this patch today.

I don't agree with calling is silencing; it fixes an mis-use of the API.
But yes, keep the patch as is for now.

^ permalink raw reply

* Re: [PATCH bpf-next 2/4] bpf: fix lockdep false positive in stackmap
From: Waiman Long @ 2019-01-30 19:42 UTC (permalink / raw)
  To: Alexei Starovoitov, Peter Zijlstra
  Cc: Alexei Starovoitov, davem, daniel, edumazet, jannh, netdev,
	kernel-team
In-Reply-To: <20190130193040.3hmh6g7efk5z3g2j@ast-mbp.dhcp.thefacebook.com>

On 01/30/2019 02:30 PM, Alexei Starovoitov wrote:
> On Wed, Jan 30, 2019 at 11:15:30AM +0100, Peter Zijlstra wrote:
>> On Tue, Jan 29, 2019 at 08:04:56PM -0800, Alexei Starovoitov wrote:
>>> Lockdep warns about false positive:
>> This is not a false positive, and you probably also need to use
>> down_read_non_owner() to match this up_read_non_owner().
>>
>> {up,down}_read() and {up,down}_read_non_owner() are not only different
>> in the lockdep annotation; there is also optimistic spin stuff that
>> relies on 'owner' tracking.
> Can you point out in the code the spin bit?
> As far as I can see sem->owner is debug only feature.
> All owner checks are done under CONFIG_DEBUG_RWSEMS.

No, sem->owner is mainly for performing optimistic spinning which is a
performance feature to make rwsem writer-lock performs similar to mutex.
The debugging part is just an add-on. It is not the reason for the
presence of sem->owner.

> Also there is no down_read_trylock_non_owner() at the moment.
> We can argue about it for -next, but I'd rather silence lockdep
> with this patch today.
>
We can add down_read_trylock_non_owner() if there is a need for it. It
should be easy to do.

Cheers,
Longman

down_read_trylock_non_owner(


^ permalink raw reply

* RE: [Intel-wired-lan] [PATCH net] i40e: fix potential RX buffer starvation for AF_XDP
From: Bowers, AndrewX @ 2019-01-30 19:41 UTC (permalink / raw)
  To: intel-wired-lan@lists.osuosl.org; +Cc: netdev@vger.kernel.org
In-Reply-To: <1548770597-16141-1-git-send-email-magnus.karlsson@intel.com>

> -----Original Message-----
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces@osuosl.org] On
> Behalf Of Magnus Karlsson
> Sent: Tuesday, January 29, 2019 6:03 AM
> To: Karlsson, Magnus <magnus.karlsson@intel.com>; Topel, Bjorn
> <bjorn.topel@intel.com>; intel-wired-lan@lists.osuosl.org
> Cc: netdev@vger.kernel.org
> Subject: [Intel-wired-lan] [PATCH net] i40e: fix potential RX buffer starvation
> for AF_XDP
> 
> When the RX rings are created they are also populated with buffers so that
> packets can be received. Usually these are kernel buffers, but for AF_XDP in
> zero-copy mode, these are user-space buffers and in this case the
> application might not have sent down any buffers to the driver at this point.
> And if no buffers are allocated at ring creation time, no packets can be
> received and no interupts will be generated so the napi poll function that
> allocates buffers to the rings will never get executed.
> 
> To rectify this, we kick the NAPI context of any queue with an attached
> AF_XDP zero-copy socket in two places in the code. Once after an XDP
> program has loaded and once after the umem is registered.
> This take care of both cases: XDP program gets loaded first then AF_XDP
> socket is created, and the reverse, AF_XDP socket is created first, then XDP
> program is loaded.
> 
> Fixes: 0a714186d3c0 ("i40e: add AF_XDP zero-copy Rx support")
> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
> ---
>  drivers/net/ethernet/intel/i40e/i40e_main.c | 13 ++++++++++++-
> drivers/net/ethernet/intel/i40e/i40e_xsk.c  |  5 +++++
>  2 files changed, 17 insertions(+), 1 deletion(-)

Tested-by: Andrew Bowers <andrewx.bowers@intel.com>



^ permalink raw reply

* RE: [Intel-wired-lan] [PATCH net] ixgbe: fix potential RX buffer starvation for AF_XDP
From: Bowers, AndrewX @ 2019-01-30 19:41 UTC (permalink / raw)
  To: intel-wired-lan@lists.osuosl.org; +Cc: netdev@vger.kernel.org
In-Reply-To: <1548770630-16189-1-git-send-email-magnus.karlsson@intel.com>

> -----Original Message-----
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces@osuosl.org] On
> Behalf Of Magnus Karlsson
> Sent: Tuesday, January 29, 2019 6:04 AM
> To: Karlsson, Magnus <magnus.karlsson@intel.com>; Topel, Bjorn
> <bjorn.topel@intel.com>; intel-wired-lan@lists.osuosl.org
> Cc: netdev@vger.kernel.org
> Subject: [Intel-wired-lan] [PATCH net] ixgbe: fix potential RX buffer
> starvation for AF_XDP
> 
> When the RX rings are created they are also populated with buffers so that
> packets can be received. Usually these are kernel buffers, but for AF_XDP in
> zero-copy mode, these are user-space buffers and in this case the
> application might not have sent down any buffers to the driver at this point.
> And if no buffers are allocated at ring creation time, no packets can be
> received and no interupts will be generated so the napi poll function that
> allocates buffers to the rings will never get executed.
> 
> To recitfy this, we kick the NAPI context of any queue with an attached
> AF_XDP zero-copy socket in two places in the code. Once after an XDP
> program has loaded and once after the umem is registered.  This take care of
> both cases: XDP program gets loaded first then AF_XDP socket is created,
> and the reverse, AF_XDP socket is created first, then XDP program is loaded.
> 
> Fixes: d0bcacd0a130 ("ixgbe: add AF_XDP zero-copy Rx support")
> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 12 +++++++++++-
> drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 12 ++++++++++--
>  2 files changed, 21 insertions(+), 3 deletions(-)

Tested-by: Andrew Bowers <andrewx.bowers@intel.com>



^ permalink raw reply

* [PATCH net] dccp: fool proof ccid_hc_[rt]x_parse_options()
From: Eric Dumazet @ 2019-01-30 19:39 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Eric Dumazet, syzbot, Gerrit Renker

Similarly to commit 276bdb82dedb ("dccp: check ccid before dereferencing")
it is wise to test for a NULL ccid.

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 5.0.0-rc3+ #37
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:ccid_hc_tx_parse_options net/dccp/ccid.h:205 [inline]
RIP: 0010:dccp_parse_options+0x8d9/0x12b0 net/dccp/options.c:233
Code: c5 0f b6 75 b3 80 38 00 0f 85 d6 08 00 00 48 b9 00 00 00 00 00 fc ff df 48 8b 45 b8 4c 8b b8 f8 07 00 00 4c 89 f8 48 c1 e8 03 <80> 3c 08 00 0f 85 95 08 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b
kobject: 'loop5' (0000000080f78fc1): kobject_uevent_env
RSP: 0018:ffff8880a94df0b8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8880858ac723 RCX: dffffc0000000000
RDX: 0000000000000100 RSI: 0000000000000007 RDI: 0000000000000001
RBP: ffff8880a94df140 R08: 0000000000000001 R09: ffff888061b83a80
R10: ffffed100c370752 R11: ffff888061b83a97 R12: 0000000000000026
R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f0defa33518 CR3: 000000008db5e000 CR4: 00000000001406e0
kobject: 'loop5' (0000000080f78fc1): fill_kobj_path: path = '/devices/virtual/block/loop5'
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 dccp_rcv_state_process+0x2b6/0x1af6 net/dccp/input.c:654
 dccp_v4_do_rcv+0x100/0x190 net/dccp/ipv4.c:688
 sk_backlog_rcv include/net/sock.h:936 [inline]
 __sk_receive_skb+0x3a9/0xea0 net/core/sock.c:473
 dccp_v4_rcv+0x10cb/0x1f80 net/dccp/ipv4.c:880
 ip_protocol_deliver_rcu+0xb6/0xa20 net/ipv4/ip_input.c:208
 ip_local_deliver_finish+0x23b/0x390 net/ipv4/ip_input.c:234
 NF_HOOK include/linux/netfilter.h:289 [inline]
 NF_HOOK include/linux/netfilter.h:283 [inline]
 ip_local_deliver+0x1f0/0x740 net/ipv4/ip_input.c:255
 dst_input include/net/dst.h:450 [inline]
 ip_rcv_finish+0x1f4/0x2f0 net/ipv4/ip_input.c:414
 NF_HOOK include/linux/netfilter.h:289 [inline]
 NF_HOOK include/linux/netfilter.h:283 [inline]
 ip_rcv+0xed/0x620 net/ipv4/ip_input.c:524
 __netif_receive_skb_one_core+0x160/0x210 net/core/dev.c:4973
 __netif_receive_skb+0x2c/0x1c0 net/core/dev.c:5083
 process_backlog+0x206/0x750 net/core/dev.c:5923
 napi_poll net/core/dev.c:6346 [inline]
 net_rx_action+0x76d/0x1930 net/core/dev.c:6412
 __do_softirq+0x30b/0xb11 kernel/softirq.c:292
 run_ksoftirqd kernel/softirq.c:654 [inline]
 run_ksoftirqd+0x8e/0x110 kernel/softirq.c:646
 smpboot_thread_fn+0x6ab/0xa10 kernel/smpboot.c:164
 kthread+0x357/0x430 kernel/kthread.c:246
 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
Modules linked in:
---[ end trace 58a0ba03bea2c376 ]---
RIP: 0010:ccid_hc_tx_parse_options net/dccp/ccid.h:205 [inline]
RIP: 0010:dccp_parse_options+0x8d9/0x12b0 net/dccp/options.c:233
Code: c5 0f b6 75 b3 80 38 00 0f 85 d6 08 00 00 48 b9 00 00 00 00 00 fc ff df 48 8b 45 b8 4c 8b b8 f8 07 00 00 4c 89 f8 48 c1 e8 03 <80> 3c 08 00 0f 85 95 08 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b
RSP: 0018:ffff8880a94df0b8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8880858ac723 RCX: dffffc0000000000
RDX: 0000000000000100 RSI: 0000000000000007 RDI: 0000000000000001
RBP: ffff8880a94df140 R08: 0000000000000001 R09: ffff888061b83a80
R10: ffffed100c370752 R11: ffff888061b83a97 R12: 0000000000000026
R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f0defa33518 CR3: 0000000009871000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
---
 net/dccp/ccid.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/dccp/ccid.h b/net/dccp/ccid.h
index 6eb837a47b5c42f8c73c498525fe4d269fc5b97d..baaaeb2b2c4230edee893579d0ae5a755da3b2ee 100644
--- a/net/dccp/ccid.h
+++ b/net/dccp/ccid.h
@@ -202,7 +202,7 @@ static inline void ccid_hc_tx_packet_recv(struct ccid *ccid, struct sock *sk,
 static inline int ccid_hc_tx_parse_options(struct ccid *ccid, struct sock *sk,
 					   u8 pkt, u8 opt, u8 *val, u8 len)
 {
-	if (ccid->ccid_ops->ccid_hc_tx_parse_options == NULL)
+	if (!ccid || !ccid->ccid_ops->ccid_hc_tx_parse_options)
 		return 0;
 	return ccid->ccid_ops->ccid_hc_tx_parse_options(sk, pkt, opt, val, len);
 }
@@ -214,7 +214,7 @@ static inline int ccid_hc_tx_parse_options(struct ccid *ccid, struct sock *sk,
 static inline int ccid_hc_rx_parse_options(struct ccid *ccid, struct sock *sk,
 					   u8 pkt, u8 opt, u8 *val, u8 len)
 {
-	if (ccid->ccid_ops->ccid_hc_rx_parse_options == NULL)
+	if (!ccid || !ccid->ccid_ops->ccid_hc_rx_parse_options)
 		return 0;
 	return ccid->ccid_ops->ccid_hc_rx_parse_options(sk, pkt, opt, val, len);
 }
-- 
2.20.1.495.gaa96b0ce6b-goog


^ permalink raw reply related

* Re: bpf memory model. Was: [PATCH v4 bpf-next 1/9] bpf: introduce bpf_spin_lock
From: Alexei Starovoitov @ 2019-01-30 19:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Alexei Starovoitov, davem, daniel, jakub.kicinski, netdev,
	kernel-team, mingo, will.deacon, Paul McKenney, jannh
In-Reply-To: <20190130085850.GA2278@hirez.programming.kicks-ass.net>

On Wed, Jan 30, 2019 at 09:58:50AM +0100, Peter Zijlstra wrote:
> On Tue, Jan 29, 2019 at 06:32:13PM -0800, Alexei Starovoitov wrote:
> > On Tue, Jan 29, 2019 at 10:16:54AM +0100, Peter Zijlstra wrote:
> > > On Mon, Jan 28, 2019 at 01:56:24PM -0800, Alexei Starovoitov wrote:
> > > > On Mon, Jan 28, 2019 at 10:24:08AM +0100, Peter Zijlstra wrote:
> > > 
> > > > > Ah, but the loop won't be in the BPF program itself. The BPF program
> > > > > would only have had the BPF_SPIN_LOCK instruction, the JIT them emits
> > > > > code similar to queued_spin_lock()/queued_spin_unlock() (or calls to
> > > > > out-of-line versions of them).
> > > > 
> > > > As I said we considered exactly that and such approach has a lot of downsides
> > > > comparing with the helper approach.
> > > > Pretty much every time new feature is added we're evaluating whether it
> > > > should be new instruction or new helper. 99% of the time we go with new helper.
> > > 
> > > Ah; it seems I'm confused on helper vs instruction. As in, I've no idea
> > > what a helper is.
> > 
> > bpf helper is a normal kernel function that can be called from bpf program.
> > In assembler it's a direct function call.
> 
> Ah, it is what is otherwise known as a standard library,
> 
> > > > > There isn't anything that mandates the JIT uses the exact same locking
> > > > > routines the interpreter does, is there?
> > > > 
> > > > sure. This bpf_spin_lock() helper can be optimized whichever way the kernel wants.
> > > > Like bpf_map_lookup_elem() call is _inlined_ by the verifier for certain map types.
> > > > JITs don't even need to do anything. It looks like function call from bpf prog
> > > > point of view, but in JITed code it is a sequence of native instructions.
> > > > 
> > > > Say tomorrow we find out that bpf_prog->bpf_spin_lock()->queued_spin_lock()
> > > > takes too much time then we can inline fast path of queued_spin_lock
> > > > directly into bpf prog and save function call cost.
> > > 
> > > OK, so then the JIT can optimize helpers. Would it not make sense to
> > > have the simple test-and-set spinlock in the generic code and have the
> > > JITs use arch_spinlock_t where appropriate?
> > 
> > I think that pretty much the same as what I have with qspinlock.
> > Instead of taking a risk how JIT writers implement bpf_spin_lock optimization
> > I'm using qspinlock on architectures that are known to support it.
> 
> I see the argument for it...
> 
> > So instead of starting with dumb test-and-set there will be faster
> > qspinlock from the start on x86, arm64 and few others archs.
> > Those are the archs we care about the most anyway. Other archs can take
> > time to optimize it (if optimizations are necessary at all).
> > In general hacking JITs is much harder and more error prone than
> > changing core and adding helpers. Hence we avoid touching JITs
> > as much as possible.
> 
> So archs/JITs are not trivially able to override those helper functions?
> Because for example ARM (32bit) doesn't do qspinlock but it's
> arch_spinlock_t is _much_ better than a TAS lock.

JITs can override. There is no 'ready to use' facility for all types
of helpers to do that, but it's easy enough to add.
Having said that I'm going to reject arm32 JIT patches that
are trying to use arch_spinlock instead of generic bpf_spin_lock.
The last thing arm32 jit needs is this type of optimization.
Other JITs is a different story.


^ permalink raw reply

* Re: [PATCH bpf-next 3/4] bpf: fix lockdep false positive in bpf_prog_register
From: Alexei Starovoitov @ 2019-01-30 19:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Alexei Starovoitov, davem, daniel, edumazet, jannh, netdev,
	kernel-team
In-Reply-To: <20190130101058.GD2278@hirez.programming.kicks-ass.net>

On Wed, Jan 30, 2019 at 11:10:58AM +0100, Peter Zijlstra wrote:
> 
> Why do you say this is not possible? All you need is 3 CPUs, one doing a
> CPU online, one doing a perf ioctl() and one doing that
> bpf_probe_register().

yeah. indeed. I'm impressed that lockdep figured it out
while I missed it manually looking through the code and stack traces.
Will reword the commit log.


^ permalink raw reply

* Re: [PATCH bpf-next 2/4] bpf: fix lockdep false positive in stackmap
From: Alexei Starovoitov @ 2019-01-30 19:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Alexei Starovoitov, davem, daniel, edumazet, jannh, netdev,
	kernel-team, Waiman Long
In-Reply-To: <20190130101530.GE2278@hirez.programming.kicks-ass.net>

On Wed, Jan 30, 2019 at 11:15:30AM +0100, Peter Zijlstra wrote:
> On Tue, Jan 29, 2019 at 08:04:56PM -0800, Alexei Starovoitov wrote:
> > Lockdep warns about false positive:
> 
> This is not a false positive, and you probably also need to use
> down_read_non_owner() to match this up_read_non_owner().
> 
> {up,down}_read() and {up,down}_read_non_owner() are not only different
> in the lockdep annotation; there is also optimistic spin stuff that
> relies on 'owner' tracking.

Can you point out in the code the spin bit?
As far as I can see sem->owner is debug only feature.
All owner checks are done under CONFIG_DEBUG_RWSEMS.

Also there is no down_read_trylock_non_owner() at the moment.
We can argue about it for -next, but I'd rather silence lockdep
with this patch today.


^ permalink raw reply

* Re: [PATCH bpf-next 1/4] bpf: fix lockdep false positive in percpu_freelist
From: Alexei Starovoitov @ 2019-01-30 19:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Alexei Starovoitov, davem, daniel, edumazet, jannh, netdev,
	kernel-team
In-Reply-To: <20190130102126.GF2278@hirez.programming.kicks-ass.net>

On Wed, Jan 30, 2019 at 11:21:26AM +0100, Peter Zijlstra wrote:
> On Tue, Jan 29, 2019 at 08:04:55PM -0800, Alexei Starovoitov wrote:
> > 
> > It has been explained that is a false positive here:
> > https://lkml.org/lkml/2018/7/25/756
> 
> Please, no external references like that. The best option is to fully

I strongly disagree.
We allowed all kinds of external links in bpf tree in the past and
going to continue doing so in the future.
I'm perfectly aware that some of them will go stale in a day or
in a year.


^ permalink raw reply

* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
From: Heiner Kallweit @ 2019-01-30 19:15 UTC (permalink / raw)
  To: Peter Ceiley; +Cc: Realtek linux nic maintainers, netdev
In-Reply-To: <CAMLO_R5E-HyzNNOJeeVKzVPTGkGJkRs6P2A++ia2wK60ptifLQ@mail.gmail.com>

Hi Peter,

recently I had somebody where pcie_aspm=off for whatever reason didn't
do the trick, can you also check with pcie_aspm.policy=performance.

And please check with "ethtool -S <if>" whether the chip statistics
show a significant number of errors.

If this doesn't help you may have to bisect to find the offending commit.

Heiner


On 30.01.2019 10:59, Peter Ceiley wrote:
> Hi Heiner,
> 
> I tried disabling the ASPM using the pcie_aspm=off kernel parameter
> and this made no difference.
> 
> I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and
> subsequently loaded the module in the running 4.19.18 kernel. I can
> confirm that this immediately resolved the issue and access to the NFS
> shares operated as expected.
> 
> I presume this means it is an issue with the r8169 driver included in
> 4.19 onwards?
> 
> To answer your last questions:
> 
> Base Board Information
>     Manufacturer: Alienware
>     Product Name: 0PGRP5
>     Version: A02
> 
> ... and yes, the RTL8168 is the onboard network chip.
> 
> Regards,
> 
> Peter.
> 
> On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>
>> Hi Peter,
>>
>> I think the vendor driver doesn't enable ASPM per default.
>> So it's worth a try to disable ASPM in the BIOS or via sysfs.
>> Few older systems seem to have issues with ASPM, what kind of
>> system / mainboard are you using? The RTL8168 is the onboard
>> network chip?
>>
>> Rgds, Heiner
>>
>>
>> On 29.01.2019 07:20, Peter Ceiley wrote:
>>> Hi Heiner,
>>>
>>> Thanks, I'll do some more testing. It might not be the driver - I
>>> assumed it was due to the fact that using the r8168 driver 'resolves'
>>> the issue. I'll see if I can test the r8169.c on top of 4.19 - this is
>>> a good idea.
>>>
>>> Cheers,
>>>
>>> Peter.
>>>
>>> On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>
>>>> Hi Peter,
>>>>
>>>> at a first glance it doesn't look like a typical driver issue.
>>>> What you could do:
>>>>
>>>> - Test the r8169.c from 4.18 on top of 4.19.
>>>>
>>>> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
>>>>
>>>> - Bisect between 4.18 and 4.19 to find the offending commit.
>>>>
>>>> Any specific reason why you think root cause is in the driver and not
>>>> elsewhere in the network subsystem?
>>>>
>>>> Heiner
>>>>
>>>>
>>>> On 28.01.2019 23:10, Peter Ceiley wrote:
>>>>> Hi Heiner,
>>>>>
>>>>> Thanks for getting back to me.
>>>>>
>>>>> No, I don't use jumbo packets.
>>>>>
>>>>> Bandwidth is *generally* good, and iperf results to my NAS provide
>>>>> over 900 Mbits/s in both circumstances. The issue seems to appear when
>>>>> establishing a connection and is most notable, for example, on my
>>>>> mounted NFS shares where it takes seconds (up to 10's of seconds on
>>>>> larger directories) to list the contents of each directory. Once a
>>>>> transfer begins on a file, I appear to get good bandwidth.
>>>>>
>>>>> I'm unsure of the best scientific data to provide you in order to
>>>>> troubleshoot this issue. Running the following
>>>>>
>>>>>     netstat -s |grep retransmitted
>>>>>
>>>>> shows a steady increase in retransmitted segments each time I list the
>>>>> contents of a remote directory, for example, running 'ls' on a
>>>>> directory containing 345 media files did the following using kernel
>>>>> 4.19.18:
>>>>>
>>>>> increased retransmitted segments by 21 and the 'time' command showed
>>>>> the following:
>>>>>     real    0m19.867s
>>>>>     user    0m0.012s
>>>>>     sys    0m0.036s
>>>>>
>>>>> The same command shows no retransmitted segments running kernel
>>>>> 4.18.16 and 'time' showed:
>>>>>     real    0m0.300s
>>>>>     user    0m0.004s
>>>>>     sys    0m0.007s
>>>>>
>>>>> ifconfig does not show any RX/TX errors nor dropped packets in either case.
>>>>>
>>>>> dmesg XID:
>>>>> [    2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
>>>>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
>>>>>
>>>>> # lspci -vv
>>>>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
>>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
>>>>>     Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>>>>>     Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
>>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
>>>>>     Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
>>>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
>>>>>     Latency: 0, Cache Line Size: 64 bytes
>>>>>     Interrupt: pin A routed to IRQ 19
>>>>>     Region 0: I/O ports at d000 [size=256]
>>>>>     Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
>>>>>     Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
>>>>>     Capabilities: [40] Power Management version 3
>>>>>         Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
>>>>> PME(D0+,D1+,D2+,D3hot+,D3cold+)
>>>>>         Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
>>>>>     Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
>>>>>         Address: 0000000000000000  Data: 0000
>>>>>     Capabilities: [70] Express (v2) Endpoint, MSI 01
>>>>>         DevCap:    MaxPayload 128 bytes, PhantFunc 0, Latency L0s
>>>>> <512ns, L1 <64us
>>>>>             ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>>>>> SlotPowerLimit 10.000W
>>>>>         DevCtl:    CorrErr- NonFatalErr- FatalErr- UnsupReq-
>>>>>             RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>>>>>             MaxPayload 128 bytes, MaxReadReq 4096 bytes
>>>>>         DevSta:    CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
>>>>>         LnkCap:    Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
>>>>> Latency L0s unlimited, L1 <64us
>>>>>             ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
>>>>>         LnkCtl:    ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
>>>>>             ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
>>>>>         LnkSta:    Speed 2.5GT/s (ok), Width x1 (ok)
>>>>>             TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>>>>>         DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
>>>>> OBFF Via message/WAKE#
>>>>>              AtomicOpsCap: 32bit- 64bit- 128bitCAS-
>>>>>         DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
>>>>> OBFF Disabled
>>>>>              AtomicOpsCtl: ReqEn-
>>>>>         LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
>>>>>              Transmit Margin: Normal Operating Range,
>>>>> EnterModifiedCompliance- ComplianceSOS-
>>>>>              Compliance De-emphasis: -6dB
>>>>>         LnkSta2: Current De-emphasis Level: -6dB,
>>>>> EqualizationComplete-, EqualizationPhase1-
>>>>>              EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
>>>>>     Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
>>>>>         Vector table: BAR=4 offset=00000000
>>>>>         PBA: BAR=4 offset=00000800
>>>>>     Capabilities: [d0] Vital Product Data
>>>>> pcilib: sysfs_read_vpd: read failed: Input/output error
>>>>>         Not readable
>>>>>     Capabilities: [100 v1] Advanced Error Reporting
>>>>>         UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>>>         UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>>>         UESvrt:    DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
>>>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>>>>>         CESta:    RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
>>>>>         CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
>>>>>         AERCap:    First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
>>>>> ECRCChkCap+ ECRCChkEn-
>>>>>             MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
>>>>>         HeaderLog: 00000000 00000000 00000000 00000000
>>>>>     Capabilities: [140 v1] Virtual Channel
>>>>>         Caps:    LPEVC=0 RefClk=100ns PATEntryBits=1
>>>>>         Arb:    Fixed- WRR32- WRR64- WRR128-
>>>>>         Ctrl:    ArbSelect=Fixed
>>>>>         Status:    InProgress-
>>>>>         VC0:    Caps:    PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
>>>>>             Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
>>>>>             Ctrl:    Enable+ ID=0 ArbSelect=Fixed TC/VC=01
>>>>>             Status:    NegoPending- InProgress-
>>>>>     Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
>>>>>     Capabilities: [170 v1] Latency Tolerance Reporting
>>>>>         Max snoop latency: 71680ns
>>>>>         Max no snoop latency: 71680ns
>>>>>     Kernel driver in use: r8169
>>>>>     Kernel modules: r8169
>>>>>
>>>>> Please let me know if you have any other ideas in terms of testing.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Peter.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>>
>>>>>> On 28.01.2019 12:13, Peter Ceiley wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I have been experiencing very poor network performance since Kernel
>>>>>>> 4.19 and I'm confident it's related to the r8169 driver.
>>>>>>>
>>>>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing
>>>>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with
>>>>>>> 4.20.4 & 4.19.18).
>>>>>>>
>>>>>>> If someone could guide me in the right direction, I'm happy to help
>>>>>>> troubleshoot this issue. Note that I have been keeping an eye on one
>>>>>>> issue related to loading of the PHY driver, however, my symptoms
>>>>>>> differ in that I still have a network connection. I have attempted to
>>>>>>> reload the driver on a running system, but this does not improve the
>>>>>>> situation.
>>>>>>>
>>>>>>> Using the proprietary r8168 driver returns my device to proper working order.
>>>>>>>
>>>>>>> lshw shows:
>>>>>>>        description: Ethernet interface
>>>>>>>        product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>>>>>>>        vendor: Realtek Semiconductor Co., Ltd.
>>>>>>>        physical id: 0
>>>>>>>        bus info: pci@0000:03:00.0
>>>>>>>        logical name: enp3s0
>>>>>>>        version: 0c
>>>>>>>        serial:
>>>>>>>        size: 1Gbit/s
>>>>>>>        capacity: 1Gbit/s
>>>>>>>        width: 64 bits
>>>>>>>        clock: 33MHz
>>>>>>>        capabilities: pm msi pciexpress msix vpd bus_master cap_list
>>>>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
>>>>>>> 1000bt-fd autonegotiation
>>>>>>>        configuration: autonegotiation=on broadcast=yes driver=r8169
>>>>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
>>>>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
>>>>>>>        resources: irq:19 ioport:d000(size=256)
>>>>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
>>>>>>>
>>>>>>> Kind Regards,
>>>>>>>
>>>>>>> Peter.
>>>>>>>
>>>>>> Hi Peter,
>>>>>>
>>>>>> the description "poor network performance" is quite vague, therefore:
>>>>>>
>>>>>> - Can you provide any measurements?
>>>>>> - iperf results before and after
>>>>>> - statistics about dropped packets (rx and/or tx)
>>>>>> - Do you use jumbo packets?
>>>>>>
>>>>>> Also help would be a "lspci -vv" output for the network card and
>>>>>> the dmesg output line with the chip XID.
>>>>>>
>>>>>> Heiner
>>>>>
>>>>
>>>
>>
> 


^ permalink raw reply

* Re: net: phylink: dsa: mv88e6xxx: flaky link detection on switch ports with internal PHYs
From: Andrew Lunn @ 2019-01-30 19:09 UTC (permalink / raw)
  To: John David Anglin; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev
In-Reply-To: <c45116c4-ce1d-8a59-dc8b-857247e9c840@bell.net>

> In /proc/interrupts, the switch interrupts are shown as edge.  The only
> place that I see where this
> is potentially set is mv88e6xxx_g2_watchdog_setup() where the call to
> request_threaded_irq() passes
> "IRQF_ONESHOT | IRQF_TRIGGER_FALLING".  Does this need to change?

Hi Dave

Device tree determines how the interrupt is configured.  It is the
second part of the interrupts property.

       Andrew

^ permalink raw reply

* Re: [PATCH bpf-next v4 5/7] samples/bpf: Add a "force" flag to XDP samples
From: Jesper Dangaard Brouer @ 2019-01-30 19:08 UTC (permalink / raw)
  To: Maciej Fijalkowski; +Cc: daniel, ast, netdev, jakub.kicinski, brouer
In-Reply-To: <20190130191248.00000167@gmail.com>

On Wed, 30 Jan 2019 19:12:48 +0100
Maciej Fijalkowski <maciejromanfijalkowski@gmail.com> wrote:

> On Tue, 29 Jan 2019 12:34:05 +0100
> Jesper Dangaard Brouer <brouer@redhat.com> wrote:
> 
> > On Tue, 29 Jan 2019 09:00:00 +0100
> > Jesper Dangaard Brouer <brouer@redhat.com> wrote:
> >   
> > > On Mon, 28 Jan 2019 20:16:11 +0100
> > > Maciej Fijalkowski <maciejromanfijalkowski@gmail.com> wrote:
> > >     
> > > > From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > > > 
> > > > Make xdp samples consistent with iproute2 behavior and set the
> > > > XDP_FLAGS_UPDATE_IF_NOEXIST by default when setting the xdp program on
> > > > interface. Provide an option for user to force the program loading,
> > > > which as a result will not include the mentioned flag in
> > > > bpf_set_link_xdp_fd call.      
> > > 
> > > I like the idea, but what is the error message users get after this
> > > change?    
> > 
> > $ sudo ./xdp1 mlx5p1 &
> > [1] 9768
> > 
> > $ sudo ./xdp1 mlx5p1 
> > link set xdp fd failed
> > 
> > This error message is a little too generic to be a good user experience.
> > The kernel (in dev_change_xdp_fd) will return errno -EBUSY (-16), but
> > we don't use or report the return value in these sample programs.
> > 
> > If my QA see this error message, I will still get an error report
> > bugzilla that I need to spend time on investigating.  Can we please
> > improve this error message? 
> > 
> > If you are really cool you get inspired by (or use) libbpf_strerror()
> > code avail in tools/lib/bpf/libbpf_errno.c.  Default strerror(EBUSY)
> > will return "Device or resource busy", but maybe we can do slightly
> > better and report something more meaningful for this XDP context.
> >   
> I'll post a v5 with libbpf_strerror() usage when bpf_set_link_xdp_fd failed in
> samples but at this point it will only give us a standard "device or resource
> busy" error

That is a good first iteration improvement.  And if QA complains, I can
quickly diagnose and explain the issue, based on this generic message,
without digging further and wasting more time.

> , so if we need some more meaningful message that libbpf will give
> us then I guess we need to define a new libbpf_errno entry (as well as entry in
> libbpf_strerror_table for this new errno value) and set the errno in
> bpf_set_link_xdp_fd in case of a failure?

It likely require more work do provide more meaningful messages, and I
guess it is out-of-scope for your patchset.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* [PATCH net-next v2 7/7] ethtool: add compat for devlink info
From: Jakub Kicinski @ 2019-01-30 19:05 UTC (permalink / raw)
  To: davem
  Cc: netdev, oss-drivers, jiri, andrew, f.fainelli, mkubecek, eugenem,
	jonathan.lemon, Jakub Kicinski
In-Reply-To: <20190130190513.25718-1-jakub.kicinski@netronome.com>

If driver did not fill the fw_version field, try to call into
the new devlink get_info op and collect the versions that way.
We assume ethtool was always reporting running versions.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 include/net/devlink.h |  7 ++++++
 net/core/devlink.c    | 52 ++++++++++++++++++++++++++++++++++++++++++-
 net/core/ethtool.c    |  7 ++++++
 3 files changed, 65 insertions(+), 1 deletion(-)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index c678ed0cb099..b4750e93303a 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -640,6 +640,8 @@ int devlink_info_report_version(struct devlink_info_req *req,
 				enum devlink_version_type type,
 				const char *version_name,
 				const char *version_value);
+void devlink_compat_running_versions(struct net_device *dev,
+				     char *buf, size_t len);
 
 #else
 
@@ -957,6 +959,11 @@ devlink_info_report_version(struct devlink_info_req *req,
 {
 	return 0;
 }
+
+static inline void
+devlink_compat_running_versions(struct net_device *dev, char *buf, size_t len)
+{
+}
 #endif
 
 #endif /* _NET_DEVLINK_H_ */
diff --git a/net/core/devlink.c b/net/core/devlink.c
index e2027d3a5e40..5313e5918ee2 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -3715,12 +3715,18 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
 }
 
 struct devlink_info_req {
+	bool compat;
 	struct sk_buff *msg;
+	/* For compat call */
+	char *buf;
+	size_t len;
 };
 
 int devlink_info_report_driver_name(struct devlink_info_req *req,
 				    const char *name)
 {
+	if (req->compat)
+		return 0;
 	return nla_put_string(req->msg, DEVLINK_ATTR_INFO_DRV_NAME, name);
 }
 EXPORT_SYMBOL_GPL(devlink_info_report_driver_name);
@@ -3728,6 +3734,8 @@ EXPORT_SYMBOL_GPL(devlink_info_report_driver_name);
 int devlink_info_report_serial_number(struct devlink_info_req *req,
 				      const char *sn)
 {
+	if (req->compat)
+		return 0;
 	return nla_put_string(req->msg, DEVLINK_ATTR_INFO_SERIAL_NUMBER, sn);
 }
 EXPORT_SYMBOL_GPL(devlink_info_report_serial_number);
@@ -3743,7 +3751,15 @@ int devlink_info_report_version(struct devlink_info_req *req,
 		[DEVLINK_VERSION_RUNNING] = DEVLINK_ATTR_INFO_VERSION_RUNNING,
 	};
 	struct nlattr *nest;
-	int err;
+	int len, err;
+
+	if (req->compat) {
+		if (type == DEVLINK_VERSION_RUNNING) {
+			len = strlcpy(req->buf, version_value, req->len);
+			req->len = max_t(size_t, 0, req->len - len);
+		}
+		return 0;
+	}
 
 	if (type >= ARRAY_SIZE(type2attr) || !type2attr[type])
 		return -EINVAL;
@@ -3789,6 +3805,7 @@ devlink_nl_info_fill(struct sk_buff *msg, struct devlink *devlink,
 	if (devlink_nl_put_handle(msg, devlink))
 		goto err_cancel_msg;
 
+	memset(&req, 0, sizeof(req));
 	req.msg = msg;
 	err = devlink->ops->info_get(devlink, &req, extack);
 	if (err)
@@ -5263,6 +5280,39 @@ int devlink_region_snapshot_create(struct devlink_region *region, u64 data_len,
 }
 EXPORT_SYMBOL_GPL(devlink_region_snapshot_create);
 
+void devlink_compat_running_versions(struct net_device *dev,
+				     char *buf, size_t len)
+{
+	struct devlink_port *devlink_port;
+	struct devlink_info_req req;
+	struct devlink *devlink;
+	bool found = false;
+
+	mutex_lock(&devlink_mutex);
+	list_for_each_entry(devlink, &devlink_list, list) {
+		mutex_lock(&devlink->lock);
+		list_for_each_entry(devlink_port, &devlink->port_list, list) {
+			if (devlink_port->type == DEVLINK_PORT_TYPE_ETH ||
+			    devlink_port->type_dev == dev) {
+				mutex_unlock(&devlink->lock);
+				found = true;
+				goto out;
+			}
+		}
+		mutex_unlock(&devlink->lock);
+	}
+out:
+	if (found && devlink->ops->info_get) {
+		memset(&req, 0, sizeof(req));
+		req.compat = true;
+		req.buf = buf;
+		req.len = len;
+
+		devlink->ops->info_get(devlink, &req, NULL);
+	}
+	mutex_unlock(&devlink_mutex);
+}
+
 static int __init devlink_module_init(void)
 {
 	return genl_register_family(&devlink_nl_family);
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 158264f7cfaf..176b17d11f08 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -27,6 +27,7 @@
 #include <linux/rtnetlink.h>
 #include <linux/sched/signal.h>
 #include <linux/net.h>
+#include <net/devlink.h>
 #include <net/xdp_sock.h>
 
 /*
@@ -803,6 +804,12 @@ static noinline_for_stack int ethtool_get_drvinfo(struct net_device *dev,
 	if (ops->get_eeprom_len)
 		info.eedump_len = ops->get_eeprom_len(dev);
 
+	rtnl_unlock();
+	if (!info.fw_version[0])
+		devlink_compat_running_versions(dev, info.fw_version,
+						ARRAY_SIZE(info.fw_version));
+	rtnl_lock();
+
 	if (copy_to_user(useraddr, &info, sizeof(info)))
 		return -EFAULT;
 	return 0;
-- 
2.19.2


^ permalink raw reply related

* [PATCH net-next v2 6/7] nfp: devlink: report the running and flashed versions
From: Jakub Kicinski @ 2019-01-30 19:05 UTC (permalink / raw)
  To: davem
  Cc: netdev, oss-drivers, jiri, andrew, f.fainelli, mkubecek, eugenem,
	jonathan.lemon, Jakub Kicinski
In-Reply-To: <20190130190513.25718-1-jakub.kicinski@netronome.com>

Report versions of firmware components using the new NSP command.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 .../net/ethernet/netronome/nfp/nfp_devlink.c  | 86 +++++++++++++++++++
 include/net/devlink.h                         | 11 +++
 2 files changed, 97 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
index 9857fa663adf..fade37d2b796 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
@@ -207,11 +207,59 @@ nfp_devlink_versions_get_hwinfo(struct nfp_pf *pf, struct devlink_info_req *req)
 	return 0;
 }
 
+static const struct nfp_devlink_versions {
+	enum nfp_nsp_versions id;
+	const char *key;
+} nfp_devlink_versions_nsp[] = {
+	{ NFP_VERSIONS_BUNDLE,	"fw.bundle_id", },
+	{ NFP_VERSIONS_BSP,	DEVLINK_VERSION_GENERIC_FW_MGMT, },
+	{ NFP_VERSIONS_CPLD,	"fw.cpld", },
+	{ NFP_VERSIONS_APP,	DEVLINK_VERSION_GENERIC_FW_APP, },
+	{ NFP_VERSIONS_UNDI,	DEVLINK_VERSION_GENERIC_FW_UNDI, },
+	{ NFP_VERSIONS_NCSI,	DEVLINK_VERSION_GENERIC_FW_NCSI, },
+	{ NFP_VERSIONS_CFGR,	"chip.init", },
+};
+
+static int
+nfp_devlink_versions_get_nsp(struct devlink_info_req *req, bool flash,
+			     const u8 *buf, unsigned int size)
+{
+	enum devlink_version_type type;
+	unsigned int i;
+	int err;
+
+	type = flash ? DEVLINK_VERSION_STORED : DEVLINK_VERSION_RUNNING;
+
+	for (i = 0; i < ARRAY_SIZE(nfp_devlink_versions_nsp); i++) {
+		const struct nfp_devlink_versions *info;
+		const char *version;
+
+		info = &nfp_devlink_versions_nsp[i];
+
+		version = nfp_nsp_versions_get(info->id, flash, buf, size);
+		if (IS_ERR(version)) {
+			if (PTR_ERR(version) == -ENOENT)
+				continue;
+			else
+				return PTR_ERR(version);
+		}
+
+		err = devlink_info_report_version(req, type,
+						  info->key, version);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
 static int
 nfp_devlink_info_get(struct devlink *devlink, struct devlink_info_req *req,
 		     struct netlink_ext_ack *extack)
 {
 	struct nfp_pf *pf = devlink_priv(devlink);
+	struct nfp_nsp *nsp;
+	char *buf = NULL;
 	const char *sn;
 	int err;
 
@@ -226,7 +274,45 @@ nfp_devlink_info_get(struct devlink *devlink, struct devlink_info_req *req,
 			return err;
 	}
 
+	nsp = nfp_nsp_open(pf->cpp);
+	if (IS_ERR(nsp)) {
+		NL_SET_ERR_MSG_MOD(extack, "can't access NSP");
+		return PTR_ERR(nsp);
+	}
+
+	if (nfp_nsp_has_versions(nsp)) {
+		buf = kzalloc(NFP_NSP_VERSION_BUFSZ, GFP_KERNEL);
+		if (!buf) {
+			err = -ENOMEM;
+			goto err_close_nsp;
+		}
+
+		err = nfp_nsp_versions(nsp, buf, NFP_NSP_VERSION_BUFSZ);
+		if (err)
+			goto err_free_buf;
+
+		err = nfp_devlink_versions_get_nsp(req, false,
+						   buf, NFP_NSP_VERSION_BUFSZ);
+		if (err)
+			goto err_free_buf;
+
+		err = nfp_devlink_versions_get_nsp(req, true,
+						   buf, NFP_NSP_VERSION_BUFSZ);
+		if (err)
+			goto err_free_buf;
+
+		kfree(buf);
+	}
+
+	nfp_nsp_close(nsp);
+
 	return nfp_devlink_versions_get_hwinfo(pf, req);
+
+err_free_buf:
+	kfree(buf);
+err_close_nsp:
+	nfp_nsp_close(nsp);
+	return err;
 }
 
 const struct devlink_ops nfp_devlink_ops = {
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 3d553cc6693d..c678ed0cb099 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -433,6 +433,17 @@ enum devlink_param_wol_types {
 /* Revision of board design */
 #define DEVLINK_VERSION_GENERIC_BOARD_REV	"board.rev"
 
+/* Control processor FW version, FW is responsible for house keeping tasks,
+ * PHY control etc.
+ */
+#define DEVLINK_VERSION_GENERIC_FW_MGMT		"fw.mgmt"
+/* Data path microcode controlling high-speed packet processing */
+#define DEVLINK_VERSION_GENERIC_FW_APP		"fw.app"
+/* UNDI software version */
+#define DEVLINK_VERSION_GENERIC_FW_UNDI		"fw.undi"
+/* NCSI support/handler version */
+#define DEVLINK_VERSION_GENERIC_FW_NCSI		"fw.ncsi"
+
 enum devlink_version_type {
 	DEVLINK_VERSION_FIXED,
 	DEVLINK_VERSION_STORED,
-- 
2.19.2


^ permalink raw reply related

* [PATCH net-next v2 5/7] nfp: nsp: add support for versions command
From: Jakub Kicinski @ 2019-01-30 19:05 UTC (permalink / raw)
  To: davem
  Cc: netdev, oss-drivers, jiri, andrew, f.fainelli, mkubecek, eugenem,
	jonathan.lemon, Jakub Kicinski
In-Reply-To: <20190130190513.25718-1-jakub.kicinski@netronome.com>

Retrieve the FW versions with the new command.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 .../ethernet/netronome/nfp/nfpcore/nfp_nsp.c  | 61 +++++++++++++++++++
 .../ethernet/netronome/nfp/nfpcore/nfp_nsp.h  | 20 ++++++
 2 files changed, 81 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
index ce1577bbbd2a..a9d53df0070c 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
@@ -7,6 +7,7 @@
  *         Jason McMullan <jason.mcmullan@netronome.com>
  */
 
+#include <asm/unaligned.h>
 #include <linux/bitfield.h>
 #include <linux/delay.h>
 #include <linux/firmware.h>
@@ -62,6 +63,16 @@
 
 #define NFP_HWINFO_LOOKUP_SIZE	GENMASK(11, 0)
 
+#define NFP_VERSIONS_SIZE	GENMASK(11, 0)
+#define NFP_VERSIONS_CNT_OFF	0
+#define NFP_VERSIONS_BSP_OFF	2
+#define NFP_VERSIONS_CPLD_OFF	6
+#define NFP_VERSIONS_APP_OFF	10
+#define NFP_VERSIONS_BUNDLE_OFF	14
+#define NFP_VERSIONS_UNDI_OFF	18
+#define NFP_VERSIONS_NCSI_OFF	22
+#define NFP_VERSIONS_CFGR_OFF	26
+
 enum nfp_nsp_cmd {
 	SPCODE_NOOP		= 0, /* No operation */
 	SPCODE_SOFT_RESET	= 1, /* Soft reset the NFP */
@@ -77,6 +88,7 @@ enum nfp_nsp_cmd {
 	SPCODE_NSP_IDENTIFY	= 13, /* Read NSP version */
 	SPCODE_FW_STORED	= 16, /* If no FW loaded, load flash app FW */
 	SPCODE_HWINFO_LOOKUP	= 17, /* Lookup HWinfo with overwrites etc. */
+	SPCODE_VERSIONS		= 21, /* Report FW versions */
 };
 
 static const struct {
@@ -711,3 +723,52 @@ int nfp_nsp_hwinfo_lookup(struct nfp_nsp *state, void *buf, unsigned int size)
 
 	return 0;
 }
+
+int nfp_nsp_versions(struct nfp_nsp *state, void *buf, unsigned int size)
+{
+	struct nfp_nsp_command_buf_arg versions = {
+		{
+			.code		= SPCODE_VERSIONS,
+			.option		= min_t(u32, size, NFP_VERSIONS_SIZE),
+		},
+		.out_buf	= buf,
+		.out_size	= min_t(u32, size, NFP_VERSIONS_SIZE),
+	};
+
+	return nfp_nsp_command_buf(state, &versions);
+}
+
+const char *nfp_nsp_versions_get(enum nfp_nsp_versions id, bool flash,
+				 const u8 *buf, unsigned int size)
+{
+	static const u32 id2off[] = {
+		[NFP_VERSIONS_BSP] =	NFP_VERSIONS_BSP_OFF,
+		[NFP_VERSIONS_CPLD] =	NFP_VERSIONS_CPLD_OFF,
+		[NFP_VERSIONS_APP] =	NFP_VERSIONS_APP_OFF,
+		[NFP_VERSIONS_BUNDLE] =	NFP_VERSIONS_BUNDLE_OFF,
+		[NFP_VERSIONS_UNDI] =	NFP_VERSIONS_UNDI_OFF,
+		[NFP_VERSIONS_NCSI] =	NFP_VERSIONS_NCSI_OFF,
+		[NFP_VERSIONS_CFGR] =	NFP_VERSIONS_CFGR_OFF,
+	};
+	unsigned int field, buf_field_cnt, buf_off;
+
+	if (id >= ARRAY_SIZE(id2off) || !id2off[id])
+		return ERR_PTR(-EINVAL);
+
+	field = id * 2 + flash;
+
+	buf_field_cnt = get_unaligned_le16(buf);
+	if (buf_field_cnt <= field)
+		return ERR_PTR(-ENOENT);
+
+	buf_off = get_unaligned_le16(buf + id2off[id] + flash * 2);
+	if (!buf_off)
+		return ERR_PTR(-ENOENT);
+
+	if (buf_off >= size)
+		return ERR_PTR(-EINVAL);
+	if (strnlen(&buf[buf_off], size - buf_off) == size - buf_off)
+		return ERR_PTR(-EINVAL);
+
+	return (const char *)&buf[buf_off];
+}
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h
index ff33ac54097a..246e213f1514 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h
@@ -38,6 +38,11 @@ static inline bool nfp_nsp_has_hwinfo_lookup(struct nfp_nsp *state)
 	return nfp_nsp_get_abi_ver_minor(state) > 24;
 }
 
+static inline bool nfp_nsp_has_versions(struct nfp_nsp *state)
+{
+	return nfp_nsp_get_abi_ver_minor(state) > 27;
+}
+
 enum nfp_eth_interface {
 	NFP_INTERFACE_NONE	= 0,
 	NFP_INTERFACE_SFP	= 1,
@@ -208,4 +213,19 @@ enum nfp_nsp_sensor_id {
 int nfp_hwmon_read_sensor(struct nfp_cpp *cpp, enum nfp_nsp_sensor_id id,
 			  long *val);
 
+#define NFP_NSP_VERSION_BUFSZ	1024 /* reasonable size, not in the ABI */
+
+enum nfp_nsp_versions {
+	NFP_VERSIONS_BSP,
+	NFP_VERSIONS_CPLD,
+	NFP_VERSIONS_APP,
+	NFP_VERSIONS_BUNDLE,
+	NFP_VERSIONS_UNDI,
+	NFP_VERSIONS_NCSI,
+	NFP_VERSIONS_CFGR,
+};
+
+int nfp_nsp_versions(struct nfp_nsp *state, void *buf, unsigned int size);
+const char *nfp_nsp_versions_get(enum nfp_nsp_versions id, bool flash,
+				 const u8 *buf, unsigned int size);
 #endif
-- 
2.19.2


^ permalink raw reply related

* [PATCH net-next v2 4/7] nfp: devlink: report fixed versions
From: Jakub Kicinski @ 2019-01-30 19:05 UTC (permalink / raw)
  To: davem
  Cc: netdev, oss-drivers, jiri, andrew, f.fainelli, mkubecek, eugenem,
	jonathan.lemon, Jakub Kicinski
In-Reply-To: <20190130190513.25718-1-jakub.kicinski@netronome.com>

Report information about the hardware.

RFCv2:
 - add defines for board IDs which are likely to be reusable for
   other drivers (Jiri).

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 .../net/ethernet/netronome/nfp/nfp_devlink.c  | 37 ++++++++++++++++++-
 include/net/devlink.h                         |  5 +++
 2 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
index cb3ef7e46614..9857fa663adf 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
@@ -172,6 +172,41 @@ static int nfp_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode,
 	return ret;
 }
 
+static const struct nfp_devlink_versions_simple {
+	const char *key;
+	const char *hwinfo;
+} nfp_devlink_versions_hwinfo[] = {
+	{ DEVLINK_VERSION_GENERIC_BOARD_ID,	"assembly.partno", },
+	{ DEVLINK_VERSION_GENERIC_BOARD_REV,	"assembly.revision", },
+	{ "board.vendor", /* fab */		"assembly.vendor", },
+	{ "board.model", /* code name */	"assembly.model", },
+};
+
+static int
+nfp_devlink_versions_get_hwinfo(struct nfp_pf *pf, struct devlink_info_req *req)
+{
+	unsigned int i;
+	int err;
+
+	for (i = 0; i < ARRAY_SIZE(nfp_devlink_versions_hwinfo); i++) {
+		const struct nfp_devlink_versions_simple *info;
+		const char *val;
+
+		info = &nfp_devlink_versions_hwinfo[i];
+
+		val = nfp_hwinfo_lookup(pf->hwinfo, info->hwinfo);
+		if (!val)
+			continue;
+
+		err = devlink_info_report_version(req, DEVLINK_VERSION_FIXED,
+						  info->key, val);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
 static int
 nfp_devlink_info_get(struct devlink *devlink, struct devlink_info_req *req,
 		     struct netlink_ext_ack *extack)
@@ -191,7 +226,7 @@ nfp_devlink_info_get(struct devlink *devlink, struct devlink_info_req *req,
 			return err;
 	}
 
-	return 0;
+	return nfp_devlink_versions_get_hwinfo(pf, req);
 }
 
 const struct devlink_ops nfp_devlink_ops = {
diff --git a/include/net/devlink.h b/include/net/devlink.h
index ec72638aa47f..3d553cc6693d 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -428,6 +428,11 @@ enum devlink_param_wol_types {
 	.validate = _validate,						\
 }
 
+/* Part number, identifier of board design */
+#define DEVLINK_VERSION_GENERIC_BOARD_ID	"board.id"
+/* Revision of board design */
+#define DEVLINK_VERSION_GENERIC_BOARD_REV	"board.rev"
+
 enum devlink_version_type {
 	DEVLINK_VERSION_FIXED,
 	DEVLINK_VERSION_STORED,
-- 
2.19.2


^ permalink raw reply related

* [PATCH net-next v2 3/7] nfp: devlink: report driver name and serial number
From: Jakub Kicinski @ 2019-01-30 19:05 UTC (permalink / raw)
  To: davem
  Cc: netdev, oss-drivers, jiri, andrew, f.fainelli, mkubecek, eugenem,
	jonathan.lemon, Jakub Kicinski
In-Reply-To: <20190130190513.25718-1-jakub.kicinski@netronome.com>

Report the basic info through new devlink info API.

RFCv2:
 - add driver name;
 - align serial to core changes.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 .../net/ethernet/netronome/nfp/nfp_devlink.c  | 24 +++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
index 808647ec3573..cb3ef7e46614 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
@@ -4,6 +4,7 @@
 #include <linux/rtnetlink.h>
 #include <net/devlink.h>
 
+#include "nfpcore/nfp.h"
 #include "nfpcore/nfp_nsp.h"
 #include "nfp_app.h"
 #include "nfp_main.h"
@@ -171,6 +172,28 @@ static int nfp_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode,
 	return ret;
 }
 
+static int
+nfp_devlink_info_get(struct devlink *devlink, struct devlink_info_req *req,
+		     struct netlink_ext_ack *extack)
+{
+	struct nfp_pf *pf = devlink_priv(devlink);
+	const char *sn;
+	int err;
+
+	err = devlink_info_report_driver_name(req, "nfp");
+	if (err)
+		return err;
+
+	sn = nfp_hwinfo_lookup(pf->hwinfo, "assembly.serial");
+	if (sn) {
+		err = devlink_info_report_serial_number(req, sn);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
 const struct devlink_ops nfp_devlink_ops = {
 	.port_split		= nfp_devlink_port_split,
 	.port_unsplit		= nfp_devlink_port_unsplit,
@@ -178,6 +201,7 @@ const struct devlink_ops nfp_devlink_ops = {
 	.sb_pool_set		= nfp_devlink_sb_pool_set,
 	.eswitch_mode_get	= nfp_devlink_eswitch_mode_get,
 	.eswitch_mode_set	= nfp_devlink_eswitch_mode_set,
+	.info_get		= nfp_devlink_info_get,
 };
 
 int nfp_devlink_port_register(struct nfp_app *app, struct nfp_port *port)
-- 
2.19.2


^ permalink raw reply related

* [PATCH net-next v2 2/7] devlink: add version reporting to devlink info API
From: Jakub Kicinski @ 2019-01-30 19:05 UTC (permalink / raw)
  To: davem
  Cc: netdev, oss-drivers, jiri, andrew, f.fainelli, mkubecek, eugenem,
	jonathan.lemon, Jakub Kicinski
In-Reply-To: <20190130190513.25718-1-jakub.kicinski@netronome.com>

ethtool -i has a few fixed-size fields which can be used to report
firmware version and expansion ROM version.  Unfortunately, modern
hardware has more firmware components.  There is usually some
datapath microcode, management controller, PXE drivers, and a
CPLD load.  Running ethtool -i on modern controllers reveals the
fact that vendors cram multiple values into firmware version field.

Here are some examples from systems I could lay my hands on quickly:

tg3:  "FFV20.2.17 bc 5720-v1.39"
i40e: "6.01 0x800034a4 1.1747.0"
nfp:  "0.0.3.5 0.25 sriov-2.1.16 nic"

Add a new devlink API to allow retrieving multiple versions, and
provide user-readable name for those versions.

While at it break down the versions into three categories:
 - fixed - this is the board/fixed component version, usually vendors
           report information like the board version in the PCI VPD,
           but it will benefit from naming and common API as well;
 - running - this is the running firmware version;
 - stored - this is firmware in the flash, after firmware update
            this value will reflect the flashed version, while the
            running version may only be updated after reboot.

RFCv2:
 - remove the nesting in attr DEVLINK_ATTR_INFO_VERSIONS (now
   versions are mixed with other info attrs)l
 - have the driver report versions from the same callback as
   other info.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 include/net/devlink.h        | 18 ++++++++++++++++
 include/uapi/linux/devlink.h |  5 +++++
 net/core/devlink.c           | 40 ++++++++++++++++++++++++++++++++++++
 3 files changed, 63 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 5ef3570a3859..ec72638aa47f 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -428,6 +428,12 @@ enum devlink_param_wol_types {
 	.validate = _validate,						\
 }
 
+enum devlink_version_type {
+	DEVLINK_VERSION_FIXED,
+	DEVLINK_VERSION_STORED,
+	DEVLINK_VERSION_RUNNING,
+};
+
 struct devlink_region;
 struct devlink_info_req;
 
@@ -614,6 +620,10 @@ int devlink_info_report_serial_number(struct devlink_info_req *req,
 				      const char *sn);
 int devlink_info_report_driver_name(struct devlink_info_req *req,
 				    const char *name);
+int devlink_info_report_version(struct devlink_info_req *req,
+				enum devlink_version_type type,
+				const char *version_name,
+				const char *version_value);
 
 #else
 
@@ -923,6 +933,14 @@ devlink_info_report_serial_number(struct devlink_info_req *req, const char *sn)
 {
 	return 0;
 }
+
+static inline int
+devlink_info_report_version(struct devlink_info_req *req,
+			    enum devlink_version_type type,
+			    const char *version_name, const char *version_value)
+{
+	return 0;
+}
 #endif
 
 #endif /* _NET_DEVLINK_H_ */
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index fd089baa7c50..a61b87c73c3f 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -294,6 +294,11 @@ enum devlink_attr {
 
 	DEVLINK_ATTR_INFO_DRV_NAME,		/* string */
 	DEVLINK_ATTR_INFO_SERIAL_NUMBER,	/* string */
+	DEVLINK_ATTR_INFO_VERSION_FIXED,	/* nested */
+	DEVLINK_ATTR_INFO_VERSION_RUNNING,	/* nested */
+	DEVLINK_ATTR_INFO_VERSION_STORED,	/* nested */
+	DEVLINK_ATTR_INFO_VERSION_NAME,		/* string */
+	DEVLINK_ATTR_INFO_VERSION_VALUE,	/* string */
 
 	/* add new attributes above here, update the policy in devlink.c */
 
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 1b941493fdff..e2027d3a5e40 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -3732,6 +3732,46 @@ int devlink_info_report_serial_number(struct devlink_info_req *req,
 }
 EXPORT_SYMBOL_GPL(devlink_info_report_serial_number);
 
+int devlink_info_report_version(struct devlink_info_req *req,
+				enum devlink_version_type type,
+				const char *version_name,
+				const char *version_value)
+{
+	static const enum devlink_attr type2attr[] = {
+		[DEVLINK_VERSION_FIXED] = DEVLINK_ATTR_INFO_VERSION_FIXED,
+		[DEVLINK_VERSION_STORED] = DEVLINK_ATTR_INFO_VERSION_STORED,
+		[DEVLINK_VERSION_RUNNING] = DEVLINK_ATTR_INFO_VERSION_RUNNING,
+	};
+	struct nlattr *nest;
+	int err;
+
+	if (type >= ARRAY_SIZE(type2attr) || !type2attr[type])
+		return -EINVAL;
+
+	nest = nla_nest_start(req->msg, type2attr[type]);
+	if (!nest)
+		return -EMSGSIZE;
+
+	err = nla_put_string(req->msg, DEVLINK_ATTR_INFO_VERSION_NAME,
+			     version_name);
+	if (err)
+		goto nla_put_failure;
+
+	err = nla_put_string(req->msg, DEVLINK_ATTR_INFO_VERSION_VALUE,
+			     version_value);
+	if (err)
+		goto nla_put_failure;
+
+	nla_nest_end(req->msg, nest);
+
+	return 0;
+
+nla_put_failure:
+	nla_nest_cancel(req->msg, nest);
+	return err;
+}
+EXPORT_SYMBOL_GPL(devlink_info_report_version);
+
 static int
 devlink_nl_info_fill(struct sk_buff *msg, struct devlink *devlink,
 		     enum devlink_command cmd, u32 portid,
-- 
2.19.2


^ permalink raw reply related

* [PATCH net-next v2 1/7] devlink: add device information API
From: Jakub Kicinski @ 2019-01-30 19:05 UTC (permalink / raw)
  To: davem
  Cc: netdev, oss-drivers, jiri, andrew, f.fainelli, mkubecek, eugenem,
	jonathan.lemon, Jakub Kicinski
In-Reply-To: <20190130190513.25718-1-jakub.kicinski@netronome.com>

ethtool -i has served us well for a long time, but its showing
its limitations more and more.  The device information should
also be reported per device not per-netdev.

Lay foundation for a simple devlink-based way of reading device
info.  Add driver name and device serial number as initial pieces
of information exposed via this new API.

RFC v2:
 - wrap the skb into an opaque structure (Jiri);
 - allow the serial number of be any length (Jiri & Andrew);
 - add driver name (Jonathan).

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 include/net/devlink.h        |  18 ++++++
 include/uapi/linux/devlink.h |   5 ++
 net/core/devlink.c           | 114 +++++++++++++++++++++++++++++++++++
 3 files changed, 137 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 85c9eabaf056..5ef3570a3859 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -429,6 +429,7 @@ enum devlink_param_wol_types {
 }
 
 struct devlink_region;
+struct devlink_info_req;
 
 typedef void devlink_snapshot_data_dest_t(const void *data);
 
@@ -484,6 +485,8 @@ struct devlink_ops {
 	int (*eswitch_encap_mode_get)(struct devlink *devlink, u8 *p_encap_mode);
 	int (*eswitch_encap_mode_set)(struct devlink *devlink, u8 encap_mode,
 				      struct netlink_ext_ack *extack);
+	int (*info_get)(struct devlink *devlink, struct devlink_info_req *req,
+			struct netlink_ext_ack *extack);
 };
 
 static inline void *devlink_priv(struct devlink *devlink)
@@ -607,6 +610,10 @@ u32 devlink_region_shapshot_id_get(struct devlink *devlink);
 int devlink_region_snapshot_create(struct devlink_region *region, u64 data_len,
 				   u8 *data, u32 snapshot_id,
 				   devlink_snapshot_data_dest_t *data_destructor);
+int devlink_info_report_serial_number(struct devlink_info_req *req,
+				      const char *sn);
+int devlink_info_report_driver_name(struct devlink_info_req *req,
+				    const char *name);
 
 #else
 
@@ -905,6 +912,17 @@ devlink_region_snapshot_create(struct devlink_region *region, u64 data_len,
 	return 0;
 }
 
+static inline int
+devlink_info_report_driver_name(struct devlink_info_req *req, const char *name)
+{
+	return 0;
+}
+
+static inline int
+devlink_info_report_serial_number(struct devlink_info_req *req, const char *sn)
+{
+	return 0;
+}
 #endif
 
 #endif /* _NET_DEVLINK_H_ */
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 61b4447a6c5b..fd089baa7c50 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -94,6 +94,8 @@ enum devlink_command {
 	DEVLINK_CMD_PORT_PARAM_NEW,
 	DEVLINK_CMD_PORT_PARAM_DEL,
 
+	DEVLINK_CMD_INFO_GET,		/* can dump */
+
 	/* add new commands above here */
 	__DEVLINK_CMD_MAX,
 	DEVLINK_CMD_MAX = __DEVLINK_CMD_MAX - 1
@@ -290,6 +292,9 @@ enum devlink_attr {
 	DEVLINK_ATTR_REGION_CHUNK_ADDR,         /* u64 */
 	DEVLINK_ATTR_REGION_CHUNK_LEN,          /* u64 */
 
+	DEVLINK_ATTR_INFO_DRV_NAME,		/* string */
+	DEVLINK_ATTR_INFO_SERIAL_NUMBER,	/* string */
+
 	/* add new attributes above here, update the policy in devlink.c */
 
 	__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index e6f170caf449..1b941493fdff 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -3714,6 +3714,112 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
 	return 0;
 }
 
+struct devlink_info_req {
+	struct sk_buff *msg;
+};
+
+int devlink_info_report_driver_name(struct devlink_info_req *req,
+				    const char *name)
+{
+	return nla_put_string(req->msg, DEVLINK_ATTR_INFO_DRV_NAME, name);
+}
+EXPORT_SYMBOL_GPL(devlink_info_report_driver_name);
+
+int devlink_info_report_serial_number(struct devlink_info_req *req,
+				      const char *sn)
+{
+	return nla_put_string(req->msg, DEVLINK_ATTR_INFO_SERIAL_NUMBER, sn);
+}
+EXPORT_SYMBOL_GPL(devlink_info_report_serial_number);
+
+static int
+devlink_nl_info_fill(struct sk_buff *msg, struct devlink *devlink,
+		     enum devlink_command cmd, u32 portid,
+		     u32 seq, int flags, struct netlink_ext_ack *extack)
+{
+	struct devlink_info_req req;
+	void *hdr;
+	int err;
+
+	hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd);
+	if (!hdr)
+		return -EMSGSIZE;
+
+	err = -EMSGSIZE;
+	if (devlink_nl_put_handle(msg, devlink))
+		goto err_cancel_msg;
+
+	req.msg = msg;
+	err = devlink->ops->info_get(devlink, &req, extack);
+	if (err)
+		goto err_cancel_msg;
+
+	genlmsg_end(msg, hdr);
+	return 0;
+
+err_cancel_msg:
+	genlmsg_cancel(msg, hdr);
+	return err;
+}
+
+static int devlink_nl_cmd_info_get_doit(struct sk_buff *skb,
+					struct genl_info *info)
+{
+	struct devlink *devlink = info->user_ptr[0];
+	struct sk_buff *msg;
+	int err;
+
+	if (!devlink->ops || !devlink->ops->info_get)
+		return -EOPNOTSUPP;
+
+	msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!msg)
+		return -ENOMEM;
+
+	err = devlink_nl_info_fill(msg, devlink, DEVLINK_CMD_INFO_GET,
+				   info->snd_portid, info->snd_seq, 0,
+				   info->extack);
+	if (err) {
+		nlmsg_free(msg);
+		return err;
+	}
+
+	return genlmsg_reply(msg, info);
+}
+
+static int devlink_nl_cmd_info_get_dumpit(struct sk_buff *msg,
+					  struct netlink_callback *cb)
+{
+	struct devlink *devlink;
+	int start = cb->args[0];
+	int idx = 0;
+	int err;
+
+	mutex_lock(&devlink_mutex);
+	list_for_each_entry(devlink, &devlink_list, list) {
+		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
+			continue;
+		if (idx < start) {
+			idx++;
+			continue;
+		}
+
+		mutex_lock(&devlink->lock);
+		err = devlink_nl_info_fill(msg, devlink, DEVLINK_CMD_INFO_GET,
+					   NETLINK_CB(cb->skb).portid,
+					   cb->nlh->nlmsg_seq, NLM_F_MULTI,
+					   cb->extack);
+		mutex_unlock(&devlink->lock);
+		if (err)
+			break;
+		idx++;
+	}
+	mutex_unlock(&devlink_mutex);
+
+	cb->args[0] = idx;
+	return msg->len;
+}
+
 static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING },
 	[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING },
@@ -3974,6 +4080,14 @@ static const struct genl_ops devlink_nl_ops[] = {
 		.flags = GENL_ADMIN_PERM,
 		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
 	},
+	{
+		.cmd = DEVLINK_CMD_INFO_GET,
+		.doit = devlink_nl_cmd_info_get_doit,
+		.dumpit = devlink_nl_cmd_info_get_dumpit,
+		.policy = devlink_nl_policy,
+		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
+		/* can be retrieved by unprivileged users */
+	},
 };
 
 static struct genl_family devlink_nl_family __ro_after_init = {
-- 
2.19.2


^ permalink raw reply related

* [PATCH net-next v2 0/7] devlink: add device (driver) information API
From: Jakub Kicinski @ 2019-01-30 19:05 UTC (permalink / raw)
  To: davem
  Cc: netdev, oss-drivers, jiri, andrew, f.fainelli, mkubecek, eugenem,
	jonathan.lemon, Jakub Kicinski

Hi!

fw_version field in ethtool -i does not suit modern needs with 31
characters being quite limiting on more complex systems.  There is
also no distinction between the running and flashed versions of
the firmware.

Since the driver information pertains to the entire device, rather
than a particular netdev, it seems wise to move it do devlink, at
the same time fixing the aforementioned issues.

The new API allows exposing the device serial number and versions
of the components of the card - both hardware, firmware (running
and flashed).  Driver authors can choose descriptive identifiers
for the version fields.  A few version identifiers which seemed
relevant for most devices have been added to the global devlink
header.

Example:
$ devlink dev info pci/0000:05:00.0
pci/0000:05:00.0:
  driver nfp
  serial_number 16240145
  versions:
    fixed:
      board.id AMDA0099-0001
      board.rev 07
      board.vendor SMA
      board.model carbon
    running:
      fw.mgmt: 010156.010156.010156
      fw.cpld: 0x44
      fw.app: sriov-2.1.16
    stored:
      fw.mgmt: 010158.010158.010158
      fw.cpld: 0x44
      fw.app: sriov-2.1.20

Last patch also includes a compat code for ethtool.  If driver
reports no fw_version via the traditional ethtool API, ethtool
can call into devlink and try to cram as many versions as possible
into the 31 characters.

v2:
 - rebase.

this non-RFC, v3 some would say:
 - add three more versions in the NFP patches;
 - add last patch (ethool compat) - Andrew & Michal.

RFCv2:
 - use one driver op;
 - allow longer serial number;
 - wrap the skb into an opaque request struct;
 - add some common identifier into the devlink header.

Jakub Kicinski (7):
  devlink: add device information API
  devlink: add version reporting to devlink info API
  nfp: devlink: report driver name and serial number
  nfp: devlink: report fixed versions
  nfp: nsp: add support for versions command
  nfp: devlink: report the running and flashed versions
  ethtool: add compat for devlink info

 .../net/ethernet/netronome/nfp/nfp_devlink.c  | 145 +++++++++++++
 .../ethernet/netronome/nfp/nfpcore/nfp_nsp.c  |  61 ++++++
 .../ethernet/netronome/nfp/nfpcore/nfp_nsp.h  |  20 ++
 include/net/devlink.h                         |  59 +++++
 include/uapi/linux/devlink.h                  |  10 +
 net/core/devlink.c                            | 204 ++++++++++++++++++
 net/core/ethtool.c                            |   7 +
 7 files changed, 506 insertions(+)

-- 
2.19.2

^ permalink raw reply

* Re: net: phylink: dsa: mv88e6xxx: flaky link detection on switch ports with internal PHYs
From: John David Anglin @ 2019-01-30 19:01 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev
In-Reply-To: <20190130172818.GJ21904@lunn.ch>

On 2019-01-30 12:28 p.m., Andrew Lunn wrote:
> On Wed, Jan 30, 2019 at 12:08:39PM -0500, John David Anglin wrote:
>> On 2019-01-22 7:22 p.m., Andrew Lunn wrote:
>>> >From my Espressobin
>>>
>>> cat /proc/interrupts
>>> ...
>>>  44:          0          0  mv88e6xxx-g1   3 Edge      mv88e6xxx-g1-atu-prob
>>>  46:          0          0  mv88e6xxx-g1   5 Edge      mv88e6xxx-g1-vtu-prob
>>>  48:         38         24  mv88e6xxx-g1   7 Edge      mv88e6xxx-g2
>>>  51:          0          1  mv88e6xxx-g2   1 Edge      !soc!internal-regs@d0000000!mdio@32004!switch0@1!mdio:11
>>>  52:          0          0  mv88e6xxx-g2   2 Edge      !soc!internal-regs@d0000000!mdio@32004!switch0@1!mdio:12
>>>  53:         38         23  mv88e6xxx-g2   3 Edge      !soc!internal-regs@d0000000!mdio@32004!switch0@1!mdio:13
>>>
>>> These are PHY interrupts.
>> If we come back to my trying to use the INTn pin on the esspressobin, I
>> have found that clearing and resetting the interrupt
>> enable bits in the global control register (offset 0x4) restarts link
>> detection when the device is stuck.  This suggests that the
>> INTn connection to MPP2_23 is low when the the GIC interrupt is enabled
>> on this pin.  Possibly, this is caused by the fact
>> that EEIntEn is set to 1 on reset.  INTn then goes low when EEPROM
>> loading is done.  Another possibility might be race conditions
>> in processing interrupts.
>>
>> Thoughts?
> Hi David
>
> You need active low interrupts. Without it, i think you are always
> going to have race conditions which will cause interrupts to get
> stuck/lost.
>
> I would suggest you remove the interrupt from your device tree and use
> the mv88e6xxx polling method. If i remember correctly, it currently
> polls 10 per second, so PHY link up/down is going to be 5 times faster
> on average than when phylib is polling the PHY.
Hi Andrew,

The main motivation in doing this is to try to enable the AVB interrupt
and to improve the PTP support.
I agree that polling is perfectly fine for PHY link interrupts. 
Possibly, ATU and VTU might need faster
support but I'm not using that at the moment.

I have hacked on the time stamp code quit a bit to try and improve
things but there are still issues with
lost or overwritten time stamps:

Jan 28 11:15:05 localhost kernel: [234850.840883] mv88e6085
d0032004.mdio-mii:01
: timestamp discarded
Jan 28 11:15:05 localhost ptp4l: [234852.998] port 3: received SYNC
without time
stamp

I think when PTP packets other than PDELAY are too closely spaced, we
have problems accessing
the timestamp quick enough.  Also, timestamp access is dependent on CPU
speed and HZ.

It looks like I can easily connect MPP2_23 to MPP1_16 on the edge
connector P8.  I believe the northbridge
pins support level interrupts.

In /proc/interrupts, the switch interrupts are shown as edge.  The only
place that I see where this
is potentially set is mv88e6xxx_g2_watchdog_setup() where the call to
request_threaded_irq() passes
"IRQF_ONESHOT | IRQF_TRIGGER_FALLING".  Does this need to change?

Dave

-- 
John David Anglin  dave.anglin@bell.net



^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox