Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v2] gso: fix payload length when gso_size is zero
From: Duyck, Alexander H @ 2017-10-06 16:37 UTC (permalink / raw)
  To: netdev@vger.kernel.org, alexey.kodanev@oracle.com
  Cc: davem@davemloft.net, steffen.klassert@secunet.com
In-Reply-To: <1507305755-14393-1-git-send-email-alexey.kodanev@oracle.com>

On Fri, 2017-10-06 at 19:02 +0300, Alexey Kodanev wrote:
> When gso_size reset to zero for the tail segment in skb_segment(), later
> in ipv6_gso_segment(), __skb_udp_tunnel_segment() and gre_gso_segment()
> we will get incorrect results (payload length, pcsum) for that segment.
> inet_gso_segment() already has a check for gso_size before calculating
> payload.
> 
> The issue was found with LTP vxlan & gre tests over ixgbe NIC.
> 
> Fixes: 07b26c9454a2 ("gso: Support partial splitting at the frag_list pointer")
> Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>

Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>

> ---
> v2: also added skb_is_gso to gre_gso_segment() and __skb_udp_tunnel_segment()
> 
>  net/ipv4/gre_offload.c | 2 +-
>  net/ipv4/udp_offload.c | 2 +-
>  net/ipv6/ip6_offload.c | 2 +-
>  3 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/net/ipv4/gre_offload.c b/net/ipv4/gre_offload.c
> index d5cac99..8c72034 100644
> --- a/net/ipv4/gre_offload.c
> +++ b/net/ipv4/gre_offload.c
> @@ -98,7 +98,7 @@ static struct sk_buff *gre_gso_segment(struct sk_buff *skb,
>  		greh = (struct gre_base_hdr *)skb_transport_header(skb);
>  		pcsum = (__sum16 *)(greh + 1);
>  
> -		if (gso_partial) {
> +		if (gso_partial && skb_is_gso(skb)) {
>  			unsigned int partial_adj;
>  
>  			/* Adjust checksum to account for the fact that
> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> index 0932c85..6401574 100644
> --- a/net/ipv4/udp_offload.c
> +++ b/net/ipv4/udp_offload.c
> @@ -122,7 +122,7 @@ static struct sk_buff *__skb_udp_tunnel_segment(struct sk_buff *skb,
>  		 * will be using a length value equal to only one MSS sized
>  		 * segment instead of the entire frame.
>  		 */
> -		if (gso_partial) {
> +		if (gso_partial && skb_is_gso(skb)) {
>  			uh->len = htons(skb_shinfo(skb)->gso_size +
>  					SKB_GSO_CB(skb)->data_offset +
>  					skb->head - (unsigned char *)uh);
> diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
> index cdb3728..4a87f94 100644
> --- a/net/ipv6/ip6_offload.c
> +++ b/net/ipv6/ip6_offload.c
> @@ -105,7 +105,7 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
>  
>  	for (skb = segs; skb; skb = skb->next) {
>  		ipv6h = (struct ipv6hdr *)(skb_mac_header(skb) + nhoff);
> -		if (gso_partial)
> +		if (gso_partial && skb_is_gso(skb))
>  			payload_len = skb_shinfo(skb)->gso_size +
>  				      SKB_GSO_CB(skb)->data_offset +
>  				      skb->head - (unsigned char *)(ipv6h + 1);

^ permalink raw reply

* Re: [PATCH net] bpf: fix liveness marking
From: Alexei Starovoitov @ 2017-10-06 16:43 UTC (permalink / raw)
  To: Edward Cree, David S . Miller; +Cc: Daniel Borkmann, netdev, kernel-team
In-Reply-To: <759810f5-f6db-0439-1b2d-637cf83ac2f8@solarflare.com>

On 10/6/17 9:33 AM, Edward Cree wrote:
> On 06/10/17 00:20, Alexei Starovoitov wrote:
>> while processing Rx = Ry instruction the verifier does
>> regs[insn->dst_reg] = regs[insn->src_reg]
>> which often clears write mark (when Ry doesn't have it)
>> that was just set by check_reg_arg(Rx) prior to the assignment.
>> That causes mark_reg_read() to keep marking Rx in this block as
>> REG_LIVE_READ (since the logic incorrectly misses that it's
>> screened by the write) and in many of its parents (until lucky
>> write into the same Rx or beginning of the program).
>> That causes is_state_visited() logic to miss many pruning opportunities.
> Good catch!
>> Furthermore mark_reg_read() logic propagates the read mark
>> for BPF_REG_FP as well (though it's readonly) which causes
>> harmless but unnecssary work during is_state_visited().
> Surely it's unnecessary for is_state_visited() to even look at
>  BPF_REG_FP anyway, so in addition to your change we could make
>  states_equal just do `for (i = 0; i < BPF_REG_FP; i++)`?  That
>  might save a bit more time.

yeah. before this patch it was doing extra
memcmp(rold, rcur, ..) on FP reg. This patch saves this memcpy.
The i < BPF_REG_FP would effectively do the same, but I'm not sure
I want to do it just yet.
For net-next I have a bunch of changes for verifier to support bpf_call
and there two different states may have two different FPs.
One FP from caller and one from callee.
So I might still need to do full
for (i = 0; i < MAX_BPF_REG; i++)
      if (!regsafe(..))

>> Note that do_propagate_liveness() skips FP correctly,
>> so do the same in mark_reg_read() as well.
>> It saves 0.2 seconds for the test below
>>
>> program               before  after
>> bpf_lb-DLB_L3.o       2604    2304
>> bpf_lb-DLB_L4.o       11159   3723
>> bpf_lb-DUNKNOWN.o     1116    1110
>> bpf_lxc-DDROP_ALL.o   34566   28004
>> bpf_lxc-DUNKNOWN.o    53267   39026
>> bpf_netdev.o          17843   16943
>> bpf_overlay.o         8672    7929
>> time                  ~11 sec  ~4 sec
>>
>> Fixes: dc503a8ad984 ("bpf/verifier: track liveness for pruning")
>> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> Very nice numbers!
> Acked-by: Edward Cree <ecree@solarflare.com>

Thanks!

^ permalink raw reply

* [PATCH net-next] openvswitch: add ct_clear action
From: Eric Garver @ 2017-10-06 16:44 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA; +Cc: dev-yBygre7rU0TnMu66kgdUjQ

This adds a ct_clear action for clearing conntrack state. ct_clear is
currently implemented in OVS userspace, but is not backed by an action
in the kernel datapath. This is useful for flows that may modify a
packet tuple after a ct lookup has already occurred.

Signed-off-by: Eric Garver <e@erig.me>
---
 include/uapi/linux/openvswitch.h |  2 ++
 net/openvswitch/actions.c        |  5 +++++
 net/openvswitch/conntrack.c      | 12 ++++++++++++
 net/openvswitch/conntrack.h      |  7 +++++++
 net/openvswitch/flow_netlink.c   |  5 +++++
 5 files changed, 31 insertions(+)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 156ee4cab82e..1b6e510e2cc6 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -806,6 +806,7 @@ struct ovs_action_push_eth {
  * packet.
  * @OVS_ACTION_ATTR_POP_ETH: Pop the outermost Ethernet header off the
  * packet.
+ * @OVS_ACTION_ATTR_CT_CLEAR: Clear conntrack state from the packet.
  *
  * Only a single header can be set with a single %OVS_ACTION_ATTR_SET.  Not all
  * fields within a header are modifiable, e.g. the IPv4 protocol and fragment
@@ -835,6 +836,7 @@ enum ovs_action_attr {
 	OVS_ACTION_ATTR_TRUNC,        /* u32 struct ovs_action_trunc. */
 	OVS_ACTION_ATTR_PUSH_ETH,     /* struct ovs_action_push_eth. */
 	OVS_ACTION_ATTR_POP_ETH,      /* No argument. */
+	OVS_ACTION_ATTR_CT_CLEAR,     /* No argument. */
 
 	__OVS_ACTION_ATTR_MAX,	      /* Nothing past this will be accepted
 				       * from userspace. */
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index a54a556fcdb5..db9c7f2e662b 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -1203,6 +1203,10 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
 				return err == -EINPROGRESS ? 0 : err;
 			break;
 
+		case OVS_ACTION_ATTR_CT_CLEAR:
+			err = ovs_ct_clear(skb, key);
+			break;
+
 		case OVS_ACTION_ATTR_PUSH_ETH:
 			err = push_eth(skb, key, nla_data(a));
 			break;
@@ -1210,6 +1214,7 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
 		case OVS_ACTION_ATTR_POP_ETH:
 			err = pop_eth(skb, key);
 			break;
+
 		}
 
 		if (unlikely(err)) {
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index d558e882ca0c..f9b73c726ad7 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -1129,6 +1129,18 @@ int ovs_ct_execute(struct net *net, struct sk_buff *skb,
 	return err;
 }
 
+int ovs_ct_clear(struct sk_buff *skb, struct sw_flow_key *key)
+{
+	if (skb_nfct(skb)) {
+		nf_conntrack_put(skb_nfct(skb));
+		nf_ct_set(skb, NULL, 0);
+	}
+
+	ovs_ct_fill_key(skb, key);
+
+	return 0;
+}
+
 static int ovs_ct_add_helper(struct ovs_conntrack_info *info, const char *name,
 			     const struct sw_flow_key *key, bool log)
 {
diff --git a/net/openvswitch/conntrack.h b/net/openvswitch/conntrack.h
index bc7efd1867ab..399dfdd2c4f9 100644
--- a/net/openvswitch/conntrack.h
+++ b/net/openvswitch/conntrack.h
@@ -30,6 +30,7 @@ int ovs_ct_action_to_attr(const struct ovs_conntrack_info *, struct sk_buff *);
 
 int ovs_ct_execute(struct net *, struct sk_buff *, struct sw_flow_key *,
 		   const struct ovs_conntrack_info *);
+int ovs_ct_clear(struct sk_buff *skb, struct sw_flow_key *key);
 
 void ovs_ct_fill_key(const struct sk_buff *skb, struct sw_flow_key *key);
 int ovs_ct_put_key(const struct sw_flow_key *swkey,
@@ -73,6 +74,12 @@ static inline int ovs_ct_execute(struct net *net, struct sk_buff *skb,
 	return -ENOTSUPP;
 }
 
+static inline int ovs_ct_clear(struct sk_buff *skb,
+			       struct sw_flow_key *key)
+{
+	return -ENOTSUPP;
+}
+
 static inline void ovs_ct_fill_key(const struct sk_buff *skb,
 				   struct sw_flow_key *key)
 {
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index e8eb427ce6d1..198bb828d592 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -75,6 +75,7 @@ static bool actions_may_change_flow(const struct nlattr *actions)
 			break;
 
 		case OVS_ACTION_ATTR_CT:
+		case OVS_ACTION_ATTR_CT_CLEAR:
 		case OVS_ACTION_ATTR_HASH:
 		case OVS_ACTION_ATTR_POP_ETH:
 		case OVS_ACTION_ATTR_POP_MPLS:
@@ -2479,6 +2480,7 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 			[OVS_ACTION_ATTR_SAMPLE] = (u32)-1,
 			[OVS_ACTION_ATTR_HASH] = sizeof(struct ovs_action_hash),
 			[OVS_ACTION_ATTR_CT] = (u32)-1,
+			[OVS_ACTION_ATTR_CT_CLEAR] = 0,
 			[OVS_ACTION_ATTR_TRUNC] = sizeof(struct ovs_action_trunc),
 			[OVS_ACTION_ATTR_PUSH_ETH] = sizeof(struct ovs_action_push_eth),
 			[OVS_ACTION_ATTR_POP_ETH] = 0,
@@ -2620,6 +2622,9 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 			skip_copy = true;
 			break;
 
+		case OVS_ACTION_ATTR_CT_CLEAR:
+			break;
+
 		case OVS_ACTION_ATTR_PUSH_ETH:
 			/* Disallow pushing an Ethernet header if one
 			 * is already present */
-- 
2.12.0

^ permalink raw reply related

* Re: [PATCH] net/ipv6: Convert icmpv6_push_pending_frames to void
From: David Miller @ 2017-10-06 16:52 UTC (permalink / raw)
  To: joe; +Cc: kuznet, yoshfuji, devtimhansen, netdev, linux-kernel
In-Reply-To: <14612f639009193e561bcdb11c5ef5b6f12830ed.1507272313.git.joe@perches.com>

From: Joe Perches <joe@perches.com>
Date: Thu,  5 Oct 2017 23:46:14 -0700

> commit cc71b7b07119 ("net/ipv6: remove unused err variable on
> icmpv6_push_pending_frames") exposed icmpv6_push_pending_frames
> return value not being used.
> 
> Remove now unnecessary int err declarations and uses.
> 
> Miscellanea:
> 
> o Remove unnecessary goto and out: labels
> o Realign arguments
> 
> Signed-off-by: Joe Perches <joe@perches.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next 0/7] nfp: extend match and action for flower offload
From: David Miller @ 2017-10-06 16:56 UTC (permalink / raw)
  To: simon.horman; +Cc: jakub.kicinski, netdev, oss-drivers
In-Reply-To: <1507278086-3102-1-git-send-email-simon.horman@netronome.com>

From: Simon Horman <simon.horman@netronome.com>
Date: Fri,  6 Oct 2017 10:21:19 +0200

> Pieter says:
> 
> This series extends flower offload match and action capabilities. It
> specifically adds offload capabilities for matching on MPLS, TTL, TOS
> and flow label. Furthermore offload capabilities for action have been
> expanded to include set ethernet, ipv4, ipv6, tcp and udp headers.

Series applied, thanks Simon.

^ permalink raw reply

* Re: [PATCH net-next v2 0/3] ethtool: support for forward error correction mode setting on a link
From: Roopa Prabhu @ 2017-10-06 17:05 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem@davemloft.net, John W. Linville, netdev@vger.kernel.org,
	Vidya Sagar Ravipati, Dustin Byford, Dave Olson, Casey Leedom,
	Gal Pressman, Andrew Lunn, Manoj Malviya, Santosh Rastapur,
	yuval.mintz, odedw, Ariel Almog, Jeff Kirsher, Dirk van der Merwe
In-Reply-To: <20171005113055.477aec34@cakuba.netronome.com>

On Thu, Oct 5, 2017 at 11:30 AM, Jakub Kicinski <kubakici@wp.pl> wrote:
> On Fri, 28 Jul 2017 23:28:26 -0700, Roopa Prabhu wrote:
>> On Fri, Jul 28, 2017 at 9:46 AM, Jakub Kicinski <kubakici@wp.pl> wrote:
>> > On Fri, 28 Jul 2017 07:53:01 -0700, Roopa Prabhu wrote:
>> >> On Thu, Jul 27, 2017 at 7:33 PM, Jakub Kicinski <kubakici@wp.pl> wrote:
>> >> > On Thu, 27 Jul 2017 16:47:25 -0700, Roopa Prabhu wrote:
>> >> >> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>> >> >>
>> >> >> Forward Error Correction (FEC) modes i.e Base-R
>> >> >> and Reed-Solomon modes are introduced in 25G/40G/100G standards
>> >> >> for providing good BER at high speeds. Various networking devices
>> >> >> which support 25G/40G/100G provides ability to manage supported FEC
>> >> >> modes and the lack of FEC encoding control and reporting today is a
>> >> >> source for interoperability issues for many vendors.
>> >> >> FEC capability as well as specific FEC mode i.e. Base-R
>> >> >> or RS modes can be requested or advertised through bits D44:47 of base link
>> >> >> codeword.
>> >> >>
>> >> >> This patch set intends to provide option under ethtool to manage and
>> >> >> report FEC encoding settings for networking devices as per IEEE 802.3
>> >> >> bj, bm and by specs.
>> >> >>
>> >> >> v2 :
>> >> >>         - minor patch format fixes and typos pointed out by Andrew
>> >> >>         - there was a pending discussion on the use of 'auto' vs
>> >> >>           'automatic' for fec settings. I have left it as 'auto'
>> >> >>           because in most cases today auto is used in place of
>> >> >>           automatic to represent automatically generated values.
>> >> >>           We use it in other networking config too. I would prefer
>> >> >>           leaving it as auto.
>> >> >
>> >> > On the subject of resetting the values when module is replugged I
>> >> > assume what was previously described remains:
>> >> >  - we always allow users to set the FEC regardless of the module type;
>> >> >  - if user set an incorrect FEC for the module type (or module gets
>> >> >    swapped) the link will be administratively taken down by either
>> >> >    the driver or FW.
>> >> >
>> >> > Is that correct?  Am I misremembering?
>> >>
>> >> yes, correct. And possible future sfp hotplug events can give user-space
>> >> more info to react to module type changes etc.
>> >
>> > OK, if nobody else objects and we go with that - lets make sure we
>> > document clearly those are expected :)  My concern is that if there is
>> > ever 10G + RS FEC standard we don't want to end up in a situation where
>> > some drivers silently ignore FEC settings in 10G and other apply it.
>> > So let's make it clear what the intended Linux behaviour is.  It could
>> > be in the ethtool man page, or the kernel somewhere.
>>
>> sure :), ack. We will document it in the ethtool manpage.
>
> Hi Roopa!  Did you ever publish the ethtool user space patches at all?
> I can't find them...

not yet, they are in my queue. will submit in the next two days.

^ permalink raw reply

* Re: [net-next PATCH 0/3] Improve xdp_monitor samples/bpf
From: David Miller @ 2017-10-06 17:10 UTC (permalink / raw)
  To: brouer; +Cc: netdev, andy, borkmann, alexei.starovoitov
In-Reply-To: <150727927390.4460.3200093291677318710.stgit@firesoul>

From: Jesper Dangaard Brouer <brouer@redhat.com>
Date: Fri, 06 Oct 2017 10:41:36 +0200

> Here are some improvements to the xdp_monitor tool currently located
> under samples/bpf/.  Once the tools library libbpf become more feature
> complete, xdp_monitor should be converted to use it, and be moved into
> tools/bpf/xdp/ or tools/xdp/.

Series applied.

^ permalink raw reply

* Re: [PATCH] bnx2x: Use pci_ari_enabled() instead of local copy
From: David Miller @ 2017-10-06 17:10 UTC (permalink / raw)
  To: helgaas; +Cc: ariel.elior, netdev, everest-linux-l2, linux-kernel
In-Reply-To: <20171006110030.10980.52268.stgit@bhelgaas-glaptop.roam.corp.google.com>

From: Bjorn Helgaas <helgaas@kernel.org>
Date: Fri, 06 Oct 2017 06:00:30 -0500

> From: Bjorn Helgaas <bhelgaas@google.com>
> 
> Use pci_ari_enabled() from the PCI core instead of the identical local copy
> bnx2x_ari_enabled().  No functional change intended.
> 
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

Applied to net-next.

^ permalink raw reply

* Re: [PATCH 0/4] pull request for net-next: batman-adv 2017-10-06
From: David Miller @ 2017-10-06 17:13 UTC (permalink / raw)
  To: sw; +Cc: netdev, b.a.t.m.a.n
In-Reply-To: <20171006135437.26736-1-sw@simonwunderlich.de>

From: Simon Wunderlich <sw@simonwunderlich.de>
Date: Fri,  6 Oct 2017 15:54:33 +0200

> here is a small cleanup pull request of batman-adv to go into net-next.
> 
> Please pull or let me know of any problem!

Pulled, thanks Simon.

^ permalink raw reply

* Re: [PATCH net] ppp: fix race in ppp device destruction
From: David Miller @ 2017-10-06 17:17 UTC (permalink / raw)
  To: g.nault; +Cc: netdev, bgalvani, linux-ppp, paulus, dsahern, gfree.wind
In-Reply-To: <f197b94ec7e16902a9601fdf840d5a8e94c8e91a.1507302309.git.g.nault@alphalink.fr>

From: Guillaume Nault <g.nault@alphalink.fr>
Date: Fri, 6 Oct 2017 17:05:49 +0200

> ppp_release() tries to ensure that netdevices are unregistered before
> decrementing the unit refcount and running ppp_destroy_interface().
> 
> This is all fine as long as the the device is unregistered by
> ppp_release(): the unregister_netdevice() call, followed by
> rtnl_unlock(), guarantee that the unregistration process completes
> before rtnl_unlock() returns.
> 
> However, the device may be unregistered by other means (like
> ppp_nl_dellink()). If this happens right before ppp_release() calling
> rtnl_lock(), then ppp_release() has to wait for the concurrent
> unregistration code to release the lock.
> But rtnl_unlock() releases the lock before completing the device
> unregistration process. This allows ppp_release() to proceed and
> eventually call ppp_destroy_interface() before the unregistration
> process completes. Calling free_netdev() on this partially unregistered
> device will BUG():
 ...
> We could set the ->needs_free_netdev flag on PPP devices and move the
> ppp_destroy_interface() logic in the ->priv_destructor() callback. But
> that'd be quite intrusive as we'd first need to unlink from the other
> channels and units that depend on the device (the ones that used the
> PPPIOCCONNECT and PPPIOCATTACH ioctls).
> 
> Instead, we can just let the netdevice hold a reference on its
> ppp_file. This reference is dropped in ->priv_destructor(), at the very
> end of the unregistration process, so that neither ppp_release() nor
> ppp_disconnect_channel() can call ppp_destroy_interface() in the interim.
> 
> Reported-by: Beniamino Galvani <bgalvani@redhat.com>
> Fixes: 8cb775bc0a34 ("ppp: fix device unregistration upon netns deletion")
> Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* [PATCH net-next v2] vhost_net: do not stall on zerocopy depletion
From: Willem de Bruijn @ 2017-10-06 17:22 UTC (permalink / raw)
  To: netdev; +Cc: davem, mst, jasowang, den, virtualization, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

Vhost-net has a hard limit on the number of zerocopy skbs in flight.
When reached, transmission stalls. Stalls cause latency, as well as
head-of-line blocking of other flows that do not use zerocopy.

Instead of stalling, revert to copy-based transmission.

Tested by sending two udp flows from guest to host, one with payload
of VHOST_GOODCOPY_LEN, the other too small for zerocopy (1B). The
large flow is redirected to a netem instance with 1MBps rate limit
and deep 1000 entry queue.

  modprobe ifb
  ip link set dev ifb0 up
  tc qdisc add dev ifb0 root netem limit 1000 rate 1MBit

  tc qdisc add dev tap0 ingress
  tc filter add dev tap0 parent ffff: protocol ip \
      u32 match ip dport 8000 0xffff \
      action mirred egress redirect dev ifb0

Before the delay, both flows process around 80K pps. With the delay,
before this patch, both process around 400. After this patch, the
large flow is still rate limited, while the small reverts to its
original rate. See also discussion in the first link, below.

Without rate limiting, {1, 10, 100}x TCP_STREAM tests continued to
send at 100% zerocopy.

The limit in vhost_exceeds_maxpend must be carefully chosen. With
vq->num >> 1, the flows remain correlated. This value happens to
correspond to VHOST_MAX_PENDING for vq->num == 256. Allow smaller
fractions and ensure correctness also for much smaller values of
vq->num, by testing the min() of both explicitly. See also the
discussion in the second link below.

Changes
  v1 -> v2
    - replaced min with typed min_t
    - avoid unnecessary whitespace change

Link:http://lkml.kernel.org/r/CAF=yD-+Wk9sc9dXMUq1+x_hh=3ThTXa6BnZkygP3tgVpjbp93g@mail.gmail.com
Link:http://lkml.kernel.org/r/20170819064129.27272-1-den@klaipeden.com
Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 drivers/vhost/net.c | 12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 58585ec8699e..68677d930e20 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -436,8 +436,8 @@ static bool vhost_exceeds_maxpend(struct vhost_net *net)
 	struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
 	struct vhost_virtqueue *vq = &nvq->vq;

-	return (nvq->upend_idx + vq->num - VHOST_MAX_PEND) % UIO_MAXIOV
-		== nvq->done_idx;
+	return (nvq->upend_idx + UIO_MAXIOV - nvq->done_idx) % UIO_MAXIOV >
+	       min_t(unsigned int, VHOST_MAX_PEND, vq->num >> 2);
 }

 /* Expects to be always run from workqueue - which acts as
@@ -480,11 +480,6 @@ static void handle_tx(struct vhost_net *net)
 		if (zcopy)
 			vhost_zerocopy_signal_used(net, vq);

-		/* If more outstanding DMAs, queue the work.
-		 * Handle upend_idx wrap around
-		 */
-		if (unlikely(vhost_exceeds_maxpend(net)))
-			break;

 		head = vhost_net_tx_get_vq_desc(net, vq, vq->iov,
 						ARRAY_SIZE(vq->iov),
@@ -519,8 +514,7 @@ static void handle_tx(struct vhost_net *net)
 		len = msg_data_left(&msg);

 		zcopy_used = zcopy && len >= VHOST_GOODCOPY_LEN
-				   && (nvq->upend_idx + 1) % UIO_MAXIOV !=
-				      nvq->done_idx
+				   && !vhost_exceeds_maxpend(net)
 				   && vhost_net_tx_select_zcopy(net);

 		/* use msg_control to pass vhost zerocopy ubuf info to skb */
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related

* Linux bridge of hsr and Ethernet interface. Is this valid?
From: Murali Karicheri @ 2017-10-06 17:28 UTC (permalink / raw)
  To: open list:TI NETCP ETHERNET DRIVER

Hello Linux netdev experts,

I have a board that has multiple Ethernet intefaces. two, eth0 and eth1 and 1G interfaces and two are 100M interfaces (eth2 and eth3). I create  hsr interface, hsr0 using eth2 and eth3 using ip link command. Now I want to create a linux bridge as follows:-

brctl addbr my_bridge
brctl addif eth0
brctl addif hsr0

I connect a PC to eth0 interface and another hsr compliant device to eth2 or eth3. Is it a valid scenario? 

I see following description at https://wiki.linuxfoundation.org/networking/bridge where is mentioned that 

=============================
Adding devices to a bridge

The command

 brctl addif //bridgename// //device//

adds the network device device to take part in the bridging of “bridgename.” All the devices contained in a bridge act as one big network. It is not possible to add a device to multiple bridges or bridge a bridge device, because it just wouldn't make any sense! The bridge will take a short amount of time when a device is added to learn the Ethernet addresses on the segment before starting to forward. 

=============================

In this case hsr is already a 802.1D bridge and we are trying to bridge a bridge.

So your expert opinion is needed. Thanks.
-- 
Murali Karicheri
Linux Kernel, Keystone

^ permalink raw reply

* Re: [PATCH] netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'
From: Willem de Bruijn @ 2017-10-06 17:40 UTC (permalink / raw)
  To: Shmulik Ladkani
  Cc: Pablo Neira Ayuso, netfilter-devel, Willem de Bruijn,
	Network Development, Daniel Borkmann, Rafael Buchbinder,
	Shmulik Ladkani
In-Reply-To: <20171006160242.4403-1-shmulik@nsof.io>

On Fri, Oct 6, 2017 at 12:02 PM, Shmulik Ladkani <shmulik@nsof.io> wrote:
> From: Shmulik Ladkani <shmulik.ladkani@gmail.com>
>
> Commit 2c16d6033264 ("netfilter: xt_bpf: support ebpf") introduced
> support for attaching an eBPF object by an fd, with the
> 'bpf_mt_check_v1' ABI expecting the '.fd' to be specified upon each
> IPT_SO_SET_REPLACE call.
>
> However this breaks subsequent iptables calls:
>
>  # iptables -A INPUT -m bpf --object-pinned /sys/fs/bpf/xxx -j ACCEPT
>  # iptables -A INPUT -s 5.6.7.8 -j ACCEPT
>  iptables: Invalid argument. Run `dmesg' for more information.
>
> That's because iptables works by loading exising rules using
> IPT_SO_GET_ENTRIES to userspace, then issuing IPT_SO_SET_REPLACE with
> the replacement set.
>
> However, the loaded 'xt_bpf_info_v1' has an arbitrary '.fd' number
> (from the initial "iptables -m bpf" invocation) - so when 2nd invocation
> occurs, userspace passes a bogus fd number, which leads to
> 'bpf_mt_check_v1' to fail.
>
> One suggested solution [1] was to hack iptables userspace, to perform a
> "entries fixup" immediatley after IPT_SO_GET_ENTRIES, by opening a new,
> process-local fd per every 'xt_bpf_info_v1' entry seen.
>
> However, in [2] both Pablo Neira Ayuso and Willem de Bruijn suggested to
> depricate the xt_bpf_info_v1 ABI dealing with pinned ebpf objects.
>
> This fix changes the XT_BPF_MODE_FD_PINNED behavior to ignore the given
> '.fd' and instead perform an in-kernel lookup for the bpf object given
> the provided '.path'.
>
> It also defines an alias for the XT_BPF_MODE_FD_PINNED mode, named
> XT_BPF_MODE_PATH_PINNED, to better reflect the fact that the user is
> expected to provide the path of the pinned object.
>
> Existing XT_BPF_MODE_FD_ELF behavior (non-pinned fd mode) is preserved.

I suppose that we have the same problem here. As a matter of fact, the
implementation is the same for both FD_ELF and FD_PINNED, and checks

  f.file->f_op == &bpf_prog_fops

so a file descriptor to a random open ELF file outside a bpf fs would not be
accepted as is.

Anyway, that is outside the scope of this fix.

>
> References: [1] https://marc.info/?l=netfilter-devel&m=150564724607440&w=2
>             [2] https://marc.info/?l=netfilter-devel&m=150575727129880&w=2
>
> Cc: Pablo Neira Ayuso <pablo@netfilter.org>
> Cc: Willem de Bruijn <willemb@google.com>
> Reported-by: Rafael Buchbinder <rafi@rbk.ms>
> Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>

Acked-by: Willem de Bruijn <willemb@google.com>

Thanks a lot for fixing this.

> ---
>  include/uapi/linux/netfilter/xt_bpf.h |  1 +
>  kernel/bpf/inode.c                    |  1 +
>  net/netfilter/xt_bpf.c                | 22 ++++++++++++++++++++--
>  3 files changed, 22 insertions(+), 2 deletions(-)
>
> diff --git a/include/uapi/linux/netfilter/xt_bpf.h b/include/uapi/linux/netfilter/xt_bpf.h
> index b97725af2ac0..da161b56c79e 100644
> --- a/include/uapi/linux/netfilter/xt_bpf.h
> +++ b/include/uapi/linux/netfilter/xt_bpf.h
> @@ -23,6 +23,7 @@ enum xt_bpf_modes {
>         XT_BPF_MODE_FD_PINNED,
>         XT_BPF_MODE_FD_ELF,
>  };
> +#define XT_BPF_MODE_PATH_PINNED XT_BPF_MODE_FD_PINNED
>
>  struct xt_bpf_info_v1 {
>         __u16 mode;
> diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
> index e833ed914358..be1dde967208 100644
> --- a/kernel/bpf/inode.c
> +++ b/kernel/bpf/inode.c
> @@ -363,6 +363,7 @@ int bpf_obj_get_user(const char __user *pathname)
>         putname(pname);
>         return ret;
>  }
> +EXPORT_SYMBOL_GPL(bpf_obj_get_user);
>
>  static void bpf_evict_inode(struct inode *inode)
>  {
> diff --git a/net/netfilter/xt_bpf.c b/net/netfilter/xt_bpf.c
> index 38986a95216c..29123934887b 100644
> --- a/net/netfilter/xt_bpf.c
> +++ b/net/netfilter/xt_bpf.c
> @@ -8,6 +8,7 @@
>   */
>
>  #include <linux/module.h>
> +#include <linux/syscalls.h>
>  #include <linux/skbuff.h>
>  #include <linux/filter.h>
>  #include <linux/bpf.h>
> @@ -49,6 +50,22 @@ static int __bpf_mt_check_fd(int fd, struct bpf_prog **ret)
>         return 0;
>  }
>
> +static int __bpf_mt_check_path(const char *path, struct bpf_prog **ret)
> +{
> +       mm_segment_t oldfs = get_fs();
> +       int retval, fd;
> +
> +       set_fs(KERNEL_DS);
> +       fd = bpf_obj_get_user(path);
> +       set_fs(oldfs);
> +       if (fd < 0)
> +               return fd;
> +
> +       retval = __bpf_mt_check_fd(fd, ret);
> +       sys_close(fd);
> +       return retval;
> +}
> +
>  static int bpf_mt_check(const struct xt_mtchk_param *par)
>  {
>         struct xt_bpf_info *info = par->matchinfo;
> @@ -66,9 +83,10 @@ static int bpf_mt_check_v1(const struct xt_mtchk_param *par)
>                 return __bpf_mt_check_bytecode(info->bpf_program,
>                                                info->bpf_program_num_elem,
>                                                &info->filter);
> -       else if (info->mode == XT_BPF_MODE_FD_PINNED ||
> -                info->mode == XT_BPF_MODE_FD_ELF)
> +       else if (info->mode == XT_BPF_MODE_FD_ELF)
>                 return __bpf_mt_check_fd(info->fd, &info->filter);
> +       else if (info->mode == XT_BPF_MODE_PATH_PINNED)
> +               return __bpf_mt_check_path(info->path, &info->filter);
>         else
>                 return -EINVAL;
>  }
> --
> 2.14.2
>

^ permalink raw reply

* Re: [PATCH net-next 1/1] [net] bonding: Add NUMA notice
From: Patrick Talbert @ 2017-10-06 17:54 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev
In-Reply-To: <20171005214636.GM13247@lunn.ch>

On Thu, Oct 5, 2017 at 5:46 PM, Andrew Lunn <andrew@lunn.ch> wrote:
> On Thu, Oct 05, 2017 at 04:23:45PM -0400, Patrick Talbert wrote:
>> Network performance can suffer when a load balancing bond uses slave
>> interfaces which are in different NUMA domains.
>>
>> This compares the NUMA domain of a newly enslaved interface against any
>> existing enslaved interfaces and prints a warning if they do not match.
>
> Hi Patrick
>
> Is there a bonding mode which might actually want to do this? Send on
> the local domain, unless it is overloaded, in which case send it to
> the other domain?
>

I suppose there could theoretically be a bonding mode that could do
that, but currently no such mode exists.

> There is also this talk for netdev:
>
> https://netdevconf.org/2.2/session.html?shochat-devicemgmt-talk

>From reading the abstract there, it sounds like such a device driver
would want to abstract away the numa location of the underlying
devices from the "unified" net device it presents to the kernel.

>
>         Andrew

My goal with the patch is not to prevent some one from bonding
whichever interfaces they want, only to notify them that what they are
doing is *likely* to be less than ideal from a performance
perspective. Even if some theoretical load balancing bonding mode was
intelligent enough to consider NUMA when choosing a transmit
interface, it never has control over the interface traffic is received
on (excluding the strange balance-alb mode).

Patrick

^ permalink raw reply

* [net-next 01/15] i40e: fix a typo in i40e_pf documentation
From: Jeff Kirsher @ 2017-10-06 17:57 UTC (permalink / raw)
  To: davem; +Cc: Rami Rosen, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171006175727.868-1-jeffrey.t.kirsher@intel.com>

From: Rami Rosen <rami.rosen@intel.com>

This patch fixes a typo in i40e_pf object documentation; num_req_vfs
refers to the number of VFs requested for the PF.

Signed-off-by: Rami Rosen <rami.rosen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index 439c63cb2a0c..2bc4dd0dbbf1 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -350,7 +350,7 @@ struct i40e_pf {
 	u16 num_vmdq_vsis;         /* num vmdq vsis this PF has set up */
 	u16 num_vmdq_qps;          /* num queue pairs per vmdq pool */
 	u16 num_vmdq_msix;         /* num queue vectors per vmdq pool */
-	u16 num_req_vfs;           /* num VFs requested for this VF */
+	u16 num_req_vfs;           /* num VFs requested for this PF */
 	u16 num_vf_qps;            /* num queue pairs per VF */
 	u16 num_lan_qps;           /* num lan queues this PF has set up */
 	u16 num_lan_msix;          /* num queue vectors for the base PF vsi */
-- 
2.14.2

^ permalink raw reply related

* [net-next 00/15][pull request] 40GbE Intel Wired LAN Driver Updates 2017-10-06
From: Jeff Kirsher @ 2017-10-06 17:57 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, nhorman, sassmann, jogreene

This series contains updates to i40e and i40evf only.

Rami fixes a typo in the code comments.

Mitch adds an ethtool private flag to control source pruning to resolve an
issue where our default behavior is to enable source pruning which breaks ARP
monitoring in channel bonding.  Fixes a couple of register definitions, which
were incorrect.

Jake fixes an issue with multiple logical CPUs per core (simultaneous
multithreading - SMT) and how we set an affinity hint based on the v_idx of
that q_vector, which is an incremental value and might lead to multiple
offline CPUs being assigned to a q_vector.  Instead, we should only assign
hints for CPUs which are online, so look to use cpumask_local_spread().
Also fixed a VF VLAN tag stripping issue, where the flag created to change
this feature was seen as unchangeable.  Lastly, organized and re-numbered
the feature flags.

Alan re-enables PTP L4 for XL710 devices with firmware version 6.0 or
greater, now that the previous bug in the older firmware is fixed.
Implements the PCI error handlers for reset_prepare() and reset_done() to
allow us to handle function level resets.

Alice cleans up code that was added to the incorrect function during a
merge.

Filip adds a change to display an error message when a module is inserted
that does not meet the thermal requirements, Talking Heads "Burning Down
the House" comes to mind.  Also fixed a flow director filter issue where
a variable was not being cleared which stores the filter number to be
removed from the list when the firmware refused to add the requested
filter.

The following are changes since commit cc71b7b071192ac1c288e272fdc3f3877eb96663:
  net/ipv6: remove unused err variable on icmpv6_push_pending_frames
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 40GbE

Alan Brady (2):
  i40e: re-enable PTP L4 capabilities for XL710 if FW >6.0
  i40e: implement split PCI error reset handler

Alice Michael (1):
  i40e: fix merge error

Filip Sadowski (2):
  i40e: Display error message if module does not meet thermal
    requirements
  i40e: Properly maintain flow director filters list

Jacob Keller (4):
  i40e/i40evf: spread CPU affinity hints across online CPUs only
  i40evf: enable support for VF VLAN tag stripping control
  i40e: ignore skb->xmit_more when deciding to set RS bit
  i40e/i40evf: organize and re-number feature flags

Jesse Brandeburg (1):
  i40e/i40evf: use DECLARE_BITMAP for state

Mariusz Stachura (1):
  i40e: do not enter PHY debug mode while setting LEDs behaviour

Mitch Williams (3):
  i40e: add private flag to control source pruning
  i40e: redfine I40E_PHY_TYPE_MAX
  i40e: fix incorrect register definition

Rami Rosen (1):
  i40e: fix a typo in i40e_pf documentation

 drivers/net/ethernet/intel/i40e/i40e.h             |  98 +++++++++----------
 drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h  |   3 +-
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c     |   8 +-
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c     |  20 ++--
 drivers/net/ethernet/intel/i40e/i40e_main.c        | 104 +++++++++++++++++----
 drivers/net/ethernet/intel/i40e/i40e_register.h    |   2 +-
 drivers/net/ethernet/intel/i40e/i40e_txrx.c        |  34 +------
 drivers/net/ethernet/intel/i40e/i40e_txrx.h        |   3 +-
 .../net/ethernet/intel/i40evf/i40e_adminq_cmd.h    |   3 +-
 drivers/net/ethernet/intel/i40evf/i40e_txrx.h      |   3 +-
 drivers/net/ethernet/intel/i40evf/i40evf.h         |  32 +++----
 drivers/net/ethernet/intel/i40evf/i40evf_main.c    |  31 +++---
 12 files changed, 203 insertions(+), 138 deletions(-)

-- 
2.14.2

^ permalink raw reply

* [net-next 04/15] i40e: re-enable PTP L4 capabilities for XL710 if FW >6.0
From: Jeff Kirsher @ 2017-10-06 17:57 UTC (permalink / raw)
  To: davem; +Cc: Alan Brady, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171006175727.868-1-jeffrey.t.kirsher@intel.com>

From: Alan Brady <alan.brady@intel.com>

Starting with XL710 FW 5.3 PTP L4 was disabled for XL710 due to a bug.  The
bug has since been resolved in XL710 FW >6.0 and PTP L4 can now be
re-enabled on those devices with updated firmware.

Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index d2bb4f17c89e..85132eee9f64 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -9074,6 +9074,11 @@ static int i40e_sw_init(struct i40e_pf *pf)
 	    (pf->hw.aq.fw_maj_ver >= 5)))
 		pf->hw_features |= I40E_HW_USE_SET_LLDP_MIB;
 
+	/* Enable PTP L4 if FW > v6.0 */
+	if (pf->hw.mac.type == I40E_MAC_XL710 &&
+	    pf->hw.aq.fw_maj_ver >= 6)
+		pf->hw_features |= I40E_HW_PTP_L4_CAPABLE;
+
 	if (pf->hw.func_caps.vmdq) {
 		pf->num_vmdq_vsis = I40E_DEFAULT_NUM_VMDQ_VSI;
 		pf->flags |= I40E_FLAG_VMDQ_ENABLED;
-- 
2.14.2

^ permalink raw reply related

* [net-next 02/15] i40e: add private flag to control source pruning
From: Jeff Kirsher @ 2017-10-06 17:57 UTC (permalink / raw)
  To: davem; +Cc: Mitch Williams, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171006175727.868-1-jeffrey.t.kirsher@intel.com>

From: Mitch Williams <mitch.a.williams@intel.com>

By default, our devices do source pruning, that is, they drop receive
packets that have the source MAC matching one of the receive filters.
Unfortunately, this breaks ARP monitoring in channel bonding, as the
bonding driver expects devices to receive ARPs containing their own
source address.

Add an ethtool private flag to control this feature.

Also, remove the netif_running() check when we process our private
flags. It's OK to reset when the device is closed and in most cases we
need the reset the apply these changes.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h         |  1 +
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c |  7 +++++--
 drivers/net/ethernet/intel/i40e/i40e_main.c    | 25 +++++++++++++++++++++++++
 3 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index 2bc4dd0dbbf1..c78448daa7a1 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -452,6 +452,7 @@ struct i40e_pf {
 #define I40E_FLAG_TEMP_LINK_POLLING		BIT_ULL(55)
 #define I40E_FLAG_CLIENT_L2_CHANGE		BIT_ULL(56)
 #define I40E_FLAG_LEGACY_RX			BIT_ULL(58)
+#define I40E_FLAG_SOURCE_PRUNING_DISABLED	BIT_ULL(59)
 
 	struct i40e_client_instance *cinst;
 	bool stat_offsets_loaded;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index 1136d02e2e95..6203d362438c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -227,6 +227,8 @@ static const struct i40e_priv_flags i40e_gstrings_priv_flags[] = {
 	I40E_PRIV_FLAG("veb-stats", I40E_FLAG_VEB_STATS_ENABLED, 0),
 	I40E_PRIV_FLAG("hw-atr-eviction", I40E_FLAG_HW_ATR_EVICT_ENABLED, 0),
 	I40E_PRIV_FLAG("legacy-rx", I40E_FLAG_LEGACY_RX, 0),
+	I40E_PRIV_FLAG("disable-source-pruning",
+		       I40E_FLAG_SOURCE_PRUNING_DISABLED, 0),
 };
 
 #define I40E_PRIV_FLAGS_STR_LEN ARRAY_SIZE(i40e_gstrings_priv_flags)
@@ -4189,8 +4191,9 @@ static int i40e_set_priv_flags(struct net_device *dev, u32 flags)
 	/* Issue reset to cause things to take effect, as additional bits
 	 * are added we will need to create a mask of bits requiring reset
 	 */
-	if ((changed_flags & I40E_FLAG_VEB_STATS_ENABLED) ||
-	    ((changed_flags & I40E_FLAG_LEGACY_RX) && netif_running(dev)))
+	if (changed_flags & (I40E_FLAG_VEB_STATS_ENABLED |
+			     I40E_FLAG_LEGACY_RX |
+			     I40E_FLAG_SOURCE_PRUNING_DISABLED))
 		i40e_do_reset(pf, BIT(__I40E_PF_RESET_REQUESTED), true);
 
 	return 0;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 3f9e89b054ec..b539469f576f 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -9903,6 +9903,31 @@ static int i40e_add_vsi(struct i40e_vsi *vsi)
 
 		enabled_tc = i40e_pf_get_tc_map(pf);
 
+		/* Source pruning is enabled by default, so the flag is
+		 * negative logic - if it's set, we need to fiddle with
+		 * the VSI to disable source pruning.
+		 */
+		if (pf->flags & I40E_FLAG_SOURCE_PRUNING_DISABLED) {
+			memset(&ctxt, 0, sizeof(ctxt));
+			ctxt.seid = pf->main_vsi_seid;
+			ctxt.pf_num = pf->hw.pf_id;
+			ctxt.vf_num = 0;
+			ctxt.info.valid_sections |=
+				     cpu_to_le16(I40E_AQ_VSI_PROP_SWITCH_VALID);
+			ctxt.info.switch_id =
+				   cpu_to_le16(I40E_AQ_VSI_SW_ID_FLAG_LOCAL_LB);
+			ret = i40e_aq_update_vsi_params(hw, &ctxt, NULL);
+			if (ret) {
+				dev_info(&pf->pdev->dev,
+					 "update vsi failed, err %s aq_err %s\n",
+					 i40e_stat_str(&pf->hw, ret),
+					 i40e_aq_str(&pf->hw,
+						     pf->hw.aq.asq_last_status));
+				ret = -ENOENT;
+				goto err;
+			}
+		}
+
 		/* MFP mode setup queue map and update VSI */
 		if ((pf->flags & I40E_FLAG_MFP_ENABLED) &&
 		    !(pf->hw.func_caps.iscsi)) { /* NIC type PF */
-- 
2.14.2

^ permalink raw reply related

* [net-next 03/15] i40e/i40evf: spread CPU affinity hints across online CPUs only
From: Jeff Kirsher @ 2017-10-06 17:57 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171006175727.868-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

Currently, when setting up the IRQ for a q_vector, we set an affinity
hint based on the v_idx of that q_vector. Meaning a loop iterates on
v_idx, which is an incremental value, and the cpumask is created based
on this value.

This is a problem in systems with multiple logical CPUs per core (like in
simultaneous multithreading (SMT) scenarios). If we disable some logical
CPUs, by turning SMT off for example, we will end up with a sparse
cpu_online_mask, i.e., only the first CPU in a core is online, and
incremental filling in q_vector cpumask might lead to multiple offline
CPUs being assigned to q_vectors.

Example: if we have a system with 8 cores each one containing 8 logical
CPUs (SMT == 8 in this case), we have 64 CPUs in total. But if SMT is
disabled, only the 1st CPU in each core remains online, so the
cpu_online_mask in this case would have only 8 bits set, in a sparse way.

In general case, when SMT is off the cpu_online_mask has only C bits set:
0, 1*N, 2*N, ..., C*(N-1)  where
C == # of cores;
N == # of logical CPUs per core.
In our example, only bits 0, 8, 16, 24, 32, 40, 48, 56 would be set.

Instead, we should only assign hints for CPUs which are online. Even
better, the kernel already provides a function, cpumask_local_spread()
which takes an index and returns a CPU, spreading the interrupts across
local NUMA nodes first, and then remote ones if necessary.

Since we generally have a 1:1 mapping between vectors and CPUs, there
is no real advantage to spreading vectors to local CPUs first. In order
to avoid mismatch of the default XPS hints, we'll pass -1 so that it
spreads across all CPUs without regard to the node locality.

Note that we don't need to change the q_vector->affinity_mask as this is
initialized to cpu_possible_mask, until an actual affinity is set and
then notified back to us.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c     | 16 +++++++++++-----
 drivers/net/ethernet/intel/i40evf/i40evf_main.c |  9 ++++++---
 2 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index b539469f576f..d2bb4f17c89e 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -2885,14 +2885,15 @@ static void i40e_vsi_free_rx_resources(struct i40e_vsi *vsi)
 static void i40e_config_xps_tx_ring(struct i40e_ring *ring)
 {
 	struct i40e_vsi *vsi = ring->vsi;
+	int cpu;
 
 	if (!ring->q_vector || !ring->netdev)
 		return;
 
 	if ((vsi->tc_config.numtc <= 1) &&
 	    !test_and_set_bit(__I40E_TX_XPS_INIT_DONE, &ring->state)) {
-		netif_set_xps_queue(ring->netdev,
-				    get_cpu_mask(ring->q_vector->v_idx),
+		cpu = cpumask_local_spread(ring->q_vector->v_idx, -1);
+		netif_set_xps_queue(ring->netdev, get_cpu_mask(cpu),
 				    ring->queue_index);
 	}
 
@@ -3482,6 +3483,7 @@ static int i40e_vsi_request_irq_msix(struct i40e_vsi *vsi, char *basename)
 	int tx_int_idx = 0;
 	int vector, err;
 	int irq_num;
+	int cpu;
 
 	for (vector = 0; vector < q_vectors; vector++) {
 		struct i40e_q_vector *q_vector = vsi->q_vectors[vector];
@@ -3517,10 +3519,14 @@ static int i40e_vsi_request_irq_msix(struct i40e_vsi *vsi, char *basename)
 		q_vector->affinity_notify.notify = i40e_irq_affinity_notify;
 		q_vector->affinity_notify.release = i40e_irq_affinity_release;
 		irq_set_affinity_notifier(irq_num, &q_vector->affinity_notify);
-		/* get_cpu_mask returns a static constant mask with
-		 * a permanent lifetime so it's ok to use here.
+		/* Spread affinity hints out across online CPUs.
+		 *
+		 * get_cpu_mask returns a static constant mask with
+		 * a permanent lifetime so it's ok to pass to
+		 * irq_set_affinity_hint without making a copy.
 		 */
-		irq_set_affinity_hint(irq_num, get_cpu_mask(q_vector->v_idx));
+		cpu = cpumask_local_spread(q_vector->v_idx, -1);
+		irq_set_affinity_hint(irq_num, get_cpu_mask(cpu));
 	}
 
 	vsi->irqs_ready = true;
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index f2f1e754c2ce..bc76378a71e2 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -515,6 +515,7 @@ i40evf_request_traffic_irqs(struct i40evf_adapter *adapter, char *basename)
 	unsigned int vector, q_vectors;
 	unsigned int rx_int_idx = 0, tx_int_idx = 0;
 	int irq_num, err;
+	int cpu;
 
 	i40evf_irq_disable(adapter);
 	/* Decrement for Other and TCP Timer vectors */
@@ -553,10 +554,12 @@ i40evf_request_traffic_irqs(struct i40evf_adapter *adapter, char *basename)
 		q_vector->affinity_notify.release =
 						   i40evf_irq_affinity_release;
 		irq_set_affinity_notifier(irq_num, &q_vector->affinity_notify);
-		/* get_cpu_mask returns a static constant mask with
-		 * a permanent lifetime so it's ok to use here.
+		/* Spread the IRQ affinity hints across online CPUs. Note that
+		 * get_cpu_mask returns a mask with a permanent lifetime so
+		 * it's safe to use as a hint for irq_set_affinity_hint.
 		 */
-		irq_set_affinity_hint(irq_num, get_cpu_mask(q_vector->v_idx));
+		cpu = cpumask_local_spread(q_vector->v_idx, -1);
+		irq_set_affinity_hint(irq_num, get_cpu_mask(cpu));
 	}
 
 	return 0;
-- 
2.14.2

^ permalink raw reply related

* [net-next 08/15] i40e: fix merge error
From: Jeff Kirsher @ 2017-10-06 17:57 UTC (permalink / raw)
  To: davem; +Cc: Alice Michael, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171006175727.868-1-jeffrey.t.kirsher@intel.com>

From: Alice Michael <alice.michael@intel.com>

This patch removes some code that was accidentally added to
the wrong function with a merge error.  Fixes: c53934c6d1b1
("i40e: fix: do not sleep in netdev_ops")

Signed-off-by: Alice Michael <alice.michael@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 49401be7a2f4..628101bb08d4 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -1776,11 +1776,6 @@ static void i40e_set_rx_mode(struct net_device *netdev)
 		vsi->flags |= I40E_VSI_FLAG_FILTER_CHANGED;
 		vsi->back->flags |= I40E_FLAG_FILTER_SYNC;
 	}
-
-	/* schedule our worker thread which will take care of
-	 * applying the new filter changes
-	 */
-	i40e_service_event_schedule(vsi->back);
 }
 
 /**
-- 
2.14.2

^ permalink raw reply related

* [net-next 07/15] i40e/i40evf: use DECLARE_BITMAP for state
From: Jeff Kirsher @ 2017-10-06 17:57 UTC (permalink / raw)
  To: davem; +Cc: Jesse Brandeburg, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher
In-Reply-To: <20171006175727.868-1-jeffrey.t.kirsher@intel.com>

From: Jesse Brandeburg <jesse.brandeburg@intel.com>

When using set_bit and friends, we should be using actual
bitmaps, and fix all the locations where we might access
it.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c | 8 ++++----
 drivers/net/ethernet/intel/i40e/i40e_main.c    | 4 ++--
 drivers/net/ethernet/intel/i40e/i40e_txrx.h    | 3 ++-
 drivers/net/ethernet/intel/i40evf/i40e_txrx.h  | 3 ++-
 4 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
index 8f326f87a815..6f2725fc50a1 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
@@ -278,8 +278,8 @@ static void i40e_dbg_dump_vsi_seid(struct i40e_pf *pf, int seid)
 			 rx_ring->netdev,
 			 rx_ring->rx_bi);
 		dev_info(&pf->pdev->dev,
-			 "    rx_rings[%i]: state = %li, queue_index = %d, reg_idx = %d\n",
-			 i, rx_ring->state,
+			 "    rx_rings[%i]: state = %lu, queue_index = %d, reg_idx = %d\n",
+			 i, *rx_ring->state,
 			 rx_ring->queue_index,
 			 rx_ring->reg_idx);
 		dev_info(&pf->pdev->dev,
@@ -334,8 +334,8 @@ static void i40e_dbg_dump_vsi_seid(struct i40e_pf *pf, int seid)
 			 tx_ring->netdev,
 			 tx_ring->tx_bi);
 		dev_info(&pf->pdev->dev,
-			 "    tx_rings[%i]: state = %li, queue_index = %d, reg_idx = %d\n",
-			 i, tx_ring->state,
+			 "    tx_rings[%i]: state = %lu, queue_index = %d, reg_idx = %d\n",
+			 i, *tx_ring->state,
 			 tx_ring->queue_index,
 			 tx_ring->reg_idx);
 		dev_info(&pf->pdev->dev,
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 85132eee9f64..49401be7a2f4 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -2891,7 +2891,7 @@ static void i40e_config_xps_tx_ring(struct i40e_ring *ring)
 		return;
 
 	if ((vsi->tc_config.numtc <= 1) &&
-	    !test_and_set_bit(__I40E_TX_XPS_INIT_DONE, &ring->state)) {
+	    !test_and_set_bit(__I40E_TX_XPS_INIT_DONE, ring->state)) {
 		cpu = cpumask_local_spread(ring->q_vector->v_idx, -1);
 		netif_set_xps_queue(ring->netdev, get_cpu_mask(cpu),
 				    ring->queue_index);
@@ -3010,7 +3010,7 @@ static int i40e_configure_rx_ring(struct i40e_ring *ring)
 	struct i40e_hmc_obj_rxq rx_ctx;
 	i40e_status err = 0;
 
-	ring->state = 0;
+	bitmap_zero(ring->state, __I40E_RING_STATE_NBITS);
 
 	/* clear the context structure first */
 	memset(&rx_ctx, 0, sizeof(rx_ctx));
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
index 2f848bc5e391..a4e3e665a1a1 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
@@ -342,6 +342,7 @@ struct i40e_rx_queue_stats {
 enum i40e_ring_state_t {
 	__I40E_TX_FDIR_INIT_DONE,
 	__I40E_TX_XPS_INIT_DONE,
+	__I40E_RING_STATE_NBITS /* must be last */
 };
 
 /* some useful defines for virtchannel interface, which
@@ -366,7 +367,7 @@ struct i40e_ring {
 		struct i40e_tx_buffer *tx_bi;
 		struct i40e_rx_buffer *rx_bi;
 	};
-	unsigned long state;
+	DECLARE_BITMAP(state, __I40E_RING_STATE_NBITS);
 	u16 queue_index;		/* Queue number of ring */
 	u8 dcb_tc;			/* Traffic class of ring */
 	u8 __iomem *tail;
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.h b/drivers/net/ethernet/intel/i40evf/i40e_txrx.h
index 0d9f98bc07bd..d8ca802a71a9 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.h
@@ -325,6 +325,7 @@ struct i40e_rx_queue_stats {
 enum i40e_ring_state_t {
 	__I40E_TX_FDIR_INIT_DONE,
 	__I40E_TX_XPS_INIT_DONE,
+	__I40E_RING_STATE_NBITS /* must be last */
 };
 
 /* some useful defines for virtchannel interface, which
@@ -348,7 +349,7 @@ struct i40e_ring {
 		struct i40e_tx_buffer *tx_bi;
 		struct i40e_rx_buffer *rx_bi;
 	};
-	unsigned long state;
+	DECLARE_BITMAP(state, __I40E_RING_STATE_NBITS);
 	u16 queue_index;		/* Queue number of ring */
 	u8 dcb_tc;			/* Traffic class of ring */
 	u8 __iomem *tail;
-- 
2.14.2

^ permalink raw reply related

* [net-next 10/15] i40e: Properly maintain flow director filters list
From: Jeff Kirsher @ 2017-10-06 17:57 UTC (permalink / raw)
  To: davem; +Cc: Filip Sadowski, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171006175727.868-1-jeffrey.t.kirsher@intel.com>

From: Filip Sadowski <filip.sadowski@intel.com>

When there is no space for more flow director filters and user requested to
add a new one it is rejected by firmware and automatically removed from the
filter list maintained by driver. This behaviour is correct. Afterwards
existing filter can be removed making free slot for the new one. This
however causes the newly added filter to be accepted by firmware but
removed from driver filter list resulting in not showing after issuing
'ethtool -n <dev_name>'.

This happened due to not clearing the variable pf->fd_inv which stores
filter number to be removed from the list when firmware refused to add the
requested filter. It caused the filter with this specific ID to be
constantly removed once it was added to the list although it has been
accepted by firmware and effectively applied to the NIC.
It was fixed by clearing pf->fd_inv variable after removal of the filter
from the list when it was rejected by firmware.

Signed-off-by: Filip Sadowski <filip.sadowski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 3d6d6a283327..9704cfef2f05 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -6232,6 +6232,7 @@ void i40e_fdir_check_and_reenable(struct i40e_pf *pf)
 				hlist_del(&filter->fdir_node);
 				kfree(filter);
 				pf->fdir_pf_active_filters--;
+				pf->fd_inv = 0;
 			}
 		}
 	}
-- 
2.14.2

^ permalink raw reply related

* [net-next 05/15] i40e: redfine I40E_PHY_TYPE_MAX
From: Jeff Kirsher @ 2017-10-06 17:57 UTC (permalink / raw)
  To: davem; +Cc: Mitch Williams, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171006175727.868-1-jeffrey.t.kirsher@intel.com>

From: Mitch Williams <mitch.a.williams@intel.com>

Since I40E_PHY_TYPE_MAX is used as an iterator, usually combined with
some sort of bit-shifting, it should only include actual PHY types and
not error cases. Move it up in the enum declaration so that loops only
iterate across valid PHY types.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h   | 2 +-
 drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
index 4c85ea9cd89a..50c5a4c630b8 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
@@ -1771,9 +1771,9 @@ enum i40e_aq_phy_type {
 	I40E_PHY_TYPE_25GBASE_CR		= 0x20,
 	I40E_PHY_TYPE_25GBASE_SR		= 0x21,
 	I40E_PHY_TYPE_25GBASE_LR		= 0x22,
+	I40E_PHY_TYPE_MAX,
 	I40E_PHY_TYPE_EMPTY			= 0xFE,
 	I40E_PHY_TYPE_DEFAULT			= 0xFF,
-	I40E_PHY_TYPE_MAX
 };
 
 #define I40E_LINK_SPEED_100MB_SHIFT	0x1
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h b/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h
index ed5602f4bbcd..dc6fc8b1bc79 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h
@@ -1767,9 +1767,9 @@ enum i40e_aq_phy_type {
 	I40E_PHY_TYPE_25GBASE_CR		= 0x20,
 	I40E_PHY_TYPE_25GBASE_SR		= 0x21,
 	I40E_PHY_TYPE_25GBASE_LR		= 0x22,
+	I40E_PHY_TYPE_MAX,
 	I40E_PHY_TYPE_EMPTY			= 0xFE,
 	I40E_PHY_TYPE_DEFAULT			= 0xFF,
-	I40E_PHY_TYPE_MAX
 };
 
 #define I40E_LINK_SPEED_100MB_SHIFT	0x1
-- 
2.14.2

^ permalink raw reply related

* [net-next 06/15] i40e: fix incorrect register definition
From: Jeff Kirsher @ 2017-10-06 17:57 UTC (permalink / raw)
  To: davem; +Cc: Mitch Williams, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171006175727.868-1-jeffrey.t.kirsher@intel.com>

From: Mitch Williams <mitch.a.williams@intel.com>

This register was defined incorrectly. Fix the increment value to 8, and
replace the iterator with _i to make the definition consistent with
other statistics registers.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_register.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_register.h b/drivers/net/ethernet/intel/i40e/i40e_register.h
index 86ca27f72f02..c234758dad15 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_register.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_register.h
@@ -2794,7 +2794,7 @@
 #define I40E_GLV_RUPP_MAX_INDEX 383
 #define I40E_GLV_RUPP_RUPP_SHIFT 0
 #define I40E_GLV_RUPP_RUPP_MASK I40E_MASK(0xFFFFFFFF, I40E_GLV_RUPP_RUPP_SHIFT)
-#define I40E_GLV_TEPC(_VSI) (0x00344000 + ((_VSI) * 4)) /* _i=0...383 */ /* Reset: CORER */
+#define I40E_GLV_TEPC(_i) (0x00344000 + ((_i) * 8)) /* _i=0...383 */ /* Reset: CORER */
 #define I40E_GLV_TEPC_MAX_INDEX 383
 #define I40E_GLV_TEPC_TEPC_SHIFT 0
 #define I40E_GLV_TEPC_TEPC_MASK I40E_MASK(0xFFFFFFFF, I40E_GLV_TEPC_TEPC_SHIFT)
-- 
2.14.2

^ permalink raw reply related

* [net-next 09/15] i40e: Display error message if module does not meet thermal requirements
From: Jeff Kirsher @ 2017-10-06 17:57 UTC (permalink / raw)
  To: davem; +Cc: Filip Sadowski, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171006175727.868-1-jeffrey.t.kirsher@intel.com>

From: Filip Sadowski <filip.sadowski@intel.com>

This patch causes error message to be displayed when NIC detects
insertion of module that does not meet thermal requirements.

Signed-off-by: Filip Sadowski <filip.sadowski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h             |  1 +
 drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h  |  1 +
 drivers/net/ethernet/intel/i40e/i40e_main.c        | 24 +++++++++++++++++-----
 .../net/ethernet/intel/i40evf/i40e_adminq_cmd.h    |  1 +
 4 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index c78448daa7a1..4dc6d43f8812 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -451,6 +451,7 @@ struct i40e_pf {
 #define I40E_FLAG_CLIENT_RESET			BIT_ULL(54)
 #define I40E_FLAG_TEMP_LINK_POLLING		BIT_ULL(55)
 #define I40E_FLAG_CLIENT_L2_CHANGE		BIT_ULL(56)
+#define I40E_FLAG_LINK_DOWN_ON_CLOSE_ENABLED	BIT_ULL(57)
 #define I40E_FLAG_LEGACY_RX			BIT_ULL(58)
 #define I40E_FLAG_SOURCE_PRUNING_DISABLED	BIT_ULL(59)
 
diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
index 50c5a4c630b8..a8f65aed5421 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
@@ -1772,6 +1772,7 @@ enum i40e_aq_phy_type {
 	I40E_PHY_TYPE_25GBASE_SR		= 0x21,
 	I40E_PHY_TYPE_25GBASE_LR		= 0x22,
 	I40E_PHY_TYPE_MAX,
+	I40E_PHY_TYPE_NOT_SUPPORTED_HIGH_TEMP	= 0xFD,
 	I40E_PHY_TYPE_EMPTY			= 0xFE,
 	I40E_PHY_TYPE_DEFAULT			= 0xFF,
 };
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 628101bb08d4..3d6d6a283327 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -6558,12 +6558,26 @@ static void i40e_handle_link_event(struct i40e_pf *pf,
 	 */
 	i40e_link_event(pf);
 
-	/* check for unqualified module, if link is down */
-	if ((status->link_info & I40E_AQ_MEDIA_AVAILABLE) &&
-	    (!(status->an_info & I40E_AQ_QUALIFIED_MODULE)) &&
-	    (!(status->link_info & I40E_AQ_LINK_UP)))
+	/* Check if module meets thermal requirements */
+	if (status->phy_type == I40E_PHY_TYPE_NOT_SUPPORTED_HIGH_TEMP) {
 		dev_err(&pf->pdev->dev,
-			"The driver failed to link because an unqualified module was detected.\n");
+			"Rx/Tx is disabled on this device because the module does not meet thermal requirements.\n");
+		dev_err(&pf->pdev->dev,
+			"Refer to the Intel(R) Ethernet Adapters and Devices User Guide for a list of supported modules.\n");
+	} else {
+		/* check for unqualified module, if link is down, suppress
+		 * the message if link was forced to be down.
+		 */
+		if ((status->link_info & I40E_AQ_MEDIA_AVAILABLE) &&
+		    (!(status->an_info & I40E_AQ_QUALIFIED_MODULE)) &&
+		    (!(status->link_info & I40E_AQ_LINK_UP)) &&
+		    (!(pf->flags & I40E_FLAG_LINK_DOWN_ON_CLOSE_ENABLED))) {
+			dev_err(&pf->pdev->dev,
+				"Rx/Tx is disabled on this device because an unsupported SFP module type was detected.\n");
+			dev_err(&pf->pdev->dev,
+				"Refer to the Intel(R) Ethernet Adapters and Devices User Guide for a list of supported modules.\n");
+		}
+	}
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h b/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h
index dc6fc8b1bc79..60c892f559b9 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h
@@ -1768,6 +1768,7 @@ enum i40e_aq_phy_type {
 	I40E_PHY_TYPE_25GBASE_SR		= 0x21,
 	I40E_PHY_TYPE_25GBASE_LR		= 0x22,
 	I40E_PHY_TYPE_MAX,
+	I40E_PHY_TYPE_NOT_SUPPORTED_HIGH_TEMP	= 0xFD,
 	I40E_PHY_TYPE_EMPTY			= 0xFE,
 	I40E_PHY_TYPE_DEFAULT			= 0xFF,
 };
-- 
2.14.2

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox