Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net] socket, bpf: fix possible use after free
From: Eric Dumazet @ 2017-10-02 19:20 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Alexei Starovoitov, Daniel Borkmann

From: Eric Dumazet <edumazet@google.com>

Starting from linux-4.4, 3WHS no longer takes the listener lock.

Since this time, we might hit a use-after-free in sk_filter_charge(),
if the filter we got in the memcpy() of the listener content
just happened to be replaced by a thread changing listener BPF filter.

To fix this, we need to make sure the filter refcount is not already
zero before incrementing it again.

Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets")
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/core/filter.c |   12 ++++++++----
 net/core/sock.c   |    5 ++++-
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 82edad58d066857aeee562661098effa3b3e6961..74b8c91fb5f4461da58c73568976bc9834c4612b 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -989,10 +989,14 @@ static bool __sk_filter_charge(struct sock *sk, struct sk_filter *fp)
 
 bool sk_filter_charge(struct sock *sk, struct sk_filter *fp)
 {
-	bool ret = __sk_filter_charge(sk, fp);
-	if (ret)
-		refcount_inc(&fp->refcnt);
-	return ret;
+	if (!refcount_inc_not_zero(&fp->refcnt))
+		return false;
+
+	if (!__sk_filter_charge(sk, fp)) {
+		sk_filter_release(fp);
+		return false;
+	}
+	return true;
 }
 
 static struct bpf_prog *bpf_migrate_filter(struct bpf_prog *fp)
diff --git a/net/core/sock.c b/net/core/sock.c
index 7d55c05f449d306b43fd850a61ce8ffd44174f7f..23953b741a41fbcf4a6ffb0dd5bf05bd5266b99d 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1684,13 +1684,16 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
 
 		sock_reset_flag(newsk, SOCK_DONE);
 
-		filter = rcu_dereference_protected(newsk->sk_filter, 1);
+		rcu_read_lock();
+		filter = rcu_dereference(sk->sk_filter);
 		if (filter != NULL)
 			/* though it's an empty new sock, the charging may fail
 			 * if sysctl_optmem_max was changed between creation of
 			 * original socket and cloning
 			 */
 			is_charged = sk_filter_charge(newsk, filter);
+		RCU_INIT_POINTER(newsk->sk_filter, filter);
+		rcu_read_unlock();
 
 		if (unlikely(!is_charged || xfrm_sk_clone_policy(newsk, sk))) {
 			/* We need to make sure that we don't uncharge the new

^ permalink raw reply related

* Re: [PATCH 00/18] use ARRAY_SIZE macro
From: J. Bruce Fields @ 2017-10-02 19:22 UTC (permalink / raw)
  To: Greg KH
  Cc: Jérémy Lefaure, Tobin C. Harding, alsa-devel, nouveau,
	dri-devel, dm-devel, brcm80211-dev-list, devel, linux-scsi,
	linux-rdma, amd-gfx, Jason Gunthorpe, linux-acpi, linux-video,
	intel-wired-lan, linux-media, intel-gfx, ecryptfs, linux-nfs,
	linux-raid, openipmi-developer, intel-gvt-dev, devel,
	brcm80211-dev-list.pdl
In-Reply-To: <20171002053554.GA28743@kroah.com>

On Mon, Oct 02, 2017 at 07:35:54AM +0200, Greg KH wrote:
> On Sun, Oct 01, 2017 at 08:52:20PM -0400, Jérémy Lefaure wrote:
> > On Mon, 2 Oct 2017 09:01:31 +1100
> > "Tobin C. Harding" <me@tobin.cc> wrote:
> > 
> > > > In order to reduce the size of the To: and Cc: lines, each patch of the
> > > > series is sent only to the maintainers and lists concerned by the patch.
> > > > This cover letter is sent to every list concerned by this series.  
> > > 
> > > Why don't you just send individual patches for each subsystem? I'm not a maintainer but I don't see
> > > how any one person is going to be able to apply this whole series, it is making it hard for
> > > maintainers if they have to pick patches out from among the series (if indeed any will bother
> > > doing that).
> > Yeah, maybe it would have been better to send individual patches.
> > 
> > From my point of view it's a series because the patches are related (I
> > did a git format-patch from my local branch). But for the maintainers
> > point of view, they are individual patches.
> 
> And the maintainers view is what matters here, if you wish to get your
> patches reviewed and accepted...

Mainly I'd just like to know which you're asking for.  Do you want me to
apply this, or to ACK it so someone else can?  If it's sent as a series
I tend to assume the latter.

But in this case I'm assuming it's the former, so I'll pick up the nfsd
one....

--b.

^ permalink raw reply

* Re: [PATCH net] socket, bpf: fix possible use after free
From: Alexei Starovoitov @ 2017-10-02 19:35 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Alexei Starovoitov, Daniel Borkmann
In-Reply-To: <1506972051.8061.30.camel@edumazet-glaptop3.roam.corp.google.com>

On Mon, Oct 02, 2017 at 12:20:51PM -0700, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> Starting from linux-4.4, 3WHS no longer takes the listener lock.
> 
> Since this time, we might hit a use-after-free in sk_filter_charge(),
> if the filter we got in the memcpy() of the listener content
> just happened to be replaced by a thread changing listener BPF filter.
> 
> To fix this, we need to make sure the filter refcount is not already
> zero before incrementing it again.
> 
> Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets")
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Thanks for the fix!
Acked-by: Alexei Starovoitov <ast@kernel.org>

^ permalink raw reply

* Re: [PATCH net-next 2/2] flow_dissector: dissect tunnel info
From: Tom Herbert @ 2017-10-02 19:36 UTC (permalink / raw)
  To: Simon Horman
  Cc: David Miller, Jiri Pirko, Jamal Hadi Salim, Cong Wang,
	Linux Kernel Network Developers, oss-drivers
In-Reply-To: <1506933676-20121-3-git-send-email-simon.horman@netronome.com>

On Mon, Oct 2, 2017 at 1:41 AM, Simon Horman <simon.horman@netronome.com> wrote:
> Move dissection of tunnel info from the flower classifier to the flow
> dissector where all other dissection occurs.  This should not have any
> behavioural affect on other users of the flow dissector.
>
> Signed-off-by: Simon Horman <simon.horman@netronome.com>
> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> ---
>  net/core/flow_dissector.c | 100 ++++++++++++++++++++++++++++++++++++++++++++++
>  net/sched/cls_flower.c    |  25 ------------
>  2 files changed, 100 insertions(+), 25 deletions(-)
>
> diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
> index 0a977373d003..1f5caafb4492 100644
> --- a/net/core/flow_dissector.c
> +++ b/net/core/flow_dissector.c
> @@ -5,6 +5,7 @@
>  #include <linux/ipv6.h>
>  #include <linux/if_vlan.h>
>  #include <net/dsa.h>
> +#include <net/dst_metadata.h>
>  #include <net/ip.h>
>  #include <net/ipv6.h>
>  #include <net/gre.h>
> @@ -115,6 +116,102 @@ __be32 __skb_flow_get_ports(const struct sk_buff *skb, int thoff, u8 ip_proto,
>  }
>  EXPORT_SYMBOL(__skb_flow_get_ports);
>
> +static void
> +skb_flow_dissect_set_enc_addr_type(enum flow_dissector_key_id type,
> +                                  struct flow_dissector *flow_dissector,
> +                                  void *target_container)
> +{
> +       struct flow_dissector_key_control *ctrl;
> +
> +       if (!dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_ENC_CONTROL))
> +               return;
> +
> +       ctrl = skb_flow_dissector_target(flow_dissector,
> +                                        FLOW_DISSECTOR_KEY_ENC_CONTROL,
> +                                        target_container);
> +       ctrl->addr_type = type;
> +}
> +
> +static void
> +__skb_flow_dissect_tunnel_info(const struct sk_buff *skb,
> +                              struct flow_dissector *flow_dissector,
> +                              void *target_container)
> +{
> +       struct ip_tunnel_info *info;
> +       struct ip_tunnel_key *key;
> +
> +       /* A quick check to see if there might be something to do. */
> +       if (!dissector_uses_key(flow_dissector,
> +                               FLOW_DISSECTOR_KEY_ENC_KEYID) &&
> +           !dissector_uses_key(flow_dissector,
> +                               FLOW_DISSECTOR_KEY_ENC_IPV4_ADDRS) &&
> +           !dissector_uses_key(flow_dissector,
> +                               FLOW_DISSECTOR_KEY_ENC_IPV6_ADDRS) &&
> +           !dissector_uses_key(flow_dissector,
> +                               FLOW_DISSECTOR_KEY_ENC_CONTROL) &&
> +           !dissector_uses_key(flow_dissector,
> +                               FLOW_DISSECTOR_KEY_ENC_PORTS))
> +               return;

This complex conditional is now in the path of every call to flow
dissector regardless of whether a classifier is enabled or tunnels
are. What does the assembly show in terms of number of branches? Can
we at least get this down to one check (might be a use case for
FLOW_DISSECTOR_F_FLOWER ;-) ), or even better use the static key when
encap or is enabled?

> +
> +       info = skb_tunnel_info(skb);
> +       if (!info)
> +               return;
> +
> +       key = &info->key;
> +
> +       switch (ip_tunnel_info_af(info)) {
> +       case AF_INET:
> +               skb_flow_dissect_set_enc_addr_type(FLOW_DISSECTOR_KEY_IPV4_ADDRS,
> +                                                  flow_dissector,
> +                                                  target_container);
> +               if (dissector_uses_key(flow_dissector,
> +                                      FLOW_DISSECTOR_KEY_ENC_IPV4_ADDRS)) {
> +                       struct flow_dissector_key_ipv4_addrs *ipv4;
> +
> +                       ipv4 = skb_flow_dissector_target(flow_dissector,
> +                                                        FLOW_DISSECTOR_KEY_ENC_IPV4_ADDRS,
> +                                                        target_container);
> +                       ipv4->src = key->u.ipv4.src;
> +                       ipv4->dst = key->u.ipv4.dst;
> +               }
> +               break;
> +       case AF_INET6:
> +               skb_flow_dissect_set_enc_addr_type(FLOW_DISSECTOR_KEY_IPV6_ADDRS,
> +                                                  flow_dissector,
> +                                                  target_container);
> +               if (dissector_uses_key(flow_dissector,
> +                                      FLOW_DISSECTOR_KEY_ENC_IPV6_ADDRS)) {
> +                       struct flow_dissector_key_ipv6_addrs *ipv6;
> +
> +                       ipv6 = skb_flow_dissector_target(flow_dissector,
> +                                                        FLOW_DISSECTOR_KEY_ENC_IPV6_ADDRS,
> +                                                        target_container);
> +                       ipv6->src = key->u.ipv6.src;
> +                       ipv6->dst = key->u.ipv6.dst;
> +               }
> +               break;
> +       }
> +
> +       if (dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_ENC_KEYID)) {
> +               struct flow_dissector_key_keyid *keyid;
> +
> +               keyid = skb_flow_dissector_target(flow_dissector,
> +                                                 FLOW_DISSECTOR_KEY_ENC_KEYID,
> +                                                 target_container);
> +               keyid->keyid = tunnel_id_to_key32(key->tun_id);
> +       }
> +
> +       if (dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_ENC_PORTS)) {
> +               struct flow_dissector_key_ports *tp;
> +
> +               tp = skb_flow_dissector_target(flow_dissector,
> +                                              FLOW_DISSECTOR_KEY_ENC_PORTS,
> +                                              target_container);
> +               tp->src = key->tp_src;
> +               tp->dst = key->tp_dst;
> +       }
> +}
> +
>  static enum flow_dissect_ret
>  __skb_flow_dissect_mpls(const struct sk_buff *skb,
>                         struct flow_dissector *flow_dissector,
> @@ -478,6 +575,9 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
>                                               FLOW_DISSECTOR_KEY_BASIC,
>                                               target_container);
>
> +       __skb_flow_dissect_tunnel_info(skb, flow_dissector,
> +                                      target_container);
> +
>         if (dissector_uses_key(flow_dissector,
>                                FLOW_DISSECTOR_KEY_ETH_ADDRS)) {
>                 struct ethhdr *eth = eth_hdr(skb);
> diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
> index d230cb4c8094..db831ac708f6 100644
> --- a/net/sched/cls_flower.c
> +++ b/net/sched/cls_flower.c
> @@ -152,37 +152,12 @@ static int fl_classify(struct sk_buff *skb, const struct tcf_proto *tp,
>         struct cls_fl_filter *f;
>         struct fl_flow_key skb_key;
>         struct fl_flow_key skb_mkey;
> -       struct ip_tunnel_info *info;
>
>         if (!atomic_read(&head->ht.nelems))
>                 return -1;
>
>         fl_clear_masked_range(&skb_key, &head->mask);
>
> -       info = skb_tunnel_info(skb);
> -       if (info) {
> -               struct ip_tunnel_key *key = &info->key;
> -
> -               switch (ip_tunnel_info_af(info)) {
> -               case AF_INET:
> -                       skb_key.enc_control.addr_type =
> -                               FLOW_DISSECTOR_KEY_IPV4_ADDRS;
> -                       skb_key.enc_ipv4.src = key->u.ipv4.src;
> -                       skb_key.enc_ipv4.dst = key->u.ipv4.dst;
> -                       break;
> -               case AF_INET6:
> -                       skb_key.enc_control.addr_type =
> -                               FLOW_DISSECTOR_KEY_IPV6_ADDRS;
> -                       skb_key.enc_ipv6.src = key->u.ipv6.src;
> -                       skb_key.enc_ipv6.dst = key->u.ipv6.dst;
> -                       break;
> -               }
> -
> -               skb_key.enc_key_id.keyid = tunnel_id_to_key32(key->tun_id);
> -               skb_key.enc_tp.src = key->tp_src;
> -               skb_key.enc_tp.dst = key->tp_dst;
> -       }
> -
>         skb_key.indev_ifindex = skb->skb_iif;
>         /* skb_flow_dissect() does not set n_proto in case an unknown protocol,
>          * so do it rather here.
> --
> 2.1.4
>

^ permalink raw reply

* RE: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers
From: Rodney Cummings @ 2017-10-02 19:40 UTC (permalink / raw)
  To: Levi Pearson
  Cc: Linux Kernel Network Developers, Vinicius Costa Gomes,
	Henrik Austad, richardcochran@gmail.com,
	jesus.sanchez-palencia@intel.com, andre.guedes@intel.com
In-Reply-To: <CAEYbN3STY+7ZOodQLaP4MfbvwCovWsav52PXBmjHND7QOO=srg@mail.gmail.com>

Thanks Levi,

> One particular device I am working with now provides all network
> access through a DSA switch chip with hardware Qbv support in addtion
> to hardware Qav support. The SoC attached to it has no hardware timed
> launch (SO_TXTIME) support. In this case, although the proposed
> interface for Qbv is not *sufficient* to make a working time-aware end
> station, it does provide a usable building block to provide one. As
> with the credit-based shaping system, Talkers must provide an
> additional level of per-stream shaping as well, but this is largely
> (absent the jitter calculations, which are sort of a middle-level
> concern) independent of what sort of hardware offload of the
> scheduling is provided.

It's a shame that someone built such hardware. Speaking as a manufacturer
of daisy-chainable products for industrial/automotive applications, I
wouldn't use that hardware in my products. The whole point of scheduling
is to obtain close-to-optimal network determinism. If the hardware
doesn't schedule properly, the product is probably better off using CBS.

It would be great if all of the user-level application code was scheduled
as tightly as the network, but the reality is that we aren't there yet.
Also, even if the end-station has a separately timed RT Linux thread for
each stream, the order of those threads can vary. Therefore, when the 
end-station (SoC) has more than one talker, the frames for each talker 
can be written into the Qbv queue at any time, in any order. 
There is no scheduling of frames (i.e. streams)... only the queue.

From the perspective of the CNC, that means that a Qbv-only end-station is
sloppy. In contrast, an end-station using per-stream scheduling 
(e.g. i210 with LaunchTime) is precise, because each frame has a known time.
It's true that P802.1Qcc supports both, so the CNC might support both.
Nevertheless, the CNC is likely to compute a network-wide schedule
that optimizes for the per-stream end-stations, and locates the windows
for the Qbv-only end-stations in a slower interval.

I realize that we are only doing CBS for now. I guess my main point is that
if/when we add scheduling, it cannot be limited to Qbv. Per-stream scheduling
is essential, because that is the assumption for most CNC designs that 
I'm aware of.

Rodney

> -----Original Message-----
> From: Levi Pearson [mailto:levipearson@gmail.com]
> Sent: Monday, October 2, 2017 1:46 PM
> To: Rodney Cummings <rodney.cummings@ni.com>
> Cc: Linux Kernel Network Developers <netdev@vger.kernel.org>; Vinicius
> Costa Gomes <vinicius.gomes@intel.com>; Henrik Austad <henrik@austad.us>;
> richardcochran@gmail.com; jesus.sanchez-palencia@intel.com;
> andre.guedes@intel.com
> Subject: Re: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for
> traffic shapers
> 
> Hi Rodney,
> 
> Some archives seem to have threaded it, but I have CC'd the
> participants I saw in the original discussion thread since they may
> not otherwise notice it amongst the normal traffic.
> 
> On Fri, Sep 29, 2017 at 2:44 PM, Rodney Cummings <rodney.cummings@ni.com>
> wrote:
> > Hi,
> >
> > I am posting my reply to this thread after subscribing, so I apologize
> > if the archive happens to attach it to the wrong thread.
> >
> > First, I'd like to say that I strongly support this RFC.
> > We need Linux interfaces for IEEE 802.1 TSN features.
> >
> > Although I haven't looked in detail, the proposal for CBS looks good.
> > My questions/concerns are more related to future work, such for 802.1Qbv
> > (scheduled traffic).
> >
> > 1. Question: From an 802.1 perspective, is this RFC intended to support
> > end-station (e.g. NIC in host), bridges (i.e. DSA), or both?
> >
> > This is very important to clarify, because the usage of this interface
> > will be very different for one or the other.
> >
> > For a bridge, the user code typically represents a remote management
> > protocol (e.g. SNMP, NETCONF, RESTCONF), and this interface is
> > expected to align with the specifications of 802.1Q clause 12,
> > which serves as the information model for management. Historically,
> > a standard kernel interface for management hasn't been viewed as
> > essential, but I suppose it wouldn't hurt.
> 
> I don't think the proposal was meant to cover the case of non-local
> switch hardware, but in addition to dsa and switchdev switch ICs
> managed by embedded Linux-running SoCs, there are SoCs with embedded
> small port count switches or even plain multiple NICs with software
> bridging. Many of these embedded small port count switches have FQTSS
> hardware that could potentially be configured by the proposed cbs
> qdisc. This blurs the line somewhat between what is a "bridge" and
> what is an "end-station" in 802.1Q terminology, but nevertheless these
> devices exist, sometimes acting as an endpoint + a real bridge and
> sometimes as just a system with multiple network interfaces.
> 
> > For an end station, the user code can be an implementation of SRP
> > (802.1Q clause 35), or it can be an application-specific
> > protocol (e.g. industrial fieldbus) that exchanges data according
> > to P802.1Qcc clause 46. Either way, the top-level user interface
> > is designed for individual streams, not queues and shapers. That
> > implies some translation code between that top-level interface
> > and this sort of kernel interface.
> >
> > As a specific end-station example, for CBS, 802.1Q-2014 subclause
> > 34.6.1 requires "per-stream queues" in the Talker end-station.
> > I don't see 34.6.1 represented in the proposed RFC, but that's
> > okay... maybe per-stream queues are implemented in user code.
> > Nevertheless, if that is the assumption, I think we need to
> > clarify, especially in examples.
> 
> You're correct that the FQTSS credit-based shaping algorithm requires
> per-stream shaping by Talker endpoints as well, but this is in
> addition to the per-class shaping provided by most hardware shaping
> implementations that I'm aware of in endpoint network hardware. I
> agree that we need to document the need to provide this, but it can
> definitely be built on top of the current proposal.
> 
> I believe the per-stream shaping could be managed either by a user
> space application that manages all use of a streaming traffic class,
> or through an additional qdisc module that performs per-stream
> management on top of the proposed cbs qdisc, ensuring that the
> frames-per-observation interval aspect of each stream's reservation is
> obeyed. This becomes a fairly simple qdisc to implement on top of a
> per-traffic class shaper, and could even be implemented with the help
> of the timestamp that the SO_TXTIME proposal adds to skbuffs, but I
> think keeping the layers separate provides more flexibility to
> implementations and keeps management of various kinds of hardware
> offload support simpler as well.
> 
> > 2. Suggestion: Do not assume that a time-aware (i.e. scheduled)
> > end-station will always use 802.1Qbv.
> >
> > For those who are subscribed to the 802.1 mailing list,
> > I'd suggest a read of draft P802.1Qcc/D1.6, subclause U.1
> > of Annex U. Subclause U.1 assumes that bridges in the network use
> > 802.1Qbv, and then it poses the question of what an end-station
> > Talker should do. If the end-station also uses 802.1Qbv,
> > and that end-station transmits multiple streams, 802.1Qbv is
> > a bad implementation. The reason is that the scheduling
> > (i.e. order in time) of each stream cannot be controlled, which
> > in turn means that the CNC (network manager) cannot optimize
> > the 802.1Qbv schedules in bridges. The preferred technique
> > is to use "per-stream scheduling" in each Talker, so that
> > the CNC can create an optimal schedules (i.e. best determinism).
> >
> > I'm aware of a small number of proprietary CNC implementations for
> > 802.1Qbv in bridges, and they are generally assuming per-stream
> > scheduling in end-stations (Talkers).
> >
> > The i210 NIC's LaunchTime can be used to implement per-stream
> > scheduling. I haven't looked at SO_TXTIME in detail, but it sounds
> > like per-stream scheduling. If so, then we already have the
> > fundamental building blocks for a complete implementation
> > of a time-aware end-station.
> >
> > If we answer the preceding question #1 as "end-station only",
> > I would recommend avoiding 802.1Qbv in this interface. There
> > isn't really anything wrong with it per-se, but it would lead
> > developers down the wrong path.
> 
> In some situations, such as device nodes that each incorporate a small
> port count switch for the purpose of daisy-chaining a segment of the
> network, "end stations" must do a limited subset of local bridge
> management as well. I'm not sure how common this is going to be for
> industrial control applications, but I know there are audio and
> automotive applications built this way.
> 
> One particular device I am working with now provides all network
> access through a DSA switch chip with hardware Qbv support in addtion
> to hardware Qav support. The SoC attached to it has no hardware timed
> launch (SO_TXTIME) support. In this case, although the proposed
> interface for Qbv is not *sufficient* to make a working time-aware end
> station, it does provide a usable building block to provide one. As
> with the credit-based shaping system, Talkers must provide an
> additional level of per-stream shaping as well, but this is largely
> (absent the jitter calculations, which are sort of a middle-level
> concern) independent of what sort of hardware offload of the
> scheduling is provided.
> 
> Both Qbv windows and timed launch support do roughly the same thing;
> they *delay* the launch of a hardware-queued frame so it can egress at
> a precisely specified time, and at least with the i210 and Qbv, ensure
> that no other traffic will be in-progress when that time arrives. For
> either to be used effectively, the application still has to prepare
> the frame slightly ahead-of-time and thus must have the same level of
> time-awareness. This is, again, largely independent of what kind of
> hardware offloading support is provided and is also largely
> independent of the network stack itself. Neither queue window
> management nor SO_TXTIME help the application present its
> time-sensitive traffic at the right time; that's a matter to be worked
> out with the application taking advantage of PTP and the OS scheduler.
> Whether you rely on managed windows or hardware launch time to provide
> the precisely correct amount of delay beyond that is immaterial to the
> application. In the absence of SO_TXTIME offloading (or even with it,
> and in the presence of sufficient OS scheduling jitter), an additional
> layer may need to be provided to ensure different applications' frames
> are queued in the correct order for egress during the window. Again,
> this could be a purely user-space application multiplexer or a
> separate qdisc module.
> 
> I wholeheartedly agree with you and Richard that we ought to
> eventually provide application-level APIs that don't require users to
> have deep knowledge of various 802.1Q intricacies. But I believe that
> the hardware offloading capability being provided now, and the variety
> of the way things are hooked up in real hardware, suggests that we
> ought to also build the support for the underlying protocols in layers
> so that we don't create unnecessary mismatches between offloading
> capability (which can be essential to overall network performance) and
> APIs, such that one configuration of offload support is privileged
> above others even when comparable scheduling accuracy could be
> provided by either.
> 
> In any case, only the cbs qdisc has been included in the post-RFC
> patch cover page for its last couple of iterations, so there is plenty
> of time to discuss how time-aware shaping, preemption, etc. management
> should occur beyond the cbs and SO_TXTIME proposals.
> 
> 
> Levi

^ permalink raw reply

* Re: [PATCH net] socket, bpf: fix possible use after free
From: Daniel Borkmann @ 2017-10-02 19:42 UTC (permalink / raw)
  To: Eric Dumazet, David Miller; +Cc: netdev, Alexei Starovoitov
In-Reply-To: <1506972051.8061.30.camel@edumazet-glaptop3.roam.corp.google.com>

On 10/02/2017 09:20 PM, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> Starting from linux-4.4, 3WHS no longer takes the listener lock.
>
> Since this time, we might hit a use-after-free in sk_filter_charge(),
> if the filter we got in the memcpy() of the listener content
> just happened to be replaced by a thread changing listener BPF filter.
>
> To fix this, we need to make sure the filter refcount is not already
> zero before incrementing it again.
>
> Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets")
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Thanks, Eric!

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

^ permalink raw reply

* Re: [net-next V2 PATCH 5/5] samples/bpf: add cpumap sample program xdp_redirect_cpu
From: Alexei Starovoitov @ 2017-10-02 19:44 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, jakub.kicinski, Michael S. Tsirkin, Jason Wang, mchan,
	John Fastabend, peter.waskiewicz.jr, Daniel Borkmann,
	Andy Gospodarek
In-Reply-To: <20171002140723.7667a9b4@redhat.com>

On Mon, Oct 02, 2017 at 02:07:23PM +0200, Jesper Dangaard Brouer wrote:
> On Fri, 29 Sep 2017 20:06:09 -0700
> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> 
> > > +/*** Trace point code ***/
> > > +
> > > +/* Tracepoint format: /sys/kernel/debug/tracing/events/xdp/xdp_redirect/format
> > > + * Code in:                kernel/include/trace/events/xdp.h
> > > + */
> > > +struct xdp_redirect_ctx {
> > > +	unsigned short common_type;	//	offset:0;  size:2; signed:0;
> > > +	unsigned char common_flags;	//	offset:2;  size:1; signed:0;
> > > +	unsigned char common_preempt_count;//	offset:3;  size:1; signed:0;
> > > +	int common_pid;			//	offset:4;  size:4; signed:1;  
> > 
> > this part is not right. First 8 bytes are not accessible by bpf code.
> > Please use __u64 pad; or similar here.
> 
> I've corrected this in V3.
> 
> Can you explain why BPF cannot access these (first 8 bytes) struct members?

in the very first patch adding bpf to tracepoint we allowed the access,
but then people complained that common_* stuff is not abi and since it was
slow to populate anyway due to local_save_flags (which is never used by
scripts we had so far) and for pid there was a helper already, I was happy
to get rid of it and save these 8 bytes for future use. which ended up
being used by hidden(inaccessible) pt_regs pointer.
Layout description is in
commit 98b5c2c65c29 ("perf, bpf: allow bpf programs attach to tracepoints")
The work on dynamic bits of tracepoints is still tbd.

^ permalink raw reply

* [net-next 01/15] i40e: limit lan queue count in large CPU count machine
From: Jeff Kirsher @ 2017-10-02 19:48 UTC (permalink / raw)
  To: davem; +Cc: Shannon Nelson, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171002194852.71970-1-jeffrey.t.kirsher@intel.com>

From: Shannon Nelson <shannon.nelson@oracle.com>

When a machine has more CPUs than queue pairs, e.g. 512 cores, the
counting gets a little funky and turns off Flow Director with the
message:
  not enough queues for Flow Director. Flow Director feature is disabled

This patch limits the number of lan queues initially allocated to
be sure we have some left for FD and other features.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 47f71d7c3ae0..387f0863f794 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -11093,6 +11093,7 @@ static int i40e_setup_pf_switch(struct i40e_pf *pf, bool reinit)
 static void i40e_determine_queue_usage(struct i40e_pf *pf)
 {
 	int queues_left;
+	int q_max;
 
 	pf->num_lan_qps = 0;
 
@@ -11139,10 +11140,12 @@ static void i40e_determine_queue_usage(struct i40e_pf *pf)
 					I40E_FLAG_DCB_ENABLED);
 			dev_info(&pf->pdev->dev, "not enough queues for DCB. DCB is disabled.\n");
 		}
-		pf->num_lan_qps = max_t(int, pf->rss_size_max,
-					num_online_cpus());
-		pf->num_lan_qps = min_t(int, pf->num_lan_qps,
-					pf->hw.func_caps.num_tx_qp);
+
+		/* limit lan qps to the smaller of qps, cpus or msix */
+		q_max = max_t(int, pf->rss_size_max, num_online_cpus());
+		q_max = min_t(int, q_max, pf->hw.func_caps.num_tx_qp);
+		q_max = min_t(int, q_max, pf->hw.func_caps.num_msix_vectors);
+		pf->num_lan_qps = q_max;
 
 		queues_left -= pf->num_lan_qps;
 	}
-- 
2.14.2

^ permalink raw reply related

* [net-next 05/15] i40e: Add support for 'ethtool -m'
From: Jeff Kirsher @ 2017-10-02 19:48 UTC (permalink / raw)
  To: davem; +Cc: Filip Sadowski, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171002194852.71970-1-jeffrey.t.kirsher@intel.com>

From: Filip Sadowski <filip.sadowski@intel.com>

This patch adds support for 'ethtool -m' command which displays
information about (Q)SFP+ module plugged into NIC's cage.

Signed-off-by: Filip Sadowski <filip.sadowski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h  |  18 +++
 drivers/net/ethernet/intel/i40e/i40e_common.c      |  69 +++++++++
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c     | 154 +++++++++++++++++++++
 drivers/net/ethernet/intel/i40e/i40e_prototype.h   |   9 ++
 drivers/net/ethernet/intel/i40e/i40e_type.h        |  13 ++
 .../net/ethernet/intel/i40evf/i40e_adminq_cmd.h    |  18 +++
 drivers/net/ethernet/intel/i40evf/i40e_common.c    |  69 +++++++++
 drivers/net/ethernet/intel/i40evf/i40e_prototype.h |   9 ++
 drivers/net/ethernet/intel/i40evf/i40e_type.h      |  12 ++
 9 files changed, 371 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
index 5d0291c1337e..ed7bbe14bc6e 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
@@ -244,6 +244,8 @@ enum i40e_admin_queue_opc {
 	i40e_aqc_opc_set_phy_debug		= 0x0622,
 	i40e_aqc_opc_upload_ext_phy_fm		= 0x0625,
 	i40e_aqc_opc_run_phy_activity		= 0x0626,
+	i40e_aqc_opc_set_phy_register		= 0x0628,
+	i40e_aqc_opc_get_phy_register		= 0x0629,
 
 	/* NVM commands */
 	i40e_aqc_opc_nvm_read			= 0x0701,
@@ -2053,6 +2055,22 @@ struct i40e_aqc_run_phy_activity {
 
 I40E_CHECK_CMD_LENGTH(i40e_aqc_run_phy_activity);
 
+/* Set PHY Register command (0x0628) */
+/* Get PHY Register command (0x0629) */
+struct i40e_aqc_phy_register_access {
+	u8	phy_interface;
+#define I40E_AQ_PHY_REG_ACCESS_INTERNAL	0
+#define I40E_AQ_PHY_REG_ACCESS_EXTERNAL	1
+#define I40E_AQ_PHY_REG_ACCESS_EXTERNAL_MODULE	2
+	u8	dev_address;
+	u8	reserved1[2];
+	__le32	reg_address;
+	__le32	reg_value;
+	u8	reserved2[4];
+};
+
+I40E_CHECK_CMD_LENGTH(i40e_aqc_phy_register_access);
+
 /* NVM Read command (indirect 0x0701)
  * NVM Erase commands (direct 0x0702)
  * NVM Update commands (indirect 0x0703)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c b/drivers/net/ethernet/intel/i40e/i40e_common.c
index 64c15f4c9d2b..fada03799850 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_common.c
@@ -5062,6 +5062,75 @@ void i40e_write_rx_ctl(struct i40e_hw *hw, u32 reg_addr, u32 reg_val)
 		wr32(hw, reg_addr, reg_val);
 }
 
+/**
+ * i40e_aq_set_phy_register
+ * @hw: pointer to the hw struct
+ * @phy_select: select which phy should be accessed
+ * @dev_addr: PHY device address
+ * @reg_addr: PHY register address
+ * @reg_val: new register value
+ * @cmd_details: pointer to command details structure or NULL
+ *
+ * Write the external PHY register.
+ **/
+i40e_status i40e_aq_set_phy_register(struct i40e_hw *hw,
+				     u8 phy_select, u8 dev_addr,
+				     u32 reg_addr, u32 reg_val,
+				     struct i40e_asq_cmd_details *cmd_details)
+{
+	struct i40e_aq_desc desc;
+	struct i40e_aqc_phy_register_access *cmd =
+		(struct i40e_aqc_phy_register_access *)&desc.params.raw;
+	i40e_status status;
+
+	i40e_fill_default_direct_cmd_desc(&desc,
+					  i40e_aqc_opc_set_phy_register);
+
+	cmd->phy_interface = phy_select;
+	cmd->dev_address = dev_addr;
+	cmd->reg_address = cpu_to_le32(reg_addr);
+	cmd->reg_value = cpu_to_le32(reg_val);
+
+	status = i40e_asq_send_command(hw, &desc, NULL, 0, cmd_details);
+
+	return status;
+}
+
+/**
+ * i40e_aq_get_phy_register
+ * @hw: pointer to the hw struct
+ * @phy_select: select which phy should be accessed
+ * @dev_addr: PHY device address
+ * @reg_addr: PHY register address
+ * @reg_val: read register value
+ * @cmd_details: pointer to command details structure or NULL
+ *
+ * Read the external PHY register.
+ **/
+i40e_status i40e_aq_get_phy_register(struct i40e_hw *hw,
+				     u8 phy_select, u8 dev_addr,
+				     u32 reg_addr, u32 *reg_val,
+				     struct i40e_asq_cmd_details *cmd_details)
+{
+	struct i40e_aq_desc desc;
+	struct i40e_aqc_phy_register_access *cmd =
+		(struct i40e_aqc_phy_register_access *)&desc.params.raw;
+	i40e_status status;
+
+	i40e_fill_default_direct_cmd_desc(&desc,
+					  i40e_aqc_opc_get_phy_register);
+
+	cmd->phy_interface = phy_select;
+	cmd->dev_address = dev_addr;
+	cmd->reg_address = cpu_to_le32(reg_addr);
+
+	status = i40e_asq_send_command(hw, &desc, NULL, 0, cmd_details);
+	if (!status)
+		*reg_val = le32_to_cpu(cmd->reg_value);
+
+	return status;
+}
+
 /**
  * i40e_aq_write_ppp - Write pipeline personalization profile (ppp)
  * @hw: pointer to the hw struct
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index 05e89864f781..1136d02e2e95 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -4196,6 +4196,158 @@ static int i40e_set_priv_flags(struct net_device *dev, u32 flags)
 	return 0;
 }
 
+/**
+ * i40e_get_module_info - get (Q)SFP+ module type info
+ * @netdev: network interface device structure
+ * @modinfo: module EEPROM size and layout information structure
+ **/
+static int i40e_get_module_info(struct net_device *netdev,
+				struct ethtool_modinfo *modinfo)
+{
+	struct i40e_netdev_priv *np = netdev_priv(netdev);
+	struct i40e_vsi *vsi = np->vsi;
+	struct i40e_pf *pf = vsi->back;
+	struct i40e_hw *hw = &pf->hw;
+	u32 sff8472_comp = 0;
+	u32 sff8472_swap = 0;
+	u32 sff8636_rev = 0;
+	i40e_status status;
+	u32 type = 0;
+
+	/* Check if firmware supports reading module EEPROM. */
+	if (!(hw->flags & I40E_HW_FLAG_AQ_PHY_ACCESS_CAPABLE)) {
+		netdev_err(vsi->netdev, "Module EEPROM memory read not supported. Please update the NVM image.\n");
+		return -EINVAL;
+	}
+
+	status = i40e_update_link_info(hw);
+	if (status)
+		return -EIO;
+
+	if (hw->phy.link_info.phy_type == I40E_PHY_TYPE_EMPTY) {
+		netdev_err(vsi->netdev, "Cannot read module EEPROM memory. No module connected.\n");
+		return -EINVAL;
+	}
+
+	type = hw->phy.link_info.module_type[0];
+
+	switch (type) {
+	case I40E_MODULE_TYPE_SFP:
+		status = i40e_aq_get_phy_register(hw,
+				I40E_AQ_PHY_REG_ACCESS_EXTERNAL_MODULE,
+				I40E_I2C_EEPROM_DEV_ADDR,
+				I40E_MODULE_SFF_8472_COMP,
+				&sff8472_comp, NULL);
+		if (status)
+			return -EIO;
+
+		status = i40e_aq_get_phy_register(hw,
+				I40E_AQ_PHY_REG_ACCESS_EXTERNAL_MODULE,
+				I40E_I2C_EEPROM_DEV_ADDR,
+				I40E_MODULE_SFF_8472_SWAP,
+				&sff8472_swap, NULL);
+		if (status)
+			return -EIO;
+
+		/* Check if the module requires address swap to access
+		 * the other EEPROM memory page.
+		 */
+		if (sff8472_swap & I40E_MODULE_SFF_ADDR_MODE) {
+			netdev_warn(vsi->netdev, "Module address swap to access page 0xA2 is not supported.\n");
+			modinfo->type = ETH_MODULE_SFF_8079;
+			modinfo->eeprom_len = ETH_MODULE_SFF_8079_LEN;
+		} else if (sff8472_comp == 0x00) {
+			/* Module is not SFF-8472 compliant */
+			modinfo->type = ETH_MODULE_SFF_8079;
+			modinfo->eeprom_len = ETH_MODULE_SFF_8079_LEN;
+		} else {
+			modinfo->type = ETH_MODULE_SFF_8472;
+			modinfo->eeprom_len = ETH_MODULE_SFF_8472_LEN;
+		}
+		break;
+	case I40E_MODULE_TYPE_QSFP_PLUS:
+		/* Read from memory page 0. */
+		status = i40e_aq_get_phy_register(hw,
+				I40E_AQ_PHY_REG_ACCESS_EXTERNAL_MODULE,
+				0,
+				I40E_MODULE_REVISION_ADDR,
+				&sff8636_rev, NULL);
+		if (status)
+			return -EIO;
+		/* Determine revision compliance byte */
+		if (sff8636_rev > 0x02) {
+			/* Module is SFF-8636 compliant */
+			modinfo->type = ETH_MODULE_SFF_8636;
+			modinfo->eeprom_len = I40E_MODULE_QSFP_MAX_LEN;
+		} else {
+			modinfo->type = ETH_MODULE_SFF_8436;
+			modinfo->eeprom_len = I40E_MODULE_QSFP_MAX_LEN;
+		}
+		break;
+	case I40E_MODULE_TYPE_QSFP28:
+		modinfo->type = ETH_MODULE_SFF_8636;
+		modinfo->eeprom_len = I40E_MODULE_QSFP_MAX_LEN;
+		break;
+	default:
+		netdev_err(vsi->netdev, "Module type unrecognized\n");
+		return -EINVAL;
+	}
+	return 0;
+}
+
+/**
+ * i40e_get_module_eeprom - fills buffer with (Q)SFP+ module memory contents
+ * @netdev: network interface device structure
+ * @ee: EEPROM dump request structure
+ * @data: buffer to be filled with EEPROM contents
+ **/
+static int i40e_get_module_eeprom(struct net_device *netdev,
+				  struct ethtool_eeprom *ee,
+				  u8 *data)
+{
+	struct i40e_netdev_priv *np = netdev_priv(netdev);
+	struct i40e_vsi *vsi = np->vsi;
+	struct i40e_pf *pf = vsi->back;
+	struct i40e_hw *hw = &pf->hw;
+	bool is_sfp = false;
+	i40e_status status;
+	u32 value = 0;
+	int i;
+
+	if (!ee || !ee->len || !data)
+		return -EINVAL;
+
+	if (hw->phy.link_info.module_type[0] == I40E_MODULE_TYPE_SFP)
+		is_sfp = true;
+
+	for (i = 0; i < ee->len; i++) {
+		u32 offset = i + ee->offset;
+		u32 addr = is_sfp ? I40E_I2C_EEPROM_DEV_ADDR : 0;
+
+		/* Check if we need to access the other memory page */
+		if (is_sfp) {
+			if (offset >= ETH_MODULE_SFF_8079_LEN) {
+				offset -= ETH_MODULE_SFF_8079_LEN;
+				addr = I40E_I2C_EEPROM_DEV_ADDR2;
+			}
+		} else {
+			while (offset >= ETH_MODULE_SFF_8436_LEN) {
+				/* Compute memory page number and offset. */
+				offset -= ETH_MODULE_SFF_8436_LEN / 2;
+				addr++;
+			}
+		}
+
+		status = i40e_aq_get_phy_register(hw,
+				I40E_AQ_PHY_REG_ACCESS_EXTERNAL_MODULE,
+				addr, offset, &value, NULL);
+		if (status)
+			return -EIO;
+		data[i] = value;
+	}
+	return 0;
+}
+
 static const struct ethtool_ops i40e_ethtool_ops = {
 	.get_drvinfo		= i40e_get_drvinfo,
 	.get_regs_len		= i40e_get_regs_len,
@@ -4228,6 +4380,8 @@ static const struct ethtool_ops i40e_ethtool_ops = {
 	.set_rxfh		= i40e_set_rxfh,
 	.get_channels		= i40e_get_channels,
 	.set_channels		= i40e_set_channels,
+	.get_module_info	= i40e_get_module_info,
+	.get_module_eeprom	= i40e_get_module_eeprom,
 	.get_ts_info		= i40e_get_ts_info,
 	.get_priv_flags		= i40e_get_priv_flags,
 	.set_priv_flags		= i40e_set_priv_flags,
diff --git a/drivers/net/ethernet/intel/i40e/i40e_prototype.h b/drivers/net/ethernet/intel/i40e/i40e_prototype.h
index a39b13197891..01502561035c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_prototype.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_prototype.h
@@ -360,6 +360,15 @@ i40e_status i40e_aq_rx_ctl_write_register(struct i40e_hw *hw,
 				u32 reg_addr, u32 reg_val,
 				struct i40e_asq_cmd_details *cmd_details);
 void i40e_write_rx_ctl(struct i40e_hw *hw, u32 reg_addr, u32 reg_val);
+i40e_status i40e_aq_set_phy_register(struct i40e_hw *hw,
+				     u8 phy_select, u8 dev_addr,
+				     u32 reg_addr, u32 reg_val,
+				     struct i40e_asq_cmd_details *cmd_details);
+i40e_status i40e_aq_get_phy_register(struct i40e_hw *hw,
+				     u8 phy_select, u8 dev_addr,
+				     u32 reg_addr, u32 *reg_val,
+				     struct i40e_asq_cmd_details *cmd_details);
+
 i40e_status i40e_read_phy_register_clause22(struct i40e_hw *hw,
 					    u16 reg, u8 phy_addr, u16 *value);
 i40e_status i40e_write_phy_register_clause22(struct i40e_hw *hw,
diff --git a/drivers/net/ethernet/intel/i40e/i40e_type.h b/drivers/net/ethernet/intel/i40e/i40e_type.h
index fd4bbdd88b57..8b0b9f826b7f 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_type.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_type.h
@@ -428,6 +428,18 @@ struct i40e_nvm_access {
 	u8 data[1];
 };
 
+/* (Q)SFP module access definitions */
+#define I40E_I2C_EEPROM_DEV_ADDR	0xA0
+#define I40E_I2C_EEPROM_DEV_ADDR2	0xA2
+#define I40E_MODULE_TYPE_ADDR		0x00
+#define I40E_MODULE_REVISION_ADDR	0x01
+#define I40E_MODULE_SFF_8472_COMP	0x5E
+#define I40E_MODULE_SFF_8472_SWAP	0x5C
+#define I40E_MODULE_SFF_ADDR_MODE	0x04
+#define I40E_MODULE_TYPE_QSFP_PLUS	0x0D
+#define I40E_MODULE_TYPE_QSFP28		0x11
+#define I40E_MODULE_QSFP_MAX_LEN	640
+
 /* PCI bus types */
 enum i40e_bus_type {
 	i40e_bus_type_unknown = 0,
@@ -598,6 +610,7 @@ struct i40e_hw {
 	struct i40e_dcbx_config desired_dcbx_config; /* CEE Desired Cfg */
 
 #define I40E_HW_FLAG_AQ_SRCTL_ACCESS_ENABLE BIT_ULL(0)
+#define I40E_HW_FLAG_AQ_PHY_ACCESS_CAPABLE  BIT_ULL(2)
 	u64 flags;
 
 	/* debug mask */
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h b/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h
index 709d114fc305..eee7ece42b39 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h
@@ -244,6 +244,8 @@ enum i40e_admin_queue_opc {
 	i40e_aqc_opc_set_phy_debug		= 0x0622,
 	i40e_aqc_opc_upload_ext_phy_fm		= 0x0625,
 	i40e_aqc_opc_run_phy_activity		= 0x0626,
+	i40e_aqc_opc_set_phy_register		= 0x0628,
+	i40e_aqc_opc_get_phy_register		= 0x0629,
 
 	/* NVM commands */
 	i40e_aqc_opc_nvm_read			= 0x0701,
@@ -2046,6 +2048,22 @@ struct i40e_aqc_run_phy_activity {
 
 I40E_CHECK_CMD_LENGTH(i40e_aqc_run_phy_activity);
 
+/* Set PHY Register command (0x0628) */
+/* Get PHY Register command (0x0629) */
+struct i40e_aqc_phy_register_access {
+	u8	phy_interface;
+#define I40E_AQ_PHY_REG_ACCESS_INTERNAL	0
+#define I40E_AQ_PHY_REG_ACCESS_EXTERNAL	1
+#define I40E_AQ_PHY_REG_ACCESS_EXTERNAL_MODULE	2
+	u8	dev_address;
+	u8	reserved1[2];
+	__le32	reg_address;
+	__le32	reg_value;
+	u8	reserved2[4];
+};
+
+I40E_CHECK_CMD_LENGTH(i40e_aqc_phy_register_access);
+
 /* NVM Read command (indirect 0x0701)
  * NVM Erase commands (direct 0x0702)
  * NVM Update commands (indirect 0x0703)
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_common.c b/drivers/net/ethernet/intel/i40evf/i40e_common.c
index 8d3a2bfe186a..7d70bf69b249 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_common.c
@@ -1041,6 +1041,75 @@ void i40evf_write_rx_ctl(struct i40e_hw *hw, u32 reg_addr, u32 reg_val)
 		wr32(hw, reg_addr, reg_val);
 }
 
+/**
+ * i40evf_aq_set_phy_register
+ * @hw: pointer to the hw struct
+ * @phy_select: select which phy should be accessed
+ * @dev_addr: PHY device address
+ * @reg_addr: PHY register address
+ * @reg_val: new register value
+ * @cmd_details: pointer to command details structure or NULL
+ *
+ * Reset the external PHY.
+ **/
+i40e_status i40evf_aq_set_phy_register(struct i40e_hw *hw,
+				       u8 phy_select, u8 dev_addr,
+				       u32 reg_addr, u32 reg_val,
+				       struct i40e_asq_cmd_details *cmd_details)
+{
+	struct i40e_aq_desc desc;
+	struct i40e_aqc_phy_register_access *cmd =
+		(struct i40e_aqc_phy_register_access *)&desc.params.raw;
+	i40e_status status;
+
+	i40evf_fill_default_direct_cmd_desc(&desc,
+					    i40e_aqc_opc_set_phy_register);
+
+	cmd->phy_interface = phy_select;
+	cmd->dev_address = dev_addr;
+	cmd->reg_address = cpu_to_le32(reg_addr);
+	cmd->reg_value = cpu_to_le32(reg_val);
+
+	status = i40evf_asq_send_command(hw, &desc, NULL, 0, cmd_details);
+
+	return status;
+}
+
+/**
+ * i40evf_aq_get_phy_register
+ * @hw: pointer to the hw struct
+ * @phy_select: select which phy should be accessed
+ * @dev_addr: PHY device address
+ * @reg_addr: PHY register address
+ * @reg_val: read register value
+ * @cmd_details: pointer to command details structure or NULL
+ *
+ * Reset the external PHY.
+ **/
+i40e_status i40evf_aq_get_phy_register(struct i40e_hw *hw,
+				       u8 phy_select, u8 dev_addr,
+				       u32 reg_addr, u32 *reg_val,
+				       struct i40e_asq_cmd_details *cmd_details)
+{
+	struct i40e_aq_desc desc;
+	struct i40e_aqc_phy_register_access *cmd =
+		(struct i40e_aqc_phy_register_access *)&desc.params.raw;
+	i40e_status status;
+
+	i40evf_fill_default_direct_cmd_desc(&desc,
+					    i40e_aqc_opc_get_phy_register);
+
+	cmd->phy_interface = phy_select;
+	cmd->dev_address = dev_addr;
+	cmd->reg_address = cpu_to_le32(reg_addr);
+
+	status = i40evf_asq_send_command(hw, &desc, NULL, 0, cmd_details);
+	if (!status)
+		*reg_val = le32_to_cpu(cmd->reg_value);
+
+	return status;
+}
+
 /**
  * i40e_aq_send_msg_to_pf
  * @hw: pointer to the hardware structure
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_prototype.h b/drivers/net/ethernet/intel/i40evf/i40e_prototype.h
index c9836bba487d..b624b5994075 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_prototype.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_prototype.h
@@ -111,6 +111,15 @@ i40e_status i40evf_aq_rx_ctl_write_register(struct i40e_hw *hw,
 				u32 reg_addr, u32 reg_val,
 				struct i40e_asq_cmd_details *cmd_details);
 void i40evf_write_rx_ctl(struct i40e_hw *hw, u32 reg_addr, u32 reg_val);
+i40e_status i40e_aq_set_phy_register(struct i40e_hw *hw,
+				     u8 phy_select, u8 dev_addr,
+				     u32 reg_addr, u32 reg_val,
+				     struct i40e_asq_cmd_details *cmd_details);
+i40e_status i40e_aq_get_phy_register(struct i40e_hw *hw,
+				     u8 phy_select, u8 dev_addr,
+				     u32 reg_addr, u32 *reg_val,
+				     struct i40e_asq_cmd_details *cmd_details);
+
 i40e_status i40e_read_phy_register(struct i40e_hw *hw, u8 page,
 				   u16 reg, u8 phy_addr, u16 *value);
 i40e_status i40e_write_phy_register(struct i40e_hw *hw, u8 page,
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_type.h b/drivers/net/ethernet/intel/i40evf/i40e_type.h
index 2ea919d9cdcf..b53584e3d580 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_type.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_type.h
@@ -401,6 +401,18 @@ struct i40e_nvm_access {
 	u8 data[1];
 };
 
+/* (Q)SFP module access definitions */
+#define I40E_I2C_EEPROM_DEV_ADDR	0xA0
+#define I40E_I2C_EEPROM_DEV_ADDR2	0xA2
+#define I40E_MODULE_TYPE_ADDR		0x00
+#define I40E_MODULE_REVISION_ADDR	0x01
+#define I40E_MODULE_SFF_8472_COMP	0x5E
+#define I40E_MODULE_SFF_8472_SWAP	0x5C
+#define I40E_MODULE_SFF_ADDR_MODE	0x04
+#define I40E_MODULE_TYPE_QSFP_PLUS	0x0D
+#define I40E_MODULE_TYPE_QSFP28		0x11
+#define I40E_MODULE_QSFP_MAX_LEN	640
+
 /* PCI bus types */
 enum i40e_bus_type {
 	i40e_bus_type_unknown = 0,
-- 
2.14.2

^ permalink raw reply related

* [net-next 02/15] i40e: remove logically dead code
From: Jeff Kirsher @ 2017-10-02 19:48 UTC (permalink / raw)
  To: davem; +Cc: Lihong Yang, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171002194852.71970-1-jeffrey.t.kirsher@intel.com>

From: Lihong Yang <lihong.yang@intel.com>

This patch removes the !vf condition check that cannot be
true in i40e_ndo_set_vf_trust function

Detected by CoverityScan, CID 1397531 Logically dead code

Signed-off-by: Lihong Yang <lihong.yang@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index a75396c157d9..e6b95e1e1a33 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -3354,8 +3354,6 @@ int i40e_ndo_set_vf_trust(struct net_device *netdev, int vf_id, bool setting)
 
 	vf = &pf->vf[vf_id];
 
-	if (!vf)
-		return -EINVAL;
 	if (setting == vf->trusted)
 		goto out;
 
-- 
2.14.2

^ permalink raw reply related

* [net-next 00/15][pull request] 40GbE Intel Wired LAN Driver Updates 2017-10-02
From: Jeff Kirsher @ 2017-10-02 19:48 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, nhorman, sassmann, jogreene

This series contains updates to i40e and i40evf.

Shannon Nelson fixes an issue where when a machine has more CPUs than
queue pairs, the counting gets a "little funky" and turns off Flow
Director.  So to correct it, limit the number of LAN queues initially
allocated to be sure there are some left for Flow Director and other
features.

Lihong cleans up dead code by removing a condition check which cannot
ever be true.

Christophe Jaillet fixes a potential NULL pointer dereference, which
could happen if kzalloc() fails.

Filip corrects the reporting of supported link modes, which was incorrect
for some NICs.  Added support for 'ethtool -m' command, which displays
information about QSFP+ modules.

Mariusz adds functions to read/write the LED registers to control the
LEDS, instead of accessing the registers directly whenever the LEDs
need to be controlled.

Jake fixes a regression where we introduced a scheduling while atomic,
so introduce a separate helper function which will manage its own need
for the mac_filter_hash_lock.  Also cleaned up the "PF" parameter in
i40e_vc_disable_vf() since it is never used and is not needed.  Fixed
a rare case where it is possible that a reset does not occur when
i40e_vc_disable_vf() is called, so modify i40e_reset_vf() to return a
bool to indicate whether it reset or not so that i40e_vc_disable_vf()
can wait until a reset actually occurs.

Alan adds the ability for the VF to request more or less underlying
allocated queues from the PF.  Fixes the incorrect method for clearing
the vf_states variable with a NULL assignment, when we should be
using atomic bitops since we don't actually want to clear all the
flags.  Fixed a resource leak, where the PF driver fails to inform
clients of a VF reset because we were incorrectly checking the
I40E_VF_STATE_PRE_ENABLE bit.

Mitch converts i40evf_map_rings_to_vectors() to a void function since
it cannot fail and allows us to clean up the checks for the function
return value.

Scott enables the driver(s) to pass traffic with VLAN tags using the
802.1ad Ethernet protocol.

The following are changes since commit 1dd236fda0c500a21c54f2140dadc488cde9265b:
  Merge branch 'mlxsw-Fixlets'
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 40GbE

Alan Brady (3):
  i40evf: Enable VF to request an alternate queue allocation
  i40e: fix handling of vf_states variable
  i40e: fix client notify of VF reset

Christophe JAILLET (1):
  i40e: Fix a potential NULL pointer dereference

Filip Sadowski (2):
  i40e: Fix reporting of supported link modes
  i40e: Add support for 'ethtool -m'

Jacob Keller (4):
  i40e: don't hold spinlock while resetting VF
  i40e: drop i40e_pf *pf from i40e_vc_disable_vf()
  i40e: make use of i40e_vc_disable_vf
  i40e: ensure reset occurs when disabling VF

Lihong Yang (1):
  i40e: remove logically dead code

Mariusz Stachura (1):
  i40e: use admin queue for setting LEDs behavior

Mitch Williams (1):
  i40e: make i40evf_map_rings_to_vectors void

Scott Peterson (1):
  i40e: Stop dropping 802.1ad tags - eth proto 0x88a8

Shannon Nelson (1):
  i40e: limit lan queue count in large CPU count machine

 drivers/net/ethernet/intel/i40e/i40e_adminq.c      |  12 ++
 drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h  |  55 +++++-
 drivers/net/ethernet/intel/i40e/i40e_common.c      | 201 ++++++++++++++++++---
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c     | 154 ++++++++++++++++
 drivers/net/ethernet/intel/i40e/i40e_main.c        |  18 +-
 drivers/net/ethernet/intel/i40e/i40e_prototype.h   |   9 +
 drivers/net/ethernet/intel/i40e/i40e_type.h        |  19 ++
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 102 ++++++++---
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h |   5 +-
 .../net/ethernet/intel/i40evf/i40e_adminq_cmd.h    |  55 +++++-
 drivers/net/ethernet/intel/i40evf/i40e_common.c    |  69 +++++++
 drivers/net/ethernet/intel/i40evf/i40e_prototype.h |   9 +
 drivers/net/ethernet/intel/i40evf/i40e_type.h      |  20 ++
 drivers/net/ethernet/intel/i40evf/i40evf.h         |   4 +
 drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c |  38 +++-
 drivers/net/ethernet/intel/i40evf/i40evf_main.c    |  97 +++++++++-
 .../net/ethernet/intel/i40evf/i40evf_virtchnl.c    |  44 ++++-
 17 files changed, 837 insertions(+), 74 deletions(-)

-- 
2.14.2

^ permalink raw reply

* [net-next 04/15] i40e: Fix reporting of supported link modes
From: Jeff Kirsher @ 2017-10-02 19:48 UTC (permalink / raw)
  To: davem; +Cc: Filip Sadowski, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171002194852.71970-1-jeffrey.t.kirsher@intel.com>

From: Filip Sadowski <filip.sadowski@intel.com>

This patch fixes incorrect reporting of supported link modes on some NICs.

Signed-off-by: Filip Sadowski <filip.sadowski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h   | 20 ++++++++++++++++++--
 drivers/net/ethernet/intel/i40e/i40e_common.c       | 11 ++++++++++-
 drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h | 20 ++++++++++++++++++--
 3 files changed, 46 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
index e2a9ec80a623..5d0291c1337e 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
@@ -1734,6 +1734,8 @@ enum i40e_aq_phy_type {
 	I40E_PHY_TYPE_10GBASE_CR1_CU		= 0xB,
 	I40E_PHY_TYPE_10GBASE_AOC		= 0xC,
 	I40E_PHY_TYPE_40GBASE_AOC		= 0xD,
+	I40E_PHY_TYPE_UNRECOGNIZED		= 0xE,
+	I40E_PHY_TYPE_UNSUPPORTED		= 0xF,
 	I40E_PHY_TYPE_100BASE_TX		= 0x11,
 	I40E_PHY_TYPE_1000BASE_T		= 0x12,
 	I40E_PHY_TYPE_10GBASE_T			= 0x13,
@@ -1752,6 +1754,8 @@ enum i40e_aq_phy_type {
 	I40E_PHY_TYPE_25GBASE_CR		= 0x20,
 	I40E_PHY_TYPE_25GBASE_SR		= 0x21,
 	I40E_PHY_TYPE_25GBASE_LR		= 0x22,
+	I40E_PHY_TYPE_EMPTY			= 0xFE,
+	I40E_PHY_TYPE_DEFAULT			= 0xFF,
 	I40E_PHY_TYPE_MAX
 };
 
@@ -1942,19 +1946,31 @@ struct i40e_aqc_get_link_status {
 #define I40E_AQ_25G_SERDES_UCODE_ERR	0X04
 #define I40E_AQ_25G_NIMB_UCODE_ERR	0X05
 	u8	loopback; /* use defines from i40e_aqc_set_lb_mode */
+/* Since firmware API 1.7 loopback field keeps power class info as well */
+#define I40E_AQ_LOOPBACK_MASK		0x07
+#define I40E_AQ_PWR_CLASS_SHIFT_LB	6
+#define I40E_AQ_PWR_CLASS_MASK_LB	(0x03 << I40E_AQ_PWR_CLASS_SHIFT_LB)
 	__le16	max_frame_size;
 	u8	config;
 #define I40E_AQ_CONFIG_FEC_KR_ENA	0x01
 #define I40E_AQ_CONFIG_FEC_RS_ENA	0x02
 #define I40E_AQ_CONFIG_CRC_ENA		0x04
 #define I40E_AQ_CONFIG_PACING_MASK	0x78
-	u8	power_desc;
+	union {
+		struct {
+			u8	power_desc;
 #define I40E_AQ_LINK_POWER_CLASS_1	0x00
 #define I40E_AQ_LINK_POWER_CLASS_2	0x01
 #define I40E_AQ_LINK_POWER_CLASS_3	0x02
 #define I40E_AQ_LINK_POWER_CLASS_4	0x03
 #define I40E_AQ_PWR_CLASS_MASK		0x03
-	u8	reserved[4];
+			u8	reserved[4];
+		};
+		struct {
+			u8	link_type[4];
+			u8	link_type_ext;
+		};
+	};
 };
 
 I40E_CHECK_CMD_LENGTH(i40e_aqc_get_link_status);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c b/drivers/net/ethernet/intel/i40e/i40e_common.c
index 7346d8850c8e..64c15f4c9d2b 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_common.c
@@ -1821,7 +1821,7 @@ i40e_status i40e_aq_get_link_info(struct i40e_hw *hw,
 	hw_link_info->fec_info = resp->config & (I40E_AQ_CONFIG_FEC_KR_ENA |
 						 I40E_AQ_CONFIG_FEC_RS_ENA);
 	hw_link_info->ext_info = resp->ext_info;
-	hw_link_info->loopback = resp->loopback;
+	hw_link_info->loopback = resp->loopback & I40E_AQ_LOOPBACK_MASK;
 	hw_link_info->max_frame_size = le16_to_cpu(resp->max_frame_size);
 	hw_link_info->pacing = resp->config & I40E_AQ_CONFIG_PACING_MASK;
 
@@ -1852,6 +1852,15 @@ i40e_status i40e_aq_get_link_info(struct i40e_hw *hw,
 	     hw->aq.fw_min_ver < 40)) && hw_link_info->phy_type == 0xE)
 		hw_link_info->phy_type = I40E_PHY_TYPE_10GBASE_SFPP_CU;
 
+	if (hw->aq.api_maj_ver == I40E_FW_API_VERSION_MAJOR &&
+	    hw->aq.api_min_ver >= 7) {
+		__le32 tmp;
+
+		memcpy(&tmp, resp->link_type, sizeof(tmp));
+		hw->phy.phy_types = le32_to_cpu(tmp);
+		hw->phy.phy_types |= ((u64)resp->link_type_ext << 32);
+	}
+
 	/* save link status information */
 	if (link)
 		*link = *hw_link_info;
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h b/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h
index f9f48d1900b0..709d114fc305 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h
@@ -1730,6 +1730,8 @@ enum i40e_aq_phy_type {
 	I40E_PHY_TYPE_10GBASE_CR1_CU		= 0xB,
 	I40E_PHY_TYPE_10GBASE_AOC		= 0xC,
 	I40E_PHY_TYPE_40GBASE_AOC		= 0xD,
+	I40E_PHY_TYPE_UNRECOGNIZED		= 0xE,
+	I40E_PHY_TYPE_UNSUPPORTED		= 0xF,
 	I40E_PHY_TYPE_100BASE_TX		= 0x11,
 	I40E_PHY_TYPE_1000BASE_T		= 0x12,
 	I40E_PHY_TYPE_10GBASE_T			= 0x13,
@@ -1748,6 +1750,8 @@ enum i40e_aq_phy_type {
 	I40E_PHY_TYPE_25GBASE_CR		= 0x20,
 	I40E_PHY_TYPE_25GBASE_SR		= 0x21,
 	I40E_PHY_TYPE_25GBASE_LR		= 0x22,
+	I40E_PHY_TYPE_EMPTY			= 0xFE,
+	I40E_PHY_TYPE_DEFAULT			= 0xFF,
 	I40E_PHY_TYPE_MAX
 };
 
@@ -1938,19 +1942,31 @@ struct i40e_aqc_get_link_status {
 #define I40E_AQ_25G_SERDES_UCODE_ERR	0X04
 #define I40E_AQ_25G_NIMB_UCODE_ERR	0X05
 	u8	loopback; /* use defines from i40e_aqc_set_lb_mode */
+/* Since firmware API 1.7 loopback field keeps power class info as well */
+#define I40E_AQ_LOOPBACK_MASK		0x07
+#define I40E_AQ_PWR_CLASS_SHIFT_LB	6
+#define I40E_AQ_PWR_CLASS_MASK_LB	(0x03 << I40E_AQ_PWR_CLASS_SHIFT_LB)
 	__le16	max_frame_size;
 	u8	config;
 #define I40E_AQ_CONFIG_FEC_KR_ENA	0x01
 #define I40E_AQ_CONFIG_FEC_RS_ENA	0x02
 #define I40E_AQ_CONFIG_CRC_ENA		0x04
 #define I40E_AQ_CONFIG_PACING_MASK	0x78
-	u8	power_desc;
+	union {
+		struct {
+			u8	power_desc;
 #define I40E_AQ_LINK_POWER_CLASS_1	0x00
 #define I40E_AQ_LINK_POWER_CLASS_2	0x01
 #define I40E_AQ_LINK_POWER_CLASS_3	0x02
 #define I40E_AQ_LINK_POWER_CLASS_4	0x03
 #define I40E_AQ_PWR_CLASS_MASK		0x03
-	u8	reserved[4];
+			u8	reserved[4];
+		};
+		struct {
+			u8	link_type[4];
+			u8	link_type_ext;
+		};
+	};
 };
 
 I40E_CHECK_CMD_LENGTH(i40e_aqc_get_link_status);
-- 
2.14.2

^ permalink raw reply related

* [net-next 10/15] i40e: ensure reset occurs when disabling VF
From: Jeff Kirsher @ 2017-10-02 19:48 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171002194852.71970-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

It is possible although rare that we may not reset when
i40e_vc_disable_vf() is called. This can lead to some weird
circumstances with some values not being properly set. Modify
i40e_reset_vf() to return a code indicating whether it reset or not.

Now, i40e_vc_disable_vf() can wait until a reset actually occurs. If it
fails to free up within a reasonable time frame we'll display a warning
message.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 42 +++++++++++++++++-----
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h |  4 +--
 2 files changed, 35 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 94ee243f110e..7742cf3d38d9 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -156,12 +156,28 @@ void i40e_vc_notify_vf_reset(struct i40e_vf *vf)
  * i40e_vc_disable_vf
  * @vf: pointer to the VF info
  *
- * Disable the VF through a SW reset
+ * Disable the VF through a SW reset.
  **/
 static inline void i40e_vc_disable_vf(struct i40e_vf *vf)
 {
+	int i;
+
 	i40e_vc_notify_vf_reset(vf);
-	i40e_reset_vf(vf, false);
+
+	/* We want to ensure that an actual reset occurs initiated after this
+	 * function was called. However, we do not want to wait forever, so
+	 * we'll give a reasonable time and print a message if we failed to
+	 * ensure a reset.
+	 */
+	for (i = 0; i < 20; i++) {
+		if (i40e_reset_vf(vf, false))
+			return;
+		usleep_range(10000, 20000);
+	}
+
+	dev_warn(&vf->pf->pdev->dev,
+		 "Failed to initiate reset for VF %d after 200 milliseconds\n",
+		 vf->vf_id);
 }
 
 /**
@@ -1051,9 +1067,9 @@ static void i40e_cleanup_reset_vf(struct i40e_vf *vf)
  * @vf: pointer to the VF structure
  * @flr: VFLR was issued or not
  *
- * reset the VF
+ * Returns true if the VF is reset, false otherwise.
  **/
-void i40e_reset_vf(struct i40e_vf *vf, bool flr)
+bool i40e_reset_vf(struct i40e_vf *vf, bool flr)
 {
 	struct i40e_pf *pf = vf->pf;
 	struct i40e_hw *hw = &pf->hw;
@@ -1061,9 +1077,11 @@ void i40e_reset_vf(struct i40e_vf *vf, bool flr)
 	u32 reg;
 	int i;
 
-	/* If VFs have been disabled, there is no need to reset */
+	/* If the VFs have been disabled, this means something else is
+	 * resetting the VF, so we shouldn't continue.
+	 */
 	if (test_and_set_bit(__I40E_VF_DISABLE, pf->state))
-		return;
+		return false;
 
 	i40e_trigger_vf_reset(vf, flr);
 
@@ -1100,6 +1118,8 @@ void i40e_reset_vf(struct i40e_vf *vf, bool flr)
 
 	i40e_flush(hw);
 	clear_bit(__I40E_VF_DISABLE, pf->state);
+
+	return true;
 }
 
 /**
@@ -1111,8 +1131,10 @@ void i40e_reset_vf(struct i40e_vf *vf, bool flr)
  * VF, then do all the waiting in one chunk, and finally finish restoring each
  * VF after the wait. This is useful during PF routines which need to reset
  * all VFs, as otherwise it must perform these resets in a serialized fashion.
+ *
+ * Returns true if any VFs were reset, and false otherwise.
  **/
-void i40e_reset_all_vfs(struct i40e_pf *pf, bool flr)
+bool i40e_reset_all_vfs(struct i40e_pf *pf, bool flr)
 {
 	struct i40e_hw *hw = &pf->hw;
 	struct i40e_vf *vf;
@@ -1121,11 +1143,11 @@ void i40e_reset_all_vfs(struct i40e_pf *pf, bool flr)
 
 	/* If we don't have any VFs, then there is nothing to reset */
 	if (!pf->num_alloc_vfs)
-		return;
+		return false;
 
 	/* If VFs have been disabled, there is no need to reset */
 	if (test_and_set_bit(__I40E_VF_DISABLE, pf->state))
-		return;
+		return false;
 
 	/* Begin reset on all VFs at once */
 	for (v = 0; v < pf->num_alloc_vfs; v++)
@@ -1200,6 +1222,8 @@ void i40e_reset_all_vfs(struct i40e_pf *pf, bool flr)
 
 	i40e_flush(hw);
 	clear_bit(__I40E_VF_DISABLE, pf->state);
+
+	return true;
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h
index 5111d05d5f2f..5ea42ad094bc 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h
@@ -122,8 +122,8 @@ int i40e_alloc_vfs(struct i40e_pf *pf, u16 num_alloc_vfs);
 int i40e_vc_process_vf_msg(struct i40e_pf *pf, s16 vf_id, u32 v_opcode,
 			   u32 v_retval, u8 *msg, u16 msglen);
 int i40e_vc_process_vflr_event(struct i40e_pf *pf);
-void i40e_reset_vf(struct i40e_vf *vf, bool flr);
-void i40e_reset_all_vfs(struct i40e_pf *pf, bool flr);
+bool i40e_reset_vf(struct i40e_vf *vf, bool flr);
+bool i40e_reset_all_vfs(struct i40e_pf *pf, bool flr);
 void i40e_vc_notify_vf_reset(struct i40e_vf *vf);
 
 /* VF configuration related iplink handlers */
-- 
2.14.2

^ permalink raw reply related

* [net-next 13/15] i40e: fix handling of vf_states variable
From: Jeff Kirsher @ 2017-10-02 19:48 UTC (permalink / raw)
  To: davem; +Cc: Alan Brady, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171002194852.71970-1-jeffrey.t.kirsher@intel.com>

From: Alan Brady <alan.brady@intel.com>

Currently we inappropriately clear the vf_states variable with a null
assignment.  This is problematic because we should be using atomic
bitops on this variable and we don't actually want to clear all the
flags.  We should just clear the ones we know we want to clear.
Additionally remove the I40E_VF_STATE_FCOEENA bit because it is no
longer being used.

Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 5 ++++-
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h | 1 -
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 7742cf3d38d9..989a65d60ac9 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -879,7 +879,8 @@ static void i40e_free_vf_res(struct i40e_vf *vf)
 	}
 	/* reset some of the state variables keeping track of the resources */
 	vf->num_queue_pairs = 0;
-	vf->vf_states = 0;
+	clear_bit(I40E_VF_STATE_MC_PROMISC, &vf->vf_states);
+	clear_bit(I40E_VF_STATE_UC_PROMISC, &vf->vf_states);
 }
 
 /**
@@ -1586,6 +1587,8 @@ static int i40e_vc_get_vf_resources_msg(struct i40e_vf *vf, u8 *msg)
 	    (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_IWARP)) {
 		vfres->vf_cap_flags |= VIRTCHNL_VF_OFFLOAD_IWARP;
 		set_bit(I40E_VF_STATE_IWARPENA, &vf->vf_states);
+	} else {
+		clear_bit(I40E_VF_STATE_IWARPENA, &vf->vf_states);
 	}
 
 	if (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_RSS_PF) {
diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h
index 5ea42ad094bc..5efc4f92bb37 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h
@@ -56,7 +56,6 @@ enum i40e_vf_states {
 	I40E_VF_STATE_INIT = 0,
 	I40E_VF_STATE_ACTIVE,
 	I40E_VF_STATE_IWARPENA,
-	I40E_VF_STATE_FCOEENA,
 	I40E_VF_STATE_DISABLED,
 	I40E_VF_STATE_MC_PROMISC,
 	I40E_VF_STATE_UC_PROMISC,
-- 
2.14.2

^ permalink raw reply related

* [net-next 14/15] i40e: fix client notify of VF reset
From: Jeff Kirsher @ 2017-10-02 19:48 UTC (permalink / raw)
  To: davem; +Cc: Alan Brady, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171002194852.71970-1-jeffrey.t.kirsher@intel.com>

From: Alan Brady <alan.brady@intel.com>

Currently there is a bug in which the PF driver fails to inform clients
of a VF reset which then causes clients to leak resources.  The bug
exists because we were incorrectly checking the I40E_VF_STATE_PRE_ENABLE
bit.

When a VF is first init we go through a reset to initialize variables
and allocate resources but we don't want to inform clients of this first
reset since the client isn't fully enabled yet so we set a state bit
signifying we're in a "pre-enabled" client state.  During the first
reset we should be clearing the bit, allowing all following resets to
notify the client of the reset when the bit is not set.  This patch
fixes the issue by negating the 'test_and_clear_bit' check to accurately
reflect the behavior we want.

Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 989a65d60ac9..04568137e029 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -1050,8 +1050,8 @@ static void i40e_cleanup_reset_vf(struct i40e_vf *vf)
 		set_bit(I40E_VF_STATE_ACTIVE, &vf->vf_states);
 		clear_bit(I40E_VF_STATE_DISABLED, &vf->vf_states);
 		/* Do not notify the client during VF init */
-		if (test_and_clear_bit(I40E_VF_STATE_PRE_ENABLE,
-				       &vf->vf_states))
+		if (!test_and_clear_bit(I40E_VF_STATE_PRE_ENABLE,
+					&vf->vf_states))
 			i40e_notify_client_of_vf_reset(pf, abs_vf_id);
 		vf->num_vlan = 0;
 	}
-- 
2.14.2

^ permalink raw reply related

* [net-next 08/15] i40e: drop i40e_pf *pf from i40e_vc_disable_vf()
From: Jeff Kirsher @ 2017-10-02 19:48 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171002194852.71970-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

It's never used, and the vf structure could get back to the PF if
necessary. Lets just drop the extra unneeded parameter.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 53ead127b293..70a79864177a 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -154,12 +154,11 @@ void i40e_vc_notify_vf_reset(struct i40e_vf *vf)
 
 /**
  * i40e_vc_disable_vf
- * @pf: pointer to the PF info
  * @vf: pointer to the VF info
  *
  * Disable the VF through a SW reset
  **/
-static inline void i40e_vc_disable_vf(struct i40e_pf *pf, struct i40e_vf *vf)
+static inline void i40e_vc_disable_vf(struct i40e_vf *vf)
 {
 	i40e_vc_notify_vf_reset(vf);
 	i40e_reset_vf(vf, false);
@@ -2918,7 +2917,7 @@ int i40e_ndo_set_vf_mac(struct net_device *netdev, int vf_id, u8 *mac)
 	}
 
 	/* Force the VF driver stop so it has to reload with new MAC address */
-	i40e_vc_disable_vf(pf, vf);
+	i40e_vc_disable_vf(vf);
 	dev_info(&pf->pdev->dev, "Reload the VF driver to make this change effective.\n");
 
 error_param:
@@ -3013,7 +3012,7 @@ int i40e_ndo_set_vf_port_vlan(struct net_device *netdev, int vf_id,
 		 * the right thing by reconfiguring his network correctly
 		 * and then reloading the VF driver.
 		 */
-		i40e_vc_disable_vf(pf, vf);
+		i40e_vc_disable_vf(vf);
 		/* During reset the VF got a new VSI, so refresh the pointer. */
 		vsi = pf->vsi[vf->lan_vsi_idx];
 	}
-- 
2.14.2

^ permalink raw reply related

* [net-next 11/15] i40evf: Enable VF to request an alternate queue allocation
From: Jeff Kirsher @ 2017-10-02 19:48 UTC (permalink / raw)
  To: davem; +Cc: Alan Brady, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171002194852.71970-1-jeffrey.t.kirsher@intel.com>

From: Alan Brady <alan.brady@intel.com>

Currently the VF gets a default number of allocated queues from HW on
init and it could choose to enable or disable those allocated queues.
This makes it such that the VF can request more or less underlying
allocated queues from the PF.

First the VF negotiates the number of queues it wants that can be
supported by the PF and if successful asks for a reset.  During reset
the PF will reallocate the HW queues for the VF and will then remap the
new queues.

Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40evf/i40evf.h         |  4 +
 drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c | 38 ++++++++-
 drivers/net/ethernet/intel/i40evf/i40evf_main.c    | 94 ++++++++++++++++++++--
 .../net/ethernet/intel/i40evf/i40evf_virtchnl.c    | 44 +++++++++-
 4 files changed, 173 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40evf/i40evf.h b/drivers/net/ethernet/intel/i40evf/i40evf.h
index 82f69031e5cd..5982362c5643 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf.h
+++ b/drivers/net/ethernet/intel/i40evf/i40evf.h
@@ -102,6 +102,7 @@ struct i40e_vsi {
 #define I40E_TX_CTXTDESC(R, i) \
 	(&(((struct i40e_tx_context_desc *)((R)->desc))[i]))
 #define MAX_QUEUES 16
+#define I40EVF_MAX_REQ_QUEUES 4
 
 #define I40EVF_HKEY_ARRAY_SIZE ((I40E_VFQF_HKEY_MAX_INDEX + 1) * 4)
 #define I40EVF_HLUT_ARRAY_SIZE ((I40E_VFQF_HLUT_MAX_INDEX + 1) * 4)
@@ -200,6 +201,7 @@ struct i40evf_adapter {
 	struct list_head vlan_filter_list;
 	char misc_vector_name[IFNAMSIZ + 9];
 	int num_active_queues;
+	int num_req_queues;
 
 	/* TX */
 	struct i40e_ring *tx_rings;
@@ -235,6 +237,7 @@ struct i40evf_adapter {
 #define I40EVF_FLAG_PROMISC_ON			BIT(18)
 #define I40EVF_FLAG_ALLMULTI_ON			BIT(19)
 #define I40EVF_FLAG_LEGACY_RX			BIT(20)
+#define I40EVF_FLAG_REINIT_ITR_NEEDED		BIT(21)
 /* duplicates for common code */
 #define I40E_FLAG_DCB_ENABLED			0
 #define I40E_FLAG_RX_CSUM_ENABLED		I40EVF_FLAG_RX_CSUM_ENABLED
@@ -349,6 +352,7 @@ void i40evf_deconfigure_queues(struct i40evf_adapter *adapter);
 void i40evf_enable_queues(struct i40evf_adapter *adapter);
 void i40evf_disable_queues(struct i40evf_adapter *adapter);
 void i40evf_map_queues(struct i40evf_adapter *adapter);
+int i40evf_request_queues(struct i40evf_adapter *adapter, int num);
 void i40evf_add_ether_addrs(struct i40evf_adapter *adapter);
 void i40evf_del_ether_addrs(struct i40evf_adapter *adapter);
 void i40evf_add_vlans(struct i40evf_adapter *adapter);
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c b/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c
index 65874d6b3ab9..da006fa3fec1 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c
@@ -669,7 +669,7 @@ static void i40evf_get_channels(struct net_device *netdev,
 	struct i40evf_adapter *adapter = netdev_priv(netdev);
 
 	/* Report maximum channels */
-	ch->max_combined = adapter->num_active_queues;
+	ch->max_combined = I40EVF_MAX_REQ_QUEUES;
 
 	ch->max_other = NONQ_VECS;
 	ch->other_count = NONQ_VECS;
@@ -677,6 +677,41 @@ static void i40evf_get_channels(struct net_device *netdev,
 	ch->combined_count = adapter->num_active_queues;
 }
 
+/**
+ * i40evf_set_channels: set the new channel count
+ * @netdev: network interface device structure
+ * @ch: channel information structure
+ *
+ * Negotiate a new number of channels with the PF then do a reset.  During
+ * reset we'll realloc queues and fix the RSS table.  Returns 0 on success,
+ * negative on failure.
+ **/
+static int i40evf_set_channels(struct net_device *netdev,
+			       struct ethtool_channels *ch)
+{
+	struct i40evf_adapter *adapter = netdev_priv(netdev);
+	int num_req = ch->combined_count;
+
+	if (num_req != adapter->num_active_queues &&
+	    !(adapter->vf_res->vf_cap_flags &
+	      VIRTCHNL_VF_OFFLOAD_REQ_QUEUES)) {
+		dev_info(&adapter->pdev->dev, "PF is not capable of queue negotiation.\n");
+		return -EINVAL;
+	}
+
+	/* All of these should have already been checked by ethtool before this
+	 * even gets to us, but just to be sure.
+	 */
+	if (num_req <= 0 || num_req > I40EVF_MAX_REQ_QUEUES)
+		return -EINVAL;
+
+	if (ch->rx_count || ch->tx_count || ch->other_count != NONQ_VECS)
+		return -EINVAL;
+
+	adapter->num_req_queues = num_req;
+	return i40evf_request_queues(adapter, num_req);
+}
+
 /**
  * i40evf_get_rxfh_key_size - get the RSS hash key size
  * @netdev: network interface device structure
@@ -785,6 +820,7 @@ static const struct ethtool_ops i40evf_ethtool_ops = {
 	.get_rxfh		= i40evf_get_rxfh,
 	.set_rxfh		= i40evf_set_rxfh,
 	.get_channels		= i40evf_get_channels,
+	.set_channels		= i40evf_set_channels,
 	.get_rxfh_key_size	= i40evf_get_rxfh_key_size,
 	.get_link_ksettings	= i40evf_get_link_ksettings,
 };
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index 69ef6c1d5364..8c513ce84345 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -1189,9 +1189,18 @@ static int i40evf_alloc_queues(struct i40evf_adapter *adapter)
 {
 	int i, num_active_queues;
 
-	num_active_queues = min_t(int,
-				  adapter->vsi_res->num_queue_pairs,
-				  (int)(num_online_cpus()));
+	/* If we're in reset reallocating queues we don't actually know yet for
+	 * certain the PF gave us the number of queues we asked for but we'll
+	 * assume it did.  Once basic reset is finished we'll confirm once we
+	 * start negotiating config with PF.
+	 */
+	if (adapter->num_req_queues)
+		num_active_queues = adapter->num_req_queues;
+	else
+		num_active_queues = min_t(int,
+					  adapter->vsi_res->num_queue_pairs,
+					  (int)(num_online_cpus()));
+
 
 	adapter->tx_rings = kcalloc(num_active_queues,
 				    sizeof(struct i40e_ring), GFP_KERNEL);
@@ -1539,6 +1548,48 @@ static void i40evf_free_rss(struct i40evf_adapter *adapter)
 	adapter->rss_lut = NULL;
 }
 
+/**
+ * i40evf_reinit_interrupt_scheme - Reallocate queues and vectors
+ * @adapter: board private structure
+ *
+ * Returns 0 on success, negative on failure
+ **/
+static int i40evf_reinit_interrupt_scheme(struct i40evf_adapter *adapter)
+{
+	struct net_device *netdev = adapter->netdev;
+	int err;
+
+	if (netif_running(netdev))
+		i40evf_free_traffic_irqs(adapter);
+	i40evf_free_misc_irq(adapter);
+	i40evf_reset_interrupt_capability(adapter);
+	i40evf_free_q_vectors(adapter);
+	i40evf_free_queues(adapter);
+
+	err =  i40evf_init_interrupt_scheme(adapter);
+	if (err)
+		goto err;
+
+	netif_tx_stop_all_queues(netdev);
+
+	err = i40evf_request_misc_irq(adapter);
+	if (err)
+		goto err;
+
+	set_bit(__I40E_VSI_DOWN, adapter->vsi.state);
+
+	err = i40evf_map_rings_to_vectors(adapter);
+	if (err)
+		goto err;
+
+	if (RSS_AQ(adapter))
+		adapter->aq_required |= I40EVF_FLAG_AQ_CONFIGURE_RSS;
+	else
+		err = i40evf_init_rss(adapter);
+err:
+	return err;
+}
+
 /**
  * i40evf_watchdog_timer - Periodic call-back timer
  * @data: pointer to adapter disguised as unsigned long
@@ -1885,8 +1936,15 @@ static void i40evf_reset_task(struct work_struct *work)
 	if (err)
 		dev_info(&adapter->pdev->dev, "Failed to init adminq: %d\n",
 			 err);
+	adapter->aq_required = 0;
 
-	adapter->aq_required = I40EVF_FLAG_AQ_GET_CONFIG;
+	if (adapter->flags & I40EVF_FLAG_REINIT_ITR_NEEDED) {
+		err = i40evf_reinit_interrupt_scheme(adapter);
+		if (err)
+			goto reset_err;
+	}
+
+	adapter->aq_required |= I40EVF_FLAG_AQ_GET_CONFIG;
 	adapter->aq_required |= I40EVF_FLAG_AQ_MAP_VECTORS;
 
 	/* re-add all MAC filters */
@@ -1916,6 +1974,15 @@ static void i40evf_reset_task(struct work_struct *work)
 		if (err)
 			goto reset_err;
 
+		if (adapter->flags & I40EVF_FLAG_REINIT_ITR_NEEDED) {
+			err = i40evf_request_traffic_irqs(adapter,
+							  netdev->name);
+			if (err)
+				goto reset_err;
+
+			adapter->flags &= ~I40EVF_FLAG_REINIT_ITR_NEEDED;
+		}
+
 		i40evf_configure(adapter);
 
 		i40evf_up_complete(adapter);
@@ -2431,9 +2498,9 @@ static int i40evf_check_reset_complete(struct i40e_hw *hw)
 int i40evf_process_config(struct i40evf_adapter *adapter)
 {
 	struct virtchnl_vf_resource *vfres = adapter->vf_res;
+	int i, num_req_queues = adapter->num_req_queues;
 	struct net_device *netdev = adapter->netdev;
 	struct i40e_vsi *vsi = &adapter->vsi;
-	int i;
 	netdev_features_t hw_enc_features;
 	netdev_features_t hw_features;
 
@@ -2447,6 +2514,23 @@ int i40evf_process_config(struct i40evf_adapter *adapter)
 		return -ENODEV;
 	}
 
+	if (num_req_queues &&
+	    num_req_queues != adapter->vsi_res->num_queue_pairs) {
+		/* Problem.  The PF gave us fewer queues than what we had
+		 * negotiated in our request.  Need a reset to see if we can't
+		 * get back to a working state.
+		 */
+		dev_err(&adapter->pdev->dev,
+			"Requested %d queues, but PF only gave us %d.\n",
+			num_req_queues,
+			adapter->vsi_res->num_queue_pairs);
+		adapter->flags |= I40EVF_FLAG_REINIT_ITR_NEEDED;
+		adapter->num_req_queues = adapter->vsi_res->num_queue_pairs;
+		i40evf_schedule_reset(adapter);
+		return -ENODEV;
+	}
+	adapter->num_req_queues = 0;
+
 	hw_enc_features = NETIF_F_SG			|
 			  NETIF_F_IP_CSUM		|
 			  NETIF_F_IPV6_CSUM		|
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c b/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c
index 2bb0fe00361f..2bb81c39d85f 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c
@@ -160,7 +160,8 @@ int i40evf_send_vf_config_msg(struct i40evf_adapter *adapter)
 	       VIRTCHNL_VF_OFFLOAD_WB_ON_ITR |
 	       VIRTCHNL_VF_OFFLOAD_RSS_PCTYPE_V2 |
 	       VIRTCHNL_VF_OFFLOAD_ENCAP |
-	       VIRTCHNL_VF_OFFLOAD_ENCAP_CSUM;
+	       VIRTCHNL_VF_OFFLOAD_ENCAP_CSUM |
+	       VIRTCHNL_VF_OFFLOAD_REQ_QUEUES;
 
 	adapter->current_op = VIRTCHNL_OP_GET_VF_RESOURCES;
 	adapter->aq_required &= ~I40EVF_FLAG_AQ_GET_CONFIG;
@@ -384,6 +385,32 @@ void i40evf_map_queues(struct i40evf_adapter *adapter)
 	kfree(vimi);
 }
 
+/**
+ * i40evf_request_queues
+ * @adapter: adapter structure
+ * @num: number of requested queues
+ *
+ * We get a default number of queues from the PF.  This enables us to request a
+ * different number.  Returns 0 on success, negative on failure
+ **/
+int i40evf_request_queues(struct i40evf_adapter *adapter, int num)
+{
+	struct virtchnl_vf_res_request vfres;
+
+	if (adapter->current_op != VIRTCHNL_OP_UNKNOWN) {
+		/* bail because we already have a command pending */
+		dev_err(&adapter->pdev->dev, "Cannot request queues, command %d pending\n",
+			adapter->current_op);
+		return -EBUSY;
+	}
+
+	vfres.num_queue_pairs = num;
+
+	adapter->current_op = VIRTCHNL_OP_REQUEST_QUEUES;
+	return i40evf_send_pf_msg(adapter, VIRTCHNL_OP_REQUEST_QUEUES,
+				  (u8 *)&vfres, sizeof(vfres));
+}
+
 /**
  * i40evf_add_ether_addrs
  * @adapter: adapter structure
@@ -1068,6 +1095,21 @@ void i40evf_virtchnl_completion(struct i40evf_adapter *adapter,
 				 "Invalid message %d from PF\n", v_opcode);
 		}
 		break;
+	case VIRTCHNL_OP_REQUEST_QUEUES: {
+		struct virtchnl_vf_res_request *vfres =
+			(struct virtchnl_vf_res_request *)msg;
+		if (vfres->num_queue_pairs == adapter->num_req_queues) {
+			adapter->flags |= I40EVF_FLAG_REINIT_ITR_NEEDED;
+			i40evf_schedule_reset(adapter);
+		} else {
+			dev_info(&adapter->pdev->dev,
+				 "Requested %d queues, PF can support %d\n",
+				 adapter->num_req_queues,
+				 vfres->num_queue_pairs);
+			adapter->num_req_queues = 0;
+		}
+		}
+		break;
 	default:
 		if (adapter->current_op && (v_opcode != adapter->current_op))
 			dev_warn(&adapter->pdev->dev, "Expected response %d from PF, received %d\n",
-- 
2.14.2

^ permalink raw reply related

* [net-next 15/15] i40e: Stop dropping 802.1ad tags - eth proto 0x88a8
From: Jeff Kirsher @ 2017-10-02 19:48 UTC (permalink / raw)
  To: davem; +Cc: Scott Peterson, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171002194852.71970-1-jeffrey.t.kirsher@intel.com>

From: Scott Peterson <scott.d.peterson@intel.com>

Enable i40e to pass traffic with VLAN tags using the 802.1ad ethernet
protocol ID (0x88a8).

This requires NIC firmware providing version 1.7 of the API. With
older NIC firmware 802.1ad tagged packets will continue to be dropped.

No VLAN offloads nor RSS are supported for 802.1ad VLANs.

Signed-off-by: Scott Peterson <scott.d.peterson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_adminq.c       |  6 ++++++
 drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h   | 17 ++++++++++++++++-
 drivers/net/ethernet/intel/i40e/i40e_common.c       |  6 +++++-
 drivers/net/ethernet/intel/i40e/i40e_main.c         |  7 +++++++
 drivers/net/ethernet/intel/i40e/i40e_type.h         |  6 ++++++
 drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h | 17 ++++++++++++++++-
 drivers/net/ethernet/intel/i40evf/i40e_type.h       |  6 ++++++
 7 files changed, 62 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq.c b/drivers/net/ethernet/intel/i40e/i40e_adminq.c
index 08f63226105a..9dcb2a961197 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_adminq.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_adminq.c
@@ -613,6 +613,12 @@ i40e_status i40e_init_adminq(struct i40e_hw *hw)
 		hw->flags |= I40E_HW_FLAG_AQ_PHY_ACCESS_CAPABLE;
 	}
 
+	/* The ability to RX (not drop) 802.1ad frames was added in API 1.7 */
+	if (hw->aq.api_maj_ver > 1 ||
+	    (hw->aq.api_maj_ver == 1 &&
+	     hw->aq.api_min_ver >= 7))
+		hw->flags |= I40E_HW_FLAG_802_1AD_CAPABLE;
+
 	if (hw->aq.api_maj_ver > I40E_FW_API_VERSION_MAJOR) {
 		ret_code = I40E_ERR_FIRMWARE_API_VERSION;
 		goto init_adminq_free_arq;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
index ed7bbe14bc6e..4c85ea9cd89a 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
@@ -775,7 +775,22 @@ struct i40e_aqc_set_switch_config {
 #define I40E_AQ_SET_SWITCH_CFG_PROMISC		0x0001
 #define I40E_AQ_SET_SWITCH_CFG_L2_FILTER	0x0002
 	__le16	valid_flags;
-	u8	reserved[12];
+	/* The ethertype in switch_tag is dropped on ingress and used
+	 * internally by the switch. Set this to zero for the default
+	 * of 0x88a8 (802.1ad). Should be zero for firmware API
+	 * versions lower than 1.7.
+	 */
+	__le16	switch_tag;
+	/* The ethertypes in first_tag and second_tag are used to
+	 * match the outer and inner VLAN tags (respectively) when HW
+	 * double VLAN tagging is enabled via the set port parameters
+	 * AQ command. Otherwise these are both ignored. Set them to
+	 * zero for their defaults of 0x8100 (802.1Q). Should be zero
+	 * for firmware API versions lower than 1.7.
+	 */
+	__le16	first_tag;
+	__le16	second_tag;
+	u8	reserved[6];
 };
 
 I40E_CHECK_CMD_LENGTH(i40e_aqc_set_switch_config);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c b/drivers/net/ethernet/intel/i40e/i40e_common.c
index a4838779de5d..60542beda7ad 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_common.c
@@ -2402,7 +2402,11 @@ enum i40e_status_code i40e_aq_set_switch_config(struct i40e_hw *hw,
 					  i40e_aqc_opc_set_switch_config);
 	scfg->flags = cpu_to_le16(flags);
 	scfg->valid_flags = cpu_to_le16(valid_flags);
-
+	if (hw->flags & I40E_HW_FLAG_802_1AD_CAPABLE) {
+		scfg->switch_tag = cpu_to_le16(hw->switch_tag);
+		scfg->first_tag = cpu_to_le16(hw->first_tag);
+		scfg->second_tag = cpu_to_le16(hw->second_tag);
+	}
 	status = i40e_asq_send_command(hw, &desc, NULL, 0, cmd_details);
 
 	return status;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 387f0863f794..3f9e89b054ec 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -11361,6 +11361,13 @@ static int i40e_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	hw->bus.bus_id = pdev->bus->number;
 	pf->instance = pfs_found;
 
+	/* Select something other than the 802.1ad ethertype for the
+	 * switch to use internally and drop on ingress.
+	 */
+	hw->switch_tag = 0xffff;
+	hw->first_tag = ETH_P_8021AD;
+	hw->second_tag = ETH_P_8021Q;
+
 	INIT_LIST_HEAD(&pf->l3_flex_pit_list);
 	INIT_LIST_HEAD(&pf->l4_flex_pit_list);
 
diff --git a/drivers/net/ethernet/intel/i40e/i40e_type.h b/drivers/net/ethernet/intel/i40e/i40e_type.h
index 8b0b9f826b7f..4b32b1d38a66 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_type.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_type.h
@@ -610,9 +610,15 @@ struct i40e_hw {
 	struct i40e_dcbx_config desired_dcbx_config; /* CEE Desired Cfg */
 
 #define I40E_HW_FLAG_AQ_SRCTL_ACCESS_ENABLE BIT_ULL(0)
+#define I40E_HW_FLAG_802_1AD_CAPABLE        BIT_ULL(1)
 #define I40E_HW_FLAG_AQ_PHY_ACCESS_CAPABLE  BIT_ULL(2)
 	u64 flags;
 
+	/* Used in set switch config AQ command */
+	u16 switch_tag;
+	u16 first_tag;
+	u16 second_tag;
+
 	/* debug mask */
 	u32 debug_mask;
 	char err_str[16];
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h b/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h
index eee7ece42b39..ed5602f4bbcd 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h
@@ -771,7 +771,22 @@ struct i40e_aqc_set_switch_config {
 #define I40E_AQ_SET_SWITCH_CFG_PROMISC		0x0001
 #define I40E_AQ_SET_SWITCH_CFG_L2_FILTER	0x0002
 	__le16	valid_flags;
-	u8	reserved[12];
+	/* The ethertype in switch_tag is dropped on ingress and used
+	 * internally by the switch. Set this to zero for the default
+	 * of 0x88a8 (802.1ad). Should be zero for firmware API
+	 * versions lower than 1.7.
+	 */
+	__le16	switch_tag;
+	/* The ethertypes in first_tag and second_tag are used to
+	 * match the outer and inner VLAN tags (respectively) when HW
+	 * double VLAN tagging is enabled via the set port parameters
+	 * AQ command. Otherwise these are both ignored. Set them to
+	 * zero for their defaults of 0x8100 (802.1Q). Should be zero
+	 * for firmware API versions lower than 1.7.
+	 */
+	__le16	first_tag;
+	__le16	second_tag;
+	u8	reserved[6];
 };
 
 I40E_CHECK_CMD_LENGTH(i40e_aqc_set_switch_config);
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_type.h b/drivers/net/ethernet/intel/i40evf/i40e_type.h
index 48eacf5e73e4..9364b67fff9c 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_type.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_type.h
@@ -568,6 +568,7 @@ struct i40e_hw {
 	/* LLDP/DCBX Status */
 	u16 dcbx_status;
 
+#define I40E_HW_FLAG_802_1AD_CAPABLE        BIT_ULL(1)
 #define I40E_HW_FLAG_AQ_PHY_ACCESS_CAPABLE  BIT_ULL(2)
 
 	/* DCBX info */
@@ -575,6 +576,11 @@ struct i40e_hw {
 	struct i40e_dcbx_config remote_dcbx_config; /* Peer Cfg */
 	struct i40e_dcbx_config desired_dcbx_config; /* CEE Desired Cfg */
 
+	/* Used in set switch config AQ command */
+	u16 switch_tag;
+	u16 first_tag;
+	u16 second_tag;
+
 	/* debug mask */
 	u32 debug_mask;
 	char err_str[16];
-- 
2.14.2

^ permalink raw reply related

* [net-next 12/15] i40e: make i40evf_map_rings_to_vectors void
From: Jeff Kirsher @ 2017-10-02 19:48 UTC (permalink / raw)
  To: davem; +Cc: Mitch Williams, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171002194852.71970-1-jeffrey.t.kirsher@intel.com>

From: Mitch Williams <mitch.a.williams@intel.com>

This function cannot fail, so why is it returning a value? And why are
we checking it? Why shouldn't we just make it void? Why is this commit
message made up of only questions?

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40evf/i40evf_main.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index 8c513ce84345..f2f1e754c2ce 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -430,12 +430,11 @@ i40evf_map_vector_to_txq(struct i40evf_adapter *adapter, int v_idx, int t_idx)
  * group the rings as "efficiently" as possible.  You would add new
  * mapping configurations in here.
  **/
-static int i40evf_map_rings_to_vectors(struct i40evf_adapter *adapter)
+static void i40evf_map_rings_to_vectors(struct i40evf_adapter *adapter)
 {
 	int rings_remaining = adapter->num_active_queues;
 	int ridx = 0, vidx = 0;
 	int q_vectors;
-	int err = 0;
 
 	q_vectors = adapter->num_msix_vectors - NONQ_VECS;
 
@@ -451,8 +450,6 @@ static int i40evf_map_rings_to_vectors(struct i40evf_adapter *adapter)
 	}
 
 	adapter->aq_required |= I40EVF_FLAG_AQ_MAP_VECTORS;
-
-	return err;
 }
 
 #ifdef CONFIG_NET_POLL_CONTROLLER
@@ -1578,9 +1575,7 @@ static int i40evf_reinit_interrupt_scheme(struct i40evf_adapter *adapter)
 
 	set_bit(__I40E_VSI_DOWN, adapter->vsi.state);
 
-	err = i40evf_map_rings_to_vectors(adapter);
-	if (err)
-		goto err;
+	i40evf_map_rings_to_vectors(adapter);
 
 	if (RSS_AQ(adapter))
 		adapter->aq_required |= I40EVF_FLAG_AQ_CONFIGURE_RSS;
-- 
2.14.2

^ permalink raw reply related

* [net-next 07/15] i40e: don't hold spinlock while resetting VF
From: Jeff Kirsher @ 2017-10-02 19:48 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171002194852.71970-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

When we refactored handling of the PVID in commit 9af52f60b2d9
("i40e: use (add|rm)_vlan_all_mac helper functions when changing PVID")
we introduced a scheduling while atomic regression.

This occurred because we now held the spinlock across a call to
i40e_reset_vf(), which results in a usleep_range() call that triggers
a scheduling while atomic bug. This was rare as it only occurred if the
user configured a VLAN on a VF and also attempted to reconfigure the VF
from the host system with a port VLAN.

We do need to hold the lock while calling i40e_is_vsi_in_vlan(), but we
should not be holding it while we reset the VF.

We'll fix this by introducing a separate helper function
i40e_vsi_has_vlans which checks whether we have a PVID and whether the
VSI has configured VLANs. This helper function will manage its own need
for the mac_filter_hash_lock.

Then, we can move the acquiring of the spinlock until after we reset the
VF, which ensures that we do not sleep while holding the lock.

Using a separate function like this makes the code more clear and is
easier to read than attempting to release and re-acquire the spinlock
when we reset the VF.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 36 +++++++++++++++++++---
 1 file changed, 32 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 9e3667fc7f6a..53ead127b293 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -2925,6 +2925,34 @@ int i40e_ndo_set_vf_mac(struct net_device *netdev, int vf_id, u8 *mac)
 	return ret;
 }
 
+/**
+ * i40e_vsi_has_vlans - True if VSI has configured VLANs
+ * @vsi: pointer to the vsi
+ *
+ * Check if a VSI has configured any VLANs. False if we have a port VLAN or if
+ * we have no configured VLANs. Do not call while holding the
+ * mac_filter_hash_lock.
+ */
+static bool i40e_vsi_has_vlans(struct i40e_vsi *vsi)
+{
+	bool have_vlans;
+
+	/* If we have a port VLAN, then the VSI cannot have any VLANs
+	 * configured, as all MAC/VLAN filters will be assigned to the PVID.
+	 */
+	if (vsi->info.pvid)
+		return false;
+
+	/* Since we don't have a PVID, we know that if the device is in VLAN
+	 * mode it must be because of a VLAN filter configured on this VSI.
+	 */
+	spin_lock_bh(&vsi->mac_filter_hash_lock);
+	have_vlans = i40e_is_vsi_in_vlan(vsi);
+	spin_unlock_bh(&vsi->mac_filter_hash_lock);
+
+	return have_vlans;
+}
+
 /**
  * i40e_ndo_set_vf_port_vlan
  * @netdev: network interface device structure
@@ -2977,10 +3005,7 @@ int i40e_ndo_set_vf_port_vlan(struct net_device *netdev, int vf_id,
 		/* duplicate request, so just return success */
 		goto error_pvid;
 
-	/* Locked once because multiple functions below iterate list */
-	spin_lock_bh(&vsi->mac_filter_hash_lock);
-
-	if (le16_to_cpu(vsi->info.pvid) == 0 && i40e_is_vsi_in_vlan(vsi)) {
+	if (i40e_vsi_has_vlans(vsi)) {
 		dev_err(&pf->pdev->dev,
 			"VF %d has already configured VLAN filters and the administrator is requesting a port VLAN override.\nPlease unload and reload the VF driver for this change to take effect.\n",
 			vf_id);
@@ -2993,6 +3018,9 @@ int i40e_ndo_set_vf_port_vlan(struct net_device *netdev, int vf_id,
 		vsi = pf->vsi[vf->lan_vsi_idx];
 	}
 
+	/* Locked once because multiple functions below iterate list */
+	spin_lock_bh(&vsi->mac_filter_hash_lock);
+
 	/* Check for condition where there was already a port VLAN ID
 	 * filter set and now it is being deleted by setting it to zero.
 	 * Additionally check for the condition where there was a port
-- 
2.14.2

^ permalink raw reply related

* [net-next 09/15] i40e: make use of i40e_vc_disable_vf
From: Jeff Kirsher @ 2017-10-02 19:48 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171002194852.71970-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

Replace i40e_vc_notify_vf_reset and i40e_reset_vf with a call to
i40e_vc_disable_vf which does this exact thing. This matches similar
code patterns throughout the driver.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 70a79864177a..94ee243f110e 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -3388,8 +3388,7 @@ int i40e_ndo_set_vf_trust(struct net_device *netdev, int vf_id, bool setting)
 		goto out;
 
 	vf->trusted = setting;
-	i40e_vc_notify_vf_reset(vf);
-	i40e_reset_vf(vf, false);
+	i40e_vc_disable_vf(vf);
 	dev_info(&pf->pdev->dev, "VF %u is now %strusted\n",
 		 vf_id, setting ? "" : "un");
 out:
-- 
2.14.2

^ permalink raw reply related

* [net-next 06/15] i40e: use admin queue for setting LEDs behavior
From: Jeff Kirsher @ 2017-10-02 19:48 UTC (permalink / raw)
  To: davem; +Cc: Mariusz Stachura, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher
In-Reply-To: <20171002194852.71970-1-jeffrey.t.kirsher@intel.com>

From: Mariusz Stachura <mariusz.stachura@intel.com>

Instead of accessing register directly, use newly added AQC in
order to blink LEDs. Introduce and utilize a new flag to prevent
excessive API version checking.

Signed-off-by: Mariusz Stachura <mariusz.stachura@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_adminq.c |   6 ++
 drivers/net/ethernet/intel/i40e/i40e_common.c | 115 ++++++++++++++++++++------
 drivers/net/ethernet/intel/i40evf/i40e_type.h |   2 +
 3 files changed, 99 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq.c b/drivers/net/ethernet/intel/i40e/i40e_adminq.c
index ba04988e0598..08f63226105a 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_adminq.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_adminq.c
@@ -607,6 +607,12 @@ i40e_status i40e_init_adminq(struct i40e_hw *hw)
 			   &oem_lo);
 	hw->nvm.oem_ver = ((u32)oem_hi << 16) | oem_lo;
 
+	if (hw->mac.type == I40E_MAC_XL710 &&
+	    hw->aq.api_maj_ver == I40E_FW_API_VERSION_MAJOR &&
+	    hw->aq.api_min_ver >= I40E_MINOR_VER_GET_LINK_INFO_XL710) {
+		hw->flags |= I40E_HW_FLAG_AQ_PHY_ACCESS_CAPABLE;
+	}
+
 	if (hw->aq.api_maj_ver > I40E_FW_API_VERSION_MAJOR) {
 		ret_code = I40E_ERR_FIRMWARE_API_VERSION;
 		goto init_adminq_free_arq;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c b/drivers/net/ethernet/intel/i40e/i40e_common.c
index fada03799850..a4838779de5d 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_common.c
@@ -4836,6 +4836,74 @@ i40e_status i40e_blink_phy_link_led(struct i40e_hw *hw,
 	return status;
 }
 
+/**
+ * i40e_led_get_reg - read LED register
+ * @hw: pointer to the HW structure
+ * @led_addr: LED register address
+ * @reg_val: read register value
+ **/
+static enum i40e_status_code i40e_led_get_reg(struct i40e_hw *hw, u16 led_addr,
+					      u32 *reg_val)
+{
+	enum i40e_status_code status;
+	u8 phy_addr = 0;
+	u8 port_num;
+	u32 i;
+
+	*reg_val = 0;
+	if (hw->flags & I40E_HW_FLAG_AQ_PHY_ACCESS_CAPABLE) {
+		status =
+		       i40e_aq_get_phy_register(hw,
+						I40E_AQ_PHY_REG_ACCESS_EXTERNAL,
+						I40E_PHY_COM_REG_PAGE,
+						I40E_PHY_LED_PROV_REG_1,
+						reg_val, NULL);
+	} else {
+		i = rd32(hw, I40E_PFGEN_PORTNUM);
+		port_num = (u8)(i & I40E_PFGEN_PORTNUM_PORT_NUM_MASK);
+		phy_addr = i40e_get_phy_address(hw, port_num);
+		status = i40e_read_phy_register_clause45(hw,
+							 I40E_PHY_COM_REG_PAGE,
+							 led_addr, phy_addr,
+							 (u16 *)reg_val);
+	}
+	return status;
+}
+
+/**
+ * i40e_led_set_reg - write LED register
+ * @hw: pointer to the HW structure
+ * @led_addr: LED register address
+ * @reg_val: register value to write
+ **/
+static enum i40e_status_code i40e_led_set_reg(struct i40e_hw *hw, u16 led_addr,
+					      u32 reg_val)
+{
+	enum i40e_status_code status;
+	u8 phy_addr = 0;
+	u8 port_num;
+	u32 i;
+
+	if (hw->flags & I40E_HW_FLAG_AQ_PHY_ACCESS_CAPABLE) {
+		status =
+		       i40e_aq_set_phy_register(hw,
+						I40E_AQ_PHY_REG_ACCESS_EXTERNAL,
+						I40E_PHY_COM_REG_PAGE,
+						I40E_PHY_LED_PROV_REG_1,
+						reg_val, NULL);
+	} else {
+		i = rd32(hw, I40E_PFGEN_PORTNUM);
+		port_num = (u8)(i & I40E_PFGEN_PORTNUM_PORT_NUM_MASK);
+		phy_addr = i40e_get_phy_address(hw, port_num);
+		status = i40e_write_phy_register_clause45(hw,
+							  I40E_PHY_COM_REG_PAGE,
+							  led_addr, phy_addr,
+							  (u16)reg_val);
+	}
+
+	return status;
+}
+
 /**
  * i40e_led_get_phy - return current on/off mode
  * @hw: pointer to the hw struct
@@ -4853,7 +4921,19 @@ i40e_status i40e_led_get_phy(struct i40e_hw *hw, u16 *led_addr,
 	u16 temp_addr;
 	u8 port_num;
 	u32 i;
-
+	u32 reg_val_aq;
+
+	if (hw->flags & I40E_HW_FLAG_AQ_PHY_ACCESS_CAPABLE) {
+		status =
+		      i40e_aq_get_phy_register(hw,
+					       I40E_AQ_PHY_REG_ACCESS_EXTERNAL,
+					       I40E_PHY_COM_REG_PAGE,
+					       I40E_PHY_LED_PROV_REG_1,
+					       &reg_val_aq, NULL);
+		if (status == I40E_SUCCESS)
+			*val = (u16)reg_val_aq;
+		return status;
+	}
 	temp_addr = I40E_PHY_LED_PROV_REG_1;
 	i = rd32(hw, I40E_PFGEN_PORTNUM);
 	port_num = (u8)(i & I40E_PFGEN_PORTNUM_PORT_NUM_MASK);
@@ -4888,51 +4968,38 @@ i40e_status i40e_led_set_phy(struct i40e_hw *hw, bool on,
 			     u16 led_addr, u32 mode)
 {
 	i40e_status status = 0;
-	u16 led_ctl = 0;
-	u16 led_reg = 0;
-	u8 phy_addr = 0;
-	u8 port_num;
-	u32 i;
+	u32 led_ctl = 0;
+	u32 led_reg = 0;
 
-	i = rd32(hw, I40E_PFGEN_PORTNUM);
-	port_num = (u8)(i & I40E_PFGEN_PORTNUM_PORT_NUM_MASK);
-	phy_addr = i40e_get_phy_address(hw, port_num);
-	status = i40e_read_phy_register_clause45(hw, I40E_PHY_COM_REG_PAGE,
-						 led_addr, phy_addr, &led_reg);
+	status = i40e_led_get_reg(hw, led_addr, &led_reg);
 	if (status)
 		return status;
 	led_ctl = led_reg;
 	if (led_reg & I40E_PHY_LED_LINK_MODE_MASK) {
 		led_reg = 0;
-		status = i40e_write_phy_register_clause45(hw,
-							  I40E_PHY_COM_REG_PAGE,
-							  led_addr, phy_addr,
-							  led_reg);
+		status = i40e_led_set_reg(hw, led_addr, led_reg);
 		if (status)
 			return status;
 	}
-	status = i40e_read_phy_register_clause45(hw, I40E_PHY_COM_REG_PAGE,
-						 led_addr, phy_addr, &led_reg);
+	status = i40e_led_get_reg(hw, led_addr, &led_reg);
 	if (status)
 		goto restore_config;
 	if (on)
 		led_reg = I40E_PHY_LED_MANUAL_ON;
 	else
 		led_reg = 0;
-	status = i40e_write_phy_register_clause45(hw, I40E_PHY_COM_REG_PAGE,
-						  led_addr, phy_addr, led_reg);
+
+	status = i40e_led_set_reg(hw, led_addr, led_reg);
 	if (status)
 		goto restore_config;
 	if (mode & I40E_PHY_LED_MODE_ORIG) {
 		led_ctl = (mode & I40E_PHY_LED_MODE_MASK);
-		status = i40e_write_phy_register_clause45(hw,
-						 I40E_PHY_COM_REG_PAGE,
-						 led_addr, phy_addr, led_ctl);
+		status = i40e_led_set_reg(hw, led_addr, led_ctl);
 	}
 	return status;
+
 restore_config:
-	status = i40e_write_phy_register_clause45(hw, I40E_PHY_COM_REG_PAGE,
-						  led_addr, phy_addr, led_ctl);
+	status = i40e_led_set_reg(hw, led_addr, led_ctl);
 	return status;
 }
 
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_type.h b/drivers/net/ethernet/intel/i40evf/i40e_type.h
index b53584e3d580..48eacf5e73e4 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_type.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_type.h
@@ -568,6 +568,8 @@ struct i40e_hw {
 	/* LLDP/DCBX Status */
 	u16 dcbx_status;
 
+#define I40E_HW_FLAG_AQ_PHY_ACCESS_CAPABLE  BIT_ULL(2)
+
 	/* DCBX info */
 	struct i40e_dcbx_config local_dcbx_config; /* Oper/Local Cfg */
 	struct i40e_dcbx_config remote_dcbx_config; /* Peer Cfg */
-- 
2.14.2

^ permalink raw reply related

* [net-next 03/15] i40e: Fix a potential NULL pointer dereference
From: Jeff Kirsher @ 2017-10-02 19:48 UTC (permalink / raw)
  To: davem; +Cc: Christophe JAILLET, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher
In-Reply-To: <20171002194852.71970-1-jeffrey.t.kirsher@intel.com>

From: Christophe JAILLET <christophe.jaillet@wanadoo.fr>

If 'kzalloc()' fails, a NULL pointer will be dereferenced.
Return an error code (-ENOMEM) instead.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index e6b95e1e1a33..9e3667fc7f6a 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -423,6 +423,9 @@ static int i40e_config_iwarp_qvlist(struct i40e_vf *vf,
 	       (sizeof(struct virtchnl_iwarp_qv_info) *
 						(qvlist_info->num_vectors - 1));
 	vf->qvlist_info = kzalloc(size, GFP_KERNEL);
+	if (!vf->qvlist_info)
+		return -ENOMEM;
+
 	vf->qvlist_info->num_vectors = qvlist_info->num_vectors;
 
 	msix_vf = pf->hw.func_caps.num_msix_vectors_vf;
-- 
2.14.2

^ permalink raw reply related

* Re: [oss-drivers] Re: [PATCH net-next 2/2] flow_dissector: dissect tunnel info
From: Simon Horman @ 2017-10-02 20:04 UTC (permalink / raw)
  To: Tom Herbert
  Cc: David Miller, Jiri Pirko, Jamal Hadi Salim, Cong Wang,
	Linux Kernel Network Developers, oss-drivers
In-Reply-To: <CALx6S35DpMWdLVWKSoRdsfROM7=A2p9pLi7n2fMtmqwhSEh5wQ@mail.gmail.com>

On Mon, Oct 02, 2017 at 12:36:33PM -0700, Tom Herbert wrote:
> On Mon, Oct 2, 2017 at 1:41 AM, Simon Horman <simon.horman@netronome.com> wrote:
> > Move dissection of tunnel info from the flower classifier to the flow
> > dissector where all other dissection occurs.  This should not have any
> > behavioural affect on other users of the flow dissector.

...

> > +static void
> > +__skb_flow_dissect_tunnel_info(const struct sk_buff *skb,
> > +                              struct flow_dissector *flow_dissector,
> > +                              void *target_container)
> > +{
> > +       struct ip_tunnel_info *info;
> > +       struct ip_tunnel_key *key;
> > +
> > +       /* A quick check to see if there might be something to do. */
> > +       if (!dissector_uses_key(flow_dissector,
> > +                               FLOW_DISSECTOR_KEY_ENC_KEYID) &&
> > +           !dissector_uses_key(flow_dissector,
> > +                               FLOW_DISSECTOR_KEY_ENC_IPV4_ADDRS) &&
> > +           !dissector_uses_key(flow_dissector,
> > +                               FLOW_DISSECTOR_KEY_ENC_IPV6_ADDRS) &&
> > +           !dissector_uses_key(flow_dissector,
> > +                               FLOW_DISSECTOR_KEY_ENC_CONTROL) &&
> > +           !dissector_uses_key(flow_dissector,
> > +                               FLOW_DISSECTOR_KEY_ENC_PORTS))
> > +               return;
> 
> This complex conditional is now in the path of every call to flow
> dissector regardless of whether a classifier is enabled or tunnels
> are. What does the assembly show in terms of number of branches? Can
> we at least get this down to one check (might be a use case for
> FLOW_DISSECTOR_F_FLOWER ;-) ), or even better use the static key when
> encap or is enabled?

Hi Tom,

it appears to me (a little to my surprise but I did check before
posting) that the compiler turns the above into a single comparison.

$ gcc --version
gcc (Debian 4.9.2-10) 4.9.2
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

^ permalink raw reply

* Re: [PATCH iproute2] iproute: build more easily on Android
From: enh @ 2017-10-02 20:23 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Lorenzo Colitti, netdev
In-Reply-To: <20171002103611.17ccb01f@xeon-e3>

On Mon, Oct 2, 2017 at 10:36 AM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> On Tue,  3 Oct 2017 02:03:37 +0900
> Lorenzo Colitti <lorenzo@google.com> wrote:
>
>> iproute2 contains a bunch of kernel headers, including uapi ones.
>> Android's libc uses uapi headers almost directly, and uses a
>> script to fix kernel types that don't match what userspace
>> expects.
>>
>> For example: https://issuetracker.google.com/36987220 reports
>> that our struct ip_mreq_source contains "__be32 imr_multiaddr"
>> rather than "struct in_addr imr_multiaddr". The script addresses
>> this by replacing the uapi struct definition with a #include
>> <bits/ip_mreq.h> which contains the traditional userspace
>> definition.
>>
>> Unfortunately, when we compile iproute2, this definition
>> conflicts with the one in iproute2's linux/in.h.
>>
>> Historically we've just solved this problem by running "git rm"
>> on all the iproute2 include/linux headers that break Android's
>> libc.  However, deleting the files in this way makes it harder to
>> keep up with upstream, because every upstream change to
>> an include file causes a merge conflict with the delete.
>>
>> This patch fixes the problem by moving the iproute2 linux headers
>> from include/linux to include/uapi/linux.
>>
>> Tested: compiles on ubuntu trusty (glibc)
>>
>> Signed-off-by: Elliott Hughes <enh@google.com>
>> Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
>
> Rather than moving everything, why not make kernel headers directory
> configurable as part of the configure script setup process.

the problem is that C libraries with their our own uapi headers still
need your app-specific headers. to build iproute2 we need to put
iproute2's include/ on our include path, but then the fact that your
different uapi headers are *under* that directory causes the conflict.
separating your headers from your copy of the uapi headers is a
necessary part of any fix.

-- 
Elliott Hughes - http://who/enh - http://jessies.org/~enh/
Android native code/tools questions? Mail me/drop by/add me as a reviewer.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox