Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] atmel: using strlcpy() to avoid possible buffer overflows
From: Andy Shevchenko @ 2018-06-29 21:51 UTC (permalink / raw)
  To: YueHaibing
  Cc: simon, Kalle Valo, Linux Kernel Mailing List, netdev,
	open list:TI WILINK WIRELES..., David S. Miller
In-Reply-To: <CAHp75Vet-vTz_Ld=Lturmm8NSpK1etvoj-7Wzn4-h=bM2FQgZg@mail.gmail.com>

On Sat, Jun 30, 2018 at 12:47 AM, Andy Shevchenko
<andy.shevchenko@gmail.com> wrote:
> On Fri, Jun 29, 2018 at 5:51 AM, YueHaibing <yuehaibing@huawei.com> wrote:
>> 'firmware' is a module param which may been longer than firmware_id,
>> so using strlcpy() to guard against overflows
>
> strncat() is against overflow, this does a bit more.
>
>>         priv->firmware_id[0] = '\0';
> ...
>>         if (firmware) /* module parameter */
>> -               strcpy(priv->firmware_id, firmware);
>> +               strlcpy(priv->firmware_id, firmware, sizeof(priv->firmware_id));
>
> In either case the above '\0' is not needed.
> But it looks like the intention was to use strncat() / strlcat().

Ah, this is under condition, yes. If no parameter supplied, this needs
to be clean, but
priv is allocated with zeroed memory
https://elixir.bootlin.com/linux/latest/source/net/core/dev.c#L8369

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply

* [PATCH ipsec-next] xfrm: Allow Set Mark to be Updated Using UPDSA
From: Nathan Harold @ 2018-06-29 22:07 UTC (permalink / raw)
  To: netdev; +Cc: Nathan Harold

Allow UPDSA to change "set mark" to permit
policy separation of packet routing decisions from
SA keying in systems that use mark-based routing.

The set mark, used as a routing and firewall mark
for outbound packets, is made update-able which
allows routing decisions to be handled independently
of keying/SA creation. To maintain consistency with
other optional attributes, the set mark is only
updated if sent with a non-zero value.

The per-SA lock and the xfrm_state_lock are taken in
that order to avoid a deadlock with
xfrm_timer_handler(), which also takes the locks in
that order.

Signed-off-by: Nathan Harold <nharold@google.com>
Change-Id: Ia05c6733a94c1901cd1e54eb7c7e237704678d71
---
 net/xfrm/xfrm_state.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index e04a510ec992..c9ffcdfa89f6 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -1562,6 +1562,15 @@ int xfrm_state_update(struct xfrm_state *x)
 		if (x1->curlft.use_time)
 			xfrm_state_check_expire(x1);

+		if (x->props.smark.m || x->props.smark.v) {
+			spin_lock_bh(&net->xfrm.xfrm_state_lock);
+
+			x1->props.smark = x->props.smark;
+
+			__xfrm_state_bump_genids(x1);
+			spin_unlock_bh(&net->xfrm.xfrm_state_lock);
+		}
+
 		err = 0;
 		x->km.state = XFRM_STATE_DEAD;
 		__xfrm_state_put(x);
-- 
2.18.0.399.gad0ab374a1-goog

^ permalink raw reply related

* Re: [PATCH v3 net-next 7/9] net: ipv4: listified version of ip_rcv
From: kbuild test robot @ 2018-06-29 22:08 UTC (permalink / raw)
  To: Edward Cree; +Cc: kbuild-all, davem, netdev
In-Reply-To: <419314b8-1fa0-15e6-794d-5ef8874597a2@solarflare.com>

[-- Attachment #1: Type: text/plain, Size: 2754 bytes --]

Hi Edward,

I love your patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Edward-Cree/Handle-multiple-received-packets-at-each-stage/20180630-042204
config: i386-randconfig-a0-201825 (attached as .config)
compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All warnings (new ones prefixed by >>):

   net//ipv4/ip_input.c: In function 'ip_sublist_rcv':
>> net//ipv4/ip_input.c:524:14: warning: passing argument 6 of 'NF_HOOK_LIST' from incompatible pointer type
           head, dev, NULL, ip_rcv_finish);
                 ^
   In file included from include/uapi/linux/netfilter_ipv4.h:9:0,
                    from include/linux/netfilter_ipv4.h:7,
                    from net//ipv4/ip_input.c:145:
   include/linux/netfilter.h:387:1: note: expected 'struct list_head *' but argument is of type 'struct net_device *'
    NF_HOOK_LIST(uint8_t pf, unsigned int hook, struct net *net, struct sock *sk,
    ^
   net//ipv4/ip_input.c:524:25: warning: passing argument 8 of 'NF_HOOK_LIST' from incompatible pointer type
           head, dev, NULL, ip_rcv_finish);
                            ^
   In file included from include/uapi/linux/netfilter_ipv4.h:9:0,
                    from include/linux/netfilter_ipv4.h:7,
                    from net//ipv4/ip_input.c:145:
   include/linux/netfilter.h:387:1: note: expected 'struct net_device *' but argument is of type 'int (*)(struct net *, struct sock *, struct sk_buff *)'
    NF_HOOK_LIST(uint8_t pf, unsigned int hook, struct net *net, struct sock *sk,
    ^
   net//ipv4/ip_input.c:523:2: error: too few arguments to function 'NF_HOOK_LIST'
     NF_HOOK_LIST(NFPROTO_IPV4, NF_INET_PRE_ROUTING, net, NULL,
     ^
   In file included from include/uapi/linux/netfilter_ipv4.h:9:0,
                    from include/linux/netfilter_ipv4.h:7,
                    from net//ipv4/ip_input.c:145:
   include/linux/netfilter.h:387:1: note: declared here
    NF_HOOK_LIST(uint8_t pf, unsigned int hook, struct net *net, struct sock *sk,
    ^

vim +/NF_HOOK_LIST +524 net//ipv4/ip_input.c

   517	
   518	static void ip_sublist_rcv(struct list_head *head, struct net_device *dev,
   519				   struct net *net)
   520	{
   521		struct sk_buff *skb, *next;
   522	
   523		NF_HOOK_LIST(NFPROTO_IPV4, NF_INET_PRE_ROUTING, net, NULL,
 > 524			     head, dev, NULL, ip_rcv_finish);
   525		list_for_each_entry_safe(skb, next, head, list)
   526			ip_rcv_finish(net, NULL, skb);
   527	}
   528	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 27668 bytes --]

^ permalink raw reply

* Anyone know if strongswan works with vrf?
From: Ben Greear @ 2018-06-29 22:10 UTC (permalink / raw)
  To: netdev

Hello,

We're trying to create lots of strongswan VPN tunnels on network devices
bound to different VRFs.  We are using Fedora-24 on the client side, with a 4.16.15+ kernel
and updated 'ip' package, etc.

So far, no luck getting it to work.

Any idea if this is supported or not?

Thanks,
Ben
-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* Re: [PATCH] [v2] infiniband: i40iw, nes: don't use wall time for TCP sequence numbers
From: Shiraz Saleem @ 2018-06-29 22:10 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Latif, Faisal, Doug Ledford, Jason Gunthorpe, David S. Miller,
	Orosco, Henry, Nikolova, Tatyana E, Ismail, Mustafa,
	Bart Van Assche, Yuval Shaia, linux-rdma@vger.kernel.org,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <20180627132628.915978-1-arnd@arndb.de>

On Wed, Jun 27, 2018 at 07:26:05AM -0600, Arnd Bergmann wrote:
> The nes infiniband driver uses current_kernel_time() to get a nanosecond
> granunarity timestamp to initialize its tcp sequence counters. This is
> one of only a few remaining users of that deprecated function, so we
> should try to get rid of it.
> 
> Aside from using a deprecated API, there are several problems I see here:
> 
> - Using a CLOCK_REALTIME based time source makes it predictable in
>   case the time base is synchronized.
> - Using a coarse timestamp means it only gets updated once per jiffie,
>   making it even more predictable in order to avoid having to access
>   the hardware clock source
> - The upper 2 bits are always zero because the nanoseconds are at most
>   999999999.
> 
> For the Linux TCP implementation, we use secure_tcp_seq(), which appears
> to be appropriate here as well, and solves all the above problems.
> 
> i40iw uses a variant of the same code, so I do that same thing there
> for ipv4. Unlike nes, i40e also supports ipv6, which needs to call
> secure_tcpv6_seq instead.
> 
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
> v2: use secure_tcpv6_seq for IPv6 support as suggested by Shiraz Saleem.
> ---

Looks good.

Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>

^ permalink raw reply

* Re: [patch net-next v2 0/9] net: sched: introduce chain templates support with offloading to mlxsw
From: Cong Wang @ 2018-06-29 22:18 UTC (permalink / raw)
  To: sridhar.samudrala
  Cc: Jiri Pirko, David Ahern, Jamal Hadi Salim,
	Linux Kernel Network Developers, David Miller, Jakub Kicinski,
	Simon Horman, john.hurley, mlxsw
In-Reply-To: <b0c7a083-5a15-336a-c30a-8ee785a06684@intel.com>

On Fri, Jun 29, 2018 at 10:06 AM Samudrala, Sridhar
<sridhar.samudrala@intel.com> wrote:
>
> So instead of introducing 'chaintemplate' object in the kernel, can't we add 'chain'
> object in the kernel that takes the 'template' as an attribute?

This is exactly what I mean above. Making the chain a standalone object
in kernel would benefit:

1. Align with 'tc chain' in iproute2, add/del an object is natural

2. Template is an attribute of this object when creating it:
# tc chain add template ....
# tc chain add ... # non-template chain

3. Easier for sharing by qdiscs:
# tc chain add X block Y ...
# tc filter add ... chain X block Y ...
# tc qdisc add dev eth0 block Y ...

The current 'ingress_block 22 ingress' syntax is ugly.

^ permalink raw reply

* Re: [net-next 01/12] net/mlx5e: Add UDP GSO support
From: Willem de Bruijn @ 2018-06-29 22:19 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David Miller, Network Development, ogerlitz, borisp, yossiku,
	Alexander Duyck
In-Reply-To: <20180628215103.9141-2-saeedm@mellanox.com>

On Fri, Jun 29, 2018 at 2:24 AM Saeed Mahameed <saeedm@mellanox.com> wrote:
>
> From: Boris Pismenny <borisp@mellanox.com>
>
> This patch enables UDP GSO support. We enable this by using two WQEs
> the first is a UDP LSO WQE for all segments with equal length, and the
> second is for the last segment in case it has different length.
> Due to HW limitation, before sending, we must adjust the packet length fields.
>
> We measure performance between two Intel(R) Xeon(R) CPU E5-2643 v2 @3.50GHz
> machines connected back-to-back with Connectx4-Lx (40Gbps) NICs.
> We compare single stream UDP, UDP GSO and UDP GSO with offload.
> Performance:
>                 | MSS (bytes)   | Throughput (Gbps)     | CPU utilization (%)
> UDP GSO offload | 1472          | 35.6                  | 8%
> UDP GSO         | 1472          | 25.5                  | 17%
> UDP             | 1472          | 10.2                  | 17%
> UDP GSO offload | 1024          | 35.6                  | 8%
> UDP GSO         | 1024          | 19.2                  | 17%
> UDP             | 1024          | 5.7                   | 17%
> UDP GSO offload | 512           | 33.8                  | 16%
> UDP GSO         | 512           | 10.4                  | 17%
> UDP             | 512           | 3.5                   | 17%

Very nice results :)

> +static void mlx5e_udp_gso_prepare_last_skb(struct sk_buff *skb,
> +                                          struct sk_buff *nskb,
> +                                          int remaining)
> +{
> +       int bytes_needed = remaining, remaining_headlen, remaining_page_offset;
> +       int headlen = skb_transport_offset(skb) + sizeof(struct udphdr);
> +       int payload_len = remaining + sizeof(struct udphdr);
> +       int k = 0, i, j;
> +
> +       skb_copy_bits(skb, 0, nskb->data, headlen);
> +       nskb->dev = skb->dev;
> +       skb_reset_mac_header(nskb);
> +       skb_set_network_header(nskb, skb_network_offset(skb));
> +       skb_set_transport_header(nskb, skb_transport_offset(skb));
> +       skb_set_tail_pointer(nskb, headlen);
> +
> +       /* How many frags do we need? */
> +       for (i = skb_shinfo(skb)->nr_frags - 1; i >= 0; i--) {
> +               bytes_needed -= skb_frag_size(&skb_shinfo(skb)->frags[i]);
> +               k++;
> +               if (bytes_needed <= 0)
> +                       break;
> +       }
> +
> +       /* Fill the first frag and split it if necessary */
> +       j = skb_shinfo(skb)->nr_frags - k;
> +       remaining_page_offset = -bytes_needed;
> +       skb_fill_page_desc(nskb, 0,
> +                          skb_shinfo(skb)->frags[j].page.p,
> +                          skb_shinfo(skb)->frags[j].page_offset + remaining_page_offset,
> +                          skb_shinfo(skb)->frags[j].size - remaining_page_offset);
> +
> +       skb_frag_ref(skb, j);
> +
> +       /* Fill the rest of the frags */
> +       for (i = 1; i < k; i++) {
> +               j = skb_shinfo(skb)->nr_frags - k + i;
> +
> +               skb_fill_page_desc(nskb, i,
> +                                  skb_shinfo(skb)->frags[j].page.p,
> +                                  skb_shinfo(skb)->frags[j].page_offset,
> +                                  skb_shinfo(skb)->frags[j].size);
> +               skb_frag_ref(skb, j);
> +       }
> +       skb_shinfo(nskb)->nr_frags = k;
> +
> +       remaining_headlen = remaining - skb->data_len;
> +
> +       /* headlen contains remaining data? */
> +       if (remaining_headlen > 0)
> +               skb_copy_bits(skb, skb->len - remaining, nskb->data + headlen,
> +                             remaining_headlen);
> +       nskb->len = remaining + headlen;
> +       nskb->data_len =  payload_len - sizeof(struct udphdr) +
> +               max_t(int, 0, remaining_headlen);
> +       nskb->protocol = skb->protocol;
> +       if (nskb->protocol == htons(ETH_P_IP)) {
> +               ip_hdr(nskb)->id = htons(ntohs(ip_hdr(nskb)->id) +
> +                                        skb_shinfo(skb)->gso_segs);
> +               ip_hdr(nskb)->tot_len =
> +                       htons(payload_len + sizeof(struct iphdr));
> +       } else {
> +               ipv6_hdr(nskb)->payload_len = htons(payload_len);
> +       }
> +       udp_hdr(nskb)->len = htons(payload_len);
> +       skb_shinfo(nskb)->gso_size = 0;
> +       nskb->ip_summed = skb->ip_summed;
> +       nskb->csum_start = skb->csum_start;
> +       nskb->csum_offset = skb->csum_offset;
> +       nskb->queue_mapping = skb->queue_mapping;
> +}
> +
> +/* might send skbs and update wqe and pi */
> +struct sk_buff *mlx5e_udp_gso_handle_tx_skb(struct net_device *netdev,
> +                                           struct mlx5e_txqsq *sq,
> +                                           struct sk_buff *skb,
> +                                           struct mlx5e_tx_wqe **wqe,
> +                                           u16 *pi)
> +{
> +       int payload_len = skb_shinfo(skb)->gso_size + sizeof(struct udphdr);
> +       int headlen = skb_transport_offset(skb) + sizeof(struct udphdr);
> +       int remaining = (skb->len - headlen) % skb_shinfo(skb)->gso_size;
> +       struct sk_buff *nskb;
> +
> +       if (skb->protocol == htons(ETH_P_IP))
> +               ip_hdr(skb)->tot_len = htons(payload_len + sizeof(struct iphdr));
> +       else
> +               ipv6_hdr(skb)->payload_len = htons(payload_len);
> +       udp_hdr(skb)->len = htons(payload_len);
> +       if (!remaining)
> +               return skb;
> +
> +       nskb = alloc_skb(max_t(int, headlen, headlen + remaining - skb->data_len), GFP_ATOMIC);
> +       if (unlikely(!nskb)) {
> +               sq->stats->dropped++;
> +               return NULL;
> +       }
> +
> +       mlx5e_udp_gso_prepare_last_skb(skb, nskb, remaining);
> +
> +       skb_shinfo(skb)->gso_segs--;
> +       pskb_trim(skb, skb->len - remaining);
> +       mlx5e_sq_xmit(sq, skb, *wqe, *pi);
> +       mlx5e_sq_fetch_wqe(sq, wqe, pi);
> +       return nskb;
> +}

The device driver seems to be implementing the packet split here
similar to NETIF_F_GSO_PARTIAL. When advertising the right flag, the
stack should be able to do that for you and pass two packets to the
driver.

^ permalink raw reply

* Re: [PATCH v3 net-next 7/9] net: ipv4: listified version of ip_rcv
From: kbuild test robot @ 2018-06-29 22:44 UTC (permalink / raw)
  To: Edward Cree; +Cc: kbuild-all, davem, netdev
In-Reply-To: <419314b8-1fa0-15e6-794d-5ef8874597a2@solarflare.com>

[-- Attachment #1: Type: text/plain, Size: 2917 bytes --]

Hi Edward,

I love your patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Edward-Cree/Handle-multiple-received-packets-at-each-stage/20180630-042204
config: x86_64-randconfig-x003-201825 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   net/ipv4/ip_input.c: In function 'ip_sublist_rcv':
>> net/ipv4/ip_input.c:524:14: error: passing argument 6 of 'NF_HOOK_LIST' from incompatible pointer type [-Werror=incompatible-pointer-types]
           head, dev, NULL, ip_rcv_finish);
                 ^~~
   In file included from include/uapi/linux/netfilter_ipv4.h:9:0,
                    from include/linux/netfilter_ipv4.h:7,
                    from net/ipv4/ip_input.c:145:
   include/linux/netfilter.h:387:1: note: expected 'struct list_head *' but argument is of type 'struct net_device *'
    NF_HOOK_LIST(uint8_t pf, unsigned int hook, struct net *net, struct sock *sk,
    ^~~~~~~~~~~~
   net/ipv4/ip_input.c:524:25: error: passing argument 8 of 'NF_HOOK_LIST' from incompatible pointer type [-Werror=incompatible-pointer-types]
           head, dev, NULL, ip_rcv_finish);
                            ^~~~~~~~~~~~~
   In file included from include/uapi/linux/netfilter_ipv4.h:9:0,
                    from include/linux/netfilter_ipv4.h:7,
                    from net/ipv4/ip_input.c:145:
   include/linux/netfilter.h:387:1: note: expected 'struct net_device *' but argument is of type 'int (*)(struct net *, struct sock *, struct sk_buff *)'
    NF_HOOK_LIST(uint8_t pf, unsigned int hook, struct net *net, struct sock *sk,
    ^~~~~~~~~~~~
>> net/ipv4/ip_input.c:523:2: error: too few arguments to function 'NF_HOOK_LIST'
     NF_HOOK_LIST(NFPROTO_IPV4, NF_INET_PRE_ROUTING, net, NULL,
     ^~~~~~~~~~~~
   In file included from include/uapi/linux/netfilter_ipv4.h:9:0,
                    from include/linux/netfilter_ipv4.h:7,
                    from net/ipv4/ip_input.c:145:
   include/linux/netfilter.h:387:1: note: declared here
    NF_HOOK_LIST(uint8_t pf, unsigned int hook, struct net *net, struct sock *sk,
    ^~~~~~~~~~~~
   cc1: some warnings being treated as errors

vim +/NF_HOOK_LIST +524 net/ipv4/ip_input.c

   517	
   518	static void ip_sublist_rcv(struct list_head *head, struct net_device *dev,
   519				   struct net *net)
   520	{
   521		struct sk_buff *skb, *next;
   522	
 > 523		NF_HOOK_LIST(NFPROTO_IPV4, NF_INET_PRE_ROUTING, net, NULL,
 > 524			     head, dev, NULL, ip_rcv_finish);
   525		list_for_each_entry_safe(skb, next, head, list)
   526			ip_rcv_finish(net, NULL, skb);
   527	}
   528	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 32495 bytes --]

^ permalink raw reply

* Re: [PATCH net-next] openvswitch: kernel datapath clone action
From: Pravin Shelar @ 2018-06-29 22:08 UTC (permalink / raw)
  To: Yifeng Sun; +Cc: Andy Zhou, Linux Kernel Network Developers
In-Reply-To: <1530199208-11687-1-git-send-email-pkusunyifeng@gmail.com>

On Thu, Jun 28, 2018 at 8:20 AM, Yifeng Sun <pkusunyifeng@gmail.com> wrote:
> Add 'clone' action to kernel datapath by using existing functions.
> When actions within clone don't modify the current flow, the flow
> key is not cloned before executing clone actions.
>
> This is a follow up patch for this incomplete work:
> https://patchwork.ozlabs.org/patch/722096/
>
> Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
> Signed-off-by: Andy Zhou <azhou@ovn.org>
> ---
>  include/uapi/linux/openvswitch.h |  8 +++++
>  net/openvswitch/actions.c        | 33 ++++++++++++++++++
>  net/openvswitch/flow_netlink.c   | 73 ++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 114 insertions(+)
>
> diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
> index 863aaba..5de8583 100644
> --- a/include/uapi/linux/openvswitch.h
> +++ b/include/uapi/linux/openvswitch.h
> @@ -625,6 +625,11 @@ struct sample_arg {
>                                       * 'OVS_SAMPLE_ATTR_PROBABILITY'.
>                                       */
>  };
> +
> +#define OVS_CLONE_ATTR_EXEC      0   /* Specify an u32 value. When nonzero,
> +                                     * actions in clone will not change flow
> +                                     * keys. False otherwise.
> +                                     */
>  #endif

This symbol is used only in datapath, so we can move it to kernel
headers from uapi.

> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index 30a5df2..4444e31 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -1057,6 +1057,28 @@ static int sample(struct datapath *dp, struct sk_buff *skb,
>                              clone_flow_key);
>  }
>
> +/* When 'last' is true, clone() should always consume the 'skb'.
> + * Otherwise, clone() should keep 'skb' intact regardless what
> + * actions are executed within clone().
> + */
> +static int clone(struct datapath *dp, struct sk_buff *skb,
> +                struct sw_flow_key *key, const struct nlattr *attr,
> +                bool last)
> +{
> +       struct nlattr *actions;
> +       struct nlattr *clone_arg;
> +       int rem = nla_len(attr);
> +       bool clone_flow_key;
> +
> +       /* The first action is always 'OVS_CLONE_ATTR_ARG'. */
> +       clone_arg = nla_data(attr);
> +       clone_flow_key = !nla_get_u32(clone_arg);
> +       actions = nla_next(clone_arg, &rem);
> +

Since OVS_CLONE_ATTR_EXEC means do not clone the key, it can be named
accordingly.

^ permalink raw reply

* Re: [PATCH bpf 3/3] bpf: undo prog rejection on read-only lock failure
From: Daniel Borkmann @ 2018-06-29 23:47 UTC (permalink / raw)
  To: Kees Cook; +Cc: Alexei Starovoitov, Network Development, Laura Abbott
In-Reply-To: <CAGXu5j+SAurTmBVgtABjOBmbHUczjWbV=DwmTc9xQseTthVwtw@mail.gmail.com>

On 06/29/2018 08:42 PM, Kees Cook wrote:
> On Thu, Jun 28, 2018 at 2:34 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
>> Kees suggested that if set_memory_*() can fail, we should annotate it with
>> __must_check, and all callers need to deal with it gracefully given those
>> set_memory_*() markings aren't "advisory", but they're expected to actually
>> do what they say. This might be an option worth to move forward in future
>> but would at the same time require that set_memory_*() calls from supporting
>> archs are guaranteed to be "atomic" in that they provide rollback if part
>> of the range fails, once that happened, the transition from RW -> RO could
>> be made more robust that way, while subsequent RO -> RW transition /must/
>> continue guaranteeing to always succeed the undo part.
> 
> Does this mean we can have BPF filters that aren't read-only then?
> What's the situation where set_memory_ro() fails? (Can it be induced
> by the user?)

My understanding is that the cpa_process_alias() would attempt to also change
attributes of physmap ranges, and it found that a large page had to be split
for this but failed in doing so thus attributes couldn't be updated there due
to page alloc error. Attempting to change the primary mapping which would be
directly the addr passed to set_memory_ro() was however set to read-only
despite error. While for reproduction I had a toggle on the alloc_pages() in
split_large_page() to have it fail, I only could trigger it occasionally; I
used the selftest suite in a loop to stress test and it hit about or twice
over hours.

Thanks,
Daniel

^ permalink raw reply

* Re: [PATCH v2 net-next 0/2] net: preserve sock reference when scrubbing the skb.
From: Cong Wang @ 2018-06-30  0:15 UTC (permalink / raw)
  To: David Miller
  Cc: Flavio Leitner, Linux Kernel Network Developers, Eric Dumazet,
	Paolo Abeni, Florian Westphal, NetFilter
In-Reply-To: <20180629.112235.970317820691624358.davem@davemloft.net>

On Thu, Jun 28, 2018 at 7:22 PM David Miller <davem@davemloft.net> wrote:
>
> From: Cong Wang <xiyou.wangcong@gmail.com>
> Date: Thu, 28 Jun 2018 14:53:09 -0700
>
> > I will send a revert with quote of the above.
>
> And it will go to /dev/null as far as I am concerned.  I read it the
> first time, so posting it again will not change my opinion of what you
> have to say.

David, you claim you read it, now tell me, where is "cgroups" or "cpu"
from?

This is the link of my reply you quoted:
https://marc.info/?l=linux-netdev&m=153013948711582&w=2

I did mention cgroups to Eric because of isolation, the softnet_data
is per-CPU, and CPU is not isolated by netns apparently, therefore
sd->input_pkt_queue can't be totally isolated for netns without cpuset.

But this is never the reason why I dislike it, this is why I never even
mentioned it in the link above.

>
> Cong, you really need to calm down and understand that people perhaps
> simply fundamentally disagree with you.

1. Eric's "forwarding to eth0" is missing, never brought up until in his
private reply. Without this information, XPS makes no sense at all in
this patchset. For the record, I provide a different solution for Eric.

2. No one responses to:
https://marc.info/?l=linux-netdev&m=153013948711582&w=2
I never expect you agree with me on all of them, but no one
gives me any response to my concerns.

3. I will write a blog post to draw you some pictures, since it
is so hard to understand the isolation...

^ permalink raw reply

* [PATCH net-next 0/9] nfp: flower updates and netconsole
From: Jakub Kicinski @ 2018-06-30  0:04 UTC (permalink / raw)
  To: davem; +Cc: oss-drivers, netdev, Jakub Kicinski

Hi!

This set contains assorted updates to driver base and flower.
First patch is a follow up to a fix to calculating counters which
went into net.  For ethtool counters we should also make sure
they are visible even after ring reconfiguration.  Next patch
is a safety measure in case we are running on a machine with a
broken BIOS we should fail the probe when risk of incorrect
operation is too high.  The next two patches add netpoll support
and make use of napi_consume_skb().  Last we populate bus info
on all representors.

Pieter adds support for offload of the checksum action in flower.

John follows up to another fix he's done in net, we set TTL
values on tunnels to stack's default, now Johns does a full
route lookup to get a more precise information, he populates
ToS field as well.  Last but not least he follows up on Jiri's
request to enable LAG offload in case the team driver is used
and then hash function is unknown.

Jakub Kicinski (5):
  nfp: expose ring stats of inactive rings via ethtool
  nfp: fail probe if serial or interface id is missing
  nfp: implement netpoll ndo (thus enabling netconsole)
  nfp: make use of napi_consume_skb()
  nfp: populate bus-info on representors

John Hurley (3):
  nfp: flower: extract ipv4 udp tunnel ttl from route
  nfp: flower: offload tos and tunnel flags for ipv4 udp tunnels
  nfp: flower: enabled offloading of Team LAG

Pieter Jansen van Vuuren (1):
  nfp: flower: ignore checksum actions when performing pedit actions

 .../ethernet/netronome/nfp/flower/action.c    | 108 ++++++++++++++++--
 .../net/ethernet/netronome/nfp/flower/cmsg.h  |   4 +-
 .../ethernet/netronome/nfp/flower/lag_conf.c  |   5 +-
 .../ethernet/netronome/nfp/nfp_net_common.c   |  29 ++++-
 .../ethernet/netronome/nfp/nfp_net_ethtool.c  |  58 ++++------
 .../netronome/nfp/nfpcore/nfp6000_pcie.c      |  16 ++-
 .../ethernet/netronome/nfp/nfpcore/nfp_cpp.h  |   4 +-
 .../netronome/nfp/nfpcore/nfp_cppcore.c       |  22 +++-
 8 files changed, 178 insertions(+), 68 deletions(-)

-- 
2.17.1

^ permalink raw reply

* [PATCH net-next 1/9] nfp: expose ring stats of inactive rings via ethtool
From: Jakub Kicinski @ 2018-06-30  0:04 UTC (permalink / raw)
  To: davem; +Cc: oss-drivers, netdev, Jakub Kicinski
In-Reply-To: <20180630000442.27353-1-jakub.kicinski@netronome.com>

After user changes the ring count statistics for deactivated
rings disappear from ethtool -S output.  This causes loss of
information to the user and means that ethtool stats may not
add up to interface stats.  Always expose counters from all
the rings.  Note that we allocate at most num_possible_cpus()
rings so number of rings should be reasonable.

The alternative of only listing stats for rings which were
ever in use could be confusing.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
---
 .../ethernet/netronome/nfp/nfp_net_ethtool.c  | 50 +++++++------------
 1 file changed, 19 insertions(+), 31 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
index 26d1cc4e2906..2aeb4622f1ea 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
@@ -452,7 +452,7 @@ static unsigned int nfp_vnic_get_sw_stats_count(struct net_device *netdev)
 {
 	struct nfp_net *nn = netdev_priv(netdev);
 
-	return NN_RVEC_GATHER_STATS + nn->dp.num_r_vecs * NN_RVEC_PER_Q_STATS;
+	return NN_RVEC_GATHER_STATS + nn->max_r_vecs * NN_RVEC_PER_Q_STATS;
 }
 
 static u8 *nfp_vnic_get_sw_stats_strings(struct net_device *netdev, u8 *data)
@@ -460,7 +460,7 @@ static u8 *nfp_vnic_get_sw_stats_strings(struct net_device *netdev, u8 *data)
 	struct nfp_net *nn = netdev_priv(netdev);
 	int i;
 
-	for (i = 0; i < nn->dp.num_r_vecs; i++) {
+	for (i = 0; i < nn->max_r_vecs; i++) {
 		data = nfp_pr_et(data, "rvec_%u_rx_pkts", i);
 		data = nfp_pr_et(data, "rvec_%u_tx_pkts", i);
 		data = nfp_pr_et(data, "rvec_%u_tx_busy", i);
@@ -486,7 +486,7 @@ static u64 *nfp_vnic_get_sw_stats(struct net_device *netdev, u64 *data)
 	u64 tmp[NN_RVEC_GATHER_STATS];
 	unsigned int i, j;
 
-	for (i = 0; i < nn->dp.num_r_vecs; i++) {
+	for (i = 0; i < nn->max_r_vecs; i++) {
 		unsigned int start;
 
 		do {
@@ -521,15 +521,13 @@ static u64 *nfp_vnic_get_sw_stats(struct net_device *netdev, u64 *data)
 	return data;
 }
 
-static unsigned int
-nfp_vnic_get_hw_stats_count(unsigned int rx_rings, unsigned int tx_rings)
+static unsigned int nfp_vnic_get_hw_stats_count(unsigned int num_vecs)
 {
-	return NN_ET_GLOBAL_STATS_LEN + (rx_rings + tx_rings) * 2;
+	return NN_ET_GLOBAL_STATS_LEN + num_vecs * 4;
 }
 
 static u8 *
-nfp_vnic_get_hw_stats_strings(u8 *data, unsigned int rx_rings,
-			      unsigned int tx_rings, bool repr)
+nfp_vnic_get_hw_stats_strings(u8 *data, unsigned int num_vecs, bool repr)
 {
 	int swap_off, i;
 
@@ -549,36 +547,29 @@ nfp_vnic_get_hw_stats_strings(u8 *data, unsigned int rx_rings,
 	for (i = NN_ET_SWITCH_STATS_LEN * 2; i < NN_ET_GLOBAL_STATS_LEN; i++)
 		data = nfp_pr_et(data, nfp_net_et_stats[i].name);
 
-	for (i = 0; i < tx_rings; i++) {
-		data = nfp_pr_et(data, "txq_%u_pkts", i);
-		data = nfp_pr_et(data, "txq_%u_bytes", i);
-	}
-
-	for (i = 0; i < rx_rings; i++) {
+	for (i = 0; i < num_vecs; i++) {
 		data = nfp_pr_et(data, "rxq_%u_pkts", i);
 		data = nfp_pr_et(data, "rxq_%u_bytes", i);
+		data = nfp_pr_et(data, "txq_%u_pkts", i);
+		data = nfp_pr_et(data, "txq_%u_bytes", i);
 	}
 
 	return data;
 }
 
 static u64 *
-nfp_vnic_get_hw_stats(u64 *data, u8 __iomem *mem,
-		      unsigned int rx_rings, unsigned int tx_rings)
+nfp_vnic_get_hw_stats(u64 *data, u8 __iomem *mem, unsigned int num_vecs)
 {
 	unsigned int i;
 
 	for (i = 0; i < NN_ET_GLOBAL_STATS_LEN; i++)
 		*data++ = readq(mem + nfp_net_et_stats[i].off);
 
-	for (i = 0; i < tx_rings; i++) {
-		*data++ = readq(mem + NFP_NET_CFG_TXR_STATS(i));
-		*data++ = readq(mem + NFP_NET_CFG_TXR_STATS(i) + 8);
-	}
-
-	for (i = 0; i < rx_rings; i++) {
+	for (i = 0; i < num_vecs; i++) {
 		*data++ = readq(mem + NFP_NET_CFG_RXR_STATS(i));
 		*data++ = readq(mem + NFP_NET_CFG_RXR_STATS(i) + 8);
+		*data++ = readq(mem + NFP_NET_CFG_TXR_STATS(i));
+		*data++ = readq(mem + NFP_NET_CFG_TXR_STATS(i) + 8);
 	}
 
 	return data;
@@ -633,8 +624,7 @@ static void nfp_net_get_strings(struct net_device *netdev,
 	switch (stringset) {
 	case ETH_SS_STATS:
 		data = nfp_vnic_get_sw_stats_strings(netdev, data);
-		data = nfp_vnic_get_hw_stats_strings(data, nn->dp.num_rx_rings,
-						     nn->dp.num_tx_rings,
+		data = nfp_vnic_get_hw_stats_strings(data, nn->max_r_vecs,
 						     false);
 		data = nfp_mac_get_stats_strings(netdev, data);
 		data = nfp_app_port_get_stats_strings(nn->port, data);
@@ -649,8 +639,7 @@ nfp_net_get_stats(struct net_device *netdev, struct ethtool_stats *stats,
 	struct nfp_net *nn = netdev_priv(netdev);
 
 	data = nfp_vnic_get_sw_stats(netdev, data);
-	data = nfp_vnic_get_hw_stats(data, nn->dp.ctrl_bar,
-				     nn->dp.num_rx_rings, nn->dp.num_tx_rings);
+	data = nfp_vnic_get_hw_stats(data, nn->dp.ctrl_bar, nn->max_r_vecs);
 	data = nfp_mac_get_stats(netdev, data);
 	data = nfp_app_port_get_stats(nn->port, data);
 }
@@ -662,8 +651,7 @@ static int nfp_net_get_sset_count(struct net_device *netdev, int sset)
 	switch (sset) {
 	case ETH_SS_STATS:
 		return nfp_vnic_get_sw_stats_count(netdev) +
-		       nfp_vnic_get_hw_stats_count(nn->dp.num_rx_rings,
-						   nn->dp.num_tx_rings) +
+		       nfp_vnic_get_hw_stats_count(nn->max_r_vecs) +
 		       nfp_mac_get_stats_count(netdev) +
 		       nfp_app_port_get_stats_count(nn->port);
 	default:
@@ -679,7 +667,7 @@ static void nfp_port_get_strings(struct net_device *netdev,
 	switch (stringset) {
 	case ETH_SS_STATS:
 		if (nfp_port_is_vnic(port))
-			data = nfp_vnic_get_hw_stats_strings(data, 0, 0, true);
+			data = nfp_vnic_get_hw_stats_strings(data, 0, true);
 		else
 			data = nfp_mac_get_stats_strings(netdev, data);
 		data = nfp_app_port_get_stats_strings(port, data);
@@ -694,7 +682,7 @@ nfp_port_get_stats(struct net_device *netdev, struct ethtool_stats *stats,
 	struct nfp_port *port = nfp_port_from_netdev(netdev);
 
 	if (nfp_port_is_vnic(port))
-		data = nfp_vnic_get_hw_stats(data, port->vnic, 0, 0);
+		data = nfp_vnic_get_hw_stats(data, port->vnic, 0);
 	else
 		data = nfp_mac_get_stats(netdev, data);
 	data = nfp_app_port_get_stats(port, data);
@@ -708,7 +696,7 @@ static int nfp_port_get_sset_count(struct net_device *netdev, int sset)
 	switch (sset) {
 	case ETH_SS_STATS:
 		if (nfp_port_is_vnic(port))
-			count = nfp_vnic_get_hw_stats_count(0, 0);
+			count = nfp_vnic_get_hw_stats_count(0);
 		else
 			count = nfp_mac_get_stats_count(netdev);
 		count += nfp_app_port_get_stats_count(port);
-- 
2.17.1

^ permalink raw reply related

* [PATCH net-next 2/9] nfp: fail probe if serial or interface id is missing
From: Jakub Kicinski @ 2018-06-30  0:04 UTC (permalink / raw)
  To: davem; +Cc: oss-drivers, netdev, Jakub Kicinski
In-Reply-To: <20180630000442.27353-1-jakub.kicinski@netronome.com>

On some platforms with broken ACPI tables we may not have access
to the Serial Number PCIe capability.  This capability is crucial
for us for switchdev operation as we use serial number as switch ID,
and for communication with management FW where interface ID is used.

If we can't determine the Serial Number we have to fail device probe.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 .../netronome/nfp/nfpcore/nfp6000_pcie.c      | 16 +++++++++-----
 .../ethernet/netronome/nfp/nfpcore/nfp_cpp.h  |  4 ++--
 .../netronome/nfp/nfpcore/nfp_cppcore.c       | 22 ++++++++++++++-----
 3 files changed, 28 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c
index 749655c329b2..c8d0b1016a64 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c
@@ -1248,7 +1248,7 @@ static void nfp6000_free(struct nfp_cpp *cpp)
 	kfree(nfp);
 }
 
-static void nfp6000_read_serial(struct device *dev, u8 *serial)
+static int nfp6000_read_serial(struct device *dev, u8 *serial)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
 	int pos;
@@ -1256,25 +1256,29 @@ static void nfp6000_read_serial(struct device *dev, u8 *serial)
 
 	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_DSN);
 	if (!pos) {
-		memset(serial, 0, NFP_SERIAL_LEN);
-		return;
+		dev_err(dev, "can't find PCIe Serial Number Capability\n");
+		return -EINVAL;
 	}
 
 	pci_read_config_dword(pdev, pos + 4, &reg);
 	put_unaligned_be16(reg >> 16, serial + 4);
 	pci_read_config_dword(pdev, pos + 8, &reg);
 	put_unaligned_be32(reg, serial);
+
+	return 0;
 }
 
-static u16 nfp6000_get_interface(struct device *dev)
+static int nfp6000_get_interface(struct device *dev)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
 	int pos;
 	u32 reg;
 
 	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_DSN);
-	if (!pos)
-		return NFP_CPP_INTERFACE(NFP_CPP_INTERFACE_TYPE_PCI, 0, 0xff);
+	if (!pos) {
+		dev_err(dev, "can't find PCIe Serial Number Capability\n");
+		return -EINVAL;
+	}
 
 	pci_read_config_dword(pdev, pos + 4, &reg);
 
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cpp.h b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cpp.h
index b0da3d436850..c338d539fa96 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cpp.h
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cpp.h
@@ -364,8 +364,8 @@ struct nfp_cpp_operations {
 	int (*init)(struct nfp_cpp *cpp);
 	void (*free)(struct nfp_cpp *cpp);
 
-	void (*read_serial)(struct device *dev, u8 *serial);
-	u16 (*get_interface)(struct device *dev);
+	int (*read_serial)(struct device *dev, u8 *serial);
+	int (*get_interface)(struct device *dev);
 
 	int (*area_init)(struct nfp_cpp_area *area,
 			 u32 dest, unsigned long long address,
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c
index ef30597aa319..73de57a09800 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c
@@ -1163,10 +1163,10 @@ nfp_cpp_from_operations(const struct nfp_cpp_operations *ops,
 {
 	const u32 arm = NFP_CPP_ID(NFP_CPP_TARGET_ARM, NFP_CPP_ACTION_RW, 0);
 	struct nfp_cpp *cpp;
+	int ifc, err;
 	u32 mask[2];
 	u32 xpbaddr;
 	size_t tgt;
-	int err;
 
 	cpp = kzalloc(sizeof(*cpp), GFP_KERNEL);
 	if (!cpp) {
@@ -1176,9 +1176,19 @@ nfp_cpp_from_operations(const struct nfp_cpp_operations *ops,
 
 	cpp->op = ops;
 	cpp->priv = priv;
-	cpp->interface = ops->get_interface(parent);
-	if (ops->read_serial)
-		ops->read_serial(parent, cpp->serial);
+
+	ifc = ops->get_interface(parent);
+	if (ifc < 0) {
+		err = ifc;
+		goto err_free_cpp;
+	}
+	cpp->interface = ifc;
+	if (ops->read_serial) {
+		err = ops->read_serial(parent, cpp->serial);
+		if (err)
+			goto err_free_cpp;
+	}
+
 	rwlock_init(&cpp->resource_lock);
 	init_waitqueue_head(&cpp->waitq);
 	lockdep_set_class(&cpp->resource_lock, &nfp_cpp_resource_lock_key);
@@ -1191,7 +1201,7 @@ nfp_cpp_from_operations(const struct nfp_cpp_operations *ops,
 	err = device_register(&cpp->dev);
 	if (err < 0) {
 		put_device(&cpp->dev);
-		goto err_dev;
+		goto err_free_cpp;
 	}
 
 	dev_set_drvdata(&cpp->dev, cpp);
@@ -1238,7 +1248,7 @@ nfp_cpp_from_operations(const struct nfp_cpp_operations *ops,
 
 err_out:
 	device_unregister(&cpp->dev);
-err_dev:
+err_free_cpp:
 	kfree(cpp);
 err_malloc:
 	return ERR_PTR(err);
-- 
2.17.1

^ permalink raw reply related

* [PATCH net-next 3/9] nfp: implement netpoll ndo (thus enabling netconsole)
From: Jakub Kicinski @ 2018-06-30  0:04 UTC (permalink / raw)
  To: davem; +Cc: oss-drivers, netdev, Jakub Kicinski
In-Reply-To: <20180630000442.27353-1-jakub.kicinski@netronome.com>

NFP NAPI handling will only complete the TXed packets when called
with budget of 0, implement ndo_poll_controller by scheduling NAPI
on all TX queues.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 .../ethernet/netronome/nfp/nfp_net_common.c    | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index d4c27f849f9b..edc6ef682f6d 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -3115,6 +3115,21 @@ nfp_net_vlan_rx_kill_vid(struct net_device *netdev, __be16 proto, u16 vid)
 	return nfp_net_reconfig_mbox(nn, NFP_NET_CFG_MBOX_CMD_CTAG_FILTER_KILL);
 }
 
+#ifdef CONFIG_NET_POLL_CONTROLLER
+static void nfp_net_netpoll(struct net_device *netdev)
+{
+	struct nfp_net *nn = netdev_priv(netdev);
+	int i;
+
+	/* nfp_net's NAPIs are statically allocated so even if there is a race
+	 * with reconfig path this will simply try to schedule some disabled
+	 * NAPI instances.
+	 */
+	for (i = 0; i < nn->dp.num_stack_tx_rings; i++)
+		napi_schedule_irqoff(&nn->r_vecs[i].napi);
+}
+#endif
+
 static void nfp_net_stat64(struct net_device *netdev,
 			   struct rtnl_link_stats64 *stats)
 {
@@ -3482,6 +3497,9 @@ const struct net_device_ops nfp_net_netdev_ops = {
 	.ndo_get_stats64	= nfp_net_stat64,
 	.ndo_vlan_rx_add_vid	= nfp_net_vlan_rx_add_vid,
 	.ndo_vlan_rx_kill_vid	= nfp_net_vlan_rx_kill_vid,
+#ifdef CONFIG_NET_POLL_CONTROLLER
+	.ndo_poll_controller	= nfp_net_netpoll,
+#endif
 	.ndo_set_vf_mac         = nfp_app_set_vf_mac,
 	.ndo_set_vf_vlan        = nfp_app_set_vf_vlan,
 	.ndo_set_vf_spoofchk    = nfp_app_set_vf_spoofchk,
-- 
2.17.1

^ permalink raw reply related

* [PATCH net-next 4/9] nfp: make use of napi_consume_skb()
From: Jakub Kicinski @ 2018-06-30  0:04 UTC (permalink / raw)
  To: davem; +Cc: oss-drivers, netdev, Jakub Kicinski
In-Reply-To: <20180630000442.27353-1-jakub.kicinski@netronome.com>

Use napi_consume_skb() in nfp_net_tx_complete() to get bulk free.
Pass 0 as budget for ctrl queue completion since it runs out of
a tasklet.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index edc6ef682f6d..7df5ca37bfb8 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -945,11 +945,12 @@ static int nfp_net_tx(struct sk_buff *skb, struct net_device *netdev)
 
 /**
  * nfp_net_tx_complete() - Handled completed TX packets
- * @tx_ring:   TX ring structure
+ * @tx_ring:	TX ring structure
+ * @budget:	NAPI budget (only used as bool to determine if in NAPI context)
  *
  * Return: Number of completed TX descriptors
  */
-static void nfp_net_tx_complete(struct nfp_net_tx_ring *tx_ring)
+static void nfp_net_tx_complete(struct nfp_net_tx_ring *tx_ring, int budget)
 {
 	struct nfp_net_r_vector *r_vec = tx_ring->r_vec;
 	struct nfp_net_dp *dp = &r_vec->nfp_net->dp;
@@ -999,7 +1000,7 @@ static void nfp_net_tx_complete(struct nfp_net_tx_ring *tx_ring)
 
 		/* check for last gather fragment */
 		if (fidx == nr_frags - 1)
-			dev_consume_skb_any(skb);
+			napi_consume_skb(skb, budget);
 
 		tx_ring->txbufs[idx].dma_addr = 0;
 		tx_ring->txbufs[idx].skb = NULL;
@@ -1828,7 +1829,7 @@ static int nfp_net_poll(struct napi_struct *napi, int budget)
 	unsigned int pkts_polled = 0;
 
 	if (r_vec->tx_ring)
-		nfp_net_tx_complete(r_vec->tx_ring);
+		nfp_net_tx_complete(r_vec->tx_ring, budget);
 	if (r_vec->rx_ring)
 		pkts_polled = nfp_net_rx(r_vec->rx_ring, budget);
 
@@ -2062,7 +2063,7 @@ static void nfp_ctrl_poll(unsigned long arg)
 	struct nfp_net_r_vector *r_vec = (void *)arg;
 
 	spin_lock_bh(&r_vec->lock);
-	nfp_net_tx_complete(r_vec->tx_ring);
+	nfp_net_tx_complete(r_vec->tx_ring, 0);
 	__nfp_ctrl_tx_queued(r_vec);
 	spin_unlock_bh(&r_vec->lock);
 
-- 
2.17.1

^ permalink raw reply related

* [PATCH net-next 5/9] nfp: populate bus-info on representors
From: Jakub Kicinski @ 2018-06-30  0:04 UTC (permalink / raw)
  To: davem; +Cc: oss-drivers, netdev, Jakub Kicinski
In-Reply-To: <20180630000442.27353-1-jakub.kicinski@netronome.com>

We used to leave bus-info in ethtool driver info empty for
representors in case multi-PCIe-to-single-host cards make
the association between PCIe device and NFP many to one.
It seems these attempts are futile, we need to link the
representors to one PCIe device in sysfs to get consistent
naming, plus devlink uses one PCIe as a handle, anyway.
The multi-PCIe-to-single-system support won't be clean,
if it ever comes.

Turns out some user space (RHEL tests) likes to read bus-info
so just populate it.

While at it remove unnecessary app NULL-check, representors
are spawned by an app, so it must exist.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
index 2aeb4622f1ea..6a79c8e4a7a4 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
@@ -233,12 +233,10 @@ nfp_net_get_drvinfo(struct net_device *netdev, struct ethtool_drvinfo *drvinfo)
 static void
 nfp_app_get_drvinfo(struct net_device *netdev, struct ethtool_drvinfo *drvinfo)
 {
-	struct nfp_app *app;
-
-	app = nfp_app_from_netdev(netdev);
-	if (!app)
-		return;
+	struct nfp_app *app = nfp_app_from_netdev(netdev);

+	strlcpy(drvinfo->bus_info, pci_name(app->pdev),
+		sizeof(drvinfo->bus_info));
 	nfp_get_drvinfo(app, app->pdev, "*", drvinfo);
 }

-- 
2.17.1

^ permalink raw reply related

* [PATCH net-next 6/9] nfp: flower: ignore checksum actions when performing pedit actions
From: Jakub Kicinski @ 2018-06-30  0:04 UTC (permalink / raw)
  To: davem; +Cc: oss-drivers, netdev, Pieter Jansen van Vuuren
In-Reply-To: <20180630000442.27353-1-jakub.kicinski@netronome.com>

From: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>

Hardware will automatically update csum in headers when a set action has
been performed. This means we could in the driver ignore the explicit
checksum action when performing a set action.

Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
---
 .../ethernet/netronome/nfp/flower/action.c    | 80 +++++++++++++++++--
 1 file changed, 72 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/action.c b/drivers/net/ethernet/netronome/nfp/flower/action.c
index 4a6d2db75071..61ba8d4f99f1 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/action.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/action.c
@@ -34,6 +34,7 @@
 #include <linux/bitfield.h>
 #include <net/pkt_cls.h>
 #include <net/switchdev.h>
+#include <net/tc_act/tc_csum.h>
 #include <net/tc_act/tc_gact.h>
 #include <net/tc_act/tc_mirred.h>
 #include <net/tc_act/tc_pedit.h>
@@ -398,8 +399,27 @@ nfp_fl_set_tport(const struct tc_action *action, int idx, u32 off,
 	return 0;
 }
 
+static u32 nfp_fl_csum_l4_to_flag(u8 ip_proto)
+{
+	switch (ip_proto) {
+	case 0:
+		/* Filter doesn't force proto match,
+		 * both TCP and UDP will be updated if encountered
+		 */
+		return TCA_CSUM_UPDATE_FLAG_TCP | TCA_CSUM_UPDATE_FLAG_UDP;
+	case IPPROTO_TCP:
+		return TCA_CSUM_UPDATE_FLAG_TCP;
+	case IPPROTO_UDP:
+		return TCA_CSUM_UPDATE_FLAG_UDP;
+	default:
+		/* All other protocols will be ignored by FW */
+		return 0;
+	}
+}
+
 static int
-nfp_fl_pedit(const struct tc_action *action, char *nfp_action, int *a_len)
+nfp_fl_pedit(const struct tc_action *action, struct tc_cls_flower_offload *flow,
+	     char *nfp_action, int *a_len, u32 *csum_updated)
 {
 	struct nfp_fl_set_ipv6_addr set_ip6_dst, set_ip6_src;
 	struct nfp_fl_set_ip4_addrs set_ip_addr;
@@ -409,6 +429,7 @@ nfp_fl_pedit(const struct tc_action *action, char *nfp_action, int *a_len)
 	int idx, nkeys, err;
 	size_t act_size;
 	u32 offset, cmd;
+	u8 ip_proto = 0;
 
 	memset(&set_ip6_dst, 0, sizeof(set_ip6_dst));
 	memset(&set_ip6_src, 0, sizeof(set_ip6_src));
@@ -451,6 +472,15 @@ nfp_fl_pedit(const struct tc_action *action, char *nfp_action, int *a_len)
 			return err;
 	}
 
+	if (dissector_uses_key(flow->dissector, FLOW_DISSECTOR_KEY_BASIC)) {
+		struct flow_dissector_key_basic *basic;
+
+		basic = skb_flow_dissector_target(flow->dissector,
+						  FLOW_DISSECTOR_KEY_BASIC,
+						  flow->key);
+		ip_proto = basic->ip_proto;
+	}
+
 	if (set_eth.head.len_lw) {
 		act_size = sizeof(set_eth);
 		memcpy(nfp_action, &set_eth, act_size);
@@ -459,6 +489,10 @@ nfp_fl_pedit(const struct tc_action *action, char *nfp_action, int *a_len)
 		act_size = sizeof(set_ip_addr);
 		memcpy(nfp_action, &set_ip_addr, act_size);
 		*a_len += act_size;
+
+		/* Hardware will automatically fix IPv4 and TCP/UDP checksum. */
+		*csum_updated |= TCA_CSUM_UPDATE_FLAG_IPV4HDR |
+				nfp_fl_csum_l4_to_flag(ip_proto);
 	} else if (set_ip6_dst.head.len_lw && set_ip6_src.head.len_lw) {
 		/* TC compiles set src and dst IPv6 address as a single action,
 		 * the hardware requires this to be 2 separate actions.
@@ -471,18 +505,30 @@ nfp_fl_pedit(const struct tc_action *action, char *nfp_action, int *a_len)
 		memcpy(&nfp_action[sizeof(set_ip6_src)], &set_ip6_dst,
 		       act_size);
 		*a_len += act_size;
+
+		/* Hardware will automatically fix TCP/UDP checksum. */
+		*csum_updated |= nfp_fl_csum_l4_to_flag(ip_proto);
 	} else if (set_ip6_dst.head.len_lw) {
 		act_size = sizeof(set_ip6_dst);
 		memcpy(nfp_action, &set_ip6_dst, act_size);
 		*a_len += act_size;
+
+		/* Hardware will automatically fix TCP/UDP checksum. */
+		*csum_updated |= nfp_fl_csum_l4_to_flag(ip_proto);
 	} else if (set_ip6_src.head.len_lw) {
 		act_size = sizeof(set_ip6_src);
 		memcpy(nfp_action, &set_ip6_src, act_size);
 		*a_len += act_size;
+
+		/* Hardware will automatically fix TCP/UDP checksum. */
+		*csum_updated |= nfp_fl_csum_l4_to_flag(ip_proto);
 	} else if (set_tport.head.len_lw) {
 		act_size = sizeof(set_tport);
 		memcpy(nfp_action, &set_tport, act_size);
 		*a_len += act_size;
+
+		/* Hardware will automatically fix TCP/UDP checksum. */
+		*csum_updated |= nfp_fl_csum_l4_to_flag(ip_proto);
 	}
 
 	return 0;
@@ -493,12 +539,18 @@ nfp_flower_output_action(struct nfp_app *app, const struct tc_action *a,
 			 struct nfp_fl_payload *nfp_fl, int *a_len,
 			 struct net_device *netdev, bool last,
 			 enum nfp_flower_tun_type *tun_type, int *tun_out_cnt,
-			 int *out_cnt)
+			 int *out_cnt, u32 *csum_updated)
 {
 	struct nfp_flower_priv *priv = app->priv;
 	struct nfp_fl_output *output;
 	int err, prelag_size;
 
+	/* If csum_updated has not been reset by now, it means HW will
+	 * incorrectly update csums when they are not requested.
+	 */
+	if (*csum_updated)
+		return -EOPNOTSUPP;
+
 	if (*a_len + sizeof(struct nfp_fl_output) > NFP_FL_MAX_A_SIZ)
 		return -EOPNOTSUPP;
 
@@ -529,10 +581,11 @@ nfp_flower_output_action(struct nfp_app *app, const struct tc_action *a,
 
 static int
 nfp_flower_loop_action(struct nfp_app *app, const struct tc_action *a,
+		       struct tc_cls_flower_offload *flow,
 		       struct nfp_fl_payload *nfp_fl, int *a_len,
 		       struct net_device *netdev,
 		       enum nfp_flower_tun_type *tun_type, int *tun_out_cnt,
-		       int *out_cnt)
+		       int *out_cnt, u32 *csum_updated)
 {
 	struct nfp_fl_set_ipv4_udp_tun *set_tun;
 	struct nfp_fl_pre_tunnel *pre_tun;
@@ -545,14 +598,14 @@ nfp_flower_loop_action(struct nfp_app *app, const struct tc_action *a,
 	} else if (is_tcf_mirred_egress_redirect(a)) {
 		err = nfp_flower_output_action(app, a, nfp_fl, a_len, netdev,
 					       true, tun_type, tun_out_cnt,
-					       out_cnt);
+					       out_cnt, csum_updated);
 		if (err)
 			return err;
 
 	} else if (is_tcf_mirred_egress_mirror(a)) {
 		err = nfp_flower_output_action(app, a, nfp_fl, a_len, netdev,
 					       false, tun_type, tun_out_cnt,
-					       out_cnt);
+					       out_cnt, csum_updated);
 		if (err)
 			return err;
 
@@ -602,8 +655,17 @@ nfp_flower_loop_action(struct nfp_app *app, const struct tc_action *a,
 		/* Tunnel decap is handled by default so accept action. */
 		return 0;
 	} else if (is_tcf_pedit(a)) {
-		if (nfp_fl_pedit(a, &nfp_fl->action_data[*a_len], a_len))
+		if (nfp_fl_pedit(a, flow, &nfp_fl->action_data[*a_len],
+				 a_len, csum_updated))
 			return -EOPNOTSUPP;
+	} else if (is_tcf_csum(a)) {
+		/* csum action requests recalc of something we have not fixed */
+		if (tcf_csum_update_flags(a) & ~*csum_updated)
+			return -EOPNOTSUPP;
+		/* If we will correctly fix the csum we can remove it from the
+		 * csum update list. Which will later be used to check support.
+		 */
+		*csum_updated &= ~tcf_csum_update_flags(a);
 	} else {
 		/* Currently we do not handle any other actions. */
 		return -EOPNOTSUPP;
@@ -620,6 +682,7 @@ int nfp_flower_compile_action(struct nfp_app *app,
 	int act_len, act_cnt, err, tun_out_cnt, out_cnt;
 	enum nfp_flower_tun_type tun_type;
 	const struct tc_action *a;
+	u32 csum_updated = 0;
 	LIST_HEAD(actions);
 
 	memset(nfp_flow->action_data, 0, NFP_FL_MAX_A_SIZ);
@@ -632,8 +695,9 @@ int nfp_flower_compile_action(struct nfp_app *app,
 
 	tcf_exts_to_list(flow->exts, &actions);
 	list_for_each_entry(a, &actions, list) {
-		err = nfp_flower_loop_action(app, a, nfp_flow, &act_len, netdev,
-					     &tun_type, &tun_out_cnt, &out_cnt);
+		err = nfp_flower_loop_action(app, a, flow, nfp_flow, &act_len,
+					     netdev, &tun_type, &tun_out_cnt,
+					     &out_cnt, &csum_updated);
 		if (err)
 			return err;
 		act_cnt++;
-- 
2.17.1

^ permalink raw reply related

* [PATCH net-next 7/9] nfp: flower: extract ipv4 udp tunnel ttl from route
From: Jakub Kicinski @ 2018-06-30  0:04 UTC (permalink / raw)
  To: davem; +Cc: oss-drivers, netdev, John Hurley
In-Reply-To: <20180630000442.27353-1-jakub.kicinski@netronome.com>

From: John Hurley <john.hurley@netronome.com>

Previously the ttl for ipv4 udp tunnels was set to the namespace default.
Modify this to attempt to extract the ttl from a full route lookup on the
tunnel destination. If this is not possible then resort to the default.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
---
 .../ethernet/netronome/nfp/flower/action.c    | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/action.c b/drivers/net/ethernet/netronome/nfp/flower/action.c
index 61ba8d4f99f1..d421b7fbce96 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/action.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/action.c
@@ -236,9 +236,12 @@ nfp_fl_set_ipv4_udp_tun(struct nfp_fl_set_ipv4_udp_tun *set_tun,
 	size_t act_size = sizeof(struct nfp_fl_set_ipv4_udp_tun);
 	struct ip_tunnel_info *ip_tun = tcf_tunnel_info(action);
 	u32 tmp_set_ip_tun_type_index = 0;
+	struct flowi4 flow = {};
 	/* Currently support one pre-tunnel so index is always 0. */
 	int pretun_idx = 0;
+	struct rtable *rt;
 	struct net *net;
+	int err;
 
 	if (ip_tun->options_len)
 		return -EOPNOTSUPP;
@@ -255,7 +258,21 @@ nfp_fl_set_ipv4_udp_tun(struct nfp_fl_set_ipv4_udp_tun *set_tun,
 
 	set_tun->tun_type_index = cpu_to_be32(tmp_set_ip_tun_type_index);
 	set_tun->tun_id = ip_tun->key.tun_id;
-	set_tun->ttl = net->ipv4.sysctl_ip_default_ttl;
+
+	/* Do a route lookup to determine ttl - if fails then use default.
+	 * Note that CONFIG_INET is a requirement of CONFIG_NET_SWITCHDEV so
+	 * must be defined here.
+	 */
+	flow.daddr = ip_tun->key.u.ipv4.dst;
+	flow.flowi4_proto = IPPROTO_UDP;
+	rt = ip_route_output_key(net, &flow);
+	err = PTR_ERR_OR_ZERO(rt);
+	if (!err) {
+		set_tun->ttl = ip4_dst_hoplimit(&rt->dst);
+		ip_rt_put(rt);
+	} else {
+		set_tun->ttl = net->ipv4.sysctl_ip_default_ttl;
+	}
 
 	/* Complete pre_tunnel action. */
 	pre_tun->ipv4_dst = ip_tun->key.u.ipv4.dst;
-- 
2.17.1

^ permalink raw reply related

* [PATCH net-next 8/9] nfp: flower: offload tos and tunnel flags for ipv4 udp tunnels
From: Jakub Kicinski @ 2018-06-30  0:04 UTC (permalink / raw)
  To: davem; +Cc: oss-drivers, netdev, John Hurley
In-Reply-To: <20180630000442.27353-1-jakub.kicinski@netronome.com>

From: John Hurley <john.hurley@netronome.com>

Extract the tos and the tunnel flags from the tunnel key and offload these
action fields. Only the checksum and tunnel key flags are implemented in
fw so reject offloads of other flags. The tunnel key flag is always
considered set in the fw so enforce that it is set in the rule. Note that
the compulsory setting of the tunnel key flag and optional setting of
checksum is inline with how tc currently generates ipv4 udp tunnel
actions.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/flower/action.c | 9 +++++++++
 drivers/net/ethernet/netronome/nfp/flower/cmsg.h   | 4 ++--
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/action.c b/drivers/net/ethernet/netronome/nfp/flower/action.c
index d421b7fbce96..e56b815a8dc6 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/action.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/action.c
@@ -45,6 +45,8 @@
 #include "main.h"
 #include "../nfp_net_repr.h"
 
+#define NFP_FL_SUPPORTED_IPV4_UDP_TUN_FLAGS	(TUNNEL_CSUM | TUNNEL_KEY)
+
 static void nfp_fl_pop_vlan(struct nfp_fl_pop_vlan *pop_vlan)
 {
 	size_t act_size = sizeof(struct nfp_fl_pop_vlan);
@@ -274,6 +276,13 @@ nfp_fl_set_ipv4_udp_tun(struct nfp_fl_set_ipv4_udp_tun *set_tun,
 		set_tun->ttl = net->ipv4.sysctl_ip_default_ttl;
 	}
 
+	set_tun->tos = ip_tun->key.tos;
+
+	if (!(ip_tun->key.tun_flags & TUNNEL_KEY) ||
+	    ip_tun->key.tun_flags & ~NFP_FL_SUPPORTED_IPV4_UDP_TUN_FLAGS)
+		return -EOPNOTSUPP;
+	set_tun->tun_flags = ip_tun->key.tun_flags;
+
 	/* Complete pre_tunnel action. */
 	pre_tun->ipv4_dst = ip_tun->key.u.ipv4.dst;
 
diff --git a/drivers/net/ethernet/netronome/nfp/flower/cmsg.h b/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
index 4a7f3510a296..15f1eacd76b6 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
@@ -203,9 +203,9 @@ struct nfp_fl_set_ipv4_udp_tun {
 	__be16 reserved;
 	__be64 tun_id __packed;
 	__be32 tun_type_index;
-	__be16 reserved2;
+	__be16 tun_flags;
 	u8 ttl;
-	u8 reserved3;
+	u8 tos;
 	__be32 extra[2];
 };
 
-- 
2.17.1

^ permalink raw reply related

* [PATCH net-next 9/9] nfp: flower: enabled offloading of Team LAG
From: Jakub Kicinski @ 2018-06-30  0:04 UTC (permalink / raw)
  To: davem; +Cc: oss-drivers, netdev, John Hurley
In-Reply-To: <20180630000442.27353-1-jakub.kicinski@netronome.com>

From: John Hurley <john.hurley@netronome.com>

Currently the NFP fw only supports L3/L4 hashing so rejects the offload of
filters that output to LAG ports implementing other hash algorithms. Team,
however, uses a BPF function for the hash that is not defined. To support
Team offload, accept hashes that are defined as 'unknown' (only Team
defines such hash types). In this case, use the NFP default of L3/L4
hashing for egress port selection.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/flower/lag_conf.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/lag_conf.c b/drivers/net/ethernet/netronome/nfp/flower/lag_conf.c
index 0c4c957717ea..bf10598f66ae 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/lag_conf.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/lag_conf.c
@@ -564,8 +564,9 @@ nfp_fl_lag_changeupper_event(struct nfp_fl_lag *lag,
 	if (lag_upper_info &&
 	    lag_upper_info->tx_type != NETDEV_LAG_TX_TYPE_ACTIVEBACKUP &&
 	    (lag_upper_info->tx_type != NETDEV_LAG_TX_TYPE_HASH ||
-	    (lag_upper_info->hash_type != NETDEV_LAG_HASH_L34 &&
-	    lag_upper_info->hash_type != NETDEV_LAG_HASH_E34))) {
+	     (lag_upper_info->hash_type != NETDEV_LAG_HASH_L34 &&
+	      lag_upper_info->hash_type != NETDEV_LAG_HASH_E34 &&
+	      lag_upper_info->hash_type != NETDEV_LAG_HASH_UNKNOWN))) {
 		can_offload = false;
 		nfp_flower_cmsg_warn(priv->app,
 				     "Unable to offload tx_type %u hash %u\n",
-- 
2.17.1

^ permalink raw reply related

* Re: [RFC v2 PATCH 1/4] eBPF: Add new eBPF prog type BPF_PROG_TYPE_SOCKET_SG_FILTER
From: Tushar Dave @ 2018-06-30  0:20 UTC (permalink / raw)
  To: Daniel Borkmann, Daniel Borkmann, ast, davem, jakub.kicinski,
	quentin.monnet, jiong.wang, guro, sandipan, john.fastabend, kafai,
	rdna, brakmo, netdev, acme, sowmini.varadhan
In-Reply-To: <8d996ba3-b8a6-7d40-9752-b3725aa9c012@iogearbox.net>



On 06/29/2018 01:48 AM, Daniel Borkmann wrote:
> On 06/29/2018 09:25 AM, Daniel Borkmann wrote:
>> On 06/19/2018 08:00 PM, Tushar Dave wrote:
>>> Add new eBPF prog type BPF_PROG_TYPE_SOCKET_SG_FILTER which uses the
>>> existing socket filter infrastructure for bpf program attach and load.
>>> SOCKET_SG_FILTER eBPF program receives struct scatterlist as bpf context
>>> contrast to SOCKET_FILTER which deals with struct skb. This is useful
>>> for kernel entities that don't have skb to represent packet data but
>>> want to run eBPF socket filter on packet data that is in form of struct
>>> scatterlist e.g. IB/RDMA
>>>
>>> Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
>>> Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
> [...]
>>>   static void __bpf_prog_release(struct bpf_prog *prog)
>>>   {
>>> -	if (prog->type == BPF_PROG_TYPE_SOCKET_FILTER) {
>>> +	if (prog->type == BPF_PROG_TYPE_SOCKET_FILTER ||
>>> +	    prog->type == BPF_PROG_TYPE_SOCKET_SG_FILTER) {
>>>   		bpf_prog_put(prog);
>>>   	} else {
>>>   		bpf_release_orig_filter(prog);
>>> @@ -1551,10 +1552,16 @@ int sk_reuseport_attach_filter(struct sock_fprog *fprog, struct sock *sk)
>>>   
>>>   static struct bpf_prog *__get_bpf(u32 ufd, struct sock *sk)
>>>   {
>>> +	struct bpf_prog *prog;
>>> +
>>>   	if (sock_flag(sk, SOCK_FILTER_LOCKED))
>>>   		return ERR_PTR(-EPERM);
>>>   
>>> -	return bpf_prog_get_type(ufd, BPF_PROG_TYPE_SOCKET_FILTER);
>>> +	prog = bpf_prog_get_type(ufd, BPF_PROG_TYPE_SOCKET_FILTER);
>>> +	if (IS_ERR(prog))
>>> +		prog = bpf_prog_get_type(ufd, BPF_PROG_TYPE_SOCKET_SG_FILTER);
>>> +
>>> +	return prog;
>>>   }
>>
>> Hmm, I don't think this works: this now means as unpriviledged I can attach a new
>> BPF_PROG_TYPE_SOCKET_SG_FILTER to a non-rds socket e.g. normal tcp/udp through the
>> SO_ATTACH_BPF sockopt, where input context is skb instead of sg list and thus crash
>> my box?

hmm.. I see the problem.

> 
> ... probably best to just make a setsockopt specific to rds here, so the two are fully
> separated.

Yes, it makes sense to make setsockopt specific to RDS that gives us
complete separation (and not to worry about above problem).
btw, to to make setsockopt specific to RDS though we have to export
sk_attach_bpf().

> 
> Also worth exploring whether you can reuse as much as possible from the struct sk_msg_buff
> context and in general the BPF_PROG_TYPE_SK_MSG type that is using this which we already
> have in sockmap today. At least feels like some of the concepts are a bit similar. For
> pulling in more payload you have bpf_msg_pull_data() there which I think might be more
> user-friendly at least in that you have the full payload from start to the 'current' end
> available and don't need to navigate through individual sg entries back/forth which could
> perhaps end up being bit painful for users, though I can see that it's a middle ground
> between some skb_load_bytes()-alike helper that would copy the pieces out of the sg entries
> vs needing to linearize. What are the requirements here, would it make sense to offer both
> as an option or is this impractical based on what you've measured?

Yes, sockmap also deal with struct scatterlist so from that prospective
I certainly try to reuse code wherever I can e.g. most likely get rid of
struct bpf_scatterlist and use struct sk_msg_buff.

Form use-case prospective, we want to look at complete payload (that
includes RDS header as well) and based on that take actions like pass
,drop or forward. So, I agree that it makes sense to not iterate over sg
element back/forth [1]. I guess bpf_msg_pull_data() would do the work.
If there is need we may add sg_load_bytes()-like helper.

Thanks for taking time reviewing. I will work on the suggested changes.

-Tushar

[1]
btw, bpf_sg_next() was added to just showcase an example and if there is
no real need of it I will get rid of it.

^ permalink raw reply

* Re: [RFC v2 PATCH 2/4] ebpf: Add sg_filter_run and sg helper
From: Tushar Dave @ 2018-06-30  0:24 UTC (permalink / raw)
  To: Daniel Borkmann, ast, davem, jakub.kicinski, quentin.monnet,
	jiong.wang, guro, sandipan, john.fastabend, kafai, rdna, brakmo,
	netdev, acme, sowmini.varadhan
In-Reply-To: <b19a9d21-f1a6-466b-0e97-884967b62155@iogearbox.net>



On 06/29/2018 01:18 AM, Daniel Borkmann wrote:
> On 06/19/2018 08:00 PM, Tushar Dave wrote:
>> When sg_filter_run() is invoked it runs the attached eBPF
>> SOCKET_SG_FILTER program which deals with struct scatterlist.
>>
>> In addition, this patch also adds bpf_sg_next helper function that
>> allows users to retrieve the next sg element from sg list.
>>
>> Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
>> Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
>> ---
>>   include/linux/filter.h                    |  2 +
>>   include/uapi/linux/bpf.h                  | 10 ++++-
>>   net/core/filter.c                         | 72 +++++++++++++++++++++++++++++++
>>   tools/include/uapi/linux/bpf.h            | 10 ++++-
>>   tools/testing/selftests/bpf/bpf_helpers.h |  3 ++
>>   5 files changed, 95 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/filter.h b/include/linux/filter.h
>> index 71618b1..d176402 100644
>> --- a/include/linux/filter.h
>> +++ b/include/linux/filter.h
>> @@ -1072,4 +1072,6 @@ struct bpf_sock_ops_kern {
>>   					 */
>>   };
>>   
>> +int sg_filter_run(struct sock *sk, struct scatterlist *sg);
>> +
>>   #endif /* __LINUX_FILTER_H__ */
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index ef0a7b6..036432b 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -2076,6 +2076,13 @@ struct bpf_stack_build_id {
>>    * 	Return
>>    * 		A 64-bit integer containing the current cgroup id based
>>    * 		on the cgroup within which the current task is running.
>> + *
>> + * int bpf_sg_next(struct bpf_scatterlist *sg)
>> + *	Description
>> + *		This helper allows user to retrieve next sg element from
>> + *		sg list.
>> + *	Return
>> + *		Returns 0 on success, or a negative error in case of failure.
>>    */
>>   #define __BPF_FUNC_MAPPER(FN)		\
>>   	FN(unspec),			\
>> @@ -2158,7 +2165,8 @@ struct bpf_stack_build_id {
>>   	FN(rc_repeat),			\
>>   	FN(rc_keydown),			\
>>   	FN(skb_cgroup_id),		\
>> -	FN(get_current_cgroup_id),
>> +	FN(get_current_cgroup_id),	\
>> +	FN(sg_next),
>>   
>>   /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>>    * function eBPF program intends to call
>> diff --git a/net/core/filter.c b/net/core/filter.c
>> index 8f67942..702ff5b 100644
>> --- a/net/core/filter.c
>> +++ b/net/core/filter.c
>> @@ -121,6 +121,53 @@ int sk_filter_trim_cap(struct sock *sk, struct sk_buff *skb, unsigned int cap)
>>   }
>>   EXPORT_SYMBOL(sk_filter_trim_cap);
>>   
>> +int sg_filter_run(struct sock *sk, struct scatterlist *sg)
>> +{
>> +	struct sk_filter *filter;
>> +	int err;
>> +
>> +	rcu_read_lock();
>> +	filter = rcu_dereference(sk->sk_filter);
>> +	if (filter) {
>> +		struct bpf_scatterlist bpfsg;
>> +		int num_sg;
>> +
>> +		if (!sg) {
>> +			err = -EINVAL;
>> +			goto out;
>> +		}
>> +
>> +		num_sg = sg_nents(sg);
>> +		if (num_sg <= 0) {
>> +			err = -EINVAL;
>> +			goto out;
>> +		}
>> +
>> +		/* We store a reference  to the sg list so it can later used by
>> +		 * eBPF helpers to retrieve the next sg element.
>> +		 */
>> +		bpfsg.num_sg = num_sg;
>> +		bpfsg.cur_sg = 0;
>> +		bpfsg.sg = sg;
>> +
>> +		/* For the first sg element, we store the pkt access pointers
>> +		 * into start and end so eBPF program can have pkt access using
>> +		 * data and data_end. The pkt access for subsequent element of
>> +		 * sg list is possible when eBPF program invokes bpf_sg_next
>> +		 * which takes care of setting start and end to the correct sg
>> +		 * element.
>> +		 */
>> +		bpfsg.start = sg_virt(sg);
>> +		bpfsg.end = bpfsg.start + sg->length;
>> +		BPF_PROG_RUN(filter->prog, &bpfsg);
>> +	}
>> +out:
>> +	rcu_read_unlock();
>> +
>> +	return err;
>> +}
>> +EXPORT_SYMBOL(sg_filter_run);
>> +
>>   BPF_CALL_1(bpf_skb_get_pay_offset, struct sk_buff *, skb)
>>   {
>>   	return skb_get_poff(skb);
>> @@ -3753,6 +3800,29 @@ static unsigned long bpf_xdp_copy(void *dst_buff, const void *src_buff,
>>   	.arg1_type      = ARG_PTR_TO_CTX,
>>   };
>>   
>> +BPF_CALL_1(bpf_sg_next, struct bpf_scatterlist *, bpfsg)
>> +{
>> +	struct scatterlist *sg = bpfsg->sg;
>> +	int cur_sg = bpfsg->cur_sg;
>> +
>> +	cur_sg++;
>> +	if (cur_sg >= bpfsg->num_sg)
>> +		return -ENODATA;
>> +
>> +	bpfsg->cur_sg = cur_sg;
>> +	bpfsg->start = sg_virt(&sg[cur_sg]);
>> +	bpfsg->end = bpfsg->start + sg[cur_sg].length;
>> +
>> +	return 0;
>> +}
>> +
>> +static const struct bpf_func_proto bpf_sg_next_proto = {
>> +	.func		= bpf_sg_next,
>> +	.gpl_only	= false,
>> +	.ret_type	= RET_INTEGER,
>> +	.arg1_type	= ARG_PTR_TO_CTX,
>> +};
> 
> Should be added to bpf_helper_changes_pkt_data() in order to enforce a reload
> of all pkt pointers. Otherwise this is buggy in the sense that someone could only
> reload pkt_end pointer in the prog while old pkt_start still points to previous
> sg entry, so you would be able to access out of bounds.

Sure thing. Will do so.

Thank you.

-Tushar
> 
> Thanks,
> Daniel
> 

^ permalink raw reply

* Re: [RFC v2 PATCH 2/4] ebpf: Add sg_filter_run and sg helper
From: Tushar Dave @ 2018-06-30  0:27 UTC (permalink / raw)
  To: Daniel Borkmann, ast, davem, jakub.kicinski, quentin.monnet,
	jiong.wang, guro, sandipan, john.fastabend, kafai, rdna, brakmo,
	netdev, acme, sowmini.varadhan
In-Reply-To: <160bb237-f453-b1cb-0e75-f4ca6d4e6559@iogearbox.net>



On 06/29/2018 01:32 AM, Daniel Borkmann wrote:
> On 06/19/2018 08:00 PM, Tushar Dave wrote:
> [...]
>> +int sg_filter_run(struct sock *sk, struct scatterlist *sg)
>> +{
>> +	struct sk_filter *filter;
>> +	int err;
>> +
>> +	rcu_read_lock();
>> +	filter = rcu_dereference(sk->sk_filter);
>> +	if (filter) {
>> +		struct bpf_scatterlist bpfsg;
>> +		int num_sg;
>> +
>> +		if (!sg) {
>> +			err = -EINVAL;
>> +			goto out;
>> +		}
>> +
>> +		num_sg = sg_nents(sg);
>> +		if (num_sg <= 0) {
>> +			err = -EINVAL;
>> +			goto out;
>> +		}
>> +
>> +		/* We store a reference  to the sg list so it can later used by
>> +		 * eBPF helpers to retrieve the next sg element.
>> +		 */
>> +		bpfsg.num_sg = num_sg;
>> +		bpfsg.cur_sg = 0;
>> +		bpfsg.sg = sg;
>> +
>> +		/* For the first sg element, we store the pkt access pointers
>> +		 * into start and end so eBPF program can have pkt access using
>> +		 * data and data_end. The pkt access for subsequent element of
>> +		 * sg list is possible when eBPF program invokes bpf_sg_next
>> +		 * which takes care of setting start and end to the correct sg
>> +		 * element.
>> +		 */
>> +		bpfsg.start = sg_virt(sg);
>> +		bpfsg.end = bpfsg.start + sg->length;
>> +		BPF_PROG_RUN(filter->prog, &bpfsg);
> 
> Return code here from BPF prog is ignored entirely, I thought you wanted to
> use it also for dropping packets? If UAPI would get frozen like this then it's
> baked in stone.

Yeah, I am going to add return code necessary for pass, drop and
forward. I will do that. Thanks.

-Tushar

> 
>> +	}
>> +out:
>> +	rcu_read_unlock();
>> +
>> +	return err;
>> +}
>> +EXPORT_SYMBOL(sg_filter_run);
> 

^ permalink raw reply

* [PATCH net-next 00/13] mlxsw: Add resource scale tests
From: Petr Machata @ 2018-06-30  0:44 UTC (permalink / raw)
  To: netdev, linux-kselftest; +Cc: jiri, idosch, shuah, davem

There are a number of tests that check features of the Linux networking
stack. By running them on suitable interfaces, one can exercise the
mlxsw offloading code. However none of these tests attempts to push
mlxsw to the limits supported by the ASIC.

As an additional wrinkle, the "limits supported by the ASIC" themselves
may not be a set of fixed numbers, but rather depend on a profile that
determines how the ASIC resources are allocated for different purposes.

This patchset introduces several tests that verify capability of mlxsw
to offload amounts of routes, flower rules, and mirroring sessions that
match predicted ASIC capacity, at different configuration profiles.
Additionally they verify that amounts exceeding the predicted capacity
can *not* be offloaded.

These are not generic tests, but ones that are tailored for mlxsw
specifically. For that reason they are not added to net/forwarding
selftests subdirectory, but rather to a newly-added drivers/net/mlxsw.

Patches #1, #2 and #3 tweak the generic forwarding/lib.sh to support the
new additions.

In patches #4 and #5, new libraries for interfacing with devlink are
introduced, first a generic one, then a Spectrum-specific one.

In patch #6, a devlink resource test is introduced.

Patches #7 and #8, #9 and #10, and #11 and #12 introduce three scale
tests: router, flower and mirror-to-gretap. The first of each pair of
patches introduces a generic portion of the test (mlxsw-specific), the
second introduces a Spectrum-specific wrapper.

Patch #13 then introduces a scale test driver that runs (possibly a
subset of) the tests introduced by patches from previous paragraph.

Arkadi Sharshevsky (1):
  selftests: mlxsw: Add router test

Petr Machata (8):
  selftests: forwarding: lib: Add check_err_fail()
  selftests: forwarding: lib: Parameterize NUM_NETIFS in two functions
  selftests: forwarding: Add devlink_lib.sh
  selftests: mlxsw: Add devlink_lib_spectrum.sh
  selftests: mlxsw: Add tc flower scale test
  selftests: mlxsw: Add target for tc flower test on spectrum
  selftests: mlxsw: Add scale test for mirror-to-gretap
  selftests: mlxsw: Add target for mirror-to-gretap test on spectrum

Yuval Mintz (4):
  selftests: forwarding: Allow lib.sh sourcing from other directories
  selftests: mlxsw: Add devlink KVD resource test
  selftests: mlxsw: Add target for router test on spectrum
  selftests: mlxsw: Add scale test for resources

 MAINTAINERS                                        |   1 +
 .../drivers/net/mlxsw/mirror_gre_scale.sh          | 197 +++++++++++++++++++++
 .../selftests/drivers/net/mlxsw/router_scale.sh    | 167 +++++++++++++++++
 .../net/mlxsw/spectrum/devlink_lib_spectrum.sh     | 119 +++++++++++++
 .../net/mlxsw/spectrum/devlink_resources.sh        | 117 ++++++++++++
 .../drivers/net/mlxsw/spectrum/mirror_gre_scale.sh |  13 ++
 .../drivers/net/mlxsw/spectrum/resource_scale.sh   |  55 ++++++
 .../drivers/net/mlxsw/spectrum/router_scale.sh     |  18 ++
 .../drivers/net/mlxsw/spectrum/tc_flower_scale.sh  |  19 ++
 .../selftests/drivers/net/mlxsw/tc_flower_scale.sh | 134 ++++++++++++++
 .../selftests/net/forwarding/devlink_lib.sh        | 108 +++++++++++
 tools/testing/selftests/net/forwarding/lib.sh      |  30 +++-
 12 files changed, 974 insertions(+), 4 deletions(-)
 create mode 100644 tools/testing/selftests/drivers/net/mlxsw/mirror_gre_scale.sh
 create mode 100644 tools/testing/selftests/drivers/net/mlxsw/router_scale.sh
 create mode 100644 tools/testing/selftests/drivers/net/mlxsw/spectrum/devlink_lib_spectrum.sh
 create mode 100755 tools/testing/selftests/drivers/net/mlxsw/spectrum/devlink_resources.sh
 create mode 100644 tools/testing/selftests/drivers/net/mlxsw/spectrum/mirror_gre_scale.sh
 create mode 100755 tools/testing/selftests/drivers/net/mlxsw/spectrum/resource_scale.sh
 create mode 100644 tools/testing/selftests/drivers/net/mlxsw/spectrum/router_scale.sh
 create mode 100644 tools/testing/selftests/drivers/net/mlxsw/spectrum/tc_flower_scale.sh
 create mode 100644 tools/testing/selftests/drivers/net/mlxsw/tc_flower_scale.sh
 create mode 100644 tools/testing/selftests/net/forwarding/devlink_lib.sh

-- 
2.4.11

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox