Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v2 net-next 0/3] net: Update fib_table_lookup tracepoints
From: David Miller @ 2018-05-25  3:01 UTC (permalink / raw)
  To: dsahern; +Cc: netdev, dsahern
In-Reply-To: <20180524000849.6553-1-dsahern@kernel.org>

From: dsahern@kernel.org
Date: Wed, 23 May 2018 17:08:46 -0700

> From: David Ahern <dsahern@gmail.com>
> 
> Update the FIB lookup tracepoints to include ip proto and port fields
> from the flow struct. In the process make the IPv4 tracepoint inline
> with IPv6 which is much easier to use and follow the lookup and result.
> 
> Remove the tracepoint in fib_validate_source which does not provide
> value above the fib_table_lookup which immediately follows it.
> 
> v2
> - move CREATE_TRACE_POINTS for the v6 tracepoint to route.c to handle
>   its need for an internal function to convert route type to error and
>   handle IPv6 as a module or builtin. Reported by kbuild robot.

Series applied.

^ permalink raw reply

* Re: [Patch net-next] net_sched: switch to rcu_work
From: David Miller @ 2018-05-25  2:56 UTC (permalink / raw)
  To: xiyou.wangcong; +Cc: netdev, tj, paulmck, jhs
In-Reply-To: <20180523222653.19376-1-xiyou.wangcong@gmail.com>

From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Wed, 23 May 2018 15:26:53 -0700

> Commit 05f0fe6b74db ("RCU, workqueue: Implement rcu_work") introduces
> new API's for dispatching work in a RCU callback. Now we can just
> switch to the new API's for tc filters. This could get rid of a lot
> of code.
> 
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH v2] ppp: remove the PPPIOCDETACH ioctl
From: David Miller @ 2018-05-25  2:55 UTC (permalink / raw)
  To: ebiggers3
  Cc: linux-ppp, paulus, netdev, linux-fsdevel, linux-kernel, g.nault,
	syzkaller-bugs, ebiggers
In-Reply-To: <20180523213738.146911-1-ebiggers3@gmail.com>

From: Eric Biggers <ebiggers3@gmail.com>
Date: Wed, 23 May 2018 14:37:38 -0700

> From: Eric Biggers <ebiggers@google.com>
> 
> The PPPIOCDETACH ioctl effectively tries to "close" the given ppp file
> before f_count has reached 0, which is fundamentally a bad idea.  It
> does check 'f_count < 2', which excludes concurrent operations on the
> file since they would only be possible with a shared fd table, in which
> case each fdget() would take a file reference.  However, it fails to
> account for the fact that even with 'f_count == 1' the file can still be
> linked into epoll instances.  As reported by syzbot, this can trivially
> be used to cause a use-after-free.
> 
> Yet, the only known user of PPPIOCDETACH is pppd versions older than
> ppp-2.4.2, which was released almost 15 years ago (November 2003).
> Also, PPPIOCDETACH apparently stopped working reliably at around the
> same time, when the f_count check was added to the kernel, e.g. see
> https://lkml.org/lkml/2002/12/31/83.  Also, the current 'f_count < 2'
> check makes PPPIOCDETACH only work in single-threaded applications; it
> always fails if called from a multithreaded application.
> 
> All pppd versions released in the last 15 years just close() the file
> descriptor instead.
> 
> Therefore, instead of hacking around this bug by exporting epoll
> internals to modules, and probably missing other related bugs, just
> remove the PPPIOCDETACH ioctl and see if anyone actually notices.  Leave
> a stub in place that prints a one-time warning and returns EINVAL.
> 
> Reported-by: syzbot+16363c99d4134717c05b@syzkaller.appspotmail.com
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Signed-off-by: Eric Biggers <ebiggers@google.com>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [PATCH net-next] net: stmmac: Add PPS and Flexible PPS support
From: kbuild test robot @ 2018-05-25  2:42 UTC (permalink / raw)
  To: Jose Abreu
  Cc: kbuild-all, netdev, Jose Abreu, David S. Miller, Joao Pinto,
	Vitor Soares, Giuseppe Cavallaro, Alexandre Torgue
In-Reply-To: <072478625b1cb3d4af9e3b42f83ece7303fd554e.1526993857.git.joabreu@synopsys.com>

[-- Attachment #1: Type: text/plain, Size: 919 bytes --]

Hi Jose,

I love your patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Jose-Abreu/net-stmmac-Add-PPS-and-Flexible-PPS-support/20180525-074128
config: arm-sunxi_defconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=arm 

All errors (new ones prefixed by >>):

   drivers/net/ethernet/stmicro/stmmac/dwmac5.o: In function `dwmac5_flex_pps_config':
>> dwmac5.c:(.text+0x974): undefined reference to `__aeabi_uldivmod'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 22791 bytes --]

^ permalink raw reply

* Re: [RFC v5 0/5] virtio: support packed ring
From: Jason Wang @ 2018-05-25  2:31 UTC (permalink / raw)
  To: Tiwei Bie, mst, virtualization, linux-kernel, netdev; +Cc: wexu, jfreimann
In-Reply-To: <20180522081648.14768-1-tiwei.bie@intel.com>



On 2018年05月22日 16:16, Tiwei Bie wrote:
> Hello everyone,
>
> This RFC implements packed ring support in virtio driver.
>
> Some simple functional tests have been done with Jason's
> packed ring implementation in vhost (RFC v4):
>
> https://lkml.org/lkml/2018/5/16/501
>
> Both of ping and netperf worked as expected w/ EVENT_IDX
> disabled. Ping worked as expected w/ EVENT_IDX enabled,
> but netperf didn't (A hack has been added in the driver
> to make netperf test pass in this case. The hack can be
> found by `grep -rw XXX` in the code).

Looks like this is because a bug in vhost which wrongly track 
signalled_used and may miss an interrupt. After fixing that, both side 
works like a charm.

I'm testing vIOMMU and zerocopy, and will post a new version shortly. 
(Hope it would be the last RFC version).

Thanks

^ permalink raw reply

* Re: [PATCH bpf-next v7 6/6] selftests/bpf: test for seg6local End.BPF action
From: Y Song @ 2018-05-25  2:30 UTC (permalink / raw)
  To: Mathieu Xhonneux; +Cc: netdev, Daniel Borkmann, dlebrun, Alexei Starovoitov
In-Reply-To: <ec46c74ede080ae7371c36efc8aa6792b445b9ac.1526824042.git.m.xhonneux@gmail.com>

When compiling latest bpf-next, I hit the following compilation error:

clang -I. -I./include/uapi -I../../../include/uapi -idirafter
/usr/local/include -idirafter
/data/users/yhs/work/llvm/build/install/lib/clang/7.0.0/include
-idirafter /usr/include -Wno-compare-distinct-pointer-types \
         -O2 -target bpf -emit-llvm -c test_lwt_seg6local.c -o - |      \
llc -march=bpf -mcpu=generic  -filetype=obj -o
/data/users/yhs/work/net-next/tools/testing/selftests/bpf/test_lwt_seg6local.o
test_lwt_seg6local.c:4:10: fatal error: 'linux/seg6_local.h' file not found
#include <linux/seg6_local.h>
         ^~~~~~~~~~~~~~~~~~~~
1 error generated.
make: Leaving directory
`/data/users/yhs/work/net-next/tools/testing/selftests/bpf'

Should the seg6_local.h be copied to tools/ directory?

On Sun, May 20, 2018 at 6:58 AM, Mathieu Xhonneux <m.xhonneux@gmail.com> wrote:
> Add a new test for the seg6local End.BPF action. The following helpers
> are also tested:
>
> - bpf_lwt_push_encap within the LWT BPF IN hook
> - bpf_lwt_seg6_action
> - bpf_lwt_seg6_adjust_srh
> - bpf_lwt_seg6_store_bytes
>
> A chain of End.BPF actions is built. The SRH is injected through a LWT
> BPF IN hook before entering this chain. Each End.BPF action validates
> the previous one, otherwise the packet is dropped. The test succeeds
> if the last node in the chain receives the packet and the UDP datagram
> contained can be retrieved from userspace.
>
> Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
> ---
>  tools/include/uapi/linux/bpf.h                    |  97 ++++-
>  tools/testing/selftests/bpf/Makefile              |   6 +-
>  tools/testing/selftests/bpf/bpf_helpers.h         |  12 +
>  tools/testing/selftests/bpf/test_lwt_seg6local.c  | 437 ++++++++++++++++++++++
>  tools/testing/selftests/bpf/test_lwt_seg6local.sh | 140 +++++++
>  5 files changed, 689 insertions(+), 3 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/test_lwt_seg6local.c
>  create mode 100755 tools/testing/selftests/bpf/test_lwt_seg6local.sh
>
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 97446bbe2ca5..b217a33d80a4 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -141,6 +141,7 @@ enum bpf_prog_type {
>         BPF_PROG_TYPE_SK_MSG,
>         BPF_PROG_TYPE_RAW_TRACEPOINT,
>         BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
> +       BPF_PROG_TYPE_LWT_SEG6LOCAL,
>  };
>
>  enum bpf_attach_type {
> @@ -1902,6 +1903,90 @@ union bpf_attr {
>   *             egress otherwise). This is the only flag supported for now.
>   *     Return
>   *             **SK_PASS** on success, or **SK_DROP** on error.
> + *
> + * int bpf_lwt_push_encap(struct sk_buff *skb, u32 type, void *hdr, u32 len)
> + *     Description
> + *             Encapsulate the packet associated to *skb* within a Layer 3
> + *             protocol header. This header is provided in the buffer at
> + *             address *hdr*, with *len* its size in bytes. *type* indicates
> + *             the protocol of the header and can be one of:
> + *
> + *             **BPF_LWT_ENCAP_SEG6**
> + *                     IPv6 encapsulation with Segment Routing Header
> + *                     (**struct ipv6_sr_hdr**). *hdr* only contains the SRH,
> + *                     the IPv6 header is computed by the kernel.
> + *             **BPF_LWT_ENCAP_SEG6_INLINE**
> + *                     Only works if *skb* contains an IPv6 packet. Insert a
> + *                     Segment Routing Header (**struct ipv6_sr_hdr**) inside
> + *                     the IPv6 header.
> + *
> + *             A call to this helper is susceptible to change the underlaying
> + *             packet buffer. Therefore, at load time, all checks on pointers
> + *             previously done by the verifier are invalidated and must be
> + *             performed again, if the helper is used in combination with
> + *             direct packet access.
> + *     Return
> + *             0 on success, or a negative error in case of failure.
> + *
> + * int bpf_lwt_seg6_store_bytes(struct sk_buff *skb, u32 offset, const void *from, u32 len)
> + *     Description
> + *             Store *len* bytes from address *from* into the packet
> + *             associated to *skb*, at *offset*. Only the flags, tag and TLVs
> + *             inside the outermost IPv6 Segment Routing Header can be
> + *             modified through this helper.
> + *
> + *             A call to this helper is susceptible to change the underlaying
> + *             packet buffer. Therefore, at load time, all checks on pointers
> + *             previously done by the verifier are invalidated and must be
> + *             performed again, if the helper is used in combination with
> + *             direct packet access.
> + *     Return
> + *             0 on success, or a negative error in case of failure.
> + *
> + * int bpf_lwt_seg6_adjust_srh(struct sk_buff *skb, u32 offset, s32 delta)
> + *     Description
> + *             Adjust the size allocated to TLVs in the outermost IPv6
> + *             Segment Routing Header contained in the packet associated to
> + *             *skb*, at position *offset* by *delta* bytes. Only offsets
> + *             after the segments are accepted. *delta* can be as well
> + *             positive (growing) as negative (shrinking).
> + *
> + *             A call to this helper is susceptible to change the underlaying
> + *             packet buffer. Therefore, at load time, all checks on pointers
> + *             previously done by the verifier are invalidated and must be
> + *             performed again, if the helper is used in combination with
> + *             direct packet access.
> + *     Return
> + *             0 on success, or a negative error in case of failure.
> + *
> + * int bpf_lwt_seg6_action(struct sk_buff *skb, u32 action, void *param, u32 param_len)
> + *     Description
> + *             Apply an IPv6 Segment Routing action of type *action* to the
> + *             packet associated to *skb*. Each action takes a parameter
> + *             contained at address *param*, and of length *param_len* bytes.
> + *             *action* can be one of:
> + *
> + *             **SEG6_LOCAL_ACTION_END_X**
> + *                     End.X action: Endpoint with Layer-3 cross-connect.
> + *                     Type of *param*: **struct in6_addr**.
> + *             **SEG6_LOCAL_ACTION_END_T**
> + *                     End.T action: Endpoint with specific IPv6 table lookup.
> + *                     Type of *param*: **int**.
> + *             **SEG6_LOCAL_ACTION_END_B6**
> + *                     End.B6 action: Endpoint bound to an SRv6 policy.
> + *                     Type of param: **struct ipv6_sr_hdr**.
> + *             **SEG6_LOCAL_ACTION_END_B6_ENCAP**
> + *                     End.B6.Encap action: Endpoint bound to an SRv6
> + *                     encapsulation policy.
> + *                     Type of param: **struct ipv6_sr_hdr**.
> + *
> + *             A call to this helper is susceptible to change the underlaying
> + *             packet buffer. Therefore, at load time, all checks on pointers
> + *             previously done by the verifier are invalidated and must be
> + *             performed again, if the helper is used in combination with
> + *             direct packet access.
> + *     Return
> + *             0 on success, or a negative error in case of failure.
>   */
>  #define __BPF_FUNC_MAPPER(FN)          \
>         FN(unspec),                     \
> @@ -1976,7 +2061,11 @@ union bpf_attr {
>         FN(fib_lookup),                 \
>         FN(sock_hash_update),           \
>         FN(msg_redirect_hash),          \
> -       FN(sk_redirect_hash),
> +       FN(sk_redirect_hash),           \
> +       FN(lwt_push_encap),             \
> +       FN(lwt_seg6_store_bytes),       \
> +       FN(lwt_seg6_adjust_srh),        \
> +       FN(lwt_seg6_action),
>
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>   * function eBPF program intends to call
> @@ -2043,6 +2132,12 @@ enum bpf_hdr_start_off {
>         BPF_HDR_START_NET,
>  };
>
> +/* Encapsulation type for BPF_FUNC_lwt_push_encap helper. */
> +enum bpf_lwt_encap_mode {
> +       BPF_LWT_ENCAP_SEG6,
> +       BPF_LWT_ENCAP_SEG6_INLINE
> +};
> +
>  /* user accessible mirror of in-kernel sk_buff.
>   * new fields can only be added to the end of this structure
>   */
> diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> index 1eb0fa2aba92..b6222b3f8fab 100644
> --- a/tools/testing/selftests/bpf/Makefile
> +++ b/tools/testing/selftests/bpf/Makefile
> @@ -33,7 +33,8 @@ TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o test
>         sample_map_ret0.o test_tcpbpf_kern.o test_stacktrace_build_id.o \
>         sockmap_tcp_msg_prog.o connect4_prog.o connect6_prog.o test_adjust_tail.o \
>         test_btf_haskv.o test_btf_nokv.o test_sockmap_kern.o test_tunnel_kern.o \
> -       test_get_stack_rawtp.o test_sockmap_kern.o test_sockhash_kern.o
> +       test_get_stack_rawtp.o test_sockmap_kern.o test_sockhash_kern.o \
> +       test_lwt_seg6local.o
>
>  # Order correspond to 'make run_tests' order
>  TEST_PROGS := test_kmod.sh \
> @@ -42,7 +43,8 @@ TEST_PROGS := test_kmod.sh \
>         test_xdp_meta.sh \
>         test_offload.py \
>         test_sock_addr.sh \
> -       test_tunnel.sh
> +       test_tunnel.sh \
> +       test_lwt_seg6local.sh
>
>  # Compile but not part of 'make run_tests'
>  TEST_GEN_PROGS_EXTENDED = test_libbpf_open test_sock_addr
> diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
> index 8f143dfb3700..334d3e8c5e89 100644
> --- a/tools/testing/selftests/bpf/bpf_helpers.h
> +++ b/tools/testing/selftests/bpf/bpf_helpers.h
> @@ -114,6 +114,18 @@ static int (*bpf_get_stack)(void *ctx, void *buf, int size, int flags) =
>  static int (*bpf_fib_lookup)(void *ctx, struct bpf_fib_lookup *params,
>                              int plen, __u32 flags) =
>         (void *) BPF_FUNC_fib_lookup;
> +static int (*bpf_lwt_push_encap)(void *ctx, unsigned int type, void *hdr,
> +                                unsigned int len) =
> +       (void *) BPF_FUNC_lwt_push_encap;
> +static int (*bpf_lwt_seg6_store_bytes)(void *ctx, unsigned int offset,
> +                                      void *from, unsigned int len) =
> +       (void *) BPF_FUNC_lwt_seg6_store_bytes;
> +static int (*bpf_lwt_seg6_action)(void *ctx, unsigned int action, void *param,
> +                                 unsigned int param_len) =
> +       (void *) BPF_FUNC_lwt_seg6_action;
> +static int (*bpf_lwt_seg6_adjust_srh)(void *ctx, unsigned int offset,
> +                                     unsigned int len) =
> +       (void *) BPF_FUNC_lwt_seg6_adjust_srh;
>
>  /* llvm builtin functions that eBPF C program may use to
>   * emit BPF_LD_ABS and BPF_LD_IND instructions
> diff --git a/tools/testing/selftests/bpf/test_lwt_seg6local.c b/tools/testing/selftests/bpf/test_lwt_seg6local.c
> new file mode 100644
> index 000000000000..0575751bc1bc
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/test_lwt_seg6local.c
> @@ -0,0 +1,437 @@
> +#include <stddef.h>
> +#include <inttypes.h>
> +#include <errno.h>
> +#include <linux/seg6_local.h>
> +#include <linux/bpf.h>
> +#include "bpf_helpers.h"
> +#include "bpf_endian.h"
> +
> +#define bpf_printk(fmt, ...)                           \
> +({                                                     \
> +       char ____fmt[] = fmt;                           \
> +       bpf_trace_printk(____fmt, sizeof(____fmt),      \
> +                       ##__VA_ARGS__);                 \
> +})
> +
> +/* Packet parsing state machine helpers. */
> +#define cursor_advance(_cursor, _len) \
> +       ({ void *_tmp = _cursor; _cursor += _len; _tmp; })
> +
> +#define SR6_FLAG_ALERT (1 << 4)
> +
> +#define htonll(x) ((bpf_htonl(1)) == 1 ? (x) : ((uint64_t)bpf_htonl((x) & \
> +                               0xFFFFFFFF) << 32) | bpf_htonl((x) >> 32))
> +#define ntohll(x) ((bpf_ntohl(1)) == 1 ? (x) : ((uint64_t)bpf_ntohl((x) & \
> +                               0xFFFFFFFF) << 32) | bpf_ntohl((x) >> 32))
> +#define BPF_PACKET_HEADER __attribute__((packed))
> +
> +struct ip6_t {
> +       unsigned int ver:4;
> +       unsigned int priority:8;
> +       unsigned int flow_label:20;
> +       unsigned short payload_len;
> +       unsigned char next_header;
> +       unsigned char hop_limit;
> +       unsigned long long src_hi;
> +       unsigned long long src_lo;
> +       unsigned long long dst_hi;
> +       unsigned long long dst_lo;
> +} BPF_PACKET_HEADER;
> +
> +struct ip6_addr_t {
> +       unsigned long long hi;
> +       unsigned long long lo;
> +} BPF_PACKET_HEADER;
> +
> +struct ip6_srh_t {
> +       unsigned char nexthdr;
> +       unsigned char hdrlen;
> +       unsigned char type;
> +       unsigned char segments_left;
> +       unsigned char first_segment;
> +       unsigned char flags;
> +       unsigned short tag;
> +
> +       struct ip6_addr_t segments[0];
> +} BPF_PACKET_HEADER;
> +
> +struct sr6_tlv_t {
> +       unsigned char type;
> +       unsigned char len;
> +       unsigned char value[0];
> +} BPF_PACKET_HEADER;
> +
> +__attribute__((always_inline)) struct ip6_srh_t *get_srh(struct __sk_buff *skb)
> +{
> +       void *cursor, *data_end;
> +       struct ip6_srh_t *srh;
> +       struct ip6_t *ip;
> +       uint8_t *ipver;
> +
> +       data_end = (void *)(long)skb->data_end;
> +       cursor = (void *)(long)skb->data;
> +       ipver = (uint8_t *)cursor;
> +
> +       if ((void *)ipver + sizeof(*ipver) > data_end)
> +               return NULL;
> +
> +       if ((*ipver >> 4) != 6)
> +               return NULL;
> +
> +       ip = cursor_advance(cursor, sizeof(*ip));
> +       if ((void *)ip + sizeof(*ip) > data_end)
> +               return NULL;
> +
> +       if (ip->next_header != 43)
> +               return NULL;
> +
> +       srh = cursor_advance(cursor, sizeof(*srh));
> +       if ((void *)srh + sizeof(*srh) > data_end)
> +               return NULL;
> +
> +       if (srh->type != 4)
> +               return NULL;
> +
> +       return srh;
> +}
> +
> +__attribute__((always_inline))
> +int update_tlv_pad(struct __sk_buff *skb, uint32_t new_pad,
> +                  uint32_t old_pad, uint32_t pad_off)
> +{
> +       int err;
> +
> +       if (new_pad != old_pad) {
> +               err = bpf_lwt_seg6_adjust_srh(skb, pad_off,
> +                                         (int) new_pad - (int) old_pad);
> +               if (err)
> +                       return err;
> +       }
> +
> +       if (new_pad > 0) {
> +               char pad_tlv_buf[16] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> +                                       0, 0, 0};
> +               struct sr6_tlv_t *pad_tlv = (struct sr6_tlv_t *) pad_tlv_buf;
> +
> +               pad_tlv->type = SR6_TLV_PADDING;
> +               pad_tlv->len = new_pad - 2;
> +
> +               err = bpf_lwt_seg6_store_bytes(skb, pad_off,
> +                                              (void *)pad_tlv_buf, new_pad);
> +               if (err)
> +                       return err;
> +       }
> +
> +       return 0;
> +}
> +
> +__attribute__((always_inline))
> +int is_valid_tlv_boundary(struct __sk_buff *skb, struct ip6_srh_t *srh,
> +                         uint32_t *tlv_off, uint32_t *pad_size,
> +                         uint32_t *pad_off)
> +{
> +       uint32_t srh_off, cur_off;
> +       int offset_valid = 0;
> +       int err;
> +
> +       srh_off = (char *)srh - (char *)(long)skb->data;
> +       // cur_off = end of segments, start of possible TLVs
> +       cur_off = srh_off + sizeof(*srh) +
> +               sizeof(struct ip6_addr_t) * (srh->first_segment + 1);
> +
> +       *pad_off = 0;
> +
> +       // we can only go as far as ~10 TLVs due to the BPF max stack size
> +       #pragma clang loop unroll(full)
> +       for (int i = 0; i < 10; i++) {
> +               struct sr6_tlv_t tlv;
> +
> +               if (cur_off == *tlv_off)
> +                       offset_valid = 1;
> +
> +               if (cur_off >= srh_off + ((srh->hdrlen + 1) << 3))
> +                       break;
> +
> +               err = bpf_skb_load_bytes(skb, cur_off, &tlv, sizeof(tlv));
> +               if (err)
> +                       return err;
> +
> +               if (tlv.type == SR6_TLV_PADDING) {
> +                       *pad_size = tlv.len + sizeof(tlv);
> +                       *pad_off = cur_off;
> +
> +                       if (*tlv_off == srh_off) {
> +                               *tlv_off = cur_off;
> +                               offset_valid = 1;
> +                       }
> +                       break;
> +
> +               } else if (tlv.type == SR6_TLV_HMAC) {
> +                       break;
> +               }
> +
> +               cur_off += sizeof(tlv) + tlv.len;
> +       } // we reached the padding or HMAC TLVs, or the end of the SRH
> +
> +       if (*pad_off == 0)
> +               *pad_off = cur_off;
> +
> +       if (*tlv_off == -1)
> +               *tlv_off = cur_off;
> +       else if (!offset_valid)
> +               return -EINVAL;
> +
> +       return 0;
> +}
> +
> +__attribute__((always_inline))
> +int add_tlv(struct __sk_buff *skb, struct ip6_srh_t *srh, uint32_t tlv_off,
> +           struct sr6_tlv_t *itlv, uint8_t tlv_size)
> +{
> +       uint32_t srh_off = (char *)srh - (char *)(long)skb->data;
> +       uint8_t len_remaining, new_pad;
> +       uint32_t pad_off = 0;
> +       uint32_t pad_size = 0;
> +       uint32_t partial_srh_len;
> +       int err;
> +
> +       if (tlv_off != -1)
> +               tlv_off += srh_off;
> +
> +       if (itlv->type == SR6_TLV_PADDING || itlv->type == SR6_TLV_HMAC)
> +               return -EINVAL;
> +
> +       err = is_valid_tlv_boundary(skb, srh, &tlv_off, &pad_size, &pad_off);
> +       if (err)
> +               return err;
> +
> +       err = bpf_lwt_seg6_adjust_srh(skb, tlv_off, sizeof(*itlv) + itlv->len);
> +       if (err)
> +               return err;
> +
> +       err = bpf_lwt_seg6_store_bytes(skb, tlv_off, (void *)itlv, tlv_size);
> +       if (err)
> +               return err;
> +
> +       // the following can't be moved inside update_tlv_pad because the
> +       // bpf verifier has some issues with it
> +       pad_off += sizeof(*itlv) + itlv->len;
> +       partial_srh_len = pad_off - srh_off;
> +       len_remaining = partial_srh_len % 8;
> +       new_pad = 8 - len_remaining;
> +
> +       if (new_pad == 1) // cannot pad for 1 byte only
> +               new_pad = 9;
> +       else if (new_pad == 8)
> +               new_pad = 0;
> +
> +       return update_tlv_pad(skb, new_pad, pad_size, pad_off);
> +}
> +
> +__attribute__((always_inline))
> +int delete_tlv(struct __sk_buff *skb, struct ip6_srh_t *srh,
> +              uint32_t tlv_off)
> +{
> +       uint32_t srh_off = (char *)srh - (char *)(long)skb->data;
> +       uint8_t len_remaining, new_pad;
> +       uint32_t partial_srh_len;
> +       uint32_t pad_off = 0;
> +       uint32_t pad_size = 0;
> +       struct sr6_tlv_t tlv;
> +       int err;
> +
> +       tlv_off += srh_off;
> +
> +       err = is_valid_tlv_boundary(skb, srh, &tlv_off, &pad_size, &pad_off);
> +       if (err)
> +               return err;
> +
> +       err = bpf_skb_load_bytes(skb, tlv_off, &tlv, sizeof(tlv));
> +       if (err)
> +               return err;
> +
> +       err = bpf_lwt_seg6_adjust_srh(skb, tlv_off, -(sizeof(tlv) + tlv.len));
> +       if (err)
> +               return err;
> +
> +       pad_off -= sizeof(tlv) + tlv.len;
> +       partial_srh_len = pad_off - srh_off;
> +       len_remaining = partial_srh_len % 8;
> +       new_pad = 8 - len_remaining;
> +       if (new_pad == 1) // cannot pad for 1 byte only
> +               new_pad = 9;
> +       else if (new_pad == 8)
> +               new_pad = 0;
> +
> +       return update_tlv_pad(skb, new_pad, pad_size, pad_off);
> +}
> +
> +__attribute__((always_inline))
> +int has_egr_tlv(struct __sk_buff *skb, struct ip6_srh_t *srh)
> +{
> +       int tlv_offset = sizeof(struct ip6_t) + sizeof(struct ip6_srh_t) +
> +               ((srh->first_segment + 1) << 4);
> +       struct sr6_tlv_t tlv;
> +
> +       if (bpf_skb_load_bytes(skb, tlv_offset, &tlv, sizeof(struct sr6_tlv_t)))
> +               return 0;
> +
> +       if (tlv.type == SR6_TLV_EGRESS && tlv.len == 18) {
> +               struct ip6_addr_t egr_addr;
> +
> +               if (bpf_skb_load_bytes(skb, tlv_offset + 4, &egr_addr, 16))
> +                       return 0;
> +
> +               // check if egress TLV value is correct
> +               if (ntohll(egr_addr.hi) == 0xfd00000000000000 &&
> +                               ntohll(egr_addr.lo) == 0x4)
> +                       return 1;
> +       }
> +
> +       return 0;
> +}
> +
> +// This function will push a SRH with segments fd00::1, fd00::2, fd00::3,
> +// fd00::4
> +SEC("encap_srh")
> +int __encap_srh(struct __sk_buff *skb)
> +{
> +       unsigned long long hi = 0xfd00000000000000;
> +       struct ip6_addr_t *seg;
> +       struct ip6_srh_t *srh;
> +       char srh_buf[72]; // room for 4 segments
> +       int err;
> +
> +       srh = (struct ip6_srh_t *)srh_buf;
> +       srh->nexthdr = 0;
> +       srh->hdrlen = 8;
> +       srh->type = 4;
> +       srh->segments_left = 3;
> +       srh->first_segment = 3;
> +       srh->flags = 0;
> +       srh->tag = 0;
> +
> +       seg = (struct ip6_addr_t *)((char *)srh + sizeof(*srh));
> +
> +       #pragma clang loop unroll(full)
> +       for (unsigned long long lo = 0; lo < 4; lo++) {
> +               seg->lo = htonll(4 - lo);
> +               seg->hi = htonll(hi);
> +               seg = (struct ip6_addr_t *)((char *)seg + sizeof(*seg));
> +       }
> +
> +       err = bpf_lwt_push_encap(skb, 0, (void *)srh, sizeof(srh_buf));
> +       if (err)
> +               return BPF_DROP;
> +
> +       return BPF_REDIRECT;
> +}
> +
> +// Add an Egress TLV fc00::4, add the flag A,
> +// and apply End.X action to fc42::1
> +SEC("add_egr_x")
> +int __add_egr_x(struct __sk_buff *skb)
> +{
> +       unsigned long long hi = 0xfc42000000000000;
> +       unsigned long long lo = 0x1;
> +       struct ip6_srh_t *srh = get_srh(skb);
> +       uint8_t new_flags = SR6_FLAG_ALERT;
> +       struct ip6_addr_t addr;
> +       int err, offset;
> +
> +       if (srh == NULL)
> +               return BPF_DROP;
> +
> +       uint8_t tlv[20] = {2, 18, 0, 0, 0xfd, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> +                          0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x4};
> +
> +       err = add_tlv(skb, srh, (srh->hdrlen+1) << 3,
> +                     (struct sr6_tlv_t *)&tlv, 20);
> +       if (err)
> +               return BPF_DROP;
> +
> +       offset = sizeof(struct ip6_t) + offsetof(struct ip6_srh_t, flags);
> +       err = bpf_lwt_seg6_store_bytes(skb, offset,
> +                                      (void *)&new_flags, sizeof(new_flags));
> +       if (err)
> +               return BPF_DROP;
> +
> +       addr.lo = htonll(lo);
> +       addr.hi = htonll(hi);
> +       err = bpf_lwt_seg6_action(skb, SEG6_LOCAL_ACTION_END_X,
> +                                 (void *)&addr, sizeof(addr));
> +       if (err)
> +               return BPF_DROP;
> +       return BPF_REDIRECT;
> +}
> +
> +// Pop the Egress TLV, reset the flags, change the tag 2442 and finally do a
> +// simple End action
> +SEC("pop_egr")
> +int __pop_egr(struct __sk_buff *skb)
> +{
> +       struct ip6_srh_t *srh = get_srh(skb);
> +       uint16_t new_tag = bpf_htons(2442);
> +       uint8_t new_flags = 0;
> +       int err, offset;
> +
> +       if (srh == NULL)
> +               return BPF_DROP;
> +
> +       if (srh->flags != SR6_FLAG_ALERT)
> +               return BPF_DROP;
> +
> +       if (srh->hdrlen != 11) // 4 segments + Egress TLV + Padding TLV
> +               return BPF_DROP;
> +
> +       if (!has_egr_tlv(skb, srh))
> +               return BPF_DROP;
> +
> +       err = delete_tlv(skb, srh, 8 + (srh->first_segment + 1) * 16);
> +       if (err)
> +               return BPF_DROP;
> +
> +       offset = sizeof(struct ip6_t) + offsetof(struct ip6_srh_t, flags);
> +       if (bpf_lwt_seg6_store_bytes(skb, offset, (void *)&new_flags,
> +                                    sizeof(new_flags)))
> +               return BPF_DROP;
> +
> +       offset = sizeof(struct ip6_t) + offsetof(struct ip6_srh_t, tag);
> +       if (bpf_lwt_seg6_store_bytes(skb, offset, (void *)&new_tag,
> +                                    sizeof(new_tag)))
> +               return BPF_DROP;
> +
> +       return BPF_OK;
> +}
> +
> +// Inspect if the Egress TLV and flag have been removed, if the tag is correct,
> +// then apply a End.T action to reach the last segment
> +SEC("inspect_t")
> +int __inspect_t(struct __sk_buff *skb)
> +{
> +       struct ip6_srh_t *srh = get_srh(skb);
> +       int table = 117;
> +       int err;
> +
> +       if (srh == NULL)
> +               return BPF_DROP;
> +
> +       if (srh->flags != 0)
> +               return BPF_DROP;
> +
> +       if (srh->tag != bpf_htons(2442))
> +               return BPF_DROP;
> +
> +       if (srh->hdrlen != 8) // 4 segments
> +               return BPF_DROP;
> +
> +       err = bpf_lwt_seg6_action(skb, SEG6_LOCAL_ACTION_END_T,
> +                                 (void *)&table, sizeof(table));
> +
> +       if (err)
> +               return BPF_DROP;
> +
> +       return BPF_REDIRECT;
> +}
> +
> +char __license[] SEC("license") = "GPL";
> diff --git a/tools/testing/selftests/bpf/test_lwt_seg6local.sh b/tools/testing/selftests/bpf/test_lwt_seg6local.sh
> new file mode 100755
> index 000000000000..1c77994b5e71
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/test_lwt_seg6local.sh
> @@ -0,0 +1,140 @@
> +#!/bin/bash
> +# Connects 6 network namespaces through veths.
> +# Each NS may have different IPv6 global scope addresses :
> +#   NS1 ---- NS2 ---- NS3 ---- NS4 ---- NS5 ---- NS6
> +# fb00::1           fd00::1  fd00::2  fd00::3  fb00::6
> +#                   fc42::1           fd00::4
> +#
> +# All IPv6 packets going to fb00::/16 through NS2 will be encapsulated in a
> +# IPv6 header with a Segment Routing Header, with segments :
> +#      fd00::1 -> fd00::2 -> fd00::3 -> fd00::4
> +#
> +# 3 fd00::/16 IPv6 addresses are binded to seg6local End.BPF actions :
> +# - fd00::1 : add a TLV, change the flags and apply a End.X action to fc42::1
> +# - fd00::2 : remove the TLV, change the flags, add a tag
> +# - fd00::3 : apply an End.T action to fd00::4, through routing table 117
> +#
> +# fd00::4 is a simple Segment Routing node decapsulating the inner IPv6 packet.
> +# Each End.BPF action will validate the operations applied on the SRH by the
> +# previous BPF program in the chain, otherwise the packet is dropped.
> +#
> +# An UDP datagram is sent from fb00::1 to fb00::6. The test succeeds if this
> +# datagram can be read on NS6 when binding to fb00::6.
> +
> +TMP_FILE="/tmp/selftest_lwt_seg6local.txt"
> +
> +cleanup()
> +{
> +       if [ "$?" = "0" ]; then
> +               echo "selftests: test_lwt_seg6local [PASS]";
> +       else
> +               echo "selftests: test_lwt_seg6local [FAILED]";
> +       fi
> +
> +       set +e
> +       ip netns del ns1 2> /dev/null
> +       ip netns del ns2 2> /dev/null
> +       ip netns del ns3 2> /dev/null
> +       ip netns del ns4 2> /dev/null
> +       ip netns del ns5 2> /dev/null
> +       ip netns del ns6 2> /dev/null
> +       rm -f $TMP_FILE
> +}
> +
> +set -e
> +
> +ip netns add ns1
> +ip netns add ns2
> +ip netns add ns3
> +ip netns add ns4
> +ip netns add ns5
> +ip netns add ns6
> +
> +trap cleanup 0 2 3 6 9
> +
> +ip link add veth1 type veth peer name veth2
> +ip link add veth3 type veth peer name veth4
> +ip link add veth5 type veth peer name veth6
> +ip link add veth7 type veth peer name veth8
> +ip link add veth9 type veth peer name veth10
> +
> +ip link set veth1 netns ns1
> +ip link set veth2 netns ns2
> +ip link set veth3 netns ns2
> +ip link set veth4 netns ns3
> +ip link set veth5 netns ns3
> +ip link set veth6 netns ns4
> +ip link set veth7 netns ns4
> +ip link set veth8 netns ns5
> +ip link set veth9 netns ns5
> +ip link set veth10 netns ns6
> +
> +ip netns exec ns1 ip link set dev veth1 up
> +ip netns exec ns2 ip link set dev veth2 up
> +ip netns exec ns2 ip link set dev veth3 up
> +ip netns exec ns3 ip link set dev veth4 up
> +ip netns exec ns3 ip link set dev veth5 up
> +ip netns exec ns4 ip link set dev veth6 up
> +ip netns exec ns4 ip link set dev veth7 up
> +ip netns exec ns5 ip link set dev veth8 up
> +ip netns exec ns5 ip link set dev veth9 up
> +ip netns exec ns6 ip link set dev veth10 up
> +ip netns exec ns6 ip link set dev lo up
> +
> +# All link scope addresses and routes required between veths
> +ip netns exec ns1 ip -6 addr add fb00::12/16 dev veth1 scope link
> +ip netns exec ns1 ip -6 route add fb00::21 dev veth1 scope link
> +ip netns exec ns2 ip -6 addr add fb00::21/16 dev veth2 scope link
> +ip netns exec ns2 ip -6 addr add fb00::34/16 dev veth3 scope link
> +ip netns exec ns2 ip -6 route add fb00::43 dev veth3 scope link
> +ip netns exec ns3 ip -6 route add fb00::65 dev veth5 scope link
> +ip netns exec ns3 ip -6 addr add fb00::43/16 dev veth4 scope link
> +ip netns exec ns3 ip -6 addr add fb00::56/16 dev veth5 scope link
> +ip netns exec ns4 ip -6 addr add fb00::65/16 dev veth6 scope link
> +ip netns exec ns4 ip -6 addr add fb00::78/16 dev veth7 scope link
> +ip netns exec ns4 ip -6 route add fb00::87 dev veth7 scope link
> +ip netns exec ns5 ip -6 addr add fb00::87/16 dev veth8 scope link
> +ip netns exec ns5 ip -6 addr add fb00::910/16 dev veth9 scope link
> +ip netns exec ns5 ip -6 route add fb00::109 dev veth9 scope link
> +ip netns exec ns5 ip -6 route add fb00::109 table 117 dev veth9 scope link
> +ip netns exec ns6 ip -6 addr add fb00::109/16 dev veth10 scope link
> +
> +ip netns exec ns1 ip -6 addr add fb00::1/16 dev lo
> +ip netns exec ns1 ip -6 route add fb00::6 dev veth1 via fb00::21
> +
> +ip netns exec ns2 ip -6 route add fb00::6 encap bpf in obj test_lwt_seg6local.o sec encap_srh dev veth2
> +ip netns exec ns2 ip -6 route add fd00::1 dev veth3 via fb00::43 scope link
> +
> +ip netns exec ns3 ip -6 route add fc42::1 dev veth5 via fb00::65
> +ip netns exec ns3 ip -6 route add fd00::1 encap seg6local action End.BPF obj test_lwt_seg6local.o sec add_egr_x dev veth4
> +
> +ip netns exec ns4 ip -6 route add fd00::2 encap seg6local action End.BPF obj test_lwt_seg6local.o sec pop_egr dev veth6
> +ip netns exec ns4 ip -6 addr add fc42::1 dev lo
> +ip netns exec ns4 ip -6 route add fd00::3 dev veth7 via fb00::87
> +
> +ip netns exec ns5 ip -6 route add fd00::4 table 117 dev veth9 via fb00::109
> +ip netns exec ns5 ip -6 route add fd00::3 encap seg6local action End.BPF obj test_lwt_seg6local.o sec inspect_t dev veth8
> +
> +ip netns exec ns6 ip -6 addr add fb00::6/16 dev lo
> +ip netns exec ns6 ip -6 addr add fd00::4/16 dev lo
> +
> +ip netns exec ns1 sysctl net.ipv6.conf.all.forwarding=1 > /dev/null
> +ip netns exec ns2 sysctl net.ipv6.conf.all.forwarding=1 > /dev/null
> +ip netns exec ns3 sysctl net.ipv6.conf.all.forwarding=1 > /dev/null
> +ip netns exec ns4 sysctl net.ipv6.conf.all.forwarding=1 > /dev/null
> +ip netns exec ns5 sysctl net.ipv6.conf.all.forwarding=1 > /dev/null
> +
> +ip netns exec ns6 sysctl net.ipv6.conf.all.seg6_enabled=1 > /dev/null
> +ip netns exec ns6 sysctl net.ipv6.conf.lo.seg6_enabled=1 > /dev/null
> +ip netns exec ns6 sysctl net.ipv6.conf.veth10.seg6_enabled=1 > /dev/null
> +
> +ip netns exec ns6 nc -l -6 -u -d 7330 > $TMP_FILE &
> +ip netns exec ns1 bash -c "echo 'foobar' | nc -w0 -6 -u -p 2121 -s fb00::1 fb00::6 7330"
> +sleep 5 # wait enough time to ensure the UDP datagram arrived to the last segment
> +kill -INT $!
> +
> +if [[ $(< $TMP_FILE) != "foobar" ]]; then
> +       exit 1
> +fi
> +
> +exit 0
> --
> 2.16.1
>

^ permalink raw reply

* Re: [RFC V4 PATCH 8/8] vhost: event suppression for packed ring
From: Jason Wang @ 2018-05-25  2:28 UTC (permalink / raw)
  To: mst; +Cc: kvm, virtualization, netdev, linux-kernel, jfreimann, wexu,
	tiwei.bie
In-Reply-To: <1526473941-16199-9-git-send-email-jasowang@redhat.com>



On 2018年05月16日 20:32, Jason Wang wrote:
> +static bool vhost_notify_packed(struct vhost_dev *dev,
> +				struct vhost_virtqueue *vq)
> +{
> +	__virtio16 event_off_wrap, event_flags;
> +	__u16 old, new, off_wrap;
> +	bool v;
> +
> +	/* Flush out used descriptors updates. This is paired
> +	 * with the barrier that the Guest executes when enabling
> +	 * interrupts.
> +	 */
> +	smp_mb();
> +
> +	if (vhost_get_avail(vq, event_flags,
> +			   &vq->driver_event->flags) < 0) {
> +		vq_err(vq, "Failed to get driver desc_event_flags");
> +		return true;
> +	}
> +
> +	if (event_flags == cpu_to_vhost16(vq, RING_EVENT_FLAGS_DISABLE))
> +		return false;
> +	else if (event_flags == cpu_to_vhost16(vq, RING_EVENT_FLAGS_ENABLE))
> +		return true;
> +
> +	/* Read desc event flags before event_off and event_wrap */
> +	smp_rmb();
> +
> +	if (vhost_get_avail(vq, event_off_wrap,
> +			    &vq->driver_event->off_warp) < 0) {
> +		vq_err(vq, "Failed to get driver desc_event_off/wrap");
> +		return true;
> +	}
> +
> +	off_wrap = vhost16_to_cpu(vq, event_off_wrap);
> +
> +	old = vq->signalled_used;
> +	v = vq->signalled_used_valid;
> +	new = vq->signalled_used = vq->last_used_idx;
> +	vq->signalled_used_valid = true;

We should move those idx tracking before checking event_flags. Otherwise 
we may lose interrupts because of a wrong signalled_used value.

Thanks

> +
> +	if (unlikely(!v))
> +		return true;
> +
> +	return vhost_vring_packed_need_event(vq, off_wrap, new, old);
> +}

^ permalink raw reply

* Re: [PATCH net-next 00/10] Mirroring tests involving VLAN
From: David Miller @ 2018-05-25  2:26 UTC (permalink / raw)
  To: petrm; +Cc: netdev, linux-kselftest, shuah, idosch, jiri
In-Reply-To: <cover.1527171860.git.petrm@mellanox.com>

From: Petr Machata <petrm@mellanox.com>
Date: Thu, 24 May 2018 16:27:04 +0200

> This patchset tests mirror-to-gretap with various underlay
> configurations involving VLAN netdevice in particular. Some of the tests
> involve bridges as well, but tests aimed specifically at testing bridges
> (i.e. FDB, STP) are not part of this patchset.
> 
> In patches #1-#6, the codebase is adapted to support the new tests.
> 
> In patch #7, a test for mirroring to VLAN is introduced.
> 
> Patches #8-#10 add three tests where VLAN is part of underlay path after
> gretap encapsulation.

Series applied, thank you.

^ permalink raw reply

* [RFC net-next 4/4] net: sched: cls_matchall: implement offload tcf_proto_op
From: Jakub Kicinski @ 2018-05-25  2:25 UTC (permalink / raw)
  To: netdev; +Cc: jiri, gerlitz.or, sridhar.samudrala, oss-drivers, john.hurley
In-Reply-To: <20180525022539.6799-1-jakub.kicinski@netronome.com>

From: John Hurley <john.hurley@netronome.com>

Add the offload tcf_proto_op in matchall to generate an offload message
for each filter in the given tcf_proto. Call the specified callback with
this new offload message. The function only returns an error if the
callback rejects adding a 'hardware only' rule.

Ensure matchall flags correctly report if the rule is in hw by keeping a
reference counter for the number of instances of the rule offloaded. Only
update the flag when this counter changes from or to 0.

Signed-off-by: John Hurley <john.hurley@netronome.com>
---
 net/sched/cls_matchall.c | 40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/net/sched/cls_matchall.c b/net/sched/cls_matchall.c
index 2ba721a590a7..a3497bc4ee24 100644
--- a/net/sched/cls_matchall.c
+++ b/net/sched/cls_matchall.c
@@ -21,6 +21,7 @@ struct cls_mall_head {
 	struct tcf_result res;
 	u32 handle;
 	u32 flags;
+	u32 in_hw_count;
 	union {
 		struct work_struct work;
 		struct rcu_head	rcu;
@@ -106,6 +107,7 @@ static int mall_replace_hw_filter(struct tcf_proto *tp,
 		mall_destroy_hw_filter(tp, head, cookie, NULL);
 		return err;
 	} else if (err > 0) {
+		head->in_hw_count = err;
 		tcf_block_offload_inc(block, &head->flags);
 	}
 
@@ -246,6 +248,43 @@ static void mall_walk(struct tcf_proto *tp, struct tcf_walker *arg)
 	arg->count++;
 }
 
+static int mall_offload(struct tcf_proto *tp, bool add, tc_setup_cb_t *cb,
+			void *cb_priv, struct netlink_ext_ack *extack)
+{
+	struct cls_mall_head *head = rtnl_dereference(tp->root);
+	struct tc_cls_matchall_offload cls_mall = {};
+	struct tcf_block *block = tp->chain->block;
+	int err;
+
+	if (tc_skip_hw(head->flags))
+		return 0;
+
+	tc_cls_common_offload_init(&cls_mall.common, tp, head->flags, extack);
+	cls_mall.command = add ?
+		TC_CLSMATCHALL_REPLACE : TC_CLSMATCHALL_DESTROY;
+	cls_mall.exts = &head->exts;
+	cls_mall.cookie = (unsigned long)head;
+
+	err = cb(TC_SETUP_CLSMATCHALL, &cls_mall, cb_priv);
+	if (err) {
+		if (add && tc_skip_sw(head->flags))
+			return err;
+		return 0;
+	}
+
+	if (add) {
+		if (!head->in_hw_count)
+			tcf_block_offload_inc(block, &head->flags);
+		head->in_hw_count++;
+	} else {
+		head->in_hw_count--;
+		if (!head->in_hw_count)
+			tcf_block_offload_dec(block, &head->flags);
+	}
+
+	return 0;
+}
+
 static int mall_dump(struct net *net, struct tcf_proto *tp, void *fh,
 		     struct sk_buff *skb, struct tcmsg *t)
 {
@@ -300,6 +339,7 @@ static struct tcf_proto_ops cls_mall_ops __read_mostly = {
 	.change		= mall_change,
 	.delete		= mall_delete,
 	.walk		= mall_walk,
+	.offload	= mall_offload,
 	.dump		= mall_dump,
 	.bind_class	= mall_bind_class,
 	.owner		= THIS_MODULE,
-- 
2.17.0

^ permalink raw reply related

* [RFC net-next 3/4] net: sched: cls_flower: implement offload tcf_proto_op
From: Jakub Kicinski @ 2018-05-25  2:25 UTC (permalink / raw)
  To: netdev; +Cc: jiri, gerlitz.or, sridhar.samudrala, oss-drivers, john.hurley
In-Reply-To: <20180525022539.6799-1-jakub.kicinski@netronome.com>

From: John Hurley <john.hurley@netronome.com>

Add the offload tcf_proto_op in flower to generate an offload message for
each filter in the given tcf_proto. Call the specified callback with this
new offload message. The function only returns an error if the callback
rejects adding a 'hardware only' rule.

A filter contains a flag to indicate if it is in hardware or not. To
ensure the offload function properly maintains this flag, keep a reference
counter for the number of instances of the filter that are in hardware.
Only update the flag when this counter changes from or to 0.

Signed-off-by: John Hurley <john.hurley@netronome.com>
---
 net/sched/cls_flower.c | 51 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index eacaaf803914..c23d338df6a1 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -90,6 +90,7 @@ struct cls_fl_filter {
 	struct list_head list;
 	u32 handle;
 	u32 flags;
+	u32 in_hw_count;
 	union {
 		struct work_struct work;
 		struct rcu_head	rcu;
@@ -289,6 +290,7 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
 		fl_hw_destroy_filter(tp, f, NULL);
 		return err;
 	} else if (err > 0) {
+		f->in_hw_count = err;
 		tcf_block_offload_inc(block, &f->flags);
 	}
 
@@ -1092,6 +1094,54 @@ static void fl_walk(struct tcf_proto *tp, struct tcf_walker *arg)
 	}
 }
 
+static int fl_offload(struct tcf_proto *tp, bool add, tc_setup_cb_t *cb,
+		      void *cb_priv, struct netlink_ext_ack *extack)
+{
+	struct cls_fl_head *head = rtnl_dereference(tp->root);
+	struct tc_cls_flower_offload cls_flower = {};
+	struct tcf_block *block = tp->chain->block;
+	struct fl_flow_mask *mask;
+	struct cls_fl_filter *f;
+	int err;
+
+	list_for_each_entry(mask, &head->masks, list) {
+		list_for_each_entry(f, &mask->filters, list) {
+			if (tc_skip_hw(f->flags))
+				continue;
+
+			tc_cls_common_offload_init(&cls_flower.common, tp,
+						   f->flags, extack);
+			cls_flower.command = add ?
+				TC_CLSFLOWER_REPLACE : TC_CLSFLOWER_DESTROY;
+			cls_flower.cookie = (unsigned long)f;
+			cls_flower.dissector = &mask->dissector;
+			cls_flower.mask = &f->mkey;
+			cls_flower.key = &f->key;
+			cls_flower.exts = &f->exts;
+			cls_flower.classid = f->res.classid;
+
+			err = cb(TC_SETUP_CLSFLOWER, &cls_flower, cb_priv);
+			if (err) {
+				if (add && tc_skip_sw(f->flags))
+					return err;
+				continue;
+			}
+
+			if (add) {
+				if (!f->in_hw_count)
+					tcf_block_offload_inc(block, &f->flags);
+				f->in_hw_count++;
+			} else {
+				f->in_hw_count--;
+				if (!f->in_hw_count)
+					tcf_block_offload_dec(block, &f->flags);
+			}
+		}
+	}
+
+	return 0;
+}
+
 static int fl_dump_key_val(struct sk_buff *skb,
 			   void *val, int val_type,
 			   void *mask, int mask_type, int len)
@@ -1443,6 +1493,7 @@ static struct tcf_proto_ops cls_fl_ops __read_mostly = {
 	.change		= fl_change,
 	.delete		= fl_delete,
 	.walk		= fl_walk,
+	.offload	= fl_offload,
 	.dump		= fl_dump,
 	.bind_class	= fl_bind_class,
 	.owner		= THIS_MODULE,
-- 
2.17.0

^ permalink raw reply related

* [RFC net-next 2/4] net: sched: add tcf_proto_op to offload a rule
From: Jakub Kicinski @ 2018-05-25  2:25 UTC (permalink / raw)
  To: netdev; +Cc: jiri, gerlitz.or, sridhar.samudrala, oss-drivers, john.hurley
In-Reply-To: <20180525022539.6799-1-jakub.kicinski@netronome.com>

From: John Hurley <john.hurley@netronome.com>

Create a new tcf_proto_op called offload that generates a new offload
message for each node in a tcf_proto. Pointers to the tcf_proto and
whether the offload request is to add or delete the node are included.
Also included is a callback function to send the offload message to and
the option of priv data to go with the cb.

The function is called to offload all tcf_proto nodes in all chains of a
block when a callback tries to register to a port that already has
offloaded rules. If all existing rules cannot be offloaded then the
registration is rejected. This replaces the previous policy of rejecting
such callback registration outright.

On unregistration of a callback, the rules are flushed for that given cb.
The implementation of block sharing in the NFP driver, for example,
dublicates shared rules to all devs bound to a block. This meant that
rules could still exist in hw even after a device is unbound from a block
(assuming the block still remains active).

Signed-off-by: John Hurley <john.hurley@netronome.com>
---
 .../net/ethernet/mellanox/mlxsw/spectrum.c    |  4 +-
 include/net/act_api.h                         |  3 --
 include/net/pkt_cls.h                         |  6 ++-
 include/net/sch_generic.h                     |  6 +++
 net/sched/cls_api.c                           | 50 ++++++++++++++++---
 5 files changed, 54 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index 6887c7faaaba..67ff1934fc38 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -1542,7 +1542,7 @@ mlxsw_sp_setup_tc_block_flower_bind(struct mlxsw_sp_port *mlxsw_sp_port,
 
 err_block_bind:
 	if (!tcf_block_cb_decref(block_cb)) {
-		__tcf_block_cb_unregister(block_cb);
+		__tcf_block_cb_unregister(block, block_cb);
 err_cb_register:
 		mlxsw_sp_acl_block_destroy(acl_block);
 	}
@@ -1572,7 +1572,7 @@ mlxsw_sp_setup_tc_block_flower_unbind(struct mlxsw_sp_port *mlxsw_sp_port,
 	err = mlxsw_sp_acl_block_unbind(mlxsw_sp, acl_block,
 					mlxsw_sp_port, ingress);
 	if (!err && !tcf_block_cb_decref(block_cb)) {
-		__tcf_block_cb_unregister(block_cb);
+		__tcf_block_cb_unregister(block, block_cb);
 		mlxsw_sp_acl_block_destroy(acl_block);
 	}
 }
diff --git a/include/net/act_api.h b/include/net/act_api.h
index 9e59ebfded62..5ff11adbe2a6 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -190,9 +190,6 @@ static inline void tcf_action_stats_update(struct tc_action *a, u64 bytes,
 #endif
 }
 
-typedef int tc_setup_cb_t(enum tc_setup_type type,
-			  void *type_data, void *cb_priv);
-
 #ifdef CONFIG_NET_CLS_ACT
 int tc_setup_cb_egdev_register(const struct net_device *dev,
 			       tc_setup_cb_t *cb, void *cb_priv);
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index b2b5cbe13086..74dd9784bf88 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -78,7 +78,8 @@ struct tcf_block_cb *__tcf_block_cb_register(struct tcf_block *block,
 int tcf_block_cb_register(struct tcf_block *block,
 			  tc_setup_cb_t *cb, void *cb_ident,
 			  void *cb_priv, struct netlink_ext_ack *extack);
-void __tcf_block_cb_unregister(struct tcf_block_cb *block_cb);
+void __tcf_block_cb_unregister(struct tcf_block *block,
+			       struct tcf_block_cb *block_cb);
 void tcf_block_cb_unregister(struct tcf_block *block,
 			     tc_setup_cb_t *cb, void *cb_ident);
 
@@ -176,7 +177,8 @@ int tcf_block_cb_register(struct tcf_block *block,
 }
 
 static inline
-void __tcf_block_cb_unregister(struct tcf_block_cb *block_cb)
+void __tcf_block_cb_unregister(struct tcf_block *block,
+			       struct tcf_block_cb *block_cb)
 {
 }
 
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 98c10a28cd01..7b0d62870e82 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -20,6 +20,9 @@ struct qdisc_walker;
 struct tcf_walker;
 struct module;
 
+typedef int tc_setup_cb_t(enum tc_setup_type type,
+			  void *type_data, void *cb_priv);
+
 struct qdisc_rate_table {
 	struct tc_ratespec rate;
 	u32		data[256];
@@ -256,6 +259,9 @@ struct tcf_proto_ops {
 					  bool *last,
 					  struct netlink_ext_ack *);
 	void			(*walk)(struct tcf_proto*, struct tcf_walker *arg);
+	int			(*offload)(struct tcf_proto *, bool,
+					   tc_setup_cb_t *, void *,
+					   struct netlink_ext_ack *);
 	void			(*bind_class)(void *, u32, unsigned long);
 
 	/* rtnetlink specific */
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 95a83aa798d1..0aba7d7db099 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -677,19 +677,50 @@ unsigned int tcf_block_cb_decref(struct tcf_block_cb *block_cb)
 }
 EXPORT_SYMBOL(tcf_block_cb_decref);
 
+static int
+tcf_block_playback_offloads(struct tcf_block *block, tc_setup_cb_t *cb,
+			    void *cb_priv, bool add,
+			    struct netlink_ext_ack *extack)
+{
+	struct tcf_chain *chain;
+	struct tcf_proto *tp;
+	int err;
+
+	list_for_each_entry(chain, &block->chain_list, list) {
+		for (tp = rtnl_dereference(chain->filter_chain); tp;
+		     tp = rtnl_dereference(tp->next)) {
+			if (tp->ops->offload) {
+				err = tp->ops->offload(tp, add, cb, cb_priv,
+						       extack);
+				if (err && add)
+					goto err_playback_remove;
+			} else if (add) {
+				err = -EOPNOTSUPP;
+				NL_SET_ERR_MSG(extack, "Block rule replay failed - no tp offload op");
+				goto err_playback_remove;
+			}
+		}
+	}
+
+	return 0;
+
+err_playback_remove:
+	tcf_block_playback_offloads(block, cb, cb_priv, false, extack);
+	return err;
+}
+
 struct tcf_block_cb *__tcf_block_cb_register(struct tcf_block *block,
 					     tc_setup_cb_t *cb, void *cb_ident,
 					     void *cb_priv,
 					     struct netlink_ext_ack *extack)
 {
 	struct tcf_block_cb *block_cb;
+	int err;
 
-	/* At this point, playback of previous block cb calls is not supported,
-	 * so forbid to register to block which already has some offloaded
-	 * filters present.
-	 */
-	if (tcf_block_offload_in_use(block))
-		return ERR_PTR(-EOPNOTSUPP);
+	/* Replay any already present rules */
+	err = tcf_block_playback_offloads(block, cb, cb_priv, true, extack);
+	if (err)
+		return ERR_PTR(err);
 
 	block_cb = kzalloc(sizeof(*block_cb), GFP_KERNEL);
 	if (!block_cb)
@@ -714,8 +745,11 @@ int tcf_block_cb_register(struct tcf_block *block,
 }
 EXPORT_SYMBOL(tcf_block_cb_register);
 
-void __tcf_block_cb_unregister(struct tcf_block_cb *block_cb)
+void __tcf_block_cb_unregister(struct tcf_block *block,
+			       struct tcf_block_cb *block_cb)
 {
+	tcf_block_playback_offloads(block, block_cb->cb, block_cb->cb_priv,
+				    false, NULL);
 	list_del(&block_cb->list);
 	kfree(block_cb);
 }
@@ -729,7 +763,7 @@ void tcf_block_cb_unregister(struct tcf_block *block,
 	block_cb = tcf_block_cb_lookup(block, cb, cb_ident);
 	if (!block_cb)
 		return;
-	__tcf_block_cb_unregister(block_cb);
+	__tcf_block_cb_unregister(block, block_cb);
 }
 EXPORT_SYMBOL(tcf_block_cb_unregister);
 
-- 
2.17.0

^ permalink raw reply related

* [RFC net-next 1/4] net: sched: pass extack pointer to block binds and cb registration
From: Jakub Kicinski @ 2018-05-25  2:25 UTC (permalink / raw)
  To: netdev; +Cc: jiri, gerlitz.or, sridhar.samudrala, oss-drivers, john.hurley
In-Reply-To: <20180525022539.6799-1-jakub.kicinski@netronome.com>

From: John Hurley <john.hurley@netronome.com>

Pass the extact struct from a tc qdisc add to the block bind function and,
in turn, to the setup_tc ndo of binding device via the tc_block_offload
struct. Pass this back to any block callback registrations to allow
netlink logging of fails in the bind process.

Signed-off-by: John Hurley <john.hurley@netronome.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     |  2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c |  2 +-
 .../net/ethernet/chelsio/cxgb4/cxgb4_main.c   |  2 +-
 drivers/net/ethernet/intel/i40e/i40e_main.c   |  2 +-
 .../net/ethernet/intel/i40evf/i40evf_main.c   |  2 +-
 drivers/net/ethernet/intel/igb/igb_main.c     |  2 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  2 +-
 .../net/ethernet/mellanox/mlx5/core/en_main.c |  2 +-
 .../net/ethernet/mellanox/mlx5/core/en_rep.c  |  2 +-
 .../net/ethernet/mellanox/mlxsw/spectrum.c    | 10 ++++----
 drivers/net/ethernet/netronome/nfp/bpf/main.c |  2 +-
 .../ethernet/netronome/nfp/flower/offload.c   |  2 +-
 .../net/ethernet/stmicro/stmmac/stmmac_main.c |  2 +-
 drivers/net/netdevsim/netdev.c                |  2 +-
 include/net/pkt_cls.h                         |  6 +++--
 net/dsa/slave.c                               |  2 +-
 net/sched/cls_api.c                           | 24 ++++++++++++-------
 17 files changed, 39 insertions(+), 29 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index dfa0839f6656..a9187b0d97e3 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -7984,7 +7984,7 @@ static int bnxt_setup_tc_block(struct net_device *dev,
 	switch (f->command) {
 	case TC_BLOCK_BIND:
 		return tcf_block_cb_register(f->block, bnxt_setup_tc_block_cb,
-					     bp, bp);
+					     bp, bp, f->extack);
 	case TC_BLOCK_UNBIND:
 		tcf_block_cb_unregister(f->block, bnxt_setup_tc_block_cb, bp);
 		return 0;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c
index 38f635cf8408..6cd3976f9920 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c
@@ -173,7 +173,7 @@ static int bnxt_vf_rep_setup_tc_block(struct net_device *dev,
 	case TC_BLOCK_BIND:
 		return tcf_block_cb_register(f->block,
 					     bnxt_vf_rep_setup_tc_block_cb,
-					     vf_rep, vf_rep);
+					     vf_rep, vf_rep, f->extack);
 	case TC_BLOCK_UNBIND:
 		tcf_block_cb_unregister(f->block,
 					bnxt_vf_rep_setup_tc_block_cb, vf_rep);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 513e1d356384..191ac686c5fb 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -3015,7 +3015,7 @@ static int cxgb_setup_tc_block(struct net_device *dev,
 	switch (f->command) {
 	case TC_BLOCK_BIND:
 		return tcf_block_cb_register(f->block, cxgb_setup_tc_block_cb,
-					     pi, dev);
+					     pi, dev, f->extack);
 	case TC_BLOCK_UNBIND:
 		tcf_block_cb_unregister(f->block, cxgb_setup_tc_block_cb, pi);
 		return 0;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index b5daa5c9c7de..07dc46bbe508 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -7554,7 +7554,7 @@ static int i40e_setup_tc_block(struct net_device *dev,
 	switch (f->command) {
 	case TC_BLOCK_BIND:
 		return tcf_block_cb_register(f->block, i40e_setup_tc_block_cb,
-					     np, np);
+					     np, np, f->extack);
 	case TC_BLOCK_UNBIND:
 		tcf_block_cb_unregister(f->block, i40e_setup_tc_block_cb, np);
 		return 0;
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index a7b87f935411..3f8bb0d61f63 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -2926,7 +2926,7 @@ static int i40evf_setup_tc_block(struct net_device *dev,
 	switch (f->command) {
 	case TC_BLOCK_BIND:
 		return tcf_block_cb_register(f->block, i40evf_setup_tc_block_cb,
-					     adapter, adapter);
+					     adapter, adapter, f->extack);
 	case TC_BLOCK_UNBIND:
 		tcf_block_cb_unregister(f->block, i40evf_setup_tc_block_cb,
 					adapter);
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 78574c06635b..b1ba426d2b40 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -2727,7 +2727,7 @@ static int igb_setup_tc_block(struct igb_adapter *adapter,
 	switch (f->command) {
 	case TC_BLOCK_BIND:
 		return tcf_block_cb_register(f->block, igb_setup_tc_block_cb,
-					     adapter, adapter);
+					     adapter, adapter, f->extack);
 	case TC_BLOCK_UNBIND:
 		tcf_block_cb_unregister(f->block, igb_setup_tc_block_cb,
 					adapter);
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index a52d92e182ee..81553a602f0f 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -9376,7 +9376,7 @@ static int ixgbe_setup_tc_block(struct net_device *dev,
 	switch (f->command) {
 	case TC_BLOCK_BIND:
 		return tcf_block_cb_register(f->block, ixgbe_setup_tc_block_cb,
-					     adapter, adapter);
+					     adapter, adapter, f->extack);
 	case TC_BLOCK_UNBIND:
 		tcf_block_cb_unregister(f->block, ixgbe_setup_tc_block_cb,
 					adapter);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index b5a7580b12fe..2e0029173a64 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3178,7 +3178,7 @@ static int mlx5e_setup_tc_block(struct net_device *dev,
 	switch (f->command) {
 	case TC_BLOCK_BIND:
 		return tcf_block_cb_register(f->block, mlx5e_setup_tc_block_cb,
-					     priv, priv);
+					     priv, priv, f->extack);
 	case TC_BLOCK_UNBIND:
 		tcf_block_cb_unregister(f->block, mlx5e_setup_tc_block_cb,
 					priv);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index c3034f58aa33..d9fe432f18f5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -803,7 +803,7 @@ static int mlx5e_rep_setup_tc_block(struct net_device *dev,
 	switch (f->command) {
 	case TC_BLOCK_BIND:
 		return tcf_block_cb_register(f->block, mlx5e_rep_setup_tc_cb,
-					     priv, priv);
+					     priv, priv, f->extack);
 	case TC_BLOCK_UNBIND:
 		tcf_block_cb_unregister(f->block, mlx5e_rep_setup_tc_cb, priv);
 		return 0;
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index bb252b36994d..6887c7faaaba 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -1503,7 +1503,8 @@ static int mlxsw_sp_setup_tc_block_cb_flower(enum tc_setup_type type,
 
 static int
 mlxsw_sp_setup_tc_block_flower_bind(struct mlxsw_sp_port *mlxsw_sp_port,
-				    struct tcf_block *block, bool ingress)
+				    struct tcf_block *block, bool ingress,
+				    struct netlink_ext_ack *extack)
 {
 	struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp;
 	struct mlxsw_sp_acl_block *acl_block;
@@ -1518,7 +1519,7 @@ mlxsw_sp_setup_tc_block_flower_bind(struct mlxsw_sp_port *mlxsw_sp_port,
 			return -ENOMEM;
 		block_cb = __tcf_block_cb_register(block,
 						   mlxsw_sp_setup_tc_block_cb_flower,
-						   mlxsw_sp, acl_block);
+						   mlxsw_sp, acl_block, extack);
 		if (IS_ERR(block_cb)) {
 			err = PTR_ERR(block_cb);
 			goto err_cb_register;
@@ -1596,11 +1597,12 @@ static int mlxsw_sp_setup_tc_block(struct mlxsw_sp_port *mlxsw_sp_port,
 	switch (f->command) {
 	case TC_BLOCK_BIND:
 		err = tcf_block_cb_register(f->block, cb, mlxsw_sp_port,
-					    mlxsw_sp_port);
+					    mlxsw_sp_port, f->extack);
 		if (err)
 			return err;
 		err = mlxsw_sp_setup_tc_block_flower_bind(mlxsw_sp_port,
-							  f->block, ingress);
+							  f->block, ingress,
+							  f->extack);
 		if (err) {
 			tcf_block_cb_unregister(f->block, cb, mlxsw_sp_port);
 			return err;
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.c b/drivers/net/ethernet/netronome/nfp/bpf/main.c
index fcdfb8e7fdea..bf46f7bff912 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.c
@@ -206,7 +206,7 @@ static int nfp_bpf_setup_tc_block(struct net_device *netdev,
 	case TC_BLOCK_BIND:
 		return tcf_block_cb_register(f->block,
 					     nfp_bpf_setup_tc_block_cb,
-					     nn, nn);
+					     nn, nn, f->extack);
 	case TC_BLOCK_UNBIND:
 		tcf_block_cb_unregister(f->block,
 					nfp_bpf_setup_tc_block_cb,
diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index c42e64f32333..7abefed1efe9 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -627,7 +627,7 @@ static int nfp_flower_setup_tc_block(struct net_device *netdev,
 	case TC_BLOCK_BIND:
 		return tcf_block_cb_register(f->block,
 					     nfp_flower_setup_tc_block_cb,
-					     repr, repr);
+					     repr, repr, f->extack);
 	case TC_BLOCK_UNBIND:
 		tcf_block_cb_unregister(f->block,
 					nfp_flower_setup_tc_block_cb,
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index c32de53a00d3..f5bee8da084e 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -3759,7 +3759,7 @@ static int stmmac_setup_tc_block(struct stmmac_priv *priv,
 	switch (f->command) {
 	case TC_BLOCK_BIND:
 		return tcf_block_cb_register(f->block, stmmac_setup_tc_block_cb,
-				priv, priv);
+				priv, priv, f->extack);
 	case TC_BLOCK_UNBIND:
 		tcf_block_cb_unregister(f->block, stmmac_setup_tc_block_cb, priv);
 		return 0;
diff --git a/drivers/net/netdevsim/netdev.c b/drivers/net/netdevsim/netdev.c
index ec68f38213d9..c9dacc6fcd59 100644
--- a/drivers/net/netdevsim/netdev.c
+++ b/drivers/net/netdevsim/netdev.c
@@ -260,7 +260,7 @@ nsim_setup_tc_block(struct net_device *dev, struct tc_block_offload *f)
 	switch (f->command) {
 	case TC_BLOCK_BIND:
 		return tcf_block_cb_register(f->block, nsim_setup_tc_block_cb,
-					     ns, ns);
+					     ns, ns, f->extack);
 	case TC_BLOCK_UNBIND:
 		tcf_block_cb_unregister(f->block, nsim_setup_tc_block_cb, ns);
 		return 0;
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 0005f0b40fe9..b2b5cbe13086 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -73,10 +73,11 @@ void tcf_block_cb_incref(struct tcf_block_cb *block_cb);
 unsigned int tcf_block_cb_decref(struct tcf_block_cb *block_cb);
 struct tcf_block_cb *__tcf_block_cb_register(struct tcf_block *block,
 					     tc_setup_cb_t *cb, void *cb_ident,
-					     void *cb_priv);
+					     void *cb_priv,
+					     struct netlink_ext_ack *extack);
 int tcf_block_cb_register(struct tcf_block *block,
 			  tc_setup_cb_t *cb, void *cb_ident,
-			  void *cb_priv);
+			  void *cb_priv, struct netlink_ext_ack *extack);
 void __tcf_block_cb_unregister(struct tcf_block_cb *block_cb);
 void tcf_block_cb_unregister(struct tcf_block *block,
 			     tc_setup_cb_t *cb, void *cb_ident);
@@ -596,6 +597,7 @@ struct tc_block_offload {
 	enum tc_block_command command;
 	enum tcf_block_binder_type binder_type;
 	struct tcf_block *block;
+	struct netlink_ext_ack *extack;
 };
 
 struct tc_cls_common_offload {
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 1e3b6a6d8a40..71536c435132 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -900,7 +900,7 @@ static int dsa_slave_setup_tc_block(struct net_device *dev,
 
 	switch (f->command) {
 	case TC_BLOCK_BIND:
-		return tcf_block_cb_register(f->block, cb, dev, dev);
+		return tcf_block_cb_register(f->block, cb, dev, dev, f->extack);
 	case TC_BLOCK_UNBIND:
 		tcf_block_cb_unregister(f->block, cb, dev);
 		return 0;
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 963e4bf0aab8..95a83aa798d1 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -276,7 +276,8 @@ static bool tcf_block_offload_in_use(struct tcf_block *block)
 static int tcf_block_offload_cmd(struct tcf_block *block,
 				 struct net_device *dev,
 				 struct tcf_block_ext_info *ei,
-				 enum tc_block_command command)
+				 enum tc_block_command command,
+				 struct netlink_ext_ack *extack)
 {
 	struct tc_block_offload bo = {};
 
@@ -287,7 +288,8 @@ static int tcf_block_offload_cmd(struct tcf_block *block,
 }
 
 static int tcf_block_offload_bind(struct tcf_block *block, struct Qdisc *q,
-				  struct tcf_block_ext_info *ei)
+				  struct tcf_block_ext_info *ei,
+				  struct netlink_ext_ack *extack)
 {
 	struct net_device *dev = q->dev_queue->dev;
 	int err;
@@ -298,10 +300,12 @@ static int tcf_block_offload_bind(struct tcf_block *block, struct Qdisc *q,
 	/* If tc offload feature is disabled and the block we try to bind
 	 * to already has some offloaded filters, forbid to bind.
 	 */
-	if (!tc_can_offload(dev) && tcf_block_offload_in_use(block))
+	if (!tc_can_offload(dev) && tcf_block_offload_in_use(block)) {
+		NL_SET_ERR_MSG(extack, "Bind to offloaded block failed as dev has offload disabled");
 		return -EOPNOTSUPP;
+	}
 
-	err = tcf_block_offload_cmd(block, dev, ei, TC_BLOCK_BIND);
+	err = tcf_block_offload_cmd(block, dev, ei, TC_BLOCK_BIND, extack);
 	if (err == -EOPNOTSUPP)
 		goto no_offload_dev_inc;
 	return err;
@@ -321,7 +325,7 @@ static void tcf_block_offload_unbind(struct tcf_block *block, struct Qdisc *q,
 
 	if (!dev->netdev_ops->ndo_setup_tc)
 		goto no_offload_dev_dec;
-	err = tcf_block_offload_cmd(block, dev, ei, TC_BLOCK_UNBIND);
+	err = tcf_block_offload_cmd(block, dev, ei, TC_BLOCK_UNBIND, NULL);
 	if (err == -EOPNOTSUPP)
 		goto no_offload_dev_dec;
 	return;
@@ -539,7 +543,7 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q,
 	if (err)
 		goto err_chain_head_change_cb_add;
 
-	err = tcf_block_offload_bind(block, q, ei);
+	err = tcf_block_offload_bind(block, q, ei, extack);
 	if (err)
 		goto err_block_offload_bind;
 
@@ -675,7 +679,8 @@ EXPORT_SYMBOL(tcf_block_cb_decref);
 
 struct tcf_block_cb *__tcf_block_cb_register(struct tcf_block *block,
 					     tc_setup_cb_t *cb, void *cb_ident,
-					     void *cb_priv)
+					     void *cb_priv,
+					     struct netlink_ext_ack *extack)
 {
 	struct tcf_block_cb *block_cb;
 
@@ -699,11 +704,12 @@ EXPORT_SYMBOL(__tcf_block_cb_register);
 
 int tcf_block_cb_register(struct tcf_block *block,
 			  tc_setup_cb_t *cb, void *cb_ident,
-			  void *cb_priv)
+			  void *cb_priv, struct netlink_ext_ack *extack)
 {
 	struct tcf_block_cb *block_cb;
 
-	block_cb = __tcf_block_cb_register(block, cb, cb_ident, cb_priv);
+	block_cb = __tcf_block_cb_register(block, cb, cb_ident, cb_priv,
+					   extack);
 	return IS_ERR(block_cb) ? PTR_ERR(block_cb) : 0;
 }
 EXPORT_SYMBOL(tcf_block_cb_register);
-- 
2.17.0

^ permalink raw reply related

* [RFC net-next 0/4] net: sched: support replay of filter offload when binding to block
From: Jakub Kicinski @ 2018-05-25  2:25 UTC (permalink / raw)
  To: netdev
  Cc: jiri, gerlitz.or, sridhar.samudrala, oss-drivers, john.hurley,
	Jakub Kicinski

Hi!

This series from John adds the ability to replay filter offload requests
when new offload callback is being registered on a TC block.  This is most
likely to take place for shared blocks today, when a block which already
has rules is bound to another interface.  Prior to this patch set if any
of the rules were offloaded the block bind would fail.

A new tcf_proto_op is added to generate a filter-specific offload request.
The new 'offload' op is supporting extack from day 0, hence we need to
propagate extack to .ndo_setup_tc TC_BLOCK_BIND/TC_BLOCK_UNBIND and
through tcf_block_cb_register() to tcf_block_playback_offloads().

The immediate use of this patch set is to simplify life of drivers which
require duplicating rules when sharing blocks.  Switch drivers (mlxsw)
can bind ports to rule lists dynamically, NIC drivers generally don't
have that ability and need the rules to be duplicated for each ingress
they match on.  In code terms this means that switch drivers don't
register multiple callbacks for each port.  NIC drivers do, and get a
separate request and hance rule per-port, as if the block was not shared.
The registration would fail, however, if some rules were already present.

As John notes in description of patch 2, drivers which register multiple
callbacks to shared blocks will likely need to flush the rules on block
unbind.  This set makes the core not only replay the the offload add
requests but also offload remove requests when callback is unregistered.

John Hurley (4):
  net: sched: pass extack pointer to block binds and cb registration
  net: sched: add tcf_proto_op to offload a rule
  net: sched: cls_flower: implement offload tcf_proto_op
  net: sched: cls_matchall: implement offload tcf_proto_op

 drivers/net/ethernet/broadcom/bnxt/bnxt.c     |  2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c |  2 +-
 .../net/ethernet/chelsio/cxgb4/cxgb4_main.c   |  2 +-
 drivers/net/ethernet/intel/i40e/i40e_main.c   |  2 +-
 .../net/ethernet/intel/i40evf/i40evf_main.c   |  2 +-
 drivers/net/ethernet/intel/igb/igb_main.c     |  2 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  2 +-
 .../net/ethernet/mellanox/mlx5/core/en_main.c |  2 +-
 .../net/ethernet/mellanox/mlx5/core/en_rep.c  |  2 +-
 .../net/ethernet/mellanox/mlxsw/spectrum.c    | 14 ++--
 drivers/net/ethernet/netronome/nfp/bpf/main.c |  2 +-
 .../ethernet/netronome/nfp/flower/offload.c   |  2 +-
 .../net/ethernet/stmicro/stmmac/stmmac_main.c |  2 +-
 drivers/net/netdevsim/netdev.c                |  2 +-
 include/net/act_api.h                         |  3 -
 include/net/pkt_cls.h                         | 12 ++-
 include/net/sch_generic.h                     |  6 ++
 net/dsa/slave.c                               |  2 +-
 net/sched/cls_api.c                           | 74 ++++++++++++++-----
 net/sched/cls_flower.c                        | 51 +++++++++++++
 net/sched/cls_matchall.c                      | 40 ++++++++++
 21 files changed, 184 insertions(+), 44 deletions(-)

-- 
2.17.0

^ permalink raw reply

* Re: pull-request: bpf-next 2018-05-24
From: David Miller @ 2018-05-25  2:21 UTC (permalink / raw)
  To: ast; +Cc: daniel, kernel-team, netdev
In-Reply-To: <20180525020351.3634582-1-ast@kernel.org>

From: Alexei Starovoitov <ast@kernel.org>
Date: Thu, 24 May 2018 19:03:51 -0700

> The following pull-request contains BPF updates for your *net-next*
> tree.

Pulled, thanks Alexei.

^ permalink raw reply

* Re: [PATCH net-next 0/8] ibmvnic: Failover hardening
From: David Miller @ 2018-05-25  2:19 UTC (permalink / raw)
  To: tlfalcon; +Cc: netdev, nfont, jallen, linuxppc-dev
In-Reply-To: <1527100682-23099-1-git-send-email-tlfalcon@linux.vnet.ibm.com>

From: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Date: Wed, 23 May 2018 13:37:54 -0500

> Introduce additional transport event hardening to handle
> events during device reset. In the driver's current state,
> if a transport event is received during device reset, it can
> cause the device to become unresponsive as invalid operations
> are processed as the backing device context changes. After
> a transport event, the device expects a request to begin the
> initialization process. If the driver is still processing
> a previously queued device reset in this state, it is likely
> to fail as firmware will reject any commands other than the
> one to initialize the client driver's Command-Response Queue.
> 
> Instead of failing and becoming dormant, the driver will make
> one more attempt to recover and continue operation. This is
> achieved by setting a state flag, which if true will direct
> the driver to clean up all allocated resources and perform
> a hard reset in an attempt to bring the driver back to an
> operational state.

Series applied.

^ permalink raw reply

* Re: [PATCH net] ipv4: remove warning in ip_recv_error
From: David Miller @ 2018-05-25  2:18 UTC (permalink / raw)
  To: willemdebruijn.kernel; +Cc: netdev, willemb
In-Reply-To: <20180523182952.77006-1-willemdebruijn.kernel@gmail.com>

From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Date: Wed, 23 May 2018 14:29:52 -0400

> From: Willem de Bruijn <willemb@google.com>
> 
> A precondition check in ip_recv_error triggered on an otherwise benign
> race. Remove the warning.
> 
> The warning triggers when passing an ipv6 socket to this ipv4 error
> handling function. RaceFuzzer was able to trigger it due to a race
> in setsockopt IPV6_ADDRFORM.
 ...
> This socket option converts a v6 socket that is connected to a v4 peer
> to an v4 socket. It updates the socket on the fly, changing fields in
> sk as well as other structs. This is inherently non-atomic. It races
> with the lockless udp_recvmsg path.
> 
> No other code makes an assumption that these fields are updated
> atomically. It is benign here, too, as ip_recv_error cares only about
> the protocol of the skbs enqueued on the error queue, for which
> sk_family is not a precise predictor (thanks to another isue with
> IPV6_ADDRFORM).
> 
> Link: http://lkml.kernel.org/r/20180518120826.GA19515@dragonet.kaist.ac.kr
> Fixes: ("7ce875e5ecb8 ipv4: warn once on passing AF_INET6 socket to ip_recv_error")
> Reported-by: DaeRyong Jeong <threeearcat@gmail.com>
> Suggested-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Willem de Bruijn <willemb@google.com>

Applied and queued up for -stable.

The SHA1_ID doesn't go inside the (" ") of the Fixes tag, I fixed
it up this time.

^ permalink raw reply

* Re: [PATCH net-next 0/3] selftests: forwarding: Additions to mirror-to-gretap tests
From: David Miller @ 2018-05-25  2:14 UTC (permalink / raw)
  To: petrm; +Cc: netdev, linux-kselftest, shuah, idosch
In-Reply-To: <cover.1527093017.git.petrm@mellanox.com>

From: Petr Machata <petrm@mellanox.com>
Date: Wed, 23 May 2018 18:34:49 +0200

> This patchset is for a handful of edge cases in mirror-to-gretap
> scenarios: removal of mirrored-to netdevice (#1), removal of underlay
> route for tunnel remote endpoint (#2) and cessation of mirroring upon
> removal of flower mirroring rule (#3).

Series applied, thank you.

^ permalink raw reply

* Re: [PATCH net] net : sched: cls_api: deal with egdev path only if needed
From: David Miller @ 2018-05-25  2:13 UTC (permalink / raw)
  To: ogerlitz; +Cc: jiri, jakub.kicinski, paulb, netdev, ASAP_Direct_Dev
In-Reply-To: <1527092688-27496-1-git-send-email-ogerlitz@mellanox.com>

From: Or Gerlitz <ogerlitz@mellanox.com>
Date: Wed, 23 May 2018 19:24:48 +0300

> When dealing with ingress rule on a netdev, if we did fine through the
> conventional path, there's no need to continue into the egdev route,
> and we can stop right there.
> 
> Not doing so may cause a 2nd rule to be added by the cls api layer
> with the ingress being the egdev.
> 
> For example, under sriov switchdev scheme, a user rule of VFR A --> VFR B
> will end up with two HW rules (1) VF A --> VF B and (2) uplink --> VF B
> 
> Fixes: 208c0f4b5237 ('net: sched: use tc_setup_cb_call to call per-block callbacks')
> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>

Applied and queued up for -stable.

> As I wrote in [1], we are asking this patch to go into net and 
> stable >= 4.15 but not carried into net-next.

Please send me a revert with a detailed commit message when this
gets merged into net-next.

Thanks.

^ permalink raw reply

* Re: [PATCH net] vhost: synchronize IOTLB message with dev cleanup
From: David Miller @ 2018-05-25  2:10 UTC (permalink / raw)
  To: jasowang; +Cc: mst, kvm, virtualization, netdev, linux-kernel
In-Reply-To: <1526990337-24892-1-git-send-email-jasowang@redhat.com>

From: Jason Wang <jasowang@redhat.com>
Date: Tue, 22 May 2018 19:58:57 +0800

> DaeRyong Jeong reports a race between vhost_dev_cleanup() and
> vhost_process_iotlb_msg():
> 
> Thread interleaving:
> CPU0 (vhost_process_iotlb_msg)			CPU1 (vhost_dev_cleanup)
> (In the case of both VHOST_IOTLB_UPDATE and
> VHOST_IOTLB_INVALIDATE)
> =====						=====
> 						vhost_umem_clean(dev->iotlb);
> if (!dev->iotlb) {
> 	        ret = -EFAULT;
> 		        break;
> }
> 						dev->iotlb = NULL;
> 
> The reason is we don't synchronize between them, fixing by protecting
> vhost_process_iotlb_msg() with dev mutex.
> 
> Reported-by: DaeRyong Jeong <threeearcat@gmail.com>
> Fixes: 6b1e6cc7855b0 ("vhost: new device IOTLB API")
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Applied and queued up for -stable.

^ permalink raw reply

* pull-request: bpf-next 2018-05-24
From: Alexei Starovoitov @ 2018-05-25  2:03 UTC (permalink / raw)
  To: davem; +Cc: daniel, kernel-team, netdev

Hi David,

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Björn Töpel cleans up AF_XDP (removes rebind, explicit cache alignment from uapi, etc).

2) David Ahern adds mtu checks to bpf_ipv{4,6}_fib_lookup() helpers.

3) Jesper Dangaard Brouer adds bulking support to ndo_xdp_xmit.

4) Jiong Wang adds support for indirect and arithmetic shifts to NFP

5) Martin KaFai Lau cleans up BTF uapi and makes the btf_header extensible.

6) Mathieu Xhonneux adds an End.BPF action to seg6local with BPF helpers allowing
   to edit/grow/shrink a SRH and apply on a packet generic SRv6 actions.

7) Sandipan Das adds support for bpf2bpf function calls in ppc64 JIT.

8) Yonghong Song adds BPF_TASK_FD_QUERY command for introspection of tracing events.

9) other misc fixes from Gustavo A. R. Silva, Sirio Balmelli, John Fastabend, and Magnus Karlsson

Please consider pulling these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git

Thanks a lot!

----------------------------------------------------------------

The following changes since commit b9f672af148bf7a08a6031743156faffd58dbc7e:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next (2018-05-16 22:47:11 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git 

for you to fetch changes up to 10f678683e4026e43524b0492068a371d00fdeed:

  Merge branch 'xdp_xmit-bulking' (2018-05-24 18:36:16 -0700)

----------------------------------------------------------------
Alexei Starovoitov (2):
      Merge branch 'bpf-task-fd-query'
      Merge branch 'xdp_xmit-bulking'

Björn Töpel (11):
      xsk: clean up SPDX headers
      xsk: remove newline at end of file
      xsk: fixed some cases of unnecessary parentheses
      xsk: proper '=' alignment
      xsk: remove rebind support
      xsk: fill hole in struct sockaddr_xdp
      xsk: remove explicit ring structure from uapi
      samples/bpf: adapt xdpsock to the new uapi
      xsk: add missing write- and data-dependency barrier
      xsk: simplified umem setup
      xsk: convert atomic_t to refcount_t

Daniel Borkmann (8):
      Merge branch 'bpf-af-xdp-cleanups'
      Merge branch 'bpf-nfp-shift-insns'
      Merge branch 'bpf-sk-msg-fields'
      Merge branch 'bpf-af-xdp-cleanups'
      Merge branch 'bpf-fib-mtu-check'
      Merge branch 'btf-uapi-cleanups'
      Merge branch 'bpf-multi-prog-improvements'
      Merge branch 'bpf-ipv6-seg6-bpf-action'

David Ahern (3):
      net/ipv4: Add helper to return path MTU based on fib result
      net/ipv6: Add helper to return path MTU based on fib result
      bpf: Add mtu checking to FIB forwarding helper

Gustavo A. R. Silva (2):
      bpf: sockmap, fix uninitialized variable
      bpf: sockmap, fix double-free

Jesper Dangaard Brouer (8):
      bpf: devmap introduce dev_map_enqueue
      bpf: devmap prepare xdp frames for bulking
      xdp: add tracepoint for devmap like cpumap have
      samples/bpf: xdp_monitor use tracepoint xdp:xdp_devmap_xmit
      xdp: introduce xdp_return_frame_rx_napi
      xdp: change ndo_xdp_xmit API to support bulking
      xdp/trace: extend tracepoint in devmap with an err
      samples/bpf: xdp_monitor use err code from tracepoint xdp:xdp_devmap_xmit

Jiong Wang (3):
      nfp: bpf: support logic indirect shifts (BPF_[L|R]SH | BPF_X)
      nfp: bpf: support arithmetic right shift by constant (BPF_ARSH | BPF_K)
      nfp: bpf: support arithmetic indirect right shift (BPF_ARSH | BPF_X)

John Fastabend (2):
      bpf: allow sk_msg programs to read sock fields
      bpf: add sk_msg prog sk access tests to test_verifier

Magnus Karlsson (1):
      xsk: proper queue id check at bind

Martin KaFai Lau (8):
      bpf: Expose check_uarg_tail_zero()
      bpf: btf: Change how section is supported in btf_header
      bpf: btf: Check array->index_type
      bpf: btf: Remove unused bits from uapi/linux/btf.h
      bpf: btf: Rename btf_key_id and btf_value_id in bpf_map_info
      bpf: btf: Sync bpf.h and btf.h to tools
      bpf: btf: Add tests for the btf uapi changes
      bpf: btf: Avoid variable length array

Mathieu Xhonneux (6):
      ipv6: sr: make seg6.h includable without IPv6
      ipv6: sr: export function lookup_nexthop
      bpf: Add IPv6 Segment Routing helpers
      bpf: Split lwt inout verifier structures
      ipv6: sr: Add seg6local action End.BPF
      selftests/bpf: test for seg6local End.BPF action

Quentin Monnet (1):
      bpf: change eBPF helper doc parsing script to allow for smaller indent

Sandipan Das (10):
      bpf: support 64-bit offsets for bpf function calls
      bpf: powerpc64: pad function address loads with NOPs
      bpf: powerpc64: add JIT support for multi-function programs
      bpf: get kernel symbol addresses via syscall
      tools: bpf: sync bpf uapi header
      tools: bpftool: resolve calls without using imm field
      bpf: fix multi-function JITed dump obtained via syscall
      bpf: get JITed image lengths of functions via syscall
      tools: bpf: sync bpf uapi header
      tools: bpftool: add delimiters to multi-function JITed dumps

Sirio Balmelli (2):
      selftests/bpf: Makefile fix "missing" headers on build with -idirafter
      tools/lib/libbpf.c: fix string format to allow build on arm32

Yonghong Song (7):
      perf/core: add perf_get_event() to return perf_event given a struct file
      bpf: introduce bpf subcommand BPF_TASK_FD_QUERY
      tools/bpf: sync kernel header bpf.h and add bpf_task_fd_query in libbpf
      tools/bpf: add ksym_get_addr() in trace_helpers
      samples/bpf: add a samples/bpf test for BPF_TASK_FD_QUERY
      tools/bpf: add two BPF_TASK_FD_QUERY tests in test_progs
      tools/bpftool: add perf subcommand

 arch/powerpc/net/bpf_jit_comp64.c                 | 110 ++++-
 drivers/net/ethernet/intel/i40e/i40e_txrx.c       |  26 +-
 drivers/net/ethernet/intel/i40e/i40e_txrx.h       |   2 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c     |  21 +-
 drivers/net/ethernet/netronome/nfp/bpf/jit.c      | 410 ++++++++++++++--
 drivers/net/ethernet/netronome/nfp/bpf/main.h     |  28 ++
 drivers/net/ethernet/netronome/nfp/bpf/offload.c  |   2 +
 drivers/net/ethernet/netronome/nfp/bpf/verifier.c |   8 +
 drivers/net/ethernet/netronome/nfp/nfp_asm.h      |  18 +-
 drivers/net/tun.c                                 |  37 +-
 drivers/net/virtio_net.c                          |  66 ++-
 include/linux/bpf.h                               |  24 +-
 include/linux/bpf_types.h                         |   5 +-
 include/linux/filter.h                            |   1 +
 include/linux/netdevice.h                         |  14 +-
 include/linux/perf_event.h                        |   5 +
 include/linux/trace_events.h                      |  17 +
 include/net/addrconf.h                            |   2 +
 include/net/ip6_fib.h                             |   6 +
 include/net/ip6_route.h                           |   3 +
 include/net/ip_fib.h                              |   2 +
 include/net/page_pool.h                           |   5 +-
 include/net/seg6.h                                |   7 +-
 include/net/seg6_local.h                          |  32 ++
 include/net/xdp.h                                 |   1 +
 include/net/xdp_sock.h                            |  13 +-
 include/trace/events/xdp.h                        |  50 +-
 include/uapi/linux/bpf.h                          | 143 +++++-
 include/uapi/linux/btf.h                          |  37 +-
 include/uapi/linux/if_xdp.h                       |  59 +--
 include/uapi/linux/seg6_local.h                   |  12 +
 kernel/bpf/arraymap.c                             |   2 +-
 kernel/bpf/btf.c                                  | 334 +++++++++----
 kernel/bpf/cpumap.c                               |   2 +-
 kernel/bpf/devmap.c                               | 131 ++++-
 kernel/bpf/sockmap.c                              |   4 +-
 kernel/bpf/syscall.c                              | 245 ++++++++-
 kernel/bpf/verifier.c                             |  23 +-
 kernel/bpf/xskmap.c                               |   9 -
 kernel/events/core.c                              |   8 +
 kernel/trace/bpf_trace.c                          |  48 ++
 kernel/trace/trace_kprobe.c                       |  29 ++
 kernel/trace/trace_uprobe.c                       |  22 +
 net/core/filter.c                                 | 572 +++++++++++++++++++---
 net/core/xdp.c                                    |  20 +-
 net/ipv4/route.c                                  |  31 ++
 net/ipv6/Kconfig                                  |   5 +
 net/ipv6/addrconf_core.c                          |   8 +
 net/ipv6/af_inet6.c                               |   1 +
 net/ipv6/route.c                                  |  48 ++
 net/ipv6/seg6_local.c                             | 190 ++++++-
 net/xdp/Makefile                                  |   1 -
 net/xdp/xdp_umem.c                                |  96 ++--
 net/xdp/xdp_umem.h                                |  18 +-
 net/xdp/xdp_umem_props.h                          |  13 +-
 net/xdp/xsk.c                                     | 150 +++---
 net/xdp/xsk_queue.c                               |  12 +-
 net/xdp/xsk_queue.h                               |  34 +-
 samples/bpf/Makefile                              |   4 +
 samples/bpf/task_fd_query_kern.c                  |  19 +
 samples/bpf/task_fd_query_user.c                  | 382 +++++++++++++++
 samples/bpf/xdp_monitor_kern.c                    |  49 ++
 samples/bpf/xdp_monitor_user.c                    |  69 ++-
 samples/bpf/xdpsock_user.c                        | 135 ++---
 scripts/bpf_helpers_doc.py                        |   8 +-
 tools/bpf/bpftool/Documentation/bpftool-perf.rst  |  81 +++
 tools/bpf/bpftool/Documentation/bpftool.rst       |   5 +-
 tools/bpf/bpftool/bash-completion/bpftool         |   9 +
 tools/bpf/bpftool/main.c                          |   3 +-
 tools/bpf/bpftool/main.h                          |   1 +
 tools/bpf/bpftool/perf.c                          | 246 ++++++++++
 tools/bpf/bpftool/prog.c                          |  97 +++-
 tools/bpf/bpftool/xlated_dumper.c                 |  14 +-
 tools/bpf/bpftool/xlated_dumper.h                 |   3 +
 tools/include/uapi/linux/bpf.h                    | 143 +++++-
 tools/include/uapi/linux/btf.h                    |  37 +-
 tools/lib/bpf/bpf.c                               |  27 +-
 tools/lib/bpf/bpf.h                               |   7 +-
 tools/lib/bpf/btf.c                               |   5 +-
 tools/lib/bpf/libbpf.c                            |  43 +-
 tools/lib/bpf/libbpf.h                            |   4 +-
 tools/testing/selftests/bpf/Makefile              |  16 +-
 tools/testing/selftests/bpf/bpf_helpers.h         |  12 +
 tools/testing/selftests/bpf/test_btf.c            | 521 ++++++++++++++++----
 tools/testing/selftests/bpf/test_lwt_seg6local.c  | 437 +++++++++++++++++
 tools/testing/selftests/bpf/test_lwt_seg6local.sh | 140 ++++++
 tools/testing/selftests/bpf/test_progs.c          | 158 ++++++
 tools/testing/selftests/bpf/test_verifier.c       | 115 +++++
 tools/testing/selftests/bpf/trace_helpers.c       |  12 +
 tools/testing/selftests/bpf/trace_helpers.h       |   1 +
 90 files changed, 5213 insertions(+), 812 deletions(-)
 create mode 100644 include/net/seg6_local.h
 create mode 100644 samples/bpf/task_fd_query_kern.c
 create mode 100644 samples/bpf/task_fd_query_user.c
 create mode 100644 tools/bpf/bpftool/Documentation/bpftool-perf.rst
 create mode 100644 tools/bpf/bpftool/perf.c
 create mode 100644 tools/testing/selftests/bpf/test_lwt_seg6local.c
 create mode 100755 tools/testing/selftests/bpf/test_lwt_seg6local.sh

^ permalink raw reply

* Re: [pull request][net 0/2] Mellanox, mlx5 fixes 2018-05-24
From: David Miller @ 2018-05-25  2:02 UTC (permalink / raw)
  To: saeedm; +Cc: netdev
In-Reply-To: <20180524215313.7605-1-saeedm@mellanox.com>

From: Saeed Mahameed <saeedm@mellanox.com>
Date: Thu, 24 May 2018 14:53:11 -0700

> This series includes two mlx5 fixes.
> 
> 1) add FCS data to checksum complete when required, from Eran Ben
> Elisha.
> 
> 2) Fix A race in IPSec sandbox QP commands, from Yossi Kuperman.
> 
> Please pull and let me know if there's any problem.

Pulled.

> for -stable v4.15
> ("net/mlx5e: When RXFCS is set, add FCS data into checksum calculation")

Queued up.

^ permalink raw reply

* Re: [PATCH net] packet: fix reserve calculation
From: David Miller @ 2018-05-25  1:56 UTC (permalink / raw)
  To: willemdebruijn.kernel; +Cc: netdev, willemb
In-Reply-To: <20180524221030.158150-1-willemdebruijn.kernel@gmail.com>

From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Date: Thu, 24 May 2018 18:10:30 -0400

> From: Willem de Bruijn <willemb@google.com>
> 
> Commit b84bbaf7a6c8 ("packet: in packet_snd start writing at link
> layer allocation") ensures that packet_snd always starts writing
> the link layer header in reserved headroom allocated for this
> purpose.
> 
> This is needed because packets may be shorter than hard_header_len,
> in which case the space up to hard_header_len may be zeroed. But
> that necessary padding is not accounted for in skb->len.
> 
> The fix, however, is buggy. It calls skb_push, which grows skb->len
> when moving skb->data back. But in this case packet length should not
> change.
> 
> Instead, call skb_reserve, which moves both skb->data and skb->tail
> back, without changing length.
> 
> Fixes: b84bbaf7a6c8 ("packet: in packet_snd start writing at link layer allocation")
> Reported-by: Tariq Toukan <tariqt@mellanox.com>
> Signed-off-by: Willem de Bruijn <willemb@google.com>
> Acked-by: Soheil Hassas Yeganeh <soheil@google.com>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [PATCH net-next] cxgb4: Check for kvzalloc allocation failure
From: David Miller @ 2018-05-25  1:52 UTC (permalink / raw)
  To: yuehaibing; +Cc: ganeshgr, linux-kernel, netdev
In-Reply-To: <f26c9a69-2d48-ddc7-4f12-53d0cf65b4f1@huawei.com>

From: YueHaibing <yuehaibing@huawei.com>
Date: Fri, 25 May 2018 09:39:20 +0800

> On 2018/5/24 23:07, David Miller wrote:
>> From: YueHaibing <yuehaibing@huawei.com>
>> Date: Tue, 22 May 2018 15:07:18 +0800
>> 
>>> diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
>>> index 130d1ee..019cffe 100644
>>> --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
>>> +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
>>> @@ -4135,6 +4135,10 @@ static int adap_init0(struct adapter *adap)
>>>  		 * card
>>>  		 */
>>>  		card_fw = kvzalloc(sizeof(*card_fw), GFP_KERNEL);
>>> +		if (!card_fw) {
>>> +			ret = -ENOMEM;
>>> +			goto bye;
>>> +		}
>>>  
>> 
>> On error, this leaks fw_info.
> 
> Hi David,
> 
> I checked fw_info is an element of fw_info_array，there all members of struct fw_info no need free.

Aha, I misread the code, sorry.

Applied, thanks.

^ permalink raw reply

* Re: [bpf-next V5 PATCH 0/8] xdp: introduce bulking for ndo_xdp_xmit API
From: Alexei Starovoitov @ 2018-05-25  1:42 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, Daniel Borkmann, Christoph Hellwig, BjörnTöpel,
	John Fastabend, Magnus Karlsson, makita.toshiaki
In-Reply-To: <152717306303.4777.4205616217877503311.stgit@firesoul>

On Thu, May 24, 2018 at 04:45:41PM +0200, Jesper Dangaard Brouer wrote:
> This patchset change ndo_xdp_xmit API to take a bulk of xdp frames.
> 
> When kernel is compiled with CONFIG_RETPOLINE, every indirect function
> pointer (branch) call hurts performance. For XDP this have a huge
> negative performance impact.
> 
> This patchset reduce the needed (indirect) calls to ndo_xdp_xmit, but
> also prepares for further optimizations.  The DMA APIs use of indirect
> function pointer calls is the primary source the regression.  It is
> left for a followup patchset, to use bulking calls towards the DMA API
> (via the scatter-gatter calls).
> 
> The other advantage of this API change is that drivers can easier
> amortize the cost of any sync/locking scheme, over the bulk of
> packets.  The assumption of the current API is that the driver
> implemementing the NDO will also allocate a dedicated XDP TX queue for
> every CPU in the system.  Which is not always possible or practical to
> configure. E.g. ixgbe cannot load an XDP program on a machine with
> more than 96 CPUs, due to limited hardware TX queues.  E.g. virtio_net
> is hard to configure as it requires manually increasing the
> queues. E.g. tun driver chooses to use a per XDP frame producer lock
> modulo smp_processor_id over avail queues.
> 
> I'm considered adding 'flags' to ndo_xdp_xmit, but it's not part of
> this patchset.  This will be a followup patchset, once we know if this
> will be needed (e.g. for non-map xdp_redirect flush-flag, and if
> AF_XDP chooses to use ndo_xdp_xmit for TX).
> 
> ---
> V5: Fixed up issues spotted by Daniel and John
> 
> V4: Splitout the patches from 4 to 8 patches.  I cannot split the
> driver changes from the NDO change, but I've tried to isolated the NDO
> change together with the driver change as much as possible.

The patch 6/8 would have benefited from a review from Intel folks,
but the series have been pending for too long already, hence
Applied to bpf-next.
Please address any follow up reviews if/when they come.
Thanks Jesper.

^ permalink raw reply

* Re: [PATCH net-next] cxgb4: Check for kvzalloc allocation failure
From: YueHaibing @ 2018-05-25  1:39 UTC (permalink / raw)
  To: David Miller; +Cc: ganeshgr, linux-kernel, netdev
In-Reply-To: <20180524.110743.522760687215216591.davem@davemloft.net>

On 2018/5/24 23:07, David Miller wrote:
> From: YueHaibing <yuehaibing@huawei.com>
> Date: Tue, 22 May 2018 15:07:18 +0800
> 
>> diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
>> index 130d1ee..019cffe 100644
>> --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
>> +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
>> @@ -4135,6 +4135,10 @@ static int adap_init0(struct adapter *adap)
>>  		 * card
>>  		 */
>>  		card_fw = kvzalloc(sizeof(*card_fw), GFP_KERNEL);
>> +		if (!card_fw) {
>> +			ret = -ENOMEM;
>> +			goto bye;
>> +		}
>>  
> 
> On error, this leaks fw_info.

Hi David,

I checked fw_info is an element of fw_info_array，there all members of struct fw_info no need free.

It likes this :

static struct fw_info fw_info_array[] = {
	{
		.chip = CHELSIO_T4,
		.fs_name = FW4_CFNAME,
		.fw_mod_name = FW4_FNAME,
		.fw_hdr = {
			.chip = FW_HDR_CHIP_T4,
			.fw_ver = __cpu_to_be32(FW_VERSION(T4)),
			.intfver_nic = FW_INTFVER(T4, NIC),
			.intfver_vnic = FW_INTFVER(T4, VNIC),
			.intfver_ri = FW_INTFVER(T4, RI),
			.intfver_iscsi = FW_INTFVER(T4, ISCSI),
			.intfver_fcoe = FW_INTFVER(T4, FCOE),
		},
	}, {
		........

Am I missing something?
> 
> .
> 

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox