Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 1/3 net-next] linux: Add skb_frag_t page_offset accessors
From: Jonathan Lemon @ 2019-07-29 21:37 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, kernel-team, netdev, Matthew Wilcox
In-Reply-To: <20190729142211.43b5ccd8@cakuba.netronome.com>



On 29 Jul 2019, at 14:22, Jakub Kicinski wrote:

> On Mon, 29 Jul 2019 14:02:21 -0700, Jonathan Lemon wrote:
>> On 29 Jul 2019, at 13:50, Jakub Kicinski wrote:
>>> On Mon, 29 Jul 2019 10:19:39 -0700, Jonathan Lemon wrote:
>>>> Add skb_frag_off(), skb_frag_off_add(), skb_frag_off_set(),
>>>> and skb_frag_off_set_from() accessors for page_offset.
>>>>
>>>> Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
>
>>>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>>>> index 718742b1c505..7d94a78067ee 100644
>>>> --- a/include/linux/skbuff.h
>>>> +++ b/include/linux/skbuff.h
>>>> @@ -331,7 +331,7 @@ static inline void skb_frag_size_set(skb_frag_t
>>>> *frag, unsigned int size)
>>>>  }
>>>>
>>>>  /**
>>>> - * skb_frag_size_add - Incrementes the size of a skb fragment by
>>>> %delta
>>>> + * skb_frag_size_add - Increments the size of a skb fragment by
>>>> %delta
>>>>   * @frag: skb fragment
>>>>   * @delta: value to add
>>>>   */
>>>> @@ -2857,6 +2857,46 @@ static inline void
>>>> skb_propagate_pfmemalloc(struct page *page,
>>>>  		skb->pfmemalloc = true;
>>>>  }
>>>>
>>>> +/**
>>>> + * skb_frag_off - Returns the offset of a skb fragment
>>>> + * @frag: the paged fragment
>>>> + */
>>>> +static inline unsigned int skb_frag_off(const skb_frag_t *frag)
>>>> +{
>>>> +	return frag->page_offset;
>>>> +}
>>>> +
>>>> +/**
>>>> + * skb_frag_off_add - Increments the offset of a skb fragment by
>>>> %delta
>>>
>>> I realize you're following the existing code, but should we perhaps
>>> use
>>> the latest kdoc syntax? '()' after function name, and args should 
>>> have
>>> '@' prefix, '%' would be for constants.
>>
>> That would be a task for a different cleanup.  Not that I disagree 
>> with
>> you, but there's also nothing worse than mixing styles in the same 
>> file.
>
> Funny you should say that given that (a) I'm commenting on the new 
> code
> you're adding, and (b) you did do an unrelated spelling fix above ;)
>
>>>> + * @frag: skb fragment
>>>> + * @delta: value to add
>>>> + */
>>>> +static inline void skb_frag_off_add(skb_frag_t *frag, int delta)
>>>> +{
>>>> +	frag->page_offset += delta;
>>>> +}
>>>> +
>>>> +/**
>>>> + * skb_frag_off_set - Sets the offset of a skb fragment
>>>> + * @frag: skb fragment
>>>> + * @offset: offset of fragment
>>>> + */
>>>> +static inline void skb_frag_off_set(skb_frag_t *frag, unsigned int
>>>> offset)
>>>> +{
>>>> +	frag->page_offset = offset;
>>>> +}
>>>> +
>>>> +/**
>>>> + * skb_frag_off_set_from - Sets the offset of a skb fragment from
>>>> another fragment
>>>> + * @fragto: skb fragment where offset is set
>>>> + * @fragfrom: skb fragment offset is copied from
>>>> + */
>>>> +static inline void skb_frag_off_set_from(skb_frag_t *fragto,
>>>> +					 const skb_frag_t *fragfrom)
>>>
>>> skb_frag_off_copy() ?
>>
>> That was my initial inclination, but due to the often overloaded
>> connotations of the word "copy", opted to use the same "set" verbiage
>> that existed in the other functions.
>
> There is no need to ponder the connotations of verbs. Please just
> look at other function names in skbuff.h, especially those which
> copy fields :)
>
> static inline void skb_copy_hash(struct sk_buff *to, const struct 
> sk_buff *from)
> static inline void skb_copy_secmark(struct sk_buff *to, const struct 
> sk_buff *from)
> static inline void skb_copy_queue_mapping(struct sk_buff *to, const 
> struct sk_buff *from)
> static inline void skb_ext_copy(struct sk_buff *dst, const struct 
> sk_buff *src)

Okay, I missed those, let me respin.
-- 
Jonathan

^ permalink raw reply

* [net-next:master 57/59] drivers/staging/octeon/octeon-stubs.h:1205:9: warning: cast to pointer from integer of different size
From: kbuild test robot @ 2019-07-29 21:36 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: kbuild-all, netdev

[-- Attachment #1: Type: text/plain, Size: 8294 bytes --]

tree:   https://kernel.googlesource.com/pub/scm/linux/kernel/git/davem/net-next.git master
head:   1cb9dfca39eb406960f8f84864ddd6ba329ec321
commit: 171a9bae68c72f2d1260c3825203760856e6793b [57/59] staging/octeon: Allow test build on !MIPS
config: sh-allmodconfig (attached as .config)
compiler: sh4-linux-gcc (GCC) 7.4.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        git checkout 171a9bae68c72f2d1260c3825203760856e6793b
        # save the attached .config to linux build tree
        GCC_VERSION=7.4.0 make.cross ARCH=sh 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   In file included from drivers/staging/octeon/octeon-ethernet.h:41:0,
                    from drivers/staging/octeon/ethernet.c:22:
   drivers/staging/octeon/octeon-stubs.h: In function 'cvmx_phys_to_ptr':
>> drivers/staging/octeon/octeon-stubs.h:1205:9: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
     return (void *)(physical_address);
            ^
--
   In file included from drivers/staging/octeon/octeon-ethernet.h:41:0,
                    from drivers/staging/octeon/ethernet-mem.c:12:
   drivers/staging/octeon/octeon-stubs.h: In function 'cvmx_phys_to_ptr':
>> drivers/staging/octeon/octeon-stubs.h:1205:9: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
     return (void *)(physical_address);
            ^
   In file included from include/linux/scatterlist.h:9:0,
                    from include/linux/dma-mapping.h:11,
                    from include/linux/skbuff.h:31,
                    from include/linux/if_ether.h:19,
                    from include/uapi/linux/ethtool.h:19,
                    from include/linux/ethtool.h:18,
                    from include/linux/netdevice.h:37,
                    from drivers/staging/octeon/ethernet-mem.c:9:
   drivers/staging/octeon/ethernet-mem.c: In function 'cvm_oct_free_hw_memory':
   arch/sh/include/asm/io.h:244:32: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
    #define phys_to_virt(address) ((void *)(address))
                                   ^
>> drivers/staging/octeon/ethernet-mem.c:123:18: note: in expansion of macro 'phys_to_virt'
       fpa = (char *)phys_to_virt(cvmx_ptr_to_phys(fpa));
                     ^~~~~~~~~~~~
--
   In file included from drivers/staging/octeon/octeon-ethernet.h:41:0,
                    from drivers/staging/octeon/ethernet-tx.c:25:
   drivers/staging/octeon/octeon-stubs.h: In function 'cvmx_phys_to_ptr':
>> drivers/staging/octeon/octeon-stubs.h:1205:9: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
     return (void *)(physical_address);
            ^
   drivers/staging/octeon/ethernet-tx.c: In function 'cvm_oct_xmit':
>> drivers/staging/octeon/ethernet-tx.c:264:37: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
      hw_buffer.s.addr = XKPHYS_TO_PHYS((u64)skb->data);
                                        ^
   drivers/staging/octeon/octeon-stubs.h:2:30: note: in definition of macro 'XKPHYS_TO_PHYS'
    #define XKPHYS_TO_PHYS(p)   (p)
                                 ^
   drivers/staging/octeon/ethernet-tx.c:268:37: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
      hw_buffer.s.addr = XKPHYS_TO_PHYS((u64)skb->data);
                                        ^
   drivers/staging/octeon/octeon-stubs.h:2:30: note: in definition of macro 'XKPHYS_TO_PHYS'
    #define XKPHYS_TO_PHYS(p)   (p)
                                 ^
   drivers/staging/octeon/ethernet-tx.c:276:20: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
        XKPHYS_TO_PHYS((u64)skb_frag_address(fs));
                       ^
   drivers/staging/octeon/octeon-stubs.h:2:30: note: in definition of macro 'XKPHYS_TO_PHYS'
    #define XKPHYS_TO_PHYS(p)   (p)
                                 ^
   drivers/staging/octeon/ethernet-tx.c:280:37: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
      hw_buffer.s.addr = XKPHYS_TO_PHYS((u64)CVM_OCT_SKB_CB(skb));
                                        ^
   drivers/staging/octeon/octeon-stubs.h:2:30: note: in definition of macro 'XKPHYS_TO_PHYS'
    #define XKPHYS_TO_PHYS(p)   (p)
                                 ^
--
   drivers/net/phy/mdio-octeon.c: In function 'octeon_mdiobus_probe':
>> drivers/net/phy/mdio-octeon.c:48:3: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
      (u64)devm_ioremap(&pdev->dev, mdio_phys, regsize);
      ^
   In file included from include/linux/io.h:13:0,
                    from include/linux/of_address.h:7,
                    from drivers/net/phy/mdio-octeon.c:7:
>> drivers/net/phy/mdio-cavium.h:111:48: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
    #define oct_mdio_writeq(val, addr) writeq(val, (void *)addr)
                                                   ^
   arch/sh/include/asm/io.h:33:71: note: in definition of macro '__raw_writeq'
    #define __raw_writeq(v,a) (__chk_io_ptr(a), *(volatile u64 __force *)(a) = (v))
                                                                          ^
>> arch/sh/include/asm/io.h:58:32: note: in expansion of macro 'writeq_relaxed'
    #define writeq(v,a)  ({ wmb(); writeq_relaxed((v),(a)); })
                                   ^~~~~~~~~~~~~~
>> drivers/net/phy/mdio-cavium.h:111:36: note: in expansion of macro 'writeq'
    #define oct_mdio_writeq(val, addr) writeq(val, (void *)addr)
                                       ^~~~~~
>> drivers/net/phy/mdio-octeon.c:56:2: note: in expansion of macro 'oct_mdio_writeq'
     oct_mdio_writeq(smi_en.u64, bus->register_base + SMI_EN);
     ^~~~~~~~~~~~~~~
>> drivers/net/phy/mdio-cavium.h:111:48: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
    #define oct_mdio_writeq(val, addr) writeq(val, (void *)addr)
                                                   ^
   arch/sh/include/asm/io.h:33:71: note: in definition of macro '__raw_writeq'
    #define __raw_writeq(v,a) (__chk_io_ptr(a), *(volatile u64 __force *)(a) = (v))
                                                                          ^
>> arch/sh/include/asm/io.h:58:32: note: in expansion of macro 'writeq_relaxed'
    #define writeq(v,a)  ({ wmb(); writeq_relaxed((v),(a)); })
                                   ^~~~~~~~~~~~~~
>> drivers/net/phy/mdio-cavium.h:111:36: note: in expansion of macro 'writeq'
    #define oct_mdio_writeq(val, addr) writeq(val, (void *)addr)
                                       ^~~~~~
   drivers/net/phy/mdio-octeon.c:77:2: note: in expansion of macro 'oct_mdio_writeq'
     oct_mdio_writeq(smi_en.u64, bus->register_base + SMI_EN);
     ^~~~~~~~~~~~~~~
   drivers/net/phy/mdio-octeon.c: In function 'octeon_mdiobus_remove':
>> drivers/net/phy/mdio-cavium.h:111:48: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
    #define oct_mdio_writeq(val, addr) writeq(val, (void *)addr)
                                                   ^
   arch/sh/include/asm/io.h:33:71: note: in definition of macro '__raw_writeq'
    #define __raw_writeq(v,a) (__chk_io_ptr(a), *(volatile u64 __force *)(a) = (v))
                                                                          ^
>> arch/sh/include/asm/io.h:58:32: note: in expansion of macro 'writeq_relaxed'
    #define writeq(v,a)  ({ wmb(); writeq_relaxed((v),(a)); })
                                   ^~~~~~~~~~~~~~
>> drivers/net/phy/mdio-cavium.h:111:36: note: in expansion of macro 'writeq'
    #define oct_mdio_writeq(val, addr) writeq(val, (void *)addr)
                                       ^~~~~~
   drivers/net/phy/mdio-octeon.c:91:2: note: in expansion of macro 'oct_mdio_writeq'
     oct_mdio_writeq(smi_en.u64, bus->register_base + SMI_EN);
     ^~~~~~~~~~~~~~~

vim +1205 drivers/staging/octeon/octeon-stubs.h

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 51905 bytes --]

^ permalink raw reply

* [PATCH bpf-next] tools: bpftool: add support for reporting the effective cgroup progs
From: Jakub Kicinski @ 2019-07-29 21:35 UTC (permalink / raw)
  To: alexei.starovoitov, daniel
  Cc: netdev, bpf, oss-drivers, ctakshak, kernel-team, Jakub Kicinski,
	Quentin Monnet

Takshak said in the original submission:

With different bpf attach_flags available to attach bpf programs specially
with BPF_F_ALLOW_OVERRIDE and BPF_F_ALLOW_MULTI, the list of effective
bpf-programs available to any sub-cgroups really needs to be available for
easy debugging.

Using BPF_F_QUERY_EFFECTIVE flag, one can get the list of not only attached
bpf-programs to a cgroup but also the inherited ones from parent cgroup.

So a new option is introduced to use BPF_F_QUERY_EFFECTIVE query flag here
to list all the effective bpf-programs available for execution at a specified
cgroup.

Reused modified test program test_cgroup_attach from tools/testing/selftests/bpf:
  # ./test_cgroup_attach

With old bpftool:

 # bpftool cgroup show /sys/fs/cgroup/cgroup-test-work-dir/cg1/
  ID       AttachType      AttachFlags     Name
  271      egress          multi           pkt_cntr_1
  272      egress          multi           pkt_cntr_2

Attached new program pkt_cntr_4 in cg2 gives following:

 # bpftool cgroup show /sys/fs/cgroup/cgroup-test-work-dir/cg1/cg2
  ID       AttachType      AttachFlags     Name
  273      egress          override        pkt_cntr_4

And with new "effective" option it shows all effective programs for cg2:

 # bpftool cgroup show /sys/fs/cgroup/cgroup-test-work-dir/cg1/cg2 effective
  ID       AttachType      AttachFlags     Name
  273      egress          override        pkt_cntr_4
  271      egress          override        pkt_cntr_1
  272      egress          override        pkt_cntr_2

Compared to original submission use a local flag instead of global
option.

We need to clear query_flags on every command, in case batch mode
wants to use varying settings.

Signed-off-by: Takshak Chahande <ctakshak@fb.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
 .../bpftool/Documentation/bpftool-cgroup.rst  | 16 +++--
 tools/bpf/bpftool/bash-completion/bpftool     | 15 +++--
 tools/bpf/bpftool/cgroup.c                    | 65 ++++++++++++-------
 3 files changed, 63 insertions(+), 33 deletions(-)

diff --git a/tools/bpf/bpftool/Documentation/bpftool-cgroup.rst b/tools/bpf/bpftool/Documentation/bpftool-cgroup.rst
index 585f270c2d25..06a28b07787d 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-cgroup.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-cgroup.rst
@@ -20,8 +20,8 @@ SYNOPSIS
 CGROUP COMMANDS
 ===============
 
-|	**bpftool** **cgroup { show | list }** *CGROUP*
-|	**bpftool** **cgroup tree** [*CGROUP_ROOT*]
+|	**bpftool** **cgroup { show | list }** *CGROUP* [**effective**]
+|	**bpftool** **cgroup tree** [*CGROUP_ROOT*] [**effective**]
 |	**bpftool** **cgroup attach** *CGROUP* *ATTACH_TYPE* *PROG* [*ATTACH_FLAGS*]
 |	**bpftool** **cgroup detach** *CGROUP* *ATTACH_TYPE* *PROG*
 |	**bpftool** **cgroup help**
@@ -35,13 +35,17 @@ CGROUP COMMANDS
 
 DESCRIPTION
 ===========
-	**bpftool cgroup { show | list }** *CGROUP*
+	**bpftool cgroup { show | list }** *CGROUP* [**effective**]
 		  List all programs attached to the cgroup *CGROUP*.
 
 		  Output will start with program ID followed by attach type,
 		  attach flags and program name.
 
-	**bpftool cgroup tree** [*CGROUP_ROOT*]
+		  If **effective** is specified retrieve effective programs that
+		  will execute for events within a cgroup. This includes
+		  inherited along with attached ones.
+
+	**bpftool cgroup tree** [*CGROUP_ROOT*] [**effective**]
 		  Iterate over all cgroups in *CGROUP_ROOT* and list all
 		  attached programs. If *CGROUP_ROOT* is not specified,
 		  bpftool uses cgroup v2 mountpoint.
@@ -50,6 +54,10 @@ DESCRIPTION
 		  commands: it starts with absolute cgroup path, followed by
 		  program ID, attach type, attach flags and program name.
 
+		  If **effective** is specified retrieve effective programs that
+		  will execute for events within a cgroup. This includes
+		  inherited along with attached ones.
+
 	**bpftool cgroup attach** *CGROUP* *ATTACH_TYPE* *PROG* [*ATTACH_FLAGS*]
 		  Attach program *PROG* to the cgroup *CGROUP* with attach type
 		  *ATTACH_TYPE* and optional *ATTACH_FLAGS*.
diff --git a/tools/bpf/bpftool/bash-completion/bpftool b/tools/bpf/bpftool/bash-completion/bpftool
index 6b961a5ed100..df16c5415444 100644
--- a/tools/bpf/bpftool/bash-completion/bpftool
+++ b/tools/bpf/bpftool/bash-completion/bpftool
@@ -710,12 +710,15 @@ _bpftool()
             ;;
         cgroup)
             case $command in
-                show|list)
-                    _filedir
-                    return 0
-                    ;;
-                tree)
-                    _filedir
+                show|list|tree)
+                    case $cword in
+                        3)
+                            _filedir
+                            ;;
+                        4)
+                            COMPREPLY=( $( compgen -W 'effective' -- "$cur" ) )
+                            ;;
+                    esac
                     return 0
                     ;;
                 attach|detach)
diff --git a/tools/bpf/bpftool/cgroup.c b/tools/bpf/bpftool/cgroup.c
index f3c05b08c68c..339c2c78b8e4 100644
--- a/tools/bpf/bpftool/cgroup.c
+++ b/tools/bpf/bpftool/cgroup.c
@@ -29,6 +29,8 @@
 	"                        recvmsg4 | recvmsg6 | sysctl |\n"	       \
 	"                        getsockopt | setsockopt }"
 
+static unsigned int query_flags;
+
 static const char * const attach_type_strings[] = {
 	[BPF_CGROUP_INET_INGRESS] = "ingress",
 	[BPF_CGROUP_INET_EGRESS] = "egress",
@@ -107,7 +109,8 @@ static int count_attached_bpf_progs(int cgroup_fd, enum bpf_attach_type type)
 	__u32 prog_cnt = 0;
 	int ret;
 
-	ret = bpf_prog_query(cgroup_fd, type, 0, NULL, NULL, &prog_cnt);
+	ret = bpf_prog_query(cgroup_fd, type, query_flags, NULL,
+			     NULL, &prog_cnt);
 	if (ret)
 		return -1;
 
@@ -125,8 +128,8 @@ static int show_attached_bpf_progs(int cgroup_fd, enum bpf_attach_type type,
 	int ret;
 
 	prog_cnt = ARRAY_SIZE(prog_ids);
-	ret = bpf_prog_query(cgroup_fd, type, 0, &attach_flags, prog_ids,
-			     &prog_cnt);
+	ret = bpf_prog_query(cgroup_fd, type, query_flags, &attach_flags,
+			     prog_ids, &prog_cnt);
 	if (ret)
 		return ret;
 
@@ -158,20 +161,30 @@ static int show_attached_bpf_progs(int cgroup_fd, enum bpf_attach_type type,
 static int do_show(int argc, char **argv)
 {
 	enum bpf_attach_type type;
+	const char *path;
 	int cgroup_fd;
 	int ret = -1;
 
-	if (argc < 1) {
-		p_err("too few parameters for cgroup show");
-		goto exit;
-	} else if (argc > 1) {
-		p_err("too many parameters for cgroup show");
-		goto exit;
+	query_flags = 0;
+
+	if (!REQ_ARGS(1))
+		return -1;
+	path = GET_ARG();
+
+	while (argc) {
+		if (is_prefix(*argv, "effective")) {
+			query_flags |= BPF_F_QUERY_EFFECTIVE;
+			NEXT_ARG();
+		} else {
+			p_err("expected no more arguments, 'effective', got: '%s'?",
+			      *argv);
+			return -1;
+		}
 	}
 
-	cgroup_fd = open(argv[0], O_RDONLY);
+	cgroup_fd = open(path, O_RDONLY);
 	if (cgroup_fd < 0) {
-		p_err("can't open cgroup %s", argv[0]);
+		p_err("can't open cgroup %s", path);
 		goto exit;
 	}
 
@@ -297,23 +310,29 @@ static int do_show_tree(int argc, char **argv)
 	char *cgroup_root;
 	int ret;
 
-	switch (argc) {
-	case 0:
+	query_flags = 0;
+
+	if (!argc) {
 		cgroup_root = find_cgroup_root();
 		if (!cgroup_root) {
 			p_err("cgroup v2 isn't mounted");
 			return -1;
 		}
-		break;
-	case 1:
-		cgroup_root = argv[0];
-		break;
-	default:
-		p_err("too many parameters for cgroup tree");
-		return -1;
+	} else {
+		cgroup_root = GET_ARG();
+
+		while (argc) {
+			if (is_prefix(*argv, "effective")) {
+				query_flags |= BPF_F_QUERY_EFFECTIVE;
+				NEXT_ARG();
+			} else {
+				p_err("expected no more arguments, 'effective', got: '%s'?",
+				      *argv);
+				return -1;
+			}
+		}
 	}
 
-
 	if (json_output)
 		jsonw_start_array(json_wtr);
 	else
@@ -459,8 +478,8 @@ static int do_help(int argc, char **argv)
 	}
 
 	fprintf(stderr,
-		"Usage: %s %s { show | list } CGROUP\n"
-		"       %s %s tree [CGROUP_ROOT]\n"
+		"Usage: %s %s { show | list } CGROUP [**effective**]\n"
+		"       %s %s tree [CGROUP_ROOT] [**effective**]\n"
 		"       %s %s attach CGROUP ATTACH_TYPE PROG [ATTACH_FLAGS]\n"
 		"       %s %s detach CGROUP ATTACH_TYPE PROG\n"
 		"       %s %s help\n"
-- 
2.21.0


^ permalink raw reply related

* Re: [PATCH net-next 3/3] net: stmmac: Introducing support for Page Pool
From: Jon Hunter @ 2019-07-29 21:33 UTC (permalink / raw)
  To: Jose Abreu, Robin Murphy, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, linux-stm32@st-md-mailman.stormreply.com,
	linux-arm-kernel@lists.infradead.org, Catalin Marinas,
	Will Deacon
  Cc: Joao Pinto, Alexandre Torgue, Maxime Ripard, Chen-Yu Tsai,
	Maxime Coquelin, linux-tegra, Giuseppe Cavallaro,
	David S . Miller
In-Reply-To: <MN2PR12MB3279ABF628C52883021123C5D3DD0@MN2PR12MB3279.namprd12.prod.outlook.com>


On 29/07/2019 15:08, Jose Abreu wrote:

...

>>> Hi Catalin and Will,
>>>
>>> Sorry to add you in such a long thread but we are seeing a DMA issue
>>> with stmmac driver in an ARM64 platform with IOMMU enabled.
>>>
>>> The issue seems to be solved when buffers allocation for DMA based
>>> transfers are *not* mapped with the DMA_ATTR_SKIP_CPU_SYNC flag *OR*
>>> when IOMMU is disabled.
>>>
>>> Notice that after transfer is done we do use
>>> dma_sync_single_for_{cpu,device} and then we reuse *the same* page for
>>> another transfer.
>>>
>>> Can you please comment on whether DMA_ATTR_SKIP_CPU_SYNC can not be used
>>> in ARM64 platforms with IOMMU ?
>>
>> In terms of what they do, there should be no difference on arm64 between:
>>
>> dma_map_page(..., dir);
>> ...
>> dma_unmap_page(..., dir);
>>
>> and:
>>
>> dma_map_page_attrs(..., dir, DMA_ATTR_SKIP_CPU_SYNC);
>> dma_sync_single_for_device(..., dir);
>> ...
>> dma_sync_single_for_cpu(..., dir);
>> dma_unmap_page_attrs(..., dir, DMA_ATTR_SKIP_CPU_SYNC);
>>
>> provided that the first sync covers the whole buffer and any subsequent 
>> ones cover at least the parts of the buffer which may have changed. Plus 
>> for coherent hardware it's entirely moot either way.
> 
> Thanks for confirming. That's indeed what stmmac is doing when buffer is 
> received by syncing the packet size to CPU.
> 
>>
>> Given Jon's previous findings, I would lean towards the idea that 
>> performing the extra (redundant) cache maintenance plus barrier in 
>> dma_unmap is mostly just perturbing timing in the same way as the debug 
>> print which also made things seem OK.
> 
> Mikko said that Tegra186 is not coherent so we have to explicit flush 
> pipeline but I don't understand why sync_single() is not doing it ...
> 
> Jon, can you please remove *all* debug prints, hacks, etc ... and test 
> this one in attach with plain -net tree ?

So far I have just been testing on the mainline kernel branch. The issue
still persists after applying this on mainline. I can test on the -net
tree, but I am not sure that will make a difference.

Cheers
Jon

-- 
nvpublic

^ permalink raw reply

* Re: [PATCH net] net/mlx5e: Fix unnecessary flow_block_cb_is_busy call
From: Saeed Mahameed @ 2019-07-29 21:30 UTC (permalink / raw)
  To: wenxu@ucloud.cn, netdev@vger.kernel.org
In-Reply-To: <1564239595-23786-1-git-send-email-wenxu@ucloud.cn>

On Sat, 2019-07-27 at 22:59 +0800, wenxu@ucloud.cn wrote:
> From: wenxu <wenxu@ucloud.cn>
> 
> When call flow_block_cb_is_busy. The indr_priv is guaranteed to
> NULL ptr. So there is no need to call flow_bock_cb_is_busy.
> 
> Fixes: 0d4fd02e7199 ("net: flow_offload: add flow_block_cb_is_busy()
> and use it")
> Signed-off-by: wenxu <wenxu@ucloud.cn>

Applied to net-next-mlx5 branch.

Thanks,
Saeed.

^ permalink raw reply

* Re: [PATCH 1/3 net-next] linux: Add skb_frag_t page_offset accessors
From: Jakub Kicinski @ 2019-07-29 21:25 UTC (permalink / raw)
  To: Jonathan Lemon; +Cc: davem, kernel-team, netdev, Matthew Wilcox
In-Reply-To: <20190729142211.43b5ccd8@cakuba.netronome.com>

On Mon, 29 Jul 2019 14:22:11 -0700, Jakub Kicinski wrote:
> > > I realize you're following the existing code, but should we perhaps 
> > > use
> > > the latest kdoc syntax? '()' after function name, and args should have
> > > '@' prefix, '%' would be for constants.    
> > 
> > That would be a task for a different cleanup.  Not that I disagree with 
> > you, but there's also nothing worse than mixing styles in the same file.  
> 
> Funny you should say that given that (a) I'm commenting on the new code
> you're adding, and (b) you did do an unrelated spelling fix above ;)

Ah, sorry I misread your comment there.

Some code already uses '()' in this file, as for the '%' skb_frag_
functions are the only one which have this mistake, the rest of kdoc 
is correct.

^ permalink raw reply

* Re: [PATCH 1/3 net-next] linux: Add skb_frag_t page_offset accessors
From: David Miller @ 2019-07-29 21:25 UTC (permalink / raw)
  To: jakub.kicinski; +Cc: jonathan.lemon, kernel-team, netdev, willy
In-Reply-To: <20190729142211.43b5ccd8@cakuba.netronome.com>

From: Jakub Kicinski <jakub.kicinski@netronome.com>
Date: Mon, 29 Jul 2019 14:22:11 -0700

> There is no need to ponder the connotations of verbs. Please just 
> look at other function names in skbuff.h, especially those which 
> copy fields :)
> 
> static inline void skb_copy_hash(struct sk_buff *to, const struct sk_buff *from)
> static inline void skb_copy_secmark(struct sk_buff *to, const struct sk_buff *from)
> static inline void skb_copy_queue_mapping(struct sk_buff *to, const struct sk_buff *from)
> static inline void skb_ext_copy(struct sk_buff *dst, const struct sk_buff *src)

I have to agree :-)

^ permalink raw reply

* linux-next: Fixes tag needs some work in the net-next tree
From: Stephen Rothwell @ 2019-07-29 21:25 UTC (permalink / raw)
  To: David Miller, Networking
  Cc: Linux Next Mailing List, Linux Kernel Mailing List,
	Oliver Hartkopp

[-- Attachment #1: Type: text/plain, Size: 406 bytes --]

Hi all,

In commit

  473d924d7d46 ("can: fix ioctl function removal")

Fixes tag

  Fixes: 60649d4e0af ("can: remove obsolete empty ioctl() handler")

has these problem(s):

  - SHA1 should be at least 12 digits long
    Can be fixed by setting core.abbrev to 12 (or more) or (for git v2.11
    or later) just making sure it is not set (or set to "auto").

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH net-next 00/16] bnxt_en: Add TPA (GRO_HW and LRO) on 57500 chips.
From: David Miller @ 2019-07-29 21:24 UTC (permalink / raw)
  To: michael.chan; +Cc: netdev
In-Reply-To: <1564395033-19511-1-git-send-email-michael.chan@broadcom.com>

From: Michael Chan <michael.chan@broadcom.com>
Date: Mon, 29 Jul 2019 06:10:17 -0400

> This patchset adds TPA v2 support on the 57500 chips.  TPA v2 is
> different from the legacy TPA scheme on older chips and requires major
> refactoring and restructuring of the existing TPA logic.  The main
> difference is that the new TPA v2 has on-the-fly aggregation buffer
> completions before a TPA packet is completed.  The larger aggregation
> ID space also requires a new ID mapping logic to make it more
> memory efficient.

Series applied, but please explain something to me.

I thought initially while reviewing this that patch #5 makes the series
non-bisectable because this only includes the logic that appends new
entries to the agg array but lacks the changes to reset the agg count
at TPE end time (which occurs in patch #8).

However I then realized that you haven't turned on the logic yet that
can result in CMP_TYPE_RX_TPA_AGG_CMP entries in this context.

Am I right?

^ permalink raw reply

* Re: [PATCH 1/3 net-next] linux: Add skb_frag_t page_offset accessors
From: Jakub Kicinski @ 2019-07-29 21:22 UTC (permalink / raw)
  To: Jonathan Lemon; +Cc: davem, kernel-team, netdev, Matthew Wilcox
In-Reply-To: <932D725D-62F1-47D6-807A-45F81E66B1C6@gmail.com>

On Mon, 29 Jul 2019 14:02:21 -0700, Jonathan Lemon wrote:
> On 29 Jul 2019, at 13:50, Jakub Kicinski wrote:
> > On Mon, 29 Jul 2019 10:19:39 -0700, Jonathan Lemon wrote:  
> >> Add skb_frag_off(), skb_frag_off_add(), skb_frag_off_set(),
> >> and skb_frag_off_set_from() accessors for page_offset.
> >>
> >> Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>

> >> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> >> index 718742b1c505..7d94a78067ee 100644
> >> --- a/include/linux/skbuff.h
> >> +++ b/include/linux/skbuff.h
> >> @@ -331,7 +331,7 @@ static inline void skb_frag_size_set(skb_frag_t 
> >> *frag, unsigned int size)
> >>  }
> >>
> >>  /**
> >> - * skb_frag_size_add - Incrementes the size of a skb fragment by 
> >> %delta
> >> + * skb_frag_size_add - Increments the size of a skb fragment by 
> >> %delta
> >>   * @frag: skb fragment
> >>   * @delta: value to add
> >>   */
> >> @@ -2857,6 +2857,46 @@ static inline void 
> >> skb_propagate_pfmemalloc(struct page *page,
> >>  		skb->pfmemalloc = true;
> >>  }
> >>
> >> +/**
> >> + * skb_frag_off - Returns the offset of a skb fragment
> >> + * @frag: the paged fragment
> >> + */
> >> +static inline unsigned int skb_frag_off(const skb_frag_t *frag)
> >> +{
> >> +	return frag->page_offset;
> >> +}
> >> +
> >> +/**
> >> + * skb_frag_off_add - Increments the offset of a skb fragment by 
> >> %delta  
> >
> > I realize you're following the existing code, but should we perhaps 
> > use
> > the latest kdoc syntax? '()' after function name, and args should have
> > '@' prefix, '%' would be for constants.  
> 
> That would be a task for a different cleanup.  Not that I disagree with 
> you, but there's also nothing worse than mixing styles in the same file.

Funny you should say that given that (a) I'm commenting on the new code
you're adding, and (b) you did do an unrelated spelling fix above ;)

> >> + * @frag: skb fragment
> >> + * @delta: value to add
> >> + */
> >> +static inline void skb_frag_off_add(skb_frag_t *frag, int delta)
> >> +{
> >> +	frag->page_offset += delta;
> >> +}
> >> +
> >> +/**
> >> + * skb_frag_off_set - Sets the offset of a skb fragment
> >> + * @frag: skb fragment
> >> + * @offset: offset of fragment
> >> + */
> >> +static inline void skb_frag_off_set(skb_frag_t *frag, unsigned int 
> >> offset)
> >> +{
> >> +	frag->page_offset = offset;
> >> +}
> >> +
> >> +/**
> >> + * skb_frag_off_set_from - Sets the offset of a skb fragment from 
> >> another fragment
> >> + * @fragto: skb fragment where offset is set
> >> + * @fragfrom: skb fragment offset is copied from
> >> + */
> >> +static inline void skb_frag_off_set_from(skb_frag_t *fragto,
> >> +					 const skb_frag_t *fragfrom)  
> >
> > skb_frag_off_copy() ?  
> 
> That was my initial inclination, but due to the often overloaded
> connotations of the word "copy", opted to use the same "set" verbiage
> that existed in the other functions.

There is no need to ponder the connotations of verbs. Please just 
look at other function names in skbuff.h, especially those which 
copy fields :)

static inline void skb_copy_hash(struct sk_buff *to, const struct sk_buff *from)
static inline void skb_copy_secmark(struct sk_buff *to, const struct sk_buff *from)
static inline void skb_copy_queue_mapping(struct sk_buff *to, const struct sk_buff *from)
static inline void skb_ext_copy(struct sk_buff *dst, const struct sk_buff *src)

^ permalink raw reply

* Re: [PATCH bpf-next 10/10] selftests/bpf: add CO-RE relocs ints tests
From: Song Liu @ 2019-07-29 21:21 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Networking, Alexei Starovoitov, Daniel Borkmann,
	Yonghong Song, Andrii Nakryiko, Kernel Team
In-Reply-To: <20190724192742.1419254-11-andriin@fb.com>

On Wed, Jul 24, 2019 at 1:34 PM Andrii Nakryiko <andriin@fb.com> wrote:
>
> Add various tests validating handling compatible/incompatible integer
> types.
>
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>

Acked-by: Song Liu <songliubraving@fb.com>

^ permalink raw reply

* Re: [PATCH net] net: ipv6: Fix a bug in ndisc_send_ns when netdev only has a global address
From: David Miller @ 2019-07-29 21:17 UTC (permalink / raw)
  To: suyj.fnst; +Cc: kuznet, yoshfuji, netdev, dsahern
In-Reply-To: <1564368591-42301-1-git-send-email-suyj.fnst@cn.fujitsu.com>

From: Su Yanjun <suyj.fnst@cn.fujitsu.com>
Date: Mon, 29 Jul 2019 10:49:51 +0800

> When we send mpls packets and the interface only has a
> manual global ipv6 address, then the two hosts cant communicate.
> I find that in ndisc_send_ns it only tries to get a ll address.
> In my case, the executive path is as below.
> ip6_output
>  ->ip6_finish_output
>   ->lwtunnel_xmit
>    ->mpls_xmit
>     ->neigh_resolve_output
>      ->neigh_probe
>       ->ndisc_solicit
>        ->ndisc_send_ns
> 
> In RFC4861, 7.2.2 says
> "If the source address of the packet prompting the solicitation is the
> same as one of the addresses assigned to the outgoing interface, that
> address SHOULD be placed in the IP Source Address of the outgoing
> solicitation.  Otherwise, any one of the addresses assigned to the
> interface should be used."
> 
> In this patch we try get a global address if we get ll address failed.
> 
> Signed-off-by: Su Yanjun <suyj.fnst@cn.fujitsu.com>

David, can you take a quick look at this?

Thank you.

^ permalink raw reply

* Re: ip route JSON format is unparseable for "unreachable" routes
From: David Ahern @ 2019-07-29 21:16 UTC (permalink / raw)
  To: Michael Ziegler, netdev
In-Reply-To: <6e88311b-5edc-4c62-1581-0f5b160a5f4e@michaelziegler.name>

On 7/28/19 5:09 AM, Michael Ziegler wrote:
> Hi,
> 
> I created a couple "unreachable" routes on one of my systems, like such:
> 
>> ip route add unreachable 10.0.0.0/8     metric 255
>> ip route add unreachable 192.168.0.0/16 metric 255
> 
> Unfortunately this results in unparseable JSON output from "ip":
> 
>> # ip -j route show  | jq .
>> parse error: Objects must consist of key:value pairs at line 1, column 84
> 
> The offending JSON objects are these:
> 
>> {"unreachable","dst":"10.0.0.0/8","metric":255,"flags":[]}
>> {"unreachable","dst":"192.168.0.0/16","metric":255,"flags":[]}
> "unreachable" cannot appear on its own here, it needs to be some kind of
> field.
> 
> The manpage says to report here, thus I do :) I've searched the
> archives, but I wasn't able to find any existing bug reports about this.
> I'm running version
> 

actually that was fixed by:

073661773872 ip route: print route type in JSON output

^ permalink raw reply

* Re: [PATCH bpf-next 09/10] selftest/bpf: add CO-RE relocs ptr-as-array tests
From: Song Liu @ 2019-07-29 21:14 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Networking, Alexei Starovoitov, Daniel Borkmann,
	Yonghong Song, Andrii Nakryiko, Kernel Team
In-Reply-To: <20190724192742.1419254-10-andriin@fb.com>

On Wed, Jul 24, 2019 at 12:31 PM Andrii Nakryiko <andriin@fb.com> wrote:
>
> Add test validating correct relocation handling for cases where pointer
> to something is used as an array. E.g.:
>
>   int *ptr = ...;
>   int x = ptr[42];
>
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>

Acked-by: Song Liu <songliubraving@fb.com>

^ permalink raw reply

* [PATCH mlx5-next 11/11] net/mlx5: E-switch, Tide up eswitch config sequence
From: Saeed Mahameed @ 2019-07-29 21:13 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky, netdev@vger.kernel.org,
	linux-rdma@vger.kernel.org
  Cc: Parav Pandit
In-Reply-To: <20190729211209.14772-1-saeedm@mellanox.com>

From: Parav Pandit <parav@mellanox.com>

Currently for PF and ECPF vports, representors are created before
their eswitch hardware ports are initialized in below flow.

mlx5_eswitch_enable()
  esw_offloads_init()
    esw_offloads_load_all_reps()
[..]
esw_enable_vport()

However for VFs, vports are initialized before creating their
respective netdev represnetors in event handling context.

Similarly while disabling eswitch, first hardware vports are disabled,
followed by destroying their representors.
Here while underlying vports gets destroyed but its respective user
facing netdevice can still exist on which user can continue to perform
more offload operations.

Instead, its more accurate to do
enable_eswitch switchdev mode:
1. perform FDB tables initialization
2. initialize hw vport
3. create and publish representor for this vport

disable_eswitch switchdev mode:
1. destroy user facing representor for the vport
2. disable hw vport
3. perform FDB tables cleanup

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/eswitch.c | 54 +++++++++++--------
 .../net/ethernet/mellanox/mlx5/core/eswitch.h |  4 +-
 .../mellanox/mlx5/core/eswitch_offloads.c     |  8 ++-
 3 files changed, 41 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index 90d150be237b..d4465dd18c11 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -452,6 +452,22 @@ static int esw_create_legacy_table(struct mlx5_eswitch *esw)
 	return err;
 }
 
+#define MLX5_LEGACY_SRIOV_VPORT_EVENTS (MLX5_VPORT_UC_ADDR_CHANGE | \
+					MLX5_VPORT_MC_ADDR_CHANGE | \
+					MLX5_VPORT_PROMISC_CHANGE)
+
+static int esw_legacy_enable(struct mlx5_eswitch *esw)
+{
+	int ret;
+
+	ret = esw_create_legacy_table(esw);
+	if (ret)
+		return ret;
+
+	mlx5_eswitch_enable_pf_vf_vports(esw, MLX5_LEGACY_SRIOV_VPORT_EVENTS);
+	return 0;
+}
+
 static void esw_destroy_legacy_table(struct mlx5_eswitch *esw)
 {
 	esw_cleanup_vepa_rules(esw);
@@ -459,6 +475,19 @@ static void esw_destroy_legacy_table(struct mlx5_eswitch *esw)
 	esw_destroy_legacy_vepa_table(esw);
 }
 
+static void esw_legacy_disable(struct mlx5_eswitch *esw)
+{
+	struct esw_mc_addr *mc_promisc;
+
+	mlx5_eswitch_disable_pf_vf_vports(esw);
+
+	mc_promisc = &esw->mc_promisc;
+	if (mc_promisc->uplink_rule)
+		mlx5_del_flow_rules(mc_promisc->uplink_rule);
+
+	esw_destroy_legacy_table(esw);
+}
+
 /* E-Switch vport UC/MC lists management */
 typedef int (*vport_addr_action)(struct mlx5_eswitch *esw,
 				 struct vport_addr *vaddr);
@@ -1826,13 +1855,8 @@ void mlx5_eswitch_disable_pf_vf_vports(struct mlx5_eswitch *esw)
 		esw_disable_vport(esw, vport);
 }
 
-#define MLX5_LEGACY_SRIOV_VPORT_EVENTS (MLX5_VPORT_UC_ADDR_CHANGE | \
-					MLX5_VPORT_MC_ADDR_CHANGE | \
-					MLX5_VPORT_PROMISC_CHANGE)
-
 int mlx5_eswitch_enable(struct mlx5_eswitch *esw, int mode)
 {
-	int enabled_events;
 	int err;
 
 	if (!ESW_ALLOWED(esw) ||
@@ -1854,21 +1878,16 @@ int mlx5_eswitch_enable(struct mlx5_eswitch *esw, int mode)
 	mlx5_lag_update(esw->dev);
 
 	if (mode == MLX5_ESWITCH_LEGACY) {
-		err = esw_create_legacy_table(esw);
+		err = esw_legacy_enable(esw);
 	} else {
 		mlx5_reload_interface(esw->dev, MLX5_INTERFACE_PROTOCOL_ETH);
 		mlx5_reload_interface(esw->dev, MLX5_INTERFACE_PROTOCOL_IB);
-		err = esw_offloads_init(esw);
+		err = esw_offloads_enable(esw);
 	}
 
 	if (err)
 		goto abort;
 
-	enabled_events = (mode == MLX5_ESWITCH_LEGACY) ? MLX5_LEGACY_SRIOV_VPORT_EVENTS :
-		MLX5_VPORT_UC_ADDR_CHANGE;
-
-	mlx5_eswitch_enable_pf_vf_vports(esw, enabled_events);
-
 	mlx5_eswitch_event_handlers_register(esw);
 
 	esw_info(esw->dev, "Enable: mode(%s), nvfs(%d), active vports(%d)\n",
@@ -1890,7 +1909,6 @@ int mlx5_eswitch_enable(struct mlx5_eswitch *esw, int mode)
 
 void mlx5_eswitch_disable(struct mlx5_eswitch *esw)
 {
-	struct esw_mc_addr *mc_promisc;
 	int old_mode;
 
 	if (!ESW_ALLOWED(esw) || esw->mode == MLX5_ESWITCH_NONE)
@@ -1902,16 +1920,10 @@ void mlx5_eswitch_disable(struct mlx5_eswitch *esw)
 
 	mlx5_eswitch_event_handlers_unregister(esw);
 
-	mlx5_eswitch_disable_pf_vf_vports(esw);
-
-	mc_promisc = &esw->mc_promisc;
-	if (mc_promisc->uplink_rule)
-		mlx5_del_flow_rules(mc_promisc->uplink_rule);
-
 	if (esw->mode == MLX5_ESWITCH_LEGACY)
-		esw_destroy_legacy_table(esw);
+		esw_legacy_disable(esw);
 	else if (esw->mode == MLX5_ESWITCH_OFFLOADS)
-		esw_offloads_cleanup(esw);
+		esw_offloads_disable(esw);
 
 	esw_destroy_tsar(esw);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index 51b6d29466f1..d447e1e44d59 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -242,8 +242,8 @@ struct mlx5_eswitch {
 	struct mlx5_esw_functions esw_funcs;
 };
 
-void esw_offloads_cleanup(struct mlx5_eswitch *esw);
-int esw_offloads_init(struct mlx5_eswitch *esw);
+void esw_offloads_disable(struct mlx5_eswitch *esw);
+int esw_offloads_enable(struct mlx5_eswitch *esw);
 void esw_offloads_cleanup_reps(struct mlx5_eswitch *esw);
 int esw_offloads_init_reps(struct mlx5_eswitch *esw);
 void esw_vport_cleanup_ingress_rules(struct mlx5_eswitch *esw,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 4be19890f725..db01b8ee9385 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -2104,7 +2104,7 @@ int mlx5_esw_funcs_changed_handler(struct notifier_block *nb, unsigned long type
 	return NOTIFY_OK;
 }
 
-int esw_offloads_init(struct mlx5_eswitch *esw)
+int esw_offloads_enable(struct mlx5_eswitch *esw)
 {
 	int err;
 
@@ -2122,6 +2122,8 @@ int esw_offloads_init(struct mlx5_eswitch *esw)
 	if (err)
 		goto err_vport_metadata;
 
+	mlx5_eswitch_enable_pf_vf_vports(esw, MLX5_VPORT_UC_ADDR_CHANGE);
+
 	err = esw_offloads_load_all_reps(esw);
 	if (err)
 		goto err_reps;
@@ -2134,6 +2136,7 @@ int esw_offloads_init(struct mlx5_eswitch *esw)
 	return 0;
 
 err_reps:
+	mlx5_eswitch_disable_pf_vf_vports(esw);
 	esw_set_passing_vport_metadata(esw, false);
 err_vport_metadata:
 	esw_offloads_steering_cleanup(esw);
@@ -2159,11 +2162,12 @@ static int esw_offloads_stop(struct mlx5_eswitch *esw,
 	return err;
 }
 
-void esw_offloads_cleanup(struct mlx5_eswitch *esw)
+void esw_offloads_disable(struct mlx5_eswitch *esw)
 {
 	mlx5_rdma_disable_roce(esw->dev);
 	esw_offloads_devcom_cleanup(esw);
 	esw_offloads_unload_all_reps(esw);
+	mlx5_eswitch_disable_pf_vf_vports(esw);
 	esw_set_passing_vport_metadata(esw, false);
 	esw_offloads_steering_cleanup(esw);
 	esw->offloads.encap = DEVLINK_ESWITCH_ENCAP_MODE_NONE;
-- 
2.21.0


^ permalink raw reply related

* [PATCH mlx5-next 08/11] net/mlx5: E-switch, Introduce helper function to enable/disable vports
From: Saeed Mahameed @ 2019-07-29 21:13 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky, netdev@vger.kernel.org,
	linux-rdma@vger.kernel.org
  Cc: Parav Pandit
In-Reply-To: <20190729211209.14772-1-saeedm@mellanox.com>

From: Parav Pandit <parav@mellanox.com>

vports needs to be enabled in switchdev and legacy mode.

In switchdev mode, vports should be enabled after initializing
the FDB tables and before creating their represntors so that
representor works on an initialized vport object.

Prepare a helper function which can be called when enabling either of
the eswitch modes.

Similarly, have disable vports helper function.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/eswitch.c | 96 +++++++++++--------
 .../net/ethernet/mellanox/mlx5/core/eswitch.h | 19 +++-
 2 files changed, 73 insertions(+), 42 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index 820970911f8b..6d82aefae6e1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -58,20 +58,9 @@ struct vport_addr {
 	bool mc_promisc;
 };
 
-enum {
-	UC_ADDR_CHANGE = BIT(0),
-	MC_ADDR_CHANGE = BIT(1),
-	PROMISC_CHANGE = BIT(3),
-};
-
 static void esw_destroy_legacy_fdb_table(struct mlx5_eswitch *esw);
 static void esw_cleanup_vepa_rules(struct mlx5_eswitch *esw);
 
-/* Vport context events */
-#define SRIOV_VPORT_EVENTS (UC_ADDR_CHANGE | \
-			    MC_ADDR_CHANGE | \
-			    PROMISC_CHANGE)
-
 struct mlx5_vport *__must_check
 mlx5_eswitch_get_vport(struct mlx5_eswitch *esw, u16 vport_num)
 {
@@ -108,13 +97,13 @@ static int arm_vport_context_events_cmd(struct mlx5_core_dev *dev, u16 vport,
 
 	MLX5_SET(nic_vport_context, nic_vport_ctx, arm_change_event, 1);
 
-	if (events_mask & UC_ADDR_CHANGE)
+	if (events_mask & MLX5_VPORT_UC_ADDR_CHANGE)
 		MLX5_SET(nic_vport_context, nic_vport_ctx,
 			 event_on_uc_address_change, 1);
-	if (events_mask & MC_ADDR_CHANGE)
+	if (events_mask & MLX5_VPORT_MC_ADDR_CHANGE)
 		MLX5_SET(nic_vport_context, nic_vport_ctx,
 			 event_on_mc_address_change, 1);
-	if (events_mask & PROMISC_CHANGE)
+	if (events_mask & MLX5_VPORT_PROMISC_CHANGE)
 		MLX5_SET(nic_vport_context, nic_vport_ctx,
 			 event_on_promisc_change, 1);
 
@@ -901,21 +890,21 @@ static void esw_vport_change_handle_locked(struct mlx5_vport *vport)
 	esw_debug(dev, "vport[%d] Context Changed: perm mac: %pM\n",
 		  vport->vport, mac);
 
-	if (vport->enabled_events & UC_ADDR_CHANGE) {
+	if (vport->enabled_events & MLX5_VPORT_UC_ADDR_CHANGE) {
 		esw_update_vport_addr_list(esw, vport, MLX5_NVPRT_LIST_TYPE_UC);
 		esw_apply_vport_addr_list(esw, vport, MLX5_NVPRT_LIST_TYPE_UC);
 	}
 
-	if (vport->enabled_events & MC_ADDR_CHANGE)
+	if (vport->enabled_events & MLX5_VPORT_MC_ADDR_CHANGE)
 		esw_update_vport_addr_list(esw, vport, MLX5_NVPRT_LIST_TYPE_MC);
 
-	if (vport->enabled_events & PROMISC_CHANGE) {
+	if (vport->enabled_events & MLX5_VPORT_PROMISC_CHANGE) {
 		esw_update_vport_rx_mode(esw, vport);
 		if (!IS_ERR_OR_NULL(vport->allmulti_rule))
 			esw_update_vport_mc_promisc(esw, vport);
 	}
 
-	if (vport->enabled_events & (PROMISC_CHANGE | MC_ADDR_CHANGE))
+	if (vport->enabled_events & (MLX5_VPORT_PROMISC_CHANGE | MLX5_VPORT_MC_ADDR_CHANGE))
 		esw_apply_vport_addr_list(esw, vport, MLX5_NVPRT_LIST_TYPE_MC);
 
 	esw_debug(esw->dev, "vport[%d] Context Changed: Done\n", vport->vport);
@@ -1649,7 +1638,7 @@ static void esw_vport_destroy_drop_counters(struct mlx5_vport *vport)
 }
 
 static void esw_enable_vport(struct mlx5_eswitch *esw, struct mlx5_vport *vport,
-			     int enable_events)
+			     enum mlx5_eswitch_vport_event enabled_events)
 {
 	u16 vport_num = vport->vport;
 
@@ -1671,7 +1660,7 @@ static void esw_enable_vport(struct mlx5_eswitch *esw, struct mlx5_vport *vport,
 		esw_warn(esw->dev, "Failed to attach vport %d to eswitch rate limiter", vport_num);
 
 	/* Sync with current vport context */
-	vport->enabled_events = enable_events;
+	vport->enabled_events = enabled_events;
 	vport->enabled = true;
 
 	/* Esw manager is trusted by default. Host PF (vport 0) is trusted as well
@@ -1800,11 +1789,51 @@ static void mlx5_eswitch_event_handlers_unregister(struct mlx5_eswitch *esw)
 /* Public E-Switch API */
 #define ESW_ALLOWED(esw) ((esw) && MLX5_ESWITCH_MANAGER((esw)->dev))
 
-int mlx5_eswitch_enable(struct mlx5_eswitch *esw, int mode)
+/* mlx5_eswitch_enable_pf_vf_vports() enables vports of PF, ECPF and VFs
+ * whichever are present on the eswitch.
+ */
+void
+mlx5_eswitch_enable_pf_vf_vports(struct mlx5_eswitch *esw,
+				 enum mlx5_eswitch_vport_event enabled_events)
+{
+	struct mlx5_vport *vport;
+	int i;
+
+	/* Enable PF vport */
+	vport = mlx5_eswitch_get_vport(esw, MLX5_VPORT_PF);
+	esw_enable_vport(esw, vport, enabled_events);
+
+	/* Enable ECPF vports */
+	if (mlx5_ecpf_vport_exists(esw->dev)) {
+		vport = mlx5_eswitch_get_vport(esw, MLX5_VPORT_ECPF);
+		esw_enable_vport(esw, vport, enabled_events);
+	}
+
+	/* Enable VF vports */
+	mlx5_esw_for_each_vf_vport(esw, i, vport, esw->esw_funcs.num_vfs)
+		esw_enable_vport(esw, vport, enabled_events);
+}
+
+/* mlx5_eswitch_disable_pf_vf_vports() disables vports of PF, ECPF and VFs
+ * whichever are previously enabled on the eswitch.
+ */
+void mlx5_eswitch_disable_pf_vf_vports(struct mlx5_eswitch *esw)
 {
 	struct mlx5_vport *vport;
+	int i;
+
+	mlx5_esw_for_all_vports_reverse(esw, i, vport)
+		esw_disable_vport(esw, vport);
+}
+
+#define MLX5_LEGACY_SRIOV_VPORT_EVENTS (MLX5_VPORT_UC_ADDR_CHANGE | \
+					MLX5_VPORT_MC_ADDR_CHANGE | \
+					MLX5_VPORT_PROMISC_CHANGE)
+
+int mlx5_eswitch_enable(struct mlx5_eswitch *esw, int mode)
+{
+	int enabled_events;
 	int err;
-	int i, enabled_events;
 
 	if (!ESW_ALLOWED(esw) ||
 	    !MLX5_CAP_ESW_FLOWTABLE_FDB(esw->dev, ft_support)) {
@@ -1837,22 +1866,10 @@ int mlx5_eswitch_enable(struct mlx5_eswitch *esw, int mode)
 	if (err)
 		goto abort;
 
-	enabled_events = (mode == MLX5_ESWITCH_LEGACY) ? SRIOV_VPORT_EVENTS :
-		UC_ADDR_CHANGE;
+	enabled_events = (mode == MLX5_ESWITCH_LEGACY) ? MLX5_LEGACY_SRIOV_VPORT_EVENTS :
+		MLX5_VPORT_UC_ADDR_CHANGE;
 
-	/* Enable PF vport */
-	vport = mlx5_eswitch_get_vport(esw, MLX5_VPORT_PF);
-	esw_enable_vport(esw, vport, enabled_events);
-
-	/* Enable ECPF vports */
-	if (mlx5_ecpf_vport_exists(esw->dev)) {
-		vport = mlx5_eswitch_get_vport(esw, MLX5_VPORT_ECPF);
-		esw_enable_vport(esw, vport, enabled_events);
-	}
-
-	/* Enable VF vports */
-	mlx5_esw_for_each_vf_vport(esw, i, vport, esw->esw_funcs.num_vfs)
-		esw_enable_vport(esw, vport, enabled_events);
+	mlx5_eswitch_enable_pf_vf_vports(esw, enabled_events);
 
 	mlx5_eswitch_event_handlers_register(esw);
 
@@ -1876,9 +1893,7 @@ int mlx5_eswitch_enable(struct mlx5_eswitch *esw, int mode)
 void mlx5_eswitch_disable(struct mlx5_eswitch *esw)
 {
 	struct esw_mc_addr *mc_promisc;
-	struct mlx5_vport *vport;
 	int old_mode;
-	int i;
 
 	if (!ESW_ALLOWED(esw) || esw->mode == MLX5_ESWITCH_NONE)
 		return;
@@ -1890,8 +1905,7 @@ void mlx5_eswitch_disable(struct mlx5_eswitch *esw)
 	mc_promisc = &esw->mc_promisc;
 	mlx5_eswitch_event_handlers_unregister(esw);
 
-	mlx5_esw_for_all_vports(esw, i, vport)
-		esw_disable_vport(esw, vport);
+	mlx5_eswitch_disable_pf_vf_vports(esw);
 
 	if (mc_promisc && mc_promisc->uplink_rule)
 		mlx5_del_flow_rules(mc_promisc->uplink_rule);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index a38e8a3c7c9a..3103a34c619c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -101,6 +101,13 @@ struct mlx5_vport_info {
 	bool                    trusted;
 };
 
+/* Vport context events */
+enum mlx5_eswitch_vport_event {
+	MLX5_VPORT_UC_ADDR_CHANGE = BIT(0),
+	MLX5_VPORT_MC_ADDR_CHANGE = BIT(1),
+	MLX5_VPORT_PROMISC_CHANGE = BIT(3),
+};
+
 struct mlx5_vport {
 	struct mlx5_core_dev    *dev;
 	int                     vport;
@@ -122,7 +129,7 @@ struct mlx5_vport {
 	} qos;
 
 	bool                    enabled;
-	u16                     enabled_events;
+	enum mlx5_eswitch_vport_event enabled_events;
 };
 
 enum offloads_fdb_flags {
@@ -513,6 +520,11 @@ void mlx5e_tc_clean_fdb_peer_flows(struct mlx5_eswitch *esw);
 	     (vport) = &(esw)->vports[i],		\
 	     (i) < (esw)->total_vports; (i)++)
 
+#define mlx5_esw_for_all_vports_reverse(esw, i, vport)	\
+	for ((i) = (esw)->total_vports - 1;		\
+	     (vport) = &(esw)->vports[i],		\
+	     (i) >= MLX5_VPORT_PF; (i)--)
+
 #define mlx5_esw_for_each_vf_vport(esw, i, vport, nvfs)	\
 	for ((i) = MLX5_VPORT_FIRST_VF;			\
 	     (vport) = &(esw)->vports[(i)],		\
@@ -574,6 +586,11 @@ bool mlx5_eswitch_is_vf_vport(const struct mlx5_eswitch *esw, u16 vport_num);
 void mlx5_eswitch_update_num_of_vfs(struct mlx5_eswitch *esw, const int num_vfs);
 int mlx5_esw_funcs_changed_handler(struct notifier_block *nb, unsigned long type, void *data);
 
+void
+mlx5_eswitch_enable_pf_vf_vports(struct mlx5_eswitch *esw,
+				 enum mlx5_eswitch_vport_event enabled_events);
+void mlx5_eswitch_disable_pf_vf_vports(struct mlx5_eswitch *esw);
+
 #else  /* CONFIG_MLX5_ESWITCH */
 /* eswitch API stubs */
 static inline int  mlx5_eswitch_init(struct mlx5_core_dev *dev) { return 0; }
-- 
2.21.0


^ permalink raw reply related

* [PATCH mlx5-next 10/11] net/mlx5: E-Switch, Remove redundant mc_promisc NULL check
From: Saeed Mahameed @ 2019-07-29 21:13 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky, netdev@vger.kernel.org,
	linux-rdma@vger.kernel.org
  Cc: Parav Pandit
In-Reply-To: <20190729211209.14772-1-saeedm@mellanox.com>

From: Parav Pandit <parav@mellanox.com>

mc_promisc pointer points to an instance of struct esw_mc_addr allocated
as part of the esw structure.
Hence it cannot be NULL.
Removed such redundant check and assign where it is actually used.

While at it, add comment around legacy mode fields and move mc_promisc
close to other legacy mode structures to improve code redability.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 4 ++--
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h | 4 +++-
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index 17fb982b3489..90d150be237b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -1900,12 +1900,12 @@ void mlx5_eswitch_disable(struct mlx5_eswitch *esw)
 		 esw->mode == MLX5_ESWITCH_LEGACY ? "LEGACY" : "OFFLOADS",
 		 esw->esw_funcs.num_vfs, esw->enabled_vports);
 
-	mc_promisc = &esw->mc_promisc;
 	mlx5_eswitch_event_handlers_unregister(esw);
 
 	mlx5_eswitch_disable_pf_vf_vports(esw);
 
-	if (mc_promisc && mc_promisc->uplink_rule)
+	mc_promisc = &esw->mc_promisc;
+	if (mc_promisc->uplink_rule)
 		mlx5_del_flow_rules(mc_promisc->uplink_rule);
 
 	if (esw->mode == MLX5_ESWITCH_LEGACY)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index 3103a34c619c..51b6d29466f1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -214,8 +214,11 @@ enum {
 struct mlx5_eswitch {
 	struct mlx5_core_dev    *dev;
 	struct mlx5_nb          nb;
+	/* legacy data structures */
 	struct mlx5_eswitch_fdb fdb_table;
 	struct hlist_head       mc_table[MLX5_L2_ADDR_HASH_SIZE];
+	struct esw_mc_addr mc_promisc;
+	/* end of legacy */
 	struct workqueue_struct *work_queue;
 	struct mlx5_vport       *vports;
 	u32 flags;
@@ -225,7 +228,6 @@ struct mlx5_eswitch {
 	 * and async SRIOV admin state changes
 	 */
 	struct mutex            state_lock;
-	struct esw_mc_addr	mc_promisc;
 
 	struct {
 		bool            enabled;
-- 
2.21.0


^ permalink raw reply related

* [PATCH mlx5-next 09/11] net/mlx5: E-Switch, remove redundant error handling
From: Saeed Mahameed @ 2019-07-29 21:13 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky, netdev@vger.kernel.org,
	linux-rdma@vger.kernel.org
  Cc: Roi Dayan
In-Reply-To: <20190729211209.14772-1-saeedm@mellanox.com>

We don't need to handle error flow of esw_create_legacy_table() in the
same branch, it is already being handled directly after the if statement,
for both legacy and switchdev modes in one place.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index 6d82aefae6e1..17fb982b3489 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -1855,8 +1855,6 @@ int mlx5_eswitch_enable(struct mlx5_eswitch *esw, int mode)
 
 	if (mode == MLX5_ESWITCH_LEGACY) {
 		err = esw_create_legacy_table(esw);
-		if (err)
-			goto abort;
 	} else {
 		mlx5_reload_interface(esw->dev, MLX5_INTERFACE_PROTOCOL_ETH);
 		mlx5_reload_interface(esw->dev, MLX5_INTERFACE_PROTOCOL_IB);
-- 
2.21.0


^ permalink raw reply related

* [PATCH mlx5-next 07/11] net/mlx5: E-switch, Initialize TSAR Qos hardware block before its user vports
From: Saeed Mahameed @ 2019-07-29 21:13 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky, netdev@vger.kernel.org,
	linux-rdma@vger.kernel.org
  Cc: Parav Pandit
In-Reply-To: <20190729211209.14772-1-saeedm@mellanox.com>

From: Parav Pandit <parav@mellanox.com>

First enable TSAR Qos hardware block in device before enabling its
user vports.

This refactor is needed so that vports can be enabled before their
representor netdevice can be created.

While at it, esw_create_tsar() returns error code which was used only to
print error. However esw_create_tsar() already prints warning if it hits
an error.
Hence, remove the redundant warning.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/eswitch.c | 21 ++++++++-----------
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index 2927fa1da92f..820970911f8b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -1415,7 +1415,7 @@ static bool element_type_supported(struct mlx5_eswitch *esw, int type)
 }
 
 /* Vport QoS management */
-static int esw_create_tsar(struct mlx5_eswitch *esw)
+static void esw_create_tsar(struct mlx5_eswitch *esw)
 {
 	u32 tsar_ctx[MLX5_ST_SZ_DW(scheduling_context)] = {0};
 	struct mlx5_core_dev *dev = esw->dev;
@@ -1423,13 +1423,13 @@ static int esw_create_tsar(struct mlx5_eswitch *esw)
 	int err;
 
 	if (!MLX5_CAP_GEN(dev, qos) || !MLX5_CAP_QOS(dev, esw_scheduling))
-		return 0;
+		return;
 
 	if (!element_type_supported(esw, SCHEDULING_CONTEXT_ELEMENT_TYPE_TSAR))
-		return 0;
+		return;
 
 	if (esw->qos.enabled)
-		return -EEXIST;
+		return;
 
 	MLX5_SET(scheduling_context, tsar_ctx, element_type,
 		 SCHEDULING_CONTEXT_ELEMENT_TYPE_TSAR);
@@ -1443,11 +1443,10 @@ static int esw_create_tsar(struct mlx5_eswitch *esw)
 						 &esw->qos.root_tsar_id);
 	if (err) {
 		esw_warn(esw->dev, "E-Switch create TSAR failed (%d)\n", err);
-		return err;
+		return;
 	}
 
 	esw->qos.enabled = true;
-	return 0;
 }
 
 static void esw_destroy_tsar(struct mlx5_eswitch *esw)
@@ -1819,6 +1818,8 @@ int mlx5_eswitch_enable(struct mlx5_eswitch *esw, int mode)
 	if (!MLX5_CAP_ESW_EGRESS_ACL(esw->dev, ft_support))
 		esw_warn(esw->dev, "engress ACL is not supported by FW\n");
 
+	esw_create_tsar(esw);
+
 	esw->mode = mode;
 
 	mlx5_lag_update(esw->dev);
@@ -1836,10 +1837,6 @@ int mlx5_eswitch_enable(struct mlx5_eswitch *esw, int mode)
 	if (err)
 		goto abort;
 
-	err = esw_create_tsar(esw);
-	if (err)
-		esw_warn(esw->dev, "Failed to create eswitch TSAR");
-
 	enabled_events = (mode == MLX5_ESWITCH_LEGACY) ? SRIOV_VPORT_EVENTS :
 		UC_ADDR_CHANGE;
 
@@ -1899,13 +1896,13 @@ void mlx5_eswitch_disable(struct mlx5_eswitch *esw)
 	if (mc_promisc && mc_promisc->uplink_rule)
 		mlx5_del_flow_rules(mc_promisc->uplink_rule);
 
-	esw_destroy_tsar(esw);
-
 	if (esw->mode == MLX5_ESWITCH_LEGACY)
 		esw_destroy_legacy_table(esw);
 	else if (esw->mode == MLX5_ESWITCH_OFFLOADS)
 		esw_offloads_cleanup(esw);
 
+	esw_destroy_tsar(esw);
+
 	old_mode = esw->mode;
 	esw->mode = MLX5_ESWITCH_NONE;
 
-- 
2.21.0


^ permalink raw reply related

* [PATCH mlx5-next 06/11] net/mlx5: E-switch, Combine metadata enable/disable functionality
From: Saeed Mahameed @ 2019-07-29 21:13 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky, netdev@vger.kernel.org,
	linux-rdma@vger.kernel.org
  Cc: Parav Pandit
In-Reply-To: <20190729211209.14772-1-saeedm@mellanox.com>

From: Parav Pandit <parav@mellanox.com>

Except bit toggling code, rest of the code is same to enable/disable
metadata passing functionality.
Hence, combine them to single function and control using enable flag.

Also instead of checking metadata supported at multiple places,
fold into the helper function.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../mellanox/mlx5/core/eswitch_offloads.c     | 48 +++++--------------
 1 file changed, 12 insertions(+), 36 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 089ae4d48a82..4be19890f725 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -587,38 +587,15 @@ void mlx5_eswitch_del_send_to_vport_rule(struct mlx5_flow_handle *rule)
 	mlx5_del_flow_rules(rule);
 }
 
-static int mlx5_eswitch_enable_passing_vport_metadata(struct mlx5_eswitch *esw)
+static int esw_set_passing_vport_metadata(struct mlx5_eswitch *esw, bool enable)
 {
 	u32 out[MLX5_ST_SZ_DW(query_esw_vport_context_out)] = {};
 	u32 in[MLX5_ST_SZ_DW(modify_esw_vport_context_in)] = {};
 	u8 fdb_to_vport_reg_c_id;
 	int err;
 
-	err = mlx5_eswitch_query_esw_vport_context(esw, esw->manager_vport,
-						   out, sizeof(out));
-	if (err)
-		return err;
-
-	fdb_to_vport_reg_c_id = MLX5_GET(query_esw_vport_context_out, out,
-					 esw_vport_context.fdb_to_vport_reg_c_id);
-
-	fdb_to_vport_reg_c_id |= MLX5_FDB_TO_VPORT_REG_C_0;
-	MLX5_SET(modify_esw_vport_context_in, in,
-		 esw_vport_context.fdb_to_vport_reg_c_id, fdb_to_vport_reg_c_id);
-
-	MLX5_SET(modify_esw_vport_context_in, in,
-		 field_select.fdb_to_vport_reg_c_id, 1);
-
-	return mlx5_eswitch_modify_esw_vport_context(esw, esw->manager_vport,
-						     in, sizeof(in));
-}
-
-static int mlx5_eswitch_disable_passing_vport_metadata(struct mlx5_eswitch *esw)
-{
-	u32 out[MLX5_ST_SZ_DW(query_esw_vport_context_out)] = {};
-	u32 in[MLX5_ST_SZ_DW(modify_esw_vport_context_in)] = {};
-	u8 fdb_to_vport_reg_c_id;
-	int err;
+	if (!mlx5_eswitch_vport_match_metadata_enabled(esw))
+		return 0;
 
 	err = mlx5_eswitch_query_esw_vport_context(esw, esw->manager_vport,
 						   out, sizeof(out));
@@ -628,7 +605,10 @@ static int mlx5_eswitch_disable_passing_vport_metadata(struct mlx5_eswitch *esw)
 	fdb_to_vport_reg_c_id = MLX5_GET(query_esw_vport_context_out, out,
 					 esw_vport_context.fdb_to_vport_reg_c_id);
 
-	fdb_to_vport_reg_c_id &= ~MLX5_FDB_TO_VPORT_REG_C_0;
+	if (enable)
+		fdb_to_vport_reg_c_id |= MLX5_FDB_TO_VPORT_REG_C_0;
+	else
+		fdb_to_vport_reg_c_id &= ~MLX5_FDB_TO_VPORT_REG_C_0;
 
 	MLX5_SET(modify_esw_vport_context_in, in,
 		 esw_vport_context.fdb_to_vport_reg_c_id, fdb_to_vport_reg_c_id);
@@ -2138,11 +2118,9 @@ int esw_offloads_init(struct mlx5_eswitch *esw)
 	if (err)
 		return err;
 
-	if (mlx5_eswitch_vport_match_metadata_enabled(esw)) {
-		err = mlx5_eswitch_enable_passing_vport_metadata(esw);
-		if (err)
-			goto err_vport_metadata;
-	}
+	err = esw_set_passing_vport_metadata(esw, true);
+	if (err)
+		goto err_vport_metadata;
 
 	err = esw_offloads_load_all_reps(esw);
 	if (err)
@@ -2156,8 +2134,7 @@ int esw_offloads_init(struct mlx5_eswitch *esw)
 	return 0;
 
 err_reps:
-	if (mlx5_eswitch_vport_match_metadata_enabled(esw))
-		mlx5_eswitch_disable_passing_vport_metadata(esw);
+	esw_set_passing_vport_metadata(esw, false);
 err_vport_metadata:
 	esw_offloads_steering_cleanup(esw);
 	return err;
@@ -2187,8 +2164,7 @@ void esw_offloads_cleanup(struct mlx5_eswitch *esw)
 	mlx5_rdma_disable_roce(esw->dev);
 	esw_offloads_devcom_cleanup(esw);
 	esw_offloads_unload_all_reps(esw);
-	if (mlx5_eswitch_vport_match_metadata_enabled(esw))
-		mlx5_eswitch_disable_passing_vport_metadata(esw);
+	esw_set_passing_vport_metadata(esw, false);
 	esw_offloads_steering_cleanup(esw);
 	esw->offloads.encap = DEVLINK_ESWITCH_ENCAP_MODE_NONE;
 }
-- 
2.21.0


^ permalink raw reply related

* [PATCH mlx5-next 04/11] net/mlx5: Make load_one() and unload_one() symmetric
From: Saeed Mahameed @ 2019-07-29 21:12 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky, netdev@vger.kernel.org,
	linux-rdma@vger.kernel.org
  Cc: Parav Pandit
In-Reply-To: <20190729211209.14772-1-saeedm@mellanox.com>

From: Parav Pandit <parav@mellanox.com>

Currently mlx5_load_one() perform device registration using
mlx5_register_device(). But mlx5_unload_one() doesn't unregister.

Make them symmetric by doing device unregistration in
mlx5_unload_one().

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index b15b27a497fc..fa0e991f1983 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1217,8 +1217,10 @@ static int mlx5_unload_one(struct mlx5_core_dev *dev, bool cleanup)
 {
 	int err = 0;
 
-	if (cleanup)
+	if (cleanup) {
+		mlx5_unregister_device(dev);
 		mlx5_drain_health_wq(dev);
+	}
 
 	mutex_lock(&dev->intf_state_mutex);
 	if (!test_bit(MLX5_INTERFACE_STATE_UP, &dev->intf_state)) {
@@ -1369,7 +1371,6 @@ static void remove_one(struct pci_dev *pdev)
 
 	mlx5_crdump_disable(dev);
 	mlx5_devlink_unregister(devlink);
-	mlx5_unregister_device(dev);
 
 	if (mlx5_unload_one(dev, true)) {
 		mlx5_core_err(dev, "mlx5_unload_one failed\n");
-- 
2.21.0


^ permalink raw reply related

* [PATCH mlx5-next 05/11] net/mlx5: E-Switch, Verify support QoS element type
From: Saeed Mahameed @ 2019-07-29 21:13 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky, netdev@vger.kernel.org,
	linux-rdma@vger.kernel.org
  Cc: Eli Cohen, Paul Blakey
In-Reply-To: <20190729211209.14772-1-saeedm@mellanox.com>

From: Eli Cohen <eli@mellanox.com>

Check if firmware supports the requested element type before
attempting to create the element type.
In addition, explicitly specify the request element type and tsar type.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/eswitch.c | 31 +++++++++++++++++++
 include/linux/mlx5/mlx5_ifc.h                 |  7 +++++
 2 files changed, 38 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index 1f3891fde2eb..2927fa1da92f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -1393,19 +1393,50 @@ static int esw_vport_egress_config(struct mlx5_eswitch *esw,
 	return err;
 }
 
+static bool element_type_supported(struct mlx5_eswitch *esw, int type)
+{
+	struct mlx5_core_dev *dev = esw->dev = esw->dev;
+
+	switch (type) {
+	case SCHEDULING_CONTEXT_ELEMENT_TYPE_TSAR:
+		return MLX5_CAP_QOS(dev, esw_element_type) &
+		       ELEMENT_TYPE_CAP_MASK_TASR;
+	case SCHEDULING_CONTEXT_ELEMENT_TYPE_VPORT:
+		return MLX5_CAP_QOS(dev, esw_element_type) &
+		       ELEMENT_TYPE_CAP_MASK_VPORT;
+	case SCHEDULING_CONTEXT_ELEMENT_TYPE_VPORT_TC:
+		return MLX5_CAP_QOS(dev, esw_element_type) &
+		       ELEMENT_TYPE_CAP_MASK_VPORT_TC;
+	case SCHEDULING_CONTEXT_ELEMENT_TYPE_PARA_VPORT_TC:
+		return MLX5_CAP_QOS(dev, esw_element_type) &
+		       ELEMENT_TYPE_CAP_MASK_PARA_VPORT_TC;
+	}
+	return false;
+}
+
 /* Vport QoS management */
 static int esw_create_tsar(struct mlx5_eswitch *esw)
 {
 	u32 tsar_ctx[MLX5_ST_SZ_DW(scheduling_context)] = {0};
 	struct mlx5_core_dev *dev = esw->dev;
+	__be32 *attr;
 	int err;
 
 	if (!MLX5_CAP_GEN(dev, qos) || !MLX5_CAP_QOS(dev, esw_scheduling))
 		return 0;
 
+	if (!element_type_supported(esw, SCHEDULING_CONTEXT_ELEMENT_TYPE_TSAR))
+		return 0;
+
 	if (esw->qos.enabled)
 		return -EEXIST;
 
+	MLX5_SET(scheduling_context, tsar_ctx, element_type,
+		 SCHEDULING_CONTEXT_ELEMENT_TYPE_TSAR);
+
+	attr = MLX5_ADDR_OF(scheduling_context, tsar_ctx, element_attributes);
+	*attr = cpu_to_be32(TSAR_ELEMENT_TSAR_TYPE_DWRR << 16);
+
 	err = mlx5_create_scheduling_element_cmd(dev,
 						 SCHEDULING_HIERARCHY_E_SWITCH,
 						 tsar_ctx,
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 9265c84ad353..30d15e80bcc7 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -2957,6 +2957,13 @@ enum {
 	SCHEDULING_CONTEXT_ELEMENT_TYPE_PARA_VPORT_TC = 0x3,
 };
 
+enum {
+	ELEMENT_TYPE_CAP_MASK_TASR		= 1 << 0,
+	ELEMENT_TYPE_CAP_MASK_VPORT		= 1 << 1,
+	ELEMENT_TYPE_CAP_MASK_VPORT_TC		= 1 << 2,
+	ELEMENT_TYPE_CAP_MASK_PARA_VPORT_TC	= 1 << 3,
+};
+
 struct mlx5_ifc_scheduling_context_bits {
 	u8         element_type[0x8];
 	u8         reserved_at_8[0x18];
-- 
2.21.0


^ permalink raw reply related

* [PATCH mlx5-next 03/11] net/mlx5: Fix offset of tisc bits reserved field
From: Saeed Mahameed @ 2019-07-29 21:12 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky, netdev@vger.kernel.org,
	linux-rdma@vger.kernel.org
In-Reply-To: <20190729211209.14772-1-saeedm@mellanox.com>

First reserved field is off by one instead of reserved_at_1 it should be
reserved_at_2, fix that.

Fixes: a12ff35e0fb7 ("net/mlx5: Introduce TLS TX offload hardware bits and structures")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
---
 include/linux/mlx5/mlx5_ifc.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 196987f14a3f..9265c84ad353 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -2782,7 +2782,7 @@ struct mlx5_ifc_traffic_counter_bits {
 struct mlx5_ifc_tisc_bits {
 	u8         strict_lag_tx_port_affinity[0x1];
 	u8         tls_en[0x1];
-	u8         reserved_at_1[0x2];
+	u8         reserved_at_2[0x2];
 	u8         lag_tx_port_affinity[0x04];
 
 	u8         reserved_at_8[0x4];
-- 
2.21.0


^ permalink raw reply related

* [PATCH mlx5-next 02/11] net/mlx5: Add flow counter bulk allocation hardware bits and command
From: Saeed Mahameed @ 2019-07-29 21:12 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky, netdev@vger.kernel.org,
	linux-rdma@vger.kernel.org
  Cc: Gavi Teitz, Vlad Buslov
In-Reply-To: <20190729211209.14772-1-saeedm@mellanox.com>

From: Gavi Teitz <gavi@mellanox.com>

Add a handle to invoke the new FW capability of allocating a bulk of
flow counters.

Signed-off-by: Gavi Teitz <gavi@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/fs_cmd.c  | 10 ++++++++-
 .../net/ethernet/mellanox/mlx5/core/fs_cmd.h  |  3 +++
 include/linux/mlx5/mlx5_ifc.h                 | 21 +++++++++++++++++--
 3 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
index 51f6972f4c70..b84a225bbe86 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
@@ -566,7 +566,9 @@ static int mlx5_cmd_delete_fte(struct mlx5_flow_root_namespace *ns,
 	return mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
 }
 
-int mlx5_cmd_fc_alloc(struct mlx5_core_dev *dev, u32 *id)
+int mlx5_cmd_fc_bulk_alloc(struct mlx5_core_dev *dev,
+			   enum mlx5_fc_bulk_alloc_bitmask alloc_bitmask,
+			   u32 *id)
 {
 	u32 in[MLX5_ST_SZ_DW(alloc_flow_counter_in)]   = {0};
 	u32 out[MLX5_ST_SZ_DW(alloc_flow_counter_out)] = {0};
@@ -574,6 +576,7 @@ int mlx5_cmd_fc_alloc(struct mlx5_core_dev *dev, u32 *id)
 
 	MLX5_SET(alloc_flow_counter_in, in, opcode,
 		 MLX5_CMD_OP_ALLOC_FLOW_COUNTER);
+	MLX5_SET(alloc_flow_counter_in, in, flow_counter_bulk, alloc_bitmask);
 
 	err = mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
 	if (!err)
@@ -581,6 +584,11 @@ int mlx5_cmd_fc_alloc(struct mlx5_core_dev *dev, u32 *id)
 	return err;
 }
 
+int mlx5_cmd_fc_alloc(struct mlx5_core_dev *dev, u32 *id)
+{
+	return mlx5_cmd_fc_bulk_alloc(dev, 0, id);
+}
+
 int mlx5_cmd_fc_free(struct mlx5_core_dev *dev, u32 id)
 {
 	u32 in[MLX5_ST_SZ_DW(dealloc_flow_counter_in)]   = {0};
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h
index db49eabba98d..bc4606306009 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h
@@ -78,6 +78,9 @@ struct mlx5_flow_cmds {
 };
 
 int mlx5_cmd_fc_alloc(struct mlx5_core_dev *dev, u32 *id);
+int mlx5_cmd_fc_bulk_alloc(struct mlx5_core_dev *dev,
+			   enum mlx5_fc_bulk_alloc_bitmask alloc_bitmask,
+			   u32 *id);
 int mlx5_cmd_fc_free(struct mlx5_core_dev *dev, u32 id);
 int mlx5_cmd_fc_query(struct mlx5_core_dev *dev, u32 id,
 		      u64 *packets, u64 *bytes);
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index b3d5752657d9..196987f14a3f 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -1040,6 +1040,21 @@ enum {
 	MLX5_UCTX_CAP_INTERNAL_DEV_RES = 1UL << 1,
 };
 
+#define MLX5_FC_BULK_SIZE_FACTOR 128
+
+enum mlx5_fc_bulk_alloc_bitmask {
+	MLX5_FC_BULK_128   = (1 << 0),
+	MLX5_FC_BULK_256   = (1 << 1),
+	MLX5_FC_BULK_512   = (1 << 2),
+	MLX5_FC_BULK_1024  = (1 << 3),
+	MLX5_FC_BULK_2048  = (1 << 4),
+	MLX5_FC_BULK_4096  = (1 << 5),
+	MLX5_FC_BULK_8192  = (1 << 6),
+	MLX5_FC_BULK_16384 = (1 << 7),
+};
+
+#define MLX5_FC_BULK_NUM_FCS(fc_enum) (MLX5_FC_BULK_SIZE_FACTOR * (fc_enum))
+
 struct mlx5_ifc_cmd_hca_cap_bits {
 	u8         reserved_at_0[0x30];
 	u8         vhca_id[0x10];
@@ -1244,7 +1259,8 @@ struct mlx5_ifc_cmd_hca_cap_bits {
 	u8         reserved_at_2e0[0x7];
 	u8         max_qp_mcg[0x19];
 
-	u8         reserved_at_300[0x18];
+	u8         reserved_at_300[0x10];
+	u8         flow_counter_bulk_alloc[0x8];
 	u8         log_max_mcg[0x8];
 
 	u8         reserved_at_320[0x3];
@@ -7815,7 +7831,8 @@ struct mlx5_ifc_alloc_flow_counter_in_bits {
 	u8         reserved_at_20[0x10];
 	u8         op_mod[0x10];
 
-	u8         reserved_at_40[0x40];
+	u8         reserved_at_40[0x38];
+	u8         flow_counter_bulk[0x8];
 };
 
 struct mlx5_ifc_add_vxlan_udp_dport_out_bits {
-- 
2.21.0


^ permalink raw reply related

* [PATCH mlx5-next 01/11] net/mlx5: Refactor and optimize flow counter bulk query
From: Saeed Mahameed @ 2019-07-29 21:12 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky, netdev@vger.kernel.org,
	linux-rdma@vger.kernel.org
  Cc: Gavi Teitz, Vlad Buslov
In-Reply-To: <20190729211209.14772-1-saeedm@mellanox.com>

From: Gavi Teitz <gavi@mellanox.com>

Towards introducing the ability to allocate bulks of flow counters,
refactor the flow counter bulk query process, removing functions and
structs whose names indicated being used for flow counter bulk
allocation FW commands, despite them actually only being used to
support bulk querying, and migrate their functionality to correctly
named functions in their natural location, fs_counters.c.

Additionally, optimize the bulk query process by:
 * Extracting the memory used for the query to mlx5_fc_stats so
   that it is only allocated once, and not for each bulk query.
 * Querying all the counters in one function call.

Signed-off-by: Gavi Teitz <gavi@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/fs_cmd.c  |  61 ++-------
 .../net/ethernet/mellanox/mlx5/core/fs_cmd.h  |  13 +-
 .../ethernet/mellanox/mlx5/core/fs_counters.c | 125 ++++++++++--------
 include/linux/mlx5/driver.h                   |   1 +
 4 files changed, 81 insertions(+), 119 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
index 7ac1249eadc3..51f6972f4c70 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
@@ -615,67 +615,24 @@ int mlx5_cmd_fc_query(struct mlx5_core_dev *dev, u32 id,
 	return 0;
 }
 
-struct mlx5_cmd_fc_bulk {
-	u32 id;
-	int num;
-	int outlen;
-	u32 out[0];
-};
-
-struct mlx5_cmd_fc_bulk *
-mlx5_cmd_fc_bulk_alloc(struct mlx5_core_dev *dev, u32 id, int num)
+int mlx5_cmd_fc_get_bulk_query_out_len(int bulk_len)
 {
-	struct mlx5_cmd_fc_bulk *b;
-	int outlen =
-		MLX5_ST_SZ_BYTES(query_flow_counter_out) +
-		MLX5_ST_SZ_BYTES(traffic_counter) * num;
-
-	b = kzalloc(sizeof(*b) + outlen, GFP_KERNEL);
-	if (!b)
-		return NULL;
-
-	b->id = id;
-	b->num = num;
-	b->outlen = outlen;
-
-	return b;
+	return MLX5_ST_SZ_BYTES(query_flow_counter_out) +
+		MLX5_ST_SZ_BYTES(traffic_counter) * bulk_len;
 }
 
-void mlx5_cmd_fc_bulk_free(struct mlx5_cmd_fc_bulk *b)
-{
-	kfree(b);
-}
-
-int
-mlx5_cmd_fc_bulk_query(struct mlx5_core_dev *dev, struct mlx5_cmd_fc_bulk *b)
+int mlx5_cmd_fc_bulk_query(struct mlx5_core_dev *dev, u32 base_id, int bulk_len,
+			   u32 *out)
 {
+	int outlen = mlx5_cmd_fc_get_bulk_query_out_len(bulk_len);
 	u32 in[MLX5_ST_SZ_DW(query_flow_counter_in)] = {0};
 
 	MLX5_SET(query_flow_counter_in, in, opcode,
 		 MLX5_CMD_OP_QUERY_FLOW_COUNTER);
 	MLX5_SET(query_flow_counter_in, in, op_mod, 0);
-	MLX5_SET(query_flow_counter_in, in, flow_counter_id, b->id);
-	MLX5_SET(query_flow_counter_in, in, num_of_counters, b->num);
-	return mlx5_cmd_exec(dev, in, sizeof(in), b->out, b->outlen);
-}
-
-void mlx5_cmd_fc_bulk_get(struct mlx5_core_dev *dev,
-			  struct mlx5_cmd_fc_bulk *b, u32 id,
-			  u64 *packets, u64 *bytes)
-{
-	int index = id - b->id;
-	void *stats;
-
-	if (index < 0 || index >= b->num) {
-		mlx5_core_warn(dev, "Flow counter id (0x%x) out of range (0x%x..0x%x). Counter ignored.\n",
-			       id, b->id, b->id + b->num - 1);
-		return;
-	}
-
-	stats = MLX5_ADDR_OF(query_flow_counter_out, b->out,
-			     flow_statistics[index]);
-	*packets = MLX5_GET64(traffic_counter, stats, packets);
-	*bytes = MLX5_GET64(traffic_counter, stats, octets);
+	MLX5_SET(query_flow_counter_in, in, flow_counter_id, base_id);
+	MLX5_SET(query_flow_counter_in, in, num_of_counters, bulk_len);
+	return mlx5_cmd_exec(dev, in, sizeof(in), out, outlen);
 }
 
 int mlx5_packet_reformat_alloc(struct mlx5_core_dev *dev,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h
index e340f9af2f5a..db49eabba98d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h
@@ -82,16 +82,9 @@ int mlx5_cmd_fc_free(struct mlx5_core_dev *dev, u32 id);
 int mlx5_cmd_fc_query(struct mlx5_core_dev *dev, u32 id,
 		      u64 *packets, u64 *bytes);
 
-struct mlx5_cmd_fc_bulk;
-
-struct mlx5_cmd_fc_bulk *
-mlx5_cmd_fc_bulk_alloc(struct mlx5_core_dev *dev, u32 id, int num);
-void mlx5_cmd_fc_bulk_free(struct mlx5_cmd_fc_bulk *b);
-int
-mlx5_cmd_fc_bulk_query(struct mlx5_core_dev *dev, struct mlx5_cmd_fc_bulk *b);
-void mlx5_cmd_fc_bulk_get(struct mlx5_core_dev *dev,
-			  struct mlx5_cmd_fc_bulk *b, u32 id,
-			  u64 *packets, u64 *bytes);
+int mlx5_cmd_fc_get_bulk_query_out_len(int bulk_len);
+int mlx5_cmd_fc_bulk_query(struct mlx5_core_dev *dev, u32 base_id, int bulk_len,
+			   u32 *out);
 
 const struct mlx5_flow_cmds *mlx5_fs_cmd_get_default(enum fs_flow_table_type type);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c
index b3762123a69c..067a4b56498b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c
@@ -75,7 +75,7 @@ struct mlx5_fc {
  * access to counter list:
  * - create (user context)
  *   - mlx5_fc_create() only adds to an addlist to be used by
- *     mlx5_fc_stats_query_work(). addlist is a lockless single linked list
+ *     mlx5_fc_stats_work(). addlist is a lockless single linked list
  *     that doesn't require any additional synchronization when adding single
  *     node.
  *   - spawn thread to do the actual destroy
@@ -136,72 +136,69 @@ static void mlx5_fc_stats_remove(struct mlx5_core_dev *dev,
 	spin_unlock(&fc_stats->counters_idr_lock);
 }
 
-/* The function returns the last counter that was queried so the caller
- * function can continue calling it till all counters are queried.
- */
-static struct mlx5_fc *mlx5_fc_stats_query(struct mlx5_core_dev *dev,
-					   struct mlx5_fc *first,
-					   u32 last_id)
+static int get_max_bulk_query_len(struct mlx5_core_dev *dev)
 {
-	struct mlx5_fc_stats *fc_stats = &dev->priv.fc_stats;
-	struct mlx5_fc *counter = NULL;
-	struct mlx5_cmd_fc_bulk *b;
-	bool more = false;
-	u32 afirst_id;
-	int num;
-	int err;
+	return min_t(int, MLX5_SW_MAX_COUNTERS_BULK,
+			  (1 << MLX5_CAP_GEN(dev, log_max_flow_counter_bulk)));
+}
 
-	int max_bulk = min_t(int, MLX5_SW_MAX_COUNTERS_BULK,
-			     (1 << MLX5_CAP_GEN(dev, log_max_flow_counter_bulk)));
+static void update_counter_cache(int index, u32 *bulk_raw_data,
+				 struct mlx5_fc_cache *cache)
+{
+	void *stats = MLX5_ADDR_OF(query_flow_counter_out, bulk_raw_data,
+			     flow_statistics[index]);
+	u64 packets = MLX5_GET64(traffic_counter, stats, packets);
+	u64 bytes = MLX5_GET64(traffic_counter, stats, octets);
 
-	/* first id must be aligned to 4 when using bulk query */
-	afirst_id = first->id & ~0x3;
+	if (cache->packets == packets)
+		return;
 
-	/* number of counters to query inc. the last counter */
-	num = ALIGN(last_id - afirst_id + 1, 4);
-	if (num > max_bulk) {
-		num = max_bulk;
-		last_id = afirst_id + num - 1;
-	}
+	cache->packets = packets;
+	cache->bytes = bytes;
+	cache->lastuse = jiffies;
+}
 
-	b = mlx5_cmd_fc_bulk_alloc(dev, afirst_id, num);
-	if (!b) {
-		mlx5_core_err(dev, "Error allocating resources for bulk query\n");
-		return NULL;
-	}
+static void mlx5_fc_stats_query_counter_range(struct mlx5_core_dev *dev,
+					      struct mlx5_fc *first,
+					      u32 last_id)
+{
+	struct mlx5_fc_stats *fc_stats = &dev->priv.fc_stats;
+	bool query_more_counters = (first->id <= last_id);
+	int max_bulk_len = get_max_bulk_query_len(dev);
+	u32 *data = fc_stats->bulk_query_out;
+	struct mlx5_fc *counter = first;
+	u32 bulk_base_id;
+	int bulk_len;
+	int err;
 
-	err = mlx5_cmd_fc_bulk_query(dev, b);
-	if (err) {
-		mlx5_core_err(dev, "Error doing bulk query: %d\n", err);
-		goto out;
-	}
+	while (query_more_counters) {
+		/* first id must be aligned to 4 when using bulk query */
+		bulk_base_id = counter->id & ~0x3;
 
-	counter = first;
-	list_for_each_entry_from(counter, &fc_stats->counters, list) {
-		struct mlx5_fc_cache *c = &counter->cache;
-		u64 packets;
-		u64 bytes;
+		/* number of counters to query inc. the last counter */
+		bulk_len = min_t(int, max_bulk_len,
+				 ALIGN(last_id - bulk_base_id + 1, 4));
 
-		if (counter->id > last_id) {
-			more = true;
-			break;
+		err = mlx5_cmd_fc_bulk_query(dev, bulk_base_id, bulk_len,
+					     data);
+		if (err) {
+			mlx5_core_err(dev, "Error doing bulk query: %d\n", err);
+			return;
 		}
+		query_more_counters = false;
 
-		mlx5_cmd_fc_bulk_get(dev, b,
-				     counter->id, &packets, &bytes);
+		list_for_each_entry_from(counter, &fc_stats->counters, list) {
+			int counter_index = counter->id - bulk_base_id;
+			struct mlx5_fc_cache *cache = &counter->cache;
 
-		if (c->packets == packets)
-			continue;
+			if (counter->id >= bulk_base_id + bulk_len) {
+				query_more_counters = true;
+				break;
+			}
 
-		c->packets = packets;
-		c->bytes = bytes;
-		c->lastuse = jiffies;
+			update_counter_cache(counter_index, data, cache);
+		}
 	}
-
-out:
-	mlx5_cmd_fc_bulk_free(b);
-
-	return more ? counter : NULL;
 }
 
 static void mlx5_free_fc(struct mlx5_core_dev *dev,
@@ -244,8 +241,8 @@ static void mlx5_fc_stats_work(struct work_struct *work)
 
 	counter = list_first_entry(&fc_stats->counters, struct mlx5_fc,
 				   list);
-	while (counter)
-		counter = mlx5_fc_stats_query(dev, counter, last->id);
+	if (counter)
+		mlx5_fc_stats_query_counter_range(dev, counter, last->id);
 
 	fc_stats->next_query = now + fc_stats->sampling_interval;
 }
@@ -324,6 +321,8 @@ EXPORT_SYMBOL(mlx5_fc_destroy);
 int mlx5_init_fc_stats(struct mlx5_core_dev *dev)
 {
 	struct mlx5_fc_stats *fc_stats = &dev->priv.fc_stats;
+	int max_bulk_len;
+	int max_out_len;
 
 	spin_lock_init(&fc_stats->counters_idr_lock);
 	idr_init(&fc_stats->counters_idr);
@@ -331,14 +330,24 @@ int mlx5_init_fc_stats(struct mlx5_core_dev *dev)
 	init_llist_head(&fc_stats->addlist);
 	init_llist_head(&fc_stats->dellist);
 
+	max_bulk_len = get_max_bulk_query_len(dev);
+	max_out_len = mlx5_cmd_fc_get_bulk_query_out_len(max_bulk_len);
+	fc_stats->bulk_query_out = kzalloc(max_out_len, GFP_KERNEL);
+	if (!fc_stats->bulk_query_out)
+		return -ENOMEM;
+
 	fc_stats->wq = create_singlethread_workqueue("mlx5_fc");
 	if (!fc_stats->wq)
-		return -ENOMEM;
+		goto err_wq_create;
 
 	fc_stats->sampling_interval = MLX5_FC_STATS_PERIOD;
 	INIT_DELAYED_WORK(&fc_stats->work, mlx5_fc_stats_work);
 
 	return 0;
+
+err_wq_create:
+	kfree(fc_stats->bulk_query_out);
+	return -ENOMEM;
 }
 
 void mlx5_cleanup_fc_stats(struct mlx5_core_dev *dev)
@@ -352,6 +361,8 @@ void mlx5_cleanup_fc_stats(struct mlx5_core_dev *dev)
 	destroy_workqueue(dev->priv.fc_stats.wq);
 	dev->priv.fc_stats.wq = NULL;
 
+	kfree(fc_stats->bulk_query_out);
+
 	idr_destroy(&fc_stats->counters_idr);
 
 	tmplist = llist_del_all(&fc_stats->addlist);
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 0e6da1840c7d..267b2bc0ca4a 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -488,6 +488,7 @@ struct mlx5_fc_stats {
 	struct delayed_work work;
 	unsigned long next_query;
 	unsigned long sampling_interval; /* jiffies */
+	u32 *bulk_query_out;
 };
 
 struct mlx5_events;
-- 
2.21.0


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox