Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 1/3 net-next] linux: Add skb_frag_t page_offset accessors
From: Jonathan Lemon @ 2019-07-29 21:37 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, kernel-team, netdev, Matthew Wilcox
In-Reply-To: <20190729142211.43b5ccd8@cakuba.netronome.com>



On 29 Jul 2019, at 14:22, Jakub Kicinski wrote:

> On Mon, 29 Jul 2019 14:02:21 -0700, Jonathan Lemon wrote:
>> On 29 Jul 2019, at 13:50, Jakub Kicinski wrote:
>>> On Mon, 29 Jul 2019 10:19:39 -0700, Jonathan Lemon wrote:
>>>> Add skb_frag_off(), skb_frag_off_add(), skb_frag_off_set(),
>>>> and skb_frag_off_set_from() accessors for page_offset.
>>>>
>>>> Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
>
>>>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>>>> index 718742b1c505..7d94a78067ee 100644
>>>> --- a/include/linux/skbuff.h
>>>> +++ b/include/linux/skbuff.h
>>>> @@ -331,7 +331,7 @@ static inline void skb_frag_size_set(skb_frag_t
>>>> *frag, unsigned int size)
>>>>  }
>>>>
>>>>  /**
>>>> - * skb_frag_size_add - Incrementes the size of a skb fragment by
>>>> %delta
>>>> + * skb_frag_size_add - Increments the size of a skb fragment by
>>>> %delta
>>>>   * @frag: skb fragment
>>>>   * @delta: value to add
>>>>   */
>>>> @@ -2857,6 +2857,46 @@ static inline void
>>>> skb_propagate_pfmemalloc(struct page *page,
>>>>  		skb->pfmemalloc = true;
>>>>  }
>>>>
>>>> +/**
>>>> + * skb_frag_off - Returns the offset of a skb fragment
>>>> + * @frag: the paged fragment
>>>> + */
>>>> +static inline unsigned int skb_frag_off(const skb_frag_t *frag)
>>>> +{
>>>> +	return frag->page_offset;
>>>> +}
>>>> +
>>>> +/**
>>>> + * skb_frag_off_add - Increments the offset of a skb fragment by
>>>> %delta
>>>
>>> I realize you're following the existing code, but should we perhaps
>>> use
>>> the latest kdoc syntax? '()' after function name, and args should 
>>> have
>>> '@' prefix, '%' would be for constants.
>>
>> That would be a task for a different cleanup.  Not that I disagree 
>> with
>> you, but there's also nothing worse than mixing styles in the same 
>> file.
>
> Funny you should say that given that (a) I'm commenting on the new 
> code
> you're adding, and (b) you did do an unrelated spelling fix above ;)
>
>>>> + * @frag: skb fragment
>>>> + * @delta: value to add
>>>> + */
>>>> +static inline void skb_frag_off_add(skb_frag_t *frag, int delta)
>>>> +{
>>>> +	frag->page_offset += delta;
>>>> +}
>>>> +
>>>> +/**
>>>> + * skb_frag_off_set - Sets the offset of a skb fragment
>>>> + * @frag: skb fragment
>>>> + * @offset: offset of fragment
>>>> + */
>>>> +static inline void skb_frag_off_set(skb_frag_t *frag, unsigned int
>>>> offset)
>>>> +{
>>>> +	frag->page_offset = offset;
>>>> +}
>>>> +
>>>> +/**
>>>> + * skb_frag_off_set_from - Sets the offset of a skb fragment from
>>>> another fragment
>>>> + * @fragto: skb fragment where offset is set
>>>> + * @fragfrom: skb fragment offset is copied from
>>>> + */
>>>> +static inline void skb_frag_off_set_from(skb_frag_t *fragto,
>>>> +					 const skb_frag_t *fragfrom)
>>>
>>> skb_frag_off_copy() ?
>>
>> That was my initial inclination, but due to the often overloaded
>> connotations of the word "copy", opted to use the same "set" verbiage
>> that existed in the other functions.
>
> There is no need to ponder the connotations of verbs. Please just
> look at other function names in skbuff.h, especially those which
> copy fields :)
>
> static inline void skb_copy_hash(struct sk_buff *to, const struct 
> sk_buff *from)
> static inline void skb_copy_secmark(struct sk_buff *to, const struct 
> sk_buff *from)
> static inline void skb_copy_queue_mapping(struct sk_buff *to, const 
> struct sk_buff *from)
> static inline void skb_ext_copy(struct sk_buff *dst, const struct 
> sk_buff *src)

Okay, I missed those, let me respin.
-- 
Jonathan

^ permalink raw reply

* Re: [PATCH V36 23/29] bpf: Restrict bpf when kernel lockdown is in confidentiality mode
From: Matthew Garrett @ 2019-07-29 21:47 UTC (permalink / raw)
  To: James Morris
  Cc: LSM List, Linux Kernel Mailing List, Linux API, David Howells,
	Alexei Starovoitov, Network Development, Chun-Yi Lee,
	Daniel Borkmann
In-Reply-To: <20190718194415.108476-24-matthewgarrett@google.com>

On Thu, Jul 18, 2019 at 12:45 PM Matthew Garrett
<matthewgarrett@google.com> wrote:
> bpf_read() and bpf_read_str() could potentially be abused to (eg) allow
> private keys in kernel memory to be leaked. Disable them if the kernel
> has been locked down in confidentiality mode.
>
> Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Signed-off-by: Matthew Garrett <mjg59@google.com>
> cc: netdev@vger.kernel.org
> cc: Chun-Yi Lee <jlee@suse.com>
> cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Cc: Daniel Borkmann <daniel@iogearbox.net>

Any further feedback on this?

^ permalink raw reply

* [net-next:master 57/59] drivers/net/phy/mdio-cavium.h:111:36: error: implicit declaration of function 'writeq'; did you mean 'writel'?
From: kbuild test robot @ 2019-07-29 21:45 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: kbuild-all, netdev

[-- Attachment #1: Type: text/plain, Size: 4206 bytes --]

tree:   https://kernel.googlesource.com/pub/scm/linux/kernel/git/davem/net-next.git master
head:   1cb9dfca39eb406960f8f84864ddd6ba329ec321
commit: 171a9bae68c72f2d1260c3825203760856e6793b [57/59] staging/octeon: Allow test build on !MIPS
config: c6x-allyesconfig (attached as .config)
compiler: c6x-elf-gcc (GCC) 7.4.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        git checkout 171a9bae68c72f2d1260c3825203760856e6793b
        # save the attached .config to linux build tree
        GCC_VERSION=7.4.0 make.cross ARCH=c6x 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   drivers/net/phy/mdio-octeon.c: In function 'octeon_mdiobus_probe':
   drivers/net/phy/mdio-octeon.c:48:3: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
      (u64)devm_ioremap(&pdev->dev, mdio_phys, regsize);
      ^
   In file included from drivers/net/phy/mdio-octeon.c:14:0:
>> drivers/net/phy/mdio-cavium.h:111:36: error: implicit declaration of function 'writeq'; did you mean 'writel'? [-Werror=implicit-function-declaration]
    #define oct_mdio_writeq(val, addr) writeq(val, (void *)addr)
                                       ^
   drivers/net/phy/mdio-octeon.c:56:2: note: in expansion of macro 'oct_mdio_writeq'
     oct_mdio_writeq(smi_en.u64, bus->register_base + SMI_EN);
     ^~~~~~~~~~~~~~~
   drivers/net/phy/mdio-cavium.h:111:48: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
    #define oct_mdio_writeq(val, addr) writeq(val, (void *)addr)
                                                   ^
   drivers/net/phy/mdio-octeon.c:56:2: note: in expansion of macro 'oct_mdio_writeq'
     oct_mdio_writeq(smi_en.u64, bus->register_base + SMI_EN);
     ^~~~~~~~~~~~~~~
   drivers/net/phy/mdio-cavium.h:111:48: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
    #define oct_mdio_writeq(val, addr) writeq(val, (void *)addr)
                                                   ^
   drivers/net/phy/mdio-octeon.c:77:2: note: in expansion of macro 'oct_mdio_writeq'
     oct_mdio_writeq(smi_en.u64, bus->register_base + SMI_EN);
     ^~~~~~~~~~~~~~~
   drivers/net/phy/mdio-octeon.c: In function 'octeon_mdiobus_remove':
   drivers/net/phy/mdio-cavium.h:111:48: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
    #define oct_mdio_writeq(val, addr) writeq(val, (void *)addr)
                                                   ^
   drivers/net/phy/mdio-octeon.c:91:2: note: in expansion of macro 'oct_mdio_writeq'
     oct_mdio_writeq(smi_en.u64, bus->register_base + SMI_EN);
     ^~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors

vim +111 drivers/net/phy/mdio-cavium.h

1eefee901fca020 David Daney 2016-03-11  105  
1eefee901fca020 David Daney 2016-03-11  106  static inline u64 oct_mdio_readq(u64 addr)
1eefee901fca020 David Daney 2016-03-11  107  {
1eefee901fca020 David Daney 2016-03-11  108  	return cvmx_read_csr(addr);
1eefee901fca020 David Daney 2016-03-11  109  }
1eefee901fca020 David Daney 2016-03-11  110  #else
1eefee901fca020 David Daney 2016-03-11 @111  #define oct_mdio_writeq(val, addr)	writeq(val, (void *)addr)
1eefee901fca020 David Daney 2016-03-11  112  #define oct_mdio_readq(addr)		readq((void *)addr)
1eefee901fca020 David Daney 2016-03-11  113  #endif
1eefee901fca020 David Daney 2016-03-11  114  
1eefee901fca020 David Daney 2016-03-11  115  int cavium_mdiobus_read(struct mii_bus *bus, int phy_id, int regnum);
1eefee901fca020 David Daney 2016-03-11  116  int cavium_mdiobus_write(struct mii_bus *bus, int phy_id, int regnum, u16 val);

:::::: The code at line 111 was first introduced by commit
:::::: 1eefee901fca0208b8a56f20cdc134e2b8638ae7 phy: mdio-octeon: Refactor into two files/modules

:::::: TO: David Daney <david.daney@cavium.com>
:::::: CC: David S. Miller <davem@davemloft.net>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 49823 bytes --]

^ permalink raw reply

* [PATCH bpf-next 0/2] bpf: allocate extra memory for setsockopt hook buffer
From: Stanislav Fomichev @ 2019-07-29 21:51 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, Stanislav Fomichev

Current setsockopt hook is limited to the size of the buffer that
user had supplied. Since we always allocate memory and copy the value
into kernel space, allocate just a little bit more in case BPF
program needs to override input data with a larger value.

The canonical example is TCP_CONGESTION socket option where
input buffer is a string and if user calls it with a short string,
BPF program has no way of extending it.

The tests are extended with TCP_CONGESTION use case.

Stanislav Fomichev (2):
  bpf: always allocate at least 16 bytes for setsockopt hook
  selftests/bpf: extend sockopt_sk selftest with TCP_CONGESTION use case

 kernel/bpf/cgroup.c                           | 17 ++++++++++---
 .../testing/selftests/bpf/progs/sockopt_sk.c  | 22 ++++++++++++++++
 tools/testing/selftests/bpf/test_sockopt_sk.c | 25 +++++++++++++++++++
 3 files changed, 60 insertions(+), 4 deletions(-)

-- 
2.22.0.709.g102302147b-goog

^ permalink raw reply

* [PATCH bpf-next 1/2] bpf: always allocate at least 16 bytes for setsockopt hook
From: Stanislav Fomichev @ 2019-07-29 21:51 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, Stanislav Fomichev
In-Reply-To: <20190729215111.209219-1-sdf@google.com>

Since we always allocate memory, allocate just a little bit more
for the BPF program in case it need to override user input with
bigger value. The canonical example is TCP_CONGESTION where
input string might be too small to override (nv -> bbr or cubic).

16 bytes are chosen to match the size of TCP_CA_NAME_MAX and can
be extended in the future if needed.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 kernel/bpf/cgroup.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 0a00eaca6fae..6a6a154cfa7b 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -964,7 +964,6 @@ static int sockopt_alloc_buf(struct bpf_sockopt_kern *ctx, int max_optlen)
 		return -ENOMEM;
 
 	ctx->optval_end = ctx->optval + max_optlen;
-	ctx->optlen = max_optlen;
 
 	return 0;
 }
@@ -984,7 +983,7 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level,
 		.level = *level,
 		.optname = *optname,
 	};
-	int ret;
+	int ret, max_optlen;
 
 	/* Opportunistic check to see whether we have any BPF program
 	 * attached to the hook so we don't waste time allocating
@@ -994,10 +993,18 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level,
 	    __cgroup_bpf_prog_array_is_empty(cgrp, BPF_CGROUP_SETSOCKOPT))
 		return 0;
 
-	ret = sockopt_alloc_buf(&ctx, *optlen);
+	/* Allocate a bit more than the initial user buffer for
+	 * BPF program. The canonical use case is overriding
+	 * TCP_CONGESTION(nv) to TCP_CONGESTION(cubic).
+	 */
+	max_optlen = max_t(int, 16, *optlen);
+
+	ret = sockopt_alloc_buf(&ctx, max_optlen);
 	if (ret)
 		return ret;
 
+	ctx.optlen = *optlen;
+
 	if (copy_from_user(ctx.optval, optval, *optlen) != 0) {
 		ret = -EFAULT;
 		goto out;
@@ -1016,7 +1023,7 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level,
 	if (ctx.optlen == -1) {
 		/* optlen set to -1, bypass kernel */
 		ret = 1;
-	} else if (ctx.optlen > *optlen || ctx.optlen < -1) {
+	} else if (ctx.optlen > max_optlen || ctx.optlen < -1) {
 		/* optlen is out of bounds */
 		ret = -EFAULT;
 	} else {
@@ -1063,6 +1070,8 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
 	if (ret)
 		return ret;
 
+	ctx.optlen = max_optlen;
+
 	if (!retval) {
 		/* If kernel getsockopt finished successfully,
 		 * copy whatever was returned to the user back
-- 
2.22.0.709.g102302147b-goog


^ permalink raw reply related

* [PATCH bpf-next 2/2] selftests/bpf: extend sockopt_sk selftest with TCP_CONGESTION use case
From: Stanislav Fomichev @ 2019-07-29 21:51 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, Stanislav Fomichev
In-Reply-To: <20190729215111.209219-1-sdf@google.com>

Ignore SOL_TCP:TCP_CONGESTION in getsockopt and always override
SOL_TCP:TCP_CONGESTION with "cubic" in setsockopt hook.

Call setsockopt(SOL_TCP, TCP_CONGESTION) with short optval ("nv")
to make sure BPF program has enough buffer space to replace it
with "cubic".

Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 .../testing/selftests/bpf/progs/sockopt_sk.c  | 22 ++++++++++++++++
 tools/testing/selftests/bpf/test_sockopt_sk.c | 25 +++++++++++++++++++
 2 files changed, 47 insertions(+)

diff --git a/tools/testing/selftests/bpf/progs/sockopt_sk.c b/tools/testing/selftests/bpf/progs/sockopt_sk.c
index 076122c898e9..9a3d1c79e6fe 100644
--- a/tools/testing/selftests/bpf/progs/sockopt_sk.c
+++ b/tools/testing/selftests/bpf/progs/sockopt_sk.c
@@ -1,5 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
+#include <string.h>
 #include <netinet/in.h>
+#include <netinet/tcp.h>
 #include <linux/bpf.h>
 #include "bpf_helpers.h"
 
@@ -42,6 +44,14 @@ int _getsockopt(struct bpf_sockopt *ctx)
 		return 1;
 	}
 
+	if (ctx->level == SOL_TCP && ctx->optname == TCP_CONGESTION) {
+		/* Not interested in SOL_TCP:TCP_CONGESTION;
+		 * let next BPF program in the cgroup chain or kernel
+		 * handle it.
+		 */
+		return 1;
+	}
+
 	if (ctx->level != SOL_CUSTOM)
 		return 0; /* EPERM, deny everything except custom level */
 
@@ -91,6 +101,18 @@ int _setsockopt(struct bpf_sockopt *ctx)
 		return 1;
 	}
 
+	if (ctx->level == SOL_TCP && ctx->optname == TCP_CONGESTION) {
+		/* Always use cubic */
+
+		if (optval + 5 > optval_end)
+			return 0; /* EPERM, bounds check */
+
+		memcpy(optval, "cubic", 5);
+		ctx->optlen = 5;
+
+		return 1;
+	}
+
 	if (ctx->level != SOL_CUSTOM)
 		return 0; /* EPERM, deny everything except custom level */
 
diff --git a/tools/testing/selftests/bpf/test_sockopt_sk.c b/tools/testing/selftests/bpf/test_sockopt_sk.c
index 036b652e5ca9..e4f6055d92e9 100644
--- a/tools/testing/selftests/bpf/test_sockopt_sk.c
+++ b/tools/testing/selftests/bpf/test_sockopt_sk.c
@@ -6,6 +6,7 @@
 #include <sys/types.h>
 #include <sys/socket.h>
 #include <netinet/in.h>
+#include <netinet/tcp.h>
 
 #include <linux/filter.h>
 #include <bpf/bpf.h>
@@ -25,6 +26,7 @@ static int getsetsockopt(void)
 	union {
 		char u8[4];
 		__u32 u32;
+		char cc[16]; /* TCP_CA_NAME_MAX */
 	} buf = {};
 	socklen_t optlen;
 
@@ -115,6 +117,29 @@ static int getsetsockopt(void)
 		goto err;
 	}
 
+	/* TCP_CONGESTION can extend the string */
+
+	strcpy(buf.cc, "nv");
+	err = setsockopt(fd, SOL_TCP, TCP_CONGESTION, &buf, strlen("nv"));
+	if (err) {
+		log_err("Failed to call setsockopt(TCP_CONGESTION)");
+		goto err;
+	}
+
+
+	optlen = sizeof(buf.cc);
+	err = getsockopt(fd, SOL_TCP, TCP_CONGESTION, &buf, &optlen);
+	if (err) {
+		log_err("Failed to call getsockopt(TCP_CONGESTION)");
+		goto err;
+	}
+
+	if (strcmp(buf.cc, "cubic") != 0) {
+		log_err("Unexpected getsockopt(TCP_CONGESTION) %s != %s",
+			buf.cc, "cubic");
+		goto err;
+	}
+
 	close(fd);
 	return 0;
 err:
-- 
2.22.0.709.g102302147b-goog


^ permalink raw reply related

* Re: [PATCH 1/3 net-next] linux: Add skb_frag_t page_offset accessors
From: Jonathan Lemon @ 2019-07-29 21:53 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, kernel-team, netdev, Matthew Wilcox
In-Reply-To: <20190729142548.028d5a2b@cakuba.netronome.com>



On 29 Jul 2019, at 14:25, Jakub Kicinski wrote:

> On Mon, 29 Jul 2019 14:22:11 -0700, Jakub Kicinski wrote:
>>>> I realize you're following the existing code, but should we perhaps
>>>> use
>>>> the latest kdoc syntax? '()' after function name, and args should have
>>>> '@' prefix, '%' would be for constants.
>>>
>>> That would be a task for a different cleanup.  Not that I disagree with
>>> you, but there's also nothing worse than mixing styles in the same file.
>>
>> Funny you should say that given that (a) I'm commenting on the new code
>> you're adding, and (b) you did do an unrelated spelling fix above ;)
>
> Ah, sorry I misread your comment there.
>
> Some code already uses '()' in this file, as for the '%' skb_frag_
> functions are the only one which have this mistake, the rest of kdoc
> is correct.

The kernel-doc.rst guide seems to indicate that function names should
have () at the end - but none of them do so within this file.  (only when
talking about the function in the document).

The %CONST indicates name of a constant - I'm unclear whether this is
supposed to refer to a constant parameter.  For example:

/**
 *      __skb_peek - peek at the head of a non-empty &sk_buff_head
 *      @list_: list to peek at
 *
 *      Like skb_peek(), but the caller knows that the list is not empty.
 */
static inline struct sk_buff *__skb_peek(const struct sk_buff_head *list_)
{
        return list_->next;
}

^ permalink raw reply

* Re: [PATCH net-next 00/16] bnxt_en: Add TPA (GRO_HW and LRO) on 57500 chips.
From: Michael Chan @ 2019-07-29 22:00 UTC (permalink / raw)
  To: David Miller; +Cc: Netdev
In-Reply-To: <20190729.142455.1728471766679878919.davem@davemloft.net>

On Mon, Jul 29, 2019 at 2:24 PM David Miller <davem@davemloft.net> wrote:
>
> From: Michael Chan <michael.chan@broadcom.com>
> Date: Mon, 29 Jul 2019 06:10:17 -0400
>
> > This patchset adds TPA v2 support on the 57500 chips.  TPA v2 is
> > different from the legacy TPA scheme on older chips and requires major
> > refactoring and restructuring of the existing TPA logic.  The main
> > difference is that the new TPA v2 has on-the-fly aggregation buffer
> > completions before a TPA packet is completed.  The larger aggregation
> > ID space also requires a new ID mapping logic to make it more
> > memory efficient.
>
> Series applied, but please explain something to me.
>
> I thought initially while reviewing this that patch #5 makes the series
> non-bisectable because this only includes the logic that appends new
> entries to the agg array but lacks the changes to reset the agg count
> at TPE end time (which occurs in patch #8).
>
> However I then realized that you haven't turned on the logic yet that
> can result in CMP_TYPE_RX_TPA_AGG_CMP entries in this context.
>
> Am I right?

Yes, correct.  Everything is built up incrementally and the new GRO_HW
and LRO features on the new chip can only be enabled after patch #14.

Thanks.

^ permalink raw reply

* Re: [PATCH 1/3 net-next] linux: Add skb_frag_t page_offset accessors
From: Jakub Kicinski @ 2019-07-29 22:02 UTC (permalink / raw)
  To: Jonathan Lemon; +Cc: davem, kernel-team, netdev, Matthew Wilcox
In-Reply-To: <94802E4A-536A-4249-BEA3-5D89E8073738@gmail.com>

On Mon, 29 Jul 2019 14:53:45 -0700, Jonathan Lemon wrote:
> On 29 Jul 2019, at 14:25, Jakub Kicinski wrote:
> 
> > On Mon, 29 Jul 2019 14:22:11 -0700, Jakub Kicinski wrote:  
> >>>> I realize you're following the existing code, but should we perhaps
> >>>> use
> >>>> the latest kdoc syntax? '()' after function name, and args should have
> >>>> '@' prefix, '%' would be for constants.  
> >>>
> >>> That would be a task for a different cleanup.  Not that I disagree with
> >>> you, but there's also nothing worse than mixing styles in the same file.  
> >>
> >> Funny you should say that given that (a) I'm commenting on the new code
> >> you're adding, and (b) you did do an unrelated spelling fix above ;)  
> >
> > Ah, sorry I misread your comment there.
> >
> > Some code already uses '()' in this file, as for the '%' skb_frag_
> > functions are the only one which have this mistake, the rest of kdoc
> > is correct.  
> 
> The kernel-doc.rst guide seems to indicate that function names should
> have () at the end - but none of them do so within this file.  (only when
> talking about the function in the document).

/**
 * skb_complete_tx_timestamp() - deliver cloned skb with tx timestamps

/**
 * skb_tx_timestamp() - Driver hook for transmit timestamping

> The %CONST indicates name of a constant - I'm unclear whether this is
> supposed to refer to a constant parameter.  For example:
> 
> /**
>  *      __skb_peek - peek at the head of a non-empty &sk_buff_head
>  *      @list_: list to peek at
>  *
>  *      Like skb_peek(), but the caller knows that the list is not empty.
>  */
> static inline struct sk_buff *__skb_peek(const struct sk_buff_head *list_)
> {
>         return list_->next;
> }

Hmm.. I'm not sure I follow, this example does not use %, but & which
is for types. Quoting from:

https://www.kernel.org/doc/html/latest/doc-guide/kernel-doc.html#highlights-and-cross-references

@parameter
    Name of a function parameter. (No cross-referencing, just formatting.)
%CONST
    Name of a constant. (No cross-referencing, just formatting.)


So in your case you should use @delta, rather than %delta.

^ permalink raw reply

* [PATCH] net: smc911x: Mark expected switch fall-through
From: Gustavo A. R. Silva @ 2019-07-29 22:10 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, linux-kernel, Gustavo A. R. Silva, Kees Cook

Mark switch cases where we are expecting to fall through.

This patch fixes the following warning (Building: arm):

drivers/net/ethernet/smsc/smc911x.c: In function ‘smc911x_phy_detect’:
drivers/net/ethernet/smsc/smc911x.c:677:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
    if (cfg & HW_CFG_EXT_PHY_DET_) {
       ^
drivers/net/ethernet/smsc/smc911x.c:715:3: note: here
   default:
   ^~~~~~~

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
---
 drivers/net/ethernet/smsc/smc911x.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/smsc/smc911x.c b/drivers/net/ethernet/smsc/smc911x.c
index bd14803545de..8d88e4083456 100644
--- a/drivers/net/ethernet/smsc/smc911x.c
+++ b/drivers/net/ethernet/smsc/smc911x.c
@@ -712,6 +712,7 @@ static void smc911x_phy_detect(struct net_device *dev)
 					/* Found an external PHY */
 					break;
 			}
+			/* Else, fall through */
 		default:
 			/* Internal media only */
 			SMC_GET_PHY_ID1(lp, 1, id1);
-- 
2.22.0


^ permalink raw reply related

* Re: [PATCH] net: smc911x: Mark expected switch fall-through
From: David Miller @ 2019-07-29 22:13 UTC (permalink / raw)
  To: gustavo; +Cc: netdev, linux-kernel, keescook
In-Reply-To: <20190729221016.GA17610@embeddedor>

From: "Gustavo A. R. Silva" <gustavo@embeddedor.com>
Date: Mon, 29 Jul 2019 17:10:16 -0500

> Mark switch cases where we are expecting to fall through.
> 
> This patch fixes the following warning (Building: arm):
> 
> drivers/net/ethernet/smsc/smc911x.c: In function ‘smc911x_phy_detect’:
> drivers/net/ethernet/smsc/smc911x.c:677:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
>     if (cfg & HW_CFG_EXT_PHY_DET_) {
>        ^
> drivers/net/ethernet/smsc/smc911x.c:715:3: note: here
>    default:
>    ^~~~~~~
> 
> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>

Applied.

^ permalink raw reply

* [PATCH] tipc: compat: allow tipc commands without arguments
From: Taras Kondratiuk @ 2019-07-29 22:15 UTC (permalink / raw)
  To: Jon Maloy, Ying Xue, David S. Miller
  Cc: netdev, tipc-discussion, linux-kernel, xe-linux-external, stable

Commit 2753ca5d9009 ("tipc: fix uninit-value in tipc_nl_compat_doit")
broke older tipc tools that use compat interface (e.g. tipc-config from
tipcutils package):

% tipc-config -p
operation not supported

The commit started to reject TIPC netlink compat messages that do not
have attributes. It is too restrictive because some of such messages are
valid (they don't need any arguments):

% grep 'tx none' include/uapi/linux/tipc_config.h
#define  TIPC_CMD_NOOP              0x0000    /* tx none, rx none */
#define  TIPC_CMD_GET_MEDIA_NAMES   0x0002    /* tx none, rx media_name(s) */
#define  TIPC_CMD_GET_BEARER_NAMES  0x0003    /* tx none, rx bearer_name(s) */
#define  TIPC_CMD_SHOW_PORTS        0x0006    /* tx none, rx ultra_string */
#define  TIPC_CMD_GET_REMOTE_MNG    0x4003    /* tx none, rx unsigned */
#define  TIPC_CMD_GET_MAX_PORTS     0x4004    /* tx none, rx unsigned */
#define  TIPC_CMD_GET_NETID         0x400B    /* tx none, rx unsigned */
#define  TIPC_CMD_NOT_NET_ADMIN     0xC001    /* tx none, rx none */

This patch relaxes the original fix and rejects messages without
arguments only if such arguments are expected by a command (reg_type is
non zero).

Fixes: 2753ca5d9009 ("tipc: fix uninit-value in tipc_nl_compat_doit")
Cc: stable@vger.kernel.org
Signed-off-by: Taras Kondratiuk <takondra@cisco.com>
---
The patch is based on v5.3-rc2.

 net/tipc/netlink_compat.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/net/tipc/netlink_compat.c b/net/tipc/netlink_compat.c
index d86030ef1232..e135d4e11231 100644
--- a/net/tipc/netlink_compat.c
+++ b/net/tipc/netlink_compat.c
@@ -55,6 +55,7 @@ struct tipc_nl_compat_msg {
 	int rep_type;
 	int rep_size;
 	int req_type;
+	int req_size;
 	struct net *net;
 	struct sk_buff *rep;
 	struct tlv_desc *req;
@@ -257,7 +258,8 @@ static int tipc_nl_compat_dumpit(struct tipc_nl_compat_cmd_dump *cmd,
 	int err;
 	struct sk_buff *arg;
 
-	if (msg->req_type && !TLV_CHECK_TYPE(msg->req, msg->req_type))
+	if (msg->req_type && (!msg->req_size ||
+			      !TLV_CHECK_TYPE(msg->req, msg->req_type)))
 		return -EINVAL;
 
 	msg->rep = tipc_tlv_alloc(msg->rep_size);
@@ -354,7 +356,8 @@ static int tipc_nl_compat_doit(struct tipc_nl_compat_cmd_doit *cmd,
 {
 	int err;
 
-	if (msg->req_type && !TLV_CHECK_TYPE(msg->req, msg->req_type))
+	if (msg->req_type && (!msg->req_size ||
+			      !TLV_CHECK_TYPE(msg->req, msg->req_type)))
 		return -EINVAL;
 
 	err = __tipc_nl_compat_doit(cmd, msg);
@@ -1278,8 +1281,8 @@ static int tipc_nl_compat_recv(struct sk_buff *skb, struct genl_info *info)
 		goto send;
 	}
 
-	len = nlmsg_attrlen(req_nlh, GENL_HDRLEN + TIPC_GENL_HDRLEN);
-	if (!len || !TLV_OK(msg.req, len)) {
+	msg.req_size = nlmsg_attrlen(req_nlh, GENL_HDRLEN + TIPC_GENL_HDRLEN);
+	if (msg.req_size && !TLV_OK(msg.req, msg.req_size)) {
 		msg.rep = tipc_get_err_tlv(TIPC_CFG_NOT_SUPPORTED);
 		err = -EOPNOTSUPP;
 		goto send;
-- 
2.19.1


^ permalink raw reply related

* Re: [PATCH net] net: ipv6: Fix a bug in ndisc_send_ns when netdev only has a global address
From: David Ahern @ 2019-07-29 22:28 UTC (permalink / raw)
  To: David Miller, suyj.fnst; +Cc: kuznet, yoshfuji, netdev
In-Reply-To: <20190729.141752.457438545178811941.davem@davemloft.net>

On 7/29/19 3:17 PM, David Miller wrote:
> David, can you take a quick look at this?

will do. I'll get back to you by tomorrow.

^ permalink raw reply

* Re: [PATCH net-next] net: dsa: mv88e6xxx: avoid some redundant vtu load/purge operations
From: Vivien Didelot @ 2019-07-29 22:46 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: Andrew Lunn, Florian Fainelli, David S. Miller, Rasmus Villemoes,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <20190722233713.31396-1-rasmus.villemoes@prevas.dk>

Hi Rasmus,

On Mon, 22 Jul 2019 23:37:26 +0000, Rasmus Villemoes <rasmus.villemoes@prevas.dk> wrote:
> We have an ERPS (Ethernet Ring Protection Switching) setup involving
> mv88e6250 switches which we're in the process of switching to a BSP
> based on the mainline driver. Breaking any link in the ring works as
> expected, with the ring reconfiguring itself quickly and traffic
> continuing with almost no noticable drops. However, when plugging back
> the cable, we see 5+ second stalls.
> 
> This has been tracked down to the userspace application in charge of
> the protocol missing a few CCM messages on the good link (the one that
> was not unplugged), causing it to broadcast a "signal fail". That
> message eventually reaches its link partner, which responds by
> blocking the port. Meanwhile, the first node has continued to block
> the port with the just plugged-in cable, breaking the network. And the
> reason for those missing CCM messages has in turn been tracked down to
> the VTU apparently being too busy servicing load/purge operations that
> the normal lookups are delayed.
> 
> Initial state, the link between C and D is blocked in software.
> 
>      _____________________
>     /                     \
>    |                       |
>    A ----- B ----- C *---- D
> 
> Unplug the cable between C and D.
> 
>      _____________________
>     /                     \
>    |                       |
>    A ----- B ----- C *   * D
> 
> Reestablish the link between C and D.
>      _____________________
>     /                     \
>    |                       |
>    A ----- B ----- C *---- D
> 
> Somehow, enough VTU/ATU operations happen inside C that prevents
> the application from receving the CCM messages from B in a timely
> manner, so a Signal Fail message is sent by C. When B receives
> that, it responds by blocking its port.
> 
>      _____________________
>     /                     \
>    |                       |
>    A ----- B *---* C *---- D
> 
> Very shortly after this, the signal fail condition clears on the
> BC link (some CCM messages finally make it through), so C
> unblocks the port. However, a guard timer inside B prevents it
> from removing the blocking before 5 seconds have elapsed.
> 
> It is not unlikely that our userspace ERPS implementation could be
> smarter and/or is simply buggy. However, this patch fixes the symptoms
> we see, and is a small optimization that should not break anything
> (knock wood). The idea is simply to avoid doing an VTU load of an
> entry identical to the one already present. To do that, we need to
> know whether mv88e6xxx_vtu_get() actually found an existing entry, or
> has just prepared a struct mv88e6xxx_vtu_entry for us to load. To that
> end, let vlan->valid be an output parameter. The other two callers of
> mv88e6xxx_vtu_get() are not affected by this patch since they pass
> new=false.
> 
> Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
> ---
>  drivers/net/dsa/mv88e6xxx/chip.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
> index 6b17cd961d06..2e500428670c 100644
> --- a/drivers/net/dsa/mv88e6xxx/chip.c
> +++ b/drivers/net/dsa/mv88e6xxx/chip.c
> @@ -1423,7 +1423,6 @@ static int mv88e6xxx_vtu_get(struct mv88e6xxx_chip *chip, u16 vid,
>  
>  		/* Initialize a fresh VLAN entry */
>  		memset(entry, 0, sizeof(*entry));
> -		entry->valid = true;
>  		entry->vid = vid;
>  
>  		/* Exclude all ports */
> @@ -1618,6 +1617,9 @@ static int _mv88e6xxx_port_vlan_add(struct mv88e6xxx_chip *chip, int port,
>  	if (err)
>  		return err;
>  
> +	if (vlan.valid && vlan.member[port] == member)
> +		return 0;
> +	vlan.valid = true;
>  	vlan.member[port] = member;
>  
>  	err = mv88e6xxx_vtu_loadpurge(chip, &vlan);

I'd prefer not to use the mv88e6xxx_vtu_entry structure for output
parameters, this can be confusing. As you correctly mentioned, this
initialization is only done for _mv88e6xxx_port_vlan_add, so I'll
prepare a patch which gets rid of the boolean parameter and move that
logic up.


Thanks,
Vivien

^ permalink raw reply

* [PATCH net] selftests/tls: fix TLS tests with CONFIG_TLS=n
From: Jakub Kicinski @ 2019-07-29 23:08 UTC (permalink / raw)
  To: davem; +Cc: netdev, oss-drivers, Jakub Kicinski, kernel test robot

Build bot reports some recent TLS tests are failing
with CONFIG_TLS=n. Correct the expected return code
and skip TLS installation if not supported.

Tested with CONFIG_TLS=n and CONFIG_TLS=m.

Reported-by: kernel test robot <rong.a.chen@intel.com>
Fixes: cf32526c8842 ("selftests/tls: add a test for ULP but no keys")
Fixes: 65d41fb317c6 ("selftests/tls: add a bidirectional test")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 tools/testing/selftests/net/tls.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/tools/testing/selftests/net/tls.c b/tools/testing/selftests/net/tls.c
index 630c5b884d43..d995e6503b1a 100644
--- a/tools/testing/selftests/net/tls.c
+++ b/tools/testing/selftests/net/tls.c
@@ -69,7 +69,7 @@ FIXTURE_SETUP(tls_basic)
 
 	ret = setsockopt(self->fd, IPPROTO_TCP, TCP_ULP, "tls", sizeof("tls"));
 	if (ret != 0) {
-		ASSERT_EQ(errno, ENOTSUPP);
+		ASSERT_EQ(errno, ENOENT);
 		self->notls = true;
 		printf("Failure setting TCP_ULP, testing without tls\n");
 		return;
@@ -696,21 +696,26 @@ TEST_F(tls, recv_lowat)
 
 TEST_F(tls, bidir)
 {
-	struct tls12_crypto_info_aes_gcm_128 tls12;
 	char const *test_str = "test_read";
 	int send_len = 10;
 	char buf[10];
 	int ret;
 
-	memset(&tls12, 0, sizeof(tls12));
-	tls12.info.version = TLS_1_3_VERSION;
-	tls12.info.cipher_type = TLS_CIPHER_AES_GCM_128;
+	if (!self->notls) {
+		struct tls12_crypto_info_aes_gcm_128 tls12;
 
-	ret = setsockopt(self->fd, SOL_TLS, TLS_RX, &tls12, sizeof(tls12));
-	ASSERT_EQ(ret, 0);
+		memset(&tls12, 0, sizeof(tls12));
+		tls12.info.version = TLS_1_3_VERSION;
+		tls12.info.cipher_type = TLS_CIPHER_AES_GCM_128;
 
-	ret = setsockopt(self->cfd, SOL_TLS, TLS_TX, &tls12, sizeof(tls12));
-	ASSERT_EQ(ret, 0);
+		ret = setsockopt(self->fd, SOL_TLS, TLS_RX, &tls12,
+				 sizeof(tls12));
+		ASSERT_EQ(ret, 0);
+
+		ret = setsockopt(self->cfd, SOL_TLS, TLS_TX, &tls12,
+				 sizeof(tls12));
+		ASSERT_EQ(ret, 0);
+	}
 
 	ASSERT_EQ(strlen(test_str) + 1, send_len);
 
-- 
2.21.0


^ permalink raw reply related

* Re: [PATCH bpf-next 00/10] CO-RE offset relocations
From: Andrii Nakryiko @ 2019-07-29 23:09 UTC (permalink / raw)
  To: Song Liu
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Yonghong Song, Kernel Team
In-Reply-To: <CAPhsuW5H2QQjuASV2iXTdA73E7AQnj73b77x4FmJomc-gJy-Cg@mail.gmail.com>

On Mon, Jul 29, 2019 at 1:37 PM Song Liu <liu.song.a23@gmail.com> wrote:
>
> On Mon, Jul 29, 2019 at 1:20 PM Song Liu <liu.song.a23@gmail.com> wrote:
> >
> > On Wed, Jul 24, 2019 at 1:34 PM Andrii Nakryiko <andriin@fb.com> wrote:
> > >
> > > This patch set implements central part of CO-RE (Compile Once - Run
> > > Everywhere, see [0] and [1] for slides and video): relocating field offsets.
> > > Most of the details are written down as comments to corresponding parts of the
> > > code.
> > >
> > > Patch #1 adds loading of .BTF.ext offset relocations section and macros to
> > > work with its contents.
> > > Patch #2 implements CO-RE relocations algorithm in libbpf.
> > > Patches #3-#10 adds selftests validating various parts of relocation handling,
> > > type compatibility, etc.
> > >
> > > For all tests to work, you'll need latest Clang/LLVM supporting
> > > __builtin_preserve_access_index intrinsic, used for recording offset
> > > relocations. Kernel on which selftests run should have BTF information built
> > > in (CONFIG_DEBUG_INFO_BTF=y).
> > >
> > >   [0] http://vger.kernel.org/bpfconf2019.html#session-2
> > >   [1] http://vger.kernel.org/lpc-bpf2018.html#session-2CO-RE relocations
> > >
> > > This patch set implements central part of CO-RE (Compile Once - Run
> > > Everywhere, see [0] and [1] for slides and video): relocating field offsets.
> > > Most of the details are written down as comments to corresponding parts of the
> > > code.
> > >
> > > Patch #1 adds loading of .BTF.ext offset relocations section and macros to
> > > work with its contents.
> > > Patch #2 implements CO-RE relocations algorithm in libbpf.
> > > Patches #3-#10 adds selftests validating various parts of relocation handling,
> > > type compatibility, etc.
> > >
> > > For all tests to work, you'll need latest Clang/LLVM supporting
> > > __builtin_preserve_access_index intrinsic, used for recording offset
> > > relocations. Kernel on which selftests run should have BTF information built
> > > in (CONFIG_DEBUG_INFO_BTF=y).
> > >
> > >   [0] http://vger.kernel.org/bpfconf2019.html#session-2
> > >   [1] http://vger.kernel.org/lpc-bpf2018.html#session-2
> > >
> > > Andrii Nakryiko (10):
> > >   libbpf: add .BTF.ext offset relocation section loading
> > >   libbpf: implement BPF CO-RE offset relocation algorithm
> > >   selftests/bpf: add CO-RE relocs testing setup
> > >   selftests/bpf: add CO-RE relocs struct flavors tests
> > >   selftests/bpf: add CO-RE relocs nesting tests
> > >   selftests/bpf: add CO-RE relocs array tests
> > >   selftests/bpf: add CO-RE relocs enum/ptr/func_proto tests
> > >   selftests/bpf: add CO-RE relocs modifiers/typedef tests
> > >   selftest/bpf: add CO-RE relocs ptr-as-array tests
> > >   selftests/bpf: add CO-RE relocs ints tests
> > >
> > >  tools/lib/bpf/btf.c                           |  64 +-
> > >  tools/lib/bpf/btf.h                           |   4 +
> > >  tools/lib/bpf/libbpf.c                        | 866 +++++++++++++++++-
> > >  tools/lib/bpf/libbpf.h                        |   1 +
> > >  tools/lib/bpf/libbpf_internal.h               |  91 ++
> > >  .../selftests/bpf/prog_tests/core_reloc.c     | 363 ++++++++
> > >  .../bpf/progs/btf__core_reloc_arrays.c        |   3 +
> > >  .../btf__core_reloc_arrays___diff_arr_dim.c   |   3 +
> > >  ...btf__core_reloc_arrays___diff_arr_val_sz.c |   3 +
> > >  .../btf__core_reloc_arrays___err_non_array.c  |   3 +
> > >  ...btf__core_reloc_arrays___err_too_shallow.c |   3 +
> > >  .../btf__core_reloc_arrays___err_too_small.c  |   3 +
> > >  ..._core_reloc_arrays___err_wrong_val_type1.c |   3 +
> > >  ..._core_reloc_arrays___err_wrong_val_type2.c |   3 +
> > >  .../bpf/progs/btf__core_reloc_flavors.c       |   3 +
> > >  .../btf__core_reloc_flavors__err_wrong_name.c |   3 +
> > >  .../bpf/progs/btf__core_reloc_ints.c          |   3 +
> > >  .../bpf/progs/btf__core_reloc_ints___bool.c   |   3 +
> > >  .../btf__core_reloc_ints___err_bitfield.c     |   3 +
> > >  .../btf__core_reloc_ints___err_wrong_sz_16.c  |   3 +
> > >  .../btf__core_reloc_ints___err_wrong_sz_32.c  |   3 +
> > >  .../btf__core_reloc_ints___err_wrong_sz_64.c  |   3 +
> > >  .../btf__core_reloc_ints___err_wrong_sz_8.c   |   3 +
> > >  .../btf__core_reloc_ints___reverse_sign.c     |   3 +
> > >  .../bpf/progs/btf__core_reloc_mods.c          |   3 +
> > >  .../progs/btf__core_reloc_mods___mod_swap.c   |   3 +
> > >  .../progs/btf__core_reloc_mods___typedefs.c   |   3 +
> > >  .../bpf/progs/btf__core_reloc_nesting.c       |   3 +
> > >  .../btf__core_reloc_nesting___anon_embed.c    |   3 +
> > >  ...f__core_reloc_nesting___dup_compat_types.c |   5 +
> > >  ...core_reloc_nesting___err_array_container.c |   3 +
> > >  ...tf__core_reloc_nesting___err_array_field.c |   3 +
> > >  ...e_reloc_nesting___err_dup_incompat_types.c |   4 +
> > >  ...re_reloc_nesting___err_missing_container.c |   3 +
> > >  ...__core_reloc_nesting___err_missing_field.c |   3 +
> > >  ..._reloc_nesting___err_nonstruct_container.c |   3 +
> > >  ...e_reloc_nesting___err_partial_match_dups.c |   4 +
> > >  .../btf__core_reloc_nesting___err_too_deep.c  |   3 +
> > >  .../btf__core_reloc_nesting___extra_nesting.c |   3 +
> > >  ..._core_reloc_nesting___struct_union_mixup.c |   3 +
> > >  .../bpf/progs/btf__core_reloc_primitives.c    |   3 +
> > >  ...f__core_reloc_primitives___diff_enum_def.c |   3 +
> > >  ..._core_reloc_primitives___diff_func_proto.c |   3 +
> > >  ...f__core_reloc_primitives___diff_ptr_type.c |   3 +
> > >  ...tf__core_reloc_primitives___err_non_enum.c |   3 +
> > >  ...btf__core_reloc_primitives___err_non_int.c |   3 +
> > >  ...btf__core_reloc_primitives___err_non_ptr.c |   3 +
> > >  .../bpf/progs/btf__core_reloc_ptr_as_arr.c    |   3 +
> > >  .../btf__core_reloc_ptr_as_arr___diff_sz.c    |   3 +
> > >  .../selftests/bpf/progs/core_reloc_types.h    | 642 +++++++++++++
> > >  .../bpf/progs/test_core_reloc_arrays.c        |  58 ++
> > >  .../bpf/progs/test_core_reloc_flavors.c       |  65 ++
> > >  .../bpf/progs/test_core_reloc_ints.c          |  48 +
> > >  .../bpf/progs/test_core_reloc_kernel.c        |  39 +
> > >  .../bpf/progs/test_core_reloc_mods.c          |  68 ++
> > >  .../bpf/progs/test_core_reloc_nesting.c       |  48 +
> > >  .../bpf/progs/test_core_reloc_primitives.c    |  50 +
> > >  .../bpf/progs/test_core_reloc_ptr_as_arr.c    |  34 +
> > >  58 files changed, 2527 insertions(+), 47 deletions(-)
> > >  create mode 100644 tools/testing/selftests/bpf/prog_tests/core_reloc.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___diff_arr_dim.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___diff_arr_val_sz.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___err_non_array.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___err_too_shallow.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___err_too_small.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___err_wrong_val_type1.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___err_wrong_val_type2.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_flavors.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_flavors__err_wrong_name.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___bool.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___err_bitfield.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___err_wrong_sz_16.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___err_wrong_sz_32.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___err_wrong_sz_64.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___err_wrong_sz_8.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___reverse_sign.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_mods.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_mods___mod_swap.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_mods___typedefs.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___anon_embed.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___dup_compat_types.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_array_container.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_array_field.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_dup_incompat_types.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_missing_container.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_missing_field.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_nonstruct_container.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_partial_match_dups.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_too_deep.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___extra_nesting.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___struct_union_mixup.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives___diff_enum_def.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives___diff_func_proto.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives___diff_ptr_type.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives___err_non_enum.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives___err_non_int.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives___err_non_ptr.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ptr_as_arr.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ptr_as_arr___diff_sz.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/core_reloc_types.h
> > >  create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_arrays.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_flavors.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_ints.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_kernel.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_mods.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_nesting.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_primitives.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_ptr_as_arr.c
> >
> > We have created a lot of small files. Would it be cleaner if we can
> > somehow put these
> > data in one file (maybe different sections?).
>
> After reading more, I guess you have tried this and end up with current
> design: keep most struct defines in core_reloc_types.h.

Yeah, I have all the definition in one header file, but then I need
individual combinations as separate BTFs, so I essentially "pick"
desired types using function declarations. Creating those BTFs by hand
would be a nightmare to create and maintain.

>
> >
> > Alternatively, maybe create a folder for these files:
> >   tools/testing/selftests/bpf/progs/core/
>
> I guess this would still make it cleaner.

There is nothing too special about core tests to split them. Also it
would require Makefile changes and would deviate test_progs
definitions from analogous test_maps, test_verifier, test_btf, etc, so
I'm not sure about that. I though about putting those btf__* files
under separate directory, but I'm on the fence there as well, as I'd
rather have related files to stay together...


>
> Thanks,
> Song

^ permalink raw reply

* [PATCH v2 net-next 1/1] tc-testing: Clarify the use of tdc's -d option
From: Lucas Bates @ 2019-07-29 23:18 UTC (permalink / raw)
  To: davem
  Cc: netdev, nicolas.dichtel, jhs, xiyou.wangcong, jiri, mleitner,
	vladbu, dcaratti, kernel, Lucas Bates

The -d command line argument to tdc requires the name of a physical device
on the system where the tests will be run. If -d has not been used, tdc
will skip tests that require a physical device.

This patch is intended to better document what the -d option does and how
it is used.

Signed-off-by: Lucas Bates <lucasb@mojatatu.com>
---
 tools/testing/selftests/tc-testing/README |  4 +++-
 tools/testing/selftests/tc-testing/tdc.py | 12 ++++++++----
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/tc-testing/README b/tools/testing/selftests/tc-testing/README
index 22e5da9..b0954c8 100644
--- a/tools/testing/selftests/tc-testing/README
+++ b/tools/testing/selftests/tc-testing/README
@@ -128,7 +128,9 @@ optional arguments:
   -v, --verbose         Show the commands that are being run
   -N, --notap           Suppress tap results for command under test
   -d DEVICE, --device DEVICE
-                        Execute the test case in flower category
+                        Execute test cases that use a physical device, where
+                        DEVICE is its name. (If not defined, tests that require
+                        a physical device will be skipped)
   -P, --pause           Pause execution just before post-suite stage
 
 selection:
diff --git a/tools/testing/selftests/tc-testing/tdc.py b/tools/testing/selftests/tc-testing/tdc.py
index f04321a..e566c70 100755
--- a/tools/testing/selftests/tc-testing/tdc.py
+++ b/tools/testing/selftests/tc-testing/tdc.py
@@ -356,12 +356,14 @@ def test_runner(pm, args, filtered_tests):
     time.sleep(2)
     for tidx in testlist:
         if "flower" in tidx["category"] and args.device == None:
+            errmsg = "Tests using the DEV2 variable must define the name of a "
+            errmsg += "physical NIC with the -d option when running tdc.\n"
+            errmsg += "Test has been skipped."
             if args.verbose > 1:
-                print('Not executing test {} {} because DEV2 not defined'.
-                      format(tidx['id'], tidx['name']))
+                print(errmsg)
             res = TestResult(tidx['id'], tidx['name'])
             res.set_result(ResultState.skip)
-            res.set_errormsg('Not executed because DEV2 is not defined')
+            res.set_errormsg(errmsg)
             tsr.add_resultdata(res)
             continue
         try:
@@ -499,7 +501,9 @@ def set_args(parser):
         choices=['none', 'xunit', 'tap'],
         help='Specify the format for test results. (Default: TAP)')
     parser.add_argument('-d', '--device',
-                        help='Execute the test case in flower category')
+                        help='Execute test cases that use a physical device, ' +
+                        'where DEVICE is its name. (If not defined, tests ' +
+                        'that require a physical device will be skipped)')
     parser.add_argument(
         '-P', '--pause', action='store_true',
         help='Pause execution just before post-suite stage')
-- 
2.7.4


^ permalink raw reply related

* Re: [PATCH 1/4] dt-bindings: net: Add aspeed,ast2600-mdio binding
From: Rob Herring @ 2019-07-29 23:37 UTC (permalink / raw)
  To: Andrew Jeffery
  Cc: netdev, David Miller, Mark Rutland, Joel Stanley, Andrew Lunn,
	Florian Fainelli, Heiner Kallweit, devicetree,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
	linux-aspeed, linux-kernel@vger.kernel.org
In-Reply-To: <20190729043926.32679-2-andrew@aj.id.au>

On Sun, Jul 28, 2019 at 10:39 PM Andrew Jeffery <andrew@aj.id.au> wrote:
>
> The AST2600 splits out the MDIO bus controller from the MAC into its own
> IP block and rearranges the register layout. Add a new binding to
> describe the new hardware.
>
> Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
> ---
>  .../bindings/net/aspeed,ast2600-mdio.yaml     | 61 +++++++++++++++++++
>  1 file changed, 61 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/net/aspeed,ast2600-mdio.yaml
>
> diff --git a/Documentation/devicetree/bindings/net/aspeed,ast2600-mdio.yaml b/Documentation/devicetree/bindings/net/aspeed,ast2600-mdio.yaml
> new file mode 100644
> index 000000000000..fa86f6438473
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/net/aspeed,ast2600-mdio.yaml
> @@ -0,0 +1,61 @@
> +# SPDX-License-Identifier: GPL-2.0-or-later
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/net/aspeed,ast2600-mdio.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: ASPEED AST2600 MDIO Controller
> +
> +maintainers:
> +  - Andrew Jeffery <andrew@aj.id.au>
> +
> +description: |+
> +  The ASPEED AST2600 MDIO controller is the third iteration of ASPEED's MDIO
> +  bus register interface, this time also separating out the controller from the
> +  MAC.
> +

Should have a:

allOf:
  - $ref: "mdio.yaml#"

> +properties:
> +  compatible:
> +    const: aspeed,ast2600-mdio
> +  reg:
> +    maxItems: 1
> +    description: The register range of the MDIO controller instance

> +  "#address-cells":
> +    const: 1
> +  "#size-cells":
> +    const: 0

Then you can drop these 2.

> +
> +patternProperties:
> +  "^ethernet-phy@[a-f0-9]$":
> +    properties:
> +      reg:
> +        description:
> +          The MDIO bus index of the PHY

And this.

> +      compatible:
> +        enum:
> +          - ethernet-phy-ieee802.3-c22
> +          - ethernet-phy-ieee802.3-c45

This isn't specific to ASpeed either and is already covered by
ethernet-phy.yaml.

So that means none of the child node schema is needed here.

> +    required:
> +      - reg
> +
> +required:
> +  - compatible
> +  - reg
> +  - "#address-cells"
> +  - "#size-cells"
> +
> +examples:
> +  - |
> +    mdio0: mdio@1e650000 {
> +            compatible = "aspeed,ast2600-mdio";
> +            reg = <0x1e650000 0x8>;
> +            #address-cells = <1>;
> +            #size-cells = <0>;
> +
> +            status = "okay";

Don't show status in examples.

> +
> +            ethphy0: ethernet-phy@0 {
> +                    compatible = "ethernet-phy-ieee802.3-c22";
> +                    reg = <0>;
> +            };
> +    };
> --
> 2.20.1
>

^ permalink raw reply

* Re: [RFC PATCH 1/2] gianfar: convert to phylink
From: Vladimir Oltean @ 2019-07-29 23:39 UTC (permalink / raw)
  To: Arseny Solokha, Claudiu Manoil, Russell King
  Cc: Ioana Ciornei, Andrew Lunn, netdev, Florian Fainelli
In-Reply-To: <20190723151702.14430-2-asolokha@kb.kras.ru>

Hi Arseny,

Nice project!

On Wed, 24 Jul 2019 at 03:38, Arseny Solokha <asolokha@kb.kras.ru> wrote:
>
> Convert gianfar to use the phylink API for better SFP modules support.
>
> The driver still uses phylib for serdes configuration over the TBI
> interface, as there seems to be no functionally equivalent API present
> in phylink (yet). phylib usage is basically confined in two functions.
>
> One needs to change their Device Tree accordingly to get working SFP
> support:
>
>  enet1: ethernet@25000 {
>  <...>
>         device_type = "network";
>         model = "eTSEC";
>         compatible = "gianfar";
>         tbi-handle = <&tbi0>;
> +       phy-connection-type = "sgmii";
> +       managed = "in-band-status";
> +       sfp = <&sfp>;
>         max-speed = <1000>;
>
>         mdio@520 {
>                 #address-cells = <1>;
>                 #size-cells = <0>;
>                 compatible = "fsl,gianfar-tbi";
>                 reg = <0x520 0x20>;
>
>                 tbi0: tbi-phy@1f {
>                         reg = <0x1f>;
>                         device_type = "tbi-phy";
>                 };
>         };
> +
> +       sfp: sfp0 {
> +               compatible = "sff,sfp";
> +               i2c-bus = <&i2c1>;
> +               mod-def0-gpios = <&gpio 4 GPIO_ACTIVE_LOW>;
> +               rate-select0-gpios = <&gpio 13 GPIO_ACTIVE_HIGH>;
> +               tx-disable-gpios = <&gpio 5 GPIO_ACTIVE_LOW>;
> +               tx-fault-gpios = <&gpio 12 GPIO_ACTIVE_HIGH>;
> +               los-gpios = <&gpio 3 GPIO_ACTIVE_HIGH>;
> +       };
>  };
>
> Signed-off-by: Arseny Solokha <asolokha@kb.kras.ru>
> ---
>  drivers/net/ethernet/freescale/Kconfig        |   2 +-
>  drivers/net/ethernet/freescale/gianfar.c      | 409 +++++++++---------
>  drivers/net/ethernet/freescale/gianfar.h      |  26 +-
>  .../net/ethernet/freescale/gianfar_ethtool.c  |  79 ++--
>  4 files changed, 251 insertions(+), 265 deletions(-)
>
> diff --git a/drivers/net/ethernet/freescale/Kconfig b/drivers/net/ethernet/freescale/Kconfig
> index 6a7e8993119f..8b51d423b61d 100644
> --- a/drivers/net/ethernet/freescale/Kconfig
> +++ b/drivers/net/ethernet/freescale/Kconfig
> @@ -89,7 +89,7 @@ config GIANFAR
>         tristate "Gianfar Ethernet"
>         depends on HAS_DMA
>         select FSL_PQ_MDIO
> -       select PHYLIB
> +       select PHYLINK
>         select CRC32
>         ---help---
>           This driver supports the Gigabit TSEC on the MPC83xx, MPC85xx,
> diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c
> index 7ea19e173339..64c7b174e591 100644
> --- a/drivers/net/ethernet/freescale/gianfar.c
> +++ b/drivers/net/ethernet/freescale/gianfar.c
> @@ -96,6 +96,7 @@
>  #include <linux/mii.h>
>  #include <linux/phy.h>
>  #include <linux/phy_fixed.h>
> +#include <linux/phylink.h>

You can remove linux/phy.h.
You (PHYLINK, actually) broke .ndo_change_carrier, and therefore
commit 6211d46713c5 ("gianfar: Add change_carrier() for Fixed PHYs").
If we were honest, we should probably revert that commit entirely now.
Once you do that, you can remove linux/phy_fixed.h as well.

>  #include <linux/of.h>
>  #include <linux/of_net.h>
>
> @@ -117,8 +118,18 @@ static int gfar_change_mtu(struct net_device *dev, int new_mtu);
>  static irqreturn_t gfar_error(int irq, void *dev_id);
>  static irqreturn_t gfar_transmit(int irq, void *dev_id);
>  static irqreturn_t gfar_interrupt(int irq, void *dev_id);
> -static void adjust_link(struct net_device *dev);
> -static noinline void gfar_update_link_state(struct gfar_private *priv);
> +static void gfar_phylink_validate(struct phylink_config *config,
> +                                 unsigned long *supported,
> +                                 struct phylink_link_state *state);
> +static int gfar_mac_link_state(struct phylink_config *config,
> +                              struct phylink_link_state *state);
> +static void gfar_mac_config(struct phylink_config *config, unsigned int mode,
> +                           const struct phylink_link_state *state);
> +static void gfar_mac_an_restart(struct phylink_config *config);
> +static void gfar_mac_link_down(struct phylink_config *config, unsigned int mode,
> +                              phy_interface_t interface);
> +static void gfar_mac_link_up(struct phylink_config *config, unsigned int mode,
> +                            phy_interface_t interface, struct phy_device *phy);

Please do not add to this forward declaration madness. A cleanup patch
in this area would be highly appreciated.

>  static int init_phy(struct net_device *dev);
>  static int gfar_probe(struct platform_device *ofdev);
>  static int gfar_remove(struct platform_device *ofdev);
> @@ -504,6 +515,15 @@ static const struct net_device_ops gfar_netdev_ops = {
>  #endif
>  };
>
> +static const struct phylink_mac_ops gfar_phylink_ops = {
> +       .validate = gfar_phylink_validate,
> +       .mac_link_state = gfar_mac_link_state,
> +       .mac_config = gfar_mac_config,
> +       .mac_an_restart = gfar_mac_an_restart,
> +       .mac_link_down = gfar_mac_link_down,
> +       .mac_link_up = gfar_mac_link_up,
> +};
> +
>  static void gfar_ints_disable(struct gfar_private *priv)
>  {
>         int i;
> @@ -733,6 +753,7 @@ static int gfar_of_init(struct platform_device *ofdev, struct net_device **pdev)
>         struct gfar_private *priv = NULL;
>         struct device_node *np = ofdev->dev.of_node;
>         struct device_node *child = NULL;
> +       struct phylink *phylink;
>         u32 stash_len = 0;
>         u32 stash_idx = 0;
>         unsigned int num_tx_qs, num_rx_qs;
> @@ -891,11 +912,21 @@ static int gfar_of_init(struct platform_device *ofdev, struct net_device **pdev)
>
>         err = of_property_read_string(np, "phy-connection-type", &ctype);
>
> -       /* We only care about rgmii-id.  The rest are autodetected */
> -       if (err == 0 && !strcmp(ctype, "rgmii-id"))
> -               priv->interface = PHY_INTERFACE_MODE_RGMII_ID;
> -       else
> +       /* We only care about rgmii-id and sgmii - the former
> +        * is indistinguishable from rgmii in hardware, and phylink needs
> +        * the latter to be set appropriately for correct phy configuration.
> +        * The rest are autodetected
> +        */
> +       if (err == 0) {
> +               if (!strcmp(ctype, "rgmii-id"))
> +                       priv->interface = PHY_INTERFACE_MODE_RGMII_ID;
> +               else if (!strcmp(ctype, "sgmii"))
> +                       priv->interface = PHY_INTERFACE_MODE_SGMII;
> +               else
> +                       priv->interface = PHY_INTERFACE_MODE_MII;
> +       } else {
>                 priv->interface = PHY_INTERFACE_MODE_MII;
> +       }
>

No. Don't do this. Just do:

    err = of_get_phy_mode(np);
    if (err < 0)
        goto err_grp_init;

    priv->interface = err;

>         if (of_find_property(np, "fsl,magic-packet", NULL))
>                 priv->device_flags |= FSL_GIANFAR_DEV_HAS_MAGIC_PACKET;
> @@ -903,19 +934,21 @@ static int gfar_of_init(struct platform_device *ofdev, struct net_device **pdev)
>         if (of_get_property(np, "fsl,wake-on-filer", NULL))
>                 priv->device_flags |= FSL_GIANFAR_DEV_HAS_WAKE_ON_FILER;
>
> -       priv->phy_node = of_parse_phandle(np, "phy-handle", 0);
> +       priv->device_node = np;
> +       priv->speed = SPEED_UNKNOWN;
>
> -       /* In the case of a fixed PHY, the DT node associated
> -        * to the PHY is the Ethernet MAC DT node.
> -        */
> -       if (!priv->phy_node && of_phy_is_fixed_link(np)) {
> -               err = of_phy_register_fixed_link(np);
> -               if (err)
> -                       goto err_grp_init;
> +       priv->phylink_config.dev = &priv->ndev->dev;
> +       priv->phylink_config.type = PHYLINK_NETDEV;
>
> -               priv->phy_node = of_node_get(np);
> +       phylink = phylink_create(&priv->phylink_config, of_fwnode_handle(np),
> +                                priv->interface, &gfar_phylink_ops);

You introduced a bug here.
of_phy_connect used to take the PHY interface type (for good or bad)
from gfar_get_interface() (which is reconstructing it from the MAC
registers).
You are now passing the PHY interface type to phylink_create from the
"phy-connection-type" DT property.
At the very least, you are breaking LS1021A which uses phy-mode
instead of phy-connection-type (hence my comment above to use the
generic OF helper).
Actually I think you just uncovered a latent bug, in that the DT
bindings for phy-mode didn't mean much at all to the driver - it would
rely on what the bootloader had set up.
Actually DT bindings for phy-connection-type were most likely simply
bolt on on top of gianfar when they figured they couldn't just
auto-detect the various species of required RGMII delays.
But gfar_get_interface is a piece of history that was introduced in
the same commit as the enum phy_interface_t itself: e8a2b6a42073
("[PATCH] PHY: Add support for configuring the PHY connection
interface"). Its time has come.

> +       if (IS_ERR(phylink)) {
> +               err = PTR_ERR(phylink);
> +               goto err_grp_init;
>         }
>
> +       priv->phylink = phylink;
> +
>         /* Find the TBI PHY.  If it's not there, we don't support SGMII */
>         priv->tbi_node = of_parse_phandle(np, "tbi-handle", 0);
>
> @@ -994,7 +1027,7 @@ static int gfar_hwtstamp_get(struct net_device *netdev, struct ifreq *ifr)
>
>  static int gfar_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
>  {
> -       struct phy_device *phydev = dev->phydev;
> +       struct gfar_private *priv = netdev_priv(dev);
>
>         if (!netif_running(dev))
>                 return -EINVAL;
> @@ -1004,10 +1037,7 @@ static int gfar_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
>         if (cmd == SIOCGHWTSTAMP)
>                 return gfar_hwtstamp_get(dev, rq);
>
> -       if (!phydev)
> -               return -ENODEV;
> -
> -       return phy_mii_ioctl(phydev, rq, cmd);
> +       return phylink_mii_ioctl(priv->phylink, rq, cmd);
>  }
>
>  static u32 cluster_entry_per_class(struct gfar_private *priv, u32 rqfar,
> @@ -1307,7 +1337,6 @@ static void gfar_init_addr_hash_table(struct gfar_private *priv)
>   */
>  static int gfar_probe(struct platform_device *ofdev)
>  {
> -       struct device_node *np = ofdev->dev.of_node;
>         struct net_device *dev = NULL;
>         struct gfar_private *priv = NULL;
>         int err = 0, i;
> @@ -1463,12 +1492,10 @@ static int gfar_probe(struct platform_device *ofdev)
>         return 0;
>
>  register_fail:
> -       if (of_phy_is_fixed_link(np))
> -               of_phy_deregister_fixed_link(np);
>         unmap_group_regs(priv);
>         gfar_free_rx_queues(priv);
>         gfar_free_tx_queues(priv);
> -       of_node_put(priv->phy_node);
> +       phylink_destroy(priv->phylink);
>         of_node_put(priv->tbi_node);
>         free_gfar_dev(priv);
>         return err;
> @@ -1477,19 +1504,15 @@ static int gfar_probe(struct platform_device *ofdev)
>  static int gfar_remove(struct platform_device *ofdev)
>  {
>         struct gfar_private *priv = platform_get_drvdata(ofdev);
> -       struct device_node *np = ofdev->dev.of_node;
>
> -       of_node_put(priv->phy_node);
>         of_node_put(priv->tbi_node);
>
>         unregister_netdev(priv->ndev);
>
> -       if (of_phy_is_fixed_link(np))
> -               of_phy_deregister_fixed_link(np);
> -
>         unmap_group_regs(priv);
>         gfar_free_rx_queues(priv);
>         gfar_free_tx_queues(priv);
> +       phylink_destroy(priv->phylink);
>         free_gfar_dev(priv);
>
>         return 0;
> @@ -1643,9 +1666,11 @@ static int gfar_suspend(struct device *dev)
>                 gfar_start_wol_filer(priv);
>
>         } else {
> -               phy_stop(ndev->phydev);
> +               phylink_stop(phy->phylink);
>         }
>
> +       priv->speed = SPEED_UNKNOWN;
> +
>         return 0;
>  }
>
> @@ -1672,7 +1697,7 @@ static int gfar_resume(struct device *dev)
>                 gfar_filer_restore_table(priv);
>
>         } else {
> -               phy_start(ndev->phydev);
> +               phylink_start(priv->phylink);
>         }
>
>         gfar_start(priv);
> @@ -1702,12 +1727,7 @@ static int gfar_restore(struct device *dev)
>
>         gfar_start(priv);
>
> -       priv->oldlink = 0;
> -       priv->oldspeed = 0;
> -       priv->oldduplex = -1;
> -
> -       if (ndev->phydev)
> -               phy_start(ndev->phydev);
> +       phylink_start(priv->phylink);
>
>         netif_device_attach(ndev);
>         enable_napi(priv);
> @@ -1781,46 +1801,26 @@ static phy_interface_t gfar_get_interface(struct net_device *dev)
>   */
>  static int init_phy(struct net_device *dev)
>  {
> -       __ETHTOOL_DECLARE_LINK_MODE_MASK(mask) = { 0, };
>         struct gfar_private *priv = netdev_priv(dev);
>         phy_interface_t interface;
> -       struct phy_device *phydev;
>         struct ethtool_eee edata;
> +       int flags = 0;
> +       int err;
>
> -       linkmode_set_bit_array(phy_10_100_features_array,
> -                              ARRAY_SIZE(phy_10_100_features_array),
> -                              mask);
> -       linkmode_set_bit(ETHTOOL_LINK_MODE_Autoneg_BIT, mask);
> -       linkmode_set_bit(ETHTOOL_LINK_MODE_MII_BIT, mask);
> -       if (priv->device_flags & FSL_GIANFAR_DEV_HAS_GIGABIT)
> -               linkmode_set_bit(ETHTOOL_LINK_MODE_1000baseT_Full_BIT, mask);
> -
> -       priv->oldlink = 0;
> -       priv->oldspeed = 0;
> -       priv->oldduplex = -1;
> -
> -       interface = gfar_get_interface(dev);
> -
> -       phydev = of_phy_connect(dev, priv->phy_node, &adjust_link, 0,
> -                               interface);
> -       if (!phydev) {
> -               dev_err(&dev->dev, "could not attach to PHY\n");
> -               return -ENODEV;
> +       err = phylink_of_phy_connect(priv->phylink, priv->device_node, flags);
> +       if (err) {
> +               netdev_err(dev, "could not attach to PHY: %d\n", err);
> +               return err;

While you're at it, can you say "connect" instead of "attach" here?

>         }
>
> +       priv->tbi_phy = NULL;
> +       interface = gfar_get_interface(dev);

Be consistent and just go for priv->interface. Nobody's changing it anyway.

>         if (interface == PHY_INTERFACE_MODE_SGMII)
>                 gfar_configure_serdes(dev);
>

My gut feeling says that this should be moved to gfar_mac_config.

> -       /* Remove any features not supported by the controller */
> -       linkmode_and(phydev->supported, phydev->supported, mask);
> -       linkmode_copy(phydev->advertising, phydev->supported);
> -
> -       /* Add support for flow control */
> -       phy_support_asym_pause(phydev);
> -
>         /* disable EEE autoneg, EEE not supported by eTSEC */
>         memset(&edata, 0, sizeof(struct ethtool_eee));
> -       phy_ethtool_set_eee(phydev, &edata);
> +       phylink_ethtool_set_eee(priv->phylink, &edata);
>
>         return 0;
>  }
> @@ -1850,6 +1850,8 @@ static void gfar_configure_serdes(struct net_device *dev)
>                 return;
>         }
>
> +       priv->tbi_phy = tbiphy;
> +
>         /* If the link is already up, we must already be ok, and don't need to
>          * configure and reset the TBI<->SerDes link.  Maybe U-Boot configured
>          * everything for us?  Resetting it takes the link down and requires
> @@ -1964,7 +1966,7 @@ void stop_gfar(struct net_device *dev)
>         /* disable ints and gracefully shut down Rx/Tx DMA */
>         gfar_halt(priv);
>
> -       phy_stop(dev->phydev);
> +       phylink_stop(priv->phylink);
>
>         free_skb_resources(priv);
>  }
> @@ -2219,12 +2221,7 @@ int startup_gfar(struct net_device *ndev)
>         /* Start Rx/Tx DMA and enable the interrupts */
>         gfar_start(priv);
>
> -       /* force link state update after mac reset */
> -       priv->oldlink = 0;
> -       priv->oldspeed = 0;
> -       priv->oldduplex = -1;
> -
> -       phy_start(ndev->phydev);
> +       phylink_start(priv->phylink);
>
>         enable_napi(priv);
>
> @@ -2593,7 +2590,7 @@ static int gfar_close(struct net_device *dev)
>         stop_gfar(dev);
>
>         /* Disconnect from the PHY */
> -       phy_disconnect(dev->phydev);
> +       phylink_disconnect_phy(priv->phylink);

It is very odd to disconnect from the PHY on ndo_close and connect
back on ndo_open. I don't know of any other driver that does that.
Can't you change the behavior to simply start and stop phylink here?

>
>         gfar_free_irq(priv);
>
> @@ -3387,23 +3384,6 @@ static irqreturn_t gfar_interrupt(int irq, void *grp_id)
>         return IRQ_HANDLED;
>  }
>
> -/* Called every time the controller might need to be made
> - * aware of new link state.  The PHY code conveys this
> - * information through variables in the phydev structure, and this
> - * function converts those variables into the appropriate
> - * register values, and can bring down the device if needed.
> - */
> -static void adjust_link(struct net_device *dev)
> -{
> -       struct gfar_private *priv = netdev_priv(dev);
> -       struct phy_device *phydev = dev->phydev;
> -
> -       if (unlikely(phydev->link != priv->oldlink ||
> -                    (phydev->link && (phydev->duplex != priv->oldduplex ||
> -                                      phydev->speed != priv->oldspeed))))
> -               gfar_update_link_state(priv);
> -}

Getting rid of the cruft from this function deserves its own patch.

> -
>  /* Update the hash table based on the current list of multicast
>   * addresses we subscribe to.  Also, change the promiscuity of
>   * the device based on the flags (this function is called
> @@ -3635,132 +3615,169 @@ static irqreturn_t gfar_error(int irq, void *grp_id)
>         return IRQ_HANDLED;
>  }
>
> -static u32 gfar_get_flowctrl_cfg(struct gfar_private *priv)
> +static void gfar_phylink_validate(struct phylink_config *config,
> +                                 unsigned long *supported,
> +                                 struct phylink_link_state *state)
> +{
> +       struct gfar_private *priv = netdev_priv(to_net_dev(config->dev));
> +       __ETHTOOL_DECLARE_LINK_MODE_MASK(mask) = { 0, };
> +
> +       if (state->interface != PHY_INTERFACE_MODE_NA &&
> +           state->interface != PHY_INTERFACE_MODE_SGMII &&
> +           state->interface != PHY_INTERFACE_MODE_1000BASEX &&

Russell is right, you shouldn't advertise 1000Base-X if there is no
way of making sure that it works.
His question was:

> How is the difference between SGMII and 1000BASE-X handled?

To be honest I don't have a complete answer to that question. The
literature recommends writing 0x01a0 to the MII_ADVERTISE (0x4)
register of the MAC PCS for 1000Base-X, and 0x4001 for SGMII.
The FMan driver which uses the TSEC MAC does exactly that:
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/tree/drivers/net/ethernet/freescale/fman/fman_dtsec.c#n58
However I can't seem to be able to trace down the definition of bit 14
from 0x4001 - it's reserved in all the manuals I see. I have a hunch
that it selects the format of the Config_Reg base page between
1000Base-X and SGMII.
But the bad news is that gianfar is only configuring the TBI PHY for
the 1000Base-X Config_Reg, even for SGMII PHYs. I'm not quite sure,
but I think that the only reason that this doesn't break in-band AN is
that we don't use in-band AN.

> +           !phy_interface_mode_is_rgmii(state->interface)) {
> +               phylink_zero(supported);
> +               return;
> +       }
> +
> +       phylink_set(mask, Autoneg);
> +       phylink_set(mask, Pause);
> +       phylink_set(mask, Asym_Pause);
> +       phylink_set_port_modes(mask);
> +
> +       phylink_set(mask, 10baseT_Half);
> +       phylink_set(mask, 10baseT_Full);
> +       phylink_set(mask, 100baseT_Half);
> +       phylink_set(mask, 100baseT_Full);
> +
> +       if (priv->device_flags & FSL_GIANFAR_DEV_HAS_GIGABIT) {
> +               phylink_set(mask, 1000baseX_Full);
> +               phylink_set(mask, 1000baseT_Full);
> +       }
> +
> +       linkmode_and(supported, supported, mask);
> +       linkmode_and(state->advertising, state->advertising, mask);
> +}
> +
> +static int gfar_mac_link_state(struct phylink_config *config,
> +                              struct phylink_link_state *state)
> +{
> +       if (state->interface == PHY_INTERFACE_MODE_SGMII ||
> +           state->interface == PHY_INTERFACE_MODE_1000BASEX) {

What if you reduce the indentation level by 1 here, by just exiting if
the interface mode is not SGMII?

> +               struct gfar_private *priv =
> +                       netdev_priv(to_net_dev(config->dev));
> +               u16 tbi_cr;
> +
> +               if (!priv->tbi_phy)
> +                       return -ENODEV;
> +
> +               tbi_cr = phy_read(priv->tbi_phy, MII_TBI_CR);
> +
> +               state->duplex = !!(tbi_cr & TBI_CR_FULL_DUPLEX);

Woah there. Aren't you supposed to first ensure state->an_complete is
ok, based on TBI_MII_Register_Set_SR[AN_Done]? There's also a
Link_Status bit in that register that you could retrieve.

> +               if ((tbi_cr & TBI_CR_SPEED_1000_MASK) == TBI_CR_SPEED_1000_MASK)
> +                       state->speed = SPEED_1000;
> +       }

See the Speed_Bit table from TBI_MII_Register_Set_ANLPBPA_SGMII for
the link partner (aka SGMII PHY) advertisement. You have to do a
logical-and between that and your own. Also please make sure you
really are in SGMII AN and not 1000 Base-X when interpreting registers
4 & 5 as one way or another.

> +
> +       return 1;
> +}
> +
> +static u32 gfar_get_flowctrl_cfg(const struct phylink_link_state *state)
>  {
> -       struct net_device *ndev = priv->ndev;
> -       struct phy_device *phydev = ndev->phydev;
>         u32 val = 0;
>
> -       if (!phydev->duplex)
> +       if (!state->duplex)
>                 return val;
>
> -       if (!priv->pause_aneg_en) {
> -               if (priv->tx_pause_en)
> -                       val |= MACCFG1_TX_FLOW;
> -               if (priv->rx_pause_en)
> -                       val |= MACCFG1_RX_FLOW;
> -       } else {
> -               u16 lcl_adv, rmt_adv;
> -               u8 flowctrl;
> -               /* get link partner capabilities */
> -               rmt_adv = 0;
> -               if (phydev->pause)
> -                       rmt_adv = LPA_PAUSE_CAP;
> -               if (phydev->asym_pause)
> -                       rmt_adv |= LPA_PAUSE_ASYM;
> -
> -               lcl_adv = linkmode_adv_to_lcl_adv_t(phydev->advertising);
> -               flowctrl = mii_resolve_flowctrl_fdx(lcl_adv, rmt_adv);
> -               if (flowctrl & FLOW_CTRL_TX)
> -                       val |= MACCFG1_TX_FLOW;
> -               if (flowctrl & FLOW_CTRL_RX)
> -                       val |= MACCFG1_RX_FLOW;
> -       }
> +       if (state->pause & MLO_PAUSE_TX)
> +               val |= MACCFG1_TX_FLOW;
> +       if (state->pause & MLO_PAUSE_RX)
> +               val |= MACCFG1_RX_FLOW;
>
>         return val;
>  }
>
> -static noinline void gfar_update_link_state(struct gfar_private *priv)
> +static void gfar_mac_config(struct phylink_config *config, unsigned int mode,
> +                           const struct phylink_link_state *state)
>  {
> +       struct gfar_private *priv = netdev_priv(to_net_dev(config->dev));
>         struct gfar __iomem *regs = priv->gfargrp[0].regs;
> -       struct net_device *ndev = priv->ndev;
> -       struct phy_device *phydev = ndev->phydev;
> -       struct gfar_priv_rx_q *rx_queue = NULL;
> -       int i;
> +       u32 maccfg1, new_maccfg1;
> +       u32 maccfg2, new_maccfg2;
> +       u32 ecntrl, new_ecntrl;
> +       u32 tx_flow, new_tx_flow;

Don't introduce new_ variables. Is there any issue if you
unconditionally write to the MAC registers?

>
>         if (unlikely(test_bit(GFAR_RESETTING, &priv->state)))
>                 return;
>
> -       if (phydev->link) {
> -               u32 tempval1 = gfar_read(&regs->maccfg1);
> -               u32 tempval = gfar_read(&regs->maccfg2);
> -               u32 ecntrl = gfar_read(&regs->ecntrl);
> -               u32 tx_flow_oldval = (tempval1 & MACCFG1_TX_FLOW);
> +       if (unlikely(phylink_autoneg_inband(mode)))
> +               return;
>
> -               if (phydev->duplex != priv->oldduplex) {
> -                       if (!(phydev->duplex))
> -                               tempval &= ~(MACCFG2_FULL_DUPLEX);
> -                       else
> -                               tempval |= MACCFG2_FULL_DUPLEX;
> +       maccfg1 = gfar_read(&regs->maccfg1);
> +       maccfg2 = gfar_read(&regs->maccfg2);
> +       ecntrl = gfar_read(&regs->ecntrl);
>
> -                       priv->oldduplex = phydev->duplex;
> -               }
> +       new_maccfg2 = maccfg2 & ~(MACCFG2_FULL_DUPLEX | MACCFG2_IF);
> +       new_ecntrl = ecntrl & ~ECNTRL_R100;
>
> -               if (phydev->speed != priv->oldspeed) {
> -                       switch (phydev->speed) {
> -                       case 1000:
> -                               tempval =
> -                                   ((tempval & ~(MACCFG2_IF)) | MACCFG2_GMII);
> +       if (state->duplex)
> +               new_maccfg2 |= MACCFG2_FULL_DUPLEX;
>
> -                               ecntrl &= ~(ECNTRL_R100);
> -                               break;
> -                       case 100:
> -                       case 10:
> -                               tempval =
> -                                   ((tempval & ~(MACCFG2_IF)) | MACCFG2_MII);
> -
> -                               /* Reduced mode distinguishes
> -                                * between 10 and 100
> -                                */
> -                               if (phydev->speed == SPEED_100)
> -                                       ecntrl |= ECNTRL_R100;
> -                               else
> -                                       ecntrl &= ~(ECNTRL_R100);
> -                               break;
> -                       default:
> -                               netif_warn(priv, link, priv->ndev,
> -                                          "Ack!  Speed (%d) is not 10/100/1000!\n",
> -                                          phydev->speed);

Please 1. remove "Ack!" 2. treat SPEED_UNKNOWN here by setting the MAC
into a low-power state (e.g. 10 Mbps - the power savings are real).
Don't print that Speed -1 is not 10/100/1000, we know that.

> -                               break;
> -                       }
> -
> -                       priv->oldspeed = phydev->speed;
> -               }
> -
> -               tempval1 &= ~(MACCFG1_TX_FLOW | MACCFG1_RX_FLOW);
> -               tempval1 |= gfar_get_flowctrl_cfg(priv);
> -
> -               /* Turn last free buffer recording on */
> -               if ((tempval1 & MACCFG1_TX_FLOW) && !tx_flow_oldval) {
> -                       for (i = 0; i < priv->num_rx_queues; i++) {
> -                               u32 bdp_dma;
> -
> -                               rx_queue = priv->rx_queue[i];
> -                               bdp_dma = gfar_rxbd_dma_lastfree(rx_queue);
> -                               gfar_write(rx_queue->rfbptr, bdp_dma);
> -                       }
> -
> -                       priv->tx_actual_en = 1;
> -               }
> -
> -               if (unlikely(!(tempval1 & MACCFG1_TX_FLOW) && tx_flow_oldval))
> -                       priv->tx_actual_en = 0;
> -
> -               gfar_write(&regs->maccfg1, tempval1);
> -               gfar_write(&regs->maccfg2, tempval);
> -               gfar_write(&regs->ecntrl, ecntrl);
> -
> -               if (!priv->oldlink)
> -                       priv->oldlink = 1;
> -
> -       } else if (priv->oldlink) {
> -               priv->oldlink = 0;
> -               priv->oldspeed = 0;
> -               priv->oldduplex = -1;
> +       switch (state->speed) {
> +       case SPEED_1000:
> +               new_maccfg2 |= MACCFG2_GMII;
> +               break;
> +       case SPEED_100:
> +               new_maccfg2 |= MACCFG2_MII;
> +               new_ecntrl = ecntrl | ECNTRL_R100;
> +               break;
> +       case SPEED_10:
> +               new_maccfg2 |= MACCFG2_MII;
> +               break;
> +       default:
> +               netif_warn(priv, link, priv->ndev,
> +                          "Ack!  Speed (%d) is not 10/100/1000!\n",
> +                          state->speed);
> +               return;
>         }
>
> -       if (netif_msg_link(priv))
> -               phy_print_status(phydev);
> +       priv->speed = state->speed;
> +
> +       new_maccfg1 = maccfg1 & ~(MACCFG1_TX_FLOW | MACCFG1_RX_FLOW);
> +       new_maccfg1 |= gfar_get_flowctrl_cfg(state);
> +
> +       /* Turn last free buffer recording on */
> +       tx_flow = maccfg1 & MACCFG1_TX_FLOW;
> +       new_tx_flow = new_maccfg1 & MACCFG1_TX_FLOW;
> +       if (new_tx_flow && !tx_flow) {
> +               int i;
> +
> +               for (i = 0; i < priv->num_rx_queues; i++) {
> +                       struct gfar_priv_rx_q *rx_queue;
> +                       u32 bdp_dma;
> +
> +                       rx_queue = priv->rx_queue[i];
> +                       bdp_dma = gfar_rxbd_dma_lastfree(rx_queue);
> +                       gfar_write(rx_queue->rfbptr, bdp_dma);
> +               }
> +
> +               priv->tx_actual_en = 1;
> +       } else if (unlikely(!new_tx_flow && tx_flow)) {
> +               priv->tx_actual_en = 0;
> +       }
> +
> +       if (new_maccfg1 != maccfg1)
> +               gfar_write(&regs->maccfg1, new_maccfg1);
> +       if (new_maccfg2 != maccfg2)
> +               gfar_write(&regs->maccfg2, new_maccfg2);
> +       if (new_ecntrl != ecntrl)
> +               gfar_write(&regs->ecntrl, new_ecntrl);
> +}
> +
> +static void gfar_mac_an_restart(struct phylink_config *config)
> +{
> +       /* Not supported */
> +}

What about running gfar_configure_serdes again?

> +
> +static void gfar_mac_link_down(struct phylink_config *config, unsigned int mode,
> +                              phy_interface_t interface)
> +{
> +       /* Not supported */
> +}
> +

What about disabling RX_EN and TX_EN from MACCFG1?

> +static void gfar_mac_link_up(struct phylink_config *config, unsigned int mode,
> +                            phy_interface_t interface, struct phy_device *phy)
> +{
> +       /* Not supported */
>  }
>

What about enabling RX_EN and TX_EN from MACCFG1?

>  static const struct of_device_id gfar_match[] =
> diff --git a/drivers/net/ethernet/freescale/gianfar.h b/drivers/net/ethernet/freescale/gianfar.h
> index f2af96349c7b..0b28b1f60f2d 100644
> --- a/drivers/net/ethernet/freescale/gianfar.h
> +++ b/drivers/net/ethernet/freescale/gianfar.h
> @@ -31,8 +31,7 @@
>  #include <linux/skbuff.h>
>  #include <linux/spinlock.h>
>  #include <linux/mm.h>
> -#include <linux/mii.h>
> -#include <linux/phy.h>
> +#include <linux/phylink.h>
>
>  #include <asm/io.h>
>  #include <asm/irq.h>
> @@ -149,8 +148,13 @@ extern const char gfar_driver_version[];
>  #define GFAR_SUPPORTED_GBIT SUPPORTED_1000baseT_Full
>
>  /* TBI register addresses */
> +#define MII_TBI_CR             0x00
>  #define MII_TBICON             0x11
>
> +/* TBI_CR register bit fields */
> +#define TBI_CR_FULL_DUPLEX     0x0100
> +#define TBI_CR_SPEED_1000_MASK 0x0040
> +

I think BIT() definitions are preferred, even if that means you have
to convert existing code first.

>  /* TBICON register bit fields */
>  #define TBICON_CLK_SELECT      0x0020
>
> @@ -1148,12 +1152,12 @@ struct gfar_private {
>
>         /* PHY stuff */
>         phy_interface_t interface;
> -       struct device_node *phy_node;
> +       struct device_node *device_node;
>         struct device_node *tbi_node;
> -       struct mii_bus *mii_bus;
> -       int oldspeed;
> -       int oldduplex;
> -       int oldlink;
> +       struct phylink *phylink;
> +       struct phylink_config phylink_config;
> +       struct phy_device *tbi_phy;
> +       int speed;
>
>         uint32_t msg_enable;
>
> @@ -1165,11 +1169,7 @@ struct gfar_private {
>                 bd_stash_en:1,
>                 rx_filer_enable:1,
>                 /* Enable priorty based Tx scheduling in Hw */
> -               prio_sched_en:1,
> -               /* Flow control flags */
> -               pause_aneg_en:1,
> -               tx_pause_en:1,
> -               rx_pause_en:1;
> +               prio_sched_en:1;
>
>         /* The total tx and rx ring size for the enabled queues */
>         unsigned int total_tx_ring_size;
> @@ -1333,8 +1333,6 @@ void reset_gfar(struct net_device *dev);
>  void gfar_mac_reset(struct gfar_private *priv);
>  void gfar_halt(struct gfar_private *priv);
>  void gfar_start(struct gfar_private *priv);
> -void gfar_phy_test(struct mii_bus *bus, struct phy_device *phydev, int enable,
> -                  u32 regnum, u32 read);
>  void gfar_configure_coalescing_all(struct gfar_private *priv);
>  int gfar_set_features(struct net_device *dev, netdev_features_t features);
>
> diff --git a/drivers/net/ethernet/freescale/gianfar_ethtool.c b/drivers/net/ethernet/freescale/gianfar_ethtool.c
> index 3433b46b90c1..146b30d07789 100644
> --- a/drivers/net/ethernet/freescale/gianfar_ethtool.c
> +++ b/drivers/net/ethernet/freescale/gianfar_ethtool.c
> @@ -35,7 +35,7 @@
>  #include <asm/types.h>
>  #include <linux/ethtool.h>
>  #include <linux/mii.h>
> -#include <linux/phy.h>
> +#include <linux/phylink.h>
>  #include <linux/sort.h>
>  #include <linux/if_vlan.h>
>  #include <linux/of_platform.h>
> @@ -207,12 +207,10 @@ static void gfar_get_regs(struct net_device *dev, struct ethtool_regs *regs,
>  static unsigned int gfar_usecs2ticks(struct gfar_private *priv,
>                                      unsigned int usecs)
>  {
> -       struct net_device *ndev = priv->ndev;
> -       struct phy_device *phydev = ndev->phydev;

Are you sure this still works? You missed a ndev->phydev check from
gfar_gcoalesce, where this is called from. Technically you can still
check ndev->phydev, it's just that PHYLINK doesn't guarantee you'll
have one (e.g. fixed-link interfaces).

>         unsigned int count;
>
>         /* The timer is different, depending on the interface speed */
> -       switch (phydev->speed) {
> +       switch (priv->speed) {

What about retrieving the speed from phylink_ethtool_ksettings_get,
and using that here? This would save you from introducing a cached
priv->speed.

>         case SPEED_1000:
>                 count = GFAR_GBIT_TIME;
>                 break;
> @@ -234,12 +232,10 @@ static unsigned int gfar_usecs2ticks(struct gfar_private *priv,
>  static unsigned int gfar_ticks2usecs(struct gfar_private *priv,
>                                      unsigned int ticks)
>  {
> -       struct net_device *ndev = priv->ndev;
> -       struct phy_device *phydev = ndev->phydev;
>         unsigned int count;
>
>         /* The timer is different, depending on the interface speed */
> -       switch (phydev->speed) {
> +       switch (priv->speed) {

Same comments as above, but for gfar_scoalesce.

>         case SPEED_1000:
>                 count = GFAR_GBIT_TIME;
>                 break;
> @@ -489,58 +485,15 @@ static void gfar_gpauseparam(struct net_device *dev,
>  {
>         struct gfar_private *priv = netdev_priv(dev);
>
> -       epause->autoneg = !!priv->pause_aneg_en;
> -       epause->rx_pause = !!priv->rx_pause_en;
> -       epause->tx_pause = !!priv->tx_pause_en;
> +       phylink_ethtool_get_pauseparam(priv->phylink, epause);
>  }
>
>  static int gfar_spauseparam(struct net_device *dev,
>                             struct ethtool_pauseparam *epause)
>  {
>         struct gfar_private *priv = netdev_priv(dev);
> -       struct phy_device *phydev = dev->phydev;
> -       struct gfar __iomem *regs = priv->gfargrp[0].regs;
>
> -       if (!phydev)
> -               return -ENODEV;
> -
> -       if (!phy_validate_pause(phydev, epause))
> -               return -EINVAL;
> -
> -       priv->rx_pause_en = priv->tx_pause_en = 0;
> -       phy_set_asym_pause(phydev, epause->rx_pause, epause->tx_pause);
> -       if (epause->rx_pause) {
> -               priv->rx_pause_en = 1;
> -
> -               if (epause->tx_pause) {
> -                       priv->tx_pause_en = 1;
> -               }
> -       } else if (epause->tx_pause) {
> -               priv->tx_pause_en = 1;
> -       }
> -
> -       if (epause->autoneg)
> -               priv->pause_aneg_en = 1;
> -       else
> -               priv->pause_aneg_en = 0;
> -
> -       if (!epause->autoneg) {
> -               u32 tempval = gfar_read(&regs->maccfg1);
> -
> -               tempval &= ~(MACCFG1_TX_FLOW | MACCFG1_RX_FLOW);
> -
> -               priv->tx_actual_en = 0;
> -               if (priv->tx_pause_en) {
> -                       priv->tx_actual_en = 1;
> -                       tempval |= MACCFG1_TX_FLOW;
> -               }
> -
> -               if (priv->rx_pause_en)
> -                       tempval |= MACCFG1_RX_FLOW;
> -               gfar_write(&regs->maccfg1, tempval);
> -       }
> -
> -       return 0;
> +       return phylink_ethtool_set_pauseparam(priv->phylink, epause);
>  }
>
>  int gfar_set_features(struct net_device *dev, netdev_features_t features)
> @@ -1519,6 +1472,24 @@ static int gfar_get_ts_info(struct net_device *dev,
>         return 0;
>  }
>
> +/* Set link ksettings (phy address, speed) for ethtools */

ethtool, not ethtools. Also, I'm not quite sure what you mean by
setting the "phy address" with ethtool.

> +static int gfar_get_link_ksettings(struct net_device *dev,
> +                                  struct ethtool_link_ksettings *cmd)
> +{
> +       struct gfar_private *priv = netdev_priv(dev);
> +
> +       return phylink_ethtool_ksettings_get(priv->phylink, cmd);
> +}
> +
> +/* Get link ksettings for ethtools */
> +static int gfar_set_link_ksettings(struct net_device *dev,
> +                                  const struct ethtool_link_ksettings *cmd)
> +{
> +       struct gfar_private *priv = netdev_priv(dev);
> +
> +       return phylink_ethtool_ksettings_set(priv->phylink, cmd);
> +}
> +
>  const struct ethtool_ops gfar_ethtool_ops = {
>         .get_drvinfo = gfar_gdrvinfo,
>         .get_regs_len = gfar_reglen,
> @@ -1542,6 +1513,6 @@ const struct ethtool_ops gfar_ethtool_ops = {
>         .set_rxnfc = gfar_set_nfc,
>         .get_rxnfc = gfar_get_nfc,
>         .get_ts_info = gfar_get_ts_info,
> -       .get_link_ksettings = phy_ethtool_get_link_ksettings,
> -       .set_link_ksettings = phy_ethtool_set_link_ksettings,
> +       .get_link_ksettings = gfar_get_link_ksettings,
> +       .set_link_ksettings = gfar_set_link_ksettings,
>  };
> --
> 2.22.0
>

Thanks a lot for doing this!
-Vladimir

^ permalink raw reply

* Re: [bpf-next,v2 0/6] Introduce a BPF helper to generate SYN cookies
From: Petar Penkov @ 2019-07-29 23:45 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Petar Penkov, Networking, bpf, David S . Miller,
	Alexei Starovoitov, Daniel Borkmann, Eric Dumazet, lmb,
	Stanislav Fomichev, Toke Høiland-Jørgensen
In-Reply-To: <20190729204755.iu5wp3xisu42vkky@ast-mbp>

On Mon, Jul 29, 2019 at 1:48 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, Jul 29, 2019 at 09:59:12AM -0700, Petar Penkov wrote:
> > From: Petar Penkov <ppenkov@google.com>
> >
> > This patch series introduces a BPF helper function that allows generating SYN
> > cookies from BPF. Currently, this helper is enabled at both the TC hook and the
> > XDP hook.
> >
> > The first two patches in the series add/modify several TCP helper functions to
> > allow for SKB-less operation, as is the case at the XDP hook.
> >
> > The third patch introduces the bpf_tcp_gen_syncookie helper function which
> > generates a SYN cookie for either XDP or TC programs. The return value of
> > this function contains both the MSS value, encoded in the cookie, and the
> > cookie itself.
> >
> > The last three patches sync tools/ and add a test.
> >
> > Performance evaluation:
> > I sent 10Mpps to a fixed port on a host with 2 10G bonded Mellanox 4 NICs from
> > random IPv6 source addresses. Without XDP I observed 7.2Mpps (syn-acks) being
> > sent out if the IPv6 packets carry 20 bytes of TCP options or 7.6Mpps if they
> > carry no options. If I attached a simple program that checks if a packet is
> > IPv6/TCP/SYN, looks up the socket, issues a cookie, and sends it back out after
> > swapping src/dest, recomputing the checksum, and setting the ACK flag, I
> > observed 10Mpps being sent back out.
>

Thank you for reviewing this so quickly!
> Is it 10m because trafic gen is 10m?
Yes, I believe so.

> What is cpu utilization at this rate?
> Is it cpu or nic limited if you crank up the syn flood?
> Original 7M with all cores or single core?
My receiver was configured with 16rx queues and 16 cores. 7M all cores
are at 100% so I believe this case is CPU limited. At XDP, all cores
are at roughly 40%. I couldn't reliably generate higher SYN flood rate
than that, and the highest numbers I could see for XDP did not go past
10.65Mpps with ~42% utilization on each core. I think I am hitting a
NIC limit here since the CPUs are free.


>
> The patch set looks good to me.
> I'd like Eric to review it one more time before applying.
>

^ permalink raw reply

* [pull request][net-next 00/13] Mellanox, mlx5 tc flow handling for concurrent execution (Part 1)
From: Saeed Mahameed @ 2019-07-29 23:50 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev@vger.kernel.org, Saeed Mahameed

Hi Dave,

This series, mostly from Vlad, is the first part of ongoing work to
improve mlx5 tc flow handling by removing dependency on rtnl_lock and
providing a more fine-grained locking and rcu safe data structures.

For more information please see tag log below.

Please pull and let me know if there is any problem.

Thanks,
Saeed.

---
The following changes since commit 85fd8011475e86265beff7b7617c493c247f5356:

  Merge branch 'bnxt_en-TPA-57500' (2019-07-29 14:19:09 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5-updates-2019-07-29

for you to fetch changes up to b6fac0b46a1a76024698d240f0f9aac552f897b7:

  net/mlx5e: Protect tc flow table with mutex (2019-07-29 16:40:26 -0700)

----------------------------------------------------------------
mlx5-updates-2019-07-29

This series includes updates to mlx5 driver,
1) Simplifications, cleanup and warning prints improvements

2) From Vlad Buslov:
Refactor mlx5 tc flow handling for unlocked execution (Part 1)

Currently, all cls API hardware offloads driver callbacks require caller
to hold rtnl lock when calling them. Cls API has already been updated to
update software filters in parallel (on classifiers that support
unlocked execution), however hardware offloads code still obtains rtnl
lock before calling driver tc callbacks. This set implements partial
support for unlocked execution that is leveraged by follow up
refactorings in specific mlx5 tc subsystems and patch to cls API that
implements API that allows drivers to register their callbacks as
rtnl-unlocked.

In mlx5 tc code mlx5e_tc_flow is the main structure that is used to
represent tc filter. Currently, code the structure itself and its
handlers in both tc and eswitch layers do not implement any kind of
synchronizations and assume external global synchronization provided by
rtnl lock instead. Implement following changes to remove dependency on
rtnl lock in flow handling code that are intended to be used a
groundwork for following changes to provide fully rtnl-independent mlx5
tc:

- Extend struct mlx5e_tc_flow with atomic reference counter and rcu to
  allow concurrent access from multiple tc and neigh update workqueue
  instances without introducing any additional locks specific to the
  structure. Its 'flags' field type is changed to atomic bitmask ops which
  is necessary for tc to interact with other concurrent tc instances or
  concurrent neigh update that need to skip flows that are not fully
  initialized (new INIT_DONE flow flag) and can change the flags
  according to neighbor state (flipping OFFLOADED flag).

- Protect unready flows list by new uplink_priv->unready_flows_lock
  mutex.

- Convert calls to netdev APIs that require rtnl lock in flow handling
  code to their rcu counterparts.

- Modify eswitch code that is called from tc layer and assume implicit
  external synchronization to be concurrency safe: change
  esw->offloads.num_flows type to atomic integer and re-arrange
  esw->state_lock usage to protect additional data.

Some of approaches to synchronizations presented in this patch set are
quite complicated (lockless concurrent usage of data structures with rcu
and reference counting, using fine-grained locking when necessary, retry
mechanisms to handle concurrent insertion of another instance of data
structure with same key, etc.). This is necessary to allow calling the
firmware in parallel in most cases, which is the main motivation of this
change since firmware calls are mach heavier operation than atomic
operations, multitude of locks and potential multiple retries during
concurrent accesses to same elements.

----------------------------------------------------------------
Eli Britstein (1):
      net/mlx5e: Simplify get_route_and_out_devs helper function

Huy Nguyen (1):
      net/mlx5e: Print a warning when LRO feature is dropped or not allowed

Saeed Mahameed (2):
      net/mlx5e: Avoid warning print when not required
      net/mlx5e: Improve ethtool rxnfc callback structure

Vlad Buslov (8):
      net/mlx5e: Extend tc flow struct with reference counter
      net/mlx5e: Change flow flags type to unsigned long
      net/mlx5e: Protect tc flows hashtable with rcu
      net/mlx5e: Protect unready flows with dedicated lock
      net/mlx5e: Eswitch, change offloads num_flows type to atomic64
      net/mlx5e: Eswitch, use state_lock to synchronize vlan change
      net/mlx5e: Rely on rcu instead of rtnl lock when getting upper dev
      net/mlx5e: Protect tc flow table with mutex

wenxu (1):
      net/mlx5e: Fix unnecessary flow_block_cb_is_busy call

 drivers/net/ethernet/mellanox/mlx5/core/en/fs.h    |  13 +-
 .../net/ethernet/mellanox/mlx5/core/en/tc_tun.c    |  33 +-
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |  28 +-
 .../ethernet/mellanox/mlx5/core/en_fs_ethtool.c    |  11 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  16 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |  12 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.h   |   2 +
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c    | 492 +++++++++++++--------
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.h    |  26 +-
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  |  16 +-
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |   3 +-
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c |  27 +-
 12 files changed, 424 insertions(+), 255 deletions(-)

^ permalink raw reply

* [net-next 01/13] net/mlx5e: Print a warning when LRO feature is dropped or not allowed
From: Saeed Mahameed @ 2019-07-29 23:50 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev@vger.kernel.org, Huy Nguyen, Saeed Mahameed
In-Reply-To: <20190729234934.23595-1-saeedm@mellanox.com>

From: Huy Nguyen <huyn@mellanox.com>

When user enables LRO via ethtool and if the RQ mode is legacy,
mlx5e_fix_features drops the request without any explanation.
Add netdev_warn to cover this case.

Fixes: 6c3a823e1e9c ("net/mlx5e: RX, Remove HW LRO support in legacy RQ")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 47eea6b3a1c3..776eb46d263d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3788,9 +3788,10 @@ static netdev_features_t mlx5e_fix_features(struct net_device *netdev,
 			netdev_warn(netdev, "Dropping C-tag vlan stripping offload due to S-tag vlan\n");
 	}
 	if (!MLX5E_GET_PFLAG(params, MLX5E_PFLAG_RX_STRIDING_RQ)) {
-		features &= ~NETIF_F_LRO;
-		if (params->lro_en)
+		if (features & NETIF_F_LRO) {
 			netdev_warn(netdev, "Disabling LRO, not supported in legacy RQ\n");
+			features &= ~NETIF_F_LRO;
+		}
 	}
 
 	if (MLX5E_GET_PFLAG(params, MLX5E_PFLAG_RX_CQE_COMPRESS)) {
-- 
2.21.0


^ permalink raw reply related

* [net-next 02/13] net/mlx5e: Avoid warning print when not required
From: Saeed Mahameed @ 2019-07-29 23:50 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev@vger.kernel.org, Saeed Mahameed, Tariq Toukan
In-Reply-To: <20190729234934.23595-1-saeedm@mellanox.com>

When disabling CQE compression in favor of time-stamping, don't show a
warning when CQE compression is already disabled.

Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 776eb46d263d..266783295124 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3958,7 +3958,8 @@ int mlx5e_hwstamp_set(struct mlx5e_priv *priv, struct ifreq *ifr)
 	case HWTSTAMP_FILTER_PTP_V2_DELAY_REQ:
 	case HWTSTAMP_FILTER_NTP_ALL:
 		/* Disable CQE compression */
-		netdev_warn(priv->netdev, "Disabling cqe compression");
+		if (MLX5E_GET_PFLAG(&priv->channels.params, MLX5E_PFLAG_RX_CQE_COMPRESS))
+			netdev_warn(priv->netdev, "Disabling RX cqe compression\n");
 		err = mlx5e_modify_rx_cqe_compression_locked(priv, false);
 		if (err) {
 			netdev_err(priv->netdev, "Failed disabling cqe compression err=%d\n", err);
-- 
2.21.0


^ permalink raw reply related

* [net-next 03/13] net/mlx5e: Improve ethtool rxnfc callback structure
From: Saeed Mahameed @ 2019-07-29 23:50 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev@vger.kernel.org, Saeed Mahameed, Tariq Toukan
In-Reply-To: <20190729234934.23595-1-saeedm@mellanox.com>

Don't choose who implements the rxnfc "get/set" callbacks according to
CONFIG_MLX5_EN_RXNFC, instead have the callbacks always available and
delegate to a function of a different driver module when needed
(en_fs_ethtool.c), have stubs in en/fs.h to fallback to when
en_fs_ethtool.c is compiled out, to avoid complications and ifdefs in
en_main.c.

Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/en/fs.h   | 11 ++++++--
 .../ethernet/mellanox/mlx5/core/en_ethtool.c  | 28 +++++++++++--------
 .../mellanox/mlx5/core/en_fs_ethtool.c        | 11 +++-----
 3 files changed, 28 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h b/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h
index be5961ff24cc..eb70ada89b09 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h
@@ -132,12 +132,17 @@ struct mlx5e_ethtool_steering {
 
 void mlx5e_ethtool_init_steering(struct mlx5e_priv *priv);
 void mlx5e_ethtool_cleanup_steering(struct mlx5e_priv *priv);
-int mlx5e_set_rxnfc(struct net_device *dev, struct ethtool_rxnfc *cmd);
-int mlx5e_get_rxnfc(struct net_device *dev,
-		    struct ethtool_rxnfc *info, u32 *rule_locs);
+int mlx5e_ethtool_set_rxnfc(struct net_device *dev, struct ethtool_rxnfc *cmd);
+int mlx5e_ethtool_get_rxnfc(struct net_device *dev,
+			    struct ethtool_rxnfc *info, u32 *rule_locs);
 #else
 static inline void mlx5e_ethtool_init_steering(struct mlx5e_priv *priv)    { }
 static inline void mlx5e_ethtool_cleanup_steering(struct mlx5e_priv *priv) { }
+static inline int mlx5e_ethtool_set_rxnfc(struct net_device *dev, struct ethtool_rxnfc *cmd)
+{ return -EOPNOTSUPP; }
+static inline int mlx5e_ethtool_get_rxnfc(struct net_device *dev,
+					  struct ethtool_rxnfc *info, u32 *rule_locs)
+{ return -EOPNOTSUPP; }
 #endif /* CONFIG_MLX5_EN_RXNFC */
 
 #ifdef CONFIG_MLX5_EN_ARFS
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index 126ec4181286..a6b0eda0bd1a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -1888,21 +1888,27 @@ static u32 mlx5e_get_priv_flags(struct net_device *netdev)
 	return priv->channels.params.pflags;
 }
 
-#ifndef CONFIG_MLX5_EN_RXNFC
-/* When CONFIG_MLX5_EN_RXNFC=n we only support ETHTOOL_GRXRINGS
- * otherwise this function will be defined from en_fs_ethtool.c
- */
 static int mlx5e_get_rxnfc(struct net_device *dev, struct ethtool_rxnfc *info, u32 *rule_locs)
 {
 	struct mlx5e_priv *priv = netdev_priv(dev);
 
-	if (info->cmd != ETHTOOL_GRXRINGS)
-		return -EOPNOTSUPP;
-	/* ring_count is needed by ethtool -x */
-	info->data = priv->channels.params.num_channels;
-	return 0;
+	/* ETHTOOL_GRXRINGS is needed by ethtool -x which is not part
+	 * of rxnfc. We keep this logic out of mlx5e_ethtool_get_rxnfc,
+	 * to avoid breaking "ethtool -x" when mlx5e_ethtool_get_rxnfc
+	 * is compiled out via CONFIG_MLX5_EN_RXNFC=n.
+	 */
+	if (info->cmd == ETHTOOL_GRXRINGS) {
+		info->data = priv->channels.params.num_channels;
+		return 0;
+	}
+
+	return mlx5e_ethtool_get_rxnfc(dev, info, rule_locs);
+}
+
+static int mlx5e_set_rxnfc(struct net_device *dev, struct ethtool_rxnfc *cmd)
+{
+	return mlx5e_ethtool_set_rxnfc(dev, cmd);
 }
-#endif
 
 const struct ethtool_ops mlx5e_ethtool_ops = {
 	.get_drvinfo       = mlx5e_get_drvinfo,
@@ -1923,9 +1929,7 @@ const struct ethtool_ops mlx5e_ethtool_ops = {
 	.get_rxfh          = mlx5e_get_rxfh,
 	.set_rxfh          = mlx5e_set_rxfh,
 	.get_rxnfc         = mlx5e_get_rxnfc,
-#ifdef CONFIG_MLX5_EN_RXNFC
 	.set_rxnfc         = mlx5e_set_rxnfc,
-#endif
 	.get_tunable       = mlx5e_get_tunable,
 	.set_tunable       = mlx5e_set_tunable,
 	.get_pauseparam    = mlx5e_get_pauseparam,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c
index ea3a490b569a..a66589816e21 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c
@@ -887,10 +887,10 @@ static int mlx5e_get_rss_hash_opt(struct mlx5e_priv *priv,
 	return 0;
 }
 
-int mlx5e_set_rxnfc(struct net_device *dev, struct ethtool_rxnfc *cmd)
+int mlx5e_ethtool_set_rxnfc(struct net_device *dev, struct ethtool_rxnfc *cmd)
 {
-	int err = 0;
 	struct mlx5e_priv *priv = netdev_priv(dev);
+	int err = 0;
 
 	switch (cmd->cmd) {
 	case ETHTOOL_SRXCLSRLINS:
@@ -910,16 +910,13 @@ int mlx5e_set_rxnfc(struct net_device *dev, struct ethtool_rxnfc *cmd)
 	return err;
 }
 
-int mlx5e_get_rxnfc(struct net_device *dev,
-		    struct ethtool_rxnfc *info, u32 *rule_locs)
+int mlx5e_ethtool_get_rxnfc(struct net_device *dev,
+			    struct ethtool_rxnfc *info, u32 *rule_locs)
 {
 	struct mlx5e_priv *priv = netdev_priv(dev);
 	int err = 0;
 
 	switch (info->cmd) {
-	case ETHTOOL_GRXRINGS:
-		info->data = priv->channels.params.num_channels;
-		break;
 	case ETHTOOL_GRXCLSRLCNT:
 		info->rule_cnt = priv->fs.ethtool.tot_num_rules;
 		break;
-- 
2.21.0


^ permalink raw reply related

* [net-next 05/13] net/mlx5e: Simplify get_route_and_out_devs helper function
From: Saeed Mahameed @ 2019-07-29 23:50 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev@vger.kernel.org, Eli Britstein, Roi Dayan, Saeed Mahameed
In-Reply-To: <20190729234934.23595-1-saeedm@mellanox.com>

From: Eli Britstein <elibr@mellanox.com>

The helper function has "if" branches that do the same. Merge them to
simplify the code.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../ethernet/mellanox/mlx5/core/en/tc_tun.c   | 19 +++++++------------
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c
index a6a52806be45..ae439d95f5a3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c
@@ -40,20 +40,15 @@ static int get_route_and_out_devs(struct mlx5e_priv *priv,
 	/* if the egress device isn't on the same HW e-switch or
 	 * it's a LAG device, use the uplink
 	 */
+	*route_dev = dev;
 	if (!netdev_port_same_parent_id(priv->netdev, real_dev) ||
-	    dst_is_lag_dev) {
-		*route_dev = dev;
+	    dst_is_lag_dev || is_vlan_dev(*route_dev))
 		*out_dev = uplink_dev;
-	} else {
-		*route_dev = dev;
-		if (is_vlan_dev(*route_dev))
-			*out_dev = uplink_dev;
-		else if (mlx5e_eswitch_rep(dev) &&
-			 mlx5e_is_valid_eswitch_fwd_dev(priv, dev))
-			*out_dev = *route_dev;
-		else
-			return -EOPNOTSUPP;
-	}
+	else if (mlx5e_eswitch_rep(dev) &&
+		 mlx5e_is_valid_eswitch_fwd_dev(priv, dev))
+		*out_dev = *route_dev;
+	else
+		return -EOPNOTSUPP;
 
 	if (!(mlx5e_eswitch_rep(*out_dev) &&
 	      mlx5e_is_uplink_rep(netdev_priv(*out_dev))))
-- 
2.21.0


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox