Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH bpf-next v10 5/7] bpf: add handling of BPF_LWT_REROUTE to lwt_bpf.c
From: Peter Oskolkov @ 2019-02-13 20:41 UTC (permalink / raw)
  To: David Ahern
  Cc: Alexei Starovoitov, Daniel Borkmann, netdev, Peter Oskolkov,
	Willem de Bruijn
In-Reply-To: <80849fb5-c5de-ce6b-6c25-bd152326196c@gmail.com>

On Wed, Feb 13, 2019 at 12:11 PM David Ahern <dsahern@gmail.com> wrote:
>
> On 2/13/19 12:57 PM, Peter Oskolkov wrote:
> > Thanks, David! I was not able to reproduce the leak, but based on your
> > suggestion and similar code elsewhere I made a change in v11 to explicitly
> > release a dst with error.
>
> ok. Did you run the test with a debug kernel - checking refcount, use
> after free, etc?

In my tests I was always getting ERR_PTR for unroutable packets,
not a full rt/dst with an error flag set. But I checked several
similar route lookups,
and they all release bad dsts, so I did not feel it was worth it to
investigate further.

^ permalink raw reply

* [PATCH] mm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocs
From: Jann Horn @ 2019-02-13 20:41 UTC (permalink / raw)
  To: linux-mm, Andrew Morton, jannh
  Cc: linux-kernel, Michal Hocko, Vlastimil Babka, Pavel Tatashin,
	Oscar Salvador, Mel Gorman, Aaron Lu, netdev, Alexander Duyck

The basic idea behind ->pagecnt_bias is: If we pre-allocate the maximum
number of references that we might need to create in the fastpath later,
the bump-allocation fastpath only has to modify the non-atomic bias value
that tracks the number of extra references we hold instead of the atomic
refcount. The maximum number of allocations we can serve (under the
assumption that no allocation is made with size 0) is nc->size, so that's
the bias used.

However, even when all memory in the allocation has been given away, a
reference to the page is still held; and in the `offset < 0` slowpath, the
page may be reused if everyone else has dropped their references.
This means that the necessary number of references is actually
`nc->size+1`.

Luckily, from a quick grep, it looks like the only path that can call
page_frag_alloc(fragsz=1) is TAP with the IFF_NAPI_FRAGS flag, which
requires CAP_NET_ADMIN in the init namespace and is only intended to be
used for kernel testing and fuzzing.

To test for this issue, put a `WARN_ON(page_ref_count(page) == 0)` in the
`offset < 0` path, below the virt_to_page() call, and then repeatedly call
writev() on a TAP device with IFF_TAP|IFF_NO_PI|IFF_NAPI_FRAGS|IFF_NAPI,
with a vector consisting of 15 elements containing 1 byte each.

Cc: stable@vger.kernel.org
Signed-off-by: Jann Horn <jannh@google.com>
---
 mm/page_alloc.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 35fdde041f5c..46285d28e43b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4675,11 +4675,11 @@ void *page_frag_alloc(struct page_frag_cache *nc,
 		/* Even if we own the page, we do not use atomic_set().
 		 * This would break get_page_unless_zero() users.
 		 */
-		page_ref_add(page, size - 1);
+		page_ref_add(page, size);
 
 		/* reset page count bias and offset to start of new frag */
 		nc->pfmemalloc = page_is_pfmemalloc(page);
-		nc->pagecnt_bias = size;
+		nc->pagecnt_bias = size + 1;
 		nc->offset = size;
 	}
 
@@ -4695,10 +4695,10 @@ void *page_frag_alloc(struct page_frag_cache *nc,
 		size = nc->size;
 #endif
 		/* OK, page count is 0, we can safely set it */
-		set_page_count(page, size);
+		set_page_count(page, size + 1);
 
 		/* reset page count bias and offset to start of new frag */
-		nc->pagecnt_bias = size;
+		nc->pagecnt_bias = size + 1;
 		offset = size - fragsz;
 	}
 
-- 
2.20.1.791.gb4d0f1c61a-goog


^ permalink raw reply related

* Re: [PATCH bpf-next v4 0/2] libbpf: adding AF_XDP support
From: Jonathan Lemon @ 2019-02-13 20:49 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: Magnus Karlsson, Björn Töpel, ast, Daniel Borkmann,
	Network Development, Jakub Kicinski, Björn Töpel,
	Zhang, Qi Z, Jesper Dangaard Brouer, xiaolong.ye
In-Reply-To: <CAJ8uoz19UjmEHTc28Qd_9KdY9D-ojXSBRTbmffRhUTX49mnWvg@mail.gmail.com>

On 13 Feb 2019, at 3:32, Magnus Karlsson wrote:

> On Mon, Feb 11, 2019 at 9:44 PM Jonathan Lemon 
> <jonathan.lemon@gmail.com> wrote:
>>
>> On 8 Feb 2019, at 5:05, Magnus Karlsson wrote:
>>
>>> This patch proposes to add AF_XDP support to libbpf. The main reason
>>> for this is to facilitate writing applications that use AF_XDP by
>>> offering higher-level APIs that hide many of the details of the 
>>> AF_XDP
>>> uapi. This is in the same vein as libbpf facilitates XDP adoption by
>>> offering easy-to-use higher level interfaces of XDP
>>> functionality. Hopefully this will facilitate adoption of AF_XDP, 
>>> make
>>> applications using it simpler and smaller, and finally also make it
>>> possible for applications to benefit from optimizations in the 
>>> AF_XDP
>>> user space access code. Previously, people just copied and pasted 
>>> the
>>> code from the sample application into their application, which is 
>>> not
>>> desirable.
>>
>> I like the idea of encapsulating the boilerplate logic in a library.
>>
>> I do think there is an important missing piece though - there should 
>> be
>> some code which queries the netdev for how many queues are attached, 
>> and
>> create the appropriate number of umem/AF_XDP sockets.
>>
>> I ran into this issue when testing the current AF_XDP code - on my 
>> test
>> boxes, the mlx5 card has 55 channels (aka queues), so when the test 
>> program
>> binds only to channel 0, nothing works as expected, since not all 
>> traffic
>> is being intercepted.  While obvious in hindsight, this took a while 
>> to
>> track down.
>
> Yes, agreed. You are not the first one to stumble upon this problem
> :-). Let me think a little bit on how to solve this in a good way. We
> need this to be simple and intuitive, as you say.

Has any investigation been done on using some variant of MPSC 
implementation
as an intermediate form for AF_XDP?  E.g.: something like LCRQ or the 
bulkQ
in bpf devmap/cpumap.  I'm aware that this would be slightly slower, as 
it
would introduce a lock in the path, but I'd think that having DEVMAP, 
CPUMAP
and XSKMAP all behave the same way would add more flexibility.

Ideally, if the configuration matches the underlying hardware, then the
implementation would reduce to the current setup (and allow ZC 
implementations),
but a non-matching configuration would still work - as opposed to the 
current
situation.
-- 
Jonathan

^ permalink raw reply

* Re: [PATCH] mm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocs
From: Andrew Morton @ 2019-02-13 20:59 UTC (permalink / raw)
  To: Jann Horn
  Cc: linux-mm, linux-kernel, Michal Hocko, Vlastimil Babka,
	Pavel Tatashin, Oscar Salvador, Mel Gorman, Aaron Lu, netdev,
	Alexander Duyck
In-Reply-To: <20190213204157.12570-1-jannh@google.com>

On Wed, 13 Feb 2019 21:41:57 +0100 Jann Horn <jannh@google.com> wrote:

> The basic idea behind ->pagecnt_bias is: If we pre-allocate the maximum
> number of references that we might need to create in the fastpath later,
> the bump-allocation fastpath only has to modify the non-atomic bias value
> that tracks the number of extra references we hold instead of the atomic
> refcount. The maximum number of allocations we can serve (under the
> assumption that no allocation is made with size 0) is nc->size, so that's
> the bias used.
> 
> However, even when all memory in the allocation has been given away, a
> reference to the page is still held; and in the `offset < 0` slowpath, the
> page may be reused if everyone else has dropped their references.
> This means that the necessary number of references is actually
> `nc->size+1`.
> 
> Luckily, from a quick grep, it looks like the only path that can call
> page_frag_alloc(fragsz=1) is TAP with the IFF_NAPI_FRAGS flag, which
> requires CAP_NET_ADMIN in the init namespace and is only intended to be
> used for kernel testing and fuzzing.

For the net-naive, what is TAP?  It doesn't appear to mean
drivers/net/tap.c.

> To test for this issue, put a `WARN_ON(page_ref_count(page) == 0)` in the
> `offset < 0` path, below the virt_to_page() call, and then repeatedly call
> writev() on a TAP device with IFF_TAP|IFF_NO_PI|IFF_NAPI_FRAGS|IFF_NAPI,
> with a vector consisting of 15 elements containing 1 byte each.
> 
> ...
>
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4675,11 +4675,11 @@ void *page_frag_alloc(struct page_frag_cache *nc,
>  		/* Even if we own the page, we do not use atomic_set().
>  		 * This would break get_page_unless_zero() users.
>  		 */
> -		page_ref_add(page, size - 1);
> +		page_ref_add(page, size);
>  
>  		/* reset page count bias and offset to start of new frag */
>  		nc->pfmemalloc = page_is_pfmemalloc(page);
> -		nc->pagecnt_bias = size;
> +		nc->pagecnt_bias = size + 1;
>  		nc->offset = size;
>  	}
>  
> @@ -4695,10 +4695,10 @@ void *page_frag_alloc(struct page_frag_cache *nc,
>  		size = nc->size;
>  #endif
>  		/* OK, page count is 0, we can safely set it */
> -		set_page_count(page, size);
> +		set_page_count(page, size + 1);
>  
>  		/* reset page count bias and offset to start of new frag */
> -		nc->pagecnt_bias = size;
> +		nc->pagecnt_bias = size + 1;
>  		offset = size - fragsz;
>  	}

This is probably more a davem patch than a -mm one.

^ permalink raw reply

* Re: [PATCH] mm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocs
From: Jann Horn @ 2019-02-13 21:11 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, kernel list, Michal Hocko, Vlastimil Babka,
	Pavel Tatashin, Oscar Salvador, Mel Gorman, Aaron Lu,
	Network Development, Alexander Duyck
In-Reply-To: <20190213125906.eae96c18fe585e060aaf0ef7@linux-foundation.org>

On Wed, Feb 13, 2019 at 9:59 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Wed, 13 Feb 2019 21:41:57 +0100 Jann Horn <jannh@google.com> wrote:
>
> > The basic idea behind ->pagecnt_bias is: If we pre-allocate the maximum
> > number of references that we might need to create in the fastpath later,
> > the bump-allocation fastpath only has to modify the non-atomic bias value
> > that tracks the number of extra references we hold instead of the atomic
> > refcount. The maximum number of allocations we can serve (under the
> > assumption that no allocation is made with size 0) is nc->size, so that's
> > the bias used.
> >
> > However, even when all memory in the allocation has been given away, a
> > reference to the page is still held; and in the `offset < 0` slowpath, the
> > page may be reused if everyone else has dropped their references.
> > This means that the necessary number of references is actually
> > `nc->size+1`.
> >
> > Luckily, from a quick grep, it looks like the only path that can call
> > page_frag_alloc(fragsz=1) is TAP with the IFF_NAPI_FRAGS flag, which
> > requires CAP_NET_ADMIN in the init namespace and is only intended to be
> > used for kernel testing and fuzzing.
>
> For the net-naive, what is TAP?  It doesn't appear to mean
> drivers/net/tap.c.

It's implemented in drivers/net/tun.c; the combined functionality
implemented in there is called TUN/TAP. TUN refers to providing raw IP
packets to the kernel, TAP refers to providing raw ethernet packets.
It's documented in Documentation/networking/tuntap.txt. The code
that's interesting here is tun_get_user(), which calls into
tun_napi_alloc_frags() if tun_napi_frags_enabled(tfile) is true, which
in turn calls into netdev_alloc_frag(), which ends up in
page_frag_alloc(). This is how you can use it (except that if you were
using it legitimately, you'd be writing an ethernet header, a layer 3
header, and application data instead of writing "aaaaaaaaaaaaaaa" like
me):

================
#define _GNU_SOURCE
#include <stdlib.h>
#include <stdarg.h>
#include <net/if.h>
#include <linux/if.h>
#include <linux/if_tun.h>
#include <err.h>
#include <sys/types.h>
#include <fcntl.h>
#include <string.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/ioctl.h>

void systemf(const char *command, ...) {
  char *full_command;
  va_list ap;
  va_start(ap, command);
  if (vasprintf(&full_command, command, ap) == -1)
    err(1, "vasprintf");
  va_end(ap);
  printf("systemf: <<<%s>>>\n", full_command);
  system(full_command);
}

char *devname;

int tun_alloc(char *name) {
  int fd = open("/dev/net/tun", O_RDWR);
  if (fd == -1)
    err(1, "open tun dev");
  static struct ifreq req = { .ifr_flags =
IFF_TAP|IFF_NO_PI|IFF_NAPI_FRAGS|IFF_NAPI };
  strcpy(req.ifr_name, name);
  if (ioctl(fd, TUNSETIFF, &req))
    err(1, "TUNSETIFF");
  devname = req.ifr_name;
  printf("device name: %s\n", devname);
  return fd;
}

int main(void) {
  int tun_fd = tun_alloc("inject_dev%d");
  systemf("ip link set %s up", devname);

  while (1) {
    struct iovec iov[15];
    for (int i=0; i<sizeof(iov)/sizeof(iov[0]); i++) {
      iov[i].iov_base = "a";
      iov[i].iov_len = 1;
    }
    writev(tun_fd, iov, sizeof(iov)/sizeof(iov[0]));
  }
}
================

> > To test for this issue, put a `WARN_ON(page_ref_count(page) == 0)` in the
> > `offset < 0` path, below the virt_to_page() call, and then repeatedly call
> > writev() on a TAP device with IFF_TAP|IFF_NO_PI|IFF_NAPI_FRAGS|IFF_NAPI,
> > with a vector consisting of 15 elements containing 1 byte each.
> >
> > ...
> >
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -4675,11 +4675,11 @@ void *page_frag_alloc(struct page_frag_cache *nc,
> >               /* Even if we own the page, we do not use atomic_set().
> >                * This would break get_page_unless_zero() users.
> >                */
> > -             page_ref_add(page, size - 1);
> > +             page_ref_add(page, size);
> >
> >               /* reset page count bias and offset to start of new frag */
> >               nc->pfmemalloc = page_is_pfmemalloc(page);
> > -             nc->pagecnt_bias = size;
> > +             nc->pagecnt_bias = size + 1;
> >               nc->offset = size;
> >       }
> >
> > @@ -4695,10 +4695,10 @@ void *page_frag_alloc(struct page_frag_cache *nc,
> >               size = nc->size;
> >  #endif
> >               /* OK, page count is 0, we can safely set it */
> > -             set_page_count(page, size);
> > +             set_page_count(page, size + 1);
> >
> >               /* reset page count bias and offset to start of new frag */
> > -             nc->pagecnt_bias = size;
> > +             nc->pagecnt_bias = size + 1;
> >               offset = size - fragsz;
> >       }
>
> This is probably more a davem patch than a -mm one.

Ah, sorry. I assumed that I just should go by which directory the
patched code is in.

You did just add it to the -mm tree though, right? So I shouldn't
resend it to davem?

^ permalink raw reply

* Re: [PATCH iproute2 net-next v2 3/4] ss: Buffer raw fields first, then render them as a table
From: Stefano Brivio @ 2019-02-13 21:17 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Stephen Hemminger, netdev, Sabrina Dubroca, David Ahern
In-Reply-To: <dfdb5a99-d922-5be8-b110-e5f069600ecd@gmail.com>

On Wed, 13 Feb 2019 09:31:03 -0800
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> On 02/13/2019 12:37 AM, Stefano Brivio wrote:
> > On Tue, 12 Feb 2019 16:42:04 -0800
> > Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >   
> >> I do not get it.
> >>
> >> "ss -emoi " uses almost 1KB per socket.
> >>
> >> 10,000,000 sockets -> we need about 10GB of memory  ???
> >>
> >> This is a serious regression.  
> > 
> > I guess this is rather subjective: the worst case I considered back then
> > was the output of 'ss -tei0' (less than 500 bytes) for one million
> > sockets, which gives 500M of memory, which should in turn be fine on a
> > machine handling one million sockets.
> > 
> > Now, if 'ss -emoi' on 10 million sockets is an actual use case (out of
> > curiosity: how are you going to process that output? Would JSON help?),
> > I see two easy options to solve this:  
> 
> 
> ss -temoi | parser (written in shell or awk or whatever...)
> 
> This is a use case, I just got bitten because using ss command
> actually OOM my container, while trying to debug a busy GFE.
> 
> The host itself can have 10,000,000 TCP sockets, but usually sysadmin shells
> run in a container with no more than 500 MB available. 
> 
> Otherwise, it would be too easy for a buggy program to OOM the whole machine
> and have angry customers.
> 
> > 
> > 1. flush the output every time we reach a given buffer size (1M
> >    perhaps). This might make the resulting blocks slightly unaligned,
> >    with occasional loss of readability on lines occurring every 1k to
> >    10k sockets approximately, even though after 1k sockets column sizes
> >    won't change much (it looks anyway better than the original), and I
> >    don't expect anybody to actually scroll that output
> > 
> > 2. add a switch for unbuffered output, but then you need to remember to
> >    pass it manually, and the whole output would be as bad as the
> >    original in case you need the switch.
> > 
> > I'd rather go with 1., it's easy to implement (we already have partial
> > flushing with '--events') and it looks like a good compromise on
> > usability. Thoughts?
> >   
> 
> 1 seems fine, but a switch for 'please do not try to format' would be fine.
> 
> I wonder why we try to 'format' when stdout is a pipe or a regular file .

On a second thought: what about | less, or | grep [ports],
or > readable.log? I guess those might also be rather common use cases,
what do you think?

I'm tempted to skip this for the moment and just go with option 1.

-- 
Stefano

^ permalink raw reply

* Re: [PATCH] net: phy: at803x: disable delay only for RGMII mode
From: Florian Fainelli @ 2019-02-13 21:38 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: Marc Gonzalez, Andrew Lunn, Vinod Koul, David S Miller,
	linux-arm-msm, Bjorn Andersson, netdev, Nori, Sekhar,
	Peter Ujfalusi, hkallweit1
In-Reply-To: <20190213200738.GB460@centauri.lan>

On 2/13/19 12:07 PM, Niklas Cassel wrote:
> On Wed, Feb 13, 2019 at 09:59:43AM -0800, Florian Fainelli wrote:
>> On 2/13/19 9:40 AM, Niklas Cassel wrote:
>>> On Wed, Feb 13, 2019 at 02:40:18PM +0100, Marc Gonzalez wrote:
>>>> On 13/02/2019 14:29, Andrew Lunn wrote:
>>>>
>>>>>> So we have these modes:
>>>>>>
>>>>>> PHY_INTERFACE_MODE_RGMII: TX and RX delays disabled
>>>>>> PHY_INTERFACE_MODE_RGMII_ID: TX and RX delays enabled
>>>>>> PHY_INTERFACE_MODE_RGMII_RXID: RX delay enabled, TX delay disabled
>>>>>> PHY_INTERFACE_MODE_RGMII_TXID: TX delay enabled, RX delay disabled
>>>>>>
>>>>>> What I don't like with this patch, is that if we specify phy-mode
>>>>>> PHY_INTERFACE_MODE_RGMII_TXID, this patch will enable TX delay,
>>>>>> but RX delay will not be explicitly set.
>>>>>
>>>>> That is not the behaviour we want. It is best to assume the device is
>>>>> in a random state, and correctly enable/disable all delays as
>>>>> requested. Only leave the hardware alone if PHY_INTERFACE_MODE_NA is
>>>>> used.
>>>>
>>>> That's what my patch did:
>>>> https://www.spinics.net/lists/netdev/msg445053.html
>>>>
>>>> But see Florian's remarks:
>>>> https://www.spinics.net/lists/netdev/msg445133.html
>>>
>>> Hello Marc,
>>>
>>> I saw that comment from Florian. However that was way back in 2017.
>>> Maybe the phy-modes were not as well defined back then?
>>
>> The definition of the 'phy-mode' was clarified to be understood from the
>> perspective of the PHY device (hence the name) after we had several
>> fruitful exchanges with Marc (at least from my perspective), but since
>> the definition was not clear before, there is a high chance of finding
>> DTS/DTBs out there with the 'phy-mode' property understood from the
>> MAC's perspective, which would now be wrong.
> 
> Hello Florian,
> 
> 
> We have a specification:
> Documentation/devicetree/bindings/net/ethernet.txt
> 
> And several implementations: the PHY drivers.
> 
> Either we decide that all PHY drivers have to follow
> the specification for "phy-mode" in
> Documentation/devicetree/bindings/net/ethernet.txt
> or we decide that they don't.
> 
> If we decide that all PHY drivers have to follow the specification,
> then we can fix the PHY drivers that currently do not follow the
> specification.
> 
> If we decide that all PHY drivers do not have to follow the spec,
> then the "phy-mode" property is basically useless, and then we should
> introduce a new device tree property, e.g. "phy-mode2", that is
> guaranteed to respect the definitons in
> Documentation/devicetree/bindings/net/ethernet.txt

If the specification had been clear from day one, then we would not be
in the situation we are in today, so in that case it is not as simple
as: a) deprecating an existing property that was misused because the
spec was not well enough defined and b) go and fix all drivers. The
amount of breakage that can be introduced is just immense, and quite
frankly, for absolutely no good reason.

It's all well and good to introduce a 'phy-mode2' but let's think about
the future:

- what is depreciation path for 'phy-mode'/'phy-connection-type' looking
like then?
- do we have the manpower to review every new binding, DTS submission
that gets included in Linux, FreeBSD, Zephyr, for correctness?

> 
>>
>>
>>>
>>> Andrew recently suggested to fix the driver so that it conforms with the
>>> phy-modes, and fix any SoC that specified an incorrect phy-mode in DT
>>> and thus relied upon the broken behavior of the PHY driver:
>>> https://www.spinics.net/lists/netdev/msg445133.html
>>>
>>>
>>> So, I've rebased your old patch, see attachment.
>>> I suggest that Peter test it on am335x-evm.
>>>
>>> am335x-evm appears to rely on the current broken behavior of the PHY
>>> driver, so we will probably need to fix the am335x-evm according to this:
>>> https://www.spinics.net/lists/netdev/msg445117.html
>>> and merge that as well.
>>>
>>>
>>> Andrew, Florian, do you both agree?
>>
>> In my reply to Marc, there was a concern that while am335x-evm was
>> identified and reported to be broken after fixing the PHY driver, there
>> could be platforms out there that we have little to no visibility that
>> would most likely be equally broken. That concern still exists, and I
>> don't think there is anything we can do to even assess the size of the
>> problem unless we attempt to fix it, so maybe we should attempt to fix that.
>>
>> There was a suggestion to Marc that one way to possibly "ignore" an
>> incorrectly broken 'phy-mode' property would be to allow specifying
>> rx/tx delay properties such that if the driver obtained its
>> phy_interface_t, yet still parsed rx/tx delays, the rx/tx delays would
>> take precedence, and we could possibly derive some sort of a "more
>> correct" phy_interface_t that we could assign back to phydev->interface
>> and issue a warning about that.
> 
> You mean to add new device tree properties to
> Documentation/devicetree/bindings/net/ethernet.txt
> 
> - phy-id-tx: "true" if PHY should add internal delay on TX lines;
>              "false" or not specified if PHY should not add internal
> 	     delay on TX lines. This property overrides any delay
> 	     requested by "phy-mode".
> - phy-id-rx: "true" if PHY should add internal delay on RX lines;
>              "false" or not specified if PHY should not add internal
> 	     delay on RX lines. This property overrides any delay
> 	     requested by "phy-mode".
> 
> Perhaps something like that?

Not quite booleans, actual delay values, e.g.:

tx-delay-ps = <2000>
rx-delay-ps = <2000>

this is something that exists already:

Documentation/devicetree/bindings/net/apm-xgene-enet.txt
Documentation/devicetree/bindings/net/cavium-pip.txt
Documentation/devicetree/bindings/net/dwmac-sun8i.txt

because conceptually, telling the PHY driver that a TX or RX delay is
simply not enough, sometimes the standard 2ns (90 degree shift at
125Mhz) is not good enough and gets you out of spec because of some
board design.

> 
> Personally, I prefer making "phy-mode" strict,
> but whatever you guys decide:
> - making "phy-mode" strict
> - introducing a "phy-mode2"
> - introducing "phy-id-tx/phy-id-rx"
> - introducing "mac-mode"
> - some other solution
> 
> It is probably wise to introduce helper functions in phy.h
>  phy_wants_id_rx()
>  phy_wants_id_tx()
> so that PHY drivers can simply use e.g.:
> 
> if (phy_wants_id_rx(phydev))
> 	at803x_enable_rx_delay(phydev);
> else
> 	at803x_disable_rx_delay(phydev);
> 
> if (phy_wants_id_tx(phydev))
> 	at803x_enable_tx_delay(phydev);
> else
> 	at803x_disable_tx_delay(phydev);

Yes, that I think is pretty much orthogonal to the end solution we
decide to choose, having a way to tell what the PHY is currently
configured, or capable of supporting is step 1 in trying to find a
compatibility solution.

> 
>>
>> Another possible way to resolve that could be to introduce a 'mac-mode'
>> property, which must be strictly compatible with specifying a 'phy-mode'
>> property. For instance:
>>
>> - MAC specifies mac-mode = 'rgmii-id', then the PHY must have phy-mode =
>> 'rmgii' since the MAC is taking of inserting both RX and TX delays,
>> reverse also applies
>>
>> - MAC specifies mac-mode = 'rgmii-txid', then the PHY must have phy-mode
>> = 'rgmii-rxid' because the MAC adds the TX delay, but the PHY should
>> insert the delay on the RX lines, reverse also applies
>>
>> Because there is usually (not always, DSA is an exception) a 1:1 mapping
>> between MAC and PHY devices we could look up the 'mac-mode' property in
>> the MAC in the PHY library code and make sure that we have a compatible
>> matrix and if we do not, maybe pass something like PHY_INTERFACE_MODE_NA
>> such that the driver retains its settings.
> 
> Is there any advantage of creating a "mac-mode" over creating a
> "phy-mode2" ?
> 
> 
> Kind regards,
> Niklas
> 
>>
>> Maybe another way to approach this is if we assume that the PHY comes up
>> configured correctly by the boot loader, or upon power on reset, we add
>> some PHY driver methods that allow us to determine the RGMII mode in
>> which a PHY is and that tells us whether we are compatible with the
>> MAC's phy_interface_t upon connection. We check both at connect() time
>> and if something does not look right, we flip the meaning of
>> phy_interface_t.
>>
>> None of those solutions are entirely fool proof, but at least we might
>> be able to detect incorrect combinations, yet still make them work by
>> reversing the meaning of the 'phy-mode' property given information at hand.
>>
>> Let me know if none of that makes sense and this just looks like yet
>> another brain dump.
>>
>> Wonderful RGMII...
>> -- 
>> Florian


-- 
Florian

^ permalink raw reply

* Re: [PATCH] mm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocs
From: Andrew Morton @ 2019-02-13 21:40 UTC (permalink / raw)
  To: Jann Horn
  Cc: Linux-MM, kernel list, Michal Hocko, Vlastimil Babka,
	Pavel Tatashin, Oscar Salvador, Mel Gorman, Aaron Lu,
	Network Development, Alexander Duyck
In-Reply-To: <CAG48ez2Qo7N-+=y=eFhzw9HfYS3HODAY-zLaubFMGyXEV_nwpg@mail.gmail.com>

On Wed, 13 Feb 2019 22:11:58 +0100 Jann Horn <jannh@google.com> wrote:

> > This is probably more a davem patch than a -mm one.
> 
> Ah, sorry. I assumed that I just should go by which directory the
> patched code is in.
> 
> You did just add it to the -mm tree though, right? So I shouldn't
> resend it to davem?

Yes, please send to Dave.  I'll autodrop the -mm copy if/when it turns
up in -next.


^ permalink raw reply

* Re: [PATCH] NETWORKING: avoid use IPCB in cipso_v4_error
From: Paul Moore @ 2019-02-13 21:41 UTC (permalink / raw)
  To: Nazarov Sergey
  Cc: netdev@vger.kernel.org, linux-security-module@vger.kernel.org,
	davem, kuznet, yoshfuji
In-Reply-To: <6691891549984203@myt5-a323eb993ef7.qloud-c.yandex.net>

On Tue, Feb 12, 2019 at 10:10 AM Nazarov Sergey <s-nazarov@yandex.ru> wrote:
> Since cipso_v4_error might be called from different network stack layers, we can't safely use icmp_send there.
> icmp_send copies IP options with ip_option_echo, which uses IPCB to take access to IP header compiled data.
> But after commit 971f10ec ("tcp: better TCP_SKB_CB layout to reduce cache line misses"), IPCB can't be used
> above IP layer.
> This patch fixes the problem by creating in cipso_v4_error a local copy of compiled IP options and using it with
> introduced __icmp_send function. This looks some overloaded, but in quite rare error conditions only.
>
> The original discussion is here:
> https://lore.kernel.org/linux-security-module/16659801547571984@sas1-890ba5c2334a.qloud-c.yandex.net/
>
> Signed-off-by: Sergey Nazarov <s-nazarov@yandex.ru>
> ---
>  include/net/icmp.h    |    9 ++++++++-
>  net/ipv4/cipso_ipv4.c |   18 ++++++++++++++++--
>  net/ipv4/icmp.c       |    7 ++++---
>  3 files changed, 28 insertions(+), 6 deletions(-)

Hi Sergey,

Thanks for your work on finding this and putting a fix together.  As
we discussed previously, I think this looks good, but can you describe
the testing you did to verify that this works correctly?

> diff --git a/include/net/icmp.h b/include/net/icmp.h
> index 6ac3a5b..e0f709d 100644
> --- a/include/net/icmp.h
> +++ b/include/net/icmp.h
> @@ -22,6 +22,7 @@
>
>  #include <net/inet_sock.h>
>  #include <net/snmp.h>
> +#include <net/ip.h>
>
>  struct icmp_err {
>    int          errno;
> @@ -39,7 +40,13 @@ struct icmp_err {
>  struct sk_buff;
>  struct net;
>
> -void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info);
> +void __icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info,
> +                const struct ip_options *opt);
> +static inline void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
> +{
> +       __icmp_send(skb_in, type, code, info, &IPCB(skb_in)->opt);
> +}
> +
>  int icmp_rcv(struct sk_buff *skb);
>  int icmp_err(struct sk_buff *skb, u32 info);
>  int icmp_init(void);
> diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c
> index 777fa3b..234d12e 100644
> --- a/net/ipv4/cipso_ipv4.c
> +++ b/net/ipv4/cipso_ipv4.c
> @@ -1735,13 +1735,27 @@ int cipso_v4_validate(const struct sk_buff *skb, unsigned char **option)
>   */
>  void cipso_v4_error(struct sk_buff *skb, int error, u32 gateway)
>  {
> +       unsigned char optbuf[sizeof(struct ip_options) + 40];
> +       struct ip_options *opt = (struct ip_options *)optbuf;
> +
>         if (ip_hdr(skb)->protocol == IPPROTO_ICMP || error != -EACCES)
>                 return;
>
> +       /*
> +        * We might be called above the IP layer,
> +        * so we can not use icmp_send and IPCB here.
> +        */
> +
> +       memset(opt, 0, sizeof(struct ip_options));
> +       opt->optlen = ip_hdr(skb)->ihl*4 - sizeof(struct iphdr);
> +       memcpy(opt->__data, (unsigned char *)&(ip_hdr(skb)[1]), opt->optlen);
> +       if (ip_options_compile(dev_net(skb->dev), opt, NULL))
> +               return;
> +
>         if (gateway)
> -               icmp_send(skb, ICMP_DEST_UNREACH, ICMP_NET_ANO, 0);
> +               __icmp_send(skb, ICMP_DEST_UNREACH, ICMP_NET_ANO, 0, opt);
>         else
> -               icmp_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_ANO, 0);
> +               __icmp_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_ANO, 0, opt);
>  }
>
>  /**
> diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
> index 065997f..3f24414 100644
> --- a/net/ipv4/icmp.c
> +++ b/net/ipv4/icmp.c
> @@ -570,7 +570,8 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb)
>   *                     MUST reply to only the first fragment.
>   */
>
> -void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
> +void __icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info,
> +                const struct ip_options *opt)
>  {
>         struct iphdr *iph;
>         int room;
> @@ -691,7 +692,7 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
>                                           iph->tos;
>         mark = IP4_REPLY_MARK(net, skb_in->mark);
>
> -       if (ip_options_echo(net, &icmp_param.replyopts.opt.opt, skb_in))
> +       if (__ip_options_echo(net, &icmp_param.replyopts.opt.opt, skb_in, opt))
>                 goto out_unlock;
>
>
> @@ -742,7 +743,7 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
>         local_bh_enable();
>  out:;
>  }
> -EXPORT_SYMBOL(icmp_send);
> +EXPORT_SYMBOL(__icmp_send);
>
>
>  static void icmp_socket_deliver(struct sk_buff *skb, u32 info)
> --
>


-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* [RESEND PATCH net] mm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocs
From: Jann Horn @ 2019-02-13 21:45 UTC (permalink / raw)
  To: David S. Miller, netdev, jannh
  Cc: linux-mm, linux-kernel, Michal Hocko, Vlastimil Babka,
	Pavel Tatashin, Oscar Salvador, Mel Gorman, Aaron Lu,
	Alexander Duyck

The basic idea behind ->pagecnt_bias is: If we pre-allocate the maximum
number of references that we might need to create in the fastpath later,
the bump-allocation fastpath only has to modify the non-atomic bias value
that tracks the number of extra references we hold instead of the atomic
refcount. The maximum number of allocations we can serve (under the
assumption that no allocation is made with size 0) is nc->size, so that's
the bias used.

However, even when all memory in the allocation has been given away, a
reference to the page is still held; and in the `offset < 0` slowpath, the
page may be reused if everyone else has dropped their references.
This means that the necessary number of references is actually
`nc->size+1`.

Luckily, from a quick grep, it looks like the only path that can call
page_frag_alloc(fragsz=1) is TAP with the IFF_NAPI_FRAGS flag, which
requires CAP_NET_ADMIN in the init namespace and is only intended to be
used for kernel testing and fuzzing.

To test for this issue, put a `WARN_ON(page_ref_count(page) == 0)` in the
`offset < 0` path, below the virt_to_page() call, and then repeatedly call
writev() on a TAP device with IFF_TAP|IFF_NO_PI|IFF_NAPI_FRAGS|IFF_NAPI,
with a vector consisting of 15 elements containing 1 byte each.

Signed-off-by: Jann Horn <jannh@google.com>
---
Resending to davem at the request of akpm.

 mm/page_alloc.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 35fdde041f5c..46285d28e43b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4675,11 +4675,11 @@ void *page_frag_alloc(struct page_frag_cache *nc,
 		/* Even if we own the page, we do not use atomic_set().
 		 * This would break get_page_unless_zero() users.
 		 */
-		page_ref_add(page, size - 1);
+		page_ref_add(page, size);
 
 		/* reset page count bias and offset to start of new frag */
 		nc->pfmemalloc = page_is_pfmemalloc(page);
-		nc->pagecnt_bias = size;
+		nc->pagecnt_bias = size + 1;
 		nc->offset = size;
 	}
 
@@ -4695,10 +4695,10 @@ void *page_frag_alloc(struct page_frag_cache *nc,
 		size = nc->size;
 #endif
 		/* OK, page count is 0, we can safely set it */
-		set_page_count(page, size);
+		set_page_count(page, size + 1);
 
 		/* reset page count bias and offset to start of new frag */
-		nc->pagecnt_bias = size;
+		nc->pagecnt_bias = size + 1;
 		offset = size - fragsz;
 	}
 
-- 
2.20.1.791.gb4d0f1c61a-goog


^ permalink raw reply related

* Re: [PATCH iproute2 net-next v2 3/4] ss: Buffer raw fields first, then render them as a table
From: Stephen Hemminger @ 2019-02-13 21:55 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: Eric Dumazet, netdev, Sabrina Dubroca, David Ahern
In-Reply-To: <20190213221716.5f958c2a@redhat.com>

On Wed, 13 Feb 2019 22:17:16 +0100
Stefano Brivio <sbrivio@redhat.com> wrote:

> On Wed, 13 Feb 2019 09:31:03 -0800
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > On 02/13/2019 12:37 AM, Stefano Brivio wrote:  
> > > On Tue, 12 Feb 2019 16:42:04 -0800
> > > Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > >     
> > >> I do not get it.
> > >>
> > >> "ss -emoi " uses almost 1KB per socket.
> > >>
> > >> 10,000,000 sockets -> we need about 10GB of memory  ???
> > >>
> > >> This is a serious regression.    
> > > 
> > > I guess this is rather subjective: the worst case I considered back then
> > > was the output of 'ss -tei0' (less than 500 bytes) for one million
> > > sockets, which gives 500M of memory, which should in turn be fine on a
> > > machine handling one million sockets.
> > > 
> > > Now, if 'ss -emoi' on 10 million sockets is an actual use case (out of
> > > curiosity: how are you going to process that output? Would JSON help?),
> > > I see two easy options to solve this:    
> > 
> > 
> > ss -temoi | parser (written in shell or awk or whatever...)
> > 
> > This is a use case, I just got bitten because using ss command
> > actually OOM my container, while trying to debug a busy GFE.
> > 
> > The host itself can have 10,000,000 TCP sockets, but usually sysadmin shells
> > run in a container with no more than 500 MB available. 
> > 
> > Otherwise, it would be too easy for a buggy program to OOM the whole machine
> > and have angry customers.
> >   
> > > 
> > > 1. flush the output every time we reach a given buffer size (1M
> > >    perhaps). This might make the resulting blocks slightly unaligned,
> > >    with occasional loss of readability on lines occurring every 1k to
> > >    10k sockets approximately, even though after 1k sockets column sizes
> > >    won't change much (it looks anyway better than the original), and I
> > >    don't expect anybody to actually scroll that output
> > > 
> > > 2. add a switch for unbuffered output, but then you need to remember to
> > >    pass it manually, and the whole output would be as bad as the
> > >    original in case you need the switch.
> > > 
> > > I'd rather go with 1., it's easy to implement (we already have partial
> > > flushing with '--events') and it looks like a good compromise on
> > > usability. Thoughts?
> > >     
> > 
> > 1 seems fine, but a switch for 'please do not try to format' would be fine.
> > 
> > I wonder why we try to 'format' when stdout is a pipe or a regular file .  
> 
> On a second thought: what about | less, or | grep [ports],
> or > readable.log? I guess those might also be rather common use cases,
> what do you think?
> 
> I'm tempted to skip this for the moment and just go with option 1.
> 

What I would favor:
	* use big enough columns that for the common case everything lines up fine
	* if column is to wide just print that element wider (which is what print %Ns does)
and
	* add json output for programs that want to parse
	* use print_uint etc for that

The buffering patch (in iproute2-next) can/will be reverted.

^ permalink raw reply

* Re: [PATCH iproute2] lib/libnetlink: ensure a minimum of 32KB for the buffer used in rtnl_recvmsg()
From: Stephen Hemminger @ 2019-02-13 21:57 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Ahern, netdev, Eric Dumazet, Hangbin Liu, Phil Sutter
In-Reply-To: <20190213015841.140383-1-edumazet@google.com>

On Tue, 12 Feb 2019 17:58:41 -0800
Eric Dumazet <edumazet@google.com> wrote:

> In the past, we tried to increase the buffer size up to 32 KB in order
> to reduce number of syscalls per dump.
> 
> Commit 2d34851cd341 ("lib/libnetlink: re malloc buff if size is not enough")
> brought the size back to 4KB because the kernel can not know the application
> is ready to receive bigger requests.
> 
> See kernel commits 9063e21fb026 ("netlink: autosize skb lengthes") and
> d35c99ff77ec ("netlink: do not enter direct reclaim from netlink_dump()")
> for more details.
> 
> Fixes: 2d34851cd341 ("lib/libnetlink: re malloc buff if size is not enough")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Hangbin Liu <liuhangbin@gmail.com>
> Cc: Phil Sutter <phil@nwl.cc>

Applied, although maybe we should bump it to 64K or bigger?

^ permalink raw reply

* Re: [PATCH iproute2] lib/libnetlink: ensure a minimum of 32KB for the buffer used in rtnl_recvmsg()
From: Eric Dumazet @ 2019-02-13 21:59 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David Ahern, netdev, Eric Dumazet, Hangbin Liu, Phil Sutter
In-Reply-To: <20190213135718.1ed23c3a@shemminger-XPS-13-9360>

On Wed, Feb 13, 2019 at 1:57 PM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> On Tue, 12 Feb 2019 17:58:41 -0800
> Eric Dumazet <edumazet@google.com> wrote:
>
> > In the past, we tried to increase the buffer size up to 32 KB in order
> > to reduce number of syscalls per dump.
> >
> > Commit 2d34851cd341 ("lib/libnetlink: re malloc buff if size is not enough")
> > brought the size back to 4KB because the kernel can not know the application
> > is ready to receive bigger requests.
> >
> > See kernel commits 9063e21fb026 ("netlink: autosize skb lengthes") and
> > d35c99ff77ec ("netlink: do not enter direct reclaim from netlink_dump()")
> > for more details.
> >
> > Fixes: 2d34851cd341 ("lib/libnetlink: re malloc buff if size is not enough")
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > Cc: Hangbin Liu <liuhangbin@gmail.com>
> > Cc: Phil Sutter <phil@nwl.cc>
>
> Applied, although maybe we should bump it to 64K or bigger?

Note the kernel does not yet try 64KB allocations, so I do not see an
urgent need for that :)

^ permalink raw reply

* Re: [PATCH iproute2] ss: add option --tos for requesting ipv4 tos and ipv6 tclass
From: Stephen Hemminger @ 2019-02-13 22:00 UTC (permalink / raw)
  To: Konstantin Khlebnikov; +Cc: netdev, Eric Dumazet
In-Reply-To: <155006154185.449020.2783123004054072980.stgit@buzz>

On Wed, 13 Feb 2019 15:39:01 +0300
Konstantin Khlebnikov <khlebnikov@yandex-team.ru> wrote:

> Also show socket class_id/priority used by classful qdisc.
> Kernel report this together with tclass since commit
> ("inet_diag: fix reporting cgroup classid and fallback to priority")
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>

Applied, this is useful even if diffserv is not.

^ permalink raw reply

* Re: [PATCH -next] net: ipvlan_l3s: fix kconfig dependency warning
From: Daniel Borkmann @ 2019-02-13 22:03 UTC (permalink / raw)
  To: Randy Dunlap, netdev@vger.kernel.org; +Cc: Mahesh Bandewar, David Miller
In-Reply-To: <204a7785-a1d2-e714-653e-2cb19e36f279@infradead.org>

On 02/13/2019 05:55 PM, Randy Dunlap wrote:
> From: Randy Dunlap <rdunlap@infradead.org>
> 
> Fix the kconfig warning in IPVLAN_L3S when neither INET nor IPV6
> is enabled:
> 
> WARNING: unmet direct dependencies detected for NET_L3_MASTER_DEV
>   Depends on [n]: NET [=y] && (INET [=n] || IPV6 [=n])
>   Selected by [y]:
>   - IPVLAN_L3S [=y] && NETDEVICES [=y] && NET_CORE [=y] && NETFILTER [=y]
> 
> Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
> Cc: Mahesh Bandewar <maheshb@google.com>
> Cc: Daniel Borkmann <daniel@iogearbox.net>

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

^ permalink raw reply

* Re: [PATCH iproute2] iplink: document XDP subcommand to force the XDP mode.
From: Stephen Hemminger @ 2019-02-13 22:04 UTC (permalink / raw)
  To: Matteo Croce; +Cc: netdev, David Ahern, Stephen Hemminger, Jakub Kicinski
In-Reply-To: <20190213144030.15160-1-mcroce@redhat.com>

On Wed, 13 Feb 2019 15:40:30 +0100
Matteo Croce <mcroce@redhat.com> wrote:

> When attaching an eBPF program to a device, ip link can force the XDP mode
> by using the xdp{generic,drv,offload} keyword instead of just 'xdp'.
> Document this behaviour also in the help output.
> 
> Signed-off-by: Matteo Croce <mcroce@redhat.com>
> Fixes: 14683814 ("bpf: add xdpdrv for requesting XDP driver mode")
> Fixes: 1b5e8094 ("bpf: allow requesting XDP HW offload")

Applied, thanks. 
The man page already has this as well.


^ permalink raw reply

* [PATCH net-next 0/9] net: Get rid of switchdev_port_attr_get()
From: Florian Fainelli @ 2019-02-13 22:06 UTC (permalink / raw)
  To: netdev
  Cc: Florian Fainelli, David S. Miller, Ido Schimmel, open list,
	open list:STAGING SUBSYSTEM, moderated list:ETHERNET BRIDGE, jiri,
	andrew, vivien.didelot

Hi all,

This patch series splits the removal of the switchdev_ops that was
proposed a few times before and first tackles the easy part which is the
removal of the single call to switchdev_port_attr_get() within the
bridge code.

As suggestd by Ido, this patch series adds a
SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS which is used in the same
context as the caller of switchdev_port_attr_set(), so not deferred, and
then the operation is carried out in deferred context with setting a
support bridge port flag.

Follow-up patches will do the switchdev_ops removal after introducing
the proper helpers for the switchdev blocking notifier to work across
stacked devices (unlike the previous submissions).

Florian Fainelli (9):
  Documentation: networking: switchdev: Update port parent ID section
  net: switchdev: Add PORT_PRE_BRIDGE_FLAGS
  mlxsw: spectrum: Check bridge flags during prepare phase
  staging: fsl-dpaa2: ethsw: Check bridge port flags during prepare
  net: dsa: Add setter for SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS
  rocker: Check bridge flags during prepare phase
  net: bridge: Stop calling switchdev_port_attr_get()
  net: Remove SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS_SUPPORT
  net: Get rid of switchdev_port_attr_get()

 Documentation/networking/switchdev.txt        | 15 ++---
 .../mellanox/mlxsw/spectrum_switchdev.c       | 30 +++------
 drivers/net/ethernet/rocker/rocker_main.c     | 67 +++++++++----------
 drivers/staging/fsl-dpaa2/ethsw/ethsw.c       | 27 +++-----
 include/net/switchdev.h                       | 13 +---
 net/bridge/br_switchdev.c                     | 16 ++---
 net/dsa/dsa_priv.h                            |  3 +
 net/dsa/port.c                                | 11 +++
 net/dsa/slave.c                               | 22 ++----
 9 files changed, 88 insertions(+), 116 deletions(-)

-- 
2.17.1


^ permalink raw reply

* [PATCH net-next 1/9] Documentation: networking: switchdev: Update port parent ID section
From: Florian Fainelli @ 2019-02-13 22:06 UTC (permalink / raw)
  To: netdev
  Cc: Florian Fainelli, David S. Miller, Ido Schimmel, open list,
	open list:STAGING SUBSYSTEM, moderated list:ETHERNET BRIDGE, jiri,
	andrew, vivien.didelot
In-Reply-To: <20190213220638.1552-1-f.fainelli@gmail.com>

Update the section about switchdev drivers having to implement a
switchdev_port_attr_get() function to return
SWITCHDEV_ATTR_ID_PORT_PARENT_ID since that is no longer valid after
commit bccb30254a4a ("net: Get rid of
SWITCHDEV_ATTR_ID_PORT_PARENT_ID").

Fixes: bccb30254a4a ("net: Get rid of SWITCHDEV_ATTR_ID_PORT_PARENT_ID")
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 Documentation/networking/switchdev.txt | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
index f3244d87512a..ea90243340a9 100644
--- a/Documentation/networking/switchdev.txt
+++ b/Documentation/networking/switchdev.txt
@@ -92,11 +92,11 @@ device.
 Switch ID
 ^^^^^^^^^
 
-The switchdev driver must implement the switchdev op switchdev_port_attr_get
-for SWITCHDEV_ATTR_ID_PORT_PARENT_ID for each port netdev, returning the same
-physical ID for each port of a switch.  The ID must be unique between switches
-on the same system.  The ID does not need to be unique between switches on
-different systems.
+The switchdev driver must implement the net_device operation
+ndo_get_port_parent_id for each port netdev, returning the same physical ID for
+each port of a switch. The ID must be unique between switches on the same
+system. The ID does not need to be unique between switches on different
+systems.
 
 The switch ID is used to locate ports on a switch and to know if aggregated
 ports belong to the same switch.
-- 
2.17.1


^ permalink raw reply related

* [PATCH net-next 2/9] net: switchdev: Add PORT_PRE_BRIDGE_FLAGS
From: Florian Fainelli @ 2019-02-13 22:06 UTC (permalink / raw)
  To: netdev
  Cc: Florian Fainelli, David S. Miller, Ido Schimmel, open list,
	open list:STAGING SUBSYSTEM, moderated list:ETHERNET BRIDGE, jiri,
	andrew, vivien.didelot
In-Reply-To: <20190213220638.1552-1-f.fainelli@gmail.com>

In preparation for removing switchdev_port_attr_get(), introduce
PORT_PRE_BRIDGE_FLAGS which will be called through
switchdev_port_attr_set(), in the caller's context (possibly atomic) and
which must be checked by the switchdev driver in order to return whether
the operation is supported or not.

This is entirely analoguous to how the BRIDGE_FLAGS_SUPPORT works,
except it goes through a set() instead of get().

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 include/net/switchdev.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index 5e87b54c5dc5..de72b0a3867f 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -46,6 +46,7 @@ enum switchdev_attr_id {
 	SWITCHDEV_ATTR_ID_PORT_STP_STATE,
 	SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS,
 	SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS_SUPPORT,
+	SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS,
 	SWITCHDEV_ATTR_ID_PORT_MROUTER,
 	SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME,
 	SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING,
@@ -61,7 +62,7 @@ struct switchdev_attr {
 	void (*complete)(struct net_device *dev, int err, void *priv);
 	union {
 		u8 stp_state;				/* PORT_STP_STATE */
-		unsigned long brport_flags;		/* PORT_BRIDGE_FLAGS */
+		unsigned long brport_flags;		/* PORT_{PRE}_BRIDGE_FLAGS */
 		unsigned long brport_flags_support;	/* PORT_BRIDGE_FLAGS_SUPPORT */
 		bool mrouter;				/* PORT_MROUTER */
 		clock_t ageing_time;			/* BRIDGE_AGEING_TIME */
-- 
2.17.1


^ permalink raw reply related

* [PATCH net-next 3/9] mlxsw: spectrum: Check bridge flags during prepare phase
From: Florian Fainelli @ 2019-02-13 22:06 UTC (permalink / raw)
  To: netdev
  Cc: Florian Fainelli, David S. Miller, Ido Schimmel, open list,
	open list:STAGING SUBSYSTEM, moderated list:ETHERNET BRIDGE, jiri,
	andrew, vivien.didelot
In-Reply-To: <20190213220638.1552-1-f.fainelli@gmail.com>

In preparation for getting rid of switchdev_port_attr_get(), have mlxsw
check for the bridge flags being set through switchdev_port_attr_set()
when the SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS attribute identifier is
used.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 .../ethernet/mellanox/mlxsw/spectrum_switchdev.c   | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index 1f492b7dbea8..7616eab50035 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -598,13 +598,17 @@ mlxsw_sp_bridge_port_learning_set(struct mlxsw_sp_port *mlxsw_sp_port,
 static int mlxsw_sp_port_attr_br_flags_set(struct mlxsw_sp_port *mlxsw_sp_port,
 					   struct switchdev_trans *trans,
 					   struct net_device *orig_dev,
-					   unsigned long brport_flags)
+					   unsigned long brport_flags,
+					   bool pre_set)
 {
 	struct mlxsw_sp_bridge_port *bridge_port;
 	int err;
 
-	if (switchdev_trans_ph_prepare(trans))
+	if (switchdev_trans_ph_prepare(trans) && pre_set) {
+		if (brport_flags & ~(BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD))
+			return -EOPNOTSUPP;
 		return 0;
+	}
 
 	bridge_port = mlxsw_sp_bridge_port_find(mlxsw_sp_port->mlxsw_sp->bridge,
 						orig_dev);
@@ -833,6 +837,7 @@ static int mlxsw_sp_port_attr_set(struct net_device *dev,
 				  struct switchdev_trans *trans)
 {
 	struct mlxsw_sp_port *mlxsw_sp_port = netdev_priv(dev);
+	bool pre_set = false;
 	int err;
 
 	switch (attr->id) {
@@ -841,10 +846,13 @@ static int mlxsw_sp_port_attr_set(struct net_device *dev,
 						       attr->orig_dev,
 						       attr->u.stp_state);
 		break;
+	case SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS:
+		pre_set = true;	/* fall through */
 	case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS:
 		err = mlxsw_sp_port_attr_br_flags_set(mlxsw_sp_port, trans,
 						      attr->orig_dev,
-						      attr->u.brport_flags);
+						      attr->u.brport_flags,
+						      pre_set);
 		break;
 	case SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME:
 		err = mlxsw_sp_port_attr_br_ageing_set(mlxsw_sp_port, trans,
-- 
2.17.1


^ permalink raw reply related

* [PATCH net-next 4/9] staging: fsl-dpaa2: ethsw: Check bridge port flags during prepare
From: Florian Fainelli @ 2019-02-13 22:06 UTC (permalink / raw)
  To: netdev
  Cc: Florian Fainelli, David S. Miller, Ido Schimmel, open list,
	open list:STAGING SUBSYSTEM, moderated list:ETHERNET BRIDGE, jiri,
	andrew, vivien.didelot
In-Reply-To: <20190213220638.1552-1-f.fainelli@gmail.com>

In preparation for removing SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS_SUPPORT,
have ethsw check that the bridge port flags that are being set are
supported when SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS is specified.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/staging/fsl-dpaa2/ethsw/ethsw.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/fsl-dpaa2/ethsw/ethsw.c b/drivers/staging/fsl-dpaa2/ethsw/ethsw.c
index 1b3943b71254..f788a9458b89 100644
--- a/drivers/staging/fsl-dpaa2/ethsw/ethsw.c
+++ b/drivers/staging/fsl-dpaa2/ethsw/ethsw.c
@@ -668,13 +668,16 @@ static int port_attr_stp_state_set(struct net_device *netdev,
 
 static int port_attr_br_flags_set(struct net_device *netdev,
 				  struct switchdev_trans *trans,
-				  unsigned long flags)
+				  unsigned long flags, bool pre_set)
 {
 	struct ethsw_port_priv *port_priv = netdev_priv(netdev);
 	int err = 0;
 
-	if (switchdev_trans_ph_prepare(trans))
+	if (switchdev_trans_ph_prepare(trans) && pre_set) {
+		if (flags & ~(BR_LEARNING | BR_FLOOD))
+			return -EOPNOTSUPP;
 		return 0;
+	}
 
 	/* Learning is enabled per switch */
 	err = ethsw_set_learning(port_priv->ethsw_data, flags & BR_LEARNING);
@@ -691,6 +694,7 @@ static int swdev_port_attr_set(struct net_device *netdev,
 			       const struct switchdev_attr *attr,
 			       struct switchdev_trans *trans)
 {
+	bool pre_set = false;
 	int err = 0;
 
 	switch (attr->id) {
@@ -698,9 +702,11 @@ static int swdev_port_attr_set(struct net_device *netdev,
 		err = port_attr_stp_state_set(netdev, trans,
 					      attr->u.stp_state);
 		break;
+	case SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS:
+		pre_set = true; /* fall through */
 	case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS:
 		err = port_attr_br_flags_set(netdev, trans,
-					     attr->u.brport_flags);
+					     attr->u.brport_flags, pre_set);
 		break;
 	case SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING:
 		/* VLANs are supported by default  */
-- 
2.17.1


^ permalink raw reply related

* [PATCH net-next 5/9] net: dsa: Add setter for SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS
From: Florian Fainelli @ 2019-02-13 22:06 UTC (permalink / raw)
  To: netdev
  Cc: Florian Fainelli, David S. Miller, Ido Schimmel, open list,
	open list:STAGING SUBSYSTEM, moderated list:ETHERNET BRIDGE, jiri,
	andrew, vivien.didelot
In-Reply-To: <20190213220638.1552-1-f.fainelli@gmail.com>

In preparation for removing SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS_SUPPORT,
add support for a function that processes the
SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS and
SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS attributes and returns not
supported for any flag set, since DSA does not currently support
toggling those bridge port attributes (yet).

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 net/dsa/dsa_priv.h |  3 +++
 net/dsa/port.c     | 11 +++++++++++
 net/dsa/slave.c    |  7 +++++++
 3 files changed, 21 insertions(+)

diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 1f4972dab9f2..50d73698dfb3 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -150,6 +150,9 @@ int dsa_port_vlan_filtering(struct dsa_port *dp, bool vlan_filtering,
 			    struct switchdev_trans *trans);
 int dsa_port_ageing_time(struct dsa_port *dp, clock_t ageing_clock,
 			 struct switchdev_trans *trans);
+int dsa_port_bridge_port_flags_set(struct dsa_port *dp,
+				   unsigned long brport_flags,
+				   struct switchdev_trans *trans, bool pre_set);
 int dsa_port_fdb_add(struct dsa_port *dp, const unsigned char *addr,
 		     u16 vid);
 int dsa_port_fdb_del(struct dsa_port *dp, const unsigned char *addr,
diff --git a/net/dsa/port.c b/net/dsa/port.c
index 2d7e01b23572..b0c4cfd18da9 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -177,6 +177,17 @@ int dsa_port_ageing_time(struct dsa_port *dp, clock_t ageing_clock,
 	return dsa_port_notify(dp, DSA_NOTIFIER_AGEING_TIME, &info);
 }
 
+int dsa_port_bridge_port_flags_set(struct dsa_port *dp,
+				   unsigned long brport_flags,
+				   struct switchdev_trans *trans,
+				   bool pre_set)
+{
+	if (brport_flags)
+		return -EOPNOTSUPP;
+
+	return 0;
+}
+
 int dsa_port_fdb_add(struct dsa_port *dp, const unsigned char *addr,
 		     u16 vid)
 {
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 2e5e7c04821b..417388c9f1fa 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -282,6 +282,7 @@ static int dsa_slave_port_attr_set(struct net_device *dev,
 				   struct switchdev_trans *trans)
 {
 	struct dsa_port *dp = dsa_slave_to_port(dev);
+	bool pre_set = false;
 	int ret;
 
 	switch (attr->id) {
@@ -295,6 +296,12 @@ static int dsa_slave_port_attr_set(struct net_device *dev,
 	case SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME:
 		ret = dsa_port_ageing_time(dp, attr->u.ageing_time, trans);
 		break;
+	case SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS:
+		pre_set = true; /* fall through */
+	case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS:
+		ret = dsa_port_bridge_port_flags_set(dp, attr->u.brport_flags,
+						     trans, pre_set);
+		break;
 	default:
 		ret = -EOPNOTSUPP;
 		break;
-- 
2.17.1


^ permalink raw reply related

* [PATCH net-next 6/9] rocker: Check bridge flags during prepare phase
From: Florian Fainelli @ 2019-02-13 22:06 UTC (permalink / raw)
  To: netdev
  Cc: Florian Fainelli, David S. Miller, Ido Schimmel, open list,
	open list:STAGING SUBSYSTEM, moderated list:ETHERNET BRIDGE, jiri,
	andrew, vivien.didelot
In-Reply-To: <20190213220638.1552-1-f.fainelli@gmail.com>

In preparation for getting rid of switchdev_port_attr_get(), have rocker
check for the bridge flags being set through switchdev_port_attr_set()
with the SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS attribute identifier.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/ethernet/rocker/rocker_main.c | 48 +++++++++++++++--------
 1 file changed, 31 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/rocker/rocker_main.c b/drivers/net/ethernet/rocker/rocker_main.c
index 5ce8d5aba603..863a8b32e6e9 100644
--- a/drivers/net/ethernet/rocker/rocker_main.c
+++ b/drivers/net/ethernet/rocker/rocker_main.c
@@ -1565,37 +1565,48 @@ static int rocker_world_port_attr_stp_state_set(struct rocker_port *rocker_port,
 	return wops->port_attr_stp_state_set(rocker_port, state);
 }
 
+static int
+rocker_world_port_attr_bridge_flags_support_get(const struct rocker_port *
+						rocker_port,
+						unsigned long *
+						p_brport_flags_support)
+{
+	struct rocker_world_ops *wops = rocker_port->rocker->wops;
+
+	if (!wops->port_attr_bridge_flags_support_get)
+		return -EOPNOTSUPP;
+	return wops->port_attr_bridge_flags_support_get(rocker_port,
+							p_brport_flags_support);
+}
+
 static int
 rocker_world_port_attr_bridge_flags_set(struct rocker_port *rocker_port,
 					unsigned long brport_flags,
-					struct switchdev_trans *trans)
+					struct switchdev_trans *trans,
+					bool pre_set)
 {
 	struct rocker_world_ops *wops = rocker_port->rocker->wops;
+	unsigned long brport_flags_s;
+	int err;
 
 	if (!wops->port_attr_bridge_flags_set)
 		return -EOPNOTSUPP;
 
-	if (switchdev_trans_ph_prepare(trans))
+	if (switchdev_trans_ph_prepare(trans) && pre_set) {
+		err = rocker_world_port_attr_bridge_flags_support_get(rocker_port,
+							      &brport_flags_s);
+		if (err)
+			return err;
+
+		if (brport_flags & ~brport_flags_s)
+			return -EOPNOTSUPP;
 		return 0;
+	}
 
 	return wops->port_attr_bridge_flags_set(rocker_port, brport_flags,
 						trans);
 }
 
-static int
-rocker_world_port_attr_bridge_flags_support_get(const struct rocker_port *
-						rocker_port,
-						unsigned long *
-						p_brport_flags_support)
-{
-	struct rocker_world_ops *wops = rocker_port->rocker->wops;
-
-	if (!wops->port_attr_bridge_flags_support_get)
-		return -EOPNOTSUPP;
-	return wops->port_attr_bridge_flags_support_get(rocker_port,
-							p_brport_flags_support);
-}
-
 static int
 rocker_world_port_attr_bridge_ageing_time_set(struct rocker_port *rocker_port,
 					      u32 ageing_time,
@@ -2066,6 +2077,7 @@ static int rocker_port_attr_set(struct net_device *dev,
 				struct switchdev_trans *trans)
 {
 	struct rocker_port *rocker_port = netdev_priv(dev);
+	bool pre_set = false;
 	int err = 0;
 
 	switch (attr->id) {
@@ -2074,10 +2086,12 @@ static int rocker_port_attr_set(struct net_device *dev,
 							   attr->u.stp_state,
 							   trans);
 		break;
+	case SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS:
+		pre_set = true; /* fall through */
 	case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS:
 		err = rocker_world_port_attr_bridge_flags_set(rocker_port,
 							      attr->u.brport_flags,
-							      trans);
+							      trans, pre_set);
 		break;
 	case SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME:
 		err = rocker_world_port_attr_bridge_ageing_time_set(rocker_port,
-- 
2.17.1


^ permalink raw reply related

* [PATCH net-next 7/9] net: bridge: Stop calling switchdev_port_attr_get()
From: Florian Fainelli @ 2019-02-13 22:06 UTC (permalink / raw)
  To: netdev
  Cc: Florian Fainelli, David S. Miller, Ido Schimmel, open list,
	open list:STAGING SUBSYSTEM, moderated list:ETHERNET BRIDGE, jiri,
	andrew, vivien.didelot
In-Reply-To: <20190213220638.1552-1-f.fainelli@gmail.com>

Now that all switchdev drivers have been converted to checking the
bridge port flags during the prepare phase of the
switchdev_port_attr_set() when the process
SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS, we can avoid calling
switchdev_port_attr_get() with
SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS_SUPPORT.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 net/bridge/br_switchdev.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c
index db9e8ab96d48..8f88f8a1a7fa 100644
--- a/net/bridge/br_switchdev.c
+++ b/net/bridge/br_switchdev.c
@@ -64,29 +64,27 @@ int br_switchdev_set_port_flag(struct net_bridge_port *p,
 {
 	struct switchdev_attr attr = {
 		.orig_dev = p->dev,
-		.id = SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS_SUPPORT,
+		.id = SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS,
+		.u.brport_flags = flags,
 	};
 	int err;
 
 	if (mask & ~BR_PORT_FLAGS_HW_OFFLOAD)
 		return 0;
 
-	err = switchdev_port_attr_get(p->dev, &attr);
-	if (err == -EOPNOTSUPP)
-		return 0;
-	if (err)
+	err = switchdev_port_attr_set(p->dev, &attr);
+	if (err && err != -EOPNOTSUPP)
 		return err;
 
-	/* Check if specific bridge flag attribute offload is supported */
-	if (!(attr.u.brport_flags_support & mask)) {
+	if (err == -EOPNOTSUPP) {
 		br_warn(p->br, "bridge flag offload is not supported %u(%s)\n",
 			(unsigned int)p->port_no, p->dev->name);
-		return -EOPNOTSUPP;
+		return err;
 	}
 
 	attr.id = SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS;
 	attr.flags = SWITCHDEV_F_DEFER;
-	attr.u.brport_flags = flags;
+
 	err = switchdev_port_attr_set(p->dev, &attr);
 	if (err) {
 		br_warn(p->br, "error setting offload flag on port %u(%s)\n",
-- 
2.17.1


^ permalink raw reply related

* [PATCH net-next 8/9] net: Remove SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS_SUPPORT
From: Florian Fainelli @ 2019-02-13 22:06 UTC (permalink / raw)
  To: netdev
  Cc: Florian Fainelli, David S. Miller, Ido Schimmel, open list,
	open list:STAGING SUBSYSTEM, moderated list:ETHERNET BRIDGE, jiri,
	andrew, vivien.didelot
In-Reply-To: <20190213220638.1552-1-f.fainelli@gmail.com>

Now that we have converted the bridge code and the drivers to check for
bridge port(s) flags at the time we try to set them, there is no need
for a get() -> set() sequence anymore and
SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS_SUPPORT therefore becomes unused.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 .../ethernet/mellanox/mlxsw/spectrum_switchdev.c   | 11 +----------
 drivers/net/ethernet/rocker/rocker_main.c          | 14 +-------------
 drivers/staging/fsl-dpaa2/ethsw/ethsw.c            | 10 +---------
 include/net/switchdev.h                            |  2 --
 net/dsa/slave.c                                    | 10 +---------
 5 files changed, 4 insertions(+), 43 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index 7616eab50035..c11cf7fa4863 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -434,16 +434,7 @@ static void mlxsw_sp_bridge_vlan_put(struct mlxsw_sp_bridge_vlan *bridge_vlan)
 static int mlxsw_sp_port_attr_get(struct net_device *dev,
 				  struct switchdev_attr *attr)
 {
-	switch (attr->id) {
-	case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS_SUPPORT:
-		attr->u.brport_flags_support = BR_LEARNING | BR_FLOOD |
-					       BR_MCAST_FLOOD;
-		break;
-	default:
-		return -EOPNOTSUPP;
-	}
-
-	return 0;
+	return -EOPNOTSUPP;
 }
 
 static int
diff --git a/drivers/net/ethernet/rocker/rocker_main.c b/drivers/net/ethernet/rocker/rocker_main.c
index 863a8b32e6e9..8e80301eae7b 100644
--- a/drivers/net/ethernet/rocker/rocker_main.c
+++ b/drivers/net/ethernet/rocker/rocker_main.c
@@ -2057,19 +2057,7 @@ static const struct net_device_ops rocker_port_netdev_ops = {
 static int rocker_port_attr_get(struct net_device *dev,
 				struct switchdev_attr *attr)
 {
-	const struct rocker_port *rocker_port = netdev_priv(dev);
-	int err = 0;
-
-	switch (attr->id) {
-	case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS_SUPPORT:
-		err = rocker_world_port_attr_bridge_flags_support_get(rocker_port,
-								      &attr->u.brport_flags_support);
-		break;
-	default:
-		return -EOPNOTSUPP;
-	}
-
-	return err;
+	return -EOPNOTSUPP;
 }
 
 static int rocker_port_attr_set(struct net_device *dev,
diff --git a/drivers/staging/fsl-dpaa2/ethsw/ethsw.c b/drivers/staging/fsl-dpaa2/ethsw/ethsw.c
index f788a9458b89..5f58c7df67bb 100644
--- a/drivers/staging/fsl-dpaa2/ethsw/ethsw.c
+++ b/drivers/staging/fsl-dpaa2/ethsw/ethsw.c
@@ -643,15 +643,7 @@ static void ethsw_teardown_irqs(struct fsl_mc_device *sw_dev)
 static int swdev_port_attr_get(struct net_device *netdev,
 			       struct switchdev_attr *attr)
 {
-	switch (attr->id) {
-	case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS_SUPPORT:
-		attr->u.brport_flags_support = BR_LEARNING | BR_FLOOD;
-		break;
-	default:
-		return -EOPNOTSUPP;
-	}
-
-	return 0;
+	return -EOPNOTSUPP;
 }
 
 static int port_attr_stp_state_set(struct net_device *netdev,
diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index de72b0a3867f..0f352019ef99 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -45,7 +45,6 @@ enum switchdev_attr_id {
 	SWITCHDEV_ATTR_ID_UNDEFINED,
 	SWITCHDEV_ATTR_ID_PORT_STP_STATE,
 	SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS,
-	SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS_SUPPORT,
 	SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS,
 	SWITCHDEV_ATTR_ID_PORT_MROUTER,
 	SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME,
@@ -63,7 +62,6 @@ struct switchdev_attr {
 	union {
 		u8 stp_state;				/* PORT_STP_STATE */
 		unsigned long brport_flags;		/* PORT_{PRE}_BRIDGE_FLAGS */
-		unsigned long brport_flags_support;	/* PORT_BRIDGE_FLAGS_SUPPORT */
 		bool mrouter;				/* PORT_MROUTER */
 		clock_t ageing_time;			/* BRIDGE_AGEING_TIME */
 		bool vlan_filtering;			/* BRIDGE_VLAN_FILTERING */
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 417388c9f1fa..a176d3ba3b7a 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -388,15 +388,7 @@ static int dsa_slave_get_port_parent_id(struct net_device *dev,
 static int dsa_slave_port_attr_get(struct net_device *dev,
 				   struct switchdev_attr *attr)
 {
-	switch (attr->id) {
-	case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS_SUPPORT:
-		attr->u.brport_flags_support = 0;
-		break;
-	default:
-		return -EOPNOTSUPP;
-	}
-
-	return 0;
+	return -EOPNOTSUPP;
 }
 
 static inline netdev_tx_t dsa_slave_netpoll_send_skb(struct net_device *dev,
-- 
2.17.1


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox