Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next v2] net: openvswitch: Set OvS recirc_id from tc chain index
From: Paul Blakey @ 2019-08-25 14:11 UTC (permalink / raw)
  To: David Miller
  Cc: pshelar@ovn.org, netdev@vger.kernel.org, jpettit@nicira.com,
	simon.horman@netronome.com, marcelo.leitner@gmail.com,
	Vlad Buslov, Jiri Pirko, Roi Dayan, Yossi Kuperman, Rony Efraim,
	Oz Shlomo
In-Reply-To: <20190821.205735.2069656948701231785.davem@davemloft.net>

On 8/22/2019 6:57 AM, David Miller wrote:

> From: Paul Blakey <paulb@mellanox.com>
> Date: Tue, 20 Aug 2019 15:30:51 +0300
>
>> @@ -4050,6 +4060,9 @@ enum skb_ext_id {
>>   #ifdef CONFIG_XFRM
>>   	SKB_EXT_SEC_PATH,
>>   #endif
>> +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
>> +	TC_SKB_EXT,
>> +#endif
>>   	SKB_EXT_NUM, /* must be last */
>>   };
> Sorry, no.
>
> The SKB extensions are not a dumping ground for people to use when they can't
> figure out another more reasonable place to put their values.  Try to use
> the normal cb[], and if you can't you must explain in exhaustive detail
> why you cannot in any way whatsoever make that work.
>
> Again, SKB extensions are not a dumping ground.
Hi,
The general context of this skb extension patch is hardware offload of 
multi chain rules.
This patch shows only one usage of this extension which is by tc -> OvS 
miss path.
But we also plan to reuse this extension to pass information from 
HW/Driver -> tc miss path.

In tc multi chain rules scenarios, some of the rules might be offloaded
and some not (e.g skip_hw, unsupported rules by HW, vxlan encapsulation, 
offload order, etc).
Therefore, HW can miss at any point of the processing chain.
SW will need to continue processing in correct tc chain where the HW 
left off, as HW might have modified the packet and updated stats for it.
This scenario can reuse this tc SKB extension to restore the tc chain.

Skb control block acts a scratchpad area for storing temporary 
information and isn't suppose
to be used to pass around information between different layers of 
processing.
HW/Driver -> tc - >OvS  are different layers, and not necessarily 
processing the packet one after another.
There can be bridges, tunnel devices, VLAN devices, Netfilter 
(Conntrack) and a host of other entities processing the packet in between
so we can't guarantee the control block integrity between this main 
processing entities (HW/Driver, Tc, Ovs).
So if we'll use the control block, it will restrict such use cases.
For example, the napi API which we use, uses the control block and comes 
right after our driver layer.
This will overwrite any usage of CB by us.

Thanks,
Paul B.

^ permalink raw reply

* [PATCH net] nexthop: Fix nexthop_num_path for blackhole nexthops
From: David Ahern @ 2019-08-25 14:47 UTC (permalink / raw)
  To: davem; +Cc: netdev, sharpd, David Ahern

From: David Ahern <dsahern@gmail.com>

Donald reported this sequence:
  ip next add id 1 blackhole
  ip next add id 2 blackhole
  ip ro add 1.1.1.1/32 nhid 1
  ip ro add 1.1.1.2/32 nhid 2

would cause a crash. Backtrace is:

[  151.302790] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[  151.304043] CPU: 1 PID: 277 Comm: ip Not tainted 5.3.0-rc5+ #37
[  151.305078] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
[  151.306526] RIP: 0010:fib_add_nexthop+0x8b/0x2aa
[  151.307343] Code: 35 f7 81 48 8d 14 01 c7 02 f1 f1 f1 f1 c7 42 04 01 f4 f4 f4 48 89 f2 48 c1 ea 03 65 48 8b 0c 25 28 00 00 00 48 89 4d d0 31 c9 <80> 3c 02 00 74 08 48 89 f7 e8 1a e8 53 ff be 08 00 00 00 4c 89 e7
[  151.310549] RSP: 0018:ffff888116c27340 EFLAGS: 00010246
[  151.311469] RAX: dffffc0000000000 RBX: ffff8881154ece00 RCX: 0000000000000000
[  151.312713] RDX: 0000000000000004 RSI: 0000000000000020 RDI: ffff888115649b40
[  151.313968] RBP: ffff888116c273d8 R08: ffffed10221e3757 R09: ffff888110f1bab8
[  151.315212] R10: 0000000000000001 R11: ffff888110f1bab3 R12: ffff888115649b40
[  151.316456] R13: 0000000000000020 R14: ffff888116c273b0 R15: ffff888115649b40
[  151.317707] FS:  00007f60b4d8d800(0000) GS:ffff88811ac00000(0000) knlGS:0000000000000000
[  151.319113] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  151.320119] CR2: 0000555671ffdc00 CR3: 00000001136ba005 CR4: 0000000000020ee0
[  151.321367] Call Trace:
[  151.321820]  ? fib_nexthop_info+0x635/0x635
[  151.322572]  fib_dump_info+0xaa4/0xde0
[  151.323247]  ? fib_create_info+0x2431/0x2431
[  151.324008]  ? napi_alloc_frag+0x2a/0x2a
[  151.324711]  rtmsg_fib+0x2c4/0x3be
[  151.325339]  fib_table_insert+0xe2f/0xeee
...

fib_dump_info incorrectly has nhs = 0 for blackhole nexthops, so it
believes the nexthop object is a multipath group (nhs != 1) and ends
up down the nexthop_mpath_fill_node() path which is wrong for a
blackhole.

The blackhole check in nexthop_num_path is leftover from early days
of the blackhole implementation which did not initialize the device.
In the end the design was simpler (fewer special case checks) to set
the device to loopback in nh_info, so the check in nexthop_num_path
should have been removed.

Fixes: 430a049190de ("nexthop: Add support for nexthop groups")
Reported-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
---
 include/net/nexthop.h | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/include/net/nexthop.h b/include/net/nexthop.h
index 25f1f9a8419b..95f766c31c90 100644
--- a/include/net/nexthop.h
+++ b/include/net/nexthop.h
@@ -141,12 +141,6 @@ static inline unsigned int nexthop_num_path(const struct nexthop *nh)
 
 		nh_grp = rcu_dereference_rtnl(nh->nh_grp);
 		rc = nh_grp->num_nh;
-	} else {
-		const struct nh_info *nhi;
-
-		nhi = rcu_dereference_rtnl(nh->nh_info);
-		if (nhi->reject_nh)
-			rc = 0;
 	}
 
 	return rc;
-- 
2.11.0


^ permalink raw reply related

* Re: [PATCH RFC net-next 0/3] Multi-CPU DSA support
From: Florian Fainelli @ 2019-08-25 15:00 UTC (permalink / raw)
  To: Marek Behun
  Cc: netdev, Andrew Lunn, Vivien Didelot, David Ahern,
	Stephen Hemminger, Chris Healy, Vladimir Oltean
In-Reply-To: <20190825091337.236ee73a@nic.cz>

On 8/25/2019 12:13 AM, Marek Behun wrote:
> On Sat, 24 Aug 2019 13:04:04 -0700
> Florian Fainelli <f.fainelli@gmail.com> wrote:
> 
>> Now, the 4.9 kernel behavior actually works just fine because eth1 is
>> not a special interface, so no tagging is expected, and "wifi", although
>> it supports DSA tagging, represents another side of the CPU/host network
>> stack, so you never have to inject frames into the switch, because you
>> can use eth1 to do that and let MAC learning do its job to forward to
>> the correct port of the switch.
> 
> Hi Florian,
> 
> Sorry, I am having trouble understanding what you mean in the
> paragraph I quoted above (and paragraphs afterwards).
> 
> eth0 and eth1 are interfaces created by an ethernet driver.
> wlan0 is an interface created by wireless driver.
> wifi is a slave interface created by DSA for port 5 on the switch.
> eth1 is DSA slave or a DSA master connected to port 5?

"wifi" is a DSA slave network interface, and as you understood, eth1 is
a standard Ethernet driver (which happens to be the same as eth0, since
on that system there are two identical controllers, driver is
bcmsysport.c). The relevant part of the DTS for that system looks like this:

port@5 {
	reg = <5>;
	label = "wifi";
	ethernet = <&enet_1>;
};

port@8 {
	reg = <8>;
	label = "cpu";
	ethernet = <&enet_0>;
};

So as you can see, the devices are all described correctly, simply the
behavior on the Linux change changes whether you have commit
6d4e5c570c2d66c806ecc6bd851fcf881fe8a38e ("net: dsa: get port type at
parse time") included or not. With this commit, the criteria for
determining what is a DSA/CPU port changed: it used to be based on the
label (e.g.: "cpu", "dsa") and then it changed to be based on whether a
phandle property is either "ethernet" (port is CPU) or "link" (port is
DSA), which is the right thing to do, but it no longer allows us to be
specific about which port is going to be elected as the CPU port.
Fortunately the switch driver is coded with the assumption that either
port 5 or 8 could be used.

Note that when we do not have a network device that represents the
switch"side (e.g.: the CPU port or the DSA port), we had to jump through
many hoops in order to make some information visible to the user, or
even overlay specific operations onto the DSA master, that code is under
net/dsa/master.c.

> 
> How does DSA handle two interfaces with same reg property?

This is not happening, see DTS example above.
-- 
Florian

^ permalink raw reply

* Re: [PATCH net-next v3 2/6] net: dsa: mv88e6xxx: update code operating on hidden registers
From: Vivien Didelot @ 2019-08-25 15:07 UTC (permalink / raw)
  To: Marek Behún
  Cc: netdev, Andrew Lunn, Florian Fainelli, Vladimir Oltean,
	Marek Behún
In-Reply-To: <20190825035915.13112-3-marek.behun@nic.cz>

Hi Marek,

On Sun, 25 Aug 2019 05:59:11 +0200, Marek Behún <marek.behun@nic.cz> wrote:
> This patch moves the functions operating on the hidden debug registers
> into it's own file, port_hidden.c. The functions prefix is renamed from
> mv88e6390_hidden_ to mv88e6xxx_port_hidden_, to be consistent with the
> rest of this driver. The macros are prefixed with MV88E6XXX_ prefix, and
> are changed not to use the BIT() macro nor bit shifts, since the rest of
> the port.h file does not use it.
> 
> We also add the support for setting the Block Address field when
> operating hidden registers. Marvell's mdio examples for SERDES settings
> on Topaz use Block Address 0x7 when reading/writing hidden registers,
> and although the specification says that block must be set to 0xf, those
> settings are reachable only with Block Address 0x7.
> 
> Signed-off-by: Marek Behún <marek.behun@nic.cz>

Even though there are several semantic changes here, bisectability is
preserved, and blaming the new functions will easily show which commit
introduced this new API and why. Much more readable and well documented:

Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>

^ permalink raw reply

* Re: [PATCH net-next v3 3/6] net: dsa: mv88e6xxx: create serdes_get_lane chip operation
From: Vivien Didelot @ 2019-08-25 15:48 UTC (permalink / raw)
  To: Marek Behún
  Cc: netdev, Andrew Lunn, Florian Fainelli, Vladimir Oltean,
	Marek Behún
In-Reply-To: <20190825035915.13112-4-marek.behun@nic.cz>

Hi Marek,

On Sun, 25 Aug 2019 05:59:12 +0200, Marek Behún <marek.behun@nic.cz> wrote:
>  void mv88e6390x_serdes_irq_free(struct mv88e6xxx_chip *chip, int port)
>  {
> -	int lane = mv88e6390x_serdes_get_lane(chip, port);
> +	int err;
> +	s8 lane;
>  
> -	if (lane == -ENODEV)
> +	err = mv88e6xxx_serdes_get_lane(chip, port, &lane);
> +	if (err) {
> +		dev_err(chip->dev, "Unable to free SERDES irq: %d\n", err);
>  		return;
> -
> +	}
>  	if (lane < 0)
>  		return;

[...]

> -int mv88e6390x_serdes_get_lane(struct mv88e6xxx_chip *chip, int port);
> +/* Put the SERDES lane address a port is using into *lane. If a port has
> + * multiple lanes, should put the first lane the port is using. If a port does
> + * not have a lane, put -1 into *lane.
> + */
> +static inline int mv88e6xxx_serdes_get_lane(struct mv88e6xxx_chip *chip,
> +					    int port, s8 *lane)
> +{
> +	if (!chip->info->ops->serdes_get_lane)
> +		return -EOPNOTSUPP;
> +
> +	return chip->info->ops->serdes_get_lane(chip, port, lane);
> +}

Using an invalid value is only useful if it can be interpreted by the other
functions of the API. So I would've make lane an u8, assuming it gets modified
only on success, which would result in calls like this one:

    u8 lane;
    int err;

    err = mv88e6xxx_serdes_get_lane(chip, port, &lane);
    if (err) {
        if (err == -ENODEV)
            err = 0;
        return err;
    }

But at least it is well documented, so we can eventually fine tuned later:

Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>


Thank you!

	Vivien

^ permalink raw reply

* Re: [PATCH net-next v3 4/6] net: dsa: mv88e6xxx: simplify SERDES code for Topaz and Peridot
From: Vivien Didelot @ 2019-08-25 16:02 UTC (permalink / raw)
  To: Marek Behún
  Cc: netdev, Andrew Lunn, Florian Fainelli, Vladimir Oltean,
	Marek Behún
In-Reply-To: <20190825035915.13112-5-marek.behun@nic.cz>

Hi Marek,

On Sun, 25 Aug 2019 05:59:13 +0200, Marek Behún <marek.behun@nic.cz> wrote:
> +int mv88e6341_serdes_get_lane(struct mv88e6xxx_chip *chip, int port, s8 *lane)
> +{
> +	u8 cmode = chip->ports[port].cmode;
> +
> +	*lane = -1;
> +
> +	if (port != 5)
> +		return 0;

Aren't you relying on -ENODEV as well?

> +
> +	if (cmode == MV88E6XXX_PORT_STS_CMODE_1000BASE_X ||
> +	    cmode == MV88E6XXX_PORT_STS_CMODE_SGMII ||
> +	    cmode == MV88E6XXX_PORT_STS_CMODE_2500BASEX)
> +		*lane = MV88E6341_PORT5_LANE;
> +
> +	return 0;
> +}

^ permalink raw reply

* Re: [PATCH net-next v3 3/6] net: dsa: mv88e6xxx: create serdes_get_lane chip operation
From: Vivien Didelot @ 2019-08-25 16:12 UTC (permalink / raw)
  To: Marek Behún
  Cc: netdev, Andrew Lunn, Florian Fainelli, Vladimir Oltean,
	Marek Behún
In-Reply-To: <20190825035915.13112-4-marek.behun@nic.cz>

On Sun, 25 Aug 2019 05:59:12 +0200, Marek Behún <marek.behun@nic.cz> wrote:
>  int mv88e6390_serdes_power(struct mv88e6xxx_chip *chip, int port, bool on)
>  {
> -	int lane;
> -
> -	lane = mv88e6390_serdes_get_lane(chip, port);
> -	if (lane == -ENODEV)
> -		return 0;
> +	s8 lane;
> +	int err;
>  
> +	err = mv88e6xxx_serdes_get_lane(chip, port, &lane);
> +	if (err)
> +		return err;
>  	if (lane < 0)
> -		return lane;
> +		return 0;

In fact you're also relying on -ENODEV, which is what you return here (and in
other places) instead of 0. So I'm afraid you have to address my comment now...

^ permalink raw reply

* Re: [PATCH net] ipv4: mpls: fix mpls_xmit for iptunnel
From: David Ahern @ 2019-08-25 16:23 UTC (permalink / raw)
  To: Alexey Kodanev, netdev; +Cc: David Miller
In-Reply-To: <1566582703-26567-1-git-send-email-alexey.kodanev@oracle.com>

On 8/23/19 11:51 AM, Alexey Kodanev wrote:
> When using mpls over gre/gre6 setup, rt->rt_gw4 address is not set, the
> same for rt->rt_gw_family.  Therefore, when rt->rt_gw_family is checked
> in mpls_xmit(), neigh_xmit() call is skipped. As a result, such setup
> doesn't work anymore.
> 
> This issue was found with LTP mpls03 tests.
> 
> Fixes: 1550c171935d ("ipv4: Prepare rtable for IPv6 gateway")
> Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
> ---
>  net/mpls/mpls_iptunnel.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)

ok, I see now. This is a device only route with MPLS encap:

10.23.0.1  encap mpls  60 dev ltp_v0 scope link

and the change reverts to 5.1 behavior unless the gateway is IPv6 (new
behavior). Thanks for the patch.

Reviewed-by: David Ahern <dsahern@gmail.com>

^ permalink raw reply

* Re: [PATCH 0/3] Add NETIF_F_HW_BRIDGE feature
From: Horatiu Vultur @ 2019-08-25 16:30 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Florian Fainelli, roopa, nikolay, davem, UNGLinuxDriver,
	alexandre.belloni, allan.nielsen, netdev, linux-kernel, bridge
In-Reply-To: <20190824074204.GA15041@nanopsycho.orion>

The 08/24/2019 09:42, Jiri Pirko wrote:
> External E-Mail
> 
> 
> Sat, Aug 24, 2019 at 01:25:02AM CEST, f.fainelli@gmail.com wrote:
> >On 8/22/19 12:07 PM, Horatiu Vultur wrote:
> >> Current implementation of the SW bridge is setting the interfaces in
> >> promisc mode when they are added to bridge if learning of the frames is
> >> enabled.
> >> In case of Ocelot which has HW capabilities to switch frames, it is not
> >> needed to set the ports in promisc mode because the HW already capable of
> >> doing that. Therefore add NETIF_F_HW_BRIDGE feature to indicate that the
> >> HW has bridge capabilities. Therefore the SW bridge doesn't need to set
> >> the ports in promisc mode to do the switching.
> >
> >Then do not do anything when the ndo_set_rx_mode() for the ocelot
> >network device is called and indicates that IFF_PROMISC is set and that
> >your network port is a bridge port member. That is what mlxsw does AFAICT.

Yes, but then if you want to monitor all the traffic on a bridge port
you will not be able to do that. And this seems to be a limitation.
This is the case for mlxsw and ocelot(it doesn't implement at all
promisc mode) and might be others.

> 
> Correct.
> 
> >
> >As other pointed out, the Linux bridge implements a software bridge by
> >default, and because it needs to operate on a wide variety of network
> >devices, all with different capabilities, the easiest way to make sure
> >that all management (IGMP, BPDU, etc. ) as well as non-management
> >traffic can make it to the bridge ports, is to put the network devices
> >in promiscuous mode.

What if the HW can copy all the management traffic to the SW bridge and
HW knows to learn and flood frames. Then there is no point to set a
network port in promisc mode just because it is a bridge port member.
> >If this is suboptimal for you, you can take
> >shortcuts in your driver that do not hinder the overall functionality.

If I add this check, I don't see how any other network drivers will be
affected by this. If a network driver will start to use this then it
needs to know that the HW should be configure to include CPU in the
flood mask and to know which addresses can be reached through SW bridge.

> >
> >> This optimization takes places only if all the interfaces that are part
> >> of the bridge have this flag and have the same network driver.
> >> 
> >> If the bridge interfaces is added in promisc mode then also the ports part
> >> of the bridge are set in promisc mode.
> >> 
> >> Horatiu Vultur (3):
> >>   net: Add HW_BRIDGE offload feature
> >>   net: mscc: Use NETIF_F_HW_BRIDGE
> >>   net: mscc: Implement promisc mode.
> >> 
> >>  drivers/net/ethernet/mscc/ocelot.c | 26 ++++++++++++++++++++++++--
> >>  include/linux/netdev_features.h    |  3 +++
> >>  net/bridge/br_if.c                 | 29 ++++++++++++++++++++++++++++-
> >>  net/core/ethtool.c                 |  1 +
> >>  4 files changed, 56 insertions(+), 3 deletions(-)
> >> 
> >
> >
> >-- 
> >Florian
> 

-- 
/Horatiu

^ permalink raw reply

* Re: [PATCH net-next v3 4/6] net: dsa: mv88e6xxx: simplify SERDES code for Topaz and Peridot
From: Marek Behun @ 2019-08-25 16:36 UTC (permalink / raw)
  To: Vivien Didelot; +Cc: netdev, Andrew Lunn, Florian Fainelli, Vladimir Oltean
In-Reply-To: <20190825120232.GG6729@t480s.localdomain>

On Sun, 25 Aug 2019 12:02:32 -0400
Vivien Didelot <vivien.didelot@gmail.com> wrote:

> Aren't you relying on -ENODEV as well?

Vivien, I am not relying o -ENODEV. I changed the serdes_get_lane
semantics:
 - previously:
   - if port has a lane for current cmode, return given lane number
   - otherwise return -ENODEV
   - if other error occured during serdes_get_lane, return that error
     (this never happened, because all implementations only need port
     number and cmode, and cmode is cached, so no function was called
     that could err)
 - after this commit:
   - if port has a lane for current cmode, return 0 and put lane number
     into *lane
   - otherwise return 0 and put -1 into *lane
   - if error occured, return that error number

I removed the -ENODEV semantics for "no lane on port" event.
There are two reasons for this:
  1. once you requested lane number to be put into a place pointed to
     by a pointer, rather than the return value, the code seemed better
     to me (you may of course disagree, this is a personal opinion) when
     I did:
       if (err)
           return err;
       if (lane < 0)
           return 0;
     rather than
       if (err == -ENODEV)
           return 0;
       if (err)
           return err;
  2. some future implementation may actually need to call some MDIO
     read/write functions, which may or may not return -ENODEV. That
     could conflict with the -ENODEV returned when there is no lane.

Marek

^ permalink raw reply

* Re: [PATCH net] openvswitch: Fix log message in ovs conntrack
From: Pravin Shelar @ 2019-08-25 16:48 UTC (permalink / raw)
  To: Yi-Hung Wei; +Cc: Linux Kernel Network Developers
In-Reply-To: <1566432970-13377-1-git-send-email-yihung.wei@gmail.com>

On Wed, Aug 21, 2019 at 5:27 PM Yi-Hung Wei <yihung.wei@gmail.com> wrote:
>
> Fixes: 06bd2bdf19d2 ("openvswitch: Add timeout support to ct action")
> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
> ---
>  net/openvswitch/conntrack.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
> index 45498fcf540d..0d5ab4957ec0 100644
> --- a/net/openvswitch/conntrack.c
> +++ b/net/openvswitch/conntrack.c
> @@ -1574,7 +1574,7 @@ static int parse_ct(const struct nlattr *attr, struct ovs_conntrack_info *info,
>                 case OVS_CT_ATTR_TIMEOUT:
>                         memcpy(info->timeout, nla_data(a), nla_len(a));
>                         if (!memchr(info->timeout, '\0', nla_len(a))) {
> -                               OVS_NLERR(log, "Invalid conntrack helper");
> +                               OVS_NLERR(log, "Invalid conntrack timeout");
>                                 return -EINVAL;
>                         }
>                         break;
Acked-by: Pravin B Shelar <pshelar@ovn.org>

Thanks,
Pravin.

^ permalink raw reply

* Re: [PATCH net v2] openvswitch: Fix conntrack cache with timeout
From: Pravin Shelar @ 2019-08-25 16:53 UTC (permalink / raw)
  To: Yi-Hung Wei; +Cc: Linux Kernel Network Developers
In-Reply-To: <CAG1aQhLkrvVADEtAFcdO+GX03SGx7GMW1YyLVTGrPjiCz1HMyQ@mail.gmail.com>

On Fri, Aug 23, 2019 at 9:40 AM Yi-Hung Wei <yihung.wei@gmail.com> wrote:
>
> On Thu, Aug 22, 2019 at 11:51 PM Pravin Shelar <pshelar@ovn.org> wrote:
> >
> > On Thu, Aug 22, 2019 at 1:28 PM Yi-Hung Wei <yihung.wei@gmail.com> wrote:
> > >
> > > This patch addresses a conntrack cache issue with timeout policy.
> > > Currently, we do not check if the timeout extension is set properly in the
> > > cached conntrack entry.  Thus, after packet recirculate from conntrack
> > > action, the timeout policy is not applied properly.  This patch fixes the
> > > aforementioned issue.
> > >
> > > Fixes: 06bd2bdf19d2 ("openvswitch: Add timeout support to ct action")
> > > Reported-by: kbuild test robot <lkp@intel.com>
> > > Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
> > > ---
> > > v1->v2: Fix rcu dereference issue reported by kbuild test robot.
> > > ---
> > >  net/openvswitch/conntrack.c | 13 +++++++++++++
> > >  1 file changed, 13 insertions(+)
> > >
> > > diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
> > > index 848c6eb55064..4d7896135e73 100644
> > > --- a/net/openvswitch/conntrack.c
> > > +++ b/net/openvswitch/conntrack.c
> > > @@ -1657,6 +1666,10 @@ int ovs_ct_copy_action(struct net *net, const struct nlattr *attr,
> > >                                       ct_info.timeout))
> > >                         pr_info_ratelimited("Failed to associated timeout "
> > >                                             "policy `%s'\n", ct_info.timeout);
> > > +               else
> > > +                       ct_info.nf_ct_timeout = rcu_dereference(
> > > +                               nf_ct_timeout_find(ct_info.ct)->timeout);
> > Is this dereference safe from NULL pointer?
>
> Hi Pravin,
>
> Thanks for your review.  I am not sure if
> nf_ct_timeout_find(ct_info.ct) will return NULL in this case.
>
> We only run into this statement when ct_info.timeout[0] is set, and it
> is only set in parse_ct() when CONFIG_NF_CONNTRACK_TIMEOUT is
> configured.  Also, in this else condition the timeout extension is
> supposed to be set properly by nf_ct_set_timeout().
>

Sounds good.
Acked-by: Pravin B Shelar <pshelar@ovn.org>

Thanks,
Pravin.

^ permalink raw reply

* Re: [PATCH net 1/2] openvswitch: Properly set L4 keys on "later" IP fragments.
From: Pravin Shelar @ 2019-08-25 16:54 UTC (permalink / raw)
  To: Justin Pettit; +Cc: Linux Kernel Network Developers, Joe Stringer
In-Reply-To: <20190824165846.79627-1-jpettit@ovn.org>

On Sat, Aug 24, 2019 at 9:58 AM Justin Pettit <jpettit@ovn.org> wrote:
>
> When IP fragments are reassembled before being sent to conntrack, the
> key from the last fragment is used.  Unless there are reordering
> issues, the last fragment received will not contain the L4 ports, so the
> key for the reassembled datagram won't contain them.  This patch updates
> the key once we have a reassembled datagram.
>
> Signed-off-by: Justin Pettit <jpettit@ovn.org>
> ---
>  net/openvswitch/conntrack.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
> index 848c6eb55064..f40ad2a42086 100644
> --- a/net/openvswitch/conntrack.c
> +++ b/net/openvswitch/conntrack.c
> @@ -524,6 +524,10 @@ static int handle_fragments(struct net *net, struct sw_flow_key *key,
>                 return -EPFNOSUPPORT;
>         }
>
> +       /* The key extracted from the fragment that completed this datagram
> +        * likely didn't have an L4 header, so regenerate it. */
> +       ovs_flow_key_update(skb, key);
> +
>         key->ip.frag = OVS_FRAG_TYPE_NONE;
>         skb_clear_hash(skb);
>         skb->ignore_df = 1;
> --

Looks good to me.

Acked-by: Pravin B Shelar <pshelar@ovn.org>

Thanks,
Pravin.

^ permalink raw reply

* Re: [PATCH net 2/2] openvswitch: Clear the L4 portion of the key for "later" fragments.
From: Pravin Shelar @ 2019-08-25 17:00 UTC (permalink / raw)
  To: Justin Pettit; +Cc: Linux Kernel Network Developers, Joe Stringer
In-Reply-To: <20190824165846.79627-2-jpettit@ovn.org>

On Sat, Aug 24, 2019 at 9:58 AM Justin Pettit <jpettit@ovn.org> wrote:
>
> Only the first fragment in a datagram contains the L4 headers.  When the
> Open vSwitch module parses a packet, it always sets the IP protocol
> field in the key, but can only set the L4 fields on the first fragment.
> The original behavior would not clear the L4 portion of the key, so
> garbage values would be sent in the key for "later" fragments.  This
> patch clears the L4 fields in that circumstance to prevent sending those
> garbage values as part of the upcall.
>
> Signed-off-by: Justin Pettit <jpettit@ovn.org>
> ---
>  net/openvswitch/flow.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
> index bc89e16e0505..0fb2cec08523 100644
> --- a/net/openvswitch/flow.c
> +++ b/net/openvswitch/flow.c
> @@ -623,6 +623,7 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
>                 offset = nh->frag_off & htons(IP_OFFSET);
>                 if (offset) {
>                         key->ip.frag = OVS_FRAG_TYPE_LATER;
> +                       memset(&key->tp, 0, sizeof(key->tp));
>                         return 0;
>                 }
>                 if (nh->frag_off & htons(IP_MF) ||
> @@ -740,8 +741,10 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
>                         return error;
>                 }
>
> -               if (key->ip.frag == OVS_FRAG_TYPE_LATER)
> +               if (key->ip.frag == OVS_FRAG_TYPE_LATER) {
> +                       memset(&key->tp, 0, sizeof(key->tp));
>                         return 0;
> +               }
>                 if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP)
>                         key->ip.frag = OVS_FRAG_TYPE_FIRST;
>

Looks good to me.

Acked-by: Pravin B Shelar <pshelar@ovn.org>

Thanks,
Pravin.

^ permalink raw reply

* [Patch net] net_sched: fix a NULL pointer deref in ipt action
From: Cong Wang @ 2019-08-25 17:01 UTC (permalink / raw)
  To: netdev; +Cc: Cong Wang, itugrok, Jamal Hadi Salim, Jiri Pirko

The net pointer in struct xt_tgdtor_param is not explicitly
initialized therefore is still NULL when dereferencing it.
So we have to find a way to pass the correct net pointer to
ipt_destroy_target().

The best way I find is just saving the net pointer inside the per
netns struct tcf_idrinfo, which could make this patch smaller.

Fixes: 0c66dc1ea3f0 ("netfilter: conntrack: register hooks in netns when needed by ruleset")
Reported-and-tested-by: itugrok@yahoo.com
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 include/net/act_api.h      |  4 +++-
 net/sched/act_bpf.c        |  2 +-
 net/sched/act_connmark.c   |  2 +-
 net/sched/act_csum.c       |  2 +-
 net/sched/act_ct.c         |  2 +-
 net/sched/act_ctinfo.c     |  2 +-
 net/sched/act_gact.c       |  2 +-
 net/sched/act_ife.c        |  2 +-
 net/sched/act_ipt.c        | 11 ++++++-----
 net/sched/act_mirred.c     |  2 +-
 net/sched/act_mpls.c       |  2 +-
 net/sched/act_nat.c        |  2 +-
 net/sched/act_pedit.c      |  2 +-
 net/sched/act_police.c     |  2 +-
 net/sched/act_sample.c     |  2 +-
 net/sched/act_simple.c     |  2 +-
 net/sched/act_skbedit.c    |  2 +-
 net/sched/act_skbmod.c     |  2 +-
 net/sched/act_tunnel_key.c |  2 +-
 net/sched/act_vlan.c       |  2 +-
 20 files changed, 27 insertions(+), 24 deletions(-)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index c61a1bf4e3de..3a1a72990fce 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -15,6 +15,7 @@
 struct tcf_idrinfo {
 	struct mutex	lock;
 	struct idr	action_idr;
+	struct net	*net;
 };
 
 struct tc_action_ops;
@@ -108,7 +109,7 @@ struct tc_action_net {
 };
 
 static inline
-int tc_action_net_init(struct tc_action_net *tn,
+int tc_action_net_init(struct net *net, struct tc_action_net *tn,
 		       const struct tc_action_ops *ops)
 {
 	int err = 0;
@@ -117,6 +118,7 @@ int tc_action_net_init(struct tc_action_net *tn,
 	if (!tn->idrinfo)
 		return -ENOMEM;
 	tn->ops = ops;
+	tn->idrinfo->net = net;
 	mutex_init(&tn->idrinfo->lock);
 	idr_init(&tn->idrinfo->action_idr);
 	return err;
diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
index fd1f7e799e23..04b7bd4ec751 100644
--- a/net/sched/act_bpf.c
+++ b/net/sched/act_bpf.c
@@ -422,7 +422,7 @@ static __net_init int bpf_init_net(struct net *net)
 {
 	struct tc_action_net *tn = net_generic(net, bpf_net_id);
 
-	return tc_action_net_init(tn, &act_bpf_ops);
+	return tc_action_net_init(net, tn, &act_bpf_ops);
 }
 
 static void __net_exit bpf_exit_net(struct list_head *net_list)
diff --git a/net/sched/act_connmark.c b/net/sched/act_connmark.c
index 32ac04d77a45..2b43cacf82af 100644
--- a/net/sched/act_connmark.c
+++ b/net/sched/act_connmark.c
@@ -231,7 +231,7 @@ static __net_init int connmark_init_net(struct net *net)
 {
 	struct tc_action_net *tn = net_generic(net, connmark_net_id);
 
-	return tc_action_net_init(tn, &act_connmark_ops);
+	return tc_action_net_init(net, tn, &act_connmark_ops);
 }
 
 static void __net_exit connmark_exit_net(struct list_head *net_list)
diff --git a/net/sched/act_csum.c b/net/sched/act_csum.c
index 9b9288267a54..d3cfad88dc3a 100644
--- a/net/sched/act_csum.c
+++ b/net/sched/act_csum.c
@@ -714,7 +714,7 @@ static __net_init int csum_init_net(struct net *net)
 {
 	struct tc_action_net *tn = net_generic(net, csum_net_id);
 
-	return tc_action_net_init(tn, &act_csum_ops);
+	return tc_action_net_init(net, tn, &act_csum_ops);
 }
 
 static void __net_exit csum_exit_net(struct list_head *net_list)
diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index 33a1a7406e87..cdd6f3818097 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -939,7 +939,7 @@ static __net_init int ct_init_net(struct net *net)
 		tn->labels = true;
 	}
 
-	return tc_action_net_init(&tn->tn, &act_ct_ops);
+	return tc_action_net_init(net, &tn->tn, &act_ct_ops);
 }
 
 static void __net_exit ct_exit_net(struct list_head *net_list)
diff --git a/net/sched/act_ctinfo.c b/net/sched/act_ctinfo.c
index 06ef74b74911..0dbcfd1dca7b 100644
--- a/net/sched/act_ctinfo.c
+++ b/net/sched/act_ctinfo.c
@@ -376,7 +376,7 @@ static __net_init int ctinfo_init_net(struct net *net)
 {
 	struct tc_action_net *tn = net_generic(net, ctinfo_net_id);
 
-	return tc_action_net_init(tn, &act_ctinfo_ops);
+	return tc_action_net_init(net, tn, &act_ctinfo_ops);
 }
 
 static void __net_exit ctinfo_exit_net(struct list_head *net_list)
diff --git a/net/sched/act_gact.c b/net/sched/act_gact.c
index 8f0140c6ca58..324f1d1f6d47 100644
--- a/net/sched/act_gact.c
+++ b/net/sched/act_gact.c
@@ -278,7 +278,7 @@ static __net_init int gact_init_net(struct net *net)
 {
 	struct tc_action_net *tn = net_generic(net, gact_net_id);
 
-	return tc_action_net_init(tn, &act_gact_ops);
+	return tc_action_net_init(net, tn, &act_gact_ops);
 }
 
 static void __net_exit gact_exit_net(struct list_head *net_list)
diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c
index 92ee853d43e6..3a31e241c647 100644
--- a/net/sched/act_ife.c
+++ b/net/sched/act_ife.c
@@ -890,7 +890,7 @@ static __net_init int ife_init_net(struct net *net)
 {
 	struct tc_action_net *tn = net_generic(net, ife_net_id);
 
-	return tc_action_net_init(tn, &act_ife_ops);
+	return tc_action_net_init(net, tn, &act_ife_ops);
 }
 
 static void __net_exit ife_exit_net(struct list_head *net_list)
diff --git a/net/sched/act_ipt.c b/net/sched/act_ipt.c
index ce2c30a591d2..214a03d405cf 100644
--- a/net/sched/act_ipt.c
+++ b/net/sched/act_ipt.c
@@ -61,12 +61,13 @@ static int ipt_init_target(struct net *net, struct xt_entry_target *t,
 	return 0;
 }
 
-static void ipt_destroy_target(struct xt_entry_target *t)
+static void ipt_destroy_target(struct xt_entry_target *t, struct net *net)
 {
 	struct xt_tgdtor_param par = {
 		.target   = t->u.kernel.target,
 		.targinfo = t->data,
 		.family   = NFPROTO_IPV4,
+		.net      = net,
 	};
 	if (par.target->destroy != NULL)
 		par.target->destroy(&par);
@@ -78,7 +79,7 @@ static void tcf_ipt_release(struct tc_action *a)
 	struct tcf_ipt *ipt = to_ipt(a);
 
 	if (ipt->tcfi_t) {
-		ipt_destroy_target(ipt->tcfi_t);
+		ipt_destroy_target(ipt->tcfi_t, a->idrinfo->net);
 		kfree(ipt->tcfi_t);
 	}
 	kfree(ipt->tcfi_tname);
@@ -180,7 +181,7 @@ static int __tcf_ipt_init(struct net *net, unsigned int id, struct nlattr *nla,
 
 	spin_lock_bh(&ipt->tcf_lock);
 	if (ret != ACT_P_CREATED) {
-		ipt_destroy_target(ipt->tcfi_t);
+		ipt_destroy_target(ipt->tcfi_t, net);
 		kfree(ipt->tcfi_tname);
 		kfree(ipt->tcfi_t);
 	}
@@ -350,7 +351,7 @@ static __net_init int ipt_init_net(struct net *net)
 {
 	struct tc_action_net *tn = net_generic(net, ipt_net_id);
 
-	return tc_action_net_init(tn, &act_ipt_ops);
+	return tc_action_net_init(net, tn, &act_ipt_ops);
 }
 
 static void __net_exit ipt_exit_net(struct list_head *net_list)
@@ -399,7 +400,7 @@ static __net_init int xt_init_net(struct net *net)
 {
 	struct tc_action_net *tn = net_generic(net, xt_net_id);
 
-	return tc_action_net_init(tn, &act_xt_ops);
+	return tc_action_net_init(net, tn, &act_xt_ops);
 }
 
 static void __net_exit xt_exit_net(struct list_head *net_list)
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index be3f88dfc37e..9d1bf508075a 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -453,7 +453,7 @@ static __net_init int mirred_init_net(struct net *net)
 {
 	struct tc_action_net *tn = net_generic(net, mirred_net_id);
 
-	return tc_action_net_init(tn, &act_mirred_ops);
+	return tc_action_net_init(net, tn, &act_mirred_ops);
 }
 
 static void __net_exit mirred_exit_net(struct list_head *net_list)
diff --git a/net/sched/act_mpls.c b/net/sched/act_mpls.c
index 0f299e3b618c..e168df0e008a 100644
--- a/net/sched/act_mpls.c
+++ b/net/sched/act_mpls.c
@@ -375,7 +375,7 @@ static __net_init int mpls_init_net(struct net *net)
 {
 	struct tc_action_net *tn = net_generic(net, mpls_net_id);
 
-	return tc_action_net_init(tn, &act_mpls_ops);
+	return tc_action_net_init(net, tn, &act_mpls_ops);
 }
 
 static void __net_exit mpls_exit_net(struct list_head *net_list)
diff --git a/net/sched/act_nat.c b/net/sched/act_nat.c
index 7b858c11b1b5..ea4c5359e7df 100644
--- a/net/sched/act_nat.c
+++ b/net/sched/act_nat.c
@@ -327,7 +327,7 @@ static __net_init int nat_init_net(struct net *net)
 {
 	struct tc_action_net *tn = net_generic(net, nat_net_id);
 
-	return tc_action_net_init(tn, &act_nat_ops);
+	return tc_action_net_init(net, tn, &act_nat_ops);
 }
 
 static void __net_exit nat_exit_net(struct list_head *net_list)
diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
index 17360c6faeaa..cdfaa79382a2 100644
--- a/net/sched/act_pedit.c
+++ b/net/sched/act_pedit.c
@@ -498,7 +498,7 @@ static __net_init int pedit_init_net(struct net *net)
 {
 	struct tc_action_net *tn = net_generic(net, pedit_net_id);
 
-	return tc_action_net_init(tn, &act_pedit_ops);
+	return tc_action_net_init(net, tn, &act_pedit_ops);
 }
 
 static void __net_exit pedit_exit_net(struct list_head *net_list)
diff --git a/net/sched/act_police.c b/net/sched/act_police.c
index 49cec3e64a4d..6315e0f8d26e 100644
--- a/net/sched/act_police.c
+++ b/net/sched/act_police.c
@@ -371,7 +371,7 @@ static __net_init int police_init_net(struct net *net)
 {
 	struct tc_action_net *tn = net_generic(net, police_net_id);
 
-	return tc_action_net_init(tn, &act_police_ops);
+	return tc_action_net_init(net, tn, &act_police_ops);
 }
 
 static void __net_exit police_exit_net(struct list_head *net_list)
diff --git a/net/sched/act_sample.c b/net/sched/act_sample.c
index 595308d60133..7eff363f9f03 100644
--- a/net/sched/act_sample.c
+++ b/net/sched/act_sample.c
@@ -265,7 +265,7 @@ static __net_init int sample_init_net(struct net *net)
 {
 	struct tc_action_net *tn = net_generic(net, sample_net_id);
 
-	return tc_action_net_init(tn, &act_sample_ops);
+	return tc_action_net_init(net, tn, &act_sample_ops);
 }
 
 static void __net_exit sample_exit_net(struct list_head *net_list)
diff --git a/net/sched/act_simple.c b/net/sched/act_simple.c
index 33aefa25b545..6120e56117ca 100644
--- a/net/sched/act_simple.c
+++ b/net/sched/act_simple.c
@@ -232,7 +232,7 @@ static __net_init int simp_init_net(struct net *net)
 {
 	struct tc_action_net *tn = net_generic(net, simp_net_id);
 
-	return tc_action_net_init(tn, &act_simp_ops);
+	return tc_action_net_init(net, tn, &act_simp_ops);
 }
 
 static void __net_exit simp_exit_net(struct list_head *net_list)
diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
index 37dced00b63d..6a8d3337c577 100644
--- a/net/sched/act_skbedit.c
+++ b/net/sched/act_skbedit.c
@@ -336,7 +336,7 @@ static __net_init int skbedit_init_net(struct net *net)
 {
 	struct tc_action_net *tn = net_generic(net, skbedit_net_id);
 
-	return tc_action_net_init(tn, &act_skbedit_ops);
+	return tc_action_net_init(net, tn, &act_skbedit_ops);
 }
 
 static void __net_exit skbedit_exit_net(struct list_head *net_list)
diff --git a/net/sched/act_skbmod.c b/net/sched/act_skbmod.c
index 7da3518e18ef..888437f97ba6 100644
--- a/net/sched/act_skbmod.c
+++ b/net/sched/act_skbmod.c
@@ -287,7 +287,7 @@ static __net_init int skbmod_init_net(struct net *net)
 {
 	struct tc_action_net *tn = net_generic(net, skbmod_net_id);
 
-	return tc_action_net_init(tn, &act_skbmod_ops);
+	return tc_action_net_init(net, tn, &act_skbmod_ops);
 }
 
 static void __net_exit skbmod_exit_net(struct list_head *net_list)
diff --git a/net/sched/act_tunnel_key.c b/net/sched/act_tunnel_key.c
index 6d0debdc9b97..2f83a79f76aa 100644
--- a/net/sched/act_tunnel_key.c
+++ b/net/sched/act_tunnel_key.c
@@ -600,7 +600,7 @@ static __net_init int tunnel_key_init_net(struct net *net)
 {
 	struct tc_action_net *tn = net_generic(net, tunnel_key_net_id);
 
-	return tc_action_net_init(tn, &act_tunnel_key_ops);
+	return tc_action_net_init(net, tn, &act_tunnel_key_ops);
 }
 
 static void __net_exit tunnel_key_exit_net(struct list_head *net_list)
diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index a3c9eea1ee8a..287a30bf8930 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -334,7 +334,7 @@ static __net_init int vlan_init_net(struct net *net)
 {
 	struct tc_action_net *tn = net_generic(net, vlan_net_id);
 
-	return tc_action_net_init(tn, &act_vlan_ops);
+	return tc_action_net_init(net, tn, &act_vlan_ops);
 }
 
 static void __net_exit vlan_exit_net(struct list_head *net_list)
-- 
2.21.0


^ permalink raw reply related

* Re: [PATCH bpf-next 2/4] xsk: add proper barriers and {READ, WRITE}_ONCE-correctness for state
From: Björn Töpel @ 2019-08-25 17:06 UTC (permalink / raw)
  To: Jonathan Lemon
  Cc: Alexei Starovoitov, Daniel Borkmann, Netdev,
	Björn Töpel, Karlsson, Magnus, Magnus Karlsson, bpf,
	syzbot+c82697e3043781e08802, Hillf Danton, Ilya Maximets
In-Reply-To: <E18E14E3-3EC2-4A74-BB51-726FCDDA3881@gmail.com>

On Fri, 23 Aug 2019 at 18:46, Jonathan Lemon <jonathan.lemon@gmail.com> wrote:
>
>
>
> On 22 Aug 2019, at 2:13, Björn Töpel wrote:
>
> > From: Björn Töpel <bjorn.topel@intel.com>
> >
> > The state variable was read, and written outside the control mutex
> > (struct xdp_sock, mutex), without proper barriers and {READ,
> > WRITE}_ONCE correctness.
> >
> > In this commit this issue is addressed, and the state member is now
> > used a point of synchronization whether the socket is setup correctly
> > or not.
> >
> > This also fixes a race, found by syzcaller, in xsk_poll() where umem
> > could be accessed when stale.
> >
> > Suggested-by: Hillf Danton <hdanton@sina.com>
> > Reported-by: syzbot+c82697e3043781e08802@syzkaller.appspotmail.com
> > Fixes: 77cd0d7b3f25 ("xsk: add support for need_wakeup flag in AF_XDP
> > rings")
> > Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> > ---
> >  net/xdp/xsk.c | 57
> > ++++++++++++++++++++++++++++++++++++---------------
> >  1 file changed, 41 insertions(+), 16 deletions(-)
> >
> > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > index f3351013c2a5..31236e61069b 100644
> > --- a/net/xdp/xsk.c
> > +++ b/net/xdp/xsk.c
> > @@ -162,10 +162,23 @@ static int __xsk_rcv_zc(struct xdp_sock *xs,
> > struct xdp_buff *xdp, u32 len)
> >       return err;
> >  }
> >
> > +static bool xsk_is_bound(struct xdp_sock *xs)
> > +{
> > +     if (READ_ONCE(xs->state) == XSK_BOUND) {
> > +             /* Matches smp_wmb() in bind(). */
> > +             smp_rmb();
> > +             return true;
> > +     }
> > +     return false;
> > +}
> > +
> >  int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
> >  {
> >       u32 len;
> >
> > +     if (!xsk_is_bound(xs))
> > +             return -EINVAL;
> > +
> >       if (xs->dev != xdp->rxq->dev || xs->queue_id !=
> > xdp->rxq->queue_index)
> >               return -EINVAL;
> >
> > @@ -362,6 +375,8 @@ static int xsk_sendmsg(struct socket *sock, struct
> > msghdr *m, size_t total_len)
> >       struct sock *sk = sock->sk;
> >       struct xdp_sock *xs = xdp_sk(sk);
> >
> > +     if (unlikely(!xsk_is_bound(xs)))
> > +             return -ENXIO;
> >       if (unlikely(!xs->dev))
> >               return -ENXIO;
>
> Can probably remove the xs->dev check now, replaced by checking
> xs->state, right?
>

Indeed! Thanks for catching; I'll send out a v2.


Björn

>
> >       if (unlikely(!(xs->dev->flags & IFF_UP)))
> > @@ -378,10 +393,15 @@ static unsigned int xsk_poll(struct file *file,
> > struct socket *sock,
> >                            struct poll_table_struct *wait)
> >  {
> >       unsigned int mask = datagram_poll(file, sock, wait);
> > -     struct sock *sk = sock->sk;
> > -     struct xdp_sock *xs = xdp_sk(sk);
> > -     struct net_device *dev = xs->dev;
> > -     struct xdp_umem *umem = xs->umem;
> > +     struct xdp_sock *xs = xdp_sk(sock->sk);
> > +     struct net_device *dev;
> > +     struct xdp_umem *umem;
> > +
> > +     if (unlikely(!xsk_is_bound(xs)))
> > +             return mask;
> > +
> > +     dev = xs->dev;
> > +     umem = xs->umem;
> >
> >       if (umem->need_wakeup)
> >               dev->netdev_ops->ndo_xsk_wakeup(dev, xs->queue_id,
> > @@ -417,10 +437,9 @@ static void xsk_unbind_dev(struct xdp_sock *xs)
> >  {
> >       struct net_device *dev = xs->dev;
> >
> > -     if (!dev || xs->state != XSK_BOUND)
> > +     if (xs->state != XSK_BOUND)
> >               return;
> > -
> > -     xs->state = XSK_UNBOUND;
> > +     WRITE_ONCE(xs->state, XSK_UNBOUND);
> >
> >       /* Wait for driver to stop using the xdp socket. */
> >       xdp_del_sk_umem(xs->umem, xs);
> > @@ -495,7 +514,9 @@ static int xsk_release(struct socket *sock)
> >       local_bh_enable();
> >
> >       xsk_delete_from_maps(xs);
> > +     mutex_lock(&xs->mutex);
> >       xsk_unbind_dev(xs);
> > +     mutex_unlock(&xs->mutex);
> >
> >       xskq_destroy(xs->rx);
> >       xskq_destroy(xs->tx);
> > @@ -589,19 +610,18 @@ static int xsk_bind(struct socket *sock, struct
> > sockaddr *addr, int addr_len)
> >               }
> >
> >               umem_xs = xdp_sk(sock->sk);
> > -             if (!umem_xs->umem) {
> > -                     /* No umem to inherit. */
> > +             if (!xsk_is_bound(umem_xs)) {
> >                       err = -EBADF;
> >                       sockfd_put(sock);
> >                       goto out_unlock;
> > -             } else if (umem_xs->dev != dev || umem_xs->queue_id != qid) {
> > +             }
> > +             if (umem_xs->dev != dev || umem_xs->queue_id != qid) {
> >                       err = -EINVAL;
> >                       sockfd_put(sock);
> >                       goto out_unlock;
> >               }
> > -
> >               xdp_get_umem(umem_xs->umem);
> > -             xs->umem = umem_xs->umem;
> > +             WRITE_ONCE(xs->umem, umem_xs->umem);
> >               sockfd_put(sock);
> >       } else if (!xs->umem || !xdp_umem_validate_queues(xs->umem)) {
> >               err = -EINVAL;
> > @@ -626,10 +646,15 @@ static int xsk_bind(struct socket *sock, struct
> > sockaddr *addr, int addr_len)
> >       xdp_add_sk_umem(xs->umem, xs);
> >
> >  out_unlock:
> > -     if (err)
> > +     if (err) {
> >               dev_put(dev);
> > -     else
> > -             xs->state = XSK_BOUND;
> > +     } else {
> > +             /* Matches smp_rmb() in bind() for shared umem
> > +              * sockets, and xsk_is_bound().
> > +              */
> > +             smp_wmb();
> > +             WRITE_ONCE(xs->state, XSK_BOUND);
> > +     }
> >  out_release:
> >       mutex_unlock(&xs->mutex);
> >       rtnl_unlock();
> > @@ -869,7 +894,7 @@ static int xsk_mmap(struct file *file, struct
> > socket *sock,
> >       unsigned long pfn;
> >       struct page *qpg;
> >
> > -     if (xs->state != XSK_READY)
> > +     if (READ_ONCE(xs->state) != XSK_READY)
> >               return -EBUSY;
> >
> >       if (offset == XDP_PGOFF_RX_RING) {
> > --
> > 2.20.1

^ permalink raw reply

* [PATCH net-next v2 0/6] net: dsa: explicit programmation of VLAN on CPU ports
From: Vivien Didelot @ 2019-08-25 17:25 UTC (permalink / raw)
  To: netdev; +Cc: davem, f.fainelli, andrew, olteanv, Vivien Didelot

When a VLAN is programmed on a user port, every switch of the fabric also
program the CPU ports and the DSA links as part of the VLAN. To do that,
DSA makes use of bitmaps to prepare all members of a VLAN.

While this is expected for DSA links which are used as conduit between
interconnected switches, only the dedicated CPU port of the slave must be
programmed, not all CPU ports of the fabric. This may also cause problems in
other corners of DSA such as the tag_8021q.c driver, which needs to program
its ports manually, CPU port included.

We need the dsa_port_vlan_{add,del} functions and its dsa_port_vid_{add,del}
variants to simply trigger the VLAN programmation without any logic in them,
but they may currently skip the operation based on the bridge device state.

This patchset gets rid of the bitmap operations, and moves the bridge device
check as well as the explicit programmation of CPU ports where they belong,
in the slave code.

While at it, clear the VLAN flags before programming a CPU port, as it
doesn't make sense to forward the PVID flag for example for such ports.

Changes in v2: only clear the PVID flag.

Vivien Didelot (6):
  net: dsa: remove bitmap operations
  net: dsa: do not skip -EOPNOTSUPP in dsa_port_vid_add
  net: dsa: add slave VLAN helpers
  net: dsa: check bridge VLAN in slave operations
  net: dsa: program VLAN on CPU port from slave
  net: dsa: clear VLAN PVID flag for CPU port

 include/net/dsa.h |   3 --
 net/dsa/dsa2.c    |  14 -----
 net/dsa/port.c    |  14 ++---
 net/dsa/slave.c   |  79 +++++++++++++++++++++++----
 net/dsa/switch.c  | 135 +++++++++++++++++++++-------------------------
 5 files changed, 136 insertions(+), 109 deletions(-)

-- 
2.23.0

^ permalink raw reply

* [PATCH net-next v2 1/6] net: dsa: remove bitmap operations
From: Vivien Didelot @ 2019-08-25 17:25 UTC (permalink / raw)
  To: netdev; +Cc: davem, f.fainelli, andrew, olteanv, Vivien Didelot
In-Reply-To: <20190825172520.22798-1-vivien.didelot@gmail.com>

The bitmap operations were introduced to simplify the switch drivers
in the future, since most of them could implement the common VLAN and
MDB operations (add, del, dump) with simple functions taking all target
ports at once, and thus limiting the number of hardware accesses.

Programming an MDB or VLAN this way in a single operation would clearly
simplify the drivers a lot but would require a new get-set interface
in DSA. The usage of such bitmap from the stack also raised concerned
in the past, leading to the dynamic allocation of a new ds->_bitmap
member in the dsa_switch structure. So let's get rid of them for now.

This commit nicely wraps the ds->ops->port_{mdb,vlan}_{prepare,add}
switch operations into new dsa_switch_{mdb,vlan}_{prepare,add}
variants not using any bitmap argument anymore.

New dsa_switch_{mdb,vlan}_match helpers have been introduced to make
clear which local port of a switch must be programmed with the target
object. While the targeted user port is an obvious candidate, the
DSA links must also be programmed, as well as the CPU port for VLANs.

While at it, also remove local variables that are only used once.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
---
 include/net/dsa.h |   3 --
 net/dsa/dsa2.c    |  14 -----
 net/dsa/switch.c  | 132 +++++++++++++++++++++-------------------------
 3 files changed, 59 insertions(+), 90 deletions(-)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 147b757ef8ea..96acb14ec1a8 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -275,9 +275,6 @@ struct dsa_switch {
 	 */
 	bool			vlan_filtering;
 
-	unsigned long		*bitmap;
-	unsigned long		_bitmap;
-
 	/* Dynamically allocated ports, keep last */
 	size_t num_ports;
 	struct dsa_port ports[];
diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index 8c4eccb0cfe6..f8445fa73448 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -834,20 +834,6 @@ struct dsa_switch *dsa_switch_alloc(struct device *dev, size_t n)
 	if (!ds)
 		return NULL;
 
-	/* We avoid allocating memory outside dsa_switch
-	 * if it is not needed.
-	 */
-	if (n <= sizeof(ds->_bitmap) * 8) {
-		ds->bitmap = &ds->_bitmap;
-	} else {
-		ds->bitmap = devm_kcalloc(dev,
-					  BITS_TO_LONGS(n),
-					  sizeof(unsigned long),
-					  GFP_KERNEL);
-		if (unlikely(!ds->bitmap))
-			return NULL;
-	}
-
 	ds->dev = dev;
 	ds->num_ports = n;
 
diff --git a/net/dsa/switch.c b/net/dsa/switch.c
index 09d9286b27cc..489eb7b430a4 100644
--- a/net/dsa/switch.c
+++ b/net/dsa/switch.c
@@ -128,57 +128,51 @@ static int dsa_switch_fdb_del(struct dsa_switch *ds,
 	return ds->ops->port_fdb_del(ds, port, info->addr, info->vid);
 }
 
-static int
-dsa_switch_mdb_prepare_bitmap(struct dsa_switch *ds,
-			      const struct switchdev_obj_port_mdb *mdb,
-			      const unsigned long *bitmap)
+static bool dsa_switch_mdb_match(struct dsa_switch *ds, int port,
+				 struct dsa_notifier_mdb_info *info)
+{
+	if (ds->index == info->sw_index && port == info->port)
+		return true;
+
+	if (dsa_is_dsa_port(ds, port))
+		return true;
+
+	return false;
+}
+
+static int dsa_switch_mdb_prepare(struct dsa_switch *ds,
+				  struct dsa_notifier_mdb_info *info)
 {
 	int port, err;
 
 	if (!ds->ops->port_mdb_prepare || !ds->ops->port_mdb_add)
 		return -EOPNOTSUPP;
 
-	for_each_set_bit(port, bitmap, ds->num_ports) {
-		err = ds->ops->port_mdb_prepare(ds, port, mdb);
-		if (err)
-			return err;
+	for (port = 0; port < ds->num_ports; port++) {
+		if (dsa_switch_mdb_match(ds, port, info)) {
+			err = ds->ops->port_mdb_prepare(ds, port, info->mdb);
+			if (err)
+				return err;
+		}
 	}
 
 	return 0;
 }
 
-static void dsa_switch_mdb_add_bitmap(struct dsa_switch *ds,
-				      const struct switchdev_obj_port_mdb *mdb,
-				      const unsigned long *bitmap)
-{
-	int port;
-
-	if (!ds->ops->port_mdb_add)
-		return;
-
-	for_each_set_bit(port, bitmap, ds->num_ports)
-		ds->ops->port_mdb_add(ds, port, mdb);
-}
-
 static int dsa_switch_mdb_add(struct dsa_switch *ds,
 			      struct dsa_notifier_mdb_info *info)
 {
-	const struct switchdev_obj_port_mdb *mdb = info->mdb;
-	struct switchdev_trans *trans = info->trans;
 	int port;
 
-	/* Build a mask of Multicast group members */
-	bitmap_zero(ds->bitmap, ds->num_ports);
-	if (ds->index == info->sw_index)
-		set_bit(info->port, ds->bitmap);
-	for (port = 0; port < ds->num_ports; port++)
-		if (dsa_is_dsa_port(ds, port))
-			set_bit(port, ds->bitmap);
+	if (switchdev_trans_ph_prepare(info->trans))
+		return dsa_switch_mdb_prepare(ds, info);
 
-	if (switchdev_trans_ph_prepare(trans))
-		return dsa_switch_mdb_prepare_bitmap(ds, mdb, ds->bitmap);
+	if (!ds->ops->port_mdb_add)
+		return 0;
 
-	dsa_switch_mdb_add_bitmap(ds, mdb, ds->bitmap);
+	for (port = 0; port < ds->num_ports; port++)
+		if (dsa_switch_mdb_match(ds, port, info))
+			ds->ops->port_mdb_add(ds, port, info->mdb);
 
 	return 0;
 }
@@ -186,13 +180,11 @@ static int dsa_switch_mdb_add(struct dsa_switch *ds,
 static int dsa_switch_mdb_del(struct dsa_switch *ds,
 			      struct dsa_notifier_mdb_info *info)
 {
-	const struct switchdev_obj_port_mdb *mdb = info->mdb;
-
 	if (!ds->ops->port_mdb_del)
 		return -EOPNOTSUPP;
 
 	if (ds->index == info->sw_index)
-		return ds->ops->port_mdb_del(ds, info->port, mdb);
+		return ds->ops->port_mdb_del(ds, info->port, info->mdb);
 
 	return 0;
 }
@@ -234,59 +226,55 @@ static int dsa_port_vlan_check(struct dsa_switch *ds, int port,
 			     (void *)vlan);
 }
 
-static int
-dsa_switch_vlan_prepare_bitmap(struct dsa_switch *ds,
-			       const struct switchdev_obj_port_vlan *vlan,
-			       const unsigned long *bitmap)
+static bool dsa_switch_vlan_match(struct dsa_switch *ds, int port,
+				  struct dsa_notifier_vlan_info *info)
+{
+	if (ds->index == info->sw_index && port == info->port)
+		return true;
+
+	if (dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port))
+		return true;
+
+	return false;
+}
+
+static int dsa_switch_vlan_prepare(struct dsa_switch *ds,
+				   struct dsa_notifier_vlan_info *info)
 {
 	int port, err;
 
 	if (!ds->ops->port_vlan_prepare || !ds->ops->port_vlan_add)
 		return -EOPNOTSUPP;
 
-	for_each_set_bit(port, bitmap, ds->num_ports) {
-		err = dsa_port_vlan_check(ds, port, vlan);
-		if (err)
-			return err;
+	for (port = 0; port < ds->num_ports; port++) {
+		if (dsa_switch_vlan_match(ds, port, info)) {
+			err = dsa_port_vlan_check(ds, port, info->vlan);
+			if (err)
+				return err;
 
-		err = ds->ops->port_vlan_prepare(ds, port, vlan);
-		if (err)
-			return err;
+			err = ds->ops->port_vlan_prepare(ds, port, info->vlan);
+			if (err)
+				return err;
+		}
 	}
 
 	return 0;
 }
 
-static void
-dsa_switch_vlan_add_bitmap(struct dsa_switch *ds,
-			   const struct switchdev_obj_port_vlan *vlan,
-			   const unsigned long *bitmap)
-{
-	int port;
-
-	for_each_set_bit(port, bitmap, ds->num_ports)
-		ds->ops->port_vlan_add(ds, port, vlan);
-}
-
 static int dsa_switch_vlan_add(struct dsa_switch *ds,
 			       struct dsa_notifier_vlan_info *info)
 {
-	const struct switchdev_obj_port_vlan *vlan = info->vlan;
-	struct switchdev_trans *trans = info->trans;
 	int port;
 
-	/* Build a mask of VLAN members */
-	bitmap_zero(ds->bitmap, ds->num_ports);
-	if (ds->index == info->sw_index)
-		set_bit(info->port, ds->bitmap);
-	for (port = 0; port < ds->num_ports; port++)
-		if (dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port))
-			set_bit(port, ds->bitmap);
+	if (switchdev_trans_ph_prepare(info->trans))
+		return dsa_switch_vlan_prepare(ds, info);
 
-	if (switchdev_trans_ph_prepare(trans))
-		return dsa_switch_vlan_prepare_bitmap(ds, vlan, ds->bitmap);
+	if (!ds->ops->port_vlan_add)
+		return 0;
 
-	dsa_switch_vlan_add_bitmap(ds, vlan, ds->bitmap);
+	for (port = 0; port < ds->num_ports; port++)
+		if (dsa_switch_vlan_match(ds, port, info))
+			ds->ops->port_vlan_add(ds, port, info->vlan);
 
 	return 0;
 }
@@ -294,13 +282,11 @@ static int dsa_switch_vlan_add(struct dsa_switch *ds,
 static int dsa_switch_vlan_del(struct dsa_switch *ds,
 			       struct dsa_notifier_vlan_info *info)
 {
-	const struct switchdev_obj_port_vlan *vlan = info->vlan;
-
 	if (!ds->ops->port_vlan_del)
 		return -EOPNOTSUPP;
 
 	if (ds->index == info->sw_index)
-		return ds->ops->port_vlan_del(ds, info->port, vlan);
+		return ds->ops->port_vlan_del(ds, info->port, info->vlan);
 
 	return 0;
 }
-- 
2.23.0


^ permalink raw reply related

* [PATCH net-next v2 2/6] net: dsa: do not skip -EOPNOTSUPP in dsa_port_vid_add
From: Vivien Didelot @ 2019-08-25 17:25 UTC (permalink / raw)
  To: netdev; +Cc: davem, f.fainelli, andrew, olteanv, Vivien Didelot
In-Reply-To: <20190825172520.22798-1-vivien.didelot@gmail.com>

Currently dsa_port_vid_add returns 0 if the switch returns -EOPNOTSUPP.

This function is used in the tag_8021q.c code to offload the PVID of
ports, which would simply not work if .port_vlan_add is not supported
by the underlying switch.

Do not skip -EOPNOTSUPP in dsa_port_vid_add but only when necessary,
that is to say in dsa_slave_vlan_rx_add_vid.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
---
 net/dsa/port.c  | 4 ++--
 net/dsa/slave.c | 7 +++++--
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/net/dsa/port.c b/net/dsa/port.c
index f75301456430..ef28df7ecbde 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -382,8 +382,8 @@ int dsa_port_vid_add(struct dsa_port *dp, u16 vid, u16 flags)
 
 	trans.ph_prepare = true;
 	err = dsa_port_vlan_add(dp, &vlan, &trans);
-	if (err == -EOPNOTSUPP)
-		return 0;
+	if (err)
+		return err;
 
 	trans.ph_prepare = false;
 	return dsa_port_vlan_add(dp, &vlan, &trans);
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 33f41178afcc..9d61d9dbf001 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -1082,8 +1082,11 @@ static int dsa_slave_vlan_rx_add_vid(struct net_device *dev, __be16 proto,
 			return -EBUSY;
 	}
 
-	/* This API only allows programming tagged, non-PVID VIDs */
-	return dsa_port_vid_add(dp, vid, 0);
+	ret = dsa_port_vid_add(dp, vid, 0);
+	if (ret && ret != -EOPNOTSUPP)
+		return ret;
+
+	return 0;
 }
 
 static int dsa_slave_vlan_rx_kill_vid(struct net_device *dev, __be16 proto,
-- 
2.23.0


^ permalink raw reply related

* [PATCH net-next v2 4/6] net: dsa: check bridge VLAN in slave operations
From: Vivien Didelot @ 2019-08-25 17:25 UTC (permalink / raw)
  To: netdev; +Cc: davem, f.fainelli, andrew, olteanv, Vivien Didelot
In-Reply-To: <20190825172520.22798-1-vivien.didelot@gmail.com>

The bridge VLANs are not offloaded by dsa_port_vlan_* if the port is
not bridged or if its bridge is not VLAN aware.

This is a good thing but other corners of DSA, such as the tag_8021q
driver, may need to program VLANs regardless the bridge state.

And also because bridge_dev is specific to user ports anyway, move
these checks were it belongs, one layer up in the slave code.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Suggested-by: Vladimir Oltean <olteanv@gmail.com>
---
 net/dsa/port.c  | 10 ++--------
 net/dsa/slave.c | 12 ++++++++++++
 2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/net/dsa/port.c b/net/dsa/port.c
index ef28df7ecbde..9b54e5a76297 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -348,10 +348,7 @@ int dsa_port_vlan_add(struct dsa_port *dp,
 		.vlan = vlan,
 	};
 
-	if (!dp->bridge_dev || br_vlan_enabled(dp->bridge_dev))
-		return dsa_port_notify(dp, DSA_NOTIFIER_VLAN_ADD, &info);
-
-	return 0;
+	return dsa_port_notify(dp, DSA_NOTIFIER_VLAN_ADD, &info);
 }
 
 int dsa_port_vlan_del(struct dsa_port *dp,
@@ -363,10 +360,7 @@ int dsa_port_vlan_del(struct dsa_port *dp,
 		.vlan = vlan,
 	};
 
-	if (!dp->bridge_dev || br_vlan_enabled(dp->bridge_dev))
-		return dsa_port_notify(dp, DSA_NOTIFIER_VLAN_DEL, &info);
-
-	return 0;
+	return dsa_port_notify(dp, DSA_NOTIFIER_VLAN_DEL, &info);
 }
 
 int dsa_port_vid_add(struct dsa_port *dp, u16 vid, u16 flags)
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 8f5126c41282..82e48d247b81 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -323,6 +323,9 @@ static int dsa_slave_vlan_add(struct net_device *dev,
 	if (obj->orig_dev != dev)
 		return -EOPNOTSUPP;
 
+	if (dp->bridge_dev && !br_vlan_enabled(dp->bridge_dev))
+		return 0;
+
 	vlan = *SWITCHDEV_OBJ_PORT_VLAN(obj);
 
 	err = dsa_port_vlan_add(dp, &vlan, trans);
@@ -377,6 +380,9 @@ static int dsa_slave_vlan_del(struct net_device *dev,
 	if (obj->orig_dev != dev)
 		return -EOPNOTSUPP;
 
+	if (dp->bridge_dev && !br_vlan_enabled(dp->bridge_dev))
+		return 0;
+
 	return dsa_port_vlan_del(dp, SWITCHDEV_OBJ_PORT_VLAN(obj));
 }
 
@@ -1099,6 +1105,9 @@ static int dsa_slave_vlan_rx_add_vid(struct net_device *dev, __be16 proto,
 	 * need to emulate the switchdev prepare + commit phase.
 	 */
 	if (dp->bridge_dev) {
+		if (!br_vlan_enabled(dp->bridge_dev))
+			return 0;
+
 		/* br_vlan_get_info() returns -EINVAL or -ENOENT if the
 		 * device, respectively the VID is not found, returning
 		 * 0 means success, which is a failure for us here.
@@ -1126,6 +1135,9 @@ static int dsa_slave_vlan_rx_kill_vid(struct net_device *dev, __be16 proto,
 	 * need to emulate the switchdev prepare + commit phase.
 	 */
 	if (dp->bridge_dev) {
+		if (!br_vlan_enabled(dp->bridge_dev))
+			return 0;
+
 		/* br_vlan_get_info() returns -EINVAL or -ENOENT if the
 		 * device, respectively the VID is not found, returning
 		 * 0 means success, which is a failure for us here.
-- 
2.23.0


^ permalink raw reply related

* [PATCH net-next v2 5/6] net: dsa: program VLAN on CPU port from slave
From: Vivien Didelot @ 2019-08-25 17:25 UTC (permalink / raw)
  To: netdev; +Cc: davem, f.fainelli, andrew, olteanv, Vivien Didelot
In-Reply-To: <20190825172520.22798-1-vivien.didelot@gmail.com>

DSA currently programs a VLAN on the CPU port implicitly after the
related notifier is received by a switch.

While we still need to do this transparent programmation of the DSA
links in the fabric, programming the CPU port this way may cause
problems in some corners such as the tag_8021q driver.

Because the dedicated CPU port is specific to a slave, make their
programmation explicit a few layers up, in the slave code.

Note that technically, DSA links have a dedicated CPU port as well,
but since they are only used as conduit between interconnected switches
of a fabric, programming them transparently this way is what we want.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
---
 net/dsa/slave.c  | 14 ++++++++++++++
 net/dsa/switch.c |  5 ++++-
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 82e48d247b81..8267c156a51a 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -332,6 +332,10 @@ static int dsa_slave_vlan_add(struct net_device *dev,
 	if (err)
 		return err;
 
+	err = dsa_port_vlan_add(dp->cpu_dp, &vlan, trans);
+	if (err)
+		return err;
+
 	return 0;
 }
 
@@ -383,6 +387,9 @@ static int dsa_slave_vlan_del(struct net_device *dev,
 	if (dp->bridge_dev && !br_vlan_enabled(dp->bridge_dev))
 		return 0;
 
+	/* Do not deprogram the CPU port as it may be shared with other user
+	 * ports which can be members of this VLAN as well.
+	 */
 	return dsa_port_vlan_del(dp, SWITCHDEV_OBJ_PORT_VLAN(obj));
 }
 
@@ -1121,6 +1128,10 @@ static int dsa_slave_vlan_rx_add_vid(struct net_device *dev, __be16 proto,
 	if (ret && ret != -EOPNOTSUPP)
 		return ret;
 
+	ret = dsa_port_vid_add(dp->cpu_dp, vid, 0);
+	if (ret && ret != -EOPNOTSUPP)
+		return ret;
+
 	return 0;
 }
 
@@ -1151,6 +1162,9 @@ static int dsa_slave_vlan_rx_kill_vid(struct net_device *dev, __be16 proto,
 	if (ret == -EOPNOTSUPP)
 		ret = 0;
 
+	/* Do not deprogram the CPU port as it may be shared with other user
+	 * ports which can be members of this VLAN as well.
+	 */
 	return ret;
 }
 
diff --git a/net/dsa/switch.c b/net/dsa/switch.c
index 489eb7b430a4..6a9607518823 100644
--- a/net/dsa/switch.c
+++ b/net/dsa/switch.c
@@ -232,7 +232,7 @@ static bool dsa_switch_vlan_match(struct dsa_switch *ds, int port,
 	if (ds->index == info->sw_index && port == info->port)
 		return true;
 
-	if (dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port))
+	if (dsa_is_dsa_port(ds, port))
 		return true;
 
 	return false;
@@ -288,6 +288,9 @@ static int dsa_switch_vlan_del(struct dsa_switch *ds,
 	if (ds->index == info->sw_index)
 		return ds->ops->port_vlan_del(ds, info->port, info->vlan);
 
+	/* Do not deprogram the DSA links as they may be used as conduit
+	 * for other VLAN members in the fabric.
+	 */
 	return 0;
 }
 
-- 
2.23.0


^ permalink raw reply related

* [PATCH net-next v2 3/6] net: dsa: add slave VLAN helpers
From: Vivien Didelot @ 2019-08-25 17:25 UTC (permalink / raw)
  To: netdev; +Cc: davem, f.fainelli, andrew, olteanv, Vivien Didelot
In-Reply-To: <20190825172520.22798-1-vivien.didelot@gmail.com>

Add dsa_slave_vlan_add and dsa_slave_vlan_del helpers to handle
SWITCHDEV_OBJ_ID_PORT_VLAN switchdev objects. Also copy the
switchdev_obj_port_vlan structure on add since we will modify it in
future patches.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
---
 net/dsa/slave.c | 40 +++++++++++++++++++++++++++++++++-------
 1 file changed, 33 insertions(+), 7 deletions(-)

diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 9d61d9dbf001..8f5126c41282 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -312,6 +312,26 @@ static int dsa_slave_port_attr_set(struct net_device *dev,
 	return ret;
 }
 
+static int dsa_slave_vlan_add(struct net_device *dev,
+			      const struct switchdev_obj *obj,
+			      struct switchdev_trans *trans)
+{
+	struct dsa_port *dp = dsa_slave_to_port(dev);
+	struct switchdev_obj_port_vlan vlan;
+	int err;
+
+	if (obj->orig_dev != dev)
+		return -EOPNOTSUPP;
+
+	vlan = *SWITCHDEV_OBJ_PORT_VLAN(obj);
+
+	err = dsa_port_vlan_add(dp, &vlan, trans);
+	if (err)
+		return err;
+
+	return 0;
+}
+
 static int dsa_slave_port_obj_add(struct net_device *dev,
 				  const struct switchdev_obj *obj,
 				  struct switchdev_trans *trans,
@@ -339,10 +359,7 @@ static int dsa_slave_port_obj_add(struct net_device *dev,
 				       trans);
 		break;
 	case SWITCHDEV_OBJ_ID_PORT_VLAN:
-		if (obj->orig_dev != dev)
-			return -EOPNOTSUPP;
-		err = dsa_port_vlan_add(dp, SWITCHDEV_OBJ_PORT_VLAN(obj),
-					trans);
+		err = dsa_slave_vlan_add(dev, obj, trans);
 		break;
 	default:
 		err = -EOPNOTSUPP;
@@ -352,6 +369,17 @@ static int dsa_slave_port_obj_add(struct net_device *dev,
 	return err;
 }
 
+static int dsa_slave_vlan_del(struct net_device *dev,
+			      const struct switchdev_obj *obj)
+{
+	struct dsa_port *dp = dsa_slave_to_port(dev);
+
+	if (obj->orig_dev != dev)
+		return -EOPNOTSUPP;
+
+	return dsa_port_vlan_del(dp, SWITCHDEV_OBJ_PORT_VLAN(obj));
+}
+
 static int dsa_slave_port_obj_del(struct net_device *dev,
 				  const struct switchdev_obj *obj)
 {
@@ -371,9 +399,7 @@ static int dsa_slave_port_obj_del(struct net_device *dev,
 		err = dsa_port_mdb_del(dp->cpu_dp, SWITCHDEV_OBJ_PORT_MDB(obj));
 		break;
 	case SWITCHDEV_OBJ_ID_PORT_VLAN:
-		if (obj->orig_dev != dev)
-			return -EOPNOTSUPP;
-		err = dsa_port_vlan_del(dp, SWITCHDEV_OBJ_PORT_VLAN(obj));
+		err = dsa_slave_vlan_del(dev, obj);
 		break;
 	default:
 		err = -EOPNOTSUPP;
-- 
2.23.0


^ permalink raw reply related

* [PATCH net-next v2 6/6] net: dsa: clear VLAN PVID flag for CPU port
From: Vivien Didelot @ 2019-08-25 17:25 UTC (permalink / raw)
  To: netdev; +Cc: davem, f.fainelli, andrew, olteanv, Vivien Didelot
In-Reply-To: <20190825172520.22798-1-vivien.didelot@gmail.com>

When the bridge offloads a VLAN on a slave port, we also need to
program its dedicated CPU port as a member of the VLAN.

Drivers may handle the CPU port's membership as they want. For example,
Marvell as a special "Unmodified" mode to pass frames as is through
such ports.

Even though DSA expects the drivers to handle the CPU port membership,
it does not make sense to program user VLANs as PVID on the CPU port.
This patch clears this flag before programming the CPU port.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Suggested-by: Vladimir Oltean <olteanv@gmail.com>
---
 net/dsa/slave.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 8267c156a51a..d84225125099 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -332,6 +332,12 @@ static int dsa_slave_vlan_add(struct net_device *dev,
 	if (err)
 		return err;

+	/* We need the dedicated CPU port to be a member of the VLAN as well.
+	 * Even though drivers often handle CPU membership in special ways,
+	 * it doesn't make sense to program a PVID, so clear this flag.
+	 */
+	vlan.flags &= ~BRIDGE_VLAN_INFO_PVID;
+
 	err = dsa_port_vlan_add(dp->cpu_dp, &vlan, trans);
 	if (err)
 		return err;
-- 
2.23.0

^ permalink raw reply related

* Re: [PATCH net-next 6/6] net: dsa: clear VLAN flags for CPU port
From: Vivien Didelot @ 2019-08-25 17:28 UTC (permalink / raw)
  To: Vladimir Oltean; +Cc: Florian Fainelli, netdev, David S. Miller, Andrew Lunn
In-Reply-To: <CA+h21hpML8GLQ-n5AsJ4+BAYy8dwTQuAGYRwcZrwHxY9wy=6aQ@mail.gmail.com>

On Sat, 24 Aug 2019 22:53:08 +0300, Vladimir Oltean <olteanv@gmail.com> wrote:
> Vivien, can't you just unset the PVID flag? Keeping the same
> tagged/untagged setting on ingress as on egress does make more sense.

Why not. I've sent a v2 with this single change.

Thanks,

	Vivien

^ permalink raw reply

* [PATCH net-next v4 3/3] dt-bindings: net: ethernet: Update mt7622 docs and dts to reflect the new phylink API
From: René van Dorst @ 2019-08-25 17:43 UTC (permalink / raw)
  To: John Crispin, Sean Wang, Nelson Chang, David S . Miller,
	Matthias Brugger
  Cc: netdev, linux-arm-kernel, linux-mediatek, linux-mips,
	Russell King, Frank Wunderlich, Stefan Roese, René van Dorst
In-Reply-To: <20190825174341.20750-1-opensource@vdorst.com>

This patch the removes the recently added mediatek,physpeed property.
Use the fixed-link property speed = <2500> to set the phy in 2.5Gbit.
See mt7622-bananapi-bpi-r64.dts for a working example.

Signed-off-by: René van Dorst <opensource@vdorst.com>
--
v3->v4:
* no change
v2->v3:
* no change
v1->v2:
* SGMII port only support BASE-X at 2.5Gbit.
---
 .../arm/mediatek/mediatek,sgmiisys.txt        |  2 --
 .../dts/mediatek/mt7622-bananapi-bpi-r64.dts  | 28 +++++++++++++------
 arch/arm64/boot/dts/mediatek/mt7622.dtsi      |  1 -
 3 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/Documentation/devicetree/bindings/arm/mediatek/mediatek,sgmiisys.txt b/Documentation/devicetree/bindings/arm/mediatek/mediatek,sgmiisys.txt
index f5518f26a914..30cb645c0e54 100644
--- a/Documentation/devicetree/bindings/arm/mediatek/mediatek,sgmiisys.txt
+++ b/Documentation/devicetree/bindings/arm/mediatek/mediatek,sgmiisys.txt
@@ -9,8 +9,6 @@ Required Properties:
 	- "mediatek,mt7622-sgmiisys", "syscon"
 	- "mediatek,mt7629-sgmiisys", "syscon"
 - #clock-cells: Must be 1
-- mediatek,physpeed: Should be one of "auto", "1000" or "2500" to match up
-		     the capability of the target PHY.
 
 The SGMIISYS controller uses the common clk binding from
 Documentation/devicetree/bindings/clock/clock-bindings.txt
diff --git a/arch/arm64/boot/dts/mediatek/mt7622-bananapi-bpi-r64.dts b/arch/arm64/boot/dts/mediatek/mt7622-bananapi-bpi-r64.dts
index 710c5c3d87d3..83e10591e0e5 100644
--- a/arch/arm64/boot/dts/mediatek/mt7622-bananapi-bpi-r64.dts
+++ b/arch/arm64/boot/dts/mediatek/mt7622-bananapi-bpi-r64.dts
@@ -115,24 +115,34 @@
 };
 
 &eth {
-	pinctrl-names = "default";
-	pinctrl-0 = <&eth_pins>;
 	status = "okay";
+	gmac0: mac@0 {
+		compatible = "mediatek,eth-mac";
+		reg = <0>;
+		phy-mode = "2500base-x";
+
+		fixed-link {
+			speed = <2500>;
+			full-duplex;
+			pause;
+		};
+	};
 
 	gmac1: mac@1 {
 		compatible = "mediatek,eth-mac";
 		reg = <1>;
-		phy-handle = <&phy5>;
+		phy-mode = "rgmii";
+
+		fixed-link {
+			speed = <1000>;
+			full-duplex;
+			pause;
+		};
 	};
 
-	mdio-bus {
+	mdio: mdio-bus {
 		#address-cells = <1>;
 		#size-cells = <0>;
-
-		phy5: ethernet-phy@5 {
-			reg = <5>;
-			phy-mode = "sgmii";
-		};
 	};
 };
 
diff --git a/arch/arm64/boot/dts/mediatek/mt7622.dtsi b/arch/arm64/boot/dts/mediatek/mt7622.dtsi
index d1e13d340e26..dac51e98204c 100644
--- a/arch/arm64/boot/dts/mediatek/mt7622.dtsi
+++ b/arch/arm64/boot/dts/mediatek/mt7622.dtsi
@@ -931,6 +931,5 @@
 			     "syscon";
 		reg = <0 0x1b128000 0 0x3000>;
 		#clock-cells = <1>;
-		mediatek,physpeed = "2500";
 	};
 };
-- 
2.20.1


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox