Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 1/4] dt-bindings: net: Add aspeed,ast2600-mdio binding
From: Andrew Jeffery @ 2019-07-30  0:47 UTC (permalink / raw)
  To: Rob Herring
  Cc: netdev, David Miller, Mark Rutland, Joel Stanley, Andrew Lunn,
	Florian Fainelli, Heiner Kallweit, devicetree,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
	linux-aspeed, linux-kernel@vger.kernel.org
In-Reply-To: <CAL_JsqLytwfsoyS6TSnpPgTjRTOR0TeQwroX21AHqj3A1mPJ5Q@mail.gmail.com>



On Tue, 30 Jul 2019, at 09:07, Rob Herring wrote:
> On Sun, Jul 28, 2019 at 10:39 PM Andrew Jeffery <andrew@aj.id.au> wrote:
> >
> > The AST2600 splits out the MDIO bus controller from the MAC into its own
> > IP block and rearranges the register layout. Add a new binding to
> > describe the new hardware.
> >
> > Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
> > ---
> >  .../bindings/net/aspeed,ast2600-mdio.yaml     | 61 +++++++++++++++++++
> >  1 file changed, 61 insertions(+)
> >  create mode 100644 Documentation/devicetree/bindings/net/aspeed,ast2600-mdio.yaml
> >
> > diff --git a/Documentation/devicetree/bindings/net/aspeed,ast2600-mdio.yaml b/Documentation/devicetree/bindings/net/aspeed,ast2600-mdio.yaml
> > new file mode 100644
> > index 000000000000..fa86f6438473
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/net/aspeed,ast2600-mdio.yaml
> > @@ -0,0 +1,61 @@
> > +# SPDX-License-Identifier: GPL-2.0-or-later
> > +%YAML 1.2
> > +---
> > +$id: http://devicetree.org/schemas/net/aspeed,ast2600-mdio.yaml#
> > +$schema: http://devicetree.org/meta-schemas/core.yaml#
> > +
> > +title: ASPEED AST2600 MDIO Controller
> > +
> > +maintainers:
> > +  - Andrew Jeffery <andrew@aj.id.au>
> > +
> > +description: |+
> > +  The ASPEED AST2600 MDIO controller is the third iteration of ASPEED's MDIO
> > +  bus register interface, this time also separating out the controller from the
> > +  MAC.
> > +
> 
> Should have a:
> 
> allOf:
>   - $ref: "mdio.yaml#"

Ack.

> 
> > +properties:
> > +  compatible:
> > +    const: aspeed,ast2600-mdio
> > +  reg:
> > +    maxItems: 1
> > +    description: The register range of the MDIO controller instance
> 
> > +  "#address-cells":
> > +    const: 1
> > +  "#size-cells":
> > +    const: 0
> 
> Then you can drop these 2.

Great.

> 
> > +
> > +patternProperties:
> > +  "^ethernet-phy@[a-f0-9]$":
> > +    properties:
> > +      reg:
> > +        description:
> > +          The MDIO bus index of the PHY
> 
> And this.
> 
> > +      compatible:
> > +        enum:
> > +          - ethernet-phy-ieee802.3-c22
> > +          - ethernet-phy-ieee802.3-c45
> 
> This isn't specific to ASpeed either and is already covered by
> ethernet-phy.yaml.
> 
> So that means none of the child node schema is needed here.

Bah, I'd developed the patches on v5.2 while waiting for v5.3-rc1 to come
out.

> 
> > +    required:
> > +      - reg
> > +
> > +required:
> > +  - compatible
> > +  - reg
> > +  - "#address-cells"
> > +  - "#size-cells"
> > +
> > +examples:
> > +  - |
> > +    mdio0: mdio@1e650000 {
> > +            compatible = "aspeed,ast2600-mdio";
> > +            reg = <0x1e650000 0x8>;
> > +            #address-cells = <1>;
> > +            #size-cells = <0>;
> > +
> > +            status = "okay";
> 
> Don't show status in examples.

Sorry, copy/paste at fault.

Thanks for the feedback.

Andrew

> 
> > +
> > +            ethphy0: ethernet-phy@0 {
> > +                    compatible = "ethernet-phy-ieee802.3-c22";
> > +                    reg = <0>;
> > +            };
> > +    };
> > --
> > 2.20.1
> >
>

^ permalink raw reply

* Re: [PATCH net] net: ipv6: Fix a bug in ndisc_send_ns when netdev only has a global address
From: David Ahern @ 2019-07-30  0:23 UTC (permalink / raw)
  To: Su Yanjun, davem, kuznet, yoshfuji; +Cc: netdev, linux-kernel
In-Reply-To: <1564368591-42301-1-git-send-email-suyj.fnst@cn.fujitsu.com>

On 7/28/19 8:49 PM, Su Yanjun wrote:
> When we send mpls packets and the interface only has a
> manual global ipv6 address, then the two hosts cant communicate.
> I find that in ndisc_send_ns it only tries to get a ll address.
> In my case, the executive path is as below.
> ip6_output
>  ->ip6_finish_output
>   ->lwtunnel_xmit
>    ->mpls_xmit
>     ->neigh_resolve_output
>      ->neigh_probe
>       ->ndisc_solicit
>        ->ndisc_send_ns

for the archives, this is not an MPLS problem but a general IPv6
forwarding problem when the egress interface does not have a link local
address.

> 
> In RFC4861, 7.2.2 says
> "If the source address of the packet prompting the solicitation is the
> same as one of the addresses assigned to the outgoing interface, that
> address SHOULD be placed in the IP Source Address of the outgoing
> solicitation.  Otherwise, any one of the addresses assigned to the
> interface should be used."
> 
> In this patch we try get a global address if we get ll address failed.
> 
> Signed-off-by: Su Yanjun <suyj.fnst@cn.fujitsu.com>
> ---
>  include/net/addrconf.h |  4 ++++
>  net/ipv6/addrconf.c    | 34 ++++++++++++++++++++++++++++++++++
>  net/ipv6/ndisc.c       |  8 ++++++--
>  3 files changed, 44 insertions(+), 2 deletions(-)
> 
> diff --git a/include/net/addrconf.h b/include/net/addrconf.h
> index becdad5..006db8e 100644
> --- a/include/net/addrconf.h
> +++ b/include/net/addrconf.h
> @@ -107,6 +107,10 @@ int __ipv6_get_lladdr(struct inet6_dev *idev, struct in6_addr *addr,
>  		      u32 banned_flags);
>  int ipv6_get_lladdr(struct net_device *dev, struct in6_addr *addr,
>  		    u32 banned_flags);
> +int __ipv6_get_addr(struct inet6_dev *idev, struct in6_addr *addr,
> +		      u32 banned_flags);

no reason to export __ipv6_get_addr. I suspect you copied
__ipv6_get_lladdr but it has an external (to addrconf.c) user. In this
case only ipv6_get_addr needs to be exported.


> +int ipv6_get_addr(struct net_device *dev, struct in6_addr *addr,
> +		    u32 banned_flags);
>  bool inet_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2,
>  			  bool match_wildcard);
>  bool inet_rcv_saddr_any(const struct sock *sk);
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 521e320..4c0a43f 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -1870,6 +1870,40 @@ int ipv6_get_lladdr(struct net_device *dev, struct in6_addr *addr,
>  	return err;
>  }
>  
> +int __ipv6_get_addr(struct inet6_dev *idev, struct in6_addr *addr,
> +		    u32 banned_flags)
> +{
> +	struct inet6_ifaddr *ifp;
> +	int err = -EADDRNOTAVAIL;
> +
> +	list_for_each_entry_reverse(ifp, &idev->addr_list, if_list) {

Addresses are ordered by scope. __ipv6_get_lladdr uses
list_for_each_entry_reverse because the LLA's are after the globals.
Since this is falling back to 'give an address' from this interface, I
think you can just use list_for_each_entry.


> +		if (ifp->scope == 0 &&
> +		    !(ifp->flags & banned_flags)) {
> +			*addr = ifp->addr;
> +			err = 0;
> +			break;
> +		}
> +	}
> +	return err;
> +}
> +
> +int ipv6_get_addr(struct net_device *dev, struct in6_addr *addr,
> +		  u32 banned_flags)
> +{
> +	struct inet6_dev *idev;
> +	int err = -EADDRNOTAVAIL;
> +
> +	rcu_read_lock();
> +	idev = __in6_dev_get(dev);
> +	if (idev) {
> +		read_lock_bh(&idev->lock);
> +		err = __ipv6_get_addr(idev, addr, banned_flags);
> +		read_unlock_bh(&idev->lock);
> +	}
> +	rcu_read_unlock();
> +	return err;
> +}
> +
>  static int ipv6_count_addresses(const struct inet6_dev *idev)
>  {
>  	const struct inet6_ifaddr *ifp;
> diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
> index 083cc1c..18ac2fb 100644
> --- a/net/ipv6/ndisc.c
> +++ b/net/ipv6/ndisc.c
> @@ -606,8 +606,12 @@ void ndisc_send_ns(struct net_device *dev, const struct in6_addr *solicit,
>  
>  	if (!saddr) {

And since you are going to do a v2, another nit - define a local banned
flags and use it for both lookups just to make it clear.

		u32 banned_flags = IFA_F_TENTATIVE | IFA_F_OPTIMISTIC;

>  		if (ipv6_get_lladdr(dev, &addr_buf,
> -				   (IFA_F_TENTATIVE|IFA_F_OPTIMISTIC)))
> -			return;
> +				   (IFA_F_TENTATIVE | IFA_F_OPTIMISTIC))) {
> +			/* try global address */
> +			if (ipv6_get_addr(dev, &addr_buf,
> +					  (IFA_F_TENTATIVE | IFA_F_OPTIMISTIC)))
> +				return;
> +		}
>  		saddr = &addr_buf;
>  	}
>  
> 


^ permalink raw reply

* Re: [PATCH net-next v4 2/3] flow_offload: Support get default block from tc immediately
From: wenxu @ 2019-07-30  0:10 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: pablo, fw, netfilter-devel, netdev
In-Reply-To: <20190729095526.17214c4d@cakuba.netronome.com>


在 2019/7/30 0:55, Jakub Kicinski 写道:
> On Mon, 29 Jul 2019 15:18:03 +0800, wenxu wrote:
>> On 7/29/2019 12:42 PM, Jakub Kicinski wrote:
>>> On Mon, 29 Jul 2019 10:43:56 +0800, wenxu wrote:  
>>>> On 7/29/2019 4:16 AM, Jakub Kicinski wrote:  
>>>>> I don't know the nft code, but it seems unlikely it wouldn't have the
>>>>> same problem/need..    
>>>> nft don't have the same problem.  The offload rule can only attached
>>>> to offload base chain.
>>>>
>>>> Th  offload base chain is created after the device driver loaded (the
>>>> device exist).  
>>> For indirect blocks the block is on the tunnel device and the offload
>>> target is another device. E.g. you offload rules from a VXLAN device
>>> onto the ASIC. The ASICs driver does not have to be loaded when VXLAN
>>> device is created.
>>>
>>> So I feel like either the chain somehow directly references the offload
>>> target (in which case the indirect infrastructure with hash lookup etc
>>> is not needed for nft), or indirect infra is needed, and we need to take
>>> care of replays.  
>> I think the nft is different with tc. 
>>
>> In tc case we can create vxlan device add a ingress qdisc with a block success
>>
>> Then the ASIC driver loaded,  then register the vxlan indr-dev and get the block
>> adn replay it to hardware
>>
>> But in the nft case,  The base chain flags with offload. Create an offload netdev
>> base chain on vxlan device will fail if there is no indr-device to offload.
> Can you show us the offload chain spec? Does it specify offload to the
> vxlan device or the ASIC device?

nft add chain netdev firewall aclout { type filter hook ingress offload device vxlan0 priority - 300 \; }

>
> Indir-devs can come and go, how do you handle a situation where offload
> chain was installed with indir listener present, but then the ASIC
> driver got removed?

yes, I think nft is also need to get the default block in the indr-regster-cb

for the go aways and reload again case

>

^ permalink raw reply

* [net-next 11/13] net/mlx5e: Eswitch, use state_lock to synchronize vlan change
From: Saeed Mahameed @ 2019-07-29 23:50 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev@vger.kernel.org, Vlad Buslov, Jianbo Liu, Roi Dayan,
	Saeed Mahameed
In-Reply-To: <20190729234934.23595-1-saeedm@mellanox.com>

From: Vlad Buslov <vladbu@mellanox.com>

esw->state_lock is already used to protect vlan vport configuration change.
However, all preparation and correctness checks, and code that sets vport
data are not protected by this lock and assume external synchronization by
rtnl lock. In order to remove dependency on rtnl lock, extend
esw->state_lock protection to whole eswitch vlan add/del functions.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/eswitch.c   | 15 ++++++++-------
 .../mellanox/mlx5/core/eswitch_offloads.c       | 17 ++++++++++++-----
 2 files changed, 20 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index d365551d2f10..7a0888470fae 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -2086,23 +2086,19 @@ int __mlx5_eswitch_set_vport_vlan(struct mlx5_eswitch *esw,
 	if (vlan > 4095 || qos > 7)
 		return -EINVAL;
 
-	mutex_lock(&esw->state_lock);
-
 	err = modify_esw_vport_cvlan(esw->dev, vport, vlan, qos, set_flags);
 	if (err)
-		goto unlock;
+		return err;
 
 	evport->info.vlan = vlan;
 	evport->info.qos = qos;
 	if (evport->enabled && esw->mode == MLX5_ESWITCH_LEGACY) {
 		err = esw_vport_ingress_config(esw, evport);
 		if (err)
-			goto unlock;
+			return err;
 		err = esw_vport_egress_config(esw, evport);
 	}
 
-unlock:
-	mutex_unlock(&esw->state_lock);
 	return err;
 }
 
@@ -2110,11 +2106,16 @@ int mlx5_eswitch_set_vport_vlan(struct mlx5_eswitch *esw,
 				u16 vport, u16 vlan, u8 qos)
 {
 	u8 set_flags = 0;
+	int err;
 
 	if (vlan || qos)
 		set_flags = SET_VLAN_STRIP | SET_VLAN_INSERT;
 
-	return __mlx5_eswitch_set_vport_vlan(esw, vport, vlan, qos, set_flags);
+	mutex_lock(&esw->state_lock);
+	err = __mlx5_eswitch_set_vport_vlan(esw, vport, vlan, qos, set_flags);
+	mutex_unlock(&esw->state_lock);
+
+	return err;
 }
 
 int mlx5_eswitch_set_vport_spoofchk(struct mlx5_eswitch *esw,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 244ad1893691..d502c91c148c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -442,9 +442,11 @@ int mlx5_eswitch_add_vlan_action(struct mlx5_eswitch *esw,
 	fwd  = !!((attr->action & MLX5_FLOW_CONTEXT_ACTION_FWD_DEST) &&
 		   !attr->dest_chain);
 
+	mutex_lock(&esw->state_lock);
+
 	err = esw_add_vlan_action_check(attr, push, pop, fwd);
 	if (err)
-		return err;
+		goto unlock;
 
 	attr->vlan_handled = false;
 
@@ -457,11 +459,11 @@ int mlx5_eswitch_add_vlan_action(struct mlx5_eswitch *esw,
 			attr->vlan_handled = true;
 		}
 
-		return 0;
+		goto unlock;
 	}
 
 	if (!push && !pop)
-		return 0;
+		goto unlock;
 
 	if (!(offloads->vlan_push_pop_refcount)) {
 		/* it's the 1st vlan rule, apply global vlan pop policy */
@@ -486,6 +488,8 @@ int mlx5_eswitch_add_vlan_action(struct mlx5_eswitch *esw,
 out:
 	if (!err)
 		attr->vlan_handled = true;
+unlock:
+	mutex_unlock(&esw->state_lock);
 	return err;
 }
 
@@ -508,6 +512,8 @@ int mlx5_eswitch_del_vlan_action(struct mlx5_eswitch *esw,
 	pop  = !!(attr->action & MLX5_FLOW_CONTEXT_ACTION_VLAN_POP);
 	fwd  = !!(attr->action & MLX5_FLOW_CONTEXT_ACTION_FWD_DEST);
 
+	mutex_lock(&esw->state_lock);
+
 	vport = esw_vlan_action_get_vport(attr, push, pop);
 
 	if (!push && !pop && fwd) {
@@ -515,7 +521,7 @@ int mlx5_eswitch_del_vlan_action(struct mlx5_eswitch *esw,
 		if (attr->dests[0].rep->vport == MLX5_VPORT_UPLINK)
 			vport->vlan_refcount--;
 
-		return 0;
+		goto out;
 	}
 
 	if (push) {
@@ -533,12 +539,13 @@ int mlx5_eswitch_del_vlan_action(struct mlx5_eswitch *esw,
 skip_unset_push:
 	offloads->vlan_push_pop_refcount--;
 	if (offloads->vlan_push_pop_refcount)
-		return 0;
+		goto out;
 
 	/* no more vlan rules, stop global vlan pop policy */
 	err = esw_set_global_vlan_pop(esw, 0);
 
 out:
+	mutex_unlock(&esw->state_lock);
 	return err;
 }
 
-- 
2.21.0


^ permalink raw reply related

* [net-next 07/13] net/mlx5e: Change flow flags type to unsigned long
From: Saeed Mahameed @ 2019-07-29 23:50 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev@vger.kernel.org, Vlad Buslov, Jianbo Liu, Roi Dayan,
	Saeed Mahameed
In-Reply-To: <20190729234934.23595-1-saeedm@mellanox.com>

From: Vlad Buslov <vladbu@mellanox.com>

To remove dependency on rtnl lock and allow concurrent modification of
'flags' field of tc flow structure, change flow flag type to unsigned long
and use atomic bit ops for reading and changing the flags. Implement
auxiliary functions for setting, resetting and getting specific flag, and
for checking most often used flag values.

Always set flags with smp_mb__before_atomic() to ensure that all
mlx5e_tc_flow are updated before concurrent readers can read new flags
value. Rearrange all code paths to actually set flow->rule[] pointers
before setting the OFFLOADED flag. On read side, use smp_mb__after_atomic()
when accessing flags to ensure that offload-related flow fields are only
read after the flags.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/en_main.c |   8 +-
 .../net/ethernet/mellanox/mlx5/core/en_rep.c  |   6 +-
 .../net/ethernet/mellanox/mlx5/core/en_tc.c   | 206 +++++++++++-------
 .../net/ethernet/mellanox/mlx5/core/en_tc.h   |  26 ++-
 4 files changed, 149 insertions(+), 97 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 266783295124..b2618dd6dd10 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3429,7 +3429,7 @@ static int mlx5e_setup_tc_mqprio(struct mlx5e_priv *priv,
 #ifdef CONFIG_MLX5_ESWITCH
 static int mlx5e_setup_tc_cls_flower(struct mlx5e_priv *priv,
 				     struct flow_cls_offload *cls_flower,
-				     int flags)
+				     unsigned long flags)
 {
 	switch (cls_flower->command) {
 	case FLOW_CLS_REPLACE:
@@ -3449,12 +3449,12 @@ static int mlx5e_setup_tc_cls_flower(struct mlx5e_priv *priv,
 static int mlx5e_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
 				   void *cb_priv)
 {
+	unsigned long flags = MLX5_TC_FLAG(INGRESS) | MLX5_TC_FLAG(NIC_OFFLOAD);
 	struct mlx5e_priv *priv = cb_priv;
 
 	switch (type) {
 	case TC_SETUP_CLSFLOWER:
-		return mlx5e_setup_tc_cls_flower(priv, type_data, MLX5E_TC_INGRESS |
-						 MLX5E_TC_NIC_OFFLOAD);
+		return mlx5e_setup_tc_cls_flower(priv, type_data, flags);
 	default:
 		return -EOPNOTSUPP;
 	}
@@ -3647,7 +3647,7 @@ static int set_feature_tc_num_filters(struct net_device *netdev, bool enable)
 {
 	struct mlx5e_priv *priv = netdev_priv(netdev);
 
-	if (!enable && mlx5e_tc_num_filters(priv, MLX5E_TC_NIC_OFFLOAD)) {
+	if (!enable && mlx5e_tc_num_filters(priv, MLX5_TC_FLAG(NIC_OFFLOAD))) {
 		netdev_err(netdev,
 			   "Active offloaded tc filters, can't turn hw_tc_offload off\n");
 		return -EINVAL;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 496d3034e278..69f7ac8fc9be 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -659,8 +659,8 @@ mlx5e_rep_indr_offload(struct net_device *netdev,
 		       struct flow_cls_offload *flower,
 		       struct mlx5e_rep_indr_block_priv *indr_priv)
 {
+	unsigned long flags = MLX5_TC_FLAG(EGRESS) | MLX5_TC_FLAG(ESW_OFFLOAD);
 	struct mlx5e_priv *priv = netdev_priv(indr_priv->rpriv->netdev);
-	int flags = MLX5E_TC_EGRESS | MLX5E_TC_ESW_OFFLOAD;
 	int err = 0;
 
 	switch (flower->command) {
@@ -1159,12 +1159,12 @@ mlx5e_rep_setup_tc_cls_flower(struct mlx5e_priv *priv,
 static int mlx5e_rep_setup_tc_cb(enum tc_setup_type type, void *type_data,
 				 void *cb_priv)
 {
+	unsigned long flags = MLX5_TC_FLAG(INGRESS) | MLX5_TC_FLAG(ESW_OFFLOAD);
 	struct mlx5e_priv *priv = cb_priv;
 
 	switch (type) {
 	case TC_SETUP_CLSFLOWER:
-		return mlx5e_rep_setup_tc_cls_flower(priv, type_data, MLX5E_TC_INGRESS |
-						     MLX5E_TC_ESW_OFFLOAD);
+		return mlx5e_rep_setup_tc_cls_flower(priv, type_data, flags);
 	default:
 		return -EOPNOTSUPP;
 	}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index e2b87f723819..241157b699df 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -66,19 +66,19 @@ struct mlx5_nic_flow_attr {
 	struct mlx5_fc		*counter;
 };
 
-#define MLX5E_TC_FLOW_BASE (MLX5E_TC_LAST_EXPORTED_BIT + 1)
+#define MLX5E_TC_FLOW_BASE (MLX5E_TC_FLAG_LAST_EXPORTED_BIT + 1)
 
 enum {
-	MLX5E_TC_FLOW_INGRESS	= MLX5E_TC_INGRESS,
-	MLX5E_TC_FLOW_EGRESS	= MLX5E_TC_EGRESS,
-	MLX5E_TC_FLOW_ESWITCH	= MLX5E_TC_ESW_OFFLOAD,
-	MLX5E_TC_FLOW_NIC	= MLX5E_TC_NIC_OFFLOAD,
-	MLX5E_TC_FLOW_OFFLOADED	= BIT(MLX5E_TC_FLOW_BASE),
-	MLX5E_TC_FLOW_HAIRPIN	= BIT(MLX5E_TC_FLOW_BASE + 1),
-	MLX5E_TC_FLOW_HAIRPIN_RSS = BIT(MLX5E_TC_FLOW_BASE + 2),
-	MLX5E_TC_FLOW_SLOW	  = BIT(MLX5E_TC_FLOW_BASE + 3),
-	MLX5E_TC_FLOW_DUP         = BIT(MLX5E_TC_FLOW_BASE + 4),
-	MLX5E_TC_FLOW_NOT_READY   = BIT(MLX5E_TC_FLOW_BASE + 5),
+	MLX5E_TC_FLOW_FLAG_INGRESS	= MLX5E_TC_FLAG_INGRESS_BIT,
+	MLX5E_TC_FLOW_FLAG_EGRESS	= MLX5E_TC_FLAG_EGRESS_BIT,
+	MLX5E_TC_FLOW_FLAG_ESWITCH	= MLX5E_TC_FLAG_ESW_OFFLOAD_BIT,
+	MLX5E_TC_FLOW_FLAG_NIC		= MLX5E_TC_FLAG_NIC_OFFLOAD_BIT,
+	MLX5E_TC_FLOW_FLAG_OFFLOADED	= MLX5E_TC_FLOW_BASE,
+	MLX5E_TC_FLOW_FLAG_HAIRPIN	= MLX5E_TC_FLOW_BASE + 1,
+	MLX5E_TC_FLOW_FLAG_HAIRPIN_RSS	= MLX5E_TC_FLOW_BASE + 2,
+	MLX5E_TC_FLOW_FLAG_SLOW		= MLX5E_TC_FLOW_BASE + 3,
+	MLX5E_TC_FLOW_FLAG_DUP		= MLX5E_TC_FLOW_BASE + 4,
+	MLX5E_TC_FLOW_FLAG_NOT_READY	= MLX5E_TC_FLOW_BASE + 5,
 };
 
 #define MLX5E_TC_MAX_SPLITS 1
@@ -109,7 +109,7 @@ struct mlx5e_tc_flow {
 	struct rhash_head	node;
 	struct mlx5e_priv	*priv;
 	u64			cookie;
-	u16			flags;
+	unsigned long		flags;
 	struct mlx5_flow_handle *rule[MLX5E_TC_MAX_SPLITS + 1];
 	/* Flow can be associated with multiple encap IDs.
 	 * The number of encaps is bounded by the number of supported
@@ -205,6 +205,47 @@ static void mlx5e_flow_put(struct mlx5e_priv *priv,
 	}
 }
 
+static void __flow_flag_set(struct mlx5e_tc_flow *flow, unsigned long flag)
+{
+	/* Complete all memory stores before setting bit. */
+	smp_mb__before_atomic();
+	set_bit(flag, &flow->flags);
+}
+
+#define flow_flag_set(flow, flag) __flow_flag_set(flow, MLX5E_TC_FLOW_FLAG_##flag)
+
+static void __flow_flag_clear(struct mlx5e_tc_flow *flow, unsigned long flag)
+{
+	/* Complete all memory stores before clearing bit. */
+	smp_mb__before_atomic();
+	clear_bit(flag, &flow->flags);
+}
+
+#define flow_flag_clear(flow, flag) __flow_flag_clear(flow, \
+						      MLX5E_TC_FLOW_FLAG_##flag)
+
+static bool __flow_flag_test(struct mlx5e_tc_flow *flow, unsigned long flag)
+{
+	bool ret = test_bit(flag, &flow->flags);
+
+	/* Read fields of flow structure only after checking flags. */
+	smp_mb__after_atomic();
+	return ret;
+}
+
+#define flow_flag_test(flow, flag) __flow_flag_test(flow, \
+						    MLX5E_TC_FLOW_FLAG_##flag)
+
+static bool mlx5e_is_eswitch_flow(struct mlx5e_tc_flow *flow)
+{
+	return flow_flag_test(flow, ESWITCH);
+}
+
+static bool mlx5e_is_offloaded_flow(struct mlx5e_tc_flow *flow)
+{
+	return flow_flag_test(flow, OFFLOADED);
+}
+
 static inline u32 hash_mod_hdr_info(struct mod_hdr_key *key)
 {
 	return jhash(key->actions,
@@ -226,9 +267,9 @@ static int mlx5e_attach_mod_hdr(struct mlx5e_priv *priv,
 {
 	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
 	int num_actions, actions_size, namespace, err;
+	bool found = false, is_eswitch_flow;
 	struct mlx5e_mod_hdr_entry *mh;
 	struct mod_hdr_key key;
-	bool found = false;
 	u32 hash_key;
 
 	num_actions  = parse_attr->num_mod_hdr_actions;
@@ -239,7 +280,8 @@ static int mlx5e_attach_mod_hdr(struct mlx5e_priv *priv,
 
 	hash_key = hash_mod_hdr_info(&key);
 
-	if (flow->flags & MLX5E_TC_FLOW_ESWITCH) {
+	is_eswitch_flow = mlx5e_is_eswitch_flow(flow);
+	if (is_eswitch_flow) {
 		namespace = MLX5_FLOW_NAMESPACE_FDB;
 		hash_for_each_possible(esw->offloads.mod_hdr_tbl, mh,
 				       mod_hdr_hlist, hash_key) {
@@ -278,14 +320,14 @@ static int mlx5e_attach_mod_hdr(struct mlx5e_priv *priv,
 	if (err)
 		goto out_err;
 
-	if (flow->flags & MLX5E_TC_FLOW_ESWITCH)
+	if (is_eswitch_flow)
 		hash_add(esw->offloads.mod_hdr_tbl, &mh->mod_hdr_hlist, hash_key);
 	else
 		hash_add(priv->fs.tc.mod_hdr_tbl, &mh->mod_hdr_hlist, hash_key);
 
 attach_flow:
 	list_add(&flow->mod_hdr, &mh->flows);
-	if (flow->flags & MLX5E_TC_FLOW_ESWITCH)
+	if (is_eswitch_flow)
 		flow->esw_attr->mod_hdr_id = mh->mod_hdr_id;
 	else
 		flow->nic_attr->mod_hdr_id = mh->mod_hdr_id;
@@ -700,7 +742,7 @@ static int mlx5e_hairpin_flow_add(struct mlx5e_priv *priv,
 
 attach_flow:
 	if (hpe->hp->num_channels > 1) {
-		flow->flags |= MLX5E_TC_FLOW_HAIRPIN_RSS;
+		flow_flag_set(flow, HAIRPIN_RSS);
 		flow->nic_attr->hairpin_ft = hpe->hp->ttc.ft.t;
 	} else {
 		flow->nic_attr->hairpin_tirn = hpe->hp->tirn;
@@ -761,12 +803,12 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
 	flow_context->flags |= FLOW_CONTEXT_HAS_TAG;
 	flow_context->flow_tag = attr->flow_tag;
 
-	if (flow->flags & MLX5E_TC_FLOW_HAIRPIN) {
+	if (flow_flag_test(flow, HAIRPIN)) {
 		err = mlx5e_hairpin_flow_add(priv, flow, parse_attr, extack);
 		if (err)
 			return err;
 
-		if (flow->flags & MLX5E_TC_FLOW_HAIRPIN_RSS) {
+		if (flow_flag_test(flow, HAIRPIN_RSS)) {
 			dest[dest_ix].type = MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE;
 			dest[dest_ix].ft = attr->hairpin_ft;
 		} else {
@@ -849,7 +891,7 @@ static void mlx5e_tc_del_nic_flow(struct mlx5e_priv *priv,
 		mlx5_del_flow_rules(flow->rule[0]);
 	mlx5_fc_destroy(priv->mdev, counter);
 
-	if (!mlx5e_tc_num_filters(priv, MLX5E_TC_NIC_OFFLOAD)  && priv->fs.tc.t) {
+	if (!mlx5e_tc_num_filters(priv, MLX5_TC_FLAG(NIC_OFFLOAD)) && priv->fs.tc.t) {
 		mlx5_destroy_flow_table(priv->fs.tc.t);
 		priv->fs.tc.t = NULL;
 	}
@@ -857,7 +899,7 @@ static void mlx5e_tc_del_nic_flow(struct mlx5e_priv *priv,
 	if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR)
 		mlx5e_detach_mod_hdr(priv, flow);
 
-	if (flow->flags & MLX5E_TC_FLOW_HAIRPIN)
+	if (flow_flag_test(flow, HAIRPIN))
 		mlx5e_hairpin_flow_del(priv, flow);
 }
 
@@ -892,7 +934,6 @@ mlx5e_tc_offload_fdb_rules(struct mlx5_eswitch *esw,
 		}
 	}
 
-	flow->flags |= MLX5E_TC_FLOW_OFFLOADED;
 	return rule;
 }
 
@@ -901,7 +942,7 @@ mlx5e_tc_unoffload_fdb_rules(struct mlx5_eswitch *esw,
 			     struct mlx5e_tc_flow *flow,
 			   struct mlx5_esw_flow_attr *attr)
 {
-	flow->flags &= ~MLX5E_TC_FLOW_OFFLOADED;
+	flow_flag_clear(flow, OFFLOADED);
 
 	if (attr->split_count)
 		mlx5_eswitch_del_fwd_rule(esw, flow->rule[1], attr);
@@ -924,7 +965,7 @@ mlx5e_tc_offload_to_slow_path(struct mlx5_eswitch *esw,
 
 	rule = mlx5e_tc_offload_fdb_rules(esw, flow, spec, slow_attr);
 	if (!IS_ERR(rule))
-		flow->flags |= MLX5E_TC_FLOW_SLOW;
+		flow_flag_set(flow, SLOW);
 
 	return rule;
 }
@@ -939,7 +980,7 @@ mlx5e_tc_unoffload_from_slow_path(struct mlx5_eswitch *esw,
 	slow_attr->split_count = 0;
 	slow_attr->dest_chain = FDB_SLOW_PATH_CHAIN;
 	mlx5e_tc_unoffload_fdb_rules(esw, flow, slow_attr);
-	flow->flags &= ~MLX5E_TC_FLOW_SLOW;
+	flow_flag_clear(flow, SLOW);
 }
 
 static void add_unready_flow(struct mlx5e_tc_flow *flow)
@@ -952,14 +993,14 @@ static void add_unready_flow(struct mlx5e_tc_flow *flow)
 	rpriv = mlx5_eswitch_get_uplink_priv(esw, REP_ETH);
 	uplink_priv = &rpriv->uplink_priv;
 
-	flow->flags |= MLX5E_TC_FLOW_NOT_READY;
+	flow_flag_set(flow, NOT_READY);
 	list_add_tail(&flow->unready, &uplink_priv->unready_flows);
 }
 
 static void remove_unready_flow(struct mlx5e_tc_flow *flow)
 {
 	list_del(&flow->unready);
-	flow->flags &= ~MLX5E_TC_FLOW_NOT_READY;
+	flow_flag_clear(flow, NOT_READY);
 }
 
 static int
@@ -1049,6 +1090,8 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv,
 
 	if (IS_ERR(flow->rule[0]))
 		return PTR_ERR(flow->rule[0]);
+	else
+		flow_flag_set(flow, OFFLOADED);
 
 	return 0;
 }
@@ -1074,14 +1117,14 @@ static void mlx5e_tc_del_fdb_flow(struct mlx5e_priv *priv,
 	struct mlx5_esw_flow_attr slow_attr;
 	int out_index;
 
-	if (flow->flags & MLX5E_TC_FLOW_NOT_READY) {
+	if (flow_flag_test(flow, NOT_READY)) {
 		remove_unready_flow(flow);
 		kvfree(attr->parse_attr);
 		return;
 	}
 
-	if (flow->flags & MLX5E_TC_FLOW_OFFLOADED) {
-		if (flow->flags & MLX5E_TC_FLOW_SLOW)
+	if (mlx5e_is_offloaded_flow(flow)) {
+		if (flow_flag_test(flow, SLOW))
 			mlx5e_tc_unoffload_from_slow_path(esw, flow, &slow_attr);
 		else
 			mlx5e_tc_unoffload_fdb_rules(esw, flow, attr);
@@ -1166,8 +1209,9 @@ void mlx5e_tc_encap_flows_add(struct mlx5e_priv *priv,
 		}
 
 		mlx5e_tc_unoffload_from_slow_path(esw, flow, &slow_attr);
-		flow->flags |= MLX5E_TC_FLOW_OFFLOADED; /* was unset when slow path rule removed */
 		flow->rule[0] = rule;
+		/* was unset when slow path rule removed */
+		flow_flag_set(flow, OFFLOADED);
 
 loop_cont:
 		mlx5e_flow_put(priv, flow);
@@ -1205,8 +1249,9 @@ void mlx5e_tc_encap_flows_del(struct mlx5e_priv *priv,
 		}
 
 		mlx5e_tc_unoffload_fdb_rules(esw, flow, flow->esw_attr);
-		flow->flags |= MLX5E_TC_FLOW_OFFLOADED; /* was unset when fast path rule removed */
 		flow->rule[0] = rule;
+		/* was unset when fast path rule removed */
+		flow_flag_set(flow, OFFLOADED);
 
 loop_cont:
 		mlx5e_flow_put(priv, flow);
@@ -1219,7 +1264,7 @@ void mlx5e_tc_encap_flows_del(struct mlx5e_priv *priv,
 
 static struct mlx5_fc *mlx5e_tc_get_counter(struct mlx5e_tc_flow *flow)
 {
-	if (flow->flags & MLX5E_TC_FLOW_ESWITCH)
+	if (mlx5e_is_eswitch_flow(flow))
 		return flow->esw_attr->counter;
 	else
 		return flow->nic_attr->counter;
@@ -1255,7 +1300,7 @@ void mlx5e_tc_update_neigh_used_value(struct mlx5e_neigh_hash_entry *nhe)
 			if (IS_ERR(mlx5e_flow_get(flow)))
 				continue;
 
-			if (flow->flags & MLX5E_TC_FLOW_OFFLOADED) {
+			if (mlx5e_is_offloaded_flow(flow)) {
 				counter = mlx5e_tc_get_counter(flow);
 				mlx5_fc_query_cached(counter, &bytes, &packets, &lastuse);
 				if (time_after((unsigned long)lastuse, nhe->reported_lastuse)) {
@@ -1315,15 +1360,15 @@ static void __mlx5e_tc_del_fdb_peer_flow(struct mlx5e_tc_flow *flow)
 {
 	struct mlx5_eswitch *esw = flow->priv->mdev->priv.eswitch;
 
-	if (!(flow->flags & MLX5E_TC_FLOW_ESWITCH) ||
-	    !(flow->flags & MLX5E_TC_FLOW_DUP))
+	if (!flow_flag_test(flow, ESWITCH) ||
+	    !flow_flag_test(flow, DUP))
 		return;
 
 	mutex_lock(&esw->offloads.peer_mutex);
 	list_del(&flow->peer);
 	mutex_unlock(&esw->offloads.peer_mutex);
 
-	flow->flags &= ~MLX5E_TC_FLOW_DUP;
+	flow_flag_clear(flow, DUP);
 
 	mlx5e_tc_del_fdb_flow(flow->peer_flow->priv, flow->peer_flow);
 	kvfree(flow->peer_flow);
@@ -1347,7 +1392,7 @@ static void mlx5e_tc_del_fdb_peer_flow(struct mlx5e_tc_flow *flow)
 static void mlx5e_tc_del_flow(struct mlx5e_priv *priv,
 			      struct mlx5e_tc_flow *flow)
 {
-	if (flow->flags & MLX5E_TC_FLOW_ESWITCH) {
+	if (mlx5e_is_eswitch_flow(flow)) {
 		mlx5e_tc_del_fdb_peer_flow(flow);
 		mlx5e_tc_del_fdb_flow(priv, flow);
 	} else {
@@ -1845,11 +1890,13 @@ static int parse_cls_flower(struct mlx5e_priv *priv,
 	struct mlx5e_rep_priv *rpriv = priv->ppriv;
 	u8 match_level, tunnel_match_level = MLX5_MATCH_NONE;
 	struct mlx5_eswitch_rep *rep;
+	bool is_eswitch_flow;
 	int err;
 
 	err = __parse_cls_flower(priv, spec, f, filter_dev, &match_level, &tunnel_match_level);
 
-	if (!err && (flow->flags & MLX5E_TC_FLOW_ESWITCH)) {
+	is_eswitch_flow = mlx5e_is_eswitch_flow(flow);
+	if (!err && is_eswitch_flow) {
 		rep = rpriv->rep;
 		if (rep->vport != MLX5_VPORT_UPLINK &&
 		    (esw->offloads.inline_mode != MLX5_INLINE_MODE_NONE &&
@@ -1863,7 +1910,7 @@ static int parse_cls_flower(struct mlx5e_priv *priv,
 		}
 	}
 
-	if (flow->flags & MLX5E_TC_FLOW_ESWITCH) {
+	if (is_eswitch_flow) {
 		flow->esw_attr->match_level = match_level;
 		flow->esw_attr->tunnel_match_level = tunnel_match_level;
 	} else {
@@ -2384,12 +2431,12 @@ static bool actions_match_supported(struct mlx5e_priv *priv,
 {
 	u32 actions;
 
-	if (flow->flags & MLX5E_TC_FLOW_ESWITCH)
+	if (mlx5e_is_eswitch_flow(flow))
 		actions = flow->esw_attr->action;
 	else
 		actions = flow->nic_attr->action;
 
-	if (flow->flags & MLX5E_TC_FLOW_EGRESS &&
+	if (flow_flag_test(flow, EGRESS) &&
 	    !((actions & MLX5_FLOW_CONTEXT_ACTION_DECAP) ||
 	      (actions & MLX5_FLOW_CONTEXT_ACTION_VLAN_POP)))
 		return false;
@@ -2541,7 +2588,7 @@ static int parse_tc_nic_actions(struct mlx5e_priv *priv,
 			if (priv->netdev->netdev_ops == peer_dev->netdev_ops &&
 			    same_hw_devs(priv, netdev_priv(peer_dev))) {
 				parse_attr->mirred_ifindex[0] = peer_dev->ifindex;
-				flow->flags |= MLX5E_TC_FLOW_HAIRPIN;
+				flow_flag_set(flow, HAIRPIN);
 				action |= MLX5_FLOW_CONTEXT_ACTION_FWD_DEST |
 					  MLX5_FLOW_CONTEXT_ACTION_COUNT;
 			} else {
@@ -3065,19 +3112,19 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv,
 	return 0;
 }
 
-static void get_flags(int flags, u16 *flow_flags)
+static void get_flags(int flags, unsigned long *flow_flags)
 {
-	u16 __flow_flags = 0;
+	unsigned long __flow_flags = 0;
 
-	if (flags & MLX5E_TC_INGRESS)
-		__flow_flags |= MLX5E_TC_FLOW_INGRESS;
-	if (flags & MLX5E_TC_EGRESS)
-		__flow_flags |= MLX5E_TC_FLOW_EGRESS;
+	if (flags & MLX5_TC_FLAG(INGRESS))
+		__flow_flags |= BIT(MLX5E_TC_FLOW_FLAG_INGRESS);
+	if (flags & MLX5_TC_FLAG(EGRESS))
+		__flow_flags |= BIT(MLX5E_TC_FLOW_FLAG_EGRESS);
 
-	if (flags & MLX5E_TC_ESW_OFFLOAD)
-		__flow_flags |= MLX5E_TC_FLOW_ESWITCH;
-	if (flags & MLX5E_TC_NIC_OFFLOAD)
-		__flow_flags |= MLX5E_TC_FLOW_NIC;
+	if (flags & MLX5_TC_FLAG(ESW_OFFLOAD))
+		__flow_flags |= BIT(MLX5E_TC_FLOW_FLAG_ESWITCH);
+	if (flags & MLX5_TC_FLAG(NIC_OFFLOAD))
+		__flow_flags |= BIT(MLX5E_TC_FLOW_FLAG_NIC);
 
 	*flow_flags = __flow_flags;
 }
@@ -3089,12 +3136,13 @@ static const struct rhashtable_params tc_ht_params = {
 	.automatic_shrinking = true,
 };
 
-static struct rhashtable *get_tc_ht(struct mlx5e_priv *priv, int flags)
+static struct rhashtable *get_tc_ht(struct mlx5e_priv *priv,
+				    unsigned long flags)
 {
 	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
 	struct mlx5e_rep_priv *uplink_rpriv;
 
-	if (flags & MLX5E_TC_ESW_OFFLOAD) {
+	if (flags & MLX5_TC_FLAG(ESW_OFFLOAD)) {
 		uplink_rpriv = mlx5_eswitch_get_uplink_priv(esw, REP_ETH);
 		return &uplink_rpriv->uplink_priv.tc_ht;
 	} else /* NIC offload */
@@ -3105,7 +3153,7 @@ static bool is_peer_flow_needed(struct mlx5e_tc_flow *flow)
 {
 	struct mlx5_esw_flow_attr *attr = flow->esw_attr;
 	bool is_rep_ingress = attr->in_rep->vport != MLX5_VPORT_UPLINK &&
-			      flow->flags & MLX5E_TC_FLOW_INGRESS;
+		flow_flag_test(flow, INGRESS);
 	bool act_is_encap = !!(attr->action &
 			       MLX5_FLOW_CONTEXT_ACTION_PACKET_REFORMAT);
 	bool esw_paired = mlx5_devcom_is_paired(attr->in_mdev->priv.devcom,
@@ -3124,7 +3172,7 @@ static bool is_peer_flow_needed(struct mlx5e_tc_flow *flow)
 
 static int
 mlx5e_alloc_flow(struct mlx5e_priv *priv, int attr_size,
-		 struct flow_cls_offload *f, u16 flow_flags,
+		 struct flow_cls_offload *f, unsigned long flow_flags,
 		 struct mlx5e_tc_flow_parse_attr **__parse_attr,
 		 struct mlx5e_tc_flow **__flow)
 {
@@ -3186,7 +3234,7 @@ mlx5e_flow_esw_attr_init(struct mlx5_esw_flow_attr *esw_attr,
 static struct mlx5e_tc_flow *
 __mlx5e_add_fdb_flow(struct mlx5e_priv *priv,
 		     struct flow_cls_offload *f,
-		     u16 flow_flags,
+		     unsigned long flow_flags,
 		     struct net_device *filter_dev,
 		     struct mlx5_eswitch_rep *in_rep,
 		     struct mlx5_core_dev *in_mdev)
@@ -3197,7 +3245,7 @@ __mlx5e_add_fdb_flow(struct mlx5e_priv *priv,
 	struct mlx5e_tc_flow *flow;
 	int attr_size, err;
 
-	flow_flags |= MLX5E_TC_FLOW_ESWITCH;
+	flow_flags |= BIT(MLX5E_TC_FLOW_FLAG_ESWITCH);
 	attr_size  = sizeof(struct mlx5_esw_flow_attr);
 	err = mlx5e_alloc_flow(priv, attr_size, f, flow_flags,
 			       &parse_attr, &flow);
@@ -3236,7 +3284,7 @@ __mlx5e_add_fdb_flow(struct mlx5e_priv *priv,
 
 static int mlx5e_tc_add_fdb_peer_flow(struct flow_cls_offload *f,
 				      struct mlx5e_tc_flow *flow,
-				      u16 flow_flags)
+				      unsigned long flow_flags)
 {
 	struct mlx5e_priv *priv = flow->priv, *peer_priv;
 	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch, *peer_esw;
@@ -3274,7 +3322,7 @@ static int mlx5e_tc_add_fdb_peer_flow(struct flow_cls_offload *f,
 	}
 
 	flow->peer_flow = peer_flow;
-	flow->flags |= MLX5E_TC_FLOW_DUP;
+	flow_flag_set(flow, DUP);
 	mutex_lock(&esw->offloads.peer_mutex);
 	list_add_tail(&flow->peer, &esw->offloads.peer_flows);
 	mutex_unlock(&esw->offloads.peer_mutex);
@@ -3287,7 +3335,7 @@ static int mlx5e_tc_add_fdb_peer_flow(struct flow_cls_offload *f,
 static int
 mlx5e_add_fdb_flow(struct mlx5e_priv *priv,
 		   struct flow_cls_offload *f,
-		   u16 flow_flags,
+		   unsigned long flow_flags,
 		   struct net_device *filter_dev,
 		   struct mlx5e_tc_flow **__flow)
 {
@@ -3321,7 +3369,7 @@ mlx5e_add_fdb_flow(struct mlx5e_priv *priv,
 static int
 mlx5e_add_nic_flow(struct mlx5e_priv *priv,
 		   struct flow_cls_offload *f,
-		   u16 flow_flags,
+		   unsigned long flow_flags,
 		   struct net_device *filter_dev,
 		   struct mlx5e_tc_flow **__flow)
 {
@@ -3335,7 +3383,7 @@ mlx5e_add_nic_flow(struct mlx5e_priv *priv,
 	if (!tc_cls_can_offload_and_chain0(priv->netdev, &f->common))
 		return -EOPNOTSUPP;
 
-	flow_flags |= MLX5E_TC_FLOW_NIC;
+	flow_flags |= BIT(MLX5E_TC_FLOW_FLAG_NIC);
 	attr_size  = sizeof(struct mlx5_nic_flow_attr);
 	err = mlx5e_alloc_flow(priv, attr_size, f, flow_flags,
 			       &parse_attr, &flow);
@@ -3356,7 +3404,7 @@ mlx5e_add_nic_flow(struct mlx5e_priv *priv,
 	if (err)
 		goto err_free;
 
-	flow->flags |= MLX5E_TC_FLOW_OFFLOADED;
+	flow_flag_set(flow, OFFLOADED);
 	kvfree(parse_attr);
 	*__flow = flow;
 
@@ -3372,12 +3420,12 @@ mlx5e_add_nic_flow(struct mlx5e_priv *priv,
 static int
 mlx5e_tc_add_flow(struct mlx5e_priv *priv,
 		  struct flow_cls_offload *f,
-		  int flags,
+		  unsigned long flags,
 		  struct net_device *filter_dev,
 		  struct mlx5e_tc_flow **flow)
 {
 	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
-	u16 flow_flags;
+	unsigned long flow_flags;
 	int err;
 
 	get_flags(flags, &flow_flags);
@@ -3396,7 +3444,7 @@ mlx5e_tc_add_flow(struct mlx5e_priv *priv,
 }
 
 int mlx5e_configure_flower(struct net_device *dev, struct mlx5e_priv *priv,
-			   struct flow_cls_offload *f, int flags)
+			   struct flow_cls_offload *f, unsigned long flags)
 {
 	struct netlink_ext_ack *extack = f->common.extack;
 	struct rhashtable *tc_ht = get_tc_ht(priv, flags);
@@ -3430,19 +3478,17 @@ int mlx5e_configure_flower(struct net_device *dev, struct mlx5e_priv *priv,
 	return err;
 }
 
-#define DIRECTION_MASK (MLX5E_TC_INGRESS | MLX5E_TC_EGRESS)
-#define FLOW_DIRECTION_MASK (MLX5E_TC_FLOW_INGRESS | MLX5E_TC_FLOW_EGRESS)
-
 static bool same_flow_direction(struct mlx5e_tc_flow *flow, int flags)
 {
-	if ((flow->flags & FLOW_DIRECTION_MASK) == (flags & DIRECTION_MASK))
-		return true;
+	bool dir_ingress = !!(flags & MLX5_TC_FLAG(INGRESS));
+	bool dir_egress = !!(flags & MLX5_TC_FLAG(EGRESS));
 
-	return false;
+	return flow_flag_test(flow, INGRESS) == dir_ingress &&
+		flow_flag_test(flow, EGRESS) == dir_egress;
 }
 
 int mlx5e_delete_flower(struct net_device *dev, struct mlx5e_priv *priv,
-			struct flow_cls_offload *f, int flags)
+			struct flow_cls_offload *f, unsigned long flags)
 {
 	struct rhashtable *tc_ht = get_tc_ht(priv, flags);
 	struct mlx5e_tc_flow *flow;
@@ -3459,7 +3505,7 @@ int mlx5e_delete_flower(struct net_device *dev, struct mlx5e_priv *priv,
 }
 
 int mlx5e_stats_flower(struct net_device *dev, struct mlx5e_priv *priv,
-		       struct flow_cls_offload *f, int flags)
+		       struct flow_cls_offload *f, unsigned long flags)
 {
 	struct mlx5_devcom *devcom = priv->mdev->priv.devcom;
 	struct rhashtable *tc_ht = get_tc_ht(priv, flags);
@@ -3481,7 +3527,7 @@ int mlx5e_stats_flower(struct net_device *dev, struct mlx5e_priv *priv,
 		goto errout;
 	}
 
-	if (flow->flags & MLX5E_TC_FLOW_OFFLOADED) {
+	if (mlx5e_is_offloaded_flow(flow)) {
 		counter = mlx5e_tc_get_counter(flow);
 		if (!counter)
 			goto errout;
@@ -3496,8 +3542,8 @@ int mlx5e_stats_flower(struct net_device *dev, struct mlx5e_priv *priv,
 	if (!peer_esw)
 		goto out;
 
-	if ((flow->flags & MLX5E_TC_FLOW_DUP) &&
-	    (flow->peer_flow->flags & MLX5E_TC_FLOW_OFFLOADED)) {
+	if (flow_flag_test(flow, DUP) &&
+	    flow_flag_test(flow->peer_flow, OFFLOADED)) {
 		u64 bytes2;
 		u64 packets2;
 		u64 lastuse2;
@@ -3622,7 +3668,7 @@ void mlx5e_tc_esw_cleanup(struct rhashtable *tc_ht)
 	rhashtable_free_and_destroy(tc_ht, _mlx5e_tc_del_flow, NULL);
 }
 
-int mlx5e_tc_num_filters(struct mlx5e_priv *priv, int flags)
+int mlx5e_tc_num_filters(struct mlx5e_priv *priv, unsigned long flags)
 {
 	struct rhashtable *tc_ht = get_tc_ht(priv, flags);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
index 3ab39275ca7d..1cb66bf76997 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
@@ -40,13 +40,15 @@
 #ifdef CONFIG_MLX5_ESWITCH
 
 enum {
-	MLX5E_TC_INGRESS = BIT(0),
-	MLX5E_TC_EGRESS  = BIT(1),
-	MLX5E_TC_NIC_OFFLOAD = BIT(2),
-	MLX5E_TC_ESW_OFFLOAD = BIT(3),
-	MLX5E_TC_LAST_EXPORTED_BIT = 3,
+	MLX5E_TC_FLAG_INGRESS_BIT,
+	MLX5E_TC_FLAG_EGRESS_BIT,
+	MLX5E_TC_FLAG_NIC_OFFLOAD_BIT,
+	MLX5E_TC_FLAG_ESW_OFFLOAD_BIT,
+	MLX5E_TC_FLAG_LAST_EXPORTED_BIT = MLX5E_TC_FLAG_ESW_OFFLOAD_BIT,
 };
 
+#define MLX5_TC_FLAG(flag) BIT(MLX5E_TC_FLAG_##flag##_BIT)
+
 int mlx5e_tc_nic_init(struct mlx5e_priv *priv);
 void mlx5e_tc_nic_cleanup(struct mlx5e_priv *priv);
 
@@ -54,12 +56,12 @@ int mlx5e_tc_esw_init(struct rhashtable *tc_ht);
 void mlx5e_tc_esw_cleanup(struct rhashtable *tc_ht);
 
 int mlx5e_configure_flower(struct net_device *dev, struct mlx5e_priv *priv,
-			   struct flow_cls_offload *f, int flags);
+			   struct flow_cls_offload *f, unsigned long flags);
 int mlx5e_delete_flower(struct net_device *dev, struct mlx5e_priv *priv,
-			struct flow_cls_offload *f, int flags);
+			struct flow_cls_offload *f, unsigned long flags);
 
 int mlx5e_stats_flower(struct net_device *dev, struct mlx5e_priv *priv,
-		       struct flow_cls_offload *f, int flags);
+		       struct flow_cls_offload *f, unsigned long flags);
 
 struct mlx5e_encap_entry;
 void mlx5e_tc_encap_flows_add(struct mlx5e_priv *priv,
@@ -70,7 +72,7 @@ void mlx5e_tc_encap_flows_del(struct mlx5e_priv *priv,
 struct mlx5e_neigh_hash_entry;
 void mlx5e_tc_update_neigh_used_value(struct mlx5e_neigh_hash_entry *nhe);
 
-int mlx5e_tc_num_filters(struct mlx5e_priv *priv, int flags);
+int mlx5e_tc_num_filters(struct mlx5e_priv *priv, unsigned long flags);
 
 void mlx5e_tc_reoffload_flows_work(struct work_struct *work);
 
@@ -80,7 +82,11 @@ bool mlx5e_is_valid_eswitch_fwd_dev(struct mlx5e_priv *priv,
 #else /* CONFIG_MLX5_ESWITCH */
 static inline int  mlx5e_tc_nic_init(struct mlx5e_priv *priv) { return 0; }
 static inline void mlx5e_tc_nic_cleanup(struct mlx5e_priv *priv) {}
-static inline int  mlx5e_tc_num_filters(struct mlx5e_priv *priv, int flags) { return 0; }
+static inline int  mlx5e_tc_num_filters(struct mlx5e_priv *priv,
+					unsigned long flags)
+{
+	return 0;
+}
 #endif
 
 #endif /* __MLX5_EN_TC_H__ */
-- 
2.21.0


^ permalink raw reply related

* [net-next 13/13] net/mlx5e: Protect tc flow table with mutex
From: Saeed Mahameed @ 2019-07-29 23:50 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev@vger.kernel.org, Vlad Buslov, Jianbo Liu, Roi Dayan,
	Saeed Mahameed
In-Reply-To: <20190729234934.23595-1-saeedm@mellanox.com>

From: Vlad Buslov <vladbu@mellanox.com>

TC flow table is created when first flow is added, and destroyed when last
flow is removed. This assumes that all accesses to the table are externally
synchronized with rtnl lock. To remove dependency on rtnl lock, add new
mutex mlx5e_tc_table->t_lock and use it to protect the flow table.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en/fs.h | 2 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 7 +++++++
 2 files changed, 9 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h b/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h
index eb70ada89b09..4518ce19112e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h
@@ -10,6 +10,8 @@ enum {
 };
 
 struct mlx5e_tc_table {
+	/* protects flow table */
+	struct mutex			t_lock;
 	struct mlx5_flow_table		*t;
 
 	struct rhashtable               ht;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 595a4c5667ea..f3ed028d5017 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -854,6 +854,7 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
 			return err;
 	}
 
+	mutex_lock(&priv->fs.tc.t_lock);
 	if (IS_ERR_OR_NULL(priv->fs.tc.t)) {
 		int tc_grp_size, tc_tbl_size;
 		u32 max_flow_counter;
@@ -873,6 +874,7 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
 							    MLX5E_TC_TABLE_NUM_GROUPS,
 							    MLX5E_TC_FT_LEVEL, 0);
 		if (IS_ERR(priv->fs.tc.t)) {
+			mutex_unlock(&priv->fs.tc.t_lock);
 			NL_SET_ERR_MSG_MOD(extack,
 					   "Failed to create tc offload table\n");
 			netdev_err(priv->netdev,
@@ -886,6 +888,7 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
 
 	flow->rule[0] = mlx5_add_flow_rules(priv->fs.tc.t, &parse_attr->spec,
 					    &flow_act, dest, dest_ix);
+	mutex_unlock(&priv->fs.tc.t_lock);
 
 	if (IS_ERR(flow->rule[0]))
 		return PTR_ERR(flow->rule[0]);
@@ -904,10 +907,12 @@ static void mlx5e_tc_del_nic_flow(struct mlx5e_priv *priv,
 		mlx5_del_flow_rules(flow->rule[0]);
 	mlx5_fc_destroy(priv->mdev, counter);
 
+	mutex_lock(&priv->fs.tc.t_lock);
 	if (!mlx5e_tc_num_filters(priv, MLX5_TC_FLAG(NIC_OFFLOAD)) && priv->fs.tc.t) {
 		mlx5_destroy_flow_table(priv->fs.tc.t);
 		priv->fs.tc.t = NULL;
 	}
+	mutex_unlock(&priv->fs.tc.t_lock);
 
 	if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR)
 		mlx5e_detach_mod_hdr(priv, flow);
@@ -3684,6 +3689,7 @@ int mlx5e_tc_nic_init(struct mlx5e_priv *priv)
 	struct mlx5e_tc_table *tc = &priv->fs.tc;
 	int err;
 
+	mutex_init(&tc->t_lock);
 	hash_init(tc->mod_hdr_tbl);
 	hash_init(tc->hairpin_tbl);
 
@@ -3722,6 +3728,7 @@ void mlx5e_tc_nic_cleanup(struct mlx5e_priv *priv)
 		mlx5_destroy_flow_table(tc->t);
 		tc->t = NULL;
 	}
+	mutex_destroy(&tc->t_lock);
 }
 
 int mlx5e_tc_esw_init(struct rhashtable *tc_ht)
-- 
2.21.0


^ permalink raw reply related

* [net-next 12/13] net/mlx5e: Rely on rcu instead of rtnl lock when getting upper dev
From: Saeed Mahameed @ 2019-07-29 23:50 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev@vger.kernel.org, Vlad Buslov, Jianbo Liu, Roi Dayan,
	Saeed Mahameed
In-Reply-To: <20190729234934.23595-1-saeedm@mellanox.com>

From: Vlad Buslov <vladbu@mellanox.com>

Function netdev_master_upper_dev_get() generates warning if caller doesn't
hold rtnl lock. Modify rules update path to use rcu version of that
function.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/en/tc_tun.c    | 14 +++++++++++++-
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c    |  6 +++++-
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c
index ae439d95f5a3..4c4620db3d31 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c
@@ -31,11 +31,23 @@ static int get_route_and_out_devs(struct mlx5e_priv *priv,
 
 	real_dev = is_vlan_dev(dev) ? vlan_dev_real_dev(dev) : dev;
 	uplink_dev = mlx5_eswitch_uplink_get_proto_dev(esw, REP_ETH);
-	uplink_upper = netdev_master_upper_dev_get(uplink_dev);
+
+	rcu_read_lock();
+	uplink_upper = netdev_master_upper_dev_get_rcu(uplink_dev);
+	/* mlx5_lag_is_sriov() is a blocking function which can't be called
+	 * while holding rcu read lock. Take the net_device for correctness
+	 * sake.
+	 */
+	if (uplink_upper)
+		dev_hold(uplink_upper);
+	rcu_read_unlock();
+
 	dst_is_lag_dev = (uplink_upper &&
 			  netif_is_lag_master(uplink_upper) &&
 			  real_dev == uplink_upper &&
 			  mlx5_lag_is_sriov(priv->mdev));
+	if (uplink_upper)
+		dev_put(uplink_upper);
 
 	/* if the egress device isn't on the same HW e-switch or
 	 * it's a LAG device, use the uplink
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 714aa9d7180b..595a4c5667ea 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -2978,12 +2978,16 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv,
 			if (netdev_port_same_parent_id(priv->netdev, out_dev)) {
 				struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
 				struct net_device *uplink_dev = mlx5_eswitch_uplink_get_proto_dev(esw, REP_ETH);
-				struct net_device *uplink_upper = netdev_master_upper_dev_get(uplink_dev);
+				struct net_device *uplink_upper;
 
+				rcu_read_lock();
+				uplink_upper =
+					netdev_master_upper_dev_get_rcu(uplink_dev);
 				if (uplink_upper &&
 				    netif_is_lag_master(uplink_upper) &&
 				    uplink_upper == out_dev)
 					out_dev = uplink_dev;
+				rcu_read_unlock();
 
 				if (is_vlan_dev(out_dev)) {
 					err = add_vlan_push_action(priv, attr,
-- 
2.21.0


^ permalink raw reply related

* [net-next 10/13] net/mlx5e: Eswitch, change offloads num_flows type to atomic64
From: Saeed Mahameed @ 2019-07-29 23:50 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev@vger.kernel.org, Vlad Buslov, Jianbo Liu, Roi Dayan,
	Saeed Mahameed
In-Reply-To: <20190729234934.23595-1-saeedm@mellanox.com>

From: Vlad Buslov <vladbu@mellanox.com>

Eswitch implements its own locking by means of state_lock mutex and
multiple fine-grained lock in containing data structures, and is supposed
to not rely on rtnl lock. However, eswitch offloads num_flows type is a
regular long long integer and cannot be modified concurrently. This is an
implicit assumptions that mlx5 tc is serialized (by rtnl lock or any other
means). In order to remove implicit dependency on rtnl lock, change
num_flows type to atomic64 to allow concurrent modifications.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c      |  1 +
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h      |  3 ++-
 .../net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 10 +++++-----
 3 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index 1f3891fde2eb..d365551d2f10 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -1933,6 +1933,7 @@ int mlx5_eswitch_init(struct mlx5_core_dev *dev)
 
 	hash_init(esw->offloads.encap_tbl);
 	hash_init(esw->offloads.mod_hdr_tbl);
+	atomic64_set(&esw->offloads.num_flows, 0);
 	mutex_init(&esw->state_lock);
 
 	mlx5_esw_for_all_vports(esw, i, vport) {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index a38e8a3c7c9a..60f0c62b447b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -35,6 +35,7 @@
 
 #include <linux/if_ether.h>
 #include <linux/if_link.h>
+#include <linux/atomic.h>
 #include <net/devlink.h>
 #include <linux/mlx5/device.h>
 #include <linux/mlx5/eswitch.h>
@@ -179,7 +180,7 @@ struct mlx5_esw_offload {
 	struct mutex termtbl_mutex; /* protects termtbl hash */
 	const struct mlx5_eswitch_rep_ops *rep_ops[NUM_REP_TYPES];
 	u8 inline_mode;
-	u64 num_flows;
+	atomic64_t num_flows;
 	enum devlink_eswitch_encap_mode encap;
 };
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 089ae4d48a82..244ad1893691 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -233,7 +233,7 @@ mlx5_eswitch_add_offloaded_rule(struct mlx5_eswitch *esw,
 	if (IS_ERR(rule))
 		goto err_add_rule;
 	else
-		esw->offloads.num_flows++;
+		atomic64_inc(&esw->offloads.num_flows);
 
 	return rule;
 
@@ -298,7 +298,7 @@ mlx5_eswitch_add_fwd_rule(struct mlx5_eswitch *esw,
 	if (IS_ERR(rule))
 		goto add_err;
 
-	esw->offloads.num_flows++;
+	atomic64_inc(&esw->offloads.num_flows);
 
 	return rule;
 add_err:
@@ -326,7 +326,7 @@ __mlx5_eswitch_del_rule(struct mlx5_eswitch *esw,
 			mlx5_eswitch_termtbl_put(esw, attr->dests[i].termtbl);
 	}
 
-	esw->offloads.num_flows--;
+	atomic64_dec(&esw->offloads.num_flows);
 
 	if (fwd_rule)  {
 		esw_put_prio_table(esw, attr->chain, attr->prio, 1);
@@ -2349,7 +2349,7 @@ int mlx5_devlink_eswitch_inline_mode_set(struct devlink *devlink, u8 mode,
 		break;
 	}
 
-	if (esw->offloads.num_flows > 0) {
+	if (atomic64_read(&esw->offloads.num_flows) > 0) {
 		NL_SET_ERR_MSG_MOD(extack,
 				   "Can't set inline mode when flows are configured");
 		return -EOPNOTSUPP;
@@ -2459,7 +2459,7 @@ int mlx5_devlink_eswitch_encap_mode_set(struct devlink *devlink,
 	if (esw->offloads.encap == encap)
 		return 0;
 
-	if (esw->offloads.num_flows > 0) {
+	if (atomic64_read(&esw->offloads.num_flows) > 0) {
 		NL_SET_ERR_MSG_MOD(extack,
 				   "Can't set encapsulation when flows are configured");
 		return -EOPNOTSUPP;
-- 
2.21.0


^ permalink raw reply related

* [net-next 09/13] net/mlx5e: Protect unready flows with dedicated lock
From: Saeed Mahameed @ 2019-07-29 23:50 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev@vger.kernel.org, Vlad Buslov, Jianbo Liu, Roi Dayan,
	Saeed Mahameed
In-Reply-To: <20190729234934.23595-1-saeedm@mellanox.com>

From: Vlad Buslov <vladbu@mellanox.com>

In order to remove dependency on rtnl lock for protecting unready_flows
list when reoffloading unready flows on workqueue, extend representor
uplink private structure with dedicated 'unready_flows_lock' mutex. Take
the lock in all users of unready_flows list before accessing it. Implement
helper functions to add and delete unready flow.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/en_rep.c  |  2 +
 .../net/ethernet/mellanox/mlx5/core/en_rep.h  |  2 +
 .../net/ethernet/mellanox/mlx5/core/en_tc.c   | 43 ++++++++++++++++---
 3 files changed, 40 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 69f7ac8fc9be..6edf0aeb1e26 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -1560,6 +1560,7 @@ static int mlx5e_init_rep_tx(struct mlx5e_priv *priv)
 	if (rpriv->rep->vport == MLX5_VPORT_UPLINK) {
 		uplink_priv = &rpriv->uplink_priv;
 
+		mutex_init(&uplink_priv->unready_flows_lock);
 		INIT_LIST_HEAD(&uplink_priv->unready_flows);
 
 		/* init shared tc flow table */
@@ -1604,6 +1605,7 @@ static void mlx5e_cleanup_rep_tx(struct mlx5e_priv *priv)
 
 		/* delete shared tc flow table */
 		mlx5e_tc_esw_cleanup(&rpriv->uplink_priv.tc_ht);
+		mutex_destroy(&rpriv->uplink_priv.unready_flows_lock);
 	}
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.h b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.h
index c56e6ee4350c..10fafd5fa17b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.h
@@ -75,6 +75,8 @@ struct mlx5_rep_uplink_priv {
 
 	struct mlx5_tun_entropy tun_entropy;
 
+	/* protects unready_flows */
+	struct mutex                unready_flows_lock;
 	struct list_head            unready_flows;
 	struct work_struct          reoffload_flows_work;
 };
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index a39f8a07de0a..714aa9d7180b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -996,6 +996,25 @@ mlx5e_tc_unoffload_from_slow_path(struct mlx5_eswitch *esw,
 	flow_flag_clear(flow, SLOW);
 }
 
+/* Caller must obtain uplink_priv->unready_flows_lock mutex before calling this
+ * function.
+ */
+static void unready_flow_add(struct mlx5e_tc_flow *flow,
+			     struct list_head *unready_flows)
+{
+	flow_flag_set(flow, NOT_READY);
+	list_add_tail(&flow->unready, unready_flows);
+}
+
+/* Caller must obtain uplink_priv->unready_flows_lock mutex before calling this
+ * function.
+ */
+static void unready_flow_del(struct mlx5e_tc_flow *flow)
+{
+	list_del(&flow->unready);
+	flow_flag_clear(flow, NOT_READY);
+}
+
 static void add_unready_flow(struct mlx5e_tc_flow *flow)
 {
 	struct mlx5_rep_uplink_priv *uplink_priv;
@@ -1006,14 +1025,24 @@ static void add_unready_flow(struct mlx5e_tc_flow *flow)
 	rpriv = mlx5_eswitch_get_uplink_priv(esw, REP_ETH);
 	uplink_priv = &rpriv->uplink_priv;
 
-	flow_flag_set(flow, NOT_READY);
-	list_add_tail(&flow->unready, &uplink_priv->unready_flows);
+	mutex_lock(&uplink_priv->unready_flows_lock);
+	unready_flow_add(flow, &uplink_priv->unready_flows);
+	mutex_unlock(&uplink_priv->unready_flows_lock);
 }
 
 static void remove_unready_flow(struct mlx5e_tc_flow *flow)
 {
-	list_del(&flow->unready);
-	flow_flag_clear(flow, NOT_READY);
+	struct mlx5_rep_uplink_priv *uplink_priv;
+	struct mlx5e_rep_priv *rpriv;
+	struct mlx5_eswitch *esw;
+
+	esw = flow->priv->mdev->priv.eswitch;
+	rpriv = mlx5_eswitch_get_uplink_priv(esw, REP_ETH);
+	uplink_priv = &rpriv->uplink_priv;
+
+	mutex_lock(&uplink_priv->unready_flows_lock);
+	unready_flow_del(flow);
+	mutex_unlock(&uplink_priv->unready_flows_lock);
 }
 
 static int
@@ -3723,10 +3752,10 @@ void mlx5e_tc_reoffload_flows_work(struct work_struct *work)
 			     reoffload_flows_work);
 	struct mlx5e_tc_flow *flow, *tmp;
 
-	rtnl_lock();
+	mutex_lock(&rpriv->unready_flows_lock);
 	list_for_each_entry_safe(flow, tmp, &rpriv->unready_flows, unready) {
 		if (!mlx5e_tc_add_fdb_flow(flow->priv, flow, NULL))
-			remove_unready_flow(flow);
+			unready_flow_del(flow);
 	}
-	rtnl_unlock();
+	mutex_unlock(&rpriv->unready_flows_lock);
 }
-- 
2.21.0


^ permalink raw reply related

* [net-next 04/13] net/mlx5e: Fix unnecessary flow_block_cb_is_busy call
From: Saeed Mahameed @ 2019-07-29 23:50 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev@vger.kernel.org, wenxu, Saeed Mahameed
In-Reply-To: <20190729234934.23595-1-saeedm@mellanox.com>

From: wenxu <wenxu@ucloud.cn>

When call flow_block_cb_is_busy. The indr_priv is guaranteed to
NULL ptr. So there is no need to call flow_bock_cb_is_busy.

Fixes: 0d4fd02e7199 ("net: flow_offload: add flow_block_cb_is_busy() and use it")
Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 7f747cb1a4f4..496d3034e278 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -722,10 +722,6 @@ mlx5e_rep_indr_setup_tc_block(struct net_device *netdev,
 		if (indr_priv)
 			return -EEXIST;
 
-		if (flow_block_cb_is_busy(mlx5e_rep_indr_setup_block_cb,
-					  indr_priv, &mlx5e_block_cb_list))
-			return -EBUSY;
-
 		indr_priv = kmalloc(sizeof(*indr_priv), GFP_KERNEL);
 		if (!indr_priv)
 			return -ENOMEM;
-- 
2.21.0


^ permalink raw reply related

* [net-next 08/13] net/mlx5e: Protect tc flows hashtable with rcu
From: Saeed Mahameed @ 2019-07-29 23:50 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev@vger.kernel.org, Vlad Buslov, Jianbo Liu, Roi Dayan,
	Saeed Mahameed
In-Reply-To: <20190729234934.23595-1-saeedm@mellanox.com>

From: Vlad Buslov <vladbu@mellanox.com>

In order to remove dependency on rtnl lock, access to tc flows hashtable
must be explicitly protected from concurrent flows removal.

Extend tc flow structure with rcu to allow concurrent parallel access. Use
rcu read lock to safely lookup flow in tc flows hash table, and take
reference to it. Use rcu free for flow deletion to accommodate concurrent
stats requests.

Add new DELETED flow flag. Imlement new flow_flag_test_and_set() helper
that is used to set a flag and return its previous value. Use it to
atomically set the flag in mlx5e_delete_flower() to guarantee that flow can
only be deleted once, even when same flow is deleted concurrently by
multiple tasks.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/en_tc.c   | 47 ++++++++++++++++---
 1 file changed, 40 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 241157b699df..a39f8a07de0a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -79,6 +79,7 @@ enum {
 	MLX5E_TC_FLOW_FLAG_SLOW		= MLX5E_TC_FLOW_BASE + 3,
 	MLX5E_TC_FLOW_FLAG_DUP		= MLX5E_TC_FLOW_BASE + 4,
 	MLX5E_TC_FLOW_FLAG_NOT_READY	= MLX5E_TC_FLOW_BASE + 5,
+	MLX5E_TC_FLOW_FLAG_DELETED	= MLX5E_TC_FLOW_BASE + 6,
 };
 
 #define MLX5E_TC_MAX_SPLITS 1
@@ -122,6 +123,7 @@ struct mlx5e_tc_flow {
 	struct list_head	peer;    /* flows with peer flow */
 	struct list_head	unready; /* flows not ready to be offloaded (e.g due to missing route) */
 	refcount_t		refcnt;
+	struct rcu_head		rcu_head;
 	union {
 		struct mlx5_esw_flow_attr esw_attr[0];
 		struct mlx5_nic_flow_attr nic_attr[0];
@@ -201,7 +203,7 @@ static void mlx5e_flow_put(struct mlx5e_priv *priv,
 {
 	if (refcount_dec_and_test(&flow->refcnt)) {
 		mlx5e_tc_del_flow(priv, flow);
-		kfree(flow);
+		kfree_rcu(flow, rcu_head);
 	}
 }
 
@@ -214,6 +216,17 @@ static void __flow_flag_set(struct mlx5e_tc_flow *flow, unsigned long flag)
 
 #define flow_flag_set(flow, flag) __flow_flag_set(flow, MLX5E_TC_FLOW_FLAG_##flag)
 
+static bool __flow_flag_test_and_set(struct mlx5e_tc_flow *flow,
+				     unsigned long flag)
+{
+	/* test_and_set_bit() provides all necessary barriers */
+	return test_and_set_bit(flag, &flow->flags);
+}
+
+#define flow_flag_test_and_set(flow, flag)			\
+	__flow_flag_test_and_set(flow,				\
+				 MLX5E_TC_FLOW_FLAG_##flag)
+
 static void __flow_flag_clear(struct mlx5e_tc_flow *flow, unsigned long flag)
 {
 	/* Complete all memory stores before clearing bit. */
@@ -3451,7 +3464,9 @@ int mlx5e_configure_flower(struct net_device *dev, struct mlx5e_priv *priv,
 	struct mlx5e_tc_flow *flow;
 	int err = 0;
 
-	flow = rhashtable_lookup_fast(tc_ht, &f->cookie, tc_ht_params);
+	rcu_read_lock();
+	flow = rhashtable_lookup(tc_ht, &f->cookie, tc_ht_params);
+	rcu_read_unlock();
 	if (flow) {
 		NL_SET_ERR_MSG_MOD(extack,
 				   "flow cookie already exists, ignoring");
@@ -3466,7 +3481,7 @@ int mlx5e_configure_flower(struct net_device *dev, struct mlx5e_priv *priv,
 	if (err)
 		goto out;
 
-	err = rhashtable_insert_fast(tc_ht, &flow->node, tc_ht_params);
+	err = rhashtable_lookup_insert_fast(tc_ht, &flow->node, tc_ht_params);
 	if (err)
 		goto err_free;
 
@@ -3492,16 +3507,32 @@ int mlx5e_delete_flower(struct net_device *dev, struct mlx5e_priv *priv,
 {
 	struct rhashtable *tc_ht = get_tc_ht(priv, flags);
 	struct mlx5e_tc_flow *flow;
+	int err;
 
+	rcu_read_lock();
 	flow = rhashtable_lookup_fast(tc_ht, &f->cookie, tc_ht_params);
-	if (!flow || !same_flow_direction(flow, flags))
-		return -EINVAL;
+	if (!flow || !same_flow_direction(flow, flags)) {
+		err = -EINVAL;
+		goto errout;
+	}
 
+	/* Only delete the flow if it doesn't have MLX5E_TC_FLOW_DELETED flag
+	 * set.
+	 */
+	if (flow_flag_test_and_set(flow, DELETED)) {
+		err = -EINVAL;
+		goto errout;
+	}
 	rhashtable_remove_fast(tc_ht, &flow->node, tc_ht_params);
+	rcu_read_unlock();
 
 	mlx5e_flow_put(priv, flow);
 
 	return 0;
+
+errout:
+	rcu_read_unlock();
+	return err;
 }
 
 int mlx5e_stats_flower(struct net_device *dev, struct mlx5e_priv *priv,
@@ -3517,8 +3548,10 @@ int mlx5e_stats_flower(struct net_device *dev, struct mlx5e_priv *priv,
 	u64 bytes = 0;
 	int err = 0;
 
-	flow = mlx5e_flow_get(rhashtable_lookup_fast(tc_ht, &f->cookie,
-						     tc_ht_params));
+	rcu_read_lock();
+	flow = mlx5e_flow_get(rhashtable_lookup(tc_ht, &f->cookie,
+						tc_ht_params));
+	rcu_read_unlock();
 	if (IS_ERR(flow))
 		return PTR_ERR(flow);
 
-- 
2.21.0


^ permalink raw reply related

* [net-next 06/13] net/mlx5e: Extend tc flow struct with reference counter
From: Saeed Mahameed @ 2019-07-29 23:50 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev@vger.kernel.org, Vlad Buslov, Jianbo Liu, Roi Dayan,
	Saeed Mahameed
In-Reply-To: <20190729234934.23595-1-saeedm@mellanox.com>

From: Vlad Buslov <vladbu@mellanox.com>

With new classifier type that doesn't require rtnl lock, following
invariant holds:
 - Filter with specified cookie created only once.
 - Filter with specified cookie deleted only once.
 - Stats updates can be performed in parallel to each other.

Extend tc flow with rcu and reference counter. To protect from concurrent
delete, get reference to tc flow when:
 - Reading flow stats.
 - Accessing flow in neigh update handler.
 - Accessing flow in neigh update used value handler.

Only free flow when reference counter reached zero. Modify flow cleanup to
account for flows that could be not fully initialized by checking if flow
is actually in the list of corresponding mod_hdr, hairpin and encap
entries. Don't cleanup flow directly in case of error to allow concurrent
neigh update (neigh update will be modified to always take reference to
flow when using it).

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/en_tc.c   | 193 ++++++++++--------
 1 file changed, 105 insertions(+), 88 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index cc096f6011d9..e2b87f723819 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -38,6 +38,7 @@
 #include <linux/mlx5/fs.h>
 #include <linux/mlx5/device.h>
 #include <linux/rhashtable.h>
+#include <linux/refcount.h>
 #include <net/tc_act/tc_mirred.h>
 #include <net/tc_act/tc_vlan.h>
 #include <net/tc_act/tc_tunnel_key.h>
@@ -120,6 +121,7 @@ struct mlx5e_tc_flow {
 	struct list_head	hairpin; /* flows sharing the same hairpin */
 	struct list_head	peer;    /* flows with peer flow */
 	struct list_head	unready; /* flows not ready to be offloaded (e.g due to missing route) */
+	refcount_t		refcnt;
 	union {
 		struct mlx5_esw_flow_attr esw_attr[0];
 		struct mlx5_nic_flow_attr nic_attr[0];
@@ -184,6 +186,25 @@ struct mlx5e_mod_hdr_entry {
 
 #define MLX5_MH_ACT_SZ MLX5_UN_SZ_BYTES(set_action_in_add_action_in_auto)
 
+static void mlx5e_tc_del_flow(struct mlx5e_priv *priv,
+			      struct mlx5e_tc_flow *flow);
+
+static struct mlx5e_tc_flow *mlx5e_flow_get(struct mlx5e_tc_flow *flow)
+{
+	if (!flow || !refcount_inc_not_zero(&flow->refcnt))
+		return ERR_PTR(-EINVAL);
+	return flow;
+}
+
+static void mlx5e_flow_put(struct mlx5e_priv *priv,
+			   struct mlx5e_tc_flow *flow)
+{
+	if (refcount_dec_and_test(&flow->refcnt)) {
+		mlx5e_tc_del_flow(priv, flow);
+		kfree(flow);
+	}
+}
+
 static inline u32 hash_mod_hdr_info(struct mod_hdr_key *key)
 {
 	return jhash(key->actions,
@@ -281,6 +302,10 @@ static void mlx5e_detach_mod_hdr(struct mlx5e_priv *priv,
 {
 	struct list_head *next = flow->mod_hdr.next;
 
+	/* flow wasn't fully initialized */
+	if (list_empty(&flow->mod_hdr))
+		return;
+
 	list_del(&flow->mod_hdr);
 
 	if (list_empty(next)) {
@@ -694,6 +719,10 @@ static void mlx5e_hairpin_flow_del(struct mlx5e_priv *priv,
 {
 	struct list_head *next = flow->hairpin.next;
 
+	/* flow wasn't fully initialized */
+	if (list_empty(&flow->hairpin))
+		return;
+
 	list_del(&flow->hairpin);
 
 	/* no more hairpin flows for us, release the hairpin pair */
@@ -727,7 +756,6 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
 		.flags    = FLOW_ACT_NO_APPEND,
 	};
 	struct mlx5_fc *counter = NULL;
-	bool table_created = false;
 	int err, dest_ix = 0;
 
 	flow_context->flags |= FLOW_CONTEXT_HAS_TAG;
@@ -735,9 +763,9 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
 
 	if (flow->flags & MLX5E_TC_FLOW_HAIRPIN) {
 		err = mlx5e_hairpin_flow_add(priv, flow, parse_attr, extack);
-		if (err) {
-			goto err_add_hairpin_flow;
-		}
+		if (err)
+			return err;
+
 		if (flow->flags & MLX5E_TC_FLOW_HAIRPIN_RSS) {
 			dest[dest_ix].type = MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE;
 			dest[dest_ix].ft = attr->hairpin_ft;
@@ -754,10 +782,9 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
 
 	if (attr->action & MLX5_FLOW_CONTEXT_ACTION_COUNT) {
 		counter = mlx5_fc_create(dev, true);
-		if (IS_ERR(counter)) {
-			err = PTR_ERR(counter);
-			goto err_fc_create;
-		}
+		if (IS_ERR(counter))
+			return PTR_ERR(counter);
+
 		dest[dest_ix].type = MLX5_FLOW_DESTINATION_TYPE_COUNTER;
 		dest[dest_ix].counter_id = mlx5_fc_id(counter);
 		dest_ix++;
@@ -769,7 +796,7 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
 		flow_act.modify_id = attr->mod_hdr_id;
 		kfree(parse_attr->mod_hdr_actions);
 		if (err)
-			goto err_create_mod_hdr_id;
+			return err;
 	}
 
 	if (IS_ERR_OR_NULL(priv->fs.tc.t)) {
@@ -795,11 +822,8 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
 					   "Failed to create tc offload table\n");
 			netdev_err(priv->netdev,
 				   "Failed to create tc offload table\n");
-			err = PTR_ERR(priv->fs.tc.t);
-			goto err_create_ft;
+			return PTR_ERR(priv->fs.tc.t);
 		}
-
-		table_created = true;
 	}
 
 	if (attr->match_level != MLX5_MATCH_NONE)
@@ -808,28 +832,10 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
 	flow->rule[0] = mlx5_add_flow_rules(priv->fs.tc.t, &parse_attr->spec,
 					    &flow_act, dest, dest_ix);
 
-	if (IS_ERR(flow->rule[0])) {
-		err = PTR_ERR(flow->rule[0]);
-		goto err_add_rule;
-	}
+	if (IS_ERR(flow->rule[0]))
+		return PTR_ERR(flow->rule[0]);
 
 	return 0;
-
-err_add_rule:
-	if (table_created) {
-		mlx5_destroy_flow_table(priv->fs.tc.t);
-		priv->fs.tc.t = NULL;
-	}
-err_create_ft:
-	if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR)
-		mlx5e_detach_mod_hdr(priv, flow);
-err_create_mod_hdr_id:
-	mlx5_fc_destroy(dev, counter);
-err_fc_create:
-	if (flow->flags & MLX5E_TC_FLOW_HAIRPIN)
-		mlx5e_hairpin_flow_del(priv, flow);
-err_add_hairpin_flow:
-	return err;
 }
 
 static void mlx5e_tc_del_nic_flow(struct mlx5e_priv *priv,
@@ -839,7 +845,8 @@ static void mlx5e_tc_del_nic_flow(struct mlx5e_priv *priv,
 	struct mlx5_fc *counter = NULL;
 
 	counter = attr->counter;
-	mlx5_del_flow_rules(flow->rule[0]);
+	if (!IS_ERR_OR_NULL(flow->rule[0]))
+		mlx5_del_flow_rules(flow->rule[0]);
 	mlx5_fc_destroy(priv->mdev, counter);
 
 	if (!mlx5e_tc_num_filters(priv, MLX5E_TC_NIC_OFFLOAD)  && priv->fs.tc.t) {
@@ -980,14 +987,12 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv,
 
 	if (attr->chain > max_chain) {
 		NL_SET_ERR_MSG(extack, "Requested chain is out of supported range");
-		err = -EOPNOTSUPP;
-		goto err_max_prio_chain;
+		return -EOPNOTSUPP;
 	}
 
 	if (attr->prio > max_prio) {
 		NL_SET_ERR_MSG(extack, "Requested priority is out of supported range");
-		err = -EOPNOTSUPP;
-		goto err_max_prio_chain;
+		return -EOPNOTSUPP;
 	}
 
 	for (out_index = 0; out_index < MLX5_MAX_FLOW_FWD_VPORTS; out_index++) {
@@ -1002,7 +1007,7 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv,
 		err = mlx5e_attach_encap(priv, flow, out_dev, out_index,
 					 extack, &encap_dev, &encap_valid);
 		if (err)
-			goto err_attach_encap;
+			return err;
 
 		out_priv = netdev_priv(encap_dev);
 		rpriv = out_priv->ppriv;
@@ -1012,21 +1017,19 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv,
 
 	err = mlx5_eswitch_add_vlan_action(esw, attr);
 	if (err)
-		goto err_add_vlan;
+		return err;
 
 	if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR) {
 		err = mlx5e_attach_mod_hdr(priv, flow, parse_attr);
 		kfree(parse_attr->mod_hdr_actions);
 		if (err)
-			goto err_mod_hdr;
+			return err;
 	}
 
 	if (attr->action & MLX5_FLOW_CONTEXT_ACTION_COUNT) {
 		counter = mlx5_fc_create(attr->counter_dev, true);
-		if (IS_ERR(counter)) {
-			err = PTR_ERR(counter);
-			goto err_create_counter;
-		}
+		if (IS_ERR(counter))
+			return PTR_ERR(counter);
 
 		attr->counter = counter;
 	}
@@ -1044,27 +1047,10 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv,
 		flow->rule[0] = mlx5e_tc_offload_fdb_rules(esw, flow, &parse_attr->spec, attr);
 	}
 
-	if (IS_ERR(flow->rule[0])) {
-		err = PTR_ERR(flow->rule[0]);
-		goto err_add_rule;
-	}
+	if (IS_ERR(flow->rule[0]))
+		return PTR_ERR(flow->rule[0]);
 
 	return 0;
-
-err_add_rule:
-	mlx5_fc_destroy(attr->counter_dev, counter);
-err_create_counter:
-	if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR)
-		mlx5e_detach_mod_hdr(priv, flow);
-err_mod_hdr:
-	mlx5_eswitch_del_vlan_action(esw, attr);
-err_add_vlan:
-	for (out_index = 0; out_index < MLX5_MAX_FLOW_FWD_VPORTS; out_index++)
-		if (attr->dests[out_index].flags & MLX5_ESW_DEST_ENCAP)
-			mlx5e_detach_encap(priv, flow, out_index);
-err_attach_encap:
-err_max_prio_chain:
-	return err;
 }
 
 static bool mlx5_flow_has_geneve_opt(struct mlx5e_tc_flow *flow)
@@ -1123,9 +1109,9 @@ void mlx5e_tc_encap_flows_add(struct mlx5e_priv *priv,
 {
 	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
 	struct mlx5_esw_flow_attr slow_attr, *esw_attr;
+	struct encap_flow_item *efi, *tmp;
 	struct mlx5_flow_handle *rule;
 	struct mlx5_flow_spec *spec;
-	struct encap_flow_item *efi;
 	struct mlx5e_tc_flow *flow;
 	int err;
 
@@ -1142,11 +1128,14 @@ void mlx5e_tc_encap_flows_add(struct mlx5e_priv *priv,
 	e->flags |= MLX5_ENCAP_ENTRY_VALID;
 	mlx5e_rep_queue_neigh_stats_work(priv);
 
-	list_for_each_entry(efi, &e->flows, list) {
+	list_for_each_entry_safe(efi, tmp, &e->flows, list) {
 		bool all_flow_encaps_valid = true;
 		int i;
 
 		flow = container_of(efi, struct mlx5e_tc_flow, encaps[efi->index]);
+		if (IS_ERR(mlx5e_flow_get(flow)))
+			continue;
+
 		esw_attr = flow->esw_attr;
 		spec = &esw_attr->parse_attr->spec;
 
@@ -1166,19 +1155,22 @@ void mlx5e_tc_encap_flows_add(struct mlx5e_priv *priv,
 		}
 		/* Do not offload flows with unresolved neighbors */
 		if (!all_flow_encaps_valid)
-			continue;
+			goto loop_cont;
 		/* update from slow path rule to encap rule */
 		rule = mlx5e_tc_offload_fdb_rules(esw, flow, spec, esw_attr);
 		if (IS_ERR(rule)) {
 			err = PTR_ERR(rule);
 			mlx5_core_warn(priv->mdev, "Failed to update cached encapsulation flow, %d\n",
 				       err);
-			continue;
+			goto loop_cont;
 		}
 
 		mlx5e_tc_unoffload_from_slow_path(esw, flow, &slow_attr);
 		flow->flags |= MLX5E_TC_FLOW_OFFLOADED; /* was unset when slow path rule removed */
 		flow->rule[0] = rule;
+
+loop_cont:
+		mlx5e_flow_put(priv, flow);
 	}
 }
 
@@ -1187,14 +1179,17 @@ void mlx5e_tc_encap_flows_del(struct mlx5e_priv *priv,
 {
 	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
 	struct mlx5_esw_flow_attr slow_attr;
+	struct encap_flow_item *efi, *tmp;
 	struct mlx5_flow_handle *rule;
 	struct mlx5_flow_spec *spec;
-	struct encap_flow_item *efi;
 	struct mlx5e_tc_flow *flow;
 	int err;
 
-	list_for_each_entry(efi, &e->flows, list) {
+	list_for_each_entry_safe(efi, tmp, &e->flows, list) {
 		flow = container_of(efi, struct mlx5e_tc_flow, encaps[efi->index]);
+		if (IS_ERR(mlx5e_flow_get(flow)))
+			continue;
+
 		spec = &flow->esw_attr->parse_attr->spec;
 
 		/* update from encap rule to slow path rule */
@@ -1206,12 +1201,15 @@ void mlx5e_tc_encap_flows_del(struct mlx5e_priv *priv,
 			err = PTR_ERR(rule);
 			mlx5_core_warn(priv->mdev, "Failed to update slow path (encap) flow, %d\n",
 				       err);
-			continue;
+			goto loop_cont;
 		}
 
 		mlx5e_tc_unoffload_fdb_rules(esw, flow, flow->esw_attr);
 		flow->flags |= MLX5E_TC_FLOW_OFFLOADED; /* was unset when fast path rule removed */
 		flow->rule[0] = rule;
+
+loop_cont:
+		mlx5e_flow_put(priv, flow);
 	}
 
 	/* we know that the encap is valid */
@@ -1248,20 +1246,26 @@ void mlx5e_tc_update_neigh_used_value(struct mlx5e_neigh_hash_entry *nhe)
 		return;
 
 	list_for_each_entry(e, &nhe->encap_list, encap_list) {
-		struct encap_flow_item *efi;
+		struct encap_flow_item *efi, *tmp;
 		if (!(e->flags & MLX5_ENCAP_ENTRY_VALID))
 			continue;
-		list_for_each_entry(efi, &e->flows, list) {
+		list_for_each_entry_safe(efi, tmp, &e->flows, list) {
 			flow = container_of(efi, struct mlx5e_tc_flow,
 					    encaps[efi->index]);
+			if (IS_ERR(mlx5e_flow_get(flow)))
+				continue;
+
 			if (flow->flags & MLX5E_TC_FLOW_OFFLOADED) {
 				counter = mlx5e_tc_get_counter(flow);
 				mlx5_fc_query_cached(counter, &bytes, &packets, &lastuse);
 				if (time_after((unsigned long)lastuse, nhe->reported_lastuse)) {
+					mlx5e_flow_put(netdev_priv(e->out_dev), flow);
 					neigh_used = true;
 					break;
 				}
 			}
+
+			mlx5e_flow_put(netdev_priv(e->out_dev), flow);
 		}
 		if (neigh_used)
 			break;
@@ -1287,6 +1291,10 @@ static void mlx5e_detach_encap(struct mlx5e_priv *priv,
 {
 	struct list_head *next = flow->encaps[out_index].list.next;
 
+	/* flow wasn't fully initialized */
+	if (list_empty(&flow->encaps[out_index].list))
+		return;
+
 	list_del(&flow->encaps[out_index].list);
 	if (list_empty(next)) {
 		struct mlx5e_encap_entry *e;
@@ -3122,7 +3130,7 @@ mlx5e_alloc_flow(struct mlx5e_priv *priv, int attr_size,
 {
 	struct mlx5e_tc_flow_parse_attr *parse_attr;
 	struct mlx5e_tc_flow *flow;
-	int err;
+	int out_index, err;
 
 	flow = kzalloc(sizeof(*flow) + attr_size, GFP_KERNEL);
 	parse_attr = kvzalloc(sizeof(*parse_attr), GFP_KERNEL);
@@ -3134,6 +3142,11 @@ mlx5e_alloc_flow(struct mlx5e_priv *priv, int attr_size,
 	flow->cookie = f->cookie;
 	flow->flags = flow_flags;
 	flow->priv = priv;
+	for (out_index = 0; out_index < MLX5_MAX_FLOW_FWD_VPORTS; out_index++)
+		INIT_LIST_HEAD(&flow->encaps[out_index].list);
+	INIT_LIST_HEAD(&flow->mod_hdr);
+	INIT_LIST_HEAD(&flow->hairpin);
+	refcount_set(&flow->refcnt, 1);
 
 	*__flow = flow;
 	*__parse_attr = parse_attr;
@@ -3216,8 +3229,7 @@ __mlx5e_add_fdb_flow(struct mlx5e_priv *priv,
 	return flow;
 
 err_free:
-	kfree(flow);
-	kvfree(parse_attr);
+	mlx5e_flow_put(priv, flow);
 out:
 	return ERR_PTR(err);
 }
@@ -3351,7 +3363,7 @@ mlx5e_add_nic_flow(struct mlx5e_priv *priv,
 	return 0;
 
 err_free:
-	kfree(flow);
+	mlx5e_flow_put(priv, flow);
 	kvfree(parse_attr);
 out:
 	return err;
@@ -3413,8 +3425,7 @@ int mlx5e_configure_flower(struct net_device *dev, struct mlx5e_priv *priv,
 	return 0;
 
 err_free:
-	mlx5e_tc_del_flow(priv, flow);
-	kfree(flow);
+	mlx5e_flow_put(priv, flow);
 out:
 	return err;
 }
@@ -3442,9 +3453,7 @@ int mlx5e_delete_flower(struct net_device *dev, struct mlx5e_priv *priv,
 
 	rhashtable_remove_fast(tc_ht, &flow->node, tc_ht_params);
 
-	mlx5e_tc_del_flow(priv, flow);
-
-	kfree(flow);
+	mlx5e_flow_put(priv, flow);
 
 	return 0;
 }
@@ -3460,15 +3469,22 @@ int mlx5e_stats_flower(struct net_device *dev, struct mlx5e_priv *priv,
 	u64 lastuse = 0;
 	u64 packets = 0;
 	u64 bytes = 0;
+	int err = 0;
 
-	flow = rhashtable_lookup_fast(tc_ht, &f->cookie, tc_ht_params);
-	if (!flow || !same_flow_direction(flow, flags))
-		return -EINVAL;
+	flow = mlx5e_flow_get(rhashtable_lookup_fast(tc_ht, &f->cookie,
+						     tc_ht_params));
+	if (IS_ERR(flow))
+		return PTR_ERR(flow);
+
+	if (!same_flow_direction(flow, flags)) {
+		err = -EINVAL;
+		goto errout;
+	}
 
 	if (flow->flags & MLX5E_TC_FLOW_OFFLOADED) {
 		counter = mlx5e_tc_get_counter(flow);
 		if (!counter)
-			return 0;
+			goto errout;
 
 		mlx5_fc_query_cached(counter, &bytes, &packets, &lastuse);
 	}
@@ -3500,8 +3516,9 @@ int mlx5e_stats_flower(struct net_device *dev, struct mlx5e_priv *priv,
 	mlx5_devcom_release_peer_data(devcom, MLX5_DEVCOM_ESW_OFFLOADS);
 out:
 	flow_stats_update(&f->stats, bytes, packets, lastuse);
-
-	return 0;
+errout:
+	mlx5e_flow_put(priv, flow);
+	return err;
 }
 
 static void mlx5e_tc_hairpin_update_dead_peer(struct mlx5e_priv *priv,
-- 
2.21.0


^ permalink raw reply related

* [net-next 05/13] net/mlx5e: Simplify get_route_and_out_devs helper function
From: Saeed Mahameed @ 2019-07-29 23:50 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev@vger.kernel.org, Eli Britstein, Roi Dayan, Saeed Mahameed
In-Reply-To: <20190729234934.23595-1-saeedm@mellanox.com>

From: Eli Britstein <elibr@mellanox.com>

The helper function has "if" branches that do the same. Merge them to
simplify the code.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../ethernet/mellanox/mlx5/core/en/tc_tun.c   | 19 +++++++------------
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c
index a6a52806be45..ae439d95f5a3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c
@@ -40,20 +40,15 @@ static int get_route_and_out_devs(struct mlx5e_priv *priv,
 	/* if the egress device isn't on the same HW e-switch or
 	 * it's a LAG device, use the uplink
 	 */
+	*route_dev = dev;
 	if (!netdev_port_same_parent_id(priv->netdev, real_dev) ||
-	    dst_is_lag_dev) {
-		*route_dev = dev;
+	    dst_is_lag_dev || is_vlan_dev(*route_dev))
 		*out_dev = uplink_dev;
-	} else {
-		*route_dev = dev;
-		if (is_vlan_dev(*route_dev))
-			*out_dev = uplink_dev;
-		else if (mlx5e_eswitch_rep(dev) &&
-			 mlx5e_is_valid_eswitch_fwd_dev(priv, dev))
-			*out_dev = *route_dev;
-		else
-			return -EOPNOTSUPP;
-	}
+	else if (mlx5e_eswitch_rep(dev) &&
+		 mlx5e_is_valid_eswitch_fwd_dev(priv, dev))
+		*out_dev = *route_dev;
+	else
+		return -EOPNOTSUPP;
 
 	if (!(mlx5e_eswitch_rep(*out_dev) &&
 	      mlx5e_is_uplink_rep(netdev_priv(*out_dev))))
-- 
2.21.0


^ permalink raw reply related

* [net-next 03/13] net/mlx5e: Improve ethtool rxnfc callback structure
From: Saeed Mahameed @ 2019-07-29 23:50 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev@vger.kernel.org, Saeed Mahameed, Tariq Toukan
In-Reply-To: <20190729234934.23595-1-saeedm@mellanox.com>

Don't choose who implements the rxnfc "get/set" callbacks according to
CONFIG_MLX5_EN_RXNFC, instead have the callbacks always available and
delegate to a function of a different driver module when needed
(en_fs_ethtool.c), have stubs in en/fs.h to fallback to when
en_fs_ethtool.c is compiled out, to avoid complications and ifdefs in
en_main.c.

Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/en/fs.h   | 11 ++++++--
 .../ethernet/mellanox/mlx5/core/en_ethtool.c  | 28 +++++++++++--------
 .../mellanox/mlx5/core/en_fs_ethtool.c        | 11 +++-----
 3 files changed, 28 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h b/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h
index be5961ff24cc..eb70ada89b09 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h
@@ -132,12 +132,17 @@ struct mlx5e_ethtool_steering {
 
 void mlx5e_ethtool_init_steering(struct mlx5e_priv *priv);
 void mlx5e_ethtool_cleanup_steering(struct mlx5e_priv *priv);
-int mlx5e_set_rxnfc(struct net_device *dev, struct ethtool_rxnfc *cmd);
-int mlx5e_get_rxnfc(struct net_device *dev,
-		    struct ethtool_rxnfc *info, u32 *rule_locs);
+int mlx5e_ethtool_set_rxnfc(struct net_device *dev, struct ethtool_rxnfc *cmd);
+int mlx5e_ethtool_get_rxnfc(struct net_device *dev,
+			    struct ethtool_rxnfc *info, u32 *rule_locs);
 #else
 static inline void mlx5e_ethtool_init_steering(struct mlx5e_priv *priv)    { }
 static inline void mlx5e_ethtool_cleanup_steering(struct mlx5e_priv *priv) { }
+static inline int mlx5e_ethtool_set_rxnfc(struct net_device *dev, struct ethtool_rxnfc *cmd)
+{ return -EOPNOTSUPP; }
+static inline int mlx5e_ethtool_get_rxnfc(struct net_device *dev,
+					  struct ethtool_rxnfc *info, u32 *rule_locs)
+{ return -EOPNOTSUPP; }
 #endif /* CONFIG_MLX5_EN_RXNFC */
 
 #ifdef CONFIG_MLX5_EN_ARFS
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index 126ec4181286..a6b0eda0bd1a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -1888,21 +1888,27 @@ static u32 mlx5e_get_priv_flags(struct net_device *netdev)
 	return priv->channels.params.pflags;
 }
 
-#ifndef CONFIG_MLX5_EN_RXNFC
-/* When CONFIG_MLX5_EN_RXNFC=n we only support ETHTOOL_GRXRINGS
- * otherwise this function will be defined from en_fs_ethtool.c
- */
 static int mlx5e_get_rxnfc(struct net_device *dev, struct ethtool_rxnfc *info, u32 *rule_locs)
 {
 	struct mlx5e_priv *priv = netdev_priv(dev);
 
-	if (info->cmd != ETHTOOL_GRXRINGS)
-		return -EOPNOTSUPP;
-	/* ring_count is needed by ethtool -x */
-	info->data = priv->channels.params.num_channels;
-	return 0;
+	/* ETHTOOL_GRXRINGS is needed by ethtool -x which is not part
+	 * of rxnfc. We keep this logic out of mlx5e_ethtool_get_rxnfc,
+	 * to avoid breaking "ethtool -x" when mlx5e_ethtool_get_rxnfc
+	 * is compiled out via CONFIG_MLX5_EN_RXNFC=n.
+	 */
+	if (info->cmd == ETHTOOL_GRXRINGS) {
+		info->data = priv->channels.params.num_channels;
+		return 0;
+	}
+
+	return mlx5e_ethtool_get_rxnfc(dev, info, rule_locs);
+}
+
+static int mlx5e_set_rxnfc(struct net_device *dev, struct ethtool_rxnfc *cmd)
+{
+	return mlx5e_ethtool_set_rxnfc(dev, cmd);
 }
-#endif
 
 const struct ethtool_ops mlx5e_ethtool_ops = {
 	.get_drvinfo       = mlx5e_get_drvinfo,
@@ -1923,9 +1929,7 @@ const struct ethtool_ops mlx5e_ethtool_ops = {
 	.get_rxfh          = mlx5e_get_rxfh,
 	.set_rxfh          = mlx5e_set_rxfh,
 	.get_rxnfc         = mlx5e_get_rxnfc,
-#ifdef CONFIG_MLX5_EN_RXNFC
 	.set_rxnfc         = mlx5e_set_rxnfc,
-#endif
 	.get_tunable       = mlx5e_get_tunable,
 	.set_tunable       = mlx5e_set_tunable,
 	.get_pauseparam    = mlx5e_get_pauseparam,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c
index ea3a490b569a..a66589816e21 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c
@@ -887,10 +887,10 @@ static int mlx5e_get_rss_hash_opt(struct mlx5e_priv *priv,
 	return 0;
 }
 
-int mlx5e_set_rxnfc(struct net_device *dev, struct ethtool_rxnfc *cmd)
+int mlx5e_ethtool_set_rxnfc(struct net_device *dev, struct ethtool_rxnfc *cmd)
 {
-	int err = 0;
 	struct mlx5e_priv *priv = netdev_priv(dev);
+	int err = 0;
 
 	switch (cmd->cmd) {
 	case ETHTOOL_SRXCLSRLINS:
@@ -910,16 +910,13 @@ int mlx5e_set_rxnfc(struct net_device *dev, struct ethtool_rxnfc *cmd)
 	return err;
 }
 
-int mlx5e_get_rxnfc(struct net_device *dev,
-		    struct ethtool_rxnfc *info, u32 *rule_locs)
+int mlx5e_ethtool_get_rxnfc(struct net_device *dev,
+			    struct ethtool_rxnfc *info, u32 *rule_locs)
 {
 	struct mlx5e_priv *priv = netdev_priv(dev);
 	int err = 0;
 
 	switch (info->cmd) {
-	case ETHTOOL_GRXRINGS:
-		info->data = priv->channels.params.num_channels;
-		break;
 	case ETHTOOL_GRXCLSRLCNT:
 		info->rule_cnt = priv->fs.ethtool.tot_num_rules;
 		break;
-- 
2.21.0


^ permalink raw reply related

* [net-next 02/13] net/mlx5e: Avoid warning print when not required
From: Saeed Mahameed @ 2019-07-29 23:50 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev@vger.kernel.org, Saeed Mahameed, Tariq Toukan
In-Reply-To: <20190729234934.23595-1-saeedm@mellanox.com>

When disabling CQE compression in favor of time-stamping, don't show a
warning when CQE compression is already disabled.

Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 776eb46d263d..266783295124 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3958,7 +3958,8 @@ int mlx5e_hwstamp_set(struct mlx5e_priv *priv, struct ifreq *ifr)
 	case HWTSTAMP_FILTER_PTP_V2_DELAY_REQ:
 	case HWTSTAMP_FILTER_NTP_ALL:
 		/* Disable CQE compression */
-		netdev_warn(priv->netdev, "Disabling cqe compression");
+		if (MLX5E_GET_PFLAG(&priv->channels.params, MLX5E_PFLAG_RX_CQE_COMPRESS))
+			netdev_warn(priv->netdev, "Disabling RX cqe compression\n");
 		err = mlx5e_modify_rx_cqe_compression_locked(priv, false);
 		if (err) {
 			netdev_err(priv->netdev, "Failed disabling cqe compression err=%d\n", err);
-- 
2.21.0


^ permalink raw reply related

* [net-next 01/13] net/mlx5e: Print a warning when LRO feature is dropped or not allowed
From: Saeed Mahameed @ 2019-07-29 23:50 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev@vger.kernel.org, Huy Nguyen, Saeed Mahameed
In-Reply-To: <20190729234934.23595-1-saeedm@mellanox.com>

From: Huy Nguyen <huyn@mellanox.com>

When user enables LRO via ethtool and if the RQ mode is legacy,
mlx5e_fix_features drops the request without any explanation.
Add netdev_warn to cover this case.

Fixes: 6c3a823e1e9c ("net/mlx5e: RX, Remove HW LRO support in legacy RQ")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 47eea6b3a1c3..776eb46d263d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3788,9 +3788,10 @@ static netdev_features_t mlx5e_fix_features(struct net_device *netdev,
 			netdev_warn(netdev, "Dropping C-tag vlan stripping offload due to S-tag vlan\n");
 	}
 	if (!MLX5E_GET_PFLAG(params, MLX5E_PFLAG_RX_STRIDING_RQ)) {
-		features &= ~NETIF_F_LRO;
-		if (params->lro_en)
+		if (features & NETIF_F_LRO) {
 			netdev_warn(netdev, "Disabling LRO, not supported in legacy RQ\n");
+			features &= ~NETIF_F_LRO;
+		}
 	}
 
 	if (MLX5E_GET_PFLAG(params, MLX5E_PFLAG_RX_CQE_COMPRESS)) {
-- 
2.21.0


^ permalink raw reply related

* [pull request][net-next 00/13] Mellanox, mlx5 tc flow handling for concurrent execution (Part 1)
From: Saeed Mahameed @ 2019-07-29 23:50 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev@vger.kernel.org, Saeed Mahameed

Hi Dave,

This series, mostly from Vlad, is the first part of ongoing work to
improve mlx5 tc flow handling by removing dependency on rtnl_lock and
providing a more fine-grained locking and rcu safe data structures.

For more information please see tag log below.

Please pull and let me know if there is any problem.

Thanks,
Saeed.

---
The following changes since commit 85fd8011475e86265beff7b7617c493c247f5356:

  Merge branch 'bnxt_en-TPA-57500' (2019-07-29 14:19:09 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5-updates-2019-07-29

for you to fetch changes up to b6fac0b46a1a76024698d240f0f9aac552f897b7:

  net/mlx5e: Protect tc flow table with mutex (2019-07-29 16:40:26 -0700)

----------------------------------------------------------------
mlx5-updates-2019-07-29

This series includes updates to mlx5 driver,
1) Simplifications, cleanup and warning prints improvements

2) From Vlad Buslov:
Refactor mlx5 tc flow handling for unlocked execution (Part 1)

Currently, all cls API hardware offloads driver callbacks require caller
to hold rtnl lock when calling them. Cls API has already been updated to
update software filters in parallel (on classifiers that support
unlocked execution), however hardware offloads code still obtains rtnl
lock before calling driver tc callbacks. This set implements partial
support for unlocked execution that is leveraged by follow up
refactorings in specific mlx5 tc subsystems and patch to cls API that
implements API that allows drivers to register their callbacks as
rtnl-unlocked.

In mlx5 tc code mlx5e_tc_flow is the main structure that is used to
represent tc filter. Currently, code the structure itself and its
handlers in both tc and eswitch layers do not implement any kind of
synchronizations and assume external global synchronization provided by
rtnl lock instead. Implement following changes to remove dependency on
rtnl lock in flow handling code that are intended to be used a
groundwork for following changes to provide fully rtnl-independent mlx5
tc:

- Extend struct mlx5e_tc_flow with atomic reference counter and rcu to
  allow concurrent access from multiple tc and neigh update workqueue
  instances without introducing any additional locks specific to the
  structure. Its 'flags' field type is changed to atomic bitmask ops which
  is necessary for tc to interact with other concurrent tc instances or
  concurrent neigh update that need to skip flows that are not fully
  initialized (new INIT_DONE flow flag) and can change the flags
  according to neighbor state (flipping OFFLOADED flag).

- Protect unready flows list by new uplink_priv->unready_flows_lock
  mutex.

- Convert calls to netdev APIs that require rtnl lock in flow handling
  code to their rcu counterparts.

- Modify eswitch code that is called from tc layer and assume implicit
  external synchronization to be concurrency safe: change
  esw->offloads.num_flows type to atomic integer and re-arrange
  esw->state_lock usage to protect additional data.

Some of approaches to synchronizations presented in this patch set are
quite complicated (lockless concurrent usage of data structures with rcu
and reference counting, using fine-grained locking when necessary, retry
mechanisms to handle concurrent insertion of another instance of data
structure with same key, etc.). This is necessary to allow calling the
firmware in parallel in most cases, which is the main motivation of this
change since firmware calls are mach heavier operation than atomic
operations, multitude of locks and potential multiple retries during
concurrent accesses to same elements.

----------------------------------------------------------------
Eli Britstein (1):
      net/mlx5e: Simplify get_route_and_out_devs helper function

Huy Nguyen (1):
      net/mlx5e: Print a warning when LRO feature is dropped or not allowed

Saeed Mahameed (2):
      net/mlx5e: Avoid warning print when not required
      net/mlx5e: Improve ethtool rxnfc callback structure

Vlad Buslov (8):
      net/mlx5e: Extend tc flow struct with reference counter
      net/mlx5e: Change flow flags type to unsigned long
      net/mlx5e: Protect tc flows hashtable with rcu
      net/mlx5e: Protect unready flows with dedicated lock
      net/mlx5e: Eswitch, change offloads num_flows type to atomic64
      net/mlx5e: Eswitch, use state_lock to synchronize vlan change
      net/mlx5e: Rely on rcu instead of rtnl lock when getting upper dev
      net/mlx5e: Protect tc flow table with mutex

wenxu (1):
      net/mlx5e: Fix unnecessary flow_block_cb_is_busy call

 drivers/net/ethernet/mellanox/mlx5/core/en/fs.h    |  13 +-
 .../net/ethernet/mellanox/mlx5/core/en/tc_tun.c    |  33 +-
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |  28 +-
 .../ethernet/mellanox/mlx5/core/en_fs_ethtool.c    |  11 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  16 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |  12 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.h   |   2 +
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c    | 492 +++++++++++++--------
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.h    |  26 +-
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  |  16 +-
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |   3 +-
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c |  27 +-
 12 files changed, 424 insertions(+), 255 deletions(-)

^ permalink raw reply

* Re: [bpf-next,v2 0/6] Introduce a BPF helper to generate SYN cookies
From: Petar Penkov @ 2019-07-29 23:45 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Petar Penkov, Networking, bpf, David S . Miller,
	Alexei Starovoitov, Daniel Borkmann, Eric Dumazet, lmb,
	Stanislav Fomichev, Toke Høiland-Jørgensen
In-Reply-To: <20190729204755.iu5wp3xisu42vkky@ast-mbp>

On Mon, Jul 29, 2019 at 1:48 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, Jul 29, 2019 at 09:59:12AM -0700, Petar Penkov wrote:
> > From: Petar Penkov <ppenkov@google.com>
> >
> > This patch series introduces a BPF helper function that allows generating SYN
> > cookies from BPF. Currently, this helper is enabled at both the TC hook and the
> > XDP hook.
> >
> > The first two patches in the series add/modify several TCP helper functions to
> > allow for SKB-less operation, as is the case at the XDP hook.
> >
> > The third patch introduces the bpf_tcp_gen_syncookie helper function which
> > generates a SYN cookie for either XDP or TC programs. The return value of
> > this function contains both the MSS value, encoded in the cookie, and the
> > cookie itself.
> >
> > The last three patches sync tools/ and add a test.
> >
> > Performance evaluation:
> > I sent 10Mpps to a fixed port on a host with 2 10G bonded Mellanox 4 NICs from
> > random IPv6 source addresses. Without XDP I observed 7.2Mpps (syn-acks) being
> > sent out if the IPv6 packets carry 20 bytes of TCP options or 7.6Mpps if they
> > carry no options. If I attached a simple program that checks if a packet is
> > IPv6/TCP/SYN, looks up the socket, issues a cookie, and sends it back out after
> > swapping src/dest, recomputing the checksum, and setting the ACK flag, I
> > observed 10Mpps being sent back out.
>

Thank you for reviewing this so quickly!
> Is it 10m because trafic gen is 10m?
Yes, I believe so.

> What is cpu utilization at this rate?
> Is it cpu or nic limited if you crank up the syn flood?
> Original 7M with all cores or single core?
My receiver was configured with 16rx queues and 16 cores. 7M all cores
are at 100% so I believe this case is CPU limited. At XDP, all cores
are at roughly 40%. I couldn't reliably generate higher SYN flood rate
than that, and the highest numbers I could see for XDP did not go past
10.65Mpps with ~42% utilization on each core. I think I am hitting a
NIC limit here since the CPUs are free.


>
> The patch set looks good to me.
> I'd like Eric to review it one more time before applying.
>

^ permalink raw reply

* Re: [RFC PATCH 1/2] gianfar: convert to phylink
From: Vladimir Oltean @ 2019-07-29 23:39 UTC (permalink / raw)
  To: Arseny Solokha, Claudiu Manoil, Russell King
  Cc: Ioana Ciornei, Andrew Lunn, netdev, Florian Fainelli
In-Reply-To: <20190723151702.14430-2-asolokha@kb.kras.ru>

Hi Arseny,

Nice project!

On Wed, 24 Jul 2019 at 03:38, Arseny Solokha <asolokha@kb.kras.ru> wrote:
>
> Convert gianfar to use the phylink API for better SFP modules support.
>
> The driver still uses phylib for serdes configuration over the TBI
> interface, as there seems to be no functionally equivalent API present
> in phylink (yet). phylib usage is basically confined in two functions.
>
> One needs to change their Device Tree accordingly to get working SFP
> support:
>
>  enet1: ethernet@25000 {
>  <...>
>         device_type = "network";
>         model = "eTSEC";
>         compatible = "gianfar";
>         tbi-handle = <&tbi0>;
> +       phy-connection-type = "sgmii";
> +       managed = "in-band-status";
> +       sfp = <&sfp>;
>         max-speed = <1000>;
>
>         mdio@520 {
>                 #address-cells = <1>;
>                 #size-cells = <0>;
>                 compatible = "fsl,gianfar-tbi";
>                 reg = <0x520 0x20>;
>
>                 tbi0: tbi-phy@1f {
>                         reg = <0x1f>;
>                         device_type = "tbi-phy";
>                 };
>         };
> +
> +       sfp: sfp0 {
> +               compatible = "sff,sfp";
> +               i2c-bus = <&i2c1>;
> +               mod-def0-gpios = <&gpio 4 GPIO_ACTIVE_LOW>;
> +               rate-select0-gpios = <&gpio 13 GPIO_ACTIVE_HIGH>;
> +               tx-disable-gpios = <&gpio 5 GPIO_ACTIVE_LOW>;
> +               tx-fault-gpios = <&gpio 12 GPIO_ACTIVE_HIGH>;
> +               los-gpios = <&gpio 3 GPIO_ACTIVE_HIGH>;
> +       };
>  };
>
> Signed-off-by: Arseny Solokha <asolokha@kb.kras.ru>
> ---
>  drivers/net/ethernet/freescale/Kconfig        |   2 +-
>  drivers/net/ethernet/freescale/gianfar.c      | 409 +++++++++---------
>  drivers/net/ethernet/freescale/gianfar.h      |  26 +-
>  .../net/ethernet/freescale/gianfar_ethtool.c  |  79 ++--
>  4 files changed, 251 insertions(+), 265 deletions(-)
>
> diff --git a/drivers/net/ethernet/freescale/Kconfig b/drivers/net/ethernet/freescale/Kconfig
> index 6a7e8993119f..8b51d423b61d 100644
> --- a/drivers/net/ethernet/freescale/Kconfig
> +++ b/drivers/net/ethernet/freescale/Kconfig
> @@ -89,7 +89,7 @@ config GIANFAR
>         tristate "Gianfar Ethernet"
>         depends on HAS_DMA
>         select FSL_PQ_MDIO
> -       select PHYLIB
> +       select PHYLINK
>         select CRC32
>         ---help---
>           This driver supports the Gigabit TSEC on the MPC83xx, MPC85xx,
> diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c
> index 7ea19e173339..64c7b174e591 100644
> --- a/drivers/net/ethernet/freescale/gianfar.c
> +++ b/drivers/net/ethernet/freescale/gianfar.c
> @@ -96,6 +96,7 @@
>  #include <linux/mii.h>
>  #include <linux/phy.h>
>  #include <linux/phy_fixed.h>
> +#include <linux/phylink.h>

You can remove linux/phy.h.
You (PHYLINK, actually) broke .ndo_change_carrier, and therefore
commit 6211d46713c5 ("gianfar: Add change_carrier() for Fixed PHYs").
If we were honest, we should probably revert that commit entirely now.
Once you do that, you can remove linux/phy_fixed.h as well.

>  #include <linux/of.h>
>  #include <linux/of_net.h>
>
> @@ -117,8 +118,18 @@ static int gfar_change_mtu(struct net_device *dev, int new_mtu);
>  static irqreturn_t gfar_error(int irq, void *dev_id);
>  static irqreturn_t gfar_transmit(int irq, void *dev_id);
>  static irqreturn_t gfar_interrupt(int irq, void *dev_id);
> -static void adjust_link(struct net_device *dev);
> -static noinline void gfar_update_link_state(struct gfar_private *priv);
> +static void gfar_phylink_validate(struct phylink_config *config,
> +                                 unsigned long *supported,
> +                                 struct phylink_link_state *state);
> +static int gfar_mac_link_state(struct phylink_config *config,
> +                              struct phylink_link_state *state);
> +static void gfar_mac_config(struct phylink_config *config, unsigned int mode,
> +                           const struct phylink_link_state *state);
> +static void gfar_mac_an_restart(struct phylink_config *config);
> +static void gfar_mac_link_down(struct phylink_config *config, unsigned int mode,
> +                              phy_interface_t interface);
> +static void gfar_mac_link_up(struct phylink_config *config, unsigned int mode,
> +                            phy_interface_t interface, struct phy_device *phy);

Please do not add to this forward declaration madness. A cleanup patch
in this area would be highly appreciated.

>  static int init_phy(struct net_device *dev);
>  static int gfar_probe(struct platform_device *ofdev);
>  static int gfar_remove(struct platform_device *ofdev);
> @@ -504,6 +515,15 @@ static const struct net_device_ops gfar_netdev_ops = {
>  #endif
>  };
>
> +static const struct phylink_mac_ops gfar_phylink_ops = {
> +       .validate = gfar_phylink_validate,
> +       .mac_link_state = gfar_mac_link_state,
> +       .mac_config = gfar_mac_config,
> +       .mac_an_restart = gfar_mac_an_restart,
> +       .mac_link_down = gfar_mac_link_down,
> +       .mac_link_up = gfar_mac_link_up,
> +};
> +
>  static void gfar_ints_disable(struct gfar_private *priv)
>  {
>         int i;
> @@ -733,6 +753,7 @@ static int gfar_of_init(struct platform_device *ofdev, struct net_device **pdev)
>         struct gfar_private *priv = NULL;
>         struct device_node *np = ofdev->dev.of_node;
>         struct device_node *child = NULL;
> +       struct phylink *phylink;
>         u32 stash_len = 0;
>         u32 stash_idx = 0;
>         unsigned int num_tx_qs, num_rx_qs;
> @@ -891,11 +912,21 @@ static int gfar_of_init(struct platform_device *ofdev, struct net_device **pdev)
>
>         err = of_property_read_string(np, "phy-connection-type", &ctype);
>
> -       /* We only care about rgmii-id.  The rest are autodetected */
> -       if (err == 0 && !strcmp(ctype, "rgmii-id"))
> -               priv->interface = PHY_INTERFACE_MODE_RGMII_ID;
> -       else
> +       /* We only care about rgmii-id and sgmii - the former
> +        * is indistinguishable from rgmii in hardware, and phylink needs
> +        * the latter to be set appropriately for correct phy configuration.
> +        * The rest are autodetected
> +        */
> +       if (err == 0) {
> +               if (!strcmp(ctype, "rgmii-id"))
> +                       priv->interface = PHY_INTERFACE_MODE_RGMII_ID;
> +               else if (!strcmp(ctype, "sgmii"))
> +                       priv->interface = PHY_INTERFACE_MODE_SGMII;
> +               else
> +                       priv->interface = PHY_INTERFACE_MODE_MII;
> +       } else {
>                 priv->interface = PHY_INTERFACE_MODE_MII;
> +       }
>

No. Don't do this. Just do:

    err = of_get_phy_mode(np);
    if (err < 0)
        goto err_grp_init;

    priv->interface = err;

>         if (of_find_property(np, "fsl,magic-packet", NULL))
>                 priv->device_flags |= FSL_GIANFAR_DEV_HAS_MAGIC_PACKET;
> @@ -903,19 +934,21 @@ static int gfar_of_init(struct platform_device *ofdev, struct net_device **pdev)
>         if (of_get_property(np, "fsl,wake-on-filer", NULL))
>                 priv->device_flags |= FSL_GIANFAR_DEV_HAS_WAKE_ON_FILER;
>
> -       priv->phy_node = of_parse_phandle(np, "phy-handle", 0);
> +       priv->device_node = np;
> +       priv->speed = SPEED_UNKNOWN;
>
> -       /* In the case of a fixed PHY, the DT node associated
> -        * to the PHY is the Ethernet MAC DT node.
> -        */
> -       if (!priv->phy_node && of_phy_is_fixed_link(np)) {
> -               err = of_phy_register_fixed_link(np);
> -               if (err)
> -                       goto err_grp_init;
> +       priv->phylink_config.dev = &priv->ndev->dev;
> +       priv->phylink_config.type = PHYLINK_NETDEV;
>
> -               priv->phy_node = of_node_get(np);
> +       phylink = phylink_create(&priv->phylink_config, of_fwnode_handle(np),
> +                                priv->interface, &gfar_phylink_ops);

You introduced a bug here.
of_phy_connect used to take the PHY interface type (for good or bad)
from gfar_get_interface() (which is reconstructing it from the MAC
registers).
You are now passing the PHY interface type to phylink_create from the
"phy-connection-type" DT property.
At the very least, you are breaking LS1021A which uses phy-mode
instead of phy-connection-type (hence my comment above to use the
generic OF helper).
Actually I think you just uncovered a latent bug, in that the DT
bindings for phy-mode didn't mean much at all to the driver - it would
rely on what the bootloader had set up.
Actually DT bindings for phy-connection-type were most likely simply
bolt on on top of gianfar when they figured they couldn't just
auto-detect the various species of required RGMII delays.
But gfar_get_interface is a piece of history that was introduced in
the same commit as the enum phy_interface_t itself: e8a2b6a42073
("[PATCH] PHY: Add support for configuring the PHY connection
interface"). Its time has come.

> +       if (IS_ERR(phylink)) {
> +               err = PTR_ERR(phylink);
> +               goto err_grp_init;
>         }
>
> +       priv->phylink = phylink;
> +
>         /* Find the TBI PHY.  If it's not there, we don't support SGMII */
>         priv->tbi_node = of_parse_phandle(np, "tbi-handle", 0);
>
> @@ -994,7 +1027,7 @@ static int gfar_hwtstamp_get(struct net_device *netdev, struct ifreq *ifr)
>
>  static int gfar_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
>  {
> -       struct phy_device *phydev = dev->phydev;
> +       struct gfar_private *priv = netdev_priv(dev);
>
>         if (!netif_running(dev))
>                 return -EINVAL;
> @@ -1004,10 +1037,7 @@ static int gfar_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
>         if (cmd == SIOCGHWTSTAMP)
>                 return gfar_hwtstamp_get(dev, rq);
>
> -       if (!phydev)
> -               return -ENODEV;
> -
> -       return phy_mii_ioctl(phydev, rq, cmd);
> +       return phylink_mii_ioctl(priv->phylink, rq, cmd);
>  }
>
>  static u32 cluster_entry_per_class(struct gfar_private *priv, u32 rqfar,
> @@ -1307,7 +1337,6 @@ static void gfar_init_addr_hash_table(struct gfar_private *priv)
>   */
>  static int gfar_probe(struct platform_device *ofdev)
>  {
> -       struct device_node *np = ofdev->dev.of_node;
>         struct net_device *dev = NULL;
>         struct gfar_private *priv = NULL;
>         int err = 0, i;
> @@ -1463,12 +1492,10 @@ static int gfar_probe(struct platform_device *ofdev)
>         return 0;
>
>  register_fail:
> -       if (of_phy_is_fixed_link(np))
> -               of_phy_deregister_fixed_link(np);
>         unmap_group_regs(priv);
>         gfar_free_rx_queues(priv);
>         gfar_free_tx_queues(priv);
> -       of_node_put(priv->phy_node);
> +       phylink_destroy(priv->phylink);
>         of_node_put(priv->tbi_node);
>         free_gfar_dev(priv);
>         return err;
> @@ -1477,19 +1504,15 @@ static int gfar_probe(struct platform_device *ofdev)
>  static int gfar_remove(struct platform_device *ofdev)
>  {
>         struct gfar_private *priv = platform_get_drvdata(ofdev);
> -       struct device_node *np = ofdev->dev.of_node;
>
> -       of_node_put(priv->phy_node);
>         of_node_put(priv->tbi_node);
>
>         unregister_netdev(priv->ndev);
>
> -       if (of_phy_is_fixed_link(np))
> -               of_phy_deregister_fixed_link(np);
> -
>         unmap_group_regs(priv);
>         gfar_free_rx_queues(priv);
>         gfar_free_tx_queues(priv);
> +       phylink_destroy(priv->phylink);
>         free_gfar_dev(priv);
>
>         return 0;
> @@ -1643,9 +1666,11 @@ static int gfar_suspend(struct device *dev)
>                 gfar_start_wol_filer(priv);
>
>         } else {
> -               phy_stop(ndev->phydev);
> +               phylink_stop(phy->phylink);
>         }
>
> +       priv->speed = SPEED_UNKNOWN;
> +
>         return 0;
>  }
>
> @@ -1672,7 +1697,7 @@ static int gfar_resume(struct device *dev)
>                 gfar_filer_restore_table(priv);
>
>         } else {
> -               phy_start(ndev->phydev);
> +               phylink_start(priv->phylink);
>         }
>
>         gfar_start(priv);
> @@ -1702,12 +1727,7 @@ static int gfar_restore(struct device *dev)
>
>         gfar_start(priv);
>
> -       priv->oldlink = 0;
> -       priv->oldspeed = 0;
> -       priv->oldduplex = -1;
> -
> -       if (ndev->phydev)
> -               phy_start(ndev->phydev);
> +       phylink_start(priv->phylink);
>
>         netif_device_attach(ndev);
>         enable_napi(priv);
> @@ -1781,46 +1801,26 @@ static phy_interface_t gfar_get_interface(struct net_device *dev)
>   */
>  static int init_phy(struct net_device *dev)
>  {
> -       __ETHTOOL_DECLARE_LINK_MODE_MASK(mask) = { 0, };
>         struct gfar_private *priv = netdev_priv(dev);
>         phy_interface_t interface;
> -       struct phy_device *phydev;
>         struct ethtool_eee edata;
> +       int flags = 0;
> +       int err;
>
> -       linkmode_set_bit_array(phy_10_100_features_array,
> -                              ARRAY_SIZE(phy_10_100_features_array),
> -                              mask);
> -       linkmode_set_bit(ETHTOOL_LINK_MODE_Autoneg_BIT, mask);
> -       linkmode_set_bit(ETHTOOL_LINK_MODE_MII_BIT, mask);
> -       if (priv->device_flags & FSL_GIANFAR_DEV_HAS_GIGABIT)
> -               linkmode_set_bit(ETHTOOL_LINK_MODE_1000baseT_Full_BIT, mask);
> -
> -       priv->oldlink = 0;
> -       priv->oldspeed = 0;
> -       priv->oldduplex = -1;
> -
> -       interface = gfar_get_interface(dev);
> -
> -       phydev = of_phy_connect(dev, priv->phy_node, &adjust_link, 0,
> -                               interface);
> -       if (!phydev) {
> -               dev_err(&dev->dev, "could not attach to PHY\n");
> -               return -ENODEV;
> +       err = phylink_of_phy_connect(priv->phylink, priv->device_node, flags);
> +       if (err) {
> +               netdev_err(dev, "could not attach to PHY: %d\n", err);
> +               return err;

While you're at it, can you say "connect" instead of "attach" here?

>         }
>
> +       priv->tbi_phy = NULL;
> +       interface = gfar_get_interface(dev);

Be consistent and just go for priv->interface. Nobody's changing it anyway.

>         if (interface == PHY_INTERFACE_MODE_SGMII)
>                 gfar_configure_serdes(dev);
>

My gut feeling says that this should be moved to gfar_mac_config.

> -       /* Remove any features not supported by the controller */
> -       linkmode_and(phydev->supported, phydev->supported, mask);
> -       linkmode_copy(phydev->advertising, phydev->supported);
> -
> -       /* Add support for flow control */
> -       phy_support_asym_pause(phydev);
> -
>         /* disable EEE autoneg, EEE not supported by eTSEC */
>         memset(&edata, 0, sizeof(struct ethtool_eee));
> -       phy_ethtool_set_eee(phydev, &edata);
> +       phylink_ethtool_set_eee(priv->phylink, &edata);
>
>         return 0;
>  }
> @@ -1850,6 +1850,8 @@ static void gfar_configure_serdes(struct net_device *dev)
>                 return;
>         }
>
> +       priv->tbi_phy = tbiphy;
> +
>         /* If the link is already up, we must already be ok, and don't need to
>          * configure and reset the TBI<->SerDes link.  Maybe U-Boot configured
>          * everything for us?  Resetting it takes the link down and requires
> @@ -1964,7 +1966,7 @@ void stop_gfar(struct net_device *dev)
>         /* disable ints and gracefully shut down Rx/Tx DMA */
>         gfar_halt(priv);
>
> -       phy_stop(dev->phydev);
> +       phylink_stop(priv->phylink);
>
>         free_skb_resources(priv);
>  }
> @@ -2219,12 +2221,7 @@ int startup_gfar(struct net_device *ndev)
>         /* Start Rx/Tx DMA and enable the interrupts */
>         gfar_start(priv);
>
> -       /* force link state update after mac reset */
> -       priv->oldlink = 0;
> -       priv->oldspeed = 0;
> -       priv->oldduplex = -1;
> -
> -       phy_start(ndev->phydev);
> +       phylink_start(priv->phylink);
>
>         enable_napi(priv);
>
> @@ -2593,7 +2590,7 @@ static int gfar_close(struct net_device *dev)
>         stop_gfar(dev);
>
>         /* Disconnect from the PHY */
> -       phy_disconnect(dev->phydev);
> +       phylink_disconnect_phy(priv->phylink);

It is very odd to disconnect from the PHY on ndo_close and connect
back on ndo_open. I don't know of any other driver that does that.
Can't you change the behavior to simply start and stop phylink here?

>
>         gfar_free_irq(priv);
>
> @@ -3387,23 +3384,6 @@ static irqreturn_t gfar_interrupt(int irq, void *grp_id)
>         return IRQ_HANDLED;
>  }
>
> -/* Called every time the controller might need to be made
> - * aware of new link state.  The PHY code conveys this
> - * information through variables in the phydev structure, and this
> - * function converts those variables into the appropriate
> - * register values, and can bring down the device if needed.
> - */
> -static void adjust_link(struct net_device *dev)
> -{
> -       struct gfar_private *priv = netdev_priv(dev);
> -       struct phy_device *phydev = dev->phydev;
> -
> -       if (unlikely(phydev->link != priv->oldlink ||
> -                    (phydev->link && (phydev->duplex != priv->oldduplex ||
> -                                      phydev->speed != priv->oldspeed))))
> -               gfar_update_link_state(priv);
> -}

Getting rid of the cruft from this function deserves its own patch.

> -
>  /* Update the hash table based on the current list of multicast
>   * addresses we subscribe to.  Also, change the promiscuity of
>   * the device based on the flags (this function is called
> @@ -3635,132 +3615,169 @@ static irqreturn_t gfar_error(int irq, void *grp_id)
>         return IRQ_HANDLED;
>  }
>
> -static u32 gfar_get_flowctrl_cfg(struct gfar_private *priv)
> +static void gfar_phylink_validate(struct phylink_config *config,
> +                                 unsigned long *supported,
> +                                 struct phylink_link_state *state)
> +{
> +       struct gfar_private *priv = netdev_priv(to_net_dev(config->dev));
> +       __ETHTOOL_DECLARE_LINK_MODE_MASK(mask) = { 0, };
> +
> +       if (state->interface != PHY_INTERFACE_MODE_NA &&
> +           state->interface != PHY_INTERFACE_MODE_SGMII &&
> +           state->interface != PHY_INTERFACE_MODE_1000BASEX &&

Russell is right, you shouldn't advertise 1000Base-X if there is no
way of making sure that it works.
His question was:

> How is the difference between SGMII and 1000BASE-X handled?

To be honest I don't have a complete answer to that question. The
literature recommends writing 0x01a0 to the MII_ADVERTISE (0x4)
register of the MAC PCS for 1000Base-X, and 0x4001 for SGMII.
The FMan driver which uses the TSEC MAC does exactly that:
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/tree/drivers/net/ethernet/freescale/fman/fman_dtsec.c#n58
However I can't seem to be able to trace down the definition of bit 14
from 0x4001 - it's reserved in all the manuals I see. I have a hunch
that it selects the format of the Config_Reg base page between
1000Base-X and SGMII.
But the bad news is that gianfar is only configuring the TBI PHY for
the 1000Base-X Config_Reg, even for SGMII PHYs. I'm not quite sure,
but I think that the only reason that this doesn't break in-band AN is
that we don't use in-band AN.

> +           !phy_interface_mode_is_rgmii(state->interface)) {
> +               phylink_zero(supported);
> +               return;
> +       }
> +
> +       phylink_set(mask, Autoneg);
> +       phylink_set(mask, Pause);
> +       phylink_set(mask, Asym_Pause);
> +       phylink_set_port_modes(mask);
> +
> +       phylink_set(mask, 10baseT_Half);
> +       phylink_set(mask, 10baseT_Full);
> +       phylink_set(mask, 100baseT_Half);
> +       phylink_set(mask, 100baseT_Full);
> +
> +       if (priv->device_flags & FSL_GIANFAR_DEV_HAS_GIGABIT) {
> +               phylink_set(mask, 1000baseX_Full);
> +               phylink_set(mask, 1000baseT_Full);
> +       }
> +
> +       linkmode_and(supported, supported, mask);
> +       linkmode_and(state->advertising, state->advertising, mask);
> +}
> +
> +static int gfar_mac_link_state(struct phylink_config *config,
> +                              struct phylink_link_state *state)
> +{
> +       if (state->interface == PHY_INTERFACE_MODE_SGMII ||
> +           state->interface == PHY_INTERFACE_MODE_1000BASEX) {

What if you reduce the indentation level by 1 here, by just exiting if
the interface mode is not SGMII?

> +               struct gfar_private *priv =
> +                       netdev_priv(to_net_dev(config->dev));
> +               u16 tbi_cr;
> +
> +               if (!priv->tbi_phy)
> +                       return -ENODEV;
> +
> +               tbi_cr = phy_read(priv->tbi_phy, MII_TBI_CR);
> +
> +               state->duplex = !!(tbi_cr & TBI_CR_FULL_DUPLEX);

Woah there. Aren't you supposed to first ensure state->an_complete is
ok, based on TBI_MII_Register_Set_SR[AN_Done]? There's also a
Link_Status bit in that register that you could retrieve.

> +               if ((tbi_cr & TBI_CR_SPEED_1000_MASK) == TBI_CR_SPEED_1000_MASK)
> +                       state->speed = SPEED_1000;
> +       }

See the Speed_Bit table from TBI_MII_Register_Set_ANLPBPA_SGMII for
the link partner (aka SGMII PHY) advertisement. You have to do a
logical-and between that and your own. Also please make sure you
really are in SGMII AN and not 1000 Base-X when interpreting registers
4 & 5 as one way or another.

> +
> +       return 1;
> +}
> +
> +static u32 gfar_get_flowctrl_cfg(const struct phylink_link_state *state)
>  {
> -       struct net_device *ndev = priv->ndev;
> -       struct phy_device *phydev = ndev->phydev;
>         u32 val = 0;
>
> -       if (!phydev->duplex)
> +       if (!state->duplex)
>                 return val;
>
> -       if (!priv->pause_aneg_en) {
> -               if (priv->tx_pause_en)
> -                       val |= MACCFG1_TX_FLOW;
> -               if (priv->rx_pause_en)
> -                       val |= MACCFG1_RX_FLOW;
> -       } else {
> -               u16 lcl_adv, rmt_adv;
> -               u8 flowctrl;
> -               /* get link partner capabilities */
> -               rmt_adv = 0;
> -               if (phydev->pause)
> -                       rmt_adv = LPA_PAUSE_CAP;
> -               if (phydev->asym_pause)
> -                       rmt_adv |= LPA_PAUSE_ASYM;
> -
> -               lcl_adv = linkmode_adv_to_lcl_adv_t(phydev->advertising);
> -               flowctrl = mii_resolve_flowctrl_fdx(lcl_adv, rmt_adv);
> -               if (flowctrl & FLOW_CTRL_TX)
> -                       val |= MACCFG1_TX_FLOW;
> -               if (flowctrl & FLOW_CTRL_RX)
> -                       val |= MACCFG1_RX_FLOW;
> -       }
> +       if (state->pause & MLO_PAUSE_TX)
> +               val |= MACCFG1_TX_FLOW;
> +       if (state->pause & MLO_PAUSE_RX)
> +               val |= MACCFG1_RX_FLOW;
>
>         return val;
>  }
>
> -static noinline void gfar_update_link_state(struct gfar_private *priv)
> +static void gfar_mac_config(struct phylink_config *config, unsigned int mode,
> +                           const struct phylink_link_state *state)
>  {
> +       struct gfar_private *priv = netdev_priv(to_net_dev(config->dev));
>         struct gfar __iomem *regs = priv->gfargrp[0].regs;
> -       struct net_device *ndev = priv->ndev;
> -       struct phy_device *phydev = ndev->phydev;
> -       struct gfar_priv_rx_q *rx_queue = NULL;
> -       int i;
> +       u32 maccfg1, new_maccfg1;
> +       u32 maccfg2, new_maccfg2;
> +       u32 ecntrl, new_ecntrl;
> +       u32 tx_flow, new_tx_flow;

Don't introduce new_ variables. Is there any issue if you
unconditionally write to the MAC registers?

>
>         if (unlikely(test_bit(GFAR_RESETTING, &priv->state)))
>                 return;
>
> -       if (phydev->link) {
> -               u32 tempval1 = gfar_read(&regs->maccfg1);
> -               u32 tempval = gfar_read(&regs->maccfg2);
> -               u32 ecntrl = gfar_read(&regs->ecntrl);
> -               u32 tx_flow_oldval = (tempval1 & MACCFG1_TX_FLOW);
> +       if (unlikely(phylink_autoneg_inband(mode)))
> +               return;
>
> -               if (phydev->duplex != priv->oldduplex) {
> -                       if (!(phydev->duplex))
> -                               tempval &= ~(MACCFG2_FULL_DUPLEX);
> -                       else
> -                               tempval |= MACCFG2_FULL_DUPLEX;
> +       maccfg1 = gfar_read(&regs->maccfg1);
> +       maccfg2 = gfar_read(&regs->maccfg2);
> +       ecntrl = gfar_read(&regs->ecntrl);
>
> -                       priv->oldduplex = phydev->duplex;
> -               }
> +       new_maccfg2 = maccfg2 & ~(MACCFG2_FULL_DUPLEX | MACCFG2_IF);
> +       new_ecntrl = ecntrl & ~ECNTRL_R100;
>
> -               if (phydev->speed != priv->oldspeed) {
> -                       switch (phydev->speed) {
> -                       case 1000:
> -                               tempval =
> -                                   ((tempval & ~(MACCFG2_IF)) | MACCFG2_GMII);
> +       if (state->duplex)
> +               new_maccfg2 |= MACCFG2_FULL_DUPLEX;
>
> -                               ecntrl &= ~(ECNTRL_R100);
> -                               break;
> -                       case 100:
> -                       case 10:
> -                               tempval =
> -                                   ((tempval & ~(MACCFG2_IF)) | MACCFG2_MII);
> -
> -                               /* Reduced mode distinguishes
> -                                * between 10 and 100
> -                                */
> -                               if (phydev->speed == SPEED_100)
> -                                       ecntrl |= ECNTRL_R100;
> -                               else
> -                                       ecntrl &= ~(ECNTRL_R100);
> -                               break;
> -                       default:
> -                               netif_warn(priv, link, priv->ndev,
> -                                          "Ack!  Speed (%d) is not 10/100/1000!\n",
> -                                          phydev->speed);

Please 1. remove "Ack!" 2. treat SPEED_UNKNOWN here by setting the MAC
into a low-power state (e.g. 10 Mbps - the power savings are real).
Don't print that Speed -1 is not 10/100/1000, we know that.

> -                               break;
> -                       }
> -
> -                       priv->oldspeed = phydev->speed;
> -               }
> -
> -               tempval1 &= ~(MACCFG1_TX_FLOW | MACCFG1_RX_FLOW);
> -               tempval1 |= gfar_get_flowctrl_cfg(priv);
> -
> -               /* Turn last free buffer recording on */
> -               if ((tempval1 & MACCFG1_TX_FLOW) && !tx_flow_oldval) {
> -                       for (i = 0; i < priv->num_rx_queues; i++) {
> -                               u32 bdp_dma;
> -
> -                               rx_queue = priv->rx_queue[i];
> -                               bdp_dma = gfar_rxbd_dma_lastfree(rx_queue);
> -                               gfar_write(rx_queue->rfbptr, bdp_dma);
> -                       }
> -
> -                       priv->tx_actual_en = 1;
> -               }
> -
> -               if (unlikely(!(tempval1 & MACCFG1_TX_FLOW) && tx_flow_oldval))
> -                       priv->tx_actual_en = 0;
> -
> -               gfar_write(&regs->maccfg1, tempval1);
> -               gfar_write(&regs->maccfg2, tempval);
> -               gfar_write(&regs->ecntrl, ecntrl);
> -
> -               if (!priv->oldlink)
> -                       priv->oldlink = 1;
> -
> -       } else if (priv->oldlink) {
> -               priv->oldlink = 0;
> -               priv->oldspeed = 0;
> -               priv->oldduplex = -1;
> +       switch (state->speed) {
> +       case SPEED_1000:
> +               new_maccfg2 |= MACCFG2_GMII;
> +               break;
> +       case SPEED_100:
> +               new_maccfg2 |= MACCFG2_MII;
> +               new_ecntrl = ecntrl | ECNTRL_R100;
> +               break;
> +       case SPEED_10:
> +               new_maccfg2 |= MACCFG2_MII;
> +               break;
> +       default:
> +               netif_warn(priv, link, priv->ndev,
> +                          "Ack!  Speed (%d) is not 10/100/1000!\n",
> +                          state->speed);
> +               return;
>         }
>
> -       if (netif_msg_link(priv))
> -               phy_print_status(phydev);
> +       priv->speed = state->speed;
> +
> +       new_maccfg1 = maccfg1 & ~(MACCFG1_TX_FLOW | MACCFG1_RX_FLOW);
> +       new_maccfg1 |= gfar_get_flowctrl_cfg(state);
> +
> +       /* Turn last free buffer recording on */
> +       tx_flow = maccfg1 & MACCFG1_TX_FLOW;
> +       new_tx_flow = new_maccfg1 & MACCFG1_TX_FLOW;
> +       if (new_tx_flow && !tx_flow) {
> +               int i;
> +
> +               for (i = 0; i < priv->num_rx_queues; i++) {
> +                       struct gfar_priv_rx_q *rx_queue;
> +                       u32 bdp_dma;
> +
> +                       rx_queue = priv->rx_queue[i];
> +                       bdp_dma = gfar_rxbd_dma_lastfree(rx_queue);
> +                       gfar_write(rx_queue->rfbptr, bdp_dma);
> +               }
> +
> +               priv->tx_actual_en = 1;
> +       } else if (unlikely(!new_tx_flow && tx_flow)) {
> +               priv->tx_actual_en = 0;
> +       }
> +
> +       if (new_maccfg1 != maccfg1)
> +               gfar_write(&regs->maccfg1, new_maccfg1);
> +       if (new_maccfg2 != maccfg2)
> +               gfar_write(&regs->maccfg2, new_maccfg2);
> +       if (new_ecntrl != ecntrl)
> +               gfar_write(&regs->ecntrl, new_ecntrl);
> +}
> +
> +static void gfar_mac_an_restart(struct phylink_config *config)
> +{
> +       /* Not supported */
> +}

What about running gfar_configure_serdes again?

> +
> +static void gfar_mac_link_down(struct phylink_config *config, unsigned int mode,
> +                              phy_interface_t interface)
> +{
> +       /* Not supported */
> +}
> +

What about disabling RX_EN and TX_EN from MACCFG1?

> +static void gfar_mac_link_up(struct phylink_config *config, unsigned int mode,
> +                            phy_interface_t interface, struct phy_device *phy)
> +{
> +       /* Not supported */
>  }
>

What about enabling RX_EN and TX_EN from MACCFG1?

>  static const struct of_device_id gfar_match[] =
> diff --git a/drivers/net/ethernet/freescale/gianfar.h b/drivers/net/ethernet/freescale/gianfar.h
> index f2af96349c7b..0b28b1f60f2d 100644
> --- a/drivers/net/ethernet/freescale/gianfar.h
> +++ b/drivers/net/ethernet/freescale/gianfar.h
> @@ -31,8 +31,7 @@
>  #include <linux/skbuff.h>
>  #include <linux/spinlock.h>
>  #include <linux/mm.h>
> -#include <linux/mii.h>
> -#include <linux/phy.h>
> +#include <linux/phylink.h>
>
>  #include <asm/io.h>
>  #include <asm/irq.h>
> @@ -149,8 +148,13 @@ extern const char gfar_driver_version[];
>  #define GFAR_SUPPORTED_GBIT SUPPORTED_1000baseT_Full
>
>  /* TBI register addresses */
> +#define MII_TBI_CR             0x00
>  #define MII_TBICON             0x11
>
> +/* TBI_CR register bit fields */
> +#define TBI_CR_FULL_DUPLEX     0x0100
> +#define TBI_CR_SPEED_1000_MASK 0x0040
> +

I think BIT() definitions are preferred, even if that means you have
to convert existing code first.

>  /* TBICON register bit fields */
>  #define TBICON_CLK_SELECT      0x0020
>
> @@ -1148,12 +1152,12 @@ struct gfar_private {
>
>         /* PHY stuff */
>         phy_interface_t interface;
> -       struct device_node *phy_node;
> +       struct device_node *device_node;
>         struct device_node *tbi_node;
> -       struct mii_bus *mii_bus;
> -       int oldspeed;
> -       int oldduplex;
> -       int oldlink;
> +       struct phylink *phylink;
> +       struct phylink_config phylink_config;
> +       struct phy_device *tbi_phy;
> +       int speed;
>
>         uint32_t msg_enable;
>
> @@ -1165,11 +1169,7 @@ struct gfar_private {
>                 bd_stash_en:1,
>                 rx_filer_enable:1,
>                 /* Enable priorty based Tx scheduling in Hw */
> -               prio_sched_en:1,
> -               /* Flow control flags */
> -               pause_aneg_en:1,
> -               tx_pause_en:1,
> -               rx_pause_en:1;
> +               prio_sched_en:1;
>
>         /* The total tx and rx ring size for the enabled queues */
>         unsigned int total_tx_ring_size;
> @@ -1333,8 +1333,6 @@ void reset_gfar(struct net_device *dev);
>  void gfar_mac_reset(struct gfar_private *priv);
>  void gfar_halt(struct gfar_private *priv);
>  void gfar_start(struct gfar_private *priv);
> -void gfar_phy_test(struct mii_bus *bus, struct phy_device *phydev, int enable,
> -                  u32 regnum, u32 read);
>  void gfar_configure_coalescing_all(struct gfar_private *priv);
>  int gfar_set_features(struct net_device *dev, netdev_features_t features);
>
> diff --git a/drivers/net/ethernet/freescale/gianfar_ethtool.c b/drivers/net/ethernet/freescale/gianfar_ethtool.c
> index 3433b46b90c1..146b30d07789 100644
> --- a/drivers/net/ethernet/freescale/gianfar_ethtool.c
> +++ b/drivers/net/ethernet/freescale/gianfar_ethtool.c
> @@ -35,7 +35,7 @@
>  #include <asm/types.h>
>  #include <linux/ethtool.h>
>  #include <linux/mii.h>
> -#include <linux/phy.h>
> +#include <linux/phylink.h>
>  #include <linux/sort.h>
>  #include <linux/if_vlan.h>
>  #include <linux/of_platform.h>
> @@ -207,12 +207,10 @@ static void gfar_get_regs(struct net_device *dev, struct ethtool_regs *regs,
>  static unsigned int gfar_usecs2ticks(struct gfar_private *priv,
>                                      unsigned int usecs)
>  {
> -       struct net_device *ndev = priv->ndev;
> -       struct phy_device *phydev = ndev->phydev;

Are you sure this still works? You missed a ndev->phydev check from
gfar_gcoalesce, where this is called from. Technically you can still
check ndev->phydev, it's just that PHYLINK doesn't guarantee you'll
have one (e.g. fixed-link interfaces).

>         unsigned int count;
>
>         /* The timer is different, depending on the interface speed */
> -       switch (phydev->speed) {
> +       switch (priv->speed) {

What about retrieving the speed from phylink_ethtool_ksettings_get,
and using that here? This would save you from introducing a cached
priv->speed.

>         case SPEED_1000:
>                 count = GFAR_GBIT_TIME;
>                 break;
> @@ -234,12 +232,10 @@ static unsigned int gfar_usecs2ticks(struct gfar_private *priv,
>  static unsigned int gfar_ticks2usecs(struct gfar_private *priv,
>                                      unsigned int ticks)
>  {
> -       struct net_device *ndev = priv->ndev;
> -       struct phy_device *phydev = ndev->phydev;
>         unsigned int count;
>
>         /* The timer is different, depending on the interface speed */
> -       switch (phydev->speed) {
> +       switch (priv->speed) {

Same comments as above, but for gfar_scoalesce.

>         case SPEED_1000:
>                 count = GFAR_GBIT_TIME;
>                 break;
> @@ -489,58 +485,15 @@ static void gfar_gpauseparam(struct net_device *dev,
>  {
>         struct gfar_private *priv = netdev_priv(dev);
>
> -       epause->autoneg = !!priv->pause_aneg_en;
> -       epause->rx_pause = !!priv->rx_pause_en;
> -       epause->tx_pause = !!priv->tx_pause_en;
> +       phylink_ethtool_get_pauseparam(priv->phylink, epause);
>  }
>
>  static int gfar_spauseparam(struct net_device *dev,
>                             struct ethtool_pauseparam *epause)
>  {
>         struct gfar_private *priv = netdev_priv(dev);
> -       struct phy_device *phydev = dev->phydev;
> -       struct gfar __iomem *regs = priv->gfargrp[0].regs;
>
> -       if (!phydev)
> -               return -ENODEV;
> -
> -       if (!phy_validate_pause(phydev, epause))
> -               return -EINVAL;
> -
> -       priv->rx_pause_en = priv->tx_pause_en = 0;
> -       phy_set_asym_pause(phydev, epause->rx_pause, epause->tx_pause);
> -       if (epause->rx_pause) {
> -               priv->rx_pause_en = 1;
> -
> -               if (epause->tx_pause) {
> -                       priv->tx_pause_en = 1;
> -               }
> -       } else if (epause->tx_pause) {
> -               priv->tx_pause_en = 1;
> -       }
> -
> -       if (epause->autoneg)
> -               priv->pause_aneg_en = 1;
> -       else
> -               priv->pause_aneg_en = 0;
> -
> -       if (!epause->autoneg) {
> -               u32 tempval = gfar_read(&regs->maccfg1);
> -
> -               tempval &= ~(MACCFG1_TX_FLOW | MACCFG1_RX_FLOW);
> -
> -               priv->tx_actual_en = 0;
> -               if (priv->tx_pause_en) {
> -                       priv->tx_actual_en = 1;
> -                       tempval |= MACCFG1_TX_FLOW;
> -               }
> -
> -               if (priv->rx_pause_en)
> -                       tempval |= MACCFG1_RX_FLOW;
> -               gfar_write(&regs->maccfg1, tempval);
> -       }
> -
> -       return 0;
> +       return phylink_ethtool_set_pauseparam(priv->phylink, epause);
>  }
>
>  int gfar_set_features(struct net_device *dev, netdev_features_t features)
> @@ -1519,6 +1472,24 @@ static int gfar_get_ts_info(struct net_device *dev,
>         return 0;
>  }
>
> +/* Set link ksettings (phy address, speed) for ethtools */

ethtool, not ethtools. Also, I'm not quite sure what you mean by
setting the "phy address" with ethtool.

> +static int gfar_get_link_ksettings(struct net_device *dev,
> +                                  struct ethtool_link_ksettings *cmd)
> +{
> +       struct gfar_private *priv = netdev_priv(dev);
> +
> +       return phylink_ethtool_ksettings_get(priv->phylink, cmd);
> +}
> +
> +/* Get link ksettings for ethtools */
> +static int gfar_set_link_ksettings(struct net_device *dev,
> +                                  const struct ethtool_link_ksettings *cmd)
> +{
> +       struct gfar_private *priv = netdev_priv(dev);
> +
> +       return phylink_ethtool_ksettings_set(priv->phylink, cmd);
> +}
> +
>  const struct ethtool_ops gfar_ethtool_ops = {
>         .get_drvinfo = gfar_gdrvinfo,
>         .get_regs_len = gfar_reglen,
> @@ -1542,6 +1513,6 @@ const struct ethtool_ops gfar_ethtool_ops = {
>         .set_rxnfc = gfar_set_nfc,
>         .get_rxnfc = gfar_get_nfc,
>         .get_ts_info = gfar_get_ts_info,
> -       .get_link_ksettings = phy_ethtool_get_link_ksettings,
> -       .set_link_ksettings = phy_ethtool_set_link_ksettings,
> +       .get_link_ksettings = gfar_get_link_ksettings,
> +       .set_link_ksettings = gfar_set_link_ksettings,
>  };
> --
> 2.22.0
>

Thanks a lot for doing this!
-Vladimir

^ permalink raw reply

* Re: [PATCH 1/4] dt-bindings: net: Add aspeed,ast2600-mdio binding
From: Rob Herring @ 2019-07-29 23:37 UTC (permalink / raw)
  To: Andrew Jeffery
  Cc: netdev, David Miller, Mark Rutland, Joel Stanley, Andrew Lunn,
	Florian Fainelli, Heiner Kallweit, devicetree,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
	linux-aspeed, linux-kernel@vger.kernel.org
In-Reply-To: <20190729043926.32679-2-andrew@aj.id.au>

On Sun, Jul 28, 2019 at 10:39 PM Andrew Jeffery <andrew@aj.id.au> wrote:
>
> The AST2600 splits out the MDIO bus controller from the MAC into its own
> IP block and rearranges the register layout. Add a new binding to
> describe the new hardware.
>
> Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
> ---
>  .../bindings/net/aspeed,ast2600-mdio.yaml     | 61 +++++++++++++++++++
>  1 file changed, 61 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/net/aspeed,ast2600-mdio.yaml
>
> diff --git a/Documentation/devicetree/bindings/net/aspeed,ast2600-mdio.yaml b/Documentation/devicetree/bindings/net/aspeed,ast2600-mdio.yaml
> new file mode 100644
> index 000000000000..fa86f6438473
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/net/aspeed,ast2600-mdio.yaml
> @@ -0,0 +1,61 @@
> +# SPDX-License-Identifier: GPL-2.0-or-later
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/net/aspeed,ast2600-mdio.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: ASPEED AST2600 MDIO Controller
> +
> +maintainers:
> +  - Andrew Jeffery <andrew@aj.id.au>
> +
> +description: |+
> +  The ASPEED AST2600 MDIO controller is the third iteration of ASPEED's MDIO
> +  bus register interface, this time also separating out the controller from the
> +  MAC.
> +

Should have a:

allOf:
  - $ref: "mdio.yaml#"

> +properties:
> +  compatible:
> +    const: aspeed,ast2600-mdio
> +  reg:
> +    maxItems: 1
> +    description: The register range of the MDIO controller instance

> +  "#address-cells":
> +    const: 1
> +  "#size-cells":
> +    const: 0

Then you can drop these 2.

> +
> +patternProperties:
> +  "^ethernet-phy@[a-f0-9]$":
> +    properties:
> +      reg:
> +        description:
> +          The MDIO bus index of the PHY

And this.

> +      compatible:
> +        enum:
> +          - ethernet-phy-ieee802.3-c22
> +          - ethernet-phy-ieee802.3-c45

This isn't specific to ASpeed either and is already covered by
ethernet-phy.yaml.

So that means none of the child node schema is needed here.

> +    required:
> +      - reg
> +
> +required:
> +  - compatible
> +  - reg
> +  - "#address-cells"
> +  - "#size-cells"
> +
> +examples:
> +  - |
> +    mdio0: mdio@1e650000 {
> +            compatible = "aspeed,ast2600-mdio";
> +            reg = <0x1e650000 0x8>;
> +            #address-cells = <1>;
> +            #size-cells = <0>;
> +
> +            status = "okay";

Don't show status in examples.

> +
> +            ethphy0: ethernet-phy@0 {
> +                    compatible = "ethernet-phy-ieee802.3-c22";
> +                    reg = <0>;
> +            };
> +    };
> --
> 2.20.1
>

^ permalink raw reply

* [PATCH v2 net-next 1/1] tc-testing: Clarify the use of tdc's -d option
From: Lucas Bates @ 2019-07-29 23:18 UTC (permalink / raw)
  To: davem
  Cc: netdev, nicolas.dichtel, jhs, xiyou.wangcong, jiri, mleitner,
	vladbu, dcaratti, kernel, Lucas Bates

The -d command line argument to tdc requires the name of a physical device
on the system where the tests will be run. If -d has not been used, tdc
will skip tests that require a physical device.

This patch is intended to better document what the -d option does and how
it is used.

Signed-off-by: Lucas Bates <lucasb@mojatatu.com>
---
 tools/testing/selftests/tc-testing/README |  4 +++-
 tools/testing/selftests/tc-testing/tdc.py | 12 ++++++++----
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/tc-testing/README b/tools/testing/selftests/tc-testing/README
index 22e5da9..b0954c8 100644
--- a/tools/testing/selftests/tc-testing/README
+++ b/tools/testing/selftests/tc-testing/README
@@ -128,7 +128,9 @@ optional arguments:
   -v, --verbose         Show the commands that are being run
   -N, --notap           Suppress tap results for command under test
   -d DEVICE, --device DEVICE
-                        Execute the test case in flower category
+                        Execute test cases that use a physical device, where
+                        DEVICE is its name. (If not defined, tests that require
+                        a physical device will be skipped)
   -P, --pause           Pause execution just before post-suite stage
 
 selection:
diff --git a/tools/testing/selftests/tc-testing/tdc.py b/tools/testing/selftests/tc-testing/tdc.py
index f04321a..e566c70 100755
--- a/tools/testing/selftests/tc-testing/tdc.py
+++ b/tools/testing/selftests/tc-testing/tdc.py
@@ -356,12 +356,14 @@ def test_runner(pm, args, filtered_tests):
     time.sleep(2)
     for tidx in testlist:
         if "flower" in tidx["category"] and args.device == None:
+            errmsg = "Tests using the DEV2 variable must define the name of a "
+            errmsg += "physical NIC with the -d option when running tdc.\n"
+            errmsg += "Test has been skipped."
             if args.verbose > 1:
-                print('Not executing test {} {} because DEV2 not defined'.
-                      format(tidx['id'], tidx['name']))
+                print(errmsg)
             res = TestResult(tidx['id'], tidx['name'])
             res.set_result(ResultState.skip)
-            res.set_errormsg('Not executed because DEV2 is not defined')
+            res.set_errormsg(errmsg)
             tsr.add_resultdata(res)
             continue
         try:
@@ -499,7 +501,9 @@ def set_args(parser):
         choices=['none', 'xunit', 'tap'],
         help='Specify the format for test results. (Default: TAP)')
     parser.add_argument('-d', '--device',
-                        help='Execute the test case in flower category')
+                        help='Execute test cases that use a physical device, ' +
+                        'where DEVICE is its name. (If not defined, tests ' +
+                        'that require a physical device will be skipped)')
     parser.add_argument(
         '-P', '--pause', action='store_true',
         help='Pause execution just before post-suite stage')
-- 
2.7.4


^ permalink raw reply related

* Re: [PATCH bpf-next 00/10] CO-RE offset relocations
From: Andrii Nakryiko @ 2019-07-29 23:09 UTC (permalink / raw)
  To: Song Liu
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Yonghong Song, Kernel Team
In-Reply-To: <CAPhsuW5H2QQjuASV2iXTdA73E7AQnj73b77x4FmJomc-gJy-Cg@mail.gmail.com>

On Mon, Jul 29, 2019 at 1:37 PM Song Liu <liu.song.a23@gmail.com> wrote:
>
> On Mon, Jul 29, 2019 at 1:20 PM Song Liu <liu.song.a23@gmail.com> wrote:
> >
> > On Wed, Jul 24, 2019 at 1:34 PM Andrii Nakryiko <andriin@fb.com> wrote:
> > >
> > > This patch set implements central part of CO-RE (Compile Once - Run
> > > Everywhere, see [0] and [1] for slides and video): relocating field offsets.
> > > Most of the details are written down as comments to corresponding parts of the
> > > code.
> > >
> > > Patch #1 adds loading of .BTF.ext offset relocations section and macros to
> > > work with its contents.
> > > Patch #2 implements CO-RE relocations algorithm in libbpf.
> > > Patches #3-#10 adds selftests validating various parts of relocation handling,
> > > type compatibility, etc.
> > >
> > > For all tests to work, you'll need latest Clang/LLVM supporting
> > > __builtin_preserve_access_index intrinsic, used for recording offset
> > > relocations. Kernel on which selftests run should have BTF information built
> > > in (CONFIG_DEBUG_INFO_BTF=y).
> > >
> > >   [0] http://vger.kernel.org/bpfconf2019.html#session-2
> > >   [1] http://vger.kernel.org/lpc-bpf2018.html#session-2CO-RE relocations
> > >
> > > This patch set implements central part of CO-RE (Compile Once - Run
> > > Everywhere, see [0] and [1] for slides and video): relocating field offsets.
> > > Most of the details are written down as comments to corresponding parts of the
> > > code.
> > >
> > > Patch #1 adds loading of .BTF.ext offset relocations section and macros to
> > > work with its contents.
> > > Patch #2 implements CO-RE relocations algorithm in libbpf.
> > > Patches #3-#10 adds selftests validating various parts of relocation handling,
> > > type compatibility, etc.
> > >
> > > For all tests to work, you'll need latest Clang/LLVM supporting
> > > __builtin_preserve_access_index intrinsic, used for recording offset
> > > relocations. Kernel on which selftests run should have BTF information built
> > > in (CONFIG_DEBUG_INFO_BTF=y).
> > >
> > >   [0] http://vger.kernel.org/bpfconf2019.html#session-2
> > >   [1] http://vger.kernel.org/lpc-bpf2018.html#session-2
> > >
> > > Andrii Nakryiko (10):
> > >   libbpf: add .BTF.ext offset relocation section loading
> > >   libbpf: implement BPF CO-RE offset relocation algorithm
> > >   selftests/bpf: add CO-RE relocs testing setup
> > >   selftests/bpf: add CO-RE relocs struct flavors tests
> > >   selftests/bpf: add CO-RE relocs nesting tests
> > >   selftests/bpf: add CO-RE relocs array tests
> > >   selftests/bpf: add CO-RE relocs enum/ptr/func_proto tests
> > >   selftests/bpf: add CO-RE relocs modifiers/typedef tests
> > >   selftest/bpf: add CO-RE relocs ptr-as-array tests
> > >   selftests/bpf: add CO-RE relocs ints tests
> > >
> > >  tools/lib/bpf/btf.c                           |  64 +-
> > >  tools/lib/bpf/btf.h                           |   4 +
> > >  tools/lib/bpf/libbpf.c                        | 866 +++++++++++++++++-
> > >  tools/lib/bpf/libbpf.h                        |   1 +
> > >  tools/lib/bpf/libbpf_internal.h               |  91 ++
> > >  .../selftests/bpf/prog_tests/core_reloc.c     | 363 ++++++++
> > >  .../bpf/progs/btf__core_reloc_arrays.c        |   3 +
> > >  .../btf__core_reloc_arrays___diff_arr_dim.c   |   3 +
> > >  ...btf__core_reloc_arrays___diff_arr_val_sz.c |   3 +
> > >  .../btf__core_reloc_arrays___err_non_array.c  |   3 +
> > >  ...btf__core_reloc_arrays___err_too_shallow.c |   3 +
> > >  .../btf__core_reloc_arrays___err_too_small.c  |   3 +
> > >  ..._core_reloc_arrays___err_wrong_val_type1.c |   3 +
> > >  ..._core_reloc_arrays___err_wrong_val_type2.c |   3 +
> > >  .../bpf/progs/btf__core_reloc_flavors.c       |   3 +
> > >  .../btf__core_reloc_flavors__err_wrong_name.c |   3 +
> > >  .../bpf/progs/btf__core_reloc_ints.c          |   3 +
> > >  .../bpf/progs/btf__core_reloc_ints___bool.c   |   3 +
> > >  .../btf__core_reloc_ints___err_bitfield.c     |   3 +
> > >  .../btf__core_reloc_ints___err_wrong_sz_16.c  |   3 +
> > >  .../btf__core_reloc_ints___err_wrong_sz_32.c  |   3 +
> > >  .../btf__core_reloc_ints___err_wrong_sz_64.c  |   3 +
> > >  .../btf__core_reloc_ints___err_wrong_sz_8.c   |   3 +
> > >  .../btf__core_reloc_ints___reverse_sign.c     |   3 +
> > >  .../bpf/progs/btf__core_reloc_mods.c          |   3 +
> > >  .../progs/btf__core_reloc_mods___mod_swap.c   |   3 +
> > >  .../progs/btf__core_reloc_mods___typedefs.c   |   3 +
> > >  .../bpf/progs/btf__core_reloc_nesting.c       |   3 +
> > >  .../btf__core_reloc_nesting___anon_embed.c    |   3 +
> > >  ...f__core_reloc_nesting___dup_compat_types.c |   5 +
> > >  ...core_reloc_nesting___err_array_container.c |   3 +
> > >  ...tf__core_reloc_nesting___err_array_field.c |   3 +
> > >  ...e_reloc_nesting___err_dup_incompat_types.c |   4 +
> > >  ...re_reloc_nesting___err_missing_container.c |   3 +
> > >  ...__core_reloc_nesting___err_missing_field.c |   3 +
> > >  ..._reloc_nesting___err_nonstruct_container.c |   3 +
> > >  ...e_reloc_nesting___err_partial_match_dups.c |   4 +
> > >  .../btf__core_reloc_nesting___err_too_deep.c  |   3 +
> > >  .../btf__core_reloc_nesting___extra_nesting.c |   3 +
> > >  ..._core_reloc_nesting___struct_union_mixup.c |   3 +
> > >  .../bpf/progs/btf__core_reloc_primitives.c    |   3 +
> > >  ...f__core_reloc_primitives___diff_enum_def.c |   3 +
> > >  ..._core_reloc_primitives___diff_func_proto.c |   3 +
> > >  ...f__core_reloc_primitives___diff_ptr_type.c |   3 +
> > >  ...tf__core_reloc_primitives___err_non_enum.c |   3 +
> > >  ...btf__core_reloc_primitives___err_non_int.c |   3 +
> > >  ...btf__core_reloc_primitives___err_non_ptr.c |   3 +
> > >  .../bpf/progs/btf__core_reloc_ptr_as_arr.c    |   3 +
> > >  .../btf__core_reloc_ptr_as_arr___diff_sz.c    |   3 +
> > >  .../selftests/bpf/progs/core_reloc_types.h    | 642 +++++++++++++
> > >  .../bpf/progs/test_core_reloc_arrays.c        |  58 ++
> > >  .../bpf/progs/test_core_reloc_flavors.c       |  65 ++
> > >  .../bpf/progs/test_core_reloc_ints.c          |  48 +
> > >  .../bpf/progs/test_core_reloc_kernel.c        |  39 +
> > >  .../bpf/progs/test_core_reloc_mods.c          |  68 ++
> > >  .../bpf/progs/test_core_reloc_nesting.c       |  48 +
> > >  .../bpf/progs/test_core_reloc_primitives.c    |  50 +
> > >  .../bpf/progs/test_core_reloc_ptr_as_arr.c    |  34 +
> > >  58 files changed, 2527 insertions(+), 47 deletions(-)
> > >  create mode 100644 tools/testing/selftests/bpf/prog_tests/core_reloc.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___diff_arr_dim.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___diff_arr_val_sz.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___err_non_array.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___err_too_shallow.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___err_too_small.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___err_wrong_val_type1.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___err_wrong_val_type2.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_flavors.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_flavors__err_wrong_name.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___bool.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___err_bitfield.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___err_wrong_sz_16.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___err_wrong_sz_32.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___err_wrong_sz_64.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___err_wrong_sz_8.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___reverse_sign.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_mods.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_mods___mod_swap.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_mods___typedefs.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___anon_embed.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___dup_compat_types.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_array_container.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_array_field.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_dup_incompat_types.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_missing_container.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_missing_field.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_nonstruct_container.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_partial_match_dups.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_too_deep.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___extra_nesting.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___struct_union_mixup.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives___diff_enum_def.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives___diff_func_proto.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives___diff_ptr_type.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives___err_non_enum.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives___err_non_int.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives___err_non_ptr.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ptr_as_arr.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ptr_as_arr___diff_sz.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/core_reloc_types.h
> > >  create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_arrays.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_flavors.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_ints.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_kernel.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_mods.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_nesting.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_primitives.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_ptr_as_arr.c
> >
> > We have created a lot of small files. Would it be cleaner if we can
> > somehow put these
> > data in one file (maybe different sections?).
>
> After reading more, I guess you have tried this and end up with current
> design: keep most struct defines in core_reloc_types.h.

Yeah, I have all the definition in one header file, but then I need
individual combinations as separate BTFs, so I essentially "pick"
desired types using function declarations. Creating those BTFs by hand
would be a nightmare to create and maintain.

>
> >
> > Alternatively, maybe create a folder for these files:
> >   tools/testing/selftests/bpf/progs/core/
>
> I guess this would still make it cleaner.

There is nothing too special about core tests to split them. Also it
would require Makefile changes and would deviate test_progs
definitions from analogous test_maps, test_verifier, test_btf, etc, so
I'm not sure about that. I though about putting those btf__* files
under separate directory, but I'm on the fence there as well, as I'd
rather have related files to stay together...


>
> Thanks,
> Song

^ permalink raw reply

* [PATCH net] selftests/tls: fix TLS tests with CONFIG_TLS=n
From: Jakub Kicinski @ 2019-07-29 23:08 UTC (permalink / raw)
  To: davem; +Cc: netdev, oss-drivers, Jakub Kicinski, kernel test robot

Build bot reports some recent TLS tests are failing
with CONFIG_TLS=n. Correct the expected return code
and skip TLS installation if not supported.

Tested with CONFIG_TLS=n and CONFIG_TLS=m.

Reported-by: kernel test robot <rong.a.chen@intel.com>
Fixes: cf32526c8842 ("selftests/tls: add a test for ULP but no keys")
Fixes: 65d41fb317c6 ("selftests/tls: add a bidirectional test")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 tools/testing/selftests/net/tls.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/tools/testing/selftests/net/tls.c b/tools/testing/selftests/net/tls.c
index 630c5b884d43..d995e6503b1a 100644
--- a/tools/testing/selftests/net/tls.c
+++ b/tools/testing/selftests/net/tls.c
@@ -69,7 +69,7 @@ FIXTURE_SETUP(tls_basic)
 
 	ret = setsockopt(self->fd, IPPROTO_TCP, TCP_ULP, "tls", sizeof("tls"));
 	if (ret != 0) {
-		ASSERT_EQ(errno, ENOTSUPP);
+		ASSERT_EQ(errno, ENOENT);
 		self->notls = true;
 		printf("Failure setting TCP_ULP, testing without tls\n");
 		return;
@@ -696,21 +696,26 @@ TEST_F(tls, recv_lowat)
 
 TEST_F(tls, bidir)
 {
-	struct tls12_crypto_info_aes_gcm_128 tls12;
 	char const *test_str = "test_read";
 	int send_len = 10;
 	char buf[10];
 	int ret;
 
-	memset(&tls12, 0, sizeof(tls12));
-	tls12.info.version = TLS_1_3_VERSION;
-	tls12.info.cipher_type = TLS_CIPHER_AES_GCM_128;
+	if (!self->notls) {
+		struct tls12_crypto_info_aes_gcm_128 tls12;
 
-	ret = setsockopt(self->fd, SOL_TLS, TLS_RX, &tls12, sizeof(tls12));
-	ASSERT_EQ(ret, 0);
+		memset(&tls12, 0, sizeof(tls12));
+		tls12.info.version = TLS_1_3_VERSION;
+		tls12.info.cipher_type = TLS_CIPHER_AES_GCM_128;
 
-	ret = setsockopt(self->cfd, SOL_TLS, TLS_TX, &tls12, sizeof(tls12));
-	ASSERT_EQ(ret, 0);
+		ret = setsockopt(self->fd, SOL_TLS, TLS_RX, &tls12,
+				 sizeof(tls12));
+		ASSERT_EQ(ret, 0);
+
+		ret = setsockopt(self->cfd, SOL_TLS, TLS_TX, &tls12,
+				 sizeof(tls12));
+		ASSERT_EQ(ret, 0);
+	}
 
 	ASSERT_EQ(strlen(test_str) + 1, send_len);
 
-- 
2.21.0


^ permalink raw reply related

* Re: [PATCH net-next] net: dsa: mv88e6xxx: avoid some redundant vtu load/purge operations
From: Vivien Didelot @ 2019-07-29 22:46 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: Andrew Lunn, Florian Fainelli, David S. Miller, Rasmus Villemoes,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <20190722233713.31396-1-rasmus.villemoes@prevas.dk>

Hi Rasmus,

On Mon, 22 Jul 2019 23:37:26 +0000, Rasmus Villemoes <rasmus.villemoes@prevas.dk> wrote:
> We have an ERPS (Ethernet Ring Protection Switching) setup involving
> mv88e6250 switches which we're in the process of switching to a BSP
> based on the mainline driver. Breaking any link in the ring works as
> expected, with the ring reconfiguring itself quickly and traffic
> continuing with almost no noticable drops. However, when plugging back
> the cable, we see 5+ second stalls.
> 
> This has been tracked down to the userspace application in charge of
> the protocol missing a few CCM messages on the good link (the one that
> was not unplugged), causing it to broadcast a "signal fail". That
> message eventually reaches its link partner, which responds by
> blocking the port. Meanwhile, the first node has continued to block
> the port with the just plugged-in cable, breaking the network. And the
> reason for those missing CCM messages has in turn been tracked down to
> the VTU apparently being too busy servicing load/purge operations that
> the normal lookups are delayed.
> 
> Initial state, the link between C and D is blocked in software.
> 
>      _____________________
>     /                     \
>    |                       |
>    A ----- B ----- C *---- D
> 
> Unplug the cable between C and D.
> 
>      _____________________
>     /                     \
>    |                       |
>    A ----- B ----- C *   * D
> 
> Reestablish the link between C and D.
>      _____________________
>     /                     \
>    |                       |
>    A ----- B ----- C *---- D
> 
> Somehow, enough VTU/ATU operations happen inside C that prevents
> the application from receving the CCM messages from B in a timely
> manner, so a Signal Fail message is sent by C. When B receives
> that, it responds by blocking its port.
> 
>      _____________________
>     /                     \
>    |                       |
>    A ----- B *---* C *---- D
> 
> Very shortly after this, the signal fail condition clears on the
> BC link (some CCM messages finally make it through), so C
> unblocks the port. However, a guard timer inside B prevents it
> from removing the blocking before 5 seconds have elapsed.
> 
> It is not unlikely that our userspace ERPS implementation could be
> smarter and/or is simply buggy. However, this patch fixes the symptoms
> we see, and is a small optimization that should not break anything
> (knock wood). The idea is simply to avoid doing an VTU load of an
> entry identical to the one already present. To do that, we need to
> know whether mv88e6xxx_vtu_get() actually found an existing entry, or
> has just prepared a struct mv88e6xxx_vtu_entry for us to load. To that
> end, let vlan->valid be an output parameter. The other two callers of
> mv88e6xxx_vtu_get() are not affected by this patch since they pass
> new=false.
> 
> Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
> ---
>  drivers/net/dsa/mv88e6xxx/chip.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
> index 6b17cd961d06..2e500428670c 100644
> --- a/drivers/net/dsa/mv88e6xxx/chip.c
> +++ b/drivers/net/dsa/mv88e6xxx/chip.c
> @@ -1423,7 +1423,6 @@ static int mv88e6xxx_vtu_get(struct mv88e6xxx_chip *chip, u16 vid,
>  
>  		/* Initialize a fresh VLAN entry */
>  		memset(entry, 0, sizeof(*entry));
> -		entry->valid = true;
>  		entry->vid = vid;
>  
>  		/* Exclude all ports */
> @@ -1618,6 +1617,9 @@ static int _mv88e6xxx_port_vlan_add(struct mv88e6xxx_chip *chip, int port,
>  	if (err)
>  		return err;
>  
> +	if (vlan.valid && vlan.member[port] == member)
> +		return 0;
> +	vlan.valid = true;
>  	vlan.member[port] = member;
>  
>  	err = mv88e6xxx_vtu_loadpurge(chip, &vlan);

I'd prefer not to use the mv88e6xxx_vtu_entry structure for output
parameters, this can be confusing. As you correctly mentioned, this
initialization is only done for _mv88e6xxx_port_vlan_add, so I'll
prepare a patch which gets rid of the boolean parameter and move that
logic up.


Thanks,
Vivien

^ permalink raw reply

* Re: [PATCH net] net: ipv6: Fix a bug in ndisc_send_ns when netdev only has a global address
From: David Ahern @ 2019-07-29 22:28 UTC (permalink / raw)
  To: David Miller, suyj.fnst; +Cc: kuznet, yoshfuji, netdev
In-Reply-To: <20190729.141752.457438545178811941.davem@davemloft.net>

On 7/29/19 3:17 PM, David Miller wrote:
> David, can you take a quick look at this?

will do. I'll get back to you by tomorrow.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox