[PATCH net] ice: Fix VF Reset paths when interface in a failed over aggregate

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net] ice: Fix VF Reset paths when interface in a failed over aggregate
@ 2023-11-15 21:12 Tony Nguyen
  2023-11-16 14:22 ` Maciej Fijalkowski
  0 siblings, 1 reply; 7+ messages in thread
From: Tony Nguyen @ 2023-11-15 21:12 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet, netdev
  Cc: Dave Ertman, anthony.l.nguyen, carolyn.wyborny, daniel.machon,
	Przemek Kitszel, Sujai Buvaneswaran

From: Dave Ertman <david.m.ertman@intel.com>

There is an error when an interface has the following conditions:
- PF is in an aggregate (bond)
- PF has VFs created on it
- bond is in a state where it is failed-over to the secondary interface
- A VF reset is issued on one or more of those VFs

The issue is generated by the originating PF trying to rebuild or
reconfigure the VF resources.  Since the bond is failed over to the
secondary interface the queue contexts are in a modified state.

To fix this issue, have the originating interface reclaim its resources
prior to the tear-down and rebuild or reconfigure.  Then after the process
is complete, move the resources back to the currently active interface.

There are multiple paths that can be used depending on what triggered the
event, so create a helper function to move the queues and use paired calls
to the helper (back to origin, process, then move back to active interface)
under the same lag_mutex lock.

Fixes: 1e0f9881ef79 ("ice: Flesh out implementation of support for SRIOV on bonded interface")
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
This is the net patch mentioned yesterday:
https://lore.kernel.org/netdev/71058999-50d9-cc17-d940-3f043734e0ee@intel.com/

 drivers/net/ethernet/intel/ice/ice_lag.c      | 42 +++++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_lag.h      |  1 +
 drivers/net/ethernet/intel/ice/ice_vf_lib.c   | 20 +++++++++
 drivers/net/ethernet/intel/ice/ice_virtchnl.c | 25 +++++++++++
 4 files changed, 88 insertions(+)

diff --git a/drivers/net/ethernet/intel/ice/ice_lag.c b/drivers/net/ethernet/intel/ice/ice_lag.c
index cd065ec48c87..9eed93baa59b 100644
--- a/drivers/net/ethernet/intel/ice/ice_lag.c
+++ b/drivers/net/ethernet/intel/ice/ice_lag.c
@@ -679,6 +679,48 @@ static void ice_lag_move_vf_nodes(struct ice_lag *lag, u8 oldport, u8 newport)
 			ice_lag_move_single_vf_nodes(lag, oldport, newport, i);
 }
 
+/**
+ * ice_lag_move_vf_nodes_cfg - move VF nodes outside LAG netdev event context
+ * @lag: local lag struct
+ * @src_prt: lport value for source port
+ * @dst_prt: lport value for destination port
+ *
+ * This function is used to move nodes during an out-of-netdev-event situation,
+ * primarily when the driver needs to reconfigure or recreate resources.
+ *
+ * Must be called while holding the lag_mutex to avoid lag events from
+ * processing while out-of-sync moves are happening.  Also, paired moves,
+ * such as used in a reset flow, should both be called under the same mutex
+ * lock to avoid changes between start of reset and end of reset.
+ */
+void ice_lag_move_vf_nodes_cfg(struct ice_lag *lag, u8 src_prt, u8 dst_prt)
+{
+	struct ice_lag_netdev_list ndlist, *nl;
+	struct list_head *tmp, *n;
+	struct net_device *tmp_nd;
+
+	INIT_LIST_HEAD(&ndlist.node);
+	rcu_read_lock();
+	for_each_netdev_in_bond_rcu(lag->upper_netdev, tmp_nd) {
+		nl = kzalloc(sizeof(*nl), GFP_ATOMIC);
+		if (!nl)
+			break;
+
+		nl->netdev = tmp_nd;
+		list_add(&nl->node, &ndlist.node);
+	}
+	rcu_read_unlock();
+	lag->netdev_head = &ndlist.node;
+	ice_lag_move_vf_nodes(lag, src_prt, dst_prt);
+
+	list_for_each_safe(tmp, n, &ndlist.node) {
+		nl = list_entry(tmp, struct ice_lag_netdev_list, node);
+		list_del(&nl->node);
+		kfree(nl);
+	}
+	lag->netdev_head = NULL;
+}
+
 #define ICE_LAG_SRIOV_CP_RECIPE		10
 #define ICE_LAG_SRIOV_TRAIN_PKT_LEN	16
 
diff --git a/drivers/net/ethernet/intel/ice/ice_lag.h b/drivers/net/ethernet/intel/ice/ice_lag.h
index 9557e8605a07..ede833dfa658 100644
--- a/drivers/net/ethernet/intel/ice/ice_lag.h
+++ b/drivers/net/ethernet/intel/ice/ice_lag.h
@@ -65,4 +65,5 @@ int ice_init_lag(struct ice_pf *pf);
 void ice_deinit_lag(struct ice_pf *pf);
 void ice_lag_rebuild(struct ice_pf *pf);
 bool ice_lag_is_switchdev_running(struct ice_pf *pf);
+void ice_lag_move_vf_nodes_cfg(struct ice_lag *lag, u8 src_prt, u8 dst_prt);
 #endif /* _ICE_LAG_H_ */
diff --git a/drivers/net/ethernet/intel/ice/ice_vf_lib.c b/drivers/net/ethernet/intel/ice/ice_vf_lib.c
index aca1f2ea5034..b7ae09952156 100644
--- a/drivers/net/ethernet/intel/ice/ice_vf_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_vf_lib.c
@@ -829,12 +829,16 @@ static void ice_notify_vf_reset(struct ice_vf *vf)
 int ice_reset_vf(struct ice_vf *vf, u32 flags)
 {
 	struct ice_pf *pf = vf->pf;
+	struct ice_lag *lag;
 	struct ice_vsi *vsi;
+	u8 act_prt, pri_prt;
 	struct device *dev;
 	int err = 0;
 	bool rsd;
 
 	dev = ice_pf_to_dev(pf);
+	act_prt = ICE_LAG_INVALID_PORT;
+	pri_prt = pf->hw.port_info->lport;
 
 	if (flags & ICE_VF_RESET_NOTIFY)
 		ice_notify_vf_reset(vf);
@@ -845,6 +849,17 @@ int ice_reset_vf(struct ice_vf *vf, u32 flags)
 		return 0;
 	}
 
+	lag = pf->lag;
+	mutex_lock(&pf->lag_mutex);
+	if (lag && lag->bonded && lag->primary) {
+		act_prt = lag->active_port;
+		if (act_prt != pri_prt && act_prt != ICE_LAG_INVALID_PORT &&
+		    lag->upper_netdev)
+			ice_lag_move_vf_nodes_cfg(lag, act_prt, pri_prt);
+		else
+			act_prt = ICE_LAG_INVALID_PORT;
+	}
+
 	if (flags & ICE_VF_RESET_LOCK)
 		mutex_lock(&vf->cfg_lock);
 	else
@@ -937,6 +952,11 @@ int ice_reset_vf(struct ice_vf *vf, u32 flags)
 	if (flags & ICE_VF_RESET_LOCK)
 		mutex_unlock(&vf->cfg_lock);
 
+	if (lag && lag->bonded && lag->primary &&
+	    act_prt != ICE_LAG_INVALID_PORT)
+		ice_lag_move_vf_nodes_cfg(lag, pri_prt, act_prt);
+	mutex_unlock(&pf->lag_mutex);
+
 	return err;
 }
 
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.c b/drivers/net/ethernet/intel/ice/ice_virtchnl.c
index cdf17b1e2f25..de11b3186bd7 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.c
@@ -1603,9 +1603,24 @@ static int ice_vc_cfg_qs_msg(struct ice_vf *vf, u8 *msg)
 	    (struct virtchnl_vsi_queue_config_info *)msg;
 	struct virtchnl_queue_pair_info *qpi;
 	struct ice_pf *pf = vf->pf;
+	struct ice_lag *lag;
 	struct ice_vsi *vsi;
+	u8 act_prt, pri_prt;
 	int i = -1, q_idx;
 
+	lag = pf->lag;
+	mutex_lock(&pf->lag_mutex);
+	act_prt = ICE_LAG_INVALID_PORT;
+	pri_prt = pf->hw.port_info->lport;
+	if (lag && lag->bonded && lag->primary) {
+		act_prt = lag->active_port;
+		if (act_prt != pri_prt && act_prt != ICE_LAG_INVALID_PORT &&
+		    lag->upper_netdev)
+			ice_lag_move_vf_nodes_cfg(lag, act_prt, pri_prt);
+		else
+			act_prt = ICE_LAG_INVALID_PORT;
+	}
+
 	if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states))
 		goto error_param;
 
@@ -1729,6 +1744,11 @@ static int ice_vc_cfg_qs_msg(struct ice_vf *vf, u8 *msg)
 		}
 	}
 
+	if (lag && lag->bonded && lag->primary &&
+	    act_prt != ICE_LAG_INVALID_PORT)
+		ice_lag_move_vf_nodes_cfg(lag, pri_prt, act_prt);
+	mutex_unlock(&pf->lag_mutex);
+
 	/* send the response to the VF */
 	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_VSI_QUEUES,
 				     VIRTCHNL_STATUS_SUCCESS, NULL, 0);
@@ -1743,6 +1763,11 @@ static int ice_vc_cfg_qs_msg(struct ice_vf *vf, u8 *msg)
 				vf->vf_id, i);
 	}
 
+	if (lag && lag->bonded && lag->primary &&
+	    act_prt != ICE_LAG_INVALID_PORT)
+		ice_lag_move_vf_nodes_cfg(lag, pri_prt, act_prt);
+	mutex_unlock(&pf->lag_mutex);
+
 	ice_lag_move_new_vf_nodes(vf);
 
 	/* send the response to the VF */
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net] ice: Fix VF Reset paths when interface in a failed over aggregate
  2023-11-15 21:12 [PATCH net] ice: Fix VF Reset paths when interface in a failed over aggregate Tony Nguyen
@ 2023-11-16 14:22 ` Maciej Fijalkowski
  2023-11-16 21:36   ` Ertman, David M
  0 siblings, 1 reply; 7+ messages in thread
From: Maciej Fijalkowski @ 2023-11-16 14:22 UTC (permalink / raw)
  To: Tony Nguyen
  Cc: davem, kuba, pabeni, edumazet, netdev, Dave Ertman,
	carolyn.wyborny, daniel.machon, Przemek Kitszel,
	Sujai Buvaneswaran

On Wed, Nov 15, 2023 at 01:12:41PM -0800, Tony Nguyen wrote:
> From: Dave Ertman <david.m.ertman@intel.com>
> 
> There is an error when an interface has the following conditions:
> - PF is in an aggregate (bond)
> - PF has VFs created on it
> - bond is in a state where it is failed-over to the secondary interface
> - A VF reset is issued on one or more of those VFs
> 
> The issue is generated by the originating PF trying to rebuild or
> reconfigure the VF resources.  Since the bond is failed over to the
> secondary interface the queue contexts are in a modified state.
> 
> To fix this issue, have the originating interface reclaim its resources
> prior to the tear-down and rebuild or reconfigure.  Then after the process
> is complete, move the resources back to the currently active interface.
> 
> There are multiple paths that can be used depending on what triggered the
> event, so create a helper function to move the queues and use paired calls
> to the helper (back to origin, process, then move back to active interface)
> under the same lag_mutex lock.
> 
> Fixes: 1e0f9881ef79 ("ice: Flesh out implementation of support for SRIOV on bonded interface")
> Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
> ---
> This is the net patch mentioned yesterday:
> https://lore.kernel.org/netdev/71058999-50d9-cc17-d940-3f043734e0ee@intel.com/
> 
>  drivers/net/ethernet/intel/ice/ice_lag.c      | 42 +++++++++++++++++++
>  drivers/net/ethernet/intel/ice/ice_lag.h      |  1 +
>  drivers/net/ethernet/intel/ice/ice_vf_lib.c   | 20 +++++++++
>  drivers/net/ethernet/intel/ice/ice_virtchnl.c | 25 +++++++++++
>  4 files changed, 88 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_lag.c b/drivers/net/ethernet/intel/ice/ice_lag.c
> index cd065ec48c87..9eed93baa59b 100644
> --- a/drivers/net/ethernet/intel/ice/ice_lag.c
> +++ b/drivers/net/ethernet/intel/ice/ice_lag.c
> @@ -679,6 +679,48 @@ static void ice_lag_move_vf_nodes(struct ice_lag *lag, u8 oldport, u8 newport)
>  			ice_lag_move_single_vf_nodes(lag, oldport, newport, i);
>  }
>  
> +/**
> + * ice_lag_move_vf_nodes_cfg - move VF nodes outside LAG netdev event context
> + * @lag: local lag struct
> + * @src_prt: lport value for source port
> + * @dst_prt: lport value for destination port
> + *
> + * This function is used to move nodes during an out-of-netdev-event situation,
> + * primarily when the driver needs to reconfigure or recreate resources.
> + *
> + * Must be called while holding the lag_mutex to avoid lag events from
> + * processing while out-of-sync moves are happening.  Also, paired moves,
> + * such as used in a reset flow, should both be called under the same mutex
> + * lock to avoid changes between start of reset and end of reset.
> + */
> +void ice_lag_move_vf_nodes_cfg(struct ice_lag *lag, u8 src_prt, u8 dst_prt)
> +{
> +	struct ice_lag_netdev_list ndlist, *nl;
> +	struct list_head *tmp, *n;
> +	struct net_device *tmp_nd;
> +
> +	INIT_LIST_HEAD(&ndlist.node);
> +	rcu_read_lock();
> +	for_each_netdev_in_bond_rcu(lag->upper_netdev, tmp_nd) {

Why do you need rcu section for that?

under mutex? lacking context here.

> +		nl = kzalloc(sizeof(*nl), GFP_ATOMIC);

do these have to be new allocations or could you just use list_move?

> +		if (!nl)
> +			break;
> +
> +		nl->netdev = tmp_nd;
> +		list_add(&nl->node, &ndlist.node);

list_add_rcu ?

> +	}
> +	rcu_read_unlock();

you have the very same chunk of code in ice_lag_move_new_vf_nodes(). pull
this out to common function?

...and in ice_lag_rebuild().

> +	lag->netdev_head = &ndlist.node;
> +	ice_lag_move_vf_nodes(lag, src_prt, dst_prt);
> +
> +	list_for_each_safe(tmp, n, &ndlist.node) {

use list_for_each_entry_safe()

> +		nl = list_entry(tmp, struct ice_lag_netdev_list, node);
> +		list_del(&nl->node);
> +		kfree(nl);
> +	}
> +	lag->netdev_head = NULL;
> +}
> +
>  #define ICE_LAG_SRIOV_CP_RECIPE		10
>  #define ICE_LAG_SRIOV_TRAIN_PKT_LEN	16
>  
> diff --git a/drivers/net/ethernet/intel/ice/ice_lag.h b/drivers/net/ethernet/intel/ice/ice_lag.h
> index 9557e8605a07..ede833dfa658 100644
> --- a/drivers/net/ethernet/intel/ice/ice_lag.h
> +++ b/drivers/net/ethernet/intel/ice/ice_lag.h
> @@ -65,4 +65,5 @@ int ice_init_lag(struct ice_pf *pf);
>  void ice_deinit_lag(struct ice_pf *pf);
>  void ice_lag_rebuild(struct ice_pf *pf);
>  bool ice_lag_is_switchdev_running(struct ice_pf *pf);
> +void ice_lag_move_vf_nodes_cfg(struct ice_lag *lag, u8 src_prt, u8 dst_prt);
>  #endif /* _ICE_LAG_H_ */
> diff --git a/drivers/net/ethernet/intel/ice/ice_vf_lib.c b/drivers/net/ethernet/intel/ice/ice_vf_lib.c
> index aca1f2ea5034..b7ae09952156 100644
> --- a/drivers/net/ethernet/intel/ice/ice_vf_lib.c
> +++ b/drivers/net/ethernet/intel/ice/ice_vf_lib.c
> @@ -829,12 +829,16 @@ static void ice_notify_vf_reset(struct ice_vf *vf)
>  int ice_reset_vf(struct ice_vf *vf, u32 flags)
>  {
>  	struct ice_pf *pf = vf->pf;
> +	struct ice_lag *lag;
>  	struct ice_vsi *vsi;
> +	u8 act_prt, pri_prt;
>  	struct device *dev;
>  	int err = 0;
>  	bool rsd;
>  
>  	dev = ice_pf_to_dev(pf);
> +	act_prt = ICE_LAG_INVALID_PORT;
> +	pri_prt = pf->hw.port_info->lport;
>  
>  	if (flags & ICE_VF_RESET_NOTIFY)
>  		ice_notify_vf_reset(vf);
> @@ -845,6 +849,17 @@ int ice_reset_vf(struct ice_vf *vf, u32 flags)
>  		return 0;
>  	}
>  
> +	lag = pf->lag;
> +	mutex_lock(&pf->lag_mutex);
> +	if (lag && lag->bonded && lag->primary) {
> +		act_prt = lag->active_port;
> +		if (act_prt != pri_prt && act_prt != ICE_LAG_INVALID_PORT &&
> +		    lag->upper_netdev)
> +			ice_lag_move_vf_nodes_cfg(lag, act_prt, pri_prt);
> +		else
> +			act_prt = ICE_LAG_INVALID_PORT;
> +	}
> +
>  	if (flags & ICE_VF_RESET_LOCK)
>  		mutex_lock(&vf->cfg_lock);
>  	else
> @@ -937,6 +952,11 @@ int ice_reset_vf(struct ice_vf *vf, u32 flags)
>  	if (flags & ICE_VF_RESET_LOCK)
>  		mutex_unlock(&vf->cfg_lock);
>  
> +	if (lag && lag->bonded && lag->primary &&
> +	    act_prt != ICE_LAG_INVALID_PORT)
> +		ice_lag_move_vf_nodes_cfg(lag, pri_prt, act_prt);
> +	mutex_unlock(&pf->lag_mutex);
> +
>  	return err;
>  }
>  
> diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.c b/drivers/net/ethernet/intel/ice/ice_virtchnl.c
> index cdf17b1e2f25..de11b3186bd7 100644
> --- a/drivers/net/ethernet/intel/ice/ice_virtchnl.c
> +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.c
> @@ -1603,9 +1603,24 @@ static int ice_vc_cfg_qs_msg(struct ice_vf *vf, u8 *msg)
>  	    (struct virtchnl_vsi_queue_config_info *)msg;
>  	struct virtchnl_queue_pair_info *qpi;
>  	struct ice_pf *pf = vf->pf;
> +	struct ice_lag *lag;
>  	struct ice_vsi *vsi;
> +	u8 act_prt, pri_prt;
>  	int i = -1, q_idx;
>  
> +	lag = pf->lag;
> +	mutex_lock(&pf->lag_mutex);
> +	act_prt = ICE_LAG_INVALID_PORT;
> +	pri_prt = pf->hw.port_info->lport;
> +	if (lag && lag->bonded && lag->primary) {
> +		act_prt = lag->active_port;
> +		if (act_prt != pri_prt && act_prt != ICE_LAG_INVALID_PORT &&
> +		    lag->upper_netdev)
> +			ice_lag_move_vf_nodes_cfg(lag, act_prt, pri_prt);
> +		else
> +			act_prt = ICE_LAG_INVALID_PORT;
> +	}
> +
>  	if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states))
>  		goto error_param;
>  
> @@ -1729,6 +1744,11 @@ static int ice_vc_cfg_qs_msg(struct ice_vf *vf, u8 *msg)
>  		}
>  	}
>  
> +	if (lag && lag->bonded && lag->primary &&
> +	    act_prt != ICE_LAG_INVALID_PORT)
> +		ice_lag_move_vf_nodes_cfg(lag, pri_prt, act_prt);
> +	mutex_unlock(&pf->lag_mutex);
> +
>  	/* send the response to the VF */
>  	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_VSI_QUEUES,
>  				     VIRTCHNL_STATUS_SUCCESS, NULL, 0);
> @@ -1743,6 +1763,11 @@ static int ice_vc_cfg_qs_msg(struct ice_vf *vf, u8 *msg)
>  				vf->vf_id, i);
>  	}
>  
> +	if (lag && lag->bonded && lag->primary &&
> +	    act_prt != ICE_LAG_INVALID_PORT)
> +		ice_lag_move_vf_nodes_cfg(lag, pri_prt, act_prt);
> +	mutex_unlock(&pf->lag_mutex);
> +
>  	ice_lag_move_new_vf_nodes(vf);
>  
>  	/* send the response to the VF */
> -- 
> 2.41.0
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH net] ice: Fix VF Reset paths when interface in a failed over aggregate
  2023-11-16 14:22 ` Maciej Fijalkowski
@ 2023-11-16 21:36   ` Ertman, David M
  2023-11-17 17:44     ` Maciej Fijalkowski
  0 siblings, 1 reply; 7+ messages in thread
From: Ertman, David M @ 2023-11-16 21:36 UTC (permalink / raw)
  To: Fijalkowski, Maciej, Nguyen, Anthony L
  Cc: davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com,
	edumazet@google.com, netdev@vger.kernel.org, Wyborny, Carolyn,
	daniel.machon@microchip.com, Kitszel, Przemyslaw,
	Buvaneswaran, Sujai

> -----Original Message-----
> From: Fijalkowski, Maciej <maciej.fijalkowski@intel.com>
> Sent: Thursday, November 16, 2023 6:22 AM
> To: Nguyen, Anthony L <anthony.l.nguyen@intel.com>
> Cc: davem@davemloft.net; kuba@kernel.org; pabeni@redhat.com;
> edumazet@google.com; netdev@vger.kernel.org; Ertman, David M
> <david.m.ertman@intel.com>; Wyborny, Carolyn
> <carolyn.wyborny@intel.com>; daniel.machon@microchip.com; Kitszel,
> Przemyslaw <przemyslaw.kitszel@intel.com>; Buvaneswaran, Sujai
> <sujai.buvaneswaran@intel.com>
> Subject: Re: [PATCH net] ice: Fix VF Reset paths when interface in a failed
> over aggregate
> 
> On Wed, Nov 15, 2023 at 01:12:41PM -0800, Tony Nguyen wrote:
> > From: Dave Ertman <david.m.ertman@intel.com>
> >
> > There is an error when an interface has the following conditions:
> > - PF is in an aggregate (bond)
> > - PF has VFs created on it
> > - bond is in a state where it is failed-over to the secondary interface
> > - A VF reset is issued on one or more of those VFs
> >
> > The issue is generated by the originating PF trying to rebuild or
> > reconfigure the VF resources.  Since the bond is failed over to the
> > secondary interface the queue contexts are in a modified state.
> >
> > To fix this issue, have the originating interface reclaim its resources
> > prior to the tear-down and rebuild or reconfigure.  Then after the process
> > is complete, move the resources back to the currently active interface.
> >
> > There are multiple paths that can be used depending on what triggered the
> > event, so create a helper function to move the queues and use paired calls
> > to the helper (back to origin, process, then move back to active interface)
> > under the same lag_mutex lock.
> >
> > Fixes: 1e0f9881ef79 ("ice: Flesh out implementation of support for SRIOV
> on bonded interface")
> > Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
> > Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
> > Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
> > Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
> > ---
> > This is the net patch mentioned yesterday:
> > https://lore.kernel.org/netdev/71058999-50d9-cc17-d940-
> 3f043734e0ee@intel.com/
> >
> >  drivers/net/ethernet/intel/ice/ice_lag.c      | 42 +++++++++++++++++++
> >  drivers/net/ethernet/intel/ice/ice_lag.h      |  1 +
> >  drivers/net/ethernet/intel/ice/ice_vf_lib.c   | 20 +++++++++
> >  drivers/net/ethernet/intel/ice/ice_virtchnl.c | 25 +++++++++++
> >  4 files changed, 88 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/intel/ice/ice_lag.c
> b/drivers/net/ethernet/intel/ice/ice_lag.c
> > index cd065ec48c87..9eed93baa59b 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_lag.c
> > +++ b/drivers/net/ethernet/intel/ice/ice_lag.c
> > @@ -679,6 +679,48 @@ static void ice_lag_move_vf_nodes(struct ice_lag
> *lag, u8 oldport, u8 newport)
> >  			ice_lag_move_single_vf_nodes(lag, oldport,
> newport, i);
> >  }
> >
> > +/**
> > + * ice_lag_move_vf_nodes_cfg - move VF nodes outside LAG netdev
> event context
> > + * @lag: local lag struct
> > + * @src_prt: lport value for source port
> > + * @dst_prt: lport value for destination port
> > + *
> > + * This function is used to move nodes during an out-of-netdev-event
> situation,
> > + * primarily when the driver needs to reconfigure or recreate resources.
> > + *
> > + * Must be called while holding the lag_mutex to avoid lag events from
> > + * processing while out-of-sync moves are happening.  Also, paired
> moves,
> > + * such as used in a reset flow, should both be called under the same
> mutex
> > + * lock to avoid changes between start of reset and end of reset.
> > + */
> > +void ice_lag_move_vf_nodes_cfg(struct ice_lag *lag, u8 src_prt, u8
> dst_prt)
> > +{
> > +	struct ice_lag_netdev_list ndlist, *nl;
> > +	struct list_head *tmp, *n;
> > +	struct net_device *tmp_nd;
> > +
> > +	INIT_LIST_HEAD(&ndlist.node);
> > +	rcu_read_lock();
> > +	for_each_netdev_in_bond_rcu(lag->upper_netdev, tmp_nd) {
> 
> Why do you need rcu section for that?
> 
> under mutex? lacking context here.
> 

Mutex lock is to stop lag event thread from processing any other event until
after the asynchronous reset is processed.  RCU lock is to stop upper kernel
bonding driver from changing elements while reset is building a list.

> > +		nl = kzalloc(sizeof(*nl), GFP_ATOMIC);
> 
> do these have to be new allocations or could you just use list_move?
> 

Building a list from scratch - nothing to move until it is created.

> > +		if (!nl)
> > +			break;
> > +
> > +		nl->netdev = tmp_nd;
> > +		list_add(&nl->node, &ndlist.node);
> 
> list_add_rcu ?
> 

I thought list_add_rcu was for internal list manipulation when prev and next
Are both known and defined?

> > +	}
> > +	rcu_read_unlock();
> 
> you have the very same chunk of code in ice_lag_move_new_vf_nodes().
> pull
> this out to common function?
> 
> ...and in ice_lag_rebuild().
> 

Nice catch - for v2, pulled out code into two helper function:
ice_lag_build_netdev_list()
Iie_lag_destroy_netdev_list()


> > +	lag->netdev_head = &ndlist.node;
> > +	ice_lag_move_vf_nodes(lag, src_prt, dst_prt);
> > +
> > +	list_for_each_safe(tmp, n, &ndlist.node) {
> 
> use list_for_each_entry_safe()
> 

Changed in v2.

> > +		nl = list_entry(tmp, struct ice_lag_netdev_list, node);
> > +		list_del(&nl->node);
> > +		kfree(nl);
> > +	}
> > +	lag->netdev_head = NULL;

Thanks for the review!
DaveE

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] ice: Fix VF Reset paths when interface in a failed over aggregate
  2023-11-16 21:36   ` Ertman, David M
@ 2023-11-17 17:44     ` Maciej Fijalkowski
  2023-11-17 18:17       ` Ertman, David M
  0 siblings, 1 reply; 7+ messages in thread
From: Maciej Fijalkowski @ 2023-11-17 17:44 UTC (permalink / raw)
  To: Ertman, David M
  Cc: Nguyen, Anthony L, davem@davemloft.net, kuba@kernel.org,
	pabeni@redhat.com, edumazet@google.com, netdev@vger.kernel.org,
	Wyborny, Carolyn, daniel.machon@microchip.com,
	Kitszel, Przemyslaw, Buvaneswaran, Sujai

On Thu, Nov 16, 2023 at 10:36:37PM +0100, Ertman, David M wrote:
> > -----Original Message-----
> > From: Fijalkowski, Maciej <maciej.fijalkowski@intel.com>
> > Sent: Thursday, November 16, 2023 6:22 AM
> > To: Nguyen, Anthony L <anthony.l.nguyen@intel.com>
> > Cc: davem@davemloft.net; kuba@kernel.org; pabeni@redhat.com;
> > edumazet@google.com; netdev@vger.kernel.org; Ertman, David M
> > <david.m.ertman@intel.com>; Wyborny, Carolyn
> > <carolyn.wyborny@intel.com>; daniel.machon@microchip.com; Kitszel,
> > Przemyslaw <przemyslaw.kitszel@intel.com>; Buvaneswaran, Sujai
> > <sujai.buvaneswaran@intel.com>
> > Subject: Re: [PATCH net] ice: Fix VF Reset paths when interface in a failed
> > over aggregate
> > 
> > On Wed, Nov 15, 2023 at 01:12:41PM -0800, Tony Nguyen wrote:
> > > From: Dave Ertman <david.m.ertman@intel.com>
> > >
> > > There is an error when an interface has the following conditions:
> > > - PF is in an aggregate (bond)
> > > - PF has VFs created on it
> > > - bond is in a state where it is failed-over to the secondary interface
> > > - A VF reset is issued on one or more of those VFs
> > >
> > > The issue is generated by the originating PF trying to rebuild or
> > > reconfigure the VF resources.  Since the bond is failed over to the
> > > secondary interface the queue contexts are in a modified state.
> > >
> > > To fix this issue, have the originating interface reclaim its resources
> > > prior to the tear-down and rebuild or reconfigure.  Then after the process
> > > is complete, move the resources back to the currently active interface.
> > >
> > > There are multiple paths that can be used depending on what triggered the
> > > event, so create a helper function to move the queues and use paired calls
> > > to the helper (back to origin, process, then move back to active interface)
> > > under the same lag_mutex lock.
> > >
> > > Fixes: 1e0f9881ef79 ("ice: Flesh out implementation of support for SRIOV
> > on bonded interface")
> > > Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
> > > Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
> > > Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
> > > Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
> > > ---
> > > This is the net patch mentioned yesterday:
> > > https://lore.kernel.org/netdev/71058999-50d9-cc17-d940-
> > 3f043734e0ee@intel.com/
> > >
> > >  drivers/net/ethernet/intel/ice/ice_lag.c      | 42 +++++++++++++++++++
> > >  drivers/net/ethernet/intel/ice/ice_lag.h      |  1 +
> > >  drivers/net/ethernet/intel/ice/ice_vf_lib.c   | 20 +++++++++
> > >  drivers/net/ethernet/intel/ice/ice_virtchnl.c | 25 +++++++++++
> > >  4 files changed, 88 insertions(+)
> > >
> > > diff --git a/drivers/net/ethernet/intel/ice/ice_lag.c
> > b/drivers/net/ethernet/intel/ice/ice_lag.c
> > > index cd065ec48c87..9eed93baa59b 100644
> > > --- a/drivers/net/ethernet/intel/ice/ice_lag.c
> > > +++ b/drivers/net/ethernet/intel/ice/ice_lag.c
> > > @@ -679,6 +679,48 @@ static void ice_lag_move_vf_nodes(struct ice_lag
> > *lag, u8 oldport, u8 newport)
> > >  			ice_lag_move_single_vf_nodes(lag, oldport,
> > newport, i);
> > >  }
> > >
> > > +/**
> > > + * ice_lag_move_vf_nodes_cfg - move VF nodes outside LAG netdev
> > event context
> > > + * @lag: local lag struct
> > > + * @src_prt: lport value for source port
> > > + * @dst_prt: lport value for destination port
> > > + *
> > > + * This function is used to move nodes during an out-of-netdev-event
> > situation,
> > > + * primarily when the driver needs to reconfigure or recreate resources.
> > > + *
> > > + * Must be called while holding the lag_mutex to avoid lag events from
> > > + * processing while out-of-sync moves are happening.  Also, paired
> > moves,
> > > + * such as used in a reset flow, should both be called under the same
> > mutex
> > > + * lock to avoid changes between start of reset and end of reset.
> > > + */
> > > +void ice_lag_move_vf_nodes_cfg(struct ice_lag *lag, u8 src_prt, u8
> > dst_prt)
> > > +{
> > > +	struct ice_lag_netdev_list ndlist, *nl;
> > > +	struct list_head *tmp, *n;
> > > +	struct net_device *tmp_nd;
> > > +
> > > +	INIT_LIST_HEAD(&ndlist.node);
> > > +	rcu_read_lock();
> > > +	for_each_netdev_in_bond_rcu(lag->upper_netdev, tmp_nd) {
> > 
> > Why do you need rcu section for that?
> > 
> > under mutex? lacking context here.
> > 
> 
> Mutex lock is to stop lag event thread from processing any other event until
> after the asynchronous reset is processed.  RCU lock is to stop upper kernel
> bonding driver from changing elements while reset is building a list.

Can you point me to relevant piece of code for upper kernel bonding
driver? Is there synchronize_rcu() on updates?
> 
> > > +		nl = kzalloc(sizeof(*nl), GFP_ATOMIC);
> > 
> > do these have to be new allocations or could you just use list_move?
> > 
> 
> Building a list from scratch - nothing to move until it is created.

Sorry got confused.

Couldn't you keep the up-to-date list of netdevs instead? And avoid all
the building list and then deleting it after processing event?

> 
> > > +		if (!nl)
> > > +			break;
> > > +
> > > +		nl->netdev = tmp_nd;
> > > +		list_add(&nl->node, &ndlist.node);
> > 
> > list_add_rcu ?
> > 
> 
> I thought list_add_rcu was for internal list manipulation when prev and next
> Are both known and defined?

First time I hear this TBH but disregard the suggestion.

> 
> > > +	}
> > > +	rcu_read_unlock();
> > 
> > you have the very same chunk of code in ice_lag_move_new_vf_nodes().
> > pull
> > this out to common function?
> > 
> > ...and in ice_lag_rebuild().
> > 
> 
> Nice catch - for v2, pulled out code into two helper function:
> ice_lag_build_netdev_list()
> Iie_lag_destroy_netdev_list()
> 
> 
> > > +	lag->netdev_head = &ndlist.node;
> > > +	ice_lag_move_vf_nodes(lag, src_prt, dst_prt);
> > > +
> > > +	list_for_each_safe(tmp, n, &ndlist.node) {
> > 
> > use list_for_each_entry_safe()
> > 
> 
> Changed in v2.
> 
> > > +		nl = list_entry(tmp, struct ice_lag_netdev_list, node);
> > > +		list_del(&nl->node);
> > > +		kfree(nl);
> > > +	}
> > > +	lag->netdev_head = NULL;
> 
> Thanks for the review!
> DaveE

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH net] ice: Fix VF Reset paths when interface in a failed over aggregate
  2023-11-17 17:44     ` Maciej Fijalkowski
@ 2023-11-17 18:17       ` Ertman, David M
  2023-11-17 21:21         ` Jay Vosburgh
  0 siblings, 1 reply; 7+ messages in thread
From: Ertman, David M @ 2023-11-17 18:17 UTC (permalink / raw)
  To: Fijalkowski, Maciej
  Cc: Nguyen, Anthony L, davem@davemloft.net, kuba@kernel.org,
	pabeni@redhat.com, edumazet@google.com, netdev@vger.kernel.org,
	Wyborny, Carolyn, daniel.machon@microchip.com,
	Kitszel, Przemyslaw, Buvaneswaran, Sujai

> -----Original Message-----
> From: Fijalkowski, Maciej <maciej.fijalkowski@intel.com>
> Sent: Friday, November 17, 2023 9:44 AM
> To: Ertman, David M <david.m.ertman@intel.com>
> Cc: Nguyen, Anthony L <anthony.l.nguyen@intel.com>;
> davem@davemloft.net; kuba@kernel.org; pabeni@redhat.com;
> edumazet@google.com; netdev@vger.kernel.org; Wyborny, Carolyn
> <carolyn.wyborny@intel.com>; daniel.machon@microchip.com; Kitszel,
> Przemyslaw <przemyslaw.kitszel@intel.com>; Buvaneswaran, Sujai
> <sujai.buvaneswaran@intel.com>
> Subject: Re: [PATCH net] ice: Fix VF Reset paths when interface in a failed
> over aggregate
> 
> On Thu, Nov 16, 2023 at 10:36:37PM +0100, Ertman, David M wrote:
> > > -----Original Message-----
> > > From: Fijalkowski, Maciej <maciej.fijalkowski@intel.com>
> > > Sent: Thursday, November 16, 2023 6:22 AM
> > > To: Nguyen, Anthony L <anthony.l.nguyen@intel.com>
> > > Cc: davem@davemloft.net; kuba@kernel.org; pabeni@redhat.com;
> > > edumazet@google.com; netdev@vger.kernel.org; Ertman, David M
> > > <david.m.ertman@intel.com>; Wyborny, Carolyn
> > > <carolyn.wyborny@intel.com>; daniel.machon@microchip.com; Kitszel,
> > > Przemyslaw <przemyslaw.kitszel@intel.com>; Buvaneswaran, Sujai
> > > <sujai.buvaneswaran@intel.com>
> > > Subject: Re: [PATCH net] ice: Fix VF Reset paths when interface in a failed
> > > over aggregate
> > >
> > > On Wed, Nov 15, 2023 at 01:12:41PM -0800, Tony Nguyen wrote:
> > > > From: Dave Ertman <david.m.ertman@intel.com>
> > > >
> > > > There is an error when an interface has the following conditions:
> > > > - PF is in an aggregate (bond)
> > > > - PF has VFs created on it
> > > > - bond is in a state where it is failed-over to the secondary interface
> > > > - A VF reset is issued on one or more of those VFs
> > > >
> > > > The issue is generated by the originating PF trying to rebuild or
> > > > reconfigure the VF resources.  Since the bond is failed over to the
> > > > secondary interface the queue contexts are in a modified state.
> > > >
> > > > To fix this issue, have the originating interface reclaim its resources
> > > > prior to the tear-down and rebuild or reconfigure.  Then after the
> process
> > > > is complete, move the resources back to the currently active interface.
> > > >
> > > > There are multiple paths that can be used depending on what triggered
> the
> > > > event, so create a helper function to move the queues and use paired
> calls
> > > > to the helper (back to origin, process, then move back to active
> interface)
> > > > under the same lag_mutex lock.
> > > >
> > > > Fixes: 1e0f9881ef79 ("ice: Flesh out implementation of support for
> SRIOV
> > > on bonded interface")
> > > > Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
> > > > Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
> > > > Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
> > > > Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
> > > > ---
> > > > This is the net patch mentioned yesterday:
> > > > https://lore.kernel.org/netdev/71058999-50d9-cc17-d940-
> > > 3f043734e0ee@intel.com/
> > > >
> > > >  drivers/net/ethernet/intel/ice/ice_lag.c      | 42
> +++++++++++++++++++
> > > >  drivers/net/ethernet/intel/ice/ice_lag.h      |  1 +
> > > >  drivers/net/ethernet/intel/ice/ice_vf_lib.c   | 20 +++++++++
> > > >  drivers/net/ethernet/intel/ice/ice_virtchnl.c | 25 +++++++++++
> > > >  4 files changed, 88 insertions(+)
> > > >
> > > > diff --git a/drivers/net/ethernet/intel/ice/ice_lag.c
> > > b/drivers/net/ethernet/intel/ice/ice_lag.c
> > > > index cd065ec48c87..9eed93baa59b 100644
> > > > --- a/drivers/net/ethernet/intel/ice/ice_lag.c
> > > > +++ b/drivers/net/ethernet/intel/ice/ice_lag.c
> > > > @@ -679,6 +679,48 @@ static void ice_lag_move_vf_nodes(struct
> ice_lag
> > > *lag, u8 oldport, u8 newport)
> > > >  			ice_lag_move_single_vf_nodes(lag, oldport,
> > > newport, i);
> > > >  }
> > > >
> > > > +/**
> > > > + * ice_lag_move_vf_nodes_cfg - move VF nodes outside LAG netdev
> > > event context
> > > > + * @lag: local lag struct
> > > > + * @src_prt: lport value for source port
> > > > + * @dst_prt: lport value for destination port
> > > > + *
> > > > + * This function is used to move nodes during an out-of-netdev-event
> > > situation,
> > > > + * primarily when the driver needs to reconfigure or recreate
> resources.
> > > > + *
> > > > + * Must be called while holding the lag_mutex to avoid lag events from
> > > > + * processing while out-of-sync moves are happening.  Also, paired
> > > moves,
> > > > + * such as used in a reset flow, should both be called under the same
> > > mutex
> > > > + * lock to avoid changes between start of reset and end of reset.
> > > > + */
> > > > +void ice_lag_move_vf_nodes_cfg(struct ice_lag *lag, u8 src_prt, u8
> > > dst_prt)
> > > > +{
> > > > +	struct ice_lag_netdev_list ndlist, *nl;
> > > > +	struct list_head *tmp, *n;
> > > > +	struct net_device *tmp_nd;
> > > > +
> > > > +	INIT_LIST_HEAD(&ndlist.node);
> > > > +	rcu_read_lock();
> > > > +	for_each_netdev_in_bond_rcu(lag->upper_netdev, tmp_nd) {
> > >
> > > Why do you need rcu section for that?
> > >
> > > under mutex? lacking context here.
> > >
> >
> > Mutex lock is to stop lag event thread from processing any other event
> until
> > after the asynchronous reset is processed.  RCU lock is to stop upper kernel
> > bonding driver from changing elements while reset is building a list.
> 
> Can you point me to relevant piece of code for upper kernel bonding
> driver? Is there synchronize_rcu() on updates?

Here is the benning of the bonding struct from /include/net/bonding.h

/*
 * Here are the locking policies for the two bonding locks:
 * Get rcu_read_lock when reading or RTNL when writing slave list.
 */
struct bonding {
	struct   net_device *dev; /* first - useful for panic debug */
	struct   slave __rcu *curr_active_slave;
	struct   slave __rcu *current_arp_slave;
	struct   slave __rcu *primary_slave;
	struct   bond_up_slave __rcu *usable_slaves;
	struct   bond_up_slave __rcu *all_slaves;

> >
> > > > +		nl = kzalloc(sizeof(*nl), GFP_ATOMIC);
> > >
> > > do these have to be new allocations or could you just use list_move?
> > >
> >
> > Building a list from scratch - nothing to move until it is created.
> 
> Sorry got confused.
> 
> Couldn't you keep the up-to-date list of netdevs instead? And avoid all
> the building list and then deleting it after processing event?
> 

The bonding driver is generating netdev events for things changing in the aggregate. The ice
driver's event handler which takes a snapshot of the members of the bond and creates a work
item which gets put on the event processing thread and then returns.  The events are processed
one at a time in sequence asynchronously to the event handler on the processing thread.  The
contents of the member list for the work item is only valid for that work item and cannot be used
for a reset event happening asynchronously to the processing queue.

Thanks, 
DaveE

> >
> > > > +		if (!nl)
> > > > +			break;
> > > > +
> > > > +		nl->netdev = tmp_nd;
> > > > +		list_add(&nl->node, &ndlist.node);
> > >
> > > list_add_rcu ?
> > >
> >
> > I thought list_add_rcu was for internal list manipulation when prev and
> next
> > Are both known and defined?
> 
> First time I hear this TBH but disregard the suggestion.
> 
> >
> > > > +	}
> > > > +	rcu_read_unlock();
> > >
> > > you have the very same chunk of code in
> ice_lag_move_new_vf_nodes().
> > > pull
> > > this out to common function?
> > >
> > > ...and in ice_lag_rebuild().
> > >
> >
> > Nice catch - for v2, pulled out code into two helper function:
> > ice_lag_build_netdev_list()
> > Iie_lag_destroy_netdev_list()
> >
> >
> > > > +	lag->netdev_head = &ndlist.node;
> > > > +	ice_lag_move_vf_nodes(lag, src_prt, dst_prt);
> > > > +
> > > > +	list_for_each_safe(tmp, n, &ndlist.node) {
> > >
> > > use list_for_each_entry_safe()
> > >
> >
> > Changed in v2.
> >
> > > > +		nl = list_entry(tmp, struct ice_lag_netdev_list, node);
> > > > +		list_del(&nl->node);
> > > > +		kfree(nl);
> > > > +	}
> > > > +	lag->netdev_head = NULL;
> >
> > Thanks for the review!
> > DaveE

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] ice: Fix VF Reset paths when interface in a failed over aggregate
  2023-11-17 18:17       ` Ertman, David M
@ 2023-11-17 21:21         ` Jay Vosburgh
  2023-11-17 22:03           ` Ertman, David M
  0 siblings, 1 reply; 7+ messages in thread
From: Jay Vosburgh @ 2023-11-17 21:21 UTC (permalink / raw)
  To: Ertman, David M
  Cc: Fijalkowski, Maciej, Nguyen, Anthony L, davem@davemloft.net,
	kuba@kernel.org, pabeni@redhat.com, edumazet@google.com,
	netdev@vger.kernel.org, Wyborny, Carolyn,
	daniel.machon@microchip.com, Kitszel, Przemyslaw,
	Buvaneswaran, Sujai

Ertman, David M <david.m.ertman@intel.com> wrote:

>> -----Original Message-----
>> From: Fijalkowski, Maciej <maciej.fijalkowski@intel.com>
>> Sent: Friday, November 17, 2023 9:44 AM
>> To: Ertman, David M <david.m.ertman@intel.com>
>> Cc: Nguyen, Anthony L <anthony.l.nguyen@intel.com>;
>> davem@davemloft.net; kuba@kernel.org; pabeni@redhat.com;
>> edumazet@google.com; netdev@vger.kernel.org; Wyborny, Carolyn
>> <carolyn.wyborny@intel.com>; daniel.machon@microchip.com; Kitszel,
>> Przemyslaw <przemyslaw.kitszel@intel.com>; Buvaneswaran, Sujai
>> <sujai.buvaneswaran@intel.com>
>> Subject: Re: [PATCH net] ice: Fix VF Reset paths when interface in a failed
>> over aggregate
>> 
>> On Thu, Nov 16, 2023 at 10:36:37PM +0100, Ertman, David M wrote:
>> > > -----Original Message-----
>> > > From: Fijalkowski, Maciej <maciej.fijalkowski@intel.com>
>> > > Sent: Thursday, November 16, 2023 6:22 AM
>> > > To: Nguyen, Anthony L <anthony.l.nguyen@intel.com>
>> > > Cc: davem@davemloft.net; kuba@kernel.org; pabeni@redhat.com;
>> > > edumazet@google.com; netdev@vger.kernel.org; Ertman, David M
>> > > <david.m.ertman@intel.com>; Wyborny, Carolyn
>> > > <carolyn.wyborny@intel.com>; daniel.machon@microchip.com; Kitszel,
>> > > Przemyslaw <przemyslaw.kitszel@intel.com>; Buvaneswaran, Sujai
>> > > <sujai.buvaneswaran@intel.com>
>> > > Subject: Re: [PATCH net] ice: Fix VF Reset paths when interface in a failed
>> > > over aggregate
>> > >
>> > > On Wed, Nov 15, 2023 at 01:12:41PM -0800, Tony Nguyen wrote:
>> > > > From: Dave Ertman <david.m.ertman@intel.com>
>> > > >
>> > > > There is an error when an interface has the following conditions:
>> > > > - PF is in an aggregate (bond)
>> > > > - PF has VFs created on it
>> > > > - bond is in a state where it is failed-over to the secondary interface
>> > > > - A VF reset is issued on one or more of those VFs
>> > > >
>> > > > The issue is generated by the originating PF trying to rebuild or
>> > > > reconfigure the VF resources.  Since the bond is failed over to the
>> > > > secondary interface the queue contexts are in a modified state.
>> > > >
>> > > > To fix this issue, have the originating interface reclaim its resources
>> > > > prior to the tear-down and rebuild or reconfigure.  Then after the
>> process
>> > > > is complete, move the resources back to the currently active interface.
>> > > >
>> > > > There are multiple paths that can be used depending on what triggered
>> the
>> > > > event, so create a helper function to move the queues and use paired
>> calls
>> > > > to the helper (back to origin, process, then move back to active
>> interface)
>> > > > under the same lag_mutex lock.
>> > > >
>> > > > Fixes: 1e0f9881ef79 ("ice: Flesh out implementation of support for
>> SRIOV
>> > > on bonded interface")
>> > > > Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
>> > > > Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
>> > > > Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
>> > > > Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
>> > > > ---
>> > > > This is the net patch mentioned yesterday:
>> > > > https://lore.kernel.org/netdev/71058999-50d9-cc17-d940-
>> > > 3f043734e0ee@intel.com/
>> > > >
>> > > >  drivers/net/ethernet/intel/ice/ice_lag.c      | 42
>> +++++++++++++++++++
>> > > >  drivers/net/ethernet/intel/ice/ice_lag.h      |  1 +
>> > > >  drivers/net/ethernet/intel/ice/ice_vf_lib.c   | 20 +++++++++
>> > > >  drivers/net/ethernet/intel/ice/ice_virtchnl.c | 25 +++++++++++
>> > > >  4 files changed, 88 insertions(+)
>> > > >
>> > > > diff --git a/drivers/net/ethernet/intel/ice/ice_lag.c
>> > > b/drivers/net/ethernet/intel/ice/ice_lag.c
>> > > > index cd065ec48c87..9eed93baa59b 100644
>> > > > --- a/drivers/net/ethernet/intel/ice/ice_lag.c
>> > > > +++ b/drivers/net/ethernet/intel/ice/ice_lag.c
>> > > > @@ -679,6 +679,48 @@ static void ice_lag_move_vf_nodes(struct
>> ice_lag
>> > > *lag, u8 oldport, u8 newport)
>> > > >  			ice_lag_move_single_vf_nodes(lag, oldport,
>> > > newport, i);
>> > > >  }
>> > > >
>> > > > +/**
>> > > > + * ice_lag_move_vf_nodes_cfg - move VF nodes outside LAG netdev
>> > > event context
>> > > > + * @lag: local lag struct
>> > > > + * @src_prt: lport value for source port
>> > > > + * @dst_prt: lport value for destination port
>> > > > + *
>> > > > + * This function is used to move nodes during an out-of-netdev-event
>> > > situation,
>> > > > + * primarily when the driver needs to reconfigure or recreate
>> resources.
>> > > > + *
>> > > > + * Must be called while holding the lag_mutex to avoid lag events from
>> > > > + * processing while out-of-sync moves are happening.  Also, paired
>> > > moves,
>> > > > + * such as used in a reset flow, should both be called under the same
>> > > mutex
>> > > > + * lock to avoid changes between start of reset and end of reset.
>> > > > + */
>> > > > +void ice_lag_move_vf_nodes_cfg(struct ice_lag *lag, u8 src_prt, u8
>> > > dst_prt)
>> > > > +{
>> > > > +	struct ice_lag_netdev_list ndlist, *nl;
>> > > > +	struct list_head *tmp, *n;
>> > > > +	struct net_device *tmp_nd;
>> > > > +
>> > > > +	INIT_LIST_HEAD(&ndlist.node);
>> > > > +	rcu_read_lock();
>> > > > +	for_each_netdev_in_bond_rcu(lag->upper_netdev, tmp_nd) {
>> > >
>> > > Why do you need rcu section for that?
>> > >
>> > > under mutex? lacking context here.
>> > >
>> >
>> > Mutex lock is to stop lag event thread from processing any other event
>> until
>> > after the asynchronous reset is processed.  RCU lock is to stop upper kernel
>> > bonding driver from changing elements while reset is building a list.
>> 
>> Can you point me to relevant piece of code for upper kernel bonding
>> driver? Is there synchronize_rcu() on updates?
>
>Here is the benning of the bonding struct from /include/net/bonding.h
>
>/*
> * Here are the locking policies for the two bonding locks:
> * Get rcu_read_lock when reading or RTNL when writing slave list.
> */
>struct bonding {
>	struct   net_device *dev; /* first - useful for panic debug */
>	struct   slave __rcu *curr_active_slave;
>	struct   slave __rcu *current_arp_slave;
>	struct   slave __rcu *primary_slave;
>	struct   bond_up_slave __rcu *usable_slaves;
>	struct   bond_up_slave __rcu *all_slaves;
>
>> >
>> > > > +		nl = kzalloc(sizeof(*nl), GFP_ATOMIC);
>> > >
>> > > do these have to be new allocations or could you just use list_move?
>> > >
>> >
>> > Building a list from scratch - nothing to move until it is created.
>> 
>> Sorry got confused.
>> 
>> Couldn't you keep the up-to-date list of netdevs instead? And avoid all
>> the building list and then deleting it after processing event?
>> 
>
>The bonding driver is generating netdev events for things changing in the aggregate. The ice
>driver's event handler which takes a snapshot of the members of the bond and creates a work
>item which gets put on the event processing thread and then returns.  The events are processed
>one at a time in sequence asynchronously to the event handler on the processing thread.  The
>contents of the member list for the work item is only valid for that work item and cannot be used
>for a reset event happening asynchronously to the processing queue.

	Why is ice keeping track of the bonding state?  I see there's
also concurrently a patch [0] to block PF reinitialization if attached
to a bond, as well as some upstream discussion regarding ice issues with
bonding and SR-IOV [1], are these things related in some way?

	Whatever that reason is, should the logic apply to other drivers
that do similar things?  For example, team and openvswitch both have
functionality that is largely similar to what bonding does, so I'm
curious as to what specifically is going on that requires special
handling on the part of the device driver.

	-J

[0] 
https://lore.kernel.org/netdev/20231117164427.912563-1-sachin.bahadur@intel.com/

[1]
https://lore.kernel.org/lkml/CC024511-980A-4508-8ABF-659A04367C2B@gmail.com/T/#mde6cc7110fedf54771aa3c13044ae6c0936525fb

>Thanks, 
>DaveE
>
>> >
>> > > > +		if (!nl)
>> > > > +			break;
>> > > > +
>> > > > +		nl->netdev = tmp_nd;
>> > > > +		list_add(&nl->node, &ndlist.node);
>> > >
>> > > list_add_rcu ?
>> > >
>> >
>> > I thought list_add_rcu was for internal list manipulation when prev and
>> next
>> > Are both known and defined?
>> 
>> First time I hear this TBH but disregard the suggestion.
>> 
>> >
>> > > > +	}
>> > > > +	rcu_read_unlock();
>> > >
>> > > you have the very same chunk of code in
>> ice_lag_move_new_vf_nodes().
>> > > pull
>> > > this out to common function?
>> > >
>> > > ...and in ice_lag_rebuild().
>> > >
>> >
>> > Nice catch - for v2, pulled out code into two helper function:
>> > ice_lag_build_netdev_list()
>> > Iie_lag_destroy_netdev_list()
>> >
>> >
>> > > > +	lag->netdev_head = &ndlist.node;
>> > > > +	ice_lag_move_vf_nodes(lag, src_prt, dst_prt);
>> > > > +
>> > > > +	list_for_each_safe(tmp, n, &ndlist.node) {
>> > >
>> > > use list_for_each_entry_safe()
>> > >
>> >
>> > Changed in v2.
>> >
>> > > > +		nl = list_entry(tmp, struct ice_lag_netdev_list, node);
>> > > > +		list_del(&nl->node);
>> > > > +		kfree(nl);
>> > > > +	}
>> > > > +	lag->netdev_head = NULL;
>> >
>> > Thanks for the review!
>> > DaveE
>

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH net] ice: Fix VF Reset paths when interface in a failed over aggregate
  2023-11-17 21:21         ` Jay Vosburgh
@ 2023-11-17 22:03           ` Ertman, David M
  0 siblings, 0 replies; 7+ messages in thread
From: Ertman, David M @ 2023-11-17 22:03 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: Fijalkowski, Maciej, Nguyen, Anthony L, davem@davemloft.net,
	kuba@kernel.org, pabeni@redhat.com, edumazet@google.com,
	netdev@vger.kernel.org, Wyborny, Carolyn,
	daniel.machon@microchip.com, Kitszel, Przemyslaw,
	Buvaneswaran, Sujai



> -----Original Message-----
> From: Jay Vosburgh <jay.vosburgh@canonical.com>
> Sent: Friday, November 17, 2023 1:22 PM
> To: Ertman, David M <david.m.ertman@intel.com>
> Cc: Fijalkowski, Maciej <maciej.fijalkowski@intel.com>; Nguyen, Anthony L
> <anthony.l.nguyen@intel.com>; davem@davemloft.net; kuba@kernel.org;
> pabeni@redhat.com; edumazet@google.com; netdev@vger.kernel.org;
> Wyborny, Carolyn <carolyn.wyborny@intel.com>;
> daniel.machon@microchip.com; Kitszel, Przemyslaw
> <przemyslaw.kitszel@intel.com>; Buvaneswaran, Sujai
> <sujai.buvaneswaran@intel.com>
> Subject: Re: [PATCH net] ice: Fix VF Reset paths when interface in a failed
> over aggregate
> 
> Ertman, David M <david.m.ertman@intel.com> wrote:
> 
> >> -----Original Message-----
> >> From: Fijalkowski, Maciej <maciej.fijalkowski@intel.com>
> >> Sent: Friday, November 17, 2023 9:44 AM
> >> To: Ertman, David M <david.m.ertman@intel.com>
> >> Cc: Nguyen, Anthony L <anthony.l.nguyen@intel.com>;
> >> davem@davemloft.net; kuba@kernel.org; pabeni@redhat.com;
> >> edumazet@google.com; netdev@vger.kernel.org; Wyborny, Carolyn
> >> <carolyn.wyborny@intel.com>; daniel.machon@microchip.com; Kitszel,
> >> Przemyslaw <przemyslaw.kitszel@intel.com>; Buvaneswaran, Sujai
> >> <sujai.buvaneswaran@intel.com>
> >> Subject: Re: [PATCH net] ice: Fix VF Reset paths when interface in a failed
> >> over aggregate
> >>
> >> On Thu, Nov 16, 2023 at 10:36:37PM +0100, Ertman, David M wrote:
> >> > > -----Original Message-----
> >> > > From: Fijalkowski, Maciej <maciej.fijalkowski@intel.com>
> >> > > Sent: Thursday, November 16, 2023 6:22 AM
> >> > > To: Nguyen, Anthony L <anthony.l.nguyen@intel.com>
> >> > > Cc: davem@davemloft.net; kuba@kernel.org; pabeni@redhat.com;
> >> > > edumazet@google.com; netdev@vger.kernel.org; Ertman, David M
> >> > > <david.m.ertman@intel.com>; Wyborny, Carolyn
> >> > > <carolyn.wyborny@intel.com>; daniel.machon@microchip.com;
> Kitszel,
> >> > > Przemyslaw <przemyslaw.kitszel@intel.com>; Buvaneswaran, Sujai
> >> > > <sujai.buvaneswaran@intel.com>
> >> > > Subject: Re: [PATCH net] ice: Fix VF Reset paths when interface in a
> failed
> >> > > over aggregate
> >> > >
> >> > > On Wed, Nov 15, 2023 at 01:12:41PM -0800, Tony Nguyen wrote:
> >> > > > From: Dave Ertman <david.m.ertman@intel.com>
> >> > > >
> >> > > > There is an error when an interface has the following conditions:
> >> > > > - PF is in an aggregate (bond)
> >> > > > - PF has VFs created on it
> >> > > > - bond is in a state where it is failed-over to the secondary interface
> >> > > > - A VF reset is issued on one or more of those VFs
> >> > > >
> >> > > > The issue is generated by the originating PF trying to rebuild or
> >> > > > reconfigure the VF resources.  Since the bond is failed over to the
> >> > > > secondary interface the queue contexts are in a modified state.
> >> > > >
> >> > > > To fix this issue, have the originating interface reclaim its resources
> >> > > > prior to the tear-down and rebuild or reconfigure.  Then after the
> >> process
> >> > > > is complete, move the resources back to the currently active
> interface.
> >> > > >
> >> > > > There are multiple paths that can be used depending on what
> triggered
> >> the
> >> > > > event, so create a helper function to move the queues and use
> paired
> >> calls
> >> > > > to the helper (back to origin, process, then move back to active
> >> interface)
> >> > > > under the same lag_mutex lock.
> >> > > >
> >> > > > Fixes: 1e0f9881ef79 ("ice: Flesh out implementation of support for
> >> SRIOV
> >> > > on bonded interface")
> >> > > > Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
> >> > > > Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
> >> > > > Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
> >> > > > Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
> >> > > > ---
> >> > > > This is the net patch mentioned yesterday:
> >> > > > https://lore.kernel.org/netdev/71058999-50d9-cc17-d940-
> >> > > 3f043734e0ee@intel.com/
> >> > > >
> >> > > >  drivers/net/ethernet/intel/ice/ice_lag.c      | 42
> >> +++++++++++++++++++
> >> > > >  drivers/net/ethernet/intel/ice/ice_lag.h      |  1 +
> >> > > >  drivers/net/ethernet/intel/ice/ice_vf_lib.c   | 20 +++++++++
> >> > > >  drivers/net/ethernet/intel/ice/ice_virtchnl.c | 25 +++++++++++
> >> > > >  4 files changed, 88 insertions(+)
> >> > > >
> >> > > > diff --git a/drivers/net/ethernet/intel/ice/ice_lag.c
> >> > > b/drivers/net/ethernet/intel/ice/ice_lag.c
> >> > > > index cd065ec48c87..9eed93baa59b 100644
> >> > > > --- a/drivers/net/ethernet/intel/ice/ice_lag.c
> >> > > > +++ b/drivers/net/ethernet/intel/ice/ice_lag.c
> >> > > > @@ -679,6 +679,48 @@ static void ice_lag_move_vf_nodes(struct
> >> ice_lag
> >> > > *lag, u8 oldport, u8 newport)
> >> > > >  			ice_lag_move_single_vf_nodes(lag, oldport,
> >> > > newport, i);
> >> > > >  }
> >> > > >
> >> > > > +/**
> >> > > > + * ice_lag_move_vf_nodes_cfg - move VF nodes outside LAG
> netdev
> >> > > event context
> >> > > > + * @lag: local lag struct
> >> > > > + * @src_prt: lport value for source port
> >> > > > + * @dst_prt: lport value for destination port
> >> > > > + *
> >> > > > + * This function is used to move nodes during an out-of-netdev-
> event
> >> > > situation,
> >> > > > + * primarily when the driver needs to reconfigure or recreate
> >> resources.
> >> > > > + *
> >> > > > + * Must be called while holding the lag_mutex to avoid lag events
> from
> >> > > > + * processing while out-of-sync moves are happening.  Also, paired
> >> > > moves,
> >> > > > + * such as used in a reset flow, should both be called under the
> same
> >> > > mutex
> >> > > > + * lock to avoid changes between start of reset and end of reset.
> >> > > > + */
> >> > > > +void ice_lag_move_vf_nodes_cfg(struct ice_lag *lag, u8 src_prt, u8
> >> > > dst_prt)
> >> > > > +{
> >> > > > +	struct ice_lag_netdev_list ndlist, *nl;
> >> > > > +	struct list_head *tmp, *n;
> >> > > > +	struct net_device *tmp_nd;
> >> > > > +
> >> > > > +	INIT_LIST_HEAD(&ndlist.node);
> >> > > > +	rcu_read_lock();
> >> > > > +	for_each_netdev_in_bond_rcu(lag->upper_netdev,
> tmp_nd) {
> >> > >
> >> > > Why do you need rcu section for that?
> >> > >
> >> > > under mutex? lacking context here.
> >> > >
> >> >
> >> > Mutex lock is to stop lag event thread from processing any other event
> >> until
> >> > after the asynchronous reset is processed.  RCU lock is to stop upper
> kernel
> >> > bonding driver from changing elements while reset is building a list.
> >>
> >> Can you point me to relevant piece of code for upper kernel bonding
> >> driver? Is there synchronize_rcu() on updates?
> >
> >Here is the benning of the bonding struct from /include/net/bonding.h
> >
> >/*
> > * Here are the locking policies for the two bonding locks:
> > * Get rcu_read_lock when reading or RTNL when writing slave list.
> > */
> >struct bonding {
> >	struct   net_device *dev; /* first - useful for panic debug */
> >	struct   slave __rcu *curr_active_slave;
> >	struct   slave __rcu *current_arp_slave;
> >	struct   slave __rcu *primary_slave;
> >	struct   bond_up_slave __rcu *usable_slaves;
> >	struct   bond_up_slave __rcu *all_slaves;
> >
> >> >
> >> > > > +		nl = kzalloc(sizeof(*nl), GFP_ATOMIC);
> >> > >
> >> > > do these have to be new allocations or could you just use list_move?
> >> > >
> >> >
> >> > Building a list from scratch - nothing to move until it is created.
> >>
> >> Sorry got confused.
> >>
> >> Couldn't you keep the up-to-date list of netdevs instead? And avoid all
> >> the building list and then deleting it after processing event?
> >>
> >
> >The bonding driver is generating netdev events for things changing in the
> aggregate. The ice
> >driver's event handler which takes a snapshot of the members of the bond
> and creates a work
> >item which gets put on the event processing thread and then returns.  The
> events are processed
> >one at a time in sequence asynchronously to the event handler on the
> processing thread.  The
> >contents of the member list for the work item is only valid for that work
> item and cannot be used
> >for a reset event happening asynchronously to the processing queue.
> 
> 	Why is ice keeping track of the bonding state?  I see there's
> also concurrently a patch [0] to block PF reinitialization if attached
> to a bond, as well as some upstream discussion regarding ice issues with
> bonding and SR-IOV [1], are these things related in some way?
> 
> 	Whatever that reason is, should the logic apply to other drivers
> that do similar things?  For example, team and openvswitch both have
> functionality that is largely similar to what bonding does, so I'm
> curious as to what specifically is going on that requires special
> handling on the part of the device driver.
> 
> 	-J
> 

The original patches are implementing support for SRIOV VFs on an aggregate
interface (bonded PFs).  This results in a single VF being backed up by two PFs.  The VF
and users of the VF are not even aware that an aggregate is involved. To accomplish this
in software, the ice driver has to manipulate resources (queues, scheduling nodes, etc)
so that VFs can communicate through whichever interface is active.  It has to move
resources back and forth based on events generated by the bonding driver. Also as interfaces
are added to an aggregate, setup has to be done to prepare to support SRIOV VFs.

As far as other entities that act like the bonding driver - they will only have support for SRIOV
VFs if they also generate events like the bonding driver.

> [0]
> https://lore.kernel.org/netdev/20231117164427.912563-1-
> sachin.bahadur@intel.com/
> 
> [1]
> https://lore.kernel.org/lkml/CC024511-980A-4508-8ABF-
> 659A04367C2B@gmail.com/T/#mde6cc7110fedf54771aa3c13044ae6c0936525f
> b
> 
> >Thanks,
> >DaveE
> >
> >> >
> >> > > > +		if (!nl)
> >> > > > +			break;
> >> > > > +
> >> > > > +		nl->netdev = tmp_nd;
> >> > > > +		list_add(&nl->node, &ndlist.node);
> >> > >
> >> > > list_add_rcu ?
> >> > >
> >> >
> >> > I thought list_add_rcu was for internal list manipulation when prev and
> >> next
> >> > Are both known and defined?
> >>
> >> First time I hear this TBH but disregard the suggestion.
> >>
> >> >
> >> > > > +	}
> >> > > > +	rcu_read_unlock();
> >> > >
> >> > > you have the very same chunk of code in
> >> ice_lag_move_new_vf_nodes().
> >> > > pull
> >> > > this out to common function?
> >> > >
> >> > > ...and in ice_lag_rebuild().
> >> > >
> >> >
> >> > Nice catch - for v2, pulled out code into two helper function:
> >> > ice_lag_build_netdev_list()
> >> > Iie_lag_destroy_netdev_list()
> >> >
> >> >
> >> > > > +	lag->netdev_head = &ndlist.node;
> >> > > > +	ice_lag_move_vf_nodes(lag, src_prt, dst_prt);
> >> > > > +
> >> > > > +	list_for_each_safe(tmp, n, &ndlist.node) {
> >> > >
> >> > > use list_for_each_entry_safe()
> >> > >
> >> >
> >> > Changed in v2.
> >> >
> >> > > > +		nl = list_entry(tmp, struct ice_lag_netdev_list, node);
> >> > > > +		list_del(&nl->node);
> >> > > > +		kfree(nl);
> >> > > > +	}
> >> > > > +	lag->netdev_head = NULL;
> >> >
> >> > Thanks for the review!
> >> > DaveE
> >
> 
> ---
> 	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-11-17 22:03 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-15 21:12 [PATCH net] ice: Fix VF Reset paths when interface in a failed over aggregate Tony Nguyen
2023-11-16 14:22 ` Maciej Fijalkowski
2023-11-16 21:36   ` Ertman, David M
2023-11-17 17:44     ` Maciej Fijalkowski
2023-11-17 18:17       ` Ertman, David M
2023-11-17 21:21         ` Jay Vosburgh
2023-11-17 22:03           ` Ertman, David M

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).