netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH net-next 0/2] Configuring PFC stall prevention via ethtool
@ 2017-11-15 19:00 Eran Ben Elisha
  2017-11-15 19:00 ` [RFC PATCH net-next 1/2] ethtool: Add support for configuring PFC stall prevention in ethtool Eran Ben Elisha
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Eran Ben Elisha @ 2017-11-15 19:00 UTC (permalink / raw)
  To: netdev, David S. Miller, John W. Linville, Saeed Mahameed
  Cc: Gal Pressman, Ariel Almog, Eran Ben Elisha, Inbar Karmy

From: Inbar Karmy <inbark@mellanox.com>

This RFC adds support for configuring PFC stall prevention through ethtool.

In the event where the device unexpectedly becomes unresponsive for a long
period of time, flow control mechanism may propagate pause frames which will
cause congestion spreading to the entire network.

To prevent this scenario, the device may implement a protection mechanism for
monitoring and resolving such state.  The following patches allow the user to
control the stall prevention functionality.

PFC stall prevention configuration is done via ethtool -a (pause).
Two modes are introduced:
Default - current behavior per driver.
Auto - protection mechanism controlled automatically by the driver.
Due to lack of extension ability of ethtool_ops set_pauseparam, a new
ethtool_ops get_pfc_prevention_mode is introduced.

I based this patch set on net-next commit 50895b9de1d3("tcp: highest_sack fix
").

Inbar Karmy (2):
  ethtool: Add support for configuring PFC stall prevention in ethtool
  net/mlx5e: PFC stall prevention support

 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 51 ++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlx5/core/port.c     | 56 ++++++++++++++++++----
 include/linux/ethtool.h                            |  8 ++++
 include/linux/mlx5/mlx5_ifc.h                      | 22 +++++++--
 include/linux/mlx5/port.h                          |  5 ++
 include/uapi/linux/ethtool.h                       | 20 ++++++++
 net/core/ethtool.c                                 | 39 +++++++++++++++
 7 files changed, 188 insertions(+), 13 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC PATCH net-next 1/2] ethtool: Add support for configuring PFC stall prevention in ethtool
  2017-11-15 19:00 [RFC PATCH net-next 0/2] Configuring PFC stall prevention via ethtool Eran Ben Elisha
@ 2017-11-15 19:00 ` Eran Ben Elisha
  2017-11-15 19:00 ` [RFC PATCH net-next 2/2] net/mlx5e: PFC stall prevention support Eran Ben Elisha
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Eran Ben Elisha @ 2017-11-15 19:00 UTC (permalink / raw)
  To: netdev, David S. Miller, John W. Linville, Saeed Mahameed
  Cc: Gal Pressman, Ariel Almog, Eran Ben Elisha, Inbar Karmy

From: Inbar Karmy <inbark@mellanox.com>

In the event where the device unexpectedly becomes unresponsive
for a long period of time, flow control mechanism may propagate
pause frames which will cause congestion spreading to the entire
network.
To prevent this scenario, when the device is stalled for a period
longer than a pre-configured timeout, flow control mechanisms are
automatically disabled.
This patch defines a new ETHTOOL_GPFCPREVENTION/ETHTOOL_SPFCPREVENTION API,
handled by the new get_pfc_prevention_mode/set_pfc_prevention_mode
callbacks.
This API provides support for configuring flow control protection mechanism
into two deferent modes: default/auto.

Signed-off-by: Inbar Karmy <inbark@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
---
 include/linux/ethtool.h      |  8 ++++++++
 include/uapi/linux/ethtool.h | 20 ++++++++++++++++++++
 net/core/ethtool.c           | 39 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 67 insertions(+)

diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 2ec41a7..f6d5e26 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -310,6 +310,10 @@ bool ethtool_convert_link_mode_to_legacy_u32(u32 *legacy_u32,
  *	fields should be ignored (use %__ETHTOOL_LINK_MODE_MASK_NBITS
  *	instead of the latter), any change to them will be overwritten
  *	by kernel. Returns a negative error code or zero.
+ * @get_pfc_prevention_mode: Report pfc storm prevention mode-
+ *	default/auto.
+ * @set_pfc_prevention_mode: Set pfc storm prevention mode. Returns
+ *	a negative error code or zero.
  *
  * All operations are optional (i.e. the function pointer may be set
  * to %NULL) and callers must take this into account.  Callers must
@@ -400,5 +404,9 @@ struct ethtool_ops {
 				      struct ethtool_fecparam *);
 	int	(*set_fecparam)(struct net_device *,
 				      struct ethtool_fecparam *);
+	int	(*get_pfc_prevention_mode)(struct net_device *,
+					   struct ethtool_pfc_prevention *);
+	int	(*set_pfc_prevention_mode)(struct net_device *,
+					   struct ethtool_pfc_prevention *);
 };
 #endif /* _LINUX_ETHTOOL_H */
diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index ac71559..6ba0d43 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -552,6 +552,24 @@ struct ethtool_pauseparam {
 #define ETH_GSTRING_LEN		32
 
 /**
+ * struct ethtool_pfc_prevention - flow control stall prevention mode
+ * @cmd: Command number = %ETHTOOL_GPFCPREVENTION or %ETHTOOL_SPFCPREVENTION
+ * @mode: Flag to define the time till activate pfc stall prevention:
+ *	  auto, default.
+ */
+
+enum pfc_prevention_type {
+	ETH_PFC_PREVENTION_DEFAULT		= 0,
+	ETH_PFC_PREVENTION_AUTO,
+};
+
+struct ethtool_pfc_prevention {
+	__u32	cmd;
+	__u32	mode;
+	__u32	reserved;
+};
+
+/**
  * enum ethtool_stringset - string set ID
  * @ETH_SS_TEST: Self-test result names, for use with %ETHTOOL_TEST
  * @ETH_SS_STATS: Statistic names, for use with %ETHTOOL_GSTATS
@@ -1374,6 +1392,8 @@ enum ethtool_fec_config_bits {
 #define ETHTOOL_PHY_STUNABLE	0x0000004f /* Set PHY tunable configuration */
 #define ETHTOOL_GFECPARAM	0x00000050 /* Get FEC settings */
 #define ETHTOOL_SFECPARAM	0x00000051 /* Set FEC settings */
+#define ETHTOOL_GPFCPREVENTION  0x00000052 /* Get PFC stall prevention configuration */
+#define ETHTOOL_SPFCPREVENTION  0x00000053 /* Set PFC stall prevention configuration*/
 
 /* compatibility with older code */
 #define SPARC_ETH_GSET		ETHTOOL_GSET
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index f8fcf45..a656ecc 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -2556,6 +2556,38 @@ static int ethtool_set_fecparam(struct net_device *dev, void __user *useraddr)
 	return dev->ethtool_ops->set_fecparam(dev, &fecparam);
 }
 
+static int ethtool_get_pfc_prevention_mode(struct net_device *dev, void __user *useraddr)
+{
+	struct ethtool_pfc_prevention preventionmod = { .cmd = ETHTOOL_GPFCPREVENTION };
+	int ret;
+
+	if (!dev->ethtool_ops->get_pfc_prevention_mode)
+		return -EOPNOTSUPP;
+
+	ret = dev->ethtool_ops->get_pfc_prevention_mode(dev, &preventionmod);
+
+	if (copy_to_user(useraddr, &preventionmod, sizeof(preventionmod)))
+		return -EFAULT;
+
+	return ret;
+}
+
+static int ethtool_set_pfc_prevention_mode(struct net_device *dev, void __user *useraddr)
+{
+	struct ethtool_pfc_prevention preventionmod;
+
+	if (!dev->ethtool_ops->set_pfc_prevention_mode)
+		return -EOPNOTSUPP;
+
+	if (copy_from_user(&preventionmod, useraddr, sizeof(preventionmod)))
+		return -EFAULT;
+
+	if (preventionmod.reserved)
+		return -EINVAL;
+
+	return dev->ethtool_ops->set_pfc_prevention_mode(dev, &preventionmod);
+}
+
 /* The main entry point in this file.  Called from net/core/dev_ioctl.c */
 
 int dev_ethtool(struct net *net, struct ifreq *ifr)
@@ -2615,6 +2647,7 @@ int dev_ethtool(struct net *net, struct ifreq *ifr)
 	case ETHTOOL_PHY_GTUNABLE:
 	case ETHTOOL_GLINKSETTINGS:
 	case ETHTOOL_GFECPARAM:
+	case ETHTOOL_GPFCPREVENTION:
 		break;
 	default:
 		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
@@ -2830,6 +2863,12 @@ int dev_ethtool(struct net *net, struct ifreq *ifr)
 	case ETHTOOL_SFECPARAM:
 		rc = ethtool_set_fecparam(dev, useraddr);
 		break;
+	case ETHTOOL_GPFCPREVENTION:
+		rc = ethtool_get_pfc_prevention_mode(dev, useraddr);
+		break;
+	case ETHTOOL_SPFCPREVENTION:
+		rc = ethtool_set_pfc_prevention_mode(dev, useraddr);
+		break;
 	default:
 		rc = -EOPNOTSUPP;
 	}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH net-next 2/2] net/mlx5e: PFC stall prevention support
  2017-11-15 19:00 [RFC PATCH net-next 0/2] Configuring PFC stall prevention via ethtool Eran Ben Elisha
  2017-11-15 19:00 ` [RFC PATCH net-next 1/2] ethtool: Add support for configuring PFC stall prevention in ethtool Eran Ben Elisha
@ 2017-11-15 19:00 ` Eran Ben Elisha
  2017-11-16  1:07 ` [RFC PATCH net-next 0/2] Configuring PFC stall prevention via ethtool Andrew Lunn
  2017-11-16  8:44 ` Michal Kubecek
  3 siblings, 0 replies; 12+ messages in thread
From: Eran Ben Elisha @ 2017-11-15 19:00 UTC (permalink / raw)
  To: netdev, David S. Miller, John W. Linville, Saeed Mahameed
  Cc: Gal Pressman, Ariel Almog, Eran Ben Elisha, Inbar Karmy

From: Inbar Karmy <inbark@mellanox.com>

Implement set/get functions to configure PFC stall prevention
mode by ethtool.
On default the stall prevention timeout is configured to 8 sec.
Auto mode will set the stall prevention timeout to be 100 msec.

Signed-off-by: Inbar Karmy <inbark@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 51 ++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlx5/core/port.c     | 56 ++++++++++++++++++----
 include/linux/mlx5/mlx5_ifc.h                      | 22 +++++++--
 include/linux/mlx5/port.h                          |  5 ++
 4 files changed, 121 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index 23425f0..74096f1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -1180,6 +1180,55 @@ static int mlx5e_set_pauseparam(struct net_device *netdev,
 	return err;
 }
 
+#define MLX5E_PFC_PREVEN_CRITICAL_AUTO_MSEC	100
+#define MLX5E_PFC_PREVEN_MINOR_AUTO_MSEC	85
+#define MLX5E_PFC_PREVEN_CRITICAL_DEFAULT_MSEC	8000
+#define MLX5E_PFC_PREVEN_MINOR_DEFAULT_MSEC	6800
+
+static int mlx5e_get_pfc_prevention_mode(struct net_device *netdev,
+					 struct ethtool_pfc_prevention *pfc_preven)
+{
+	struct mlx5e_priv *priv    = netdev_priv(netdev);
+	struct mlx5_core_dev *mdev = priv->mdev;
+	u16 pfc_prevention_critical;
+	int err;
+
+	if (!MLX5_CAP_PCAM_FEATURE((priv)->mdev, pfcc_mask))
+		return -EOPNOTSUPP;
+
+	err = mlx5_query_port_pfc_prevention(mdev, &pfc_prevention_critical);
+
+	pfc_preven->mode = (pfc_prevention_critical ==
+			    MLX5E_PFC_PREVEN_CRITICAL_DEFAULT_MSEC) ?
+			    ETH_PFC_PREVENTION_DEFAULT : ETH_PFC_PREVENTION_AUTO;
+	return err;
+}
+
+static int mlx5e_set_pfc_prevention_mode(struct net_device *netdev,
+					 struct ethtool_pfc_prevention *pfc_preven)
+{
+	struct mlx5e_priv *priv    = netdev_priv(netdev);
+	struct mlx5_core_dev *mdev = priv->mdev;
+	u16 pfc_prevention_critical;
+	u16 pfc_prevention_minor;
+	int err;
+
+	if (!MLX5_CAP_PCAM_FEATURE((priv)->mdev, pfcc_mask))
+		return -EOPNOTSUPP;
+
+	pfc_prevention_critical = (pfc_preven->mode == ETH_PFC_PREVENTION_DEFAULT) ?
+				   MLX5E_PFC_PREVEN_CRITICAL_DEFAULT_MSEC :
+				   MLX5E_PFC_PREVEN_CRITICAL_AUTO_MSEC;
+	pfc_prevention_minor = (pfc_prevention_critical ==
+				MLX5E_PFC_PREVEN_CRITICAL_DEFAULT_MSEC) ?
+				MLX5E_PFC_PREVEN_MINOR_DEFAULT_MSEC :
+				MLX5E_PFC_PREVEN_MINOR_AUTO_MSEC;
+	err = mlx5_set_port_pfc_prevention(mdev, pfc_prevention_critical,
+					   pfc_prevention_minor);
+
+	return err;
+}
+
 int mlx5e_ethtool_get_ts_info(struct mlx5e_priv *priv,
 			      struct ethtool_ts_info *info)
 {
@@ -1696,6 +1745,8 @@ static int mlx5e_flash_device(struct net_device *dev,
 	.set_tunable       = mlx5e_set_tunable,
 	.get_pauseparam    = mlx5e_get_pauseparam,
 	.set_pauseparam    = mlx5e_set_pauseparam,
+	.get_pfc_prevention_mode = mlx5e_get_pfc_prevention_mode,
+	.set_pfc_prevention_mode = mlx5e_set_pfc_prevention_mode,
 	.get_ts_info       = mlx5e_get_ts_info,
 	.set_phys_id       = mlx5e_set_phys_id,
 	.get_wol	   = mlx5e_get_wol,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/port.c b/drivers/net/ethernet/mellanox/mlx5/core/port.c
index c37d00c..d2f7357 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/port.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/port.c
@@ -483,6 +483,16 @@ int mlx5_core_query_ib_ppcnt(struct mlx5_core_dev *dev,
 }
 EXPORT_SYMBOL_GPL(mlx5_core_query_ib_ppcnt);
 
+static int mlx5_query_pfcc_reg(struct mlx5_core_dev *dev, u32 *out, u32 out_size)
+{
+	u32 in[MLX5_ST_SZ_DW(pfcc_reg)] = {0};
+
+	MLX5_SET(pfcc_reg, in, local_port, 1);
+
+	return mlx5_core_access_reg(dev, in, sizeof(in), out,
+				    out_size, MLX5_REG_PFCC, 0, 0);
+}
+
 int mlx5_set_port_pause(struct mlx5_core_dev *dev, u32 rx_pause, u32 tx_pause)
 {
 	u32 in[MLX5_ST_SZ_DW(pfcc_reg)] = {0};
@@ -500,13 +510,10 @@ int mlx5_set_port_pause(struct mlx5_core_dev *dev, u32 rx_pause, u32 tx_pause)
 int mlx5_query_port_pause(struct mlx5_core_dev *dev,
 			  u32 *rx_pause, u32 *tx_pause)
 {
-	u32 in[MLX5_ST_SZ_DW(pfcc_reg)] = {0};
 	u32 out[MLX5_ST_SZ_DW(pfcc_reg)];
 	int err;
 
-	MLX5_SET(pfcc_reg, in, local_port, 1);
-	err = mlx5_core_access_reg(dev, in, sizeof(in), out,
-				   sizeof(out), MLX5_REG_PFCC, 0, 0);
+	err = mlx5_query_pfcc_reg(dev, out, sizeof(out));
 	if (err)
 		return err;
 
@@ -520,6 +527,42 @@ int mlx5_query_port_pause(struct mlx5_core_dev *dev,
 }
 EXPORT_SYMBOL_GPL(mlx5_query_port_pause);
 
+int mlx5_set_port_pfc_prevention(struct mlx5_core_dev *dev,
+				 u16 pfc_preven_critical, u16 pfc_preven_minor)
+{
+	u32 in[MLX5_ST_SZ_DW(pfcc_reg)] = {0};
+	u32 out[MLX5_ST_SZ_DW(pfcc_reg)];
+
+	MLX5_SET(pfcc_reg, in, local_port, 1);
+	MLX5_SET(pfcc_reg, in, pptx_mask_n, 1);
+	MLX5_SET(pfcc_reg, in, pprx_mask_n, 1);
+	MLX5_SET(pfcc_reg, in, ppan_mask_n, 1);
+	MLX5_SET(pfcc_reg, in, critical_stall_mask, 1);
+	MLX5_SET(pfcc_reg, in, minor_stall_mask, 1);
+	MLX5_SET(pfcc_reg, in, device_stall_critical_watermark, pfc_preven_critical);
+	MLX5_SET(pfcc_reg, in, device_stall_minor_watermark, pfc_preven_minor);
+
+	return mlx5_core_access_reg(dev, in, sizeof(in), out,
+				    sizeof(out), MLX5_REG_PFCC, 0, 1);
+}
+
+int mlx5_query_port_pfc_prevention(struct mlx5_core_dev *dev,
+				   u16 *pfc_preven_critical)
+{
+	u32 out[MLX5_ST_SZ_DW(pfcc_reg)];
+	int err;
+
+	err = mlx5_query_pfcc_reg(dev, out, sizeof(out));
+	if (err)
+		return err;
+
+	if (pfc_preven_critical)
+		*pfc_preven_critical = MLX5_GET(pfcc_reg, out,
+						device_stall_critical_watermark);
+
+	return 0;
+}
+
 int mlx5_set_port_pfc(struct mlx5_core_dev *dev, u8 pfc_en_tx, u8 pfc_en_rx)
 {
 	u32 in[MLX5_ST_SZ_DW(pfcc_reg)] = {0};
@@ -538,13 +581,10 @@ int mlx5_set_port_pfc(struct mlx5_core_dev *dev, u8 pfc_en_tx, u8 pfc_en_rx)
 
 int mlx5_query_port_pfc(struct mlx5_core_dev *dev, u8 *pfc_en_tx, u8 *pfc_en_rx)
 {
-	u32 in[MLX5_ST_SZ_DW(pfcc_reg)] = {0};
 	u32 out[MLX5_ST_SZ_DW(pfcc_reg)];
 	int err;
 
-	MLX5_SET(pfcc_reg, in, local_port, 1);
-	err = mlx5_core_access_reg(dev, in, sizeof(in), out,
-				   sizeof(out), MLX5_REG_PFCC, 0, 0);
+	err = mlx5_query_pfcc_reg(dev, out, sizeof(out));
 	if (err)
 		return err;
 
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 3e5363f..966cd5f 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -7755,7 +7755,11 @@ struct mlx5_ifc_pifr_reg_bits {
 struct mlx5_ifc_pfcc_reg_bits {
 	u8         reserved_at_0[0x8];
 	u8         local_port[0x8];
-	u8         reserved_at_10[0x10];
+	u8         reserved_at_10[0xb];
+	u8         ppan_mask_n[0x1];
+	u8         minor_stall_mask[0x1];
+	u8         critical_stall_mask[0x1];
+	u8         reserved_at_1e[0x2];
 
 	u8         ppan[0x4];
 	u8         reserved_at_24[0x4];
@@ -7765,17 +7769,22 @@ struct mlx5_ifc_pfcc_reg_bits {
 
 	u8         pptx[0x1];
 	u8         aptx[0x1];
-	u8         reserved_at_42[0x6];
+	u8         pptx_mask_n[0x1];
+	u8         reserved_at_43[0x5];
 	u8         pfctx[0x8];
 	u8         reserved_at_50[0x10];
 
 	u8         pprx[0x1];
 	u8         aprx[0x1];
-	u8         reserved_at_62[0x6];
+	u8         pprx_mask_n[0x1];
+	u8         reserved_at_63[0x5];
 	u8         pfcrx[0x8];
 	u8         reserved_at_70[0x10];
 
-	u8         reserved_at_80[0x80];
+	u8         device_stall_minor_watermark[0x10];
+	u8         device_stall_critical_watermark[0x10];
+
+	u8         reserved_at_a0[0x60];
 };
 
 struct mlx5_ifc_pelc_reg_bits {
@@ -7816,7 +7825,10 @@ struct mlx5_ifc_peir_reg_bits {
 };
 
 struct mlx5_ifc_pcam_enhanced_features_bits {
-	u8         reserved_at_0[0x7b];
+	u8         reserved_at_0[0x76];
+ 
+	u8         pfcc_mask[0x1];
+	u8         reserved_at_77[0x4];
 
 	u8         rx_buffer_fullness_counters[0x1];
 	u8         ptys_connector_type[0x1];
diff --git a/include/linux/mlx5/port.h b/include/linux/mlx5/port.h
index 035f0d4..139a228 100644
--- a/include/linux/mlx5/port.h
+++ b/include/linux/mlx5/port.h
@@ -147,6 +147,11 @@ int mlx5_query_port_vl_hw_cap(struct mlx5_core_dev *dev,
 int mlx5_query_port_pause(struct mlx5_core_dev *dev,
 			  u32 *rx_pause, u32 *tx_pause);
 
+int mlx5_set_port_pfc_prevention(struct mlx5_core_dev *dev, u16 pfc_preven_critical,
+				 u16 pfc_preven_minor);
+int mlx5_query_port_pfc_prevention(struct mlx5_core_dev *dev,
+				   u16 *pfc_preven_critical);
+
 int mlx5_set_port_pfc(struct mlx5_core_dev *dev, u8 pfc_en_tx, u8 pfc_en_rx);
 int mlx5_query_port_pfc(struct mlx5_core_dev *dev, u8 *pfc_en_tx,
 			u8 *pfc_en_rx);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH net-next 0/2] Configuring PFC stall prevention via ethtool
  2017-11-15 19:00 [RFC PATCH net-next 0/2] Configuring PFC stall prevention via ethtool Eran Ben Elisha
  2017-11-15 19:00 ` [RFC PATCH net-next 1/2] ethtool: Add support for configuring PFC stall prevention in ethtool Eran Ben Elisha
  2017-11-15 19:00 ` [RFC PATCH net-next 2/2] net/mlx5e: PFC stall prevention support Eran Ben Elisha
@ 2017-11-16  1:07 ` Andrew Lunn
  2017-11-16  2:44   ` Andrew Lunn
  2017-11-16  8:44 ` Michal Kubecek
  3 siblings, 1 reply; 12+ messages in thread
From: Andrew Lunn @ 2017-11-16  1:07 UTC (permalink / raw)
  To: Eran Ben Elisha
  Cc: netdev, David S. Miller, John W. Linville, Saeed Mahameed,
	Gal Pressman, Ariel Almog, Inbar Karmy

On Wed, Nov 15, 2017 at 09:00:09PM +0200, Eran Ben Elisha wrote:
> From: Inbar Karmy <inbark@mellanox.com>
> 
> This RFC adds support for configuring PFC stall prevention through ethtool.
> 
> In the event where the device unexpectedly becomes unresponsive for a long
> period of time, flow control mechanism may propagate pause frames which will
> cause congestion spreading to the entire network.
> 
> To prevent this scenario, the device may implement a protection mechanism for
> monitoring and resolving such state.  The following patches allow the user to
> control the stall prevention functionality.
> 
> PFC stall prevention configuration is done via ethtool -a (pause).
> Two modes are introduced:
> Default - current behavior per driver.
> Auto - protection mechanism controlled automatically by the driver.

Why Auto?

Down in the driver you seem to translate this to a time. And it looks
like your hardware is flexible on that time, it can probably do at
least 8s to 100ms.

Why not specify a time?

What do other vendors support? Time? Number of pause frames sent?

     Andrew

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH net-next 0/2] Configuring PFC stall prevention via ethtool
  2017-11-16  1:07 ` [RFC PATCH net-next 0/2] Configuring PFC stall prevention via ethtool Andrew Lunn
@ 2017-11-16  2:44   ` Andrew Lunn
  2017-11-16  9:17     ` Eran Ben Elisha
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Lunn @ 2017-11-16  2:44 UTC (permalink / raw)
  To: Eran Ben Elisha
  Cc: netdev, David S. Miller, John W. Linville, Saeed Mahameed,
	Gal Pressman, Ariel Almog, Inbar Karmy

> What do other vendors support? Time? Number of pause frames sent?

So i checked a few Marvell Switches. You can also specify a time. It
is a little bit more complex than that, since the units of time depend
on the link speed. But converting a time in ms to what the register
wants is possible.

So i'm thinking rather than a poorly defined 'Auto', passing a time
would be better.

      Andrew

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH net-next 0/2] Configuring PFC stall prevention via ethtool
  2017-11-15 19:00 [RFC PATCH net-next 0/2] Configuring PFC stall prevention via ethtool Eran Ben Elisha
                   ` (2 preceding siblings ...)
  2017-11-16  1:07 ` [RFC PATCH net-next 0/2] Configuring PFC stall prevention via ethtool Andrew Lunn
@ 2017-11-16  8:44 ` Michal Kubecek
  2017-11-16 12:03   ` Eran Ben Elisha
  2017-11-16 15:42   ` Andrew Lunn
  3 siblings, 2 replies; 12+ messages in thread
From: Michal Kubecek @ 2017-11-16  8:44 UTC (permalink / raw)
  To: Eran Ben Elisha
  Cc: netdev, David S. Miller, John W. Linville, Saeed Mahameed,
	Gal Pressman, Ariel Almog, Inbar Karmy

On Wed, Nov 15, 2017 at 09:00:09PM +0200, Eran Ben Elisha wrote:
> From: Inbar Karmy <inbark@mellanox.com>
> 
> This RFC adds support for configuring PFC stall prevention through ethtool.
> 
> In the event where the device unexpectedly becomes unresponsive for a long
> period of time, flow control mechanism may propagate pause frames which will
> cause congestion spreading to the entire network.
> 
> To prevent this scenario, the device may implement a protection mechanism for
> monitoring and resolving such state.  The following patches allow the user to
> control the stall prevention functionality.
> 
> PFC stall prevention configuration is done via ethtool -a (pause).
> Two modes are introduced:
> Default - current behavior per driver.
> Auto - protection mechanism controlled automatically by the driver.
> Due to lack of extension ability of ethtool_ops set_pauseparam, a new
> ethtool_ops get_pfc_prevention_mode is introduced.

I don't like adding another ethtool_ops callback tightly tied to the
structures passed via ioctl() but when I started to think what to
suggest as an alternative, I started to wonder if it is really necessary
to add a new ethtool command at all. Couldn't this be handled as
a tunable?

Michal Kubecek

> I based this patch set on net-next commit 50895b9de1d3("tcp: highest_sack fix
> ").
> 
> Inbar Karmy (2):
>   ethtool: Add support for configuring PFC stall prevention in ethtool
>   net/mlx5e: PFC stall prevention support
> 
>  .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 51 ++++++++++++++++++++
>  drivers/net/ethernet/mellanox/mlx5/core/port.c     | 56 ++++++++++++++++++----
>  include/linux/ethtool.h                            |  8 ++++
>  include/linux/mlx5/mlx5_ifc.h                      | 22 +++++++--
>  include/linux/mlx5/port.h                          |  5 ++
>  include/uapi/linux/ethtool.h                       | 20 ++++++++
>  net/core/ethtool.c                                 | 39 +++++++++++++++
>  7 files changed, 188 insertions(+), 13 deletions(-)
> 
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH net-next 0/2] Configuring PFC stall prevention via ethtool
  2017-11-16  2:44   ` Andrew Lunn
@ 2017-11-16  9:17     ` Eran Ben Elisha
  2017-11-16 15:22       ` Andrew Lunn
  0 siblings, 1 reply; 12+ messages in thread
From: Eran Ben Elisha @ 2017-11-16  9:17 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Eran Ben Elisha, Linux Netdev List, David S. Miller,
	John W. Linville, Saeed Mahameed, Gal Pressman, Ariel Almog,
	Inbar Karmy

On Thu, Nov 16, 2017 at 4:44 AM, Andrew Lunn <andrew@lunn.ch> wrote:
>> What do other vendors support? Time? Number of pause frames sent?
>
> So i checked a few Marvell Switches. You can also specify a time. It
> is a little bit more complex than that, since the units of time depend
> on the link speed. But converting a time in ms to what the register
> wants is possible.
>
> So i'm thinking rather than a poorly defined 'Auto', passing a time
> would be better.
>
>       Andrew
Hi Andrew,

We were using the term 'Auto' for few reasons.
1. Not confusing the user with the question of what is the correct
value (100 ms is good? Bad?)
2. Allowing exposure of new mechanism in the future without user need
to change its commands
3. Letting the device to decide on best approach according to its
capabilities, link speed, etc.

Our initial thought was to expose with timeout as you suggested, but
it felt very restrictive due to the reasons I mentioned.

Eran

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH net-next 0/2] Configuring PFC stall prevention via ethtool
  2017-11-16  8:44 ` Michal Kubecek
@ 2017-11-16 12:03   ` Eran Ben Elisha
  2017-11-16 12:39     ` Michal Kubecek
  2017-11-16 15:42   ` Andrew Lunn
  1 sibling, 1 reply; 12+ messages in thread
From: Eran Ben Elisha @ 2017-11-16 12:03 UTC (permalink / raw)
  To: Michal Kubecek
  Cc: Eran Ben Elisha, Linux Netdev List, David S. Miller,
	John W. Linville, Saeed Mahameed, Gal Pressman, Ariel Almog,
	Inbar Karmy

On Thu, Nov 16, 2017 at 10:44 AM, Michal Kubecek <mkubecek@suse.cz> wrote:
> On Wed, Nov 15, 2017 at 09:00:09PM +0200, Eran Ben Elisha wrote:
>> From: Inbar Karmy <inbark@mellanox.com>
>>
>> This RFC adds support for configuring PFC stall prevention through ethtool.
>>
>> In the event where the device unexpectedly becomes unresponsive for a long
>> period of time, flow control mechanism may propagate pause frames which will
>> cause congestion spreading to the entire network.
>>
>> To prevent this scenario, the device may implement a protection mechanism for
>> monitoring and resolving such state.  The following patches allow the user to
>> control the stall prevention functionality.
>>
>> PFC stall prevention configuration is done via ethtool -a (pause).
>> Two modes are introduced:
>> Default - current behavior per driver.
>> Auto - protection mechanism controlled automatically by the driver.
>> Due to lack of extension ability of ethtool_ops set_pauseparam, a new
>> ethtool_ops get_pfc_prevention_mode is introduced.
>
> I don't like adding another ethtool_ops callback tightly tied to the
> structures passed via ioctl() but when I started to think what to
> suggest as an alternative, I started to wonder if it is really necessary
> to add a new ethtool command at all. Couldn't this be handled as
> a tunable?
>
> Michal Kubecek

tunable seems as a good infrastructure to PHY tuning, however this
feature is not a PHY feature.
To me, it's looks fit to ethtool -a where pause operations are being controlled.
Unfortunately set/get_pauseparam is not extensible and need a new operation.

Eran.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH net-next 0/2] Configuring PFC stall prevention via ethtool
  2017-11-16 12:03   ` Eran Ben Elisha
@ 2017-11-16 12:39     ` Michal Kubecek
  0 siblings, 0 replies; 12+ messages in thread
From: Michal Kubecek @ 2017-11-16 12:39 UTC (permalink / raw)
  To: Eran Ben Elisha
  Cc: Eran Ben Elisha, Linux Netdev List, David S. Miller,
	John W. Linville, Saeed Mahameed, Gal Pressman, Ariel Almog,
	Inbar Karmy

On Thu, Nov 16, 2017 at 02:03:21PM +0200, Eran Ben Elisha wrote:
> On Thu, Nov 16, 2017 at 10:44 AM, Michal Kubecek <mkubecek@suse.cz> wrote:
> >
> > I don't like adding another ethtool_ops callback tightly tied to the
> > structures passed via ioctl() but when I started to think what to
> > suggest as an alternative, I started to wonder if it is really necessary
> > to add a new ethtool command at all. Couldn't this be handled as
> > a tunable?
> 
> tunable seems as a good infrastructure to PHY tuning, however this
> feature is not a PHY feature.

There are two kinds: tunables (ETHTOOL_{G,S}TUNABLE) and phy tunables
(ETHTOOL_PHY_{G,S}TUNABLE). My understanding is that former is meant as
a generic interface for parameters related to net_device, latter for
parameters related to phydev.

It's only my guess but IMHO the idea behind (both kinds of) tunables was
to add at least some extensibility to the ioctl() interface so that we
don't have to add more and more new one-purpose commands (until we can
switch to netlink).

> To me, it's looks fit to ethtool -a where pause operations are being
> controlled.  Unfortunately set/get_pauseparam is not extensible and
> need a new operation.

Yes, logically it belongs there, no question about that. But the
relation between ethtool subcommands (on command line) and ioctl
commands (cmd member of the structures) is not 1:1 in general (this is
one of the reasons. After all, your patchset uses another command
anyway; my suggestion is to use an existing one (ETHTOOL_{G,S}TUNABLE)
rather than adding a new one (and new callback to ethtool_ops).

Michal Kubecek

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH net-next 0/2] Configuring PFC stall prevention via ethtool
  2017-11-16  9:17     ` Eran Ben Elisha
@ 2017-11-16 15:22       ` Andrew Lunn
  0 siblings, 0 replies; 12+ messages in thread
From: Andrew Lunn @ 2017-11-16 15:22 UTC (permalink / raw)
  To: Eran Ben Elisha
  Cc: Eran Ben Elisha, Linux Netdev List, David S. Miller,
	John W. Linville, Saeed Mahameed, Gal Pressman, Ariel Almog,
	Inbar Karmy

On Thu, Nov 16, 2017 at 11:17:36AM +0200, Eran Ben Elisha wrote:
> On Thu, Nov 16, 2017 at 4:44 AM, Andrew Lunn <andrew@lunn.ch> wrote:
> >> What do other vendors support? Time? Number of pause frames sent?
> >
> > So i checked a few Marvell Switches. You can also specify a time. It
> > is a little bit more complex than that, since the units of time depend
> > on the link speed. But converting a time in ms to what the register
> > wants is possible.
> >
> > So i'm thinking rather than a poorly defined 'Auto', passing a time
> > would be better.
> >
> >       Andrew
> Hi Andrew,
> 
> We were using the term 'Auto' for few reasons.
> 1. Not confusing the user with the question of what is the correct
> value (100 ms is good? Bad?)
> 2. Allowing exposure of new mechanism in the future without user need
> to change its commands
> 3. Letting the device to decide on best approach according to its
> capabilities, link speed, etc.
> 
> Our initial thought was to expose with timeout as you suggested, but
> it felt very restrictive due to the reasons I mentioned.

I just find 'auto' to be very unclearly defined. Auto-negotiation is
well defined, it is specified in 802.3. But what does Auto mean here?
Why 8ms? Why not 42ms? or 420ms? Auto also generally means some sort
of dynamic behaviour. Make changes depending on the current
conditions. Where as your implementation seems to be fixed at 8ms.

Does 802.3 say anything about this at all? Does it list the 8 seconds
your driver defaults to?

Thanks
	Andrew

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH net-next 0/2] Configuring PFC stall prevention via ethtool
  2017-11-16  8:44 ` Michal Kubecek
  2017-11-16 12:03   ` Eran Ben Elisha
@ 2017-11-16 15:42   ` Andrew Lunn
  2017-11-20 11:47     ` Eran Ben Elisha
  1 sibling, 1 reply; 12+ messages in thread
From: Andrew Lunn @ 2017-11-16 15:42 UTC (permalink / raw)
  To: Michal Kubecek
  Cc: Eran Ben Elisha, netdev, David S. Miller, John W. Linville,
	Saeed Mahameed, Gal Pressman, Ariel Almog, Inbar Karmy

> I don't like adding another ethtool_ops callback tightly tied to the
> structures passed via ioctl() but when I started to think what to
> suggest as an alternative, I started to wonder if it is really necessary
> to add a new ethtool command at all. Couldn't this be handled as
> a tunable?

I agree with Michal here.

And as he pointed out, there does not need to be a 1:1 mapping between
ethtool(1) and the kAPI. I suggest extending the existing -a option,
and have it make two system calls if needed.

    Andrew

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH net-next 0/2] Configuring PFC stall prevention via ethtool
  2017-11-16 15:42   ` Andrew Lunn
@ 2017-11-20 11:47     ` Eran Ben Elisha
  0 siblings, 0 replies; 12+ messages in thread
From: Eran Ben Elisha @ 2017-11-20 11:47 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Michal Kubecek, Eran Ben Elisha, Linux Netdev List,
	David S. Miller, John W. Linville, Saeed Mahameed, Gal Pressman,
	Ariel Almog, Inbar Karmy

On Thu, Nov 16, 2017 at 5:42 PM, Andrew Lunn <andrew@lunn.ch> wrote:
>> I don't like adding another ethtool_ops callback tightly tied to the
>> structures passed via ioctl() but when I started to think what to
>> suggest as an alternative, I started to wonder if it is really necessary
>> to add a new ethtool command at all. Couldn't this be handled as
>> a tunable?
>
> I agree with Michal here.
>
> And as he pointed out, there does not need to be a 1:1 mapping between
> ethtool(1) and the kAPI. I suggest extending the existing -a option,
> and have it make two system calls if needed.
>
>     Andrew

Sound good to me. We will follow this suggestion to extend -a using
the tunable op.
In addition, we will come up with new API to use timeouts  and on/off
instead of auto/default.

Eran

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-11-20 11:48 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-11-15 19:00 [RFC PATCH net-next 0/2] Configuring PFC stall prevention via ethtool Eran Ben Elisha
2017-11-15 19:00 ` [RFC PATCH net-next 1/2] ethtool: Add support for configuring PFC stall prevention in ethtool Eran Ben Elisha
2017-11-15 19:00 ` [RFC PATCH net-next 2/2] net/mlx5e: PFC stall prevention support Eran Ben Elisha
2017-11-16  1:07 ` [RFC PATCH net-next 0/2] Configuring PFC stall prevention via ethtool Andrew Lunn
2017-11-16  2:44   ` Andrew Lunn
2017-11-16  9:17     ` Eran Ben Elisha
2017-11-16 15:22       ` Andrew Lunn
2017-11-16  8:44 ` Michal Kubecek
2017-11-16 12:03   ` Eran Ben Elisha
2017-11-16 12:39     ` Michal Kubecek
2017-11-16 15:42   ` Andrew Lunn
2017-11-20 11:47     ` Eran Ben Elisha

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).