Netdev List
 help / color / mirror / Atom feed
* [net 12/12] net/mlx5e: Remove redundant check in CQE recovery flow of tx reporter
From: Saeed Mahameed @ 2019-08-08 20:22 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev@vger.kernel.org, Aya Levin, Saeed Mahameed
In-Reply-To: <20190808202025.11303-1-saeedm@mellanox.com>

From: Aya Levin <ayal@mellanox.com>

Remove check of recovery bit, in the beginning of the CQE recovery
function. This test is already performed right before the reporter
is invoked, when CQE error is detected.

Fixes: de8650a82071 ("net/mlx5e: Add tx reporter support")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
index b91814ecfbc9..c7f86453c638 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
@@ -76,9 +76,6 @@ static int mlx5e_tx_reporter_err_cqe_recover(struct mlx5e_txqsq *sq)
 	u8 state;
 	int err;
 
-	if (!test_bit(MLX5E_SQ_STATE_RECOVERING, &sq->state))
-		return 0;
-
 	err = mlx5_core_query_sq_state(mdev, sq->sqn, &state);
 	if (err) {
 		netdev_err(dev, "Failed to query SQ 0x%x state. err = %d\n",
-- 
2.21.0


^ permalink raw reply related

* [net 11/12] net/mlx5e: Fix error flow of CQE recovery on tx reporter
From: Saeed Mahameed @ 2019-08-08 20:22 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev@vger.kernel.org, Aya Levin, Tariq Toukan, Saeed Mahameed
In-Reply-To: <20190808202025.11303-1-saeedm@mellanox.com>

From: Aya Levin <ayal@mellanox.com>

CQE recovery function begins with test and set of recovery bit. Add an
error flow which ensures clearing of this bit when leaving the recovery
function, to allow further recoveries to take place. This allows removal
of clearing recovery bit on sq activate.

Fixes: de8650a82071 ("net/mlx5e: Add tx reporter support")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/en/reporter_tx.c | 12 ++++++++----
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c    |  1 -
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
index b307234b4e05..b91814ecfbc9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
@@ -83,17 +83,17 @@ static int mlx5e_tx_reporter_err_cqe_recover(struct mlx5e_txqsq *sq)
 	if (err) {
 		netdev_err(dev, "Failed to query SQ 0x%x state. err = %d\n",
 			   sq->sqn, err);
-		return err;
+		goto out;
 	}
 
 	if (state != MLX5_SQC_STATE_ERR)
-		return 0;
+		goto out;
 
 	mlx5e_tx_disable_queue(sq->txq);
 
 	err = mlx5e_wait_for_sq_flush(sq);
 	if (err)
-		return err;
+		goto out;
 
 	/* At this point, no new packets will arrive from the stack as TXQ is
 	 * marked with QUEUE_STATE_DRV_XOFF. In addition, NAPI cleared all
@@ -102,13 +102,17 @@ static int mlx5e_tx_reporter_err_cqe_recover(struct mlx5e_txqsq *sq)
 
 	err = mlx5e_sq_to_ready(sq, state);
 	if (err)
-		return err;
+		goto out;
 
 	mlx5e_reset_txqsq_cc_pc(sq);
 	sq->stats->recover++;
+	clear_bit(MLX5E_SQ_STATE_RECOVERING, &sq->state);
 	mlx5e_activate_txqsq(sq);
 
 	return 0;
+out:
+	clear_bit(MLX5E_SQ_STATE_RECOVERING, &sq->state);
+	return err;
 }
 
 static int mlx5_tx_health_report(struct devlink_health_reporter *tx_reporter,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 6c712c5be4d8..9d5f6e56188f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -1321,7 +1321,6 @@ static int mlx5e_open_txqsq(struct mlx5e_channel *c,
 void mlx5e_activate_txqsq(struct mlx5e_txqsq *sq)
 {
 	sq->txq = netdev_get_tx_queue(sq->channel->netdev, sq->txq_ix);
-	clear_bit(MLX5E_SQ_STATE_RECOVERING, &sq->state);
 	set_bit(MLX5E_SQ_STATE_ENABLED, &sq->state);
 	netdev_tx_reset_queue(sq->txq);
 	netif_tx_start_queue(sq->txq);
-- 
2.21.0


^ permalink raw reply related

* [net 10/12] net/mlx5e: Fix false negative indication on tx reporter CQE recovery
From: Saeed Mahameed @ 2019-08-08 20:22 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev@vger.kernel.org, Aya Levin, Tariq Toukan, Saeed Mahameed
In-Reply-To: <20190808202025.11303-1-saeedm@mellanox.com>

From: Aya Levin <ayal@mellanox.com>

Remove wrong error return value when SQ is not in error state.
CQE recovery on TX reporter queries the sq state. If the sq is not in
error state, the sq is either in ready or reset state. Ready state is
good state which doesn't require recovery and reset state is a temporal
state which ends in ready state. With this patch, CQE recovery in this
scenario is successful.

Fixes: de8650a82071 ("net/mlx5e: Add tx reporter support")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
index f3d98748b211..b307234b4e05 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
@@ -86,10 +86,8 @@ static int mlx5e_tx_reporter_err_cqe_recover(struct mlx5e_txqsq *sq)
 		return err;
 	}
 
-	if (state != MLX5_SQC_STATE_ERR) {
-		netdev_err(dev, "SQ 0x%x not in ERROR state\n", sq->sqn);
-		return -EINVAL;
-	}
+	if (state != MLX5_SQC_STATE_ERR)
+		return 0;
 
 	mlx5e_tx_disable_queue(sq->txq);
 
-- 
2.21.0


^ permalink raw reply related

* [net 09/12] net/mlx5e: kTLS, Fix tisn field placement
From: Saeed Mahameed @ 2019-08-08 20:22 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev@vger.kernel.org, Tariq Toukan, Saeed Mahameed
In-Reply-To: <20190808202025.11303-1-saeedm@mellanox.com>

From: Tariq Toukan <tariqt@mellanox.com>

Shift the tisn field in the WQE control segment, per the
HW specification.

Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
index cfc9e7d457e3..8b93101e1a09 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
@@ -69,7 +69,7 @@ build_static_params(struct mlx5e_umr_wqe *wqe, u16 pc, u32 sqn,
 	cseg->qpn_ds           = cpu_to_be32((sqn << MLX5_WQE_CTRL_QPN_SHIFT) |
 					     STATIC_PARAMS_DS_CNT);
 	cseg->fm_ce_se         = fence ? MLX5_FENCE_MODE_INITIATOR_SMALL : 0;
-	cseg->tisn             = cpu_to_be32(priv_tx->tisn);
+	cseg->tisn             = cpu_to_be32(priv_tx->tisn << 8);
 
 	ucseg->flags = MLX5_UMR_INLINE;
 	ucseg->bsf_octowords = cpu_to_be16(MLX5_ST_SZ_BYTES(tls_static_params) / 16);
@@ -278,7 +278,7 @@ tx_post_resync_dump(struct mlx5e_txqsq *sq, struct sk_buff *skb,
 
 	cseg->opmod_idx_opcode = cpu_to_be32((sq->pc << 8)  | MLX5_OPCODE_DUMP);
 	cseg->qpn_ds           = cpu_to_be32((sq->sqn << 8) | ds_cnt);
-	cseg->tisn             = cpu_to_be32(tisn);
+	cseg->tisn             = cpu_to_be32(tisn << 8);
 	cseg->fm_ce_se         = first ? MLX5_FENCE_MODE_INITIATOR_SMALL : 0;
 
 	eseg->inline_hdr.sz = cpu_to_be16(ihs);
@@ -434,7 +434,7 @@ struct sk_buff *mlx5e_ktls_handle_tx_skb(struct net_device *netdev,
 	priv_tx->expected_seq = seq + datalen;
 
 	cseg = &(*wqe)->ctrl;
-	cseg->tisn = cpu_to_be32(priv_tx->tisn);
+	cseg->tisn = cpu_to_be32(priv_tx->tisn << 8);
 
 	stats->tls_encrypted_packets += skb_is_gso(skb) ? skb_shinfo(skb)->gso_segs : 1;
 	stats->tls_encrypted_bytes   += datalen;
-- 
2.21.0


^ permalink raw reply related

* [net 08/12] net/mlx5e: kTLS, Fix tisn field name
From: Saeed Mahameed @ 2019-08-08 20:22 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev@vger.kernel.org, Tariq Toukan, Saeed Mahameed
In-Reply-To: <20190808202025.11303-1-saeedm@mellanox.com>

From: Tariq Toukan <tariqt@mellanox.com>

Use the proper tisn field name from the union in struct mlx5_wqe_ctrl_seg.

Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
index 9f67bfb559f1..cfc9e7d457e3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
@@ -69,7 +69,7 @@ build_static_params(struct mlx5e_umr_wqe *wqe, u16 pc, u32 sqn,
 	cseg->qpn_ds           = cpu_to_be32((sqn << MLX5_WQE_CTRL_QPN_SHIFT) |
 					     STATIC_PARAMS_DS_CNT);
 	cseg->fm_ce_se         = fence ? MLX5_FENCE_MODE_INITIATOR_SMALL : 0;
-	cseg->imm              = cpu_to_be32(priv_tx->tisn);
+	cseg->tisn             = cpu_to_be32(priv_tx->tisn);
 
 	ucseg->flags = MLX5_UMR_INLINE;
 	ucseg->bsf_octowords = cpu_to_be16(MLX5_ST_SZ_BYTES(tls_static_params) / 16);
@@ -278,7 +278,7 @@ tx_post_resync_dump(struct mlx5e_txqsq *sq, struct sk_buff *skb,
 
 	cseg->opmod_idx_opcode = cpu_to_be32((sq->pc << 8)  | MLX5_OPCODE_DUMP);
 	cseg->qpn_ds           = cpu_to_be32((sq->sqn << 8) | ds_cnt);
-	cseg->imm              = cpu_to_be32(tisn);
+	cseg->tisn             = cpu_to_be32(tisn);
 	cseg->fm_ce_se         = first ? MLX5_FENCE_MODE_INITIATOR_SMALL : 0;
 
 	eseg->inline_hdr.sz = cpu_to_be16(ihs);
@@ -434,7 +434,7 @@ struct sk_buff *mlx5e_ktls_handle_tx_skb(struct net_device *netdev,
 	priv_tx->expected_seq = seq + datalen;
 
 	cseg = &(*wqe)->ctrl;
-	cseg->imm = cpu_to_be32(priv_tx->tisn);
+	cseg->tisn = cpu_to_be32(priv_tx->tisn);
 
 	stats->tls_encrypted_packets += skb_is_gso(skb) ? skb_shinfo(skb)->gso_segs : 1;
 	stats->tls_encrypted_bytes   += datalen;
-- 
2.21.0


^ permalink raw reply related

* [net 07/12] net/mlx5e: kTLS, Fix progress params context WQE layout
From: Saeed Mahameed @ 2019-08-08 20:22 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev@vger.kernel.org, Tariq Toukan, Saeed Mahameed
In-Reply-To: <20190808202025.11303-1-saeedm@mellanox.com>

From: Tariq Toukan <tariqt@mellanox.com>

The TLS progress params context WQE should not include an
Eth segment, drop it.
In addition, align the tls_progress_params layout with the
HW specification document:
- fix the tisn field name.
- remove the valid bit.

Fixes: a12ff35e0fb7 ("net/mlx5: Introduce TLS TX offload hardware bits and structures")
Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h             | 9 +++++++--
 drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls.h  | 6 ++++--
 .../net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c   | 4 ++--
 include/linux/mlx5/mlx5_ifc.h                            | 5 ++---
 4 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index ce1be2a84231..f6b64a03cd06 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -184,8 +184,13 @@ static inline int mlx5e_get_max_num_channels(struct mlx5_core_dev *mdev)
 
 struct mlx5e_tx_wqe {
 	struct mlx5_wqe_ctrl_seg ctrl;
-	struct mlx5_wqe_eth_seg  eth;
-	struct mlx5_wqe_data_seg data[0];
+	union {
+		struct {
+			struct mlx5_wqe_eth_seg  eth;
+			struct mlx5_wqe_data_seg data[0];
+		};
+		u8 tls_progress_params_ctx[0];
+	};
 };
 
 struct mlx5e_rx_wqe_ll {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls.h b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls.h
index 407da83474ef..b7298f9ee3d3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls.h
@@ -11,12 +11,14 @@
 #include "accel/tls.h"
 
 #define MLX5E_KTLS_STATIC_UMR_WQE_SZ \
-	(sizeof(struct mlx5e_umr_wqe) + MLX5_ST_SZ_BYTES(tls_static_params))
+	(offsetof(struct mlx5e_umr_wqe, tls_static_params_ctx) + \
+	 MLX5_ST_SZ_BYTES(tls_static_params))
 #define MLX5E_KTLS_STATIC_WQEBBS \
 	(DIV_ROUND_UP(MLX5E_KTLS_STATIC_UMR_WQE_SZ, MLX5_SEND_WQE_BB))
 
 #define MLX5E_KTLS_PROGRESS_WQE_SZ \
-	(sizeof(struct mlx5e_tx_wqe) + MLX5_ST_SZ_BYTES(tls_progress_params))
+	(offsetof(struct mlx5e_tx_wqe, tls_progress_params_ctx) + \
+	 MLX5_ST_SZ_BYTES(tls_progress_params))
 #define MLX5E_KTLS_PROGRESS_WQEBBS \
 	(DIV_ROUND_UP(MLX5E_KTLS_PROGRESS_WQE_SZ, MLX5_SEND_WQE_BB))
 #define MLX5E_KTLS_MAX_DUMP_WQEBBS 2
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
index 3766545ce259..9f67bfb559f1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
@@ -80,7 +80,7 @@ build_static_params(struct mlx5e_umr_wqe *wqe, u16 pc, u32 sqn,
 static void
 fill_progress_params_ctx(void *ctx, struct mlx5e_ktls_offload_context_tx *priv_tx)
 {
-	MLX5_SET(tls_progress_params, ctx, pd, priv_tx->tisn);
+	MLX5_SET(tls_progress_params, ctx, tisn, priv_tx->tisn);
 	MLX5_SET(tls_progress_params, ctx, record_tracker_state,
 		 MLX5E_TLS_PROGRESS_PARAMS_RECORD_TRACKER_STATE_START);
 	MLX5_SET(tls_progress_params, ctx, auth_state,
@@ -104,7 +104,7 @@ build_progress_params(struct mlx5e_tx_wqe *wqe, u16 pc, u32 sqn,
 					     PROGRESS_PARAMS_DS_CNT);
 	cseg->fm_ce_se         = fence ? MLX5_FENCE_MODE_INITIATOR_SMALL : 0;
 
-	fill_progress_params_ctx(wqe->data, priv_tx);
+	fill_progress_params_ctx(wqe->tls_progress_params_ctx, priv_tx);
 }
 
 static void tx_fill_wi(struct mlx5e_txqsq *sq,
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index ec571fd7fcf8..b8b570c30b5e 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -10054,9 +10054,8 @@ struct mlx5_ifc_tls_static_params_bits {
 };
 
 struct mlx5_ifc_tls_progress_params_bits {
-	u8         valid[0x1];
-	u8         reserved_at_1[0x7];
-	u8         pd[0x18];
+	u8         reserved_at_0[0x8];
+	u8         tisn[0x18];
 
 	u8         next_record_tcp_sn[0x20];
 
-- 
2.21.0


^ permalink raw reply related

* [net 06/12] net/mlx5: kTLS, Fix wrong TIS opmod constants
From: Saeed Mahameed @ 2019-08-08 20:22 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev@vger.kernel.org, Tariq Toukan, Saeed Mahameed
In-Reply-To: <20190808202025.11303-1-saeedm@mellanox.com>

From: Tariq Toukan <tariqt@mellanox.com>

Fix the used constants for TLS TIS opmods, per the HW specification.

Fixes: a12ff35e0fb7 ("net/mlx5: Introduce TLS TX offload hardware bits and structures")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 include/linux/mlx5/device.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index ce9839c8bc1a..c2f056b5766d 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -446,11 +446,11 @@ enum {
 };
 
 enum {
-	MLX5_OPC_MOD_TLS_TIS_STATIC_PARAMS = 0x20,
+	MLX5_OPC_MOD_TLS_TIS_STATIC_PARAMS = 0x1,
 };
 
 enum {
-	MLX5_OPC_MOD_TLS_TIS_PROGRESS_PARAMS = 0x20,
+	MLX5_OPC_MOD_TLS_TIS_PROGRESS_PARAMS = 0x1,
 };
 
 enum {
-- 
2.21.0


^ permalink raw reply related

* [net 05/12] net/mlx5: crypto, Fix wrong offset in encryption key command
From: Saeed Mahameed @ 2019-08-08 20:22 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev@vger.kernel.org, Tariq Toukan, Saeed Mahameed
In-Reply-To: <20190808202025.11303-1-saeedm@mellanox.com>

From: Tariq Toukan <tariqt@mellanox.com>

Fix the 128b key offset in key encryption key creation command,
per the HW specification.

Fixes: 45d3b55dc665 ("net/mlx5: Add crypto library to support create/destroy encryption key")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/lib/crypto.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/crypto.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/crypto.c
index ea9ee88491e5..ea1d4d26ece0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/crypto.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/crypto.c
@@ -27,6 +27,7 @@ int mlx5_create_encryption_key(struct mlx5_core_dev *mdev,
 	case 128:
 		general_obj_key_size =
 			MLX5_GENERAL_OBJECT_TYPE_ENCRYPTION_KEY_KEY_SIZE_128;
+		key_p += sz_bytes;
 		break;
 	case 256:
 		general_obj_key_size =
-- 
2.21.0


^ permalink raw reply related

* [net 04/12] net/mlx5e: ethtool, Avoid setting speed to 56GBASE when autoneg off
From: Saeed Mahameed @ 2019-08-08 20:22 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev@vger.kernel.org, Mohamad Heib, Saeed Mahameed
In-Reply-To: <20190808202025.11303-1-saeedm@mellanox.com>

From: Mohamad Heib <mohamadh@mellanox.com>

Setting speed to 56GBASE is allowed only with auto-negotiation enabled.

This patch prevent setting speed to 56GBASE when auto-negotiation disabled.

Fixes: f62b8bb8f2d3 ("net/mlx5: Extend mlx5_core to support ConnectX-4 Ethernet functionality")
Signed-off-by: Mohamad Heib <mohamadh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index ee9fa0c2c8b9..e89dba790a2d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -1081,6 +1081,14 @@ int mlx5e_ethtool_set_link_ksettings(struct mlx5e_priv *priv,
 	link_modes = autoneg == AUTONEG_ENABLE ? ethtool2ptys_adver_func(adver) :
 		mlx5e_port_speed2linkmodes(mdev, speed, !ext);
 
+	if ((link_modes & MLX5E_PROT_MASK(MLX5E_56GBASE_R4)) &&
+	    autoneg != AUTONEG_ENABLE) {
+		netdev_err(priv->netdev, "%s: 56G link speed requires autoneg enabled\n",
+			   __func__);
+		err = -EINVAL;
+		goto out;
+	}
+
 	link_modes = link_modes & eproto.cap;
 	if (!link_modes) {
 		netdev_err(priv->netdev, "%s: Not supported link mode(s) requested",
-- 
2.21.0


^ permalink raw reply related

* [net 03/12] net/mlx5e: Only support tx/rx pause setting for port owner
From: Saeed Mahameed @ 2019-08-08 20:22 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev@vger.kernel.org, Huy Nguyen, Parav Pandit, Saeed Mahameed
In-Reply-To: <20190808202025.11303-1-saeedm@mellanox.com>

From: Huy Nguyen <huyn@mellanox.com>

Only support changing tx/rx pause frame setting if the net device
is the vport group manager.

Fixes: 3c2d18ef22df ("net/mlx5e: Support ethtool get/set_pauseparam")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index 03bed714bac3..ee9fa0c2c8b9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -1338,6 +1338,9 @@ int mlx5e_ethtool_set_pauseparam(struct mlx5e_priv *priv,
 	struct mlx5_core_dev *mdev = priv->mdev;
 	int err;
 
+	if (!MLX5_CAP_GEN(mdev, vport_group_manager))
+		return -EOPNOTSUPP;
+
 	if (pauseparam->autoneg)
 		return -EINVAL;
 
-- 
2.21.0


^ permalink raw reply related

* [net 02/12] net/mlx5: Support inner header match criteria for non decap flow action
From: Saeed Mahameed @ 2019-08-08 20:22 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev@vger.kernel.org, Huy Nguyen, Roi Dayan, Saeed Mahameed
In-Reply-To: <20190808202025.11303-1-saeedm@mellanox.com>

From: Huy Nguyen <huyn@mellanox.com>

We have an issue that OVS application creates an offloaded drop rule
that drops VXLAN traffic with both inner and outer header match
criteria. mlx5_core driver detects correctly the inner and outer
header match criteria but does not enable the inner header match criteria
due to an incorrect assumption in mlx5_eswitch_add_offloaded_rule that
only decap rule needs inner header criteria.

Solution:
Remove mlx5_esw_flow_attr's match_level and tunnel_match_level and add
two new members: inner_match_level and outer_match_level.
inner/outer_match_level is set to NONE if the inner/outer match criteria
is not specified in the tc rule creation request. The decap assumption is
removed and the code just needs to check for inner/outer_match_level to
enable the corresponding bit in firmware's match_criteria_enable value.

Fixes: 6363651d6dd7 ("net/mlx5e: Properly set steering match levels for offloaded TC decap rules")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/en_tc.c   | 31 ++++++++++++-------
 .../net/ethernet/mellanox/mlx5/core/eswitch.h |  4 +--
 .../mellanox/mlx5/core/eswitch_offloads.c     | 12 +++----
 3 files changed, 26 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 7ecfc53cf5f6..deeb65da99f3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -1480,7 +1480,7 @@ static int __parse_cls_flower(struct mlx5e_priv *priv,
 			      struct mlx5_flow_spec *spec,
 			      struct flow_cls_offload *f,
 			      struct net_device *filter_dev,
-			      u8 *match_level, u8 *tunnel_match_level)
+			      u8 *inner_match_level, u8 *outer_match_level)
 {
 	struct netlink_ext_ack *extack = f->common.extack;
 	void *headers_c = MLX5_ADDR_OF(fte_match_param, spec->match_criteria,
@@ -1495,8 +1495,9 @@ static int __parse_cls_flower(struct mlx5e_priv *priv,
 	struct flow_dissector *dissector = rule->match.dissector;
 	u16 addr_type = 0;
 	u8 ip_proto = 0;
+	u8 *match_level;
 
-	*match_level = MLX5_MATCH_NONE;
+	match_level = outer_match_level;
 
 	if (dissector->used_keys &
 	    ~(BIT(FLOW_DISSECTOR_KEY_META) |
@@ -1524,12 +1525,14 @@ static int __parse_cls_flower(struct mlx5e_priv *priv,
 	}
 
 	if (mlx5e_get_tc_tun(filter_dev)) {
-		if (parse_tunnel_attr(priv, spec, f, filter_dev, tunnel_match_level))
+		if (parse_tunnel_attr(priv, spec, f, filter_dev,
+				      outer_match_level))
 			return -EOPNOTSUPP;
 
-		/* In decap flow, header pointers should point to the inner
+		/* At this point, header pointers should point to the inner
 		 * headers, outer header were already set by parse_tunnel_attr
 		 */
+		match_level = inner_match_level;
 		headers_c = get_match_headers_criteria(MLX5_FLOW_CONTEXT_ACTION_DECAP,
 						       spec);
 		headers_v = get_match_headers_value(MLX5_FLOW_CONTEXT_ACTION_DECAP,
@@ -1831,35 +1834,41 @@ static int parse_cls_flower(struct mlx5e_priv *priv,
 			    struct flow_cls_offload *f,
 			    struct net_device *filter_dev)
 {
+	u8 inner_match_level, outer_match_level, non_tunnel_match_level;
 	struct netlink_ext_ack *extack = f->common.extack;
 	struct mlx5_core_dev *dev = priv->mdev;
 	struct mlx5_eswitch *esw = dev->priv.eswitch;
 	struct mlx5e_rep_priv *rpriv = priv->ppriv;
-	u8 match_level, tunnel_match_level = MLX5_MATCH_NONE;
 	struct mlx5_eswitch_rep *rep;
 	int err;
 
-	err = __parse_cls_flower(priv, spec, f, filter_dev, &match_level, &tunnel_match_level);
+	inner_match_level = MLX5_MATCH_NONE;
+	outer_match_level = MLX5_MATCH_NONE;
+
+	err = __parse_cls_flower(priv, spec, f, filter_dev, &inner_match_level,
+				 &outer_match_level);
+	non_tunnel_match_level = (inner_match_level == MLX5_MATCH_NONE) ?
+				 outer_match_level : inner_match_level;
 
 	if (!err && (flow->flags & MLX5E_TC_FLOW_ESWITCH)) {
 		rep = rpriv->rep;
 		if (rep->vport != MLX5_VPORT_UPLINK &&
 		    (esw->offloads.inline_mode != MLX5_INLINE_MODE_NONE &&
-		    esw->offloads.inline_mode < match_level)) {
+		    esw->offloads.inline_mode < non_tunnel_match_level)) {
 			NL_SET_ERR_MSG_MOD(extack,
 					   "Flow is not offloaded due to min inline setting");
 			netdev_warn(priv->netdev,
 				    "Flow is not offloaded due to min inline setting, required %d actual %d\n",
-				    match_level, esw->offloads.inline_mode);
+				    non_tunnel_match_level, esw->offloads.inline_mode);
 			return -EOPNOTSUPP;
 		}
 	}
 
 	if (flow->flags & MLX5E_TC_FLOW_ESWITCH) {
-		flow->esw_attr->match_level = match_level;
-		flow->esw_attr->tunnel_match_level = tunnel_match_level;
+		flow->esw_attr->inner_match_level = inner_match_level;
+		flow->esw_attr->outer_match_level = outer_match_level;
 	} else {
-		flow->nic_attr->match_level = match_level;
+		flow->nic_attr->match_level = non_tunnel_match_level;
 	}
 
 	return err;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index a38e8a3c7c9a..04685dbb280c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -377,8 +377,8 @@ struct mlx5_esw_flow_attr {
 		struct mlx5_termtbl_handle *termtbl;
 	} dests[MLX5_MAX_FLOW_FWD_VPORTS];
 	u32	mod_hdr_id;
-	u8	match_level;
-	u8	tunnel_match_level;
+	u8	inner_match_level;
+	u8	outer_match_level;
 	struct mlx5_fc *counter;
 	u32	chain;
 	u16	prio;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 089ae4d48a82..0323fd078271 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -207,14 +207,10 @@ mlx5_eswitch_add_offloaded_rule(struct mlx5_eswitch *esw,
 
 	mlx5_eswitch_set_rule_source_port(esw, spec, attr);
 
-	if (flow_act.action & MLX5_FLOW_CONTEXT_ACTION_DECAP) {
-		if (attr->tunnel_match_level != MLX5_MATCH_NONE)
-			spec->match_criteria_enable |= MLX5_MATCH_OUTER_HEADERS;
-		if (attr->match_level != MLX5_MATCH_NONE)
-			spec->match_criteria_enable |= MLX5_MATCH_INNER_HEADERS;
-	} else if (attr->match_level != MLX5_MATCH_NONE) {
+	if (attr->outer_match_level != MLX5_MATCH_NONE)
 		spec->match_criteria_enable |= MLX5_MATCH_OUTER_HEADERS;
-	}
+	if (attr->inner_match_level != MLX5_MATCH_NONE)
+		spec->match_criteria_enable |= MLX5_MATCH_INNER_HEADERS;
 
 	if (flow_act.action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR)
 		flow_act.modify_id = attr->mod_hdr_id;
@@ -290,7 +286,7 @@ mlx5_eswitch_add_fwd_rule(struct mlx5_eswitch *esw,
 	mlx5_eswitch_set_rule_source_port(esw, spec, attr);
 
 	spec->match_criteria_enable |= MLX5_MATCH_MISC_PARAMETERS;
-	if (attr->match_level != MLX5_MATCH_NONE)
+	if (attr->outer_match_level != MLX5_MATCH_NONE)
 		spec->match_criteria_enable |= MLX5_MATCH_OUTER_HEADERS;
 
 	rule = mlx5_add_flow_rules(fast_fdb, spec, &flow_act, dest, i);
-- 
2.21.0


^ permalink raw reply related

* [net 01/12] net/mlx5e: Use flow keys dissector to parse packets for ARFS
From: Saeed Mahameed @ 2019-08-08 20:22 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev@vger.kernel.org, Maxim Mikityanskiy, Tariq Toukan,
	Saeed Mahameed
In-Reply-To: <20190808202025.11303-1-saeedm@mellanox.com>

From: Maxim Mikityanskiy <maximmi@mellanox.com>

The current ARFS code relies on certain fields to be set in the SKB
(e.g. transport_header) and extracts IP addresses and ports by custom
code that parses the packet. The necessary SKB fields, however, are not
always set at that point, which leads to an out-of-bounds access. Use
skb_flow_dissect_flow_keys() to get the necessary information reliably,
fix the out-of-bounds access and reuse the code.

Fixes: 18c908e477dc ("net/mlx5e: Add accelerated RFS support")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/en_arfs.c | 97 +++++++------------
 1 file changed, 34 insertions(+), 63 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c
index 8657e0f26995..2c75b2752f58 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c
@@ -437,12 +437,6 @@ arfs_hash_bucket(struct arfs_table *arfs_t, __be16 src_port,
 	return &arfs_t->rules_hash[bucket_idx];
 }
 
-static u8 arfs_get_ip_proto(const struct sk_buff *skb)
-{
-	return (skb->protocol == htons(ETH_P_IP)) ?
-		ip_hdr(skb)->protocol : ipv6_hdr(skb)->nexthdr;
-}
-
 static struct arfs_table *arfs_get_table(struct mlx5e_arfs_tables *arfs,
 					 u8 ip_proto, __be16 etype)
 {
@@ -602,31 +596,9 @@ static void arfs_handle_work(struct work_struct *work)
 	arfs_may_expire_flow(priv);
 }
 
-/* return L4 destination port from ip4/6 packets */
-static __be16 arfs_get_dst_port(const struct sk_buff *skb)
-{
-	char *transport_header;
-
-	transport_header = skb_transport_header(skb);
-	if (arfs_get_ip_proto(skb) == IPPROTO_TCP)
-		return ((struct tcphdr *)transport_header)->dest;
-	return ((struct udphdr *)transport_header)->dest;
-}
-
-/* return L4 source port from ip4/6 packets */
-static __be16 arfs_get_src_port(const struct sk_buff *skb)
-{
-	char *transport_header;
-
-	transport_header = skb_transport_header(skb);
-	if (arfs_get_ip_proto(skb) == IPPROTO_TCP)
-		return ((struct tcphdr *)transport_header)->source;
-	return ((struct udphdr *)transport_header)->source;
-}
-
 static struct arfs_rule *arfs_alloc_rule(struct mlx5e_priv *priv,
 					 struct arfs_table *arfs_t,
-					 const struct sk_buff *skb,
+					 const struct flow_keys *fk,
 					 u16 rxq, u32 flow_id)
 {
 	struct arfs_rule *rule;
@@ -641,19 +613,19 @@ static struct arfs_rule *arfs_alloc_rule(struct mlx5e_priv *priv,
 	INIT_WORK(&rule->arfs_work, arfs_handle_work);
 
 	tuple = &rule->tuple;
-	tuple->etype = skb->protocol;
+	tuple->etype = fk->basic.n_proto;
+	tuple->ip_proto = fk->basic.ip_proto;
 	if (tuple->etype == htons(ETH_P_IP)) {
-		tuple->src_ipv4 = ip_hdr(skb)->saddr;
-		tuple->dst_ipv4 = ip_hdr(skb)->daddr;
+		tuple->src_ipv4 = fk->addrs.v4addrs.src;
+		tuple->dst_ipv4 = fk->addrs.v4addrs.dst;
 	} else {
-		memcpy(&tuple->src_ipv6, &ipv6_hdr(skb)->saddr,
+		memcpy(&tuple->src_ipv6, &fk->addrs.v6addrs.src,
 		       sizeof(struct in6_addr));
-		memcpy(&tuple->dst_ipv6, &ipv6_hdr(skb)->daddr,
+		memcpy(&tuple->dst_ipv6, &fk->addrs.v6addrs.dst,
 		       sizeof(struct in6_addr));
 	}
-	tuple->ip_proto = arfs_get_ip_proto(skb);
-	tuple->src_port = arfs_get_src_port(skb);
-	tuple->dst_port = arfs_get_dst_port(skb);
+	tuple->src_port = fk->ports.src;
+	tuple->dst_port = fk->ports.dst;
 
 	rule->flow_id = flow_id;
 	rule->filter_id = priv->fs.arfs.last_filter_id++ % RPS_NO_FILTER;
@@ -664,37 +636,33 @@ static struct arfs_rule *arfs_alloc_rule(struct mlx5e_priv *priv,
 	return rule;
 }
 
-static bool arfs_cmp_ips(struct arfs_tuple *tuple,
-			 const struct sk_buff *skb)
+static bool arfs_cmp(const struct arfs_tuple *tuple, const struct flow_keys *fk)
 {
-	if (tuple->etype == htons(ETH_P_IP) &&
-	    tuple->src_ipv4 == ip_hdr(skb)->saddr &&
-	    tuple->dst_ipv4 == ip_hdr(skb)->daddr)
-		return true;
-	if (tuple->etype == htons(ETH_P_IPV6) &&
-	    (!memcmp(&tuple->src_ipv6, &ipv6_hdr(skb)->saddr,
-		     sizeof(struct in6_addr))) &&
-	    (!memcmp(&tuple->dst_ipv6, &ipv6_hdr(skb)->daddr,
-		     sizeof(struct in6_addr))))
-		return true;
+	if (tuple->src_port != fk->ports.src || tuple->dst_port != fk->ports.dst)
+		return false;
+	if (tuple->etype != fk->basic.n_proto)
+		return false;
+	if (tuple->etype == htons(ETH_P_IP))
+		return tuple->src_ipv4 == fk->addrs.v4addrs.src &&
+		       tuple->dst_ipv4 == fk->addrs.v4addrs.dst;
+	if (tuple->etype == htons(ETH_P_IPV6))
+		return !memcmp(&tuple->src_ipv6, &fk->addrs.v6addrs.src,
+			       sizeof(struct in6_addr)) &&
+		       !memcmp(&tuple->dst_ipv6, &fk->addrs.v6addrs.dst,
+			       sizeof(struct in6_addr));
 	return false;
 }
 
 static struct arfs_rule *arfs_find_rule(struct arfs_table *arfs_t,
-					const struct sk_buff *skb)
+					const struct flow_keys *fk)
 {
 	struct arfs_rule *arfs_rule;
 	struct hlist_head *head;
-	__be16 src_port = arfs_get_src_port(skb);
-	__be16 dst_port = arfs_get_dst_port(skb);
 
-	head = arfs_hash_bucket(arfs_t, src_port, dst_port);
+	head = arfs_hash_bucket(arfs_t, fk->ports.src, fk->ports.dst);
 	hlist_for_each_entry(arfs_rule, head, hlist) {
-		if (arfs_rule->tuple.src_port == src_port &&
-		    arfs_rule->tuple.dst_port == dst_port &&
-		    arfs_cmp_ips(&arfs_rule->tuple, skb)) {
+		if (arfs_cmp(&arfs_rule->tuple, fk))
 			return arfs_rule;
-		}
 	}
 
 	return NULL;
@@ -707,20 +675,24 @@ int mlx5e_rx_flow_steer(struct net_device *dev, const struct sk_buff *skb,
 	struct mlx5e_arfs_tables *arfs = &priv->fs.arfs;
 	struct arfs_table *arfs_t;
 	struct arfs_rule *arfs_rule;
+	struct flow_keys fk;
+
+	if (!skb_flow_dissect_flow_keys(skb, &fk, 0))
+		return -EPROTONOSUPPORT;
 
-	if (skb->protocol != htons(ETH_P_IP) &&
-	    skb->protocol != htons(ETH_P_IPV6))
+	if (fk.basic.n_proto != htons(ETH_P_IP) &&
+	    fk.basic.n_proto != htons(ETH_P_IPV6))
 		return -EPROTONOSUPPORT;
 
 	if (skb->encapsulation)
 		return -EPROTONOSUPPORT;
 
-	arfs_t = arfs_get_table(arfs, arfs_get_ip_proto(skb), skb->protocol);
+	arfs_t = arfs_get_table(arfs, fk.basic.ip_proto, fk.basic.n_proto);
 	if (!arfs_t)
 		return -EPROTONOSUPPORT;
 
 	spin_lock_bh(&arfs->arfs_lock);
-	arfs_rule = arfs_find_rule(arfs_t, skb);
+	arfs_rule = arfs_find_rule(arfs_t, &fk);
 	if (arfs_rule) {
 		if (arfs_rule->rxq == rxq_index) {
 			spin_unlock_bh(&arfs->arfs_lock);
@@ -728,8 +700,7 @@ int mlx5e_rx_flow_steer(struct net_device *dev, const struct sk_buff *skb,
 		}
 		arfs_rule->rxq = rxq_index;
 	} else {
-		arfs_rule = arfs_alloc_rule(priv, arfs_t, skb,
-					    rxq_index, flow_id);
+		arfs_rule = arfs_alloc_rule(priv, arfs_t, &fk, rxq_index, flow_id);
 		if (!arfs_rule) {
 			spin_unlock_bh(&arfs->arfs_lock);
 			return -ENOMEM;
-- 
2.21.0


^ permalink raw reply related

* [pull request][net 00/12] Mellanox, mlx5 fixes 2019-08-08
From: Saeed Mahameed @ 2019-08-08 20:21 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev@vger.kernel.org, Saeed Mahameed

Hi Dave,

This series introduces some fixes to mlx5 driver.

Highlights:
1) From Tariq, Critical mlx5 kTLS fixes to better align with hw specs.
2) From Aya, Fixes to mlx5 tx devlink health reporter.
3) From Maxim, aRFs parsing to use flow dissector to avoid relying on
invalid skb fields.

Please pull and let me know if there is any problem.

For -stable v4.3
 ('net/mlx5e: Only support tx/rx pause setting for port owner')
For -stable v4.9
 ('net/mlx5e: Use flow keys dissector to parse packets for ARFS')
For -stable v5.1
 ('net/mlx5e: Fix false negative indication on tx reporter CQE recovery')
 ('net/mlx5e: Remove redundant check in CQE recovery flow of tx reporter')
 ('net/mlx5e: ethtool, Avoid setting speed to 56GBASE when autoneg off')

Note: when merged with net-next this minor conflict will pop up:
++<<<<<<< (net-next)
 +      if (is_eswitch_flow) {
 +              flow->esw_attr->match_level = match_level;
 +              flow->esw_attr->tunnel_match_level = tunnel_match_level;
++=======
+       if (flow->flags & MLX5E_TC_FLOW_ESWITCH) {
+               flow->esw_attr->inner_match_level = inner_match_level;
+               flow->esw_attr->outer_match_level = outer_match_level;
++>>>>>>> (net)

To resolve, use hunks from net (2nd) and replace:
if (flow->flags & MLX5E_TC_FLOW_ESWITCH) 
with
if (is_eswitch_flow)

Thanks,
Saeed.

---
The following changes since commit f6649feb264ed10ce425455df48242c0e704cba2:

  Merge tag 'batadv-net-for-davem-20190808' of git://git.open-mesh.org/linux-merge (2019-08-08 11:25:39 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5-fixes-2019-08-08

for you to fetch changes up to a4e508cab623951dc4754f346e5673714f3bbade:

  net/mlx5e: Remove redundant check in CQE recovery flow of tx reporter (2019-08-08 13:01:20 -0700)

----------------------------------------------------------------
mlx5-fixes-2019-08-08

----------------------------------------------------------------
Aya Levin (3):
      net/mlx5e: Fix false negative indication on tx reporter CQE recovery
      net/mlx5e: Fix error flow of CQE recovery on tx reporter
      net/mlx5e: Remove redundant check in CQE recovery flow of tx reporter

Huy Nguyen (2):
      net/mlx5: Support inner header match criteria for non decap flow action
      net/mlx5e: Only support tx/rx pause setting for port owner

Maxim Mikityanskiy (1):
      net/mlx5e: Use flow keys dissector to parse packets for ARFS

Mohamad Heib (1):
      net/mlx5e: ethtool, Avoid setting speed to 56GBASE when autoneg off

Tariq Toukan (5):
      net/mlx5: crypto, Fix wrong offset in encryption key command
      net/mlx5: kTLS, Fix wrong TIS opmod constants
      net/mlx5e: kTLS, Fix progress params context WQE layout
      net/mlx5e: kTLS, Fix tisn field name
      net/mlx5e: kTLS, Fix tisn field placement

 drivers/net/ethernet/mellanox/mlx5/core/en.h       |  9 +-
 .../ethernet/mellanox/mlx5/core/en/reporter_tx.c   | 19 ++---
 .../ethernet/mellanox/mlx5/core/en_accel/ktls.h    |  6 +-
 .../ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c | 10 +--
 drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c  | 97 ++++++++--------------
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 11 +++
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  1 -
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c    | 31 ++++---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |  4 +-
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 12 +--
 .../net/ethernet/mellanox/mlx5/core/lib/crypto.c   |  1 +
 include/linux/mlx5/device.h                        |  4 +-
 include/linux/mlx5/mlx5_ifc.h                      |  5 +-
 13 files changed, 101 insertions(+), 109 deletions(-)

^ permalink raw reply

* Re: [PATCH v2 bpf-next] btf: expose BTF info through sysfs
From: Yonghong Song @ 2019-08-08 20:21 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf@vger.kernel.org, netdev@vger.kernel.org,
	Alexei Starovoitov, daniel@iogearbox.net, Kernel Team,
	Masahiro Yamada, Arnaldo Carvalho de Melo, Jiri Olsa,
	Sam Ravnborg
In-Reply-To: <CAEf4BzYAZ7x+PY0t90ty9RVSm1FSmc9XqY216DtJCA-giK3fUg@mail.gmail.com>



On 8/8/19 10:47 AM, Andrii Nakryiko wrote:
> On Wed, Aug 7, 2019 at 9:24 PM Yonghong Song <yhs@fb.com> wrote:
>>
>>
>>
>> On 8/7/19 5:32 PM, Andrii Nakryiko wrote:
>>> Make .BTF section allocated and expose its contents through sysfs.
>>>
>>> /sys/kernel/btf directory is created to contain all the BTFs present
>>> inside kernel. Currently there is only kernel's main BTF, represented as
>>> /sys/kernel/btf/kernel file. Once kernel modules' BTFs are supported,
>>> each module will expose its BTF as /sys/kernel/btf/<module-name> file.
>>>
>>> Current approach relies on a few pieces coming together:
>>> 1. pahole is used to take almost final vmlinux image (modulo .BTF and
>>>      kallsyms) and generate .BTF section by converting DWARF info into
>>>      BTF. This section is not allocated and not mapped to any segment,
>>>      though, so is not yet accessible from inside kernel at runtime.
>>> 2. objcopy dumps .BTF contents into binary file and subsequently
>>>      convert binary file into linkable object file with automatically
>>>      generated symbols _binary__btf_kernel_bin_start and
>>>      _binary__btf_kernel_bin_end, pointing to start and end, respectively,
>>>      of BTF raw data.
>>> 3. final vmlinux image is generated by linking this object file (and
>>>      kallsyms, if necessary). sysfs_btf.c then creates
>>>      /sys/kernel/btf/kernel file and exposes embedded BTF contents through
>>>      it. This allows, e.g., libbpf and bpftool access BTF info at
>>>      well-known location, without resorting to searching for vmlinux image
>>>      on disk (location of which is not standardized and vmlinux image
>>>      might not be even available in some scenarios, e.g., inside qemu
>>>      during testing).
>>>
>>> Alternative approach using .incbin assembler directive to embed BTF
>>> contents directly was attempted but didn't work, because sysfs_proc.o is
>>> not re-compiled during link-vmlinux.sh stage. This is required, though,
>>> to update embedded BTF data (initially empty data is embedded, then
>>> pahole generates BTF info and we need to regenerate sysfs_btf.o with
>>> updated contents, but it's too late at that point).
>>>
>>> If BTF couldn't be generated due to missing or too old pahole,
>>> sysfs_btf.c handles that gracefully by detecting that
>>> _binary__btf_kernel_bin_start (weak symbol) is 0 and not creating
>>> /sys/kernel/btf at all.
>>>
>>> v1->v2:
>>> - allow kallsyms stage to re-use vmlinux generated by gen_btf();
>>>
>>> Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
>>> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
>>> Cc: Jiri Olsa <jolsa@kernel.org>
>>> Cc: Sam Ravnborg <sam@ravnborg.org>
>>> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
>>> ---
> 
> [...]
> 
>>> +
>>> +     # dump .BTF section into raw binary file to link with final vmlinux
>>> +     bin_arch=$(${OBJDUMP} -f ${1} | grep architecture | \
>>> +             cut -d, -f1 | cut -d' ' -f2)
>>> +     ${OBJCOPY} --dump-section .BTF=.btf.kernel.bin ${1} 2>/dev/null
>>> +     ${OBJCOPY} -I binary -O ${CONFIG_OUTPUT_FORMAT} -B ${bin_arch} \
>>> +             --rename-section .data=.BTF .btf.kernel.bin ${2}
>>
>> Currently, the binary size on my config is about 2.6MB. Do you think
>> we could or need to compress it to make it smaller? I tried gzip
>> and the compressed size is 0.9MB.
> 
> I'd really prefer to keep it uncompressed for two main reasons:
> - by having this in uncompressed form, kernel itself can use this BTF
> data from inside with almost no additional memory (except maybe for
> index from type ID to actual location of type info), which opens up a
> lot of new and interesting opportunities, like kernel returning its
> own BTF and BTF type ID for various types (think about driver metdata,
> all those special maps, etc).
> - if we are doing compression, now we need to decide on best
> compression format, teach it libbpf (which will make libbpf also
> bigger and depending on extra libraries), etc.
> 
> So basically, in exchange of 1-1.5MB extra memory we get a bunch of
> new problems we normally don't have to deal with.

Yes, I am aware of this tradeoff. Just to make sure this has been 
discussed. I am totally fine with leaving it uncompressed.

> 
>>
>>>    }
>>>
>>>    # Create ${2} .o file with all symbols from the ${1} object file
>>> @@ -153,6 +164,7 @@ sortextable()
>>>    # Delete output files in case of error
>>>    cleanup()
>>>    {
> 
> [...]
> 

^ permalink raw reply

* Re: [PATCH net-next 3/3] net: phy: realtek: add support for the 2.5Gbps PHY in RTL8125
From: Andrew Lunn @ 2019-08-08 20:20 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Florian Fainelli, David Miller, netdev@vger.kernel.org
In-Reply-To: <f34d1117-510f-861f-59f0-51e0e87ead1e@gmail.com>

> I have a contact in Realtek who provided the information about
> the vendor-specific registers used in the patch. I also asked for
> a method to auto-detect 2.5Gbps support but have no feedback so far.
> What may contribute to the problem is that also the integrated 1Gbps
> PHY's (all with the same PHY ID) differ significantly from each other,
> depending on the network chip version.

Hi Heiner

Some of the PHYs embedded in Marvell switches have an OUI, but no
product ID. We work around this brokenness by trapping the reads to
the ID registers in the MDIO bus controller driver and inserting the
switch product ID. The Marvell PHY driver then recognises these IDs
and does the right thing.

Maybe you can do something similar here?

      Andrew

^ permalink raw reply

* Re: [PATCH net-next v1 1/8] netfilter: inlined four headers files into another one.
From: Jeremy Sowden @ 2019-08-08 20:10 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Netfilter Devel, Net Dev, Masahiro Yamada, Jozsef Kadlecsik
In-Reply-To: <20190808112355.w3ax3twuf6b7pwc7@salvia>

[-- Attachment #1: Type: text/plain, Size: 768 bytes --]

On 2019-08-08, at 13:23:55 +0200, Pablo Neira Ayuso wrote:
> On Wed, Aug 07, 2019 at 03:16:58PM +0100, Jeremy Sowden wrote:
> [...]
> > +/* Called from uadd only, protected by the set spinlock.
> > + * The kadt functions don't use the comment extensions in any way.
> > + */
> > +static inline void
> > +ip_set_init_comment(struct ip_set *set, struct ip_set_comment *comment,
> > +		    const struct ip_set_ext *ext)
>
> Not related to this patch, but I think the number of inline functions
> could be reduced a bit by exporting symbols? Specifically for
> functions that are called from the netlink control plane, ie. _uadd()
> functions. I think forcing the compiler to inline this is not useful.
> This could be done in a follow up patchset.

I'll take a look.

J.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH net-next 1/3] net: phy: prepare phylib to deal with PHY's extending Clause 22
From: Heiner Kallweit @ 2019-08-08 20:09 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Florian Fainelli, David Miller, netdev@vger.kernel.org
In-Reply-To: <20190808193243.GK27917@lunn.ch>

On 08.08.2019 21:32, Andrew Lunn wrote:
> On Thu, Aug 08, 2019 at 09:03:42PM +0200, Heiner Kallweit wrote:
>> The integrated PHY in 2.5Gbps chip RTL8125 is the first (known to me)
>> PHY that uses standard Clause 22 for all modes up to 1Gbps and adds
>> 2.5Gbps control using vendor-specific registers. To use phylib for
>> the standard part little extensions are needed:
>> - Move most of genphy_config_aneg to a new function
>>   __genphy_config_aneg that takes a parameter whether restarting
>>   auto-negotiation is needed (depending on whether content of
>>   vendor-specific advertisement register changed).
>> - Don't clear phydev->lp_advertising in genphy_read_status so that
>>   we can set non-C22 mode flags before.
>>
>> Basically both changes mimic the behavior of the equivalent Clause 45
>> functions.
>>
>> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>> ---
>>  drivers/net/phy/phy_device.c | 35 +++++++++++++++--------------------
>>  include/linux/phy.h          |  8 +++++++-
>>  2 files changed, 22 insertions(+), 21 deletions(-)
>>
>> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
>> index 7ddd91df9..bd7e7db8c 100644
>> --- a/drivers/net/phy/phy_device.c
>> +++ b/drivers/net/phy/phy_device.c
>> @@ -1571,11 +1571,9 @@ static int genphy_config_advert(struct phy_device *phydev)
>>  	/* Only allow advertising what this PHY supports */
>>  	linkmode_and(phydev->advertising, phydev->advertising,
>>  		     phydev->supported);
>> -	if (!ethtool_convert_link_mode_to_legacy_u32(&advertise,
>> -						     phydev->advertising))
>> -		phydev_warn(phydev, "PHY advertising (%*pb) more modes than genphy supports, some modes not advertised.\n",
>> -			    __ETHTOOL_LINK_MODE_MASK_NBITS,
>> -			    phydev->advertising);
>> +
>> +	ethtool_convert_link_mode_to_legacy_u32(&advertise,
>> +						phydev->advertising);
> 
> Hi Heiner
> 
> linkmode_adv_to_mii_adv_t() would remove the need to use
> ethtool_convert_link_mode_to_legacy_u32(), and this warning would also
> go away. 
> 
Thanks for the hint! I'll check and submit a v2.

>    Andrew
> 
Heiner


^ permalink raw reply

* Re: [PATCH net-next] r8169: make use of xmit_more
From: Heiner Kallweit @ 2019-08-08 20:08 UTC (permalink / raw)
  To: Holger Hoffstätte, Realtek linux nic maintainers,
	David Miller
  Cc: netdev@vger.kernel.org, Sander Eikelenboom, Eric Dumazet
In-Reply-To: <868a1f4c-5fba-c64b-ea31-30a3770e6a2f@applied-asynchrony.com>

On 08.08.2019 21:52, Holger Hoffstätte wrote:
> On 8/8/19 8:17 PM, Heiner Kallweit wrote:
>> On 08.08.2019 17:53, Holger Hoffstätte wrote:
>>> On 8/8/19 4:37 PM, Holger Hoffstätte wrote:
>>>>
>>>> Hello Heiner -
>>>>
>>>> On 7/28/19 11:25 AM, Heiner Kallweit wrote:
>>>>> There was a previous attempt to use xmit_more, but the change had to be
>>>>> reverted because under load sometimes a transmit timeout occurred [0].
>>>>> Maybe this was caused by a missing memory barrier, the new attempt
>>>>> keeps the memory barrier before the call to netif_stop_queue like it
>>>>> is used by the driver as of today. The new attempt also changes the
>>>>> order of some calls as suggested by Eric.
>>>>>
>>>>> [0] https://lkml.org/lkml/2019/2/10/39
>>>>>
>>>>> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>>>>
>>>> I decided to take one for the team and merged this into my 5.2.x tree (just
>>>> fixing up the path) and it has been working fine for the last 2 weeks in two
>>>> machines..until today, when for the first time in forever some random NFS traffic
>>>> made this old friend come out from under the couch:
>>>>
>>>> [Aug 8 14:13] ------------[ cut here ]------------
>>>> [  +0.000006] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
>>>> [  +0.000021] WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:442 dev_watchdog+0x21f/0x230
>>>> [  +0.000001] Modules linked in: lz4 lz4_compress lz4_decompress nfsd auth_rpcgss oid_registry lockd grace sunrpc sch_fq_codel btrfs xor zstd_compress raid6_pq zstd_decompress bfq jitterentropy_rng nct6775 hwmon_vid coretemp hwmon x86_pkg_temp_thermal aesni_intel aes_x86_64 i915 glue_helper crypto_simd cryptd i2c_i801 intel_gtt i2c_algo_bit iosf_mbi drm_kms_helper syscopyarea usbhid sysfillrect r8169 sysimgblt fb_sys_fops realtek drm libphy drm_panel_orientation_quirks i2c_core video backlight mq_deadline
>>>> [  +0.000026] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.2.7 #1
>>>> [  +0.000001] Hardware name: System manufacturer System Product Name/P8Z68-V LX, BIOS 4105 07/01/2013
>>>> [  +0.000004] RIP: 0010:dev_watchdog+0x21f/0x230
>>>> [  +0.000002] Code: 3b 00 75 ea eb ad 4c 89 ef c6 05 1c 45 bd 00 01 e8 66 35 fc ff 44 89 e1 4c 89 ee 48 c7 c7 e8 5e fc 81 48 89 c2 e8 90 df 92 ff <0f> 0b eb 8e 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 66 66 66 66 90
>>>> [  +0.000002] RSP: 0018:ffffc90000118e68 EFLAGS: 00010286
>>>> [  +0.000002] RAX: 0000000000000000 RBX: ffff8887f7837600 RCX: 0000000000000303
>>>> [  +0.000001] RDX: 0000000000000001 RSI: 0000000000000092 RDI: ffffffff827a488c
>>>> [  +0.000001] RBP: ffff8887f9fbc440 R08: 0000000000000303 R09: 0000000000000003
>>>> [  +0.000001] R10: 000000000001004c R11: 0000000000000001 R12: 0000000000000000
>>>> [  +0.000009] R13: ffff8887f9fbc000 R14: ffffffff8173aa20 R15: dead000000000200
>>>> [  +0.000001] FS:  0000000000000000(0000) GS:ffff8887ff580000(0000) knlGS:0000000000000000
>>>> [  +0.000000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [  +0.000001] CR2: 00007f8d1c04d000 CR3: 0000000002209001 CR4: 00000000000606e0
>>>> [  +0.000000] Call Trace:
>>>> [  +0.000002]  <IRQ>
>>>> [  +0.000005]  call_timer_fn+0x2b/0x120
>>>> [  +0.000002]  expire_timers+0xa4/0x100
>>>> [  +0.000001]  run_timer_softirq+0x8c/0x170
>>>> [  +0.000002]  ? __hrtimer_run_queues+0x13a/0x290
>>>> [  +0.000003]  ? sched_clock_cpu+0xe/0x130
>>>> [  +0.000003]  __do_softirq+0xeb/0x2de
>>>> [  +0.000003]  irq_exit+0x9d/0xe0
>>>> [  +0.000002]  smp_apic_timer_interrupt+0x60/0x110
>>>> [  +0.000003]  apic_timer_interrupt+0xf/0x20
>>>> [  +0.000001]  </IRQ>
>>>> [  +0.000003] RIP: 0010:cpuidle_enter_state+0xad/0x930
>>>> [  +0.000001] Code: c5 66 66 66 66 90 31 ff e8 90 99 9e ff 80 7c 24 0b 00 74 12 9c 58 f6 c4 02 0f 85 39 08 00 00 31 ff e8 e7 26 a2 ff fb 45 85 e4 <0f> 88 34 02 00 00 49 63 cc 4c 2b 2c 24 48 8d 04 49 48 c1 e0 05 8b
>>>> [  +0.000000] RSP: 0018:ffffc9000008be50 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
>>>> [  +0.000001] RAX: ffff8887ff5a9180 RBX: ffffffff822b6c40 RCX: 000000000000001f
>>>> [  +0.000001] RDX: 0000000000000000 RSI: 0000000033087154 RDI: 0000000000000000
>>>> [  +0.000001] RBP: ffff8887ff5b1310 R08: 000030d021fae397 R09: ffff8887ff59c8c0
>>>> [  +0.000000] R10: ffff8887ff59c8c0 R11: 0000000000000006 R12: 0000000000000004
>>>> [  +0.000001] R13: 000030d021fae397 R14: 0000000000000004 R15: ffff8887fc281600
>>>> [  +0.000001]  cpuidle_enter+0x29/0x40
>>>> [  +0.000002]  do_idle+0x1e5/0x280
>>>> [  +0.000001]  cpu_startup_entry+0x19/0x20
>>>> [  +0.000002]  start_secondary+0x186/0x1c0
>>>> [  +0.000001]  secondary_startup_64+0xa4/0xb0
>>>> [  +0.000001] ---[ end trace 99493c768580f4fd ]---
>>>>
>>>> The device is:
>>>>
>>>> Aug  7 23:19:09 tux kernel: libphy: r8169: probed
>>>> Aug  7 23:19:09 tux kernel: r8169 0000:04:00.0 eth0: RTL8168evl/8111evl, c8:60:00:68:33:cc, XID 2c9, IRQ 36
>>>> Aug  7 23:19:09 tux kernel: r8169 0000:04:00.0 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko]
>>>> Aug  7 23:19:12 tux kernel: RTL8211E Gigabit Ethernet r8169-400:00: attached PHY driver [RTL8211E Gigabit Ethernet] (mii_bus:phy_addr=r8169-400:00, irq=IGNORE)
>>>> Aug  7 23:19:13 tux kernel: r8169 0000:04:00.0 eth0: No native access to PCI extended config space, falling back to CSI
>>>>
>>>> and using fq_codel, of course.
>>>>
>>>> This cpuidle hiccup used to be completely gone without xmit_more and this was
>>>> the first (and so far only) time since merging it (regardless of load).
>>>> Also, while I'm using BMQ as CPU scheduler, that hasn't made a difference for
>>>> this particular problem in the past (with MuQSS/PDS) either; way back when I had
>>>> Eric's previous attempt(s) it also hiccupped with CFS.
>>>>
>>>> Revert or wait for more reports when -next is merged in 5.4?
>>>
>>> Another question/data point: I've had the whole basket of offloads activated:
>>>
>>>    ethtool --offload eth0 rx on tx on gro on gso on sg on tso on
>>>
>>> and this caused zero problems without the xmit_more patch. However I just saw
>>> that net-next has a patch where TSO is disabled due to a known HW defect in
>>> RTL8168evl, which is of course what I have. Could this be the reason for the
>>> stall/hiccup when xmit_more has its fingers in the pie? I kind of know what
>>> xmit_more does, just not how it could interact with a possibly broken TSO that
>>> nevertheless seems to work fine otherwise..
>>>
>>
>> I was about to ask exactly that, whether you have TSO enabled. I don't know what
>> can trigger the HW issue, it was just confirmed by Realtek that this chip version
>> has a problem with TSO. So the logical conclusion is: test w/o TSO, ideally the
>> linux-next version.
> 
> So disabling TSO alone didn't work - it leads to reduced throughout (~70 MB/s in iperf).
> Instead I decided to backport 93681cd7d94f ("r8169: enable HW csum and TSO"), which
> wasn't easy due to cleanups/renamings of dependencies, but I managed to backport
> it and .. got the same problem of reduced throughout. wat?!
> 
> After lots of trial & error I started disabling all offloads and finally found
> that sg (Scatter-Gather) enabled alone - without TSO - will lead to the throughput
> drop. So the culprit seems 93681cd7d94f, which disabled TSO on my NIC, but left
> sg on by default. This weas repeatable - switch on sg, throughput drop; turn it
> off - smooth sailing, now with reduced buffers.
> 
> I modified the relevant bits to disable tso & sg like this:
> 
>     /* RTL8168e-vl has a HW issue with TSO */
>     if (tp->mac_version == RTL_GIGA_MAC_VER_34) {
> +        dev->vlan_features &= ~(NETIF_F_ALL_TSO|NETIF_F_SG);
> +        dev->hw_features &= ~(NETIF_F_ALL_TSO|NETIF_F_SG);
> +        dev->features &= ~(NETIF_F_ALL_TSO|NETIF_F_SG);
>     }
> 
> This seems to work since it restores performance without sg/tso by default
> and without any additional offloads, yet with xmit_more in the mix.
> We'll see whether that is stable over the next few days, but I strongly
> suspect it will be good and that the hiccups were due to xmit_more/TSO
> interaction.
> 
Thanks a lot for the analysis and testing. Then I'll submit the disabling
of SG on RTL8168evl (on your behalf), independent of whether it fixes
the timeout issue.

> thanks,
> Holger
> 
Heiner


^ permalink raw reply

* Re: BUG: soft lockup in tcp_delack_timer
From: Thomas Gleixner @ 2019-08-08 20:07 UTC (permalink / raw)
  To: syzbot
  Cc: Frederic Weisbecker, LKML, Ingo Molnar, syzkaller-bugs,
	Eric Dumazet, netdev
In-Reply-To: <00000000000094d2b5058f9df271@google.com>

On Thu, 8 Aug 2019, syzbot wrote:

Cc+ Eric, net-dev

> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:    0d8b3265 Add linux-next specific files for 20190729
> git tree:       linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=1101fdc8600000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=ae96f3b8a7e885f7
> dashboard link: https://syzkaller.appspot.com/bug?extid=2d55fb97f42947bbcddd
> compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
> 
> Unfortunately, I don't have any reproducer for this crash yet.
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+2d55fb97f42947bbcddd@syzkaller.appspotmail.com
> 
> net_ratelimit: 2 callbacks suppressed
> TCP: request_sock_TCPv6: Possible SYN flooding on port 20002. Sending cookies.
> Check SNMP counters.
> watchdog: BUG: soft lockup - CPU#0 stuck for 122s! [swapper/0:0]
> Modules linked in:
> irq event stamp: 92022
> hardirqs last  enabled at (92021): [<ffffffff81660331>]
> tick_nohz_idle_exit+0x181/0x2e0 kernel/time/tick-sched.c:1180
> hardirqs last disabled at (92022): [<ffffffff873d5d7d>]
> __schedule+0x1dd/0x15b0 kernel/sched/core.c:3862
> softirqs last  enabled at (90810): [<ffffffff876006cd>]
> __do_softirq+0x6cd/0x98c kernel/softirq.c:319
> softirqs last disabled at (90703): [<ffffffff8144fc1b>] invoke_softirq
> kernel/softirq.c:373 [inline]
> softirqs last disabled at (90703): [<ffffffff8144fc1b>] irq_exit+0x19b/0x1e0
> kernel/softirq.c:413
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-rc2-next-20190729 #54
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google
> 01/01/2011
> RIP: 0010:cpu_relax arch/x86/include/asm/processor.h:656 [inline]
> RIP: 0010:virt_spin_lock arch/x86/include/asm/qspinlock.h:84 [inline]
> RIP: 0010:native_queued_spin_lock_slowpath+0x132/0x9f0
> kernel/locking/qspinlock.c:325
> Code: 00 00 00 48 8b 45 d0 65 48 33 04 25 28 00 00 00 0f 85 37 07 00 00 48 81
> c4 98 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 f3 90 <e9> 73 ff ff ff 8b 45
> 98 4c 8d 65 d8 3d 00 01 00 00 0f 84 e5 00 00
> RSP: 0018:ffff8880ae809b48 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
> RAX: 0000000000000000 RBX: ffff8880621cd088 RCX: ffffffff8158f117
> RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff8880621cd088
> RBP: ffff8880ae809c08 R08: 1ffff1100c439a11 R09: ffffed100c439a12
> R10: ffffed100c439a11 R11: ffff8880621cd08b R12: 0000000000000001
> R13: 0000000000000003 R14: ffffed100c439a11 R15: 0000000000000001
> FS:  0000000000000000(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000001541e88 CR3: 0000000068089000 CR4: 00000000001406f0
> Call Trace:
> <IRQ>
> pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:642 [inline]
> queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:50 [inline]
> queued_spin_lock include/asm-generic/qspinlock.h:81 [inline]
> do_raw_spin_lock+0x20e/0x2e0 kernel/locking/spinlock_debug.c:113
> __raw_spin_lock include/linux/spinlock_api_smp.h:143 [inline]
> _raw_spin_lock+0x37/0x40 kernel/locking/spinlock.c:151
> spin_lock include/linux/spinlock.h:338 [inline]
> tcp_delack_timer+0x2b/0x2a0 net/ipv4/tcp_timer.c:318
> call_timer_fn+0x1ac/0x780 kernel/time/timer.c:1322
> expire_timers kernel/time/timer.c:1366 [inline]
> __run_timers kernel/time/timer.c:1685 [inline]
> __run_timers kernel/time/timer.c:1653 [inline]
> run_timer_softirq+0x697/0x17a0 kernel/time/timer.c:1698
> __do_softirq+0x262/0x98c kernel/softirq.c:292
> invoke_softirq kernel/softirq.c:373 [inline]
> irq_exit+0x19b/0x1e0 kernel/softirq.c:413
> exiting_irq arch/x86/include/asm/apic.h:536 [inline]
> smp_apic_timer_interrupt+0x1a3/0x610 arch/x86/kernel/apic/apic.c:1095
> apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:828
> </IRQ>
> RIP: 0010:native_safe_halt+0xe/0x10 arch/x86/include/asm/irqflags.h:61
> Code: c8 75 6e fa eb 8a 90 90 90 90 90 90 e9 07 00 00 00 0f 00 2d c4 b2 49 00
> f4 c3 66 90 e9 07 00 00 00 0f 00 2d b4 b2 49 00 fb f4 <c3> 90 55 48 89 e5 41
> 57 41 56 41 55 41 54 53 e8 8e 56 21 fa e8 29
> RSP: 0018:ffffffff88c07ce8 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
> RAX: 1ffffffff11a5e87 RBX: ffffffff88c7a1c0 RCX: 1ffffffff134bca6
> RDX: dffffc0000000000 RSI: ffffffff81779dee RDI: ffffffff873e794c
> RBP: ffffffff88c07d18 R08: ffffffff88c7a1c0 R09: fffffbfff118f439
> R10: fffffbfff118f438 R11: ffffffff88c7a1c7 R12: dffffc0000000000
> R13: ffffffff89a5b340 R14: 0000000000000000 R15: 0000000000000000
> arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:571
> default_idle_call+0x84/0xb0 kernel/sched/idle.c:94
> cpuidle_idle_call kernel/sched/idle.c:154 [inline]
> do_idle+0x413/0x760 kernel/sched/idle.c:263
> cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:354
> rest_init+0x245/0x37b init/main.c:451
> arch_call_rest_init+0xe/0x1b
> start_kernel+0x912/0x951 init/main.c:785
> x86_64_start_reservations+0x29/0x2b arch/x86/kernel/head64.c:472
> x86_64_start_kernel+0x77/0x7b arch/x86/kernel/head64.c:453
> secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:241
> Sending NMI from CPU 0 to CPUs 1:
> INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 1.339
> msecs
> NMI backtrace for cpu 1
> CPU: 1 PID: 7447 Comm: syz-executor.5 Not tainted 5.3.0-rc2-next-20190729 #54
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google
> 01/01/2011
> RIP: 0010:cpu_relax arch/x86/include/asm/processor.h:656 [inline]
> RIP: 0010:virt_spin_lock arch/x86/include/asm/qspinlock.h:84 [inline]
> RIP: 0010:native_queued_spin_lock_slowpath+0x132/0x9f0
> kernel/locking/qspinlock.c:325
> Code: 00 00 00 48 8b 45 d0 65 48 33 04 25 28 00 00 00 0f 85 37 07 00 00 48 81
> c4 98 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 f3 90 <e9> 73 ff ff ff 8b 45
> 98 4c 8d 65 d8 3d 00 01 00 00 0f 84 e5 00 00
> RSP: 0018:ffff8880ae909210 EFLAGS: 00000202
> RAX: 0000000000000000 RBX: ffff8880621cd088 RCX: ffffffff8158f117
> RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff8880621cd088
> RBP: ffff8880ae9092d0 R08: 1ffff1100c439a11 R09: ffffed100c439a12
> R10: ffffed100c439a11 R11: ffff8880621cd08b R12: 0000000000000001
> R13: 0000000000000003 R14: ffffed100c439a11 R15: 0000000000000001
> FS:  0000555557246940(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000001b32b29000 CR3: 00000000a4e3a000 CR4: 00000000001406e0
> Call Trace:
> <IRQ>
> pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:642 [inline]
> queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:50 [inline]
> queued_spin_lock include/asm-generic/qspinlock.h:81 [inline]
> do_raw_spin_lock+0x20e/0x2e0 kernel/locking/spinlock_debug.c:113
> __raw_spin_lock_bh include/linux/spinlock_api_smp.h:136 [inline]
> _raw_spin_lock_bh+0x3b/0x50 kernel/locking/spinlock.c:175
> spin_lock_bh include/linux/spinlock.h:343 [inline]
> release_sock+0x20/0x1c0 net/core/sock.c:2932
> wait_on_pending_writer+0x20f/0x420 net/tls/tls_main.c:91
> tls_sk_proto_cleanup+0x2c5/0x3e0 net/tls/tls_main.c:295
> tls_sk_proto_unhash+0x90/0x3f0 net/tls/tls_main.c:330
> tcp_set_state+0x5b9/0x7d0 net/ipv4/tcp.c:2235
> tcp_done+0xe2/0x320 net/ipv4/tcp.c:3824
> tcp_reset+0x132/0x500 net/ipv4/tcp_input.c:4080
> tcp_validate_incoming+0xa2d/0x1660 net/ipv4/tcp_input.c:5440
> tcp_rcv_established+0x6b5/0x1e70 net/ipv4/tcp_input.c:5648
> tcp_v6_do_rcv+0x41e/0x12c0 net/ipv6/tcp_ipv6.c:1356
> tcp_v6_rcv+0x31f1/0x3500 net/ipv6/tcp_ipv6.c:1588
> ip6_protocol_deliver_rcu+0x2fe/0x1660 net/ipv6/ip6_input.c:397
> ip6_input_finish+0x84/0x170 net/ipv6/ip6_input.c:438
> NF_HOOK include/linux/netfilter.h:305 [inline]
> NF_HOOK include/linux/netfilter.h:299 [inline]
> ip6_input+0xe4/0x3f0 net/ipv6/ip6_input.c:447
> dst_input include/net/dst.h:442 [inline]
> ip6_rcv_finish+0x1de/0x2f0 net/ipv6/ip6_input.c:76
> NF_HOOK include/linux/netfilter.h:305 [inline]
> NF_HOOK include/linux/netfilter.h:299 [inline]
> ipv6_rcv+0x10e/0x420 net/ipv6/ip6_input.c:272
> __netif_receive_skb_one_core+0x113/0x1a0 net/core/dev.c:4999
> __netif_receive_skb+0x2c/0x1d0 net/core/dev.c:5113
> process_backlog+0x206/0x750 net/core/dev.c:5924
> napi_poll net/core/dev.c:6347 [inline]
> net_rx_action+0x508/0x10c0 net/core/dev.c:6413
> __do_softirq+0x262/0x98c kernel/softirq.c:292
> do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1080
> </IRQ>
> do_softirq.part.0+0x11a/0x170 kernel/softirq.c:337
> do_softirq kernel/softirq.c:329 [inline]
> __local_bh_enable_ip+0x211/0x270 kernel/softirq.c:189
> local_bh_enable include/linux/bottom_half.h:32 [inline]
> inet_csk_listen_stop+0x1e0/0x850 net/ipv4/inet_connection_sock.c:993
> tcp_close+0xd5b/0x10e0 net/ipv4/tcp.c:2338
> inet_release+0xed/0x200 net/ipv4/af_inet.c:427
> inet6_release+0x53/0x80 net/ipv6/af_inet6.c:470
> __sock_release+0xce/0x280 net/socket.c:590
> sock_close+0x1e/0x30 net/socket.c:1268
> __fput+0x2ff/0x890 fs/file_table.c:280
> ____fput+0x16/0x20 fs/file_table.c:313
> task_work_run+0x145/0x1c0 kernel/task_work.c:113
> tracehook_notify_resume include/linux/tracehook.h:188 [inline]
> exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:163
> prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
> syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
> do_syscall_64+0x65f/0x760 arch/x86/entry/common.c:300
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x413511
> Code: 75 14 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 04 1b 00 00 c3 48 83
> ec 08 e8 0a fc ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2
> e8 53 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01
> RSP: 002b:00007ffebfc402f0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
> RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000413511
> RDX: 0000000000000000 RSI: 000000000000183f RDI: 0000000000000004
> RBP: 0000000000000001 R08: 00000000c44ab83f R09: 00000000c44ab843
> R10: 00007ffebfc403d0 R11: 0000000000000293 R12: 000000000075c9a0
> R13: 000000000075c9a0 R14: 0000000000760750 R15: ffffffffffffffff
> 
> 
> ---
> This bug is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
> 
> syzbot will keep track of this bug report. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

^ permalink raw reply

* Re: [PATCH bpf-next v5 0/6] xdp: Add devmap_hash map type
From: Jesper Dangaard Brouer @ 2019-08-08 20:05 UTC (permalink / raw)
  To: Y Song
  Cc: Toke Høiland-Jørgensen, Alexei Starovoitov,
	Daniel Borkmann, Alexei Starovoitov, Network Development,
	David Miller, Jakub Kicinski, Björn Töpel,
	Yonghong Song, brouer
In-Reply-To: <CAH3MdRWk_bZVpBUZ8=xsMNw2hUwnQ3Yv-otu9M+7f1Cwr-t1UA@mail.gmail.com>

On Thu, 8 Aug 2019 12:57:05 -0700
Y Song <ys114321@gmail.com> wrote:

> On Thu, Aug 8, 2019 at 12:43 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >
> > Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> >  
> > > On Fri, Jul 26, 2019 at 9:06 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:  
> > >>
> > >> This series adds a new map type, devmap_hash, that works like the existing
> > >> devmap type, but using a hash-based indexing scheme. This is useful for the use
> > >> case where a devmap is indexed by ifindex (for instance for use with the routing
> > >> table lookup helper). For this use case, the regular devmap needs to be sized
> > >> after the maximum ifindex number, not the number of devices in it. A hash-based
> > >> indexing scheme makes it possible to size the map after the number of devices it
> > >> should contain instead.
> > >>
> > >> This was previously part of my patch series that also turned the regular
> > >> bpf_redirect() helper into a map-based one; for this series I just pulled out
> > >> the patches that introduced the new map type.
> > >>
> > >> Changelog:
> > >>
> > >> v5:
> > >>
> > >> - Dynamically set the number of hash buckets by rounding up max_entries to the
> > >>   nearest power of two (mirroring the regular hashmap), as suggested by Jesper.  
> > >
> > > fyi I'm waiting for Jesper to review this new version.  
> >
> > Ping Jesper? :)  
> 
> Toke, the patch set has been merged to net-next.
> https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=d3406913561c322323ec2898cc58f55e79786be7
> 

Yes, and I did review this... :-)

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH net] net: phy: rtl8211f: do a double read to get real time link status
From: Heiner Kallweit @ 2019-08-08 20:01 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Yonglong Liu, davem, netdev, linux-kernel, linuxarm, salil.mehta,
	yisen.zhuang, shiju.jose
In-Reply-To: <20190808194049.GM27917@lunn.ch>

On 08.08.2019 21:40, Andrew Lunn wrote:
>> @@ -568,6 +568,11 @@ int phy_start_aneg(struct phy_device *phydev)
>>  	if (err < 0)
>>  		goto out_unlock;
>>  
>> +	/* The PHY may not yet have cleared aneg-completed and link-up bit
>> +	 * w/o this delay when the following read is done.
>> +	 */
>> +	usleep_range(1000, 2000);
>> +
> 
> Hi Heiner
> 
> Does 802.3 C22 say anything about this?
> 
C22 says:
"The Auto-Negotiation process shall be restarted by setting bit 0.9 to a logic one. This bit is self-
clearing, and a PHY shall return a value of one in bit 0.9 until the Auto-Negotiation process has been
initiated."

Maybe we should read bit 0.9 in genphy_update_link() after having read BMSR and report
aneg-complete and link-up as false (no matter of their current value) if 0.9 is set.

> If this PHY is broken with respect to the standard, i would prefer the
> workaround is in the PHY specific driver code, not generic core code.
> 
Based on the C22 statement above the PHY may not be broken and the typical time between
two MDIO accesses is sufficient for the PHY to clear the bits. I think of MDIO bus access
functions in network chips that have a 10us-20us delay after each MDIO access.
On HNS3 this may not be the case.

> 	   Andrew
> 
Heiner

^ permalink raw reply

* Re: [PATCH bpf-next v5 0/6] xdp: Add devmap_hash map type
From: Y Song @ 2019-08-08 19:57 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Alexei Starovoitov, Daniel Borkmann, Alexei Starovoitov,
	Network Development, David Miller, Jesper Dangaard Brouer,
	Jakub Kicinski, Björn Töpel, Yonghong Song
In-Reply-To: <87k1bnsbds.fsf@toke.dk>

On Thu, Aug 8, 2019 at 12:43 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>
> > On Fri, Jul 26, 2019 at 9:06 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> This series adds a new map type, devmap_hash, that works like the existing
> >> devmap type, but using a hash-based indexing scheme. This is useful for the use
> >> case where a devmap is indexed by ifindex (for instance for use with the routing
> >> table lookup helper). For this use case, the regular devmap needs to be sized
> >> after the maximum ifindex number, not the number of devices in it. A hash-based
> >> indexing scheme makes it possible to size the map after the number of devices it
> >> should contain instead.
> >>
> >> This was previously part of my patch series that also turned the regular
> >> bpf_redirect() helper into a map-based one; for this series I just pulled out
> >> the patches that introduced the new map type.
> >>
> >> Changelog:
> >>
> >> v5:
> >>
> >> - Dynamically set the number of hash buckets by rounding up max_entries to the
> >>   nearest power of two (mirroring the regular hashmap), as suggested by Jesper.
> >
> > fyi I'm waiting for Jesper to review this new version.
>
> Ping Jesper? :)

Toke, the patch set has been merged to net-next.
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=d3406913561c322323ec2898cc58f55e79786be7

>
> -Toke

^ permalink raw reply

* Re: [v3,2/4] tools: bpftool: add net detach command to detach XDP on interface
From: Y Song @ 2019-08-08 19:52 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: Daniel T. Lee, Daniel Borkmann, Alexei Starovoitov, netdev,
	Jakub Kicinski
In-Reply-To: <20190807223807.00002740@gmail.com>

On Wed, Aug 7, 2019 at 1:40 PM Maciej Fijalkowski
<maciejromanfijalkowski@gmail.com> wrote:
>
> On Wed, 7 Aug 2019 13:12:17 -0700
> Y Song <ys114321@gmail.com> wrote:
>
> > On Wed, Aug 7, 2019 at 11:30 AM Maciej Fijalkowski
> > <maciejromanfijalkowski@gmail.com> wrote:
> > >
> > > On Wed, 7 Aug 2019 10:02:04 -0700
> > > Y Song <ys114321@gmail.com> wrote:
> > >
> > > > On Tue, Aug 6, 2019 at 7:25 PM Daniel T. Lee <danieltimlee@gmail.com> wrote:
> > > > >
> > > > > By this commit, using `bpftool net detach`, the attached XDP prog can
> > > > > be detached. Detaching the BPF prog will be done through libbpf
> > > > > 'bpf_set_link_xdp_fd' with the progfd set to -1.
> > > > >
> > > > > Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
> > > > > ---
> > > > >  tools/bpf/bpftool/net.c | 42 ++++++++++++++++++++++++++++++++++++++++-
> > > > >  1 file changed, 41 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/tools/bpf/bpftool/net.c b/tools/bpf/bpftool/net.c
> > > > > index c05a3fac5cac..7be96acb08e0 100644
> > > > > --- a/tools/bpf/bpftool/net.c
> > > > > +++ b/tools/bpf/bpftool/net.c
> > > > > @@ -343,6 +343,43 @@ static int do_attach(int argc, char **argv)
> > > > >         return 0;
> > > > >  }
> > > > >
> > > > > +static int do_detach(int argc, char **argv)
> > > > > +{
> > > > > +       enum net_attach_type attach_type;
> > > > > +       int progfd, ifindex, err = 0;
> > > > > +
> > > > > +       /* parse detach args */
> > > > > +       if (!REQ_ARGS(3))
> > > > > +               return -EINVAL;
> > > > > +
> > > > > +       attach_type = parse_attach_type(*argv);
> > > > > +       if (attach_type == max_net_attach_type) {
> > > > > +               p_err("invalid net attach/detach type");
> > > > > +               return -EINVAL;
> > > > > +       }
> > > > > +
> > > > > +       NEXT_ARG();
> > > > > +       ifindex = net_parse_dev(&argc, &argv);
> > > > > +       if (ifindex < 1)
> > > > > +               return -EINVAL;
> > > > > +
> > > > > +       /* detach xdp prog */
> > > > > +       progfd = -1;
> > > > > +       if (is_prefix("xdp", attach_type_strings[attach_type]))
> > > > > +               err = do_attach_detach_xdp(progfd, attach_type, ifindex, NULL);
> > > >
> > > > I found an issue here. This is probably related to do_attach_detach_xdp.
> > > >
> > > > -bash-4.4$ sudo ./bpftool net attach x pinned /sys/fs/bpf/xdp_example dev v1
> > > > -bash-4.4$ sudo ./bpftool net
> > > > xdp:
> > > > v1(4) driver id 1172
> > > >
> > > > tc:
> > > > eth0(2) clsact/ingress fbflow_icmp id 29 act []
> > > > eth0(2) clsact/egress cls_fg_dscp_section id 27 act []
> > > > eth0(2) clsact/egress fbflow_egress id 28
> > > > eth0(2) clsact/egress fbflow_sslwall_egress id 35
> > > >
> > > > flow_dissector:
> > > >
> > > > -bash-4.4$ sudo ./bpftool net detach x dev v2
> > >
> > > Shouldn't this be v1 as dev?
> >
> > I am testing a scenario where with wrong devname
> > we did not return an error.
>
> Ah ok. In this scenario if driver has a native xdp support we would be invoking
> its ndo_bpf even if there's no prog currently attached and it wouldn't return
> error value.
>
> Looking at dev_xdp_uninstall, setting driver's prog to NULL is being done only
> when prog is attached. Maybe we should consider querying the driver in
> dev_change_xdp_fd regardless of passed fd value? E.g. don't query only when
> prog >= 0.
>
> I don't recall whether this was brought up previously.

Thanks for explanation. I think return an error is better in
such error cases. Otherwise, people mistakenly write wrong
device name and they may think xdp is detached and it is
actually not.

But this probably should not prevent
this patch as it is more like a kernel issue.

>
> CCing Jakub so we have one thread.
>
> Maciej
>
> >
> > Yes, if dev "v1", it works as expected.
> >
> > >
> > > > -bash-4.4$ sudo ./bpftool net
> > > > xdp:
> > > > v1(4) driver id 1172
> > > >
> > > > tc:
> > > > eth0(2) clsact/ingress fbflow_icmp id 29 act []
> > > > eth0(2) clsact/egress cls_fg_dscp_section id 27 act []
> > > > eth0(2) clsact/egress fbflow_egress id 28
> > > > eth0(2) clsact/egress fbflow_sslwall_egress id 35
> > > >
> > > > flow_dissector:
> > > >
> > > > -bash-4.4$
> > > >
> > > > Basically detaching may fail due to wrong dev name or wrong type, etc.
> > > > But the tool did not return an error. Is this expected?
> > > > This may be related to this funciton "bpf_set_link_xdp_fd()".
> > > > So this patch itself should be okay.
> > > >
> > > > > +
> > > > > +       if (err < 0) {
> > > > > +               p_err("interface %s detach failed",
> > > > > +                     attach_type_strings[attach_type]);
> > > > > +               return err;
> > > > > +       }
> > > > > +
> > > > > +       if (json_output)
> > > > > +               jsonw_null(json_wtr);
> > > > > +
> > > > > +       return 0;
> > > > > +}
> > > > > +
> > > > >  static int do_show(int argc, char **argv)
> > > > >  {
> > > > >         struct bpf_attach_info attach_info = {};
> > > > > @@ -419,6 +456,7 @@ static int do_help(int argc, char **argv)
> > > > >         fprintf(stderr,
> > > > >                 "Usage: %s %s { show | list } [dev <devname>]\n"
> > > > >                 "       %s %s attach ATTACH_TYPE PROG dev <devname> [ overwrite ]\n"
> > > > > +               "       %s %s detach ATTACH_TYPE dev <devname>\n"
> > > > >                 "       %s %s help\n"
> > > > >                 "\n"
> > > > >                 "       " HELP_SPEC_PROGRAM "\n"
> > > > > @@ -429,7 +467,8 @@ static int do_help(int argc, char **argv)
> > > > >                 "      to dump program attachments. For program types\n"
> > > > >                 "      sk_{filter,skb,msg,reuseport} and lwt/seg6, please\n"
> > > > >                 "      consult iproute2.\n",
> > > > > -               bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2]);
> > > > > +               bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
> > > > > +               bin_name, argv[-2]);
> > > > >
> > > > >         return 0;
> > > > >  }
> > > > > @@ -438,6 +477,7 @@ static const struct cmd cmds[] = {
> > > > >         { "show",       do_show },
> > > > >         { "list",       do_show },
> > > > >         { "attach",     do_attach },
> > > > > +       { "detach",     do_detach },
> > > > >         { "help",       do_help },
> > > > >         { 0 }
> > > > >  };
> > > > > --
> > > > > 2.20.1
> > > > >
> > >
>

^ permalink raw reply

* Re: [PATCH] pcan_usb_fd: zero out the common command buffer
From: Marc Kleine-Budde @ 2019-08-08 19:52 UTC (permalink / raw)
  To: David Miller, oneukum; +Cc: netdev, wg, linux-can
In-Reply-To: <20190808.110330.1163430543960132818.davem@davemloft.net>


[-- Attachment #1.1: Type: text/plain, Size: 863 bytes --]

On 8/8/19 8:03 PM, David Miller wrote:
> From: Oliver Neukum <oneukum@suse.com>
> Date: Thu,  8 Aug 2019 11:28:25 +0200
> 
>> Lest we leak kernel memory to a device we better zero out buffers.
>>
>> Reported-by: syzbot+513e4d0985298538bf9b@syzkaller.appspotmail.com
>> Signed-off-by: Oliver Neukum <oneukum@suse.com>
> 
> Please CC: the CAN subsystem maintainers, as this is clearly listed in the
> MAINTAINERS file.

The issue is already fixed and already mainline (stable@v.k.o is on Cc):

30a8beeb3042 can: peak_usb: pcan_usb_fd: Fix info-leaks to USB devices

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH net-next] r8169: make use of xmit_more
From: Holger Hoffstätte @ 2019-08-08 19:52 UTC (permalink / raw)
  To: Heiner Kallweit, Realtek linux nic maintainers, David Miller
  Cc: netdev@vger.kernel.org, Sander Eikelenboom, Eric Dumazet
In-Reply-To: <a19bb3de-a866-d342-7cca-020fef219d03@gmail.com>

On 8/8/19 8:17 PM, Heiner Kallweit wrote:
> On 08.08.2019 17:53, Holger Hoffstätte wrote:
>> On 8/8/19 4:37 PM, Holger Hoffstätte wrote:
>>>
>>> Hello Heiner -
>>>
>>> On 7/28/19 11:25 AM, Heiner Kallweit wrote:
>>>> There was a previous attempt to use xmit_more, but the change had to be
>>>> reverted because under load sometimes a transmit timeout occurred [0].
>>>> Maybe this was caused by a missing memory barrier, the new attempt
>>>> keeps the memory barrier before the call to netif_stop_queue like it
>>>> is used by the driver as of today. The new attempt also changes the
>>>> order of some calls as suggested by Eric.
>>>>
>>>> [0] https://lkml.org/lkml/2019/2/10/39
>>>>
>>>> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>>>
>>> I decided to take one for the team and merged this into my 5.2.x tree (just
>>> fixing up the path) and it has been working fine for the last 2 weeks in two
>>> machines..until today, when for the first time in forever some random NFS traffic
>>> made this old friend come out from under the couch:
>>>
>>> [Aug 8 14:13] ------------[ cut here ]------------
>>> [  +0.000006] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
>>> [  +0.000021] WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:442 dev_watchdog+0x21f/0x230
>>> [  +0.000001] Modules linked in: lz4 lz4_compress lz4_decompress nfsd auth_rpcgss oid_registry lockd grace sunrpc sch_fq_codel btrfs xor zstd_compress raid6_pq zstd_decompress bfq jitterentropy_rng nct6775 hwmon_vid coretemp hwmon x86_pkg_temp_thermal aesni_intel aes_x86_64 i915 glue_helper crypto_simd cryptd i2c_i801 intel_gtt i2c_algo_bit iosf_mbi drm_kms_helper syscopyarea usbhid sysfillrect r8169 sysimgblt fb_sys_fops realtek drm libphy drm_panel_orientation_quirks i2c_core video backlight mq_deadline
>>> [  +0.000026] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.2.7 #1
>>> [  +0.000001] Hardware name: System manufacturer System Product Name/P8Z68-V LX, BIOS 4105 07/01/2013
>>> [  +0.000004] RIP: 0010:dev_watchdog+0x21f/0x230
>>> [  +0.000002] Code: 3b 00 75 ea eb ad 4c 89 ef c6 05 1c 45 bd 00 01 e8 66 35 fc ff 44 89 e1 4c 89 ee 48 c7 c7 e8 5e fc 81 48 89 c2 e8 90 df 92 ff <0f> 0b eb 8e 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 66 66 66 66 90
>>> [  +0.000002] RSP: 0018:ffffc90000118e68 EFLAGS: 00010286
>>> [  +0.000002] RAX: 0000000000000000 RBX: ffff8887f7837600 RCX: 0000000000000303
>>> [  +0.000001] RDX: 0000000000000001 RSI: 0000000000000092 RDI: ffffffff827a488c
>>> [  +0.000001] RBP: ffff8887f9fbc440 R08: 0000000000000303 R09: 0000000000000003
>>> [  +0.000001] R10: 000000000001004c R11: 0000000000000001 R12: 0000000000000000
>>> [  +0.000009] R13: ffff8887f9fbc000 R14: ffffffff8173aa20 R15: dead000000000200
>>> [  +0.000001] FS:  0000000000000000(0000) GS:ffff8887ff580000(0000) knlGS:0000000000000000
>>> [  +0.000000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [  +0.000001] CR2: 00007f8d1c04d000 CR3: 0000000002209001 CR4: 00000000000606e0
>>> [  +0.000000] Call Trace:
>>> [  +0.000002]  <IRQ>
>>> [  +0.000005]  call_timer_fn+0x2b/0x120
>>> [  +0.000002]  expire_timers+0xa4/0x100
>>> [  +0.000001]  run_timer_softirq+0x8c/0x170
>>> [  +0.000002]  ? __hrtimer_run_queues+0x13a/0x290
>>> [  +0.000003]  ? sched_clock_cpu+0xe/0x130
>>> [  +0.000003]  __do_softirq+0xeb/0x2de
>>> [  +0.000003]  irq_exit+0x9d/0xe0
>>> [  +0.000002]  smp_apic_timer_interrupt+0x60/0x110
>>> [  +0.000003]  apic_timer_interrupt+0xf/0x20
>>> [  +0.000001]  </IRQ>
>>> [  +0.000003] RIP: 0010:cpuidle_enter_state+0xad/0x930
>>> [  +0.000001] Code: c5 66 66 66 66 90 31 ff e8 90 99 9e ff 80 7c 24 0b 00 74 12 9c 58 f6 c4 02 0f 85 39 08 00 00 31 ff e8 e7 26 a2 ff fb 45 85 e4 <0f> 88 34 02 00 00 49 63 cc 4c 2b 2c 24 48 8d 04 49 48 c1 e0 05 8b
>>> [  +0.000000] RSP: 0018:ffffc9000008be50 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
>>> [  +0.000001] RAX: ffff8887ff5a9180 RBX: ffffffff822b6c40 RCX: 000000000000001f
>>> [  +0.000001] RDX: 0000000000000000 RSI: 0000000033087154 RDI: 0000000000000000
>>> [  +0.000001] RBP: ffff8887ff5b1310 R08: 000030d021fae397 R09: ffff8887ff59c8c0
>>> [  +0.000000] R10: ffff8887ff59c8c0 R11: 0000000000000006 R12: 0000000000000004
>>> [  +0.000001] R13: 000030d021fae397 R14: 0000000000000004 R15: ffff8887fc281600
>>> [  +0.000001]  cpuidle_enter+0x29/0x40
>>> [  +0.000002]  do_idle+0x1e5/0x280
>>> [  +0.000001]  cpu_startup_entry+0x19/0x20
>>> [  +0.000002]  start_secondary+0x186/0x1c0
>>> [  +0.000001]  secondary_startup_64+0xa4/0xb0
>>> [  +0.000001] ---[ end trace 99493c768580f4fd ]---
>>>
>>> The device is:
>>>
>>> Aug  7 23:19:09 tux kernel: libphy: r8169: probed
>>> Aug  7 23:19:09 tux kernel: r8169 0000:04:00.0 eth0: RTL8168evl/8111evl, c8:60:00:68:33:cc, XID 2c9, IRQ 36
>>> Aug  7 23:19:09 tux kernel: r8169 0000:04:00.0 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko]
>>> Aug  7 23:19:12 tux kernel: RTL8211E Gigabit Ethernet r8169-400:00: attached PHY driver [RTL8211E Gigabit Ethernet] (mii_bus:phy_addr=r8169-400:00, irq=IGNORE)
>>> Aug  7 23:19:13 tux kernel: r8169 0000:04:00.0 eth0: No native access to PCI extended config space, falling back to CSI
>>>
>>> and using fq_codel, of course.
>>>
>>> This cpuidle hiccup used to be completely gone without xmit_more and this was
>>> the first (and so far only) time since merging it (regardless of load).
>>> Also, while I'm using BMQ as CPU scheduler, that hasn't made a difference for
>>> this particular problem in the past (with MuQSS/PDS) either; way back when I had
>>> Eric's previous attempt(s) it also hiccupped with CFS.
>>>
>>> Revert or wait for more reports when -next is merged in 5.4?
>>
>> Another question/data point: I've had the whole basket of offloads activated:
>>
>>    ethtool --offload eth0 rx on tx on gro on gso on sg on tso on
>>
>> and this caused zero problems without the xmit_more patch. However I just saw
>> that net-next has a patch where TSO is disabled due to a known HW defect in
>> RTL8168evl, which is of course what I have. Could this be the reason for the
>> stall/hiccup when xmit_more has its fingers in the pie? I kind of know what
>> xmit_more does, just not how it could interact with a possibly broken TSO that
>> nevertheless seems to work fine otherwise..
>>
> 
> I was about to ask exactly that, whether you have TSO enabled. I don't know what
> can trigger the HW issue, it was just confirmed by Realtek that this chip version
> has a problem with TSO. So the logical conclusion is: test w/o TSO, ideally the
> linux-next version.

So disabling TSO alone didn't work - it leads to reduced throughout (~70 MB/s in iperf).
Instead I decided to backport 93681cd7d94f ("r8169: enable HW csum and TSO"), which
wasn't easy due to cleanups/renamings of dependencies, but I managed to backport
it and .. got the same problem of reduced throughout. wat?!

After lots of trial & error I started disabling all offloads and finally found
that sg (Scatter-Gather) enabled alone - without TSO - will lead to the throughput
drop. So the culprit seems 93681cd7d94f, which disabled TSO on my NIC, but left
sg on by default. This weas repeatable - switch on sg, throughput drop; turn it
off - smooth sailing, now with reduced buffers.

I modified the relevant bits to disable tso & sg like this:

	/* RTL8168e-vl has a HW issue with TSO */
	if (tp->mac_version == RTL_GIGA_MAC_VER_34) {
+		dev->vlan_features &= ~(NETIF_F_ALL_TSO|NETIF_F_SG);
+		dev->hw_features &= ~(NETIF_F_ALL_TSO|NETIF_F_SG);
+		dev->features &= ~(NETIF_F_ALL_TSO|NETIF_F_SG);
	}

This seems to work since it restores performance without sg/tso by default
and without any additional offloads, yet with xmit_more in the mix.
We'll see whether that is stable over the next few days, but I strongly
suspect it will be good and that the hiccups were due to xmit_more/TSO
interaction.

thanks,
Holger

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox