* [PATCH net V3 1/3] net: tls: Change async resync helpers argument
2025-10-26 20:03 [PATCH net V3 0/3] tls: Introduce and use RX async resync request cancel function Tariq Toukan
@ 2025-10-26 20:03 ` Tariq Toukan
2025-10-26 20:03 ` [PATCH net V3 2/3] net: tls: Cancel RX async resync request on rcd_delta overflow Tariq Toukan
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Tariq Toukan @ 2025-10-26 20:03 UTC (permalink / raw)
To: Sabrina Dubroca, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Andrew Lunn, David S. Miller
Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch,
John Fastabend, netdev, linux-rdma, linux-kernel, Gal Pressman,
Shahar Shitrit
From: Shahar Shitrit <shshitrit@nvidia.com>
Update tls_offload_rx_resync_async_request_start() and
tls_offload_rx_resync_async_request_end() to get a struct
tls_offload_resync_async parameter directly, rather than
extracting it from struct sock.
This change aligns the function signatures with the upcoming
tls_offload_rx_resync_async_request_cancel() helper, which
will be introduced in a subsequent patch.
Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
.../mellanox/mlx5/core/en_accel/ktls_rx.c | 9 ++++++--
include/net/tls.h | 21 +++++++------------
2 files changed, 15 insertions(+), 15 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_rx.c
index d7a11ff9bbdb..5fbc92269585 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_rx.c
@@ -425,12 +425,14 @@ void mlx5e_ktls_handle_get_psv_completion(struct mlx5e_icosq_wqe_info *wi,
{
struct mlx5e_ktls_rx_resync_buf *buf = wi->tls_get_params.buf;
struct mlx5e_ktls_offload_context_rx *priv_rx;
+ struct tls_offload_context_rx *rx_ctx;
u8 tracker_state, auth_state, *ctx;
struct device *dev;
u32 hw_seq;
priv_rx = buf->priv_rx;
dev = mlx5_core_dma_dev(sq->channel->mdev);
+ rx_ctx = tls_offload_ctx_rx(tls_get_ctx(priv_rx->sk));
if (unlikely(test_bit(MLX5E_PRIV_RX_FLAG_DELETING, priv_rx->flags)))
goto out;
@@ -447,7 +449,8 @@ void mlx5e_ktls_handle_get_psv_completion(struct mlx5e_icosq_wqe_info *wi,
}
hw_seq = MLX5_GET(tls_progress_params, ctx, hw_resync_tcp_sn);
- tls_offload_rx_resync_async_request_end(priv_rx->sk, cpu_to_be32(hw_seq));
+ tls_offload_rx_resync_async_request_end(rx_ctx->resync_async,
+ cpu_to_be32(hw_seq));
priv_rx->rq_stats->tls_resync_req_end++;
out:
mlx5e_ktls_priv_rx_put(priv_rx);
@@ -482,6 +485,7 @@ static bool resync_queue_get_psv(struct sock *sk)
static void resync_update_sn(struct mlx5e_rq *rq, struct sk_buff *skb)
{
struct ethhdr *eth = (struct ethhdr *)(skb->data);
+ struct tls_offload_resync_async *resync_async;
struct net_device *netdev = rq->netdev;
struct net *net = dev_net(netdev);
struct sock *sk = NULL;
@@ -527,7 +531,8 @@ static void resync_update_sn(struct mlx5e_rq *rq, struct sk_buff *skb)
seq = th->seq;
datalen = skb->len - depth;
- tls_offload_rx_resync_async_request_start(sk, seq, datalen);
+ resync_async = tls_offload_ctx_rx(tls_get_ctx(sk))->resync_async;
+ tls_offload_rx_resync_async_request_start(resync_async, seq, datalen);
rq->stats->tls_resync_req_start++;
unref:
diff --git a/include/net/tls.h b/include/net/tls.h
index 857340338b69..b90f3b675c3c 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -451,25 +451,20 @@ static inline void tls_offload_rx_resync_request(struct sock *sk, __be32 seq)
/* Log all TLS record header TCP sequences in [seq, seq+len] */
static inline void
-tls_offload_rx_resync_async_request_start(struct sock *sk, __be32 seq, u16 len)
+tls_offload_rx_resync_async_request_start(struct tls_offload_resync_async *resync_async,
+ __be32 seq, u16 len)
{
- struct tls_context *tls_ctx = tls_get_ctx(sk);
- struct tls_offload_context_rx *rx_ctx = tls_offload_ctx_rx(tls_ctx);
-
- atomic64_set(&rx_ctx->resync_async->req, ((u64)ntohl(seq) << 32) |
+ atomic64_set(&resync_async->req, ((u64)ntohl(seq) << 32) |
((u64)len << 16) | RESYNC_REQ | RESYNC_REQ_ASYNC);
- rx_ctx->resync_async->loglen = 0;
- rx_ctx->resync_async->rcd_delta = 0;
+ resync_async->loglen = 0;
+ resync_async->rcd_delta = 0;
}
static inline void
-tls_offload_rx_resync_async_request_end(struct sock *sk, __be32 seq)
+tls_offload_rx_resync_async_request_end(struct tls_offload_resync_async *resync_async,
+ __be32 seq)
{
- struct tls_context *tls_ctx = tls_get_ctx(sk);
- struct tls_offload_context_rx *rx_ctx = tls_offload_ctx_rx(tls_ctx);
-
- atomic64_set(&rx_ctx->resync_async->req,
- ((u64)ntohl(seq) << 32) | RESYNC_REQ);
+ atomic64_set(&resync_async->req, ((u64)ntohl(seq) << 32) | RESYNC_REQ);
}
static inline void
--
2.31.1
^ permalink raw reply related [flat|nested] 5+ messages in thread* [PATCH net V3 2/3] net: tls: Cancel RX async resync request on rcd_delta overflow
2025-10-26 20:03 [PATCH net V3 0/3] tls: Introduce and use RX async resync request cancel function Tariq Toukan
2025-10-26 20:03 ` [PATCH net V3 1/3] net: tls: Change async resync helpers argument Tariq Toukan
@ 2025-10-26 20:03 ` Tariq Toukan
2025-10-26 20:03 ` [PATCH net V3 3/3] net/mlx5e: kTLS, Cancel RX async resync request in error flows Tariq Toukan
2025-10-30 1:50 ` [PATCH net V3 0/3] tls: Introduce and use RX async resync request cancel function patchwork-bot+netdevbpf
3 siblings, 0 replies; 5+ messages in thread
From: Tariq Toukan @ 2025-10-26 20:03 UTC (permalink / raw)
To: Sabrina Dubroca, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Andrew Lunn, David S. Miller
Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch,
John Fastabend, netdev, linux-rdma, linux-kernel, Gal Pressman,
Shahar Shitrit
From: Shahar Shitrit <shshitrit@nvidia.com>
When a netdev issues a RX async resync request for a TLS connection,
the TLS module handles it by logging record headers and attempting to
match them to the tcp_sn provided by the device. If a match is found,
the TLS module approves the tcp_sn for resynchronization.
While waiting for a device response, the TLS module also increments
rcd_delta each time a new TLS record is received, tracking the distance
from the original resync request.
However, if the device response is delayed or fails (e.g due to
unstable connection and device getting out of tracking, hardware
errors, resource exhaustion etc.), the TLS module keeps logging and
incrementing, which can lead to a WARN() when rcd_delta exceeds the
threshold.
To address this, introduce tls_offload_rx_resync_async_request_cancel()
to explicitly cancel resync requests when a device response failure is
detected. Call this helper also as a final safeguard when rcd_delta
crosses its threshold, as reaching this point implies that earlier
cancellation did not occur.
Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
include/net/tls.h | 6 ++++++
net/tls/tls_device.c | 4 +++-
2 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/include/net/tls.h b/include/net/tls.h
index b90f3b675c3c..c7bcdb3afad7 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -467,6 +467,12 @@ tls_offload_rx_resync_async_request_end(struct tls_offload_resync_async *resync_
atomic64_set(&resync_async->req, ((u64)ntohl(seq) << 32) | RESYNC_REQ);
}
+static inline void
+tls_offload_rx_resync_async_request_cancel(struct tls_offload_resync_async *resync_async)
+{
+ atomic64_set(&resync_async->req, 0);
+}
+
static inline void
tls_offload_rx_resync_set_type(struct sock *sk, enum tls_offload_sync_type type)
{
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index a64ae15b1a60..71734411ff4c 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -723,8 +723,10 @@ tls_device_rx_resync_async(struct tls_offload_resync_async *resync_async,
/* shouldn't get to wraparound:
* too long in async stage, something bad happened
*/
- if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX))
+ if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX)) {
+ tls_offload_rx_resync_async_request_cancel(resync_async);
return false;
+ }
/* asynchronous stage: log all headers seq such that
* req_seq <= seq <= end_seq, and wait for real resync request
--
2.31.1
^ permalink raw reply related [flat|nested] 5+ messages in thread* [PATCH net V3 3/3] net/mlx5e: kTLS, Cancel RX async resync request in error flows
2025-10-26 20:03 [PATCH net V3 0/3] tls: Introduce and use RX async resync request cancel function Tariq Toukan
2025-10-26 20:03 ` [PATCH net V3 1/3] net: tls: Change async resync helpers argument Tariq Toukan
2025-10-26 20:03 ` [PATCH net V3 2/3] net: tls: Cancel RX async resync request on rcd_delta overflow Tariq Toukan
@ 2025-10-26 20:03 ` Tariq Toukan
2025-10-30 1:50 ` [PATCH net V3 0/3] tls: Introduce and use RX async resync request cancel function patchwork-bot+netdevbpf
3 siblings, 0 replies; 5+ messages in thread
From: Tariq Toukan @ 2025-10-26 20:03 UTC (permalink / raw)
To: Sabrina Dubroca, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Andrew Lunn, David S. Miller
Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch,
John Fastabend, netdev, linux-rdma, linux-kernel, Gal Pressman,
Shahar Shitrit
From: Shahar Shitrit <shshitrit@nvidia.com>
When device loses track of TLS records, it attempts to resync by
monitoring records and requests an asynchronous resynchronization
from software for this TLS connection.
The TLS module handles such device RX resync requests by logging record
headers and comparing them with the record tcp_sn when provided by the
device. It also increments rcd_delta to track how far the current
record tcp_sn is from the tcp_sn of the original resync request.
If the device later responds with a matching tcp_sn, the TLS module
approves the tcp_sn for resync.
However, the device response may be delayed or never arrive,
particularly due to traffic-related issues such as packet drops or
reordering. In such cases, the TLS module remains unaware that resync
will not complete, and continues performing unnecessary work by logging
headers and incrementing rcd_delta, which can eventually exceed the
threshold and trigger a WARN(). For example, this was observed when the
device got out of tracking, causing
mlx5e_ktls_handle_get_psv_completion() to fail and ultimately leading
to the rcd_delta warning.
To address this, call tls_offload_rx_resync_async_request_cancel()
to cancel the resync request and stop resync tracking in such error
cases. Also, increment the tls_resync_req_skip counter to track these
cancellations.
Fixes: 0419d8c9d8f8 ("net/mlx5e: kTLS, Add kTLS RX resync support")
Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
.../mellanox/mlx5/core/en_accel/ktls_rx.c | 34 ++++++++++++++++---
.../mellanox/mlx5/core/en_accel/ktls_txrx.h | 4 +++
.../net/ethernet/mellanox/mlx5/core/en_rx.c | 4 +++
3 files changed, 37 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_rx.c
index 5fbc92269585..da2d1eb52c13 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_rx.c
@@ -320,7 +320,6 @@ resync_post_get_progress_params(struct mlx5e_icosq *sq,
err_free:
kfree(buf);
err_out:
- priv_rx->rq_stats->tls_resync_req_skip++;
return err;
}
@@ -339,14 +338,19 @@ static void resync_handle_work(struct work_struct *work)
if (unlikely(test_bit(MLX5E_PRIV_RX_FLAG_DELETING, priv_rx->flags))) {
mlx5e_ktls_priv_rx_put(priv_rx);
+ priv_rx->rq_stats->tls_resync_req_skip++;
+ tls_offload_rx_resync_async_request_cancel(&resync->core);
return;
}
c = resync->priv->channels.c[priv_rx->rxq];
sq = &c->async_icosq;
- if (resync_post_get_progress_params(sq, priv_rx))
+ if (resync_post_get_progress_params(sq, priv_rx)) {
+ priv_rx->rq_stats->tls_resync_req_skip++;
+ tls_offload_rx_resync_async_request_cancel(&resync->core);
mlx5e_ktls_priv_rx_put(priv_rx);
+ }
}
static void resync_init(struct mlx5e_ktls_rx_resync_ctx *resync,
@@ -425,6 +429,7 @@ void mlx5e_ktls_handle_get_psv_completion(struct mlx5e_icosq_wqe_info *wi,
{
struct mlx5e_ktls_rx_resync_buf *buf = wi->tls_get_params.buf;
struct mlx5e_ktls_offload_context_rx *priv_rx;
+ struct tls_offload_resync_async *async_resync;
struct tls_offload_context_rx *rx_ctx;
u8 tracker_state, auth_state, *ctx;
struct device *dev;
@@ -433,8 +438,12 @@ void mlx5e_ktls_handle_get_psv_completion(struct mlx5e_icosq_wqe_info *wi,
priv_rx = buf->priv_rx;
dev = mlx5_core_dma_dev(sq->channel->mdev);
rx_ctx = tls_offload_ctx_rx(tls_get_ctx(priv_rx->sk));
- if (unlikely(test_bit(MLX5E_PRIV_RX_FLAG_DELETING, priv_rx->flags)))
+ async_resync = rx_ctx->resync_async;
+ if (unlikely(test_bit(MLX5E_PRIV_RX_FLAG_DELETING, priv_rx->flags))) {
+ priv_rx->rq_stats->tls_resync_req_skip++;
+ tls_offload_rx_resync_async_request_cancel(async_resync);
goto out;
+ }
dma_sync_single_for_cpu(dev, buf->dma_addr, PROGRESS_PARAMS_PADDED_SIZE,
DMA_FROM_DEVICE);
@@ -445,11 +454,12 @@ void mlx5e_ktls_handle_get_psv_completion(struct mlx5e_icosq_wqe_info *wi,
if (tracker_state != MLX5E_TLS_PROGRESS_PARAMS_RECORD_TRACKER_STATE_TRACKING ||
auth_state != MLX5E_TLS_PROGRESS_PARAMS_AUTH_STATE_NO_OFFLOAD) {
priv_rx->rq_stats->tls_resync_req_skip++;
+ tls_offload_rx_resync_async_request_cancel(async_resync);
goto out;
}
hw_seq = MLX5_GET(tls_progress_params, ctx, hw_resync_tcp_sn);
- tls_offload_rx_resync_async_request_end(rx_ctx->resync_async,
+ tls_offload_rx_resync_async_request_end(async_resync,
cpu_to_be32(hw_seq));
priv_rx->rq_stats->tls_resync_req_end++;
out:
@@ -475,8 +485,10 @@ static bool resync_queue_get_psv(struct sock *sk)
resync = &priv_rx->resync;
mlx5e_ktls_priv_rx_get(priv_rx);
- if (unlikely(!queue_work(resync->priv->tls->rx_wq, &resync->work)))
+ if (unlikely(!queue_work(resync->priv->tls->rx_wq, &resync->work))) {
mlx5e_ktls_priv_rx_put(priv_rx);
+ return false;
+ }
return true;
}
@@ -561,6 +573,18 @@ void mlx5e_ktls_rx_resync(struct net_device *netdev, struct sock *sk,
resync_handle_seq_match(priv_rx, c);
}
+void
+mlx5e_ktls_rx_resync_async_request_cancel(struct mlx5e_icosq_wqe_info *wi)
+{
+ struct mlx5e_ktls_offload_context_rx *priv_rx;
+ struct mlx5e_ktls_rx_resync_buf *buf;
+
+ buf = wi->tls_get_params.buf;
+ priv_rx = buf->priv_rx;
+ priv_rx->rq_stats->tls_resync_req_skip++;
+ tls_offload_rx_resync_async_request_cancel(&priv_rx->resync.core);
+}
+
/* End of resync section */
void mlx5e_ktls_handle_rx_skb(struct mlx5e_rq *rq, struct sk_buff *skb,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_txrx.h b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_txrx.h
index f87b65c560ea..cb08799769ee 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_txrx.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_txrx.h
@@ -29,6 +29,10 @@ void mlx5e_ktls_handle_get_psv_completion(struct mlx5e_icosq_wqe_info *wi,
void mlx5e_ktls_tx_handle_resync_dump_comp(struct mlx5e_txqsq *sq,
struct mlx5e_tx_wqe_info *wi,
u32 *dma_fifo_cc);
+
+void
+mlx5e_ktls_rx_resync_async_request_cancel(struct mlx5e_icosq_wqe_info *wi);
+
static inline bool
mlx5e_ktls_tx_try_handle_resync_dump_comp(struct mlx5e_txqsq *sq,
struct mlx5e_tx_wqe_info *wi,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 1c79adc51a04..26621a2972ec 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -1036,6 +1036,10 @@ int mlx5e_poll_ico_cq(struct mlx5e_cq *cq)
netdev_WARN_ONCE(cq->netdev,
"Bad OP in ICOSQ CQE: 0x%x\n",
get_cqe_opcode(cqe));
+#ifdef CONFIG_MLX5_EN_TLS
+ if (wi->wqe_type == MLX5E_ICOSQ_WQE_GET_PSV_TLS)
+ mlx5e_ktls_rx_resync_async_request_cancel(wi);
+#endif
mlx5e_dump_error_cqe(&sq->cq, sq->sqn,
(struct mlx5_err_cqe *)cqe);
mlx5_wq_cyc_wqe_dump(&sq->wq, ci, wi->num_wqebbs);
--
2.31.1
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH net V3 0/3] tls: Introduce and use RX async resync request cancel function
2025-10-26 20:03 [PATCH net V3 0/3] tls: Introduce and use RX async resync request cancel function Tariq Toukan
` (2 preceding siblings ...)
2025-10-26 20:03 ` [PATCH net V3 3/3] net/mlx5e: kTLS, Cancel RX async resync request in error flows Tariq Toukan
@ 2025-10-30 1:50 ` patchwork-bot+netdevbpf
3 siblings, 0 replies; 5+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-10-30 1:50 UTC (permalink / raw)
To: Tariq Toukan
Cc: sd, edumazet, kuba, pabeni, andrew+netdev, davem, saeedm, leon,
mbloch, john.fastabend, netdev, linux-rdma, linux-kernel, gal,
shshitrit
Hello:
This series was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Sun, 26 Oct 2025 22:03:00 +0200 you wrote:
> Hi,
>
> This is V3. Find previous one here:
> https://lore.kernel.org/all/1760943954-909301-1-git-send-email-tariqt@nvidia.com/
>
> This series by Shahar introduces RX async resync request cancel function
> in tls module, and uses it in mlx5e driver.
>
> [...]
Here is the summary with links:
- [net,V3,1/3] net: tls: Change async resync helpers argument
https://git.kernel.org/netdev/net/c/34892cfec0c2
- [net,V3,2/3] net: tls: Cancel RX async resync request on rcd_delta overflow
https://git.kernel.org/netdev/net/c/c15d5c62ab31
- [net,V3,3/3] net/mlx5e: kTLS, Cancel RX async resync request in error flows
https://git.kernel.org/netdev/net/c/426e9da3b284
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 5+ messages in thread