* [PATCH net 0/3] tls: Introduce and use RX async resync request cancel function
@ 2025-09-10 6:47 Tariq Toukan
2025-09-10 6:47 ` [PATCH net 1/3] net: tls: Introduce " Tariq Toukan
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Tariq Toukan @ 2025-09-10 6:47 UTC (permalink / raw)
To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
David S. Miller
Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch,
John Fastabend, Sabrina Dubroca, netdev, linux-rdma, linux-kernel,
Gal Pressman, Boris Pismenny, Shahar Shitrit
Hi,
This series by Shahar introduces RX async resync request cancel function
in tls module, and uses it in mlx5e driver.
For a device-offloaded TLS RX connection, the TLS module increments
rcd_delta each time a new TLS record is received, tracking the distance
from the original resync request. In the meanwhile, the device is
queried and is expected to respond, asynchronously.
However, if the device response is delayed or fails (e.g due to unstable
connection and device getting out of tracking, hardware errors, resource
exhaustion etc.), the TLS module keeps logging and incrementing
rcd_delta, which can lead to a WARN() when rcd_delta exceeds the
threshold.
This series improves this code area by canceling the resync request when
spotting an issue with the device response.
Regards,
Tariq
Shahar Shitrit (3):
net: tls: Introduce RX async resync request cancel function
net: tls: Cancel RX async resync request on rdc_delta overflow
net/mlx5e: kTLS, cancel RX async resync request in error flows
.../mellanox/mlx5/core/en_accel/ktls_rx.c | 29 +++++++++++++++++--
.../mellanox/mlx5/core/en_accel/ktls_txrx.h | 4 +++
.../net/ethernet/mellanox/mlx5/core/en_rx.c | 4 +++
include/net/tls.h | 9 ++++++
net/tls/tls_device.c | 5 +++-
5 files changed, 47 insertions(+), 4 deletions(-)
base-commit: 78dd8ad62cad4f5af22afc842890d531312bbb8a
--
2.31.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH net 1/3] net: tls: Introduce RX async resync request cancel function
2025-09-10 6:47 [PATCH net 0/3] tls: Introduce and use RX async resync request cancel function Tariq Toukan
@ 2025-09-10 6:47 ` Tariq Toukan
2025-09-10 6:47 ` [PATCH net 2/3] net: tls: Cancel RX async resync request on rdc_delta overflow Tariq Toukan
2025-09-10 6:47 ` [PATCH net 3/3] net/mlx5e: kTLS, cancel RX async resync request in error flows Tariq Toukan
2 siblings, 0 replies; 13+ messages in thread
From: Tariq Toukan @ 2025-09-10 6:47 UTC (permalink / raw)
To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
David S. Miller
Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch,
John Fastabend, Sabrina Dubroca, netdev, linux-rdma, linux-kernel,
Gal Pressman, Boris Pismenny, Shahar Shitrit
From: Shahar Shitrit <shshitrit@nvidia.com>
When a netdev requests a RX async resync for a TLS connection, the TLS
module handles it by logging record headers and attempting to match
them to the tcp_sn provided by the device. If a match is found, the
TLS module approves the tcp_sn for resynchronization.
While waiting for a device response, the TLS module also increments
rcd_delta each time a new TLS record is received, tracking the distance
from the original resync request.
However, if the device response is delayed or fails (e.g due to
unstable connection and device getting out of tracking, hardware
errors, resource exhaustion etc.), the TLS module keeps logging and
incrementing, which can lead to a WARN() when rcd_delta exceeds the
threshold.
To address that, introduce tls_offload_rx_resync_async_request_cancel()
to be called when device response fails.
A follow-up patch will use this function to cancel async resync
requests in mlx5 when the device is no longer able to complete the
resync.
Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
include/net/tls.h | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/include/net/tls.h b/include/net/tls.h
index 857340338b69..380492effe3f 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -472,6 +472,15 @@ tls_offload_rx_resync_async_request_end(struct sock *sk, __be32 seq)
((u64)ntohl(seq) << 32) | RESYNC_REQ);
}
+static inline void
+tls_offload_rx_resync_async_request_cancel(struct sock *sk)
+{
+ struct tls_context *tls_ctx = tls_get_ctx(sk);
+ struct tls_offload_context_rx *rx_ctx = tls_offload_ctx_rx(tls_ctx);
+
+ atomic64_set(&rx_ctx->resync_async->req, 0);
+}
+
static inline void
tls_offload_rx_resync_set_type(struct sock *sk, enum tls_offload_sync_type type)
{
--
2.31.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH net 2/3] net: tls: Cancel RX async resync request on rdc_delta overflow
2025-09-10 6:47 [PATCH net 0/3] tls: Introduce and use RX async resync request cancel function Tariq Toukan
2025-09-10 6:47 ` [PATCH net 1/3] net: tls: Introduce " Tariq Toukan
@ 2025-09-10 6:47 ` Tariq Toukan
2025-09-12 15:14 ` Sabrina Dubroca
2025-09-14 18:53 ` Jakub Kicinski
2025-09-10 6:47 ` [PATCH net 3/3] net/mlx5e: kTLS, cancel RX async resync request in error flows Tariq Toukan
2 siblings, 2 replies; 13+ messages in thread
From: Tariq Toukan @ 2025-09-10 6:47 UTC (permalink / raw)
To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
David S. Miller
Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch,
John Fastabend, Sabrina Dubroca, netdev, linux-rdma, linux-kernel,
Gal Pressman, Boris Pismenny, Shahar Shitrit
From: Shahar Shitrit <shshitrit@nvidia.com>
When a netdev issues an RX async resync request, the TLS module
increments rcd_delta for each new record that arrives. This tracks
how far the current record is from the point where synchronization
was lost.
When rcd_delta reaches its threshold, it indicates that the device
response is either excessively delayed or unlikely to arrive at all
(at that point, tcp_sn may have wrapped around, so a match would no
longer be valid anyway).
Previous patch introduced tls_offload_rx_resync_async_request_cancel()
to explicitly cancel resync requests when a device response failure
is detected.
This patch adds a final safeguard: cancel the async resync request when
rcd_delta crosses its threshold, as reaching this point implies that
earlier cancellation did not occur.
Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
net/tls/tls_device.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index f672a62a9a52..56c14f1647a4 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -721,8 +721,11 @@ tls_device_rx_resync_async(struct tls_offload_resync_async *resync_async,
/* shouldn't get to wraparound:
* too long in async stage, something bad happened
*/
- if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX))
+ if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX)) {
+ /* cancel resync request */
+ atomic64_set(&resync_async->req, 0);
return false;
+ }
/* asynchronous stage: log all headers seq such that
* req_seq <= seq <= end_seq, and wait for real resync request
--
2.31.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH net 3/3] net/mlx5e: kTLS, cancel RX async resync request in error flows
2025-09-10 6:47 [PATCH net 0/3] tls: Introduce and use RX async resync request cancel function Tariq Toukan
2025-09-10 6:47 ` [PATCH net 1/3] net: tls: Introduce " Tariq Toukan
2025-09-10 6:47 ` [PATCH net 2/3] net: tls: Cancel RX async resync request on rdc_delta overflow Tariq Toukan
@ 2025-09-10 6:47 ` Tariq Toukan
2 siblings, 0 replies; 13+ messages in thread
From: Tariq Toukan @ 2025-09-10 6:47 UTC (permalink / raw)
To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
David S. Miller
Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch,
John Fastabend, Sabrina Dubroca, netdev, linux-rdma, linux-kernel,
Gal Pressman, Boris Pismenny, Shahar Shitrit
From: Shahar Shitrit <shshitrit@nvidia.com>
When packets are lost and the device loses track of TLS records,
the device attempts to resync by tracking TLS records and requests
an async resync from software for this TLS connection.
The TLS module handles such device RX resync requests by logging record
headers and comparing them with the record tcp_sn when provided by the
device. It also increments rcd_delta to track how far the current
record tcp_sn is from the tcp_sn of the original resync request.
If the device later responds with a matching tcp_sn, the TLS module
approves the tcp_sn for resync.
However, the device response may be delayed or never arrive,
particularly due to traffic-related issues such as packet drops or
reordering. In such cases, the TLS module remains unaware that resync
will not complete, and continues performing unnecessary work by logging
headers and incrementing rcd_delta, which can eventually exceed the
threshold and trigger a WARN(). For example, this was observed when the
device got out of tracking, causing
mlx5e_ktls_handle_get_psv_completion() to fail and ultimately leading
to the rcd_delta warning.
To address this, call tls_offload_rx_resync_async_request_cancel()
to cancel the resync request and stop resync tracking in such error
cases. Also, increment the tls_resync_req_skip counter to track these
cancellations.
Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
.../mellanox/mlx5/core/en_accel/ktls_rx.c | 29 +++++++++++++++++--
.../mellanox/mlx5/core/en_accel/ktls_txrx.h | 4 +++
.../net/ethernet/mellanox/mlx5/core/en_rx.c | 4 +++
3 files changed, 34 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_rx.c
index 65ccb33edafb..ec9400de95c0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_rx.c
@@ -339,14 +339,19 @@ static void resync_handle_work(struct work_struct *work)
if (unlikely(test_bit(MLX5E_PRIV_RX_FLAG_DELETING, priv_rx->flags))) {
mlx5e_ktls_priv_rx_put(priv_rx);
+ priv_rx->rq_stats->tls_resync_req_skip++;
+ tls_offload_rx_resync_async_request_cancel(priv_rx->sk);
return;
}
c = resync->priv->channels.c[priv_rx->rxq];
sq = &c->async_icosq;
- if (resync_post_get_progress_params(sq, priv_rx))
+ if (resync_post_get_progress_params(sq, priv_rx)) {
+ priv_rx->rq_stats->tls_resync_req_skip++;
+ tls_offload_rx_resync_async_request_cancel(priv_rx->sk);
mlx5e_ktls_priv_rx_put(priv_rx);
+ }
}
static void resync_init(struct mlx5e_ktls_rx_resync_ctx *resync,
@@ -431,8 +436,11 @@ void mlx5e_ktls_handle_get_psv_completion(struct mlx5e_icosq_wqe_info *wi,
priv_rx = buf->priv_rx;
dev = mlx5_core_dma_dev(sq->channel->mdev);
- if (unlikely(test_bit(MLX5E_PRIV_RX_FLAG_DELETING, priv_rx->flags)))
+ if (unlikely(test_bit(MLX5E_PRIV_RX_FLAG_DELETING, priv_rx->flags))) {
+ priv_rx->rq_stats->tls_resync_req_skip++;
+ tls_offload_rx_resync_async_request_cancel(priv_rx->sk);
goto out;
+ }
dma_sync_single_for_cpu(dev, buf->dma_addr, PROGRESS_PARAMS_PADDED_SIZE,
DMA_FROM_DEVICE);
@@ -443,6 +451,7 @@ void mlx5e_ktls_handle_get_psv_completion(struct mlx5e_icosq_wqe_info *wi,
if (tracker_state != MLX5E_TLS_PROGRESS_PARAMS_RECORD_TRACKER_STATE_TRACKING ||
auth_state != MLX5E_TLS_PROGRESS_PARAMS_AUTH_STATE_NO_OFFLOAD) {
priv_rx->rq_stats->tls_resync_req_skip++;
+ tls_offload_rx_resync_async_request_cancel(priv_rx->sk);
goto out;
}
@@ -472,8 +481,10 @@ static bool resync_queue_get_psv(struct sock *sk)
resync = &priv_rx->resync;
mlx5e_ktls_priv_rx_get(priv_rx);
- if (unlikely(!queue_work(resync->priv->tls->rx_wq, &resync->work)))
+ if (unlikely(!queue_work(resync->priv->tls->rx_wq, &resync->work))) {
mlx5e_ktls_priv_rx_put(priv_rx);
+ return false;
+ }
return true;
}
@@ -557,6 +568,18 @@ void mlx5e_ktls_rx_resync(struct net_device *netdev, struct sock *sk,
resync_handle_seq_match(priv_rx, c);
}
+void
+mlx5e_ktls_rx_resync_async_request_cancel(struct mlx5e_icosq_wqe_info *wi)
+{
+ struct mlx5e_ktls_offload_context_rx *priv_rx;
+ struct mlx5e_ktls_rx_resync_buf *buf;
+
+ buf = wi->tls_get_params.buf;
+ priv_rx = buf->priv_rx;
+ priv_rx->rq_stats->tls_resync_req_skip++;
+ tls_offload_rx_resync_async_request_cancel(priv_rx->sk);
+}
+
/* End of resync section */
void mlx5e_ktls_handle_rx_skb(struct mlx5e_rq *rq, struct sk_buff *skb,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_txrx.h b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_txrx.h
index f87b65c560ea..cb08799769ee 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_txrx.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_txrx.h
@@ -29,6 +29,10 @@ void mlx5e_ktls_handle_get_psv_completion(struct mlx5e_icosq_wqe_info *wi,
void mlx5e_ktls_tx_handle_resync_dump_comp(struct mlx5e_txqsq *sq,
struct mlx5e_tx_wqe_info *wi,
u32 *dma_fifo_cc);
+
+void
+mlx5e_ktls_rx_resync_async_request_cancel(struct mlx5e_icosq_wqe_info *wi);
+
static inline bool
mlx5e_ktls_tx_try_handle_resync_dump_comp(struct mlx5e_txqsq *sq,
struct mlx5e_tx_wqe_info *wi,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index b8c609d91d11..c713e683ef1d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -1035,6 +1035,10 @@ int mlx5e_poll_ico_cq(struct mlx5e_cq *cq)
netdev_WARN_ONCE(cq->netdev,
"Bad OP in ICOSQ CQE: 0x%x\n",
get_cqe_opcode(cqe));
+#ifdef CONFIG_MLX5_EN_TLS
+ if (wi->wqe_type == MLX5E_ICOSQ_WQE_GET_PSV_TLS)
+ mlx5e_ktls_rx_resync_async_request_cancel(wi);
+#endif
mlx5e_dump_error_cqe(&sq->cq, sq->sqn,
(struct mlx5_err_cqe *)cqe);
mlx5_wq_cyc_wqe_dump(&sq->wq, ci, wi->num_wqebbs);
--
2.31.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH net 2/3] net: tls: Cancel RX async resync request on rdc_delta overflow
2025-09-10 6:47 ` [PATCH net 2/3] net: tls: Cancel RX async resync request on rdc_delta overflow Tariq Toukan
@ 2025-09-12 15:14 ` Sabrina Dubroca
2025-09-22 7:16 ` Shahar Shitrit
2025-09-14 18:53 ` Jakub Kicinski
1 sibling, 1 reply; 13+ messages in thread
From: Sabrina Dubroca @ 2025-09-12 15:14 UTC (permalink / raw)
To: Tariq Toukan
Cc: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
David S. Miller, Saeed Mahameed, Leon Romanovsky, Mark Bloch,
John Fastabend, netdev, linux-rdma, linux-kernel, Gal Pressman,
Boris Pismenny, Shahar Shitrit
2025-09-10, 09:47:40 +0300, Tariq Toukan wrote:
> From: Shahar Shitrit <shshitrit@nvidia.com>
>
> When a netdev issues an RX async resync request, the TLS module
> increments rcd_delta for each new record that arrives. This tracks
> how far the current record is from the point where synchronization
> was lost.
>
> When rcd_delta reaches its threshold, it indicates that the device
> response is either excessively delayed or unlikely to arrive at all
> (at that point, tcp_sn may have wrapped around, so a match would no
> longer be valid anyway).
>
> Previous patch introduced tls_offload_rx_resync_async_request_cancel()
> to explicitly cancel resync requests when a device response failure
> is detected.
>
> This patch adds a final safeguard: cancel the async resync request when
> rcd_delta crosses its threshold, as reaching this point implies that
> earlier cancellation did not occur.
>
> Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> ---
> net/tls/tls_device.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
> index f672a62a9a52..56c14f1647a4 100644
> --- a/net/tls/tls_device.c
> +++ b/net/tls/tls_device.c
> @@ -721,8 +721,11 @@ tls_device_rx_resync_async(struct tls_offload_resync_async *resync_async,
> /* shouldn't get to wraparound:
> * too long in async stage, something bad happened
> */
> - if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX))
> + if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX)) {
Do we still need to WARN here? It's a condition that can actually
happen (even if it's rare), and that the stack can handle, so maybe
not?
> + /* cancel resync request */
> + atomic64_set(&resync_async->req, 0);
> return false;
> + }
>
> /* asynchronous stage: log all headers seq such that
> * req_seq <= seq <= end_seq, and wait for real resync request
> --
> 2.31.1
>
--
Sabrina
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net 2/3] net: tls: Cancel RX async resync request on rdc_delta overflow
2025-09-10 6:47 ` [PATCH net 2/3] net: tls: Cancel RX async resync request on rdc_delta overflow Tariq Toukan
2025-09-12 15:14 ` Sabrina Dubroca
@ 2025-09-14 18:53 ` Jakub Kicinski
2025-09-22 7:18 ` Shahar Shitrit
1 sibling, 1 reply; 13+ messages in thread
From: Jakub Kicinski @ 2025-09-14 18:53 UTC (permalink / raw)
To: Tariq Toukan
Cc: Eric Dumazet, Paolo Abeni, Andrew Lunn, David S. Miller,
Saeed Mahameed, Leon Romanovsky, Mark Bloch, John Fastabend,
Sabrina Dubroca, netdev, linux-rdma, linux-kernel, Gal Pressman,
Boris Pismenny, Shahar Shitrit
On Wed, 10 Sep 2025 09:47:40 +0300 Tariq Toukan wrote:
> When a netdev issues an RX async resync request, the TLS module
> increments rcd_delta for each new record that arrives. This tracks
> how far the current record is from the point where synchronization
> was lost.
>
> When rcd_delta reaches its threshold, it indicates that the device
> response is either excessively delayed or unlikely to arrive at all
> (at that point, tcp_sn may have wrapped around, so a match would no
> longer be valid anyway).
>
> Previous patch introduced tls_offload_rx_resync_async_request_cancel()
> to explicitly cancel resync requests when a device response failure
> is detected.
>
> This patch adds a final safeguard: cancel the async resync request when
> rcd_delta crosses its threshold, as reaching this point implies that
> earlier cancellation did not occur.
Missing a Fixes tag
> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
> index f672a62a9a52..56c14f1647a4 100644
> --- a/net/tls/tls_device.c
> +++ b/net/tls/tls_device.c
> @@ -721,8 +721,11 @@ tls_device_rx_resync_async(struct tls_offload_resync_async *resync_async,
> /* shouldn't get to wraparound:
> * too long in async stage, something bad happened
> */
> - if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX))
> + if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX)) {
> + /* cancel resync request */
> + atomic64_set(&resync_async->req, 0);
we should probably use the helper added by the previous patch (I'd
probably squash them TBH)
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net 2/3] net: tls: Cancel RX async resync request on rdc_delta overflow
2025-09-12 15:14 ` Sabrina Dubroca
@ 2025-09-22 7:16 ` Shahar Shitrit
2025-09-23 19:16 ` Sabrina Dubroca
0 siblings, 1 reply; 13+ messages in thread
From: Shahar Shitrit @ 2025-09-22 7:16 UTC (permalink / raw)
To: Sabrina Dubroca, Tariq Toukan
Cc: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
David S. Miller, Saeed Mahameed, Leon Romanovsky, Mark Bloch,
John Fastabend, netdev, linux-rdma, linux-kernel, Gal Pressman,
Boris Pismenny
On 12/09/2025 18:14, Sabrina Dubroca wrote:
> 2025-09-10, 09:47:40 +0300, Tariq Toukan wrote:
>> From: Shahar Shitrit <shshitrit@nvidia.com>
>>
>> When a netdev issues an RX async resync request, the TLS module
>> increments rcd_delta for each new record that arrives. This tracks
>> how far the current record is from the point where synchronization
>> was lost.
>>
>> When rcd_delta reaches its threshold, it indicates that the device
>> response is either excessively delayed or unlikely to arrive at all
>> (at that point, tcp_sn may have wrapped around, so a match would no
>> longer be valid anyway).
>>
>> Previous patch introduced tls_offload_rx_resync_async_request_cancel()
>> to explicitly cancel resync requests when a device response failure
>> is detected.
>>
>> This patch adds a final safeguard: cancel the async resync request when
>> rcd_delta crosses its threshold, as reaching this point implies that
>> earlier cancellation did not occur.
>>
>> Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
>> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
>> ---
>> net/tls/tls_device.c | 5 ++++-
>> 1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
>> index f672a62a9a52..56c14f1647a4 100644
>> --- a/net/tls/tls_device.c
>> +++ b/net/tls/tls_device.c
>> @@ -721,8 +721,11 @@ tls_device_rx_resync_async(struct tls_offload_resync_async *resync_async,
>> /* shouldn't get to wraparound:
>> * too long in async stage, something bad happened
>> */
>> - if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX))
>> + if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX)) {
>
> Do we still need to WARN here? It's a condition that can actually
> happen (even if it's rare), and that the stack can handle, so maybe
> not?
>
You are right that now the stack handles this, but removing the WARN
without any alternative, will remove any indication that something went
wrong and will prevent us from improving by searching the error flow
where we didn't cancel the request before reaching here. We can maybe
replace the WARN with a counter. what do you think?
>> + /* cancel resync request */
>> + atomic64_set(&resync_async->req, 0);
>> return false;
>> + }
>>
>> /* asynchronous stage: log all headers seq such that
>> * req_seq <= seq <= end_seq, and wait for real resync request
>> --
>> 2.31.1
>>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net 2/3] net: tls: Cancel RX async resync request on rdc_delta overflow
2025-09-14 18:53 ` Jakub Kicinski
@ 2025-09-22 7:18 ` Shahar Shitrit
2025-09-22 15:54 ` Sabrina Dubroca
0 siblings, 1 reply; 13+ messages in thread
From: Shahar Shitrit @ 2025-09-22 7:18 UTC (permalink / raw)
To: Jakub Kicinski, Tariq Toukan
Cc: Eric Dumazet, Paolo Abeni, Andrew Lunn, David S. Miller,
Saeed Mahameed, Leon Romanovsky, Mark Bloch, John Fastabend,
Sabrina Dubroca, netdev, linux-rdma, linux-kernel, Gal Pressman,
Boris Pismenny
On 14/09/2025 21:53, Jakub Kicinski wrote:
> On Wed, 10 Sep 2025 09:47:40 +0300 Tariq Toukan wrote:
>> When a netdev issues an RX async resync request, the TLS module
>> increments rcd_delta for each new record that arrives. This tracks
>> how far the current record is from the point where synchronization
>> was lost.
>>
>> When rcd_delta reaches its threshold, it indicates that the device
>> response is either excessively delayed or unlikely to arrive at all
>> (at that point, tcp_sn may have wrapped around, so a match would no
>> longer be valid anyway).
>>
>> Previous patch introduced tls_offload_rx_resync_async_request_cancel()
>> to explicitly cancel resync requests when a device response failure
>> is detected.
>>
>> This patch adds a final safeguard: cancel the async resync request when
>> rcd_delta crosses its threshold, as reaching this point implies that
>> earlier cancellation did not occur.
>
> Missing a Fixes tag
Will add
>
>> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
>> index f672a62a9a52..56c14f1647a4 100644
>> --- a/net/tls/tls_device.c
>> +++ b/net/tls/tls_device.c
>> @@ -721,8 +721,11 @@ tls_device_rx_resync_async(struct tls_offload_resync_async *resync_async,
>> /* shouldn't get to wraparound:
>> * too long in async stage, something bad happened
>> */
>> - if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX))
>> + if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX)) {
>> + /* cancel resync request */
>> + atomic64_set(&resync_async->req, 0);
>
> we should probably use the helper added by the previous patch (I'd
> probably squash them TBH)
It's not trivial to use the helper here, since we don't have the socket.
We can maybe add another inner helper that performs the 0 setting and
call it here and inside the helper introduced in previous patch.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net 2/3] net: tls: Cancel RX async resync request on rdc_delta overflow
2025-09-22 7:18 ` Shahar Shitrit
@ 2025-09-22 15:54 ` Sabrina Dubroca
2025-09-28 6:35 ` Shahar Shitrit
0 siblings, 1 reply; 13+ messages in thread
From: Sabrina Dubroca @ 2025-09-22 15:54 UTC (permalink / raw)
To: Shahar Shitrit
Cc: Jakub Kicinski, Tariq Toukan, Eric Dumazet, Paolo Abeni,
Andrew Lunn, David S. Miller, Saeed Mahameed, Leon Romanovsky,
Mark Bloch, John Fastabend, netdev, linux-rdma, linux-kernel,
Gal Pressman, Boris Pismenny
2025-09-22, 10:18:52 +0300, Shahar Shitrit wrote:
>
>
> On 14/09/2025 21:53, Jakub Kicinski wrote:
> > On Wed, 10 Sep 2025 09:47:40 +0300 Tariq Toukan wrote:
> >> When a netdev issues an RX async resync request, the TLS module
> >> increments rcd_delta for each new record that arrives. This tracks
> >> how far the current record is from the point where synchronization
> >> was lost.
> >>
> >> When rcd_delta reaches its threshold, it indicates that the device
> >> response is either excessively delayed or unlikely to arrive at all
> >> (at that point, tcp_sn may have wrapped around, so a match would no
> >> longer be valid anyway).
> >>
> >> Previous patch introduced tls_offload_rx_resync_async_request_cancel()
> >> to explicitly cancel resync requests when a device response failure
> >> is detected.
> >>
> >> This patch adds a final safeguard: cancel the async resync request when
> >> rcd_delta crosses its threshold, as reaching this point implies that
> >> earlier cancellation did not occur.
> >
> > Missing a Fixes tag
> Will add
> >
> >> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
> >> index f672a62a9a52..56c14f1647a4 100644
> >> --- a/net/tls/tls_device.c
> >> +++ b/net/tls/tls_device.c
> >> @@ -721,8 +721,11 @@ tls_device_rx_resync_async(struct tls_offload_resync_async *resync_async,
> >> /* shouldn't get to wraparound:
> >> * too long in async stage, something bad happened
> >> */
> >> - if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX))
> >> + if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX)) {
> >> + /* cancel resync request */
> >> + atomic64_set(&resync_async->req, 0);
> >
> > we should probably use the helper added by the previous patch (I'd
> > probably squash them TBH)
>
> It's not trivial to use the helper here, since we don't have the socket.
tls_device_rx_resync_async doesn't currently get the socket, but it
has only one caller, tls_device_rx_resync_new_rec, which does. So
tls_device_rx_resync_async could easily get the socket. Or just pass
resync_async to tls_offload_rx_resync_async_request_cancel, since
that's what it really needs?
--
Sabrina
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net 2/3] net: tls: Cancel RX async resync request on rdc_delta overflow
2025-09-22 7:16 ` Shahar Shitrit
@ 2025-09-23 19:16 ` Sabrina Dubroca
2025-10-20 8:02 ` Shahar Shitrit
0 siblings, 1 reply; 13+ messages in thread
From: Sabrina Dubroca @ 2025-09-23 19:16 UTC (permalink / raw)
To: Shahar Shitrit, Jakub Kicinski
Cc: Tariq Toukan, Eric Dumazet, Paolo Abeni, Andrew Lunn,
David S. Miller, Saeed Mahameed, Leon Romanovsky, Mark Bloch,
John Fastabend, netdev, linux-rdma, linux-kernel, Gal Pressman,
Boris Pismenny
2025-09-22, 10:16:21 +0300, Shahar Shitrit wrote:
>
>
> On 12/09/2025 18:14, Sabrina Dubroca wrote:
> > 2025-09-10, 09:47:40 +0300, Tariq Toukan wrote:
> >> From: Shahar Shitrit <shshitrit@nvidia.com>
> >>
> >> When a netdev issues an RX async resync request, the TLS module
> >> increments rcd_delta for each new record that arrives. This tracks
> >> how far the current record is from the point where synchronization
> >> was lost.
> >>
> >> When rcd_delta reaches its threshold, it indicates that the device
> >> response is either excessively delayed or unlikely to arrive at all
> >> (at that point, tcp_sn may have wrapped around, so a match would no
> >> longer be valid anyway).
> >>
> >> Previous patch introduced tls_offload_rx_resync_async_request_cancel()
> >> to explicitly cancel resync requests when a device response failure
> >> is detected.
> >>
> >> This patch adds a final safeguard: cancel the async resync request when
> >> rcd_delta crosses its threshold, as reaching this point implies that
> >> earlier cancellation did not occur.
> >>
> >> Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
> >> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> >> ---
> >> net/tls/tls_device.c | 5 ++++-
> >> 1 file changed, 4 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
> >> index f672a62a9a52..56c14f1647a4 100644
> >> --- a/net/tls/tls_device.c
> >> +++ b/net/tls/tls_device.c
> >> @@ -721,8 +721,11 @@ tls_device_rx_resync_async(struct tls_offload_resync_async *resync_async,
> >> /* shouldn't get to wraparound:
> >> * too long in async stage, something bad happened
> >> */
> >> - if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX))
> >> + if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX)) {
> >
> > Do we still need to WARN here? It's a condition that can actually
> > happen (even if it's rare), and that the stack can handle, so maybe
> > not?
> >
> You are right that now the stack handles this, but removing the WARN
> without any alternative, will remove any indication that something went
> wrong and will prevent us from improving by searching the error flow
> where we didn't cancel the request before reaching here. We can maybe
> replace the WARN with a counter. what do you think?
Do you use CONFIG_DEBUG_NET in your devel/test kernels? If so,
DEBUG_NET_WARN_ONCE would be an option. Or is it more so that
users/customers can report the problem (ie on production kernels
without CONFIG_DEBUG_NET) - in that case, the counter would work
better.
But if you really think that this condition indicates a driver bug,
maybe the WARN is still appropriate. Jakub, what do you think?
BTW, I was also thinking that the documentation
(Documentation/networking/tls-offload.rst) could maybe be improved a
bit with a description of how async resync works and how the driver is
expected to use the tls_offload_rx_resync_async_request_{start,end}
(and now _cancel) helpers. The section on "Stream scan
resynchronization" is pretty abstract.
--
Sabrina
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net 2/3] net: tls: Cancel RX async resync request on rdc_delta overflow
2025-09-22 15:54 ` Sabrina Dubroca
@ 2025-09-28 6:35 ` Shahar Shitrit
2025-09-29 9:54 ` Sabrina Dubroca
0 siblings, 1 reply; 13+ messages in thread
From: Shahar Shitrit @ 2025-09-28 6:35 UTC (permalink / raw)
To: Sabrina Dubroca
Cc: Jakub Kicinski, Tariq Toukan, Eric Dumazet, Paolo Abeni,
Andrew Lunn, David S. Miller, Saeed Mahameed, Leon Romanovsky,
Mark Bloch, John Fastabend, netdev, linux-rdma, linux-kernel,
Gal Pressman, Boris Pismenny
On 22/09/2025 18:54, Sabrina Dubroca wrote:
> 2025-09-22, 10:18:52 +0300, Shahar Shitrit wrote:
>>
>>
>> On 14/09/2025 21:53, Jakub Kicinski wrote:
>>> On Wed, 10 Sep 2025 09:47:40 +0300 Tariq Toukan wrote:
>>>> When a netdev issues an RX async resync request, the TLS module
>>>> increments rcd_delta for each new record that arrives. This tracks
>>>> how far the current record is from the point where synchronization
>>>> was lost.
>>>>
>>>> When rcd_delta reaches its threshold, it indicates that the device
>>>> response is either excessively delayed or unlikely to arrive at all
>>>> (at that point, tcp_sn may have wrapped around, so a match would no
>>>> longer be valid anyway).
>>>>
>>>> Previous patch introduced tls_offload_rx_resync_async_request_cancel()
>>>> to explicitly cancel resync requests when a device response failure
>>>> is detected.
>>>>
>>>> This patch adds a final safeguard: cancel the async resync request when
>>>> rcd_delta crosses its threshold, as reaching this point implies that
>>>> earlier cancellation did not occur.
>>>
>>> Missing a Fixes tag
>> Will add
>>>
>>>> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
>>>> index f672a62a9a52..56c14f1647a4 100644
>>>> --- a/net/tls/tls_device.c
>>>> +++ b/net/tls/tls_device.c
>>>> @@ -721,8 +721,11 @@ tls_device_rx_resync_async(struct tls_offload_resync_async *resync_async,
>>>> /* shouldn't get to wraparound:
>>>> * too long in async stage, something bad happened
>>>> */
>>>> - if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX))
>>>> + if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX)) {
>>>> + /* cancel resync request */
>>>> + atomic64_set(&resync_async->req, 0);
>>>
>>> we should probably use the helper added by the previous patch (I'd
>>> probably squash them TBH)
>>
>> It's not trivial to use the helper here, since we don't have the socket.
>
> tls_device_rx_resync_async doesn't currently get the socket, but it
> has only one caller, tls_device_rx_resync_new_rec, which does. So
> tls_device_rx_resync_async could easily get the socket. Or just pass
> resync_async to tls_offload_rx_resync_async_request_cancel, since
> that's what it really needs?
>
yes these are options, but we don't like too much passing the socket to
tls_device_rx_resync_new_rec() merely for this matter. Also we wanted to
keep tls_offload_rx_resync_async_request_cancel in the same format of
tls_offload_rx_resync_async_request_start/end meaning to have the socket
as a parameter.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net 2/3] net: tls: Cancel RX async resync request on rdc_delta overflow
2025-09-28 6:35 ` Shahar Shitrit
@ 2025-09-29 9:54 ` Sabrina Dubroca
0 siblings, 0 replies; 13+ messages in thread
From: Sabrina Dubroca @ 2025-09-29 9:54 UTC (permalink / raw)
To: Shahar Shitrit
Cc: Jakub Kicinski, Tariq Toukan, Eric Dumazet, Paolo Abeni,
Andrew Lunn, David S. Miller, Saeed Mahameed, Leon Romanovsky,
Mark Bloch, John Fastabend, netdev, linux-rdma, linux-kernel,
Gal Pressman, Boris Pismenny
2025-09-28, 09:35:48 +0300, Shahar Shitrit wrote:
>
>
> On 22/09/2025 18:54, Sabrina Dubroca wrote:
> > 2025-09-22, 10:18:52 +0300, Shahar Shitrit wrote:
> >>
> >>
> >> On 14/09/2025 21:53, Jakub Kicinski wrote:
> >>> On Wed, 10 Sep 2025 09:47:40 +0300 Tariq Toukan wrote:
> >>>> When a netdev issues an RX async resync request, the TLS module
> >>>> increments rcd_delta for each new record that arrives. This tracks
> >>>> how far the current record is from the point where synchronization
> >>>> was lost.
> >>>>
> >>>> When rcd_delta reaches its threshold, it indicates that the device
> >>>> response is either excessively delayed or unlikely to arrive at all
> >>>> (at that point, tcp_sn may have wrapped around, so a match would no
> >>>> longer be valid anyway).
> >>>>
> >>>> Previous patch introduced tls_offload_rx_resync_async_request_cancel()
> >>>> to explicitly cancel resync requests when a device response failure
> >>>> is detected.
> >>>>
> >>>> This patch adds a final safeguard: cancel the async resync request when
> >>>> rcd_delta crosses its threshold, as reaching this point implies that
> >>>> earlier cancellation did not occur.
> >>>
> >>> Missing a Fixes tag
> >> Will add
> >>>
> >>>> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
> >>>> index f672a62a9a52..56c14f1647a4 100644
> >>>> --- a/net/tls/tls_device.c
> >>>> +++ b/net/tls/tls_device.c
> >>>> @@ -721,8 +721,11 @@ tls_device_rx_resync_async(struct tls_offload_resync_async *resync_async,
> >>>> /* shouldn't get to wraparound:
> >>>> * too long in async stage, something bad happened
> >>>> */
> >>>> - if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX))
> >>>> + if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX)) {
> >>>> + /* cancel resync request */
> >>>> + atomic64_set(&resync_async->req, 0);
> >>>
> >>> we should probably use the helper added by the previous patch (I'd
> >>> probably squash them TBH)
> >>
> >> It's not trivial to use the helper here, since we don't have the socket.
> >
> > tls_device_rx_resync_async doesn't currently get the socket, but it
> > has only one caller, tls_device_rx_resync_new_rec, which does. So
> > tls_device_rx_resync_async could easily get the socket. Or just pass
> > resync_async to tls_offload_rx_resync_async_request_cancel, since
> > that's what it really needs?
> >
> yes these are options, but we don't like too much passing the socket to
> tls_device_rx_resync_new_rec() merely for this matter.
Why not? If you felt the need to add a comment saying we're canceling
the request, using a helper instead that says it does the canceling is
a pretty decent reason to add whatever argument
tls_device_rx_resync_async needs (or swap resync_async for the socket
if you don't want to add another argument).
> Also we wanted to
> keep tls_offload_rx_resync_async_request_cancel in the same format of
> tls_offload_rx_resync_async_request_start/end meaning to have the socket
> as a parameter.
Then they could easily be changed to make the 3 helpers consistent
(all taking resync_async), since
tls_offload_rx_resync_async_request_start/end are used exactly once
each.
--
Sabrina
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net 2/3] net: tls: Cancel RX async resync request on rdc_delta overflow
2025-09-23 19:16 ` Sabrina Dubroca
@ 2025-10-20 8:02 ` Shahar Shitrit
0 siblings, 0 replies; 13+ messages in thread
From: Shahar Shitrit @ 2025-10-20 8:02 UTC (permalink / raw)
To: Sabrina Dubroca, Jakub Kicinski
Cc: Tariq Toukan, Eric Dumazet, Paolo Abeni, Andrew Lunn,
David S. Miller, Saeed Mahameed, Leon Romanovsky, Mark Bloch,
John Fastabend, netdev, linux-rdma, linux-kernel, Gal Pressman,
Boris Pismenny
On 23/09/2025 22:16, Sabrina Dubroca wrote:
> 2025-09-22, 10:16:21 +0300, Shahar Shitrit wrote:
>>
>>
>> On 12/09/2025 18:14, Sabrina Dubroca wrote:
>>> 2025-09-10, 09:47:40 +0300, Tariq Toukan wrote:
>>>> From: Shahar Shitrit <shshitrit@nvidia.com>
>>>>
>>>> When a netdev issues an RX async resync request, the TLS module
>>>> increments rcd_delta for each new record that arrives. This tracks
>>>> how far the current record is from the point where synchronization
>>>> was lost.
>>>>
>>>> When rcd_delta reaches its threshold, it indicates that the device
>>>> response is either excessively delayed or unlikely to arrive at all
>>>> (at that point, tcp_sn may have wrapped around, so a match would no
>>>> longer be valid anyway).
>>>>
>>>> Previous patch introduced tls_offload_rx_resync_async_request_cancel()
>>>> to explicitly cancel resync requests when a device response failure
>>>> is detected.
>>>>
>>>> This patch adds a final safeguard: cancel the async resync request when
>>>> rcd_delta crosses its threshold, as reaching this point implies that
>>>> earlier cancellation did not occur.
>>>>
>>>> Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
>>>> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
>>>> ---
>>>> net/tls/tls_device.c | 5 ++++-
>>>> 1 file changed, 4 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
>>>> index f672a62a9a52..56c14f1647a4 100644
>>>> --- a/net/tls/tls_device.c
>>>> +++ b/net/tls/tls_device.c
>>>> @@ -721,8 +721,11 @@ tls_device_rx_resync_async(struct tls_offload_resync_async *resync_async,
>>>> /* shouldn't get to wraparound:
>>>> * too long in async stage, something bad happened
>>>> */
>>>> - if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX))
>>>> + if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX)) {
>>>
>>> Do we still need to WARN here? It's a condition that can actually
>>> happen (even if it's rare), and that the stack can handle, so maybe
>>> not?
>>>
>> You are right that now the stack handles this, but removing the WARN
>> without any alternative, will remove any indication that something went
>> wrong and will prevent us from improving by searching the error flow
>> where we didn't cancel the request before reaching here. We can maybe
>> replace the WARN with a counter. what do you think?
>
> Do you use CONFIG_DEBUG_NET in your devel/test kernels? If so,
> DEBUG_NET_WARN_ONCE would be an option. Or is it more so that
> users/customers can report the problem (ie on production kernels
> without CONFIG_DEBUG_NET) - in that case, the counter would work
> better.
> But if you really think that this condition indicates a driver bug,
> maybe the WARN is still appropriate. Jakub, what do you think?
>
>
> BTW, I was also thinking that the documentation
> (Documentation/networking/tls-offload.rst) could maybe be improved a
> bit with a description of how async resync works and how the driver is
> expected to use the tls_offload_rx_resync_async_request_{start,end}
> (and now _cancel) helpers. The section on "Stream scan
> resynchronization" is pretty abstract.
>
We will submit documentation enhancement in a separate patch to net-next.
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2025-10-20 8:02 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-10 6:47 [PATCH net 0/3] tls: Introduce and use RX async resync request cancel function Tariq Toukan
2025-09-10 6:47 ` [PATCH net 1/3] net: tls: Introduce " Tariq Toukan
2025-09-10 6:47 ` [PATCH net 2/3] net: tls: Cancel RX async resync request on rdc_delta overflow Tariq Toukan
2025-09-12 15:14 ` Sabrina Dubroca
2025-09-22 7:16 ` Shahar Shitrit
2025-09-23 19:16 ` Sabrina Dubroca
2025-10-20 8:02 ` Shahar Shitrit
2025-09-14 18:53 ` Jakub Kicinski
2025-09-22 7:18 ` Shahar Shitrit
2025-09-22 15:54 ` Sabrina Dubroca
2025-09-28 6:35 ` Shahar Shitrit
2025-09-29 9:54 ` Sabrina Dubroca
2025-09-10 6:47 ` [PATCH net 3/3] net/mlx5e: kTLS, cancel RX async resync request in error flows Tariq Toukan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).