* [PATCH net] net/tls: Fix race in TLS device down flow
@ 2022-07-15 8:42 Tariq Toukan
2022-07-15 23:38 ` Jakub Kicinski
2022-07-18 10:50 ` patchwork-bot+netdevbpf
0 siblings, 2 replies; 3+ messages in thread
From: Tariq Toukan @ 2022-07-15 8:42 UTC (permalink / raw)
To: Boris Pismenny, John Fastabend, Jakub Kicinski
Cc: David S. Miller, Eric Dumazet, Paolo Abeni, netdev,
Saeed Mahameed, Gal Pressman, Tariq Toukan, Maxim Mikityanskiy
Socket destruction flow and tls_device_down function sync against each
other using tls_device_lock and the context refcount, to guarantee the
device resources are freed via tls_dev_del() by the end of
tls_device_down.
In the following unfortunate flow, this won't happen:
- refcount is decreased to zero in tls_device_sk_destruct.
- tls_device_down starts, skips the context as refcount is zero, going
all the way until it flushes the gc work, and returns without freeing
the device resources.
- only then, tls_device_queue_ctx_destruction is called, queues the gc
work and frees the context's device resources.
Solve it by decreasing the refcount in the socket's destruction flow
under the tls_device_lock, for perfect synchronization. This does not
slow down the common likely destructor flow, in which both the refcount
is decreased and the spinlock is acquired, anyway.
Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure")
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
net/tls/tls_device.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index ce827e79c66a..879b9024678e 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -97,13 +97,16 @@ static void tls_device_queue_ctx_destruction(struct tls_context *ctx)
unsigned long flags;
spin_lock_irqsave(&tls_device_lock, flags);
+ if (unlikely(!refcount_dec_and_test(&ctx->refcount)))
+ goto unlock;
+
list_move_tail(&ctx->list, &tls_device_gc_list);
/* schedule_work inside the spinlock
* to make sure tls_device_down waits for that work.
*/
schedule_work(&tls_device_gc_work);
-
+unlock:
spin_unlock_irqrestore(&tls_device_lock, flags);
}
@@ -194,8 +197,7 @@ void tls_device_sk_destruct(struct sock *sk)
clean_acked_data_disable(inet_csk(sk));
}
- if (refcount_dec_and_test(&tls_ctx->refcount))
- tls_device_queue_ctx_destruction(tls_ctx);
+ tls_device_queue_ctx_destruction(tls_ctx);
}
EXPORT_SYMBOL_GPL(tls_device_sk_destruct);
--
2.21.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH net] net/tls: Fix race in TLS device down flow
2022-07-15 8:42 [PATCH net] net/tls: Fix race in TLS device down flow Tariq Toukan
@ 2022-07-15 23:38 ` Jakub Kicinski
2022-07-18 10:50 ` patchwork-bot+netdevbpf
1 sibling, 0 replies; 3+ messages in thread
From: Jakub Kicinski @ 2022-07-15 23:38 UTC (permalink / raw)
To: Tariq Toukan
Cc: Boris Pismenny, John Fastabend, David S. Miller, Eric Dumazet,
Paolo Abeni, netdev, Saeed Mahameed, Gal Pressman,
Maxim Mikityanskiy
On Fri, 15 Jul 2022 11:42:16 +0300 Tariq Toukan wrote:
> Socket destruction flow and tls_device_down function sync against each
> other using tls_device_lock and the context refcount, to guarantee the
> device resources are freed via tls_dev_del() by the end of
> tls_device_down.
>
> In the following unfortunate flow, this won't happen:
> - refcount is decreased to zero in tls_device_sk_destruct.
> - tls_device_down starts, skips the context as refcount is zero, going
> all the way until it flushes the gc work, and returns without freeing
> the device resources.
> - only then, tls_device_queue_ctx_destruction is called, queues the gc
> work and frees the context's device resources.
>
> Solve it by decreasing the refcount in the socket's destruction flow
> under the tls_device_lock, for perfect synchronization. This does not
> slow down the common likely destructor flow, in which both the refcount
> is decreased and the spinlock is acquired, anyway.
>
> Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure")
> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Oh, so it was already racy? Sad this has missed the PR, another delay
for your -next patches :S
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH net] net/tls: Fix race in TLS device down flow
2022-07-15 8:42 [PATCH net] net/tls: Fix race in TLS device down flow Tariq Toukan
2022-07-15 23:38 ` Jakub Kicinski
@ 2022-07-18 10:50 ` patchwork-bot+netdevbpf
1 sibling, 0 replies; 3+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-07-18 10:50 UTC (permalink / raw)
To: Tariq Toukan
Cc: borisp, john.fastabend, kuba, davem, edumazet, pabeni, netdev,
saeedm, gal, maximmi
Hello:
This patch was applied to netdev/net.git (master)
by David S. Miller <davem@davemloft.net>:
On Fri, 15 Jul 2022 11:42:16 +0300 you wrote:
> Socket destruction flow and tls_device_down function sync against each
> other using tls_device_lock and the context refcount, to guarantee the
> device resources are freed via tls_dev_del() by the end of
> tls_device_down.
>
> In the following unfortunate flow, this won't happen:
> - refcount is decreased to zero in tls_device_sk_destruct.
> - tls_device_down starts, skips the context as refcount is zero, going
> all the way until it flushes the gc work, and returns without freeing
> the device resources.
> - only then, tls_device_queue_ctx_destruction is called, queues the gc
> work and frees the context's device resources.
>
> [...]
Here is the summary with links:
- [net] net/tls: Fix race in TLS device down flow
https://git.kernel.org/netdev/net/c/f08d8c1bb97c
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2022-07-18 10:50 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-07-15 8:42 [PATCH net] net/tls: Fix race in TLS device down flow Tariq Toukan
2022-07-15 23:38 ` Jakub Kicinski
2022-07-18 10:50 ` patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).