* [PATCH net-next] tcp: avoid premature drops in tcp_add_backlog()
@ 2024-04-23 12:56 Eric Dumazet
2024-04-25 18:46 ` Jakub Kicinski
2024-04-25 19:30 ` patchwork-bot+netdevbpf
0 siblings, 2 replies; 4+ messages in thread
From: Eric Dumazet @ 2024-04-23 12:56 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Soheil Hassas Yeganeh, Neal Cardwell, netdev, eric.dumazet,
Eric Dumazet
While testing TCP performance with latest trees,
I saw suspect SOCKET_BACKLOG drops.
tcp_add_backlog() computes its limit with :
limit = (u32)READ_ONCE(sk->sk_rcvbuf) +
(u32)(READ_ONCE(sk->sk_sndbuf) >> 1);
limit += 64 * 1024;
This does not take into account that sk->sk_backlog.len
is reset only at the very end of __release_sock().
Both sk->sk_backlog.len and sk->sk_rmem_alloc could reach
sk_rcvbuf in normal conditions.
We should double sk->sk_rcvbuf contribution in the formula
to absorb bubbles in the backlog, which happen more often
for very fast flows.
This change maintains decent protection against abuses.
Fixes: c377411f2494 ("net: sk_add_backlog() take rmem_alloc into account")
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/ipv4/tcp_ipv4.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 88c83ac4212957f19efad0f967952d2502bdbc7f..e06f0cd04f7eee2b00fcaebe17cbd23c26f1d28f 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1995,7 +1995,7 @@ int tcp_v4_early_demux(struct sk_buff *skb)
bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb,
enum skb_drop_reason *reason)
{
- u32 limit, tail_gso_size, tail_gso_segs;
+ u32 tail_gso_size, tail_gso_segs;
struct skb_shared_info *shinfo;
const struct tcphdr *th;
struct tcphdr *thtail;
@@ -2004,6 +2004,7 @@ bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb,
bool fragstolen;
u32 gso_segs;
u32 gso_size;
+ u64 limit;
int delta;
/* In case all data was pulled from skb frags (in __pskb_pull_tail()),
@@ -2099,7 +2100,13 @@ bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb,
__skb_push(skb, hdrlen);
no_coalesce:
- limit = (u32)READ_ONCE(sk->sk_rcvbuf) + (u32)(READ_ONCE(sk->sk_sndbuf) >> 1);
+ /* sk->sk_backlog.len is reset only at the end of __release_sock().
+ * Both sk->sk_backlog.len and sk->sk_rmem_alloc could reach
+ * sk_rcvbuf in normal conditions.
+ */
+ limit = ((u64)READ_ONCE(sk->sk_rcvbuf)) << 1;
+
+ limit += ((u32)READ_ONCE(sk->sk_sndbuf)) >> 1;
/* Only socket owner can try to collapse/prune rx queues
* to reduce memory overhead, so add a little headroom here.
@@ -2107,6 +2114,8 @@ bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb,
*/
limit += 64 * 1024;
+ limit = min_t(u64, limit, UINT_MAX);
+
if (unlikely(sk_add_backlog(sk, skb, limit))) {
bh_unlock_sock(sk);
*reason = SKB_DROP_REASON_SOCKET_BACKLOG;
--
2.44.0.769.g3c40516874-goog
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH net-next] tcp: avoid premature drops in tcp_add_backlog()
2024-04-23 12:56 [PATCH net-next] tcp: avoid premature drops in tcp_add_backlog() Eric Dumazet
@ 2024-04-25 18:46 ` Jakub Kicinski
2024-04-25 19:02 ` Eric Dumazet
2024-04-25 19:30 ` patchwork-bot+netdevbpf
1 sibling, 1 reply; 4+ messages in thread
From: Jakub Kicinski @ 2024-04-25 18:46 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Paolo Abeni, Soheil Hassas Yeganeh,
Neal Cardwell, netdev, eric.dumazet
On Tue, 23 Apr 2024 12:56:20 +0000 Eric Dumazet wrote:
> Subject: [PATCH net-next] tcp: avoid premature drops in tcp_add_backlog()
This is intentionally for net-next?
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH net-next] tcp: avoid premature drops in tcp_add_backlog()
2024-04-25 18:46 ` Jakub Kicinski
@ 2024-04-25 19:02 ` Eric Dumazet
0 siblings, 0 replies; 4+ messages in thread
From: Eric Dumazet @ 2024-04-25 19:02 UTC (permalink / raw)
To: Jakub Kicinski
Cc: David S . Miller, Paolo Abeni, Soheil Hassas Yeganeh,
Neal Cardwell, netdev, eric.dumazet
On Thu, Apr 25, 2024 at 8:46 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 23 Apr 2024 12:56:20 +0000 Eric Dumazet wrote:
> > Subject: [PATCH net-next] tcp: avoid premature drops in tcp_add_backlog()
>
> This is intentionally for net-next?
Yes, this has been broken for a long time.
We can soak this a bit, then it will reach stable trees when this
reaches Linus tree ?
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH net-next] tcp: avoid premature drops in tcp_add_backlog()
2024-04-23 12:56 [PATCH net-next] tcp: avoid premature drops in tcp_add_backlog() Eric Dumazet
2024-04-25 18:46 ` Jakub Kicinski
@ 2024-04-25 19:30 ` patchwork-bot+netdevbpf
1 sibling, 0 replies; 4+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-04-25 19:30 UTC (permalink / raw)
To: Eric Dumazet; +Cc: davem, kuba, pabeni, soheil, ncardwell, netdev, eric.dumazet
Hello:
This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Tue, 23 Apr 2024 12:56:20 +0000 you wrote:
> While testing TCP performance with latest trees,
> I saw suspect SOCKET_BACKLOG drops.
>
> tcp_add_backlog() computes its limit with :
>
> limit = (u32)READ_ONCE(sk->sk_rcvbuf) +
> (u32)(READ_ONCE(sk->sk_sndbuf) >> 1);
> limit += 64 * 1024;
>
> [...]
Here is the summary with links:
- [net-next] tcp: avoid premature drops in tcp_add_backlog()
https://git.kernel.org/netdev/net-next/c/ec00ed472bdb
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-04-25 19:30 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-23 12:56 [PATCH net-next] tcp: avoid premature drops in tcp_add_backlog() Eric Dumazet
2024-04-25 18:46 ` Jakub Kicinski
2024-04-25 19:02 ` Eric Dumazet
2024-04-25 19:30 ` patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).