[PATCH bpf 2/3] bpf: sockmap, do not inc copied_seq when PEEK flag set

All of lore.kernel.org
 help / color / mirror / Atom feed

From: John Fastabend <john.fastabend@gmail.com>
To: daniel@iogearbox.net, ast@kernel.org, andrii@kernel.org,
	jakub@cloudflare.com
Cc: john.fastabend@gmail.com, bpf@vger.kernel.org, netdev@vger.kernel.org
Subject: [PATCH bpf 2/3] bpf: sockmap, do not inc copied_seq when PEEK flag set
Date: Wed, 20 Sep 2023 16:27:05 -0700	[thread overview]
Message-ID: <20230920232706.498747-3-john.fastabend@gmail.com> (raw)
In-Reply-To: <20230920232706.498747-1-john.fastabend@gmail.com>

When data is peek'd off the receive queue we shouldn't considered it
copied from tcp_sock side. When we increment copied_seq this will confuse
tcp_data_ready() because copied_seq can be arbitrarily increased. From]
application side it results in poll() operations not waking up when
expected.

Notice tcp stack without BPF recvmsg programs also does not increment
copied_seq.

We broke this when we moved copied_seq into recvmsg to only update when
actual copy was happening. But, it wasn't working correctly either before
because the tcp_data_ready() tried to use the copied_seq value to see
if data was read by user yet. See fixes tags.

Fixes: e5c6de5fa0258 ("bpf, sockmap: Incorrectly handling copied_seq")
Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 net/ipv4/tcp_bpf.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
index 81f0dff69e0b..327268203001 100644
--- a/net/ipv4/tcp_bpf.c
+++ b/net/ipv4/tcp_bpf.c
@@ -222,6 +222,7 @@ static int tcp_bpf_recvmsg_parser(struct sock *sk,
 				  int *addr_len)
 {
 	struct tcp_sock *tcp = tcp_sk(sk);
+	int peek = flags & MSG_PEEK;
 	u32 seq = tcp->copied_seq;
 	struct sk_psock *psock;
 	int copied = 0;
@@ -311,7 +312,8 @@ static int tcp_bpf_recvmsg_parser(struct sock *sk,
 		copied = -EAGAIN;
 	}
 out:
-	WRITE_ONCE(tcp->copied_seq, seq);
+	if (!peek)
+		WRITE_ONCE(tcp->copied_seq, seq);
 	tcp_rcv_space_adjust(sk);
 	if (copied > 0)
 		__tcp_cleanup_rbuf(sk, copied);
-- 
2.33.0

next prev parent reply	other threads:[~2023-09-20 23:27 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-20 23:27 [PATCH bpf 0/3] bpf, sockmap complete fixes for avail bytes John Fastabend
2023-09-20 23:27 ` [PATCH bpf 1/3] bpf: tcp_read_skb needs to pop skb regardless of seq John Fastabend
2023-09-21 21:08   ` Simon Horman
2023-09-21 21:23     ` John Fastabend
2023-09-23 14:37   ` kernel test robot
2023-09-20 23:27 ` John Fastabend [this message]
2023-09-22 10:23   ` [PATCH bpf 2/3] bpf: sockmap, do not inc copied_seq when PEEK flag set Jakub Sitnicki
2023-09-20 23:27 ` [PATCH bpf 3/3] bpf: sockmap, add tests for MSG_F_PEEK John Fastabend
2023-09-22 11:06   ` Jakub Sitnicki

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:81f0dff69e0 dfblob:32726820300 )
 OR (
bs:"[PATCH bpf 2/3] bpf: sockmap, do not inc copied_seq when PEEK flag set" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230920232706.498747-3-john.fastabend@gmail.com \
    --to=john.fastabend@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=jakub@cloudflare.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.