linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v2 0/6] tcp: support preloading data on a listening socket
@ 2025-05-21 11:44 Jeremy Harris
  2025-05-21 11:45 ` [PATCH net-next v2 1/6] tcp: support writing to a socket in listening state Jeremy Harris
                   ` (5 more replies)
  0 siblings, 6 replies; 8+ messages in thread
From: Jeremy Harris @ 2025-05-21 11:44 UTC (permalink / raw)
  To: netdev; +Cc: linux-api, edumazet, ncardwell, Jeremy Harris

v2 changes:
  - Split out the preload operation to a separate routine from
    tcp_sendmsg_locked() and restrict from looping over the supplied
    iovec

------
Support write to a listen TCP socket, for immediate
transmission on all later passive connection establishments
parented by the listen socket.

On a normal connection transmission of the data is triggered by the receipt
of the 3rd-ack. On a fastopen (with accepted cookie) connection the data
is sent in the synack packet.

The data preload is done using a sendmsg with a newly-defined flag
(MSG_PRELOAD); the amount of data limited to a single linear sk_buff.
Note that this definition is the last-but-two bit available if "int"
is 32 bits.

Intent: lower latency for server-first protocols using TCP.
  Known cases of this use are SMTP and MySQL.

  Measurements:
    Packet capture (laptop, loopback, TFO requeste) for initial SYN to first
    client data packet (5 samples):

    - baseline   TFO-C      1064 1470 1455 1547 1595  usec
    - patched    non-TFO     140  150  159  144  153  usec
    - patched    TFO-C       142  149  149  125  125  usec

  Out of scope:
  - Client-first protocols
  - TLS-on-connect

Testing:

A) packetdrill scripts for
   - normal non-TFO
   - normal TFO
   - synack lost
   - 3rd-ack acks only the SYN
   - 3rd-ack acks partial data
     (NB: packetdrill can only check the data size, not actual content)

B) Application use, running the application testsuite
   and manual check of specific cases via packet capture

C) Daily-driver laptop use (not expected to trigger the feature;
   only regression-test)

D) KASAN/syzkaller
   - enable_syscalls "socket$inet_tcp", "listen", "sendmsg", "accept",
      "read", "write", "close", "syz_emit_ethernet", "syz_extract_tcp_res"
   - the coverage seems rather limited; the sendmsg onto a listen socket
     is there, but I am not convinced actual TCP connections are being
     excercised.  tcp_minisocks.c is entirely uncovered.
   - A need for limiting iteration in the above sendmesg was found (RCU
     timeouts), hence v2, but no hint of locking problems.
     Eric: could you expand on your previous comment?  If it referred to
     the listening socket, tcp_sendmsg_locked() is called with the sk
     locked.

Jeremy Harris (6):
  tcp: support writing to a socket in listening state
  tcp: copy write-data from listen socket to accept child socket
  tcp: fastopen: add write-data to fastopen synack packet
  tcp: transmit any pending data on receipt of 3rd-ack
  tcp: fastopen: retransmit data when only the SYN of a synack-with-data
    is acked
  tcp: fastopen: extend retransmit-queue trimming to handle linear
    sk_buff

 include/linux/socket.h                        |   1 +
 net/ipv4/tcp.c                                | 115 ++++++++++++++++++
 net/ipv4/tcp_fastopen.c                       |   3 +-
 net/ipv4/tcp_input.c                          |  15 ++-
 net/ipv4/tcp_ipv4.c                           |   4 +-
 net/ipv4/tcp_minisocks.c                      |  58 ++++++++-
 net/ipv4/tcp_output.c                         |  50 +++++++-
 .../perf/trace/beauty/include/linux/socket.h  |   1 +
 tools/perf/trace/beauty/msg_flags.c           |   3 +
 9 files changed, 237 insertions(+), 13 deletions(-)


base-commit: f685204c57e87d2a88b159c7525426d70ee745c9
-- 
2.49.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH net-next v2 1/6]     tcp: support writing to a socket in listening state
  2025-05-21 11:44 [PATCH net-next v2 0/6] tcp: support preloading data on a listening socket Jeremy Harris
@ 2025-05-21 11:45 ` Jeremy Harris
  2025-05-22  7:36   ` kernel test robot
  2025-05-21 11:45 ` [PATCH net-next v2 2/6] tcp: copy write-data from listen socket to accept child socket Jeremy Harris
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 8+ messages in thread
From: Jeremy Harris @ 2025-05-21 11:45 UTC (permalink / raw)
  To: netdev; +Cc: linux-api, edumazet, ncardwell, Jeremy Harris

    In the tcp sendmsg handler, permit a write in LISTENING state if
    a MSG_PRELOAD flag is used.  Copy from iovec to a linear sk_buff
    for placement on the socket write queue.

Signed-off-by: Jeremy Harris <jgh@exim.org>
---
 include/linux/socket.h                        |   1 +
 net/ipv4/tcp.c                                | 115 ++++++++++++++++++
 .../perf/trace/beauty/include/linux/socket.h  |   1 +
 tools/perf/trace/beauty/msg_flags.c           |   3 +
 4 files changed, 120 insertions(+)

diff --git a/include/linux/socket.h b/include/linux/socket.h
index 3b262487ec06..b41f4cd4dc97 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -330,6 +330,7 @@ struct ucred {
 #define MSG_SOCK_DEVMEM 0x2000000	/* Receive devmem skbs as cmsg */
 #define MSG_ZEROCOPY	0x4000000	/* Use user data in kernel path */
 #define MSG_SPLICE_PAGES 0x8000000	/* Splice the pages from the iterator in sendmsg() */
+#define MSG_PRELOAD	0x10000000	/* Preload tx data while listening */
 #define MSG_FASTOPEN	0x20000000	/* Send data in TCP SYN */
 #define MSG_CMSG_CLOEXEC 0x40000000	/* Set close_on_exec for file
 					   descriptor received through
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index b7b6ab41b496..9a5daf27c980 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1057,6 +1057,118 @@ int tcp_sendmsg_fastopen(struct sock *sk, struct msghdr *msg, int *copied,
 	return err;
 }
 
+/* Cut-down version of tcp_sendmsg_locked(), for writing on a listen socket
+ */
+static int tcp_sendmsg_preload(struct sock *sk, struct msghdr *msg)
+{
+	struct sk_buff *skb;
+	struct sockcm_cookie sockc;
+	int flags, err, copied = 0;
+	int size_goal;
+	int process_backlog = 0;
+	long timeo;
+
+	if (sk->sk_state != TCP_LISTEN)
+		return -EINVAL;
+
+	flags = msg->msg_flags;
+
+	sockc = (struct sockcm_cookie){ .tsflags = READ_ONCE(sk->sk_tsflags) };
+
+	timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
+
+	/* Ok commence sending. */
+restart:
+	/* Use a arbitrary "mss" value */
+	size_goal = 1000;
+
+	err = -EPIPE;
+	if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))
+		goto do_error;
+
+	while (msg_data_left(msg)) {
+		ssize_t copy = 0;
+
+		skb = tcp_write_queue_tail(sk);
+		if (skb)
+			copy = size_goal - skb->len;
+
+		trace_tcp_sendmsg_locked(sk, msg, skb, size_goal);
+
+		if (copy <= 0 || !tcp_skb_can_collapse_to(skb)) {
+			bool first_skb = !skb;
+
+			/* Limit to only one skb on the sk write queue */
+
+			if (!first_skb)
+				goto out_nopush;
+
+			if (!sk_stream_memory_free(sk))
+				goto wait_for_space;
+
+			if (unlikely(process_backlog >= 16)) {
+				process_backlog = 0;
+				if (sk_flush_backlog(sk))
+					goto restart;
+			}
+
+			skb = tcp_stream_alloc_skb(sk, sk->sk_allocation,
+						   first_skb);
+			if (!skb)
+				goto wait_for_space;
+
+			process_backlog++;
+
+#ifdef CONFIG_SKB_DECRYPTED
+			skb->decrypted = !!(flags & MSG_SENDPAGE_DECRYPTED);
+#endif
+			tcp_skb_entail(sk, skb);
+			copy = size_goal;
+		}
+
+		/* Try to append data to the end of skb. */
+		if (copy > msg_data_left(msg))
+			copy = msg_data_left(msg);
+
+		copy = min_t(int, copy, skb_tailroom(skb));
+		err = skb_add_data_nocache(sk, skb, &msg->msg_iter, copy);
+		if (err)
+			goto do_error;
+
+		TCP_SKB_CB(skb)->end_seq += copy;
+		tcp_skb_pcount_set(skb, 0);
+
+		copied += copy;
+		goto out_nopush;
+
+wait_for_space:
+		set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
+		tcp_remove_empty_skb(sk);
+
+		err = sk_stream_wait_memory(sk, &timeo);
+		if (err != 0)
+			goto do_error;
+	}
+
+out_nopush:
+	return copied;
+
+do_error:
+	tcp_remove_empty_skb(sk);
+
+	if (copied)
+		goto out_nopush;
+
+	err = sk_stream_error(sk, flags, err);
+	/* make sure we wake any epoll edge trigger waiter */
+	if (unlikely(tcp_rtx_and_write_queues_empty(sk) && err == -EAGAIN)) {
+		sk->sk_write_space(sk);
+		tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED);
+	}
+
+	return err;
+}
+
 int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 {
 	struct net_devmem_dmabuf_binding *binding = NULL;
@@ -1132,6 +1244,9 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 			goto out_err;
 	}
 
+	if (unlikely(flags & MSG_PRELOAD))
+		return tcp_sendmsg_preload(sk, msg);
+
 	timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
 
 	tcp_rate_check_app_limited(sk);  /* is sending application-limited? */
diff --git a/tools/perf/trace/beauty/include/linux/socket.h b/tools/perf/trace/beauty/include/linux/socket.h
index c3322eb3d686..e9ea498169f3 100644
--- a/tools/perf/trace/beauty/include/linux/socket.h
+++ b/tools/perf/trace/beauty/include/linux/socket.h
@@ -330,6 +330,7 @@ struct ucred {
 #define MSG_SOCK_DEVMEM 0x2000000	/* Receive devmem skbs as cmsg */
 #define MSG_ZEROCOPY	0x4000000	/* Use user data in kernel path */
 #define MSG_SPLICE_PAGES 0x8000000	/* Splice the pages from the iterator in sendmsg() */
+#define MSG_PRELOAD	0x10000000	/* Preload tx data while listening */
 #define MSG_FASTOPEN	0x20000000	/* Send data in TCP SYN */
 #define MSG_CMSG_CLOEXEC 0x40000000	/* Set close_on_exec for file
 					   descriptor received through
diff --git a/tools/perf/trace/beauty/msg_flags.c b/tools/perf/trace/beauty/msg_flags.c
index 2da581ff0c80..27e40da9b02d 100644
--- a/tools/perf/trace/beauty/msg_flags.c
+++ b/tools/perf/trace/beauty/msg_flags.c
@@ -20,6 +20,9 @@
 #ifndef MSG_SPLICE_PAGES
 #define MSG_SPLICE_PAGES	0x8000000
 #endif
+#ifndef MSG_PRELOAD
+#define MSG_PRELOAD		0x10000000
+#endif
 #ifndef MSG_FASTOPEN
 #define MSG_FASTOPEN		0x20000000
 #endif
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH net-next v2 2/6] tcp: copy write-data from listen socket to accept child socket
  2025-05-21 11:44 [PATCH net-next v2 0/6] tcp: support preloading data on a listening socket Jeremy Harris
  2025-05-21 11:45 ` [PATCH net-next v2 1/6] tcp: support writing to a socket in listening state Jeremy Harris
@ 2025-05-21 11:45 ` Jeremy Harris
  2025-05-21 11:45 ` [PATCH net-next v2 3/6] tcp: fastopen: add write-data to fastopen synack packet Jeremy Harris
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Jeremy Harris @ 2025-05-21 11:45 UTC (permalink / raw)
  To: netdev; +Cc: linux-api, edumazet, ncardwell, Jeremy Harris

Set the request_sock flag for fastopen earlier, making it available
to the af_ops SYN-handler function.

In that function copy data from the listen socket write queue into an
sk_buff, allocating if needed and adding to the write queue of the
newly-created child socket.
Set sequence number values depending on the fastopen status.

Signed-off-by: Jeremy Harris <jgh@exim.org>
---
 net/ipv4/tcp_fastopen.c  |  3 ++-
 net/ipv4/tcp_ipv4.c      |  4 +--
 net/ipv4/tcp_minisocks.c | 58 ++++++++++++++++++++++++++++++++++++----
 3 files changed, 57 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c
index 9b83d639b5ac..03a86d0b87ba 100644
--- a/net/ipv4/tcp_fastopen.c
+++ b/net/ipv4/tcp_fastopen.c
@@ -245,6 +245,8 @@ static struct sock *tcp_fastopen_create_child(struct sock *sk,
 	struct sock *child;
 	bool own_req;
 
+	tcp_rsk(req)->tfo_listener = true;
+
 	child = inet_csk(sk)->icsk_af_ops->syn_recv_sock(sk, skb, req, NULL,
 							 NULL, &own_req);
 	if (!child)
@@ -261,7 +263,6 @@ static struct sock *tcp_fastopen_create_child(struct sock *sk,
 	tp = tcp_sk(child);
 
 	rcu_assign_pointer(tp->fastopen_rsk, req);
-	tcp_rsk(req)->tfo_listener = true;
 
 	/* RFC1323: The window in SYN & SYN/ACK segments is never
 	 * scaled. So correct it appropriately.
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 6a14f9e6fef6..e488effdbdb2 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1747,8 +1747,8 @@ EXPORT_IPV6_MOD(tcp_v4_conn_request);
 
 
 /*
- * The three way handshake has completed - we got a valid synack -
- * now create the new socket.
+ * The three way handshake has completed - we got a valid synack
+ * (or a FASTOPEN syn) - now create the new socket.
  */
 struct sock *tcp_v4_syn_recv_sock(const struct sock *sk, struct sk_buff *skb,
 				  struct request_sock *req,
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 43d7852ce07e..d471531b4a78 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -529,7 +529,7 @@ struct sock *tcp_create_openreq_child(const struct sock *sk,
 	struct inet_connection_sock *newicsk;
 	const struct tcp_sock *oldtp;
 	struct tcp_sock *newtp;
-	u32 seq;
+	u32 seq, a_seq, n_seq;
 
 	if (!newsk)
 		return NULL;
@@ -550,9 +550,55 @@ struct sock *tcp_create_openreq_child(const struct sock *sk,
 	newtp->segs_in = 1;
 
 	seq = treq->snt_isn + 1;
-	newtp->snd_sml = newtp->snd_una = seq;
-	WRITE_ONCE(newtp->snd_nxt, seq);
-	newtp->snd_up = seq;
+	n_seq = seq;
+	a_seq = seq;
+	newtp->write_seq = seq;
+	newtp->snd_una = seq;
+
+	/* If there is write-data sitting on the listen socket, copy it to
+	 * the accept socket. If FASTOPEN we will send it on the synack,
+	 * otherwise it sits there until 3rd-ack arrives.
+	 */
+
+	if (unlikely(!skb_queue_empty(&sk->sk_write_queue))) {
+		struct sk_buff *l_skb = tcp_send_head(sk),
+				*a_skb = tcp_write_queue_tail(newsk);
+		ssize_t copy = 0;
+
+		if (a_skb)
+			copy = l_skb->len - a_skb->len;
+
+		if (copy <= 0 || !tcp_skb_can_collapse_to(a_skb)) {
+			bool first_skb = tcp_rtx_and_write_queues_empty(newsk);
+
+			a_skb = tcp_stream_alloc_skb(newsk,
+						     newsk->sk_allocation,
+						     first_skb);
+			if (!a_skb) {
+				/* is this the correct free? */
+				bh_unlock_sock(newsk);
+				sk_free(newsk);
+				return NULL;
+			}
+
+			tcp_skb_entail(newsk, a_skb);
+		}
+		copy = min_t(int, l_skb->len, skb_tailroom(a_skb));
+		skb_put_data(a_skb, l_skb->data, copy);
+
+		TCP_SKB_CB(a_skb)->end_seq += copy;
+
+		a_seq += l_skb->len;
+
+		if (treq->tfo_listener)
+			seq = a_seq;
+
+		/* assumes only one skb on the listen write queue */
+	}
+
+	newtp->snd_sml = seq;
+	WRITE_ONCE(newtp->snd_nxt, a_seq);
+	newtp->snd_up = n_seq;
 
 	INIT_LIST_HEAD(&newtp->tsq_node);
 	INIT_LIST_HEAD(&newtp->tsorted_sent_queue);
@@ -567,7 +613,9 @@ struct sock *tcp_create_openreq_child(const struct sock *sk,
 	newtp->total_retrans = req->num_retrans;
 
 	tcp_init_xmit_timers(newsk);
-	WRITE_ONCE(newtp->write_seq, newtp->pushed_seq = treq->snt_isn + 1);
+
+	newtp->pushed_seq = n_seq;
+	WRITE_ONCE(newtp->write_seq, a_seq);
 
 	if (sock_flag(newsk, SOCK_KEEPOPEN))
 		tcp_reset_keepalive_timer(newsk, keepalive_time_when(newtp));
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH net-next v2 3/6] tcp: fastopen: add write-data to fastopen synack packet
  2025-05-21 11:44 [PATCH net-next v2 0/6] tcp: support preloading data on a listening socket Jeremy Harris
  2025-05-21 11:45 ` [PATCH net-next v2 1/6] tcp: support writing to a socket in listening state Jeremy Harris
  2025-05-21 11:45 ` [PATCH net-next v2 2/6] tcp: copy write-data from listen socket to accept child socket Jeremy Harris
@ 2025-05-21 11:45 ` Jeremy Harris
  2025-05-21 11:45 ` [PATCH net-next v2 4/6] tcp: transmit any pending data on receipt of 3rd-ack Jeremy Harris
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Jeremy Harris @ 2025-05-21 11:45 UTC (permalink / raw)
  To: netdev; +Cc: linux-api, edumazet, ncardwell, Jeremy Harris

While building the synack packet, for a fastopen socket
copy data from write queue to the packet.
Move the data from write queue to retransmit queue.

Signed-off-by: Jeremy Harris <jgh@exim.org>
---
 net/ipv4/tcp_output.c | 32 ++++++++++++++++++++++++++++++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 3ac8d2d17e1f..c50553c1c795 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3702,7 +3702,7 @@ int tcp_send_synack(struct sock *sk)
 
 /**
  * tcp_make_synack - Allocate one skb and build a SYNACK packet.
- * @sk: listener socket
+ * @sk: listener socket (or child socket for fastopen)
  * @dst: dst entry attached to the SYNACK. It is consumed and caller
  *       should not use it again.
  * @req: request_sock pointer
@@ -3719,6 +3719,7 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst,
 	struct inet_request_sock *ireq = inet_rsk(req);
 	const struct tcp_sock *tp = tcp_sk(sk);
 	struct tcp_out_options opts;
+	struct sock *fastopen_sk = (struct sock *)sk;
 	struct tcp_key key = {};
 	struct sk_buff *skb;
 	int tcp_header_size;
@@ -3748,7 +3749,7 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst,
 		 * cpu might call us concurrently.
 		 * sk->sk_wmem_alloc in an atomic, we can promote to rw.
 		 */
-		skb_set_owner_w(skb, (struct sock *)sk);
+		skb_set_owner_w(skb, fastopen_sk);
 		break;
 	}
 	skb_dst_set(skb, dst);
@@ -3831,6 +3832,33 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst,
 	th->window = htons(min(req->rsk_rcv_wnd, 65535U));
 	tcp_options_write(th, NULL, tcp_rsk(req), &opts, &key);
 	th->doff = (tcp_header_size >> 2);
+
+	/* If this is a FASTOPEN, and there is write-data on the accept socket,
+	 * re-copy it to the synack segment. If not FASTOPEN. any data waits
+	 * until 3rd-ack arrival.
+	 */
+
+	if (synack_type == TCP_SYNACK_FASTOPEN &&
+	    !skb_queue_empty(&sk->sk_write_queue)) {
+		struct sk_buff *a_skb = tcp_write_queue_tail(sk);
+		int copy = min_t(int, a_skb->len, skb_tailroom(skb));
+
+		skb_put_data(skb, a_skb->data, copy);
+		TCP_SKB_CB(skb)->end_seq += copy;
+
+		tcp_skb_pcount_set(a_skb, 1);
+		WRITE_ONCE(tcp_sk(fastopen_sk)->write_seq,
+			   TCP_SKB_CB(a_skb)->end_seq);
+
+		skb_set_delivery_time(a_skb, now, SKB_CLOCK_MONOTONIC);
+
+		/* Move the data to the retransmit queue.
+		 * Code elsewhere implies this is a full child socket and
+		 * can be treated as writeable - permitting the cast.
+		 */
+		tcp_event_new_data_sent(fastopen_sk, a_skb);
+	}
+
 	TCP_INC_STATS(sock_net(sk), TCP_MIB_OUTSEGS);
 
 	/* Okay, we have all we need - do the md5 hash if needed */
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH net-next v2 4/6] tcp: transmit any pending data on receipt of 3rd-ack
  2025-05-21 11:44 [PATCH net-next v2 0/6] tcp: support preloading data on a listening socket Jeremy Harris
                   ` (2 preceding siblings ...)
  2025-05-21 11:45 ` [PATCH net-next v2 3/6] tcp: fastopen: add write-data to fastopen synack packet Jeremy Harris
@ 2025-05-21 11:45 ` Jeremy Harris
  2025-05-21 11:45 ` [PATCH net-next v2 5/6] tcp: fastopen: retransmit data when only the SYN of a synack-with-data is acked Jeremy Harris
  2025-05-21 11:45 ` [PATCH net-next v2 6/6] tcp: fastopen: extend retransmit-queue trimming to handle linear sk_buff Jeremy Harris
  5 siblings, 0 replies; 8+ messages in thread
From: Jeremy Harris @ 2025-05-21 11:45 UTC (permalink / raw)
  To: netdev; +Cc: linux-api, edumazet, ncardwell, Jeremy Harris

For the non-fastopen case of prelaod, when the 3rd-ack arrives there
will be data on the write queue. Transmit it immediately
by allowing the SYN_SENT state to run the xmit-recovery code.

Signed-off-by: Jeremy Harris <jgh@exim.org>
---
 net/ipv4/tcp_input.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 8ec92dec321a..345a08baaf02 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3900,7 +3900,8 @@ static void tcp_xmit_recovery(struct sock *sk, int rexmit)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 
-	if (rexmit == REXMIT_NONE || sk->sk_state == TCP_SYN_SENT)
+	if ((rexmit == REXMIT_NONE && sk->sk_state != TCP_SYN_RECV) ||
+	    sk->sk_state == TCP_SYN_SENT)
 		return;
 
 	if (unlikely(rexmit == REXMIT_NEW)) {
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH net-next v2 5/6] tcp: fastopen: retransmit data when only the SYN of a synack-with-data is acked
  2025-05-21 11:44 [PATCH net-next v2 0/6] tcp: support preloading data on a listening socket Jeremy Harris
                   ` (3 preceding siblings ...)
  2025-05-21 11:45 ` [PATCH net-next v2 4/6] tcp: transmit any pending data on receipt of 3rd-ack Jeremy Harris
@ 2025-05-21 11:45 ` Jeremy Harris
  2025-05-21 11:45 ` [PATCH net-next v2 6/6] tcp: fastopen: extend retransmit-queue trimming to handle linear sk_buff Jeremy Harris
  5 siblings, 0 replies; 8+ messages in thread
From: Jeremy Harris @ 2025-05-21 11:45 UTC (permalink / raw)
  To: netdev; +Cc: linux-api, edumazet, ncardwell, Jeremy Harris

A corner-case for the 3rd-ack after a data-on-synack is for only
the SYN to be acked. Handle this by, in ack processing, when in
SYN_RECV state (the state is not yet updated to ESTABLISHED)
marking the retransmit-queue sk_buff as having been lost.

Signed-off-by: Jeremy Harris <jgh@exim.org>
---
 net/ipv4/tcp_input.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 345a08baaf02..a53021edddd5 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4069,6 +4069,18 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
 				      &rexmit);
 	}
 
+	/* On receiving a 3rd-ack, if we never sent a packet via
+	 * the normal means (which counts them), yet there is data
+	 * remaining for retransmit, it was data-on-synack not acked;
+	 * mark the skb for retransmission.
+	 */
+	if (sk->sk_state == TCP_SYN_RECV && tp->segs_out == 0) {
+		struct sk_buff *skb = tcp_rtx_queue_head(sk);
+
+		if (skb)
+			tcp_mark_skb_lost(sk, skb);
+	}
+
 	/* If needed, reset TLP/RTO timer when RACK doesn't set. */
 	if (flag & FLAG_SET_XMIT_TIMER)
 		tcp_set_xmit_timer(sk);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH net-next v2 6/6] tcp: fastopen: extend retransmit-queue trimming to handle linear sk_buff
  2025-05-21 11:44 [PATCH net-next v2 0/6] tcp: support preloading data on a listening socket Jeremy Harris
                   ` (4 preceding siblings ...)
  2025-05-21 11:45 ` [PATCH net-next v2 5/6] tcp: fastopen: retransmit data when only the SYN of a synack-with-data is acked Jeremy Harris
@ 2025-05-21 11:45 ` Jeremy Harris
  5 siblings, 0 replies; 8+ messages in thread
From: Jeremy Harris @ 2025-05-21 11:45 UTC (permalink / raw)
  To: netdev; +Cc: linux-api, edumazet, ncardwell, Jeremy Harris

A corner-case for the 3rd-ack after a data-on-synack is for
some but not all of the data to be acked. Support this by
adding to the retransmit-queue trim routine to handle a
linear sk_buff.

Signed-off-by: Jeremy Harris <jgh@exim.org>
---
 net/ipv4/tcp_output.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index c50553c1c795..bff5934ff04b 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1708,8 +1708,22 @@ static int __pskb_trim_head(struct sk_buff *skb, int len)
 	struct skb_shared_info *shinfo;
 	int i, k, eat;
 
-	DEBUG_NET_WARN_ON_ONCE(skb_headlen(skb));
-	eat = len;
+	eat = skb_headlen(skb);
+	if (unlikely(eat)) {
+		if (len < eat)
+			eat = len;
+		skb->head += eat;
+		skb->len -= eat;
+		if (skb->data_len)
+			skb->data_len -= eat;
+
+		eat = len - eat;
+		if (eat == 0)
+			return len;
+	} else {
+		eat = len;
+	}
+
 	k = 0;
 	shinfo = skb_shinfo(skb);
 	for (i = 0; i < shinfo->nr_frags; i++) {
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next v2 1/6]     tcp: support writing to a socket in listening state
  2025-05-21 11:45 ` [PATCH net-next v2 1/6] tcp: support writing to a socket in listening state Jeremy Harris
@ 2025-05-22  7:36   ` kernel test robot
  0 siblings, 0 replies; 8+ messages in thread
From: kernel test robot @ 2025-05-22  7:36 UTC (permalink / raw)
  To: Jeremy Harris, netdev
  Cc: llvm, oe-kbuild-all, linux-api, edumazet, ncardwell,
	Jeremy Harris

Hi Jeremy,

kernel test robot noticed the following build warnings:

[auto build test WARNING on f685204c57e87d2a88b159c7525426d70ee745c9]

url:    https://github.com/intel-lab-lkp/linux/commits/Jeremy-Harris/tcp-support-writing-to-a-socket-in-listening-state/20250521-195234
base:   f685204c57e87d2a88b159c7525426d70ee745c9
patch link:    https://lore.kernel.org/r/d3f47c9b5b08237b6e76f7b0739d59089683c86e.1747826775.git.jgh%40exim.org
patch subject: [PATCH net-next v2 1/6]     tcp: support writing to a socket in listening state
config: i386-buildonly-randconfig-001-20250522 (https://download.01.org/0day-ci/archive/20250522/202505221529.hEVx1YPV-lkp@intel.com/config)
compiler: clang version 20.1.2 (https://github.com/llvm/llvm-project 58df0ef89dd64126512e4ee27b4ac3fd8ddf6247)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250522/202505221529.hEVx1YPV-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202505221529.hEVx1YPV-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> net/ipv4/tcp.c:1065:23: warning: variable 'sockc' set but not used [-Wunused-but-set-variable]
    1065 |         struct sockcm_cookie sockc;
         |                              ^
   1 warning generated.


vim +/sockc +1065 net/ipv4/tcp.c

  1059	
  1060	/* Cut-down version of tcp_sendmsg_locked(), for writing on a listen socket
  1061	 */
  1062	static int tcp_sendmsg_preload(struct sock *sk, struct msghdr *msg)
  1063	{
  1064		struct sk_buff *skb;
> 1065		struct sockcm_cookie sockc;
  1066		int flags, err, copied = 0;
  1067		int size_goal;
  1068		int process_backlog = 0;
  1069		long timeo;
  1070	
  1071		if (sk->sk_state != TCP_LISTEN)
  1072			return -EINVAL;
  1073	
  1074		flags = msg->msg_flags;
  1075	
  1076		sockc = (struct sockcm_cookie){ .tsflags = READ_ONCE(sk->sk_tsflags) };
  1077	
  1078		timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
  1079	
  1080		/* Ok commence sending. */
  1081	restart:
  1082		/* Use a arbitrary "mss" value */
  1083		size_goal = 1000;
  1084	
  1085		err = -EPIPE;
  1086		if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))
  1087			goto do_error;
  1088	
  1089		while (msg_data_left(msg)) {
  1090			ssize_t copy = 0;
  1091	
  1092			skb = tcp_write_queue_tail(sk);
  1093			if (skb)
  1094				copy = size_goal - skb->len;
  1095	
  1096			trace_tcp_sendmsg_locked(sk, msg, skb, size_goal);
  1097	
  1098			if (copy <= 0 || !tcp_skb_can_collapse_to(skb)) {
  1099				bool first_skb = !skb;
  1100	
  1101				/* Limit to only one skb on the sk write queue */
  1102	
  1103				if (!first_skb)
  1104					goto out_nopush;
  1105	
  1106				if (!sk_stream_memory_free(sk))
  1107					goto wait_for_space;
  1108	
  1109				if (unlikely(process_backlog >= 16)) {
  1110					process_backlog = 0;
  1111					if (sk_flush_backlog(sk))
  1112						goto restart;
  1113				}
  1114	
  1115				skb = tcp_stream_alloc_skb(sk, sk->sk_allocation,
  1116							   first_skb);
  1117				if (!skb)
  1118					goto wait_for_space;
  1119	
  1120				process_backlog++;
  1121	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-05-22  7:36 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-21 11:44 [PATCH net-next v2 0/6] tcp: support preloading data on a listening socket Jeremy Harris
2025-05-21 11:45 ` [PATCH net-next v2 1/6] tcp: support writing to a socket in listening state Jeremy Harris
2025-05-22  7:36   ` kernel test robot
2025-05-21 11:45 ` [PATCH net-next v2 2/6] tcp: copy write-data from listen socket to accept child socket Jeremy Harris
2025-05-21 11:45 ` [PATCH net-next v2 3/6] tcp: fastopen: add write-data to fastopen synack packet Jeremy Harris
2025-05-21 11:45 ` [PATCH net-next v2 4/6] tcp: transmit any pending data on receipt of 3rd-ack Jeremy Harris
2025-05-21 11:45 ` [PATCH net-next v2 5/6] tcp: fastopen: retransmit data when only the SYN of a synack-with-data is acked Jeremy Harris
2025-05-21 11:45 ` [PATCH net-next v2 6/6] tcp: fastopen: extend retransmit-queue trimming to handle linear sk_buff Jeremy Harris

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).