* [PATCH net-next v2 0/6] tcp: support preloading data on a listening socket
@ 2025-05-21 11:44 Jeremy Harris
2025-05-21 11:45 ` [PATCH net-next v2 1/6] tcp: support writing to a socket in listening state Jeremy Harris
` (5 more replies)
0 siblings, 6 replies; 8+ messages in thread
From: Jeremy Harris @ 2025-05-21 11:44 UTC (permalink / raw)
To: netdev; +Cc: linux-api, edumazet, ncardwell, Jeremy Harris
v2 changes:
- Split out the preload operation to a separate routine from
tcp_sendmsg_locked() and restrict from looping over the supplied
iovec
------
Support write to a listen TCP socket, for immediate
transmission on all later passive connection establishments
parented by the listen socket.
On a normal connection transmission of the data is triggered by the receipt
of the 3rd-ack. On a fastopen (with accepted cookie) connection the data
is sent in the synack packet.
The data preload is done using a sendmsg with a newly-defined flag
(MSG_PRELOAD); the amount of data limited to a single linear sk_buff.
Note that this definition is the last-but-two bit available if "int"
is 32 bits.
Intent: lower latency for server-first protocols using TCP.
Known cases of this use are SMTP and MySQL.
Measurements:
Packet capture (laptop, loopback, TFO requeste) for initial SYN to first
client data packet (5 samples):
- baseline TFO-C 1064 1470 1455 1547 1595 usec
- patched non-TFO 140 150 159 144 153 usec
- patched TFO-C 142 149 149 125 125 usec
Out of scope:
- Client-first protocols
- TLS-on-connect
Testing:
A) packetdrill scripts for
- normal non-TFO
- normal TFO
- synack lost
- 3rd-ack acks only the SYN
- 3rd-ack acks partial data
(NB: packetdrill can only check the data size, not actual content)
B) Application use, running the application testsuite
and manual check of specific cases via packet capture
C) Daily-driver laptop use (not expected to trigger the feature;
only regression-test)
D) KASAN/syzkaller
- enable_syscalls "socket$inet_tcp", "listen", "sendmsg", "accept",
"read", "write", "close", "syz_emit_ethernet", "syz_extract_tcp_res"
- the coverage seems rather limited; the sendmsg onto a listen socket
is there, but I am not convinced actual TCP connections are being
excercised. tcp_minisocks.c is entirely uncovered.
- A need for limiting iteration in the above sendmesg was found (RCU
timeouts), hence v2, but no hint of locking problems.
Eric: could you expand on your previous comment? If it referred to
the listening socket, tcp_sendmsg_locked() is called with the sk
locked.
Jeremy Harris (6):
tcp: support writing to a socket in listening state
tcp: copy write-data from listen socket to accept child socket
tcp: fastopen: add write-data to fastopen synack packet
tcp: transmit any pending data on receipt of 3rd-ack
tcp: fastopen: retransmit data when only the SYN of a synack-with-data
is acked
tcp: fastopen: extend retransmit-queue trimming to handle linear
sk_buff
include/linux/socket.h | 1 +
net/ipv4/tcp.c | 115 ++++++++++++++++++
net/ipv4/tcp_fastopen.c | 3 +-
net/ipv4/tcp_input.c | 15 ++-
net/ipv4/tcp_ipv4.c | 4 +-
net/ipv4/tcp_minisocks.c | 58 ++++++++-
net/ipv4/tcp_output.c | 50 +++++++-
.../perf/trace/beauty/include/linux/socket.h | 1 +
tools/perf/trace/beauty/msg_flags.c | 3 +
9 files changed, 237 insertions(+), 13 deletions(-)
base-commit: f685204c57e87d2a88b159c7525426d70ee745c9
--
2.49.0
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH net-next v2 1/6] tcp: support writing to a socket in listening state
2025-05-21 11:44 [PATCH net-next v2 0/6] tcp: support preloading data on a listening socket Jeremy Harris
@ 2025-05-21 11:45 ` Jeremy Harris
2025-05-22 7:36 ` kernel test robot
2025-05-21 11:45 ` [PATCH net-next v2 2/6] tcp: copy write-data from listen socket to accept child socket Jeremy Harris
` (4 subsequent siblings)
5 siblings, 1 reply; 8+ messages in thread
From: Jeremy Harris @ 2025-05-21 11:45 UTC (permalink / raw)
To: netdev; +Cc: linux-api, edumazet, ncardwell, Jeremy Harris
In the tcp sendmsg handler, permit a write in LISTENING state if
a MSG_PRELOAD flag is used. Copy from iovec to a linear sk_buff
for placement on the socket write queue.
Signed-off-by: Jeremy Harris <jgh@exim.org>
---
include/linux/socket.h | 1 +
net/ipv4/tcp.c | 115 ++++++++++++++++++
.../perf/trace/beauty/include/linux/socket.h | 1 +
tools/perf/trace/beauty/msg_flags.c | 3 +
4 files changed, 120 insertions(+)
diff --git a/include/linux/socket.h b/include/linux/socket.h
index 3b262487ec06..b41f4cd4dc97 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -330,6 +330,7 @@ struct ucred {
#define MSG_SOCK_DEVMEM 0x2000000 /* Receive devmem skbs as cmsg */
#define MSG_ZEROCOPY 0x4000000 /* Use user data in kernel path */
#define MSG_SPLICE_PAGES 0x8000000 /* Splice the pages from the iterator in sendmsg() */
+#define MSG_PRELOAD 0x10000000 /* Preload tx data while listening */
#define MSG_FASTOPEN 0x20000000 /* Send data in TCP SYN */
#define MSG_CMSG_CLOEXEC 0x40000000 /* Set close_on_exec for file
descriptor received through
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index b7b6ab41b496..9a5daf27c980 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1057,6 +1057,118 @@ int tcp_sendmsg_fastopen(struct sock *sk, struct msghdr *msg, int *copied,
return err;
}
+/* Cut-down version of tcp_sendmsg_locked(), for writing on a listen socket
+ */
+static int tcp_sendmsg_preload(struct sock *sk, struct msghdr *msg)
+{
+ struct sk_buff *skb;
+ struct sockcm_cookie sockc;
+ int flags, err, copied = 0;
+ int size_goal;
+ int process_backlog = 0;
+ long timeo;
+
+ if (sk->sk_state != TCP_LISTEN)
+ return -EINVAL;
+
+ flags = msg->msg_flags;
+
+ sockc = (struct sockcm_cookie){ .tsflags = READ_ONCE(sk->sk_tsflags) };
+
+ timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
+
+ /* Ok commence sending. */
+restart:
+ /* Use a arbitrary "mss" value */
+ size_goal = 1000;
+
+ err = -EPIPE;
+ if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))
+ goto do_error;
+
+ while (msg_data_left(msg)) {
+ ssize_t copy = 0;
+
+ skb = tcp_write_queue_tail(sk);
+ if (skb)
+ copy = size_goal - skb->len;
+
+ trace_tcp_sendmsg_locked(sk, msg, skb, size_goal);
+
+ if (copy <= 0 || !tcp_skb_can_collapse_to(skb)) {
+ bool first_skb = !skb;
+
+ /* Limit to only one skb on the sk write queue */
+
+ if (!first_skb)
+ goto out_nopush;
+
+ if (!sk_stream_memory_free(sk))
+ goto wait_for_space;
+
+ if (unlikely(process_backlog >= 16)) {
+ process_backlog = 0;
+ if (sk_flush_backlog(sk))
+ goto restart;
+ }
+
+ skb = tcp_stream_alloc_skb(sk, sk->sk_allocation,
+ first_skb);
+ if (!skb)
+ goto wait_for_space;
+
+ process_backlog++;
+
+#ifdef CONFIG_SKB_DECRYPTED
+ skb->decrypted = !!(flags & MSG_SENDPAGE_DECRYPTED);
+#endif
+ tcp_skb_entail(sk, skb);
+ copy = size_goal;
+ }
+
+ /* Try to append data to the end of skb. */
+ if (copy > msg_data_left(msg))
+ copy = msg_data_left(msg);
+
+ copy = min_t(int, copy, skb_tailroom(skb));
+ err = skb_add_data_nocache(sk, skb, &msg->msg_iter, copy);
+ if (err)
+ goto do_error;
+
+ TCP_SKB_CB(skb)->end_seq += copy;
+ tcp_skb_pcount_set(skb, 0);
+
+ copied += copy;
+ goto out_nopush;
+
+wait_for_space:
+ set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
+ tcp_remove_empty_skb(sk);
+
+ err = sk_stream_wait_memory(sk, &timeo);
+ if (err != 0)
+ goto do_error;
+ }
+
+out_nopush:
+ return copied;
+
+do_error:
+ tcp_remove_empty_skb(sk);
+
+ if (copied)
+ goto out_nopush;
+
+ err = sk_stream_error(sk, flags, err);
+ /* make sure we wake any epoll edge trigger waiter */
+ if (unlikely(tcp_rtx_and_write_queues_empty(sk) && err == -EAGAIN)) {
+ sk->sk_write_space(sk);
+ tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED);
+ }
+
+ return err;
+}
+
int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
{
struct net_devmem_dmabuf_binding *binding = NULL;
@@ -1132,6 +1244,9 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
goto out_err;
}
+ if (unlikely(flags & MSG_PRELOAD))
+ return tcp_sendmsg_preload(sk, msg);
+
timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
tcp_rate_check_app_limited(sk); /* is sending application-limited? */
diff --git a/tools/perf/trace/beauty/include/linux/socket.h b/tools/perf/trace/beauty/include/linux/socket.h
index c3322eb3d686..e9ea498169f3 100644
--- a/tools/perf/trace/beauty/include/linux/socket.h
+++ b/tools/perf/trace/beauty/include/linux/socket.h
@@ -330,6 +330,7 @@ struct ucred {
#define MSG_SOCK_DEVMEM 0x2000000 /* Receive devmem skbs as cmsg */
#define MSG_ZEROCOPY 0x4000000 /* Use user data in kernel path */
#define MSG_SPLICE_PAGES 0x8000000 /* Splice the pages from the iterator in sendmsg() */
+#define MSG_PRELOAD 0x10000000 /* Preload tx data while listening */
#define MSG_FASTOPEN 0x20000000 /* Send data in TCP SYN */
#define MSG_CMSG_CLOEXEC 0x40000000 /* Set close_on_exec for file
descriptor received through
diff --git a/tools/perf/trace/beauty/msg_flags.c b/tools/perf/trace/beauty/msg_flags.c
index 2da581ff0c80..27e40da9b02d 100644
--- a/tools/perf/trace/beauty/msg_flags.c
+++ b/tools/perf/trace/beauty/msg_flags.c
@@ -20,6 +20,9 @@
#ifndef MSG_SPLICE_PAGES
#define MSG_SPLICE_PAGES 0x8000000
#endif
+#ifndef MSG_PRELOAD
+#define MSG_PRELOAD 0x10000000
+#endif
#ifndef MSG_FASTOPEN
#define MSG_FASTOPEN 0x20000000
#endif
--
2.49.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net-next v2 2/6] tcp: copy write-data from listen socket to accept child socket
2025-05-21 11:44 [PATCH net-next v2 0/6] tcp: support preloading data on a listening socket Jeremy Harris
2025-05-21 11:45 ` [PATCH net-next v2 1/6] tcp: support writing to a socket in listening state Jeremy Harris
@ 2025-05-21 11:45 ` Jeremy Harris
2025-05-21 11:45 ` [PATCH net-next v2 3/6] tcp: fastopen: add write-data to fastopen synack packet Jeremy Harris
` (3 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: Jeremy Harris @ 2025-05-21 11:45 UTC (permalink / raw)
To: netdev; +Cc: linux-api, edumazet, ncardwell, Jeremy Harris
Set the request_sock flag for fastopen earlier, making it available
to the af_ops SYN-handler function.
In that function copy data from the listen socket write queue into an
sk_buff, allocating if needed and adding to the write queue of the
newly-created child socket.
Set sequence number values depending on the fastopen status.
Signed-off-by: Jeremy Harris <jgh@exim.org>
---
net/ipv4/tcp_fastopen.c | 3 ++-
net/ipv4/tcp_ipv4.c | 4 +--
net/ipv4/tcp_minisocks.c | 58 ++++++++++++++++++++++++++++++++++++----
3 files changed, 57 insertions(+), 8 deletions(-)
diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c
index 9b83d639b5ac..03a86d0b87ba 100644
--- a/net/ipv4/tcp_fastopen.c
+++ b/net/ipv4/tcp_fastopen.c
@@ -245,6 +245,8 @@ static struct sock *tcp_fastopen_create_child(struct sock *sk,
struct sock *child;
bool own_req;
+ tcp_rsk(req)->tfo_listener = true;
+
child = inet_csk(sk)->icsk_af_ops->syn_recv_sock(sk, skb, req, NULL,
NULL, &own_req);
if (!child)
@@ -261,7 +263,6 @@ static struct sock *tcp_fastopen_create_child(struct sock *sk,
tp = tcp_sk(child);
rcu_assign_pointer(tp->fastopen_rsk, req);
- tcp_rsk(req)->tfo_listener = true;
/* RFC1323: The window in SYN & SYN/ACK segments is never
* scaled. So correct it appropriately.
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 6a14f9e6fef6..e488effdbdb2 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1747,8 +1747,8 @@ EXPORT_IPV6_MOD(tcp_v4_conn_request);
/*
- * The three way handshake has completed - we got a valid synack -
- * now create the new socket.
+ * The three way handshake has completed - we got a valid synack
+ * (or a FASTOPEN syn) - now create the new socket.
*/
struct sock *tcp_v4_syn_recv_sock(const struct sock *sk, struct sk_buff *skb,
struct request_sock *req,
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 43d7852ce07e..d471531b4a78 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -529,7 +529,7 @@ struct sock *tcp_create_openreq_child(const struct sock *sk,
struct inet_connection_sock *newicsk;
const struct tcp_sock *oldtp;
struct tcp_sock *newtp;
- u32 seq;
+ u32 seq, a_seq, n_seq;
if (!newsk)
return NULL;
@@ -550,9 +550,55 @@ struct sock *tcp_create_openreq_child(const struct sock *sk,
newtp->segs_in = 1;
seq = treq->snt_isn + 1;
- newtp->snd_sml = newtp->snd_una = seq;
- WRITE_ONCE(newtp->snd_nxt, seq);
- newtp->snd_up = seq;
+ n_seq = seq;
+ a_seq = seq;
+ newtp->write_seq = seq;
+ newtp->snd_una = seq;
+
+ /* If there is write-data sitting on the listen socket, copy it to
+ * the accept socket. If FASTOPEN we will send it on the synack,
+ * otherwise it sits there until 3rd-ack arrives.
+ */
+
+ if (unlikely(!skb_queue_empty(&sk->sk_write_queue))) {
+ struct sk_buff *l_skb = tcp_send_head(sk),
+ *a_skb = tcp_write_queue_tail(newsk);
+ ssize_t copy = 0;
+
+ if (a_skb)
+ copy = l_skb->len - a_skb->len;
+
+ if (copy <= 0 || !tcp_skb_can_collapse_to(a_skb)) {
+ bool first_skb = tcp_rtx_and_write_queues_empty(newsk);
+
+ a_skb = tcp_stream_alloc_skb(newsk,
+ newsk->sk_allocation,
+ first_skb);
+ if (!a_skb) {
+ /* is this the correct free? */
+ bh_unlock_sock(newsk);
+ sk_free(newsk);
+ return NULL;
+ }
+
+ tcp_skb_entail(newsk, a_skb);
+ }
+ copy = min_t(int, l_skb->len, skb_tailroom(a_skb));
+ skb_put_data(a_skb, l_skb->data, copy);
+
+ TCP_SKB_CB(a_skb)->end_seq += copy;
+
+ a_seq += l_skb->len;
+
+ if (treq->tfo_listener)
+ seq = a_seq;
+
+ /* assumes only one skb on the listen write queue */
+ }
+
+ newtp->snd_sml = seq;
+ WRITE_ONCE(newtp->snd_nxt, a_seq);
+ newtp->snd_up = n_seq;
INIT_LIST_HEAD(&newtp->tsq_node);
INIT_LIST_HEAD(&newtp->tsorted_sent_queue);
@@ -567,7 +613,9 @@ struct sock *tcp_create_openreq_child(const struct sock *sk,
newtp->total_retrans = req->num_retrans;
tcp_init_xmit_timers(newsk);
- WRITE_ONCE(newtp->write_seq, newtp->pushed_seq = treq->snt_isn + 1);
+
+ newtp->pushed_seq = n_seq;
+ WRITE_ONCE(newtp->write_seq, a_seq);
if (sock_flag(newsk, SOCK_KEEPOPEN))
tcp_reset_keepalive_timer(newsk, keepalive_time_when(newtp));
--
2.49.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net-next v2 3/6] tcp: fastopen: add write-data to fastopen synack packet
2025-05-21 11:44 [PATCH net-next v2 0/6] tcp: support preloading data on a listening socket Jeremy Harris
2025-05-21 11:45 ` [PATCH net-next v2 1/6] tcp: support writing to a socket in listening state Jeremy Harris
2025-05-21 11:45 ` [PATCH net-next v2 2/6] tcp: copy write-data from listen socket to accept child socket Jeremy Harris
@ 2025-05-21 11:45 ` Jeremy Harris
2025-05-21 11:45 ` [PATCH net-next v2 4/6] tcp: transmit any pending data on receipt of 3rd-ack Jeremy Harris
` (2 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: Jeremy Harris @ 2025-05-21 11:45 UTC (permalink / raw)
To: netdev; +Cc: linux-api, edumazet, ncardwell, Jeremy Harris
While building the synack packet, for a fastopen socket
copy data from write queue to the packet.
Move the data from write queue to retransmit queue.
Signed-off-by: Jeremy Harris <jgh@exim.org>
---
net/ipv4/tcp_output.c | 32 ++++++++++++++++++++++++++++++--
1 file changed, 30 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 3ac8d2d17e1f..c50553c1c795 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3702,7 +3702,7 @@ int tcp_send_synack(struct sock *sk)
/**
* tcp_make_synack - Allocate one skb and build a SYNACK packet.
- * @sk: listener socket
+ * @sk: listener socket (or child socket for fastopen)
* @dst: dst entry attached to the SYNACK. It is consumed and caller
* should not use it again.
* @req: request_sock pointer
@@ -3719,6 +3719,7 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst,
struct inet_request_sock *ireq = inet_rsk(req);
const struct tcp_sock *tp = tcp_sk(sk);
struct tcp_out_options opts;
+ struct sock *fastopen_sk = (struct sock *)sk;
struct tcp_key key = {};
struct sk_buff *skb;
int tcp_header_size;
@@ -3748,7 +3749,7 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst,
* cpu might call us concurrently.
* sk->sk_wmem_alloc in an atomic, we can promote to rw.
*/
- skb_set_owner_w(skb, (struct sock *)sk);
+ skb_set_owner_w(skb, fastopen_sk);
break;
}
skb_dst_set(skb, dst);
@@ -3831,6 +3832,33 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst,
th->window = htons(min(req->rsk_rcv_wnd, 65535U));
tcp_options_write(th, NULL, tcp_rsk(req), &opts, &key);
th->doff = (tcp_header_size >> 2);
+
+ /* If this is a FASTOPEN, and there is write-data on the accept socket,
+ * re-copy it to the synack segment. If not FASTOPEN. any data waits
+ * until 3rd-ack arrival.
+ */
+
+ if (synack_type == TCP_SYNACK_FASTOPEN &&
+ !skb_queue_empty(&sk->sk_write_queue)) {
+ struct sk_buff *a_skb = tcp_write_queue_tail(sk);
+ int copy = min_t(int, a_skb->len, skb_tailroom(skb));
+
+ skb_put_data(skb, a_skb->data, copy);
+ TCP_SKB_CB(skb)->end_seq += copy;
+
+ tcp_skb_pcount_set(a_skb, 1);
+ WRITE_ONCE(tcp_sk(fastopen_sk)->write_seq,
+ TCP_SKB_CB(a_skb)->end_seq);
+
+ skb_set_delivery_time(a_skb, now, SKB_CLOCK_MONOTONIC);
+
+ /* Move the data to the retransmit queue.
+ * Code elsewhere implies this is a full child socket and
+ * can be treated as writeable - permitting the cast.
+ */
+ tcp_event_new_data_sent(fastopen_sk, a_skb);
+ }
+
TCP_INC_STATS(sock_net(sk), TCP_MIB_OUTSEGS);
/* Okay, we have all we need - do the md5 hash if needed */
--
2.49.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net-next v2 4/6] tcp: transmit any pending data on receipt of 3rd-ack
2025-05-21 11:44 [PATCH net-next v2 0/6] tcp: support preloading data on a listening socket Jeremy Harris
` (2 preceding siblings ...)
2025-05-21 11:45 ` [PATCH net-next v2 3/6] tcp: fastopen: add write-data to fastopen synack packet Jeremy Harris
@ 2025-05-21 11:45 ` Jeremy Harris
2025-05-21 11:45 ` [PATCH net-next v2 5/6] tcp: fastopen: retransmit data when only the SYN of a synack-with-data is acked Jeremy Harris
2025-05-21 11:45 ` [PATCH net-next v2 6/6] tcp: fastopen: extend retransmit-queue trimming to handle linear sk_buff Jeremy Harris
5 siblings, 0 replies; 8+ messages in thread
From: Jeremy Harris @ 2025-05-21 11:45 UTC (permalink / raw)
To: netdev; +Cc: linux-api, edumazet, ncardwell, Jeremy Harris
For the non-fastopen case of prelaod, when the 3rd-ack arrives there
will be data on the write queue. Transmit it immediately
by allowing the SYN_SENT state to run the xmit-recovery code.
Signed-off-by: Jeremy Harris <jgh@exim.org>
---
net/ipv4/tcp_input.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 8ec92dec321a..345a08baaf02 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3900,7 +3900,8 @@ static void tcp_xmit_recovery(struct sock *sk, int rexmit)
{
struct tcp_sock *tp = tcp_sk(sk);
- if (rexmit == REXMIT_NONE || sk->sk_state == TCP_SYN_SENT)
+ if ((rexmit == REXMIT_NONE && sk->sk_state != TCP_SYN_RECV) ||
+ sk->sk_state == TCP_SYN_SENT)
return;
if (unlikely(rexmit == REXMIT_NEW)) {
--
2.49.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net-next v2 5/6] tcp: fastopen: retransmit data when only the SYN of a synack-with-data is acked
2025-05-21 11:44 [PATCH net-next v2 0/6] tcp: support preloading data on a listening socket Jeremy Harris
` (3 preceding siblings ...)
2025-05-21 11:45 ` [PATCH net-next v2 4/6] tcp: transmit any pending data on receipt of 3rd-ack Jeremy Harris
@ 2025-05-21 11:45 ` Jeremy Harris
2025-05-21 11:45 ` [PATCH net-next v2 6/6] tcp: fastopen: extend retransmit-queue trimming to handle linear sk_buff Jeremy Harris
5 siblings, 0 replies; 8+ messages in thread
From: Jeremy Harris @ 2025-05-21 11:45 UTC (permalink / raw)
To: netdev; +Cc: linux-api, edumazet, ncardwell, Jeremy Harris
A corner-case for the 3rd-ack after a data-on-synack is for only
the SYN to be acked. Handle this by, in ack processing, when in
SYN_RECV state (the state is not yet updated to ESTABLISHED)
marking the retransmit-queue sk_buff as having been lost.
Signed-off-by: Jeremy Harris <jgh@exim.org>
---
net/ipv4/tcp_input.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 345a08baaf02..a53021edddd5 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4069,6 +4069,18 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
&rexmit);
}
+ /* On receiving a 3rd-ack, if we never sent a packet via
+ * the normal means (which counts them), yet there is data
+ * remaining for retransmit, it was data-on-synack not acked;
+ * mark the skb for retransmission.
+ */
+ if (sk->sk_state == TCP_SYN_RECV && tp->segs_out == 0) {
+ struct sk_buff *skb = tcp_rtx_queue_head(sk);
+
+ if (skb)
+ tcp_mark_skb_lost(sk, skb);
+ }
+
/* If needed, reset TLP/RTO timer when RACK doesn't set. */
if (flag & FLAG_SET_XMIT_TIMER)
tcp_set_xmit_timer(sk);
--
2.49.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net-next v2 6/6] tcp: fastopen: extend retransmit-queue trimming to handle linear sk_buff
2025-05-21 11:44 [PATCH net-next v2 0/6] tcp: support preloading data on a listening socket Jeremy Harris
` (4 preceding siblings ...)
2025-05-21 11:45 ` [PATCH net-next v2 5/6] tcp: fastopen: retransmit data when only the SYN of a synack-with-data is acked Jeremy Harris
@ 2025-05-21 11:45 ` Jeremy Harris
5 siblings, 0 replies; 8+ messages in thread
From: Jeremy Harris @ 2025-05-21 11:45 UTC (permalink / raw)
To: netdev; +Cc: linux-api, edumazet, ncardwell, Jeremy Harris
A corner-case for the 3rd-ack after a data-on-synack is for
some but not all of the data to be acked. Support this by
adding to the retransmit-queue trim routine to handle a
linear sk_buff.
Signed-off-by: Jeremy Harris <jgh@exim.org>
---
net/ipv4/tcp_output.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index c50553c1c795..bff5934ff04b 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1708,8 +1708,22 @@ static int __pskb_trim_head(struct sk_buff *skb, int len)
struct skb_shared_info *shinfo;
int i, k, eat;
- DEBUG_NET_WARN_ON_ONCE(skb_headlen(skb));
- eat = len;
+ eat = skb_headlen(skb);
+ if (unlikely(eat)) {
+ if (len < eat)
+ eat = len;
+ skb->head += eat;
+ skb->len -= eat;
+ if (skb->data_len)
+ skb->data_len -= eat;
+
+ eat = len - eat;
+ if (eat == 0)
+ return len;
+ } else {
+ eat = len;
+ }
+
k = 0;
shinfo = skb_shinfo(skb);
for (i = 0; i < shinfo->nr_frags; i++) {
--
2.49.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net-next v2 1/6] tcp: support writing to a socket in listening state
2025-05-21 11:45 ` [PATCH net-next v2 1/6] tcp: support writing to a socket in listening state Jeremy Harris
@ 2025-05-22 7:36 ` kernel test robot
0 siblings, 0 replies; 8+ messages in thread
From: kernel test robot @ 2025-05-22 7:36 UTC (permalink / raw)
To: Jeremy Harris, netdev
Cc: llvm, oe-kbuild-all, linux-api, edumazet, ncardwell,
Jeremy Harris
Hi Jeremy,
kernel test robot noticed the following build warnings:
[auto build test WARNING on f685204c57e87d2a88b159c7525426d70ee745c9]
url: https://github.com/intel-lab-lkp/linux/commits/Jeremy-Harris/tcp-support-writing-to-a-socket-in-listening-state/20250521-195234
base: f685204c57e87d2a88b159c7525426d70ee745c9
patch link: https://lore.kernel.org/r/d3f47c9b5b08237b6e76f7b0739d59089683c86e.1747826775.git.jgh%40exim.org
patch subject: [PATCH net-next v2 1/6] tcp: support writing to a socket in listening state
config: i386-buildonly-randconfig-001-20250522 (https://download.01.org/0day-ci/archive/20250522/202505221529.hEVx1YPV-lkp@intel.com/config)
compiler: clang version 20.1.2 (https://github.com/llvm/llvm-project 58df0ef89dd64126512e4ee27b4ac3fd8ddf6247)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250522/202505221529.hEVx1YPV-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202505221529.hEVx1YPV-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> net/ipv4/tcp.c:1065:23: warning: variable 'sockc' set but not used [-Wunused-but-set-variable]
1065 | struct sockcm_cookie sockc;
| ^
1 warning generated.
vim +/sockc +1065 net/ipv4/tcp.c
1059
1060 /* Cut-down version of tcp_sendmsg_locked(), for writing on a listen socket
1061 */
1062 static int tcp_sendmsg_preload(struct sock *sk, struct msghdr *msg)
1063 {
1064 struct sk_buff *skb;
> 1065 struct sockcm_cookie sockc;
1066 int flags, err, copied = 0;
1067 int size_goal;
1068 int process_backlog = 0;
1069 long timeo;
1070
1071 if (sk->sk_state != TCP_LISTEN)
1072 return -EINVAL;
1073
1074 flags = msg->msg_flags;
1075
1076 sockc = (struct sockcm_cookie){ .tsflags = READ_ONCE(sk->sk_tsflags) };
1077
1078 timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
1079
1080 /* Ok commence sending. */
1081 restart:
1082 /* Use a arbitrary "mss" value */
1083 size_goal = 1000;
1084
1085 err = -EPIPE;
1086 if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))
1087 goto do_error;
1088
1089 while (msg_data_left(msg)) {
1090 ssize_t copy = 0;
1091
1092 skb = tcp_write_queue_tail(sk);
1093 if (skb)
1094 copy = size_goal - skb->len;
1095
1096 trace_tcp_sendmsg_locked(sk, msg, skb, size_goal);
1097
1098 if (copy <= 0 || !tcp_skb_can_collapse_to(skb)) {
1099 bool first_skb = !skb;
1100
1101 /* Limit to only one skb on the sk write queue */
1102
1103 if (!first_skb)
1104 goto out_nopush;
1105
1106 if (!sk_stream_memory_free(sk))
1107 goto wait_for_space;
1108
1109 if (unlikely(process_backlog >= 16)) {
1110 process_backlog = 0;
1111 if (sk_flush_backlog(sk))
1112 goto restart;
1113 }
1114
1115 skb = tcp_stream_alloc_skb(sk, sk->sk_allocation,
1116 first_skb);
1117 if (!skb)
1118 goto wait_for_space;
1119
1120 process_backlog++;
1121
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-05-22 7:36 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-21 11:44 [PATCH net-next v2 0/6] tcp: support preloading data on a listening socket Jeremy Harris
2025-05-21 11:45 ` [PATCH net-next v2 1/6] tcp: support writing to a socket in listening state Jeremy Harris
2025-05-22 7:36 ` kernel test robot
2025-05-21 11:45 ` [PATCH net-next v2 2/6] tcp: copy write-data from listen socket to accept child socket Jeremy Harris
2025-05-21 11:45 ` [PATCH net-next v2 3/6] tcp: fastopen: add write-data to fastopen synack packet Jeremy Harris
2025-05-21 11:45 ` [PATCH net-next v2 4/6] tcp: transmit any pending data on receipt of 3rd-ack Jeremy Harris
2025-05-21 11:45 ` [PATCH net-next v2 5/6] tcp: fastopen: retransmit data when only the SYN of a synack-with-data is acked Jeremy Harris
2025-05-21 11:45 ` [PATCH net-next v2 6/6] tcp: fastopen: extend retransmit-queue trimming to handle linear sk_buff Jeremy Harris
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).