linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v3 0/6] tcp: support preloading data on a listening socket
@ 2025-06-09 15:56 Jeremy Harris
  2025-06-09 16:05 ` [PATCH net-next v3 1/6] tcp: support writing to a socket in listening state Jeremy Harris
                   ` (5 more replies)
  0 siblings, 6 replies; 8+ messages in thread
From: Jeremy Harris @ 2025-06-09 15:56 UTC (permalink / raw)
  To: netdev; +Cc: linux-api, edumazet, ncardwell, Jeremy Harris

I didn't get any comments on v2 apart from the kernel test robot
so I'm repeating the same resposes to v1 comment here.
I figured I should do a v3 to fix the compiler warnings the robot
pointed out.

v2 changes:
  - Split out the preload operation to a separate routine from
    tcp_sendmsg_locked() and restrict from looping over the supplied
    iovec

v3 changes:
  - Fix compiler warnings

------
Support write to a listen TCP socket, for immediate
transmission on all later passive connection establishments
parented by the listen socket.

On a normal connection transmission of the data is triggered by the receipt
of the 3rd-ack. On a fastopen (with accepted cookie) connection the data
is sent in the synack packet.

The data preload is done using a sendmsg with a newly-defined flag
(MSG_PRELOAD); the amount of data limited to a single linear sk_buff.
Note that this definition is the last-but-two bit available if "int"
is 32 bits.

Intent: lower latency for server-first protocols using TCP.
  Known cases of this use are SMTP and MySQL.

  Measurements:
    Packet capture (laptop, loopback, TFO requeste) for initial SYN to first
    client data packet (5 samples):

    - baseline   TFO-C      1064 1470 1455 1547 1595  usec
    - patched    non-TFO     140  150  159  144  153  usec
    - patched    TFO-C       142  149  149  125  125  usec

  Out of scope:
  - Client-first protocols
  - TLS-on-connect

Testing:

A) packetdrill scripts for
   - normal non-TFO
   - normal TFO
   - synack lost
   - 3rd-ack acks only the SYN
   - 3rd-ack acks partial data
     (NB: packetdrill can only check the data size, not actual content)

B) Application use, running the application testsuite
   and manual check of specific cases via packet capture

C) Daily-driver laptop use (not expected to trigger the feature;
   only regression-test)

D) KASAN/syzkaller

   - enable_syscalls: "socket$inet_tcp", "listen", "sendmsg", "accept",
      "read", "write", "close", "syz_emit_ethernet", "syz_extract_tcp_res"

   - the coverage seems rather limited; the sendmsg onto a listen socket
     is there, but I am not convinced actual TCP connections are being
     excercised.  tcp_input.c only 2%; tcp_minisocks.c is entirely uncovered.

   - A need for limiting iteration in the sendmesg handling was found (RCU
     timeouts), hence v2, but no hint of locking problems.

     Eric: could you expand on your previous comment "I do not see any
     locking"?  If it referred to the syscall write operation on the listening
     socket, tcp_sendmsg_locked() is called with the sk locked - so I'm
     unsure where you're looking.

Jeremy Harris (6):
  tcp: support writing to a socket in listening state
  tcp: copy write-data from listen socket to accept child socket
  tcp: fastopen: add write-data to fastopen synack packet
  tcp: transmit any pending data on receipt of 3rd-ack
  tcp: fastopen: retransmit data when only the SYN of a synack-with-data
    is acked
  tcp: fastopen: extend retransmit-queue trimming to handle linear
    sk_buff

 include/linux/socket.h                        |   1 +
 net/ipv4/tcp.c                                | 112 ++++++++++++++++++
 net/ipv4/tcp_fastopen.c                       |   3 +-
 net/ipv4/tcp_input.c                          |  15 ++-
 net/ipv4/tcp_ipv4.c                           |   4 +-
 net/ipv4/tcp_minisocks.c                      |  58 ++++++++-
 net/ipv4/tcp_output.c                         |  50 +++++++-
 .../perf/trace/beauty/include/linux/socket.h  |   1 +
 tools/perf/trace/beauty/msg_flags.c           |   3 +
 9 files changed, 234 insertions(+), 13 deletions(-)


base-commit: 2c7e4a2663a1ab5a740c59c31991579b6b865a26
-- 
2.49.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-06-09 16:26 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-09 15:56 [PATCH net-next v3 0/6] tcp: support preloading data on a listening socket Jeremy Harris
2025-06-09 16:05 ` [PATCH net-next v3 1/6] tcp: support writing to a socket in listening state Jeremy Harris
2025-06-09 16:05 ` [PATCH net-next v3 2/6] tcp: copy write-data from listen socket to accept child socket Jeremy Harris
2025-06-09 16:26   ` Eric Dumazet
2025-06-09 16:05 ` [PATCH net-next v3 3/6] tcp: fastopen: add write-data to fastopen synack packet Jeremy Harris
2025-06-09 16:05 ` [PATCH net-next v3 4/6] tcp: transmit any pending data on receipt of 3rd-ack Jeremy Harris
2025-06-09 16:05 ` [PATCH net-next v3 5/6] tcp: fastopen: retransmit data when only the SYN of a synack-with-data is acked Jeremy Harris
2025-06-09 16:05 ` [PATCH net-next v3 6/6] tcp: fastopen: extend retransmit-queue trimming to handle linear sk_buff Jeremy Harris

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).