All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeremy Harris <jgh@exim.org>
To: netdev@vger.kernel.org
Cc: linux-api@vger.kernel.org, edumazet@google.com,
	ncardwell@google.com, Jeremy Harris <jgh@exim.org>
Subject: [PATCH net-next v3 0/6] tcp: support preloading data on a listening socket
Date: Mon,  9 Jun 2025 16:56:26 +0100	[thread overview]
Message-ID: <cover.1749466540.git.jgh@exim.org> (raw)

I didn't get any comments on v2 apart from the kernel test robot
so I'm repeating the same resposes to v1 comment here.
I figured I should do a v3 to fix the compiler warnings the robot
pointed out.

v2 changes:
  - Split out the preload operation to a separate routine from
    tcp_sendmsg_locked() and restrict from looping over the supplied
    iovec

v3 changes:
  - Fix compiler warnings

------
Support write to a listen TCP socket, for immediate
transmission on all later passive connection establishments
parented by the listen socket.

On a normal connection transmission of the data is triggered by the receipt
of the 3rd-ack. On a fastopen (with accepted cookie) connection the data
is sent in the synack packet.

The data preload is done using a sendmsg with a newly-defined flag
(MSG_PRELOAD); the amount of data limited to a single linear sk_buff.
Note that this definition is the last-but-two bit available if "int"
is 32 bits.

Intent: lower latency for server-first protocols using TCP.
  Known cases of this use are SMTP and MySQL.

  Measurements:
    Packet capture (laptop, loopback, TFO requeste) for initial SYN to first
    client data packet (5 samples):

    - baseline   TFO-C      1064 1470 1455 1547 1595  usec
    - patched    non-TFO     140  150  159  144  153  usec
    - patched    TFO-C       142  149  149  125  125  usec

  Out of scope:
  - Client-first protocols
  - TLS-on-connect

Testing:

A) packetdrill scripts for
   - normal non-TFO
   - normal TFO
   - synack lost
   - 3rd-ack acks only the SYN
   - 3rd-ack acks partial data
     (NB: packetdrill can only check the data size, not actual content)

B) Application use, running the application testsuite
   and manual check of specific cases via packet capture

C) Daily-driver laptop use (not expected to trigger the feature;
   only regression-test)

D) KASAN/syzkaller

   - enable_syscalls: "socket$inet_tcp", "listen", "sendmsg", "accept",
      "read", "write", "close", "syz_emit_ethernet", "syz_extract_tcp_res"

   - the coverage seems rather limited; the sendmsg onto a listen socket
     is there, but I am not convinced actual TCP connections are being
     excercised.  tcp_input.c only 2%; tcp_minisocks.c is entirely uncovered.

   - A need for limiting iteration in the sendmesg handling was found (RCU
     timeouts), hence v2, but no hint of locking problems.

     Eric: could you expand on your previous comment "I do not see any
     locking"?  If it referred to the syscall write operation on the listening
     socket, tcp_sendmsg_locked() is called with the sk locked - so I'm
     unsure where you're looking.

Jeremy Harris (6):
  tcp: support writing to a socket in listening state
  tcp: copy write-data from listen socket to accept child socket
  tcp: fastopen: add write-data to fastopen synack packet
  tcp: transmit any pending data on receipt of 3rd-ack
  tcp: fastopen: retransmit data when only the SYN of a synack-with-data
    is acked
  tcp: fastopen: extend retransmit-queue trimming to handle linear
    sk_buff

 include/linux/socket.h                        |   1 +
 net/ipv4/tcp.c                                | 112 ++++++++++++++++++
 net/ipv4/tcp_fastopen.c                       |   3 +-
 net/ipv4/tcp_input.c                          |  15 ++-
 net/ipv4/tcp_ipv4.c                           |   4 +-
 net/ipv4/tcp_minisocks.c                      |  58 ++++++++-
 net/ipv4/tcp_output.c                         |  50 +++++++-
 .../perf/trace/beauty/include/linux/socket.h  |   1 +
 tools/perf/trace/beauty/msg_flags.c           |   3 +
 9 files changed, 234 insertions(+), 13 deletions(-)


base-commit: 2c7e4a2663a1ab5a740c59c31991579b6b865a26
-- 
2.49.0


             reply	other threads:[~2025-06-09 15:56 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-09 15:56 Jeremy Harris [this message]
2025-06-09 16:05 ` [PATCH net-next v3 1/6] tcp: support writing to a socket in listening state Jeremy Harris
2025-06-09 16:05 ` [PATCH net-next v3 2/6] tcp: copy write-data from listen socket to accept child socket Jeremy Harris
2025-06-09 16:26   ` Eric Dumazet
2025-06-09 16:05 ` [PATCH net-next v3 3/6] tcp: fastopen: add write-data to fastopen synack packet Jeremy Harris
2025-06-09 16:05 ` [PATCH net-next v3 4/6] tcp: transmit any pending data on receipt of 3rd-ack Jeremy Harris
2025-06-09 16:05 ` [PATCH net-next v3 5/6] tcp: fastopen: retransmit data when only the SYN of a synack-with-data is acked Jeremy Harris
2025-06-09 16:05 ` [PATCH net-next v3 6/6] tcp: fastopen: extend retransmit-queue trimming to handle linear sk_buff Jeremy Harris

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1749466540.git.jgh@exim.org \
    --to=jgh@exim.org \
    --cc=edumazet@google.com \
    --cc=linux-api@vger.kernel.org \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.