TJ wrote: > client SYN > server LISTENING > client < SYN ACK server SYN_RECEIVED (time-out 3s) > server: inet_rsk(req)->acked = 1 > > client ACK > server (discarded) > > client < SYN ACK (DUP) server (time-out 6s) > client ACK (DUP) > server (discarded) > > client < SYN ACK (DUP) server (time-out 12s) > client ACK (DUP) > server (discarded) > > client < SYN ACK (DUP) server (time-out 24s) > client ACK (DUP) > server (discarded) > > client < SYN ACK (DUP) server (time-out 48s) > client ACK (DUP) > server (discarded) > > client < SYN ACK (DUP) server (time-out 96s) > client ACK (DUP) > server (discarded) > > server: half-open socket closed. > > With each client ACK being dropped by the kernel's TCP_DEFER_ACCEPT > mechanism eventually the handshake fails after the 'SYN ACK' retries and > time-outs expire. > > There is a case for arguing the kernel should be operating in an > enhanced handshaking mode when TCP_DEFER_ACCEPT is enabled, not an > alternative mode, and therefore should accept *both* RFC 793 and > TCP_DEFER_ACCEPT. I've been unable to find a specification or RFC for > implementing TCP_DEFER_ACCEPT aka BSD's SO_ACCEPTFILTER to give me firm > guidance. > > It seems incorrect to penalise a client that is trying to complete the > handshake according to the RFC 793 specification, especially as the > client has no way of knowing ahead of time whether or not the server is > operating deferred accept. Interesting problem. TCP_DEFER_ACCEPT does not conform to any standard I'm aware of. (In fact, I'd say it's in violation of RFC 793.) The implementation does exactly what it claims, though -- it "allows a listener to be awakened only when data arrives on the socket." I think a more useful spec might have been "allows a listener to be awakened only when data arrives on the socket, unless the specified timeout has expired." Once the timeout expires, it should process the embryonic connection as if TCP_DEFER_ACCEPT is not set. Unfortunately, I don't think we can retroactively change this definition, as an application might depend on data being available and do a non-blocking read() after the accept(), expecting data to be there. Is this worth trying to fix? Also, a listen socket with a backlog and TCP_DEFER_ACCEPT will have reqs sit in the backlog for the full defer timeout, even if they've received data, which is not really the right thing to do. I've attached a patch implementing this suggestion (compile tested only -- I think I got the logic right but it's late ;). Kind of ugly, and uses up a bit in struct inet_request_sock. Maybe can be done better... -John