netfilter: nf_ct_tcp: fix out of sync scenario while in SYN_RECV From: Pablo Neira Ayuso This patch fixes the out of sync scenarios while in SYN_RECV state. Quoting Jozsef, what it happens if we are out of sync if the following: > b. conntrack entry is outdated, new SYN received > - (b1) we ignore it but save the initialization data from it > - (b2) when the reply SYN/ACK receives and it matches the saved data, > we pick up the new connection This is what it should happen if we are in SYN_RECV state. Initially, the SYN packet hits b1, thus we save data from it. But the SYN/ACK packet is considered a retransmission given that we're in SYN_RECV state. Therefore, we never hit b2 and we don't get in sync. To fix this, we check if we were already in SYN_RECV and the previous packet was a SYN. In that case, we enter the ignore case that get us in sync. This patch helps a lot to conntrackd in stress scenarios (assumming a client that generates lots of small TCP connections). During the failover, consider that the new primary has injected one outdated flow in SYN_RECV state (this is likely to happen if the conntrack event rate is high because the backup will be a bit delayed from the primary). With the current code, if the client starts a new fresh connection that matches the tuple, the SYN packet will be ignored without updating the state tracking, and the SYN+ACK in reply will blocked as it will not pass checkings III or IV (since all state tracking in the original direction is not initialized because of the SYN packet was ignored and the ignore case that get us in sync is not applied). Cc: Jozsef Kadlecsik Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nf_conntrack_proto_tcp.c | 15 +++++++++++++++ 1 files changed, 15 insertions(+), 0 deletions(-) diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c index 3fb2b73..baa9450 100644 --- a/net/netfilter/nf_conntrack_proto_tcp.c +++ b/net/netfilter/nf_conntrack_proto_tcp.c @@ -842,6 +842,20 @@ static int tcp_packet(struct nf_conn *ct, tuple = &ct->tuplehash[dir].tuple; switch (new_state) { + case TCP_CONNTRACK_SYN_RECV: + /* Special case: we were in SYN_RECV state but we are out of + * sync. We have previously seen a SYN packet in the original + * direction which has been ignored, now we see the SYN/ACK + * packet but the TCP state transition considers that this is + * a retransmission. However, this is not. Jump to the code + * that get us in sync. */ + if (old_state == TCP_CONNTRACK_SYN_RECV && + index == TCP_SYNACK_SET && + ct->proto.tcp.last_index == TCP_SYN_SET && + ct->proto.tcp.last_dir != dir && + ntohl(th->ack_seq) == ct->proto.tcp.last_end) + goto syn_recv_out_of_sync; + break; case TCP_CONNTRACK_SYN_SENT: if (old_state < TCP_CONNTRACK_TIME_WAIT) break; @@ -899,6 +913,7 @@ static int tcp_packet(struct nf_conn *ct, * not. We get in sync from the previously annotated * values. */ +syn_recv_out_of_sync: old_state = TCP_CONNTRACK_SYN_SENT; new_state = TCP_CONNTRACK_SYN_RECV; ct->proto.tcp.seen[ct->proto.tcp.last_dir].td_end =