* DCCP conntrack/NAT
@ 2008-04-04 15:41 Patrick McHardy
2008-04-04 19:59 ` Jan Engelhardt
` (13 more replies)
0 siblings, 14 replies; 15+ messages in thread
From: Patrick McHardy @ 2008-04-04 15:41 UTC (permalink / raw)
To: dccp
[-- Attachment #1: Type: text/plain, Size: 1563 bytes --]
These two patches contain my old conntrack/NAT helper for DCCP,
updated to net-2.6.26.git and the missing parts (almost entirely)
added.
They both depend on some other netfilter patches, I've attached
them only hoping for some review :) A git tree which contains
the full set of patches is (once upload finishes) located at:
git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6.26.git
A few words on the patches:
The NAT part is pretty uninteresting, it simply rewrites the
ports and updates the checksum. The connection tracking module
performs tracking of the connection, its states (mainly for
appropriate timeout selection) and sequence number validation
for Response/Ack packets during connection establishment.
Window tracking and full sequence number validation is not
implemented yet, connection pickup (creating valid states for
connections not tracked from the beginning) is also not working
yet. The connection tracker is comparable to the 2.4 TCP
connection tracking helper.
The part that could really use some review by someone more familiar
with DCCP than me is the state transistion table. The transistions
should be mostly similar to those of a DCCP endpoint, the acceptable
packets in each state differ slightly since the firewall sits in the
middle and packets might get lost after passing though it. If someone
spots incorrect transistions, please let me know since chances are
good that its a bug.
For testing simply load the nf_conntrack_proto_dccp and
nf_nat_proto_dccp modules and set up NAT rules as usual.
Comments welcome.
[-- Attachment #2: dccp-ct.diff --]
[-- Type: text/x-diff, Size: 26734 bytes --]
commit f448af3823a8de4260b0f45d7acc19db513a70fd
Author: Patrick McHardy <kaber@trash.net>
Date: Thu Mar 20 15:15:55 2008 +0100
[NETFILTER]: nf_conntrack: add DCCP protocol support
Signed-off-by: Patrick McHardy <kaber@trash.net>
diff --git a/include/linux/netfilter/nf_conntrack_dccp.h b/include/linux/netfilter/nf_conntrack_dccp.h
new file mode 100644
index 0000000..73c4b68
--- /dev/null
+++ b/include/linux/netfilter/nf_conntrack_dccp.h
@@ -0,0 +1,27 @@
+#ifndef _NF_CONNTRACK_DCCP_H
+#define _NF_CONNTRACK_DCCP_H
+
+/* Exposed to userspace over nfnetlink */
+enum ct_dccp_states {
+ CT_DCCP_NONE,
+ CT_DCCP_REQUEST,
+ CT_DCCP_RESPOND,
+ CT_DCCP_PARTOPEN,
+ CT_DCCP_OPEN,
+ CT_DCCP_CLOSEREQ,
+ CT_DCCP_CLOSING,
+ CT_DCCP_TIMEWAIT,
+ CT_DCCP_IGNORE,
+ CT_DCCP_MAX,
+};
+
+#ifdef __KERNEL__
+
+struct nf_ct_dccp {
+ u_int8_t state;
+ u_int64_t handshake_seq;
+};
+
+#endif /* __KERNEL__ */
+
+#endif /* _NF_CONNTRACK_DCCP_H */
diff --git a/include/linux/netfilter/nfnetlink_conntrack.h b/include/linux/netfilter/nfnetlink_conntrack.h
index e3e1533..0a383ac 100644
--- a/include/linux/netfilter/nfnetlink_conntrack.h
+++ b/include/linux/netfilter/nfnetlink_conntrack.h
@@ -80,6 +80,7 @@ enum ctattr_l4proto {
enum ctattr_protoinfo {
CTA_PROTOINFO_UNSPEC,
CTA_PROTOINFO_TCP,
+ CTA_PROTOINFO_DCCP,
__CTA_PROTOINFO_MAX
};
#define CTA_PROTOINFO_MAX (__CTA_PROTOINFO_MAX - 1)
@@ -95,6 +96,13 @@ enum ctattr_protoinfo_tcp {
};
#define CTA_PROTOINFO_TCP_MAX (__CTA_PROTOINFO_TCP_MAX - 1)
+enum ctattr_protoinfo_dccp {
+ CTA_PROTOINFO_DCCP_UNSPEC,
+ CTA_PROTOINFO_DCCP_STATE,
+ __CTA_PROTOINFO_DCCP_MAX,
+};
+#define CTA_PROTOINFO_DCCP_MAX (__CTA_PROTOINFO_DCCP_MAX - 1)
+
enum ctattr_counters {
CTA_COUNTERS_UNSPEC,
CTA_COUNTERS_PACKETS, /* old 64bit counters */
diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h
index a3567a7..bb9fc85 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -20,6 +20,7 @@
#include <asm/atomic.h>
#include <linux/netfilter/nf_conntrack_tcp.h>
+#include <linux/netfilter/nf_conntrack_dccp.h>
#include <linux/netfilter/nf_conntrack_sctp.h>
#include <linux/netfilter/nf_conntrack_proto_gre.h>
#include <net/netfilter/ipv4/nf_conntrack_icmp.h>
@@ -30,6 +31,7 @@
/* per conntrack: protocol private data */
union nf_conntrack_proto {
/* insert conntrack proto private data here */
+ struct nf_ct_dccp dccp;
struct ip_ct_sctp sctp;
struct ip_ct_tcp tcp;
struct ip_ct_icmp icmp;
diff --git a/include/net/netfilter/nf_conntrack_tuple.h b/include/net/netfilter/nf_conntrack_tuple.h
index 168c917..bdeec34 100644
--- a/include/net/netfilter/nf_conntrack_tuple.h
+++ b/include/net/netfilter/nf_conntrack_tuple.h
@@ -41,6 +41,9 @@ union nf_conntrack_man_proto
} icmp;
struct {
__be16 port;
+ } dccp;
+ struct {
+ __be16 port;
} sctp;
struct {
__be16 key; /* GRE key is 32bit, PPtP only uses 16bit */
@@ -79,6 +82,9 @@ struct nf_conntrack_tuple
} icmp;
struct {
__be16 port;
+ } dccp;
+ struct {
+ __be16 port;
} sctp;
struct {
__be16 key;
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index daf5b88..c1fc0f1 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -86,6 +86,16 @@ config NF_CONNTRACK_EVENTS
If unsure, say `N'.
+config NF_CT_PROTO_DCCP
+ tristate 'DCCP protocol connection tracking support (EXPERIMENTAL)'
+ depends on EXPERIMENTAL && NF_CONNTRACK
+ depends on NETFILTER_ADVANCED
+ help
+ With this option enabled, the layer 3 independent connection
+ tracking code will be able to do state tracking on DCCP connections.
+
+ If unsure, say 'N'.
+
config NF_CT_PROTO_GRE
tristate
depends on NF_CONNTRACK
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index ea75083..5c4b183 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -13,6 +13,7 @@ obj-$(CONFIG_NETFILTER_NETLINK_LOG) += nfnetlink_log.o
obj-$(CONFIG_NF_CONNTRACK) += nf_conntrack.o
# SCTP protocol connection tracking
+obj-$(CONFIG_NF_CT_PROTO_DCCP) += nf_conntrack_proto_dccp.o
obj-$(CONFIG_NF_CT_PROTO_GRE) += nf_conntrack_proto_gre.o
obj-$(CONFIG_NF_CT_PROTO_SCTP) += nf_conntrack_proto_sctp.o
obj-$(CONFIG_NF_CT_PROTO_UDPLITE) += nf_conntrack_proto_udplite.o
diff --git a/net/netfilter/nf_conntrack_proto_dccp.c b/net/netfilter/nf_conntrack_proto_dccp.c
new file mode 100644
index 0000000..57d62b4
--- /dev/null
+++ b/net/netfilter/nf_conntrack_proto_dccp.c
@@ -0,0 +1,761 @@
+/*
+ * DCCP connection tracking protocol helper
+ *
+ * Copyright (c) 2005, 2006, 2008 Patrick McHardy <kaber@trash.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ */
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/sysctl.h>
+#include <linux/spinlock.h>
+#include <linux/skbuff.h>
+#include <linux/ip.h>
+#include <linux/dccp.h>
+
+#include <linux/netfilter/nfnetlink_conntrack.h>
+#include <net/netfilter/nf_conntrack.h>
+#include <net/netfilter/nf_conntrack_l4proto.h>
+#include <net/netfilter/nf_log.h>
+
+static DEFINE_RWLOCK(dccp_lock);
+
+static int nf_ct_dccp_loose __read_mostly = 1;
+
+/* Timeouts are based on values from RFC4340:
+ *
+ * - REQUEST:
+ *
+ * 8.1.2. Client Request
+ *
+ * A client MAY give up on its DCCP-Requests after some time
+ * (3 minutes, for example).
+ *
+ * - RESPOND:
+ *
+ * 8.1.3. Server Response
+ *
+ * It MAY also leave the RESPOND state for CLOSED after a timeout of
+ * not less than 4MSL (8 minutes);
+ *
+ * - PARTOPEN:
+ *
+ * 8.1.5. Handshake Completion
+ *
+ * If the client remains in PARTOPEN for more than 4MSL (8 minutes),
+ * it SHOULD reset the connection with Reset Code 2, "Aborted".
+ *
+ * - CLOSEREQ/CLOSING:
+ *
+ * 8.3. Termination
+ *
+ * The retransmission timer should initially be set to go off in two
+ * round-trip times and should back off to not less than once every
+ * 64 seconds ...
+ *
+ * - TIMEWAIT:
+ *
+ * 4.3. States
+ *
+ * A server or client socket remains in this state for 2MSL (4 minutes)
+ * after the connection has been town down, ...
+ */
+
+#define DCCP_MSL (2 * 60 * HZ)
+
+static unsigned int dccp_timeout[CT_DCCP_MAX] __read_mostly = {
+ [CT_DCCP_REQUEST] = 2 * DCCP_MSL,
+ [CT_DCCP_RESPOND] = 4 * DCCP_MSL,
+ [CT_DCCP_PARTOPEN] = 4 * DCCP_MSL,
+ [CT_DCCP_OPEN] = 5 * 86400 * HZ,
+ [CT_DCCP_CLOSEREQ] = 64 * HZ,
+ [CT_DCCP_CLOSING] = 64 * HZ,
+ [CT_DCCP_TIMEWAIT] = 2 * DCCP_MSL,
+};
+
+static const char *dccp_state_names[] = {
+ [CT_DCCP_NONE] = "NONE",
+ [CT_DCCP_REQUEST] = "REQUEST",
+ [CT_DCCP_RESPOND] = "RESPOND",
+ [CT_DCCP_PARTOPEN] = "PARTOPEN",
+ [CT_DCCP_OPEN] = "OPEN",
+ [CT_DCCP_CLOSEREQ] = "CLOSEREQ",
+ [CT_DCCP_CLOSING] = "CLOSING",
+ [CT_DCCP_TIMEWAIT] = "TIMEWAIT",
+ [CT_DCCP_IGNORE] = "IGNORE",
+ [CT_DCCP_MAX] = "INVALID",
+};
+
+#define sNO CT_DCCP_NONE
+#define sRQ CT_DCCP_REQUEST
+#define sRS CT_DCCP_RESPOND
+#define sPO CT_DCCP_PARTOPEN
+#define sOP CT_DCCP_OPEN
+#define sCR CT_DCCP_CLOSEREQ
+#define sCG CT_DCCP_CLOSING
+#define sTW CT_DCCP_TIMEWAIT
+#define sIG CT_DCCP_IGNORE
+#define sIV CT_DCCP_MAX
+
+/*
+ * DCCP state transistion table
+ *
+ * The assumption is the same as for TCP tracking:
+ *
+ * We are the man in the middle. All the packets go through us but might
+ * get lost in transit to the destination. It is assumed that the destination
+ * can't receive segments we haven't seen.
+ *
+ * The following states exist:
+ *
+ * NONE: Initial state
+ * REQUEST: Request seen, waiting for Response from server
+ * RESPOND: Response from server seen, waiting for Ack from client
+ * PARTOPEN: Ack after Response seen, waiting for packet other than Response,
+ * Reset or Sync from server
+ * OPEN: RPcket other than Response, Reset or Sync seen
+ * CLOSEREQ: CloseReq from server seen, expecting Close from client
+ * CLOSING: Close seen, expecting Reset
+ * TIMEWAIT: Reset seen
+ * IGNORE: Not determinable whether packet is valid
+ *
+ * Some states exist only on one side of the connection: REQUEST, RESPOND,
+ * PARTOPEN, CLOSEREQ. For the other side these states are equivalent to
+ * the one it was in before.
+ *
+ * Packets are marked as ignored (sIG) if we don't know if they're valid
+ * (for example a reincarnation of a connection we didn't notice is dead
+ * already) and the server may send back a connection closing DCCP_RESET
+ * or a DCCP_RESPONSE.
+ */
+static u_int8_t dccp_state_table[IP_CT_DIR_MAX][DCCP_PKT_SYNCACK + 1][CT_DCCP_MAX] = {
+ [IP_CT_DIR_ORIGINAL] = {
+ [DCCP_PKT_REQUEST] = {
+ /*
+ * sNO -> sRQ Regular Request
+ * sRQ -> sRQ Retransmitted Request or reincarnation
+ * sRS -> sRS Retransmitted Request (apparently Response
+ * got lost after we saw it) or reincarnation
+ * sPO -> sIG Request during PARTOPEN state, server will ignore it
+ * sOP -> sIG Request during OPEN state: server will ignore it
+ * sCR -> sIG MUST respond with Close to CloseReq (8.3.)
+ * sCG -> sIG
+ * sTW -> sIG Time-wait
+ *
+ * sNO, sRQ, sRS, sPO. sOP, sCR, sCG, sTW, */
+ sRQ, sRQ, sRS, sIG, sIG, sIG, sIG, sIG,
+ },
+ [DCCP_PKT_RESPONSE] = {
+ /*
+ * A Response in the original direction is always invalid.
+ *
+ * sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */
+ sIV, sIV, sIV, sIV, sIV, sIV, sIV, sIV,
+ },
+ [DCCP_PKT_ACK] = {
+ /*
+ * sNO -> sIV No connection
+ * sRQ -> sIV No connection
+ * sRS -> sPO Ack for Response, move to PARTOPEN (8.1.5.)
+ * sPO -> sPO Retransmitted Ack for Response, remain in PARTOPEN
+ * sOP -> sOP Regular ACK, remain in OPEN
+ * sCR -> sCR Ack in CLOSEREQ MAY be processed (8.3.)
+ * sCG -> sCG Ack in CLOSING MAY be processed (8.3.)
+ * sTW -> sIV
+ *
+ * sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */
+ sOP, sIV, sPO, sPO, sOP, sCR, sCG, sIV
+ },
+ [DCCP_PKT_DATA] = {
+ /*
+ * sNO -> sIV No connection
+ * sRQ -> sIV No connection
+ * sRS -> sIV No connection
+ * sPO -> sIV MUST use DataAck in PARTOPEN state (8.1.5.)
+ * sOP -> sOP Regular Data packet
+ * sCR -> sCR Data in CLOSEREQ MAY be processed (8.3.)
+ * sCG -> sCG Data in CLOSING MAY be processed (8.3.)
+ * sTW -> sIV
+ *
+ * sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */
+ sIV, sIV, sIV, sIV, sOP, sCR, sCG, sIV,
+ },
+ [DCCP_PKT_DATAACK] = {
+ /*
+ * sNO -> sIV No connection
+ * sRQ -> sIV No connection
+ * sRS -> sIV No connection
+ * sPO -> sPO Remain in PARTOPEN state
+ * sOP -> sOP Regular DataAck packet in OPEN state
+ * sCR -> sCR DataAck in CLOSEREQ MAY be processed (8.3.)
+ * sCG -> sCG DataAck in CLOSING MAY be processed (8.3.)
+ * sTW -> sIV
+ *
+ * sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */
+ sIV, sIV, sIV, sPO, sOP, sCR, sCG, sIV
+ },
+ [DCCP_PKT_CLOSEREQ] = {
+ /*
+ * CLOSEREQ may only be sent by the server.
+ *
+ * sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */
+ sIV, sIV, sIV, sIV, sIV, sIV, sIV, sIV
+ },
+ [DCCP_PKT_CLOSE] = {
+ /*
+ * sNO -> sIV No connection
+ * sRQ -> sIV No connection
+ * sRS -> sIV No connection
+ * sPO -> sCG Client-initiated close
+ * sOP -> sCG Client-initiated close
+ * sCR -> sCG Close in response to CloseReq (8.3.)
+ * sCG -> sCG Retransmit
+ * sTW -> sIV Late retransmit, already in TIME_WAIT
+ *
+ * sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */
+ sIV, sIV, sIV, sCG, sCG, sCG, sIV, sIV
+ },
+ [DCCP_PKT_RESET] = {
+ /*
+ * sNO -> sIV No connection
+ * sRQ -> sTW Sync received or timeout, SHOULD send Reset (8.1.1.)
+ * sRS -> sTW Response received without Request
+ * sPO -> sTW Timeout, SHOULD send Reset (8.1.5.)
+ * sOP -> sTW Connection reset
+ * sCR -> sTW Connection reset
+ * sCG -> sTW Connection reset
+ * sTW -> sIG Ignore (don't refresh timer)
+ *
+ * sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */
+ sIV, sTW, sTW, sTW, sTW, sTW, sTW, sIG
+ },
+ [DCCP_PKT_SYNC] = {
+ /*
+ * We currently ignore Sync packets
+ *
+ * sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */
+ sIG, sIG, sIG, sIG, sIG, sIG, sIG, sIG,
+ },
+ [DCCP_PKT_SYNCACK] = {
+ /*
+ * We currently ignore SyncAck packets
+ *
+ * sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */
+ sIG, sIG, sIG, sIG, sIG, sIG, sIG, sIG,
+ },
+ },
+ [IP_CT_DIR_REPLY] = {
+ [DCCP_PKT_REQUEST] = {
+ /*
+ * A Request in the reply direction is always invalid.
+ *
+ * sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */
+ sIV, sIV, sIV, sIV, sIV, sIV, sIV, sIV
+ },
+ [DCCP_PKT_RESPONSE] = {
+ /*
+ * sNO -> sIV Response without Request
+ * sRQ -> sRS Response to clients Request
+ * sRS -> sRS Retransmitted Response (8.1.3. SHOULD NOT)
+ * sPO -> sIG Response to an ignored Request or late retransmit
+ * sOP -> sIG Invalid
+ * sCR -> sIG Invalid
+ * sCG -> sIG Invalid
+ * sTW -> sIG Invalid
+ *
+ * sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */
+ sIV, sRS, sRS, sIG, sIG, sIG, sIG, sIG
+ },
+ [DCCP_PKT_ACK] = {
+ /*
+ * sNO -> sIV No connection
+ * sRQ -> sIV No connection
+ * sRS -> sIV No connection
+ * sPO -> sOP Enter OPEN state (8.1.5.)
+ * sOP -> sOP Regular Ack in OPEN state
+ * sCR -> sIV Waiting for Close from client
+ * sCG -> sCG Ack in CLOSING MAY be processed (8.3.)
+ * sTW -> sIV
+ *
+ * sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */
+ sIV, sIV, sIV, sOP, sOP, sIV, sCG, sIV
+ },
+ [DCCP_PKT_DATA] = {
+ /*
+ * sNO -> sIV No connection
+ * sRQ -> sIV No connection
+ * sRS -> sIV No connection
+ * sPO -> sOP Enter OPEN state (8.1.5.)
+ * sOP -> sOP Regular Data packet in OPEN state
+ * sCR -> sIV Waiting for Close from client
+ * sCG -> sCG Data in CLOSING MAY be processed (8.3.)
+ * sTW -> sIV
+ *
+ * sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */
+ sIV, sIV, sIV, sOP, sOP, sIV, sCG, sIV
+ },
+ [DCCP_PKT_DATAACK] = {
+ /*
+ * sNO -> sIV No connection
+ * sRQ -> sIV No connection
+ * sRS -> sIV No connection
+ * sPO -> sOP Enter OPEN state (8.1.5.)
+ * sOP -> sOP Regular DataAck in OPEN state
+ * sCR -> sIV Waiting for Close from client
+ * sCG -> sCG Data in CLOSING MAY be processed (8.3.)
+ * sTW -> sIV
+ *
+ * sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */
+ sIV, sIV, sIV, sOP, sOP, sIV, sCG, sIV
+ },
+ [DCCP_PKT_CLOSEREQ] = {
+ /*
+ * sNO -> sIV No connection
+ * sRQ -> sIV No connection
+ * sRS -> sIV No connection
+ * sPO -> sOP -> sCR Move directly to CLOSE_REQ (8.1.5.)
+ * sOP -> sCR CloseReq in OPEN state
+ * sCR -> sCR Retransmit
+ * sCG -> sIV Already closing
+ * sTW -> sIV Already closed
+ *
+ * sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */
+ sIV, sIV, sIV, sCR, sCR, sCR, sIV, sIV
+ },
+ [DCCP_PKT_CLOSE] = {
+ /*
+ * sNO -> sIV No connection
+ * sRQ -> sIV No connection
+ * sRS -> sIV No connection
+ * sPO -> sOP -> sCG Move direcly to CLOSING
+ * sOP -> sCG Move to CLOSING
+ * sCR -> sCG Waiting for close from client
+ * sCG -> sCG Retransmit
+ * sTW -> sIV Already closed
+ *
+ * sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */
+ sIV, sIV, sIV, sCG, sCG, sCG, sCG, sIV
+ },
+ [DCCP_PKT_RESET] = {
+ /*
+ * sNO -> sIV No connection
+ * sRQ -> sTW Reset in response to Request
+ * sRS -> sTW Timeout, SHOULD send Reset (8.1.3.)
+ * sPO -> sTW Timeout, SHOULD send Reset (8.1.3.)
+ * sOP -> sTW
+ * sCR -> sTW
+ * sCG -> sTW
+ * sTW -> sIG Ignore (don't refresh timer)
+ *
+ * sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW, sTW */
+ sIV, sTW, sTW, sTW, sTW, sTW, sTW, sTW, sIG
+ },
+ [DCCP_PKT_SYNC] = {
+ /*
+ * We currently ignore Sync packets
+ *
+ * sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */
+ sIG, sIG, sIG, sIG, sIG, sIG, sIG, sIG,
+ },
+ [DCCP_PKT_SYNCACK] = {
+ /*
+ * We currently ignore SyncAck packets
+ *
+ * sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */
+ sIG, sIG, sIG, sIG, sIG, sIG, sIG, sIG,
+ },
+ },
+};
+
+static int dccp_pkt_to_tuple(const struct sk_buff *skb, unsigned int dataoff,
+ struct nf_conntrack_tuple *tuple)
+{
+ struct dccp_hdr _hdr, *dh;
+
+ dh = skb_header_pointer(skb, dataoff, sizeof(_hdr), &_hdr);
+ if (dh == NULL)
+ return 0;
+
+ tuple->src.u.dccp.port = dh->dccph_sport;
+ tuple->dst.u.dccp.port = dh->dccph_dport;
+ return 1;
+}
+
+static int dccp_invert_tuple(struct nf_conntrack_tuple *inv,
+ const struct nf_conntrack_tuple *tuple)
+{
+ inv->src.u.dccp.port = tuple->dst.u.dccp.port;
+ inv->dst.u.dccp.port = tuple->src.u.dccp.port;
+ return 1;
+}
+
+static int dccp_new(struct nf_conn *ct, const struct sk_buff *skb,
+ unsigned int dataoff)
+{
+ int pf = ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.l3num;
+ struct dccp_hdr _dh, *dh;
+ char *msg;
+ u_int8_t state;
+
+ dh = skb_header_pointer(skb, dataoff, sizeof(_dh), &dh);
+ BUG_ON(dh == NULL);
+
+ state = dccp_state_table[IP_CT_DIR_ORIGINAL][dh->dccph_type][CT_DCCP_NONE];
+ switch (state) {
+ default:
+ if (nf_ct_dccp_loose == 0) {
+ msg = "nf_ct_dccp: not picking up existing connection ";
+ goto out_invalid;
+ }
+ case CT_DCCP_REQUEST:
+ break;
+ case CT_DCCP_MAX:
+ msg = "nf_ct_dccp: invalid state transition ";
+ goto out_invalid;
+ }
+
+ ct->proto.dccp.state = CT_DCCP_NONE;
+ return 1;
+
+out_invalid:
+ if (LOG_INVALID(IPPROTO_DCCP))
+ nf_log_packet(pf, 0, skb, NULL, NULL, NULL, msg);
+ return 0;
+}
+
+static u64 dccp_ack_seq(const struct dccp_hdr *dh)
+{
+ const struct dccp_hdr_ack_bits *dhack;
+
+ dhack = (void *)dh + __dccp_basic_hdr_len(dh);
+ return ((u64)ntohs(dhack->dccph_ack_nr_high) << 32) +
+ ntohl(dhack->dccph_ack_nr_low);
+}
+
+static int dccp_packet(struct nf_conn *ct, const struct sk_buff *skb,
+ unsigned int dataoff, enum ip_conntrack_info ctinfo,
+ int pf, unsigned int hooknum)
+{
+ struct dccp_hdr _dh, *dh;
+ u_int8_t type, old_state, new_state;
+
+ dh = skb_header_pointer(skb, dataoff, sizeof(_dh), &dh);
+ BUG_ON(dh == NULL);
+ type = dh->dccph_type;
+
+ if (type == DCCP_PKT_RESET &&
+ !test_bit(IPS_SEEN_REPLY_BIT, &ct->status)) {
+ /* Tear down connection immediately if only reply is a RESET */
+ if (del_timer(&ct->timeout))
+ ct->timeout.function((unsigned long)ct);
+ return NF_ACCEPT;
+ }
+
+ write_lock_bh(&dccp_lock);
+
+ old_state = ct->proto.dccp.state;
+ new_state = dccp_state_table[CTINFO2DIR(ctinfo)][type][old_state];
+
+ switch (new_state) {
+ case CT_DCCP_RESPOND:
+ if (old_state == CT_DCCP_REQUEST)
+ ct->proto.dccp.handshake_seq = dccp_hdr_seq(dh);
+ break;
+ case CT_DCCP_PARTOPEN:
+ if (old_state == CT_DCCP_RESPOND &&
+ type == DCCP_PKT_ACK &&
+ dccp_ack_seq(dh) == ct->proto.dccp.handshake_seq)
+ set_bit(IPS_ASSURED_BIT, &ct->status);
+ break;
+ case CT_DCCP_IGNORE:
+ write_unlock_bh(&dccp_lock);
+ if (LOG_INVALID(IPPROTO_DCCP))
+ nf_log_packet(pf, 0, skb, NULL, NULL, NULL,
+ "nf_ct_dccp: invalid packet ignored ");
+ return NF_ACCEPT;
+ case CT_DCCP_MAX:
+ write_unlock_bh(&dccp_lock);
+ if (LOG_INVALID(IPPROTO_DCCP))
+ nf_log_packet(pf, 0, skb, NULL, NULL, NULL,
+ "nf_ct_dccp: invalid state transition ");
+ return -NF_ACCEPT;
+ }
+
+ ct->proto.dccp.state = new_state;
+ write_unlock_bh(&dccp_lock);
+ nf_ct_refresh_acct(ct, ctinfo, skb, dccp_timeout[new_state]);
+
+ return NF_ACCEPT;
+}
+
+static int dccp_error(struct sk_buff *skb, unsigned int dataoff,
+ enum ip_conntrack_info *ctinfo, int pf,
+ unsigned int hooknum)
+{
+ struct dccp_hdr _dh, *dh;
+ unsigned int dccp_len = skb->len - dataoff;
+ unsigned int cscov;
+ const char *msg;
+
+ dh = skb_header_pointer(skb, dataoff, sizeof(_dh), &dh);
+ if (dh == NULL) {
+ msg = "nf_ct_dccp: short packet ";
+ goto out_invalid;
+ }
+
+ if (dh->dccph_doff * 4 < sizeof(struct dccp_hdr) ||
+ dh->dccph_doff * 4 > dccp_len) {
+ msg = "nf_ct_dccp: truncated/malformed packet ";
+ goto out_invalid;
+ }
+
+ cscov = dccp_len;
+ if (dh->dccph_cscov) {
+ cscov = (dh->dccph_cscov - 1) * 4;
+ if (cscov > dccp_len) {
+ msg = "nf_ct_dccp: bad checksum coverage ";
+ goto out_invalid;
+ }
+ }
+
+ if (nf_conntrack_checksum && hooknum == NF_INET_PRE_ROUTING &&
+ nf_checksum_partial(skb, hooknum, dataoff, cscov, IPPROTO_DCCP,
+ pf)) {
+ msg = "nf_ct_dccp: bad checksum ";
+ goto out_invalid;
+ }
+
+ if (dh->dccph_type >= DCCP_PKT_INVALID) {
+ msg = "nf_ct_dccp: reserved packet type ";
+ goto out_invalid;
+ }
+
+ return NF_ACCEPT;
+
+out_invalid:
+ if (LOG_INVALID(IPPROTO_DCCP))
+ nf_log_packet(pf, 0, skb, NULL, NULL, NULL, msg);
+ return -NF_ACCEPT;
+}
+
+static int dccp_print_tuple(struct seq_file *s,
+ const struct nf_conntrack_tuple *tuple)
+{
+ return seq_printf(s, "sport=%hu dport=%hu ",
+ ntohs(tuple->src.u.dccp.port),
+ ntohs(tuple->dst.u.dccp.port));
+}
+
+static int dccp_print_conntrack(struct seq_file *s, const struct nf_conn *ct)
+{
+ return seq_printf(s, "%s ", dccp_state_names[ct->proto.dccp.state]);
+}
+
+#if defined(CONFIG_NF_CT_NETLINK) || defined(CONFIG_NF_CT_NETLINK_MODULE)
+static int dccp_to_nlattr(struct sk_buff *skb, struct nlattr *nla,
+ const struct nf_conn *ct)
+{
+ struct nlattr *nest_parms;
+
+ read_lock_bh(&dccp_lock);
+ nest_parms = nla_nest_start(skb, CTA_PROTOINFO_DCCP | NLA_F_NESTED);
+ if (!nest_parms)
+ goto nla_put_failure;
+ NLA_PUT_U8(skb, CTA_PROTOINFO_DCCP_STATE, ct->proto.dccp.state);
+ nla_nest_end(skb, nest_parms);
+ read_unlock_bh(&dccp_lock);
+ return 0;
+
+nla_put_failure:
+ read_unlock_bh(&dccp_lock);
+ return -1;
+}
+
+static const struct nla_policy dccp_nla_policy[CTA_PROTOINFO_DCCP_MAX + 1] = {
+ [CTA_PROTOINFO_DCCP_STATE] = { .type = NLA_U8 },
+};
+
+static int nlattr_to_dccp(struct nlattr *cda[], struct nf_conn *ct)
+{
+ struct nlattr *attr = cda[CTA_PROTOINFO_DCCP];
+ struct nlattr *tb[CTA_PROTOINFO_DCCP_MAX + 1];
+ int err;
+
+ if (!attr)
+ return 0;
+
+ err = nla_parse_nested(tb, CTA_PROTOINFO_DCCP_MAX, attr,
+ dccp_nla_policy);
+ if (err < 0)
+ return err;
+
+ if (!tb[CTA_PROTOINFO_DCCP_STATE] ||
+ nla_get_u8(tb[CTA_PROTOINFO_DCCP_STATE]) >= CT_DCCP_MAX)
+ return -EINVAL;
+
+ write_lock_bh(&dccp_lock);
+ ct->proto.dccp.state = nla_get_u8(tb[CTA_PROTOINFO_DCCP_STATE]);
+ write_unlock_bh(&dccp_lock);
+ return 0;
+}
+#endif
+
+#ifdef CONFIG_SYSCTL
+static unsigned int dccp_sysctl_table_users;
+static struct ctl_table_header *dccp_sysctl_header;
+static ctl_table dccp_sysctl_table[] = {
+ {
+ .ctl_name = CTL_UNNUMBERED,
+ .procname = "nf_conntrack_dccp_timeout_request",
+ .data = &dccp_timeout[CT_DCCP_REQUEST],
+ .maxlen = sizeof(unsigned int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_jiffies,
+ },
+ {
+ .ctl_name = CTL_UNNUMBERED,
+ .procname = "nf_conntrack_dccp_timeout_respond",
+ .data = &dccp_timeout[CT_DCCP_RESPOND],
+ .maxlen = sizeof(unsigned int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_jiffies,
+ },
+ {
+ .ctl_name = CTL_UNNUMBERED,
+ .procname = "nf_conntrack_dccp_timeout_partopen",
+ .data = &dccp_timeout[CT_DCCP_PARTOPEN],
+ .maxlen = sizeof(unsigned int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_jiffies,
+ },
+ {
+ .ctl_name = CTL_UNNUMBERED,
+ .procname = "nf_conntrack_dccp_timeout_open",
+ .data = &dccp_timeout[CT_DCCP_OPEN],
+ .maxlen = sizeof(unsigned int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_jiffies,
+ },
+ {
+ .ctl_name = CTL_UNNUMBERED,
+ .procname = "nf_conntrack_dccp_timeout_closereq",
+ .data = &dccp_timeout[CT_DCCP_CLOSEREQ],
+ .maxlen = sizeof(unsigned int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_jiffies,
+ },
+ {
+ .ctl_name = CTL_UNNUMBERED,
+ .procname = "nf_conntrack_dccp_timeout_closing",
+ .data = &dccp_timeout[CT_DCCP_CLOSING],
+ .maxlen = sizeof(unsigned int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_jiffies,
+ },
+ {
+ .ctl_name = CTL_UNNUMBERED,
+ .procname = "nf_conntrack_dccp_timeout_timewait",
+ .data = &dccp_timeout[CT_DCCP_TIMEWAIT],
+ .maxlen = sizeof(unsigned int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_jiffies,
+ },
+ {
+ .ctl_name = CTL_UNNUMBERED,
+ .procname = "nf_conntrack_dccp_loose",
+ .data = &nf_ct_dccp_loose,
+ .maxlen = sizeof(nf_ct_dccp_loose),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
+ {
+ .ctl_name = 0,
+ }
+};
+#endif /* CONFIG_SYSCTL */
+
+static struct nf_conntrack_l4proto dccp_proto4 __read_mostly = {
+ .l3proto = AF_INET,
+ .l4proto = IPPROTO_DCCP,
+ .name = "dccp",
+ .pkt_to_tuple = dccp_pkt_to_tuple,
+ .invert_tuple = dccp_invert_tuple,
+ .new = dccp_new,
+ .packet = dccp_packet,
+ .error = dccp_error,
+ .print_tuple = dccp_print_tuple,
+ .print_conntrack = dccp_print_conntrack,
+#if defined(CONFIG_NF_CT_NETLINK) || defined(CONFIG_NF_CT_NETLINK_MODULE)
+ .to_nlattr = dccp_to_nlattr,
+ .from_nlattr = nlattr_to_dccp,
+ .tuple_to_nlattr = nf_ct_port_tuple_to_nlattr,
+ .nlattr_to_tuple = nf_ct_port_nlattr_to_tuple,
+ .nla_policy = nf_ct_port_nla_policy,
+#endif
+#ifdef CONFIG_SYSCTL
+ .ctl_table_users = &dccp_sysctl_table_users,
+ .ctl_table_header = &dccp_sysctl_header,
+ .ctl_table = dccp_sysctl_table,
+#endif
+};
+
+static struct nf_conntrack_l4proto dccp_proto6 __read_mostly = {
+ .l3proto = AF_INET6,
+ .l4proto = IPPROTO_DCCP,
+ .name = "dccp",
+ .pkt_to_tuple = dccp_pkt_to_tuple,
+ .invert_tuple = dccp_invert_tuple,
+ .new = dccp_new,
+ .packet = dccp_packet,
+ .error = dccp_error,
+ .print_tuple = dccp_print_tuple,
+ .print_conntrack = dccp_print_conntrack,
+#if defined(CONFIG_NF_CT_NETLINK) || defined(CONFIG_NF_CT_NETLINK_MODULE)
+ .to_nlattr = dccp_to_nlattr,
+ .from_nlattr = nlattr_to_dccp,
+ .tuple_to_nlattr = nf_ct_port_tuple_to_nlattr,
+ .nlattr_to_tuple = nf_ct_port_nlattr_to_tuple,
+ .nla_policy = nf_ct_port_nla_policy,
+#endif
+#ifdef CONFIG_SYSCTL
+ .ctl_table_users = &dccp_sysctl_table_users,
+ .ctl_table_header = &dccp_sysctl_header,
+ .ctl_table = dccp_sysctl_table,
+#endif
+};
+
+static int __init nf_conntrack_proto_dccp_init(void)
+{
+ int err;
+
+ err = nf_conntrack_l4proto_register(&dccp_proto4);
+ if (err < 0)
+ goto err1;
+
+ err = nf_conntrack_l4proto_register(&dccp_proto6);
+ if (err < 0)
+ goto err2;
+ return 0;
+
+err2:
+ nf_conntrack_l4proto_unregister(&dccp_proto4);
+err1:
+ return err;
+}
+
+static void __exit nf_conntrack_proto_dccp_fini(void)
+{
+ nf_conntrack_l4proto_unregister(&dccp_proto6);
+ nf_conntrack_l4proto_unregister(&dccp_proto4);
+}
+
+module_init(nf_conntrack_proto_dccp_init);
+module_exit(nf_conntrack_proto_dccp_fini);
+
+MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>");
+MODULE_DESCRIPTION("DCCP connection tracking protocol helper");
+MODULE_LICENSE("GPL");
[-- Attachment #3: dccp-nat.diff --]
[-- Type: text/x-diff, Size: 5413 bytes --]
commit d014df2fba76bae3affb8a9c5a91de704306d4a1
Author: Patrick McHardy <kaber@trash.net>
Date: Thu Mar 20 15:15:57 2008 +0100
[NETFILTER]: nf_nat: add DCCP protocol support
Signed-off-by: Patrick McHardy <kaber@trash.net>
diff --git a/net/ipv4/netfilter/Kconfig b/net/ipv4/netfilter/Kconfig
index c5bd284..fde3eac 100644
--- a/net/ipv4/netfilter/Kconfig
+++ b/net/ipv4/netfilter/Kconfig
@@ -241,6 +241,11 @@ config NF_NAT_SNMP_BASIC
# <expr> '&&' <expr> (6)
#
# (6) Returns the result of min(/expr/, /expr/).
+config NF_NAT_PROTO_DCCP
+ tristate
+ depends on NF_NAT && NF_CT_PROTO_DCCP
+ default NF_NAT && NF_CT_PROTO_DCCP
+
config NF_NAT_PROTO_GRE
tristate
depends on NF_NAT && NF_CT_PROTO_GRE
diff --git a/net/ipv4/netfilter/Makefile b/net/ipv4/netfilter/Makefile
index 332f46f..74d8dbd 100644
--- a/net/ipv4/netfilter/Makefile
+++ b/net/ipv4/netfilter/Makefile
@@ -29,6 +29,7 @@ obj-$(CONFIG_NF_NAT_SNMP_BASIC) += nf_nat_snmp_basic.o
obj-$(CONFIG_NF_NAT_TFTP) += nf_nat_tftp.o
# NAT protocols (nf_nat)
+obj-$(CONFIG_NF_NAT_PROTO_DCCP) += nf_nat_proto_dccp.o
obj-$(CONFIG_NF_NAT_PROTO_GRE) += nf_nat_proto_gre.o
obj-$(CONFIG_NF_NAT_PROTO_UDPLITE) += nf_nat_proto_udplite.o
diff --git a/net/ipv4/netfilter/nf_nat_proto_dccp.c b/net/ipv4/netfilter/nf_nat_proto_dccp.c
new file mode 100644
index 0000000..caf4b19
--- /dev/null
+++ b/net/ipv4/netfilter/nf_nat_proto_dccp.c
@@ -0,0 +1,108 @@
+/*
+ * DCCP NAT protocol helper
+ *
+ * Copyright (c) 2005, 2006. 2008 Patrick McHardy <kaber@trash.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/skbuff.h>
+#include <linux/ip.h>
+#include <linux/dccp.h>
+
+#include <net/netfilter/nf_conntrack.h>
+#include <net/netfilter/nf_nat.h>
+#include <net/netfilter/nf_nat_protocol.h>
+
+static u_int16_t dccp_port_rover;
+
+static int
+dccp_unique_tuple(struct nf_conntrack_tuple *tuple,
+ const struct nf_nat_range *range,
+ enum nf_nat_manip_type maniptype,
+ const struct nf_conn *ct)
+{
+ return nf_nat_proto_unique_tuple(tuple, range, maniptype, ct,
+ &dccp_port_rover);
+}
+
+static int
+dccp_manip_pkt(struct sk_buff *skb,
+ unsigned int iphdroff,
+ const struct nf_conntrack_tuple *tuple,
+ enum nf_nat_manip_type maniptype)
+{
+ struct iphdr *iph = (struct iphdr *)(skb->data + iphdroff);
+ struct dccp_hdr *hdr;
+ unsigned int hdroff = iphdroff + iph->ihl * 4;
+ __be32 oldip, newip;
+ __be16 *portptr, oldport, newport;
+ int hdrsize = 8; /* DCCP connection tracking guarantees this much */
+
+ if (skb->len >= hdroff + sizeof(struct dccp_hdr))
+ hdrsize = sizeof(struct dccp_hdr);
+
+ if (!skb_make_writable(skb, hdroff + hdrsize))
+ return 0;
+
+ iph = (struct iphdr *)(skb->data + iphdroff);
+ hdr = (struct dccp_hdr *)(skb->data + hdroff);
+
+ if (maniptype == IP_NAT_MANIP_SRC) {
+ oldip = iph->saddr;
+ newip = tuple->src.u3.ip;
+ newport = tuple->src.u.dccp.port;
+ portptr = &hdr->dccph_sport;
+ } else {
+ oldip = iph->daddr;
+ newip = tuple->dst.u3.ip;
+ newport = tuple->dst.u.dccp.port;
+ portptr = &hdr->dccph_dport;
+ }
+
+ oldport = *portptr;
+ *portptr = newport;
+
+ if (hdrsize < sizeof(*hdr))
+ return 1;
+
+ inet_proto_csum_replace4(&hdr->dccph_checksum, skb, oldip, newip, 1);
+ inet_proto_csum_replace2(&hdr->dccph_checksum, skb, oldport, newport,
+ 0);
+ return 1;
+}
+
+static const struct nf_nat_protocol nf_nat_protocol_dccp = {
+ .protonum = IPPROTO_DCCP,
+ .me = THIS_MODULE,
+ .manip_pkt = dccp_manip_pkt,
+ .in_range = nf_nat_proto_in_range,
+ .unique_tuple = dccp_unique_tuple,
+#if defined(CONFIG_NF_CT_NETLINK) || defined(CONFIG_NF_CT_NETLINK_MODULE)
+ .range_to_nlattr = nf_nat_port_range_to_nlattr,
+ .nlattr_to_range = nf_nat_port_nlattr_to_range,
+#endif
+};
+
+static int __init nf_nat_proto_dccp_init(void)
+{
+ return nf_nat_protocol_register(&nf_nat_protocol_dccp);
+}
+
+static void __exit nf_nat_proto_dccp_fini(void)
+{
+ nf_nat_protocol_unregister(&nf_nat_protocol_dccp);
+}
+
+module_init(nf_nat_proto_dccp_init);
+module_exit(nf_nat_proto_dccp_fini);
+
+MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>");
+MODULE_DESCRIPTION("DCCP NAT protocol helper");
+MODULE_LICENSE("GPL");
diff --git a/net/ipv4/netfilter/nf_nat_standalone.c b/net/ipv4/netfilter/nf_nat_standalone.c
index dc316b9..b759ffa 100644
--- a/net/ipv4/netfilter/nf_nat_standalone.c
+++ b/net/ipv4/netfilter/nf_nat_standalone.c
@@ -51,7 +51,8 @@ static void nat_decode_session(struct sk_buff *skb, struct flowi *fl)
fl->fl4_dst = t->dst.u3.ip;
if (t->dst.protonum == IPPROTO_TCP ||
t->dst.protonum == IPPROTO_UDP ||
- t->dst.protonum == IPPROTO_UDPLITE)
+ t->dst.protonum == IPPROTO_UDPLITE ||
+ t->dst.protonum == IPPROTO_DCCP)
fl->fl_ip_dport = t->dst.u.tcp.port;
}
@@ -61,7 +62,8 @@ static void nat_decode_session(struct sk_buff *skb, struct flowi *fl)
fl->fl4_src = t->src.u3.ip;
if (t->dst.protonum == IPPROTO_TCP ||
t->dst.protonum == IPPROTO_UDP ||
- t->dst.protonum == IPPROTO_UDPLITE)
+ t->dst.protonum == IPPROTO_UDPLITE ||
+ t->dst.protonum == IPPROTO_DCCP)
fl->fl_ip_sport = t->src.u.tcp.port;
}
}
^ permalink raw reply related [flat|nested] 15+ messages in thread* Re: DCCP conntrack/NAT
2008-04-04 15:41 DCCP conntrack/NAT Patrick McHardy
@ 2008-04-04 19:59 ` Jan Engelhardt
2008-04-05 10:09 ` Pablo Neira Ayuso
` (12 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: Jan Engelhardt @ 2008-04-04 19:59 UTC (permalink / raw)
To: dccp
On Friday 2008-04-04 17:41, Patrick McHardy wrote:
> These two patches contain my old conntrack/NAT helper for DCCP,
> updated to net-2.6.26.git and the missing parts (almost entirely)
> added.
>
> They both depend on some other netfilter patches, I've attached
> them only hoping for some review :) A git tree which contains
> the full set of patches is (once upload finishes) located at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6.26.git
Speaking of git... I noticed people.netfilter.org has a git-daemon,
so that would be fine for iptables, no?
> A few words on the patches:
(Where's the SCTP patch for review? :)
>+static int dccp_pkt_to_tuple(const struct sk_buff *skb, unsigned int dataoff,
>+ struct nf_conntrack_tuple *tuple)
>+{
>+ struct dccp_hdr _hdr, *dh;
>+
>+ dh = skb_header_pointer(skb, dataoff, sizeof(_hdr), &_hdr);
>+ if (dh = NULL)
>+ return 0;
>+
>+ tuple->src.u.dccp.port = dh->dccph_sport;
>+ tuple->dst.u.dccp.port = dh->dccph_dport;
>+ return 1;
>+}
Something related I have been wondering about ...
(actually nf_conntrack_l3proto_ipv4)
skb_header_pointer() is used for the case of a non-linear skb (has to
do with IP fragments?).
In ipv4_pkt_to_tuple in nf_conntrack_l3proto_ipv4.c,
skb_header_pointer() is used to get the [source address of the] IP
header. Since I figured the layer-3 header must always be
unfragmented, would not it be simpler to use ip_hdr(), or is there
something that mandates use of skb_header_pointer?
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: DCCP conntrack/NAT
2008-04-04 15:41 DCCP conntrack/NAT Patrick McHardy
2008-04-04 19:59 ` Jan Engelhardt
@ 2008-04-05 10:09 ` Pablo Neira Ayuso
2008-04-05 10:12 ` Pablo Neira Ayuso
` (11 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: Pablo Neira Ayuso @ 2008-04-05 10:09 UTC (permalink / raw)
To: dccp
Jan Engelhardt wrote:
>
> On Friday 2008-04-04 17:41, Patrick McHardy wrote:
>> These two patches contain my old conntrack/NAT helper for DCCP,
>> updated to net-2.6.26.git and the missing parts (almost entirely)
>> added.
>>
>> They both depend on some other netfilter patches, I've attached
>> them only hoping for some review :) A git tree which contains
>> the full set of patches is (once upload finishes) located at:
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6.26.git
>
> Speaking of git... I noticed people.netfilter.org has a git-daemon,
> so that would be fine for iptables, no?
>
>> A few words on the patches:
>
> (Where's the SCTP patch for review? :)
>
>
>> +static int dccp_pkt_to_tuple(const struct sk_buff *skb, unsigned int
>> dataoff,
>> + struct nf_conntrack_tuple *tuple)
>> +{
>> + struct dccp_hdr _hdr, *dh;
>> +
>> + dh = skb_header_pointer(skb, dataoff, sizeof(_hdr), &_hdr);
>> + if (dh = NULL)
>> + return 0;
>> +
>> + tuple->src.u.dccp.port = dh->dccph_sport;
>> + tuple->dst.u.dccp.port = dh->dccph_dport;
>> + return 1;
>> +}
>
> Something related I have been wondering about ...
> (actually nf_conntrack_l3proto_ipv4)
Well, this is not really related with this patch, I think that it would
be a different thread since it has nothing to do with the DCCP friends.
Anyway...
> skb_header_pointer() is used for the case of a non-linear skb (has to
> do with IP fragments?).
Indeed.
> In ipv4_pkt_to_tuple in nf_conntrack_l3proto_ipv4.c,
> skb_header_pointer() is used to get the [source address of the] IP
> header. Since I figured the layer-3 header must always be
> unfragmented, would not it be simpler to use ip_hdr(), or is there
> something that mandates use of skb_header_pointer?
Right. I think that we can assume that the IP header is always linear (I
remember this from a conversation with Davem or Rusty), Patrick?
--
"Los honestos son inadaptados sociales" -- Les Luthiers
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: DCCP conntrack/NAT
2008-04-04 15:41 DCCP conntrack/NAT Patrick McHardy
2008-04-04 19:59 ` Jan Engelhardt
2008-04-05 10:09 ` Pablo Neira Ayuso
@ 2008-04-05 10:12 ` Pablo Neira Ayuso
2008-04-06 0:28 ` Patrick McHardy
` (10 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: Pablo Neira Ayuso @ 2008-04-05 10:12 UTC (permalink / raw)
To: dccp
Jan Engelhardt wrote:
>
> On Friday 2008-04-04 17:41, Patrick McHardy wrote:
>> These two patches contain my old conntrack/NAT helper for DCCP,
>> updated to net-2.6.26.git and the missing parts (almost entirely)
>> added.
>>
>> They both depend on some other netfilter patches, I've attached
>> them only hoping for some review :) A git tree which contains
>> the full set of patches is (once upload finishes) located at:
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6.26.git
>
> Speaking of git... I noticed people.netfilter.org has a git-daemon,
> so that would be fine for iptables, no?
No, that's for a different purpose. We are setting up another git
repository in Vishnu that will keep soon iptables.
--
"Los honestos son inadaptados sociales" -- Les Luthiers
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: DCCP conntrack/NAT
2008-04-04 15:41 DCCP conntrack/NAT Patrick McHardy
` (2 preceding siblings ...)
2008-04-05 10:12 ` Pablo Neira Ayuso
@ 2008-04-06 0:28 ` Patrick McHardy
2008-04-07 21:50 ` Gerrit Renker
` (9 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: Patrick McHardy @ 2008-04-06 0:28 UTC (permalink / raw)
To: dccp
Pablo Neira Ayuso wrote:
> Jan Engelhardt wrote:
>
>>> +static int dccp_pkt_to_tuple(const struct sk_buff *skb, unsigned int
>>> dataoff,
>>> + struct nf_conntrack_tuple *tuple)
>>> +{
>>> + struct dccp_hdr _hdr, *dh;
>>> +
>>> + dh = skb_header_pointer(skb, dataoff, sizeof(_hdr), &_hdr);
>>> + if (dh = NULL)
>>> + return 0;
>>> +
>> skb_header_pointer() is used for the case of a non-linear skb (has to
>> do with IP fragments?).
>>
>
> Indeed.
>
>
>> In ipv4_pkt_to_tuple in nf_conntrack_l3proto_ipv4.c,
>> skb_header_pointer() is used to get the [source address of the] IP
>> header. Since I figured the layer-3 header must always be
>> unfragmented, would not it be simpler to use ip_hdr(), or is there
>> something that mandates use of skb_header_pointer?
>>
>
> Right. I think that we can assume that the IP header is always linear (I
> remember this from a conversation with Davem or Rusty), Patrick?
>
Right, the IP header is linearized by ip_rcv() (or bridge netfilter).
The pkt_to_tuple functions may be called for the inner packet of an
ICMP message however, which might be in the non-linear area.
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: DCCP conntrack/NAT
2008-04-04 15:41 DCCP conntrack/NAT Patrick McHardy
` (3 preceding siblings ...)
2008-04-06 0:28 ` Patrick McHardy
@ 2008-04-07 21:50 ` Gerrit Renker
2008-04-07 22:45 ` Patrick McHardy
` (8 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: Gerrit Renker @ 2008-04-07 21:50 UTC (permalink / raw)
To: dccp
I have had a look at the connection-tracker and it proved a valuable exercise in re-reading the specs.
Some comments itemised below.
* nf_conntrack_proto_dccp.c:
- 8.1.2. Client Request
o reference is 8.1.1, not 8.1.2
o the Linux implementation uses same method as TCP handshake
(and is currently the only available DCCP implementation),
so can probably reuse TCP timeout values here -- the rules
in RFC 4340 are not exactly clear.
- typo: state transistion table
OPEN: RPcket
* just curious about timeout for OPEN state: it is set to
a full working week (5 * 24 * 3600 seconds). There is this
wrap-around of DCCP timestamps after 11.2 hours, so maybe
the Open state can be curtailed.
* Ignoring Sync/SyncAck packets: if this means they can get
through, then it is good, since for instance CCID-2 may use
a Sync for out-of-band information; RFC 4340 mentions Syncs
for similar purposes (e.g. feature negotiation); and Syncs
can appear in both directions.
* for the timeout sysctls, use proc_dointvec_ms_jiffies ?
* the state table has dimension DCCP_PKT_SYNCACK + 1, but what if
a value greater than that appears in the dccp header in the statement
state = dccp_state_table[IP_CT_DIR_ORIGINAL][dh->dccph_type][CT_DCCP_NONE];
* dccp_ack_seq() duplicates code from include/linux/dccp.h:
could use dccp_hdr_ack_seq() instead.
State Transitions in the original direction
=====================
* DCCP-Request:
- in state Respond (sRS -> sRS), the Request is illegal (Respond is server state)
- also, the CLOSEREQ state transition (sCR -> sIG) is illegal: Requests are sent
by clients only, and CLOSEREQ can only be entered by servers
- timewait transition -- question: is it possible to re-incarnate a new connection
here instead of ignoring the (new) Request?
* DCCP-DataAck:
- the transition sRS should go to sPO (Partopen). This is because the client
can send data when it has received the Response from the server, i.e. it
is the same rule as for DCCP-Ack in state sRS (cf. RFC 4340, 8.1.5). The "Ack"
in the DataAck has the same effect as an Ack in sRS, it acknowledges the
Response and thus triggers transition to Partopen.
State Transitions in the reply direction
====================
* DCCP-CloseReq:
- the transition from sCG is a simultaneous-close, which is possible
(both sides performing active-close, server sends CloseReq after client has
sent a Close) and has been seen on the wire, cf.
http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/closing_states/ )
- use "ignore" here?
* DCCP-Close:
- the transition sCR -> sCG: I wonder if that is possible --
o if the client is behind the NAT, it means the server sent a Close after
a CloseReq, which is invalid
o but if (hopefully soon) a server is behind a NAT, this would mean that
the server had previously sent a CloseReq which now crosses paths with
a Close from the client in the reverse direction -- again a simultaneous
close, in this case sCR -> sCR would be possible
o simplest option - maybe better to use sCR -> sIG or drop packet.
The University of Aberdeen is a charity registered in Scotland, No SC013683.
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: DCCP conntrack/NAT
2008-04-04 15:41 DCCP conntrack/NAT Patrick McHardy
` (4 preceding siblings ...)
2008-04-07 21:50 ` Gerrit Renker
@ 2008-04-07 22:45 ` Patrick McHardy
2008-04-08 9:27 ` Gerrit Renker
` (7 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: Patrick McHardy @ 2008-04-07 22:45 UTC (permalink / raw)
To: dccp
Gerrit Renker wrote:
> I have had a look at the connection-tracker and it proved a valuable exercise in re-reading the specs.
>
> Some comments itemised below.
Thanks a lot for your review :)
> * nf_conntrack_proto_dccp.c:
> - 8.1.2. Client Request
> o reference is 8.1.1, not 8.1.2
Fixed, thanks.
> o the Linux implementation uses same method as TCP handshake
> (and is currently the only available DCCP implementation),
> so can probably reuse TCP timeout values here -- the rules
> in RFC 4340 are not exactly clear.
We try not to be implementation specific (people might still be
using this once more implementations have appeared), so keeping
the larger timeout seems safer. It doesn't really hurt since
the connection will get evicted under pressure as long as it
hasn't moved to PARTOPEN with a correct ACK sequence number.
Something more specific in the RFC would be nice though.
> - typo: state transistion table
> OPEN: RPcket
Thanks, fixed already in my tree.
> * just curious about timeout for OPEN state: it is set to
> a full working week (5 * 24 * 3600 seconds). There is this
> wrap-around of DCCP timestamps after 11.2 hours, so maybe
> the Open state can be curtailed.
I just copied this part from the TCP helper because I didn't
find a better value. Does the wraparound affect the maximum
lifetime of a connection? Maybe it would make sense to decrease
it in any case, I would expect applications using DCCP not
to idle as long as TCP connections might.
> * Ignoring Sync/SyncAck packets: if this means they can get
> through, then it is good, since for instance CCID-2 may use
> a Sync for out-of-band information; RFC 4340 mentions Syncs
> for similar purposes (e.g. feature negotiation); and Syncs
> can appear in both directions.
Yes, ignored packets just never cause state transistions.
> * for the timeout sysctls, use proc_dointvec_ms_jiffies ?
I chose seconds because its consistent with what other conntrack
protocols use and less likely to confuse users.
> * the state table has dimension DCCP_PKT_SYNCACK + 1, but what if
> a value greater than that appears in the dccp header in the statement
> state = dccp_state_table[IP_CT_DIR_ORIGINAL][dh->dccph_type][CT_DCCP_NONE];
All packets go through dccp_error() first, which catches invalid
packet types.
> * dccp_ack_seq() duplicates code from include/linux/dccp.h:
> could use dccp_hdr_ack_seq() instead.
Yes, this is something I still wanted to look into. The reason
why its duplicated is that the DCCP protocol functions assume
skb->transport_header to point to the DCCP header. This is
only true for packets in the protocol layer.
> State Transitions in the original direction
> =====================>
> * DCCP-Request:
> - in state Respond (sRS -> sRS), the Request is illegal (Respond is server state)
Yes, this is one of the differences that comes from sitting in
the middle :) In the reply direction we transition from sRQ to
sRS when receiving a Response. However, that response might not
make it to the client or simply be late, in which case the request
is retransmitted.
Generally, for all states that exist only on one side (and we
transistion to it even if the packet might never reach the other
side), we must accept all packets in the other direction as in
the previous state.
> - also, the CLOSEREQ state transition (sCR -> sIG) is illegal: Requests are sent
> by clients only, and CLOSEREQ can only be entered by servers
We track both sides, so we must also define which client packets
are valid in which server state. This particular one is part of
the unfinished resync feature. The firewall might be out of sync
with both endpoints. If connection pickup is enabled it should
let packets that might establish a new connection pass and resync
when the other side responds with a valid Response. I need to
think about this a bit more, but I've marked it with FIXME for
now :)
> - timewait transition -- question: is it possible to re-incarnate a new connection
> here instead of ignoring the (new) Request?
Yes, I'll change this. The tricker case is a reincarnation in the
reverse direction. The conntrack entry must be killed and recreated
since the state table is directional and the client/server roles
change. Also needs a bit more thought.
> * DCCP-DataAck:
> - the transition sRS should go to sPO (Partopen). This is because the client
> can send data when it has received the Response from the server, i.e. it
> is the same rule as for DCCP-Ack in state sRS (cf. RFC 4340, 8.1.5). The "Ack"
> in the DataAck has the same effect as an Ack in sRS, it acknowledges the
> Response and thus triggers transition to Partopen.
Thanks, fixed.
> State Transitions in the reply direction
> ====================
>
> * DCCP-CloseReq:
> - the transition from sCG is a simultaneous-close, which is possible
> (both sides performing active-close, server sends CloseReq after client has
> sent a Close) and has been seen on the wire, cf.
> http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/closing_states/ )
> - use "ignore" here?
In case the client needs to respond with another Close it should
probably move to sCR. Otherwise I'd change it to stay (explicitly)
in sCG. Ignore is mainly for resyncing.
> * DCCP-Close:
> - the transition sCR -> sCG: I wonder if that is possible --
> o if the client is behind the NAT, it means the server sent a Close after
> a CloseReq, which is invalid
> o but if (hopefully soon) a server is behind a NAT, this would mean that
> the server had previously sent a CloseReq which now crosses paths with
> a Close from the client in the reverse direction -- again a simultaneous
> close, in this case sCR -> sCR would be possible
NAT shouldn't make any difference for the states. The table is
directional, so the Close transistion here is always a Close
after a CloseReq from the server, so its also invalid.
> o simplest option - maybe better to use sCR -> sIG or drop packet.
IIRC I used sCG because I couldn't figure out whether its
valid :) I'll change it to INVALID.
Thanks again for your review.
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: DCCP conntrack/NAT
2008-04-04 15:41 DCCP conntrack/NAT Patrick McHardy
` (5 preceding siblings ...)
2008-04-07 22:45 ` Patrick McHardy
@ 2008-04-08 9:27 ` Gerrit Renker
2008-04-08 10:30 ` Patrick McHardy
` (6 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: Gerrit Renker @ 2008-04-08 9:27 UTC (permalink / raw)
To: dccp
I have not a big understanding of netfilter and so got several things
wrong in the last posting. Thank you for patience in clarifying these.
>> * just curious about timeout for OPEN state: it is set to a full
>> working week (5 * 24 * 3600 seconds). There is this
>> wrap-around of DCCP timestamps after 11.2 hours, so maybe
>> the Open state can be curtailed.
>
> I just copied this part from the TCP helper because I didn't
> find a better value. Does the wraparound affect the maximum
> lifetime of a connection? Maybe it would make sense to decrease
> it in any case, I would expect applications using DCCP not
> to idle as long as TCP connections might.
>
DCCP uses a clock with a resolution of 0.00001 seconds (RFC 4340, 13.1).
It thus wraps around much faster than the TCP suggestion of using 1ms
timestamps (RFC 1323, 4.2.2(b)) that wrap around every 24.8 days.
It seems like this: timestamp is a 4-byte number, the 2^32 numbers need
to be split into two halves ("before", "after"), each number stands for
10 microseconds, so the maximum timespan without wrap-around is about
5.96 hours. When the timespan is longer than that, "after" can become
"before", i.e. there will be a glitch in RTT estimation and other parts
that rely on timestamps. The full wrap-around, where the clock reaches
the same value again, is after 11.9 hours.
However, the question is already resolved by the module's sysctl for
the Open state.
>> State Transitions in the original direction
>> =====================>>
>> * DCCP-Request:
>> - in state Respond (sRS -> sRS), the Request is illegal (Respond is server state)
>
> Yes, this is one of the differences that comes from sitting in
> the middle :) In the reply direction we transition from sRQ to
> sRS when receiving a Response. However, that response might not
> make it to the client or simply be late, in which case the request
> is retransmitted.
>
Yes that was my error and the transition is clearly correct.
I have a question regarding the original direction - currently it is
linked to the client which actively initiates a connection. DCCP
suffers from the problem that peer-to-peer NAT traversal is not
really possible just because of this client/server division. There
is a proposal which effects a pseudo simultaneous open, by letting
the server send an initiation packet, to fix this problem (TCP
peer-to-peer NAT traversal also favours simultaneous-open). I wonder
if this would be possible, but it is really a future-work question.
>> State Transitions in the reply direction
>> ====================
>>
>> * DCCP-CloseReq:
>> - the transition from sCG is a simultaneous-close, which is possible
>> (both sides performing active-close, server sends CloseReq after client has
>> sent a Close) and has been seen on the wire, cf.
>> http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/closing_states/ )
>> - use "ignore" here?
>
> In case the client needs to respond with another Close it should
> probably move to sCR. Otherwise I'd change it to stay (explicitly)
> in sCG. Ignore is mainly for resyncing.
>
Staying in sCG makes sense, in particular since RFC4340, 8.3 asks that a
Close must be sent in reply to each CloseReq (even when in state Closing).
So the client would retransmit its Close, which again would leave it in
sCG. When the server gets the second Close, it may already have received
the first one, thus it will respond with a Reset, Code 3 ("No Connection"),
which would then resolve the simultaneous-close into sTW.
Gerrit
The University of Aberdeen is a charity registered in Scotland, No SC013683.
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: DCCP conntrack/NAT
2008-04-04 15:41 DCCP conntrack/NAT Patrick McHardy
` (6 preceding siblings ...)
2008-04-08 9:27 ` Gerrit Renker
@ 2008-04-08 10:30 ` Patrick McHardy
2008-04-08 10:33 ` Patrick McHardy
` (5 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: Patrick McHardy @ 2008-04-08 10:30 UTC (permalink / raw)
To: dccp
[-- Attachment #1: Type: text/plain, Size: 4080 bytes --]
Gerrit Renker wrote:
> I have not a big understanding of netfilter and so got several things
> wrong in the last posting. Thank you for patience in clarifying these.
>
>>> * just curious about timeout for OPEN state: it is set to a full
>>> working week (5 * 24 * 3600 seconds). There is this
>>> wrap-around of DCCP timestamps after 11.2 hours, so maybe
>>> the Open state can be curtailed.
>> I just copied this part from the TCP helper because I didn't
>> find a better value. Does the wraparound affect the maximum
>> lifetime of a connection? Maybe it would make sense to decrease
>> it in any case, I would expect applications using DCCP not
>> to idle as long as TCP connections might.
>>
> DCCP uses a clock with a resolution of 0.00001 seconds (RFC 4340, 13.1).
> It thus wraps around much faster than the TCP suggestion of using 1ms
> timestamps (RFC 1323, 4.2.2(b)) that wrap around every 24.8 days.
>
> It seems like this: timestamp is a 4-byte number, the 2^32 numbers need
> to be split into two halves ("before", "after"), each number stands for
> 10 microseconds, so the maximum timespan without wrap-around is about
> 5.96 hours. When the timespan is longer than that, "after" can become
> "before", i.e. there will be a glitch in RTT estimation and other parts
> that rely on timestamps. The full wrap-around, where the clock reaches
> the same value again, is after 11.9 hours.
>
> However, the question is already resolved by the module's sysctl for
> the Open state.
I've changed the default timeout for OPEN to 12 hours.
>>> State Transitions in the original direction
>>> ===========================================
>>>
>>> * DCCP-Request:
>>> - in state Respond (sRS -> sRS), the Request is illegal (Respond is server state)
>> Yes, this is one of the differences that comes from sitting in
>> the middle :) In the reply direction we transition from sRQ to
>> sRS when receiving a Response. However, that response might not
>> make it to the client or simply be late, in which case the request
>> is retransmitted.
>>
> Yes that was my error and the transition is clearly correct.
>
> I have a question regarding the original direction - currently it is
> linked to the client which actively initiates a connection. DCCP
> suffers from the problem that peer-to-peer NAT traversal is not
> really possible just because of this client/server division. There
> is a proposal which effects a pseudo simultaneous open, by letting
> the server send an initiation packet, to fix this problem (TCP
> peer-to-peer NAT traversal also favours simultaneous-open). I wonder
> if this would be possible, but it is really a future-work question.
Yes, that should be possible. But how does the server know that
the client intends to initiate a connection?
>>> State Transitions in the reply direction
>>> ========================================
>>>
>>> * DCCP-CloseReq:
>>> - the transition from sCG is a simultaneous-close, which is possible
>>> (both sides performing active-close, server sends CloseReq after client has
>>> sent a Close) and has been seen on the wire, cf.
>>> http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/closing_states/ )
>>> - use "ignore" here?
>> In case the client needs to respond with another Close it should
>> probably move to sCR. Otherwise I'd change it to stay (explicitly)
>> in sCG. Ignore is mainly for resyncing.
>>
> Staying in sCG makes sense, in particular since RFC4340, 8.3 asks that a
> Close must be sent in reply to each CloseReq (even when in state Closing).
> So the client would retransmit its Close, which again would leave it in
> sCG. When the server gets the second Close, it may already have received
> the first one, thus it will respond with a Reset, Code 3 ("No Connection"),
> which would then resolve the simultaneous-close into sTW.
In that case sCR makes most sense since in that state we're
expecting a Close from the client.
The attached patch contains the changes I've made so far
based on your review. I'll go through the remaining points
now.
[-- Attachment #2: x --]
[-- Type: text/plain, Size: 2773 bytes --]
diff --git a/net/netfilter/nf_conntrack_proto_dccp.c b/net/netfilter/nf_conntrack_proto_dccp.c
index 44c8aa6..8509278 100644
--- a/net/netfilter/nf_conntrack_proto_dccp.c
+++ b/net/netfilter/nf_conntrack_proto_dccp.c
@@ -70,7 +70,7 @@ static unsigned int dccp_timeout[CT_DCCP_MAX + 1] __read_mostly = {
[CT_DCCP_REQUEST] = 2 * DCCP_MSL,
[CT_DCCP_RESPOND] = 4 * DCCP_MSL,
[CT_DCCP_PARTOPEN] = 4 * DCCP_MSL,
- [CT_DCCP_OPEN] = 5 * 86400 * HZ,
+ [CT_DCCP_OPEN] = 12 * 3600 * HZ,
[CT_DCCP_CLOSEREQ] = 64 * HZ,
[CT_DCCP_CLOSING] = 64 * HZ,
[CT_DCCP_TIMEWAIT] = 2 * DCCP_MSL,
@@ -142,12 +142,12 @@ dccp_state_table[IP_CT_DIR_MAX + 1][DCCP_PKT_SYNCACK + 1][CT_DCCP_MAX + 1] = {
* got lost after we saw it) or reincarnation
* sPO -> sIG Request during PARTOPEN state, server will ignore it
* sOP -> sIG Request during OPEN state: server will ignore it
- * sCR -> sIG MUST respond with Close to CloseReq (8.3.)
+ * sCR -> sIG FIXME MUST respond with Close to CloseReq (8.3.)
* sCG -> sIG
- * sTW -> sIG Time-wait
+ * sTW -> sRQ Reincarnation
*
* sNO, sRQ, sRS, sPO. sOP, sCR, sCG, sTW, */
- sRQ, sRQ, sRS, sIG, sIG, sIG, sIG, sIG,
+ sRQ, sRQ, sRS, sIG, sIG, sIG, sIG, sRQ,
},
[DCCP_PKT_RESPONSE] = {
/*
@@ -188,7 +188,7 @@ dccp_state_table[IP_CT_DIR_MAX + 1][DCCP_PKT_SYNCACK + 1][CT_DCCP_MAX + 1] = {
/*
* sNO -> sIV No connection
* sRQ -> sIV No connection
- * sRS -> sIV No connection
+ * sRS -> sPO Ack for Response, move to PARTOPEN (8.1.5.)
* sPO -> sPO Remain in PARTOPEN state
* sOP -> sOP Regular DataAck packet in OPEN state
* sCR -> sCR DataAck in CLOSEREQ MAY be processed (8.3.)
@@ -196,7 +196,7 @@ dccp_state_table[IP_CT_DIR_MAX + 1][DCCP_PKT_SYNCACK + 1][CT_DCCP_MAX + 1] = {
* sTW -> sIV
*
* sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */
- sIV, sIV, sIV, sPO, sOP, sCR, sCG, sIV
+ sIV, sIV, sPO, sPO, sOP, sCR, sCG, sIV
},
[DCCP_PKT_CLOSEREQ] = {
/*
@@ -320,7 +320,7 @@ dccp_state_table[IP_CT_DIR_MAX + 1][DCCP_PKT_SYNCACK + 1][CT_DCCP_MAX + 1] = {
* sPO -> sOP -> sCR Move directly to CLOSEREQ (8.1.5.)
* sOP -> sCR CloseReq in OPEN state
* sCR -> sCR Retransmit
- * sCG -> sIV Already closing
+ * sCG -> sCR Simultaneous close, client sends another Close
* sTW -> sIV Already closed
*
* sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */
@@ -333,7 +333,7 @@ dccp_state_table[IP_CT_DIR_MAX + 1][DCCP_PKT_SYNCACK + 1][CT_DCCP_MAX + 1] = {
* sRS -> sIV No connection
* sPO -> sOP -> sCG Move direcly to CLOSING
* sOP -> sCG Move to CLOSING
- * sCR -> sCG Waiting for close from client
+ * sCR -> sIV Close after CloseReq is invalid
* sCG -> sCG Retransmit
* sTW -> sIV Already closed
*
^ permalink raw reply related [flat|nested] 15+ messages in thread* Re: DCCP conntrack/NAT
2008-04-04 15:41 DCCP conntrack/NAT Patrick McHardy
` (7 preceding siblings ...)
2008-04-08 10:30 ` Patrick McHardy
@ 2008-04-08 10:33 ` Patrick McHardy
2008-04-08 11:18 ` Patrick McHardy
` (4 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: Patrick McHardy @ 2008-04-08 10:33 UTC (permalink / raw)
To: dccp
[-- Attachment #1: Type: text/plain, Size: 255 bytes --]
Patrick McHardy wrote:
> The attached patch contains the changes I've made so far
> based on your review. I'll go through the remaining points
> now.
>
New version attached that doesn't only update the comments
but also the actual state transitions :)
[-- Attachment #2: x --]
[-- Type: text/plain, Size: 3069 bytes --]
diff --git a/net/netfilter/nf_conntrack_proto_dccp.c b/net/netfilter/nf_conntrack_proto_dccp.c
index 44c8aa6..e17bd4f 100644
--- a/net/netfilter/nf_conntrack_proto_dccp.c
+++ b/net/netfilter/nf_conntrack_proto_dccp.c
@@ -70,7 +70,7 @@ static unsigned int dccp_timeout[CT_DCCP_MAX + 1] __read_mostly = {
[CT_DCCP_REQUEST] = 2 * DCCP_MSL,
[CT_DCCP_RESPOND] = 4 * DCCP_MSL,
[CT_DCCP_PARTOPEN] = 4 * DCCP_MSL,
- [CT_DCCP_OPEN] = 5 * 86400 * HZ,
+ [CT_DCCP_OPEN] = 12 * 3600 * HZ,
[CT_DCCP_CLOSEREQ] = 64 * HZ,
[CT_DCCP_CLOSING] = 64 * HZ,
[CT_DCCP_TIMEWAIT] = 2 * DCCP_MSL,
@@ -142,12 +142,12 @@ dccp_state_table[IP_CT_DIR_MAX + 1][DCCP_PKT_SYNCACK + 1][CT_DCCP_MAX + 1] = {
* got lost after we saw it) or reincarnation
* sPO -> sIG Request during PARTOPEN state, server will ignore it
* sOP -> sIG Request during OPEN state: server will ignore it
- * sCR -> sIG MUST respond with Close to CloseReq (8.3.)
+ * sCR -> sIG FIXME MUST respond with Close to CloseReq (8.3.)
* sCG -> sIG
- * sTW -> sIG Time-wait
+ * sTW -> sRQ Reincarnation
*
* sNO, sRQ, sRS, sPO. sOP, sCR, sCG, sTW, */
- sRQ, sRQ, sRS, sIG, sIG, sIG, sIG, sIG,
+ sRQ, sRQ, sRS, sIG, sIG, sIG, sIG, sRQ,
},
[DCCP_PKT_RESPONSE] = {
/*
@@ -188,7 +188,7 @@ dccp_state_table[IP_CT_DIR_MAX + 1][DCCP_PKT_SYNCACK + 1][CT_DCCP_MAX + 1] = {
/*
* sNO -> sIV No connection
* sRQ -> sIV No connection
- * sRS -> sIV No connection
+ * sRS -> sPO Ack for Response, move to PARTOPEN (8.1.5.)
* sPO -> sPO Remain in PARTOPEN state
* sOP -> sOP Regular DataAck packet in OPEN state
* sCR -> sCR DataAck in CLOSEREQ MAY be processed (8.3.)
@@ -196,7 +196,7 @@ dccp_state_table[IP_CT_DIR_MAX + 1][DCCP_PKT_SYNCACK + 1][CT_DCCP_MAX + 1] = {
* sTW -> sIV
*
* sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */
- sIV, sIV, sIV, sPO, sOP, sCR, sCG, sIV
+ sIV, sIV, sPO, sPO, sOP, sCR, sCG, sIV
},
[DCCP_PKT_CLOSEREQ] = {
/*
@@ -320,11 +320,11 @@ dccp_state_table[IP_CT_DIR_MAX + 1][DCCP_PKT_SYNCACK + 1][CT_DCCP_MAX + 1] = {
* sPO -> sOP -> sCR Move directly to CLOSEREQ (8.1.5.)
* sOP -> sCR CloseReq in OPEN state
* sCR -> sCR Retransmit
- * sCG -> sIV Already closing
+ * sCG -> sCR Simultaneous close, client sends another Close
* sTW -> sIV Already closed
*
* sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */
- sIV, sIV, sIV, sCR, sCR, sCR, sIV, sIV
+ sIV, sIV, sIV, sCR, sCR, sCR, sCR, sIV
},
[DCCP_PKT_CLOSE] = {
/*
@@ -333,12 +333,12 @@ dccp_state_table[IP_CT_DIR_MAX + 1][DCCP_PKT_SYNCACK + 1][CT_DCCP_MAX + 1] = {
* sRS -> sIV No connection
* sPO -> sOP -> sCG Move direcly to CLOSING
* sOP -> sCG Move to CLOSING
- * sCR -> sCG Waiting for close from client
+ * sCR -> sIV Close after CloseReq is invalid
* sCG -> sCG Retransmit
* sTW -> sIV Already closed
*
* sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */
- sIV, sIV, sIV, sCG, sCG, sCG, sCG, sIV
+ sIV, sIV, sIV, sCG, sCG, sIV, sCG, sIV
},
[DCCP_PKT_RESET] = {
/*
^ permalink raw reply related [flat|nested] 15+ messages in thread* Re: DCCP conntrack/NAT
2008-04-04 15:41 DCCP conntrack/NAT Patrick McHardy
` (8 preceding siblings ...)
2008-04-08 10:33 ` Patrick McHardy
@ 2008-04-08 11:18 ` Patrick McHardy
2008-04-08 13:38 ` Gerrit Renker
` (3 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: Patrick McHardy @ 2008-04-08 11:18 UTC (permalink / raw)
To: dccp
[-- Attachment #1: Type: text/plain, Size: 817 bytes --]
Patrick McHardy wrote:
> Gerrit Renker wrote:
>>
>> - timewait transition -- question: is it possible to re-incarnate a
>> new connection
>> here instead of ignoring the (new) Request?
>
> Yes, I'll change this. The tricker case is a reincarnation in the
> reverse direction. The conntrack entry must be killed and recreated
> since the state table is directional and the client/server roles
> change. Also needs a bit more thought.
The last patch handled reincarnations in the original direction,
this one adds support for reopening a connection in the reverse
direction. I used role reversal instead of recreating the conntrack
entry since that should make it easier to add DCCP_LISTEN support
later on.
(Patch might not apply because of minor cleanups I made locally,
I'll push out a new tree later).
[-- Attachment #2: x --]
[-- Type: text/plain, Size: 4319 bytes --]
diff --git a/include/linux/netfilter/nf_conntrack_dccp.h b/include/linux/netfilter/nf_conntrack_dccp.h
index 33e57c8..f3b9ce8 100644
--- a/include/linux/netfilter/nf_conntrack_dccp.h
+++ b/include/linux/netfilter/nf_conntrack_dccp.h
@@ -15,12 +15,21 @@ enum ct_dccp_states {
CT_DCCP_INVALID,
__CT_DCCP_MAX
};
-#define CT_DCCP_MAX (__CT_DCCP_MAX - 1)
+#define CT_DCCP_MAX (__CT_DCCP_MAX - 1)
+
+enum ct_dccp_roles {
+ CT_DCCP_ROLE_CLIENT,
+ CT_DCCP_ROLE_SERVER,
+ __CT_DCCP_ROLE_MAX
+};
+#define CT_DCCP_ROLE_MAX (__CT_DCCP_ROLE_MAX - 1)
#ifdef __KERNEL__
+#include <net/netfilter/nf_conntrack_tuple.h>
struct nf_ct_dccp {
u_int8_t state;
+ u_int8_t role[IP_CT_DIR_MAX];
u_int64_t handshake_seq;
};
diff --git a/net/netfilter/nf_conntrack_proto_dccp.c b/net/netfilter/nf_conntrack_proto_dccp.c
index bad3c6a..f768936 100644
--- a/net/netfilter/nf_conntrack_proto_dccp.c
+++ b/net/netfilter/nf_conntrack_proto_dccp.c
@@ -138,8 +138,8 @@ static const char * const dccp_state_names[] = {
* or a DCCP_RESPONSE.
*/
static const u_int8_t
-dccp_state_table[IP_CT_DIR_MAX + 1][DCCP_PKT_SYNCACK + 1][CT_DCCP_MAX + 1] = {
- [IP_CT_DIR_ORIGINAL] = {
+dccp_state_table[CT_DCCP_ROLE_MAX + 1][DCCP_PKT_SYNCACK + 1][CT_DCCP_MAX + 1] = {
+ [CT_DCCP_ROLE_CLIENT] = {
[DCCP_PKT_REQUEST] = {
/*
* sNO -> sRQ Regular Request
@@ -157,7 +157,7 @@ dccp_state_table[IP_CT_DIR_MAX + 1][DCCP_PKT_SYNCACK + 1][CT_DCCP_MAX + 1] = {
},
[DCCP_PKT_RESPONSE] = {
/*
- * A Response in the original direction is always invalid.
+ * A Response from the client is always invalid.
*
* sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */
sIV, sIV, sIV, sIV, sIV, sIV, sIV, sIV,
@@ -254,13 +254,23 @@ dccp_state_table[IP_CT_DIR_MAX + 1][DCCP_PKT_SYNCACK + 1][CT_DCCP_MAX + 1] = {
sIG, sIG, sIG, sIG, sIG, sIG, sIG, sIG,
},
},
- [IP_CT_DIR_REPLY] = {
+ [CT_DCCP_ROLE_SERVER] = {
[DCCP_PKT_REQUEST] = {
/*
- * A Request in the reply direction is always invalid.
+ * A Request from the server is only valid for reopening a
+ * connection in TIMEWAIT state.
+ *
+ * sNO -> sIV
+ * sRQ -> sIV
+ * sRS -> sIV
+ * sPO -> sIV
+ * sOP -> sIV
+ * sCR -> sIV
+ * sCG -> sIV
+ * sTW -> sRQ Reincarnation, must reverse roles
*
* sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */
- sIV, sIV, sIV, sIV, sIV, sIV, sIV, sIV
+ sIV, sIV, sIV, sIV, sIV, sIV, sIV, sRQ
},
[DCCP_PKT_RESPONSE] = {
/*
@@ -410,7 +420,7 @@ static int dccp_new(struct nf_conn *ct, const struct sk_buff *skb,
dh = skb_header_pointer(skb, dataoff, sizeof(_dh), &dh);
BUG_ON(dh == NULL);
- state = dccp_state_table[IP_CT_DIR_ORIGINAL][dh->dccph_type][CT_DCCP_NONE];
+ state = dccp_state_table[CT_DCCP_ROLE_CLIENT][dh->dccph_type][CT_DCCP_NONE];
switch (state) {
default:
if (nf_ct_dccp_loose == 0) {
@@ -425,6 +435,8 @@ static int dccp_new(struct nf_conn *ct, const struct sk_buff *skb,
}
ct->proto.dccp.state = CT_DCCP_NONE;
+ ct->proto.dccp.role[IP_CT_DIR_ORIGINAL] = CT_DCCP_ROLE_CLIENT;
+ ct->proto.dccp.role[IP_CT_DIR_REPLY] = CT_DCCP_ROLE_SERVER;
return 1;
out_invalid:
@@ -446,8 +458,10 @@ static int dccp_packet(struct nf_conn *ct, const struct sk_buff *skb,
unsigned int dataoff, enum ip_conntrack_info ctinfo,
int pf, unsigned int hooknum)
{
+ enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
struct dccp_hdr _dh, *dh;
u_int8_t type, old_state, new_state;
+ enum ct_dccp_roles role;
dh = skb_header_pointer(skb, dataoff, sizeof(_dh), &dh);
BUG_ON(dh == NULL);
@@ -463,10 +477,20 @@ static int dccp_packet(struct nf_conn *ct, const struct sk_buff *skb,
write_lock_bh(&dccp_lock);
+ role = ct->proto.dccp.role[dir];
old_state = ct->proto.dccp.state;
- new_state = dccp_state_table[CTINFO2DIR(ctinfo)][type][old_state];
+ new_state = dccp_state_table[role][type][old_state];
switch (new_state) {
+ case CT_DCCP_REQUEST:
+ if (old_state == CT_DCCP_TIMEWAIT &&
+ role == CT_DCCP_ROLE_SERVER) {
+ /* Reincarnation in the reverse direction: reopen and
+ * reverse client/server roles. */
+ ct->proto.dccp.role[dir] = CT_DCCP_ROLE_CLIENT;
+ ct->proto.dccp.role[!dir] = CT_DCCP_ROLE_SERVER;
+ }
+ break;
case CT_DCCP_RESPOND:
if (old_state == CT_DCCP_REQUEST)
ct->proto.dccp.handshake_seq = dccp_hdr_seq(dh);
^ permalink raw reply related [flat|nested] 15+ messages in thread* Re: DCCP conntrack/NAT
2008-04-04 15:41 DCCP conntrack/NAT Patrick McHardy
` (9 preceding siblings ...)
2008-04-08 11:18 ` Patrick McHardy
@ 2008-04-08 13:38 ` Gerrit Renker
2008-04-08 14:12 ` Patrick McHardy
` (2 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: Gerrit Renker @ 2008-04-08 13:38 UTC (permalink / raw)
To: dccp
>> I have a question regarding the original direction - currently it is
>> linked to the client which actively initiates a connection. DCCP
>> suffers from the problem that peer-to-peer NAT traversal is not
>> really possible just because of this client/server division. There
>> is a proposal which effects a pseudo simultaneous open, by letting
>> the server send an initiation packet, to fix this problem (TCP
>> peer-to-peer NAT traversal also favours simultaneous-open). I wonder
>> if this would be possible, but it is really a future-work question.
>
> Yes, that should be possible. But how does the server know that
> the client intends to initiate a connection?
>
This is outside the actual NAT implementation, via out-of-band, e.g. using SIP
or Session Traversal Utilities for NAT (draft-ietf-behave-rfc3489bis).
In this regard, the role-reversal patch will be a great help, since it
will allow support for peer-to-peer NAT traversal (i.e. NAT-ed server).
The University of Aberdeen is a charity registered in Scotland, No SC013683.
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: DCCP conntrack/NAT
2008-04-04 15:41 DCCP conntrack/NAT Patrick McHardy
` (10 preceding siblings ...)
2008-04-08 13:38 ` Gerrit Renker
@ 2008-04-08 14:12 ` Patrick McHardy
2008-04-08 14:26 ` Patrick McHardy
2008-04-08 16:21 ` Patrick McHardy
13 siblings, 0 replies; 15+ messages in thread
From: Patrick McHardy @ 2008-04-08 14:12 UTC (permalink / raw)
To: dccp
[-- Attachment #1: Type: text/plain, Size: 1288 bytes --]
Patrick McHardy wrote:
> Gerrit Renker wrote:
>> State Transitions in the original direction
>> ===========================================
>>
>> * DCCP-Request:
>> - in state Respond (sRS -> sRS), the Request is illegal (Respond
>> is server state)
>
>> - also, the CLOSEREQ state transition (sCR -> sIG) is illegal:
>> Requests are sent by clients only, and CLOSEREQ can only be
>> entered by servers
>
> We track both sides, so we must also define which client packets
> are valid in which server state. This particular one is part of
> the unfinished resync feature. The firewall might be out of sync
> with both endpoints. If connection pickup is enabled it should
> let packets that might establish a new connection pass and resync
> when the other side responds with a valid Response. I need to
> think about this a bit more, but I've marked it with FIXME for
> now :)
I've added this patch on top to handle the out-of-sync case by
letting Requests pass in (almost) any state and resyncing when
seeing a valid Response.
Last TODO before the DCCP_LISTEN support is to fix connection
pickup for established connections. Since it doesn't see the
initial Request/Response it doesn't know which side has which
role and also can't properly pick an inital state.
[-- Attachment #2: x --]
[-- Type: text/plain, Size: 5224 bytes --]
diff --git a/include/linux/netfilter/nf_conntrack_dccp.h b/include/linux/netfilter/nf_conntrack_dccp.h
index 41ffdf8..40dcc82 100644
--- a/include/linux/netfilter/nf_conntrack_dccp.h
+++ b/include/linux/netfilter/nf_conntrack_dccp.h
@@ -30,6 +30,8 @@ enum ct_dccp_roles {
struct nf_ct_dccp {
u_int8_t role[IP_CT_DIR_MAX];
u_int8_t state;
+ u_int8_t last_pkt;
+ u_int8_t last_dir;
u_int64_t handshake_seq;
};
diff --git a/net/netfilter/nf_conntrack_proto_dccp.c b/net/netfilter/nf_conntrack_proto_dccp.c
index a89113d..96bd70c 100644
--- a/net/netfilter/nf_conntrack_proto_dccp.c
+++ b/net/netfilter/nf_conntrack_proto_dccp.c
@@ -147,10 +147,10 @@ dccp_state_table[CT_DCCP_ROLE_MAX + 1][DCCP_PKT_SYNCACK + 1][CT_DCCP_MAX + 1] =
* sRQ -> sRQ Retransmitted Request or reincarnation
* sRS -> sRS Retransmitted Request (apparently Response
* got lost after we saw it) or reincarnation
- * sPO -> sIG Request during PARTOPEN state, server will ignore it
- * sOP -> sIG Request during OPEN state: server will ignore it
- * sCR -> sIG FIXME MUST respond with Close to CloseReq (8.3.)
- * sCG -> sIG
+ * sPO -> sIG Ignore, conntrack might be out of sync
+ * sOP -> sIG Ignore, conntrack might be out of sync
+ * sCR -> sIG Ignore, conntrack might be out of sync
+ * sCG -> sIG Ignore, conntrack might be out of sync
* sTW -> sRQ Reincarnation
*
* sNO, sRQ, sRS, sPO. sOP, sCR, sCG, sTW, */
@@ -158,10 +158,18 @@ dccp_state_table[CT_DCCP_ROLE_MAX + 1][DCCP_PKT_SYNCACK + 1][CT_DCCP_MAX + 1] =
},
[DCCP_PKT_RESPONSE] = {
/*
- * A Response from the client is always invalid.
+ * sNO -> sIV Invalid
+ * sRQ -> sIG Ignore, might be response to ignored Request
+ * sRS -> sIG Ignore, might be response to ignored Request
+ * sPO -> sIG Ignore, might be response to ignored Request
+ * sOP -> sIG Ignore, might be response to ignored Request
+ * sCR -> sIG Ignore, might be response to ignored Request
+ * sCG -> sIG Ignore, might be response to ignored Request
+ * sTW -> sIV Invalid, reincarnation in reverse direction
+ * goes through sRQ
*
* sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */
- sIV, sIV, sIV, sIV, sIV, sIV, sIV, sIV,
+ sIV, sIG, sIG, sIG, sIG, sIG, sIG, sIV,
},
[DCCP_PKT_ACK] = {
/*
@@ -258,20 +266,17 @@ dccp_state_table[CT_DCCP_ROLE_MAX + 1][DCCP_PKT_SYNCACK + 1][CT_DCCP_MAX + 1] =
[CT_DCCP_ROLE_SERVER] = {
[DCCP_PKT_REQUEST] = {
/*
- * A Request from the server is only valid for reopening a
- * connection in TIMEWAIT state.
- *
- * sNO -> sIV
- * sRQ -> sIV
- * sRS -> sIV
- * sPO -> sIV
- * sOP -> sIV
- * sCR -> sIV
- * sCG -> sIV
+ * sNO -> sIV Invalid
+ * sRQ -> sIG Ignore, conntrack might be out of sync
+ * sRS -> sIG Ignore, conntrack might be out of sync
+ * sPO -> sIG Ignore, conntrack might be out of sync
+ * sOP -> sIG Ignore, conntrack might be out of sync
+ * sCR -> sIG Ignore, conntrack might be out of sync
+ * sCG -> sIG Ignore, conntrack might be out of sync
* sTW -> sRQ Reincarnation, must reverse roles
*
* sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */
- sIV, sIV, sIV, sIV, sIV, sIV, sIV, sRQ
+ sIV, sIG, sIG, sIG, sIG, sIG, sIG, sRQ
},
[DCCP_PKT_RESPONSE] = {
/*
@@ -279,13 +284,13 @@ dccp_state_table[CT_DCCP_ROLE_MAX + 1][DCCP_PKT_SYNCACK + 1][CT_DCCP_MAX + 1] =
* sRQ -> sRS Response to clients Request
* sRS -> sRS Retransmitted Response (8.1.3. SHOULD NOT)
* sPO -> sIG Response to an ignored Request or late retransmit
- * sOP -> sIG Invalid
- * sCR -> sIG Invalid
- * sCG -> sIG Invalid
- * sTW -> sIG Invalid
+ * sOP -> sIG Ignore, might be response to ignored Request
+ * sCR -> sIG Ignore, might be response to ignored Request
+ * sCG -> sIG Ignore, might be response to ignored Request
+ * sTW -> sIV Invalid, Request from client in sTW moves to sRQ
*
* sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */
- sIV, sRS, sRS, sIG, sIG, sIG, sIG, sIG
+ sIV, sRS, sRS, sIG, sIG, sIG, sIG, sIV
},
[DCCP_PKT_ACK] = {
/*
@@ -503,6 +508,20 @@ static int dccp_packet(struct nf_conn *ct, const struct sk_buff *skb,
set_bit(IPS_ASSURED_BIT, &ct->status);
break;
case CT_DCCP_IGNORE:
+ /*
+ * Connection tracking might be out of sync, so we ignore
+ * packets that might establish a new connection and resync
+ * if the server responds with a valid Response.
+ */
+ if (ct->proto.dccp.last_dir == !dir &&
+ ct->proto.dccp.last_pkt == DCCP_PKT_REQUEST &&
+ type == DCCP_PKT_RESPONSE) {
+ ct->proto.dccp.role[!dir] = CT_DCCP_ROLE_CLIENT;
+ ct->proto.dccp.role[dir] = CT_DCCP_ROLE_SERVER;
+ ct->proto.dccp.handshake_seq = dccp_hdr_seq(dh);
+ new_state = CT_DCCP_RESPOND;
+ break;
+ }
write_unlock_bh(&dccp_lock);
if (LOG_INVALID(IPPROTO_DCCP))
nf_log_packet(pf, 0, skb, NULL, NULL, NULL,
@@ -516,6 +535,8 @@ static int dccp_packet(struct nf_conn *ct, const struct sk_buff *skb,
return -NF_ACCEPT;
}
+ ct->proto.dccp.last_dir = dir;
+ ct->proto.dccp.last_pkt = type;
ct->proto.dccp.state = new_state;
write_unlock_bh(&dccp_lock);
nf_ct_refresh_acct(ct, ctinfo, skb, dccp_timeout[new_state]);
^ permalink raw reply related [flat|nested] 15+ messages in thread* Re: DCCP conntrack/NAT
2008-04-04 15:41 DCCP conntrack/NAT Patrick McHardy
` (11 preceding siblings ...)
2008-04-08 14:12 ` Patrick McHardy
@ 2008-04-08 14:26 ` Patrick McHardy
2008-04-08 16:21 ` Patrick McHardy
13 siblings, 0 replies; 15+ messages in thread
From: Patrick McHardy @ 2008-04-08 14:26 UTC (permalink / raw)
To: dccp
Gerrit Renker wrote:
>>> I have a question regarding the original direction - currently it is
>>> linked to the client which actively initiates a connection. DCCP
>>> suffers from the problem that peer-to-peer NAT traversal is not
>>> really possible just because of this client/server division. There
>>> is a proposal which effects a pseudo simultaneous open, by letting
>>> the server send an initiation packet, to fix this problem (TCP
>>> peer-to-peer NAT traversal also favours simultaneous-open). I wonder
>>> if this would be possible, but it is really a future-work question.
>>>
>> Yes, that should be possible. But how does the server know that
>> the client intends to initiate a connection?
>>
>>
> This is outside the actual NAT implementation, via out-of-band, e.g. using SIP
> or Session Traversal Utilities for NAT (draft-ietf-behave-rfc3489bis).
>
> In this regard, the role-reversal patch will be a great help, since it
> will allow support for peer-to-peer NAT traversal (i.e. NAT-ed server).
This sounds like it would already work when using the netfilter
SIP helper. But the DCCP_LISTEN method is a lot simpler.
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: DCCP conntrack/NAT
2008-04-04 15:41 DCCP conntrack/NAT Patrick McHardy
` (12 preceding siblings ...)
2008-04-08 14:26 ` Patrick McHardy
@ 2008-04-08 16:21 ` Patrick McHardy
13 siblings, 0 replies; 15+ messages in thread
From: Patrick McHardy @ 2008-04-08 16:21 UTC (permalink / raw)
To: dccp
Patrick McHardy wrote:
> I've added this patch on top to handle the out-of-sync case by
> letting Requests pass in (almost) any state and resyncing when
> seeing a valid Response.
>
> Last TODO before the DCCP_LISTEN support is to fix connection
> pickup for established connections. Since it doesn't see the
> initial Request/Response it doesn't know which side has which
> role and also can't properly pick an inital state.
I skipped this for now since it gets a bit ugly and is
not very important for an initial version. I'm uploading
my current tree to kernel.org now, should be there within
the next 20 minutes.
I'll post a patch for DCCP_LISTEN soon (not sure if I will
manage to get it ready today though).
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2008-04-08 16:21 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-04 15:41 DCCP conntrack/NAT Patrick McHardy
2008-04-04 19:59 ` Jan Engelhardt
2008-04-05 10:09 ` Pablo Neira Ayuso
2008-04-05 10:12 ` Pablo Neira Ayuso
2008-04-06 0:28 ` Patrick McHardy
2008-04-07 21:50 ` Gerrit Renker
2008-04-07 22:45 ` Patrick McHardy
2008-04-08 9:27 ` Gerrit Renker
2008-04-08 10:30 ` Patrick McHardy
2008-04-08 10:33 ` Patrick McHardy
2008-04-08 11:18 ` Patrick McHardy
2008-04-08 13:38 ` Gerrit Renker
2008-04-08 14:12 ` Patrick McHardy
2008-04-08 14:26 ` Patrick McHardy
2008-04-08 16:21 ` Patrick McHardy
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox