* [PATCH,RFC] explicit connection confirmation
@ 2002-11-07 9:32 Lennert Buytenhek
2002-11-07 11:27 ` bert hubert
2003-08-14 13:11 ` Lennert Buytenhek
0 siblings, 2 replies; 14+ messages in thread
From: Lennert Buytenhek @ 2002-11-07 9:32 UTC (permalink / raw)
To: netdev
(please CC on replies, I am not on this list)
Hi,
This patch gives userland the ability to decide whether to react
with an incoming TCP SYN with a SYN-ACK or a RST. It was hacked
up after Linux Kongress 2001 and has been sitting on my patch
pile since april this year or something.
The basic idea is this:
- Put the listening TCP socket in TCP_CONFIRM_CONNECT mode.
- Sockets returned from accept() on this socket after this will be
sockets in the SYN_RECV state instead of the ESTABLISHED state
(unless syncookies had to be used). By writing to the socket,
you cause a SYN-ACK to be sent, and by immediately closing the
socket you cause a RST to be sent.
There are two issues left, AFAICS:
- SYN_RECV sockets currently don't time out for some reason
- it deadlocks instantly on SMP
It's against 2.4.18. Could someone have a look at it please? I
unfortunately haven't had any time at all lately, so I would be
really happy if someone else could take this over. (Well, I can
dream, can't I?)
cheers,
Lennert
--- linux-2.4.18-11umpr/include/linux/tcp.h.orig Thu Nov 22 20:47:11 2001
+++ linux-2.4.18-11umpr/include/linux/tcp.h Thu Apr 18 19:33:19 2002
@@ -127,6 +127,7 @@
#define TCP_WINDOW_CLAMP 10 /* Bound advertised window */
#define TCP_INFO 11 /* Information about this connection. */
#define TCP_QUICKACK 12 /* Block/reenable quick acks */
+#define TCP_CONFIRM_CONNECT 13 /* Let user control connection acceptance */
#define TCPI_OPT_TIMESTAMPS 1
#define TCPI_OPT_SACK 2
--- linux-2.4.18-11umpr/include/net/sock.h.orig Fri Dec 21 18:42:04 2001
+++ linux-2.4.18-11umpr/include/net/sock.h Thu Apr 18 19:37:52 2002
@@ -302,6 +302,7 @@
__u8 reordering; /* Packet reordering metric. */
__u8 queue_shrunk; /* Write queue has been shrunk recently.*/
__u8 defer_accept; /* User waits for some data after accept() */
+ __u8 confirm_connect;/* User wants control over conn. acceptance */
/* RTT measurement */
__u8 backoff; /* backoff */
@@ -411,6 +412,11 @@
struct open_request *accept_queue;
struct open_request *accept_queue_tail;
+ /* Our corresponding open_request if this socket is unconfirmed
+ * (i.e. if we haven't sent SYN-ACK or RST yet)
+ */
+ struct open_request *unconfirmed_openreq;
+
int write_pending; /* A write to socket waits to start. */
unsigned int keepalive_time; /* time before keep alive takes place */
--- linux-2.4.18-11umpr/include/net/tcp.h.orig Thu Nov 22 20:47:22 2001
+++ linux-2.4.18-11umpr/include/net/tcp.h Fri Apr 19 10:42:51 2002
@@ -505,7 +505,8 @@
sack_ok : 1,
wscale_ok : 1,
ecn_ok : 1,
- acked : 1;
+ acked : 1,
+ unconfirmed : 1;
/* The following two fields can be easily recomputed I think -AK */
__u32 window_clamp; /* window clamp at creation time */
__u32 rcv_wnd; /* rcv_wnd offered first time */
@@ -533,6 +534,17 @@
tcp_openreq_fastfree(req);
}
+static inline int tcp_is_unconfirmed(struct tcp_opt *tp)
+{
+ struct open_request *req;
+
+ req = tp->unconfirmed_openreq;
+ if (req != NULL && req->unconfirmed)
+ return 1;
+
+ return 0;
+}
+
#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
#define TCP_INET_FAMILY(fam) ((fam) == AF_INET)
#else
@@ -1661,6 +1673,7 @@
req->acked = 0;
req->ecn_ok = 0;
req->rmt_port = skb->h.th->source;
+ req->unconfirmed = 0;
}
#define TCP_MEM_QUANTUM ((int)PAGE_SIZE)
--- linux-2.4.18-11umpr/net/ipv4/tcp.c.orig Fri Dec 21 18:42:05 2001
+++ linux-2.4.18-11umpr/net/ipv4/tcp.c Fri Apr 19 20:50:29 2002
@@ -204,6 +204,7 @@
* Andi Kleen : Make poll agree with SIGIO
* Salvatore Sanfilippo : Support SO_LINGER with linger == 1 and
* lingertime == 0 (RFC 793 ABORT Call)
+ * Lennert Buytenhek : Explicit connection confirmation
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
@@ -366,6 +367,15 @@
return sk->tp_pinfo.af_tcp.accept_queue ? (POLLIN | POLLRDNORM) : 0;
}
+static void tcp_confirm(struct sock *sk)
+{
+ struct tcp_opt *tp = &(sk->tp_pinfo.af_tcp);
+ struct open_request *req = tp->unconfirmed_openreq;
+
+ req->unconfirmed = 0;
+ req->class->rtx_syn_ack(sk, req, NULL);
+}
+
/*
* Wait for a TCP event.
*
@@ -650,6 +660,9 @@
struct task_struct *tsk = current;
DECLARE_WAITQUEUE(wait, tsk);
+ if (tcp_is_unconfirmed(tp))
+ tcp_confirm(sk);
+
while((1 << sk->state) & ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)) {
if(sk->err)
return sock_error(sk);
@@ -1814,7 +1827,7 @@
void tcp_close(struct sock *sk, long timeout)
{
struct sk_buff *skb;
- int data_was_unread = 0;
+ int should_send_rst = 0;
lock_sock(sk);
sk->shutdown = SHUTDOWN_MASK;
@@ -1834,12 +1847,19 @@
*/
while((skb=__skb_dequeue(&sk->receive_queue))!=NULL) {
u32 len = TCP_SKB_CB(skb)->end_seq - TCP_SKB_CB(skb)->seq - skb->h.th->fin;
- data_was_unread += len;
+ should_send_rst += len;
__kfree_skb(skb);
}
tcp_mem_reclaim(sk);
+ if (sk->tp_pinfo.af_tcp.unconfirmed_openreq != NULL) {
+ if (tcp_is_unconfirmed(&(sk->tp_pinfo.af_tcp)))
+ should_send_rst = 1;
+ tcp_openreq_free(sk->tp_pinfo.af_tcp.unconfirmed_openreq);
+ sk->tp_pinfo.af_tcp.unconfirmed_openreq = NULL;
+ }
+
/* As outlined in draft-ietf-tcpimpl-prob-03.txt, section
* 3.10, we send a RST here because data was lost. To
* witness the awful effects of the old behavior of always
@@ -1849,7 +1869,7 @@
* the FTP client, wheee... Note: timeout is always zero
* in such a case.
*/
- if(data_was_unread != 0) {
+ if(should_send_rst) {
/* Unread data was tossed, zap the connection. */
NET_INC_STATS_USER(TCPAbortOnClose);
tcp_set_state(sk, TCP_CLOSE);
@@ -2026,6 +2046,11 @@
#endif
}
+ if (tp->unconfirmed_openreq) {
+ tcp_openreq_free(tp->unconfirmed_openreq);
+ tp->unconfirmed_openreq = NULL;
+ }
+
sk->shutdown = 0;
sk->done = 0;
tp->srtt = 0;
@@ -2139,8 +2164,10 @@
newsk = req->sk;
tcp_acceptq_removed(sk);
- tcp_openreq_fastfree(req);
- BUG_TRAP(newsk->state != TCP_SYN_RECV);
+ if (newsk->tp_pinfo.af_tcp.unconfirmed_openreq == NULL)
+ tcp_openreq_fastfree(req);
+ BUG_TRAP(newsk->tp_pinfo.af_tcp.unconfirmed_openreq ||
+ newsk->state != TCP_SYN_RECV);
release_sock(sk);
return newsk;
@@ -2305,6 +2332,10 @@
}
break;
+ case TCP_CONFIRM_CONNECT:
+ tp->confirm_connect = !!val;
+ break;
+
default:
err = -ENOPROTOOPT;
break;
@@ -2429,6 +2460,9 @@
case TCP_QUICKACK:
val = !tp->ack.pingpong;
break;
+ case TCP_CONFIRM_CONNECT:
+ val = tp->confirm_connect || tcp_is_unconfirmed(tp);
+ break;
default:
return -ENOPROTOOPT;
};
--- linux-2.4.18-11umpr/net/ipv4/tcp_input.c.orig Mon Feb 25 20:38:14 2002
+++ linux-2.4.18-11umpr/net/ipv4/tcp_input.c Fri Apr 19 10:52:27 2002
@@ -3749,6 +3749,11 @@
switch(sk->state) {
case TCP_SYN_RECV:
if (acceptable) {
+ if (tp->unconfirmed_openreq != NULL) {
+ tcp_openreq_free(tp->unconfirmed_openreq);
+ tp->unconfirmed_openreq = NULL;
+ }
+
tp->copied_seq = tp->rcv_nxt;
mb();
tcp_set_state(sk, TCP_ESTABLISHED);
--- linux-2.4.18-11umpr/net/ipv4/tcp_minisocks.c.orig Mon Oct 1 18:19:57 2001
+++ linux-2.4.18-11umpr/net/ipv4/tcp_minisocks.c Fri Apr 19 10:24:22 2002
@@ -696,6 +696,7 @@
tcp_init_wl(newtp, req->snt_isn, req->rcv_isn);
newtp->retransmits = 0;
+ newtp->confirm_connect = 0;
newtp->backoff = 0;
newtp->srtt = 0;
newtp->mdev = TCP_TIMEOUT_INIT;
@@ -839,7 +840,8 @@
* Enforce "SYN-ACK" according to figure 8, figure 6
* of RFC793, fixed by RFC1122.
*/
- req->class->rtx_syn_ack(sk, req, NULL);
+ if (!req->unconfirmed)
+ req->class->rtx_syn_ack(sk, req, NULL);
return NULL;
}
@@ -864,7 +866,7 @@
if (paws_reject || !tcp_in_window(TCP_SKB_CB(skb)->seq, TCP_SKB_CB(skb)->end_seq,
req->rcv_isn+1, req->rcv_isn+1+req->rcv_wnd)) {
/* Out of window: send ACK and drop. */
- if (!(flg & TCP_FLAG_RST))
+ if (!req->unconfirmed && !(flg & TCP_FLAG_RST))
req->class->send_ack(skb, req);
if (paws_reject)
NET_INC_STATS_BH(PAWSEstabRejected);
@@ -907,6 +909,12 @@
return NULL;
}
+ /* @@@ If we are in SYN_RECV and haven't confirmed/rejected
+ * the connection yet, this ACK is acking a never-sent packet.
+ */
+ if (tcp_is_unconfirmed(tp))
+ return NULL;
+
/* OK, ACK is valid, create big socket and
* feed this segment to it. It will repeat all
* the tests. THIS SEGMENT MUST MOVE SOCKET TO
--- linux-2.4.18-11umpr/net/ipv4/tcp_ipv4.c.orig Mon Feb 25 20:38:14 2002
+++ linux-2.4.18-11umpr/net/ipv4/tcp_ipv4.c Fri Apr 19 18:56:45 2002
@@ -1270,12 +1270,14 @@
int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
{
+ struct tcp_opt *master_tp = &(sk->tp_pinfo.af_tcp);
struct tcp_opt tp;
struct open_request *req;
__u32 saddr = skb->nh.iph->saddr;
__u32 daddr = skb->nh.iph->daddr;
__u32 isn = TCP_SKB_CB(skb)->when;
struct dst_entry *dst = NULL;
+ int dont_confirm = 0;
#ifdef CONFIG_SYN_COOKIES
int want_cookie = 0;
#else
@@ -1312,6 +1314,9 @@
if (req == NULL)
goto drop;
+ if (!want_cookie && master_tp->confirm_connect)
+ dont_confirm = 1;
+
tcp_clear_options(&tp);
tp.mss_clamp = 536;
tp.user_mss = sk->tp_pinfo.af_tcp.user_mss;
@@ -1396,11 +1401,31 @@
}
req->snt_isn = isn;
- if (tcp_v4_send_synack(sk, req, dst))
+ if (!dont_confirm && tcp_v4_send_synack(sk, req, dst))
goto drop_and_free;
if (want_cookie) {
tcp_openreq_free(req);
+ } else if (dont_confirm) {
+ struct sock *child;
+ __u8 rcv_wscale;
+
+ req->window_clamp = dst?dst->window:0;
+ tcp_select_initial_window(tcp_full_space(sk), req->mss,
+ &req->rcv_wnd, &req->window_clamp,
+ 0, &rcv_wscale);
+ req->rcv_wscale = rcv_wscale;
+
+ child = tcp_v4_syn_recv_sock(sk, skb, req, NULL);
+ if (child != NULL) {
+ req->unconfirmed = 1;
+ child->tp_pinfo.af_tcp.unconfirmed_openreq = req;
+ tcp_acceptq_queue(sk, req, child);
+ sk->data_ready(sk, 0);
+ sock_put(child);
+ } else {
+ tcp_openreq_free(req);
+ }
} else {
tcp_v4_synq_add(sk, req);
}
--- linux-2.4.18-11umpr/net/ipv4/tcp_timer.c.orig Mon Oct 1 18:19:57 2001
+++ linux-2.4.18-11umpr/net/ipv4/tcp_timer.c Thu Apr 18 19:49:06 2002
@@ -512,7 +512,8 @@
if ((long)(now - req->expires) >= 0) {
if ((req->retrans < thresh ||
(req->acked && req->retrans < max_retries))
- && !req->class->rtx_syn_ack(sk, req, NULL)) {
+ && (req->unconfirmed ||
+ !req->class->rtx_syn_ack(sk, req, NULL))) {
unsigned long timeo;
if (req->retrans++ == 0)
--- linux-2.4.18-11umpr/net/ipv4/af_inet.c.orig Fri Dec 21 18:42:05 2001
+++ linux-2.4.18-11umpr/net/ipv4/af_inet.c Wed Apr 17 20:45:06 2002
@@ -693,7 +693,7 @@
lock_sock(sk2);
- BUG_TRAP((1<<sk2->state)&(TCPF_ESTABLISHED|TCPF_CLOSE_WAIT|TCPF_CLOSE));
+ BUG_TRAP((1<<sk2->state)&(TCPF_SYN_RECV|TCPF_ESTABLISHED|TCPF_CLOSE_WAIT|TCPF_CLOSE));
sock_graft(sk2, newsock);
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH,RFC] explicit connection confirmation 2002-11-07 9:32 [PATCH,RFC] explicit connection confirmation Lennert Buytenhek @ 2002-11-07 11:27 ` bert hubert 2002-11-07 12:09 ` Lennert Buytenhek 2003-08-14 13:11 ` Lennert Buytenhek 1 sibling, 1 reply; 14+ messages in thread From: bert hubert @ 2002-11-07 11:27 UTC (permalink / raw) To: Lennert Buytenhek; +Cc: netdev On Thu, Nov 07, 2002 at 04:32:08AM -0500, Lennert Buytenhek wrote: > - Sockets returned from accept() on this socket after this will be > sockets in the SYN_RECV state instead of the ESTABLISHED state > (unless syncookies had to be used). By writing to the socket, > you cause a SYN-ACK to be sent, and by immediately closing the > socket you cause a RST to be sent. And reading, like a webserver would do? I think this approach smells, btw - doesn't this mean that processes will now be woken up on receiving a SYN instead of after completion of the handshake? Would make a synflood all the more interesting.. Regards, bert -- http://www.PowerDNS.com Versatile DNS Software & Services http://lartc.org Linux Advanced Routing & Traffic Control HOWTO ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH,RFC] explicit connection confirmation 2002-11-07 11:27 ` bert hubert @ 2002-11-07 12:09 ` Lennert Buytenhek 2002-11-07 13:36 ` jamal 2002-11-07 13:49 ` bert hubert 0 siblings, 2 replies; 14+ messages in thread From: Lennert Buytenhek @ 2002-11-07 12:09 UTC (permalink / raw) To: bert hubert, netdev On Thu, Nov 07, 2002 at 12:27:33PM +0100, bert hubert wrote: > > - Sockets returned from accept() on this socket after this will be > > sockets in the SYN_RECV state instead of the ESTABLISHED state > > (unless syncookies had to be used). By writing to the socket, > > you cause a SYN-ACK to be sent, and by immediately closing the > > socket you cause a RST to be sent. > > And reading, like a webserver would do? Will do nothing, but this can easily be changed. I remind you of the fact that this option has to be explicitly enabled on a listening socket, so that apps have to be adapted to use the new interface anyway. > I think this approach smells, btw - doesn't this mean that processes > will now be woken up on receiving a SYN instead of after completion > of the handshake? Yes, it does mean this. You are free to suggest alternatives. > Would make a synflood all the more interesting.. In case of a synflood, the TCP stack will fall back to sending syncookies as it normally does. cheers, Lennert ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH,RFC] explicit connection confirmation 2002-11-07 12:09 ` Lennert Buytenhek @ 2002-11-07 13:36 ` jamal 2002-11-07 15:27 ` Lennert Buytenhek 2002-11-07 13:49 ` bert hubert 1 sibling, 1 reply; 14+ messages in thread From: jamal @ 2002-11-07 13:36 UTC (permalink / raw) To: Lennert Buytenhek; +Cc: bert hubert, netdev Could you not have used netfilter for this? You have the app sending controls to add netfilter policies and delete them when not needed. cheers, jamal ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH,RFC] explicit connection confirmation 2002-11-07 13:36 ` jamal @ 2002-11-07 15:27 ` Lennert Buytenhek 2002-11-08 11:22 ` jamal 0 siblings, 1 reply; 14+ messages in thread From: Lennert Buytenhek @ 2002-11-07 15:27 UTC (permalink / raw) To: jamal, Marc Boucher; +Cc: bert hubert, netdev Hi, netfilter, yeah, sure, 'could have', but please. 'Make it a netfilter module' is generally what people say when they are confronted with a feature they don't like. There was a thread about this in private mail round April this year, in which some good points were raised. - From the kernel point of view, doing it in netfilter would require more state tracking and access to the socket hashes and would be uglier. - From the application writer's point of view, doing it via a socket option is much more intuitive, since this flag is really a socket property, than doing it via some extra API which would make it way too difficult/complex to use in existing apps. It's worth noting that selective TCP connection acceptance was also intended to be implemented as a socket option by the original BSD developers. See http://www.kohala.com/start/vanj.94jun27.txt (link thanks to Marc Boucher). >From the accept(2) man page on Red Hat Linux (again thanks to Marc Boucher): For certain protocols which require an explicit confirmation, such as DECNet, accept can be thought of as merely dequeuing the next connec- tion request and not implying confirmation. Confirmation can be implied by a normal read or write on the new file descriptor, and rejection can be implied by closing the new socket. Currently only DEC- Net has these semantics on Linux. cheers, Lennert On Thu, Nov 07, 2002 at 08:36:28AM -0500, jamal wrote: > Could you not have used netfilter for this? You have the app > sending controls to add netfilter policies and delete them when not > needed. > > cheers, > jamal > > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH,RFC] explicit connection confirmation 2002-11-07 15:27 ` Lennert Buytenhek @ 2002-11-08 11:22 ` jamal 2002-11-08 11:52 ` bert hubert 2002-11-08 18:28 ` Lennert Buytenhek 0 siblings, 2 replies; 14+ messages in thread From: jamal @ 2002-11-08 11:22 UTC (permalink / raw) To: Lennert Buytenhek; +Cc: Marc Boucher, bert hubert, netdev On Thu, 7 Nov 2002, Lennert Buytenhek wrote: > Hi, > > netfilter, yeah, sure, 'could have', but please. apology if i sounded like one of those adolescent netfilter dangerous fools who show up with "mama, look what i can do with a packet now that ive read netfilter docs" > > 'Make it a netfilter module' is generally what people say when > they are confronted with a feature they don't like. My angle was to avoid being intrusive to the tcp code. you might get a fish sent to you in .nl in an armani suit;-> > > There was a thread about this in private mail round April this year, > in which some good points were raised. > There are some good points; however, whats the app for this feature? cheers, jamal ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH,RFC] explicit connection confirmation 2002-11-08 11:22 ` jamal @ 2002-11-08 11:52 ` bert hubert 2002-11-08 11:56 ` Marc Boucher 2002-11-08 18:28 ` Lennert Buytenhek 1 sibling, 1 reply; 14+ messages in thread From: bert hubert @ 2002-11-08 11:52 UTC (permalink / raw) To: jamal; +Cc: Lennert Buytenhek, Marc Boucher, netdev On Fri, Nov 08, 2002 at 06:22:00AM -0500, jamal wrote: > > There was a thread about this in private mail round April this year, > > in which some good points were raised. > > There are some good points; however, whats the app for this feature? This came up a long time ago on bugtraq in a discussion how to easily prevent certain IP addresses from DoSsing your TCP daemon. Right now, userspace is always forced to complete the threeway handshake, and can only then close the socket. Even rather small amounts of SYN packets can thus easily saturate a server which has decided to handle only 100 connections AND has decided to ignore a certain IP address. Some inetd superservers contain code to ratelimit IP addresses which sadly is not as effective from userspace as it could be with the ability to RST a connection immediately. It also allows userspace to simulate that a service isn't even there, without root capabilities. Regards, bert -- http://www.PowerDNS.com Versatile DNS Software & Services http://lartc.org Linux Advanced Routing & Traffic Control HOWTO ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH,RFC] explicit connection confirmation 2002-11-08 11:52 ` bert hubert @ 2002-11-08 11:56 ` Marc Boucher 0 siblings, 0 replies; 14+ messages in thread From: Marc Boucher @ 2002-11-08 11:56 UTC (permalink / raw) To: bert hubert, jamal, Lennert Buytenhek, netdev it would also be useful for transparent proxying. presently all connections diverted to a proxy are immediately accepted, regardless of whether the second connection (proxy->real destination) succeeds or not. On Fri, Nov 08, 2002 at 12:52:05PM +0100, bert hubert wrote: > On Fri, Nov 08, 2002 at 06:22:00AM -0500, jamal wrote: > > > > There was a thread about this in private mail round April this year, > > > in which some good points were raised. > > > > There are some good points; however, whats the app for this feature? > > This came up a long time ago on bugtraq in a discussion how to easily > prevent certain IP addresses from DoSsing your TCP daemon. Right now, > userspace is always forced to complete the threeway handshake, and can only > then close the socket. > > Even rather small amounts of SYN packets can thus easily saturate a server > which has decided to handle only 100 connections AND has decided to ignore a > certain IP address. Some inetd superservers contain code to ratelimit IP > addresses which sadly is not as effective from userspace as it could be with > the ability to RST a connection immediately. > > It also allows userspace to simulate that a service isn't even there, > without root capabilities. > > Regards, > > bert > > -- > http://www.PowerDNS.com Versatile DNS Software & Services > http://lartc.org Linux Advanced Routing & Traffic Control HOWTO > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH,RFC] explicit connection confirmation 2002-11-08 11:22 ` jamal 2002-11-08 11:52 ` bert hubert @ 2002-11-08 18:28 ` Lennert Buytenhek 1 sibling, 0 replies; 14+ messages in thread From: Lennert Buytenhek @ 2002-11-08 18:28 UTC (permalink / raw) To: jamal; +Cc: Marc Boucher, bert hubert, netdev On Fri, Nov 08, 2002 at 06:22:00AM -0500, jamal wrote: > > netfilter, yeah, sure, 'could have', but please. > > apology if i sounded like one of those adolescent netfilter dangerous > fools who show up with "mama, look what i can do with a packet now that > ive read netfilter docs" No, you don't sound such, sorry for reacting the way i did. > > 'Make it a netfilter module' is generally what people say when > > they are confronted with a feature they don't like. > > My angle was to avoid being intrusive to the tcp code. > you might get a fish sent to you in .nl in an armani suit;-> Sorry but I don't like fish nor armani suits :-) > > There was a thread about this in private mail round April this year, > > in which some good points were raised. > > There are some good points; however, whats the app for this feature? My specific application is a proxy application that replaces the in-kernel IP masquerading functionality, using a wildcard REDIRECT rule plus SO_ORIGINAL_DST. The main reason I'm doing it in userspace is because downstream bandwidth limiting becomes a whole lot easier this way than doing it in-kernel -- it would need complicated state tracking and nonobvious window field manipulations if done there. The applications that Bert and Marc named sound sane too. There's just a whole lot of things this thing can be used for. cheers, Lennert ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH,RFC] explicit connection confirmation 2002-11-07 12:09 ` Lennert Buytenhek 2002-11-07 13:36 ` jamal @ 2002-11-07 13:49 ` bert hubert 2002-11-07 14:30 ` Lennert Buytenhek 1 sibling, 1 reply; 14+ messages in thread From: bert hubert @ 2002-11-07 13:49 UTC (permalink / raw) To: Lennert Buytenhek; +Cc: netdev On Thu, Nov 07, 2002 at 07:09:56AM -0500, Lennert Buytenhek wrote: > > I think this approach smells, btw - doesn't this mean that processes > > will now be woken up on receiving a SYN instead of after completion > > of the handshake? > > Yes, it does mean this. You are free to suggest alternatives. I like having this ability - I dislike offering it to an unsuspecting userspace. > > Would make a synflood all the more interesting.. > > In case of a synflood, the TCP stack will fall back to sending > syncookies as it normally does. Yes, but in your setup, a spoofable SYN packet will spawn a process for many daemons, unless they are modified to first try to read/write to the socket (which might block!) before forking/pthread_create()ing. Regards, bert -- http://www.PowerDNS.com Versatile DNS Software & Services http://lartc.org Linux Advanced Routing & Traffic Control HOWTO ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH,RFC] explicit connection confirmation 2002-11-07 13:49 ` bert hubert @ 2002-11-07 14:30 ` Lennert Buytenhek 2002-11-07 16:24 ` bert hubert 0 siblings, 1 reply; 14+ messages in thread From: Lennert Buytenhek @ 2002-11-07 14:30 UTC (permalink / raw) To: bert hubert, netdev On Thu, Nov 07, 2002 at 02:49:18PM +0100, bert hubert wrote: > > > I think this approach smells, btw - doesn't this mean that processes > > > will now be woken up on receiving a SYN instead of after completion > > > of the handshake? > > > > Yes, it does mean this. You are free to suggest alternatives. > > I like having this ability - I dislike offering it to an unsuspecting > userspace. Userspace needs to turn on the option first, so your 'unsuspecting' does not apply. > > > Would make a synflood all the more interesting.. > > > > In case of a synflood, the TCP stack will fall back to sending > > syncookies as it normally does. > > Yes, but in your setup, a spoofable SYN packet will spawn a process for many > daemons, unless they are modified to first try to read/write to the socket > (which might block!) before forking/pthread_create()ing. Again, if the app decides to turn on TCP_CONFIRM_CONNECT, then it's up to the app to deal with it properly. There are very good reasons for not turning on TCP_CONFIRM_CONNECT by default, which is why it is not on by default, and why grafting a setsockopt onto every daemon there is out there is definitely not a good idea. cheers, Lennert ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH,RFC] explicit connection confirmation 2002-11-07 14:30 ` Lennert Buytenhek @ 2002-11-07 16:24 ` bert hubert 0 siblings, 0 replies; 14+ messages in thread From: bert hubert @ 2002-11-07 16:24 UTC (permalink / raw) To: Lennert Buytenhek; +Cc: netdev On Thu, Nov 07, 2002 at 09:30:02AM -0500, Lennert Buytenhek wrote: > > I like having this ability - I dislike offering it to an unsuspecting > > userspace. > > Userspace needs to turn on the option first, so your 'unsuspecting' > does not apply. Ah ok, I thought 'TCP_CONFIRM_CONNECT' was a TCP connection state like SYN_RECV. > Again, if the app decides to turn on TCP_CONFIRM_CONNECT, then it's > up to the app to deal with it properly. There are very good reasons > for not turning on TCP_CONFIRM_CONNECT by default, which is why it > is not on by default, and why grafting a setsockopt onto every daemon > there is out there is definitely not a good idea. Exactly. Ok, cool. Regards, bert -- http://www.PowerDNS.com Versatile DNS Software & Services http://lartc.org Linux Advanced Routing & Traffic Control HOWTO ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH,RFC] explicit connection confirmation 2002-11-07 9:32 [PATCH,RFC] explicit connection confirmation Lennert Buytenhek 2002-11-07 11:27 ` bert hubert @ 2003-08-14 13:11 ` Lennert Buytenhek 2003-08-25 11:09 ` Harald Welte 1 sibling, 1 reply; 14+ messages in thread From: Lennert Buytenhek @ 2003-08-14 13:11 UTC (permalink / raw) To: netdev Hi, Below is the original email I sent to netdev about nine months ago announcing selective connection acceptance support for TCP sockets. I have forward-ported the 2.4.18 patch to 2.6.0-test2, included below. No functional changes have been made. Could someone have a look at it? cheers, Lennert On Thu, Nov 07, 2002 at 04:32:08AM -0500, buytenh wrote: > (please CC on replies, I am not on this list) > > Hi, > > This patch gives userland the ability to decide whether to react > with an incoming TCP SYN with a SYN-ACK or a RST. It was hacked > up after Linux Kongress 2001 and has been sitting on my patch > pile since april this year or something. > > The basic idea is this: > - Put the listening TCP socket in TCP_CONFIRM_CONNECT mode. > - Sockets returned from accept() on this socket after this will be > sockets in the SYN_RECV state instead of the ESTABLISHED state > (unless syncookies had to be used). By writing to the socket, > you cause a SYN-ACK to be sent, and by immediately closing the > socket you cause a RST to be sent. > > There are two issues left, AFAICS: > - SYN_RECV sockets currently don't time out for some reason > - it deadlocks instantly on SMP > > It's against 2.4.18. Could someone have a look at it please? I > unfortunately haven't had any time at all lately, so I would be > really happy if someone else could take this over. (Well, I can > dream, can't I?) > > > cheers, > Lennert > --- linux-2.6.0-test2/include/linux/tcp.h.orig 2003-08-14 14:19:20.886285797 +0200 +++ linux-2.6.0-test2/include/linux/tcp.h 2003-08-14 13:44:42.000000000 +0200 @@ -127,6 +127,7 @@ #define TCP_WINDOW_CLAMP 10 /* Bound advertised window */ #define TCP_INFO 11 /* Information about this connection. */ #define TCP_QUICKACK 12 /* Block/reenable quick acks */ +#define TCP_CONFIRM_CONNECT 13 /* Let user control connection acceptance */ #define TCPI_OPT_TIMESTAMPS 1 #define TCPI_OPT_SACK 2 @@ -257,6 +258,7 @@ __u8 reordering; /* Packet reordering metric. */ __u8 queue_shrunk; /* Write queue has been shrunk recently.*/ __u8 defer_accept; /* User waits for some data after accept() */ + __u8 confirm_connect;/* User wants control over conn. acceptance */ /* RTT measurement */ __u8 backoff; /* backoff */ @@ -364,6 +366,11 @@ struct open_request *accept_queue; struct open_request *accept_queue_tail; + /* Our corresponding open_request if this socket is unconfirmed + * (i.e. if we haven't sent SYN-ACK or RST yet) + */ + struct open_request *unconfirmed_openreq; + int write_pending; /* A write to socket waits to start. */ unsigned int keepalive_time; /* time before keep alive takes place */ --- linux-2.6.0-test2/include/net/tcp.h.orig 2003-08-14 14:19:20.888285455 +0200 +++ linux-2.6.0-test2/include/net/tcp.h 2003-08-14 13:42:42.000000000 +0200 @@ -591,7 +591,8 @@ sack_ok : 1, wscale_ok : 1, ecn_ok : 1, - acked : 1; + acked : 1, + unconfirmed : 1; /* The following two fields can be easily recomputed I think -AK */ __u32 window_clamp; /* window clamp at creation time */ __u32 rcv_wnd; /* rcv_wnd offered first time */ @@ -619,6 +620,17 @@ tcp_openreq_fastfree(req); } +static inline int tcp_is_unconfirmed(struct tcp_opt *tp) +{ + struct open_request *req; + + req = tp->unconfirmed_openreq; + if (req != NULL && req->unconfirmed) + return 1; + + return 0; +} + #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) #define TCP_INET_FAMILY(fam) ((fam) == AF_INET) #else @@ -1762,6 +1774,7 @@ req->acked = 0; req->ecn_ok = 0; req->rmt_port = skb->h.th->source; + req->unconfirmed = 0; } #define TCP_MEM_QUANTUM ((int)PAGE_SIZE) --- linux-2.6.0-test2/net/ipv4/af_inet.c.orig 2003-08-14 14:19:20.890285113 +0200 +++ linux-2.6.0-test2/net/ipv4/af_inet.c 2003-08-14 13:47:14.000000000 +0200 @@ -685,8 +685,8 @@ lock_sock(sk2); - BUG_TRAP((1 << sk2->sk_state) & - (TCPF_ESTABLISHED | TCPF_CLOSE_WAIT | TCPF_CLOSE)); + BUG_TRAP((1 << sk2->sk_state) & (TCPF_SYN_RECV | TCPF_ESTABLISHED | + TCPF_CLOSE_WAIT | TCPF_CLOSE)); sock_graft(sk2, newsock); --- linux-2.6.0-test2/net/ipv4/tcp.c.orig 2003-08-14 14:19:20.891284941 +0200 +++ linux-2.6.0-test2/net/ipv4/tcp.c 2003-08-14 14:16:08.697201584 +0200 @@ -206,6 +206,7 @@ * lingertime == 0 (RFC 793 ABORT Call) * Hirokazu Takahashi : Use copy_from_user() instead of * csum_and_copy_from_user() if possible. + * Lennert Buytenhek : Explicit connection confirmation * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License @@ -374,6 +375,15 @@ return tcp_sk(sk)->accept_queue ? (POLLIN | POLLRDNORM) : 0; } +static void tcp_confirm(struct sock *sk) +{ + struct tcp_opt *tp = tcp_sk(sk); + struct open_request *req = tp->unconfirmed_openreq; + + req->unconfirmed = 0; + req->class->rtx_syn_ack(sk, req, NULL); +} + /* * Wait for a TCP event. * @@ -662,6 +672,9 @@ struct task_struct *tsk = current; DEFINE_WAIT(wait); + if (tcp_is_unconfirmed(tp)) + tcp_confirm(sk); + while ((1 << sk->sk_state) & ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)) { if (sk->sk_err) return sock_error(sk); @@ -1939,7 +1952,7 @@ void tcp_close(struct sock *sk, long timeout) { struct sk_buff *skb; - int data_was_unread = 0; + int should_send_rst = 0; lock_sock(sk); sk->sk_shutdown = SHUTDOWN_MASK; @@ -1960,12 +1973,19 @@ while ((skb = __skb_dequeue(&sk->sk_receive_queue)) != NULL) { u32 len = TCP_SKB_CB(skb)->end_seq - TCP_SKB_CB(skb)->seq - skb->h.th->fin; - data_was_unread += len; + should_send_rst += len; __kfree_skb(skb); } tcp_mem_reclaim(sk); + if (tcp_sk(sk)->unconfirmed_openreq != NULL) { + if (tcp_is_unconfirmed(tcp_sk(sk))) + should_send_rst = 1; + tcp_openreq_free(tcp_sk(sk)->unconfirmed_openreq); + tcp_sk(sk)->unconfirmed_openreq = NULL; + } + /* As outlined in draft-ietf-tcpimpl-prob-03.txt, section * 3.10, we send a RST here because data was lost. To * witness the awful effects of the old behavior of always @@ -1975,7 +1995,7 @@ * the FTP client, wheee... Note: timeout is always zero * in such a case. */ - if (data_was_unread) { + if (should_send_rst) { /* Unread data was tossed, zap the connection. */ NET_INC_STATS_USER(TCPAbortOnClose); tcp_set_state(sk, TCP_CLOSE); @@ -2145,6 +2165,11 @@ if (!(sk->sk_userlocks & SOCK_BINDADDR_LOCK)) inet_reset_saddr(sk); + if (tp->unconfirmed_openreq) { + tcp_openreq_free(tp->unconfirmed_openreq); + tp->unconfirmed_openreq = NULL; + } + sk->sk_shutdown = 0; sock_reset_flag(sk, SOCK_DONE); tp->srtt = 0; @@ -2258,8 +2283,10 @@ newsk = req->sk; tcp_acceptq_removed(sk); - tcp_openreq_fastfree(req); - BUG_TRAP(newsk->sk_state != TCP_SYN_RECV); + if (tcp_sk(newsk)->unconfirmed_openreq == NULL) + tcp_openreq_fastfree(req); + BUG_TRAP(tcp_sk(newsk)->unconfirmed_openreq || + newsk->sk_state != TCP_SYN_RECV); release_sock(sk); return newsk; @@ -2428,6 +2455,10 @@ } break; + case TCP_CONFIRM_CONNECT: + tp->confirm_connect = !!val; + break; + default: err = -ENOPROTOOPT; break; @@ -2553,6 +2584,9 @@ case TCP_QUICKACK: val = !tp->ack.pingpong; break; + case TCP_CONFIRM_CONNECT: + val = tp->confirm_connect || tcp_is_unconfirmed(tp); + break; default: return -ENOPROTOOPT; }; --- linux-2.6.0-test2/net/ipv4/tcp_input.c.orig 2003-08-14 14:19:20.894284428 +0200 +++ linux-2.6.0-test2/net/ipv4/tcp_input.c 2003-08-14 13:42:42.000000000 +0200 @@ -3938,6 +3938,11 @@ switch(sk->sk_state) { case TCP_SYN_RECV: if (acceptable) { + if (tp->unconfirmed_openreq != NULL) { + tcp_openreq_free(tp->unconfirmed_openreq); + tp->unconfirmed_openreq = NULL; + } + tp->copied_seq = tp->rcv_nxt; mb(); tcp_set_state(sk, TCP_ESTABLISHED); --- linux-2.6.0-test2/net/ipv4/tcp_ipv4.c.orig 2003-08-14 14:19:20.895284256 +0200 +++ linux-2.6.0-test2/net/ipv4/tcp_ipv4.c 2003-08-14 14:34:31.383363445 +0200 @@ -1403,12 +1403,14 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb) { + struct tcp_opt *master_tp = tcp_sk(sk); struct tcp_opt tp; struct open_request *req; __u32 saddr = skb->nh.iph->saddr; __u32 daddr = skb->nh.iph->daddr; __u32 isn = TCP_SKB_CB(skb)->when; struct dst_entry *dst = NULL; + int dont_confirm = 0; #ifdef CONFIG_SYN_COOKIES int want_cookie = 0; #else @@ -1445,6 +1447,9 @@ if (!req) goto drop; + if (!want_cookie && master_tp->confirm_connect) + dont_confirm = 1; + tcp_clear_options(&tp); tp.mss_clamp = 536; tp.user_mss = tcp_sk(sk)->user_mss; @@ -1533,11 +1538,31 @@ } req->snt_isn = isn; - if (tcp_v4_send_synack(sk, req, dst)) + if (!dont_confirm && tcp_v4_send_synack(sk, req, dst)) goto drop_and_free; if (want_cookie) { tcp_openreq_free(req); + } else if (dont_confirm) { + struct sock *child; + __u8 rcv_wscale; + + req->window_clamp = dst ? dst_metric(dst, RTAX_WINDOW) : 0; + tcp_select_initial_window(tcp_full_space(sk), req->mss, + &req->rcv_wnd, &req->window_clamp, + 0, &rcv_wscale); + req->rcv_wscale = rcv_wscale; + + child = tcp_v4_syn_recv_sock(sk, skb, req, NULL); + if (child != NULL) { + req->unconfirmed = 1; + tcp_sk(child)->unconfirmed_openreq = req; + tcp_acceptq_queue(sk, req, child); + sk->sk_data_ready(sk, 0); + sock_put(child); + } else { + tcp_openreq_free(req); + } } else { tcp_v4_synq_add(sk, req); } --- linux-2.6.0-test2/net/ipv4/tcp_minisocks.c.orig 2003-08-14 14:19:20.897283914 +0200 +++ linux-2.6.0-test2/net/ipv4/tcp_minisocks.c 2003-08-14 13:42:42.000000000 +0200 @@ -732,6 +732,7 @@ tcp_init_wl(newtp, req->snt_isn, req->rcv_isn); newtp->retransmits = 0; + newtp->confirm_connect = 0; newtp->backoff = 0; newtp->srtt = 0; newtp->mdev = TCP_TIMEOUT_INIT; @@ -884,7 +885,8 @@ * Enforce "SYN-ACK" according to figure 8, figure 6 * of RFC793, fixed by RFC1122. */ - req->class->rtx_syn_ack(sk, req, NULL); + if (!req->unconfirmed) + req->class->rtx_syn_ack(sk, req, NULL); return NULL; } @@ -955,7 +957,7 @@ if (paws_reject || !tcp_in_window(TCP_SKB_CB(skb)->seq, TCP_SKB_CB(skb)->end_seq, req->rcv_isn+1, req->rcv_isn+1+req->rcv_wnd)) { /* Out of window: send ACK and drop. */ - if (!(flg & TCP_FLAG_RST)) + if (!req->unconfirmed && !(flg & TCP_FLAG_RST)) req->class->send_ack(skb, req); if (paws_reject) NET_INC_STATS_BH(PAWSEstabRejected); @@ -991,6 +993,12 @@ return NULL; } + /* @@@ If we are in SYN_RECV and haven't confirmed/rejected + * the connection yet, this ACK is acking a never-sent packet. + */ + if (tcp_is_unconfirmed(tp)) + return NULL; + /* OK, ACK is valid, create big socket and * feed this segment to it. It will repeat all * the tests. THIS SEGMENT MUST MOVE SOCKET TO --- linux-2.6.0-test2/net/ipv4/tcp_timer.c.orig 2003-08-14 14:19:20.899283572 +0200 +++ linux-2.6.0-test2/net/ipv4/tcp_timer.c 2003-08-14 13:42:42.000000000 +0200 @@ -519,7 +519,8 @@ if (time_after_eq(now, req->expires)) { if ((req->retrans < thresh || (req->acked && req->retrans < max_retries)) - && !req->class->rtx_syn_ack(sk, req, NULL)) { + && (req->unconfirmed || + !req->class->rtx_syn_ack(sk, req, NULL))) { unsigned long timeo; if (req->retrans++ == 0) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH,RFC] explicit connection confirmation 2003-08-14 13:11 ` Lennert Buytenhek @ 2003-08-25 11:09 ` Harald Welte 0 siblings, 0 replies; 14+ messages in thread From: Harald Welte @ 2003-08-25 11:09 UTC (permalink / raw) To: Lennert Buytenhek; +Cc: netdev [-- Attachment #1: Type: text/plain, Size: 931 bytes --] On Thu, Aug 14, 2003 at 09:11:56AM -0400, Lennert Buytenhek wrote: > Hi, > > Below is the original email I sent to netdev about nine months ago > announcing selective connection acceptance support for TCP sockets. > I have forward-ported the 2.4.18 patch to 2.6.0-test2, included below. > No functional changes have been made. That is great! Thanks for not forgetting about this issue. I think this feature would really be great to develop _really_ transparent proxies. > Could someone have a look at it? I'm not a core networking guy, but it looks fine to me. Dave, Andi, Alexey, James: Would this be a candidate for kernel inclusion at some point? > Lennert -- - Harald Welte <laforge@gnumonks.org> http://www.gnumonks.org/ ============================================================================ Programming is like sex: One mistake and you have to support it your lifetime [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2003-08-25 11:09 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2002-11-07 9:32 [PATCH,RFC] explicit connection confirmation Lennert Buytenhek 2002-11-07 11:27 ` bert hubert 2002-11-07 12:09 ` Lennert Buytenhek 2002-11-07 13:36 ` jamal 2002-11-07 15:27 ` Lennert Buytenhek 2002-11-08 11:22 ` jamal 2002-11-08 11:52 ` bert hubert 2002-11-08 11:56 ` Marc Boucher 2002-11-08 18:28 ` Lennert Buytenhek 2002-11-07 13:49 ` bert hubert 2002-11-07 14:30 ` Lennert Buytenhek 2002-11-07 16:24 ` bert hubert 2003-08-14 13:11 ` Lennert Buytenhek 2003-08-25 11:09 ` Harald Welte
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).