[PATCH] ipv4 tcp autobind problem

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] ipv4 tcp autobind problem
@ 2003-09-29 13:05 Kovacs Krisztian
  2003-09-30  5:22 ` David S. Miller
  0 siblings, 1 reply; 8+ messages in thread
From: Kovacs Krisztian @ 2003-09-29 13:05 UTC (permalink / raw)
  To: netdev, linux-net

[-- Attachment #1: Type: text/plain, Size: 1258 bytes --]

   Hi,

   While testing the tproxy (transparent proxying) patch for linux-2.4
(http://www.balabit.com/downloads/tproxy/linux-2.4), Stas Grabois has
found a quite strange aspect of Linux 2.4 TCP. Imagine the following
scenario: you create a new socket (AF_INET, SOCK_STREAM), bind it to local
port 0, and try to connect() to a closed port. Of course, the peer sends
back an RST, indicating no one is listening on that port. However, if your
application does not care about the return value of connect(), and calls
send() on the not connected socket, inet_autobind() is called and a new
local port is allocated for the socket. So, besides returning an error,
there is also a side effect of the send(). The same thing happens with an
established TCP session if the peer sends an RST between two send() calls,
the second call will autobind the socket, and then return error.

   Is this behaviour intentional? Isn't rebinding a TCP socket to a new
local port a bug? I mean, possibly inet_sendmsg() should check if the
socket is SOCK_STREAM before calling inet_autobind() if sk->num is zero. 
The attached patch adds this check to inet_sendmsg(). We've been using it 
for a while, and it looks it did not break anything.

-- 
   Regards,
     Krisztian KOVACS

[-- Attachment #2: inet_tcp_autobind_fix.diff --]
[-- Type: text/plain, Size: 400 bytes --]

--- linux-2.4.22/net/ipv4/af_inet.c.orig	Thu Sep 18 10:02:49 2003
+++ linux-2.4.22/net/ipv4/af_inet.c	Thu Sep 18 10:03:56 2003
@@ -751,7 +751,7 @@
 	struct sock *sk = sock->sk;

 	/* We may need to bind the socket. */
-	if (sk->num==0 && inet_autobind(sk) != 0)
+	if (sk->num==0 && sock->type != SOCK_STREAM && inet_autobind(sk) != 0)
 		return -EAGAIN;

 	return sk->prot->sendmsg(sk, msg, size);

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ipv4 tcp autobind problem
  2003-09-29 13:05 Kovacs Krisztian
@ 2003-09-30  5:22 ` David S. Miller
  2003-09-30  7:14   ` Kovacs Krisztian
  0 siblings, 1 reply; 8+ messages in thread
From: David S. Miller @ 2003-09-30  5:22 UTC (permalink / raw)
  To: Kovacs Krisztian; +Cc: netdev, linux-net

Once connect() has been called on a socket, you may not ever again
perform any action that would try to connect that socket.

Said another way, a socket that has failed to connect() is a socket
that you may not use in any usable way ever again except to close
that file descriptor.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ipv4 tcp autobind problem
  2003-09-30  5:22 ` David S. Miller
@ 2003-09-30  7:14   ` Kovacs Krisztian
  0 siblings, 0 replies; 8+ messages in thread
From: Kovacs Krisztian @ 2003-09-30  7:14 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, linux-net

   Hi,

David S. Miller wrote:
> Once connect() has been called on a socket, you may not ever again
> perform any action that would try to connect that socket.
> 
> Said another way, a socket that has failed to connect() is a socket
> that you may not use in any usable way ever again except to close
> that file descriptor.

   Yes, I know that. Exactly this is why I felt it a problem that sk->num 
gets a new value if you call send() after an unsuccessful connect(). (I 
know that in theory one should not call send() in this case, but one _can_ 
call it in reality.) And, the problem occurs not only when send() is 
called after an unsuccessful connect(), but also when an RST is received 
between two send() calls, which is perfectly legal, and cannot be viewed 
as buggy user-space software.

   I don't really understand why the inet_sendmsg() calls inet_autobind() 
for SOCK_STREAM sockets: for these kind of sockets, one must call 
connect() anyway, before doing any other kind of operations. And, the side 
effect of the code

         /* We may need to bind the socket. */
         if (!inet_sk(sk)->num && inet_autobind(sk))
                 return -EAGAIN;

in inet_sendmsg() is that when an RST is received, sk->num is set to zero, 
and when the next inet_sendmsg() call occurs, the socket is rebound to a 
new port _before_ returning an error.

   In our transparent proxying patch, this causes problems, because it 
assumes that after a socket is bound (by connect(), for example), the 
local port (sk->sport) remains unchanged until the socket is closed. 
However, this is not true because of the mentioned side effect of 
inet_sendmsg(). This is why I proposed that inet_sendmsg() should call 
inet_autobind() only if it's not a SOCK_STREAM socket.

-- 
   Regards,
     Krisztian KOVACS

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fw: Re: [PATCH] ipv4 tcp autobind problem
       [not found] <20030930024320.5aca8ddc.davem@redhat.com>
@ 2003-09-30 12:37 ` kuznet
  2003-09-30 12:37   ` David S. Miller
  2003-09-30 14:19   ` Kovacs Krisztian
  0 siblings, 2 replies; 8+ messages in thread
From: kuznet @ 2003-09-30 12:37 UTC (permalink / raw)
  To: David S. Miller; +Cc: jmorris, hidden, netdev, linux-net

Hello!

> in inet_sendmsg() is that when an RST is received, sk->num is set to zero, 

Yes, I remember this. This funny thing was added to avoid using reserved
ports obtained from accept() to do connect(). Before that sockets were never
unbound after they bound once exactly to avoid weirdness of the kind
descibed in your mail, but this happened to be insecure.

>From this mail I still do not see why autobinding of void socket is so
bad thing, that it requires marginal fixing at the place which is already
marginal. What is the real problem? So, bad sendmsg() selects some port as
a side effect. It makes it on udp and tcp. What is the deal? If it is disaster
for tcp, why it is not bad for udp?

> local port (sk->sport) remains unchanged until the socket is closed.

Socket is _closed_. Local port is reset only after socket is closed,
unless PORT_USERLOCK is set. And sk->sport remains unchanged even
after socket is closed, btw, so... 

I do recognize that current behaviour is weird, but I still want to know
how this marginal weirdness escaped to be seen in reality.

Alexey

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fw: Re: [PATCH] ipv4 tcp autobind problem
  2003-09-30 12:37 ` Fw: Re: [PATCH] ipv4 tcp autobind problem kuznet
@ 2003-09-30 12:37   ` David S. Miller
  2003-09-30 14:19   ` Kovacs Krisztian
  1 sibling, 0 replies; 8+ messages in thread
From: David S. Miller @ 2003-09-30 12:37 UTC (permalink / raw)
  To: kuznet; +Cc: jmorris, hidden, netdev, linux-net

On Tue, 30 Sep 2003 16:37:48 +0400 (MSD)
kuznet@ms2.inr.ac.ru wrote:

> From this mail I still do not see why autobinding of void socket is so
> bad thing, that it requires marginal fixing at the place which is already
> marginal. What is the real problem?

I think it is some detail about their transparent proxy kernel
hacks, remember that awful stuff? :)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fw: Re: [PATCH] ipv4 tcp autobind problem
  2003-09-30 12:37 ` Fw: Re: [PATCH] ipv4 tcp autobind problem kuznet
  2003-09-30 12:37   ` David S. Miller
@ 2003-09-30 14:19   ` Kovacs Krisztian
  2003-10-07 11:56     ` kuznet
  1 sibling, 1 reply; 8+ messages in thread
From: Kovacs Krisztian @ 2003-09-30 14:19 UTC (permalink / raw)
  To: kuznet; +Cc: David S. Miller, jmorris, netdev, linux-net, Balazs Scheidler

   Hi,

kuznet@ms2.inr.ac.ru wrote:
>>in inet_sendmsg() is that when an RST is received, sk->num is set to zero, 
> 
> Yes, I remember this. This funny thing was added to avoid using reserved
> ports obtained from accept() to do connect(). Before that sockets were never
> unbound after they bound once exactly to avoid weirdness of the kind
> descibed in your mail, but this happened to be insecure.
> 
> From this mail I still do not see why autobinding of void socket is so
> bad thing, that it requires marginal fixing at the place which is already
> marginal. What is the real problem? So, bad sendmsg() selects some port as
> a side effect. It makes it on udp and tcp. What is the deal? If it is disaster
> for tcp, why it is not bad for udp?
> 
>>local port (sk->sport) remains unchanged until the socket is closed.
> 
> Socket is _closed_. Local port is reset only after socket is closed,
> unless PORT_USERLOCK is set. And sk->sport remains unchanged even
> after socket is closed, btw, so... 

   No, sk->sport does not remain unchanged. Imagine the following 
situation: the TCP stack receives an RST, tcp_reset() gets called. It 
calls tcp_done() -> tcp_set_state(TCP_CLOSE) -> tcp_put_port() -> 
__tcp_put_port(). __tcp_put_port() sets sk->num to zero. So, when you call 
send() from userspace, the mentioned part of inet_sendmsg() calls 
inet_autobind(), which changes sk->sport, too...

> I do recognize that current behaviour is weird, but I still want to know
> how this marginal weirdness escaped to be seen in reality.

   Yes, it is our "transparent proxy kernel hack" ((C) davem :) that have 
problem with this behaviour. Transparent proxying works the following way:

- You have to assign a foreign address to a bound socket using a specific 
setsockopt call. This creates a new entry in the tproxy local IP hash 
table, and the hash key is based on the local ip:port.
- tproxy registers its Netfilter hooks, and if a new connection comes in, 
it looks up its hash tables, to see if it is a transparent proxied 
connection (it uses Netfilter's connection tracking subsystem). If it has 
to be proxyed, the appropriate NAT mappings are applied to the connection.
- When the socket is closed (inet_release() is called, tproxy has its 
"callback" function here), the corresponding entries are deleted from 
tproxy's hash tables, again, based on the local ip:port pair.

   So, the weirdness causes the following problem: since sk->sport changes 
after the corresponding entry was inserted into the hash table, it 
changes, so the entry cannot be deleted when userspace calls close() on 
the socket.

   Eventually, it should be enough if inet_sendmsg() did not call 
inet_autobind() if the socket is in an errorneous state, but I had no idea 
how this could be implemented. (sk->err gets zeroed by sock_error(), so it 
cannot be checked twice.)

   In the case of UDP, you cannot get an RST-like thing, so AFAIK there is 
no such situation in which sk->num of a bound socket could be reset to 
zero. This is why inet_sendmsg()'s mentioned part did not cause problems 
for us.

-- 
   Regards,
     Krisztian KOVACS

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fw: Re: [PATCH] ipv4 tcp autobind problem
  2003-09-30 14:19   ` Kovacs Krisztian
@ 2003-10-07 11:56     ` kuznet
  2003-10-07 12:35       ` Kovacs Krisztian
  0 siblings, 1 reply; 8+ messages in thread
From: kuznet @ 2003-10-07 11:56 UTC (permalink / raw)
  To: Kovacs Krisztian
  Cc: David S. Miller, jmorris, netdev, linux-net, Balazs Scheidler

Hello!

> - When the socket is closed (inet_release() is called, tproxy has its 
> "callback" function here), the corresponding entries are deleted from 
> tproxy's hash tables, again, based on the local ip:port pair.

The mistake is here. inet_release() has nothing to do to connection
state. Socket is not closed at this point, it goes to FIN-WAIT*, LAST-ACK or
something like this and can exist for long time after this and proxying
is to be continued all this time.

Right place to finish tracking is when socket is removed from TCP
hash tables, and to start tracking is when the socket is inserted
to TCP hash tables. BTW you would not see the problem with binding
if it was made right.

Alexey

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ipv4 tcp autobind problem
  2003-10-07 11:56     ` kuznet
@ 2003-10-07 12:35       ` Kovacs Krisztian
  0 siblings, 0 replies; 8+ messages in thread
From: Kovacs Krisztian @ 2003-10-07 12:35 UTC (permalink / raw)
  To: kuznet; +Cc: David S. Miller, jmorris, netdev, linux-net, Balazs Scheidler

   Hi,

kuznet@ms2.inr.ac.ru wrote:
> Right place to finish tracking is when socket is removed from TCP
> hash tables, and to start tracking is when the socket is inserted
> to TCP hash tables. BTW you would not see the problem with binding
> if it was made right.

   Thanks for the reply. In the meantime, although because of slightly 
different reasons, we also came to the same conclusion. Right now I'm 
testing a version in which our "unassign" hook is removed from 
inet_release(), and added to udp_v4_unhash() and tcp_put_port().
However, I'm not quite sure there won't be locking problems when calling 
our hook from these places.

-- 
   Regards,
     Krisztian KOVACS

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2003-10-07 12:35 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20030930024320.5aca8ddc.davem@redhat.com>
2003-09-30 12:37 ` Fw: Re: [PATCH] ipv4 tcp autobind problem kuznet
2003-09-30 12:37   ` David S. Miller
2003-09-30 14:19   ` Kovacs Krisztian
2003-10-07 11:56     ` kuznet
2003-10-07 12:35       ` Kovacs Krisztian
2003-09-29 13:05 Kovacs Krisztian
2003-09-30  5:22 ` David S. Miller
2003-09-30  7:14   ` Kovacs Krisztian

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).