[PATCH] ipv4: synchronize bind() with RTM

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] ipv4: synchronize bind() with RTM_NEWADDR notifications
@ 2010-10-21 10:12 Timo Teräs
  2010-10-21 10:25 ` Eric Dumazet
  0 siblings, 1 reply; 13+ messages in thread
From: Timo Teräs @ 2010-10-21 10:12 UTC (permalink / raw)
  To: netdev; +Cc: Timo Teräs

Otherwise we have race condition to user land:
 1. process A changes IP address
 2. kernel sends RTM_NEWADDR
 3. process B gets notification
 4. process B tries to bind() to new IP but that fails with
EADDRNOTAVAIL because FIB is not yet updated and inet_addr_type() in
inet_bind() does not recognize the IP as local
 5. kernel calls inetaddr_chain notifiers which updates FIB

IPv6 side seems to handle the notifications properly: bind()
immediately after RTM_NEWADDR succeeds as expected. This is because
ipv6_chk_addr() uses inet6_addr_lst which is updated before address
notification.

Signed-off-by: Timo Teräs <timo.teras@iki.fi>
---
 net/ipv4/af_inet.c  |    9 +++++++++
 net/ipv6/af_inet6.c |    4 +++-
 2 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 6a1100c..21200e4 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -466,6 +466,15 @@ int inet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 	if (addr_len < sizeof(struct sockaddr_in))
 		goto out;
 
+	/* Acquire rtnl_lock to synchronize with possible simultaneous
+	 * IP-address changes. This is needed because when RTM_NEWADDR
+	 * is sent the new IP is not yet in FIB, but alas inet_addr_type
+	 * checks the address type using FIB. Acquiring rtnl lock once
+	 * makse sure that any address for which RTM_NEWADDR was sent
+	 * earlier exists also in FIB. */
+	rtnl_lock();
+	rtnl_unlock();
+
 	chk_addr_ret = inet_addr_type(sock_net(sk), addr->sin_addr.s_addr);
 
 	/* Not specified by any standard per-se, however it breaks too
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 56b9bf2..6fc37f4 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -300,7 +300,9 @@ int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 			goto out;
 		}
 
-		/* Reproduce AF_INET checks to make the bindings consitant */
+		/* Reproduce AF_INET checks to make the bindings consistent */
+		rtnl_lock();
+		rtnl_unlock();
 		v4addr = addr->sin6_addr.s6_addr32[3];
 		chk_addr_ret = inet_addr_type(net, v4addr);
 		if (!sysctl_ip_nonlocal_bind &&
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] ipv4: synchronize bind() with RTM_NEWADDR notifications
  2010-10-21 10:12 [PATCH] ipv4: synchronize bind() with RTM_NEWADDR notifications Timo Teräs
@ 2010-10-21 10:25 ` Eric Dumazet
  2010-10-21 10:41   ` Timo Teräs
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2010-10-21 10:25 UTC (permalink / raw)
  To: Timo Teräs; +Cc: netdev

Le jeudi 21 octobre 2010 à 13:12 +0300, Timo Teräs a écrit :
> Otherwise we have race condition to user land:
>  1. process A changes IP address
>  2. kernel sends RTM_NEWADDR
>  3. process B gets notification
>  4. process B tries to bind() to new IP but that fails with
> EADDRNOTAVAIL because FIB is not yet updated and inet_addr_type() in
> inet_bind() does not recognize the IP as local
>  5. kernel calls inetaddr_chain notifiers which updates FIB
> 
> IPv6 side seems to handle the notifications properly: bind()
> immediately after RTM_NEWADDR succeeds as expected. This is because
> ipv6_chk_addr() uses inet6_addr_lst which is updated before address
> notification.
> 
> Signed-off-by: Timo Teräs <timo.teras@iki.fi>
> ---
>  net/ipv4/af_inet.c  |    9 +++++++++
>  net/ipv6/af_inet6.c |    4 +++-
>  2 files changed, 12 insertions(+), 1 deletions(-)
> 
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index 6a1100c..21200e4 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -466,6 +466,15 @@ int inet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
>  	if (addr_len < sizeof(struct sockaddr_in))
>  		goto out;
>  
> +	/* Acquire rtnl_lock to synchronize with possible simultaneous
> +	 * IP-address changes. This is needed because when RTM_NEWADDR
> +	 * is sent the new IP is not yet in FIB, but alas inet_addr_type
> +	 * checks the address type using FIB. Acquiring rtnl lock once
> +	 * makse sure that any address for which RTM_NEWADDR was sent
> +	 * earlier exists also in FIB. */
> +	rtnl_lock();
> +	rtnl_unlock();

You must be kidding ?

Really, this is a hot path...




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] ipv4: synchronize bind() with RTM_NEWADDR notifications
  2010-10-21 10:25 ` Eric Dumazet
@ 2010-10-21 10:41   ` Timo Teräs
  2010-10-21 10:50     ` David Miller
  0 siblings, 1 reply; 13+ messages in thread
From: Timo Teräs @ 2010-10-21 10:41 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On 10/21/2010 01:25 PM, Eric Dumazet wrote:
> Le jeudi 21 octobre 2010 à 13:12 +0300, Timo Teräs a écrit :
>> Otherwise we have race condition to user land:
>>  1. process A changes IP address
>>  2. kernel sends RTM_NEWADDR
>>  3. process B gets notification
>>  4. process B tries to bind() to new IP but that fails with
>> EADDRNOTAVAIL because FIB is not yet updated and inet_addr_type() in
>> inet_bind() does not recognize the IP as local
>>  5. kernel calls inetaddr_chain notifiers which updates FIB
>>
>> IPv6 side seems to handle the notifications properly: bind()
>> immediately after RTM_NEWADDR succeeds as expected. This is because
>> ipv6_chk_addr() uses inet6_addr_lst which is updated before address
>> notification.
>>
>> Signed-off-by: Timo Teräs <timo.teras@iki.fi>
>> ---
>>  net/ipv4/af_inet.c  |    9 +++++++++
>>  net/ipv6/af_inet6.c |    4 +++-
>>  2 files changed, 12 insertions(+), 1 deletions(-)
>>
>> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
>> index 6a1100c..21200e4 100644
>> --- a/net/ipv4/af_inet.c
>> +++ b/net/ipv4/af_inet.c
>> @@ -466,6 +466,15 @@ int inet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
>>  	if (addr_len < sizeof(struct sockaddr_in))
>>  		goto out;
>>  
>> +	/* Acquire rtnl_lock to synchronize with possible simultaneous
>> +	 * IP-address changes. This is needed because when RTM_NEWADDR
>> +	 * is sent the new IP is not yet in FIB, but alas inet_addr_type
>> +	 * checks the address type using FIB. Acquiring rtnl lock once
>> +	 * makse sure that any address for which RTM_NEWADDR was sent
>> +	 * earlier exists also in FIB. */
>> +	rtnl_lock();
>> +	rtnl_unlock();
> 
> You must be kidding ?
> 
> Really, this is a hot path...

Is inet_bind() called from non-userland context? If yes, then this is a
bad idea. Otherwise I don't think it's that hot path...

The other idea of doing notifier calls before RTM_NEWADDR sending is
worse because it changes ordering of userland visible netlink notifications.

This looked like the easiest way out. If this is unacceptable, I guess
we are left with changing inet_addr_type() to not use FIB.

Or is there better ideas?



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] ipv4: synchronize bind() with RTM_NEWADDR notifications
  2010-10-21 10:41   ` Timo Teräs
@ 2010-10-21 10:50     ` David Miller
  2010-10-21 10:58       ` Timo Teräs
  0 siblings, 1 reply; 13+ messages in thread
From: David Miller @ 2010-10-21 10:50 UTC (permalink / raw)
  To: timo.teras; +Cc: eric.dumazet, netdev

From: Timo Teräs <timo.teras@iki.fi>
Date: Thu, 21 Oct 2010 13:41:37 +0300

> Is inet_bind() called from non-userland context? If yes, then this is a
> bad idea. Otherwise I don't think it's that hot path...

It is.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] ipv4: synchronize bind() with RTM_NEWADDR notifications
  2010-10-21 10:50     ` David Miller
@ 2010-10-21 10:58       ` Timo Teräs
  2010-10-21 11:03         ` David Miller
  2010-10-21 11:12         ` [PATCH] " Eric Dumazet
  0 siblings, 2 replies; 13+ messages in thread
From: Timo Teräs @ 2010-10-21 10:58 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, netdev

On 10/21/2010 01:50 PM, David Miller wrote:
> From: Timo Teräs <timo.teras@iki.fi>
> Date: Thu, 21 Oct 2010 13:41:37 +0300
> 
>> Is inet_bind() called from non-userland context? If yes, then this is a
>> bad idea. Otherwise I don't think it's that hot path...
> 
> It is.

Yet, almost immediately after that there is lock_sock() which can also
sleep. How does that work then?


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] ipv4: synchronize bind() with RTM_NEWADDR notifications
  2010-10-21 10:58       ` Timo Teräs
@ 2010-10-21 11:03         ` David Miller
  2010-10-21 11:29           ` Timo Teräs
  2010-10-21 11:12         ` [PATCH] " Eric Dumazet
  1 sibling, 1 reply; 13+ messages in thread
From: David Miller @ 2010-10-21 11:03 UTC (permalink / raw)
  To: timo.teras; +Cc: eric.dumazet, netdev

From: Timo Teräs <timo.teras@iki.fi>
Date: Thu, 21 Oct 2010 13:58:08 +0300

> On 10/21/2010 01:50 PM, David Miller wrote:
>> From: Timo Teräs <timo.teras@iki.fi>
>> Date: Thu, 21 Oct 2010 13:41:37 +0300
>> 
>>> Is inet_bind() called from non-userland context? If yes, then this is a
>>> bad idea. Otherwise I don't think it's that hot path...
>> 
>> It is.
> 
> Yet, almost immediately after that there is lock_sock() which can also
> sleep. How does that work then?

RTNL interlocks globally with a heavy primitive, a mutex, lock_sock()
grabs a spinlcok which is local to the socket's context.

So if we have 100,000 sockets binding we'll suck with you're change
whereas the lock_sock() does not cause that problem.

Is this so difficult for you to comprehend?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] ipv4: synchronize bind() with RTM_NEWADDR notifications
  2010-10-21 10:58       ` Timo Teräs
  2010-10-21 11:03         ` David Miller
@ 2010-10-21 11:12         ` Eric Dumazet
  1 sibling, 0 replies; 13+ messages in thread
From: Eric Dumazet @ 2010-10-21 11:12 UTC (permalink / raw)
  To: Timo Teräs; +Cc: David Miller, netdev

Le jeudi 21 octobre 2010 à 13:58 +0300, Timo Teräs a écrit :
> On 10/21/2010 01:50 PM, David Miller wrote:
> > From: Timo Teräs <timo.teras@iki.fi>
> > Date: Thu, 21 Oct 2010 13:41:37 +0300
> > 
> >> Is inet_bind() called from non-userland context? If yes, then this is a
> >> bad idea. Otherwise I don't think it's that hot path...
> > 
> > It is.
> 
> Yet, almost immediately after that there is lock_sock() which can also
> sleep. How does that work then?
> 

I am not sure I understand your question. Maybe my answer wont be clear.

rtnl_lock() can take ages on some setups, because its using one single
and shared mutex. Its use should be restricted to administrative
purposes.

bind() is a pretty hot path on many workloads, this is hardly what we
call an administrative task.

lock_sock() uses a per socket lock, and it is a _bit_ more scalable, you
can have 4096 cpus all using bind()/recv()/send()/... at the same time,
it just works.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] ipv4: synchronize bind() with RTM_NEWADDR notifications
  2010-10-21 11:03         ` David Miller
@ 2010-10-21 11:29           ` Timo Teräs
  2010-10-21 11:34             ` David Miller
  0 siblings, 1 reply; 13+ messages in thread
From: Timo Teräs @ 2010-10-21 11:29 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, netdev

On 10/21/2010 02:03 PM, David Miller wrote:
> From: Timo Teräs <timo.teras@iki.fi>
> Date: Thu, 21 Oct 2010 13:58:08 +0300
> 
>> On 10/21/2010 01:50 PM, David Miller wrote:
>>> From: Timo Teräs <timo.teras@iki.fi>
>>> Date: Thu, 21 Oct 2010 13:41:37 +0300
>>>
>>>> Is inet_bind() called from non-userland context? If yes, then this is a
>>>> bad idea. Otherwise I don't think it's that hot path...
>>>
>>> It is.
>>
>> Yet, almost immediately after that there is lock_sock() which can also
>> sleep. How does that work then?
> 
> RTNL interlocks globally with a heavy primitive, a mutex, lock_sock()
> grabs a spinlcok which is local to the socket's context.
> 
> So if we have 100,000 sockets binding we'll suck with you're change
> whereas the lock_sock() does not cause that problem.
> 
> Is this so difficult for you to comprehend?

I was confused with Dave's original reply "It is." as referring to that
inet_bind() can get called from non-userland context. But apparently you
just meant that "It is (bad idea regardless)."

I thought the problem was possible sleeping, and not contention. Which
became very obvious from Eric's example. I didn't realize that many do
bind()/recv()/send() as general workload.

Sorry for not seeing the obvious.

This is the third time asking, what would be a good way to fix the
problem described in the original commit log?

Changing RTM_NEWADDR after FIB update would break Netlink event
ordering. And this breaks performance. I can't really use RTN_LOCAL
RTM_NEWROUTE events since (at least IPv6 side) has incorrect ifindex.

Should inet_addr_type() be rewritten to not use FIB lookups?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] ipv4: synchronize bind() with RTM_NEWADDR notifications
  2010-10-21 11:29           ` Timo Teräs
@ 2010-10-21 11:34             ` David Miller
  2010-10-21 11:57               ` Timo Teräs
  0 siblings, 1 reply; 13+ messages in thread
From: David Miller @ 2010-10-21 11:34 UTC (permalink / raw)
  To: timo.teras; +Cc: eric.dumazet, netdev

From: Timo Teräs <timo.teras@iki.fi>
Date: Thu, 21 Oct 2010 14:29:22 +0300

> This is the third time asking, what would be a good way to fix the
> problem described in the original commit log?

I kept your report in my inbox backlog and intended to think about
it as time permitted.

As the merge window has just openned up, for me time will not be
"permitted" for some time.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] ipv4: synchronize bind() with RTM_NEWADDR notifications
  2010-10-21 11:34             ` David Miller
@ 2010-10-21 11:57               ` Timo Teräs
  2010-10-21 13:06                 ` [PATCH v2] " Timo Teräs
  0 siblings, 1 reply; 13+ messages in thread
From: Timo Teräs @ 2010-10-21 11:57 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, netdev

On 10/21/2010 02:34 PM, David Miller wrote:
> From: Timo Teräs <timo.teras@iki.fi>
> Date: Thu, 21 Oct 2010 14:29:22 +0300
> 
>> This is the third time asking, what would be a good way to fix the
>> problem described in the original commit log?
> 
> I kept your report in my inbox backlog and intended to think about
> it as time permitted.
> 
> As the merge window has just openned up, for me time will not be
> "permitted" for some time.

Fair enough. Just getting multiple "does not work" without single
mention of "will think about this later" felt frustrating.

I have one more alternative: Would it be acceptable if we did
rtnl_lock()/rtnl_unlock() only if inet_addr_type() earlier said the
address is unacceptable for binding. And then retry inet_addr_type(). Or
do you think that the error case EADDRNOTAVAIL is a hot path too?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2] ipv4: synchronize bind() with RTM_NEWADDR notifications
  2010-10-21 11:57               ` Timo Teräs
@ 2010-10-21 13:06                 ` Timo Teräs
  2010-10-21 14:10                   ` Eric Dumazet
  0 siblings, 1 reply; 13+ messages in thread
From: Timo Teräs @ 2010-10-21 13:06 UTC (permalink / raw)
  To: David Miller, eric.dumazet, netdev; +Cc: Timo Teräs

Otherwise we have race condition to user land:
 1. process A: changes IP address
 2. process A: kernel sends RTM_NEWADDR (and schedules out)
 3. process B: gets notification
 4. process B: tries to bind() to new IP, but fails with EADDRNOTAVAIL
      because FIB is not yet updated and inet_addr_type() in inet_bind()
      does not recognize the IP as local
 5. process A: calls inetaddr_chain notifiers which updates FIB

Fix the error path to synchronize with configuration changes and retry
the address type check.

IPv6 side seems to handle the notifications properly: bind() immediately
after RTM_NEWADDR succeeds as expected.  This is because ipv6_chk_addr()
uses inet6_addr_lst which is updated before address notification.

Signed-off-by: Timo Teräs <timo.teras@iki.fi>
---
Since there was no reply to my question if this is ok, I interpreted it
as "maybe". So here's the code for review. Hopefully this helps determining
if this is an acceptable fix.

 net/ipv4/af_inet.c  |   18 ++++++++++++++++--
 net/ipv6/af_inet6.c |   13 ++++++++++---
 2 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 6a1100c..013ab11 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -481,8 +481,22 @@ int inet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 	    addr->sin_addr.s_addr != htonl(INADDR_ANY) &&
 	    chk_addr_ret != RTN_LOCAL &&
 	    chk_addr_ret != RTN_MULTICAST &&
-	    chk_addr_ret != RTN_BROADCAST)
-		goto out;
+	    chk_addr_ret != RTN_BROADCAST) {
+		/* inet_addr_type() does a FIB lookup to check the
+		 * address type. Since FIB is updated after sending
+		 * RTM_NEWADDR notification, an application may end up
+		 * doing bind() before the FIB is updated. To avoid
+		 * returning a false negative, wait for possible ongoing
+		 * address changes to finish by acquiring rtnl lock and
+		 * retry the address type lookup. */
+		rtnl_lock();
+		rtnl_unlock();
+		chk_addr_ret = inet_addr_type(sock_net(sk), addr->sin_addr.s_addr);
+		if (chk_addr_ret != RTN_LOCAL &&
+		    chk_addr_ret != RTN_MULTICAST &&
+		    chk_addr_ret != RTN_BROADCAST)
+			goto out;
+	}
 
 	snum = ntohs(addr->sin_port);
 	err = -EACCES;
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 56b9bf2..b1a83e1 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -300,7 +300,7 @@ int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 			goto out;
 		}
 
-		/* Reproduce AF_INET checks to make the bindings consitant */
+		/* Reproduce AF_INET checks to make the bindings consistent */
 		v4addr = addr->sin6_addr.s6_addr32[3];
 		chk_addr_ret = inet_addr_type(net, v4addr);
 		if (!sysctl_ip_nonlocal_bind &&
@@ -309,8 +309,15 @@ int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 		    chk_addr_ret != RTN_LOCAL &&
 		    chk_addr_ret != RTN_MULTICAST &&
 		    chk_addr_ret != RTN_BROADCAST) {
-			err = -EADDRNOTAVAIL;
-			goto out;
+			rtnl_lock();
+			rtnl_unlock();
+			chk_addr_ret = inet_addr_type(net, v4addr);
+			if (chk_addr_ret != RTN_LOCAL &&
+			    chk_addr_ret != RTN_MULTICAST &&
+			    chk_addr_ret != RTN_BROADCAST) {
+				err = -EADDRNOTAVAIL;
+				goto out;
+			}
 		}
 	} else {
 		if (addr_type != IPV6_ADDR_ANY) {
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] ipv4: synchronize bind() with RTM_NEWADDR notifications
  2010-10-21 13:06                 ` [PATCH v2] " Timo Teräs
@ 2010-10-21 14:10                   ` Eric Dumazet
  2010-10-21 19:01                     ` Timo Teräs
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2010-10-21 14:10 UTC (permalink / raw)
  To: Timo Teräs; +Cc: David Miller, netdev

Le jeudi 21 octobre 2010 à 16:06 +0300, Timo Teräs a écrit :
> Otherwise we have race condition to user land:
>  1. process A: changes IP address
>  2. process A: kernel sends RTM_NEWADDR (and schedules out)
>  3. process B: gets notification
>  4. process B: tries to bind() to new IP, but fails with EADDRNOTAVAIL
>       because FIB is not yet updated and inet_addr_type() in inet_bind()
>       does not recognize the IP as local
>  5. process A: calls inetaddr_chain notifiers which updates FIB
> 
> Fix the error path to synchronize with configuration changes and retry
> the address type check.
> 
> IPv6 side seems to handle the notifications properly: bind() immediately
> after RTM_NEWADDR succeeds as expected.  This is because ipv6_chk_addr()
> uses inet6_addr_lst which is updated before address notification.
> 
> Signed-off-by: Timo Teräs <timo.teras@iki.fi>
> ---
> Since there was no reply to my question if this is ok, I interpreted it
> as "maybe". So here's the code for review. Hopefully this helps determining
> if this is an acceptable fix.
> 

Just say : no

Really Timo, this problem must get another fix.

I understand you need an urgent fix, you can use your patch in the
meantime, of course ;)




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] ipv4: synchronize bind() with RTM_NEWADDR notifications
  2010-10-21 14:10                   ` Eric Dumazet
@ 2010-10-21 19:01                     ` Timo Teräs
  0 siblings, 0 replies; 13+ messages in thread
From: Timo Teräs @ 2010-10-21 19:01 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev

On 10/21/2010 05:10 PM, Eric Dumazet wrote:
> Le jeudi 21 octobre 2010 à 16:06 +0300, Timo Teräs a écrit :
>> Otherwise we have race condition to user land:
>>  1. process A: changes IP address
>>  2. process A: kernel sends RTM_NEWADDR (and schedules out)
>>  3. process B: gets notification
>>  4. process B: tries to bind() to new IP, but fails with EADDRNOTAVAIL
>>       because FIB is not yet updated and inet_addr_type() in inet_bind()
>>       does not recognize the IP as local
>>  5. process A: calls inetaddr_chain notifiers which updates FIB
>>
>> Fix the error path to synchronize with configuration changes and retry
>> the address type check.
>>
>> IPv6 side seems to handle the notifications properly: bind() immediately
>> after RTM_NEWADDR succeeds as expected.  This is because ipv6_chk_addr()
>> uses inet6_addr_lst which is updated before address notification.
>>
>> Signed-off-by: Timo Teräs <timo.teras@iki.fi>
>> ---
>> Since there was no reply to my question if this is ok, I interpreted it
>> as "maybe". So here's the code for review. Hopefully this helps determining
>> if this is an acceptable fix.
> 
> Just say : no
> 
> Really Timo, this problem must get another fix.
> 
> I understand you need an urgent fix, you can use your patch in the
> meantime, of course ;)

Fine. I figured you might feel this way. My final idea to fix this, is
to modify inet_addr_type() do the address type check using the ifa
lists. This probably involves making ifa lists rcu (did not take close
look on how ifa lists work in current ipv4 code). This is basically how
ipv6 side works. I'm not too sure how to go on with this, so I'll wait
up until someone more qualified gets the time to look at this.

I think for my immediate needs I might be able to get away with using
the kludge patch, enabling non-local binding or fixing my userland code
to listen RTM_NEWROUTE/RTN_LOCAL events (though, IPv6 link-local
addresses are special and would need RTM_NEWADDR handling still to get
the real device's ifindex).

Btw. why do IPv6 RTN_LOCAL routes have loopback interface as dst instead
of the real device? IPv4 RTN_LOCAL routes seem to have the real device
so this looks inconsistent at first sight. I guess IPv6 requires this
for a bunch of other things. This way it's just not really possible to
get link-local IPv6 routes.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2010-10-21 19:01 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-21 10:12 [PATCH] ipv4: synchronize bind() with RTM_NEWADDR notifications Timo Teräs
2010-10-21 10:25 ` Eric Dumazet
2010-10-21 10:41   ` Timo Teräs
2010-10-21 10:50     ` David Miller
2010-10-21 10:58       ` Timo Teräs
2010-10-21 11:03         ` David Miller
2010-10-21 11:29           ` Timo Teräs
2010-10-21 11:34             ` David Miller
2010-10-21 11:57               ` Timo Teräs
2010-10-21 13:06                 ` [PATCH v2] " Timo Teräs
2010-10-21 14:10                   ` Eric Dumazet
2010-10-21 19:01                     ` Timo Teräs
2010-10-21 11:12         ` [PATCH] " Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).