[PATCH] ipvs: Add missing locking during connection table hashing and unhashing

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] ipvs: Add missing locking during connection table hashing and unhashing
@ 2010-05-20 20:55 Sven Wegener
  2010-05-24 23:56 ` Simon Horman
  0 siblings, 1 reply; 4+ messages in thread
From: Sven Wegener @ 2010-05-20 20:55 UTC (permalink / raw)
  To: Simon Horman, Julian Anastasov, Wensong Zhang; +Cc: netdev, lvs-devel

The code that hashes and unhashes connections from the connection table
is missing locking of the connection being modified, which opens up a
race condition and results in memory corruption when this race condition
is hit.

Here is what happens in pretty verbose form:

CPU 0					CPU 1
------------				------------
An active connection is terminated and
we schedule ip_vs_conn_expire() on this
CPU to expire this connection.

					IRQ assignment is changed to this CPU,
					but the expire timer stays scheduled on
					the other CPU.

					New connection from same ip:port comes
					in right before the timer expires, we
					find the inactive connection in our
					connection table and get a reference to
					it. We proper lock the connection in
					tcp_state_transition() and read the
					connection flags in set_tcp_state().

ip_vs_conn_expire() gets called, we
unhash the connection from our
connection table and remove the hashed
flag in ip_vs_conn_unhash(), without
proper locking!

					While still holding proper locks we
					write the connection flags in
					set_tcp_state() and this sets the hashed
					flag again.

ip_vs_conn_expire() fails to expire the
connection, because the other CPU has
incremented the reference count. We try
to re-insert the connection into our
connection table, but this fails in
ip_vs_conn_hash(), because the hashed
flag has been set by the other CPU. We
re-schedule execution of
ip_vs_conn_expire(). Now this connection
has the hashed flag set, but isn't
actually hashed in our connection table
and has a dangling list_head.

					We drop the reference we held on the
					connection and schedule the expire timer
					for timeouting the connection on this
					CPU. Further packets won't be able to
					find this connection in our connection
					table.

					ip_vs_conn_expire() gets called again,
					we think it's already hashed, but the
					list_head is dangling and while removing
					the connection from our connection table
					we write to the memory location where
					this list_head points to.

The result will probably be a kernel oops at some other point in time.

Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Cc: stable@kernel.org
---
 net/netfilter/ipvs/ip_vs_conn.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

This race condition is pretty subtle, but it can be triggered remotely.
It needs the IRQ assignment change or another circumstance where packets
coming from the same ip:port for the same service are being processed on
different CPUs. And it involves hitting the exact time at which
ip_vs_conn_expire() gets called. It can be avoided by making sure that
all packets from one connection are always processed on the same CPU and
can be made harder to exploit by changing the connection timeouts to
some custom values.

diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
index d8f7e8e..ff04e9e 100644
--- a/net/netfilter/ipvs/ip_vs_conn.c
+++ b/net/netfilter/ipvs/ip_vs_conn.c
@@ -162,6 +162,7 @@ static inline int ip_vs_conn_hash(struct ip_vs_conn *cp)
 	hash = ip_vs_conn_hashkey(cp->af, cp->protocol, &cp->caddr, cp->cport);

 	ct_write_lock(hash);
+	spin_lock(&cp->lock);

 	if (!(cp->flags & IP_VS_CONN_F_HASHED)) {
 		list_add(&cp->c_list, &ip_vs_conn_tab[hash]);
@@ -174,6 +175,7 @@ static inline int ip_vs_conn_hash(struct ip_vs_conn *cp)
 		ret = 0;
 	}

+	spin_unlock(&cp->lock);
 	ct_write_unlock(hash);

 	return ret;
@@ -193,6 +195,7 @@ static inline int ip_vs_conn_unhash(struct ip_vs_conn *cp)
 	hash = ip_vs_conn_hashkey(cp->af, cp->protocol, &cp->caddr, cp->cport);

 	ct_write_lock(hash);
+	spin_lock(&cp->lock);

 	if (cp->flags & IP_VS_CONN_F_HASHED) {
 		list_del(&cp->c_list);
@@ -202,6 +205,7 @@ static inline int ip_vs_conn_unhash(struct ip_vs_conn *cp)
 	} else
 		ret = 0;

+	spin_unlock(&cp->lock);
 	ct_write_unlock(hash);

 	return ret;

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] ipvs: Add missing locking during connection table hashing and unhashing
  2010-05-20 20:55 [PATCH] ipvs: Add missing locking during connection table hashing and unhashing Sven Wegener
@ 2010-05-24 23:56 ` Simon Horman
  2010-06-08  6:29   ` Sven Wegener
  0 siblings, 1 reply; 4+ messages in thread
From: Simon Horman @ 2010-05-24 23:56 UTC (permalink / raw)
  To: Sven Wegener; +Cc: Julian Anastasov, Wensong Zhang, netdev, lvs-devel

On Thu, May 20, 2010 at 10:55:32PM +0200, Sven Wegener wrote:
> The code that hashes and unhashes connections from the connection table
> is missing locking of the connection being modified, which opens up a
> race condition and results in memory corruption when this race condition
> is hit.
> 
> Here is what happens in pretty verbose form:
> 
> CPU 0					CPU 1
> ------------				------------
> An active connection is terminated and
> we schedule ip_vs_conn_expire() on this
> CPU to expire this connection.
> 
> 					IRQ assignment is changed to this CPU,
> 					but the expire timer stays scheduled on
> 					the other CPU.
> 
> 					New connection from same ip:port comes
> 					in right before the timer expires, we
> 					find the inactive connection in our
> 					connection table and get a reference to
> 					it. We proper lock the connection in
> 					tcp_state_transition() and read the
> 					connection flags in set_tcp_state().
> 
> ip_vs_conn_expire() gets called, we
> unhash the connection from our
> connection table and remove the hashed
> flag in ip_vs_conn_unhash(), without
> proper locking!
> 
> 					While still holding proper locks we
> 					write the connection flags in
> 					set_tcp_state() and this sets the hashed
> 					flag again.
> 
> ip_vs_conn_expire() fails to expire the
> connection, because the other CPU has
> incremented the reference count. We try
> to re-insert the connection into our
> connection table, but this fails in
> ip_vs_conn_hash(), because the hashed
> flag has been set by the other CPU. We
> re-schedule execution of
> ip_vs_conn_expire(). Now this connection
> has the hashed flag set, but isn't
> actually hashed in our connection table
> and has a dangling list_head.
> 
> 					We drop the reference we held on the
> 					connection and schedule the expire timer
> 					for timeouting the connection on this
> 					CPU. Further packets won't be able to
> 					find this connection in our connection
> 					table.
> 
> 					ip_vs_conn_expire() gets called again,
> 					we think it's already hashed, but the
> 					list_head is dangling and while removing
> 					the connection from our connection table
> 					we write to the memory location where
> 					this list_head points to.
> 
> The result will probably be a kernel oops at some other point in time.

Nice analysis.

> Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
> Cc: stable@kernel.org

Acked-by: Simon Horman <horms@verge.net.au>

> ---
>  net/netfilter/ipvs/ip_vs_conn.c |    4 ++++
>  1 files changed, 4 insertions(+), 0 deletions(-)
> 
> This race condition is pretty subtle, but it can be triggered remotely.
> It needs the IRQ assignment change or another circumstance where packets
> coming from the same ip:port for the same service are being processed on
> different CPUs. And it involves hitting the exact time at which
> ip_vs_conn_expire() gets called. It can be avoided by making sure that
> all packets from one connection are always processed on the same CPU and
> can be made harder to exploit by changing the connection timeouts to
> some custom values.
> 
> diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
> index d8f7e8e..ff04e9e 100644
> --- a/net/netfilter/ipvs/ip_vs_conn.c
> +++ b/net/netfilter/ipvs/ip_vs_conn.c
> @@ -162,6 +162,7 @@ static inline int ip_vs_conn_hash(struct ip_vs_conn *cp)
>  	hash = ip_vs_conn_hashkey(cp->af, cp->protocol, &cp->caddr, cp->cport);
>  
>  	ct_write_lock(hash);
> +	spin_lock(&cp->lock);
>  
>  	if (!(cp->flags & IP_VS_CONN_F_HASHED)) {
>  		list_add(&cp->c_list, &ip_vs_conn_tab[hash]);
> @@ -174,6 +175,7 @@ static inline int ip_vs_conn_hash(struct ip_vs_conn *cp)
>  		ret = 0;
>  	}
>  
> +	spin_unlock(&cp->lock);
>  	ct_write_unlock(hash);
>  
>  	return ret;
> @@ -193,6 +195,7 @@ static inline int ip_vs_conn_unhash(struct ip_vs_conn *cp)
>  	hash = ip_vs_conn_hashkey(cp->af, cp->protocol, &cp->caddr, cp->cport);
>  
>  	ct_write_lock(hash);
> +	spin_lock(&cp->lock);
>  
>  	if (cp->flags & IP_VS_CONN_F_HASHED) {
>  		list_del(&cp->c_list);
> @@ -202,6 +205,7 @@ static inline int ip_vs_conn_unhash(struct ip_vs_conn *cp)
>  	} else
>  		ret = 0;
>  
> +	spin_unlock(&cp->lock);
>  	ct_write_unlock(hash);
>  
>  	return ret;

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH] ipvs: Add missing locking during connection table hashing and unhashing
  2010-05-24 23:56 ` Simon Horman
@ 2010-06-08  6:29   ` Sven Wegener
  2010-06-09 14:12     ` Patrick McHardy
  0 siblings, 1 reply; 4+ messages in thread
From: Sven Wegener @ 2010-06-08  6:29 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Julian Anastasov, Simon Horman, Wensong Zhang, netdev, lvs-devel

The code that hashes and unhashes connections from the connection table
is missing locking of the connection being modified, which opens up a
race condition and results in memory corruption when this race condition
is hit.

Here is what happens in pretty verbose form:

CPU 0					CPU 1
------------				------------
An active connection is terminated and
we schedule ip_vs_conn_expire() on this
CPU to expire this connection.

					IRQ assignment is changed to this CPU,
					but the expire timer stays scheduled on
					the other CPU.

					New connection from same ip:port comes
					in right before the timer expires, we
					find the inactive connection in our
					connection table and get a reference to
					it. We proper lock the connection in
					tcp_state_transition() and read the
					connection flags in set_tcp_state().

ip_vs_conn_expire() gets called, we
unhash the connection from our
connection table and remove the hashed
flag in ip_vs_conn_unhash(), without
proper locking!

					While still holding proper locks we
					write the connection flags in
					set_tcp_state() and this sets the hashed
					flag again.

ip_vs_conn_expire() fails to expire the
connection, because the other CPU has
incremented the reference count. We try
to re-insert the connection into our
connection table, but this fails in
ip_vs_conn_hash(), because the hashed
flag has been set by the other CPU. We
re-schedule execution of
ip_vs_conn_expire(). Now this connection
has the hashed flag set, but isn't
actually hashed in our connection table
and has a dangling list_head.

					We drop the reference we held on the
					connection and schedule the expire timer
					for timeouting the connection on this
					CPU. Further packets won't be able to
					find this connection in our connection
					table.

					ip_vs_conn_expire() gets called again,
					we think it's already hashed, but the
					list_head is dangling and while removing
					the connection from our connection table
					we write to the memory location where
					this list_head points to.

The result will probably be a kernel oops at some other point in time.

Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Cc: stable@kernel.org
Acked-by: Simon Horman <horms@verge.net.au>
---
 net/netfilter/ipvs/ip_vs_conn.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

This race condition is pretty subtle, but it can be triggered remotely.
It needs the IRQ assignment change or another circumstance where packets
coming from the same ip:port for the same service are being processed on
different CPUs. And it involves hitting the exact time at which
ip_vs_conn_expire() gets called. It can be avoided by making sure that
all packets from one connection are always processed on the same CPU and
can be made harder to exploit by changing the connection timeouts to
some custom values.

diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
index d8f7e8e..ff04e9e 100644
--- a/net/netfilter/ipvs/ip_vs_conn.c
+++ b/net/netfilter/ipvs/ip_vs_conn.c
@@ -162,6 +162,7 @@ static inline int ip_vs_conn_hash(struct ip_vs_conn *cp)
 	hash = ip_vs_conn_hashkey(cp->af, cp->protocol, &cp->caddr, cp->cport);

 	ct_write_lock(hash);
+	spin_lock(&cp->lock);

 	if (!(cp->flags & IP_VS_CONN_F_HASHED)) {
 		list_add(&cp->c_list, &ip_vs_conn_tab[hash]);
@@ -174,6 +175,7 @@ static inline int ip_vs_conn_hash(struct ip_vs_conn *cp)
 		ret = 0;
 	}

+	spin_unlock(&cp->lock);
 	ct_write_unlock(hash);

 	return ret;
@@ -193,6 +195,7 @@ static inline int ip_vs_conn_unhash(struct ip_vs_conn *cp)
 	hash = ip_vs_conn_hashkey(cp->af, cp->protocol, &cp->caddr, cp->cport);

 	ct_write_lock(hash);
+	spin_lock(&cp->lock);

 	if (cp->flags & IP_VS_CONN_F_HASHED) {
 		list_del(&cp->c_list);
@@ -202,6 +205,7 @@ static inline int ip_vs_conn_unhash(struct ip_vs_conn *cp)
 	} else
 		ret = 0;

+	spin_unlock(&cp->lock);
 	ct_write_unlock(hash);

 	return ret;

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] ipvs: Add missing locking during connection table hashing and unhashing
  2010-06-08  6:29   ` Sven Wegener
@ 2010-06-09 14:12     ` Patrick McHardy
  0 siblings, 0 replies; 4+ messages in thread
From: Patrick McHardy @ 2010-06-09 14:12 UTC (permalink / raw)
  To: Sven Wegener
  Cc: Julian Anastasov, Simon Horman, Wensong Zhang, netdev, lvs-devel

Sven Wegener wrote:
> The code that hashes and unhashes connections from the connection table
> is missing locking of the connection being modified, which opens up a
> race condition and results in memory corruption when this race condition
> is hit.
>
> Here is what happens in pretty verbose form:
>
> CPU 0					CPU 1
> ------------				------------
> An active connection is terminated and
> we schedule ip_vs_conn_expire() on this
> CPU to expire this connection.
>
> 					IRQ assignment is changed to this CPU,
> 					but the expire timer stays scheduled on
> 					the other CPU.
>
> 					New connection from same ip:port comes
> 					in right before the timer expires, we
> 					find the inactive connection in our
> 					connection table and get a reference to
> 					it. We proper lock the connection in
> 					tcp_state_transition() and read the
> 					connection flags in set_tcp_state().
>
> ip_vs_conn_expire() gets called, we
> unhash the connection from our
> connection table and remove the hashed
> flag in ip_vs_conn_unhash(), without
> proper locking!
>
> 					While still holding proper locks we
> 					write the connection flags in
> 					set_tcp_state() and this sets the hashed
> 					flag again.
>
> ip_vs_conn_expire() fails to expire the
> connection, because the other CPU has
> incremented the reference count. We try
> to re-insert the connection into our
> connection table, but this fails in
> ip_vs_conn_hash(), because the hashed
> flag has been set by the other CPU. We
> re-schedule execution of
> ip_vs_conn_expire(). Now this connection
> has the hashed flag set, but isn't
> actually hashed in our connection table
> and has a dangling list_head.
>
> 					We drop the reference we held on the
> 					connection and schedule the expire timer
> 					for timeouting the connection on this
> 					CPU. Further packets won't be able to
> 					find this connection in our connection
> 					table.
>
> 					ip_vs_conn_expire() gets called again,
> 					we think it's already hashed, but the
> 					list_head is dangling and while removing
> 					the connection from our connection table
> 					we write to the memory location where
> 					this list_head points to.
>
> The result will probably be a kernel oops at some other point in time.
>
> Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
> Cc: stable@kernel.org
> Acked-by: Simon Horman <horms@verge.net.au>
> ---
>  net/netfilter/ipvs/ip_vs_conn.c |    4 ++++
>  1 files changed, 4 insertions(+), 0 deletions(-)
>
> This race condition is pretty subtle, but it can be triggered remotely.
> It needs the IRQ assignment change or another circumstance where packets
> coming from the same ip:port for the same service are being processed on
> different CPUs. And it involves hitting the exact time at which
> ip_vs_conn_expire() gets called. It can be avoided by making sure that
> all packets from one connection are always processed on the same CPU and
> can be made harder to exploit by changing the connection timeouts to
> some custom values.
>
>   

Applied, thanks.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-06-09 14:12 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-20 20:55 [PATCH] ipvs: Add missing locking during connection table hashing and unhashing Sven Wegener
2010-05-24 23:56 ` Simon Horman
2010-06-08  6:29   ` Sven Wegener
2010-06-09 14:12     ` Patrick McHardy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).