[PATCH 3/3][CONNTRACK] Fix race condition in early drop

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 3/3][CONNTRACK] Fix race condition in early drop
@ 2006-08-21  8:47 Pablo Neira Ayuso
  2006-08-22  4:35 ` Yasuyuki KOZAKAI
       [not found] ` <200608220435.k7M4ZSLf001686@toshiba.co.jp>
  0 siblings, 2 replies; 8+ messages in thread
From: Pablo Neira Ayuso @ 2006-08-21  8:47 UTC (permalink / raw)
  To: Netfilter Development Mailinglist; +Cc: Harald Welte, Patrick McHardy

[-- Attachment #1: Type: text/plain, Size: 705 bytes --]

[CONNTRACK] Fix race condition in early drop

On SMP environments the maximum number of conntracks can be overpassed 
under heavy stress situations due to an existing race condition.

        CPU A                   CPU B
     atomic_read()               ...
     early_drop()                ...
        ...                  atomic_read()
   allocate conntrack      allocate conntrack
     atomic_inc()             atomic_inc()

This patch uses an optimistic approach to solve the concurrency problem.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

-- 
The dawn of the fourth age of Linux firewalling is coming; a time of 
great struggle and heroic deeds -- J.Kadlecsik got inspired by J.Morris

[-- Attachment #2: 09race.patch --]
[-- Type: text/plain, Size: 4111 bytes --]

[CONNTRACK] Fix race condition in early drop

On SMP environments the maximum number of conntracks can be overpassed
under heavy stress situations due to an existing race condition.

       CPU A                   CPU B
    atomic_read()               ...
    early_drop()                ...
       ...                  atomic_read()
  allocate conntrack      allocate conntrack
    atomic_inc()             atomic_inc()

This patch uses an optimistic approach to solve the concurrency problem.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Index: net-2.6/net/ipv4/netfilter/ip_conntrack_core.c
===================================================================
--- net-2.6.orig/net/ipv4/netfilter/ip_conntrack_core.c	2006-08-17 15:50:33.000000000 +0200
+++ net-2.6/net/ipv4/netfilter/ip_conntrack_core.c	2006-08-17 17:52:27.000000000 +0200
@@ -642,21 +642,32 @@ struct ip_conntrack *ip_conntrack_alloc(
 	}
 
 	if (ip_conntrack_max
-	    && atomic_read(&ip_conntrack_count) >= ip_conntrack_max) {
+	    && !atomic_add_unless(&ip_conntrack_count, 1, ip_conntrack_max)) {
 		unsigned int hash = hash_conntrack(orig);
 		/* Try dropping from this hash chain. */
-		if (!early_drop(&ip_conntrack_hash[hash])) {
-			if (net_ratelimit())
-				printk(KERN_WARNING
-				       "ip_conntrack: table full, dropping"
-				       " packet.\n");
-			return ERR_PTR(-ENOMEM);
-		}
+		do {
+			if (!early_drop(&ip_conntrack_hash[hash])) {
+				if (net_ratelimit())
+					printk(KERN_WARNING
+					       "ip_conntrack: table full, "
+					       "dropping packet.\n");
+				return ERR_PTR(-ENOMEM);
+			}
+			/*
+			 * On SMP environments, if the table is full and we
+			 * early drop a conntrack to make some place for this
+			 * new one then we have to ensure that no other
+			 * conntrack slips through.
+			 */
+		} while (!atomic_add_unless(&ip_conntrack_count, 
+					    1, 
+					    ip_conntrack_max));
 	}
 
 	conntrack = kmem_cache_alloc(ip_conntrack_cachep, GFP_ATOMIC);
 	if (!conntrack) {
 		DEBUGP("Can't allocate conntrack.\n");
+		atomic_dec(&ip_conntrack_count);
 		return ERR_PTR(-ENOMEM);
 	}
 
@@ -670,8 +681,6 @@ struct ip_conntrack *ip_conntrack_alloc(
 	conntrack->timeout.data = (unsigned long)conntrack;
 	conntrack->timeout.function = death_by_timeout;
 
-	atomic_inc(&ip_conntrack_count);
-
 	return conntrack;
 }
 
Index: net-2.6/net/netfilter/nf_conntrack_core.c
===================================================================
--- net-2.6.orig/net/netfilter/nf_conntrack_core.c	2006-08-18 19:23:19.000000000 +0200
+++ net-2.6/net/netfilter/nf_conntrack_core.c	2006-08-18 20:20:08.000000000 +0200
@@ -868,16 +868,26 @@ __nf_conntrack_alloc(const struct nf_con
 	}
 
 	if (nf_conntrack_max
-	    && atomic_read(&nf_conntrack_count) >= nf_conntrack_max) {
+	    && !atomic_add_unless(&nf_conntrack_count, 1, nf_conntrack_max)) {
 		unsigned int hash = hash_conntrack(orig);
 		/* Try dropping from this hash chain. */
-		if (!early_drop(&nf_conntrack_hash[hash])) {
-			if (net_ratelimit())
-				printk(KERN_WARNING
-				       "nf_conntrack: table full, dropping"
-				       " packet.\n");
-			return ERR_PTR(-ENOMEM);
-		}
+		do {
+			if (!early_drop(&nf_conntrack_hash[hash])) {
+				if (net_ratelimit())
+					printk(KERN_WARNING
+					       "ip_conntrack: table full, "
+					       "dropping packet.\n");
+				return ERR_PTR(-ENOMEM);
+			}
+			/*
+			 * On SMP environments, if the table is full and we
+			 * early drop a conntrack to make some place for this
+			 * new one then we have to ensure that no other
+			 * conntrack slips through.
+			 */
+		} while (!atomic_add_unless(&nf_conntrack_count, 
+					    1, 
+					    nf_conntrack_max));
 	}
 
 	/*  find features needed by this conntrack. */
@@ -923,9 +933,12 @@ __nf_conntrack_alloc(const struct nf_con
 	conntrack->timeout.data = (unsigned long)conntrack;
 	conntrack->timeout.function = death_by_timeout;
 
-	atomic_inc(&nf_conntrack_count);
+	read_unlock_bh(&nf_ct_cache_lock);
+	return conntrack;
+
 out:
 	read_unlock_bh(&nf_ct_cache_lock);
+	atomic_dec(&nf_conntrack_count);
 	return conntrack;
 }
 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/3][CONNTRACK] Fix race condition in early drop
  2006-08-21  8:47 [PATCH 3/3][CONNTRACK] Fix race condition in early drop Pablo Neira Ayuso
@ 2006-08-22  4:35 ` Yasuyuki KOZAKAI
       [not found] ` <200608220435.k7M4ZSLf001686@toshiba.co.jp>
  1 sibling, 0 replies; 8+ messages in thread
From: Yasuyuki KOZAKAI @ 2006-08-22  4:35 UTC (permalink / raw)
  To: pablo; +Cc: laforge, netfilter-devel, kaber


Hi, Pablo,

From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Mon, 21 Aug 2006 10:47:49 +0200

> [CONNTRACK] Fix race condition in early drop
> 
> On SMP environments the maximum number of conntracks can be overpassed
> under heavy stress situations due to an existing race condition.
> 
>        CPU A                   CPU B
>     atomic_read()               ...
>     early_drop()                ...
>        ...                  atomic_read()
>   allocate conntrack      allocate conntrack
>     atomic_inc()             atomic_inc()
> 
> This patch uses an optimistic approach to solve the concurrency problem.

Good catch!



> Index: net-2.6/net/netfilter/nf_conntrack_core.c
> ===================================================================
> --- net-2.6.orig/net/netfilter/nf_conntrack_core.c	2006-08-18 19:23:19.000000000 +0200
> +++ net-2.6/net/netfilter/nf_conntrack_core.c	2006-08-18 20:20:08.000000000 +0200
> @@ -868,16 +868,26 @@ __nf_conntrack_alloc(const struct nf_con
>  	}
>  
>  	if (nf_conntrack_max
> -	    && atomic_read(&nf_conntrack_count) >= nf_conntrack_max) {
> +	    && !atomic_add_unless(&nf_conntrack_count, 1, nf_conntrack_max)) {
>  		unsigned int hash = hash_conntrack(orig);
>  		/* Try dropping from this hash chain. */
> -		if (!early_drop(&nf_conntrack_hash[hash])) {
> -			if (net_ratelimit())
> -				printk(KERN_WARNING
> -				       "nf_conntrack: table full, dropping"
> -				       " packet.\n");
> -			return ERR_PTR(-ENOMEM);
> -		}
> +		do {
> +			if (!early_drop(&nf_conntrack_hash[hash])) {
> +				if (net_ratelimit())
> +					printk(KERN_WARNING
> +					       "ip_conntrack: table full, "
> +					       "dropping packet.\n");
> +				return ERR_PTR(-ENOMEM);
> +			}
> +			/*
> +			 * On SMP environments, if the table is full and we
> +			 * early drop a conntrack to make some place for this
> +			 * new one then we have to ensure that no other
> +			 * conntrack slips through.
> +			 */
> +		} while (!atomic_add_unless(&nf_conntrack_count, 
> +					    1, 
> +					    nf_conntrack_max));
>  	}

I think there is unfair case like following.

       CPU A                   CPU B
    atomic_add_unless() == 0
    early_drop()                  ...
       ...                     atomic_add_unless() == 1
    atomic_add_unless() == 0
    early_drop()

The right to allocate conntrack is stolen by CPU B in this case.
And there is no assurance that CPU A can exits this loop in short time.

How about incrementing {ip,nf}_conntrack_count at first ?

    1. atomic_add()
    2. if {ip,nf}_conntrack_count > {ip,nf}_conntrack_max (not '>=' )
       then early_drop()
    3. if early_drop() failed, atomic_dec()

-- Yasuyuki Kozakai

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/3][CONNTRACK] Fix race condition in early drop
       [not found] ` <200608220435.k7M4ZSLf001686@toshiba.co.jp>
@ 2006-08-22 13:46   ` Pablo Neira Ayuso
  2006-08-22 14:39     ` Pablo Neira Ayuso
                       ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Pablo Neira Ayuso @ 2006-08-22 13:46 UTC (permalink / raw)
  To: Yasuyuki KOZAKAI; +Cc: laforge, netfilter-devel, kaber

Hi Yasuyuki,

Yasuyuki KOZAKAI wrote:
> From: Pablo Neira Ayuso <pablo@netfilter.org>
> Date: Mon, 21 Aug 2006 10:47:49 +0200
> 
>>[CONNTRACK] Fix race condition in early drop
>>
>>On SMP environments the maximum number of conntracks can be overpassed
>>under heavy stress situations due to an existing race condition.
>>
>>       CPU A                   CPU B
>>    atomic_read()               ...
>>    early_drop()                ...
>>       ...                  atomic_read()
>>  allocate conntrack      allocate conntrack
>>    atomic_inc()             atomic_inc()
>>
[snip]
> 
> I think there is unfair case like following.
> 
>        CPU A                   CPU B
>     atomic_add_unless() == 0
>     early_drop()                  ...
>        ...                     atomic_add_unless() == 1
>     atomic_add_unless() == 0
>     early_drop()
> 
> The right to allocate conntrack is stolen by CPU B in this case.

Yes, but we're under stress so I'm not sure if fairness is important here.

> And there is no assurance that CPU A can exits this loop in short time.

You are right, this seems important. Instead of looping we can just give 
up if we lose race.

> How about incrementing {ip,nf}_conntrack_count at first ?
> 
>     1. atomic_add()
>     2. if {ip,nf}_conntrack_count > {ip,nf}_conntrack_max (not '>=' )
>        then early_drop()
>     3. if early_drop() failed, atomic_dec()

I thought about this possibility but then we can't guarantee the fixed 
maximum number of conntracks in the system.

Any comments?

-- 
The dawn of the fourth age of Linux firewalling is coming; a time of 
great struggle and heroic deeds -- J.Kadlecsik got inspired by J.Morris

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/3][CONNTRACK] Fix race condition in early drop
  2006-08-22 13:46   ` Pablo Neira Ayuso
@ 2006-08-22 14:39     ` Pablo Neira Ayuso
       [not found]       ` <200608230228.k7N2SDTf000802@toshiba.co.jp>
  2006-08-23  2:28     ` Yasuyuki KOZAKAI
  2006-08-24 11:47     ` Jarek Poplawski
  2 siblings, 1 reply; 8+ messages in thread
From: Pablo Neira Ayuso @ 2006-08-22 14:39 UTC (permalink / raw)
  To: Yasuyuki KOZAKAI; +Cc: laforge, netfilter-devel, kaber

Pablo Neira Ayuso wrote:
>> How about incrementing {ip,nf}_conntrack_count at first ?
>>
>>     1. atomic_add()
>>     2. if {ip,nf}_conntrack_count > {ip,nf}_conntrack_max (not '>=' )
>>        then early_drop()
>>     3. if early_drop() failed, atomic_dec()
> 
> 
> I thought about this possibility but then we can't guarantee the fixed 
> maximum number of conntracks in the system.

Hm, actually this is wrong, we can guarantee the maximum number but 
aren't we somehow fooling the counter? I mean, the counter can reach 
values higher than conntrack_max during a short period.

-- 
The dawn of the fourth age of Linux firewalling is coming; a time of 
great struggle and heroic deeds -- J.Kadlecsik got inspired by J.Morris

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/3][CONNTRACK] Fix race condition in early drop
  2006-08-22 13:46   ` Pablo Neira Ayuso
  2006-08-22 14:39     ` Pablo Neira Ayuso
@ 2006-08-23  2:28     ` Yasuyuki KOZAKAI
  2006-08-24 11:47     ` Jarek Poplawski
  2 siblings, 0 replies; 8+ messages in thread
From: Yasuyuki KOZAKAI @ 2006-08-23  2:28 UTC (permalink / raw)
  To: pablo; +Cc: laforge, netfilter-devel, kaber, yasuyuki.kozakai


Hi,

From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Tue, 22 Aug 2006 16:39:23 +0200

> Pablo Neira Ayuso wrote:
> >> How about incrementing {ip,nf}_conntrack_count at first ?
> >>
> >>     1. atomic_add()
> >>     2. if {ip,nf}_conntrack_count > {ip,nf}_conntrack_max (not '>=' )
> >>        then early_drop()
> >>     3. if early_drop() failed, atomic_dec()
> > 
> > 
> > I thought about this possibility but then we can't guarantee the fixed 
> > maximum number of conntracks in the system.
> 
> Hm, actually this is wrong, we can guarantee the maximum number but 
> aren't we somehow fooling the counter? I mean, the counter can reach 
> values higher than conntrack_max during a short period.

good point. I don't mind fooling the counter in this short period,
indeed someone might mind that. Then,

From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Tue, 22 Aug 2006 15:46:50 +0200

> > And there is no assurance that CPU A can exits this loop in short time.
> 
> You are right, this seems important. Instead of looping we can just give 
> up if we lose race.

Now I think this is better.

-- Yasuyuki Kozakai

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/3][CONNTRACK] Fix race condition in early drop
       [not found]       ` <200608230228.k7N2SDTf000802@toshiba.co.jp>
@ 2006-08-23  4:38         ` Patrick McHardy
  0 siblings, 0 replies; 8+ messages in thread
From: Patrick McHardy @ 2006-08-23  4:38 UTC (permalink / raw)
  To: Yasuyuki KOZAKAI; +Cc: laforge, netfilter-devel, pablo

Yasuyuki KOZAKAI wrote:
>>Pablo Neira Ayuso wrote:
>>
>>>>How about incrementing {ip,nf}_conntrack_count at first ?
>>>>
>>>>    1. atomic_add()
>>>>    2. if {ip,nf}_conntrack_count > {ip,nf}_conntrack_max (not '>=' )
>>>>       then early_drop()
>>>>    3. if early_drop() failed, atomic_dec()
>>>
>>>
>>>I thought about this possibility but then we can't guarantee the fixed 
>>>maximum number of conntracks in the system.
>>
>>Hm, actually this is wrong, we can guarantee the maximum number but 
>>aren't we somehow fooling the counter? I mean, the counter can reach 
>>values higher than conntrack_max during a short period.
> 
> 
> good point. I don't mind fooling the counter in this short period,


Me neither. We can already be off by more than one since early_drop
just removes a conntrack from the hash tables, but it is not necessarily
destroyed immediately (at which point the counter is decremented).
This is a reason why we can't loop while waiting for the counter to
decrement.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/3][CONNTRACK] Fix race condition in early drop
  2006-08-22 13:46   ` Pablo Neira Ayuso
  2006-08-22 14:39     ` Pablo Neira Ayuso
  2006-08-23  2:28     ` Yasuyuki KOZAKAI
@ 2006-08-24 11:47     ` Jarek Poplawski
  2006-08-24 13:02       ` Jarek Poplawski
  2 siblings, 1 reply; 8+ messages in thread
From: Jarek Poplawski @ 2006-08-24 11:47 UTC (permalink / raw)
  To: netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 1752 bytes --]

On 22-08-2006 15:46, Pablo Neira Ayuso wrote:
> Hi Yasuyuki,
> 
> Yasuyuki KOZAKAI wrote:
>> From: Pablo Neira Ayuso <pablo@netfilter.org>
>> Date: Mon, 21 Aug 2006 10:47:49 +0200
>>
>>> [CONNTRACK] Fix race condition in early drop
>>>
>>> On SMP environments the maximum number of conntracks can be overpassed
>>> under heavy stress situations due to an existing race condition.
>>>
>>>       CPU A                   CPU B
>>>    atomic_read()               ...
>>>    early_drop()                ...
>>>       ...                  atomic_read()
>>>  allocate conntrack      allocate conntrack
>>>    atomic_inc()             atomic_inc()
>>>
> [snip]
>>
>> I think there is unfair case like following.
>>
>>        CPU A                   CPU B
>>     atomic_add_unless() == 0
>>     early_drop()                  ...
>>        ...                     atomic_add_unless() == 1
>>     atomic_add_unless() == 0
>>     early_drop()
>>
>> The right to allocate conntrack is stolen by CPU B in this case.
> 
> Yes, but we're under stress so I'm not sure if fairness is important here.
> 
>> And there is no assurance that CPU A can exits this loop in short time.
> 
> You are right, this seems important. Instead of looping we can just give 
> up if we lose race.
> 
>> How about incrementing {ip,nf}_conntrack_count at first ?
>>
>>     1. atomic_add()
>>     2. if {ip,nf}_conntrack_count > {ip,nf}_conntrack_max (not '>=' )
>>        then early_drop()
>>     3. if early_drop() failed, atomic_dec()
> 
> I thought about this possibility but then we can't guarantee the fixed 
> maximum number of conntracks in the system.
> 
> Any comments?

Sorry, maybe I'm to fresh, but if you say "any"...
Maybe something simpler? I attach a proposal.

Jarek P.




[-- Attachment #2: nf_conntrack_core-2.6.18-rc4.diff --]
[-- Type: text/plain, Size: 1013 bytes --]

--- linux-2.6.18-rc4/net/netfilter/nf_conntrack_core.c-	2006-08-22 07:55:25.000000000 +0200
+++ linux-2.6.18-rc4/net/netfilter//nf_conntrack_core.c	2006-08-24 13:34:43.000000000 +0200
@@ -871,6 +871,7 @@
 		unsigned int hash = hash_conntrack(orig);
 		/* Try dropping from this hash chain. */
 		if (!early_drop(&nf_conntrack_hash[hash])) {
+			atomic_dec(&nf_conntrack_count);
 			if (net_ratelimit())
 				printk(KERN_WARNING
 				       "nf_conntrack: table full, dropping"
@@ -905,6 +906,12 @@
 		goto out;
 	}
 
+	if (!atomic_add_unless(&nf_conntrack_count, 1, nf_conntrack_max) {
+		kmem_cache_free(nf_ct_cache[features].cachep, conntrack);
+		conntrack = NULL;
+		goto out;
+	}
+
 	memset(conntrack, 0, nf_ct_cache[features].size);
 	conntrack->features = features;
 	if (helper) {
@@ -922,7 +929,6 @@
 	conntrack->timeout.data = (unsigned long)conntrack;
 	conntrack->timeout.function = death_by_timeout;
 
-	atomic_inc(&nf_conntrack_count);
 out:
 	read_unlock_bh(&nf_ct_cache_lock);
 	return conntrack;

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/3][CONNTRACK] Fix race condition in early drop
  2006-08-24 11:47     ` Jarek Poplawski
@ 2006-08-24 13:02       ` Jarek Poplawski
  0 siblings, 0 replies; 8+ messages in thread
From: Jarek Poplawski @ 2006-08-24 13:02 UTC (permalink / raw)
  To: netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 125 bytes --]

On 24-08-2006 13:47, Jarek Poplawski wrote:
...

Sorry again, I'm definitely too fresh. It should be even shorter:

Jarek P.

[-- Attachment #2: nf_conntrack_core-2.6.18-rc4.diff --]
[-- Type: text/plain, Size: 720 bytes --]

--- linux-2.6.18-rc4/net/netfilter/nf_conntrack_core.c-	2006-08-22 07:55:25.000000000 +0200
+++ linux-2.6.18-rc4/net/netfilter//nf_conntrack_core.c	2006-08-24 13:34:43.000000000 +0200
@@ -905,6 +906,12 @@
 		goto out;
 	}
 
+	if (!atomic_add_unless(&nf_conntrack_count, 1, nf_conntrack_max) {
+		kmem_cache_free(nf_ct_cache[features].cachep, conntrack);
+		conntrack = NULL;
+		goto out;
+	}
+
 	memset(conntrack, 0, nf_ct_cache[features].size);
 	conntrack->features = features;
 	if (helper) {
@@ -922,7 +929,6 @@
 	conntrack->timeout.data = (unsigned long)conntrack;
 	conntrack->timeout.function = death_by_timeout;
 
-	atomic_inc(&nf_conntrack_count);
 out:
 	read_unlock_bh(&nf_ct_cache_lock);
 	return conntrack;

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2006-08-24 13:02 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-21  8:47 [PATCH 3/3][CONNTRACK] Fix race condition in early drop Pablo Neira Ayuso
2006-08-22  4:35 ` Yasuyuki KOZAKAI
     [not found] ` <200608220435.k7M4ZSLf001686@toshiba.co.jp>
2006-08-22 13:46   ` Pablo Neira Ayuso
2006-08-22 14:39     ` Pablo Neira Ayuso
     [not found]       ` <200608230228.k7N2SDTf000802@toshiba.co.jp>
2006-08-23  4:38         ` Patrick McHardy
2006-08-23  2:28     ` Yasuyuki KOZAKAI
2006-08-24 11:47     ` Jarek Poplawski
2006-08-24 13:02       ` Jarek Poplawski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.