* [PATCH 3/3][CONNTRACK] Fix race condition in early drop
@ 2006-08-21 8:47 Pablo Neira Ayuso
2006-08-22 4:35 ` Yasuyuki KOZAKAI
[not found] ` <200608220435.k7M4ZSLf001686@toshiba.co.jp>
0 siblings, 2 replies; 8+ messages in thread
From: Pablo Neira Ayuso @ 2006-08-21 8:47 UTC (permalink / raw)
To: Netfilter Development Mailinglist; +Cc: Harald Welte, Patrick McHardy
[-- Attachment #1: Type: text/plain, Size: 705 bytes --]
[CONNTRACK] Fix race condition in early drop
On SMP environments the maximum number of conntracks can be overpassed
under heavy stress situations due to an existing race condition.
CPU A CPU B
atomic_read() ...
early_drop() ...
... atomic_read()
allocate conntrack allocate conntrack
atomic_inc() atomic_inc()
This patch uses an optimistic approach to solve the concurrency problem.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
--
The dawn of the fourth age of Linux firewalling is coming; a time of
great struggle and heroic deeds -- J.Kadlecsik got inspired by J.Morris
[-- Attachment #2: 09race.patch --]
[-- Type: text/plain, Size: 4111 bytes --]
[CONNTRACK] Fix race condition in early drop
On SMP environments the maximum number of conntracks can be overpassed
under heavy stress situations due to an existing race condition.
CPU A CPU B
atomic_read() ...
early_drop() ...
... atomic_read()
allocate conntrack allocate conntrack
atomic_inc() atomic_inc()
This patch uses an optimistic approach to solve the concurrency problem.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Index: net-2.6/net/ipv4/netfilter/ip_conntrack_core.c
===================================================================
--- net-2.6.orig/net/ipv4/netfilter/ip_conntrack_core.c 2006-08-17 15:50:33.000000000 +0200
+++ net-2.6/net/ipv4/netfilter/ip_conntrack_core.c 2006-08-17 17:52:27.000000000 +0200
@@ -642,21 +642,32 @@ struct ip_conntrack *ip_conntrack_alloc(
}
if (ip_conntrack_max
- && atomic_read(&ip_conntrack_count) >= ip_conntrack_max) {
+ && !atomic_add_unless(&ip_conntrack_count, 1, ip_conntrack_max)) {
unsigned int hash = hash_conntrack(orig);
/* Try dropping from this hash chain. */
- if (!early_drop(&ip_conntrack_hash[hash])) {
- if (net_ratelimit())
- printk(KERN_WARNING
- "ip_conntrack: table full, dropping"
- " packet.\n");
- return ERR_PTR(-ENOMEM);
- }
+ do {
+ if (!early_drop(&ip_conntrack_hash[hash])) {
+ if (net_ratelimit())
+ printk(KERN_WARNING
+ "ip_conntrack: table full, "
+ "dropping packet.\n");
+ return ERR_PTR(-ENOMEM);
+ }
+ /*
+ * On SMP environments, if the table is full and we
+ * early drop a conntrack to make some place for this
+ * new one then we have to ensure that no other
+ * conntrack slips through.
+ */
+ } while (!atomic_add_unless(&ip_conntrack_count,
+ 1,
+ ip_conntrack_max));
}
conntrack = kmem_cache_alloc(ip_conntrack_cachep, GFP_ATOMIC);
if (!conntrack) {
DEBUGP("Can't allocate conntrack.\n");
+ atomic_dec(&ip_conntrack_count);
return ERR_PTR(-ENOMEM);
}
@@ -670,8 +681,6 @@ struct ip_conntrack *ip_conntrack_alloc(
conntrack->timeout.data = (unsigned long)conntrack;
conntrack->timeout.function = death_by_timeout;
- atomic_inc(&ip_conntrack_count);
-
return conntrack;
}
Index: net-2.6/net/netfilter/nf_conntrack_core.c
===================================================================
--- net-2.6.orig/net/netfilter/nf_conntrack_core.c 2006-08-18 19:23:19.000000000 +0200
+++ net-2.6/net/netfilter/nf_conntrack_core.c 2006-08-18 20:20:08.000000000 +0200
@@ -868,16 +868,26 @@ __nf_conntrack_alloc(const struct nf_con
}
if (nf_conntrack_max
- && atomic_read(&nf_conntrack_count) >= nf_conntrack_max) {
+ && !atomic_add_unless(&nf_conntrack_count, 1, nf_conntrack_max)) {
unsigned int hash = hash_conntrack(orig);
/* Try dropping from this hash chain. */
- if (!early_drop(&nf_conntrack_hash[hash])) {
- if (net_ratelimit())
- printk(KERN_WARNING
- "nf_conntrack: table full, dropping"
- " packet.\n");
- return ERR_PTR(-ENOMEM);
- }
+ do {
+ if (!early_drop(&nf_conntrack_hash[hash])) {
+ if (net_ratelimit())
+ printk(KERN_WARNING
+ "ip_conntrack: table full, "
+ "dropping packet.\n");
+ return ERR_PTR(-ENOMEM);
+ }
+ /*
+ * On SMP environments, if the table is full and we
+ * early drop a conntrack to make some place for this
+ * new one then we have to ensure that no other
+ * conntrack slips through.
+ */
+ } while (!atomic_add_unless(&nf_conntrack_count,
+ 1,
+ nf_conntrack_max));
}
/* find features needed by this conntrack. */
@@ -923,9 +933,12 @@ __nf_conntrack_alloc(const struct nf_con
conntrack->timeout.data = (unsigned long)conntrack;
conntrack->timeout.function = death_by_timeout;
- atomic_inc(&nf_conntrack_count);
+ read_unlock_bh(&nf_ct_cache_lock);
+ return conntrack;
+
out:
read_unlock_bh(&nf_ct_cache_lock);
+ atomic_dec(&nf_conntrack_count);
return conntrack;
}
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 3/3][CONNTRACK] Fix race condition in early drop
2006-08-21 8:47 [PATCH 3/3][CONNTRACK] Fix race condition in early drop Pablo Neira Ayuso
@ 2006-08-22 4:35 ` Yasuyuki KOZAKAI
[not found] ` <200608220435.k7M4ZSLf001686@toshiba.co.jp>
1 sibling, 0 replies; 8+ messages in thread
From: Yasuyuki KOZAKAI @ 2006-08-22 4:35 UTC (permalink / raw)
To: pablo; +Cc: laforge, netfilter-devel, kaber
Hi, Pablo,
From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Mon, 21 Aug 2006 10:47:49 +0200
> [CONNTRACK] Fix race condition in early drop
>
> On SMP environments the maximum number of conntracks can be overpassed
> under heavy stress situations due to an existing race condition.
>
> CPU A CPU B
> atomic_read() ...
> early_drop() ...
> ... atomic_read()
> allocate conntrack allocate conntrack
> atomic_inc() atomic_inc()
>
> This patch uses an optimistic approach to solve the concurrency problem.
Good catch!
> Index: net-2.6/net/netfilter/nf_conntrack_core.c
> ===================================================================
> --- net-2.6.orig/net/netfilter/nf_conntrack_core.c 2006-08-18 19:23:19.000000000 +0200
> +++ net-2.6/net/netfilter/nf_conntrack_core.c 2006-08-18 20:20:08.000000000 +0200
> @@ -868,16 +868,26 @@ __nf_conntrack_alloc(const struct nf_con
> }
>
> if (nf_conntrack_max
> - && atomic_read(&nf_conntrack_count) >= nf_conntrack_max) {
> + && !atomic_add_unless(&nf_conntrack_count, 1, nf_conntrack_max)) {
> unsigned int hash = hash_conntrack(orig);
> /* Try dropping from this hash chain. */
> - if (!early_drop(&nf_conntrack_hash[hash])) {
> - if (net_ratelimit())
> - printk(KERN_WARNING
> - "nf_conntrack: table full, dropping"
> - " packet.\n");
> - return ERR_PTR(-ENOMEM);
> - }
> + do {
> + if (!early_drop(&nf_conntrack_hash[hash])) {
> + if (net_ratelimit())
> + printk(KERN_WARNING
> + "ip_conntrack: table full, "
> + "dropping packet.\n");
> + return ERR_PTR(-ENOMEM);
> + }
> + /*
> + * On SMP environments, if the table is full and we
> + * early drop a conntrack to make some place for this
> + * new one then we have to ensure that no other
> + * conntrack slips through.
> + */
> + } while (!atomic_add_unless(&nf_conntrack_count,
> + 1,
> + nf_conntrack_max));
> }
I think there is unfair case like following.
CPU A CPU B
atomic_add_unless() == 0
early_drop() ...
... atomic_add_unless() == 1
atomic_add_unless() == 0
early_drop()
The right to allocate conntrack is stolen by CPU B in this case.
And there is no assurance that CPU A can exits this loop in short time.
How about incrementing {ip,nf}_conntrack_count at first ?
1. atomic_add()
2. if {ip,nf}_conntrack_count > {ip,nf}_conntrack_max (not '>=' )
then early_drop()
3. if early_drop() failed, atomic_dec()
-- Yasuyuki Kozakai
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 3/3][CONNTRACK] Fix race condition in early drop
[not found] ` <200608220435.k7M4ZSLf001686@toshiba.co.jp>
@ 2006-08-22 13:46 ` Pablo Neira Ayuso
2006-08-22 14:39 ` Pablo Neira Ayuso
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Pablo Neira Ayuso @ 2006-08-22 13:46 UTC (permalink / raw)
To: Yasuyuki KOZAKAI; +Cc: laforge, netfilter-devel, kaber
Hi Yasuyuki,
Yasuyuki KOZAKAI wrote:
> From: Pablo Neira Ayuso <pablo@netfilter.org>
> Date: Mon, 21 Aug 2006 10:47:49 +0200
>
>>[CONNTRACK] Fix race condition in early drop
>>
>>On SMP environments the maximum number of conntracks can be overpassed
>>under heavy stress situations due to an existing race condition.
>>
>> CPU A CPU B
>> atomic_read() ...
>> early_drop() ...
>> ... atomic_read()
>> allocate conntrack allocate conntrack
>> atomic_inc() atomic_inc()
>>
[snip]
>
> I think there is unfair case like following.
>
> CPU A CPU B
> atomic_add_unless() == 0
> early_drop() ...
> ... atomic_add_unless() == 1
> atomic_add_unless() == 0
> early_drop()
>
> The right to allocate conntrack is stolen by CPU B in this case.
Yes, but we're under stress so I'm not sure if fairness is important here.
> And there is no assurance that CPU A can exits this loop in short time.
You are right, this seems important. Instead of looping we can just give
up if we lose race.
> How about incrementing {ip,nf}_conntrack_count at first ?
>
> 1. atomic_add()
> 2. if {ip,nf}_conntrack_count > {ip,nf}_conntrack_max (not '>=' )
> then early_drop()
> 3. if early_drop() failed, atomic_dec()
I thought about this possibility but then we can't guarantee the fixed
maximum number of conntracks in the system.
Any comments?
--
The dawn of the fourth age of Linux firewalling is coming; a time of
great struggle and heroic deeds -- J.Kadlecsik got inspired by J.Morris
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 3/3][CONNTRACK] Fix race condition in early drop
2006-08-22 13:46 ` Pablo Neira Ayuso
@ 2006-08-22 14:39 ` Pablo Neira Ayuso
[not found] ` <200608230228.k7N2SDTf000802@toshiba.co.jp>
2006-08-23 2:28 ` Yasuyuki KOZAKAI
2006-08-24 11:47 ` Jarek Poplawski
2 siblings, 1 reply; 8+ messages in thread
From: Pablo Neira Ayuso @ 2006-08-22 14:39 UTC (permalink / raw)
To: Yasuyuki KOZAKAI; +Cc: laforge, netfilter-devel, kaber
Pablo Neira Ayuso wrote:
>> How about incrementing {ip,nf}_conntrack_count at first ?
>>
>> 1. atomic_add()
>> 2. if {ip,nf}_conntrack_count > {ip,nf}_conntrack_max (not '>=' )
>> then early_drop()
>> 3. if early_drop() failed, atomic_dec()
>
>
> I thought about this possibility but then we can't guarantee the fixed
> maximum number of conntracks in the system.
Hm, actually this is wrong, we can guarantee the maximum number but
aren't we somehow fooling the counter? I mean, the counter can reach
values higher than conntrack_max during a short period.
--
The dawn of the fourth age of Linux firewalling is coming; a time of
great struggle and heroic deeds -- J.Kadlecsik got inspired by J.Morris
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 3/3][CONNTRACK] Fix race condition in early drop
2006-08-22 13:46 ` Pablo Neira Ayuso
2006-08-22 14:39 ` Pablo Neira Ayuso
@ 2006-08-23 2:28 ` Yasuyuki KOZAKAI
2006-08-24 11:47 ` Jarek Poplawski
2 siblings, 0 replies; 8+ messages in thread
From: Yasuyuki KOZAKAI @ 2006-08-23 2:28 UTC (permalink / raw)
To: pablo; +Cc: laforge, netfilter-devel, kaber, yasuyuki.kozakai
Hi,
From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Tue, 22 Aug 2006 16:39:23 +0200
> Pablo Neira Ayuso wrote:
> >> How about incrementing {ip,nf}_conntrack_count at first ?
> >>
> >> 1. atomic_add()
> >> 2. if {ip,nf}_conntrack_count > {ip,nf}_conntrack_max (not '>=' )
> >> then early_drop()
> >> 3. if early_drop() failed, atomic_dec()
> >
> >
> > I thought about this possibility but then we can't guarantee the fixed
> > maximum number of conntracks in the system.
>
> Hm, actually this is wrong, we can guarantee the maximum number but
> aren't we somehow fooling the counter? I mean, the counter can reach
> values higher than conntrack_max during a short period.
good point. I don't mind fooling the counter in this short period,
indeed someone might mind that. Then,
From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Tue, 22 Aug 2006 15:46:50 +0200
> > And there is no assurance that CPU A can exits this loop in short time.
>
> You are right, this seems important. Instead of looping we can just give
> up if we lose race.
Now I think this is better.
-- Yasuyuki Kozakai
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 3/3][CONNTRACK] Fix race condition in early drop
[not found] ` <200608230228.k7N2SDTf000802@toshiba.co.jp>
@ 2006-08-23 4:38 ` Patrick McHardy
0 siblings, 0 replies; 8+ messages in thread
From: Patrick McHardy @ 2006-08-23 4:38 UTC (permalink / raw)
To: Yasuyuki KOZAKAI; +Cc: laforge, netfilter-devel, pablo
Yasuyuki KOZAKAI wrote:
>>Pablo Neira Ayuso wrote:
>>
>>>>How about incrementing {ip,nf}_conntrack_count at first ?
>>>>
>>>> 1. atomic_add()
>>>> 2. if {ip,nf}_conntrack_count > {ip,nf}_conntrack_max (not '>=' )
>>>> then early_drop()
>>>> 3. if early_drop() failed, atomic_dec()
>>>
>>>
>>>I thought about this possibility but then we can't guarantee the fixed
>>>maximum number of conntracks in the system.
>>
>>Hm, actually this is wrong, we can guarantee the maximum number but
>>aren't we somehow fooling the counter? I mean, the counter can reach
>>values higher than conntrack_max during a short period.
>
>
> good point. I don't mind fooling the counter in this short period,
Me neither. We can already be off by more than one since early_drop
just removes a conntrack from the hash tables, but it is not necessarily
destroyed immediately (at which point the counter is decremented).
This is a reason why we can't loop while waiting for the counter to
decrement.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 3/3][CONNTRACK] Fix race condition in early drop
2006-08-22 13:46 ` Pablo Neira Ayuso
2006-08-22 14:39 ` Pablo Neira Ayuso
2006-08-23 2:28 ` Yasuyuki KOZAKAI
@ 2006-08-24 11:47 ` Jarek Poplawski
2006-08-24 13:02 ` Jarek Poplawski
2 siblings, 1 reply; 8+ messages in thread
From: Jarek Poplawski @ 2006-08-24 11:47 UTC (permalink / raw)
To: netfilter-devel
[-- Attachment #1: Type: text/plain, Size: 1752 bytes --]
On 22-08-2006 15:46, Pablo Neira Ayuso wrote:
> Hi Yasuyuki,
>
> Yasuyuki KOZAKAI wrote:
>> From: Pablo Neira Ayuso <pablo@netfilter.org>
>> Date: Mon, 21 Aug 2006 10:47:49 +0200
>>
>>> [CONNTRACK] Fix race condition in early drop
>>>
>>> On SMP environments the maximum number of conntracks can be overpassed
>>> under heavy stress situations due to an existing race condition.
>>>
>>> CPU A CPU B
>>> atomic_read() ...
>>> early_drop() ...
>>> ... atomic_read()
>>> allocate conntrack allocate conntrack
>>> atomic_inc() atomic_inc()
>>>
> [snip]
>>
>> I think there is unfair case like following.
>>
>> CPU A CPU B
>> atomic_add_unless() == 0
>> early_drop() ...
>> ... atomic_add_unless() == 1
>> atomic_add_unless() == 0
>> early_drop()
>>
>> The right to allocate conntrack is stolen by CPU B in this case.
>
> Yes, but we're under stress so I'm not sure if fairness is important here.
>
>> And there is no assurance that CPU A can exits this loop in short time.
>
> You are right, this seems important. Instead of looping we can just give
> up if we lose race.
>
>> How about incrementing {ip,nf}_conntrack_count at first ?
>>
>> 1. atomic_add()
>> 2. if {ip,nf}_conntrack_count > {ip,nf}_conntrack_max (not '>=' )
>> then early_drop()
>> 3. if early_drop() failed, atomic_dec()
>
> I thought about this possibility but then we can't guarantee the fixed
> maximum number of conntracks in the system.
>
> Any comments?
Sorry, maybe I'm to fresh, but if you say "any"...
Maybe something simpler? I attach a proposal.
Jarek P.
[-- Attachment #2: nf_conntrack_core-2.6.18-rc4.diff --]
[-- Type: text/plain, Size: 1013 bytes --]
--- linux-2.6.18-rc4/net/netfilter/nf_conntrack_core.c- 2006-08-22 07:55:25.000000000 +0200
+++ linux-2.6.18-rc4/net/netfilter//nf_conntrack_core.c 2006-08-24 13:34:43.000000000 +0200
@@ -871,6 +871,7 @@
unsigned int hash = hash_conntrack(orig);
/* Try dropping from this hash chain. */
if (!early_drop(&nf_conntrack_hash[hash])) {
+ atomic_dec(&nf_conntrack_count);
if (net_ratelimit())
printk(KERN_WARNING
"nf_conntrack: table full, dropping"
@@ -905,6 +906,12 @@
goto out;
}
+ if (!atomic_add_unless(&nf_conntrack_count, 1, nf_conntrack_max) {
+ kmem_cache_free(nf_ct_cache[features].cachep, conntrack);
+ conntrack = NULL;
+ goto out;
+ }
+
memset(conntrack, 0, nf_ct_cache[features].size);
conntrack->features = features;
if (helper) {
@@ -922,7 +929,6 @@
conntrack->timeout.data = (unsigned long)conntrack;
conntrack->timeout.function = death_by_timeout;
- atomic_inc(&nf_conntrack_count);
out:
read_unlock_bh(&nf_ct_cache_lock);
return conntrack;
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 3/3][CONNTRACK] Fix race condition in early drop
2006-08-24 11:47 ` Jarek Poplawski
@ 2006-08-24 13:02 ` Jarek Poplawski
0 siblings, 0 replies; 8+ messages in thread
From: Jarek Poplawski @ 2006-08-24 13:02 UTC (permalink / raw)
To: netfilter-devel
[-- Attachment #1: Type: text/plain, Size: 125 bytes --]
On 24-08-2006 13:47, Jarek Poplawski wrote:
...
Sorry again, I'm definitely too fresh. It should be even shorter:
Jarek P.
[-- Attachment #2: nf_conntrack_core-2.6.18-rc4.diff --]
[-- Type: text/plain, Size: 720 bytes --]
--- linux-2.6.18-rc4/net/netfilter/nf_conntrack_core.c- 2006-08-22 07:55:25.000000000 +0200
+++ linux-2.6.18-rc4/net/netfilter//nf_conntrack_core.c 2006-08-24 13:34:43.000000000 +0200
@@ -905,6 +906,12 @@
goto out;
}
+ if (!atomic_add_unless(&nf_conntrack_count, 1, nf_conntrack_max) {
+ kmem_cache_free(nf_ct_cache[features].cachep, conntrack);
+ conntrack = NULL;
+ goto out;
+ }
+
memset(conntrack, 0, nf_ct_cache[features].size);
conntrack->features = features;
if (helper) {
@@ -922,7 +929,6 @@
conntrack->timeout.data = (unsigned long)conntrack;
conntrack->timeout.function = death_by_timeout;
- atomic_inc(&nf_conntrack_count);
out:
read_unlock_bh(&nf_ct_cache_lock);
return conntrack;
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2006-08-24 13:02 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-21 8:47 [PATCH 3/3][CONNTRACK] Fix race condition in early drop Pablo Neira Ayuso
2006-08-22 4:35 ` Yasuyuki KOZAKAI
[not found] ` <200608220435.k7M4ZSLf001686@toshiba.co.jp>
2006-08-22 13:46 ` Pablo Neira Ayuso
2006-08-22 14:39 ` Pablo Neira Ayuso
[not found] ` <200608230228.k7N2SDTf000802@toshiba.co.jp>
2006-08-23 4:38 ` Patrick McHardy
2006-08-23 2:28 ` Yasuyuki KOZAKAI
2006-08-24 11:47 ` Jarek Poplawski
2006-08-24 13:02 ` Jarek Poplawski
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.