RE: netfilter: nf_conncount: cpu soft lockup using limiting with Open vSwitch.

public inbox for netfilter@vger.kernel.org
 help / color / mirror / Atom feed

* RE: netfilter: nf_conncount: cpu soft lockup using limiting with Open vSwitch.
       [not found] <7a6872ce-8015-4397-bbe9-22108c65b7ec@k2.cloud>
@ 2025-12-08 12:27 ` Odintsov Vladislav
  2025-12-09  7:57   ` Fernando Fernandez Mancera
  0 siblings, 1 reply; 10+ messages in thread
From: Odintsov Vladislav @ 2025-12-08 12:27 UTC (permalink / raw)
  To: netfilter@vger.kernel.org, ovs-dev@openvswitch.org
  Cc: Kovalev Evgeniy, Vazhnetsov Anton, Rukomoinikova Aleksandra

On 08.12.2025 15:06, Rukomoinikova Aleksandra wrote:
> Hi!
> I was testing conntrack limiting using Open vSwitch and noticed the 
> following issue: under certain limits, a CPU lock occurred.
> 
> [  491.682936] watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [ovs- 
> dpctl:19437]
> 
> This occurs during a high packet frequency when trying to get the set 
> limits through ovs-dpctl ct-get-limits.
> 
> In the trace, I can see that the lock occurred on attempts to acquire a 
> spinlock.
> 
> [  491.683056]  <IRQ>
> [  491.683059]  _raw_spin_lock_bh+0x29/0x30
> [  491.683064]  count_tree+0x19b/0x1f0 [nf_conncount]
> [  491.683069]  ovs_ct_commit+0x196/0x490 [openvswitch]
> 
> Prior to this, in the trace, there was processing of a task from 
> userspace (ovs-dpctl)
> 
> [  491.683236]  </IRQ>
> [  491.683237]  <TASK>
> [  491.683238]  asm_common_interrupt+0x22/0x40
> [  491.683240] RIP: 0010:nf_conncount_gc_list+0x18a/0x200 [nf_conncount]
> 
> Inside the nf_conncount_gc_list function, a lock is taken on 
> nf_conncount.c:spin_trylock_bh(&list->list_lock):335. After this, the 
> not-so-fast __nf_conncount_gc_list function is executed. If, at this 
> moment, a packet interrupt arrives on the same сpu core (and 
> spin_trylock_bh doesn't disable interrupts on that core), then scenario 
> I encountered occurs: the first lock remains held, while the packet 
> interrupt also attempts to acquire it at 
> nf_conncount.c:spin_lock_bh(&rbconn->list.list_lock):502 while 
> committing to conntrack. This attempt fails, leading to a soft lockup.
> 
> Hence my question: shouldn't we avoid calling nf_conncount_gc_list when 
> querying limits without an skb (as OVS does in openvswitch/ 
> conntrack.c:1773)? The limit retrieval operation should be read-only 
> regarding the contract state, not involve potential modification.
> 
> Like this:
> --- a/net/netfilter/nf_conncount.c
> +++ b/net/netfilter/nf_conncount.c
> @@ -495,7 +495,6 @@ count_tree(struct net *net,
>               int ret;
> 
>               if (!skb) {
> -                nf_conncount_gc_list(net, &rbconn->list);
>                   return rbconn->list.count;
>               }
> 
> Thanks!
> 
> -- 
> regards,
> Alexandra.
> 

+ ovs-dev ML

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: netfilter: nf_conncount: cpu soft lockup using limiting with Open vSwitch.
  2025-12-08 12:27 ` netfilter: nf_conncount: cpu soft lockup using limiting with Open vSwitch Odintsov Vladislav
@ 2025-12-09  7:57   ` Fernando Fernandez Mancera
  2025-12-09  8:42     ` Pablo Neira Ayuso
  0 siblings, 1 reply; 10+ messages in thread
From: Fernando Fernandez Mancera @ 2025-12-09  7:57 UTC (permalink / raw)
  To: Odintsov Vladislav, netfilter@vger.kernel.org,
	ovs-dev@openvswitch.org
  Cc: Kovalev Evgeniy, Vazhnetsov Anton, Rukomoinikova Aleksandra

On 12/8/25 1:27 PM, Odintsov Vladislav wrote:
> On 08.12.2025 15:06, Rukomoinikova Aleksandra wrote:
>> Hi!
>> I was testing conntrack limiting using Open vSwitch and noticed the
>> following issue: under certain limits, a CPU lock occurred.
>>
>> [  491.682936] watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [ovs-
>> dpctl:19437]
>>
>> This occurs during a high packet frequency when trying to get the set
>> limits through ovs-dpctl ct-get-limits.
>>
>> In the trace, I can see that the lock occurred on attempts to acquire a
>> spinlock.
>>
>> [  491.683056]  <IRQ>
>> [  491.683059]  _raw_spin_lock_bh+0x29/0x30
>> [  491.683064]  count_tree+0x19b/0x1f0 [nf_conncount]
>> [  491.683069]  ovs_ct_commit+0x196/0x490 [openvswitch]
>>
>> Prior to this, in the trace, there was processing of a task from
>> userspace (ovs-dpctl)
>>
>> [  491.683236]  </IRQ>
>> [  491.683237]  <TASK>
>> [  491.683238]  asm_common_interrupt+0x22/0x40
>> [  491.683240] RIP: 0010:nf_conncount_gc_list+0x18a/0x200 [nf_conncount]
>>
>> Inside the nf_conncount_gc_list function, a lock is taken on
>> nf_conncount.c:spin_trylock_bh(&list->list_lock):335. After this, the
>> not-so-fast __nf_conncount_gc_list function is executed. If, at this
>> moment, a packet interrupt arrives on the same сpu core (and
>> spin_trylock_bh doesn't disable interrupts on that core), then scenario
>> I encountered occurs: the first lock remains held, while the packet
>> interrupt also attempts to acquire it at
>> nf_conncount.c:spin_lock_bh(&rbconn->list.list_lock):502 while
>> committing to conntrack. This attempt fails, leading to a soft lockup.
>>

Yes that makes sense. That nf_conncount_gc_list() was added there to 
cover a different scenario which might be also affected by this soft 
lockup under the same conditions.

>> Hence my question: shouldn't we avoid calling nf_conncount_gc_list when
>> querying limits without an skb (as OVS does in openvswitch/
>> conntrack.c:1773)? The limit retrieval operation should be read-only
>> regarding the contract state, not involve potential modification.
>>
>> Like this:
>> --- a/net/netfilter/nf_conncount.c
>> +++ b/net/netfilter/nf_conncount.c
>> @@ -495,7 +495,6 @@ count_tree(struct net *net,
>>                int ret;
>>
>>                if (!skb) {
>> -                nf_conncount_gc_list(net, &rbconn->list);
>>                    return rbconn->list.count;
>>                }
>>

Let me think on something, I would like to provide a solution that is 
suitable for OVS + xt/nft_connlimit. Because this change would break 
some xt_connlimit use-cases. Also without this nf_conncount_gc_list(), 
the connection count wouldn't be accurate.. if some connections closed 
already the count number would still consider them..

Thanks for reporting it, I will let you know my findings on how to solve 
this :-)

Thanks,
Fernando.

>> Thanks!
>>
>> -- 
>> regards,
>> Alexandra.
>>
> 
> + ovs-dev ML


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: netfilter: nf_conncount: cpu soft lockup using limiting with Open vSwitch.
  2025-12-09  7:57   ` Fernando Fernandez Mancera
@ 2025-12-09  8:42     ` Pablo Neira Ayuso
  2025-12-09  8:49       ` Rukomoinikova Aleksandra
  2025-12-10  8:03       ` Fernando Fernandez Mancera
  0 siblings, 2 replies; 10+ messages in thread
From: Pablo Neira Ayuso @ 2025-12-09  8:42 UTC (permalink / raw)
  To: Fernando Fernandez Mancera
  Cc: Odintsov Vladislav, netfilter@vger.kernel.org,
	ovs-dev@openvswitch.org, Kovalev Evgeniy, Vazhnetsov Anton,
	Rukomoinikova Aleksandra

On Tue, Dec 09, 2025 at 08:57:59AM +0100, Fernando Fernandez Mancera wrote:
> On 12/8/25 1:27 PM, Odintsov Vladislav wrote:
> > On 08.12.2025 15:06, Rukomoinikova Aleksandra wrote:
> > > Hi!
> > > I was testing conntrack limiting using Open vSwitch and noticed the
> > > following issue: under certain limits, a CPU lock occurred.
> > > 
> > > [  491.682936] watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [ovs-
> > > dpctl:19437]
> > > 
> > > This occurs during a high packet frequency when trying to get the set
> > > limits through ovs-dpctl ct-get-limits.
> > > 
> > > In the trace, I can see that the lock occurred on attempts to acquire a
> > > spinlock.
> > > 
> > > [  491.683056]  <IRQ>
> > > [  491.683059]  _raw_spin_lock_bh+0x29/0x30
> > > [  491.683064]  count_tree+0x19b/0x1f0 [nf_conncount]
> > > [  491.683069]  ovs_ct_commit+0x196/0x490 [openvswitch]
> > > 
> > > Prior to this, in the trace, there was processing of a task from
> > > userspace (ovs-dpctl)
> > > 
> > > [  491.683236]  </IRQ>
> > > [  491.683237]  <TASK>
> > > [  491.683238]  asm_common_interrupt+0x22/0x40
> > > [  491.683240] RIP: 0010:nf_conncount_gc_list+0x18a/0x200 [nf_conncount]
> > > 
> > > Inside the nf_conncount_gc_list function, a lock is taken on
> > > nf_conncount.c:spin_trylock_bh(&list->list_lock):335. After this, the
> > > not-so-fast __nf_conncount_gc_list function is executed. If, at this
> > > moment, a packet interrupt arrives on the same сpu core (and
> > > spin_trylock_bh doesn't disable interrupts on that core), then scenario
> > > I encountered occurs: the first lock remains held, while the packet
> > > interrupt also attempts to acquire it at
> > > nf_conncount.c:spin_lock_bh(&rbconn->list.list_lock):502 while
> > > committing to conntrack. This attempt fails, leading to a soft lockup.
> > > 
> 
> Yes that makes sense. That nf_conncount_gc_list() was added there to cover a
> different scenario which might be also affected by this soft lockup under
> the same conditions.

See below, a quick browsing tells me OVS forgot to disable BH to
perform this GC.

> > > Hence my question: shouldn't we avoid calling nf_conncount_gc_list when
> > > querying limits without an skb (as OVS does in openvswitch/
> > > conntrack.c:1773)? The limit retrieval operation should be read-only
> > > regarding the contract state, not involve potential modification.
> > > 
> > > Like this:
> > > --- a/net/netfilter/nf_conncount.c
> > > +++ b/net/netfilter/nf_conncount.c
> > > @@ -495,7 +495,6 @@ count_tree(struct net *net,
> > >                int ret;
> > > 
> > >                if (!skb) {
> > > -                nf_conncount_gc_list(net, &rbconn->list);
> > >                    return rbconn->list.count;
> > >                }
> > > 
> 
> Let me think on something, I would like to provide a solution that is
> suitable for OVS + xt/nft_connlimit. Because this change would break some
> xt_connlimit use-cases. Also without this nf_conncount_gc_list(), the
> connection count wouldn't be accurate.. if some connections closed already
> the count number would still consider them..

Side note, this particular line only affects OVS, which is the only
caller passing NULL as skb:

net/netfilter/xt_connlimit.c:   connections = nf_conncount_count_skb(net, skb, xt_family(par), info->data, key);
net/openvswitch/conntrack.c:    connections = nf_conncount_count_skb(net, skb, info->family,
net/openvswitch/conntrack.c:    zone_limit.count = nf_conncount_count_skb(net, NULL, 0, data,

Another relevant aspect: nf_conncount_gc_list() is called _without_
disabling BH (before recent Fernando's changes).

You fix it here, Fernando:

commit c0362b5748282e22fa1592a8d3474f726ad964c2
Author: Fernando Fernandez Mancera <fmancera@suse.de>
Date:   Fri Nov 21 01:14:31 2025 +0100
 
    netfilter: nf_conncount: make nf_conncount_gc_list() to disable BH

I think it is only a matter of backporting it to -stable.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: netfilter: nf_conncount: cpu soft lockup using limiting with Open vSwitch.
  2025-12-09  8:42     ` Pablo Neira Ayuso
@ 2025-12-09  8:49       ` Rukomoinikova Aleksandra
  2025-12-10  8:03       ` Fernando Fernandez Mancera
  1 sibling, 0 replies; 10+ messages in thread
From: Rukomoinikova Aleksandra @ 2025-12-09  8:49 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Fernando Fernandez Mancera
  Cc: Odintsov Vladislav, netfilter@vger.kernel.org,
	ovs-dev@openvswitch.org, Kovalev Evgeniy, Vazhnetsov Anton,
	Rukomoinikova Aleksandra

Hi! Yes, this really does fix the situation. I realized it only after I sent this letter) Thanks a lot Pablo and Fernando!



On 09.12.2025 11:42, Pablo Neira Ayuso wrote:
> On Tue, Dec 09, 2025 at 08:57:59AM +0100, Fernando Fernandez Mancera wrote:
>> On 12/8/25 1:27 PM, Odintsov Vladislav wrote:
>>> On 08.12.2025 15:06, Rukomoinikova Aleksandra wrote:
>>>> Hi!
>>>> I was testing conntrack limiting using Open vSwitch and noticed the
>>>> following issue: under certain limits, a CPU lock occurred.
>>>>
>>>> [  491.682936] watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [ovs-
>>>> dpctl:19437]
>>>>
>>>> This occurs during a high packet frequency when trying to get the set
>>>> limits through ovs-dpctl ct-get-limits.
>>>>
>>>> In the trace, I can see that the lock occurred on attempts to acquire a
>>>> spinlock.
>>>>
>>>> [  491.683056]  <IRQ>
>>>> [  491.683059]  _raw_spin_lock_bh+0x29/0x30
>>>> [  491.683064]  count_tree+0x19b/0x1f0 [nf_conncount]
>>>> [  491.683069]  ovs_ct_commit+0x196/0x490 [openvswitch]
>>>>
>>>> Prior to this, in the trace, there was processing of a task from
>>>> userspace (ovs-dpctl)
>>>>
>>>> [  491.683236]  </IRQ>
>>>> [  491.683237]  <TASK>
>>>> [  491.683238]  asm_common_interrupt+0x22/0x40
>>>> [  491.683240] RIP: 0010:nf_conncount_gc_list+0x18a/0x200 [nf_conncount]
>>>>
>>>> Inside the nf_conncount_gc_list function, a lock is taken on
>>>> nf_conncount.c:spin_trylock_bh(&list->list_lock):335. After this, the
>>>> not-so-fast __nf_conncount_gc_list function is executed. If, at this
>>>> moment, a packet interrupt arrives on the same сpu core (and
>>>> spin_trylock_bh doesn't disable interrupts on that core), then scenario
>>>> I encountered occurs: the first lock remains held, while the packet
>>>> interrupt also attempts to acquire it at
>>>> nf_conncount.c:spin_lock_bh(&rbconn->list.list_lock):502 while
>>>> committing to conntrack. This attempt fails, leading to a soft lockup.
>>>>
>> Yes that makes sense. That nf_conncount_gc_list() was added there to cover a
>> different scenario which might be also affected by this soft lockup under
>> the same conditions.
> See below, a quick browsing tells me OVS forgot to disable BH to
> perform this GC.
>
>>>> Hence my question: shouldn't we avoid calling nf_conncount_gc_list when
>>>> querying limits without an skb (as OVS does in openvswitch/
>>>> conntrack.c:1773)? The limit retrieval operation should be read-only
>>>> regarding the contract state, not involve potential modification.
>>>>
>>>> Like this:
>>>> --- a/net/netfilter/nf_conncount.c
>>>> +++ b/net/netfilter/nf_conncount.c
>>>> @@ -495,7 +495,6 @@ count_tree(struct net *net,
>>>>                 int ret;
>>>>
>>>>                 if (!skb) {
>>>> -                nf_conncount_gc_list(net, &rbconn->list);
>>>>                     return rbconn->list.count;
>>>>                 }
>>>>
>> Let me think on something, I would like to provide a solution that is
>> suitable for OVS + xt/nft_connlimit. Because this change would break some
>> xt_connlimit use-cases. Also without this nf_conncount_gc_list(), the
>> connection count wouldn't be accurate.. if some connections closed already
>> the count number would still consider them..
> Side note, this particular line only affects OVS, which is the only
> caller passing NULL as skb:
>
> net/netfilter/xt_connlimit.c:   connections = nf_conncount_count_skb(net, skb, xt_family(par), info->data, key);
> net/openvswitch/conntrack.c:    connections = nf_conncount_count_skb(net, skb, info->family,
> net/openvswitch/conntrack.c:    zone_limit.count = nf_conncount_count_skb(net, NULL, 0, data,
>
> Another relevant aspect: nf_conncount_gc_list() is called _without_
> disabling BH (before recent Fernando's changes).
>
> You fix it here, Fernando:
>
> commit c0362b5748282e22fa1592a8d3474f726ad964c2
> Author: Fernando Fernandez Mancera <fmancera@suse.de>
> Date:   Fri Nov 21 01:14:31 2025 +0100
>
>      netfilter: nf_conncount: make nf_conncount_gc_list() to disable BH
>
> I think it is only a matter of backporting it to -stable.


-- 
regards,
Alexandra.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: netfilter: nf_conncount: cpu soft lockup using limiting with Open vSwitch.
  2025-12-09  8:42     ` Pablo Neira Ayuso
  2025-12-09  8:49       ` Rukomoinikova Aleksandra
@ 2025-12-10  8:03       ` Fernando Fernandez Mancera
  2025-12-12 21:27         ` Rukomoinikova Aleksandra
  1 sibling, 1 reply; 10+ messages in thread
From: Fernando Fernandez Mancera @ 2025-12-10  8:03 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Odintsov Vladislav, netfilter@vger.kernel.org,
	ovs-dev@openvswitch.org, Kovalev Evgeniy, Vazhnetsov Anton,
	Rukomoinikova Aleksandra


On 12/9/25 9:42 AM, Pablo Neira Ayuso wrote:
> On Tue, Dec 09, 2025 at 08:57:59AM +0100, Fernando Fernandez Mancera wrote:
>> On 12/8/25 1:27 PM, Odintsov Vladislav wrote:
>>> On 08.12.2025 15:06, Rukomoinikova Aleksandra wrote:
>>>> Hi!
>>>> I was testing conntrack limiting using Open vSwitch and noticed the
>>>> following issue: under certain limits, a CPU lock occurred.
>>>>
>>>> [  491.682936] watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [ovs-
>>>> dpctl:19437]
>>>>
>>>> This occurs during a high packet frequency when trying to get the set
>>>> limits through ovs-dpctl ct-get-limits.
>>>>
>>>> In the trace, I can see that the lock occurred on attempts to acquire a
>>>> spinlock.
>>>>
>>>> [  491.683056]  <IRQ>
>>>> [  491.683059]  _raw_spin_lock_bh+0x29/0x30
>>>> [  491.683064]  count_tree+0x19b/0x1f0 [nf_conncount]
>>>> [  491.683069]  ovs_ct_commit+0x196/0x490 [openvswitch]
>>>>
>>>> Prior to this, in the trace, there was processing of a task from
>>>> userspace (ovs-dpctl)
>>>>
>>>> [  491.683236]  </IRQ>
>>>> [  491.683237]  <TASK>
>>>> [  491.683238]  asm_common_interrupt+0x22/0x40
>>>> [  491.683240] RIP: 0010:nf_conncount_gc_list+0x18a/0x200 [nf_conncount]
>>>>
>>>> Inside the nf_conncount_gc_list function, a lock is taken on
>>>> nf_conncount.c:spin_trylock_bh(&list->list_lock):335. After this, the
>>>> not-so-fast __nf_conncount_gc_list function is executed. If, at this
>>>> moment, a packet interrupt arrives on the same сpu core (and
>>>> spin_trylock_bh doesn't disable interrupts on that core), then scenario
>>>> I encountered occurs: the first lock remains held, while the packet
>>>> interrupt also attempts to acquire it at
>>>> nf_conncount.c:spin_lock_bh(&rbconn->list.list_lock):502 while
>>>> committing to conntrack. This attempt fails, leading to a soft lockup.
>>>>
>>
>> Yes that makes sense. That nf_conncount_gc_list() was added there to cover a
>> different scenario which might be also affected by this soft lockup under
>> the same conditions.
> 
> See below, a quick browsing tells me OVS forgot to disable BH to
> perform this GC.
> 
>>>> Hence my question: shouldn't we avoid calling nf_conncount_gc_list when
>>>> querying limits without an skb (as OVS does in openvswitch/
>>>> conntrack.c:1773)? The limit retrieval operation should be read-only
>>>> regarding the contract state, not involve potential modification.
>>>>
>>>> Like this:
>>>> --- a/net/netfilter/nf_conncount.c
>>>> +++ b/net/netfilter/nf_conncount.c
>>>> @@ -495,7 +495,6 @@ count_tree(struct net *net,
>>>>                 int ret;
>>>>
>>>>                 if (!skb) {
>>>> -                nf_conncount_gc_list(net, &rbconn->list);
>>>>                     return rbconn->list.count;
>>>>                 }
>>>>
>>
>> Let me think on something, I would like to provide a solution that is
>> suitable for OVS + xt/nft_connlimit. Because this change would break some
>> xt_connlimit use-cases. Also without this nf_conncount_gc_list(), the
>> connection count wouldn't be accurate.. if some connections closed already
>> the count number would still consider them..
> 
> Side note, this particular line only affects OVS, which is the only
> caller passing NULL as skb:
> 
> net/netfilter/xt_connlimit.c:   connections = nf_conncount_count_skb(net, skb, xt_family(par), info->data, key);
> net/openvswitch/conntrack.c:    connections = nf_conncount_count_skb(net, skb, info->family,
> net/openvswitch/conntrack.c:    zone_limit.count = nf_conncount_count_skb(net, NULL, 0, data,
> 
> Another relevant aspect: nf_conncount_gc_list() is called _without_
> disabling BH (before recent Fernando's changes).
> 
> You fix it here, Fernando:
> 
> commit c0362b5748282e22fa1592a8d3474f726ad964c2
> Author: Fernando Fernandez Mancera <fmancera@suse.de>
> Date:   Fri Nov 21 01:14:31 2025 +0100
>   
>      netfilter: nf_conncount: make nf_conncount_gc_list() to disable BH
> 
> I think it is only a matter of backporting it to -stable.

That is right, thanks Pablo. Just a note, that commit doesn't have a 
fixes tag because I just did it to simplify its use so it won't be 
picked automatically.. should we send a request to stable mailing list?

Thanks,
Fernando.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: netfilter: nf_conncount: cpu soft lockup using limiting with Open vSwitch.
  2025-12-10  8:03       ` Fernando Fernandez Mancera
@ 2025-12-12 21:27         ` Rukomoinikova Aleksandra
  2025-12-15 11:00           ` Fernando Fernandez Mancera
  0 siblings, 1 reply; 10+ messages in thread
From: Rukomoinikova Aleksandra @ 2025-12-12 21:27 UTC (permalink / raw)
  To: Fernando Fernandez Mancera, Pablo Neira Ayuso
  Cc: Odintsov Vladislav, netfilter@vger.kernel.org,
	ovs-dev@openvswitch.org, Kovalev Evgeniy, Vazhnetsov Anton,
	Rukomoinikova Aleksandra

Hi one more time! I found another issue. I'll describe it below.

In my opinion, it's relevant after merging [1]  I saw that a fix for 
this commit was merged last week, but but it doesn't fix case I'll 
describe below.

I create limits via Open vSwitch and run a TCP flood this way: hping3 -S 
-I host -p 10880 -i u5 10.255.41.101 -c 100 (Here, -i u5 is important, 
meaning with timers < jiffies; it's more likely the issue won't 
reproduce otherwise)
 And I set the following limit on the zone where these connections 
arrive: ovs-dpctl ct-set-limits zone=9,limit=100

I start traffic, I see traffic on the interface: TCP SYN -> TCP SYN+ACK 
-> TCP RST. Zone 9 overflows, but connections immediately become CLOSED. 
I run hping3 again: hping3 -S -I host -p 10880 -i u5 10.255.41.101 -c 
100 - I don't see anything on the interface and I see messages in dmesg 
from openvswitch saying the number of connections exceeds the limit. At 
this moment, if I call ovs-dpctl ct-get-limits, traffic will immediately 
start flowing again because the limit will be reset to zero.

I think the problem is as follows: before commit [1], we called 
__nf_conncount_gc_list for every connection, and this function iterates 
over all connections and cleans up those already closed.

What we have now is that when trying to add a connection in 
__nf_conncount_add, if we don't find it, then while handling errors, we 
don't continue the iteration further and immediately exit the function 
with zero, which represents the current connection count - we will clean 
connections in the list only until we find the connection we want to 
commit now - meaning the connection count will become outdated.

Furthermore, we then go to check already_closed found connections and 
iterate collect variable, which also doesn't allow connections to be 
fully cleaned up; we will clean up a maximum of 8 
(CONNCOUNT_GC_MAX_NODES) entries per one call to __nf_conncount_add.

Also, with such a TCP attack - when connections transition to CLOSED 
immediately - __nf_conncount_gc_list won't be called at all, because we 
will constantly be calling __nf_conncount_add and updating the last_gc 
status. That's why ovs-dpctl ct-get-limits helps; it simply calls 
__nf_conncount_gc_list and cleans up all closed connections.
I propose the following behavior, which will be similar to what we had 
before [2]

diff --git a/net/netfilter/nf_conncount.c b/net/netfilter/nf_conncount.c
index 19039a0802b8..e5224785f01e 100644
--- a/net/netfilter/nf_conncount.c
+++ b/net/netfilter/nf_conncount.c
@@ -171,6 +171,7 @@ static int __nf_conncount_add(struct net *net,
      struct nf_conn *found_ct;
      unsigned int collect = 0;
      bool refcounted = false;
+    bool need_add = false;

      if (!get_ct_or_tuple_from_skb(net, skb, l3num, &ct, &tuple, &zone, 
&refcounted))
          return -ENOENT;
@@ -196,7 +197,8 @@ static int __nf_conncount_add(struct net *net,
                  if (nf_ct_tuple_equal(&conn->tuple, &tuple) &&
                      nf_ct_zone_id(&conn->zone, conn->zone.dir) ==
                      nf_ct_zone_id(zone, zone->dir))
-                    goto out_put; /* already exists */
+                    /* already exists */
+                    need_add = false;
              } else {
                  collect++;
              }
@@ -214,7 +216,7 @@ static int __nf_conncount_add(struct net *net,
               * Attempt to avoid a re-add in this case.
               */
              nf_ct_put(found_ct);
-            goto out_put;
+            need_add = false;
          } else if (already_closed(found_ct)) {
              /*
               * we do not care about connections which are
@@ -222,13 +224,16 @@ static int __nf_conncount_add(struct net *net,
               */
              nf_ct_put(found_ct);
              conn_free(list, conn);
-            collect++;
              continue;
          }

          nf_ct_put(found_ct);
      }

+    if (!need_add) {
+        goto out_put;
+    }
+
  add_new_node:

[1] netfilter: nf_conncount: reduce unnecessary GC commit 
https://github.com/torvalds/linux/commit/d265929930e2ffafc744c0ae05fb70acd53be1ee
[2] netfilter: nf_conncount: merge lookup and add functions commit. 
https://github.com/torvalds/linux/commit/df4a902509766897f7371fdfa4c3bf8bc321b55d
[3] netfilter: nft_connlimit: update the count if add was skipped 
https://github.com/torvalds/linux/commit/69894e5b4c5e28cda5f32af33d4a92b7a4b93b0e

Do you think I missed any cases, and how will this affect the function's 
performance? Thanks)



On 10.12.2025 11:03, Fernando Fernandez Mancera wrote:
>
> On 12/9/25 9:42 AM, Pablo Neira Ayuso wrote:
>> On Tue, Dec 09, 2025 at 08:57:59AM +0100, Fernando Fernandez Mancera 
>> wrote:
>>> On 12/8/25 1:27 PM, Odintsov Vladislav wrote:
>>>> On 08.12.2025 15:06, Rukomoinikova Aleksandra wrote:
>>>>> Hi!
>>>>> I was testing conntrack limiting using Open vSwitch and noticed the
>>>>> following issue: under certain limits, a CPU lock occurred.
>>>>>
>>>>> [  491.682936] watchdog: BUG: soft lockup - CPU#1 stuck for 26s! 
>>>>> [ovs-
>>>>> dpctl:19437]
>>>>>
>>>>> This occurs during a high packet frequency when trying to get the set
>>>>> limits through ovs-dpctl ct-get-limits.
>>>>>
>>>>> In the trace, I can see that the lock occurred on attempts to 
>>>>> acquire a
>>>>> spinlock.
>>>>>
>>>>> [  491.683056]  <IRQ>
>>>>> [  491.683059]  _raw_spin_lock_bh+0x29/0x30
>>>>> [  491.683064]  count_tree+0x19b/0x1f0 [nf_conncount]
>>>>> [  491.683069]  ovs_ct_commit+0x196/0x490 [openvswitch]
>>>>>
>>>>> Prior to this, in the trace, there was processing of a task from
>>>>> userspace (ovs-dpctl)
>>>>>
>>>>> [  491.683236]  </IRQ>
>>>>> [  491.683237]  <TASK>
>>>>> [  491.683238]  asm_common_interrupt+0x22/0x40
>>>>> [  491.683240] RIP: 0010:nf_conncount_gc_list+0x18a/0x200 
>>>>> [nf_conncount]
>>>>>
>>>>> Inside the nf_conncount_gc_list function, a lock is taken on
>>>>> nf_conncount.c:spin_trylock_bh(&list->list_lock):335. After this, the
>>>>> not-so-fast __nf_conncount_gc_list function is executed. If, at this
>>>>> moment, a packet interrupt arrives on the same сpu core (and
>>>>> spin_trylock_bh doesn't disable interrupts on that core), then 
>>>>> scenario
>>>>> I encountered occurs: the first lock remains held, while the packet
>>>>> interrupt also attempts to acquire it at
>>>>> nf_conncount.c:spin_lock_bh(&rbconn->list.list_lock):502 while
>>>>> committing to conntrack. This attempt fails, leading to a soft 
>>>>> lockup.
>>>>>
>>>
>>> Yes that makes sense. That nf_conncount_gc_list() was added there to 
>>> cover a
>>> different scenario which might be also affected by this soft lockup 
>>> under
>>> the same conditions.
>>
>> See below, a quick browsing tells me OVS forgot to disable BH to
>> perform this GC.
>>
>>>>> Hence my question: shouldn't we avoid calling nf_conncount_gc_list 
>>>>> when
>>>>> querying limits without an skb (as OVS does in openvswitch/
>>>>> conntrack.c:1773)? The limit retrieval operation should be read-only
>>>>> regarding the contract state, not involve potential modification.
>>>>>
>>>>> Like this:
>>>>> --- a/net/netfilter/nf_conncount.c
>>>>> +++ b/net/netfilter/nf_conncount.c
>>>>> @@ -495,7 +495,6 @@ count_tree(struct net *net,
>>>>>                 int ret;
>>>>>
>>>>>                 if (!skb) {
>>>>> -                nf_conncount_gc_list(net, &rbconn->list);
>>>>>                     return rbconn->list.count;
>>>>>                 }
>>>>>
>>>
>>> Let me think on something, I would like to provide a solution that is
>>> suitable for OVS + xt/nft_connlimit. Because this change would break 
>>> some
>>> xt_connlimit use-cases. Also without this nf_conncount_gc_list(), the
>>> connection count wouldn't be accurate.. if some connections closed 
>>> already
>>> the count number would still consider them..
>>
>> Side note, this particular line only affects OVS, which is the only
>> caller passing NULL as skb:
>>
>> net/netfilter/xt_connlimit.c:   connections = 
>> nf_conncount_count_skb(net, skb, xt_family(par), info->data, key);
>> net/openvswitch/conntrack.c:    connections = 
>> nf_conncount_count_skb(net, skb, info->family,
>> net/openvswitch/conntrack.c:    zone_limit.count = 
>> nf_conncount_count_skb(net, NULL, 0, data,
>>
>> Another relevant aspect: nf_conncount_gc_list() is called _without_
>> disabling BH (before recent Fernando's changes).
>>
>> You fix it here, Fernando:
>>
>> commit c0362b5748282e22fa1592a8d3474f726ad964c2
>> Author: Fernando Fernandez Mancera <fmancera@suse.de>
>> Date:   Fri Nov 21 01:14:31 2025 +0100
>>
>>      netfilter: nf_conncount: make nf_conncount_gc_list() to disable BH
>>
>> I think it is only a matter of backporting it to -stable.
>
> That is right, thanks Pablo. Just a note, that commit doesn't have a
> fixes tag because I just did it to simplify its use so it won't be
> picked automatically.. should we send a request to stable mailing list?
>
> Thanks,
> Fernando.


-- 
regards,
Alexandra.


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: netfilter: nf_conncount: cpu soft lockup using limiting with Open vSwitch.
  2025-12-12 21:27         ` Rukomoinikova Aleksandra
@ 2025-12-15 11:00           ` Fernando Fernandez Mancera
  2025-12-15 11:07             ` Rukomoinikova Aleksandra
  0 siblings, 1 reply; 10+ messages in thread
From: Fernando Fernandez Mancera @ 2025-12-15 11:00 UTC (permalink / raw)
  To: Rukomoinikova Aleksandra, Pablo Neira Ayuso
  Cc: Odintsov Vladislav, netfilter@vger.kernel.org,
	ovs-dev@openvswitch.org, Kovalev Evgeniy, Vazhnetsov Anton

On 12/12/25 10:27 PM, Rukomoinikova Aleksandra wrote:
> Hi one more time! I found another issue. I'll describe it below.
> 
> In my opinion, it's relevant after merging [1]  I saw that a fix for
> this commit was merged last week, but but it doesn't fix case I'll
> describe below.
> 

Just to be sure, you have tested this with the latest mainline kernel, 
right? Because as you mentioned we merged several fixes addressing 
outdated counts.

> I create limits via Open vSwitch and run a TCP flood this way: hping3 -S
> -I host -p 10880 -i u5 10.255.41.101 -c 100 (Here, -i u5 is important,
> meaning with timers < jiffies; it's more likely the issue won't
> reproduce otherwise)
>  And I set the following limit on the zone where these connections
> arrive: ovs-dpctl ct-set-limits zone=9,limit=100
> 
> I start traffic, I see traffic on the interface: TCP SYN -> TCP SYN+ACK
> -> TCP RST. Zone 9 overflows, but connections immediately become CLOSED.
> I run hping3 again: hping3 -S -I host -p 10880 -i u5 10.255.41.101 -c
> 100 - I don't see anything on the interface and I see messages in dmesg
> from openvswitch saying the number of connections exceeds the limit. At
> this moment, if I call ovs-dpctl ct-get-limits, traffic will immediately
> start flowing again because the limit will be reset to zero.
> 

To me it seems that openvswitch should perform a GC somewhere similar to 
what we did on nft_connlimit/xt_connlimit.

> I think the problem is as follows: before commit [1], we called
> __nf_conncount_gc_list for every connection, and this function iterates
> over all connections and cleans up those already closed.
> 
> What we have now is that when trying to add a connection in
> __nf_conncount_add, if we don't find it, then while handling errors, we
> don't continue the iteration further and immediately exit the function
> with zero, which represents the current connection count - we will clean> connections in the list only until we find the connection we want to
> commit now - meaning the connection count will become outdated.
> 
> Furthermore, we then go to check already_closed found connections and
> iterate collect variable, which also doesn't allow connections to be
> fully cleaned up; we will clean up a maximum of 8
> (CONNCOUNT_GC_MAX_NODES) entries per one call to __nf_conncount_add.
> 
> Also, with such a TCP attack - when connections transition to CLOSED
> immediately - __nf_conncount_gc_list won't be called at all, because we
> will constantly be calling __nf_conncount_add and updating the last_gc
> status. That's why ovs-dpctl ct-get-limits helps; it simply calls
> __nf_conncount_gc_list and cleans up all closed connections.
> I propose the following behavior, which will be similar to what we had
> before [2]
> 
> diff --git a/net/netfilter/nf_conncount.c b/net/netfilter/nf_conncount.c
> index 19039a0802b8..e5224785f01e 100644
> --- a/net/netfilter/nf_conncount.c
> +++ b/net/netfilter/nf_conncount.c
> @@ -171,6 +171,7 @@ static int __nf_conncount_add(struct net *net,
>        struct nf_conn *found_ct;
>        unsigned int collect = 0;
>        bool refcounted = false;
> +    bool need_add = false;
> 
>        if (!get_ct_or_tuple_from_skb(net, skb, l3num, &ct, &tuple, &zone,
> &refcounted))
>            return -ENOENT;
> @@ -196,7 +197,8 @@ static int __nf_conncount_add(struct net *net,
>                    if (nf_ct_tuple_equal(&conn->tuple, &tuple) &&
>                        nf_ct_zone_id(&conn->zone, conn->zone.dir) ==
>                        nf_ct_zone_id(zone, zone->dir))
> -                    goto out_put; /* already exists */
> +                    /* already exists */
> +                    need_add = false;

I don't see the point to continue here. If we reached this, it means the 
ct is already tracked. Sure, the count is not being updated but does it 
matter? This connection is already tracked.

>                } else {
>                    collect++;
>                }
> @@ -214,7 +216,7 @@ static int __nf_conncount_add(struct net *net,
>                 * Attempt to avoid a re-add in this case.
>                 */
>                nf_ct_put(found_ct);
> -            goto out_put;
> +            need_add = false;
>            } else if (already_closed(found_ct)) {
>                /*
>                 * we do not care about connections which are
> @@ -222,13 +224,16 @@ static int __nf_conncount_add(struct net *net,
>                 */
>                nf_ct_put(found_ct);
>                conn_free(list, conn);
> -            collect++;
>                continue;
>            }

This worries me a bit, it would mean that for every legit add operation 
the function will go through ALL the connections tracked which might be 
a really huge number. This would impact the performance.

IMO, this is the only relevant line on the patch for your use-case 
probably. I do not think the others have any impact. I am wondering if 
this can be fixed by handling it from openvswitch side by calling gc 
when needed.

I am going to try to reproduce this locally on a machine I have. Let's 
see what I can get.

Thanks,
Fernando.

> 
>            nf_ct_put(found_ct);
>        }
> 
> +    if (!need_add) {
> +        goto out_put;
> +    }
> +
>    add_new_node:
> 
> [1] netfilter: nf_conncount: reduce unnecessary GC commit
> https://github.com/torvalds/linux/commit/d265929930e2ffafc744c0ae05fb70acd53be1ee
> [2] netfilter: nf_conncount: merge lookup and add functions commit.
> https://github.com/torvalds/linux/commit/df4a902509766897f7371fdfa4c3bf8bc321b55d
> [3] netfilter: nft_connlimit: update the count if add was skipped
> https://github.com/torvalds/linux/commit/69894e5b4c5e28cda5f32af33d4a92b7a4b93b0e
> 
> Do you think I missed any cases, and how will this affect the function's
> performance? Thanks)
> 



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: netfilter: nf_conncount: cpu soft lockup using limiting with Open vSwitch.
  2025-12-15 11:00           ` Fernando Fernandez Mancera
@ 2025-12-15 11:07             ` Rukomoinikova Aleksandra
  2025-12-15 15:38               ` Fernando Fernandez Mancera
  0 siblings, 1 reply; 10+ messages in thread
From: Rukomoinikova Aleksandra @ 2025-12-15 11:07 UTC (permalink / raw)
  To: Fernando Fernandez Mancera, Rukomoinikova Aleksandra,
	Pablo Neira Ayuso
  Cc: Odintsov Vladislav, netfilter@vger.kernel.org,
	ovs-dev@openvswitch.org, Kovalev Evgeniy, Vazhnetsov Anton

On 15.12.2025 14:00, Fernando Fernandez Mancera wrote:
> On 12/12/25 10:27 PM, Rukomoinikova Aleksandra wrote:
>> Hi one more time! I found another issue. I'll describe it below.
>>
>> In my opinion, it's relevant after merging [1]  I saw that a fix for
>> this commit was merged last week, but but it doesn't fix case I'll
>> describe below.
>>
>
> Just to be sure, you have tested this with the latest mainline kernel,
> right? Because as you mentioned we merged several fixes addressing
> outdated counts.
HI! yes
>
>> I create limits via Open vSwitch and run a TCP flood this way: hping3 -S
>> -I host -p 10880 -i u5 10.255.41.101 -c 100 (Here, -i u5 is important,
>> meaning with timers < jiffies; it's more likely the issue won't
>> reproduce otherwise)
>>  And I set the following limit on the zone where these connections
>> arrive: ovs-dpctl ct-set-limits zone=9,limit=100
>>
>> I start traffic, I see traffic on the interface: TCP SYN -> TCP SYN+ACK
>> -> TCP RST. Zone 9 overflows, but connections immediately become CLOSED.
>> I run hping3 again: hping3 -S -I host -p 10880 -i u5 10.255.41.101 -c
>> 100 - I don't see anything on the interface and I see messages in dmesg
>> from openvswitch saying the number of connections exceeds the limit. At
>> this moment, if I call ovs-dpctl ct-get-limits, traffic will immediately
>> start flowing again because the limit will be reset to zero.
>>
>
> To me it seems that openvswitch should perform a GC somewhere similar to
> what we did on nft_connlimit/xt_connlimit.
hm, Thanks! I'll try to think about implementing this.
>
>> I think the problem is as follows: before commit [1], we called
>> __nf_conncount_gc_list for every connection, and this function iterates
>> over all connections and cleans up those already closed.
>>
>> What we have now is that when trying to add a connection in
>> __nf_conncount_add, if we don't find it, then while handling errors, we
>> don't continue the iteration further and immediately exit the function
>> with zero, which represents the current connection count - we will 
>> clean> connections in the list only until we find the connection we 
>> want to
>> commit now - meaning the connection count will become outdated.
>>
>> Furthermore, we then go to check already_closed found connections and
>> iterate collect variable, which also doesn't allow connections to be
>> fully cleaned up; we will clean up a maximum of 8
>> (CONNCOUNT_GC_MAX_NODES) entries per one call to __nf_conncount_add.
>>
>> Also, with such a TCP attack - when connections transition to CLOSED
>> immediately - __nf_conncount_gc_list won't be called at all, because we
>> will constantly be calling __nf_conncount_add and updating the last_gc
>> status. That's why ovs-dpctl ct-get-limits helps; it simply calls
>> __nf_conncount_gc_list and cleans up all closed connections.
>> I propose the following behavior, which will be similar to what we had
>> before [2]
>>
>> diff --git a/net/netfilter/nf_conncount.c b/net/netfilter/nf_conncount.c
>> index 19039a0802b8..e5224785f01e 100644
>> --- a/net/netfilter/nf_conncount.c
>> +++ b/net/netfilter/nf_conncount.c
>> @@ -171,6 +171,7 @@ static int __nf_conncount_add(struct net *net,
>>        struct nf_conn *found_ct;
>>        unsigned int collect = 0;
>>        bool refcounted = false;
>> +    bool need_add = false;
>>
>>        if (!get_ct_or_tuple_from_skb(net, skb, l3num, &ct, &tuple, 
>> &zone,
>> &refcounted))
>>            return -ENOENT;
>> @@ -196,7 +197,8 @@ static int __nf_conncount_add(struct net *net,
>>                    if (nf_ct_tuple_equal(&conn->tuple, &tuple) &&
>>                        nf_ct_zone_id(&conn->zone, conn->zone.dir) ==
>>                        nf_ct_zone_id(zone, zone->dir))
>> -                    goto out_put; /* already exists */
>> +                    /* already exists */
>> +                    need_add = false;
>
> I don't see the point to continue here. If we reached this, it means the
> ct is already tracked. Sure, the count is not being updated but does it
> matter? This connection is already tracked.
>
>>                } else {
>>                    collect++;
>>                }
>> @@ -214,7 +216,7 @@ static int __nf_conncount_add(struct net *net,
>>                 * Attempt to avoid a re-add in this case.
>>                 */
>>                nf_ct_put(found_ct);
>> -            goto out_put;
>> +            need_add = false;
>>            } else if (already_closed(found_ct)) {
>>                /*
>>                 * we do not care about connections which are
>> @@ -222,13 +224,16 @@ static int __nf_conncount_add(struct net *net,
>>                 */
>>                nf_ct_put(found_ct);
>>                conn_free(list, conn);
>> -            collect++;
>>                continue;
>>            }
>
> This worries me a bit, it would mean that for every legit add operation
> the function will go through ALL the connections tracked which might be
> a really huge number. This would impact the performance.
>
> IMO, this is the only relevant line on the patch for your use-case
> probably. I do not think the others have any impact. I am wondering if
> this can be fixed by handling it from openvswitch side by calling gc
> when needed.
>
> I am going to try to reproduce this locally on a machine I have. Let's
> see what I can get.
Thanks!
>
> Thanks,
> Fernando.
>
>>
>>            nf_ct_put(found_ct);
>>        }
>>
>> +    if (!need_add) {
>> +        goto out_put;
>> +    }
>> +
>>    add_new_node:
>>
>> [1] netfilter: nf_conncount: reduce unnecessary GC commit
>> https://github.com/torvalds/linux/commit/d265929930e2ffafc744c0ae05fb70acd53be1ee 
>>
>> [2] netfilter: nf_conncount: merge lookup and add functions commit.
>> https://github.com/torvalds/linux/commit/df4a902509766897f7371fdfa4c3bf8bc321b55d 
>>
>> [3] netfilter: nft_connlimit: update the count if add was skipped
>> https://github.com/torvalds/linux/commit/69894e5b4c5e28cda5f32af33d4a92b7a4b93b0e 
>>
>>
>> Do you think I missed any cases, and how will this affect the function's
>> performance? Thanks)
>>
>
>

-- 
regards,
Alexandra.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: netfilter: nf_conncount: cpu soft lockup using limiting with Open vSwitch.
  2025-12-15 11:07             ` Rukomoinikova Aleksandra
@ 2025-12-15 15:38               ` Fernando Fernandez Mancera
  0 siblings, 0 replies; 10+ messages in thread
From: Fernando Fernandez Mancera @ 2025-12-15 15:38 UTC (permalink / raw)
  To: Rukomoinikova Aleksandra, Pablo Neira Ayuso
  Cc: Odintsov Vladislav, netfilter@vger.kernel.org,
	ovs-dev@openvswitch.org, Kovalev Evgeniy, Vazhnetsov Anton



On 12/15/25 12:07 PM, Rukomoinikova Aleksandra wrote:
> On 15.12.2025 14:00, Fernando Fernandez Mancera wrote:
>> On 12/12/25 10:27 PM, Rukomoinikova Aleksandra wrote:
>>> Hi one more time! I found another issue. I'll describe it below.
>>>
>>> In my opinion, it's relevant after merging [1]  I saw that a fix for
>>> this commit was merged last week, but but it doesn't fix case I'll
>>> describe below.
>>>
>>
>> Just to be sure, you have tested this with the latest mainline kernel,
>> right? Because as you mentioned we merged several fixes addressing
>> outdated counts.
> HI! yes
>>
>>> I create limits via Open vSwitch and run a TCP flood this way: hping3 -S
>>> -I host -p 10880 -i u5 10.255.41.101 -c 100 (Here, -i u5 is important,
>>> meaning with timers < jiffies; it's more likely the issue won't
>>> reproduce otherwise)
>>>  And I set the following limit on the zone where these connections
>>> arrive: ovs-dpctl ct-set-limits zone=9,limit=100
>>>
>>> I start traffic, I see traffic on the interface: TCP SYN -> TCP SYN+ACK
>>> -> TCP RST. Zone 9 overflows, but connections immediately become CLOSED.
>>> I run hping3 again: hping3 -S -I host -p 10880 -i u5 10.255.41.101 -c
>>> 100 - I don't see anything on the interface and I see messages in dmesg
>>> from openvswitch saying the number of connections exceeds the limit. At
>>> this moment, if I call ovs-dpctl ct-get-limits, traffic will immediately
>>> start flowing again because the limit will be reset to zero.
>>>
>>
>> To me it seems that openvswitch should perform a GC somewhere similar to
>> what we did on nft_connlimit/xt_connlimit.
> hm, Thanks! I'll try to think about implementing this.

I investigated this and managed to reproduce it. The hping3 with the 
option you mentioned is sending a burst of packets that makes 
openvswitch to process more than 8 packets per jiffy.

That is relevant because 2 different things:
1. Currently __nf_conncount_add() stops cleaning up connections after 8 
connections.

2. The optimization introduced to only perform one GC per jiffy.

Therefore, if someone manage to pass traffic fast enough you can grow 
that number quite fast.

I would like to propose to increase that number to 64, although it might 
make sense to match net.core.netdev_budget default value which is 300 
currently. I have tested both values with OVS and solves the problem.

Any opinion? What do you think?

Also, just as a note: this issue was introduced by d265929930e2 (" 
netfilter: nf_conncount: reduce unnecessary GC"), I bisected it.

>>
>>> I think the problem is as follows: before commit [1], we called
>>> __nf_conncount_gc_list for every connection, and this function iterates
>>> over all connections and cleans up those already closed.
>>>
>>> What we have now is that when trying to add a connection in
>>> __nf_conncount_add, if we don't find it, then while handling errors, we
>>> don't continue the iteration further and immediately exit the function
>>> with zero, which represents the current connection count - we will
>>> clean> connections in the list only until we find the connection we
>>> want to
>>> commit now - meaning the connection count will become outdated.
>>>
>>> Furthermore, we then go to check already_closed found connections and
>>> iterate collect variable, which also doesn't allow connections to be
>>> fully cleaned up; we will clean up a maximum of 8
>>> (CONNCOUNT_GC_MAX_NODES) entries per one call to __nf_conncount_add.
>>>
>>> Also, with such a TCP attack - when connections transition to CLOSED
>>> immediately - __nf_conncount_gc_list won't be called at all, because we
>>> will constantly be calling __nf_conncount_add and updating the last_gc
>>> status. That's why ovs-dpctl ct-get-limits helps; it simply calls
>>> __nf_conncount_gc_list and cleans up all closed connections.
>>> I propose the following behavior, which will be similar to what we had
>>> before [2]
>>>
>>> diff --git a/net/netfilter/nf_conncount.c b/net/netfilter/nf_conncount.c
>>> index 19039a0802b8..e5224785f01e 100644
>>> --- a/net/netfilter/nf_conncount.c
>>> +++ b/net/netfilter/nf_conncount.c
>>> @@ -171,6 +171,7 @@ static int __nf_conncount_add(struct net *net,
>>>         struct nf_conn *found_ct;
>>>         unsigned int collect = 0;
>>>         bool refcounted = false;
>>> +    bool need_add = false;
>>>
>>>         if (!get_ct_or_tuple_from_skb(net, skb, l3num, &ct, &tuple,
>>> &zone,
>>> &refcounted))
>>>             return -ENOENT;
>>> @@ -196,7 +197,8 @@ static int __nf_conncount_add(struct net *net,
>>>                     if (nf_ct_tuple_equal(&conn->tuple, &tuple) &&
>>>                         nf_ct_zone_id(&conn->zone, conn->zone.dir) ==
>>>                         nf_ct_zone_id(zone, zone->dir))
>>> -                    goto out_put; /* already exists */
>>> +                    /* already exists */
>>> +                    need_add = false;
>>
>> I don't see the point to continue here. If we reached this, it means the
>> ct is already tracked. Sure, the count is not being updated but does it
>> matter? This connection is already tracked.
>>
>>>                 } else {
>>>                     collect++;
>>>                 }
>>> @@ -214,7 +216,7 @@ static int __nf_conncount_add(struct net *net,
>>>                  * Attempt to avoid a re-add in this case.
>>>                  */
>>>                 nf_ct_put(found_ct);
>>> -            goto out_put;
>>> +            need_add = false;
>>>             } else if (already_closed(found_ct)) {
>>>                 /*
>>>                  * we do not care about connections which are
>>> @@ -222,13 +224,16 @@ static int __nf_conncount_add(struct net *net,
>>>                  */
>>>                 nf_ct_put(found_ct);
>>>                 conn_free(list, conn);
>>> -            collect++;
>>>                 continue;
>>>             }
>>
>> This worries me a bit, it would mean that for every legit add operation
>> the function will go through ALL the connections tracked which might be
>> a really huge number. This would impact the performance.
>>
>> IMO, this is the only relevant line on the patch for your use-case
>> probably. I do not think the others have any impact. I am wondering if
>> this can be fixed by handling it from openvswitch side by calling gc
>> when needed.
>>
>> I am going to try to reproduce this locally on a machine I have. Let's
>> see what I can get.
> Thanks!
>>
>> Thanks,
>> Fernando.
>>
>>>
>>>             nf_ct_put(found_ct);
>>>         }
>>>
>>> +    if (!need_add) {
>>> +        goto out_put;
>>> +    }
>>> +
>>>     add_new_node:
>>>
>>> [1] netfilter: nf_conncount: reduce unnecessary GC commit
>>> https://github.com/torvalds/linux/commit/d265929930e2ffafc744c0ae05fb70acd53be1ee
>>>
>>> [2] netfilter: nf_conncount: merge lookup and add functions commit.
>>> https://github.com/torvalds/linux/commit/df4a902509766897f7371fdfa4c3bf8bc321b55d
>>>
>>> [3] netfilter: nft_connlimit: update the count if add was skipped
>>> https://github.com/torvalds/linux/commit/69894e5b4c5e28cda5f32af33d4a92b7a4b93b0e
>>>
>>>
>>> Do you think I missed any cases, and how will this affect the function's
>>> performance? Thanks)
>>>
>>
>>
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* netfilter: nf_conncount: cpu soft lockup using limiting with Open vSwitch.
@ 2025-12-08 12:31 Rukomoinikova Aleksandra
  0 siblings, 0 replies; 10+ messages in thread
From: Rukomoinikova Aleksandra @ 2025-12-08 12:31 UTC (permalink / raw)
  To: netfilter@vger.kernel.org
  Cc: Odintsov Vladislav, Kovalev Evgeniy, Vazhnetsov Anton,
	ovs-dev@openvswitch.org

Hi!
I was testing conntrack limiting using Open vSwitch and noticed the 
following issue: under certain limits, a CPU lock occurred.

[  491.682936] watchdog: BUG: soft lockup - CPU#1 stuck for 26s! 
[ovs-dpctl:19437]

This occurs during a high packet frequency when trying to get the set 
limits through ovs-dpctl ct-get-limits.

In the trace, I can see that the lock occurred on attempts to acquire a 
spinlock.

[  491.683056]  <IRQ>
[  491.683059]  _raw_spin_lock_bh+0x29/0x30
[  491.683064]  count_tree+0x19b/0x1f0 [nf_conncount]
[  491.683069]  ovs_ct_commit+0x196/0x490 [openvswitch]

Prior to this, in the trace, there was processing of a task from 
userspace (ovs-dpctl)

[  491.683236]  </IRQ>
[  491.683237]  <TASK>
[  491.683238]  asm_common_interrupt+0x22/0x40
[  491.683240] RIP: 0010:nf_conncount_gc_list+0x18a/0x200 [nf_conncount]

Inside the nf_conncount_gc_list function, a lock is taken on 
nf_conncount.c:spin_trylock_bh(&list->list_lock):335. After this, the 
not-so-fast __nf_conncount_gc_list function is executed. If, at this 
moment, a packet interrupt arrives on the same сpu core (and 
spin_trylock_bh doesn't disable interrupts on that core), then scenario 
I encountered occurs: the first lock remains held, while the packet 
interrupt also attempts to acquire it at 
nf_conncount.c:spin_lock_bh(&rbconn->list.list_lock):502 while 
committing to conntrack. This attempt fails, leading to a soft lockup.

Hence my question: shouldn't we avoid calling nf_conncount_gc_list when 
querying limits without an skb (as OVS does in 
openvswitch/conntrack.c:1773)? The limit retrieval operation should be 
read-only regarding the contract state, not involve potential modification.

Like this:
--- a/net/netfilter/nf_conncount.c
+++ b/net/netfilter/nf_conncount.c
@@ -495,7 +495,6 @@ count_tree(struct net *net,
              int ret;

              if (!skb) {
-                nf_conncount_gc_list(net, &rbconn->list);
                  return rbconn->list.count;
              }

Thanks!

-- 
regards,
Alexandra.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-12-15 15:38 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <7a6872ce-8015-4397-bbe9-22108c65b7ec@k2.cloud>
2025-12-08 12:27 ` netfilter: nf_conncount: cpu soft lockup using limiting with Open vSwitch Odintsov Vladislav
2025-12-09  7:57   ` Fernando Fernandez Mancera
2025-12-09  8:42     ` Pablo Neira Ayuso
2025-12-09  8:49       ` Rukomoinikova Aleksandra
2025-12-10  8:03       ` Fernando Fernandez Mancera
2025-12-12 21:27         ` Rukomoinikova Aleksandra
2025-12-15 11:00           ` Fernando Fernandez Mancera
2025-12-15 11:07             ` Rukomoinikova Aleksandra
2025-12-15 15:38               ` Fernando Fernandez Mancera
2025-12-08 12:31 Rukomoinikova Aleksandra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox