* RE: netfilter: nf_conncount: cpu soft lockup using limiting with Open vSwitch. [not found] <7a6872ce-8015-4397-bbe9-22108c65b7ec@k2.cloud> @ 2025-12-08 12:27 ` Odintsov Vladislav 2025-12-09 7:57 ` Fernando Fernandez Mancera 0 siblings, 1 reply; 10+ messages in thread From: Odintsov Vladislav @ 2025-12-08 12:27 UTC (permalink / raw) To: netfilter@vger.kernel.org, ovs-dev@openvswitch.org Cc: Kovalev Evgeniy, Vazhnetsov Anton, Rukomoinikova Aleksandra On 08.12.2025 15:06, Rukomoinikova Aleksandra wrote: > Hi! > I was testing conntrack limiting using Open vSwitch and noticed the > following issue: under certain limits, a CPU lock occurred. > > [ 491.682936] watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [ovs- > dpctl:19437] > > This occurs during a high packet frequency when trying to get the set > limits through ovs-dpctl ct-get-limits. > > In the trace, I can see that the lock occurred on attempts to acquire a > spinlock. > > [ 491.683056] <IRQ> > [ 491.683059] _raw_spin_lock_bh+0x29/0x30 > [ 491.683064] count_tree+0x19b/0x1f0 [nf_conncount] > [ 491.683069] ovs_ct_commit+0x196/0x490 [openvswitch] > > Prior to this, in the trace, there was processing of a task from > userspace (ovs-dpctl) > > [ 491.683236] </IRQ> > [ 491.683237] <TASK> > [ 491.683238] asm_common_interrupt+0x22/0x40 > [ 491.683240] RIP: 0010:nf_conncount_gc_list+0x18a/0x200 [nf_conncount] > > Inside the nf_conncount_gc_list function, a lock is taken on > nf_conncount.c:spin_trylock_bh(&list->list_lock):335. After this, the > not-so-fast __nf_conncount_gc_list function is executed. If, at this > moment, a packet interrupt arrives on the same сpu core (and > spin_trylock_bh doesn't disable interrupts on that core), then scenario > I encountered occurs: the first lock remains held, while the packet > interrupt also attempts to acquire it at > nf_conncount.c:spin_lock_bh(&rbconn->list.list_lock):502 while > committing to conntrack. This attempt fails, leading to a soft lockup. > > Hence my question: shouldn't we avoid calling nf_conncount_gc_list when > querying limits without an skb (as OVS does in openvswitch/ > conntrack.c:1773)? The limit retrieval operation should be read-only > regarding the contract state, not involve potential modification. > > Like this: > --- a/net/netfilter/nf_conncount.c > +++ b/net/netfilter/nf_conncount.c > @@ -495,7 +495,6 @@ count_tree(struct net *net, > int ret; > > if (!skb) { > - nf_conncount_gc_list(net, &rbconn->list); > return rbconn->list.count; > } > > Thanks! > > -- > regards, > Alexandra. > + ovs-dev ML ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: netfilter: nf_conncount: cpu soft lockup using limiting with Open vSwitch. 2025-12-08 12:27 ` netfilter: nf_conncount: cpu soft lockup using limiting with Open vSwitch Odintsov Vladislav @ 2025-12-09 7:57 ` Fernando Fernandez Mancera 2025-12-09 8:42 ` Pablo Neira Ayuso 0 siblings, 1 reply; 10+ messages in thread From: Fernando Fernandez Mancera @ 2025-12-09 7:57 UTC (permalink / raw) To: Odintsov Vladislav, netfilter@vger.kernel.org, ovs-dev@openvswitch.org Cc: Kovalev Evgeniy, Vazhnetsov Anton, Rukomoinikova Aleksandra On 12/8/25 1:27 PM, Odintsov Vladislav wrote: > On 08.12.2025 15:06, Rukomoinikova Aleksandra wrote: >> Hi! >> I was testing conntrack limiting using Open vSwitch and noticed the >> following issue: under certain limits, a CPU lock occurred. >> >> [ 491.682936] watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [ovs- >> dpctl:19437] >> >> This occurs during a high packet frequency when trying to get the set >> limits through ovs-dpctl ct-get-limits. >> >> In the trace, I can see that the lock occurred on attempts to acquire a >> spinlock. >> >> [ 491.683056] <IRQ> >> [ 491.683059] _raw_spin_lock_bh+0x29/0x30 >> [ 491.683064] count_tree+0x19b/0x1f0 [nf_conncount] >> [ 491.683069] ovs_ct_commit+0x196/0x490 [openvswitch] >> >> Prior to this, in the trace, there was processing of a task from >> userspace (ovs-dpctl) >> >> [ 491.683236] </IRQ> >> [ 491.683237] <TASK> >> [ 491.683238] asm_common_interrupt+0x22/0x40 >> [ 491.683240] RIP: 0010:nf_conncount_gc_list+0x18a/0x200 [nf_conncount] >> >> Inside the nf_conncount_gc_list function, a lock is taken on >> nf_conncount.c:spin_trylock_bh(&list->list_lock):335. After this, the >> not-so-fast __nf_conncount_gc_list function is executed. If, at this >> moment, a packet interrupt arrives on the same сpu core (and >> spin_trylock_bh doesn't disable interrupts on that core), then scenario >> I encountered occurs: the first lock remains held, while the packet >> interrupt also attempts to acquire it at >> nf_conncount.c:spin_lock_bh(&rbconn->list.list_lock):502 while >> committing to conntrack. This attempt fails, leading to a soft lockup. >> Yes that makes sense. That nf_conncount_gc_list() was added there to cover a different scenario which might be also affected by this soft lockup under the same conditions. >> Hence my question: shouldn't we avoid calling nf_conncount_gc_list when >> querying limits without an skb (as OVS does in openvswitch/ >> conntrack.c:1773)? The limit retrieval operation should be read-only >> regarding the contract state, not involve potential modification. >> >> Like this: >> --- a/net/netfilter/nf_conncount.c >> +++ b/net/netfilter/nf_conncount.c >> @@ -495,7 +495,6 @@ count_tree(struct net *net, >> int ret; >> >> if (!skb) { >> - nf_conncount_gc_list(net, &rbconn->list); >> return rbconn->list.count; >> } >> Let me think on something, I would like to provide a solution that is suitable for OVS + xt/nft_connlimit. Because this change would break some xt_connlimit use-cases. Also without this nf_conncount_gc_list(), the connection count wouldn't be accurate.. if some connections closed already the count number would still consider them.. Thanks for reporting it, I will let you know my findings on how to solve this :-) Thanks, Fernando. >> Thanks! >> >> -- >> regards, >> Alexandra. >> > > + ovs-dev ML ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: netfilter: nf_conncount: cpu soft lockup using limiting with Open vSwitch. 2025-12-09 7:57 ` Fernando Fernandez Mancera @ 2025-12-09 8:42 ` Pablo Neira Ayuso 2025-12-09 8:49 ` Rukomoinikova Aleksandra 2025-12-10 8:03 ` Fernando Fernandez Mancera 0 siblings, 2 replies; 10+ messages in thread From: Pablo Neira Ayuso @ 2025-12-09 8:42 UTC (permalink / raw) To: Fernando Fernandez Mancera Cc: Odintsov Vladislav, netfilter@vger.kernel.org, ovs-dev@openvswitch.org, Kovalev Evgeniy, Vazhnetsov Anton, Rukomoinikova Aleksandra On Tue, Dec 09, 2025 at 08:57:59AM +0100, Fernando Fernandez Mancera wrote: > On 12/8/25 1:27 PM, Odintsov Vladislav wrote: > > On 08.12.2025 15:06, Rukomoinikova Aleksandra wrote: > > > Hi! > > > I was testing conntrack limiting using Open vSwitch and noticed the > > > following issue: under certain limits, a CPU lock occurred. > > > > > > [ 491.682936] watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [ovs- > > > dpctl:19437] > > > > > > This occurs during a high packet frequency when trying to get the set > > > limits through ovs-dpctl ct-get-limits. > > > > > > In the trace, I can see that the lock occurred on attempts to acquire a > > > spinlock. > > > > > > [ 491.683056] <IRQ> > > > [ 491.683059] _raw_spin_lock_bh+0x29/0x30 > > > [ 491.683064] count_tree+0x19b/0x1f0 [nf_conncount] > > > [ 491.683069] ovs_ct_commit+0x196/0x490 [openvswitch] > > > > > > Prior to this, in the trace, there was processing of a task from > > > userspace (ovs-dpctl) > > > > > > [ 491.683236] </IRQ> > > > [ 491.683237] <TASK> > > > [ 491.683238] asm_common_interrupt+0x22/0x40 > > > [ 491.683240] RIP: 0010:nf_conncount_gc_list+0x18a/0x200 [nf_conncount] > > > > > > Inside the nf_conncount_gc_list function, a lock is taken on > > > nf_conncount.c:spin_trylock_bh(&list->list_lock):335. After this, the > > > not-so-fast __nf_conncount_gc_list function is executed. If, at this > > > moment, a packet interrupt arrives on the same сpu core (and > > > spin_trylock_bh doesn't disable interrupts on that core), then scenario > > > I encountered occurs: the first lock remains held, while the packet > > > interrupt also attempts to acquire it at > > > nf_conncount.c:spin_lock_bh(&rbconn->list.list_lock):502 while > > > committing to conntrack. This attempt fails, leading to a soft lockup. > > > > > Yes that makes sense. That nf_conncount_gc_list() was added there to cover a > different scenario which might be also affected by this soft lockup under > the same conditions. See below, a quick browsing tells me OVS forgot to disable BH to perform this GC. > > > Hence my question: shouldn't we avoid calling nf_conncount_gc_list when > > > querying limits without an skb (as OVS does in openvswitch/ > > > conntrack.c:1773)? The limit retrieval operation should be read-only > > > regarding the contract state, not involve potential modification. > > > > > > Like this: > > > --- a/net/netfilter/nf_conncount.c > > > +++ b/net/netfilter/nf_conncount.c > > > @@ -495,7 +495,6 @@ count_tree(struct net *net, > > > int ret; > > > > > > if (!skb) { > > > - nf_conncount_gc_list(net, &rbconn->list); > > > return rbconn->list.count; > > > } > > > > > Let me think on something, I would like to provide a solution that is > suitable for OVS + xt/nft_connlimit. Because this change would break some > xt_connlimit use-cases. Also without this nf_conncount_gc_list(), the > connection count wouldn't be accurate.. if some connections closed already > the count number would still consider them.. Side note, this particular line only affects OVS, which is the only caller passing NULL as skb: net/netfilter/xt_connlimit.c: connections = nf_conncount_count_skb(net, skb, xt_family(par), info->data, key); net/openvswitch/conntrack.c: connections = nf_conncount_count_skb(net, skb, info->family, net/openvswitch/conntrack.c: zone_limit.count = nf_conncount_count_skb(net, NULL, 0, data, Another relevant aspect: nf_conncount_gc_list() is called _without_ disabling BH (before recent Fernando's changes). You fix it here, Fernando: commit c0362b5748282e22fa1592a8d3474f726ad964c2 Author: Fernando Fernandez Mancera <fmancera@suse.de> Date: Fri Nov 21 01:14:31 2025 +0100 netfilter: nf_conncount: make nf_conncount_gc_list() to disable BH I think it is only a matter of backporting it to -stable. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: netfilter: nf_conncount: cpu soft lockup using limiting with Open vSwitch. 2025-12-09 8:42 ` Pablo Neira Ayuso @ 2025-12-09 8:49 ` Rukomoinikova Aleksandra 2025-12-10 8:03 ` Fernando Fernandez Mancera 1 sibling, 0 replies; 10+ messages in thread From: Rukomoinikova Aleksandra @ 2025-12-09 8:49 UTC (permalink / raw) To: Pablo Neira Ayuso, Fernando Fernandez Mancera Cc: Odintsov Vladislav, netfilter@vger.kernel.org, ovs-dev@openvswitch.org, Kovalev Evgeniy, Vazhnetsov Anton, Rukomoinikova Aleksandra Hi! Yes, this really does fix the situation. I realized it only after I sent this letter) Thanks a lot Pablo and Fernando! On 09.12.2025 11:42, Pablo Neira Ayuso wrote: > On Tue, Dec 09, 2025 at 08:57:59AM +0100, Fernando Fernandez Mancera wrote: >> On 12/8/25 1:27 PM, Odintsov Vladislav wrote: >>> On 08.12.2025 15:06, Rukomoinikova Aleksandra wrote: >>>> Hi! >>>> I was testing conntrack limiting using Open vSwitch and noticed the >>>> following issue: under certain limits, a CPU lock occurred. >>>> >>>> [ 491.682936] watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [ovs- >>>> dpctl:19437] >>>> >>>> This occurs during a high packet frequency when trying to get the set >>>> limits through ovs-dpctl ct-get-limits. >>>> >>>> In the trace, I can see that the lock occurred on attempts to acquire a >>>> spinlock. >>>> >>>> [ 491.683056] <IRQ> >>>> [ 491.683059] _raw_spin_lock_bh+0x29/0x30 >>>> [ 491.683064] count_tree+0x19b/0x1f0 [nf_conncount] >>>> [ 491.683069] ovs_ct_commit+0x196/0x490 [openvswitch] >>>> >>>> Prior to this, in the trace, there was processing of a task from >>>> userspace (ovs-dpctl) >>>> >>>> [ 491.683236] </IRQ> >>>> [ 491.683237] <TASK> >>>> [ 491.683238] asm_common_interrupt+0x22/0x40 >>>> [ 491.683240] RIP: 0010:nf_conncount_gc_list+0x18a/0x200 [nf_conncount] >>>> >>>> Inside the nf_conncount_gc_list function, a lock is taken on >>>> nf_conncount.c:spin_trylock_bh(&list->list_lock):335. After this, the >>>> not-so-fast __nf_conncount_gc_list function is executed. If, at this >>>> moment, a packet interrupt arrives on the same сpu core (and >>>> spin_trylock_bh doesn't disable interrupts on that core), then scenario >>>> I encountered occurs: the first lock remains held, while the packet >>>> interrupt also attempts to acquire it at >>>> nf_conncount.c:spin_lock_bh(&rbconn->list.list_lock):502 while >>>> committing to conntrack. This attempt fails, leading to a soft lockup. >>>> >> Yes that makes sense. That nf_conncount_gc_list() was added there to cover a >> different scenario which might be also affected by this soft lockup under >> the same conditions. > See below, a quick browsing tells me OVS forgot to disable BH to > perform this GC. > >>>> Hence my question: shouldn't we avoid calling nf_conncount_gc_list when >>>> querying limits without an skb (as OVS does in openvswitch/ >>>> conntrack.c:1773)? The limit retrieval operation should be read-only >>>> regarding the contract state, not involve potential modification. >>>> >>>> Like this: >>>> --- a/net/netfilter/nf_conncount.c >>>> +++ b/net/netfilter/nf_conncount.c >>>> @@ -495,7 +495,6 @@ count_tree(struct net *net, >>>> int ret; >>>> >>>> if (!skb) { >>>> - nf_conncount_gc_list(net, &rbconn->list); >>>> return rbconn->list.count; >>>> } >>>> >> Let me think on something, I would like to provide a solution that is >> suitable for OVS + xt/nft_connlimit. Because this change would break some >> xt_connlimit use-cases. Also without this nf_conncount_gc_list(), the >> connection count wouldn't be accurate.. if some connections closed already >> the count number would still consider them.. > Side note, this particular line only affects OVS, which is the only > caller passing NULL as skb: > > net/netfilter/xt_connlimit.c: connections = nf_conncount_count_skb(net, skb, xt_family(par), info->data, key); > net/openvswitch/conntrack.c: connections = nf_conncount_count_skb(net, skb, info->family, > net/openvswitch/conntrack.c: zone_limit.count = nf_conncount_count_skb(net, NULL, 0, data, > > Another relevant aspect: nf_conncount_gc_list() is called _without_ > disabling BH (before recent Fernando's changes). > > You fix it here, Fernando: > > commit c0362b5748282e22fa1592a8d3474f726ad964c2 > Author: Fernando Fernandez Mancera <fmancera@suse.de> > Date: Fri Nov 21 01:14:31 2025 +0100 > > netfilter: nf_conncount: make nf_conncount_gc_list() to disable BH > > I think it is only a matter of backporting it to -stable. -- regards, Alexandra. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: netfilter: nf_conncount: cpu soft lockup using limiting with Open vSwitch. 2025-12-09 8:42 ` Pablo Neira Ayuso 2025-12-09 8:49 ` Rukomoinikova Aleksandra @ 2025-12-10 8:03 ` Fernando Fernandez Mancera 2025-12-12 21:27 ` Rukomoinikova Aleksandra 1 sibling, 1 reply; 10+ messages in thread From: Fernando Fernandez Mancera @ 2025-12-10 8:03 UTC (permalink / raw) To: Pablo Neira Ayuso Cc: Odintsov Vladislav, netfilter@vger.kernel.org, ovs-dev@openvswitch.org, Kovalev Evgeniy, Vazhnetsov Anton, Rukomoinikova Aleksandra On 12/9/25 9:42 AM, Pablo Neira Ayuso wrote: > On Tue, Dec 09, 2025 at 08:57:59AM +0100, Fernando Fernandez Mancera wrote: >> On 12/8/25 1:27 PM, Odintsov Vladislav wrote: >>> On 08.12.2025 15:06, Rukomoinikova Aleksandra wrote: >>>> Hi! >>>> I was testing conntrack limiting using Open vSwitch and noticed the >>>> following issue: under certain limits, a CPU lock occurred. >>>> >>>> [ 491.682936] watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [ovs- >>>> dpctl:19437] >>>> >>>> This occurs during a high packet frequency when trying to get the set >>>> limits through ovs-dpctl ct-get-limits. >>>> >>>> In the trace, I can see that the lock occurred on attempts to acquire a >>>> spinlock. >>>> >>>> [ 491.683056] <IRQ> >>>> [ 491.683059] _raw_spin_lock_bh+0x29/0x30 >>>> [ 491.683064] count_tree+0x19b/0x1f0 [nf_conncount] >>>> [ 491.683069] ovs_ct_commit+0x196/0x490 [openvswitch] >>>> >>>> Prior to this, in the trace, there was processing of a task from >>>> userspace (ovs-dpctl) >>>> >>>> [ 491.683236] </IRQ> >>>> [ 491.683237] <TASK> >>>> [ 491.683238] asm_common_interrupt+0x22/0x40 >>>> [ 491.683240] RIP: 0010:nf_conncount_gc_list+0x18a/0x200 [nf_conncount] >>>> >>>> Inside the nf_conncount_gc_list function, a lock is taken on >>>> nf_conncount.c:spin_trylock_bh(&list->list_lock):335. After this, the >>>> not-so-fast __nf_conncount_gc_list function is executed. If, at this >>>> moment, a packet interrupt arrives on the same сpu core (and >>>> spin_trylock_bh doesn't disable interrupts on that core), then scenario >>>> I encountered occurs: the first lock remains held, while the packet >>>> interrupt also attempts to acquire it at >>>> nf_conncount.c:spin_lock_bh(&rbconn->list.list_lock):502 while >>>> committing to conntrack. This attempt fails, leading to a soft lockup. >>>> >> >> Yes that makes sense. That nf_conncount_gc_list() was added there to cover a >> different scenario which might be also affected by this soft lockup under >> the same conditions. > > See below, a quick browsing tells me OVS forgot to disable BH to > perform this GC. > >>>> Hence my question: shouldn't we avoid calling nf_conncount_gc_list when >>>> querying limits without an skb (as OVS does in openvswitch/ >>>> conntrack.c:1773)? The limit retrieval operation should be read-only >>>> regarding the contract state, not involve potential modification. >>>> >>>> Like this: >>>> --- a/net/netfilter/nf_conncount.c >>>> +++ b/net/netfilter/nf_conncount.c >>>> @@ -495,7 +495,6 @@ count_tree(struct net *net, >>>> int ret; >>>> >>>> if (!skb) { >>>> - nf_conncount_gc_list(net, &rbconn->list); >>>> return rbconn->list.count; >>>> } >>>> >> >> Let me think on something, I would like to provide a solution that is >> suitable for OVS + xt/nft_connlimit. Because this change would break some >> xt_connlimit use-cases. Also without this nf_conncount_gc_list(), the >> connection count wouldn't be accurate.. if some connections closed already >> the count number would still consider them.. > > Side note, this particular line only affects OVS, which is the only > caller passing NULL as skb: > > net/netfilter/xt_connlimit.c: connections = nf_conncount_count_skb(net, skb, xt_family(par), info->data, key); > net/openvswitch/conntrack.c: connections = nf_conncount_count_skb(net, skb, info->family, > net/openvswitch/conntrack.c: zone_limit.count = nf_conncount_count_skb(net, NULL, 0, data, > > Another relevant aspect: nf_conncount_gc_list() is called _without_ > disabling BH (before recent Fernando's changes). > > You fix it here, Fernando: > > commit c0362b5748282e22fa1592a8d3474f726ad964c2 > Author: Fernando Fernandez Mancera <fmancera@suse.de> > Date: Fri Nov 21 01:14:31 2025 +0100 > > netfilter: nf_conncount: make nf_conncount_gc_list() to disable BH > > I think it is only a matter of backporting it to -stable. That is right, thanks Pablo. Just a note, that commit doesn't have a fixes tag because I just did it to simplify its use so it won't be picked automatically.. should we send a request to stable mailing list? Thanks, Fernando. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: netfilter: nf_conncount: cpu soft lockup using limiting with Open vSwitch. 2025-12-10 8:03 ` Fernando Fernandez Mancera @ 2025-12-12 21:27 ` Rukomoinikova Aleksandra 2025-12-15 11:00 ` Fernando Fernandez Mancera 0 siblings, 1 reply; 10+ messages in thread From: Rukomoinikova Aleksandra @ 2025-12-12 21:27 UTC (permalink / raw) To: Fernando Fernandez Mancera, Pablo Neira Ayuso Cc: Odintsov Vladislav, netfilter@vger.kernel.org, ovs-dev@openvswitch.org, Kovalev Evgeniy, Vazhnetsov Anton, Rukomoinikova Aleksandra Hi one more time! I found another issue. I'll describe it below. In my opinion, it's relevant after merging [1] I saw that a fix for this commit was merged last week, but but it doesn't fix case I'll describe below. I create limits via Open vSwitch and run a TCP flood this way: hping3 -S -I host -p 10880 -i u5 10.255.41.101 -c 100 (Here, -i u5 is important, meaning with timers < jiffies; it's more likely the issue won't reproduce otherwise) And I set the following limit on the zone where these connections arrive: ovs-dpctl ct-set-limits zone=9,limit=100 I start traffic, I see traffic on the interface: TCP SYN -> TCP SYN+ACK -> TCP RST. Zone 9 overflows, but connections immediately become CLOSED. I run hping3 again: hping3 -S -I host -p 10880 -i u5 10.255.41.101 -c 100 - I don't see anything on the interface and I see messages in dmesg from openvswitch saying the number of connections exceeds the limit. At this moment, if I call ovs-dpctl ct-get-limits, traffic will immediately start flowing again because the limit will be reset to zero. I think the problem is as follows: before commit [1], we called __nf_conncount_gc_list for every connection, and this function iterates over all connections and cleans up those already closed. What we have now is that when trying to add a connection in __nf_conncount_add, if we don't find it, then while handling errors, we don't continue the iteration further and immediately exit the function with zero, which represents the current connection count - we will clean connections in the list only until we find the connection we want to commit now - meaning the connection count will become outdated. Furthermore, we then go to check already_closed found connections and iterate collect variable, which also doesn't allow connections to be fully cleaned up; we will clean up a maximum of 8 (CONNCOUNT_GC_MAX_NODES) entries per one call to __nf_conncount_add. Also, with such a TCP attack - when connections transition to CLOSED immediately - __nf_conncount_gc_list won't be called at all, because we will constantly be calling __nf_conncount_add and updating the last_gc status. That's why ovs-dpctl ct-get-limits helps; it simply calls __nf_conncount_gc_list and cleans up all closed connections. I propose the following behavior, which will be similar to what we had before [2] diff --git a/net/netfilter/nf_conncount.c b/net/netfilter/nf_conncount.c index 19039a0802b8..e5224785f01e 100644 --- a/net/netfilter/nf_conncount.c +++ b/net/netfilter/nf_conncount.c @@ -171,6 +171,7 @@ static int __nf_conncount_add(struct net *net, struct nf_conn *found_ct; unsigned int collect = 0; bool refcounted = false; + bool need_add = false; if (!get_ct_or_tuple_from_skb(net, skb, l3num, &ct, &tuple, &zone, &refcounted)) return -ENOENT; @@ -196,7 +197,8 @@ static int __nf_conncount_add(struct net *net, if (nf_ct_tuple_equal(&conn->tuple, &tuple) && nf_ct_zone_id(&conn->zone, conn->zone.dir) == nf_ct_zone_id(zone, zone->dir)) - goto out_put; /* already exists */ + /* already exists */ + need_add = false; } else { collect++; } @@ -214,7 +216,7 @@ static int __nf_conncount_add(struct net *net, * Attempt to avoid a re-add in this case. */ nf_ct_put(found_ct); - goto out_put; + need_add = false; } else if (already_closed(found_ct)) { /* * we do not care about connections which are @@ -222,13 +224,16 @@ static int __nf_conncount_add(struct net *net, */ nf_ct_put(found_ct); conn_free(list, conn); - collect++; continue; } nf_ct_put(found_ct); } + if (!need_add) { + goto out_put; + } + add_new_node: [1] netfilter: nf_conncount: reduce unnecessary GC commit https://github.com/torvalds/linux/commit/d265929930e2ffafc744c0ae05fb70acd53be1ee [2] netfilter: nf_conncount: merge lookup and add functions commit. https://github.com/torvalds/linux/commit/df4a902509766897f7371fdfa4c3bf8bc321b55d [3] netfilter: nft_connlimit: update the count if add was skipped https://github.com/torvalds/linux/commit/69894e5b4c5e28cda5f32af33d4a92b7a4b93b0e Do you think I missed any cases, and how will this affect the function's performance? Thanks) On 10.12.2025 11:03, Fernando Fernandez Mancera wrote: > > On 12/9/25 9:42 AM, Pablo Neira Ayuso wrote: >> On Tue, Dec 09, 2025 at 08:57:59AM +0100, Fernando Fernandez Mancera >> wrote: >>> On 12/8/25 1:27 PM, Odintsov Vladislav wrote: >>>> On 08.12.2025 15:06, Rukomoinikova Aleksandra wrote: >>>>> Hi! >>>>> I was testing conntrack limiting using Open vSwitch and noticed the >>>>> following issue: under certain limits, a CPU lock occurred. >>>>> >>>>> [ 491.682936] watchdog: BUG: soft lockup - CPU#1 stuck for 26s! >>>>> [ovs- >>>>> dpctl:19437] >>>>> >>>>> This occurs during a high packet frequency when trying to get the set >>>>> limits through ovs-dpctl ct-get-limits. >>>>> >>>>> In the trace, I can see that the lock occurred on attempts to >>>>> acquire a >>>>> spinlock. >>>>> >>>>> [ 491.683056] <IRQ> >>>>> [ 491.683059] _raw_spin_lock_bh+0x29/0x30 >>>>> [ 491.683064] count_tree+0x19b/0x1f0 [nf_conncount] >>>>> [ 491.683069] ovs_ct_commit+0x196/0x490 [openvswitch] >>>>> >>>>> Prior to this, in the trace, there was processing of a task from >>>>> userspace (ovs-dpctl) >>>>> >>>>> [ 491.683236] </IRQ> >>>>> [ 491.683237] <TASK> >>>>> [ 491.683238] asm_common_interrupt+0x22/0x40 >>>>> [ 491.683240] RIP: 0010:nf_conncount_gc_list+0x18a/0x200 >>>>> [nf_conncount] >>>>> >>>>> Inside the nf_conncount_gc_list function, a lock is taken on >>>>> nf_conncount.c:spin_trylock_bh(&list->list_lock):335. After this, the >>>>> not-so-fast __nf_conncount_gc_list function is executed. If, at this >>>>> moment, a packet interrupt arrives on the same сpu core (and >>>>> spin_trylock_bh doesn't disable interrupts on that core), then >>>>> scenario >>>>> I encountered occurs: the first lock remains held, while the packet >>>>> interrupt also attempts to acquire it at >>>>> nf_conncount.c:spin_lock_bh(&rbconn->list.list_lock):502 while >>>>> committing to conntrack. This attempt fails, leading to a soft >>>>> lockup. >>>>> >>> >>> Yes that makes sense. That nf_conncount_gc_list() was added there to >>> cover a >>> different scenario which might be also affected by this soft lockup >>> under >>> the same conditions. >> >> See below, a quick browsing tells me OVS forgot to disable BH to >> perform this GC. >> >>>>> Hence my question: shouldn't we avoid calling nf_conncount_gc_list >>>>> when >>>>> querying limits without an skb (as OVS does in openvswitch/ >>>>> conntrack.c:1773)? The limit retrieval operation should be read-only >>>>> regarding the contract state, not involve potential modification. >>>>> >>>>> Like this: >>>>> --- a/net/netfilter/nf_conncount.c >>>>> +++ b/net/netfilter/nf_conncount.c >>>>> @@ -495,7 +495,6 @@ count_tree(struct net *net, >>>>> int ret; >>>>> >>>>> if (!skb) { >>>>> - nf_conncount_gc_list(net, &rbconn->list); >>>>> return rbconn->list.count; >>>>> } >>>>> >>> >>> Let me think on something, I would like to provide a solution that is >>> suitable for OVS + xt/nft_connlimit. Because this change would break >>> some >>> xt_connlimit use-cases. Also without this nf_conncount_gc_list(), the >>> connection count wouldn't be accurate.. if some connections closed >>> already >>> the count number would still consider them.. >> >> Side note, this particular line only affects OVS, which is the only >> caller passing NULL as skb: >> >> net/netfilter/xt_connlimit.c: connections = >> nf_conncount_count_skb(net, skb, xt_family(par), info->data, key); >> net/openvswitch/conntrack.c: connections = >> nf_conncount_count_skb(net, skb, info->family, >> net/openvswitch/conntrack.c: zone_limit.count = >> nf_conncount_count_skb(net, NULL, 0, data, >> >> Another relevant aspect: nf_conncount_gc_list() is called _without_ >> disabling BH (before recent Fernando's changes). >> >> You fix it here, Fernando: >> >> commit c0362b5748282e22fa1592a8d3474f726ad964c2 >> Author: Fernando Fernandez Mancera <fmancera@suse.de> >> Date: Fri Nov 21 01:14:31 2025 +0100 >> >> netfilter: nf_conncount: make nf_conncount_gc_list() to disable BH >> >> I think it is only a matter of backporting it to -stable. > > That is right, thanks Pablo. Just a note, that commit doesn't have a > fixes tag because I just did it to simplify its use so it won't be > picked automatically.. should we send a request to stable mailing list? > > Thanks, > Fernando. -- regards, Alexandra. ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: netfilter: nf_conncount: cpu soft lockup using limiting with Open vSwitch. 2025-12-12 21:27 ` Rukomoinikova Aleksandra @ 2025-12-15 11:00 ` Fernando Fernandez Mancera 2025-12-15 11:07 ` Rukomoinikova Aleksandra 0 siblings, 1 reply; 10+ messages in thread From: Fernando Fernandez Mancera @ 2025-12-15 11:00 UTC (permalink / raw) To: Rukomoinikova Aleksandra, Pablo Neira Ayuso Cc: Odintsov Vladislav, netfilter@vger.kernel.org, ovs-dev@openvswitch.org, Kovalev Evgeniy, Vazhnetsov Anton On 12/12/25 10:27 PM, Rukomoinikova Aleksandra wrote: > Hi one more time! I found another issue. I'll describe it below. > > In my opinion, it's relevant after merging [1] I saw that a fix for > this commit was merged last week, but but it doesn't fix case I'll > describe below. > Just to be sure, you have tested this with the latest mainline kernel, right? Because as you mentioned we merged several fixes addressing outdated counts. > I create limits via Open vSwitch and run a TCP flood this way: hping3 -S > -I host -p 10880 -i u5 10.255.41.101 -c 100 (Here, -i u5 is important, > meaning with timers < jiffies; it's more likely the issue won't > reproduce otherwise) > And I set the following limit on the zone where these connections > arrive: ovs-dpctl ct-set-limits zone=9,limit=100 > > I start traffic, I see traffic on the interface: TCP SYN -> TCP SYN+ACK > -> TCP RST. Zone 9 overflows, but connections immediately become CLOSED. > I run hping3 again: hping3 -S -I host -p 10880 -i u5 10.255.41.101 -c > 100 - I don't see anything on the interface and I see messages in dmesg > from openvswitch saying the number of connections exceeds the limit. At > this moment, if I call ovs-dpctl ct-get-limits, traffic will immediately > start flowing again because the limit will be reset to zero. > To me it seems that openvswitch should perform a GC somewhere similar to what we did on nft_connlimit/xt_connlimit. > I think the problem is as follows: before commit [1], we called > __nf_conncount_gc_list for every connection, and this function iterates > over all connections and cleans up those already closed. > > What we have now is that when trying to add a connection in > __nf_conncount_add, if we don't find it, then while handling errors, we > don't continue the iteration further and immediately exit the function > with zero, which represents the current connection count - we will clean> connections in the list only until we find the connection we want to > commit now - meaning the connection count will become outdated. > > Furthermore, we then go to check already_closed found connections and > iterate collect variable, which also doesn't allow connections to be > fully cleaned up; we will clean up a maximum of 8 > (CONNCOUNT_GC_MAX_NODES) entries per one call to __nf_conncount_add. > > Also, with such a TCP attack - when connections transition to CLOSED > immediately - __nf_conncount_gc_list won't be called at all, because we > will constantly be calling __nf_conncount_add and updating the last_gc > status. That's why ovs-dpctl ct-get-limits helps; it simply calls > __nf_conncount_gc_list and cleans up all closed connections. > I propose the following behavior, which will be similar to what we had > before [2] > > diff --git a/net/netfilter/nf_conncount.c b/net/netfilter/nf_conncount.c > index 19039a0802b8..e5224785f01e 100644 > --- a/net/netfilter/nf_conncount.c > +++ b/net/netfilter/nf_conncount.c > @@ -171,6 +171,7 @@ static int __nf_conncount_add(struct net *net, > struct nf_conn *found_ct; > unsigned int collect = 0; > bool refcounted = false; > + bool need_add = false; > > if (!get_ct_or_tuple_from_skb(net, skb, l3num, &ct, &tuple, &zone, > &refcounted)) > return -ENOENT; > @@ -196,7 +197,8 @@ static int __nf_conncount_add(struct net *net, > if (nf_ct_tuple_equal(&conn->tuple, &tuple) && > nf_ct_zone_id(&conn->zone, conn->zone.dir) == > nf_ct_zone_id(zone, zone->dir)) > - goto out_put; /* already exists */ > + /* already exists */ > + need_add = false; I don't see the point to continue here. If we reached this, it means the ct is already tracked. Sure, the count is not being updated but does it matter? This connection is already tracked. > } else { > collect++; > } > @@ -214,7 +216,7 @@ static int __nf_conncount_add(struct net *net, > * Attempt to avoid a re-add in this case. > */ > nf_ct_put(found_ct); > - goto out_put; > + need_add = false; > } else if (already_closed(found_ct)) { > /* > * we do not care about connections which are > @@ -222,13 +224,16 @@ static int __nf_conncount_add(struct net *net, > */ > nf_ct_put(found_ct); > conn_free(list, conn); > - collect++; > continue; > } This worries me a bit, it would mean that for every legit add operation the function will go through ALL the connections tracked which might be a really huge number. This would impact the performance. IMO, this is the only relevant line on the patch for your use-case probably. I do not think the others have any impact. I am wondering if this can be fixed by handling it from openvswitch side by calling gc when needed. I am going to try to reproduce this locally on a machine I have. Let's see what I can get. Thanks, Fernando. > > nf_ct_put(found_ct); > } > > + if (!need_add) { > + goto out_put; > + } > + > add_new_node: > > [1] netfilter: nf_conncount: reduce unnecessary GC commit > https://github.com/torvalds/linux/commit/d265929930e2ffafc744c0ae05fb70acd53be1ee > [2] netfilter: nf_conncount: merge lookup and add functions commit. > https://github.com/torvalds/linux/commit/df4a902509766897f7371fdfa4c3bf8bc321b55d > [3] netfilter: nft_connlimit: update the count if add was skipped > https://github.com/torvalds/linux/commit/69894e5b4c5e28cda5f32af33d4a92b7a4b93b0e > > Do you think I missed any cases, and how will this affect the function's > performance? Thanks) > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: netfilter: nf_conncount: cpu soft lockup using limiting with Open vSwitch. 2025-12-15 11:00 ` Fernando Fernandez Mancera @ 2025-12-15 11:07 ` Rukomoinikova Aleksandra 2025-12-15 15:38 ` Fernando Fernandez Mancera 0 siblings, 1 reply; 10+ messages in thread From: Rukomoinikova Aleksandra @ 2025-12-15 11:07 UTC (permalink / raw) To: Fernando Fernandez Mancera, Rukomoinikova Aleksandra, Pablo Neira Ayuso Cc: Odintsov Vladislav, netfilter@vger.kernel.org, ovs-dev@openvswitch.org, Kovalev Evgeniy, Vazhnetsov Anton On 15.12.2025 14:00, Fernando Fernandez Mancera wrote: > On 12/12/25 10:27 PM, Rukomoinikova Aleksandra wrote: >> Hi one more time! I found another issue. I'll describe it below. >> >> In my opinion, it's relevant after merging [1] I saw that a fix for >> this commit was merged last week, but but it doesn't fix case I'll >> describe below. >> > > Just to be sure, you have tested this with the latest mainline kernel, > right? Because as you mentioned we merged several fixes addressing > outdated counts. HI! yes > >> I create limits via Open vSwitch and run a TCP flood this way: hping3 -S >> -I host -p 10880 -i u5 10.255.41.101 -c 100 (Here, -i u5 is important, >> meaning with timers < jiffies; it's more likely the issue won't >> reproduce otherwise) >> And I set the following limit on the zone where these connections >> arrive: ovs-dpctl ct-set-limits zone=9,limit=100 >> >> I start traffic, I see traffic on the interface: TCP SYN -> TCP SYN+ACK >> -> TCP RST. Zone 9 overflows, but connections immediately become CLOSED. >> I run hping3 again: hping3 -S -I host -p 10880 -i u5 10.255.41.101 -c >> 100 - I don't see anything on the interface and I see messages in dmesg >> from openvswitch saying the number of connections exceeds the limit. At >> this moment, if I call ovs-dpctl ct-get-limits, traffic will immediately >> start flowing again because the limit will be reset to zero. >> > > To me it seems that openvswitch should perform a GC somewhere similar to > what we did on nft_connlimit/xt_connlimit. hm, Thanks! I'll try to think about implementing this. > >> I think the problem is as follows: before commit [1], we called >> __nf_conncount_gc_list for every connection, and this function iterates >> over all connections and cleans up those already closed. >> >> What we have now is that when trying to add a connection in >> __nf_conncount_add, if we don't find it, then while handling errors, we >> don't continue the iteration further and immediately exit the function >> with zero, which represents the current connection count - we will >> clean> connections in the list only until we find the connection we >> want to >> commit now - meaning the connection count will become outdated. >> >> Furthermore, we then go to check already_closed found connections and >> iterate collect variable, which also doesn't allow connections to be >> fully cleaned up; we will clean up a maximum of 8 >> (CONNCOUNT_GC_MAX_NODES) entries per one call to __nf_conncount_add. >> >> Also, with such a TCP attack - when connections transition to CLOSED >> immediately - __nf_conncount_gc_list won't be called at all, because we >> will constantly be calling __nf_conncount_add and updating the last_gc >> status. That's why ovs-dpctl ct-get-limits helps; it simply calls >> __nf_conncount_gc_list and cleans up all closed connections. >> I propose the following behavior, which will be similar to what we had >> before [2] >> >> diff --git a/net/netfilter/nf_conncount.c b/net/netfilter/nf_conncount.c >> index 19039a0802b8..e5224785f01e 100644 >> --- a/net/netfilter/nf_conncount.c >> +++ b/net/netfilter/nf_conncount.c >> @@ -171,6 +171,7 @@ static int __nf_conncount_add(struct net *net, >> struct nf_conn *found_ct; >> unsigned int collect = 0; >> bool refcounted = false; >> + bool need_add = false; >> >> if (!get_ct_or_tuple_from_skb(net, skb, l3num, &ct, &tuple, >> &zone, >> &refcounted)) >> return -ENOENT; >> @@ -196,7 +197,8 @@ static int __nf_conncount_add(struct net *net, >> if (nf_ct_tuple_equal(&conn->tuple, &tuple) && >> nf_ct_zone_id(&conn->zone, conn->zone.dir) == >> nf_ct_zone_id(zone, zone->dir)) >> - goto out_put; /* already exists */ >> + /* already exists */ >> + need_add = false; > > I don't see the point to continue here. If we reached this, it means the > ct is already tracked. Sure, the count is not being updated but does it > matter? This connection is already tracked. > >> } else { >> collect++; >> } >> @@ -214,7 +216,7 @@ static int __nf_conncount_add(struct net *net, >> * Attempt to avoid a re-add in this case. >> */ >> nf_ct_put(found_ct); >> - goto out_put; >> + need_add = false; >> } else if (already_closed(found_ct)) { >> /* >> * we do not care about connections which are >> @@ -222,13 +224,16 @@ static int __nf_conncount_add(struct net *net, >> */ >> nf_ct_put(found_ct); >> conn_free(list, conn); >> - collect++; >> continue; >> } > > This worries me a bit, it would mean that for every legit add operation > the function will go through ALL the connections tracked which might be > a really huge number. This would impact the performance. > > IMO, this is the only relevant line on the patch for your use-case > probably. I do not think the others have any impact. I am wondering if > this can be fixed by handling it from openvswitch side by calling gc > when needed. > > I am going to try to reproduce this locally on a machine I have. Let's > see what I can get. Thanks! > > Thanks, > Fernando. > >> >> nf_ct_put(found_ct); >> } >> >> + if (!need_add) { >> + goto out_put; >> + } >> + >> add_new_node: >> >> [1] netfilter: nf_conncount: reduce unnecessary GC commit >> https://github.com/torvalds/linux/commit/d265929930e2ffafc744c0ae05fb70acd53be1ee >> >> [2] netfilter: nf_conncount: merge lookup and add functions commit. >> https://github.com/torvalds/linux/commit/df4a902509766897f7371fdfa4c3bf8bc321b55d >> >> [3] netfilter: nft_connlimit: update the count if add was skipped >> https://github.com/torvalds/linux/commit/69894e5b4c5e28cda5f32af33d4a92b7a4b93b0e >> >> >> Do you think I missed any cases, and how will this affect the function's >> performance? Thanks) >> > > -- regards, Alexandra. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: netfilter: nf_conncount: cpu soft lockup using limiting with Open vSwitch. 2025-12-15 11:07 ` Rukomoinikova Aleksandra @ 2025-12-15 15:38 ` Fernando Fernandez Mancera 0 siblings, 0 replies; 10+ messages in thread From: Fernando Fernandez Mancera @ 2025-12-15 15:38 UTC (permalink / raw) To: Rukomoinikova Aleksandra, Pablo Neira Ayuso Cc: Odintsov Vladislav, netfilter@vger.kernel.org, ovs-dev@openvswitch.org, Kovalev Evgeniy, Vazhnetsov Anton On 12/15/25 12:07 PM, Rukomoinikova Aleksandra wrote: > On 15.12.2025 14:00, Fernando Fernandez Mancera wrote: >> On 12/12/25 10:27 PM, Rukomoinikova Aleksandra wrote: >>> Hi one more time! I found another issue. I'll describe it below. >>> >>> In my opinion, it's relevant after merging [1] I saw that a fix for >>> this commit was merged last week, but but it doesn't fix case I'll >>> describe below. >>> >> >> Just to be sure, you have tested this with the latest mainline kernel, >> right? Because as you mentioned we merged several fixes addressing >> outdated counts. > HI! yes >> >>> I create limits via Open vSwitch and run a TCP flood this way: hping3 -S >>> -I host -p 10880 -i u5 10.255.41.101 -c 100 (Here, -i u5 is important, >>> meaning with timers < jiffies; it's more likely the issue won't >>> reproduce otherwise) >>> And I set the following limit on the zone where these connections >>> arrive: ovs-dpctl ct-set-limits zone=9,limit=100 >>> >>> I start traffic, I see traffic on the interface: TCP SYN -> TCP SYN+ACK >>> -> TCP RST. Zone 9 overflows, but connections immediately become CLOSED. >>> I run hping3 again: hping3 -S -I host -p 10880 -i u5 10.255.41.101 -c >>> 100 - I don't see anything on the interface and I see messages in dmesg >>> from openvswitch saying the number of connections exceeds the limit. At >>> this moment, if I call ovs-dpctl ct-get-limits, traffic will immediately >>> start flowing again because the limit will be reset to zero. >>> >> >> To me it seems that openvswitch should perform a GC somewhere similar to >> what we did on nft_connlimit/xt_connlimit. > hm, Thanks! I'll try to think about implementing this. I investigated this and managed to reproduce it. The hping3 with the option you mentioned is sending a burst of packets that makes openvswitch to process more than 8 packets per jiffy. That is relevant because 2 different things: 1. Currently __nf_conncount_add() stops cleaning up connections after 8 connections. 2. The optimization introduced to only perform one GC per jiffy. Therefore, if someone manage to pass traffic fast enough you can grow that number quite fast. I would like to propose to increase that number to 64, although it might make sense to match net.core.netdev_budget default value which is 300 currently. I have tested both values with OVS and solves the problem. Any opinion? What do you think? Also, just as a note: this issue was introduced by d265929930e2 (" netfilter: nf_conncount: reduce unnecessary GC"), I bisected it. >> >>> I think the problem is as follows: before commit [1], we called >>> __nf_conncount_gc_list for every connection, and this function iterates >>> over all connections and cleans up those already closed. >>> >>> What we have now is that when trying to add a connection in >>> __nf_conncount_add, if we don't find it, then while handling errors, we >>> don't continue the iteration further and immediately exit the function >>> with zero, which represents the current connection count - we will >>> clean> connections in the list only until we find the connection we >>> want to >>> commit now - meaning the connection count will become outdated. >>> >>> Furthermore, we then go to check already_closed found connections and >>> iterate collect variable, which also doesn't allow connections to be >>> fully cleaned up; we will clean up a maximum of 8 >>> (CONNCOUNT_GC_MAX_NODES) entries per one call to __nf_conncount_add. >>> >>> Also, with such a TCP attack - when connections transition to CLOSED >>> immediately - __nf_conncount_gc_list won't be called at all, because we >>> will constantly be calling __nf_conncount_add and updating the last_gc >>> status. That's why ovs-dpctl ct-get-limits helps; it simply calls >>> __nf_conncount_gc_list and cleans up all closed connections. >>> I propose the following behavior, which will be similar to what we had >>> before [2] >>> >>> diff --git a/net/netfilter/nf_conncount.c b/net/netfilter/nf_conncount.c >>> index 19039a0802b8..e5224785f01e 100644 >>> --- a/net/netfilter/nf_conncount.c >>> +++ b/net/netfilter/nf_conncount.c >>> @@ -171,6 +171,7 @@ static int __nf_conncount_add(struct net *net, >>> struct nf_conn *found_ct; >>> unsigned int collect = 0; >>> bool refcounted = false; >>> + bool need_add = false; >>> >>> if (!get_ct_or_tuple_from_skb(net, skb, l3num, &ct, &tuple, >>> &zone, >>> &refcounted)) >>> return -ENOENT; >>> @@ -196,7 +197,8 @@ static int __nf_conncount_add(struct net *net, >>> if (nf_ct_tuple_equal(&conn->tuple, &tuple) && >>> nf_ct_zone_id(&conn->zone, conn->zone.dir) == >>> nf_ct_zone_id(zone, zone->dir)) >>> - goto out_put; /* already exists */ >>> + /* already exists */ >>> + need_add = false; >> >> I don't see the point to continue here. If we reached this, it means the >> ct is already tracked. Sure, the count is not being updated but does it >> matter? This connection is already tracked. >> >>> } else { >>> collect++; >>> } >>> @@ -214,7 +216,7 @@ static int __nf_conncount_add(struct net *net, >>> * Attempt to avoid a re-add in this case. >>> */ >>> nf_ct_put(found_ct); >>> - goto out_put; >>> + need_add = false; >>> } else if (already_closed(found_ct)) { >>> /* >>> * we do not care about connections which are >>> @@ -222,13 +224,16 @@ static int __nf_conncount_add(struct net *net, >>> */ >>> nf_ct_put(found_ct); >>> conn_free(list, conn); >>> - collect++; >>> continue; >>> } >> >> This worries me a bit, it would mean that for every legit add operation >> the function will go through ALL the connections tracked which might be >> a really huge number. This would impact the performance. >> >> IMO, this is the only relevant line on the patch for your use-case >> probably. I do not think the others have any impact. I am wondering if >> this can be fixed by handling it from openvswitch side by calling gc >> when needed. >> >> I am going to try to reproduce this locally on a machine I have. Let's >> see what I can get. > Thanks! >> >> Thanks, >> Fernando. >> >>> >>> nf_ct_put(found_ct); >>> } >>> >>> + if (!need_add) { >>> + goto out_put; >>> + } >>> + >>> add_new_node: >>> >>> [1] netfilter: nf_conncount: reduce unnecessary GC commit >>> https://github.com/torvalds/linux/commit/d265929930e2ffafc744c0ae05fb70acd53be1ee >>> >>> [2] netfilter: nf_conncount: merge lookup and add functions commit. >>> https://github.com/torvalds/linux/commit/df4a902509766897f7371fdfa4c3bf8bc321b55d >>> >>> [3] netfilter: nft_connlimit: update the count if add was skipped >>> https://github.com/torvalds/linux/commit/69894e5b4c5e28cda5f32af33d4a92b7a4b93b0e >>> >>> >>> Do you think I missed any cases, and how will this affect the function's >>> performance? Thanks) >>> >> >> > ^ permalink raw reply [flat|nested] 10+ messages in thread
* netfilter: nf_conncount: cpu soft lockup using limiting with Open vSwitch.
@ 2025-12-08 12:31 Rukomoinikova Aleksandra
0 siblings, 0 replies; 10+ messages in thread
From: Rukomoinikova Aleksandra @ 2025-12-08 12:31 UTC (permalink / raw)
To: netfilter@vger.kernel.org
Cc: Odintsov Vladislav, Kovalev Evgeniy, Vazhnetsov Anton,
ovs-dev@openvswitch.org
Hi!
I was testing conntrack limiting using Open vSwitch and noticed the
following issue: under certain limits, a CPU lock occurred.
[ 491.682936] watchdog: BUG: soft lockup - CPU#1 stuck for 26s!
[ovs-dpctl:19437]
This occurs during a high packet frequency when trying to get the set
limits through ovs-dpctl ct-get-limits.
In the trace, I can see that the lock occurred on attempts to acquire a
spinlock.
[ 491.683056] <IRQ>
[ 491.683059] _raw_spin_lock_bh+0x29/0x30
[ 491.683064] count_tree+0x19b/0x1f0 [nf_conncount]
[ 491.683069] ovs_ct_commit+0x196/0x490 [openvswitch]
Prior to this, in the trace, there was processing of a task from
userspace (ovs-dpctl)
[ 491.683236] </IRQ>
[ 491.683237] <TASK>
[ 491.683238] asm_common_interrupt+0x22/0x40
[ 491.683240] RIP: 0010:nf_conncount_gc_list+0x18a/0x200 [nf_conncount]
Inside the nf_conncount_gc_list function, a lock is taken on
nf_conncount.c:spin_trylock_bh(&list->list_lock):335. After this, the
not-so-fast __nf_conncount_gc_list function is executed. If, at this
moment, a packet interrupt arrives on the same сpu core (and
spin_trylock_bh doesn't disable interrupts on that core), then scenario
I encountered occurs: the first lock remains held, while the packet
interrupt also attempts to acquire it at
nf_conncount.c:spin_lock_bh(&rbconn->list.list_lock):502 while
committing to conntrack. This attempt fails, leading to a soft lockup.
Hence my question: shouldn't we avoid calling nf_conncount_gc_list when
querying limits without an skb (as OVS does in
openvswitch/conntrack.c:1773)? The limit retrieval operation should be
read-only regarding the contract state, not involve potential modification.
Like this:
--- a/net/netfilter/nf_conncount.c
+++ b/net/netfilter/nf_conncount.c
@@ -495,7 +495,6 @@ count_tree(struct net *net,
int ret;
if (!skb) {
- nf_conncount_gc_list(net, &rbconn->list);
return rbconn->list.count;
}
Thanks!
--
regards,
Alexandra.
^ permalink raw reply [flat|nested] 10+ messages in threadend of thread, other threads:[~2025-12-15 15:38 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <7a6872ce-8015-4397-bbe9-22108c65b7ec@k2.cloud>
2025-12-08 12:27 ` netfilter: nf_conncount: cpu soft lockup using limiting with Open vSwitch Odintsov Vladislav
2025-12-09 7:57 ` Fernando Fernandez Mancera
2025-12-09 8:42 ` Pablo Neira Ayuso
2025-12-09 8:49 ` Rukomoinikova Aleksandra
2025-12-10 8:03 ` Fernando Fernandez Mancera
2025-12-12 21:27 ` Rukomoinikova Aleksandra
2025-12-15 11:00 ` Fernando Fernandez Mancera
2025-12-15 11:07 ` Rukomoinikova Aleksandra
2025-12-15 15:38 ` Fernando Fernandez Mancera
2025-12-08 12:31 Rukomoinikova Aleksandra
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox