* [PATCH nf] netfilter: nft_connlimit: fix stale read of connection count
@ 2025-10-23 23:20 Fernando Fernandez Mancera
2025-10-23 23:32 ` Fernando Fernandez Mancera
2025-10-24 11:31 ` Florian Westphal
0 siblings, 2 replies; 10+ messages in thread
From: Fernando Fernandez Mancera @ 2025-10-23 23:20 UTC (permalink / raw)
To: netfilter-devel; +Cc: coreteam, louis.t42, Fernando Fernandez Mancera
nft_connlimit_eval() reads priv->list->count to check if the connection
limit has been exceeded. This value can be cached by the CPU while it
can be decremented by a different CPU when a connection is closed. This
causes a data race as the value cached might be outdated.
When a new connection is established and evaluated by the connlimit
expression, priv->list->count is incremented by nf_conncount_add(),
triggering the CPU's cache coherency protocol and therefore refreshing
the cached value before updating it.
Solve this situation by reading the value using READ_ONCE().
Fixes: df4a90250976 ("netfilter: nf_conncount: merge lookup and add functions")
Closes: https://lore.kernel.org/netfilter/trinity-85c72a88-d762-46c3-be97-36f10e5d9796-1761173693813@3c-app-mailcom-bs12/
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
---
net/netfilter/nft_connlimit.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netfilter/nft_connlimit.c b/net/netfilter/nft_connlimit.c
index 92b984fa8175..fc35a11cdca2 100644
--- a/net/netfilter/nft_connlimit.c
+++ b/net/netfilter/nft_connlimit.c
@@ -48,7 +48,7 @@ static inline void nft_connlimit_do_eval(struct nft_connlimit *priv,
return;
}
- count = priv->list->count;
+ count = READ_ONCE(priv->list->count);
if ((count > priv->limit) ^ priv->invert) {
regs->verdict.code = NFT_BREAK;
--
2.51.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH nf] netfilter: nft_connlimit: fix stale read of connection count 2025-10-23 23:20 [PATCH nf] netfilter: nft_connlimit: fix stale read of connection count Fernando Fernandez Mancera @ 2025-10-23 23:32 ` Fernando Fernandez Mancera 2025-10-24 11:02 ` Florian Westphal 2025-10-24 11:31 ` Florian Westphal 1 sibling, 1 reply; 10+ messages in thread From: Fernando Fernandez Mancera @ 2025-10-23 23:32 UTC (permalink / raw) To: netfilter-devel; +Cc: coreteam, louis.t42 On 10/24/25 1:20 AM, Fernando Fernandez Mancera wrote: > nft_connlimit_eval() reads priv->list->count to check if the connection > limit has been exceeded. This value can be cached by the CPU while it > can be decremented by a different CPU when a connection is closed. This > causes a data race as the value cached might be outdated. > > When a new connection is established and evaluated by the connlimit > expression, priv->list->count is incremented by nf_conncount_add(), > triggering the CPU's cache coherency protocol and therefore refreshing > the cached value before updating it. > > Solve this situation by reading the value using READ_ONCE(). > > Fixes: df4a90250976 ("netfilter: nf_conncount: merge lookup and add functions") > Closes: https://lore.kernel.org/netfilter/trinity-85c72a88-d762-46c3-be97-36f10e5d9796-1761173693813@3c-app-mailcom-bs12/ > Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> > --- While at this, I have found another problem with connlimit although with this fix, it is partially mitigated. Since d265929930e2 ("netfilter: nf_conncount: reduce unnecessary GC"), if __nf_conncount_add() is called more than once during the same jiffy, the function won't check if the connection is already tracked and will be added right away incrementing the count. This can cause a situation where the count is greater than it should and can cause nft_connlimit to match wrongly for a few jiffies. I am open to suggestions on how to fix this.. as currently I don't have a different one other than reverting the commit.. Thanks, Fernando. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH nf] netfilter: nft_connlimit: fix stale read of connection count 2025-10-23 23:32 ` Fernando Fernandez Mancera @ 2025-10-24 11:02 ` Florian Westphal 2025-10-24 11:14 ` Fernando Fernandez Mancera 0 siblings, 1 reply; 10+ messages in thread From: Florian Westphal @ 2025-10-24 11:02 UTC (permalink / raw) To: Fernando Fernandez Mancera; +Cc: netfilter-devel, coreteam, louis.t42 Fernando Fernandez Mancera <fmancera@suse.de> wrote: > On 10/24/25 1:20 AM, Fernando Fernandez Mancera wrote: > > nft_connlimit_eval() reads priv->list->count to check if the connection > > limit has been exceeded. This value can be cached by the CPU while it > > can be decremented by a different CPU when a connection is closed. This > > causes a data race as the value cached might be outdated. > > > > When a new connection is established and evaluated by the connlimit > > expression, priv->list->count is incremented by nf_conncount_add(), > > triggering the CPU's cache coherency protocol and therefore refreshing > > the cached value before updating it. > > > > Solve this situation by reading the value using READ_ONCE(). > > > > Fixes: df4a90250976 ("netfilter: nf_conncount: merge lookup and add functions") > > Closes: https://lore.kernel.org/netfilter/trinity-85c72a88-d762-46c3-be97-36f10e5d9796-1761173693813@3c-app-mailcom-bs12/ > > Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> > > --- > > While at this, I have found another problem with connlimit although with > this fix, it is partially mitigated. Since d265929930e2 ("netfilter: > nf_conncount: reduce unnecessary GC"), if __nf_conncount_add() is called > more than once during the same jiffy, the function won't check if the > connection is already tracked and will be added right away incrementing > the count. This can cause a situation where the count is greater than it > should and can cause nft_connlimit to match wrongly for a few jiffies. > > I am open to suggestions on how to fix this.. as currently I don't have > a different one other than reverting the commit.. People are not supposed to use it in this way. This is very expensive, there is a reason why iptables-extensions examples all use iptables --syn. This needs a documentation fix. Or, we could revert df4a90250976 and then only _add for ctinfo NEW. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH nf] netfilter: nft_connlimit: fix stale read of connection count 2025-10-24 11:02 ` Florian Westphal @ 2025-10-24 11:14 ` Fernando Fernandez Mancera 2025-10-24 11:33 ` Florian Westphal 0 siblings, 1 reply; 10+ messages in thread From: Fernando Fernandez Mancera @ 2025-10-24 11:14 UTC (permalink / raw) To: Florian Westphal; +Cc: netfilter-devel, coreteam, louis.t42 On 10/24/25 1:02 PM, Florian Westphal wrote: > Fernando Fernandez Mancera <fmancera@suse.de> wrote: >> On 10/24/25 1:20 AM, Fernando Fernandez Mancera wrote: >>> nft_connlimit_eval() reads priv->list->count to check if the connection >>> limit has been exceeded. This value can be cached by the CPU while it >>> can be decremented by a different CPU when a connection is closed. This >>> causes a data race as the value cached might be outdated. >>> >>> When a new connection is established and evaluated by the connlimit >>> expression, priv->list->count is incremented by nf_conncount_add(), >>> triggering the CPU's cache coherency protocol and therefore refreshing >>> the cached value before updating it. >>> >>> Solve this situation by reading the value using READ_ONCE(). >>> >>> Fixes: df4a90250976 ("netfilter: nf_conncount: merge lookup and add functions") >>> Closes: https://lore.kernel.org/netfilter/trinity-85c72a88-d762-46c3-be97-36f10e5d9796-1761173693813@3c-app-mailcom-bs12/ >>> Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> >>> --- >> >> While at this, I have found another problem with connlimit although with >> this fix, it is partially mitigated. Since d265929930e2 ("netfilter: >> nf_conncount: reduce unnecessary GC"), if __nf_conncount_add() is called >> more than once during the same jiffy, the function won't check if the >> connection is already tracked and will be added right away incrementing >> the count. This can cause a situation where the count is greater than it >> should and can cause nft_connlimit to match wrongly for a few jiffies. >> >> I am open to suggestions on how to fix this.. as currently I don't have >> a different one other than reverting the commit.. > > People are not supposed to use it in this way. > > This is very expensive, there is a reason why iptables-extensions > examples all use iptables --syn. > If this is only supposed to be checked for SYN packets, that is, a new connection is started.. a simple fix could be to check whether this is a SYN packet or not, break if it isn't. Similar to what is done on synproxy eval function. What do you think? > This needs a documentation fix. > > Or, we could revert df4a90250976 and then only _add for ctinfo NEW. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH nf] netfilter: nft_connlimit: fix stale read of connection count 2025-10-24 11:14 ` Fernando Fernandez Mancera @ 2025-10-24 11:33 ` Florian Westphal 0 siblings, 0 replies; 10+ messages in thread From: Florian Westphal @ 2025-10-24 11:33 UTC (permalink / raw) To: Fernando Fernandez Mancera; +Cc: netfilter-devel, coreteam, louis.t42 Fernando Fernandez Mancera <fmancera@suse.de> wrote: > If this is only supposed to be checked for SYN packets, that is, a new > connection is started.. a simple fix could be to check whether this is a > SYN packet or not, break if it isn't. Similar to what is done on > synproxy eval function. What do you think? It should work for any protocol, not just SYN. But we can't change it now to just check NEW, we would have to provide same result as without it (just without the potential double-add). ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH nf] netfilter: nft_connlimit: fix stale read of connection count 2025-10-23 23:20 [PATCH nf] netfilter: nft_connlimit: fix stale read of connection count Fernando Fernandez Mancera 2025-10-23 23:32 ` Fernando Fernandez Mancera @ 2025-10-24 11:31 ` Florian Westphal 2025-10-24 11:55 ` Fernando Fernandez Mancera 1 sibling, 1 reply; 10+ messages in thread From: Florian Westphal @ 2025-10-24 11:31 UTC (permalink / raw) To: Fernando Fernandez Mancera; +Cc: netfilter-devel, coreteam, louis.t42 Fernando Fernandez Mancera <fmancera@suse.de> wrote: > nft_connlimit_eval() reads priv->list->count to check if the connection > limit has been exceeded. This value can be cached by the CPU while it > can be decremented by a different CPU when a connection is closed. This > causes a data race as the value cached might be outdated. > > When a new connection is established and evaluated by the connlimit > expression, priv->list->count is incremented by nf_conncount_add(), > triggering the CPU's cache coherency protocol and therefore refreshing > the cached value before updating it. > > Solve this situation by reading the value using READ_ONCE(). Hmm, I am not sure about this. Patch looks correct (we read without holding a lock), but I don't see how compiler would emit different code here. This patch makes no difference on my end, same code is emitted. Can you show code before and after this patch on your side? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH nf] netfilter: nft_connlimit: fix stale read of connection count 2025-10-24 11:31 ` Florian Westphal @ 2025-10-24 11:55 ` Fernando Fernandez Mancera 2025-10-24 12:49 ` Florian Westphal 0 siblings, 1 reply; 10+ messages in thread From: Fernando Fernandez Mancera @ 2025-10-24 11:55 UTC (permalink / raw) To: Florian Westphal; +Cc: netfilter-devel, coreteam, louis.t42 On 10/24/25 1:31 PM, Florian Westphal wrote: > Fernando Fernandez Mancera <fmancera@suse.de> wrote: >> nft_connlimit_eval() reads priv->list->count to check if the connection >> limit has been exceeded. This value can be cached by the CPU while it >> can be decremented by a different CPU when a connection is closed. This >> causes a data race as the value cached might be outdated. >> >> When a new connection is established and evaluated by the connlimit >> expression, priv->list->count is incremented by nf_conncount_add(), >> triggering the CPU's cache coherency protocol and therefore refreshing >> the cached value before updating it. >> >> Solve this situation by reading the value using READ_ONCE(). > > Hmm, I am not sure about this. > > Patch looks correct (we read without holding a lock), > but I don't see how compiler would emit different code here. > > This patch makes no difference on my end, same code is emitted. > > Can you show code before and after this patch on your side? I did `make net/netfilter/nft_connlimit.s` for before and after, then I diffed both files with diff -u: I see a difference on how count is being handled. @@ -984,19 +984,19 @@ # net/netfilter/nft_connlimit.c:46: if (nf_conncount_add(nft_net(pkt), priv->list, tuple_ptr, zone)) { testl %eax, %eax # _30 jne .L65 #, -# net/netfilter/nft_connlimit.c:51: count = priv->list->count; - movq 8(%rbx), %rax # MEM[(struct nft_connlimit *)expr_2(D) + 8B].list, MEM[(struct nft_connlimit *)expr_2(D) + 8B].list +# net/netfilter/nft_connlimit.c:51: count = READ_ONCE(priv->list->count); + movq 8(%rbx), %rax # MEM[(struct nft_connlimit *)expr_2(D) + 8B].list, _31 + movl 88(%rax), %eax # MEM[(const volatile unsigned int *)_31 + 88B], _32 # net/netfilter/nft_connlimit.c:53: if ((count > priv->limit) ^ priv->invert) { - movl 88(%rax), %eax # _31->count, tmp155 - cmpl %eax, 16(%rbx) # tmp155, MEM[(struct nft_connlimit *)expr_2(D) + 8B].limit + cmpl %eax, 16(%rbx) # _32, MEM[(struct nft_connlimit *)expr_2(D) + 8B].limit setb %al #, _34 # net/netfilter/nft_connlimit.c:53: if ((count > priv->limit) ^ priv->invert) { cmpb 20(%rbx), %al # MEM[(struct nft_connlimit *)expr_2(D) + 8B].invert, _34 jne .L69 #, .L61: ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH nf] netfilter: nft_connlimit: fix stale read of connection count 2025-10-24 11:55 ` Fernando Fernandez Mancera @ 2025-10-24 12:49 ` Florian Westphal 2025-10-24 13:04 ` Fernando Fernandez Mancera 2025-10-24 15:47 ` Fernando Fernandez Mancera 0 siblings, 2 replies; 10+ messages in thread From: Florian Westphal @ 2025-10-24 12:49 UTC (permalink / raw) To: Fernando Fernandez Mancera; +Cc: netfilter-devel, coreteam, louis.t42 Fernando Fernandez Mancera <fmancera@suse.de> wrote: > On 10/24/25 1:31 PM, Florian Westphal wrote: > > Fernando Fernandez Mancera <fmancera@suse.de> wrote: > >> nft_connlimit_eval() reads priv->list->count to check if the connection > >> limit has been exceeded. This value can be cached by the CPU while it > >> can be decremented by a different CPU when a connection is closed. This > >> causes a data race as the value cached might be outdated. > >> > >> When a new connection is established and evaluated by the connlimit > >> expression, priv->list->count is incremented by nf_conncount_add(), > >> triggering the CPU's cache coherency protocol and therefore refreshing > >> the cached value before updating it. > >> > >> Solve this situation by reading the value using READ_ONCE(). > > > > Hmm, I am not sure about this. > > > > Patch looks correct (we read without holding a lock), > > but I don't see how compiler would emit different code here. > > > > This patch makes no difference on my end, same code is emitted. > > > > Can you show code before and after this patch on your side? > > I did `make net/netfilter/nft_connlimit.s` for before and after, then I > diffed both files with diff -u: > > I see a difference on how count is being handled. I'd apply this patch for correctness reasons. But I see no difference: > @@ -984,19 +984,19 @@ > # net/netfilter/nft_connlimit.c:46: if > (nf_conncount_add(nft_net(pkt), priv->list, tuple_ptr, zone)) { > testl %eax, %eax # _30 > jne .L65 #, > -# net/netfilter/nft_connlimit.c:51: count = priv->list->count; > - movq 8(%rbx), %rax # MEM[(struct nft_connlimit *)expr_2(D) + 8B].list, > MEM[(struct nft_connlimit *)expr_2(D) + 8B].list > +# net/netfilter/nft_connlimit.c:51: count = READ_ONCE(priv->list->count); > + movq 8(%rbx), %rax # MEM[(struct nft_connlimit *)expr_2(D) + 8B].list, _31 > + movl 88(%rax), %eax # MEM[(const volatile unsigned int *)_31 + 88B], _32 old: movq 8(%rbx), %rax movl 88(%rax), %eax cmpl %eax, 16(%rbx) new: movq 8(%rbx), %rax movl 88(%rax), %eax cmpl %eax, 16(%rbx) same instruction sequence, only comments differ. Doesn't invalidate the patch however, we do read while not holding any locks so this READ_ONCE annotation is needed. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH nf] netfilter: nft_connlimit: fix stale read of connection count 2025-10-24 12:49 ` Florian Westphal @ 2025-10-24 13:04 ` Fernando Fernandez Mancera 2025-10-24 15:47 ` Fernando Fernandez Mancera 1 sibling, 0 replies; 10+ messages in thread From: Fernando Fernandez Mancera @ 2025-10-24 13:04 UTC (permalink / raw) To: Florian Westphal; +Cc: netfilter-devel, coreteam, louis.t42 On 10/24/25 2:49 PM, Florian Westphal wrote: > Fernando Fernandez Mancera <fmancera@suse.de> wrote: >> On 10/24/25 1:31 PM, Florian Westphal wrote: >>> Fernando Fernandez Mancera <fmancera@suse.de> wrote: >>>> nft_connlimit_eval() reads priv->list->count to check if the connection >>>> limit has been exceeded. This value can be cached by the CPU while it >>>> can be decremented by a different CPU when a connection is closed. This >>>> causes a data race as the value cached might be outdated. >>>> >>>> When a new connection is established and evaluated by the connlimit >>>> expression, priv->list->count is incremented by nf_conncount_add(), >>>> triggering the CPU's cache coherency protocol and therefore refreshing >>>> the cached value before updating it. >>>> >>>> Solve this situation by reading the value using READ_ONCE(). >>> >>> Hmm, I am not sure about this. >>> >>> Patch looks correct (we read without holding a lock), >>> but I don't see how compiler would emit different code here. >>> >>> This patch makes no difference on my end, same code is emitted. >>> >>> Can you show code before and after this patch on your side? >> >> I did `make net/netfilter/nft_connlimit.s` for before and after, then I >> diffed both files with diff -u: >> >> I see a difference on how count is being handled. > > I'd apply this patch for correctness reasons. But I see no difference: > >> @@ -984,19 +984,19 @@ >> # net/netfilter/nft_connlimit.c:46: if >> (nf_conncount_add(nft_net(pkt), priv->list, tuple_ptr, zone)) { >> testl %eax, %eax # _30 >> jne .L65 #, >> -# net/netfilter/nft_connlimit.c:51: count = priv->list->count; >> - movq 8(%rbx), %rax # MEM[(struct nft_connlimit *)expr_2(D) + 8B].list, >> MEM[(struct nft_connlimit *)expr_2(D) + 8B].list >> +# net/netfilter/nft_connlimit.c:51: count = READ_ONCE(priv->list->count); >> + movq 8(%rbx), %rax # MEM[(struct nft_connlimit *)expr_2(D) + 8B].list, _31 >> + movl 88(%rax), %eax # MEM[(const volatile unsigned int *)_31 + 88B], _32 > > old: > movq 8(%rbx), %rax > movl 88(%rax), %eax > cmpl %eax, 16(%rbx) > > new: > movq 8(%rbx), %rax > movl 88(%rax), %eax > cmpl %eax, 16(%rbx) > > same instruction sequence, only comments differ. > Doesn't invalidate the patch however, we do read while not holding > any locks so this READ_ONCE annotation is needed. I think the difference is on the `volatile` read, which if I am not wrong enforces to read from memory instead of using the cached value.. First time handling this type of situations but that is also the change in behavior I noticed.. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH nf] netfilter: nft_connlimit: fix stale read of connection count 2025-10-24 12:49 ` Florian Westphal 2025-10-24 13:04 ` Fernando Fernandez Mancera @ 2025-10-24 15:47 ` Fernando Fernandez Mancera 1 sibling, 0 replies; 10+ messages in thread From: Fernando Fernandez Mancera @ 2025-10-24 15:47 UTC (permalink / raw) To: Florian Westphal; +Cc: netfilter-devel, coreteam, louis.t42 On 10/24/25 2:49 PM, Florian Westphal wrote: > Fernando Fernandez Mancera <fmancera@suse.de> wrote: >> On 10/24/25 1:31 PM, Florian Westphal wrote: >>> Fernando Fernandez Mancera <fmancera@suse.de> wrote: >>>> nft_connlimit_eval() reads priv->list->count to check if the connection >>>> limit has been exceeded. This value can be cached by the CPU while it >>>> can be decremented by a different CPU when a connection is closed. This >>>> causes a data race as the value cached might be outdated. >>>> >>>> When a new connection is established and evaluated by the connlimit >>>> expression, priv->list->count is incremented by nf_conncount_add(), >>>> triggering the CPU's cache coherency protocol and therefore refreshing >>>> the cached value before updating it. >>>> >>>> Solve this situation by reading the value using READ_ONCE(). >>> >>> Hmm, I am not sure about this. >>> >>> Patch looks correct (we read without holding a lock), >>> but I don't see how compiler would emit different code here. >>> >>> This patch makes no difference on my end, same code is emitted. >>> >>> Can you show code before and after this patch on your side? >> >> I did `make net/netfilter/nft_connlimit.s` for before and after, then I >> diffed both files with diff -u: >> >> I see a difference on how count is being handled. > > I'd apply this patch for correctness reasons. But I see no difference: > >> @@ -984,19 +984,19 @@ >> # net/netfilter/nft_connlimit.c:46: if >> (nf_conncount_add(nft_net(pkt), priv->list, tuple_ptr, zone)) { >> testl %eax, %eax # _30 >> jne .L65 #, >> -# net/netfilter/nft_connlimit.c:51: count = priv->list->count; >> - movq 8(%rbx), %rax # MEM[(struct nft_connlimit *)expr_2(D) + 8B].list, >> MEM[(struct nft_connlimit *)expr_2(D) + 8B].list >> +# net/netfilter/nft_connlimit.c:51: count = READ_ONCE(priv->list->count); >> + movq 8(%rbx), %rax # MEM[(struct nft_connlimit *)expr_2(D) + 8B].list, _31 >> + movl 88(%rax), %eax # MEM[(const volatile unsigned int *)_31 + 88B], _32 > > old: > movq 8(%rbx), %rax > movl 88(%rax), %eax > cmpl %eax, 16(%rbx) > > new: > movq 8(%rbx), %rax > movl 88(%rax), %eax > cmpl %eax, 16(%rbx) > > same instruction sequence, only comments differ. > Doesn't invalidate the patch however, we do read while not holding > any locks so this READ_ONCE annotation is needed. > Yes, you are right. Let me send a V2 clarifying the commit message. I am sending some other patches as this requires other fixes.. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-10-24 15:47 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-10-23 23:20 [PATCH nf] netfilter: nft_connlimit: fix stale read of connection count Fernando Fernandez Mancera 2025-10-23 23:32 ` Fernando Fernandez Mancera 2025-10-24 11:02 ` Florian Westphal 2025-10-24 11:14 ` Fernando Fernandez Mancera 2025-10-24 11:33 ` Florian Westphal 2025-10-24 11:31 ` Florian Westphal 2025-10-24 11:55 ` Fernando Fernandez Mancera 2025-10-24 12:49 ` Florian Westphal 2025-10-24 13:04 ` Fernando Fernandez Mancera 2025-10-24 15:47 ` Fernando Fernandez Mancera
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).