* [PATCH] icmp: fix icmp_ndo_send address translation for reply direction
@ 2025-08-25 20:17 Fabian Bläse
2025-08-25 20:38 ` [PATCH v2] " Fabian Bläse
0 siblings, 1 reply; 12+ messages in thread
From: Fabian Bläse @ 2025-08-25 20:17 UTC (permalink / raw)
To: netdev; +Cc: netfilter-devel, Fabian Bläse, Jason A. Donenfeld
The icmp_ndo_send function was originally introduced to ensure proper
rate limiting when icmp_send is called by a network device driver,
where the packet's source address may have already been transformed
by NAT or MASQUERADE.
However, the implementation only considered the IP_CT_DIR_ORIGINAL case
and incorrectly applies the same logic to packets in reply direction.
Therefore, an SNAT rule in the original direction causes icmp_ndo_send to
translate the source IP of reply-direction packets, even though no
translation is required. The source address is translated to the sender
address of the original direction, because the original tuple's source
address is used.
On the other hand, icmp_ndo_send incorrectly misses translating the
source address of packets in reply-direction, leading to incorrect rate
limiting. The generated ICMP error is translated by netfilter at a later
stage, therefore the ICMP error is sent correctly.
Fix this by translating the address based on the connection direction:
- CT_DIR_ORIGINAL: Use the original tuple's source address
(unchanged from current behavior)
- CT_DIR_REPLY: Use the reply tuple's source address
(fixing the incorrect translation)
Fixes: 0b41713b6066 ("icmp: introduce helper for nat'd source address in network device context")
Signed-off-by: Fabian Bläse <fabian@blaese.de>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
---
net/ipv4/icmp.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 2ffe73ea644f..a4fb0bc7c4cf 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -803,7 +803,13 @@ void icmp_ndo_send(struct sk_buff *skb_in, int type, int code, __be32 info)
__be32 orig_ip;
ct = nf_ct_get(skb_in, &ctinfo);
- if (!ct || !(ct->status & IPS_SRC_NAT)) {
+ if (!ct) {
+ __icmp_send(skb_in, type, code, info, &opts);
+ return;
+ }
+
+ if ( !(ct->status & IPS_SRC_NAT && CTINFO2DIR(ctinfo) == IP_CT_DIR_ORIGINAL)
+ && !(ct->status & IPS_DST_NAT && CTINFO2DIR(ctinfo) == IP_CT_DIR_REPLY)) {
__icmp_send(skb_in, type, code, info, &opts);
return;
}
@@ -818,7 +824,11 @@ void icmp_ndo_send(struct sk_buff *skb_in, int type, int code, __be32 info)
goto out;
orig_ip = ip_hdr(skb_in)->saddr;
- ip_hdr(skb_in)->saddr = ct->tuplehash[0].tuple.src.u3.ip;
+ if (CTINFO2DIR(ctinfo) == IP_CT_DIR_ORIGINAL) {
+ ip_hdr(skb_in)->saddr = ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u3.ip;
+ } else {
+ ip_hdr(skb_in)->saddr = ct->tuplehash[IP_CT_DIR_REPLY].tuple.src.u3.ip;
+ }
__icmp_send(skb_in, type, code, info, &opts);
ip_hdr(skb_in)->saddr = orig_ip;
out:
--
2.50.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v2] icmp: fix icmp_ndo_send address translation for reply direction
2025-08-25 20:17 [PATCH] icmp: fix icmp_ndo_send address translation for reply direction Fabian Bläse
@ 2025-08-25 20:38 ` Fabian Bläse
2025-08-27 9:05 ` Florian Westphal
2025-08-28 9:14 ` [PATCH v3] " Fabian Bläse
0 siblings, 2 replies; 12+ messages in thread
From: Fabian Bläse @ 2025-08-25 20:38 UTC (permalink / raw)
To: netdev; +Cc: netfilter-devel, Fabian Bläse, Jason A. Donenfeld
The icmp_ndo_send function was originally introduced to ensure proper
rate limiting when icmp_send is called by a network device driver,
where the packet's source address may have already been transformed
by NAT or MASQUERADE.
However, the implementation only considered the IP_CT_DIR_ORIGINAL case
and incorrectly applies the same logic to packets in reply direction.
Therefore, an SNAT rule in the original direction causes icmp_ndo_send to
translate the source IP of reply-direction packets, even though no
translation is required. The source address is translated to the sender
address of the original direction, because the original tuple's source
address is used.
On the other hand, icmp_ndo_send incorrectly misses translating the
source address of packets in reply-direction, leading to incorrect rate
limiting. The generated ICMP error is translated by netfilter at a later
stage, therefore the ICMP error is sent correctly.
Fix this by translating the address based on the connection direction:
- CT_DIR_ORIGINAL: Use the original tuple's source address
(unchanged from current behavior)
- CT_DIR_REPLY: Use the reply tuple's source address
(fixing the incorrect translation)
Fixes: 0b41713b6066 ("icmp: introduce helper for nat'd source address in network device context")
Signed-off-by: Fabian Bläse <fabian@blaese.de>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
---
Changes v1->v2:
- Implement fix for ICMPv6 as well
---
net/ipv4/icmp.c | 14 ++++++++++++--
net/ipv6/ip6_icmp.c | 14 ++++++++++++--
2 files changed, 24 insertions(+), 4 deletions(-)
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 2ffe73ea644f..a4fb0bc7c4cf 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -803,7 +803,13 @@ void icmp_ndo_send(struct sk_buff *skb_in, int type, int code, __be32 info)
__be32 orig_ip;
ct = nf_ct_get(skb_in, &ctinfo);
- if (!ct || !(ct->status & IPS_SRC_NAT)) {
+ if (!ct) {
+ __icmp_send(skb_in, type, code, info, &opts);
+ return;
+ }
+
+ if ( !(ct->status & IPS_SRC_NAT && CTINFO2DIR(ctinfo) == IP_CT_DIR_ORIGINAL)
+ && !(ct->status & IPS_DST_NAT && CTINFO2DIR(ctinfo) == IP_CT_DIR_REPLY)) {
__icmp_send(skb_in, type, code, info, &opts);
return;
}
@@ -818,7 +824,11 @@ void icmp_ndo_send(struct sk_buff *skb_in, int type, int code, __be32 info)
goto out;
orig_ip = ip_hdr(skb_in)->saddr;
- ip_hdr(skb_in)->saddr = ct->tuplehash[0].tuple.src.u3.ip;
+ if (CTINFO2DIR(ctinfo) == IP_CT_DIR_ORIGINAL) {
+ ip_hdr(skb_in)->saddr = ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u3.ip;
+ } else {
+ ip_hdr(skb_in)->saddr = ct->tuplehash[IP_CT_DIR_REPLY].tuple.src.u3.ip;
+ }
__icmp_send(skb_in, type, code, info, &opts);
ip_hdr(skb_in)->saddr = orig_ip;
out:
diff --git a/net/ipv6/ip6_icmp.c b/net/ipv6/ip6_icmp.c
index 9e3574880cb0..c6078694311c 100644
--- a/net/ipv6/ip6_icmp.c
+++ b/net/ipv6/ip6_icmp.c
@@ -58,7 +58,13 @@ void icmpv6_ndo_send(struct sk_buff *skb_in, u8 type, u8 code, __u32 info)
struct nf_conn *ct;
ct = nf_ct_get(skb_in, &ctinfo);
- if (!ct || !(ct->status & IPS_SRC_NAT)) {
+ if (!ct) {
+ __icmpv6_send(skb_in, type, code, info, &parm);
+ return;
+ }
+
+ if ( !(ct->status & IPS_SRC_NAT && CTINFO2DIR(ctinfo) == IP_CT_DIR_ORIGINAL)
+ && !(ct->status & IPS_DST_NAT && CTINFO2DIR(ctinfo) == IP_CT_DIR_REPLY)) {
__icmpv6_send(skb_in, type, code, info, &parm);
return;
}
@@ -73,7 +79,11 @@ void icmpv6_ndo_send(struct sk_buff *skb_in, u8 type, u8 code, __u32 info)
goto out;
orig_ip = ipv6_hdr(skb_in)->saddr;
- ipv6_hdr(skb_in)->saddr = ct->tuplehash[0].tuple.src.u3.in6;
+ if (CTINFO2DIR(ctinfo) == IP_CT_DIR_ORIGINAL) {
+ ipv6_hdr(skb_in)->saddr = ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u3.in6;
+ } else {
+ ipv6_hdr(skb_in)->saddr = ct->tuplehash[IP_CT_DIR_REPLY].tuple.src.u3.in6;
+ }
__icmpv6_send(skb_in, type, code, info, &parm);
ipv6_hdr(skb_in)->saddr = orig_ip;
out:
--
2.50.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v2] icmp: fix icmp_ndo_send address translation for reply direction
2025-08-25 20:38 ` [PATCH v2] " Fabian Bläse
@ 2025-08-27 9:05 ` Florian Westphal
2025-08-27 17:12 ` Fabian Bläse
2025-08-28 9:14 ` [PATCH v3] " Fabian Bläse
1 sibling, 1 reply; 12+ messages in thread
From: Florian Westphal @ 2025-08-27 9:05 UTC (permalink / raw)
To: Fabian Bläse; +Cc: netdev, netfilter-devel, Jason A. Donenfeld
Fabian Bläse <fabian@blaese.de> wrote:
> The icmp_ndo_send function was originally introduced to ensure proper
> rate limiting when icmp_send is called by a network device driver,
> where the packet's source address may have already been transformed
> by NAT or MASQUERADE.
>
> However, the implementation only considered the IP_CT_DIR_ORIGINAL case
> and incorrectly applies the same logic to packets in reply direction.
>
> Therefore, an SNAT rule in the original direction causes icmp_ndo_send to
> translate the source IP of reply-direction packets, even though no
> translation is required. The source address is translated to the sender
> address of the original direction, because the original tuple's source
> address is used.
>
> On the other hand, icmp_ndo_send incorrectly misses translating the
> source address of packets in reply-direction, leading to incorrect rate
> limiting. The generated ICMP error is translated by netfilter at a later
> stage, therefore the ICMP error is sent correctly.
>
> Fix this by translating the address based on the connection direction:
> - CT_DIR_ORIGINAL: Use the original tuple's source address
> (unchanged from current behavior)
> - CT_DIR_REPLY: Use the reply tuple's source address
> (fixing the incorrect translation)
> ct = nf_ct_get(skb_in, &ctinfo);
> - if (!ct || !(ct->status & IPS_SRC_NAT)) {
> + if (!ct) {
> + __icmp_send(skb_in, type, code, info, &opts);
> + return;
> + }
I don't understand this part. You talk about snat, yet you remove
the check for its absence here. Why?
If the connection isn't subject to snat, why to we need to mangle the
source address in the first place?
If you are worried about "dnat to", then please update the commit
message, which only mentions masquerade/snat.
> + if ( !(ct->status & IPS_SRC_NAT && CTINFO2DIR(ctinfo) == IP_CT_DIR_ORIGINAL)
> + && !(ct->status & IPS_DST_NAT && CTINFO2DIR(ctinfo) == IP_CT_DIR_REPLY)) {
> __icmp_send(skb_in, type, code, info, &opts);
> return;
> }
Don't understand this either. Why these checks?
AFAICS you can keep the original check in place, and then:
replace this
> orig_ip = ip_hdr(skb_in)->saddr;
> - ip_hdr(skb_in)->saddr = ct->tuplehash[0].tuple.src.u3.ip;
... with ...
enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
ip_hdr(skb_in)->saddr = ct->tuplehash[dir].tuple.src.u3.ip;
... at least, thats what I gather from your commit message.
For original direction, there is no change compared to existing code (dir == 0).
For reply direction, saddr is replaced with the source of the reply tuple.
Without dnat, the reply tuple saddr == original tuple daddr.
With dnat, its the dnat targets' address (i.e., the real destination
the client is talking to).
Did I get anything wrong?
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] icmp: fix icmp_ndo_send address translation for reply direction
2025-08-27 9:05 ` Florian Westphal
@ 2025-08-27 17:12 ` Fabian Bläse
2025-08-27 17:25 ` Florian Westphal
0 siblings, 1 reply; 12+ messages in thread
From: Fabian Bläse @ 2025-08-27 17:12 UTC (permalink / raw)
To: Florian Westphal; +Cc: netdev, netfilter-devel, Jason A. Donenfeld
On 27.08.25 11:05, Florian Westphal wrote:
> If the connection isn't subject to snat, why to we need to mangle the
> source address in the first place?
It is not limited to SNAT/MASQUERADE.
DNAT also affects which source address should be used, depending on the packet
direction.
With DNAT, the *destination* of the original direction is changed.
In the reply direction, this becomes the *source* address.
So reply packets of a DNAT connection are effectively subject to source address
translation. If icmp_ndo_send doesn’t account for this, rate limiting breaks,
which is exactly the problem this function was meant to solve.
> Don't understand this either. Why these checks?
> AFAICS you can keep the original check in place, and then:
>
> replace this
>> orig_ip = ip_hdr(skb_in)->saddr;
>> - ip_hdr(skb_in)->saddr = ct->tuplehash[0].tuple.src.u3.ip;
>
> ... with ...
You are right: the code can be simplified. I'm not sure show this slipped through.
I will send an updated patch with this change — thanks for the suggestion.
However, the old check (IPS_SRC_NAT only) cannot be kept, because:
- Reply packets of a DNAT connection also need handling.
- Reply packets of a pure SNAT connection don’t need it, but replacing the
address is a no-op in that case (tuple == skb address).
To avoid unnecessary translations, I suggested the direction-specific checks.
Another option is to simplify them to:
if (!(ct->status & IPS_NAT_MASK)) { … }
This ensures we only ever touch connections with NAT, while keeping the code
straightforward.
> Without dnat, the reply tuple saddr == original tuple daddr.
>
> With dnat, its the dnat targets' address (i.e., the real destination
> the client is talking to).
Yes, exactly.
> If you are worried about "dnat to", then please update the commit
> message, which only mentions masquerade/snat.
Correct — the change not only fixes SNAT-in-reply handling, but also adds
proper handling for DNAT in the reply direction, which was missing entirely.
I will update the commit message to reflect this.
Best regards,
Fabian
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] icmp: fix icmp_ndo_send address translation for reply direction
2025-08-27 17:12 ` Fabian Bläse
@ 2025-08-27 17:25 ` Florian Westphal
0 siblings, 0 replies; 12+ messages in thread
From: Florian Westphal @ 2025-08-27 17:25 UTC (permalink / raw)
To: Fabian Bläse; +Cc: netdev, netfilter-devel, Jason A. Donenfeld
Fabian Bläse <fabian@blaese.de> wrote:
> To avoid unnecessary translations, I suggested the direction-specific checks.
> Another option is to simplify them to:
>
> if (!(ct->status & IPS_NAT_MASK)) { … }
Yes, you can update the test from
if (!ct || !(ct->status & IPS_SRC_NAT)) {
to
if (!ct || !(ct->status & IPS_NAT_MASK)) {
Not related to your change:
I suspect there is a very small risk that kcsan could report a data
race here, given ct->status can be modified on other CPU.
But maybe, while at it, replace this with
READ_ONCE(ct->status) & ...
> Correct — the change not only fixes SNAT-in-reply handling, but also adds
> proper handling for DNAT in the reply direction, which was missing entirely.
> I will update the commit message to reflect this.
Thanks!
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v3] icmp: fix icmp_ndo_send address translation for reply direction
2025-08-25 20:38 ` [PATCH v2] " Fabian Bläse
2025-08-27 9:05 ` Florian Westphal
@ 2025-08-28 9:14 ` Fabian Bläse
2025-08-28 12:00 ` Pablo Neira Ayuso
` (2 more replies)
1 sibling, 3 replies; 12+ messages in thread
From: Fabian Bläse @ 2025-08-28 9:14 UTC (permalink / raw)
To: netdev
Cc: netfilter-devel, Fabian Bläse, Jason A. Donenfeld,
Florian Westphal
The icmp_ndo_send function was originally introduced to ensure proper
rate limiting when icmp_send is called by a network device driver,
where the packet's source address may have already been transformed
by SNAT.
However, the original implementation only considers the
IP_CT_DIR_ORIGINAL direction for SNAT and always replaced the packet's
source address with that of the original-direction tuple. This causes
two problems:
1. For SNAT:
Reply-direction packets were incorrectly translated using the source
address of the CT original direction, even though no translation is
required.
2. For DNAT:
Reply-direction packets were not handled at all. In DNAT, the original
direction's destination is translated. Therefore, in the reply
direction the source address must be set to the reply-direction
source, so rate limiting works as intended.
Fix this by using the connection direction to select the correct tuple
for source address translation, and adjust the pre-checks to handle
reply-direction packets in case of DNAT.
Additionally, wrap the `ct->status` access in READ_ONCE(). This avoids
possible KCSAN reports about concurrent updates to `ct->status`.
Fixes: 0b41713b6066 ("icmp: introduce helper for nat'd source address in network device context")
Signed-off-by: Fabian Bläse <fabian@blaese.de>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Florian Westphal <fw@strlen.de>
---
Changes v1->v2:
- Implement fix for ICMPv6 as well
Changes v2->v3:
- Collapse conditional tuple selection into a single direction lookup [Florian]
- Always apply source address translation if IPS_NAT_MASK is set [Florian]
- Wrap ct->status in READ_ONCE()
- Add a clearer explanation of the behaviour change for DNAT
---
net/ipv4/icmp.c | 6 ++++--
net/ipv6/ip6_icmp.c | 6 ++++--
2 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 2ffe73ea644f..c48c572f024d 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -799,11 +799,12 @@ void icmp_ndo_send(struct sk_buff *skb_in, int type, int code, __be32 info)
struct sk_buff *cloned_skb = NULL;
struct ip_options opts = { 0 };
enum ip_conntrack_info ctinfo;
+ enum ip_conntrack_dir dir;
struct nf_conn *ct;
__be32 orig_ip;
ct = nf_ct_get(skb_in, &ctinfo);
- if (!ct || !(ct->status & IPS_SRC_NAT)) {
+ if (!ct || !(READ_ONCE(ct->status) & IPS_NAT_MASK)) {
__icmp_send(skb_in, type, code, info, &opts);
return;
}
@@ -818,7 +819,8 @@ void icmp_ndo_send(struct sk_buff *skb_in, int type, int code, __be32 info)
goto out;
orig_ip = ip_hdr(skb_in)->saddr;
- ip_hdr(skb_in)->saddr = ct->tuplehash[0].tuple.src.u3.ip;
+ dir = CTINFO2DIR(ctinfo);
+ ip_hdr(skb_in)->saddr = ct->tuplehash[dir].tuple.src.u3.ip;
__icmp_send(skb_in, type, code, info, &opts);
ip_hdr(skb_in)->saddr = orig_ip;
out:
diff --git a/net/ipv6/ip6_icmp.c b/net/ipv6/ip6_icmp.c
index 9e3574880cb0..233914b63bdb 100644
--- a/net/ipv6/ip6_icmp.c
+++ b/net/ipv6/ip6_icmp.c
@@ -54,11 +54,12 @@ void icmpv6_ndo_send(struct sk_buff *skb_in, u8 type, u8 code, __u32 info)
struct inet6_skb_parm parm = { 0 };
struct sk_buff *cloned_skb = NULL;
enum ip_conntrack_info ctinfo;
+ enum ip_conntrack_dir dir;
struct in6_addr orig_ip;
struct nf_conn *ct;
ct = nf_ct_get(skb_in, &ctinfo);
- if (!ct || !(ct->status & IPS_SRC_NAT)) {
+ if (!ct || !(READ_ONCE(ct->status) & IPS_NAT_MASK)) {
__icmpv6_send(skb_in, type, code, info, &parm);
return;
}
@@ -73,7 +74,8 @@ void icmpv6_ndo_send(struct sk_buff *skb_in, u8 type, u8 code, __u32 info)
goto out;
orig_ip = ipv6_hdr(skb_in)->saddr;
- ipv6_hdr(skb_in)->saddr = ct->tuplehash[0].tuple.src.u3.in6;
+ dir = CTINFO2DIR(ctinfo);
+ ipv6_hdr(skb_in)->saddr = ct->tuplehash[dir].tuple.src.u3.in6;
__icmpv6_send(skb_in, type, code, info, &parm);
ipv6_hdr(skb_in)->saddr = orig_ip;
out:
--
2.51.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v3] icmp: fix icmp_ndo_send address translation for reply direction
2025-08-28 9:14 ` [PATCH v3] " Fabian Bläse
@ 2025-08-28 12:00 ` Pablo Neira Ayuso
2025-08-28 12:15 ` Florian Westphal
2025-08-28 12:48 ` Florian Westphal
2025-09-01 20:20 ` patchwork-bot+netdevbpf
2 siblings, 1 reply; 12+ messages in thread
From: Pablo Neira Ayuso @ 2025-08-28 12:00 UTC (permalink / raw)
To: Fabian Bläse
Cc: netdev, netfilter-devel, Jason A. Donenfeld, Florian Westphal
On Thu, Aug 28, 2025 at 11:14:35AM +0200, Fabian Bläse wrote:
> The icmp_ndo_send function was originally introduced to ensure proper
> rate limiting when icmp_send is called by a network device driver,
> where the packet's source address may have already been transformed
> by SNAT.
>
> However, the original implementation only considers the
> IP_CT_DIR_ORIGINAL direction for SNAT and always replaced the packet's
> source address with that of the original-direction tuple. This causes
> two problems:
>
> 1. For SNAT:
> Reply-direction packets were incorrectly translated using the source
> address of the CT original direction, even though no translation is
> required.
>
> 2. For DNAT:
> Reply-direction packets were not handled at all. In DNAT, the original
> direction's destination is translated. Therefore, in the reply
> direction the source address must be set to the reply-direction
> source, so rate limiting works as intended.
>
> Fix this by using the connection direction to select the correct tuple
> for source address translation, and adjust the pre-checks to handle
> reply-direction packets in case of DNAT.
>
> Additionally, wrap the `ct->status` access in READ_ONCE(). This avoids
> possible KCSAN reports about concurrent updates to `ct->status`.
I think such concurrent update cannot not happen, NAT bits are only
set for the first packet of a connection, which sets up the nat
configuration, so READ_ONCE() can go away.
Florian?
> Fixes: 0b41713b6066 ("icmp: introduce helper for nat'd source address in network device context")
>
> Signed-off-by: Fabian Bläse <fabian@blaese.de>
> Cc: Jason A. Donenfeld <Jason@zx2c4.com>
> Cc: Florian Westphal <fw@strlen.de>
> ---
> Changes v1->v2:
> - Implement fix for ICMPv6 as well
>
> Changes v2->v3:
> - Collapse conditional tuple selection into a single direction lookup [Florian]
> - Always apply source address translation if IPS_NAT_MASK is set [Florian]
> - Wrap ct->status in READ_ONCE()
> - Add a clearer explanation of the behaviour change for DNAT
> ---
> net/ipv4/icmp.c | 6 ++++--
> net/ipv6/ip6_icmp.c | 6 ++++--
> 2 files changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
> index 2ffe73ea644f..c48c572f024d 100644
> --- a/net/ipv4/icmp.c
> +++ b/net/ipv4/icmp.c
> @@ -799,11 +799,12 @@ void icmp_ndo_send(struct sk_buff *skb_in, int type, int code, __be32 info)
> struct sk_buff *cloned_skb = NULL;
> struct ip_options opts = { 0 };
> enum ip_conntrack_info ctinfo;
> + enum ip_conntrack_dir dir;
> struct nf_conn *ct;
> __be32 orig_ip;
>
> ct = nf_ct_get(skb_in, &ctinfo);
> - if (!ct || !(ct->status & IPS_SRC_NAT)) {
> + if (!ct || !(READ_ONCE(ct->status) & IPS_NAT_MASK)) {
> __icmp_send(skb_in, type, code, info, &opts);
> return;
> }
> @@ -818,7 +819,8 @@ void icmp_ndo_send(struct sk_buff *skb_in, int type, int code, __be32 info)
> goto out;
>
> orig_ip = ip_hdr(skb_in)->saddr;
> - ip_hdr(skb_in)->saddr = ct->tuplehash[0].tuple.src.u3.ip;
> + dir = CTINFO2DIR(ctinfo);
> + ip_hdr(skb_in)->saddr = ct->tuplehash[dir].tuple.src.u3.ip;
> __icmp_send(skb_in, type, code, info, &opts);
> ip_hdr(skb_in)->saddr = orig_ip;
> out:
> diff --git a/net/ipv6/ip6_icmp.c b/net/ipv6/ip6_icmp.c
> index 9e3574880cb0..233914b63bdb 100644
> --- a/net/ipv6/ip6_icmp.c
> +++ b/net/ipv6/ip6_icmp.c
> @@ -54,11 +54,12 @@ void icmpv6_ndo_send(struct sk_buff *skb_in, u8 type, u8 code, __u32 info)
> struct inet6_skb_parm parm = { 0 };
> struct sk_buff *cloned_skb = NULL;
> enum ip_conntrack_info ctinfo;
> + enum ip_conntrack_dir dir;
> struct in6_addr orig_ip;
> struct nf_conn *ct;
>
> ct = nf_ct_get(skb_in, &ctinfo);
> - if (!ct || !(ct->status & IPS_SRC_NAT)) {
> + if (!ct || !(READ_ONCE(ct->status) & IPS_NAT_MASK)) {
> __icmpv6_send(skb_in, type, code, info, &parm);
> return;
> }
> @@ -73,7 +74,8 @@ void icmpv6_ndo_send(struct sk_buff *skb_in, u8 type, u8 code, __u32 info)
> goto out;
>
> orig_ip = ipv6_hdr(skb_in)->saddr;
> - ipv6_hdr(skb_in)->saddr = ct->tuplehash[0].tuple.src.u3.in6;
> + dir = CTINFO2DIR(ctinfo);
> + ipv6_hdr(skb_in)->saddr = ct->tuplehash[dir].tuple.src.u3.in6;
> __icmpv6_send(skb_in, type, code, info, &parm);
> ipv6_hdr(skb_in)->saddr = orig_ip;
> out:
> --
> 2.51.0
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3] icmp: fix icmp_ndo_send address translation for reply direction
2025-08-28 12:00 ` Pablo Neira Ayuso
@ 2025-08-28 12:15 ` Florian Westphal
2025-08-28 12:33 ` Pablo Neira Ayuso
0 siblings, 1 reply; 12+ messages in thread
From: Florian Westphal @ 2025-08-28 12:15 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: Fabian Bläse, netdev, netfilter-devel, Jason A. Donenfeld
Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Thu, Aug 28, 2025 at 11:14:35AM +0200, Fabian Bläse wrote:
> > The icmp_ndo_send function was originally introduced to ensure proper
> > rate limiting when icmp_send is called by a network device driver,
> > where the packet's source address may have already been transformed
> > by SNAT.
> >
> > However, the original implementation only considers the
> > IP_CT_DIR_ORIGINAL direction for SNAT and always replaced the packet's
> > source address with that of the original-direction tuple. This causes
> > two problems:
> >
> > 1. For SNAT:
> > Reply-direction packets were incorrectly translated using the source
> > address of the CT original direction, even though no translation is
> > required.
> >
> > 2. For DNAT:
> > Reply-direction packets were not handled at all. In DNAT, the original
> > direction's destination is translated. Therefore, in the reply
> > direction the source address must be set to the reply-direction
> > source, so rate limiting works as intended.
> >
> > Fix this by using the connection direction to select the correct tuple
> > for source address translation, and adjust the pre-checks to handle
> > reply-direction packets in case of DNAT.
> >
> > Additionally, wrap the `ct->status` access in READ_ONCE(). This avoids
> > possible KCSAN reports about concurrent updates to `ct->status`.
>
> I think such concurrent update cannot not happen, NAT bits are only
> set for the first packet of a connection, which sets up the nat
> configuration, so READ_ONCE() can go away.
Yes, the NAT bits stay in place but not other flags in ->status, e.g.
DYING, ASSURED, etc.
So I believe its needed, concurrent update of ->status is possible and
KCSAN would warn. Other spots either use READ_ONCE or use test_bit().
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3] icmp: fix icmp_ndo_send address translation for reply direction
2025-08-28 12:15 ` Florian Westphal
@ 2025-08-28 12:33 ` Pablo Neira Ayuso
2025-08-28 12:48 ` Florian Westphal
0 siblings, 1 reply; 12+ messages in thread
From: Pablo Neira Ayuso @ 2025-08-28 12:33 UTC (permalink / raw)
To: Florian Westphal
Cc: Fabian Bläse, netdev, netfilter-devel, Jason A. Donenfeld
On Thu, Aug 28, 2025 at 02:15:53PM +0200, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > On Thu, Aug 28, 2025 at 11:14:35AM +0200, Fabian Bläse wrote:
> > > The icmp_ndo_send function was originally introduced to ensure proper
> > > rate limiting when icmp_send is called by a network device driver,
> > > where the packet's source address may have already been transformed
> > > by SNAT.
> > >
> > > However, the original implementation only considers the
> > > IP_CT_DIR_ORIGINAL direction for SNAT and always replaced the packet's
> > > source address with that of the original-direction tuple. This causes
> > > two problems:
> > >
> > > 1. For SNAT:
> > > Reply-direction packets were incorrectly translated using the source
> > > address of the CT original direction, even though no translation is
> > > required.
> > >
> > > 2. For DNAT:
> > > Reply-direction packets were not handled at all. In DNAT, the original
> > > direction's destination is translated. Therefore, in the reply
> > > direction the source address must be set to the reply-direction
> > > source, so rate limiting works as intended.
> > >
> > > Fix this by using the connection direction to select the correct tuple
> > > for source address translation, and adjust the pre-checks to handle
> > > reply-direction packets in case of DNAT.
> > >
> > > Additionally, wrap the `ct->status` access in READ_ONCE(). This avoids
> > > possible KCSAN reports about concurrent updates to `ct->status`.
> >
> > I think such concurrent update cannot not happen, NAT bits are only
> > set for the first packet of a connection, which sets up the nat
> > configuration, so READ_ONCE() can go away.
>
> Yes, the NAT bits stay in place but not other flags in ->status, e.g.
> DYING, ASSURED, etc.
>
> So I believe its needed, concurrent update of ->status is possible and
> KCSAN would warn. Other spots either use READ_ONCE or use test_bit().
There are a more checks for ct->status & NAT_MASK in the tree that I
can see, if you are correct, then maybe a new helper function to check
for NAT_MASK is needed.
Anyway, as for this patch, READ_ONCE should not harm.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3] icmp: fix icmp_ndo_send address translation for reply direction
2025-08-28 12:33 ` Pablo Neira Ayuso
@ 2025-08-28 12:48 ` Florian Westphal
0 siblings, 0 replies; 12+ messages in thread
From: Florian Westphal @ 2025-08-28 12:48 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: Fabian Bläse, netdev, netfilter-devel, Jason A. Donenfeld
Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > So I believe its needed, concurrent update of ->status is possible and
> > KCSAN would warn. Other spots either use READ_ONCE or use test_bit().
>
> There are a more checks for ct->status & NAT_MASK in the tree that I
> can see, if you are correct, then maybe a new helper function to check
> for NAT_MASK is needed.
I think it would make sense to add a helper, yes.
Independent of this patch of course.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3] icmp: fix icmp_ndo_send address translation for reply direction
2025-08-28 9:14 ` [PATCH v3] " Fabian Bläse
2025-08-28 12:00 ` Pablo Neira Ayuso
@ 2025-08-28 12:48 ` Florian Westphal
2025-09-01 20:20 ` patchwork-bot+netdevbpf
2 siblings, 0 replies; 12+ messages in thread
From: Florian Westphal @ 2025-08-28 12:48 UTC (permalink / raw)
To: Fabian Bläse; +Cc: netdev, netfilter-devel, Jason A. Donenfeld
Fabian Bläse <fabian@blaese.de> wrote:
> The icmp_ndo_send function was originally introduced to ensure proper
> rate limiting when icmp_send is called by a network device driver,
> where the packet's source address may have already been transformed
> by SNAT.
>
> However, the original implementation only considers the
> IP_CT_DIR_ORIGINAL direction for SNAT and always replaced the packet's
> source address with that of the original-direction tuple. This causes
> two problems:
>
> 1. For SNAT:
> Reply-direction packets were incorrectly translated using the source
> address of the CT original direction, even though no translation is
> required.
>
> 2. For DNAT:
> Reply-direction packets were not handled at all. In DNAT, the original
> direction's destination is translated. Therefore, in the reply
> direction the source address must be set to the reply-direction
> source, so rate limiting works as intended.
>
> Fix this by using the connection direction to select the correct tuple
> for source address translation, and adjust the pre-checks to handle
> reply-direction packets in case of DNAT.
>
> Additionally, wrap the `ct->status` access in READ_ONCE(). This avoids
> possible KCSAN reports about concurrent updates to `ct->status`.
Reviewed-by: Florian Westphal <fw@strlen.de>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3] icmp: fix icmp_ndo_send address translation for reply direction
2025-08-28 9:14 ` [PATCH v3] " Fabian Bläse
2025-08-28 12:00 ` Pablo Neira Ayuso
2025-08-28 12:48 ` Florian Westphal
@ 2025-09-01 20:20 ` patchwork-bot+netdevbpf
2 siblings, 0 replies; 12+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-09-01 20:20 UTC (permalink / raw)
To: =?utf-8?q?Fabian_Bl=C3=A4se_=3Cfabian=40blaese=2Ede=3E?=
Cc: netdev, netfilter-devel, Jason, fw
Hello:
This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Thu, 28 Aug 2025 11:14:35 +0200 you wrote:
> The icmp_ndo_send function was originally introduced to ensure proper
> rate limiting when icmp_send is called by a network device driver,
> where the packet's source address may have already been transformed
> by SNAT.
>
> However, the original implementation only considers the
> IP_CT_DIR_ORIGINAL direction for SNAT and always replaced the packet's
> source address with that of the original-direction tuple. This causes
> two problems:
>
> [...]
Here is the summary with links:
- [v3] icmp: fix icmp_ndo_send address translation for reply direction
https://git.kernel.org/netdev/net/c/c6dd1aa2cbb7
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-09-01 20:20 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-25 20:17 [PATCH] icmp: fix icmp_ndo_send address translation for reply direction Fabian Bläse
2025-08-25 20:38 ` [PATCH v2] " Fabian Bläse
2025-08-27 9:05 ` Florian Westphal
2025-08-27 17:12 ` Fabian Bläse
2025-08-27 17:25 ` Florian Westphal
2025-08-28 9:14 ` [PATCH v3] " Fabian Bläse
2025-08-28 12:00 ` Pablo Neira Ayuso
2025-08-28 12:15 ` Florian Westphal
2025-08-28 12:33 ` Pablo Neira Ayuso
2025-08-28 12:48 ` Florian Westphal
2025-08-28 12:48 ` Florian Westphal
2025-09-01 20:20 ` patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).