From: "Mika Penttilä" <mika.penttila@nextfour.com>
To: Pablo Neira Ayuso <pablo@netfilter.org>, netfilter-devel@vger.kernel.org
Cc: davem@davemloft.net, netdev@vger.kernel.org, kuba@kernel.org
Subject: Re: [PATCH net 3/9] netfilter: nf_nat: undo erroneous tcp edemux lookup
Date: Sat, 6 Mar 2021 16:49:19 +0200 [thread overview]
Message-ID: <b0d7a77b-a33d-150c-65e2-6caebcec772f@nextfour.com> (raw)
In-Reply-To: <20210306121223.28711-4-pablo@netfilter.org>
On 6.3.2021 14.12, Pablo Neira Ayuso wrote:
> From: Florian Westphal <fw@strlen.de>
>
> Under extremely rare conditions TCP early demux will retrieve the wrong
> socket.
>
> 1. local machine establishes a connection to a remote server, S, on port
> p.
>
> This gives:
> laddr:lport -> S:p
> ... both in tcp and conntrack.
>
> 2. local machine establishes a connection to host H, on port p2.
> 2a. TCP stack choses same laddr:lport, so we have
> laddr:lport -> H:p2 from TCP point of view.
> 2b). There is a destination NAT rewrite in place, translating
> H:p2 to S:p. This results in following conntrack entries:
>
> I) laddr:lport -> S:p (origin) S:p -> laddr:lport (reply)
> II) laddr:lport -> H:p2 (origin) S:p -> laddr:lport2 (reply)
>
> NAT engine has rewritten laddr:lport to laddr:lport2 to map
> the reply packet to the correct origin.
Could you eloborate where and how linux nat engine is doing the
laddr:lport to laddr:lport2
rewrite? There's only DST nat and there should be conflict (for reply)
in tuple establishment afaik....
>
> When server sends SYN/ACK to laddr:lport2, the PREROUTING hook
> will undo-the SNAT transformation, rewriting IP header to
> S:p -> laddr:lport
>
> This causes TCP early demux to associate the skb with the TCP socket
> of the first connection.
>
> The INPUT hook will then reverse the DNAT transformation, rewriting
> the IP header to H:p2 -> laddr:lport.
>
> Because packet ends up with the wrong socket, the new connection
> never completes: originator stays in SYN_SENT and conntrack entry
> remains in SYN_RECV until timeout, and responder retransmits SYN/ACK
> until it gives up.
>
> To resolve this, orphan the skb after the input rewrite:
> Because the source IP address changed, the socket must be incorrect.
> We can't move the DNAT undo to prerouting due to backwards
> compatibility, doing so will make iptables/nftables rules to no longer
> match the way they did.
>
> After orphan, the packet will be handed to the next protocol layer
> (tcp, udp, ...) and that will repeat the socket lookup just like as if
> early demux was disabled.
>
> Fixes: 41063e9dd1195 ("ipv4: Early TCP socket demux.")
> Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1427
> Signed-off-by: Florian Westphal <fw@strlen.de>
> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> ---
> net/netfilter/nf_nat_proto.c | 25 +++++++++++++++++++++----
> 1 file changed, 21 insertions(+), 4 deletions(-)
>
> diff --git a/net/netfilter/nf_nat_proto.c b/net/netfilter/nf_nat_proto.c
> index e87b6bd6b3cd..4731d21fc3ad 100644
> --- a/net/netfilter/nf_nat_proto.c
> +++ b/net/netfilter/nf_nat_proto.c
> @@ -646,8 +646,8 @@ nf_nat_ipv4_fn(void *priv, struct sk_buff *skb,
> }
>
> static unsigned int
> -nf_nat_ipv4_in(void *priv, struct sk_buff *skb,
> - const struct nf_hook_state *state)
> +nf_nat_ipv4_pre_routing(void *priv, struct sk_buff *skb,
> + const struct nf_hook_state *state)
> {
> unsigned int ret;
> __be32 daddr = ip_hdr(skb)->daddr;
> @@ -659,6 +659,23 @@ nf_nat_ipv4_in(void *priv, struct sk_buff *skb,
> return ret;
> }
>
> +static unsigned int
> +nf_nat_ipv4_local_in(void *priv, struct sk_buff *skb,
> + const struct nf_hook_state *state)
> +{
> + __be32 saddr = ip_hdr(skb)->saddr;
> + struct sock *sk = skb->sk;
> + unsigned int ret;
> +
> + ret = nf_nat_ipv4_fn(priv, skb, state);
> +
> + if (ret == NF_ACCEPT && sk && saddr != ip_hdr(skb)->saddr &&
> + !inet_sk_transparent(sk))
> + skb_orphan(skb); /* TCP edemux obtained wrong socket */
> +
> + return ret;
> +}
> +
> static unsigned int
> nf_nat_ipv4_out(void *priv, struct sk_buff *skb,
> const struct nf_hook_state *state)
> @@ -736,7 +753,7 @@ nf_nat_ipv4_local_fn(void *priv, struct sk_buff *skb,
> static const struct nf_hook_ops nf_nat_ipv4_ops[] = {
> /* Before packet filtering, change destination */
> {
> - .hook = nf_nat_ipv4_in,
> + .hook = nf_nat_ipv4_pre_routing,
> .pf = NFPROTO_IPV4,
> .hooknum = NF_INET_PRE_ROUTING,
> .priority = NF_IP_PRI_NAT_DST,
> @@ -757,7 +774,7 @@ static const struct nf_hook_ops nf_nat_ipv4_ops[] = {
> },
> /* After packet filtering, change source */
> {
> - .hook = nf_nat_ipv4_fn,
> + .hook = nf_nat_ipv4_local_in,
> .pf = NFPROTO_IPV4,
> .hooknum = NF_INET_LOCAL_IN,
> .priority = NF_IP_PRI_NAT_SRC,
next prev parent reply other threads:[~2021-03-06 14:50 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-06 12:12 [PATCH net 0/9] Netfilter fixes for net Pablo Neira Ayuso
2021-03-06 12:12 ` [PATCH net 1/9] uapi: nfnetlink_cthelper.h: fix userspace compilation error Pablo Neira Ayuso
2021-03-07 1:10 ` patchwork-bot+netdevbpf
2021-03-06 12:12 ` [PATCH net 2/9] netfilter: conntrack: Remove a double space in a log message Pablo Neira Ayuso
2021-03-06 12:12 ` [PATCH net 3/9] netfilter: nf_nat: undo erroneous tcp edemux lookup Pablo Neira Ayuso
2021-03-06 14:49 ` Mika Penttilä [this message]
2021-03-06 16:10 ` Mika Penttilä
2021-03-06 12:12 ` [PATCH net 4/9] netfilter: conntrack: avoid misleading 'invalid' in log message Pablo Neira Ayuso
2021-03-06 12:12 ` [PATCH net 5/9] selftests: netfilter: test nat port clash resolution interaction with tcp early demux Pablo Neira Ayuso
2021-03-06 12:12 ` [PATCH net 6/9] netfilter: x_tables: gpf inside xt_find_revision() Pablo Neira Ayuso
2021-03-06 12:12 ` [PATCH net 7/9] netfilter: nftables: disallow updates on table ownership Pablo Neira Ayuso
2021-03-06 12:12 ` [PATCH net 8/9] netfilter: nftables: fix possible double hook unregistration with table owner Pablo Neira Ayuso
2021-03-06 12:12 ` [PATCH net 9/9] netfilter: nftables: bogus check for netlink portID " Pablo Neira Ayuso
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b0d7a77b-a33d-150c-65e2-6caebcec772f@nextfour.com \
--to=mika.penttila@nextfour.com \
--cc=davem@davemloft.net \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=netfilter-devel@vger.kernel.org \
--cc=pablo@netfilter.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).