All of lore.kernel.org
 help / color / mirror / Atom feed
From: Florian Westphal <fw@strlen.de>
To: Yafang Shao <laoar.shao@gmail.com>
Cc: pablo@netfilter.org, kadlec@netfilter.org,
	David Miller <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Simon Horman <horms@kernel.org>,
	netfilter-devel@vger.kernel.org, coreteam@netfilter.org
Subject: Re: [BUG REPORT] netfilter: DNS/SNAT Issue in Kubernetes Environment
Date: Wed, 28 May 2025 13:22:54 +0200	[thread overview]
Message-ID: <aDbyDiOBa3_MwsE4@strlen.de> (raw)
In-Reply-To: <CALOAHbBj9_TBOQUEX-4CFK_AHp0v6mRETfCw6uWQ0zYB1sBczQ@mail.gmail.com>

Yafang Shao <laoar.shao@gmail.com> wrote:
> Our kernel is 6.1.y (also reproduced on 6.14)
> 
> Host Network Configuration:
> --------------------------------------
> 
> We run a DNS proxy on our Kubernetes servers with the following iptables rules:
> 
> -A PREROUTING -d 169.254.1.2/32 -j DNS-DNAT
> -A DNS-DNAT -d 169.254.1.2/32 -i eth0 -j RETURN
> -A DNS-DNAT -d 169.254.1.2/32 -i eth1 -j RETURN
> -A DNS-DNAT -d 169.254.1.2/32 -i bond0 -j RETURN
> -A DNS-DNAT -j DNAT --to-destination 127.0.0.1
> -A KUBE-MARK-MASQ -j MARK --set-xmark 0x4000/0x4000
> -A POSTROUTING -j KUBE-POSTROUTING
> -A KUBE-POSTROUTING -m mark --mark 0x4000/0x4000 -j MASQUERADE
> 
> Container Network Configuration:
> --------------------------------------------
> Containers use 169.254.1.2 as their DNS resolver:
> 
> $ cat /etc/resolve.conf
> nameserver 169.254.1.2
> 
> Issue Description
> ------------------------
> 
> When performing DNS lookups from a container, the query fails with an
> unexpected source port:
> 
> $ dig +short @169.254.1.2 A www.google.com
> ;; reply from unexpected source: 169.254.1.2#123, expected 169.254.1.2#53
> 
> The tcpdump is as follows,
> 
> 16:47:23.441705 veth9cffd2a4 P   IP 10.242.249.78.37562 >
> 169.254.1.2.53: 298+ [1au] A? www.google.com. (55)
> 16:47:23.441705 bridge0 In  IP 10.242.249.78.37562 > 127.0.0.1.53:
> 298+ [1au] A? www.google.com. (55)
> 16:47:23.441856 bridge0 Out IP 169.254.1.2.53 > 10.242.249.78.37562:
> 298 1/0/1 A 142.250.71.228 (59)
> 16:47:23.441863 bond0 Out IP 169.254.1.2.53 > 10.242.249.78.37562: 298
> 1/0/1 A 142.250.71.228 (59)
> 16:47:23.441867 eth1  Out IP 169.254.1.2.53 > 10.242.249.78.37562: 298
> 1/0/1 A 142.250.71.228 (59)
> 16:47:23.441885 eth1  P   IP 169.254.1.2.53 > 10.242.249.78.37562: 298
> 1/0/1 A 142.250.71.228 (59)
> 16:47:23.441885 bond0 P   IP 169.254.1.2.53 > 10.242.249.78.37562: 298
> 1/0/1 A 142.250.71.228 (59)
> 16:47:23.441916 veth9cffd2a4 Out IP 169.254.1.2.124 >
> 10.242.249.78.37562: UDP, length 59
> 
> The DNS response port is unexpectedly changed from 53 to 124, causing
> the application can't receive the response.
> 
> We suspected the issue might be related to commit d8f84a9bc7c4
> ("netfilter: nf_nat: don't try nat source port reallocation for
> reverse dir clash"). After applying this commit, the port remapping no
> longer occurs, but the DNS response is still dropped.

Thats suspicious, I don't see how this is related.  d8f84a9bc7c4
deals with indepdent action, i.e.
 A sends to B and B sends to A, but *at the same time*.

With a request-response protocol like DNS this should obviously never
happen -- B can't reply before A's request has passed through the stack.

> The response is now correctly sent to port 53, but it is dropped in
> __nf_conntrack_confirm().
> 
> We bypassed the issue by modifying __nf_conntrack_confirm()  to skip
> the conflicting conntrack entry check:
> 
> diff --git a/net/netfilter/nf_conntrack_core.c
> b/net/netfilter/nf_conntrack_core.c
> index 7bee5bd22be2..3481e9d333b0 100644
> --- a/net/netfilter/nf_conntrack_core.c
> +++ b/net/netfilter/nf_conntrack_core.c
> @@ -1245,9 +1245,9 @@ __nf_conntrack_confirm(struct sk_buff *skb)
> 
>         chainlen = 0;
>         hlist_nulls_for_each_entry(h, n,
> &nf_conntrack_hash[reply_hash], hnnode) {
> -               if (nf_ct_key_equal(h, &ct->tuplehash[IP_CT_DIR_REPLY].tuple,
> -                                   zone, net))
> -                       goto out;
> +               //if (nf_ct_key_equal(h, &ct->tuplehash[IP_CT_DIR_REPLY].tuple,
> +               //                  zone, net))
> +               //      goto out;
>                 if (chainlen++ > max_chainlen) {
>  chaintoolong:
>                         NF_CT_STAT_INC(net, chaintoolong);

I don't understand this bit either.  For A/AAAA requests racing in same
direction, nf_ct_resolve_clash() machinery should have handled this
situation.

And I don't see how you can encounter a DNS reply before at least one
request has been committed to the table -- i.e., the conntrack being
confirmed here should not exist -- the packet should have been picked up
as a reply packet.

  reply	other threads:[~2025-05-28 11:23 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-28  9:03 [BUG REPORT] netfilter: DNS/SNAT Issue in Kubernetes Environment Yafang Shao
2025-05-28 11:22 ` Florian Westphal [this message]
2025-05-28 11:41   ` Yafang Shao
2025-05-28 12:14     ` Florian Westphal
2025-05-28 12:31       ` Yafang Shao
2025-05-28 12:43         ` Yafang Shao
2025-05-28 13:10           ` Florian Westphal
2025-05-28 13:20         ` Florian Westphal
2025-05-28 14:07           ` Yafang Shao
2025-05-28 21:48             ` Florian Westphal
2025-05-29  2:20               ` Yafang Shao
2025-05-28 23:43 ` Shaun Brady
2025-05-29  3:46   ` Yafang Shao
2025-05-30  0:45   ` Florian Westphal
2025-05-30  2:44     ` Yafang Shao
2025-05-30  3:37       ` Shaun Brady

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aDbyDiOBa3_MwsE4@strlen.de \
    --to=fw@strlen.de \
    --cc=coreteam@netfilter.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=kadlec@netfilter.org \
    --cc=kuba@kernel.org \
    --cc=laoar.shao@gmail.com \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pablo@netfilter.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.