From: Pablo Neira Ayuso <pablo@netfilter.org>
To: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>,
Florian Westphal <fw@strlen.de>,
"David S. Miller" <davem@davemloft.net>,
netfilter-devel@vger.kernel.org, coreteam@netfilter.org,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] netfilter: conntrack: fix clash resolution in nat
Date: Thu, 29 Jun 2017 18:45:34 +0200 [thread overview]
Message-ID: <20170629164534.GA7271@salvia> (raw)
In-Reply-To: <1497427883-3771-1-git-send-email-yanhaishuang@cmss.chinamobile.com>
Hi,
On Wed, Jun 14, 2017 at 04:11:23PM +0800, Haishuang Yan wrote:
> In our openstack environment, slow dns lookup for hostname when
> parallel dns requests for IPv4 and IPv6 addresses from VM, the
> second IPv6 request(AAAA record) is dropped on its way in compute
> node.
>
> We found many similar related links:
> https://bbs.archlinux.org/viewtopic.php?id=75770
> http://www.spinics.net/lists/netfilter-devel/msg15860.html
> https://www.spinics.net/lists/netfilter-devel/msg37565.html
>
> After the commit 71d8c47fc653 ("netfilter: conntrack: introduce clash
> resolution on insertion race") can fix packet drop, the second IPv6
> request packet would be forwarded by compute node, but the udp source
> port is bogus, start from 1024:
> 14:52:10.868485 IP 10.136.5.132.55785 > 10.136.5.1.domain: 3957+ A?
> www.baidu.com. (31)
> 14:52:10.868544 IP 10.136.5.132.1024 > 10.136.5.1.domain: 13826+ AAAA?
> www.baidu.com. (31)
>
> And after the commit 590b52e10d41 ("netfilter: conntrack: skip clash
> resolution if nat is in place") , it exclude nat situation, but the
> packet drops issue come back again.
>
> This patch revert the last commit. If l4proto allow clash but the tuple is
> used by the conntrack that wins race, the original tuple can be reused.
>
> Fixes: 590b52e10d41 ("netfilter: conntrack: skip clash resolution if 25
> nat is in place")
> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
> ---
> net/netfilter/nf_conntrack_core.c | 1 -
> net/netfilter/nf_nat_core.c | 4 +++-
> 2 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> index e847dba..7e16518 100644
> --- a/net/netfilter/nf_conntrack_core.c
> +++ b/net/netfilter/nf_conntrack_core.c
> @@ -699,7 +699,6 @@ static int nf_ct_resolve_clash(struct net *net, struct sk_buff *skb,
>
> l4proto = __nf_ct_l4proto_find(nf_ct_l3num(ct), nf_ct_protonum(ct));
> if (l4proto->allow_clash &&
> - ((ct->status & IPS_NAT_DONE_MASK) == 0) &&
> !nf_ct_is_dying(ct) &&
> atomic_inc_not_zero(&ct->ct_general.use)) {
> enum ip_conntrack_info oldinfo;
> diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c
> index 6c72922..9edfca2 100644
> --- a/net/netfilter/nf_nat_core.c
> +++ b/net/netfilter/nf_nat_core.c
> @@ -328,6 +328,7 @@ static int nf_nat_bysource_cmp(struct rhashtable_compare_arg *arg,
> const struct nf_conntrack_zone *zone;
> const struct nf_nat_l3proto *l3proto;
> const struct nf_nat_l4proto *l4proto;
> + const struct nf_conntrack_l4proto *ct_l4proto;
> struct net *net = nf_ct_net(ct);
>
> zone = nf_ct_zone(ct);
> @@ -336,6 +337,7 @@ static int nf_nat_bysource_cmp(struct rhashtable_compare_arg *arg,
> l3proto = __nf_nat_l3proto_find(orig_tuple->src.l3num);
> l4proto = __nf_nat_l4proto_find(orig_tuple->src.l3num,
> orig_tuple->dst.protonum);
> + ct_l4proto = __nf_ct_l4proto_find(nf_ct_l3num(ct), nf_ct_protonum(ct));
>
> /* 1) If this srcip/proto/src-proto-part is currently mapped,
> * and that same mapping gives a unique tuple within the given
> @@ -349,7 +351,7 @@ static int nf_nat_bysource_cmp(struct rhashtable_compare_arg *arg,
> !(range->flags & NF_NAT_RANGE_PROTO_RANDOM_ALL)) {
> /* try the original tuple first */
> if (in_range(l3proto, l4proto, orig_tuple, range)) {
> - if (!nf_nat_used_tuple(orig_tuple, ct)) {
> + if (!nf_nat_used_tuple(orig_tuple, ct) || ct_l4proto->allow_clash) {
Hm, this is defeating clash resolution logic entirely. I mean, with
this this would not call nf_nat_l4proto_unique_tuple() so we get a
unique tuple.
My concern with this is that, although this is fixing the issue for
you, I suspect this is breaking SNAT/masquerade in case we get two
clients in the LAN if both flow have a tuple that matches this:
*, srcport, dst-IP, dst-port
The problem that we have with this is a race condition, the problem is
that the second packet doesn't find a confirmed conntrack in the
hashes, so it gets its own one and NAT makes sure such tuple is
unique.
So far, the only way I see to fix this is to postpone packet mangling
that NAT performs after the clash resolution code in
nf_conntrack_confirm(). But that changes the behaviour in a way that
would break matching rules that assume the existing behaviour, that
is, nf_nat_setup() mangles the packet.
> *tuple = *orig_tuple;
> goto out;
> }
> --
> 1.8.3.1
>
>
>
prev parent reply other threads:[~2017-06-29 16:45 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-06-14 8:11 [PATCH] netfilter: conntrack: fix clash resolution in nat Haishuang Yan
2017-06-29 16:45 ` Pablo Neira Ayuso [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170629164534.GA7271@salvia \
--to=pablo@netfilter.org \
--cc=coreteam@netfilter.org \
--cc=davem@davemloft.net \
--cc=fw@strlen.de \
--cc=kadlec@blackhole.kfki.hu \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=netfilter-devel@vger.kernel.org \
--cc=yanhaishuang@cmss.chinamobile.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.