From: Pablo Neira Ayuso <pablo@netfilter.org>
To: Tzung-Bi Shih <tzungbi@kernel.org>
Cc: kadlec@netfilter.org, fw@strlen.de,
netfilter-devel@vger.kernel.org, coreteam@netfilter.org,
jiejiang@chromium.org, jasongustaman@chromium.org,
garrick@chromium.org
Subject: Re: [PATCH] netfilter: conntrack: fix wrong ct->timeout value
Date: Fri, 14 Apr 2023 10:12:14 +0200 [thread overview]
Message-ID: <ZDkK3ktVcBaykTVT@calendula> (raw)
In-Reply-To: <ZDjN4gyv0x1aewgm@google.com>
On Fri, Apr 14, 2023 at 11:52:02AM +0800, Tzung-Bi Shih wrote:
> On Thu, Apr 13, 2023 at 11:04:24AM +0200, Pablo Neira Ayuso wrote:
> > On Thu, Apr 13, 2023 at 11:23:11AM +0800, Tzung-Bi Shih wrote:
> > > nf_ct_expires() got called by userspace program. And the return
> > > value (which means the remaining timeout) will be the parameter for
> > > the next ctnetlink_change_timeout().
> >
> > Unconfirmed conntrack is owned by the packet that refers to it, it is
> > not yet in the hashes. I don't see how concurrent access to the
> > timeout might occur.
> >
> > Or are you referring to a different scenario that triggers the partial
> > state?
>
> I think it isn't a concurrent access. We observed that userspace program
> reads the remaining timeout and sets it back for unconfirmed conntrack.
>
> Conceptually, it looks like:
> timeout = nf_ct_expires(...); (1)
> ctnetlink_change_timeout(...timeout);
How could this possibly happen?
nf_ct_expires() is called from:
- ctnetlink_dump_timeout(), this is netlink dump path, since:
commit 1397af5bfd7d32b0cf2adb70a78c9a9e8f11d912
Author: Florian Westphal <fw@strlen.de>
Date: Mon Apr 11 13:01:18 2022 +0200
netfilter: conntrack: remove the percpu dying list
it is not possible to do any listing of unconfirmed, and that is fine.
As I said, nfnetlink_queue operates with an unconfirmed conntrack with
owns exclusively, which is not in the hashes.
- nf_ct_expires() calls from xt and nft matches are irrelevant:
net/netfilter/nft_ct.c: *dest = jiffies_to_msecs(nf_ct_expires(ct));
net/netfilter/xt_conntrack.c: unsigned long expires = nf_ct_expires(ct) / HZ;
- there is the garbage collector, but that only works with conntrack
entries in the hashes:
net/netfilter/nf_conntrack_core.c: expires = clamp(nf_ct_expires(tmp), GC_SCAN_INTERVAL_MIN, GC_SCAN_INTERVAL_CLAMP);
- there is also /proc interface, but that only works with conntrack
entries in the hashes:
net/netfilter/nf_conntrack_standalone.c: seq_printf(s, "%ld ", nf_ct_expires(ct) / HZ);
> At (1), `nfct_time_stamp` is wrongly involved in the calculation[4] because
> the conntrack is unconfirmed. The `ct->timeout` is an internal but not a
> timestamp.
I can see there are two possible states for ct->timeout, thanks for
explaining this again.
> As a result, ctnetlink_change_timeout() sets the wrong value to `ct->timeout`.
>
> [4]: https://elixir.bootlin.com/linux/v6.3-rc6/source/include/net/netfilter/nf_conntrack.h#L296
>
> > > As you can see in [4], if this happens on an unconfirmed conntrack, the
> > > `nfct_time_stamp` would be wrongly invoved in the calculation again.
> > > That's why we take care of all `ct->timeout` accesses in v1.
>
> Again, that's why v1 separates all `ct->timeout` accesses.
>
> If the conntrack is confirmed:
> - `ct->timeout` is a timestamp.
> - `nfct_time_stamp` should be involved when getting/setting `ct->timeout`.
>
> If the conntrack is unconfirmed:
> - `ct->timeout` is an interval.
> - `nfct_time_stamp` shouldn't be involved.
>
> Only separate `ct->timeout` access in __nf_ct_set_timeout() is obviously
> insufficient. I would suggest either take care of all `ct->timeout`
> accesses as v1 or at least change both __nf_ct_set_timeout() and
> nf_ct_expires().
it does not even make sense to use WRITE_ONCE from
__nf_conntrack_confirm(), see:
https://elixir.bootlin.com/linux/latest/source/net/netfilter/nf_conntrack_core.c#L1260
because the unconfirmed conntrack object is owned exclusively by the packet.
> In the latter case, it looks fine in our environment. However, I'm not
> sure if any hidden cases we haven't seen.
Maybe you have an old kernel?
next prev parent reply other threads:[~2023-04-14 8:12 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-10 6:09 [PATCH] netfilter: conntrack: fix wrong ct->timeout value Tzung-Bi Shih
2023-04-10 8:33 ` Pablo Neira Ayuso
2023-04-10 9:31 ` Pablo Neira Ayuso
2023-04-10 9:59 ` Tzung-Bi Shih
2023-04-12 22:56 ` Pablo Neira Ayuso
2023-04-13 3:23 ` Tzung-Bi Shih
2023-04-13 9:04 ` Pablo Neira Ayuso
2023-04-14 3:52 ` Tzung-Bi Shih
2023-04-14 8:12 ` Pablo Neira Ayuso [this message]
2023-04-17 3:41 ` Tzung-Bi Shih
2023-04-18 8:17 ` Pablo Neira Ayuso
2023-04-19 5:20 ` Tzung-Bi Shih
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZDkK3ktVcBaykTVT@calendula \
--to=pablo@netfilter.org \
--cc=coreteam@netfilter.org \
--cc=fw@strlen.de \
--cc=garrick@chromium.org \
--cc=jasongustaman@chromium.org \
--cc=jiejiang@chromium.org \
--cc=kadlec@netfilter.org \
--cc=netfilter-devel@vger.kernel.org \
--cc=tzungbi@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.