All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pablo Neira Ayuso <pablo@netfilter.org>
To: Tzung-Bi Shih <tzungbi@kernel.org>
Cc: kadlec@netfilter.org, fw@strlen.de,
	netfilter-devel@vger.kernel.org, coreteam@netfilter.org,
	jiejiang@chromium.org, jasongustaman@chromium.org,
	garrick@chromium.org
Subject: Re: [PATCH] netfilter: conntrack: fix wrong ct->timeout value
Date: Fri, 14 Apr 2023 10:12:14 +0200	[thread overview]
Message-ID: <ZDkK3ktVcBaykTVT@calendula> (raw)
In-Reply-To: <ZDjN4gyv0x1aewgm@google.com>

On Fri, Apr 14, 2023 at 11:52:02AM +0800, Tzung-Bi Shih wrote:
> On Thu, Apr 13, 2023 at 11:04:24AM +0200, Pablo Neira Ayuso wrote:
> > On Thu, Apr 13, 2023 at 11:23:11AM +0800, Tzung-Bi Shih wrote:
> > > nf_ct_expires() got called by userspace program.  And the return
> > > value (which means the remaining timeout) will be the parameter for
> > > the next ctnetlink_change_timeout().
> > 
> > Unconfirmed conntrack is owned by the packet that refers to it, it is
> > not yet in the hashes. I don't see how concurrent access to the
> > timeout might occur.
> > 
> > Or are you referring to a different scenario that triggers the partial
> > state?
> 
> I think it isn't a concurrent access.  We observed that userspace program
> reads the remaining timeout and sets it back for unconfirmed conntrack.
> 
> Conceptually, it looks like:
> timeout = nf_ct_expires(...);  (1)
> ctnetlink_change_timeout(...timeout);

How could this possibly happen?

nf_ct_expires() is called from:

- ctnetlink_dump_timeout(), this is netlink dump path, since:

commit 1397af5bfd7d32b0cf2adb70a78c9a9e8f11d912
Author: Florian Westphal <fw@strlen.de>
Date:   Mon Apr 11 13:01:18 2022 +0200

    netfilter: conntrack: remove the percpu dying list

it is not possible to do any listing of unconfirmed, and that is fine.

As I said, nfnetlink_queue operates with an unconfirmed conntrack with
owns exclusively, which is not in the hashes.

- nf_ct_expires() calls from xt and nft matches are irrelevant:

net/netfilter/nft_ct.c:         *dest = jiffies_to_msecs(nf_ct_expires(ct));
net/netfilter/xt_conntrack.c:           unsigned long expires = nf_ct_expires(ct) / HZ;

- there is the garbage collector, but that only works with conntrack
  entries in the hashes:

net/netfilter/nf_conntrack_core.c:                      expires = clamp(nf_ct_expires(tmp), GC_SCAN_INTERVAL_MIN, GC_SCAN_INTERVAL_CLAMP);

- there is also /proc interface, but that only works with conntrack
  entries in the hashes:

net/netfilter/nf_conntrack_standalone.c:                seq_printf(s, "%ld ", nf_ct_expires(ct)  / HZ);

> At (1), `nfct_time_stamp` is wrongly involved in the calculation[4] because
> the conntrack is unconfirmed.  The `ct->timeout` is an internal but not a
> timestamp.

I can see there are two possible states for ct->timeout, thanks for
explaining this again.

> As a result, ctnetlink_change_timeout() sets the wrong value to `ct->timeout`.
> 
> [4]: https://elixir.bootlin.com/linux/v6.3-rc6/source/include/net/netfilter/nf_conntrack.h#L296
> 
> > > As you can see in [4], if this happens on an unconfirmed conntrack, the
> > > `nfct_time_stamp` would be wrongly invoved in the calculation again.
> > > That's why we take care of all `ct->timeout` accesses in v1.
> 
> Again, that's why v1 separates all `ct->timeout` accesses.
> 
> If the conntrack is confirmed:
> - `ct->timeout` is a timestamp.
> - `nfct_time_stamp` should be involved when getting/setting `ct->timeout`.
> 
> If the conntrack is unconfirmed:
> - `ct->timeout` is an interval.
> - `nfct_time_stamp` shouldn't be involved.
> 
> Only separate `ct->timeout` access in __nf_ct_set_timeout() is obviously
> insufficient.  I would suggest either take care of all `ct->timeout`
> accesses as v1 or at least change both __nf_ct_set_timeout() and
> nf_ct_expires().

it does not even make sense to use WRITE_ONCE from
__nf_conntrack_confirm(), see:

https://elixir.bootlin.com/linux/latest/source/net/netfilter/nf_conntrack_core.c#L1260

because the unconfirmed conntrack object is owned exclusively by the packet.

> In the latter case, it looks fine in our environment.  However, I'm not
> sure if any hidden cases we haven't seen.

Maybe you have an old kernel?

  reply	other threads:[~2023-04-14  8:12 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-10  6:09 [PATCH] netfilter: conntrack: fix wrong ct->timeout value Tzung-Bi Shih
2023-04-10  8:33 ` Pablo Neira Ayuso
2023-04-10  9:31   ` Pablo Neira Ayuso
2023-04-10  9:59     ` Tzung-Bi Shih
2023-04-12 22:56       ` Pablo Neira Ayuso
2023-04-13  3:23         ` Tzung-Bi Shih
2023-04-13  9:04           ` Pablo Neira Ayuso
2023-04-14  3:52             ` Tzung-Bi Shih
2023-04-14  8:12               ` Pablo Neira Ayuso [this message]
2023-04-17  3:41                 ` Tzung-Bi Shih
2023-04-18  8:17                   ` Pablo Neira Ayuso
2023-04-19  5:20                     ` Tzung-Bi Shih

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZDkK3ktVcBaykTVT@calendula \
    --to=pablo@netfilter.org \
    --cc=coreteam@netfilter.org \
    --cc=fw@strlen.de \
    --cc=garrick@chromium.org \
    --cc=jasongustaman@chromium.org \
    --cc=jiejiang@chromium.org \
    --cc=kadlec@netfilter.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=tzungbi@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.