All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pablo Neira Ayuso <pablo@netfilter.org>
To: Sven Auhagen <sven.auhagen@voleatech.de>
Cc: Oz Shlomo <ozsh@nvidia.com>, Felix Fietkau <nbd@nbd.name>,
	netdev@vger.kernel.org, netfilter-devel@vger.kernel.org,
	Florian Westphal <fw@strlen.de>, Paul Blakey <paulb@nvidia.com>
Subject: Re: [PATCH net v2] netfilter: nf_flow_table: fix teardown flow timeout
Date: Mon, 16 May 2022 14:43:06 +0200	[thread overview]
Message-ID: <YoJG2j0w551KM17k@salvia> (raw)
In-Reply-To: <20220516122300.6gwrlmun4w3ynz7s@SvensMacbookPro.hq.voleatech.com>

On Mon, May 16, 2022 at 02:23:00PM +0200, Sven Auhagen wrote:
> On Mon, May 16, 2022 at 02:13:03PM +0200, Pablo Neira Ayuso wrote:
> > On Mon, May 16, 2022 at 12:56:41PM +0200, Pablo Neira Ayuso wrote:
> > > On Thu, May 12, 2022 at 09:28:03PM +0300, Oz Shlomo wrote:
> > > > Connections leaving the established state (due to RST / FIN TCP packets)
> > > > set the flow table teardown flag. The packet path continues to set lower
> > > > timeout value as per the new TCP state but the offload flag remains set.
> > > >
> > > > Hence, the conntrack garbage collector may race to undo the timeout
> > > > adjustment of the packet path, leaving the conntrack entry in place with
> > > > the internal offload timeout (one day).
> > > >
> > > > Avoid ct gc timeout overwrite by flagging teared down flowtable
> > > > connections.
> > > >
> > > > On the nftables side we only need to allow established TCP connections to
> > > > create a flow offload entry. Since we can not guaruantee that
> > > > flow_offload_teardown is called by a TCP FIN packet we also need to make
> > > > sure that flow_offload_fixup_ct is also called in flow_offload_del
> > > > and only fixes up established TCP connections.
> > > [...]
> > > > diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> > > > index 0164e5f522e8..324fdb62c08b 100644
> > > > --- a/net/netfilter/nf_conntrack_core.c
> > > > +++ b/net/netfilter/nf_conntrack_core.c
> > > > @@ -1477,7 +1477,8 @@ static void gc_worker(struct work_struct *work)
> > > >  			tmp = nf_ct_tuplehash_to_ctrack(h);
> > > >  
> > > >  			if (test_bit(IPS_OFFLOAD_BIT, &tmp->status)) {
> > > > -				nf_ct_offload_timeout(tmp);
> > > 
> > > Hm, it is the trick to avoid checking for IPS_OFFLOAD from the packet
> > > path that triggers the race, ie. nf_ct_is_expired()
> > > 
> > > The flowtable ct fixup races with conntrack gc collector.
> > > 
> > > Clearing IPS_OFFLOAD might result in offloading the entry again for
> > > the closing packets.
> > > 
> > > Probably clear IPS_OFFLOAD from teardown, and skip offload if flow is
> > > in a TCP state that represent closure?
> > > 
> > >   		if (unlikely(!tcph || tcph->fin || tcph->rst))
> > >   			goto out;
> > > 
> > > this is already the intention in the existing code.
> > 
> > I'm attaching an incomplete sketch patch. My goal is to avoid the
> > extra IPS_ bit.
> 
> You might create a race with ct gc that will remove the ct
> if it is in close or end of close and before flow offload teardown is running
> so flow offload teardown might access memory that was freed.

flow object holds a reference to the ct object until it is released,
no use-after-free can happen.

> It is not a very likely scenario but never the less it might happen now
> since the IPS_OFFLOAD_BIT is not set and the state might just time out.
> 
> If someone sets a very small TCP CLOSE timeout it gets more likely.
> 
> So Oz and myself were debatting about three possible cases/problems:
> 
> 1. ct gc sets timeout even though the state is in CLOSE/FIN because the
> IPS_OFFLOAD is still set but the flow is in teardown
> 2. ct gc removes the ct because the IPS_OFFLOAD is not set and
> the CLOSE timeout is reached before the flow offload del

OK.

> 3. tcp ct is always set to ESTABLISHED with a very long timeout
> in flow offload teardown/delete even though the state is already
> CLOSED.
>
> Also as a remark we can not assume that the FIN or RST packet is hitting
> flow table teardown as the packet might get bumped to the slow path in
> nftables.

I assume this remark is related to 3.?

if IPS_OFFLOAD is unset, then conntrack would update the state
according to this FIN or RST.

Thanks for the summary.

  reply	other threads:[~2022-05-16 12:43 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-12 18:28 [PATCH net v2] netfilter: nf_flow_table: fix teardown flow timeout Oz Shlomo
2022-05-16 10:56 ` Pablo Neira Ayuso
2022-05-16 11:18   ` Sven Auhagen
2022-05-16 11:37     ` Pablo Neira Ayuso
2022-05-16 12:06       ` Pablo Neira Ayuso
2022-05-16 12:17         ` Sven Auhagen
2022-05-16 17:54           ` Pablo Neira Ayuso
2022-05-16 12:13   ` Pablo Neira Ayuso
2022-05-16 12:23     ` Sven Auhagen
2022-05-16 12:43       ` Pablo Neira Ayuso [this message]
2022-05-16 13:02         ` Sven Auhagen
2022-05-16 17:50           ` Pablo Neira Ayuso
2022-05-16 18:23             ` Sven Auhagen
2022-05-17  8:32               ` Pablo Neira Ayuso
2022-05-17  8:36                 ` Sven Auhagen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YoJG2j0w551KM17k@salvia \
    --to=pablo@netfilter.org \
    --cc=fw@strlen.de \
    --cc=nbd@nbd.name \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=ozsh@nvidia.com \
    --cc=paulb@nvidia.com \
    --cc=sven.auhagen@voleatech.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.