netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Pablo Neira Ayuso <pablo@netfilter.org>
To: Oz Shlomo <ozsh@nvidia.com>
Cc: Sven Auhagen <sven.auhagen@voleatech.de>,
	Felix Fietkau <nbd@nbd.name>,
	netdev@vger.kernel.org, netfilter-devel@vger.kernel.org,
	Florian Westphal <fw@strlen.de>, Paul Blakey <paulb@nvidia.com>
Subject: Re: [PATCH net v2] netfilter: nf_flow_table: fix teardown flow timeout
Date: Mon, 16 May 2022 12:56:38 +0200	[thread overview]
Message-ID: <YoIt5rHw4Xwl1zgY@salvia> (raw)
In-Reply-To: <20220512182803.6353-1-ozsh@nvidia.com>

On Thu, May 12, 2022 at 09:28:03PM +0300, Oz Shlomo wrote:
> Connections leaving the established state (due to RST / FIN TCP packets)
> set the flow table teardown flag. The packet path continues to set lower
> timeout value as per the new TCP state but the offload flag remains set.
>
> Hence, the conntrack garbage collector may race to undo the timeout
> adjustment of the packet path, leaving the conntrack entry in place with
> the internal offload timeout (one day).
>
> Avoid ct gc timeout overwrite by flagging teared down flowtable
> connections.
>
> On the nftables side we only need to allow established TCP connections to
> create a flow offload entry. Since we can not guaruantee that
> flow_offload_teardown is called by a TCP FIN packet we also need to make
> sure that flow_offload_fixup_ct is also called in flow_offload_del
> and only fixes up established TCP connections.
[...]
> diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> index 0164e5f522e8..324fdb62c08b 100644
> --- a/net/netfilter/nf_conntrack_core.c
> +++ b/net/netfilter/nf_conntrack_core.c
> @@ -1477,7 +1477,8 @@ static void gc_worker(struct work_struct *work)
>  			tmp = nf_ct_tuplehash_to_ctrack(h);
>  
>  			if (test_bit(IPS_OFFLOAD_BIT, &tmp->status)) {
> -				nf_ct_offload_timeout(tmp);

Hm, it is the trick to avoid checking for IPS_OFFLOAD from the packet
path that triggers the race, ie. nf_ct_is_expired()

The flowtable ct fixup races with conntrack gc collector.

Clearing IPS_OFFLOAD might result in offloading the entry again for
the closing packets.

Probably clear IPS_OFFLOAD from teardown, and skip offload if flow is
in a TCP state that represent closure?

  		if (unlikely(!tcph || tcph->fin || tcph->rst))
  			goto out;

this is already the intention in the existing code.

If this does work, could you keep IPS_OFFLOAD_TEARDOWN_BIT internal,
ie. no in uapi? Define it at include/net/netfilter/nf_conntrack.h and
add a comment regarding this to avoid an overlap in the future.

> +				if (!test_bit(IPS_OFFLOAD_TEARDOWN_BIT, &tmp->status))
> +					nf_ct_offload_timeout(tmp);
>  				continue;
>  			}
>  
> diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
> index 3db256da919b..aaed1a244013 100644
> --- a/net/netfilter/nf_flow_table_core.c
> +++ b/net/netfilter/nf_flow_table_core.c
> @@ -177,14 +177,8 @@ int flow_offload_route_init(struct flow_offload *flow,
>  }
>  EXPORT_SYMBOL_GPL(flow_offload_route_init);
>  
> -static void flow_offload_fixup_tcp(struct ip_ct_tcp *tcp)
> -{
> -	tcp->state = TCP_CONNTRACK_ESTABLISHED;
> -	tcp->seen[0].td_maxwin = 0;
> -	tcp->seen[1].td_maxwin = 0;
> -}
>  
> -static void flow_offload_fixup_ct_timeout(struct nf_conn *ct)
> +static void flow_offload_fixup_ct(struct nf_conn *ct)
>  {
>  	struct net *net = nf_ct_net(ct);
>  	int l4num = nf_ct_protonum(ct);
> @@ -192,8 +186,12 @@ static void flow_offload_fixup_ct_timeout(struct nf_conn *ct)
>  
>  	if (l4num == IPPROTO_TCP) {
>  		struct nf_tcp_net *tn = nf_tcp_pernet(net);
> +		struct ip_ct_tcp *tcp = &ct->proto.tcp;
> +
> +		tcp->seen[0].td_maxwin = 0;
> +		tcp->seen[1].td_maxwin = 0;
>  
> -		timeout = tn->timeouts[TCP_CONNTRACK_ESTABLISHED];
> +		timeout = tn->timeouts[ct->proto.tcp.state];
>  		timeout -= tn->offload_timeout;
>  	} else if (l4num == IPPROTO_UDP) {
>  		struct nf_udp_net *tn = nf_udp_pernet(net);
> @@ -211,18 +209,6 @@ static void flow_offload_fixup_ct_timeout(struct nf_conn *ct)
>  		WRITE_ONCE(ct->timeout, nfct_time_stamp + timeout);
>  }
>  
> -static void flow_offload_fixup_ct_state(struct nf_conn *ct)
> -{
> -	if (nf_ct_protonum(ct) == IPPROTO_TCP)
> -		flow_offload_fixup_tcp(&ct->proto.tcp);
> -}
> -
> -static void flow_offload_fixup_ct(struct nf_conn *ct)
> -{
> -	flow_offload_fixup_ct_state(ct);
> -	flow_offload_fixup_ct_timeout(ct);
> -}
> -
>  static void flow_offload_route_release(struct flow_offload *flow)
>  {
>  	nft_flow_dst_release(flow, FLOW_OFFLOAD_DIR_ORIGINAL);
> @@ -353,6 +339,10 @@ static inline bool nf_flow_has_expired(const struct flow_offload *flow)
>  static void flow_offload_del(struct nf_flowtable *flow_table,
>  			     struct flow_offload *flow)
>  {
> +	struct nf_conn *ct = flow->ct;
> +
> +	set_bit(IPS_OFFLOAD_TEARDOWN_BIT, &flow->ct->status);
> +
>  	rhashtable_remove_fast(&flow_table->rhashtable,
>  			       &flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].node,
>  			       nf_flow_offload_rhash_params);
> @@ -360,12 +350,11 @@ static void flow_offload_del(struct nf_flowtable *flow_table,
>  			       &flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].node,
>  			       nf_flow_offload_rhash_params);
>  
> -	clear_bit(IPS_OFFLOAD_BIT, &flow->ct->status);
> -
>  	if (nf_flow_has_expired(flow))
> -		flow_offload_fixup_ct(flow->ct);
> -	else
> -		flow_offload_fixup_ct_timeout(flow->ct);
> +		flow_offload_fixup_ct(ct);

Very unlikely, but race might still happen between fixup and
clear IPS_OFFLOAD_BIT with gc below?

Without checking from the packet path, the conntrack gc might race to
refresh the timeout, I don't see a 100% race free solution.

Probably update the nf_ct_offload_timeout to a shorter value than a
day would mitigate this issue too.

> +	clear_bit(IPS_OFFLOAD_BIT, &ct->status);
> +	clear_bit(IPS_OFFLOAD_TEARDOWN_BIT, &ct->status);
>  
>  	flow_offload_free(flow);
>  }
> @@ -373,8 +362,9 @@ static void flow_offload_del(struct nf_flowtable *flow_table,
>  void flow_offload_teardown(struct flow_offload *flow)
>  {
>  	set_bit(NF_FLOW_TEARDOWN, &flow->flags);
> +	set_bit(IPS_OFFLOAD_TEARDOWN_BIT, &flow->ct->status);
>  
> -	flow_offload_fixup_ct_state(flow->ct);
> +	flow_offload_fixup_ct(flow->ct);
>  }
>  EXPORT_SYMBOL_GPL(flow_offload_teardown);
>  
> diff --git a/net/netfilter/nft_flow_offload.c b/net/netfilter/nft_flow_offload.c
> index 900d48c810a1..9cc3ea08eb3a 100644
> --- a/net/netfilter/nft_flow_offload.c
> +++ b/net/netfilter/nft_flow_offload.c
> @@ -295,6 +295,8 @@ static void nft_flow_offload_eval(const struct nft_expr *expr,
>  					  sizeof(_tcph), &_tcph);
>  		if (unlikely(!tcph || tcph->fin || tcph->rst))
>  			goto out;
> +		if (unlikely(!nf_conntrack_tcp_established(ct)))
> +			goto out;

This chunk is not required, from ruleset users can do

        ... ct status assured ...

instead.

>  		break;
>  	case IPPROTO_UDP:
>  		break;
> -- 
> 1.8.3.1
> 

  reply	other threads:[~2022-05-16 10:56 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-12 18:28 [PATCH net v2] netfilter: nf_flow_table: fix teardown flow timeout Oz Shlomo
2022-05-16 10:56 ` Pablo Neira Ayuso [this message]
2022-05-16 11:18   ` Sven Auhagen
2022-05-16 11:37     ` Pablo Neira Ayuso
2022-05-16 12:06       ` Pablo Neira Ayuso
2022-05-16 12:17         ` Sven Auhagen
2022-05-16 17:54           ` Pablo Neira Ayuso
2022-05-16 12:13   ` Pablo Neira Ayuso
2022-05-16 12:23     ` Sven Auhagen
2022-05-16 12:43       ` Pablo Neira Ayuso
2022-05-16 13:02         ` Sven Auhagen
2022-05-16 17:50           ` Pablo Neira Ayuso
2022-05-16 18:23             ` Sven Auhagen
2022-05-17  8:32               ` Pablo Neira Ayuso
2022-05-17  8:36                 ` Sven Auhagen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YoIt5rHw4Xwl1zgY@salvia \
    --to=pablo@netfilter.org \
    --cc=fw@strlen.de \
    --cc=nbd@nbd.name \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=ozsh@nvidia.com \
    --cc=paulb@nvidia.com \
    --cc=sven.auhagen@voleatech.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).