netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Florian Westphal <fw@strlen.de>
To: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>,
	Phil Sutter <phil@nwl.cc>,
	netdev@vger.kernel.org, Jozsef Kadlecsik <kadlec@netfilter.org>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Simon Horman <horms@kernel.org>,
	netfilter-devel@vger.kernel.org, coreteam@netfilter.org,
	linux-kernel@vger.kernel.org
Subject: Re: Soft lock-ups caused by iptables
Date: Thu, 20 Nov 2025 21:46:28 +0100	[thread overview]
Message-ID: <aR9-JDXdelaf0tGU@strlen.de> (raw)
In-Reply-To: <20251120203836.GA31922@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net>

Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> wrote:
> On Thu, Nov 20, 2025 at 10:34:46AM +0100, Florian Westphal wrote:
> > netfilter: nf_tables: avoid chain re-validation if possible
> > 
> > Consider:
> > 
> >       input -> j2 -> j3
> >       input -> j2 -> j3
> >       input -> j1 -> j2 -> j3
> > 
> > Then the second rule does not need to revalidate j2, and, by extension j3.
> > 
> > We need to validate it only for rule 3.
> > 
> > This is needed because chain loop detection also ensures we do not
> > exceed the jump stack: Just because we know that j2 is cycle free, its
> > last jump might now exceed the allowed stack.  We also need to update
> > the new largest call depth for all the reachable nodes.
> > 
> > diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
> > --- a/include/net/netfilter/nf_tables.h
> > +++ b/include/net/netfilter/nf_tables.h
> > @@ -1109,6 +1109,7 @@ struct nft_rule_blob {
> >   *	@udlen: user data length
> >   *	@udata: user data in the chain
> >   *	@blob_next: rule blob pointer to the next in the chain
> > + *	@depth: chain was validated for call level <= depth
> >   */
> >  struct nft_chain {
> >  	struct nft_rule_blob		__rcu *blob_gen_0;
> > @@ -1128,9 +1129,10 @@ struct nft_chain {
> >  
> >  	/* Only used during control plane commit phase: */
> >  	struct nft_rule_blob		*blob_next;
> > +	u8				depth;
> >  };
> >  
> > -int nft_chain_validate(const struct nft_ctx *ctx, const struct nft_chain *chain);
> > +int nft_chain_validate(const struct nft_ctx *ctx, struct nft_chain *chain);
> >  int nft_setelem_validate(const struct nft_ctx *ctx, struct nft_set *set,
> >  			 const struct nft_set_iter *iter,
> >  			 struct nft_elem_priv *elem_priv);
> > diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
> > --- a/net/netfilter/nf_tables_api.c
> > +++ b/net/netfilter/nf_tables_api.c
> > @@ -4088,15 +4088,26 @@ static void nf_tables_rule_release(const struct nft_ctx *ctx, struct nft_rule *r
> >   * and set lookups until either the jump limit is hit or all reachable
> >   * chains have been validated.
> >   */
> > -int nft_chain_validate(const struct nft_ctx *ctx, const struct nft_chain *chain)
> > +int nft_chain_validate(const struct nft_ctx *ctx, struct nft_chain *chain)
> >  {
> >  	struct nft_expr *expr, *last;
> >  	struct nft_rule *rule;
> >  	int err;
> >  
> > +	BUILD_BUG_ON(NFT_JUMP_STACK_SIZE > 255);
> >  	if (ctx->level == NFT_JUMP_STACK_SIZE)
> >  		return -EMLINK;
> >  
> > +	/* jumps to base chains are not allowed, this is already
> > +	 * validated by nft_verdict_init().
> > +	 *
> > +	 * Chain must be re-validated if we are entering for first
> > +	 * time or if the current jumpstack usage is higher than on
> > +	 * previous check.
> > +	 */
> > +	if (ctx->level && chain->depth >= ctx->level)
> > +		return 0;
> > +
> >  	list_for_each_entry(rule, &chain->rules, list) {
> >  		if (fatal_signal_pending(current))
> >  			return -EINTR;
> > @@ -4117,6 +4128,10 @@ int nft_chain_validate(const struct nft_ctx *ctx, const struct nft_chain *chain)
> >  		}
> >  	}
> >  
> > +	/* Chain needs no re-validation if called again
> > +	 * from a path that doesn't exceed level.
> > +	 */
> > +	chain->depth = ctx->level;
> >  	return 0;
> >  }
> >  EXPORT_SYMBOL_GPL(nft_chain_validate);
> > @@ -4128,7 +4143,7 @@ static int nft_table_validate(struct net *net, const struct nft_table *table)
> >  		.net	= net,
> >  		.family	= table->family,
> >  	};
> > -	int err;
> > +	int err = 0;
> >  
> >  	list_for_each_entry(chain, &table->chains, list) {
> >  		if (!nft_is_base_chain(chain))
> > @@ -4137,12 +4152,16 @@ static int nft_table_validate(struct net *net, const struct nft_table *table)
> >  		ctx.chain = chain;
> >  		err = nft_chain_validate(&ctx, chain);
> >  		if (err < 0)
> > -			return err;
> > +			goto err;
> >  
> >  		cond_resched();
> >  	}
> >  
> > -	return 0;
> > +err:
> > +	list_for_each_entry(chain, &table->chains, list)
> > +		chain->depth = 0;
> > +
> > +	return err;
> >  }
> >  
> >  int nft_setelem_validate(const struct nft_ctx *ctx, struct nft_set *set,
> 
> FWIW This patch seems to resolve the issue, assuming you intended to
> include the following:

Thanks for testing.  I will try to make this work universally next week
(this needs more work to keep a bitmask of base hook types for
 which we already validated this).  And we likely need to improve
existing test coverage, the above patch should fail the tests we have.

  reply	other threads:[~2025-11-20 20:46 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-18 22:17 Soft lock-ups caused by iptables Hamza Mahfooz
2025-11-19 14:49 ` Phil Sutter
2025-11-19 15:58   ` Florian Westphal
2025-11-19 18:12     ` Phil Sutter
2025-11-19 23:10       ` Pablo Neira Ayuso
2025-11-20  9:34         ` Florian Westphal
2025-11-20 11:22           ` Phil Sutter
2025-11-20 20:38           ` Hamza Mahfooz
2025-11-20 20:46             ` Florian Westphal [this message]
2025-11-20 21:07             ` Pablo Neira Ayuso
2025-11-21 20:59               ` Hamza Mahfooz
2025-11-20 21:01           ` Pablo Neira Ayuso
2025-11-19 22:29   ` Hamza Mahfooz
2025-11-19 23:14     ` Pablo Neira Ayuso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aR9-JDXdelaf0tGU@strlen.de \
    --to=fw@strlen.de \
    --cc=coreteam@netfilter.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hamzamahfooz@linux.microsoft.com \
    --cc=horms@kernel.org \
    --cc=kadlec@netfilter.org \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pablo@netfilter.org \
    --cc=phil@nwl.cc \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).