Re: [PATCH nf v2 1/1] netfilter: nf_flow_table: separate tunnel route state from direct xmit

Linux Netfilter development
 help / color / mirror / Atom feed

From: Lorenzo Bianconi <lorenzo@kernel.org>
To: Ren Wei <n05ec@lzu.edu.cn>
Cc: netfilter-devel@vger.kernel.org, pablo@netfilter.org,
	fw@strlen.de, phil@nwl.cc, yuantan098@gmail.com,
	yifanwucs@gmail.com, tomapufckgml@gmail.com, bird@lzu.edu.cn,
	chzhengyang2023@lzu.edu.cn
Subject: Re: [PATCH nf v2 1/1] netfilter: nf_flow_table: separate tunnel route state from direct xmit
Date: Thu, 18 Jun 2026 12:30:58 +0200	[thread overview]
Message-ID: <ajPI4gA8VORUknoA@lore-desk> (raw)
In-Reply-To: <7016923271a6bb3e26f9a21757922d3c5b1a7487.1781683535.git.chzhengyang2023@lzu.edu.cn>

[-- Attachment #1: Type: text/plain, Size: 8005 bytes --]

On Jun 18, Ren Wei wrote:
> From: Zhengyang Chen<chzhengyang2023@lzu.edu.cn>
> 
> When a flow tuple carries tunnel metadata and uses
> FLOW_OFFLOAD_XMIT_DIRECT, the transmit path may still need route state
> for tunnel push. However, the current tuple layout stores direct xmit
> L2 state and route state in overlapping runtime storage.
> 
> As a result, a tuple may keep tun_num set while the tunnel push path
> later reads tuple->dst_cache, even though a direct xmit tuple only has
> out.ifidx/out.h_source/out.h_dest stored in that area. This leads to
> invalid dst usage and can trigger a crash in the tunnel transmit path.
> 
> Fix this by moving dst_cache and dst_cookie out of the runtime union so
> that they can be shared by neighbour, xfrm, and direct tunnel flows.
> For FLOW_OFFLOAD_XMIT_DIRECT tuples carrying tunnel metadata, preserve
> route state in these shared fields and release it through the common
> dst release path.
> 
> Keep dst validation on the forwarding tuple before the packet is
> modified, and validate the tunnel consumer tuple from the same early
> control point. This preserves protection for current NEIGH/XFRM users
> of tuplehash->tuple.dst_cache while avoiding the late-check fallback
> after decap, NAT, and TTL updates.
> 
> Hardware offload rule construction still assumes that direct xmit flows
> do not carry tunnel route state, so reject that combination there for
> now to avoid undefined offload behaviour.
> 
> Fixes: d30301ba4b07 ("netfilter: flowtable: Add IPIP tx sw acceleration")
> Cc: stable@vger.kernel.org
> Reported-by: Yuan Tan <yuantan098@gmail.com>
> Reported-by: Yifan Wu <yifanwucs@gmail.com>
> Reported-by: Juefei Pu <tomapufckgml@gmail.com>
> Reported-by: Xin Liu <bird@lzu.edu.cn>
> Signed-off-by: Zhengyang Chen <chzhengyang2023@lzu.edu.cn>
> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
> ---
> changes in v2:
>   - Move dst_cache and dst_cookie out of the runtime union instead of
>     introducing dedicated tunnel dst fields
>   - Reuse the shared dst_cache/dst_cookie storage for DIRECT tunnel
>     flows
>   - Simplify dst release through the common dst_cache path
>   - Update Fixes: to d30301ba4b07 ("netfilter: flowtable: Add IPIP tx sw
>     acceleration")
>   - v1 Link: https://lore.kernel.org/all/3947a39286d335b6136bbee26f8bf44b23471c69.1780580352.git.chzhengyang2023@lzu.edu.cn/
> 
>  include/net/netfilter/nf_flow_table.h |  4 ++--
>  net/netfilter/nf_flow_table_core.c    | 12 ++++++++----
>  net/netfilter/nf_flow_table_ip.c      | 19 +++++++++++++++++++
>  net/netfilter/nf_flow_table_offload.c |  3 +++
>  4 files changed, 32 insertions(+), 6 deletions(-)
> 
> diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
> index 7b23b245a5a8..369f6a717811 100644
> --- a/include/net/netfilter/nf_flow_table.h
> +++ b/include/net/netfilter/nf_flow_table.h
> @@ -155,11 +155,11 @@ struct flow_offload_tuple {
>  					tun_num:2,
>  					in_vlan_ingress:2;
>  	u16				mtu;
> +	struct dst_entry		*dst_cache;
> +	u32				dst_cookie;
>  	union {
>  		struct {
> -			struct dst_entry *dst_cache;
>  			u32		ifidx;
> -			u32		dst_cookie;
>  		};
>  		struct {
>  			u32		ifidx;
> diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
> index 785d8c244a77..252b081319a7 100644
> --- a/net/netfilter/nf_flow_table_core.c
> +++ b/net/netfilter/nf_flow_table_core.c
> @@ -127,12 +127,18 @@ static int flow_offload_fill_route(struct flow_offload *flow,
>  
>  	switch (route->tuple[dir].xmit_type) {
>  	case FLOW_OFFLOAD_XMIT_DIRECT:
> +		if (flow_tuple->tun_num) {
> +			flow_tuple->dst_cache = dst;
> +			flow_tuple->dst_cookie =
> +				flow_offload_dst_cookie(flow_tuple);

Hi Ren Wei,

If I read the code correctly flow_tuple->tun and flow_tuple->tun_num are set
according to route->tuple[].in.tun and route->tuple[].in.num_tuns. Moreover,
route->tuple[].in.tun/route->tuple[].in.num_tuns are set just in
nft_dev_forward_path() that is executed only for FLOW_OFFLOAD_XMIT_NEIGH
entries. Am I missing something here?
Moreover, can you please add a selftest for this case in
tools/testing/selftests/net/netfilter/nft_flowtable.sh? Thanks.

Regards,
Lorenzo

> +		}
>  		memcpy(flow_tuple->out.h_dest, route->tuple[dir].out.h_dest,
>  		       ETH_ALEN);
>  		memcpy(flow_tuple->out.h_source, route->tuple[dir].out.h_source,
>  		       ETH_ALEN);
>  		flow_tuple->out.ifidx = route->tuple[dir].out.ifindex;
> -		dst_release(dst);
> +		if (!flow_tuple->tun_num)
> +			dst_release(dst);
>  		break;
>  	case FLOW_OFFLOAD_XMIT_XFRM:
>  	case FLOW_OFFLOAD_XMIT_NEIGH:
> @@ -152,9 +158,7 @@ static int flow_offload_fill_route(struct flow_offload *flow,
>  static void nft_flow_dst_release(struct flow_offload *flow,
>  				 enum flow_offload_tuple_dir dir)
>  {
> -	if (flow->tuplehash[dir].tuple.xmit_type == FLOW_OFFLOAD_XMIT_NEIGH ||
> -	    flow->tuplehash[dir].tuple.xmit_type == FLOW_OFFLOAD_XMIT_XFRM)
> -		dst_release(flow->tuplehash[dir].tuple.dst_cache);
> +	dst_release(flow->tuplehash[dir].tuple.dst_cache);
>  }
>  
>  void flow_offload_route_init(struct flow_offload *flow,
> diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
> index 9c05a50d6013..b125868ab1fb 100644
> --- a/net/netfilter/nf_flow_table_ip.c
> +++ b/net/netfilter/nf_flow_table_ip.c
> @@ -299,6 +299,11 @@ static bool nf_flow_exceeds_mtu(const struct sk_buff *skb, unsigned int mtu)
>  
>  static inline bool nf_flow_dst_check(struct flow_offload_tuple *tuple)
>  {
> +	if (tuple->tun_num &&
> +	    tuple->xmit_type == FLOW_OFFLOAD_XMIT_DIRECT &&
> +	    !dst_check(tuple->dst_cache, tuple->dst_cookie))
> +		return false;
> +
>  	if (tuple->xmit_type != FLOW_OFFLOAD_XMIT_NEIGH &&
>  	    tuple->xmit_type != FLOW_OFFLOAD_XMIT_XFRM)
>  		return true;
> @@ -482,6 +487,7 @@ static int nf_flow_offload_forward(struct nf_flowtable_ctx *ctx,
>  				   struct flow_offload_tuple_rhash *tuplehash,
>  				   struct sk_buff *skb)
>  {
> +	struct flow_offload_tuple *other_tuple;
>  	enum flow_offload_tuple_dir dir;
>  	struct flow_offload *flow;
>  	unsigned int thoff, mtu;
> @@ -507,6 +513,12 @@ static int nf_flow_offload_forward(struct nf_flowtable_ctx *ctx,
>  		return 0;
>  	}
>  
> +	other_tuple = &flow->tuplehash[!dir].tuple;
> +	if (other_tuple->tun_num && !nf_flow_dst_check(other_tuple)) {
> +		flow_offload_teardown(flow);
> +		return 0;
> +	}
> +
>  	if (skb_ensure_writable(skb, thoff + ctx->hdrsize))
>  		return -1;
>  
> @@ -1091,6 +1103,7 @@ static int nf_flow_offload_ipv6_forward(struct nf_flowtable_ctx *ctx,
>  					struct flow_offload_tuple_rhash *tuplehash,
>  					struct sk_buff *skb, int encap_limit)
>  {
> +	struct flow_offload_tuple *other_tuple;
>  	enum flow_offload_tuple_dir dir;
>  	struct flow_offload *flow;
>  	unsigned int thoff, mtu;
> @@ -1119,6 +1132,12 @@ static int nf_flow_offload_ipv6_forward(struct nf_flowtable_ctx *ctx,
>  		return 0;
>  	}
>  
> +	other_tuple = &flow->tuplehash[!dir].tuple;
> +	if (other_tuple->tun_num && !nf_flow_dst_check(other_tuple)) {
> +		flow_offload_teardown(flow);
> +		return 0;
> +	}
> +
>  	if (skb_ensure_writable(skb, thoff + ctx->hdrsize))
>  		return -1;
>  
> diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
> index 002ec15d988b..e3ace6435074 100644
> --- a/net/netfilter/nf_flow_table_offload.c
> +++ b/net/netfilter/nf_flow_table_offload.c
> @@ -820,6 +820,9 @@ nf_flow_offload_rule_alloc(struct net *net,
>  
>  	tuple = &flow->tuplehash[dir].tuple;
>  	other_tuple = &flow->tuplehash[!dir].tuple;
> +	if (other_tuple->tun_num &&
> +	    other_tuple->xmit_type == FLOW_OFFLOAD_XMIT_DIRECT)
> +		goto err_flow_match;
>  	if (other_tuple->xmit_type == FLOW_OFFLOAD_XMIT_NEIGH)
>  		other_dst = other_tuple->dst_cache;
>  
> -- 
> 2.43.0
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

     prev parent reply	other threads:[~2026-06-18 10:31 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-17 17:06 [PATCH nf v2 1/1] netfilter: nf_flow_table: separate tunnel route state from direct xmit Ren Wei
2026-06-18 10:30 ` Lorenzo Bianconi [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ajPI4gA8VORUknoA@lore-desk \
    --to=lorenzo@kernel.org \
    --cc=bird@lzu.edu.cn \
    --cc=chzhengyang2023@lzu.edu.cn \
    --cc=fw@strlen.de \
    --cc=n05ec@lzu.edu.cn \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=pablo@netfilter.org \
    --cc=phil@nwl.cc \
    --cc=tomapufckgml@gmail.com \
    --cc=yifanwucs@gmail.com \
    --cc=yuantan098@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox