All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ido Schimmel <idosch@nvidia.com>
To: Lorenzo Bianconi <lorenzo@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Simon Horman <horms@kernel.org>,
	Pablo Neira Ayuso <pablo@netfilter.org>,
	Florian Westphal <fw@strlen.de>, Phil Sutter <phil@nwl.cc>,
	Nikolay Aleksandrov <razor@blackwall.org>,
	Bart De Schuymer <bdschuym@pandora.be>,
	Patrick McHardy <kaber@trash.net>,
	netdev@vger.kernel.org, netfilter-devel@vger.kernel.org,
	coreteam@netfilter.org, bridge@lists.linux.dev
Subject: Re: [PATCH net] net: neigh: Reallocate headroom if necessary in neigh_hh_bridge()
Date: Sun, 10 May 2026 14:14:05 +0300	[thread overview]
Message-ID: <20260510111405.GA78831@shredder> (raw)
In-Reply-To: <20260508-nf-neigh_hh_bridge-fix-v1-1-a1464468d92e@kernel.org>

On Fri, May 08, 2026 at 01:25:14PM +0200, Lorenzo Bianconi wrote:
> neigh_hh_bridge() assumes the skb always has sufficient headroom to copy
> the aligned  L2 header. This assumption can trigger the crash reported
> below using the following netfilter setup:
> 
> $modprobe br_netfilter
> $sysctl -w net.bridge.bridge-nf-call-iptables=1
> 
> $root@OpenWrt:~# nft list ruleset
> table ip nat {
>         chain prerouting {
>                 type nat hook prerouting priority dstnat; policy accept;
>                 ip daddr 192.168.83.123 dnat to 192.168.83.120
>         }
> }
> 
> - iperf3 client (192.168.83.119) --> bridge (192.168.83.118) --> iperf3 server (192.168.83.120)
> 
> the iperf3 client is sending packet for 192.168.83.123 to the bridge device.
> 
> [ 1579.036575] Unable to handle kernel write to read-only memory at virtual address ffffff8004d76ffe
> [ 1579.045482] Mem abort info:
> [ 1579.048273]   ESR = 0x000000009600004f
> [ 1579.052024]   EC = 0x25: DABT (current EL), IL = 32 bits
> [ 1579.057363]   SET = 0, FnV = 0
> [ 1579.060417]   EA = 0, S1PTW = 0
> [ 1579.063550]   FSC = 0x0f: level 3 permission fault
> [ 1579.068345] Data abort info:
> [ 1579.071224]   ISV = 0, ISS = 0x0000004f, ISS2 = 0x00000000
> [ 1579.076720]   CM = 0, WnR = 1, TnD = 0, TagAccess = 0
> [ 1579.081770]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> [ 1579.087092] swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000080dc4000
> [ 1579.093794] [ffffff8004d76ffe] pgd=180000009ffff003, p4d=180000009ffff003, pud=180000009ffff003, pmd=180000009ffe3003, pte=0060000084d76787
> [ 1579.106343] Internal error: Oops: 000000009600004f [#1] SMP
> [ 1579.193824] CPU: 0 UID: 0 PID: 235 Comm: napi/qdma_eth-3 Tainted: G           O       6.12.57 #0

AFAICT this driver does not reserve any headroom in skbs that it's
injecting to the Rx path. Is there a reason for that?

> [ 1579.202614] Tainted: [O]=OOT_MODULE
> [ 1579.206102] Hardware name: Airoha AN7581 Evaluation Board (DT)
> [ 1579.211929] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [ 1579.218889] pc : br_nf_pre_routing_finish_bridge+0x1ac/0xcc8 [br_netfilter]
> [ 1579.225859] lr : br_nf_pre_routing_finish_bridge+0x18c/0xcc8 [br_netfilter]
> [ 1579.232822] sp : ffffffc0817cba20
> [ 1579.236128] x29: ffffffc0817cba20 x28: 0000000000000000 x27: ffffff8002b89000
> [ 1579.243273] x26: ffffff8004d7700e x25: 0000000000000008 x24: 0000000000000000
> [ 1579.250416] x23: ffffffc08179d4c0 x22: 0000000000000000 x21: ffffffc08179d4c0
> [ 1579.257561] x20: ffffff8004d9b800 x19: ffffff8015010000 x18: 0000000000000014
> [ 1579.264704] x17: ffffffbf9e930000 x16: ffffffc0817c8000 x15: 0000000000000070
> [ 1579.271848] x14: 0000000000000080 x13: 0000000000000001 x12: 0000000000000000
> [ 1579.278993] x11: ffffffc0798caae0 x10: ffffff8014db6fd8 x9 : 0000000000000000
> [ 1579.286136] x8 : 0000000000000003 x7 : ffffffc08171f628 x6 : 000000001a3b83d3
> [ 1579.293281] x5 : 0000000000000000 x4 : 1beb76f22fee0000 x3 : ffffff8004d7700e
> [ 1579.300425] x2 : 0000000000000000 x1 : ffffff8004d9b8bc x0 : ffffff80026ed000
> [ 1579.307570] Call trace:
> [ 1579.310018]  br_nf_pre_routing_finish_bridge+0x1ac/0xcc8 [br_netfilter]
> [ 1579.316632]  br_nf_hook_thresh+0xd4/0x14bc [br_netfilter]
> [ 1579.322032]  br_nf_hook_thresh+0x250/0x14bc [br_netfilter]
> [ 1579.327517]  br_nf_hook_thresh+0x76c/0x14bc [br_netfilter]
> [ 1579.333003]  br_handle_frame+0x180/0x480
> [ 1579.336935]  __netif_receive_skb_core.constprop.0+0x540/0xf40
> [ 1579.342682]  __netif_receive_skb_one_core+0x28/0x50
> [ 1579.347561]  process_backlog+0x98/0x1e0
> [ 1579.351398]  __napi_poll+0x34/0x1c4
> [ 1579.354887]  net_rx_action+0x178/0x330
> [ 1579.358638]  handle_softirqs+0x108/0x2d4
> [ 1579.362560]  __do_softirq+0x10/0x18
> [ 1579.366051]  ____do_softirq+0xc/0x20
> [ 1579.369627]  call_on_irq_stack+0x30/0x4c
> [ 1579.373550]  do_softirq_own_stack+0x18/0x20
> [ 1579.377734]  do_softirq+0x4c/0x60
> [ 1579.381050]  __local_bh_enable_ip+0x88/0x98
> [ 1579.385234]  napi_threaded_poll_loop+0x188/0x21c
> [ 1579.389853]  napi_threaded_poll+0x70/0x80
> [ 1579.393863]  kthread+0xd8/0xdc
> [ 1579.396918]  ret_from_fork+0x10/0x20
> [ 1579.400499] Code: 88dffc22 3707ffc2 f9406663 f9406684 (f81f0064)
> [ 1579.406589] ---[ end trace 0000000000000000 ]---
> [ 1579.411209] Kernel panic - not syncing: Oops: Fatal exception in interrupt
> [ 1579.418083] SMP: stopping secondary CPUs
> [ 1579.422012] Kernel Offset: disabled
> 
> Fix the issue reallocating the skb headroom if necessary in neigh_hh_bridge routine.
> 
> Fixes: e179e6322ac33 ("netfilter: bridge-netfilter: Fix MAC header handling with IP DNAT")
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
>  include/net/neighbour.h         | 15 +++++++++++----
>  net/bridge/br_netfilter_hooks.c |  5 ++++-
>  2 files changed, 15 insertions(+), 5 deletions(-)
> 
> diff --git a/include/net/neighbour.h b/include/net/neighbour.h
> index 2dfee6d4258a..4e1222968753 100644
> --- a/include/net/neighbour.h
> +++ b/include/net/neighbour.h
> @@ -487,16 +487,23 @@ static inline int neigh_event_send(struct neighbour *neigh, struct sk_buff *skb)
>  }
>  
>  #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
> -static inline int neigh_hh_bridge(struct hh_cache *hh, struct sk_buff *skb)
> +static inline struct sk_buff *
> +neigh_hh_bridge(struct hh_cache *hh, struct sk_buff *skb)
>  {
> -	unsigned int seq, hh_alen;
> +	unsigned int seq, hh_alen = HH_DATA_ALIGN(ETH_HLEN);
> +
> +	if (unlikely(skb_headroom(skb) < hh_alen)) {
> +		skb = skb_expand_head(skb, hh_alen);
> +		if (!skb)
> +			return NULL;
> +	}

The comment from Sashiko looks relevant:

Does this adequately protect against writing to shared or cloned SKBs?

If a cloned SKB already has sufficient headroom, this check evaluates to
false, and the code proceeds to overwrite the MAC header via memcpy().
Modifying a cloned SKB without unsharing it could corrupt the data for
other users of the buffer, or still trigger the read-only memory panic
this patch aims to fix.

Should this use skb_cow_head() or explicitly check skb_shared() and
skb_cloned() before modifying the buffer data?

>  
>  	do {
>  		seq = read_seqbegin(&hh->hh_lock);
> -		hh_alen = HH_DATA_ALIGN(ETH_HLEN);
>  		memcpy(skb->data - hh_alen, hh->hh_data, ETH_ALEN + hh_alen - ETH_HLEN);
>  	} while (read_seqretry(&hh->hh_lock, seq));
> -	return 0;
> +
> +	return skb;
>  }
>  #endif
>  
> diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c
> index 0ab1c94db4b9..6b59d7eb7906 100644
> --- a/net/bridge/br_netfilter_hooks.c
> +++ b/net/bridge/br_netfilter_hooks.c
> @@ -297,7 +297,10 @@ int br_nf_pre_routing_finish_bridge(struct net *net, struct sock *sk, struct sk_
>  				goto free_skb;
>  			}
>  
> -			neigh_hh_bridge(&neigh->hh, skb);
> +			skb = neigh_hh_bridge(&neigh->hh, skb);
> +			if (!skb)
> +				return -ENOMEM;
> +

Also from Sashiko:

Does returning early here leak the neighbour reference?

Earlier in br_nf_pre_routing_finish_bridge(), a reference to neigh is
obtained via dst_neigh_lookup_skb(dst, skb). By returning -ENOMEM here,
we bypass the neigh_release(neigh) call at the end of the if (neigh) block.

Could this cause the neighbour reference count to leak, eventually preventing
the network device from being unregistered?

>  			skb->dev = br_indev;
>  
>  			ret = br_handle_frame_finish(net, sk, skb);
> 
> ---
> base-commit: fcee7d82f27d6a8b1ddc5bbefda59b4e441e9bc0
> change-id: 20260508-nf-neigh_hh_bridge-fix-9ab775ee23c6
> 
> Best regards,
> -- 
> Lorenzo Bianconi <lorenzo@kernel.org>
> 

  reply	other threads:[~2026-05-10 11:14 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-08 11:25 [PATCH net] net: neigh: Reallocate headroom if necessary in neigh_hh_bridge() Lorenzo Bianconi
2026-05-10 11:14 ` Ido Schimmel [this message]
2026-05-11 15:52   ` Lorenzo Bianconi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260510111405.GA78831@shredder \
    --to=idosch@nvidia.com \
    --cc=bdschuym@pandora.be \
    --cc=bridge@lists.linux.dev \
    --cc=coreteam@netfilter.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=fw@strlen.de \
    --cc=horms@kernel.org \
    --cc=kaber@trash.net \
    --cc=kuba@kernel.org \
    --cc=lorenzo@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pablo@netfilter.org \
    --cc=phil@nwl.cc \
    --cc=razor@blackwall.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.