netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net] bridge: netfilter: Fix forwarding of fragmented packets
@ 2025-05-15  8:48 Ido Schimmel
  2025-05-15 15:36 ` Nikolay Aleksandrov
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Ido Schimmel @ 2025-05-15  8:48 UTC (permalink / raw)
  To: netdev, bridge
  Cc: davem, kuba, pabeni, edumazet, razor, venkat.x.venkatsubra, horms,
	pablo, fw, Ido Schimmel

When netfilter defrag hooks are loaded (due to the presence of conntrack
rules, for example), fragmented packets entering the bridge will be
defragged by the bridge's pre-routing hook (br_nf_pre_routing() ->
ipv4_conntrack_defrag()).

Later on, in the bridge's post-routing hook, the defragged packet will
be fragmented again. If the size of the largest fragment is larger than
what the kernel has determined as the destination MTU (using
ip_skb_dst_mtu()), the defragged packet will be dropped.

Before commit ac6627a28dbf ("net: ipv4: Consolidate ipv4_mtu and
ip_dst_mtu_maybe_forward"), ip_skb_dst_mtu() would return dst_mtu() as
the destination MTU. Assuming the dst entry attached to the packet is
the bridge's fake rtable one, this would simply be the bridge's MTU (see
fake_mtu()).

However, after above mentioned commit, ip_skb_dst_mtu() ends up
returning the route's MTU stored in the dst entry's metrics. Ideally, in
case the dst entry is the bridge's fake rtable one, this should be the
bridge's MTU as the bridge takes care of updating this metric when its
MTU changes (see br_change_mtu()).

Unfortunately, the last operation is a no-op given the metrics attached
to the fake rtable entry are marked as read-only. Therefore,
ip_skb_dst_mtu() ends up returning 1500 (the initial MTU value) and
defragged packets are dropped during fragmentation when dealing with
large fragments and high MTU (e.g., 9k).

Fix by moving the fake rtable entry's metrics to be per-bridge (in a
similar fashion to the fake rtable entry itself) and marking them as
writable, thereby allowing MTU changes to be reflected.

Fixes: 62fa8a846d7d ("net: Implement read-only protection and COW'ing of metrics.")
Fixes: 33eb9873a283 ("bridge: initialize fake_rtable metrics")
Reported-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Closes: https://lore.kernel.org/netdev/PH0PR10MB4504888284FF4CBA648197D0ACB82@PH0PR10MB4504.namprd10.prod.outlook.com/
Tested-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
---
 net/bridge/br_nf_core.c | 7 ++-----
 net/bridge/br_private.h | 1 +
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/net/bridge/br_nf_core.c b/net/bridge/br_nf_core.c
index 98aea5485aae..a8c67035e23c 100644
--- a/net/bridge/br_nf_core.c
+++ b/net/bridge/br_nf_core.c
@@ -65,17 +65,14 @@ static struct dst_ops fake_dst_ops = {
  * ipt_REJECT needs it.  Future netfilter modules might
  * require us to fill additional fields.
  */
-static const u32 br_dst_default_metrics[RTAX_MAX] = {
-	[RTAX_MTU - 1] = 1500,
-};
-
 void br_netfilter_rtable_init(struct net_bridge *br)
 {
 	struct rtable *rt = &br->fake_rtable;
 
 	rcuref_init(&rt->dst.__rcuref, 1);
 	rt->dst.dev = br->dev;
-	dst_init_metrics(&rt->dst, br_dst_default_metrics, true);
+	dst_init_metrics(&rt->dst, br->metrics, false);
+	dst_metric_set(&rt->dst, RTAX_MTU, br->dev->mtu);
 	rt->dst.flags	= DST_NOXFRM | DST_FAKE_RTABLE;
 	rt->dst.ops = &fake_dst_ops;
 }
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index d5b3c5936a79..4715a8d6dc32 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -505,6 +505,7 @@ struct net_bridge {
 		struct rtable		fake_rtable;
 		struct rt6_info		fake_rt6_info;
 	};
+	u32				metrics[RTAX_MAX];
 #endif
 	u16				group_fwd_mask;
 	u16				group_fwd_mask_required;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH net] bridge: netfilter: Fix forwarding of fragmented packets
  2025-05-15  8:48 [PATCH net] bridge: netfilter: Fix forwarding of fragmented packets Ido Schimmel
@ 2025-05-15 15:36 ` Nikolay Aleksandrov
  2025-05-16 23:10 ` patchwork-bot+netdevbpf
  2025-05-16 23:55 ` Pablo Neira Ayuso
  2 siblings, 0 replies; 5+ messages in thread
From: Nikolay Aleksandrov @ 2025-05-15 15:36 UTC (permalink / raw)
  To: Ido Schimmel, netdev, bridge
  Cc: davem, kuba, pabeni, edumazet, venkat.x.venkatsubra, horms, pablo,
	fw

On 5/15/25 10:48, Ido Schimmel wrote:
> When netfilter defrag hooks are loaded (due to the presence of conntrack
> rules, for example), fragmented packets entering the bridge will be
> defragged by the bridge's pre-routing hook (br_nf_pre_routing() ->
> ipv4_conntrack_defrag()).
> 
> Later on, in the bridge's post-routing hook, the defragged packet will
> be fragmented again. If the size of the largest fragment is larger than
> what the kernel has determined as the destination MTU (using
> ip_skb_dst_mtu()), the defragged packet will be dropped.
> 
> Before commit ac6627a28dbf ("net: ipv4: Consolidate ipv4_mtu and
> ip_dst_mtu_maybe_forward"), ip_skb_dst_mtu() would return dst_mtu() as
> the destination MTU. Assuming the dst entry attached to the packet is
> the bridge's fake rtable one, this would simply be the bridge's MTU (see
> fake_mtu()).
> 
> However, after above mentioned commit, ip_skb_dst_mtu() ends up
> returning the route's MTU stored in the dst entry's metrics. Ideally, in
> case the dst entry is the bridge's fake rtable one, this should be the
> bridge's MTU as the bridge takes care of updating this metric when its
> MTU changes (see br_change_mtu()).
> 
> Unfortunately, the last operation is a no-op given the metrics attached
> to the fake rtable entry are marked as read-only. Therefore,
> ip_skb_dst_mtu() ends up returning 1500 (the initial MTU value) and
> defragged packets are dropped during fragmentation when dealing with
> large fragments and high MTU (e.g., 9k).
> 
> Fix by moving the fake rtable entry's metrics to be per-bridge (in a
> similar fashion to the fake rtable entry itself) and marking them as
> writable, thereby allowing MTU changes to be reflected.
> 
> Fixes: 62fa8a846d7d ("net: Implement read-only protection and COW'ing of metrics.")
> Fixes: 33eb9873a283 ("bridge: initialize fake_rtable metrics")
> Reported-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
> Closes: https://lore.kernel.org/netdev/PH0PR10MB4504888284FF4CBA648197D0ACB82@PH0PR10MB4504.namprd10.prod.outlook.com/
> Tested-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
> ---
>   net/bridge/br_nf_core.c | 7 ++-----
>   net/bridge/br_private.h | 1 +
>   2 files changed, 3 insertions(+), 5 deletions(-)
> 

Acked-by: Nikolay Aleksandrov <razor@blackwall.org>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net] bridge: netfilter: Fix forwarding of fragmented packets
  2025-05-15  8:48 [PATCH net] bridge: netfilter: Fix forwarding of fragmented packets Ido Schimmel
  2025-05-15 15:36 ` Nikolay Aleksandrov
@ 2025-05-16 23:10 ` patchwork-bot+netdevbpf
  2025-05-16 23:55 ` Pablo Neira Ayuso
  2 siblings, 0 replies; 5+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-05-16 23:10 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev, bridge, davem, kuba, pabeni, edumazet, razor,
	venkat.x.venkatsubra, horms, pablo, fw

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Thu, 15 May 2025 11:48:48 +0300 you wrote:
> When netfilter defrag hooks are loaded (due to the presence of conntrack
> rules, for example), fragmented packets entering the bridge will be
> defragged by the bridge's pre-routing hook (br_nf_pre_routing() ->
> ipv4_conntrack_defrag()).
> 
> Later on, in the bridge's post-routing hook, the defragged packet will
> be fragmented again. If the size of the largest fragment is larger than
> what the kernel has determined as the destination MTU (using
> ip_skb_dst_mtu()), the defragged packet will be dropped.
> 
> [...]

Here is the summary with links:
  - [net] bridge: netfilter: Fix forwarding of fragmented packets
    https://git.kernel.org/netdev/net/c/91b6dbced0ef

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net] bridge: netfilter: Fix forwarding of fragmented packets
  2025-05-15  8:48 [PATCH net] bridge: netfilter: Fix forwarding of fragmented packets Ido Schimmel
  2025-05-15 15:36 ` Nikolay Aleksandrov
  2025-05-16 23:10 ` patchwork-bot+netdevbpf
@ 2025-05-16 23:55 ` Pablo Neira Ayuso
  2025-05-18  6:47   ` Ido Schimmel
  2 siblings, 1 reply; 5+ messages in thread
From: Pablo Neira Ayuso @ 2025-05-16 23:55 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev, bridge, davem, kuba, pabeni, edumazet, razor,
	venkat.x.venkatsubra, horms, fw

On Thu, May 15, 2025 at 11:48:48AM +0300, Ido Schimmel wrote:
[...]
> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
> index d5b3c5936a79..4715a8d6dc32 100644
> --- a/net/bridge/br_private.h
> +++ b/net/bridge/br_private.h
> @@ -505,6 +505,7 @@ struct net_bridge {
>  		struct rtable		fake_rtable;
>  		struct rt6_info		fake_rt6_info;
>  	};

This is missing #ifdef to restrict it to bridge netfilter.

> +	u32				metrics[RTAX_MAX];
>  #endif
>  	u16				group_fwd_mask;
>  	u16				group_fwd_mask_required;
> -- 
> 2.49.0
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net] bridge: netfilter: Fix forwarding of fragmented packets
  2025-05-16 23:55 ` Pablo Neira Ayuso
@ 2025-05-18  6:47   ` Ido Schimmel
  0 siblings, 0 replies; 5+ messages in thread
From: Ido Schimmel @ 2025-05-18  6:47 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: netdev, bridge, davem, kuba, pabeni, edumazet, razor,
	venkat.x.venkatsubra, horms, fw

On Sat, May 17, 2025 at 01:55:42AM +0200, Pablo Neira Ayuso wrote:
> On Thu, May 15, 2025 at 11:48:48AM +0300, Ido Schimmel wrote:
> [...]
> > diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
> > index d5b3c5936a79..4715a8d6dc32 100644
> > --- a/net/bridge/br_private.h
> > +++ b/net/bridge/br_private.h
> > @@ -505,6 +505,7 @@ struct net_bridge {
> >  		struct rtable		fake_rtable;
> >  		struct rt6_info		fake_rt6_info;
> >  	};
> 
> This is missing #ifdef to restrict it to bridge netfilter.

https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/tree/net/bridge/br_private.h#n503

It's already restricted to bridge netfilter...

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-05-18  6:47 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-15  8:48 [PATCH net] bridge: netfilter: Fix forwarding of fragmented packets Ido Schimmel
2025-05-15 15:36 ` Nikolay Aleksandrov
2025-05-16 23:10 ` patchwork-bot+netdevbpf
2025-05-16 23:55 ` Pablo Neira Ayuso
2025-05-18  6:47   ` Ido Schimmel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).