All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3] net: Add locking to protect skb->dev access in ip_output
@ 2025-07-30 10:51 Sharath Chandra Vurukala
  2025-07-30 13:01 ` Eric Dumazet
  2025-08-02  0:30 ` patchwork-bot+netdevbpf
  0 siblings, 2 replies; 4+ messages in thread
From: Sharath Chandra Vurukala @ 2025-07-30 10:51 UTC (permalink / raw)
  To: davem, dsahern, edumazet, kuba, pabeni, netdev
  Cc: quic_kapandey, quic_subashab

In ip_output() skb->dev is updated from the skb_dst(skb)->dev
this can become invalid when the interface is unregistered and freed,

Introduced new skb_dst_dev_rcu() function to be used instead of
skb_dst_dev() within rcu_locks in ip_output.This will ensure that
all the skb's associated with the dev being deregistered will
be transnmitted out first, before freeing the dev.

Given that ip_output() is called within an rcu_read_lock()
critical section or from a bottom-half context, it is safe to introduce
an RCU read-side critical section within it.

Multiple panic call stacks were observed when UL traffic was run
in concurrency with device deregistration from different functions,
pasting one sample for reference.

[496733.627565][T13385] Call trace:
[496733.627570][T13385] bpf_prog_ce7c9180c3b128ea_cgroupskb_egres+0x24c/0x7f0
[496733.627581][T13385] __cgroup_bpf_run_filter_skb+0x128/0x498
[496733.627595][T13385] ip_finish_output+0xa4/0xf4
[496733.627605][T13385] ip_output+0x100/0x1a0
[496733.627613][T13385] ip_send_skb+0x68/0x100
[496733.627618][T13385] udp_send_skb+0x1c4/0x384
[496733.627625][T13385] udp_sendmsg+0x7b0/0x898
[496733.627631][T13385] inet_sendmsg+0x5c/0x7c
[496733.627639][T13385] __sys_sendto+0x174/0x1e4
[496733.627647][T13385] __arm64_sys_sendto+0x28/0x3c
[496733.627653][T13385] invoke_syscall+0x58/0x11c
[496733.627662][T13385] el0_svc_common+0x88/0xf4
[496733.627669][T13385] do_el0_svc+0x2c/0xb0
[496733.627676][T13385] el0_svc+0x2c/0xa4
[496733.627683][T13385] el0t_64_sync_handler+0x68/0xb4
[496733.627689][T13385] el0t_64_sync+0x1a4/0x1a8

Changes in v3:
- Replaced WARN_ON() with  WARN_ON_ONCE(), as suggested by Willem de Bruijn.
- Dropped legacy lines mistakenly pulled in from an outdated branch.

Changes in v2:
- Addressed review comments from Eric Dumazet
- Used READ_ONCE() to prevent potential load/store tearing
- Added skb_dst_dev_rcu() and used along with rcu_read_lock() in ip_output

Signed-off-by: Sharath Chandra Vurukala <quic_sharathv@quicinc.com>
---
 include/net/dst.h    | 12 ++++++++++++
 net/ipv4/ip_output.c | 15 ++++++++++-----
 2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index 00467c1b5093..bab01363bb97 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -568,11 +568,23 @@ static inline struct net_device *dst_dev(const struct dst_entry *dst)
 	return READ_ONCE(dst->dev);
 }
 
+static inline struct net_device *dst_dev_rcu(const struct dst_entry *dst)
+{
+	/* In the future, use rcu_dereference(dst->dev) */
+	WARN_ON_ONCE(!rcu_read_lock_held());
+	return READ_ONCE(dst->dev);
+}
+
 static inline struct net_device *skb_dst_dev(const struct sk_buff *skb)
 {
 	return dst_dev(skb_dst(skb));
 }
 
+static inline struct net_device *skb_dst_dev_rcu(const struct sk_buff *skb)
+{
+	return dst_dev_rcu(skb_dst(skb));
+}
+
 static inline struct net *skb_dst_dev_net(const struct sk_buff *skb)
 {
 	return dev_net(skb_dst_dev(skb));
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 10a1d182fd84..84e7f8a2f50f 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -425,15 +425,20 @@ int ip_mc_output(struct net *net, struct sock *sk, struct sk_buff *skb)
 
 int ip_output(struct net *net, struct sock *sk, struct sk_buff *skb)
 {
-	struct net_device *dev = skb_dst_dev(skb), *indev = skb->dev;
+	struct net_device *dev, *indev = skb->dev;
+	int ret_val;
 
+	rcu_read_lock();
+	dev = skb_dst_dev_rcu(skb);
 	skb->dev = dev;
 	skb->protocol = htons(ETH_P_IP);
 
-	return NF_HOOK_COND(NFPROTO_IPV4, NF_INET_POST_ROUTING,
-			    net, sk, skb, indev, dev,
-			    ip_finish_output,
-			    !(IPCB(skb)->flags & IPSKB_REROUTED));
+	ret_val = NF_HOOK_COND(NFPROTO_IPV4, NF_INET_POST_ROUTING,
+				net, sk, skb, indev, dev,
+				ip_finish_output,
+				!(IPCB(skb)->flags & IPSKB_REROUTED));
+	rcu_read_unlock();
+	return ret_val;
 }
 EXPORT_SYMBOL(ip_output);
 

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v3] net: Add locking to protect skb->dev access in ip_output
  2025-07-30 10:51 [PATCH v3] net: Add locking to protect skb->dev access in ip_output Sharath Chandra Vurukala
@ 2025-07-30 13:01 ` Eric Dumazet
  2025-07-30 13:06   ` Eric Dumazet
  2025-08-02  0:30 ` patchwork-bot+netdevbpf
  1 sibling, 1 reply; 4+ messages in thread
From: Eric Dumazet @ 2025-07-30 13:01 UTC (permalink / raw)
  To: Sharath Chandra Vurukala
  Cc: davem, dsahern, kuba, pabeni, netdev, quic_kapandey,
	quic_subashab

On Wed, Jul 30, 2025 at 3:51 AM Sharath Chandra Vurukala
<quic_sharathv@quicinc.com> wrote:
>
> In ip_output() skb->dev is updated from the skb_dst(skb)->dev
> this can become invalid when the interface is unregistered and freed,
>
> Introduced new skb_dst_dev_rcu() function to be used instead of
> skb_dst_dev() within rcu_locks in ip_output.This will ensure that
> all the skb's associated with the dev being deregistered will
> be transnmitted out first, before freeing the dev.
>
> Given that ip_output() is called within an rcu_read_lock()
> critical section or from a bottom-half context, it is safe to introduce
> an RCU read-side critical section within it.
>
> Multiple panic call stacks were observed when UL traffic was run
> in concurrency with device deregistration from different functions,
> pasting one sample for reference.
>
> [496733.627565][T13385] Call trace:
> [496733.627570][T13385] bpf_prog_ce7c9180c3b128ea_cgroupskb_egres+0x24c/0x7f0
> [496733.627581][T13385] __cgroup_bpf_run_filter_skb+0x128/0x498
> [496733.627595][T13385] ip_finish_output+0xa4/0xf4
> [496733.627605][T13385] ip_output+0x100/0x1a0
> [496733.627613][T13385] ip_send_skb+0x68/0x100
> [496733.627618][T13385] udp_send_skb+0x1c4/0x384
> [496733.627625][T13385] udp_sendmsg+0x7b0/0x898
> [496733.627631][T13385] inet_sendmsg+0x5c/0x7c
> [496733.627639][T13385] __sys_sendto+0x174/0x1e4
> [496733.627647][T13385] __arm64_sys_sendto+0x28/0x3c
> [496733.627653][T13385] invoke_syscall+0x58/0x11c
> [496733.627662][T13385] el0_svc_common+0x88/0xf4
> [496733.627669][T13385] do_el0_svc+0x2c/0xb0
> [496733.627676][T13385] el0_svc+0x2c/0xa4
> [496733.627683][T13385] el0t_64_sync_handler+0x68/0xb4
> [496733.627689][T13385] el0t_64_sync+0x1a4/0x1a8
>
> Changes in v3:
> - Replaced WARN_ON() with  WARN_ON_ONCE(), as suggested by Willem de Bruijn.
> - Dropped legacy lines mistakenly pulled in from an outdated branch.
>
> Changes in v2:
> - Addressed review comments from Eric Dumazet
> - Used READ_ONCE() to prevent potential load/store tearing
> - Added skb_dst_dev_rcu() and used along with rcu_read_lock() in ip_output
>
> Signed-off-by: Sharath Chandra Vurukala <quic_sharathv@quicinc.com>

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v3] net: Add locking to protect skb->dev access in ip_output
  2025-07-30 13:01 ` Eric Dumazet
@ 2025-07-30 13:06   ` Eric Dumazet
  0 siblings, 0 replies; 4+ messages in thread
From: Eric Dumazet @ 2025-07-30 13:06 UTC (permalink / raw)
  To: Sharath Chandra Vurukala
  Cc: davem, dsahern, kuba, pabeni, netdev, quic_kapandey,
	quic_subashab

On Wed, Jul 30, 2025 at 6:01 AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Wed, Jul 30, 2025 at 3:51 AM Sharath Chandra Vurukala
> <quic_sharathv@quicinc.com> wrote:
> >
> > In ip_output() skb->dev is updated from the skb_dst(skb)->dev
> > this can become invalid when the interface is unregistered and freed,
> >
> > Introduced new skb_dst_dev_rcu() function to be used instead of
> > skb_dst_dev() within rcu_locks in ip_output.This will ensure that
> > all the skb's associated with the dev being deregistered will
> > be transnmitted out first, before freeing the dev.
> >
> > Given that ip_output() is called within an rcu_read_lock()
> > critical section or from a bottom-half context, it is safe to introduce
> > an RCU read-side critical section within it.
> >
> > Multiple panic call stacks were observed when UL traffic was run
> > in concurrency with device deregistration from different functions,
> > pasting one sample for reference.
> >
> > [496733.627565][T13385] Call trace:
> > [496733.627570][T13385] bpf_prog_ce7c9180c3b128ea_cgroupskb_egres+0x24c/0x7f0
> > [496733.627581][T13385] __cgroup_bpf_run_filter_skb+0x128/0x498
> > [496733.627595][T13385] ip_finish_output+0xa4/0xf4
> > [496733.627605][T13385] ip_output+0x100/0x1a0
> > [496733.627613][T13385] ip_send_skb+0x68/0x100
> > [496733.627618][T13385] udp_send_skb+0x1c4/0x384
> > [496733.627625][T13385] udp_sendmsg+0x7b0/0x898
> > [496733.627631][T13385] inet_sendmsg+0x5c/0x7c
> > [496733.627639][T13385] __sys_sendto+0x174/0x1e4
> > [496733.627647][T13385] __arm64_sys_sendto+0x28/0x3c
> > [496733.627653][T13385] invoke_syscall+0x58/0x11c
> > [496733.627662][T13385] el0_svc_common+0x88/0xf4
> > [496733.627669][T13385] do_el0_svc+0x2c/0xb0
> > [496733.627676][T13385] el0_svc+0x2c/0xa4
> > [496733.627683][T13385] el0t_64_sync_handler+0x68/0xb4
> > [496733.627689][T13385] el0t_64_sync+0x1a4/0x1a8
> >
> > Changes in v3:
> > - Replaced WARN_ON() with  WARN_ON_ONCE(), as suggested by Willem de Bruijn.
> > - Dropped legacy lines mistakenly pulled in from an outdated branch.
> >
> > Changes in v2:
> > - Addressed review comments from Eric Dumazet
> > - Used READ_ONCE() to prevent potential load/store tearing
> > - Added skb_dst_dev_rcu() and used along with rcu_read_lock() in ip_output
> >
> > Signed-off-by: Sharath Chandra Vurukala <quic_sharathv@quicinc.com>
>
> Reviewed-by: Eric Dumazet <edumazet@google.com>

Note: I have more patches coming, fixing IPv6 of course.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v3] net: Add locking to protect skb->dev access in ip_output
  2025-07-30 10:51 [PATCH v3] net: Add locking to protect skb->dev access in ip_output Sharath Chandra Vurukala
  2025-07-30 13:01 ` Eric Dumazet
@ 2025-08-02  0:30 ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 4+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-08-02  0:30 UTC (permalink / raw)
  To: Sharath Chandra Vurukala
  Cc: davem, dsahern, edumazet, kuba, pabeni, netdev, quic_kapandey,
	quic_subashab

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed, 30 Jul 2025 16:21:18 +0530 you wrote:
> In ip_output() skb->dev is updated from the skb_dst(skb)->dev
> this can become invalid when the interface is unregistered and freed,
> 
> Introduced new skb_dst_dev_rcu() function to be used instead of
> skb_dst_dev() within rcu_locks in ip_output.This will ensure that
> all the skb's associated with the dev being deregistered will
> be transnmitted out first, before freeing the dev.
> 
> [...]

Here is the summary with links:
  - [v3] net: Add locking to protect skb->dev access in ip_output
    https://git.kernel.org/netdev/net/c/1dbf1d590d10

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-08-02  0:29 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-30 10:51 [PATCH v3] net: Add locking to protect skb->dev access in ip_output Sharath Chandra Vurukala
2025-07-30 13:01 ` Eric Dumazet
2025-07-30 13:06   ` Eric Dumazet
2025-08-02  0:30 ` patchwork-bot+netdevbpf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.