* [PATCH] net: Add locking to protect skb->dev access in ip_output
@ 2025-07-23 8:22 Sharath Chandra Vurukala
2025-07-23 15:08 ` Eric Dumazet
2025-07-24 0:46 ` kernel test robot
0 siblings, 2 replies; 6+ messages in thread
From: Sharath Chandra Vurukala @ 2025-07-23 8:22 UTC (permalink / raw)
To: davem, dsahern, edumazet, kuba, pabeni, netdev
Cc: quic_kapandey, quic_subashab
In ip_output() skb->dev is updated from the skb_dst(skb)->dev
this can become invalid when the interface is unregistered and freed,
Added rcu locks to ip_output().This will ensure that all the skb's
associated with the dev being deregistered will be transnmitted
out first, before freeing the dev.
Multiple panic call stacks were observed when UL traffic was run
in concurrency with device deregistration from different functions,
pasting one sample for reference.
[496733.627565][T13385] Call trace:
[496733.627570][T13385] bpf_prog_ce7c9180c3b128ea_cgroupskb_egres+0x24c/0x7f0
[496733.627581][T13385] __cgroup_bpf_run_filter_skb+0x128/0x498
[496733.627595][T13385] ip_finish_output+0xa4/0xf4
[496733.627605][T13385] ip_output+0x100/0x1a0
[496733.627613][T13385] ip_send_skb+0x68/0x100
[496733.627618][T13385] udp_send_skb+0x1c4/0x384
[496733.627625][T13385] udp_sendmsg+0x7b0/0x898
[496733.627631][T13385] inet_sendmsg+0x5c/0x7c
[496733.627639][T13385] __sys_sendto+0x174/0x1e4
[496733.627647][T13385] __arm64_sys_sendto+0x28/0x3c
[496733.627653][T13385] invoke_syscall+0x58/0x11c
[496733.627662][T13385] el0_svc_common+0x88/0xf4
[496733.627669][T13385] do_el0_svc+0x2c/0xb0
[496733.627676][T13385] el0_svc+0x2c/0xa4
[496733.627683][T13385] el0t_64_sync_handler+0x68/0xb4
[496733.627689][T13385] el0t_64_sync+0x1a4/0x1a8
Signed-off-by: Sharath Chandra Vurukala <quic_sharathv@quicinc.com>
---
net/ipv4/ip_output.c | 17 ++++++++++++-----
1 file changed, 12 insertions(+), 5 deletions(-)
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 10a1d182fd84..95c5e9cfc971 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -425,15 +425,22 @@ int ip_mc_output(struct net *net, struct sock *sk, struct sk_buff *skb)
int ip_output(struct net *net, struct sock *sk, struct sk_buff *skb)
{
- struct net_device *dev = skb_dst_dev(skb), *indev = skb->dev;
+ struct net_device *dev, *indev = skb->dev;
+ IP_UPD_PO_STATS(net, IPSTATS_MIB_OUT, skb->len);
+
+ rcu_read_lock();
+
+ dev = skb_dst(skb)->dev;
skb->dev = dev;
skb->protocol = htons(ETH_P_IP);
- return NF_HOOK_COND(NFPROTO_IPV4, NF_INET_POST_ROUTING,
- net, sk, skb, indev, dev,
- ip_finish_output,
- !(IPCB(skb)->flags & IPSKB_REROUTED));
+ ret_val = NF_HOOK_COND(NFPROTO_IPV4, NF_INET_POST_ROUTING,
+ net, sk, skb, indev, dev,
+ ip_finish_output,
+ !(IPCB(skb)->flags & IPSKB_REROUTED));
+ rcu_read_unlock();
+ return ret_val;
}
EXPORT_SYMBOL(ip_output);
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] net: Add locking to protect skb->dev access in ip_output
2025-07-23 8:22 [PATCH] net: Add locking to protect skb->dev access in ip_output Sharath Chandra Vurukala
@ 2025-07-23 15:08 ` Eric Dumazet
2025-07-24 6:15 ` Sharath Chandra Vurukala
2025-07-24 0:46 ` kernel test robot
1 sibling, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2025-07-23 15:08 UTC (permalink / raw)
To: Sharath Chandra Vurukala
Cc: davem, dsahern, kuba, pabeni, netdev, quic_kapandey,
quic_subashab
On Wed, Jul 23, 2025 at 1:22 AM Sharath Chandra Vurukala
<quic_sharathv@quicinc.com> wrote:
>
> In ip_output() skb->dev is updated from the skb_dst(skb)->dev
> this can become invalid when the interface is unregistered and freed,
>
> Added rcu locks to ip_output().This will ensure that all the skb's
> associated with the dev being deregistered will be transnmitted
> out first, before freeing the dev.
>
> Multiple panic call stacks were observed when UL traffic was run
> in concurrency with device deregistration from different functions,
> pasting one sample for reference.
>
> [496733.627565][T13385] Call trace:
> [496733.627570][T13385] bpf_prog_ce7c9180c3b128ea_cgroupskb_egres+0x24c/0x7f0
> [496733.627581][T13385] __cgroup_bpf_run_filter_skb+0x128/0x498
> [496733.627595][T13385] ip_finish_output+0xa4/0xf4
> [496733.627605][T13385] ip_output+0x100/0x1a0
> [496733.627613][T13385] ip_send_skb+0x68/0x100
> [496733.627618][T13385] udp_send_skb+0x1c4/0x384
> [496733.627625][T13385] udp_sendmsg+0x7b0/0x898
> [496733.627631][T13385] inet_sendmsg+0x5c/0x7c
> [496733.627639][T13385] __sys_sendto+0x174/0x1e4
> [496733.627647][T13385] __arm64_sys_sendto+0x28/0x3c
> [496733.627653][T13385] invoke_syscall+0x58/0x11c
> [496733.627662][T13385] el0_svc_common+0x88/0xf4
> [496733.627669][T13385] do_el0_svc+0x2c/0xb0
> [496733.627676][T13385] el0_svc+0x2c/0xa4
> [496733.627683][T13385] el0t_64_sync_handler+0x68/0xb4
> [496733.627689][T13385] el0t_64_sync+0x1a4/0x1a8
>
> Signed-off-by: Sharath Chandra Vurukala <quic_sharathv@quicinc.com>
> ---
> net/ipv4/ip_output.c | 17 ++++++++++++-----
> 1 file changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index 10a1d182fd84..95c5e9cfc971 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -425,15 +425,22 @@ int ip_mc_output(struct net *net, struct sock *sk, struct sk_buff *skb)
>
> int ip_output(struct net *net, struct sock *sk, struct sk_buff *skb)
> {
> - struct net_device *dev = skb_dst_dev(skb), *indev = skb->dev;
> + struct net_device *dev, *indev = skb->dev;
>
> + IP_UPD_PO_STATS(net, IPSTATS_MIB_OUT, skb->len);
> +
> + rcu_read_lock();
> +
> + dev = skb_dst(skb)->dev;
Arg... Please do not remove skb_dst_dev(skb), and instead expand it.
I recently started to work on this class of problems.
commit a74fc62eec155ca5a6da8ff3856f3dc87fe24558
Author: Eric Dumazet <edumazet@google.com>
Date: Mon Jun 30 12:19:31 2025 +0000
ipv4: adopt dst_dev, skb_dst_dev and skb_dst_dev_net[_rcu]
Use the new helpers as a first step to deal with
potential dst->dev races.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250630121934.3399505-8-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Adding RCU is not good enough, we still need the READ_ONCE() to
prevent potential load/store tearing.
I was planning to add skb_dst_dev_rcu() helper and start replacing
skb_dst_dev() where needed.
diff --git a/include/net/dst.h b/include/net/dst.h
index 00467c1b509389a8e37d6e3d0912374a0ff12c4a..692ebb1b3f421210dbb58990b77a200b9189d0f7
100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -568,11 +568,23 @@ static inline struct net_device *dst_dev(const
struct dst_entry *dst)
return READ_ONCE(dst->dev);
}
+static inline struct net_device *dst_dev_rcu(const struct dst_entry *dst)
+{
+ /* In the future, use rcu_dereference(dst->dev) */
+ WARN_ON(!rcu_read_lock_held());
+ return READ_ONCE(dst->dev);
+}
+
static inline struct net_device *skb_dst_dev(const struct sk_buff *skb)
{
return dst_dev(skb_dst(skb));
}
+static inline struct net_device *skb_dst_dev_rcu(const struct sk_buff *skb)
+{
+ return dst_dev_rcu(skb_dst(skb));
+}
+
static inline struct net *skb_dst_dev_net(const struct sk_buff *skb)
{
return dev_net(skb_dst_dev(skb));
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 10a1d182fd848f0d2348f65fde269383f9c07baa..37b982dd53f69247634c67c493c44fa482100dee
100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -425,15 +425,20 @@ int ip_mc_output(struct net *net, struct sock
*sk, struct sk_buff *skb)
int ip_output(struct net *net, struct sock *sk, struct sk_buff *skb)
{
- struct net_device *dev = skb_dst_dev(skb), *indev = skb->dev;
+ struct net_device *dev, *indev = skb->dev;
+ int res;
+ rcu_read_lock();
+ dev = skb_dst_dev_rcu(skb);
skb->dev = dev;
skb->protocol = htons(ETH_P_IP);
- return NF_HOOK_COND(NFPROTO_IPV4, NF_INET_POST_ROUTING,
+ res = NF_HOOK_COND(NFPROTO_IPV4, NF_INET_POST_ROUTING,
net, sk, skb, indev, dev,
ip_finish_output,
!(IPCB(skb)->flags & IPSKB_REROUTED));
+ rcu_read_unlock();
+ return res;
}
EXPORT_SYMBOL(ip_output);
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] net: Add locking to protect skb->dev access in ip_output
2025-07-23 8:22 [PATCH] net: Add locking to protect skb->dev access in ip_output Sharath Chandra Vurukala
2025-07-23 15:08 ` Eric Dumazet
@ 2025-07-24 0:46 ` kernel test robot
1 sibling, 0 replies; 6+ messages in thread
From: kernel test robot @ 2025-07-24 0:46 UTC (permalink / raw)
To: Sharath Chandra Vurukala, davem, dsahern, edumazet, kuba, pabeni,
netdev
Cc: oe-kbuild-all, quic_kapandey, quic_subashab
Hi Sharath,
kernel test robot noticed the following build errors:
[auto build test ERROR on net-next/main]
[also build test ERROR on next-20250723]
[cannot apply to net/main linus/master horms-ipvs/master v6.16-rc7]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Sharath-Chandra-Vurukala/net-Add-locking-to-protect-skb-dev-access-in-ip_output/20250723-162406
base: net-next/main
patch link: https://lore.kernel.org/r/20250723082201.GA14090%40hu-sharathv-hyd.qualcomm.com
patch subject: [PATCH] net: Add locking to protect skb->dev access in ip_output
config: i386-randconfig-003-20250724 (https://download.01.org/0day-ci/archive/20250724/202507240835.DPHAwhlJ-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14+deb12u1) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250724/202507240835.DPHAwhlJ-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202507240835.DPHAwhlJ-lkp@intel.com/
All errors (new ones prefixed by >>):
net/ipv4/ip_output.c: In function 'ip_output':
>> net/ipv4/ip_output.c:438:9: error: 'ret_val' undeclared (first use in this function); did you mean 'pte_val'?
438 | ret_val = NF_HOOK_COND(NFPROTO_IPV4, NF_INET_POST_ROUTING,
| ^~~~~~~
| pte_val
net/ipv4/ip_output.c:438:9: note: each undeclared identifier is reported only once for each function it appears in
net/ipv4/ip_output.c:444:1: warning: control reaches end of non-void function [-Wreturn-type]
444 | }
| ^
vim +438 net/ipv4/ip_output.c
425
426 int ip_output(struct net *net, struct sock *sk, struct sk_buff *skb)
427 {
428 struct net_device *dev, *indev = skb->dev;
429
430 IP_UPD_PO_STATS(net, IPSTATS_MIB_OUT, skb->len);
431
432 rcu_read_lock();
433
434 dev = skb_dst(skb)->dev;
435 skb->dev = dev;
436 skb->protocol = htons(ETH_P_IP);
437
> 438 ret_val = NF_HOOK_COND(NFPROTO_IPV4, NF_INET_POST_ROUTING,
439 net, sk, skb, indev, dev,
440 ip_finish_output,
441 !(IPCB(skb)->flags & IPSKB_REROUTED));
442 rcu_read_unlock();
443 return ret_val;
444 }
445 EXPORT_SYMBOL(ip_output);
446
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] net: Add locking to protect skb->dev access in ip_output
2025-07-23 15:08 ` Eric Dumazet
@ 2025-07-24 6:15 ` Sharath Chandra Vurukala
2025-07-24 8:03 ` Eric Dumazet
0 siblings, 1 reply; 6+ messages in thread
From: Sharath Chandra Vurukala @ 2025-07-24 6:15 UTC (permalink / raw)
To: Eric Dumazet
Cc: davem, dsahern, kuba, pabeni, netdev, quic_kapandey,
quic_subashab
On 7/23/2025 8:38 PM, Eric Dumazet wrote:
> On Wed, Jul 23, 2025 at 1:22 AM Sharath Chandra Vurukala
> <quic_sharathv@quicinc.com> wrote:
>>
>> In ip_output() skb->dev is updated from the skb_dst(skb)->dev
>> this can become invalid when the interface is unregistered and freed,
>>
>> Added rcu locks to ip_output().This will ensure that all the skb's
>> associated with the dev being deregistered will be transnmitted
>> out first, before freeing the dev.
>>
>> Multiple panic call stacks were observed when UL traffic was run
>> in concurrency with device deregistration from different functions,
>> pasting one sample for reference.
>>
>> [496733.627565][T13385] Call trace:
>> [496733.627570][T13385] bpf_prog_ce7c9180c3b128ea_cgroupskb_egres+0x24c/0x7f0
>> [496733.627581][T13385] __cgroup_bpf_run_filter_skb+0x128/0x498
>> [496733.627595][T13385] ip_finish_output+0xa4/0xf4
>> [496733.627605][T13385] ip_output+0x100/0x1a0
>> [496733.627613][T13385] ip_send_skb+0x68/0x100
>> [496733.627618][T13385] udp_send_skb+0x1c4/0x384
>> [496733.627625][T13385] udp_sendmsg+0x7b0/0x898
>> [496733.627631][T13385] inet_sendmsg+0x5c/0x7c
>> [496733.627639][T13385] __sys_sendto+0x174/0x1e4
>> [496733.627647][T13385] __arm64_sys_sendto+0x28/0x3c
>> [496733.627653][T13385] invoke_syscall+0x58/0x11c
>> [496733.627662][T13385] el0_svc_common+0x88/0xf4
>> [496733.627669][T13385] do_el0_svc+0x2c/0xb0
>> [496733.627676][T13385] el0_svc+0x2c/0xa4
>> [496733.627683][T13385] el0t_64_sync_handler+0x68/0xb4
>> [496733.627689][T13385] el0t_64_sync+0x1a4/0x1a8
>>
>> Signed-off-by: Sharath Chandra Vurukala <quic_sharathv@quicinc.com>
>> ---
>> net/ipv4/ip_output.c | 17 ++++++++++++-----
>> 1 file changed, 12 insertions(+), 5 deletions(-)
>>
>> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
>> index 10a1d182fd84..95c5e9cfc971 100644
>> --- a/net/ipv4/ip_output.c
>> +++ b/net/ipv4/ip_output.c
>> @@ -425,15 +425,22 @@ int ip_mc_output(struct net *net, struct sock *sk, struct sk_buff *skb)
>>
>> int ip_output(struct net *net, struct sock *sk, struct sk_buff *skb)
>> {
>> - struct net_device *dev = skb_dst_dev(skb), *indev = skb->dev;
>> + struct net_device *dev, *indev = skb->dev;
>>
>> + IP_UPD_PO_STATS(net, IPSTATS_MIB_OUT, skb->len);
>> +
>> + rcu_read_lock();
>> +
>> + dev = skb_dst(skb)->dev;
>
> Arg... Please do not remove skb_dst_dev(skb), and instead expand it.
>
> I recently started to work on this class of problems.
>
> commit a74fc62eec155ca5a6da8ff3856f3dc87fe24558
> Author: Eric Dumazet <edumazet@google.com>
> Date: Mon Jun 30 12:19:31 2025 +0000
>
> ipv4: adopt dst_dev, skb_dst_dev and skb_dst_dev_net[_rcu]
>
> Use the new helpers as a first step to deal with
> potential dst->dev races.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
> Link: https://patch.msgid.link/20250630121934.3399505-8-edumazet@google.com
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
>
>
> Adding RCU is not good enough, we still need the READ_ONCE() to
> prevent potential load/store tearing.
>
> I was planning to add skb_dst_dev_rcu() helper and start replacing
> skb_dst_dev() where needed.
>
> diff --git a/include/net/dst.h b/include/net/dst.h
> index 00467c1b509389a8e37d6e3d0912374a0ff12c4a..692ebb1b3f421210dbb58990b77a200b9189d0f7
> 100644
> --- a/include/net/dst.h
> +++ b/include/net/dst.h
> @@ -568,11 +568,23 @@ static inline struct net_device *dst_dev(const
> struct dst_entry *dst)
> return READ_ONCE(dst->dev);
> }
>
> +static inline struct net_device *dst_dev_rcu(const struct dst_entry *dst)
> +{
> + /* In the future, use rcu_dereference(dst->dev) */
> + WARN_ON(!rcu_read_lock_held());
> + return READ_ONCE(dst->dev);
> +}
> +
> static inline struct net_device *skb_dst_dev(const struct sk_buff *skb)
> {
> return dst_dev(skb_dst(skb));
> }
>
> +static inline struct net_device *skb_dst_dev_rcu(const struct sk_buff *skb)
> +{
> + return dst_dev_rcu(skb_dst(skb));
> +}
> +
> static inline struct net *skb_dst_dev_net(const struct sk_buff *skb)
> {
> return dev_net(skb_dst_dev(skb));
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index 10a1d182fd848f0d2348f65fde269383f9c07baa..37b982dd53f69247634c67c493c44fa482100dee
> 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -425,15 +425,20 @@ int ip_mc_output(struct net *net, struct sock
> *sk, struct sk_buff *skb)
>
> int ip_output(struct net *net, struct sock *sk, struct sk_buff *skb)
> {
> - struct net_device *dev = skb_dst_dev(skb), *indev = skb->dev;
> + struct net_device *dev, *indev = skb->dev;
> + int res;
>
> + rcu_read_lock();
> + dev = skb_dst_dev_rcu(skb);
> skb->dev = dev;
> skb->protocol = htons(ETH_P_IP);
>
> - return NF_HOOK_COND(NFPROTO_IPV4, NF_INET_POST_ROUTING,
> + res = NF_HOOK_COND(NFPROTO_IPV4, NF_INET_POST_ROUTING,
> net, sk, skb, indev, dev,
> ip_finish_output,
> !(IPCB(skb)->flags & IPSKB_REROUTED));
> + rcu_read_unlock();
> + return res;
> }
> EXPORT_SYMBOL(ip_output);
Thanks Eric for the review, as this work is already underway on your end, I’ll pause and wait for your changes to become available.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] net: Add locking to protect skb->dev access in ip_output
2025-07-24 6:15 ` Sharath Chandra Vurukala
@ 2025-07-24 8:03 ` Eric Dumazet
2025-07-29 11:45 ` Sharath Chandra Vurukala
0 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2025-07-24 8:03 UTC (permalink / raw)
To: Sharath Chandra Vurukala
Cc: davem, dsahern, kuba, pabeni, netdev, quic_kapandey,
quic_subashab
On Wed, Jul 23, 2025 at 11:16 PM Sharath Chandra Vurukala
<quic_sharathv@quicinc.com> wrote:
>
>
> Thanks Eric for the review, as this work is already underway on your end, I’ll pause and wait for your changes to become available.
Hi Sharath
I think you definitely can send a patch, I was not trying to say you
could not do it.
Just pointing out what the plan was :)
Feel free to use my suggested patch, test it and send a V2
Thanks.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] net: Add locking to protect skb->dev access in ip_output
2025-07-24 8:03 ` Eric Dumazet
@ 2025-07-29 11:45 ` Sharath Chandra Vurukala
0 siblings, 0 replies; 6+ messages in thread
From: Sharath Chandra Vurukala @ 2025-07-29 11:45 UTC (permalink / raw)
To: Eric Dumazet
Cc: davem, dsahern, kuba, pabeni, netdev, quic_kapandey,
quic_subashab
On 7/24/2025 1:33 PM, Eric Dumazet wrote:
> On Wed, Jul 23, 2025 at 11:16 PM Sharath Chandra Vurukala
> <quic_sharathv@quicinc.com> wrote:
>>
>>
>> Thanks Eric for the review, as this work is already underway on your end, I’ll pause and wait for your changes to become available.
>
> Hi Sharath
>
> I think you definitely can send a patch, I was not trying to say you
> could not do it.
>
> Just pointing out what the plan was :)
>
> Feel free to use my suggested patch, test it and send a V2
>
> Thanks.
Thanks Eric, have sent a v2 patch with your suggested changes.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-07-29 11:45 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-23 8:22 [PATCH] net: Add locking to protect skb->dev access in ip_output Sharath Chandra Vurukala
2025-07-23 15:08 ` Eric Dumazet
2025-07-24 6:15 ` Sharath Chandra Vurukala
2025-07-24 8:03 ` Eric Dumazet
2025-07-29 11:45 ` Sharath Chandra Vurukala
2025-07-24 0:46 ` kernel test robot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).