* [PATCH v2] net: Add locking to protect skb->dev access in ip_output
@ 2025-07-29 11:42 Sharath Chandra Vurukala
2025-07-29 14:04 ` Willem de Bruijn
0 siblings, 1 reply; 8+ messages in thread
From: Sharath Chandra Vurukala @ 2025-07-29 11:42 UTC (permalink / raw)
To: davem, dsahern, edumazet, kuba, pabeni, netdev
Cc: quic_kapandey, quic_subashab
In ip_output() skb->dev is updated from the skb_dst(skb)->dev
this can become invalid when the interface is unregistered and freed,
Introduced new skb_dst_dev_rcu() function to be used instead of
skb_dst_dev() within rcu_locks in outout.This will ensure that
all the skb's associated with the dev being deregistered will
be transnmitted out first, before freeing the dev.
Multiple panic call stacks were observed when UL traffic was run
in concurrency with device deregistration from different functions,
pasting one sample for reference.
[496733.627565][T13385] Call trace:
[496733.627570][T13385] bpf_prog_ce7c9180c3b128ea_cgroupskb_egres+0x24c/0x7f0
[496733.627581][T13385] __cgroup_bpf_run_filter_skb+0x128/0x498
[496733.627595][T13385] ip_finish_output+0xa4/0xf4
[496733.627605][T13385] ip_output+0x100/0x1a0
[496733.627613][T13385] ip_send_skb+0x68/0x100
[496733.627618][T13385] udp_send_skb+0x1c4/0x384
[496733.627625][T13385] udp_sendmsg+0x7b0/0x898
[496733.627631][T13385] inet_sendmsg+0x5c/0x7c
[496733.627639][T13385] __sys_sendto+0x174/0x1e4
[496733.627647][T13385] __arm64_sys_sendto+0x28/0x3c
[496733.627653][T13385] invoke_syscall+0x58/0x11c
[496733.627662][T13385] el0_svc_common+0x88/0xf4
[496733.627669][T13385] do_el0_svc+0x2c/0xb0
[496733.627676][T13385] el0_svc+0x2c/0xa4
[496733.627683][T13385] el0t_64_sync_handler+0x68/0xb4
[496733.627689][T13385] el0t_64_sync+0x1a4/0x1a8
Changes in v2:
- Addressed review comments from Eric Dumazet
- Used READ_ONCE() to prevent potential load/store tearing
- Added skb_dst_dev_rcu() and used along with rcu_read_lock() in ip_output
Signed-off-by: Sharath Chandra Vurukala <quic_sharathv@quicinc.com>
---
include/net/dst.h | 12 ++++++++++++
net/ipv4/ip_output.c | 17 ++++++++++++-----
2 files changed, 24 insertions(+), 5 deletions(-)
diff --git a/include/net/dst.h b/include/net/dst.h
index 00467c1b5093..692ebb1b3f42 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -568,11 +568,23 @@ static inline struct net_device *dst_dev(const struct dst_entry *dst)
return READ_ONCE(dst->dev);
}
+static inline struct net_device *dst_dev_rcu(const struct dst_entry *dst)
+{
+ /* In the future, use rcu_dereference(dst->dev) */
+ WARN_ON(!rcu_read_lock_held());
+ return READ_ONCE(dst->dev);
+}
+
static inline struct net_device *skb_dst_dev(const struct sk_buff *skb)
{
return dst_dev(skb_dst(skb));
}
+static inline struct net_device *skb_dst_dev_rcu(const struct sk_buff *skb)
+{
+ return dst_dev_rcu(skb_dst(skb));
+}
+
static inline struct net *skb_dst_dev_net(const struct sk_buff *skb)
{
return dev_net(skb_dst_dev(skb));
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 10a1d182fd84..d70d79b71897 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -425,15 +425,22 @@ int ip_mc_output(struct net *net, struct sock *sk, struct sk_buff *skb)
int ip_output(struct net *net, struct sock *sk, struct sk_buff *skb)
{
- struct net_device *dev = skb_dst_dev(skb), *indev = skb->dev;
+ struct net_device *dev, *indev = skb->dev;
+ int ret_val;
+ IP_UPD_PO_STATS(net, IPSTATS_MIB_OUT, skb->len);
+
+ rcu_read_lock();
+ dev = skb_dst_dev_rcu(skb);
skb->dev = dev;
skb->protocol = htons(ETH_P_IP);
- return NF_HOOK_COND(NFPROTO_IPV4, NF_INET_POST_ROUTING,
- net, sk, skb, indev, dev,
- ip_finish_output,
- !(IPCB(skb)->flags & IPSKB_REROUTED));
+ ret_val = NF_HOOK_COND(NFPROTO_IPV4, NF_INET_POST_ROUTING,
+ net, sk, skb, indev, dev,
+ ip_finish_output,
+ !(IPCB(skb)->flags & IPSKB_REROUTED));
+ rcu_read_unlock();
+ return ret_val;
}
EXPORT_SYMBOL(ip_output);
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v2] net: Add locking to protect skb->dev access in ip_output
2025-07-29 11:42 [PATCH v2] net: Add locking to protect skb->dev access in ip_output Sharath Chandra Vurukala
@ 2025-07-29 14:04 ` Willem de Bruijn
2025-07-29 15:44 ` Sharath Chandra Vurukala
0 siblings, 1 reply; 8+ messages in thread
From: Willem de Bruijn @ 2025-07-29 14:04 UTC (permalink / raw)
To: Sharath Chandra Vurukala, davem, dsahern, edumazet, kuba, pabeni,
netdev
Cc: quic_kapandey, quic_subashab
Sharath Chandra Vurukala wrote:
> In ip_output() skb->dev is updated from the skb_dst(skb)->dev
> this can become invalid when the interface is unregistered and freed,
>
> Introduced new skb_dst_dev_rcu() function to be used instead of
> skb_dst_dev() within rcu_locks in outout.This will ensure that
> all the skb's associated with the dev being deregistered will
> be transnmitted out first, before freeing the dev.
>
> Multiple panic call stacks were observed when UL traffic was run
> in concurrency with device deregistration from different functions,
> pasting one sample for reference.
>
> [496733.627565][T13385] Call trace:
> [496733.627570][T13385] bpf_prog_ce7c9180c3b128ea_cgroupskb_egres+0x24c/0x7f0
> [496733.627581][T13385] __cgroup_bpf_run_filter_skb+0x128/0x498
> [496733.627595][T13385] ip_finish_output+0xa4/0xf4
> [496733.627605][T13385] ip_output+0x100/0x1a0
> [496733.627613][T13385] ip_send_skb+0x68/0x100
> [496733.627618][T13385] udp_send_skb+0x1c4/0x384
> [496733.627625][T13385] udp_sendmsg+0x7b0/0x898
> [496733.627631][T13385] inet_sendmsg+0x5c/0x7c
> [496733.627639][T13385] __sys_sendto+0x174/0x1e4
> [496733.627647][T13385] __arm64_sys_sendto+0x28/0x3c
> [496733.627653][T13385] invoke_syscall+0x58/0x11c
> [496733.627662][T13385] el0_svc_common+0x88/0xf4
> [496733.627669][T13385] do_el0_svc+0x2c/0xb0
> [496733.627676][T13385] el0_svc+0x2c/0xa4
> [496733.627683][T13385] el0t_64_sync_handler+0x68/0xb4
> [496733.627689][T13385] el0t_64_sync+0x1a4/0x1a8
>
> Changes in v2:
> - Addressed review comments from Eric Dumazet
> - Used READ_ONCE() to prevent potential load/store tearing
> - Added skb_dst_dev_rcu() and used along with rcu_read_lock() in ip_output
>
> Signed-off-by: Sharath Chandra Vurukala <quic_sharathv@quicinc.com>
> ---
> include/net/dst.h | 12 ++++++++++++
> net/ipv4/ip_output.c | 17 ++++++++++++-----
> 2 files changed, 24 insertions(+), 5 deletions(-)
>
> diff --git a/include/net/dst.h b/include/net/dst.h
> index 00467c1b5093..692ebb1b3f42 100644
> --- a/include/net/dst.h
> +++ b/include/net/dst.h
> @@ -568,11 +568,23 @@ static inline struct net_device *dst_dev(const struct dst_entry *dst)
> return READ_ONCE(dst->dev);
> }
>
> +static inline struct net_device *dst_dev_rcu(const struct dst_entry *dst)
> +{
> + /* In the future, use rcu_dereference(dst->dev) */
> + WARN_ON(!rcu_read_lock_held());
WARN_ON_ONCE or even DEBUG_NET_WARN_ON_ONCE
> + return READ_ONCE(dst->dev);
> +}
> +
> static inline struct net_device *skb_dst_dev(const struct sk_buff *skb)
> {
> return dst_dev(skb_dst(skb));
> }
>
> +static inline struct net_device *skb_dst_dev_rcu(const struct sk_buff *skb)
> +{
> + return dst_dev_rcu(skb_dst(skb));
> +}
> +
> static inline struct net *skb_dst_dev_net(const struct sk_buff *skb)
> {
> return dev_net(skb_dst_dev(skb));
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index 10a1d182fd84..d70d79b71897 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -425,15 +425,22 @@ int ip_mc_output(struct net *net, struct sock *sk, struct sk_buff *skb)
>
> int ip_output(struct net *net, struct sock *sk, struct sk_buff *skb)
> {
> - struct net_device *dev = skb_dst_dev(skb), *indev = skb->dev;
> + struct net_device *dev, *indev = skb->dev;
> + int ret_val;
>
> + IP_UPD_PO_STATS(net, IPSTATS_MIB_OUT, skb->len);
Why introduce this?
> +
> + rcu_read_lock();
> + dev = skb_dst_dev_rcu(skb);
> skb->dev = dev;
> skb->protocol = htons(ETH_P_IP);
>
> - return NF_HOOK_COND(NFPROTO_IPV4, NF_INET_POST_ROUTING,
> - net, sk, skb, indev, dev,
> - ip_finish_output,
> - !(IPCB(skb)->flags & IPSKB_REROUTED));
> + ret_val = NF_HOOK_COND(NFPROTO_IPV4, NF_INET_POST_ROUTING,
> + net, sk, skb, indev, dev,
> + ip_finish_output,
> + !(IPCB(skb)->flags & IPSKB_REROUTED));
> + rcu_read_unlock();
> + return ret_val;
> }
> EXPORT_SYMBOL(ip_output);
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] net: Add locking to protect skb->dev access in ip_output
2025-07-29 14:04 ` Willem de Bruijn
@ 2025-07-29 15:44 ` Sharath Chandra Vurukala
2025-07-29 16:11 ` Willem de Bruijn
0 siblings, 1 reply; 8+ messages in thread
From: Sharath Chandra Vurukala @ 2025-07-29 15:44 UTC (permalink / raw)
To: Willem de Bruijn, davem, dsahern, edumazet, kuba, pabeni, netdev
Cc: quic_kapandey, quic_subashab
On 7/29/2025 7:34 PM, Willem de Bruijn wrote:
> Sharath Chandra Vurukala wrote:
>> In ip_output() skb->dev is updated from the skb_dst(skb)->dev
>> this can become invalid when the interface is unregistered and freed,
>>
>> Introduced new skb_dst_dev_rcu() function to be used instead of
>> skb_dst_dev() within rcu_locks in outout.This will ensure that
>> all the skb's associated with the dev being deregistered will
>> be transnmitted out first, before freeing the dev.
>>
>> Multiple panic call stacks were observed when UL traffic was run
>> in concurrency with device deregistration from different functions,
>> pasting one sample for reference.
>>
>> [496733.627565][T13385] Call trace:
>> [496733.627570][T13385] bpf_prog_ce7c9180c3b128ea_cgroupskb_egres+0x24c/0x7f0
>> [496733.627581][T13385] __cgroup_bpf_run_filter_skb+0x128/0x498
>> [496733.627595][T13385] ip_finish_output+0xa4/0xf4
>> [496733.627605][T13385] ip_output+0x100/0x1a0
>> [496733.627613][T13385] ip_send_skb+0x68/0x100
>> [496733.627618][T13385] udp_send_skb+0x1c4/0x384
>> [496733.627625][T13385] udp_sendmsg+0x7b0/0x898
>> [496733.627631][T13385] inet_sendmsg+0x5c/0x7c
>> [496733.627639][T13385] __sys_sendto+0x174/0x1e4
>> [496733.627647][T13385] __arm64_sys_sendto+0x28/0x3c
>> [496733.627653][T13385] invoke_syscall+0x58/0x11c
>> [496733.627662][T13385] el0_svc_common+0x88/0xf4
>> [496733.627669][T13385] do_el0_svc+0x2c/0xb0
>> [496733.627676][T13385] el0_svc+0x2c/0xa4
>> [496733.627683][T13385] el0t_64_sync_handler+0x68/0xb4
>> [496733.627689][T13385] el0t_64_sync+0x1a4/0x1a8
>>
>> Changes in v2:
>> - Addressed review comments from Eric Dumazet
>> - Used READ_ONCE() to prevent potential load/store tearing
>> - Added skb_dst_dev_rcu() and used along with rcu_read_lock() in ip_output
>>
>> Signed-off-by: Sharath Chandra Vurukala <quic_sharathv@quicinc.com>
>> ---
>> include/net/dst.h | 12 ++++++++++++
>> net/ipv4/ip_output.c | 17 ++++++++++++-----
>> 2 files changed, 24 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/net/dst.h b/include/net/dst.h
>> index 00467c1b5093..692ebb1b3f42 100644
>> --- a/include/net/dst.h
>> +++ b/include/net/dst.h
>> @@ -568,11 +568,23 @@ static inline struct net_device *dst_dev(const struct dst_entry *dst)
>> return READ_ONCE(dst->dev);
>> }
>>
>> +static inline struct net_device *dst_dev_rcu(const struct dst_entry *dst)
>> +{
>> + /* In the future, use rcu_dereference(dst->dev) */
>> + WARN_ON(!rcu_read_lock_held());
>
> WARN_ON_ONCE or even DEBUG_NET_WARN_ON_ONCE
>
That makes sense — I will revise the code to use WARN_ON_ONCE() accordingly.>> + return READ_ONCE(dst->dev);
>> +}
>> +
>> static inline struct net_device *skb_dst_dev(const struct sk_buff *skb)
>> {
>> return dst_dev(skb_dst(skb));
>> }
>>
>> +static inline struct net_device *skb_dst_dev_rcu(const struct sk_buff *skb)
>> +{
>> + return dst_dev_rcu(skb_dst(skb));
>> +}
>> +
>> static inline struct net *skb_dst_dev_net(const struct sk_buff *skb)
>> {
>> return dev_net(skb_dst_dev(skb));
>> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
>> index 10a1d182fd84..d70d79b71897 100644
>> --- a/net/ipv4/ip_output.c
>> +++ b/net/ipv4/ip_output.c
>> @@ -425,15 +425,22 @@ int ip_mc_output(struct net *net, struct sock *sk, struct sk_buff *skb)
>>
>> int ip_output(struct net *net, struct sock *sk, struct sk_buff *skb)
>> {
>> - struct net_device *dev = skb_dst_dev(skb), *indev = skb->dev;
>> + struct net_device *dev, *indev = skb->dev;
>> + int ret_val;
>>
>> + IP_UPD_PO_STATS(net, IPSTATS_MIB_OUT, skb->len);
>
> Why introduce this?
>
Apologies for the oversight. The branch I am currently working on is quite outdated, and this line originates from that earlier version.
This line appears to have been unintentionally included during the preparation of the patch for submission to net-next. Will correct this.>> +
>> + rcu_read_lock();
>> + dev = skb_dst_dev_rcu(skb);
>> skb->dev = dev;
>> skb->protocol = htons(ETH_P_IP);
>>
>> - return NF_HOOK_COND(NFPROTO_IPV4, NF_INET_POST_ROUTING,
>> - net, sk, skb, indev, dev,
>> - ip_finish_output,
>> - !(IPCB(skb)->flags & IPSKB_REROUTED));
>> + ret_val = NF_HOOK_COND(NFPROTO_IPV4, NF_INET_POST_ROUTING,
>> + net, sk, skb, indev, dev,
>> + ip_finish_output,
>> + !(IPCB(skb)->flags & IPSKB_REROUTED));
>> + rcu_read_unlock();
>> + return ret_val;
>> }
>> EXPORT_SYMBOL(ip_output);
>>
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] net: Add locking to protect skb->dev access in ip_output
2025-07-29 15:44 ` Sharath Chandra Vurukala
@ 2025-07-29 16:11 ` Willem de Bruijn
2025-07-29 16:14 ` Eric Dumazet
0 siblings, 1 reply; 8+ messages in thread
From: Willem de Bruijn @ 2025-07-29 16:11 UTC (permalink / raw)
To: Sharath Chandra Vurukala, Willem de Bruijn, davem, dsahern,
edumazet, kuba, pabeni, netdev
Cc: quic_kapandey, quic_subashab
Sharath Chandra Vurukala wrote:
>
>
> On 7/29/2025 7:34 PM, Willem de Bruijn wrote:
> > Sharath Chandra Vurukala wrote:
> >> In ip_output() skb->dev is updated from the skb_dst(skb)->dev
> >> this can become invalid when the interface is unregistered and freed,
> >>
> >> Introduced new skb_dst_dev_rcu() function to be used instead of
> >> skb_dst_dev() within rcu_locks in outout.This will ensure that
> >> all the skb's associated with the dev being deregistered will
> >> be transnmitted out first, before freeing the dev.
> >>
> >> Multiple panic call stacks were observed when UL traffic was run
> >> in concurrency with device deregistration from different functions,
> >> pasting one sample for reference.
> >>
> >> [496733.627565][T13385] Call trace:
> >> [496733.627570][T13385] bpf_prog_ce7c9180c3b128ea_cgroupskb_egres+0x24c/0x7f0
> >> [496733.627581][T13385] __cgroup_bpf_run_filter_skb+0x128/0x498
> >> [496733.627595][T13385] ip_finish_output+0xa4/0xf4
> >> [496733.627605][T13385] ip_output+0x100/0x1a0
> >> [496733.627613][T13385] ip_send_skb+0x68/0x100
> >> [496733.627618][T13385] udp_send_skb+0x1c4/0x384
> >> [496733.627625][T13385] udp_sendmsg+0x7b0/0x898
> >> [496733.627631][T13385] inet_sendmsg+0x5c/0x7c
> >> [496733.627639][T13385] __sys_sendto+0x174/0x1e4
> >> [496733.627647][T13385] __arm64_sys_sendto+0x28/0x3c
> >> [496733.627653][T13385] invoke_syscall+0x58/0x11c
> >> [496733.627662][T13385] el0_svc_common+0x88/0xf4
> >> [496733.627669][T13385] do_el0_svc+0x2c/0xb0
> >> [496733.627676][T13385] el0_svc+0x2c/0xa4
> >> [496733.627683][T13385] el0t_64_sync_handler+0x68/0xb4
> >> [496733.627689][T13385] el0t_64_sync+0x1a4/0x1a8
> >>
> >> Changes in v2:
> >> - Addressed review comments from Eric Dumazet
> >> - Used READ_ONCE() to prevent potential load/store tearing
> >> - Added skb_dst_dev_rcu() and used along with rcu_read_lock() in ip_output
> >>
> >> Signed-off-by: Sharath Chandra Vurukala <quic_sharathv@quicinc.com>
> >> ---
> >> include/net/dst.h | 12 ++++++++++++
> >> net/ipv4/ip_output.c | 17 ++++++++++++-----
> >> 2 files changed, 24 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/include/net/dst.h b/include/net/dst.h
> >> index 00467c1b5093..692ebb1b3f42 100644
> >> --- a/include/net/dst.h
> >> +++ b/include/net/dst.h
> >> @@ -568,11 +568,23 @@ static inline struct net_device *dst_dev(const struct dst_entry *dst)
> >> return READ_ONCE(dst->dev);
> >> }
> >>
> >> +static inline struct net_device *dst_dev_rcu(const struct dst_entry *dst)
> >> +{
> >> + /* In the future, use rcu_dereference(dst->dev) */
> >> + WARN_ON(!rcu_read_lock_held());
> >
> > WARN_ON_ONCE or even DEBUG_NET_WARN_ON_ONCE
> >
> That makes sense — I will revise the code to use WARN_ON_ONCE() accordingly.>> + return READ_ONCE(dst->dev);
> >> +}
> >> +
> >> static inline struct net_device *skb_dst_dev(const struct sk_buff *skb)
> >> {
> >> return dst_dev(skb_dst(skb));
> >> }
> >>
> >> +static inline struct net_device *skb_dst_dev_rcu(const struct sk_buff *skb)
> >> +{
> >> + return dst_dev_rcu(skb_dst(skb));
> >> +}
> >> +
> >> static inline struct net *skb_dst_dev_net(const struct sk_buff *skb)
> >> {
> >> return dev_net(skb_dst_dev(skb));
> >> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> >> index 10a1d182fd84..d70d79b71897 100644
> >> --- a/net/ipv4/ip_output.c
> >> +++ b/net/ipv4/ip_output.c
> >> @@ -425,15 +425,22 @@ int ip_mc_output(struct net *net, struct sock *sk, struct sk_buff *skb)
> >>
> >> int ip_output(struct net *net, struct sock *sk, struct sk_buff *skb)
> >> {
> >> - struct net_device *dev = skb_dst_dev(skb), *indev = skb->dev;
> >> + struct net_device *dev, *indev = skb->dev;
> >> + int ret_val;
> >>
> >> + IP_UPD_PO_STATS(net, IPSTATS_MIB_OUT, skb->len);
> >
> > Why introduce this?
> >
> Apologies for the oversight. The branch I am currently working on is quite outdated, and this line originates from that earlier version.
> This line appears to have been unintentionally included during the preparation of the patch for submission to net-next. Will correct this.>> +
Ack thanks.
> >> + rcu_read_lock();
How do we know that all paths taken from here are safe to be run
inside an rcu readside critical section btw?
> >> + dev = skb_dst_dev_rcu(skb);
> >> skb->dev = dev;
> >> skb->protocol = htons(ETH_P_IP);
> >>
> >> - return NF_HOOK_COND(NFPROTO_IPV4, NF_INET_POST_ROUTING,
> >> - net, sk, skb, indev, dev,
> >> - ip_finish_output,
> >> - !(IPCB(skb)->flags & IPSKB_REROUTED));
> >> + ret_val = NF_HOOK_COND(NFPROTO_IPV4, NF_INET_POST_ROUTING,
> >> + net, sk, skb, indev, dev,
> >> + ip_finish_output,
> >> + !(IPCB(skb)->flags & IPSKB_REROUTED));
> >> + rcu_read_unlock();
> >> + return ret_val;
> >> }
> >> EXPORT_SYMBOL(ip_output);
> >>
> >
> >
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] net: Add locking to protect skb->dev access in ip_output
2025-07-29 16:11 ` Willem de Bruijn
@ 2025-07-29 16:14 ` Eric Dumazet
2025-07-29 16:25 ` Willem de Bruijn
0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2025-07-29 16:14 UTC (permalink / raw)
To: Willem de Bruijn
Cc: Sharath Chandra Vurukala, davem, dsahern, kuba, pabeni, netdev,
quic_kapandey, quic_subashab
On Tue, Jul 29, 2025 at 9:11 AM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> Sharath Chandra Vurukala wrote:
>
> > >> + rcu_read_lock();
>
> How do we know that all paths taken from here are safe to be run
> inside an rcu readside critical section btw?
This is totally safe ;)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] net: Add locking to protect skb->dev access in ip_output
2025-07-29 16:14 ` Eric Dumazet
@ 2025-07-29 16:25 ` Willem de Bruijn
2025-07-29 16:43 ` Eric Dumazet
0 siblings, 1 reply; 8+ messages in thread
From: Willem de Bruijn @ 2025-07-29 16:25 UTC (permalink / raw)
To: Eric Dumazet, Willem de Bruijn
Cc: Sharath Chandra Vurukala, davem, dsahern, kuba, pabeni, netdev,
quic_kapandey, quic_subashab
Eric Dumazet wrote:
> On Tue, Jul 29, 2025 at 9:11 AM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > Sharath Chandra Vurukala wrote:
> >
> > > >> + rcu_read_lock();
> >
> > How do we know that all paths taken from here are safe to be run
> > inside an rcu readside critical section btw?
>
> This is totally safe ;)
I trust that it is. It's just not immediately obvious to me why.
__dev_queue_xmit_nit calls rcu_read_lock_bh, so the safety of anything
downstream is clear.
But do all protocol stacks do this?
I see that TCP does, through __ip_queue_xmit. So that means all
code downstream of that, including all the modular netfilter code
already has to be safe indeed. That should suffice.
I started by looking at the UDP path and see no equivalent
rcu_read_lock call in that path however.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] net: Add locking to protect skb->dev access in ip_output
2025-07-29 16:25 ` Willem de Bruijn
@ 2025-07-29 16:43 ` Eric Dumazet
2025-07-29 17:12 ` Willem de Bruijn
0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2025-07-29 16:43 UTC (permalink / raw)
To: Willem de Bruijn
Cc: Sharath Chandra Vurukala, davem, dsahern, kuba, pabeni, netdev,
quic_kapandey, quic_subashab
On Tue, Jul 29, 2025 at 9:25 AM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> Eric Dumazet wrote:
> > On Tue, Jul 29, 2025 at 9:11 AM Willem de Bruijn
> > <willemdebruijn.kernel@gmail.com> wrote:
> > >
> > > Sharath Chandra Vurukala wrote:
> > >
> > > > >> + rcu_read_lock();
> > >
> > > How do we know that all paths taken from here are safe to be run
> > > inside an rcu readside critical section btw?
> >
> > This is totally safe ;)
>
> I trust that it is. It's just not immediately obvious to me why.
>
> __dev_queue_xmit_nit calls rcu_read_lock_bh, so the safety of anything
> downstream is clear.
>
> But do all protocol stacks do this?
>
> I see that TCP does, through __ip_queue_xmit. So that means all
> code downstream of that, including all the modular netfilter code
> already has to be safe indeed. That should suffice.
>
> I started by looking at the UDP path and see no equivalent
> rcu_read_lock call in that path however.
ip_output() can already be called from sections rcu_read_lock() protected,
or from BH context.
The caller's context does not matter. I am unsure what you were
looking at in the UDP stack ?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] net: Add locking to protect skb->dev access in ip_output
2025-07-29 16:43 ` Eric Dumazet
@ 2025-07-29 17:12 ` Willem de Bruijn
0 siblings, 0 replies; 8+ messages in thread
From: Willem de Bruijn @ 2025-07-29 17:12 UTC (permalink / raw)
To: Eric Dumazet, Willem de Bruijn
Cc: Sharath Chandra Vurukala, davem, dsahern, kuba, pabeni, netdev,
quic_kapandey, quic_subashab
Eric Dumazet wrote:
> On Tue, Jul 29, 2025 at 9:25 AM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > Eric Dumazet wrote:
> > > On Tue, Jul 29, 2025 at 9:11 AM Willem de Bruijn
> > > <willemdebruijn.kernel@gmail.com> wrote:
> > > >
> > > > Sharath Chandra Vurukala wrote:
> > > >
> > > > > >> + rcu_read_lock();
> > > >
> > > > How do we know that all paths taken from here are safe to be run
> > > > inside an rcu readside critical section btw?
> > >
> > > This is totally safe ;)
> >
> > I trust that it is. It's just not immediately obvious to me why.
> >
> > __dev_queue_xmit_nit calls rcu_read_lock_bh, so the safety of anything
> > downstream is clear.
> >
> > But do all protocol stacks do this?
> >
> > I see that TCP does, through __ip_queue_xmit. So that means all
> > code downstream of that, including all the modular netfilter code
> > already has to be safe indeed. That should suffice.
> >
> > I started by looking at the UDP path and see no equivalent
> > rcu_read_lock call in that path however.
>
> ip_output() can already be called from sections rcu_read_lock() protected,
> or from BH context.
>
> The caller's context does not matter. I am unsure what you were
> looking at in the UDP stack ?
I was just looking indeed for a caller that held rcu_read_lock already.
I saw the __ip_queue_xmit caller only after I asked the question. But
indeed calling from bottom half is another obvious answer.
Might be informative to add a comment to the commit message to that
effect.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-07-29 17:12 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-29 11:42 [PATCH v2] net: Add locking to protect skb->dev access in ip_output Sharath Chandra Vurukala
2025-07-29 14:04 ` Willem de Bruijn
2025-07-29 15:44 ` Sharath Chandra Vurukala
2025-07-29 16:11 ` Willem de Bruijn
2025-07-29 16:14 ` Eric Dumazet
2025-07-29 16:25 ` Willem de Bruijn
2025-07-29 16:43 ` Eric Dumazet
2025-07-29 17:12 ` Willem de Bruijn
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).