netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: Sharath Chandra Vurukala <quic_sharathv@quicinc.com>,
	 Willem de Bruijn <willemdebruijn.kernel@gmail.com>,
	 davem@davemloft.net,  dsahern@kernel.org,  edumazet@google.com,
	 kuba@kernel.org,  pabeni@redhat.com,  netdev@vger.kernel.org
Cc: quic_kapandey@quicinc.com,  quic_subashab@quicinc.com
Subject: Re: [PATCH v2] net: Add locking to protect skb->dev access in ip_output
Date: Tue, 29 Jul 2025 12:11:45 -0400	[thread overview]
Message-ID: <6888f2c11bd24_16648b29465@willemb.c.googlers.com.notmuch> (raw)
In-Reply-To: <b6beefcf-7525-4c70-9883-4ab8c8ba38ed@quicinc.com>

Sharath Chandra Vurukala wrote:
> 
> 
> On 7/29/2025 7:34 PM, Willem de Bruijn wrote:
> > Sharath Chandra Vurukala wrote:
> >> In ip_output() skb->dev is updated from the skb_dst(skb)->dev
> >> this can become invalid when the interface is unregistered and freed,
> >>
> >> Introduced new skb_dst_dev_rcu() function to be used instead of
> >> skb_dst_dev() within rcu_locks in outout.This will ensure that
> >> all the skb's associated with the dev being deregistered will
> >> be transnmitted out first, before freeing the dev.
> >>
> >> Multiple panic call stacks were observed when UL traffic was run
> >> in concurrency with device deregistration from different functions,
> >> pasting one sample for reference.
> >>
> >> [496733.627565][T13385] Call trace:
> >> [496733.627570][T13385] bpf_prog_ce7c9180c3b128ea_cgroupskb_egres+0x24c/0x7f0
> >> [496733.627581][T13385] __cgroup_bpf_run_filter_skb+0x128/0x498
> >> [496733.627595][T13385] ip_finish_output+0xa4/0xf4
> >> [496733.627605][T13385] ip_output+0x100/0x1a0
> >> [496733.627613][T13385] ip_send_skb+0x68/0x100
> >> [496733.627618][T13385] udp_send_skb+0x1c4/0x384
> >> [496733.627625][T13385] udp_sendmsg+0x7b0/0x898
> >> [496733.627631][T13385] inet_sendmsg+0x5c/0x7c
> >> [496733.627639][T13385] __sys_sendto+0x174/0x1e4
> >> [496733.627647][T13385] __arm64_sys_sendto+0x28/0x3c
> >> [496733.627653][T13385] invoke_syscall+0x58/0x11c
> >> [496733.627662][T13385] el0_svc_common+0x88/0xf4
> >> [496733.627669][T13385] do_el0_svc+0x2c/0xb0
> >> [496733.627676][T13385] el0_svc+0x2c/0xa4
> >> [496733.627683][T13385] el0t_64_sync_handler+0x68/0xb4
> >> [496733.627689][T13385] el0t_64_sync+0x1a4/0x1a8
> >>
> >> Changes in v2:
> >> - Addressed review comments from Eric Dumazet
> >> - Used READ_ONCE() to prevent potential load/store tearing
> >> - Added skb_dst_dev_rcu() and used along with rcu_read_lock() in ip_output
> >>
> >> Signed-off-by: Sharath Chandra Vurukala <quic_sharathv@quicinc.com>
> >> ---
> >>  include/net/dst.h    | 12 ++++++++++++
> >>  net/ipv4/ip_output.c | 17 ++++++++++++-----
> >>  2 files changed, 24 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/include/net/dst.h b/include/net/dst.h
> >> index 00467c1b5093..692ebb1b3f42 100644
> >> --- a/include/net/dst.h
> >> +++ b/include/net/dst.h
> >> @@ -568,11 +568,23 @@ static inline struct net_device *dst_dev(const struct dst_entry *dst)
> >>  	return READ_ONCE(dst->dev);
> >>  }
> >>  
> >> +static inline struct net_device *dst_dev_rcu(const struct dst_entry *dst)
> >> +{
> >> +	/* In the future, use rcu_dereference(dst->dev) */
> >> +	WARN_ON(!rcu_read_lock_held());
> > 
> > WARN_ON_ONCE or even DEBUG_NET_WARN_ON_ONCE
> > 
> That makes sense — I will revise the code to use WARN_ON_ONCE() accordingly.>> +	return READ_ONCE(dst->dev);
> >> +}
> >> +
> >>  static inline struct net_device *skb_dst_dev(const struct sk_buff *skb)
> >>  {
> >>  	return dst_dev(skb_dst(skb));
> >>  }
> >>  
> >> +static inline struct net_device *skb_dst_dev_rcu(const struct sk_buff *skb)
> >> +{
> >> +	return dst_dev_rcu(skb_dst(skb));
> >> +}
> >> +
> >>  static inline struct net *skb_dst_dev_net(const struct sk_buff *skb)
> >>  {
> >>  	return dev_net(skb_dst_dev(skb));
> >> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> >> index 10a1d182fd84..d70d79b71897 100644
> >> --- a/net/ipv4/ip_output.c
> >> +++ b/net/ipv4/ip_output.c
> >> @@ -425,15 +425,22 @@ int ip_mc_output(struct net *net, struct sock *sk, struct sk_buff *skb)
> >>  
> >>  int ip_output(struct net *net, struct sock *sk, struct sk_buff *skb)
> >>  {
> >> -	struct net_device *dev = skb_dst_dev(skb), *indev = skb->dev;
> >> +	struct net_device *dev, *indev = skb->dev;
> >> +	int ret_val;
> >>  
> >> +	IP_UPD_PO_STATS(net, IPSTATS_MIB_OUT, skb->len);
> > 
> > Why introduce this?
> > 
> Apologies for the oversight. The branch I am currently working on is quite outdated, and this line originates from that earlier version.
> This line appears to have been unintentionally included during the preparation of the patch for submission to net-next. Will correct this.>> +

Ack thanks.

> >> +	rcu_read_lock();

How do we know that all paths taken from here are safe to be run
inside an rcu readside critical section btw?

> >> +	dev = skb_dst_dev_rcu(skb);
> >>  	skb->dev = dev;
> >>  	skb->protocol = htons(ETH_P_IP);
> >>  
> >> -	return NF_HOOK_COND(NFPROTO_IPV4, NF_INET_POST_ROUTING,
> >> -			    net, sk, skb, indev, dev,
> >> -			    ip_finish_output,
> >> -			    !(IPCB(skb)->flags & IPSKB_REROUTED));
> >> +	ret_val = NF_HOOK_COND(NFPROTO_IPV4, NF_INET_POST_ROUTING,
> >> +			       net, sk, skb, indev, dev,
> >> +				ip_finish_output,
> >> +				!(IPCB(skb)->flags & IPSKB_REROUTED));
> >> +	rcu_read_unlock();
> >> +	return ret_val;
> >>  }
> >>  EXPORT_SYMBOL(ip_output);
> >>  
> > 
> > 
> 



  reply	other threads:[~2025-07-29 16:11 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-29 11:42 [PATCH v2] net: Add locking to protect skb->dev access in ip_output Sharath Chandra Vurukala
2025-07-29 14:04 ` Willem de Bruijn
2025-07-29 15:44   ` Sharath Chandra Vurukala
2025-07-29 16:11     ` Willem de Bruijn [this message]
2025-07-29 16:14       ` Eric Dumazet
2025-07-29 16:25         ` Willem de Bruijn
2025-07-29 16:43           ` Eric Dumazet
2025-07-29 17:12             ` Willem de Bruijn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6888f2c11bd24_16648b29465@willemb.c.googlers.com.notmuch \
    --to=willemdebruijn.kernel@gmail.com \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=quic_kapandey@quicinc.com \
    --cc=quic_sharathv@quicinc.com \
    --cc=quic_subashab@quicinc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).