netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Dumazet <edumazet@google.com>
To: Sharath Chandra Vurukala <quic_sharathv@quicinc.com>
Cc: davem@davemloft.net, dsahern@kernel.org, kuba@kernel.org,
	 pabeni@redhat.com, netdev@vger.kernel.org,
	quic_kapandey@quicinc.com,  quic_subashab@quicinc.com
Subject: Re: [PATCH] net: Add locking to protect skb->dev access in ip_output
Date: Wed, 23 Jul 2025 08:08:58 -0700	[thread overview]
Message-ID: <CANn89iLx29ovUNTp9DjzzeeAOZfKvsokztp_rj6qo1+aSjvrgw@mail.gmail.com> (raw)
In-Reply-To: <20250723082201.GA14090@hu-sharathv-hyd.qualcomm.com>

On Wed, Jul 23, 2025 at 1:22 AM Sharath Chandra Vurukala
<quic_sharathv@quicinc.com> wrote:
>
> In ip_output() skb->dev is updated from the skb_dst(skb)->dev
> this can become invalid when the interface is unregistered and freed,
>
> Added rcu locks to ip_output().This will ensure that all the skb's
> associated with the dev being deregistered will be transnmitted
> out first, before freeing the dev.
>
> Multiple panic call stacks were observed when UL traffic was run
> in concurrency with device deregistration from different functions,
> pasting one sample for reference.
>
> [496733.627565][T13385] Call trace:
> [496733.627570][T13385] bpf_prog_ce7c9180c3b128ea_cgroupskb_egres+0x24c/0x7f0
> [496733.627581][T13385] __cgroup_bpf_run_filter_skb+0x128/0x498
> [496733.627595][T13385] ip_finish_output+0xa4/0xf4
> [496733.627605][T13385] ip_output+0x100/0x1a0
> [496733.627613][T13385] ip_send_skb+0x68/0x100
> [496733.627618][T13385] udp_send_skb+0x1c4/0x384
> [496733.627625][T13385] udp_sendmsg+0x7b0/0x898
> [496733.627631][T13385] inet_sendmsg+0x5c/0x7c
> [496733.627639][T13385] __sys_sendto+0x174/0x1e4
> [496733.627647][T13385] __arm64_sys_sendto+0x28/0x3c
> [496733.627653][T13385] invoke_syscall+0x58/0x11c
> [496733.627662][T13385] el0_svc_common+0x88/0xf4
> [496733.627669][T13385] do_el0_svc+0x2c/0xb0
> [496733.627676][T13385] el0_svc+0x2c/0xa4
> [496733.627683][T13385] el0t_64_sync_handler+0x68/0xb4
> [496733.627689][T13385] el0t_64_sync+0x1a4/0x1a8
>
> Signed-off-by: Sharath Chandra Vurukala <quic_sharathv@quicinc.com>
> ---
>  net/ipv4/ip_output.c | 17 ++++++++++++-----
>  1 file changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index 10a1d182fd84..95c5e9cfc971 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -425,15 +425,22 @@ int ip_mc_output(struct net *net, struct sock *sk, struct sk_buff *skb)
>
>  int ip_output(struct net *net, struct sock *sk, struct sk_buff *skb)
>  {
> -       struct net_device *dev = skb_dst_dev(skb), *indev = skb->dev;
> +       struct net_device *dev, *indev = skb->dev;
>
> +       IP_UPD_PO_STATS(net, IPSTATS_MIB_OUT, skb->len);
> +
> +       rcu_read_lock();
> +
> +       dev = skb_dst(skb)->dev;

Arg... Please do not remove skb_dst_dev(skb), and instead expand it.

I recently started to work on this class of problems.

commit a74fc62eec155ca5a6da8ff3856f3dc87fe24558
Author: Eric Dumazet <edumazet@google.com>
Date:   Mon Jun 30 12:19:31 2025 +0000

    ipv4: adopt dst_dev, skb_dst_dev and skb_dst_dev_net[_rcu]

    Use the new helpers as a first step to deal with
    potential dst->dev races.

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
    Link: https://patch.msgid.link/20250630121934.3399505-8-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>


Adding RCU is not good enough, we still need the READ_ONCE() to
prevent potential load/store tearing.

I was planning to add skb_dst_dev_rcu() helper and start replacing
skb_dst_dev() where needed.

diff --git a/include/net/dst.h b/include/net/dst.h
index 00467c1b509389a8e37d6e3d0912374a0ff12c4a..692ebb1b3f421210dbb58990b77a200b9189d0f7
100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -568,11 +568,23 @@ static inline struct net_device *dst_dev(const
struct dst_entry *dst)
        return READ_ONCE(dst->dev);
 }

+static inline struct net_device *dst_dev_rcu(const struct dst_entry *dst)
+{
+       /* In the future, use rcu_dereference(dst->dev) */
+       WARN_ON(!rcu_read_lock_held());
+       return READ_ONCE(dst->dev);
+}
+
 static inline struct net_device *skb_dst_dev(const struct sk_buff *skb)
 {
        return dst_dev(skb_dst(skb));
 }

+static inline struct net_device *skb_dst_dev_rcu(const struct sk_buff *skb)
+{
+       return dst_dev_rcu(skb_dst(skb));
+}
+
 static inline struct net *skb_dst_dev_net(const struct sk_buff *skb)
 {
        return dev_net(skb_dst_dev(skb));
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 10a1d182fd848f0d2348f65fde269383f9c07baa..37b982dd53f69247634c67c493c44fa482100dee
100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -425,15 +425,20 @@ int ip_mc_output(struct net *net, struct sock
*sk, struct sk_buff *skb)

 int ip_output(struct net *net, struct sock *sk, struct sk_buff *skb)
 {
-       struct net_device *dev = skb_dst_dev(skb), *indev = skb->dev;
+       struct net_device *dev, *indev = skb->dev;
+       int res;

+       rcu_read_lock();
+       dev = skb_dst_dev_rcu(skb);
        skb->dev = dev;
        skb->protocol = htons(ETH_P_IP);

-       return NF_HOOK_COND(NFPROTO_IPV4, NF_INET_POST_ROUTING,
+       res = NF_HOOK_COND(NFPROTO_IPV4, NF_INET_POST_ROUTING,
                            net, sk, skb, indev, dev,
                            ip_finish_output,
                            !(IPCB(skb)->flags & IPSKB_REROUTED));
+       rcu_read_unlock();
+       return res;
 }
 EXPORT_SYMBOL(ip_output);

  reply	other threads:[~2025-07-23 15:09 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-23  8:22 [PATCH] net: Add locking to protect skb->dev access in ip_output Sharath Chandra Vurukala
2025-07-23 15:08 ` Eric Dumazet [this message]
2025-07-24  6:15   ` Sharath Chandra Vurukala
2025-07-24  8:03     ` Eric Dumazet
2025-07-29 11:45       ` Sharath Chandra Vurukala
2025-07-24  0:46 ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CANn89iLx29ovUNTp9DjzzeeAOZfKvsokztp_rj6qo1+aSjvrgw@mail.gmail.com \
    --to=edumazet@google.com \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=quic_kapandey@quicinc.com \
    --cc=quic_sharathv@quicinc.com \
    --cc=quic_subashab@quicinc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).