From: Leon Romanovsky <leon@kernel.org>
To: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Steffen Klassert <steffen.klassert@secunet.com>,
Herbert Xu <herbert@gondor.apana.org.au>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Simon Horman <horms@kernel.org>, Ilan Tayari <ilant@mellanox.com>,
Guy Shapiro <guysh@mellanox.com>,
Yossi Kuperman <yossiku@mellanox.com>,
Network Development <netdev@vger.kernel.org>,
Sabrina Dubroca <sd@queasysnail.net>
Subject: Re: [PATCH net] xfrm: always flush state and policy upon NETDEV_DOWN/NETDEV_UNREGISTER events
Date: Wed, 28 Jan 2026 12:24:04 +0200 [thread overview]
Message-ID: <20260128102404.GC12149@unreal> (raw)
In-Reply-To: <1de734e2-1da6-4b5c-8e03-66af7f88d074@I-love.SAKURA.ne.jp>
On Wed, Jan 28, 2026 at 12:32:08AM +0900, Tetsuo Handa wrote:
> syzbot is reporting that "struct xfrm_state" refcount is leaking.
>
> unregister_netdevice: waiting for netdevsim0 to become free. Usage count = 2
> ref_tracker: netdev@ffff888052f24618 has 1/1 users at
> __netdev_tracker_alloc include/linux/netdevice.h:4400 [inline]
> netdev_tracker_alloc include/linux/netdevice.h:4412 [inline]
> xfrm_dev_state_add+0x3a5/0x1080 net/xfrm/xfrm_device.c:316
> xfrm_state_construct net/xfrm/xfrm_user.c:986 [inline]
> xfrm_add_sa+0x34ff/0x5fa0 net/xfrm/xfrm_user.c:1022
> xfrm_user_rcv_msg+0x58e/0xc00 net/xfrm/xfrm_user.c:3507
> netlink_rcv_skb+0x158/0x420 net/netlink/af_netlink.c:2550
> xfrm_netlink_rcv+0x71/0x90 net/xfrm/xfrm_user.c:3529
> netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
> netlink_unicast+0x5aa/0x870 net/netlink/af_netlink.c:1344
> netlink_sendmsg+0x8c8/0xdd0 net/netlink/af_netlink.c:1894
> sock_sendmsg_nosec net/socket.c:727 [inline]
> __sock_sendmsg net/socket.c:742 [inline]
> ____sys_sendmsg+0xa5d/0xc30 net/socket.c:2592
> ___sys_sendmsg+0x134/0x1d0 net/socket.c:2646
> __sys_sendmsg+0x16d/0x220 net/socket.c:2678
> do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> do_syscall_64+0xcd/0xf80 arch/x86/entry/syscall_64.c:94
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> This is because currently xfrm_dev_down() (which is called upon NETDEV_DOWN
> event and NETDEV_UNREGISTER event) can release the reference to
> "struct net_device" taken by xfrm_dev_state_add() only if
> the NETIF_F_HW_ESP bit is set, but the NETIF_F_HW_ESP bit can be cleared
> by "ethtool -K $dev esp-hw-offload off" command.
> In other words, we cannot guess whether xfrm_dev_state_add() has taken a
> reference to "struct net_device" based on whether the NETIF_F_HW_ESP bit
> is currently set.
>
> For recording why this patch does not re-introduce xfrm_dev_unregister(),
> my guessed history about this module's NETDEV_UNREGISTER handler is shown
> below.
>
> Commit d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API")
> introduced xfrm_dev_state_add(). That commit called xfrm_dev_state_add()
> from xfrm_state_construct(), and introduced the NETDEV_UNREGISTER case
> to xfrm_dev_event(). But that commit implemented xfrm_dev_unregister() as
> a no-op, and implemented xfrm_dev_down() to call xfrm_dev_state_flush()
> only if (dev->features & NETIF_F_HW_ESP) != 0. I guess that that commit
> expected that NETDEV_DOWN event is fired before NETDEV_UNREGISTER event
> fires, and also assumed that xfrm_dev_state_add() is called only if
> (dev->features & NETIF_F_HW_ESP) != 0.
>
> Commit ec30d78c14a8 ("xfrm: add xdst pcpu cache") added
> xfrm_policy_cache_flush() call to xfrm_dev_unregister(), but
> commit e4db5b61c572 ("xfrm: policy: remove pcpu policy cache") removed
> xfrm_policy_cache_flush() call from xfrm_dev_unregister() and also
> removed the NETDEV_UNREGISTER case from xfrm_dev_event() because
> xfrm_dev_unregister() again became no-op.
>
> Commit 03891f820c21 ("xfrm: handle NETDEV_UNREGISTER for xfrm device")
> re-introduced the NETDEV_UNREGISTER case to xfrm_dev_event(), but that
> commit for unknown reason chose to share xfrm_dev_down() between the
> NETDEV_DOWN case and the NETDEV_UNREGISTER case. Therefore, I assumed
> that doing the same behavior for both cases is desirable. If something
> is wrong with this choice, please re-introduce xfrm_dev_unregister().
>
> Reported-by: syzbot+881d65229ca4f9ae8c84@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=881d65229ca4f9ae8c84
> Fixes: d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API")
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> ---
> Previous discussion is https://lkml.kernel.org/r/924f9cf5-599a-48f0-b1e3-94cd971965b0@I-love.SAKURA.ne.jp
>
> net/xfrm/xfrm_device.c | 6 ++----
> 1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
> index 52ae0e034d29..26e62b6a9db5 100644
> --- a/net/xfrm/xfrm_device.c
> +++ b/net/xfrm/xfrm_device.c
> @@ -536,10 +536,8 @@ static int xfrm_api_check(struct net_device *dev)
>
> static int xfrm_dev_down(struct net_device *dev)
> {
> - if (dev->features & NETIF_F_HW_ESP) {
> - xfrm_dev_state_flush(dev_net(dev), dev, true);
> - xfrm_dev_policy_flush(dev_net(dev), dev, true);
> - }
> + xfrm_dev_state_flush(dev_net(dev), dev, true);
> + xfrm_dev_policy_flush(dev_net(dev), dev, true);
I think this can work, but IMHO the more robust approach is to ensure that all
states and policies are removed when the NETIF_F_HW_ESP feature bit is cleared.
Would it be possible to handle this in NETDEV_FEAT_CHANGE?
Thanks.
>
> return NOTIFY_DONE;
> }
> --
> 2.47.3
>
next prev parent reply other threads:[~2026-01-28 10:24 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-27 15:32 [PATCH net] xfrm: always flush state and policy upon NETDEV_DOWN/NETDEV_UNREGISTER events Tetsuo Handa
2026-01-28 10:24 ` Leon Romanovsky [this message]
2026-01-28 10:44 ` Tetsuo Handa
2026-01-28 12:35 ` Leon Romanovsky
2026-01-29 8:06 ` Tetsuo Handa
2026-01-29 9:09 ` Leon Romanovsky
2026-01-29 10:16 ` Tetsuo Handa
2026-01-29 10:32 ` Tetsuo Handa
2026-01-29 16:05 ` Sabrina Dubroca
2026-02-01 13:12 ` Leon Romanovsky
2026-02-01 14:17 ` Tetsuo Handa
2026-01-29 15:59 ` Sabrina Dubroca
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260128102404.GC12149@unreal \
--to=leon@kernel.org \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=guysh@mellanox.com \
--cc=herbert@gondor.apana.org.au \
--cc=horms@kernel.org \
--cc=ilant@mellanox.com \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=penguin-kernel@i-love.sakura.ne.jp \
--cc=sd@queasysnail.net \
--cc=steffen.klassert@secunet.com \
--cc=yossiku@mellanox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.