From: "Russell King (Oracle)" <linux@armlinux.org.uk>
To: Daniel Golle <daniel@makrotopia.org>
Cc: Andrew Lunn <andrew@lunn.ch>,
Heiner Kallweit <hkallweit1@gmail.com>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH net] net: phy: skip LED triggers on PHYs on SFP modules
Date: Fri, 15 Dec 2023 16:45:01 +0000 [thread overview]
Message-ID: <ZXyCjfW79jPqd1G6@shell.armlinux.org.uk> (raw)
In-Reply-To: <102a9dce38bdf00215735d04cd4704458273ad9c.1702339354.git.daniel@makrotopia.org>
On Tue, Dec 12, 2023 at 12:05:35AM +0000, Daniel Golle wrote:
> -> #3 (&sfp->sm_mutex){+.+.}-{3:3}:
> __mutex_lock+0x88/0x7a0
> mutex_lock_nested+0x20/0x28
> cleanup_module+0x2ae0/0x3120 [sfp]
> sfp_register_bus+0x5c/0x9c
> sfp_register_socket+0x48/0xd4
> cleanup_module+0x271c/0x3120 [sfp]
> platform_probe+0x64/0xb8
> really_probe+0x17c/0x3c0
> __driver_probe_device+0x78/0x164
> driver_probe_device+0x3c/0xd4
> __driver_attach+0xec/0x1f0
> bus_for_each_dev+0x60/0xa0
> driver_attach+0x20/0x28
> bus_add_driver+0x108/0x208
> driver_register+0x5c/0x118
> __platform_driver_register+0x24/0x2c
> init_module+0x28/0xa7c [sfp]
> do_one_initcall+0x70/0x2ec
> do_init_module+0x54/0x1e4
> load_module+0x1b78/0x1c8c
> __do_sys_init_module+0x1bc/0x2cc
> __arm64_sys_init_module+0x18/0x20
> invoke_syscall.constprop.0+0x4c/0xdc
> do_el0_svc+0x3c/0xbc
> el0_svc+0x34/0x80
> el0t_64_sync_handler+0xf8/0x124
> el0t_64_sync+0x150/0x154
I suspect these backtraces aren't all that reliable, and look like they
are generated by walking through the stack and logging anything that
seems to be pointing into the text segment, which is rubbish for ARM32
and probably ARM64 as well.
We can see that this backtrace is a pile of lies because there is _no_
_way_ that sfp's cleanup_module() could ever be called while its
init_module() function is running.
In any case, I think this path is irrelevant.
> -> #2 (rtnl_mutex){+.+.}-{3:3}:
> __mutex_lock+0x88/0x7a0
> mutex_lock_nested+0x20/0x28
> rtnl_lock+0x18/0x20
> set_device_name+0x30/0x130
> netdev_trig_activate+0x13c/0x1ac
> led_trigger_set+0x118/0x234
> led_trigger_write+0x104/0x17c
> sysfs_kf_bin_write+0x64/0x80
> kernfs_fop_write_iter+0x128/0x1b4
> vfs_write+0x178/0x2a4
> ksys_write+0x58/0xd4
> __arm64_sys_write+0x18/0x20
> invoke_syscall.constprop.0+0x4c/0xdc
> do_el0_svc+0x3c/0xbc
> el0_svc+0x34/0x80
> el0t_64_sync_handler+0xf8/0x124
> el0t_64_sync+0x150/0x154
This is one of the paths that matters. A userspace write is occuring
to the netdev trigger module. This path takes the following locks
(most recent first):
rtnl_lock()
trigger_lock (write)
triggers_list_lock (read)
> -> #0 (triggers_list_lock){++++}-{3:3}:
> __lock_acquire+0x12a0/0x2014
> lock_acquire+0x100/0x2ac
> down_write+0x4c/0x13c
> led_trigger_register+0x4c/0x1a8
> phy_led_triggers_register+0x9c/0x214
> phy_attach_direct+0x154/0x36c
> phylink_attach_phy+0x30/0x60
> phylink_sfp_connect_phy+0x140/0x510
> sfp_add_phy+0x34/0x50
> init_module+0x15c/0xa7c [sfp]
> cleanup_module+0x1d94/0x3120 [sfp]
> cleanup_module+0x2bb4/0x3120 [sfp]
> process_one_work+0x1f8/0x4ec
> worker_thread+0x1e8/0x3d8
> kthread+0x104/0x110
> ret_from_fork+0x10/0x20
This path I suspect (but hard to see, we've got that cleanup_module
and init_module crud there again which is utter trash, sfp_add_phy
is not called from any of the functions previously listed)...
Manually going through the code instead, the locking order will be:
triggers_list_lock (write)
sm_mutex
rtnl
I'm not sure that the lockdep report is accurate, as it seems to be
blaming the deadlock via three locks (triggers_list_lock --> rtnl_mutex
--> &sfp->sm_mutex) but I can't find a path where sm_mutex would be
involved (except the immediate above.)
It looks to me like the problem is in part caused by calling
phy_led_triggers_register() while holding the rtnl lock. Holding the
rtnl lock is fundamental to being able to safely remove and add PHY
devices to netdevs while they are up and running.
The other part that causes the problem is a write to a netdev trigger
that causes it to activate takes the rtnl and triggers_list_lock in
the opposite order.
I don't currently see a solution to this.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
next prev parent reply other threads:[~2023-12-15 16:45 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-12 0:05 [PATCH net] net: phy: skip LED triggers on PHYs on SFP modules Daniel Golle
2023-12-12 14:35 ` Maxime Chevallier
2023-12-13 9:08 ` Andrew Lunn
2023-12-13 10:06 ` Maxime Chevallier
2023-12-13 10:47 ` Andrew Lunn
2023-12-13 15:35 ` Russell King (Oracle)
2023-12-13 15:27 ` Russell King (Oracle)
2023-12-13 17:12 ` Maxime Chevallier
2023-12-13 19:01 ` Daniel Golle
2023-12-13 20:23 ` Russell King (Oracle)
2023-12-14 9:48 ` Paolo Abeni
2023-12-14 16:52 ` Russell King (Oracle)
2023-12-15 9:46 ` Andrew Lunn
2023-12-15 9:59 ` Maxime Chevallier
2023-12-15 15:39 ` Maxime Chevallier
2023-12-15 2:31 ` Jakub Kicinski
2023-12-15 2:54 ` Daniel Golle
2023-12-15 9:53 ` Andrew Lunn
2023-12-15 16:45 ` Russell King (Oracle) [this message]
2023-12-16 2:00 ` patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZXyCjfW79jPqd1G6@shell.armlinux.org.uk \
--to=linux@armlinux.org.uk \
--cc=andrew@lunn.ch \
--cc=daniel@makrotopia.org \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=hkallweit1@gmail.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).