netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [linus:master] [net]  03abf2a7c6: WARNING:suspicious_RCU_usage
@ 2025-02-05  6:08 kernel test robot
  2025-02-05 17:19 ` Russell King (Oracle)
  0 siblings, 1 reply; 3+ messages in thread
From: kernel test robot @ 2025-02-05  6:08 UTC (permalink / raw)
  To: Russell King
  Cc: oe-lkp, lkp, linux-kernel, Jakub Kicinski, Jacob Keller, netdev,
	oliver.sang



Hello,

kernel test robot noticed "WARNING:suspicious_RCU_usage" on:

commit: 03abf2a7c65451e663b078b0ed1bfa648cd9380f ("net: phylink: add EEE management")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

[test failed on linus/master      5c8c229261f14159b54b9a32f12e5fa89d88b905]
[test failed on linux-next/master 40b8e93e17bff4a4e0cc129e04f9fdf5daa5397e]

in testcase: trinity
version: trinity-i386-abe9de86-1_20230429
with following parameters:

	runtime: 300s
	group: group-02
	nr_groups: 5



config: i386-randconfig-051-20250203
compiler: gcc-12
test machine: qemu-system-i386 -enable-kvm -cpu SandyBridge -smp 2 -m 4G

(please refer to attached dmesg/kmsg for entire log/backtrace)


+---------------------------------------------------------------------------+------------+------------+
|                                                                           | a17ceec62f | 03abf2a7c6 |
+---------------------------------------------------------------------------+------------+------------+
| WARNING:suspicious_RCU_usage                                              | 0          | 36         |
| drivers/net/phy/phy_device.c:#suspicious_rcu_dereference_protected()usage | 0          | 36         |
+---------------------------------------------------------------------------+------------+------------+


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202502051331.7587ac82-lkp@intel.com


[   19.591040][   T22] WARNING: suspicious RCU usage
[   19.592068][   T22] 6.13.0-rc7-01139-g03abf2a7c654 #1 Tainted: G S              T
[   19.593703][   T22] -----------------------------
[   19.594724][   T22] drivers/net/phy/phy_device.c:2004 suspicious rcu_dereference_protected() usage!
[   19.596546][   T22]
[   19.596546][   T22] other info that might help us debug this:
[   19.596546][   T22]
[   19.598680][   T22]
[   19.598680][   T22] rcu_scheduler_active = 2, debug_locks = 1
[   19.600338][   T22] 4 locks held by kworker/u4:1/22:
[ 19.601463][ T22] #0: c7d1e6b0 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work (kernel/workqueue.c:3211) 
[ 19.603512][ T22] #1: c512bf30 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work (kernel/workqueue.c:3212) 
[ 19.605344][ T22] #2: c8d1da78 (&dev->mutex){....}-{4:4}, at: device_lock (include/linux/device.h:1015) 
[ 19.610278][ T22] #3: c3fba014 (dsa2_mutex){+.+.}-{4:4}, at: dsa_register_switch (net/dsa/dsa.c:1499 net/dsa/dsa.c:1539) 
[   19.612136][   T22]
[   19.612136][   T22] stack backtrace:
[   19.613434][   T22] CPU: 0 UID: 0 PID: 22 Comm: kworker/u4:1 Tainted: G S              T  6.13.0-rc7-01139-g03abf2a7c654 #1 0503d02651d90c323d4064ac27ee9898a6e76f3e
[   19.616145][   T22] Tainted: [S]=CPU_OUT_OF_SPEC, [T]=RANDSTRUCT
[   19.617319][   T22] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[   19.619218][   T22] Workqueue: events_unbound deferred_probe_work_func
[   19.620457][   T22] Call Trace:
[ 19.621167][ T22] dump_stack_lvl (lib/dump_stack.c:123) 
[ 19.622090][ T22] dump_stack (lib/dump_stack.c:130) 
[ 19.622924][ T22] lockdep_rcu_suspicious (kernel/locking/lockdep.c:6846) 
[ 19.623932][ T22] phy_detach (drivers/net/phy/phy_device.c:2004 (discriminator 9)) 
[ 19.624755][ T22] phylink_connect_phy (drivers/net/phy/phylink.c:2327) 
[ 19.625475][ T22] dsa_user_create (net/dsa/user.c:2620 net/dsa/user.c:2655 net/dsa/user.c:2790) 
[ 19.626083][ T22] dsa_port_setup (net/dsa/dsa.c:519) 
[ 19.626631][ T22] dsa_tree_setup (net/dsa/dsa.c:759 net/dsa/dsa.c:888) 
[ 19.627196][ T22] ? dsa_switch_parse_ports (net/dsa/dsa.c:1440) 
[ 19.627844][ T22] dsa_register_switch (net/dsa/dsa.c:1525 net/dsa/dsa.c:1539) 
[ 19.628455][ T22] ? dev_get_by_name (net/core/dev.c:881) 
[ 19.629080][ T22] dsa_loop_drv_probe (drivers/net/dsa/dsa_loop.c:343) 
[ 19.629973][ T22] mdio_probe (drivers/net/phy/mdio_device.c:165) 
[ 19.630762][ T22] really_probe (drivers/base/dd.c:579 drivers/base/dd.c:658) 
[ 19.631583][ T22] __driver_probe_device (drivers/base/dd.c:800) 
[ 19.632519][ T22] driver_probe_device (drivers/base/dd.c:830) 
[ 19.633287][ T22] __device_attach_driver (drivers/base/dd.c:958) 
[ 19.633903][ T22] bus_for_each_drv (drivers/base/bus.c:459) 
[ 19.634466][ T22] __device_attach (drivers/base/dd.c:1032) 
[ 19.635018][ T22] ? driver_probe_device (drivers/base/dd.c:922) 
[ 19.635627][ T22] device_initial_probe (drivers/base/dd.c:1080) 
[ 19.636222][ T22] bus_probe_device (drivers/base/bus.c:536) 
[ 19.636783][ T22] deferred_probe_work_func (drivers/base/dd.c:124) 
[ 19.637569][ T22] process_one_work (include/trace/events/workqueue.h:110 include/trace/events/workqueue.h:110 kernel/workqueue.c:3241) 
[ 19.638328][ T22] ? __list_add (include/linux/list.h:150) 
[ 19.639013][ T22] process_scheduled_works (kernel/workqueue.c:3317) 
[ 19.639878][ T22] worker_thread (include/linux/list.h:373 kernel/workqueue.c:946 kernel/workqueue.c:3399) 
[ 19.640635][ T22] kthread (kernel/kthread.c:391) 
[ 19.641303][ T22] ? rescuer_thread (kernel/workqueue.c:3344) 
[ 19.642153][ T22] ? list_del_init (include/linux/posix-timers.h:225) 
[ 19.643006][ T22] ret_from_fork (arch/x86/kernel/process.c:153) 
[ 19.643818][ T22] ? list_del_init (include/linux/posix-timers.h:225) 
[ 19.644555][ T22] ret_from_fork_asm (arch/x86/entry/entry_32.S:737) 
[ 19.645201][ T22] entry_INT80_32 (arch/x86/entry/entry_32.S:945) 
[   19.646149][   T22] dsa-loop fixed-0:1f lan1 (uninitialized): failed to connect to PHY: -EPERM
[   19.647542][   T22] dsa-loop fixed-0:1f lan1 (uninitialized): error -1 setting up PHY for tree 0, switch 0, port 0
[   19.649283][   T22] dsa-loop fixed-0:1f lan2 (uninitialized): PHY [dsa-0.0:01] driver [Generic PHY] (irq=POLL)
[   19.650853][   T22] dsa-loop fixed-0:1f lan2 (uninitialized): failed to connect to PHY: -EPERM
[   19.652238][   T22] dsa-loop fixed-0:1f lan2 (uninitialized): error -1 setting up PHY for tree 0, switch 0, port 1
[   19.653856][   T22] dsa-loop fixed-0:1f lan3 (uninitialized): PHY [dsa-0.0:02] driver [Generic PHY] (irq=POLL)
[   19.655392][   T22] dsa-loop fixed-0:1f lan3 (uninitialized): failed to connect to PHY: -EPERM
[   19.656689][   T22] dsa-loop fixed-0:1f lan3 (uninitialized): error -1 setting up PHY for tree 0, switch 0, port 2
[   19.658308][   T22] dsa-loop fixed-0:1f lan4 (uninitialized): PHY [dsa-0.0:03] driver [Generic PHY] (irq=POLL)
[   19.659841][   T22] dsa-loop fixed-0:1f lan4 (uninitialized): failed to connect to PHY: -EPERM
[   19.661168][   T22] dsa-loop fixed-0:1f lan4 (uninitialized): error -1 setting up PHY for tree 0, switch 0, port 3
[   19.663018][   T22] DSA: tree 0 setup
[   19.663591][   T22] dsa-loop fixed-0:1f: DSA mockup driver: 0x1f



The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250205/202502051331.7587ac82-lkp@intel.com



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [linus:master] [net]  03abf2a7c6: WARNING:suspicious_RCU_usage
  2025-02-05  6:08 [linus:master] [net] 03abf2a7c6: WARNING:suspicious_RCU_usage kernel test robot
@ 2025-02-05 17:19 ` Russell King (Oracle)
  2025-03-27 13:09   ` Kory Maincent
  0 siblings, 1 reply; 3+ messages in thread
From: Russell King (Oracle) @ 2025-02-05 17:19 UTC (permalink / raw)
  To: kernel test robot
  Cc: Andrew Lunn, oe-lkp, lkp, linux-kernel, Jakub Kicinski,
	Jacob Keller, netdev, Kory Maincent

On Wed, Feb 05, 2025 at 02:08:04PM +0800, kernel test robot wrote:
> kernel test robot noticed "WARNING:suspicious_RCU_usage" on:
> 
> commit: 03abf2a7c65451e663b078b0ed1bfa648cd9380f ("net: phylink: add EEE management")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

I think there's multiple issues here that need addressing:

1) calling phy_detach() in a context that phy_attach() is allowed
   causes this warning, which seems absurd (being able to attach but
   not detach on error is a problem.)

This is the root cause of the issue, and others have run into this same
problem. There's already been an effort to address this:
   https://lore.kernel.org/r/20250117141446.1076951-1-kory.maincent@bootlin.com
   https://lore.kernel.org/r/20250117173645.1107460-1-kory.maincent@bootlin.com
   https://lore.kernel.org/r/20250120141926.1290763-1-kory.maincent@bootlin.com
and I think the conclusion is that the RTNL had to be held while calling
phy_detach().

2) phy_modify_mmd() returning -EPERM. Having traced through the code,
   this comes from my swphy.c which returns -1 (eww). However, as this
   code was extracted from fixed_phy.c, and the emulation is provided
   for userspace, this is part of the uAPI of the kernel and can't be
   changed.

3) the blamed commit introduces a call to phy_modify_mmd() to set the
   clock-stop bit, which ought not be done unless phylink managed EEE
   is being used.

(2) and (3) together is what ends up causing:

> [   19.646149][   T22] dsa-loop fixed-0:1f lan1 (uninitialized): failed to connect to PHY: -EPERM
> [   19.647542][   T22] dsa-loop fixed-0:1f lan1 (uninitialized): error -1 setting up PHY for tree 0, switch 0, port 0
> [   19.649283][   T22] dsa-loop fixed-0:1f lan2 (uninitialized): PHY [dsa-0.0:01] driver [Generic PHY] (irq=POLL)
> [   19.650853][   T22] dsa-loop fixed-0:1f lan2 (uninitialized): failed to connect to PHY: -EPERM
> [   19.652238][   T22] dsa-loop fixed-0:1f lan2 (uninitialized): error -1 setting up PHY for tree 0, switch 0, port 1
> [   19.653856][   T22] dsa-loop fixed-0:1f lan3 (uninitialized): PHY [dsa-0.0:02] driver [Generic PHY] (irq=POLL)
> [   19.655392][   T22] dsa-loop fixed-0:1f lan3 (uninitialized): failed to connect to PHY: -EPERM
> [   19.656689][   T22] dsa-loop fixed-0:1f lan3 (uninitialized): error -1 setting up PHY for tree 0, switch 0, port 2
> [   19.658308][   T22] dsa-loop fixed-0:1f lan4 (uninitialized): PHY [dsa-0.0:03] driver [Generic PHY] (irq=POLL)
> [   19.659841][   T22] dsa-loop fixed-0:1f lan4 (uninitialized): failed to connect to PHY: -EPERM
> [   19.661168][   T22] dsa-loop fixed-0:1f lan4 (uninitialized): error -1 setting up PHY for tree 0, switch 0, port 3
> [   19.663018][   T22] DSA: tree 0 setup
> [   19.663591][   T22] dsa-loop fixed-0:1f: DSA mockup driver: 0x1f

which then causes phy_detach() to be called, which then triggers the
"suspicious RCU" warning.

This has merely revealed a problem in the error handling since Kory's
commit on the 12th December, and actually has nothing to do with the
blamed commit, other than it revealing the latent problem.

The "hold RTNL" solution isn't trivial to implement here - phylink's
PHY connection functions can be called with RTNL already held, so it
isn't a simple case of throwing locking at phylink (which will cause
a deadlock) - it needs every phylink user to be audited and individual
patches to take the RTNL in the driver generated as necessary. I'm not
sure when I'll be able to do that. It's also a locking change for this
API - going from not needing the RTNL to requiring it.

This is probably going to result in more kernel warnings being
generated when I throw in ASSERT_RTNL() into phylink paths that could
call phy_detach(). Sounds joyful.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [linus:master] [net]  03abf2a7c6: WARNING:suspicious_RCU_usage
  2025-02-05 17:19 ` Russell King (Oracle)
@ 2025-03-27 13:09   ` Kory Maincent
  0 siblings, 0 replies; 3+ messages in thread
From: Kory Maincent @ 2025-03-27 13:09 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: kernel test robot, Andrew Lunn, oe-lkp, lkp, linux-kernel,
	Jakub Kicinski, Jacob Keller, netdev

Hello Russell,

On Wed, 5 Feb 2025 17:19:13 +0000
"Russell King (Oracle)" <linux@armlinux.org.uk> wrote:

> On Wed, Feb 05, 2025 at 02:08:04PM +0800, kernel test robot wrote:
> > kernel test robot noticed "WARNING:suspicious_RCU_usage" on:
> > 
> > commit: 03abf2a7c65451e663b078b0ed1bfa648cd9380f ("net: phylink: add EEE
> > management")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master  
> 
> I think there's multiple issues here that need addressing:
> 
> 1) calling phy_detach() in a context that phy_attach() is allowed
>    causes this warning, which seems absurd (being able to attach but
>    not detach on error is a problem.)
> 
> This is the root cause of the issue, and others have run into this same
> problem. There's already been an effort to address this:
>    https://lore.kernel.org/r/20250117141446.1076951-1-kory.maincent@bootlin.com
>    https://lore.kernel.org/r/20250117173645.1107460-1-kory.maincent@bootlin.com
>    https://lore.kernel.org/r/20250120141926.1290763-1-kory.maincent@bootlin.com
> and I think the conclusion is that the RTNL had to be held while calling
> phy_detach().
> 
> 2) phy_modify_mmd() returning -EPERM. Having traced through the code,
>    this comes from my swphy.c which returns -1 (eww). However, as this
>    code was extracted from fixed_phy.c, and the emulation is provided
>    for userspace, this is part of the uAPI of the kernel and can't be
>    changed.
> 
> 3) the blamed commit introduces a call to phy_modify_mmd() to set the
>    clock-stop bit, which ought not be done unless phylink managed EEE
>    is being used.
> 
> (2) and (3) together is what ends up causing:
> 
> > [   19.646149][   T22] dsa-loop fixed-0:1f lan1 (uninitialized): failed to
> > connect to PHY: -EPERM [   19.647542][   T22] dsa-loop fixed-0:1f lan1
> > (uninitialized): error -1 setting up PHY for tree 0, switch 0, port 0 [
> > 19.649283][   T22] dsa-loop fixed-0:1f lan2 (uninitialized): PHY
> > [dsa-0.0:01] driver [Generic PHY] (irq=POLL) [   19.650853][   T22]
> > dsa-loop fixed-0:1f lan2 (uninitialized): failed to connect to PHY: -EPERM
> > [   19.652238][   T22] dsa-loop fixed-0:1f lan2 (uninitialized): error -1
> > setting up PHY for tree 0, switch 0, port 1 [   19.653856][   T22] dsa-loop
> > fixed-0:1f lan3 (uninitialized): PHY [dsa-0.0:02] driver [Generic PHY]
> > (irq=POLL) [   19.655392][   T22] dsa-loop fixed-0:1f lan3 (uninitialized):
> > failed to connect to PHY: -EPERM [   19.656689][   T22] dsa-loop fixed-0:1f
> > lan3 (uninitialized): error -1 setting up PHY for tree 0, switch 0, port 2
> > [   19.658308][   T22] dsa-loop fixed-0:1f lan4 (uninitialized): PHY
> > [dsa-0.0:03] driver [Generic PHY] (irq=POLL) [   19.659841][   T22]
> > dsa-loop fixed-0:1f lan4 (uninitialized): failed to connect to PHY: -EPERM
> > [   19.661168][   T22] dsa-loop fixed-0:1f lan4 (uninitialized): error -1
> > setting up PHY for tree 0, switch 0, port 3 [   19.663018][   T22] DSA:
> > tree 0 setup [   19.663591][   T22] dsa-loop fixed-0:1f: DSA mockup driver:
> > 0x1f  
> 
> which then causes phy_detach() to be called, which then triggers the
> "suspicious RCU" warning.
> 
> This has merely revealed a problem in the error handling since Kory's
> commit on the 12th December, and actually has nothing to do with the
> blamed commit, other than it revealing the latent problem.
> 
> The "hold RTNL" solution isn't trivial to implement here - phylink's
> PHY connection functions can be called with RTNL already held, so it
> isn't a simple case of throwing locking at phylink (which will cause
> a deadlock) - it needs every phylink user to be audited and individual
> patches to take the RTNL in the driver generated as necessary. I'm not
> sure when I'll be able to do that. It's also a locking change for this
> API - going from not needing the RTNL to requiring it.
> 
> This is probably going to result in more kernel warnings being
> generated when I throw in ASSERT_RTNL() into phylink paths that could
> call phy_detach(). Sounds joyful.

It is indeed painful! I have began to take a look at it:
https://termbin.com/d9tq

I don't know if there is a better way to do this ...

Regards,
-- 
Köry Maincent, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-03-27 13:09 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-05  6:08 [linus:master] [net] 03abf2a7c6: WARNING:suspicious_RCU_usage kernel test robot
2025-02-05 17:19 ` Russell King (Oracle)
2025-03-27 13:09   ` Kory Maincent

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).