public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* w1: 3.14-rc7 - possible recursive locking detected
@ 2014-03-17 21:38 Belisko Marek
  2014-03-23 21:50 ` David Fries
  0 siblings, 1 reply; 9+ messages in thread
From: Belisko Marek @ 2014-03-17 21:38 UTC (permalink / raw)
  To: LKML, zbr; +Cc: Dr. H. Nikolaus Schaller

Hi,

booting latest 3.14-rc7 on gta04 board gives following warning:

[    3.101409] Driver for 1-wire Dallas network protocol.
[    3.109649] omap_hdq omap_hdq: OMAP HDQ Hardware Rev 0.5. Driver in
Interrupt mode
[    3.330810]
[    3.332397] =============================================
[    3.338073] [ INFO: possible recursive locking detected ]
[    3.343749] 3.14.0-rc7-gta04 #520 Not tainted
[    3.348327] ---------------------------------------------
[    3.353973] w1_bus_master1/805 is trying to acquire lock:
[    3.359649]  (&(&priv->bus_notifier)->rwsem){.+.+.+}, at:
[<c005977c>] __blocking_notifier_call_chain+0x28/0x58
[    3.370300]
[    3.370300] but task is already holding lock:
[    3.376434]  (&(&priv->bus_notifier)->rwsem){.+.+.+}, at:
[<c005977c>] __blocking_notifier_call_chain+0x28/0x58
[    3.387084]
[    3.387084] other info that might help us debug this:
[    3.393920]  Possible unsafe locking scenario:
[    3.393920]
[    3.400146]        CPU0
[    3.402709]        ----
[    3.405273]   lock(&(&priv->bus_notifier)->rwsem);
[    3.410308]   lock(&(&priv->bus_notifier)->rwsem);
[    3.415374]
[    3.415374]  *** DEADLOCK ***
[    3.415374]
[    3.421569]  May be due to missing lock nesting notation
[    3.421569]
[    3.428710] 2 locks held by w1_bus_master1/805:
[    3.433471]  #0:  (&dev->mutex#2){+.+.+.}, at: [<c03bc5d4>]
w1_attach_slave_device+0xb8/0x1c4
[    3.442504]  #1:  (&(&priv->bus_notifier)->rwsem){.+.+.+}, at:
[<c005977c>] __blocking_notifier_call_chain+0x28/0x58
[    3.453582]
[    3.453582] stack backtrace:
[    3.458190] CPU: 0 PID: 805 Comm: w1_bus_master1 Not tainted
3.14.0-rc7-gta04 #520
[    3.466156] [<c0014418>] (unwind_backtrace) from [<c0011788>]
(show_stack+0x10/0x14)
[    3.474304] [<c0011788>] (show_stack) from [<c04ee204>]
(dump_stack+0x68/0x84)
[    3.481933] [<c04ee204>] (dump_stack) from [<c007616c>]
(print_deadlock_bug+0xc0/0xf0)
[    3.490264] [<c007616c>] (print_deadlock_bug) from [<c0077a34>]
(validate_chain.isra.27+0x4e0/0x538)
[    3.499847] [<c0077a34>] (validate_chain.isra.27) from [<c0078864>]
(__lock_acquire+0x72c/0x838)
[    3.509094] [<c0078864>] (__lock_acquire) from [<c0078f84>]
(lock_acquire+0xf4/0x118)
[    3.517333] [<c0078f84>] (lock_acquire) from [<c04f69b0>]
(down_read+0x28/0x38)
[    3.525024] [<c04f69b0>] (down_read) from [<c005977c>]
(__blocking_notifier_call_chain+0x28/0x58)
[    3.534362] [<c005977c>] (__blocking_notifier_call_chain) from
[<c00597c0>] (blocking_notifier_call_chain+0x14/0x18)
[    3.545440] [<c00597c0>] (blocking_notifier_call_chain) from
[<c0306940>] (device_add+0x1f4/0x368)
[    3.554870] [<c0306940>] (device_add) from [<c030a158>]
(platform_device_add+0x138/0x1c8)
[    3.563476] [<c030a158>] (platform_device_add) from [<c03bf31c>]
(w1_bq27000_add_slave+0x44/0x74)
[    3.572784] [<c03bf31c>] (w1_bq27000_add_slave) from [<c03bca78>]
(w1_bus_notify.part.0+0x48/0xc8)
[    3.582214] [<c03bca78>] (w1_bus_notify.part.0) from [<c04f996c>]
(notifier_call_chain+0x38/0x68)
[    3.591522] [<c04f996c>] (notifier_call_chain) from [<c0059798>]
(__blocking_notifier_call_chain+0x44/0x58)
[    3.601776] [<c0059798>] (__blocking_notifier_call_chain) from
[<c00597c0>] (blocking_notifier_call_chain+0x14/0x18)
[    3.612823] [<c00597c0>] (blocking_notifier_call_chain) from
[<c0306940>] (device_add+0x1f4/0x368)
[    3.622253] [<c0306940>] (device_add) from [<c03bc4a0>]
(__w1_attach_slave_device+0x9c/0x118)
[    3.631225] [<c03bc4a0>] (__w1_attach_slave_device) from
[<c03bc644>] (w1_attach_slave_device+0x128/0x1c4)
[    3.641357] [<c03bc644>] (w1_attach_slave_device) from [<c03bc75c>]
(w1_slave_found+0x7c/0x98)
[    3.650421] [<c03bc75c>] (w1_slave_found) from [<c03bea10>]
(omap_w1_search_bus+0x54/0x5c)
[    3.659118] [<c03bea10>] (omap_w1_search_bus) from [<c03be950>]
(w1_search_devices+0x3c/0x48)
[    3.668090] [<c03be950>] (w1_search_devices) from [<c03bd210>]
(w1_search_process_cb+0x4c/0xc4)
[    3.677215] [<c03bd210>] (w1_search_process_cb) from [<c03bd2e0>]
(w1_process+0x58/0x10c)
[    3.685821] [<c03bd2e0>] (w1_process) from [<c0055850>] (kthread+0xc8/0xdc)
[    3.693145] [<c0055850>] (kthread) from [<c000dc68>]
(ret_from_fork+0x14/0x2c)
[    3.751068] omap_wdt: OMAP Watchdog Timer Rev 0x31: initial timeout 60 sec

AFAIK we didn't see that on (at least 3/14-rc2).

BR,

marek

-- 
as simple and primitive as possible
-------------------------------------------------
Marek Belisko - OPEN-NANDRA
Freelance Developer

Ruska Nova Ves 219 | Presov, 08005 Slovak Republic
Tel: +421 915 052 184
skype: marekwhite
twitter: #opennandra
web: http://open-nandra.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: w1: 3.14-rc7 - possible recursive locking detected
  2014-03-17 21:38 w1: 3.14-rc7 - possible recursive locking detected Belisko Marek
@ 2014-03-23 21:50 ` David Fries
  2014-03-24  5:55   ` Dr. H. Nikolaus Schaller
       [not found]   ` <698551395662480@web15j.yandex.ru>
  0 siblings, 2 replies; 9+ messages in thread
From: David Fries @ 2014-03-23 21:50 UTC (permalink / raw)
  To: Belisko Marek; +Cc: LKML, zbr, Dr. H. Nikolaus Schaller

On Mon, Mar 17, 2014 at 10:38:13PM +0100, Belisko Marek wrote:
> Hi,
> 
> booting latest 3.14-rc7 on gta04 board gives following warning:
> 
> [    3.101409] Driver for 1-wire Dallas network protocol.
> [    3.109649] omap_hdq omap_hdq: OMAP HDQ Hardware Rev 0.5. Driver in
> [    3.400146]        CPU0
> [    3.402709]        ----
> [    3.405273]   lock(&(&priv->bus_notifier)->rwsem);
> [    3.410308]   lock(&(&priv->bus_notifier)->rwsem);
> [    3.415374]
> [    3.415374]  *** DEADLOCK ***
> 
> AFAIK we didn't see that on (at least 3/14-rc2).

The last drivers/w1 change went in before v3.14-rc1, so if it is
something in the w1 code, it has either been there or something else
in the rest of the kernel changed to expose it.  I'm using the ds2490
USB w1 dongle, I didn't see that problem when enabling lock debugging
on 3.14.0-rc5+ which includes some changes I'm working on.

https://github.com/dfries/linux.git w1_rework

-- 
David Fries <david@fries.net>    PGP pub CB1EE8F0
http://fries.net/~david/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: w1: 3.14-rc7 - possible recursive locking detected
  2014-03-23 21:50 ` David Fries
@ 2014-03-24  5:55   ` Dr. H. Nikolaus Schaller
       [not found]   ` <698551395662480@web15j.yandex.ru>
  1 sibling, 0 replies; 9+ messages in thread
From: Dr. H. Nikolaus Schaller @ 2014-03-24  5:55 UTC (permalink / raw)
  To: David Fries; +Cc: Belisko Marek, LKML, zbr

Hi,

Am 23.03.2014 um 22:50 schrieb David Fries:

> On Mon, Mar 17, 2014 at 10:38:13PM +0100, Belisko Marek wrote:
>> Hi,
>> 
>> booting latest 3.14-rc7 on gta04 board gives following warning:
>> 
>> [    3.101409] Driver for 1-wire Dallas network protocol.
>> [    3.109649] omap_hdq omap_hdq: OMAP HDQ Hardware Rev 0.5. Driver in
>> [    3.400146]        CPU0
>> [    3.402709]        ----
>> [    3.405273]   lock(&(&priv->bus_notifier)->rwsem);
>> [    3.410308]   lock(&(&priv->bus_notifier)->rwsem);
>> [    3.415374]
>> [    3.415374]  *** DEADLOCK ***
>> 
>> AFAIK we didn't see that on (at least 3/14-rc2).
> 
> The last drivers/w1 change went in before v3.14-rc1, so if it is
> something in the w1 code, it has either been there or something else
> in the rest of the kernel changed to expose it.  I'm using the ds2490
> USB w1 dongle, I didn't see that problem when enabling lock debugging
> on 3.14.0-rc5+ which includes some changes I'm working on.
> 
> https://github.com/dfries/linux.git w1_rework

Further analysis shows that It only occurs when using the device tree.

So it may have to do something with nested driver probing/locking running
differently between DT and board file.

BR,
Nikolaus


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: w1: 3.14-rc7 - possible recursive locking detected
       [not found]   ` <698551395662480@web15j.yandex.ru>
@ 2014-04-08 19:47     ` Belisko Marek
  2014-04-13 22:24       ` David Fries
  0 siblings, 1 reply; 9+ messages in thread
From: Belisko Marek @ 2014-04-08 19:47 UTC (permalink / raw)
  To: Evgeniy Polyakov; +Cc: David Fries, LKML, Dr. H. Nikolaus Schaller

Hi,

On Mon, Mar 24, 2014 at 1:01 PM, Evgeniy Polyakov <zbr@ioremap.net> wrote:
> Hi everyone
>
> 24.03.2014, 01:50, "David Fries" <David@Fries.net>:
>> On Mon, Mar 17, 2014 at 10:38:13PM +0100, Belisko Marek wrote:
>>>  booting latest 3.14-rc7 on gta04 board gives following warning:
>>>
>>>  [    3.101409] Driver for 1-wire Dallas network protocol.
>>>  [    3.109649] omap_hdq omap_hdq: OMAP HDQ Hardware Rev 0.5. Driver in
>>>  [    3.400146]        CPU0
>>>  [    3.402709]        ----
>>>  [    3.405273]   lock(&(&priv->bus_notifier)->rwsem);
>>>  [    3.410308]   lock(&(&priv->bus_notifier)->rwsem);
>>>  [    3.415374]
>>>  [    3.415374]  *** DEADLOCK ***
>>>
>>>  AFAIK we didn't see that on (at least 3/14-rc2).
>>
>> The last drivers/w1 change went in before v3.14-rc1, so if it is
>> something in the w1 code, it has either been there or something else
>> in the rest of the kernel changed to expose it.  I'm using the ds2490
>> USB w1 dongle, I didn't see that problem when enabling lock debugging
>> on 3.14.0-rc5+ which includes some changes I'm working on.
>>
>> https://github.com/dfries/linux.git w1_rework
>
> Can it be a bug in omap w1 driver?
We test with actual Linus master and it's even worse (it hangs completely).
Trace look little bit different and it only happens when booting via
device tree (not with board file).

[    2.807434] Driver for 1-wire Dallas network protocol.
[    2.814117] omap_hdq omap_hdq: OMAP HDQ Hardware Rev 0.5. Driver in
Interrupt mode
[    3.034271] call_modprobe: w1-family-0x1
[    3.051818]
[    3.053405] =============================================
[    3.059051] [ INFO: possible recursive locking detected ]
[    3.064727] 3.14.0-gta04 #556 Not tainted
[    3.068939] ---------------------------------------------
[    3.074615] w1_bus_master1/795 is trying to acquire lock:
[    3.080261]  (&(&priv->bus_notifier)->rwsem){.+.+.+}, at:
[<c0058ab0>] __blocking_notifier_call_chain+0x28/0x58
[    3.090911]
[    3.090911] but task is already holding lock:
[    3.097045]  (&(&priv->bus_notifier)->rwsem){.+.+.+}, at:
[<c0058ab0>] __blocking_notifier_call_chain+0x28/0x58
[    3.107666]
[    3.107666] other info that might help us debug this:
[    3.114501]  Possible unsafe locking scenario:
[    3.114501]
[    3.120727]        CPU0
[    3.123291]        ----
[    3.125854]   lock(&(&priv->bus_notifier)->rwsem);
[    3.130889]   lock(&(&priv->bus_notifier)->rwsem);
[    3.135925]
[    3.135925]  *** DEADLOCK ***
[    3.135925]
[    3.142150]  May be due to missing lock nesting notation
[    3.142150]
[    3.149291] 2 locks held by w1_bus_master1/795:
[    3.154052]  #0:  (&dev->mutex#3){+.+.+.}, at: [<c03ba1c8>]
w1_attach_slave_device+0xc4/0x1c8
[    3.163055]  #1:  (&(&priv->bus_notifier)->rwsem){.+.+.+}, at:
[<c0058ab0>] __blocking_notifier_call_chain+0x28/0x58
[    3.174163]
[    3.174163] stack backtrace:
[    3.178741] CPU: 0 PID: 795 Comm: w1_bus_master1 Not tainted
3.14.0-gta04 #556
[    3.186370] [<c0013ca0>] (unwind_backtrace) from [<c0010f68>]
(show_stack+0x10/0x14)
[    3.194488] [<c0010f68>] (show_stack) from [<c04f0900>]
(dump_stack+0x6c/0x88)
[    3.202117] [<c04f0900>] (dump_stack) from [<c00746bc>]
(print_deadlock_bug+0xc0/0xf0)
[    3.210449] [<c00746bc>] (print_deadlock_bug) from [<c0075f80>]
(validate_chain.isra.29+0x4dc/0x534)
[    3.220031] [<c0075f80>] (validate_chain.isra.29) from [<c0076dac>]
(__lock_acquire+0x728/0x834)
[    3.229248] [<c0076dac>] (__lock_acquire) from [<c00774b8>]
(lock_acquire+0xf4/0x118)
[    3.237487] [<c00774b8>] (lock_acquire) from [<c04f8e04>]
(down_read+0x24/0x34)
[    3.245178] [<c04f8e04>] (down_read) from [<c0058ab0>]
(__blocking_notifier_call_chain+0x28/0x58)
[    3.254516] [<c0058ab0>] (__blocking_notifier_call_chain) from
[<c0058af4>] (blocking_notifier_call_chain+0x14/0x18)
[    3.265563] [<c0058af4>] (blocking_notifier_call_chain) from
[<c0301db8>] (device_add+0x1f4/0x368)
[    3.274993] [<c0301db8>] (device_add) from [<c03055cc>]
(platform_device_add+0x138/0x1c8)
[    3.283569] [<c03055cc>] (platform_device_add) from [<c03bd124>]
(w1_bq27000_add_slave+0x44/0x74)
[    3.292907] [<c03bd124>] (w1_bq27000_add_slave) from [<c03b9ed8>]
(w1_bus_notify.part.0+0x48/0xc8)
[    3.302337] [<c03b9ed8>] (w1_bus_notify.part.0) from [<c04fbe1c>]
(notifier_call_chain+0x38/0x68)
[    3.311645] [<c04fbe1c>] (notifier_call_chain) from [<c0058acc>]
(__blocking_notifier_call_chain+0x44/0x58)
[    3.321899] [<c0058acc>] (__blocking_notifier_call_chain) from
[<c0058af4>] (blocking_notifier_call_chain+0x14/0x18)
[    3.332946] [<c0058af4>] (blocking_notifier_call_chain) from
[<c0301db8>] (device_add+0x1f4/0x368)
[    3.342346] [<c0301db8>] (device_add) from [<c03b9ad0>]
(__w1_attach_slave_device+0x9c/0x134)
[    3.351318] [<c03b9ad0>] (__w1_attach_slave_device) from
[<c03ba238>] (w1_attach_slave_device+0x134/0x1c8)
[    3.361480] [<c03ba238>] (w1_attach_slave_device) from [<c03ba348>]
(w1_slave_found+0x7c/0x98)
[    3.370513] [<c03ba348>] (w1_slave_found) from [<c03bc818>]
(omap_w1_search_bus+0x54/0x5c)
[    3.379211] [<c03bc818>] (omap_w1_search_bus) from [<c03bc758>]
(w1_search_devices+0x3c/0x48)
[    3.388153] [<c03bc758>] (w1_search_devices) from [<c03baaec>]
(w1_search_process_cb+0x64/0x108)
[    3.397399] [<c03baaec>] (w1_search_process_cb) from [<c03bac60>]
(w1_process+0x68/0x14c)
[    3.405975] [<c03bac60>] (w1_process) from [<c0054bdc>] (kthread+0xdc/0xf0)
[    3.413299] [<c0054bdc>] (kthread) from [<c000dca8>]
(ret_from_fork+0x14/0x2c)
[   24.451049] INFO: rcu_sched detected stalls on CPUs/tasks: {}
(detected by 0, t=2102 jiffies, g=4294967086, c=4294967085, q=19)
[   24.463134] INFO: Stall ended before state dump start
[   87.501037] INFO: rcu_sched detected stalls on CPUs/tasks: {}
(detected by 0, t=8407 jiffies, g=4294967086, c=4294967085, q=19)
[   87.513122] INFO: Stall ended before state dump start
[  150.551055] INFO: rcu_sched detected stalls on CPUs/tasks: {}
(detected by 0, t=14712 jiffies, g=4294967086, c=4294967085, q=19)
[  150.563232] INFO: Stall ended before state dump start
[  213.601043] INFO: rcu_sched detected stalls on CPUs/tasks: {}
(detected by 0, t=21017 jiffies, g=4294967086, c=4294967085, q=19)
[  213.613220] INFO: Stall ended before state dump start
[  276.651062] INFO: rcu_sched detected stalls on CPUs/tasks: {}
(detected by 0, t=27322 jiffies, g=4294967086, c=4294967085, q=19)
[  276.663238] INFO: Stall ended before state dump start

BR,

marek

-- 
as simple and primitive as possible
-------------------------------------------------
Marek Belisko - OPEN-NANDRA
Freelance Developer

Ruska Nova Ves 219 | Presov, 08005 Slovak Republic
Tel: +421 915 052 184
skype: marekwhite
twitter: #opennandra
web: http://open-nandra.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: w1: 3.14-rc7 - possible recursive locking detected
  2014-04-08 19:47     ` Belisko Marek
@ 2014-04-13 22:24       ` David Fries
  2014-04-14 13:10         ` Dr. H. Nikolaus Schaller
                           ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: David Fries @ 2014-04-13 22:24 UTC (permalink / raw)
  To: Belisko Marek
  Cc: Evgeniy Polyakov, LKML, Dr. H. Nikolaus Schaller,
	Greg Kroah-Hartman

Belisko Marek,
Here is a possible solution, could you give it a try and report back?

Greg Kroah-Hartman,
Evgeniy asked me to look into this report.  I don't have the
reporter's hardware configuration, but I wouldn't think that would be
needed, just some w1 bus master (even W1_MASTER_GPIO might work), then
loading the slave device and manually adding a slave device with that
family id.  Even then I didn't reproduce the reported recursive
locking error.  I saw unrelated locking reports, but not this one.  I
wrote up the included patch, but that undoes the notify changes that
you added earlier in commit 47eba33a0997fc7, and I wanted to ask about
that commit.  Specifically these two lines,

        err = device_register(&sl->dev);
	...
+	dev_set_uevent_suppress(&sl->dev, false);
+	kobject_uevent(&sl->dev.kobj, KOBJ_ADD);

Wouldn't the default be to not suppress?  Nothing in the W1 system
enables suppressing so is that even needed?  (I'm fine with saying
it's a good idea).
device_register at some point must call device_add and device_add
calls kobject_uevent(&dev->kobj, KOBJ_ADD);
so doesn't the KOBJ_ADD send the add a second time?  As in it
shouldn't be needed?
Can the suppress be called before device_register to avoid the
automatic notify, then after it returns setup the slave device as this
patch does to avoid this problem report, and then call the KOBJ_ADD to
make everything happy?

I'm based on commit ce7613db2d8d4d5af2587a.

On Tue, Apr 08, 2014 at 09:47:50PM +0200, Belisko Marek wrote:
> Hi,
> 
> On Mon, Mar 24, 2014 at 1:01 PM, Evgeniy Polyakov <zbr@ioremap.net> wrote:
> > Hi everyone
> >
> > 24.03.2014, 01:50, "David Fries" <David@Fries.net>:
> >> On Mon, Mar 17, 2014 at 10:38:13PM +0100, Belisko Marek wrote:
> >>>  booting latest 3.14-rc7 on gta04 board gives following warning:
> >>>
> >>>  [    3.101409] Driver for 1-wire Dallas network protocol.
> >>>  [    3.109649] omap_hdq omap_hdq: OMAP HDQ Hardware Rev 0.5. Driver in
> >>>  [    3.400146]        CPU0
> >>>  [    3.402709]        ----
> >>>  [    3.405273]   lock(&(&priv->bus_notifier)->rwsem);
> >>>  [    3.410308]   lock(&(&priv->bus_notifier)->rwsem);
> >>>  [    3.415374]
> >>>  [    3.415374]  *** DEADLOCK ***
> >>>
> >>>  AFAIK we didn't see that on (at least 3/14-rc2).
> >>
> >> The last drivers/w1 change went in before v3.14-rc1, so if it is
> >> something in the w1 code, it has either been there or something else
> >> in the rest of the kernel changed to expose it.  I'm using the ds2490
> >> USB w1 dongle, I didn't see that problem when enabling lock debugging
> >> on 3.14.0-rc5+ which includes some changes I'm working on.
> >>
> >> https://github.com/dfries/linux.git w1_rework
> >
> > Can it be a bug in omap w1 driver?
> We test with actual Linus master and it's even worse (it hangs completely).
> Trace look little bit different and it only happens when booting via
> device tree (not with board file).
> 
> [    2.807434] Driver for 1-wire Dallas network protocol.
> [    2.814117] omap_hdq omap_hdq: OMAP HDQ Hardware Rev 0.5. Driver in
> Interrupt mode
> [    3.034271] call_modprobe: w1-family-0x1
> [    3.051818]
> [    3.053405] =============================================
> [    3.059051] [ INFO: possible recursive locking detected ]
> [    3.064727] 3.14.0-gta04 #556 Not tainted
> [    3.068939] ---------------------------------------------
> [    3.074615] w1_bus_master1/795 is trying to acquire lock:
> [    3.080261]  (&(&priv->bus_notifier)->rwsem){.+.+.+}, at:
> [<c0058ab0>] __blocking_notifier_call_chain+0x28/0x58
> [    3.090911]
> [    3.090911] but task is already holding lock:
> [    3.097045]  (&(&priv->bus_notifier)->rwsem){.+.+.+}, at:
> [<c0058ab0>] __blocking_notifier_call_chain+0x28/0x58
> [    3.107666]
> [    3.107666] other info that might help us debug this:
> [    3.114501]  Possible unsafe locking scenario:
> [    3.114501]
> [    3.120727]        CPU0
> [    3.123291]        ----
> [    3.125854]   lock(&(&priv->bus_notifier)->rwsem);
> [    3.130889]   lock(&(&priv->bus_notifier)->rwsem);
> [    3.135925]
> [    3.135925]  *** DEADLOCK ***
> [    3.135925]
> [    3.142150]  May be due to missing lock nesting notation
> [    3.142150]
> [    3.149291] 2 locks held by w1_bus_master1/795:
> [    3.154052]  #0:  (&dev->mutex#3){+.+.+.}, at: [<c03ba1c8>]
> w1_attach_slave_device+0xc4/0x1c8
> [    3.163055]  #1:  (&(&priv->bus_notifier)->rwsem){.+.+.+}, at:
> [<c0058ab0>] __blocking_notifier_call_chain+0x28/0x58
> [    3.174163]
> [    3.174163] stack backtrace:
> [    3.178741] CPU: 0 PID: 795 Comm: w1_bus_master1 Not tainted
> 3.14.0-gta04 #556
> [    3.186370] [<c0013ca0>] (unwind_backtrace) from [<c0010f68>]
> (show_stack+0x10/0x14)
> [    3.194488] [<c0010f68>] (show_stack) from [<c04f0900>]
> (dump_stack+0x6c/0x88)
> [    3.202117] [<c04f0900>] (dump_stack) from [<c00746bc>]
> (print_deadlock_bug+0xc0/0xf0)
> [    3.210449] [<c00746bc>] (print_deadlock_bug) from [<c0075f80>]
> (validate_chain.isra.29+0x4dc/0x534)
> [    3.220031] [<c0075f80>] (validate_chain.isra.29) from [<c0076dac>]
> (__lock_acquire+0x728/0x834)
> [    3.229248] [<c0076dac>] (__lock_acquire) from [<c00774b8>]
> (lock_acquire+0xf4/0x118)
> [    3.237487] [<c00774b8>] (lock_acquire) from [<c04f8e04>]
> (down_read+0x24/0x34)
> [    3.245178] [<c04f8e04>] (down_read) from [<c0058ab0>]
> (__blocking_notifier_call_chain+0x28/0x58)
> [    3.254516] [<c0058ab0>] (__blocking_notifier_call_chain) from
> [<c0058af4>] (blocking_notifier_call_chain+0x14/0x18)
> [    3.265563] [<c0058af4>] (blocking_notifier_call_chain) from
> [<c0301db8>] (device_add+0x1f4/0x368)
> [    3.274993] [<c0301db8>] (device_add) from [<c03055cc>]
> (platform_device_add+0x138/0x1c8)
> [    3.283569] [<c03055cc>] (platform_device_add) from [<c03bd124>]
> (w1_bq27000_add_slave+0x44/0x74)
> [    3.292907] [<c03bd124>] (w1_bq27000_add_slave) from [<c03b9ed8>]
> (w1_bus_notify.part.0+0x48/0xc8)
> [    3.302337] [<c03b9ed8>] (w1_bus_notify.part.0) from [<c04fbe1c>]
> (notifier_call_chain+0x38/0x68)
> [    3.311645] [<c04fbe1c>] (notifier_call_chain) from [<c0058acc>]
> (__blocking_notifier_call_chain+0x44/0x58)
> [    3.321899] [<c0058acc>] (__blocking_notifier_call_chain) from
> [<c0058af4>] (blocking_notifier_call_chain+0x14/0x18)
> [    3.332946] [<c0058af4>] (blocking_notifier_call_chain) from
> [<c0301db8>] (device_add+0x1f4/0x368)
> [    3.342346] [<c0301db8>] (device_add) from [<c03b9ad0>]
> (__w1_attach_slave_device+0x9c/0x134)
> [    3.351318] [<c03b9ad0>] (__w1_attach_slave_device) from
> [<c03ba238>] (w1_attach_slave_device+0x134/0x1c8)
> [    3.361480] [<c03ba238>] (w1_attach_slave_device) from [<c03ba348>]
> (w1_slave_found+0x7c/0x98)
> [    3.370513] [<c03ba348>] (w1_slave_found) from [<c03bc818>]
> (omap_w1_search_bus+0x54/0x5c)
> [    3.379211] [<c03bc818>] (omap_w1_search_bus) from [<c03bc758>]
> (w1_search_devices+0x3c/0x48)
> [    3.388153] [<c03bc758>] (w1_search_devices) from [<c03baaec>]
> (w1_search_process_cb+0x64/0x108)
> [    3.397399] [<c03baaec>] (w1_search_process_cb) from [<c03bac60>]
> (w1_process+0x68/0x14c)
> [    3.405975] [<c03bac60>] (w1_process) from [<c0054bdc>] (kthread+0xdc/0xf0)
> [    3.413299] [<c0054bdc>] (kthread) from [<c000dca8>]
> (ret_from_fork+0x14/0x2c)
> [   24.451049] INFO: rcu_sched detected stalls on CPUs/tasks: {}
> (detected by 0, t=2102 jiffies, g=4294967086, c=4294967085, q=19)
> [   24.463134] INFO: Stall ended before state dump start
> [   87.501037] INFO: rcu_sched detected stalls on CPUs/tasks: {}
> (detected by 0, t=8407 jiffies, g=4294967086, c=4294967085, q=19)
> [   87.513122] INFO: Stall ended before state dump start
> [  150.551055] INFO: rcu_sched detected stalls on CPUs/tasks: {}
> (detected by 0, t=14712 jiffies, g=4294967086, c=4294967085, q=19)
> [  150.563232] INFO: Stall ended before state dump start
> [  213.601043] INFO: rcu_sched detected stalls on CPUs/tasks: {}
> (detected by 0, t=21017 jiffies, g=4294967086, c=4294967085, q=19)
> [  213.613220] INFO: Stall ended before state dump start
> [  276.651062] INFO: rcu_sched detected stalls on CPUs/tasks: {}
> (detected by 0, t=27322 jiffies, g=4294967086, c=4294967085, q=19)
> [  276.663238] INFO: Stall ended before state dump start
> 
> BR,
> 
> marek
> 
> -- 
> as simple and primitive as possible
> -------------------------------------------------
> Marek Belisko - OPEN-NANDRA
> Freelance Developer
> 
> Ruska Nova Ves 219 | Presov, 08005 Slovak Republic
> Tel: +421 915 052 184
> skype: marekwhite
> twitter: #opennandra
> web: http://open-nandra.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
David Fries <david@fries.net>    PGP pub CB1EE8F0
http://fries.net/~david/


commit b3afc24a3ca6c261a8fe2c08b5bbccb51233c09a
Author: David Fries <David@Fries.net>
Date:   Sun Apr 13 15:36:45 2014 -0500

    w1: avoid recursive device_add
    
    __w1_attach_slave_device calls device_add which calls w1_bus_notify
    which calls for the w1_bq27000 slave driver, which calls
    platform_device_add and device_add and deadlocks on getting
    &(&priv->bus_notifier)->rwsem as it is still held in the previous
    device_add.  This avoids the problem by processing the family
    add/remove outside of the slave device_add call.  This does mean that
    the slave device isn't setup when the notification that a slave is
    available goes out.
    
    Reported-by: Belisko Marek <marek.belisko@gmail.com>

diff --git a/drivers/w1/w1.c b/drivers/w1/w1.c
index b96f61b..dde09c1 100644
--- a/drivers/w1/w1.c
+++ b/drivers/w1/w1.c
@@ -614,27 +614,11 @@ end:
 	return err;
 }
 
-/*
- * Handle sysfs file creation and removal here, before userspace is told that
- * the device is added / removed from the system
- */
-static int w1_bus_notify(struct notifier_block *nb, unsigned long action,
-			 void *data)
+static int w1_family_notify(unsigned long action, struct w1_slave *sl)
 {
-	struct device *dev = data;
-	struct w1_slave *sl;
 	struct w1_family_ops *fops;
 	int err;
 
-	/*
-	 * Only care about slave devices at the moment.  Yes, we should use a
-	 * separate "type" for this, but for now, look at the release function
-	 * to know which type it is...
-	 */
-	if (dev->release != w1_slave_release)
-		return 0;
-
-	sl = dev_to_w1_slave(dev);
 	fops = sl->family->fops;
 
 	if (!fops)
@@ -673,10 +657,6 @@ static int w1_bus_notify(struct notifier_block *nb, unsigned long action,
 	return 0;
 }
 
-static struct notifier_block w1_bus_nb = {
-	.notifier_call = w1_bus_notify,
-};
-
 static int __w1_attach_slave_device(struct w1_slave *sl)
 {
 	int err;
@@ -705,6 +685,7 @@ static int __w1_attach_slave_device(struct w1_slave *sl)
 			dev_name(&sl->dev), err);
 		return err;
 	}
+	w1_family_notify(BUS_NOTIFY_ADD_DEVICE, sl);
 
 
 	dev_set_uevent_suppress(&sl->dev, false);
@@ -799,6 +780,7 @@ int w1_unref_slave(struct w1_slave *sl)
 		msg.type = W1_SLAVE_REMOVE;
 		w1_netlink_send(sl->master, &msg);
 
+		w1_family_notify(BUS_NOTIFY_DEL_DEVICE, sl);
 		device_unregister(&sl->dev);
 		#ifdef DEBUG
 		memset(sl, 0, sizeof(*sl));
@@ -1186,10 +1168,6 @@ static int __init w1_init(void)
 		goto err_out_exit_init;
 	}
 
-	retval = bus_register_notifier(&w1_bus_type, &w1_bus_nb);
-	if (retval)
-		goto err_out_bus_unregister;
-
 	retval = driver_register(&w1_master_driver);
 	if (retval) {
 		printk(KERN_ERR

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: w1: 3.14-rc7 - possible recursive locking detected
  2014-04-13 22:24       ` David Fries
@ 2014-04-14 13:10         ` Dr. H. Nikolaus Schaller
  2014-04-14 15:04         ` Greg Kroah-Hartman
  2014-04-16  6:21         ` [PATCH] w1: avoid recursive device_add David Fries
  2 siblings, 0 replies; 9+ messages in thread
From: Dr. H. Nikolaus Schaller @ 2014-04-14 13:10 UTC (permalink / raw)
  To: David Fries; +Cc: Belisko Marek, Evgeniy Polyakov, LKML, Greg Kroah-Hartman

Hi David,
I have tested and it appears to suppress the locking issue on our
GTA04 board.

BR,
Nikolaus

Am 14.04.2014 um 00:24 schrieb David Fries:

> Belisko Marek,
> Here is a possible solution, could you give it a try and report back?
> 
> Greg Kroah-Hartman,
> Evgeniy asked me to look into this report.  I don't have the
> reporter's hardware configuration, but I wouldn't think that would be
> needed, just some w1 bus master (even W1_MASTER_GPIO might work), then
> loading the slave device and manually adding a slave device with that
> family id.  Even then I didn't reproduce the reported recursive
> locking error.  I saw unrelated locking reports, but not this one.  I
> wrote up the included patch, but that undoes the notify changes that
> you added earlier in commit 47eba33a0997fc7, and I wanted to ask about
> that commit.  Specifically these two lines,
> 
>        err = device_register(&sl->dev);
> 	...
> +	dev_set_uevent_suppress(&sl->dev, false);
> +	kobject_uevent(&sl->dev.kobj, KOBJ_ADD);
> 
> Wouldn't the default be to not suppress?  Nothing in the W1 system
> enables suppressing so is that even needed?  (I'm fine with saying
> it's a good idea).
> device_register at some point must call device_add and device_add
> calls kobject_uevent(&dev->kobj, KOBJ_ADD);
> so doesn't the KOBJ_ADD send the add a second time?  As in it
> shouldn't be needed?
> Can the suppress be called before device_register to avoid the
> automatic notify, then after it returns setup the slave device as this
> patch does to avoid this problem report, and then call the KOBJ_ADD to
> make everything happy?
> 
> I'm based on commit ce7613db2d8d4d5af2587a.
> 
> On Tue, Apr 08, 2014 at 09:47:50PM +0200, Belisko Marek wrote:
>> Hi,
>> 
>> On Mon, Mar 24, 2014 at 1:01 PM, Evgeniy Polyakov <zbr@ioremap.net> wrote:
>>> Hi everyone
>>> 
>>> 24.03.2014, 01:50, "David Fries" <David@Fries.net>:
>>>> On Mon, Mar 17, 2014 at 10:38:13PM +0100, Belisko Marek wrote:
>>>>> booting latest 3.14-rc7 on gta04 board gives following warning:
>>>>> 
>>>>> [    3.101409] Driver for 1-wire Dallas network protocol.
>>>>> [    3.109649] omap_hdq omap_hdq: OMAP HDQ Hardware Rev 0.5. Driver in
>>>>> [    3.400146]        CPU0
>>>>> [    3.402709]        ----
>>>>> [    3.405273]   lock(&(&priv->bus_notifier)->rwsem);
>>>>> [    3.410308]   lock(&(&priv->bus_notifier)->rwsem);
>>>>> [    3.415374]
>>>>> [    3.415374]  *** DEADLOCK ***
>>>>> 
>>>>> AFAIK we didn't see that on (at least 3/14-rc2).
>>>> 
>>>> The last drivers/w1 change went in before v3.14-rc1, so if it is
>>>> something in the w1 code, it has either been there or something else
>>>> in the rest of the kernel changed to expose it.  I'm using the ds2490
>>>> USB w1 dongle, I didn't see that problem when enabling lock debugging
>>>> on 3.14.0-rc5+ which includes some changes I'm working on.
>>>> 
>>>> https://github.com/dfries/linux.git w1_rework
>>> 
>>> Can it be a bug in omap w1 driver?
>> We test with actual Linus master and it's even worse (it hangs completely).
>> Trace look little bit different and it only happens when booting via
>> device tree (not with board file).
>> 
>> [    2.807434] Driver for 1-wire Dallas network protocol.
>> [    2.814117] omap_hdq omap_hdq: OMAP HDQ Hardware Rev 0.5. Driver in
>> Interrupt mode
>> [    3.034271] call_modprobe: w1-family-0x1
>> [    3.051818]
>> [    3.053405] =============================================
>> [    3.059051] [ INFO: possible recursive locking detected ]
>> [    3.064727] 3.14.0-gta04 #556 Not tainted
>> [    3.068939] ---------------------------------------------
>> [    3.074615] w1_bus_master1/795 is trying to acquire lock:
>> [    3.080261]  (&(&priv->bus_notifier)->rwsem){.+.+.+}, at:
>> [<c0058ab0>] __blocking_notifier_call_chain+0x28/0x58
>> [    3.090911]
>> [    3.090911] but task is already holding lock:
>> [    3.097045]  (&(&priv->bus_notifier)->rwsem){.+.+.+}, at:
>> [<c0058ab0>] __blocking_notifier_call_chain+0x28/0x58
>> [    3.107666]
>> [    3.107666] other info that might help us debug this:
>> [    3.114501]  Possible unsafe locking scenario:
>> [    3.114501]
>> [    3.120727]        CPU0
>> [    3.123291]        ----
>> [    3.125854]   lock(&(&priv->bus_notifier)->rwsem);
>> [    3.130889]   lock(&(&priv->bus_notifier)->rwsem);
>> [    3.135925]
>> [    3.135925]  *** DEADLOCK ***
>> [    3.135925]
>> [    3.142150]  May be due to missing lock nesting notation
>> [    3.142150]
>> [    3.149291] 2 locks held by w1_bus_master1/795:
>> [    3.154052]  #0:  (&dev->mutex#3){+.+.+.}, at: [<c03ba1c8>]
>> w1_attach_slave_device+0xc4/0x1c8
>> [    3.163055]  #1:  (&(&priv->bus_notifier)->rwsem){.+.+.+}, at:
>> [<c0058ab0>] __blocking_notifier_call_chain+0x28/0x58
>> [    3.174163]
>> [    3.174163] stack backtrace:
>> [    3.178741] CPU: 0 PID: 795 Comm: w1_bus_master1 Not tainted
>> 3.14.0-gta04 #556
>> [    3.186370] [<c0013ca0>] (unwind_backtrace) from [<c0010f68>]
>> (show_stack+0x10/0x14)
>> [    3.194488] [<c0010f68>] (show_stack) from [<c04f0900>]
>> (dump_stack+0x6c/0x88)
>> [    3.202117] [<c04f0900>] (dump_stack) from [<c00746bc>]
>> (print_deadlock_bug+0xc0/0xf0)
>> [    3.210449] [<c00746bc>] (print_deadlock_bug) from [<c0075f80>]
>> (validate_chain.isra.29+0x4dc/0x534)
>> [    3.220031] [<c0075f80>] (validate_chain.isra.29) from [<c0076dac>]
>> (__lock_acquire+0x728/0x834)
>> [    3.229248] [<c0076dac>] (__lock_acquire) from [<c00774b8>]
>> (lock_acquire+0xf4/0x118)
>> [    3.237487] [<c00774b8>] (lock_acquire) from [<c04f8e04>]
>> (down_read+0x24/0x34)
>> [    3.245178] [<c04f8e04>] (down_read) from [<c0058ab0>]
>> (__blocking_notifier_call_chain+0x28/0x58)
>> [    3.254516] [<c0058ab0>] (__blocking_notifier_call_chain) from
>> [<c0058af4>] (blocking_notifier_call_chain+0x14/0x18)
>> [    3.265563] [<c0058af4>] (blocking_notifier_call_chain) from
>> [<c0301db8>] (device_add+0x1f4/0x368)
>> [    3.274993] [<c0301db8>] (device_add) from [<c03055cc>]
>> (platform_device_add+0x138/0x1c8)
>> [    3.283569] [<c03055cc>] (platform_device_add) from [<c03bd124>]
>> (w1_bq27000_add_slave+0x44/0x74)
>> [    3.292907] [<c03bd124>] (w1_bq27000_add_slave) from [<c03b9ed8>]
>> (w1_bus_notify.part.0+0x48/0xc8)
>> [    3.302337] [<c03b9ed8>] (w1_bus_notify.part.0) from [<c04fbe1c>]
>> (notifier_call_chain+0x38/0x68)
>> [    3.311645] [<c04fbe1c>] (notifier_call_chain) from [<c0058acc>]
>> (__blocking_notifier_call_chain+0x44/0x58)
>> [    3.321899] [<c0058acc>] (__blocking_notifier_call_chain) from
>> [<c0058af4>] (blocking_notifier_call_chain+0x14/0x18)
>> [    3.332946] [<c0058af4>] (blocking_notifier_call_chain) from
>> [<c0301db8>] (device_add+0x1f4/0x368)
>> [    3.342346] [<c0301db8>] (device_add) from [<c03b9ad0>]
>> (__w1_attach_slave_device+0x9c/0x134)
>> [    3.351318] [<c03b9ad0>] (__w1_attach_slave_device) from
>> [<c03ba238>] (w1_attach_slave_device+0x134/0x1c8)
>> [    3.361480] [<c03ba238>] (w1_attach_slave_device) from [<c03ba348>]
>> (w1_slave_found+0x7c/0x98)
>> [    3.370513] [<c03ba348>] (w1_slave_found) from [<c03bc818>]
>> (omap_w1_search_bus+0x54/0x5c)
>> [    3.379211] [<c03bc818>] (omap_w1_search_bus) from [<c03bc758>]
>> (w1_search_devices+0x3c/0x48)
>> [    3.388153] [<c03bc758>] (w1_search_devices) from [<c03baaec>]
>> (w1_search_process_cb+0x64/0x108)
>> [    3.397399] [<c03baaec>] (w1_search_process_cb) from [<c03bac60>]
>> (w1_process+0x68/0x14c)
>> [    3.405975] [<c03bac60>] (w1_process) from [<c0054bdc>] (kthread+0xdc/0xf0)
>> [    3.413299] [<c0054bdc>] (kthread) from [<c000dca8>]
>> (ret_from_fork+0x14/0x2c)
>> [   24.451049] INFO: rcu_sched detected stalls on CPUs/tasks: {}
>> (detected by 0, t=2102 jiffies, g=4294967086, c=4294967085, q=19)
>> [   24.463134] INFO: Stall ended before state dump start
>> [   87.501037] INFO: rcu_sched detected stalls on CPUs/tasks: {}
>> (detected by 0, t=8407 jiffies, g=4294967086, c=4294967085, q=19)
>> [   87.513122] INFO: Stall ended before state dump start
>> [  150.551055] INFO: rcu_sched detected stalls on CPUs/tasks: {}
>> (detected by 0, t=14712 jiffies, g=4294967086, c=4294967085, q=19)
>> [  150.563232] INFO: Stall ended before state dump start
>> [  213.601043] INFO: rcu_sched detected stalls on CPUs/tasks: {}
>> (detected by 0, t=21017 jiffies, g=4294967086, c=4294967085, q=19)
>> [  213.613220] INFO: Stall ended before state dump start
>> [  276.651062] INFO: rcu_sched detected stalls on CPUs/tasks: {}
>> (detected by 0, t=27322 jiffies, g=4294967086, c=4294967085, q=19)
>> [  276.663238] INFO: Stall ended before state dump start
>> 
>> BR,
>> 
>> marek
>> 
>> -- 
>> as simple and primitive as possible
>> -------------------------------------------------
>> Marek Belisko - OPEN-NANDRA
>> Freelance Developer
>> 
>> Ruska Nova Ves 219 | Presov, 08005 Slovak Republic
>> Tel: +421 915 052 184
>> skype: marekwhite
>> twitter: #opennandra
>> web: http://open-nandra.com
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
> 
> -- 
> David Fries <david@fries.net>    PGP pub CB1EE8F0
> http://fries.net/~david/
> 
> 
> commit b3afc24a3ca6c261a8fe2c08b5bbccb51233c09a
> Author: David Fries <David@Fries.net>
> Date:   Sun Apr 13 15:36:45 2014 -0500
> 
>    w1: avoid recursive device_add
> 
>    __w1_attach_slave_device calls device_add which calls w1_bus_notify
>    which calls for the w1_bq27000 slave driver, which calls
>    platform_device_add and device_add and deadlocks on getting
>    &(&priv->bus_notifier)->rwsem as it is still held in the previous
>    device_add.  This avoids the problem by processing the family
>    add/remove outside of the slave device_add call.  This does mean that
>    the slave device isn't setup when the notification that a slave is
>    available goes out.
> 
>    Reported-by: Belisko Marek <marek.belisko@gmail.com>
> 
> diff --git a/drivers/w1/w1.c b/drivers/w1/w1.c
> index b96f61b..dde09c1 100644
> --- a/drivers/w1/w1.c
> +++ b/drivers/w1/w1.c
> @@ -614,27 +614,11 @@ end:
> 	return err;
> }
> 
> -/*
> - * Handle sysfs file creation and removal here, before userspace is told that
> - * the device is added / removed from the system
> - */
> -static int w1_bus_notify(struct notifier_block *nb, unsigned long action,
> -			 void *data)
> +static int w1_family_notify(unsigned long action, struct w1_slave *sl)
> {
> -	struct device *dev = data;
> -	struct w1_slave *sl;
> 	struct w1_family_ops *fops;
> 	int err;
> 
> -	/*
> -	 * Only care about slave devices at the moment.  Yes, we should use a
> -	 * separate "type" for this, but for now, look at the release function
> -	 * to know which type it is...
> -	 */
> -	if (dev->release != w1_slave_release)
> -		return 0;
> -
> -	sl = dev_to_w1_slave(dev);
> 	fops = sl->family->fops;
> 
> 	if (!fops)
> @@ -673,10 +657,6 @@ static int w1_bus_notify(struct notifier_block *nb, unsigned long action,
> 	return 0;
> }
> 
> -static struct notifier_block w1_bus_nb = {
> -	.notifier_call = w1_bus_notify,
> -};
> -
> static int __w1_attach_slave_device(struct w1_slave *sl)
> {
> 	int err;
> @@ -705,6 +685,7 @@ static int __w1_attach_slave_device(struct w1_slave *sl)
> 			dev_name(&sl->dev), err);
> 		return err;
> 	}
> +	w1_family_notify(BUS_NOTIFY_ADD_DEVICE, sl);
> 
> 
> 	dev_set_uevent_suppress(&sl->dev, false);
> @@ -799,6 +780,7 @@ int w1_unref_slave(struct w1_slave *sl)
> 		msg.type = W1_SLAVE_REMOVE;
> 		w1_netlink_send(sl->master, &msg);
> 
> +		w1_family_notify(BUS_NOTIFY_DEL_DEVICE, sl);
> 		device_unregister(&sl->dev);
> 		#ifdef DEBUG
> 		memset(sl, 0, sizeof(*sl));
> @@ -1186,10 +1168,6 @@ static int __init w1_init(void)
> 		goto err_out_exit_init;
> 	}
> 
> -	retval = bus_register_notifier(&w1_bus_type, &w1_bus_nb);
> -	if (retval)
> -		goto err_out_bus_unregister;
> -
> 	retval = driver_register(&w1_master_driver);
> 	if (retval) {
> 		printk(KERN_ERR


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: w1: 3.14-rc7 - possible recursive locking detected
  2014-04-13 22:24       ` David Fries
  2014-04-14 13:10         ` Dr. H. Nikolaus Schaller
@ 2014-04-14 15:04         ` Greg Kroah-Hartman
  2014-04-16  6:21         ` [PATCH] w1: avoid recursive device_add David Fries
  2 siblings, 0 replies; 9+ messages in thread
From: Greg Kroah-Hartman @ 2014-04-14 15:04 UTC (permalink / raw)
  To: David Fries
  Cc: Belisko Marek, Evgeniy Polyakov, LKML, Dr. H. Nikolaus Schaller

On Sun, Apr 13, 2014 at 05:24:01PM -0500, David Fries wrote:
> Belisko Marek,
> Here is a possible solution, could you give it a try and report back?
> 
> Greg Kroah-Hartman,
> Evgeniy asked me to look into this report.  I don't have the
> reporter's hardware configuration, but I wouldn't think that would be
> needed, just some w1 bus master (even W1_MASTER_GPIO might work), then
> loading the slave device and manually adding a slave device with that
> family id.  Even then I didn't reproduce the reported recursive
> locking error.  I saw unrelated locking reports, but not this one.  I
> wrote up the included patch, but that undoes the notify changes that
> you added earlier in commit 47eba33a0997fc7, and I wanted to ask about
> that commit.  Specifically these two lines,
> 
>         err = device_register(&sl->dev);
> 	...
> +	dev_set_uevent_suppress(&sl->dev, false);
> +	kobject_uevent(&sl->dev.kobj, KOBJ_ADD);
> 
> Wouldn't the default be to not suppress?  Nothing in the W1 system
> enables suppressing so is that even needed?  (I'm fine with saying
> it's a good idea).
> device_register at some point must call device_add and device_add
> calls kobject_uevent(&dev->kobj, KOBJ_ADD);
> so doesn't the KOBJ_ADD send the add a second time?  As in it
> shouldn't be needed?
> Can the suppress be called before device_register to avoid the
> automatic notify, then after it returns setup the slave device as this
> patch does to avoid this problem report, and then call the KOBJ_ADD to
> make everything happy?

I really have no idea, if your fix resolves an issue, that's great, I'll
be glad to take it.  I have no w1 devices to test any of this, and don't
even remember writing that kernel patch :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] w1: avoid recursive device_add
  2014-04-13 22:24       ` David Fries
  2014-04-14 13:10         ` Dr. H. Nikolaus Schaller
  2014-04-14 15:04         ` Greg Kroah-Hartman
@ 2014-04-16  6:21         ` David Fries
  2014-04-16 20:46           ` Dr. H. Nikolaus Schaller
  2 siblings, 1 reply; 9+ messages in thread
From: David Fries @ 2014-04-16  6:21 UTC (permalink / raw)
  To: Evgeniy Polyakov, Greg Kroah-Hartman
  Cc: Belisko Marek, LKML, Dr. H. Nikolaus Schaller

__w1_attach_slave_device calls device_add which calls w1_bus_notify
which calls the w1_bq27000 slave driver, which calls
platform_device_add and device_add and deadlocks on getting
&(&priv->bus_notifier)->rwsem as it is still held in the previous
device_add.  This avoids the problem by processing the family
add/remove outside of the slave device_add call.

Commit 47eba33a0997fc7362a introduced this deadlock and added
a KOBJ_ADD, as the add was already reported in device_register two add
events were being sent.  This change suppresses the device_register
add so that any slave device sysfs entries are setup before the add
goes out.

Belisko Marek reported this change fixed the deadlock he was seeing on
ARM device tree, while testing on my x86-64 system never saw the
deadlock.

Reported-by: Belisko Marek <marek.belisko@gmail.com>
Signed-off-by: David Fries <David@Fries.net>
---
Evgeniy any comments?  The patch looks good to me.
Dr. H. Nikolaus Schaller, Can you give this patch another test?  I
don't expect any problems, but it is now only sending one add instead
of two which is a change.
I verified two KOBJ_ADD notifications were going out prior to this
change, and only one after.  There should only be one, also verified
that the w1_bq27000 add notification went out before the slave device
id, showing the suppression was working.  Used
udevadm monitor --kernel --property
for notification monitoring.

 drivers/w1/w1.c |   32 ++++++--------------------------
 1 file changed, 6 insertions(+), 26 deletions(-)

diff --git a/drivers/w1/w1.c b/drivers/w1/w1.c
index b96f61b..ff52618 100644
--- a/drivers/w1/w1.c
+++ b/drivers/w1/w1.c
@@ -614,27 +614,11 @@ end:
 	return err;
 }
 
-/*
- * Handle sysfs file creation and removal here, before userspace is told that
- * the device is added / removed from the system
- */
-static int w1_bus_notify(struct notifier_block *nb, unsigned long action,
-			 void *data)
+static int w1_family_notify(unsigned long action, struct w1_slave *sl)
 {
-	struct device *dev = data;
-	struct w1_slave *sl;
 	struct w1_family_ops *fops;
 	int err;
 
-	/*
-	 * Only care about slave devices at the moment.  Yes, we should use a
-	 * separate "type" for this, but for now, look at the release function
-	 * to know which type it is...
-	 */
-	if (dev->release != w1_slave_release)
-		return 0;
-
-	sl = dev_to_w1_slave(dev);
 	fops = sl->family->fops;
 
 	if (!fops)
@@ -673,10 +657,6 @@ static int w1_bus_notify(struct notifier_block *nb, unsigned long action,
 	return 0;
 }
 
-static struct notifier_block w1_bus_nb = {
-	.notifier_call = w1_bus_notify,
-};
-
 static int __w1_attach_slave_device(struct w1_slave *sl)
 {
 	int err;
@@ -698,6 +678,9 @@ static int __w1_attach_slave_device(struct w1_slave *sl)
 	dev_dbg(&sl->dev, "%s: registering %s as %p.\n", __func__,
 		dev_name(&sl->dev), sl);
 
+	/* suppress for w1_family_notify before sending KOBJ_ADD */
+	dev_set_uevent_suppress(&sl->dev, true);
+
 	err = device_register(&sl->dev);
 	if (err < 0) {
 		dev_err(&sl->dev,
@@ -705,7 +688,7 @@ static int __w1_attach_slave_device(struct w1_slave *sl)
 			dev_name(&sl->dev), err);
 		return err;
 	}
-
+	w1_family_notify(BUS_NOTIFY_ADD_DEVICE, sl);
 
 	dev_set_uevent_suppress(&sl->dev, false);
 	kobject_uevent(&sl->dev.kobj, KOBJ_ADD);
@@ -799,6 +782,7 @@ int w1_unref_slave(struct w1_slave *sl)
 		msg.type = W1_SLAVE_REMOVE;
 		w1_netlink_send(sl->master, &msg);
 
+		w1_family_notify(BUS_NOTIFY_DEL_DEVICE, sl);
 		device_unregister(&sl->dev);
 		#ifdef DEBUG
 		memset(sl, 0, sizeof(*sl));
@@ -1186,10 +1170,6 @@ static int __init w1_init(void)
 		goto err_out_exit_init;
 	}
 
-	retval = bus_register_notifier(&w1_bus_type, &w1_bus_nb);
-	if (retval)
-		goto err_out_bus_unregister;
-
 	retval = driver_register(&w1_master_driver);
 	if (retval) {
 		printk(KERN_ERR
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] w1: avoid recursive device_add
  2014-04-16  6:21         ` [PATCH] w1: avoid recursive device_add David Fries
@ 2014-04-16 20:46           ` Dr. H. Nikolaus Schaller
  0 siblings, 0 replies; 9+ messages in thread
From: Dr. H. Nikolaus Schaller @ 2014-04-16 20:46 UTC (permalink / raw)
  To: David Fries; +Cc: Evgeniy Polyakov, Greg Kroah-Hartman, Belisko Marek, LKML

Hi David,

Am 16.04.2014 um 08:21 schrieb David Fries:

> __w1_attach_slave_device calls device_add which calls w1_bus_notify
> which calls the w1_bq27000 slave driver, which calls
> platform_device_add and device_add and deadlocks on getting
> &(&priv->bus_notifier)->rwsem as it is still held in the previous
> device_add.  This avoids the problem by processing the family
> add/remove outside of the slave device_add call.
> 
> Commit 47eba33a0997fc7362a introduced this deadlock and added
> a KOBJ_ADD, as the add was already reported in device_register two add
> events were being sent.  This change suppresses the device_register
> add so that any slave device sysfs entries are setup before the add
> goes out.
> 
> Belisko Marek reported this change fixed the deadlock he was seeing on
> ARM device tree, while testing on my x86-64 system never saw the
> deadlock.
> 
> Reported-by: Belisko Marek <marek.belisko@gmail.com>
> Signed-off-by: David Fries <David@Fries.net>
> ---
> Evgeniy any comments?  The patch looks good to me.
> Dr. H. Nikolaus Schaller, Can you give this patch another test?  I
> don't expect any problems, but it is now only sending one add instead
> of two which is a change.

It merges fine and we don't see the problematic deadlock.

BR,
Nikolaus

> I verified two KOBJ_ADD notifications were going out prior to this
> change, and only one after.  There should only be one, also verified
> that the w1_bq27000 add notification went out before the slave device
> id, showing the suppression was working.  Used
> udevadm monitor --kernel --property
> for notification monitoring.
> 
> drivers/w1/w1.c |   32 ++++++--------------------------
> 1 file changed, 6 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/w1/w1.c b/drivers/w1/w1.c
> index b96f61b..ff52618 100644
> --- a/drivers/w1/w1.c
> +++ b/drivers/w1/w1.c
> @@ -614,27 +614,11 @@ end:
> 	return err;
> }
> 
> -/*
> - * Handle sysfs file creation and removal here, before userspace is told that
> - * the device is added / removed from the system
> - */
> -static int w1_bus_notify(struct notifier_block *nb, unsigned long action,
> -			 void *data)
> +static int w1_family_notify(unsigned long action, struct w1_slave *sl)
> {
> -	struct device *dev = data;
> -	struct w1_slave *sl;
> 	struct w1_family_ops *fops;
> 	int err;
> 
> -	/*
> -	 * Only care about slave devices at the moment.  Yes, we should use a
> -	 * separate "type" for this, but for now, look at the release function
> -	 * to know which type it is...
> -	 */
> -	if (dev->release != w1_slave_release)
> -		return 0;
> -
> -	sl = dev_to_w1_slave(dev);
> 	fops = sl->family->fops;
> 
> 	if (!fops)
> @@ -673,10 +657,6 @@ static int w1_bus_notify(struct notifier_block *nb, unsigned long action,
> 	return 0;
> }
> 
> -static struct notifier_block w1_bus_nb = {
> -	.notifier_call = w1_bus_notify,
> -};
> -
> static int __w1_attach_slave_device(struct w1_slave *sl)
> {
> 	int err;
> @@ -698,6 +678,9 @@ static int __w1_attach_slave_device(struct w1_slave *sl)
> 	dev_dbg(&sl->dev, "%s: registering %s as %p.\n", __func__,
> 		dev_name(&sl->dev), sl);
> 
> +	/* suppress for w1_family_notify before sending KOBJ_ADD */
> +	dev_set_uevent_suppress(&sl->dev, true);
> +
> 	err = device_register(&sl->dev);
> 	if (err < 0) {
> 		dev_err(&sl->dev,
> @@ -705,7 +688,7 @@ static int __w1_attach_slave_device(struct w1_slave *sl)
> 			dev_name(&sl->dev), err);
> 		return err;
> 	}
> -
> +	w1_family_notify(BUS_NOTIFY_ADD_DEVICE, sl);
> 
> 	dev_set_uevent_suppress(&sl->dev, false);
> 	kobject_uevent(&sl->dev.kobj, KOBJ_ADD);
> @@ -799,6 +782,7 @@ int w1_unref_slave(struct w1_slave *sl)
> 		msg.type = W1_SLAVE_REMOVE;
> 		w1_netlink_send(sl->master, &msg);
> 
> +		w1_family_notify(BUS_NOTIFY_DEL_DEVICE, sl);
> 		device_unregister(&sl->dev);
> 		#ifdef DEBUG
> 		memset(sl, 0, sizeof(*sl));
> @@ -1186,10 +1170,6 @@ static int __init w1_init(void)
> 		goto err_out_exit_init;
> 	}
> 
> -	retval = bus_register_notifier(&w1_bus_type, &w1_bus_nb);
> -	if (retval)
> -		goto err_out_bus_unregister;
> -
> 	retval = driver_register(&w1_master_driver);
> 	if (retval) {
> 		printk(KERN_ERR
> -- 
> 1.7.10.4


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-04-16 20:46 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-17 21:38 w1: 3.14-rc7 - possible recursive locking detected Belisko Marek
2014-03-23 21:50 ` David Fries
2014-03-24  5:55   ` Dr. H. Nikolaus Schaller
     [not found]   ` <698551395662480@web15j.yandex.ru>
2014-04-08 19:47     ` Belisko Marek
2014-04-13 22:24       ` David Fries
2014-04-14 13:10         ` Dr. H. Nikolaus Schaller
2014-04-14 15:04         ` Greg Kroah-Hartman
2014-04-16  6:21         ` [PATCH] w1: avoid recursive device_add David Fries
2014-04-16 20:46           ` Dr. H. Nikolaus Schaller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox