* w1: 3.14-rc7 - possible recursive locking detected
@ 2014-03-17 21:38 Belisko Marek
2014-03-23 21:50 ` David Fries
0 siblings, 1 reply; 9+ messages in thread
From: Belisko Marek @ 2014-03-17 21:38 UTC (permalink / raw)
To: LKML, zbr; +Cc: Dr. H. Nikolaus Schaller
Hi,
booting latest 3.14-rc7 on gta04 board gives following warning:
[ 3.101409] Driver for 1-wire Dallas network protocol.
[ 3.109649] omap_hdq omap_hdq: OMAP HDQ Hardware Rev 0.5. Driver in
Interrupt mode
[ 3.330810]
[ 3.332397] =============================================
[ 3.338073] [ INFO: possible recursive locking detected ]
[ 3.343749] 3.14.0-rc7-gta04 #520 Not tainted
[ 3.348327] ---------------------------------------------
[ 3.353973] w1_bus_master1/805 is trying to acquire lock:
[ 3.359649] (&(&priv->bus_notifier)->rwsem){.+.+.+}, at:
[<c005977c>] __blocking_notifier_call_chain+0x28/0x58
[ 3.370300]
[ 3.370300] but task is already holding lock:
[ 3.376434] (&(&priv->bus_notifier)->rwsem){.+.+.+}, at:
[<c005977c>] __blocking_notifier_call_chain+0x28/0x58
[ 3.387084]
[ 3.387084] other info that might help us debug this:
[ 3.393920] Possible unsafe locking scenario:
[ 3.393920]
[ 3.400146] CPU0
[ 3.402709] ----
[ 3.405273] lock(&(&priv->bus_notifier)->rwsem);
[ 3.410308] lock(&(&priv->bus_notifier)->rwsem);
[ 3.415374]
[ 3.415374] *** DEADLOCK ***
[ 3.415374]
[ 3.421569] May be due to missing lock nesting notation
[ 3.421569]
[ 3.428710] 2 locks held by w1_bus_master1/805:
[ 3.433471] #0: (&dev->mutex#2){+.+.+.}, at: [<c03bc5d4>]
w1_attach_slave_device+0xb8/0x1c4
[ 3.442504] #1: (&(&priv->bus_notifier)->rwsem){.+.+.+}, at:
[<c005977c>] __blocking_notifier_call_chain+0x28/0x58
[ 3.453582]
[ 3.453582] stack backtrace:
[ 3.458190] CPU: 0 PID: 805 Comm: w1_bus_master1 Not tainted
3.14.0-rc7-gta04 #520
[ 3.466156] [<c0014418>] (unwind_backtrace) from [<c0011788>]
(show_stack+0x10/0x14)
[ 3.474304] [<c0011788>] (show_stack) from [<c04ee204>]
(dump_stack+0x68/0x84)
[ 3.481933] [<c04ee204>] (dump_stack) from [<c007616c>]
(print_deadlock_bug+0xc0/0xf0)
[ 3.490264] [<c007616c>] (print_deadlock_bug) from [<c0077a34>]
(validate_chain.isra.27+0x4e0/0x538)
[ 3.499847] [<c0077a34>] (validate_chain.isra.27) from [<c0078864>]
(__lock_acquire+0x72c/0x838)
[ 3.509094] [<c0078864>] (__lock_acquire) from [<c0078f84>]
(lock_acquire+0xf4/0x118)
[ 3.517333] [<c0078f84>] (lock_acquire) from [<c04f69b0>]
(down_read+0x28/0x38)
[ 3.525024] [<c04f69b0>] (down_read) from [<c005977c>]
(__blocking_notifier_call_chain+0x28/0x58)
[ 3.534362] [<c005977c>] (__blocking_notifier_call_chain) from
[<c00597c0>] (blocking_notifier_call_chain+0x14/0x18)
[ 3.545440] [<c00597c0>] (blocking_notifier_call_chain) from
[<c0306940>] (device_add+0x1f4/0x368)
[ 3.554870] [<c0306940>] (device_add) from [<c030a158>]
(platform_device_add+0x138/0x1c8)
[ 3.563476] [<c030a158>] (platform_device_add) from [<c03bf31c>]
(w1_bq27000_add_slave+0x44/0x74)
[ 3.572784] [<c03bf31c>] (w1_bq27000_add_slave) from [<c03bca78>]
(w1_bus_notify.part.0+0x48/0xc8)
[ 3.582214] [<c03bca78>] (w1_bus_notify.part.0) from [<c04f996c>]
(notifier_call_chain+0x38/0x68)
[ 3.591522] [<c04f996c>] (notifier_call_chain) from [<c0059798>]
(__blocking_notifier_call_chain+0x44/0x58)
[ 3.601776] [<c0059798>] (__blocking_notifier_call_chain) from
[<c00597c0>] (blocking_notifier_call_chain+0x14/0x18)
[ 3.612823] [<c00597c0>] (blocking_notifier_call_chain) from
[<c0306940>] (device_add+0x1f4/0x368)
[ 3.622253] [<c0306940>] (device_add) from [<c03bc4a0>]
(__w1_attach_slave_device+0x9c/0x118)
[ 3.631225] [<c03bc4a0>] (__w1_attach_slave_device) from
[<c03bc644>] (w1_attach_slave_device+0x128/0x1c4)
[ 3.641357] [<c03bc644>] (w1_attach_slave_device) from [<c03bc75c>]
(w1_slave_found+0x7c/0x98)
[ 3.650421] [<c03bc75c>] (w1_slave_found) from [<c03bea10>]
(omap_w1_search_bus+0x54/0x5c)
[ 3.659118] [<c03bea10>] (omap_w1_search_bus) from [<c03be950>]
(w1_search_devices+0x3c/0x48)
[ 3.668090] [<c03be950>] (w1_search_devices) from [<c03bd210>]
(w1_search_process_cb+0x4c/0xc4)
[ 3.677215] [<c03bd210>] (w1_search_process_cb) from [<c03bd2e0>]
(w1_process+0x58/0x10c)
[ 3.685821] [<c03bd2e0>] (w1_process) from [<c0055850>] (kthread+0xc8/0xdc)
[ 3.693145] [<c0055850>] (kthread) from [<c000dc68>]
(ret_from_fork+0x14/0x2c)
[ 3.751068] omap_wdt: OMAP Watchdog Timer Rev 0x31: initial timeout 60 sec
AFAIK we didn't see that on (at least 3/14-rc2).
BR,
marek
--
as simple and primitive as possible
-------------------------------------------------
Marek Belisko - OPEN-NANDRA
Freelance Developer
Ruska Nova Ves 219 | Presov, 08005 Slovak Republic
Tel: +421 915 052 184
skype: marekwhite
twitter: #opennandra
web: http://open-nandra.com
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: w1: 3.14-rc7 - possible recursive locking detected 2014-03-17 21:38 w1: 3.14-rc7 - possible recursive locking detected Belisko Marek @ 2014-03-23 21:50 ` David Fries 2014-03-24 5:55 ` Dr. H. Nikolaus Schaller [not found] ` <698551395662480@web15j.yandex.ru> 0 siblings, 2 replies; 9+ messages in thread From: David Fries @ 2014-03-23 21:50 UTC (permalink / raw) To: Belisko Marek; +Cc: LKML, zbr, Dr. H. Nikolaus Schaller On Mon, Mar 17, 2014 at 10:38:13PM +0100, Belisko Marek wrote: > Hi, > > booting latest 3.14-rc7 on gta04 board gives following warning: > > [ 3.101409] Driver for 1-wire Dallas network protocol. > [ 3.109649] omap_hdq omap_hdq: OMAP HDQ Hardware Rev 0.5. Driver in > [ 3.400146] CPU0 > [ 3.402709] ---- > [ 3.405273] lock(&(&priv->bus_notifier)->rwsem); > [ 3.410308] lock(&(&priv->bus_notifier)->rwsem); > [ 3.415374] > [ 3.415374] *** DEADLOCK *** > > AFAIK we didn't see that on (at least 3/14-rc2). The last drivers/w1 change went in before v3.14-rc1, so if it is something in the w1 code, it has either been there or something else in the rest of the kernel changed to expose it. I'm using the ds2490 USB w1 dongle, I didn't see that problem when enabling lock debugging on 3.14.0-rc5+ which includes some changes I'm working on. https://github.com/dfries/linux.git w1_rework -- David Fries <david@fries.net> PGP pub CB1EE8F0 http://fries.net/~david/ ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: w1: 3.14-rc7 - possible recursive locking detected 2014-03-23 21:50 ` David Fries @ 2014-03-24 5:55 ` Dr. H. Nikolaus Schaller [not found] ` <698551395662480@web15j.yandex.ru> 1 sibling, 0 replies; 9+ messages in thread From: Dr. H. Nikolaus Schaller @ 2014-03-24 5:55 UTC (permalink / raw) To: David Fries; +Cc: Belisko Marek, LKML, zbr Hi, Am 23.03.2014 um 22:50 schrieb David Fries: > On Mon, Mar 17, 2014 at 10:38:13PM +0100, Belisko Marek wrote: >> Hi, >> >> booting latest 3.14-rc7 on gta04 board gives following warning: >> >> [ 3.101409] Driver for 1-wire Dallas network protocol. >> [ 3.109649] omap_hdq omap_hdq: OMAP HDQ Hardware Rev 0.5. Driver in >> [ 3.400146] CPU0 >> [ 3.402709] ---- >> [ 3.405273] lock(&(&priv->bus_notifier)->rwsem); >> [ 3.410308] lock(&(&priv->bus_notifier)->rwsem); >> [ 3.415374] >> [ 3.415374] *** DEADLOCK *** >> >> AFAIK we didn't see that on (at least 3/14-rc2). > > The last drivers/w1 change went in before v3.14-rc1, so if it is > something in the w1 code, it has either been there or something else > in the rest of the kernel changed to expose it. I'm using the ds2490 > USB w1 dongle, I didn't see that problem when enabling lock debugging > on 3.14.0-rc5+ which includes some changes I'm working on. > > https://github.com/dfries/linux.git w1_rework Further analysis shows that It only occurs when using the device tree. So it may have to do something with nested driver probing/locking running differently between DT and board file. BR, Nikolaus ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <698551395662480@web15j.yandex.ru>]
* Re: w1: 3.14-rc7 - possible recursive locking detected [not found] ` <698551395662480@web15j.yandex.ru> @ 2014-04-08 19:47 ` Belisko Marek 2014-04-13 22:24 ` David Fries 0 siblings, 1 reply; 9+ messages in thread From: Belisko Marek @ 2014-04-08 19:47 UTC (permalink / raw) To: Evgeniy Polyakov; +Cc: David Fries, LKML, Dr. H. Nikolaus Schaller Hi, On Mon, Mar 24, 2014 at 1:01 PM, Evgeniy Polyakov <zbr@ioremap.net> wrote: > Hi everyone > > 24.03.2014, 01:50, "David Fries" <David@Fries.net>: >> On Mon, Mar 17, 2014 at 10:38:13PM +0100, Belisko Marek wrote: >>> booting latest 3.14-rc7 on gta04 board gives following warning: >>> >>> [ 3.101409] Driver for 1-wire Dallas network protocol. >>> [ 3.109649] omap_hdq omap_hdq: OMAP HDQ Hardware Rev 0.5. Driver in >>> [ 3.400146] CPU0 >>> [ 3.402709] ---- >>> [ 3.405273] lock(&(&priv->bus_notifier)->rwsem); >>> [ 3.410308] lock(&(&priv->bus_notifier)->rwsem); >>> [ 3.415374] >>> [ 3.415374] *** DEADLOCK *** >>> >>> AFAIK we didn't see that on (at least 3/14-rc2). >> >> The last drivers/w1 change went in before v3.14-rc1, so if it is >> something in the w1 code, it has either been there or something else >> in the rest of the kernel changed to expose it. I'm using the ds2490 >> USB w1 dongle, I didn't see that problem when enabling lock debugging >> on 3.14.0-rc5+ which includes some changes I'm working on. >> >> https://github.com/dfries/linux.git w1_rework > > Can it be a bug in omap w1 driver? We test with actual Linus master and it's even worse (it hangs completely). Trace look little bit different and it only happens when booting via device tree (not with board file). [ 2.807434] Driver for 1-wire Dallas network protocol. [ 2.814117] omap_hdq omap_hdq: OMAP HDQ Hardware Rev 0.5. Driver in Interrupt mode [ 3.034271] call_modprobe: w1-family-0x1 [ 3.051818] [ 3.053405] ============================================= [ 3.059051] [ INFO: possible recursive locking detected ] [ 3.064727] 3.14.0-gta04 #556 Not tainted [ 3.068939] --------------------------------------------- [ 3.074615] w1_bus_master1/795 is trying to acquire lock: [ 3.080261] (&(&priv->bus_notifier)->rwsem){.+.+.+}, at: [<c0058ab0>] __blocking_notifier_call_chain+0x28/0x58 [ 3.090911] [ 3.090911] but task is already holding lock: [ 3.097045] (&(&priv->bus_notifier)->rwsem){.+.+.+}, at: [<c0058ab0>] __blocking_notifier_call_chain+0x28/0x58 [ 3.107666] [ 3.107666] other info that might help us debug this: [ 3.114501] Possible unsafe locking scenario: [ 3.114501] [ 3.120727] CPU0 [ 3.123291] ---- [ 3.125854] lock(&(&priv->bus_notifier)->rwsem); [ 3.130889] lock(&(&priv->bus_notifier)->rwsem); [ 3.135925] [ 3.135925] *** DEADLOCK *** [ 3.135925] [ 3.142150] May be due to missing lock nesting notation [ 3.142150] [ 3.149291] 2 locks held by w1_bus_master1/795: [ 3.154052] #0: (&dev->mutex#3){+.+.+.}, at: [<c03ba1c8>] w1_attach_slave_device+0xc4/0x1c8 [ 3.163055] #1: (&(&priv->bus_notifier)->rwsem){.+.+.+}, at: [<c0058ab0>] __blocking_notifier_call_chain+0x28/0x58 [ 3.174163] [ 3.174163] stack backtrace: [ 3.178741] CPU: 0 PID: 795 Comm: w1_bus_master1 Not tainted 3.14.0-gta04 #556 [ 3.186370] [<c0013ca0>] (unwind_backtrace) from [<c0010f68>] (show_stack+0x10/0x14) [ 3.194488] [<c0010f68>] (show_stack) from [<c04f0900>] (dump_stack+0x6c/0x88) [ 3.202117] [<c04f0900>] (dump_stack) from [<c00746bc>] (print_deadlock_bug+0xc0/0xf0) [ 3.210449] [<c00746bc>] (print_deadlock_bug) from [<c0075f80>] (validate_chain.isra.29+0x4dc/0x534) [ 3.220031] [<c0075f80>] (validate_chain.isra.29) from [<c0076dac>] (__lock_acquire+0x728/0x834) [ 3.229248] [<c0076dac>] (__lock_acquire) from [<c00774b8>] (lock_acquire+0xf4/0x118) [ 3.237487] [<c00774b8>] (lock_acquire) from [<c04f8e04>] (down_read+0x24/0x34) [ 3.245178] [<c04f8e04>] (down_read) from [<c0058ab0>] (__blocking_notifier_call_chain+0x28/0x58) [ 3.254516] [<c0058ab0>] (__blocking_notifier_call_chain) from [<c0058af4>] (blocking_notifier_call_chain+0x14/0x18) [ 3.265563] [<c0058af4>] (blocking_notifier_call_chain) from [<c0301db8>] (device_add+0x1f4/0x368) [ 3.274993] [<c0301db8>] (device_add) from [<c03055cc>] (platform_device_add+0x138/0x1c8) [ 3.283569] [<c03055cc>] (platform_device_add) from [<c03bd124>] (w1_bq27000_add_slave+0x44/0x74) [ 3.292907] [<c03bd124>] (w1_bq27000_add_slave) from [<c03b9ed8>] (w1_bus_notify.part.0+0x48/0xc8) [ 3.302337] [<c03b9ed8>] (w1_bus_notify.part.0) from [<c04fbe1c>] (notifier_call_chain+0x38/0x68) [ 3.311645] [<c04fbe1c>] (notifier_call_chain) from [<c0058acc>] (__blocking_notifier_call_chain+0x44/0x58) [ 3.321899] [<c0058acc>] (__blocking_notifier_call_chain) from [<c0058af4>] (blocking_notifier_call_chain+0x14/0x18) [ 3.332946] [<c0058af4>] (blocking_notifier_call_chain) from [<c0301db8>] (device_add+0x1f4/0x368) [ 3.342346] [<c0301db8>] (device_add) from [<c03b9ad0>] (__w1_attach_slave_device+0x9c/0x134) [ 3.351318] [<c03b9ad0>] (__w1_attach_slave_device) from [<c03ba238>] (w1_attach_slave_device+0x134/0x1c8) [ 3.361480] [<c03ba238>] (w1_attach_slave_device) from [<c03ba348>] (w1_slave_found+0x7c/0x98) [ 3.370513] [<c03ba348>] (w1_slave_found) from [<c03bc818>] (omap_w1_search_bus+0x54/0x5c) [ 3.379211] [<c03bc818>] (omap_w1_search_bus) from [<c03bc758>] (w1_search_devices+0x3c/0x48) [ 3.388153] [<c03bc758>] (w1_search_devices) from [<c03baaec>] (w1_search_process_cb+0x64/0x108) [ 3.397399] [<c03baaec>] (w1_search_process_cb) from [<c03bac60>] (w1_process+0x68/0x14c) [ 3.405975] [<c03bac60>] (w1_process) from [<c0054bdc>] (kthread+0xdc/0xf0) [ 3.413299] [<c0054bdc>] (kthread) from [<c000dca8>] (ret_from_fork+0x14/0x2c) [ 24.451049] INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 0, t=2102 jiffies, g=4294967086, c=4294967085, q=19) [ 24.463134] INFO: Stall ended before state dump start [ 87.501037] INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 0, t=8407 jiffies, g=4294967086, c=4294967085, q=19) [ 87.513122] INFO: Stall ended before state dump start [ 150.551055] INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 0, t=14712 jiffies, g=4294967086, c=4294967085, q=19) [ 150.563232] INFO: Stall ended before state dump start [ 213.601043] INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 0, t=21017 jiffies, g=4294967086, c=4294967085, q=19) [ 213.613220] INFO: Stall ended before state dump start [ 276.651062] INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 0, t=27322 jiffies, g=4294967086, c=4294967085, q=19) [ 276.663238] INFO: Stall ended before state dump start BR, marek -- as simple and primitive as possible ------------------------------------------------- Marek Belisko - OPEN-NANDRA Freelance Developer Ruska Nova Ves 219 | Presov, 08005 Slovak Republic Tel: +421 915 052 184 skype: marekwhite twitter: #opennandra web: http://open-nandra.com ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: w1: 3.14-rc7 - possible recursive locking detected 2014-04-08 19:47 ` Belisko Marek @ 2014-04-13 22:24 ` David Fries 2014-04-14 13:10 ` Dr. H. Nikolaus Schaller ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: David Fries @ 2014-04-13 22:24 UTC (permalink / raw) To: Belisko Marek Cc: Evgeniy Polyakov, LKML, Dr. H. Nikolaus Schaller, Greg Kroah-Hartman Belisko Marek, Here is a possible solution, could you give it a try and report back? Greg Kroah-Hartman, Evgeniy asked me to look into this report. I don't have the reporter's hardware configuration, but I wouldn't think that would be needed, just some w1 bus master (even W1_MASTER_GPIO might work), then loading the slave device and manually adding a slave device with that family id. Even then I didn't reproduce the reported recursive locking error. I saw unrelated locking reports, but not this one. I wrote up the included patch, but that undoes the notify changes that you added earlier in commit 47eba33a0997fc7, and I wanted to ask about that commit. Specifically these two lines, err = device_register(&sl->dev); ... + dev_set_uevent_suppress(&sl->dev, false); + kobject_uevent(&sl->dev.kobj, KOBJ_ADD); Wouldn't the default be to not suppress? Nothing in the W1 system enables suppressing so is that even needed? (I'm fine with saying it's a good idea). device_register at some point must call device_add and device_add calls kobject_uevent(&dev->kobj, KOBJ_ADD); so doesn't the KOBJ_ADD send the add a second time? As in it shouldn't be needed? Can the suppress be called before device_register to avoid the automatic notify, then after it returns setup the slave device as this patch does to avoid this problem report, and then call the KOBJ_ADD to make everything happy? I'm based on commit ce7613db2d8d4d5af2587a. On Tue, Apr 08, 2014 at 09:47:50PM +0200, Belisko Marek wrote: > Hi, > > On Mon, Mar 24, 2014 at 1:01 PM, Evgeniy Polyakov <zbr@ioremap.net> wrote: > > Hi everyone > > > > 24.03.2014, 01:50, "David Fries" <David@Fries.net>: > >> On Mon, Mar 17, 2014 at 10:38:13PM +0100, Belisko Marek wrote: > >>> booting latest 3.14-rc7 on gta04 board gives following warning: > >>> > >>> [ 3.101409] Driver for 1-wire Dallas network protocol. > >>> [ 3.109649] omap_hdq omap_hdq: OMAP HDQ Hardware Rev 0.5. Driver in > >>> [ 3.400146] CPU0 > >>> [ 3.402709] ---- > >>> [ 3.405273] lock(&(&priv->bus_notifier)->rwsem); > >>> [ 3.410308] lock(&(&priv->bus_notifier)->rwsem); > >>> [ 3.415374] > >>> [ 3.415374] *** DEADLOCK *** > >>> > >>> AFAIK we didn't see that on (at least 3/14-rc2). > >> > >> The last drivers/w1 change went in before v3.14-rc1, so if it is > >> something in the w1 code, it has either been there or something else > >> in the rest of the kernel changed to expose it. I'm using the ds2490 > >> USB w1 dongle, I didn't see that problem when enabling lock debugging > >> on 3.14.0-rc5+ which includes some changes I'm working on. > >> > >> https://github.com/dfries/linux.git w1_rework > > > > Can it be a bug in omap w1 driver? > We test with actual Linus master and it's even worse (it hangs completely). > Trace look little bit different and it only happens when booting via > device tree (not with board file). > > [ 2.807434] Driver for 1-wire Dallas network protocol. > [ 2.814117] omap_hdq omap_hdq: OMAP HDQ Hardware Rev 0.5. Driver in > Interrupt mode > [ 3.034271] call_modprobe: w1-family-0x1 > [ 3.051818] > [ 3.053405] ============================================= > [ 3.059051] [ INFO: possible recursive locking detected ] > [ 3.064727] 3.14.0-gta04 #556 Not tainted > [ 3.068939] --------------------------------------------- > [ 3.074615] w1_bus_master1/795 is trying to acquire lock: > [ 3.080261] (&(&priv->bus_notifier)->rwsem){.+.+.+}, at: > [<c0058ab0>] __blocking_notifier_call_chain+0x28/0x58 > [ 3.090911] > [ 3.090911] but task is already holding lock: > [ 3.097045] (&(&priv->bus_notifier)->rwsem){.+.+.+}, at: > [<c0058ab0>] __blocking_notifier_call_chain+0x28/0x58 > [ 3.107666] > [ 3.107666] other info that might help us debug this: > [ 3.114501] Possible unsafe locking scenario: > [ 3.114501] > [ 3.120727] CPU0 > [ 3.123291] ---- > [ 3.125854] lock(&(&priv->bus_notifier)->rwsem); > [ 3.130889] lock(&(&priv->bus_notifier)->rwsem); > [ 3.135925] > [ 3.135925] *** DEADLOCK *** > [ 3.135925] > [ 3.142150] May be due to missing lock nesting notation > [ 3.142150] > [ 3.149291] 2 locks held by w1_bus_master1/795: > [ 3.154052] #0: (&dev->mutex#3){+.+.+.}, at: [<c03ba1c8>] > w1_attach_slave_device+0xc4/0x1c8 > [ 3.163055] #1: (&(&priv->bus_notifier)->rwsem){.+.+.+}, at: > [<c0058ab0>] __blocking_notifier_call_chain+0x28/0x58 > [ 3.174163] > [ 3.174163] stack backtrace: > [ 3.178741] CPU: 0 PID: 795 Comm: w1_bus_master1 Not tainted > 3.14.0-gta04 #556 > [ 3.186370] [<c0013ca0>] (unwind_backtrace) from [<c0010f68>] > (show_stack+0x10/0x14) > [ 3.194488] [<c0010f68>] (show_stack) from [<c04f0900>] > (dump_stack+0x6c/0x88) > [ 3.202117] [<c04f0900>] (dump_stack) from [<c00746bc>] > (print_deadlock_bug+0xc0/0xf0) > [ 3.210449] [<c00746bc>] (print_deadlock_bug) from [<c0075f80>] > (validate_chain.isra.29+0x4dc/0x534) > [ 3.220031] [<c0075f80>] (validate_chain.isra.29) from [<c0076dac>] > (__lock_acquire+0x728/0x834) > [ 3.229248] [<c0076dac>] (__lock_acquire) from [<c00774b8>] > (lock_acquire+0xf4/0x118) > [ 3.237487] [<c00774b8>] (lock_acquire) from [<c04f8e04>] > (down_read+0x24/0x34) > [ 3.245178] [<c04f8e04>] (down_read) from [<c0058ab0>] > (__blocking_notifier_call_chain+0x28/0x58) > [ 3.254516] [<c0058ab0>] (__blocking_notifier_call_chain) from > [<c0058af4>] (blocking_notifier_call_chain+0x14/0x18) > [ 3.265563] [<c0058af4>] (blocking_notifier_call_chain) from > [<c0301db8>] (device_add+0x1f4/0x368) > [ 3.274993] [<c0301db8>] (device_add) from [<c03055cc>] > (platform_device_add+0x138/0x1c8) > [ 3.283569] [<c03055cc>] (platform_device_add) from [<c03bd124>] > (w1_bq27000_add_slave+0x44/0x74) > [ 3.292907] [<c03bd124>] (w1_bq27000_add_slave) from [<c03b9ed8>] > (w1_bus_notify.part.0+0x48/0xc8) > [ 3.302337] [<c03b9ed8>] (w1_bus_notify.part.0) from [<c04fbe1c>] > (notifier_call_chain+0x38/0x68) > [ 3.311645] [<c04fbe1c>] (notifier_call_chain) from [<c0058acc>] > (__blocking_notifier_call_chain+0x44/0x58) > [ 3.321899] [<c0058acc>] (__blocking_notifier_call_chain) from > [<c0058af4>] (blocking_notifier_call_chain+0x14/0x18) > [ 3.332946] [<c0058af4>] (blocking_notifier_call_chain) from > [<c0301db8>] (device_add+0x1f4/0x368) > [ 3.342346] [<c0301db8>] (device_add) from [<c03b9ad0>] > (__w1_attach_slave_device+0x9c/0x134) > [ 3.351318] [<c03b9ad0>] (__w1_attach_slave_device) from > [<c03ba238>] (w1_attach_slave_device+0x134/0x1c8) > [ 3.361480] [<c03ba238>] (w1_attach_slave_device) from [<c03ba348>] > (w1_slave_found+0x7c/0x98) > [ 3.370513] [<c03ba348>] (w1_slave_found) from [<c03bc818>] > (omap_w1_search_bus+0x54/0x5c) > [ 3.379211] [<c03bc818>] (omap_w1_search_bus) from [<c03bc758>] > (w1_search_devices+0x3c/0x48) > [ 3.388153] [<c03bc758>] (w1_search_devices) from [<c03baaec>] > (w1_search_process_cb+0x64/0x108) > [ 3.397399] [<c03baaec>] (w1_search_process_cb) from [<c03bac60>] > (w1_process+0x68/0x14c) > [ 3.405975] [<c03bac60>] (w1_process) from [<c0054bdc>] (kthread+0xdc/0xf0) > [ 3.413299] [<c0054bdc>] (kthread) from [<c000dca8>] > (ret_from_fork+0x14/0x2c) > [ 24.451049] INFO: rcu_sched detected stalls on CPUs/tasks: {} > (detected by 0, t=2102 jiffies, g=4294967086, c=4294967085, q=19) > [ 24.463134] INFO: Stall ended before state dump start > [ 87.501037] INFO: rcu_sched detected stalls on CPUs/tasks: {} > (detected by 0, t=8407 jiffies, g=4294967086, c=4294967085, q=19) > [ 87.513122] INFO: Stall ended before state dump start > [ 150.551055] INFO: rcu_sched detected stalls on CPUs/tasks: {} > (detected by 0, t=14712 jiffies, g=4294967086, c=4294967085, q=19) > [ 150.563232] INFO: Stall ended before state dump start > [ 213.601043] INFO: rcu_sched detected stalls on CPUs/tasks: {} > (detected by 0, t=21017 jiffies, g=4294967086, c=4294967085, q=19) > [ 213.613220] INFO: Stall ended before state dump start > [ 276.651062] INFO: rcu_sched detected stalls on CPUs/tasks: {} > (detected by 0, t=27322 jiffies, g=4294967086, c=4294967085, q=19) > [ 276.663238] INFO: Stall ended before state dump start > > BR, > > marek > > -- > as simple and primitive as possible > ------------------------------------------------- > Marek Belisko - OPEN-NANDRA > Freelance Developer > > Ruska Nova Ves 219 | Presov, 08005 Slovak Republic > Tel: +421 915 052 184 > skype: marekwhite > twitter: #opennandra > web: http://open-nandra.com > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- David Fries <david@fries.net> PGP pub CB1EE8F0 http://fries.net/~david/ commit b3afc24a3ca6c261a8fe2c08b5bbccb51233c09a Author: David Fries <David@Fries.net> Date: Sun Apr 13 15:36:45 2014 -0500 w1: avoid recursive device_add __w1_attach_slave_device calls device_add which calls w1_bus_notify which calls for the w1_bq27000 slave driver, which calls platform_device_add and device_add and deadlocks on getting &(&priv->bus_notifier)->rwsem as it is still held in the previous device_add. This avoids the problem by processing the family add/remove outside of the slave device_add call. This does mean that the slave device isn't setup when the notification that a slave is available goes out. Reported-by: Belisko Marek <marek.belisko@gmail.com> diff --git a/drivers/w1/w1.c b/drivers/w1/w1.c index b96f61b..dde09c1 100644 --- a/drivers/w1/w1.c +++ b/drivers/w1/w1.c @@ -614,27 +614,11 @@ end: return err; } -/* - * Handle sysfs file creation and removal here, before userspace is told that - * the device is added / removed from the system - */ -static int w1_bus_notify(struct notifier_block *nb, unsigned long action, - void *data) +static int w1_family_notify(unsigned long action, struct w1_slave *sl) { - struct device *dev = data; - struct w1_slave *sl; struct w1_family_ops *fops; int err; - /* - * Only care about slave devices at the moment. Yes, we should use a - * separate "type" for this, but for now, look at the release function - * to know which type it is... - */ - if (dev->release != w1_slave_release) - return 0; - - sl = dev_to_w1_slave(dev); fops = sl->family->fops; if (!fops) @@ -673,10 +657,6 @@ static int w1_bus_notify(struct notifier_block *nb, unsigned long action, return 0; } -static struct notifier_block w1_bus_nb = { - .notifier_call = w1_bus_notify, -}; - static int __w1_attach_slave_device(struct w1_slave *sl) { int err; @@ -705,6 +685,7 @@ static int __w1_attach_slave_device(struct w1_slave *sl) dev_name(&sl->dev), err); return err; } + w1_family_notify(BUS_NOTIFY_ADD_DEVICE, sl); dev_set_uevent_suppress(&sl->dev, false); @@ -799,6 +780,7 @@ int w1_unref_slave(struct w1_slave *sl) msg.type = W1_SLAVE_REMOVE; w1_netlink_send(sl->master, &msg); + w1_family_notify(BUS_NOTIFY_DEL_DEVICE, sl); device_unregister(&sl->dev); #ifdef DEBUG memset(sl, 0, sizeof(*sl)); @@ -1186,10 +1168,6 @@ static int __init w1_init(void) goto err_out_exit_init; } - retval = bus_register_notifier(&w1_bus_type, &w1_bus_nb); - if (retval) - goto err_out_bus_unregister; - retval = driver_register(&w1_master_driver); if (retval) { printk(KERN_ERR ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: w1: 3.14-rc7 - possible recursive locking detected 2014-04-13 22:24 ` David Fries @ 2014-04-14 13:10 ` Dr. H. Nikolaus Schaller 2014-04-14 15:04 ` Greg Kroah-Hartman 2014-04-16 6:21 ` [PATCH] w1: avoid recursive device_add David Fries 2 siblings, 0 replies; 9+ messages in thread From: Dr. H. Nikolaus Schaller @ 2014-04-14 13:10 UTC (permalink / raw) To: David Fries; +Cc: Belisko Marek, Evgeniy Polyakov, LKML, Greg Kroah-Hartman Hi David, I have tested and it appears to suppress the locking issue on our GTA04 board. BR, Nikolaus Am 14.04.2014 um 00:24 schrieb David Fries: > Belisko Marek, > Here is a possible solution, could you give it a try and report back? > > Greg Kroah-Hartman, > Evgeniy asked me to look into this report. I don't have the > reporter's hardware configuration, but I wouldn't think that would be > needed, just some w1 bus master (even W1_MASTER_GPIO might work), then > loading the slave device and manually adding a slave device with that > family id. Even then I didn't reproduce the reported recursive > locking error. I saw unrelated locking reports, but not this one. I > wrote up the included patch, but that undoes the notify changes that > you added earlier in commit 47eba33a0997fc7, and I wanted to ask about > that commit. Specifically these two lines, > > err = device_register(&sl->dev); > ... > + dev_set_uevent_suppress(&sl->dev, false); > + kobject_uevent(&sl->dev.kobj, KOBJ_ADD); > > Wouldn't the default be to not suppress? Nothing in the W1 system > enables suppressing so is that even needed? (I'm fine with saying > it's a good idea). > device_register at some point must call device_add and device_add > calls kobject_uevent(&dev->kobj, KOBJ_ADD); > so doesn't the KOBJ_ADD send the add a second time? As in it > shouldn't be needed? > Can the suppress be called before device_register to avoid the > automatic notify, then after it returns setup the slave device as this > patch does to avoid this problem report, and then call the KOBJ_ADD to > make everything happy? > > I'm based on commit ce7613db2d8d4d5af2587a. > > On Tue, Apr 08, 2014 at 09:47:50PM +0200, Belisko Marek wrote: >> Hi, >> >> On Mon, Mar 24, 2014 at 1:01 PM, Evgeniy Polyakov <zbr@ioremap.net> wrote: >>> Hi everyone >>> >>> 24.03.2014, 01:50, "David Fries" <David@Fries.net>: >>>> On Mon, Mar 17, 2014 at 10:38:13PM +0100, Belisko Marek wrote: >>>>> booting latest 3.14-rc7 on gta04 board gives following warning: >>>>> >>>>> [ 3.101409] Driver for 1-wire Dallas network protocol. >>>>> [ 3.109649] omap_hdq omap_hdq: OMAP HDQ Hardware Rev 0.5. Driver in >>>>> [ 3.400146] CPU0 >>>>> [ 3.402709] ---- >>>>> [ 3.405273] lock(&(&priv->bus_notifier)->rwsem); >>>>> [ 3.410308] lock(&(&priv->bus_notifier)->rwsem); >>>>> [ 3.415374] >>>>> [ 3.415374] *** DEADLOCK *** >>>>> >>>>> AFAIK we didn't see that on (at least 3/14-rc2). >>>> >>>> The last drivers/w1 change went in before v3.14-rc1, so if it is >>>> something in the w1 code, it has either been there or something else >>>> in the rest of the kernel changed to expose it. I'm using the ds2490 >>>> USB w1 dongle, I didn't see that problem when enabling lock debugging >>>> on 3.14.0-rc5+ which includes some changes I'm working on. >>>> >>>> https://github.com/dfries/linux.git w1_rework >>> >>> Can it be a bug in omap w1 driver? >> We test with actual Linus master and it's even worse (it hangs completely). >> Trace look little bit different and it only happens when booting via >> device tree (not with board file). >> >> [ 2.807434] Driver for 1-wire Dallas network protocol. >> [ 2.814117] omap_hdq omap_hdq: OMAP HDQ Hardware Rev 0.5. Driver in >> Interrupt mode >> [ 3.034271] call_modprobe: w1-family-0x1 >> [ 3.051818] >> [ 3.053405] ============================================= >> [ 3.059051] [ INFO: possible recursive locking detected ] >> [ 3.064727] 3.14.0-gta04 #556 Not tainted >> [ 3.068939] --------------------------------------------- >> [ 3.074615] w1_bus_master1/795 is trying to acquire lock: >> [ 3.080261] (&(&priv->bus_notifier)->rwsem){.+.+.+}, at: >> [<c0058ab0>] __blocking_notifier_call_chain+0x28/0x58 >> [ 3.090911] >> [ 3.090911] but task is already holding lock: >> [ 3.097045] (&(&priv->bus_notifier)->rwsem){.+.+.+}, at: >> [<c0058ab0>] __blocking_notifier_call_chain+0x28/0x58 >> [ 3.107666] >> [ 3.107666] other info that might help us debug this: >> [ 3.114501] Possible unsafe locking scenario: >> [ 3.114501] >> [ 3.120727] CPU0 >> [ 3.123291] ---- >> [ 3.125854] lock(&(&priv->bus_notifier)->rwsem); >> [ 3.130889] lock(&(&priv->bus_notifier)->rwsem); >> [ 3.135925] >> [ 3.135925] *** DEADLOCK *** >> [ 3.135925] >> [ 3.142150] May be due to missing lock nesting notation >> [ 3.142150] >> [ 3.149291] 2 locks held by w1_bus_master1/795: >> [ 3.154052] #0: (&dev->mutex#3){+.+.+.}, at: [<c03ba1c8>] >> w1_attach_slave_device+0xc4/0x1c8 >> [ 3.163055] #1: (&(&priv->bus_notifier)->rwsem){.+.+.+}, at: >> [<c0058ab0>] __blocking_notifier_call_chain+0x28/0x58 >> [ 3.174163] >> [ 3.174163] stack backtrace: >> [ 3.178741] CPU: 0 PID: 795 Comm: w1_bus_master1 Not tainted >> 3.14.0-gta04 #556 >> [ 3.186370] [<c0013ca0>] (unwind_backtrace) from [<c0010f68>] >> (show_stack+0x10/0x14) >> [ 3.194488] [<c0010f68>] (show_stack) from [<c04f0900>] >> (dump_stack+0x6c/0x88) >> [ 3.202117] [<c04f0900>] (dump_stack) from [<c00746bc>] >> (print_deadlock_bug+0xc0/0xf0) >> [ 3.210449] [<c00746bc>] (print_deadlock_bug) from [<c0075f80>] >> (validate_chain.isra.29+0x4dc/0x534) >> [ 3.220031] [<c0075f80>] (validate_chain.isra.29) from [<c0076dac>] >> (__lock_acquire+0x728/0x834) >> [ 3.229248] [<c0076dac>] (__lock_acquire) from [<c00774b8>] >> (lock_acquire+0xf4/0x118) >> [ 3.237487] [<c00774b8>] (lock_acquire) from [<c04f8e04>] >> (down_read+0x24/0x34) >> [ 3.245178] [<c04f8e04>] (down_read) from [<c0058ab0>] >> (__blocking_notifier_call_chain+0x28/0x58) >> [ 3.254516] [<c0058ab0>] (__blocking_notifier_call_chain) from >> [<c0058af4>] (blocking_notifier_call_chain+0x14/0x18) >> [ 3.265563] [<c0058af4>] (blocking_notifier_call_chain) from >> [<c0301db8>] (device_add+0x1f4/0x368) >> [ 3.274993] [<c0301db8>] (device_add) from [<c03055cc>] >> (platform_device_add+0x138/0x1c8) >> [ 3.283569] [<c03055cc>] (platform_device_add) from [<c03bd124>] >> (w1_bq27000_add_slave+0x44/0x74) >> [ 3.292907] [<c03bd124>] (w1_bq27000_add_slave) from [<c03b9ed8>] >> (w1_bus_notify.part.0+0x48/0xc8) >> [ 3.302337] [<c03b9ed8>] (w1_bus_notify.part.0) from [<c04fbe1c>] >> (notifier_call_chain+0x38/0x68) >> [ 3.311645] [<c04fbe1c>] (notifier_call_chain) from [<c0058acc>] >> (__blocking_notifier_call_chain+0x44/0x58) >> [ 3.321899] [<c0058acc>] (__blocking_notifier_call_chain) from >> [<c0058af4>] (blocking_notifier_call_chain+0x14/0x18) >> [ 3.332946] [<c0058af4>] (blocking_notifier_call_chain) from >> [<c0301db8>] (device_add+0x1f4/0x368) >> [ 3.342346] [<c0301db8>] (device_add) from [<c03b9ad0>] >> (__w1_attach_slave_device+0x9c/0x134) >> [ 3.351318] [<c03b9ad0>] (__w1_attach_slave_device) from >> [<c03ba238>] (w1_attach_slave_device+0x134/0x1c8) >> [ 3.361480] [<c03ba238>] (w1_attach_slave_device) from [<c03ba348>] >> (w1_slave_found+0x7c/0x98) >> [ 3.370513] [<c03ba348>] (w1_slave_found) from [<c03bc818>] >> (omap_w1_search_bus+0x54/0x5c) >> [ 3.379211] [<c03bc818>] (omap_w1_search_bus) from [<c03bc758>] >> (w1_search_devices+0x3c/0x48) >> [ 3.388153] [<c03bc758>] (w1_search_devices) from [<c03baaec>] >> (w1_search_process_cb+0x64/0x108) >> [ 3.397399] [<c03baaec>] (w1_search_process_cb) from [<c03bac60>] >> (w1_process+0x68/0x14c) >> [ 3.405975] [<c03bac60>] (w1_process) from [<c0054bdc>] (kthread+0xdc/0xf0) >> [ 3.413299] [<c0054bdc>] (kthread) from [<c000dca8>] >> (ret_from_fork+0x14/0x2c) >> [ 24.451049] INFO: rcu_sched detected stalls on CPUs/tasks: {} >> (detected by 0, t=2102 jiffies, g=4294967086, c=4294967085, q=19) >> [ 24.463134] INFO: Stall ended before state dump start >> [ 87.501037] INFO: rcu_sched detected stalls on CPUs/tasks: {} >> (detected by 0, t=8407 jiffies, g=4294967086, c=4294967085, q=19) >> [ 87.513122] INFO: Stall ended before state dump start >> [ 150.551055] INFO: rcu_sched detected stalls on CPUs/tasks: {} >> (detected by 0, t=14712 jiffies, g=4294967086, c=4294967085, q=19) >> [ 150.563232] INFO: Stall ended before state dump start >> [ 213.601043] INFO: rcu_sched detected stalls on CPUs/tasks: {} >> (detected by 0, t=21017 jiffies, g=4294967086, c=4294967085, q=19) >> [ 213.613220] INFO: Stall ended before state dump start >> [ 276.651062] INFO: rcu_sched detected stalls on CPUs/tasks: {} >> (detected by 0, t=27322 jiffies, g=4294967086, c=4294967085, q=19) >> [ 276.663238] INFO: Stall ended before state dump start >> >> BR, >> >> marek >> >> -- >> as simple and primitive as possible >> ------------------------------------------------- >> Marek Belisko - OPEN-NANDRA >> Freelance Developer >> >> Ruska Nova Ves 219 | Presov, 08005 Slovak Republic >> Tel: +421 915 052 184 >> skype: marekwhite >> twitter: #opennandra >> web: http://open-nandra.com >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Please read the FAQ at http://www.tux.org/lkml/ > > -- > David Fries <david@fries.net> PGP pub CB1EE8F0 > http://fries.net/~david/ > > > commit b3afc24a3ca6c261a8fe2c08b5bbccb51233c09a > Author: David Fries <David@Fries.net> > Date: Sun Apr 13 15:36:45 2014 -0500 > > w1: avoid recursive device_add > > __w1_attach_slave_device calls device_add which calls w1_bus_notify > which calls for the w1_bq27000 slave driver, which calls > platform_device_add and device_add and deadlocks on getting > &(&priv->bus_notifier)->rwsem as it is still held in the previous > device_add. This avoids the problem by processing the family > add/remove outside of the slave device_add call. This does mean that > the slave device isn't setup when the notification that a slave is > available goes out. > > Reported-by: Belisko Marek <marek.belisko@gmail.com> > > diff --git a/drivers/w1/w1.c b/drivers/w1/w1.c > index b96f61b..dde09c1 100644 > --- a/drivers/w1/w1.c > +++ b/drivers/w1/w1.c > @@ -614,27 +614,11 @@ end: > return err; > } > > -/* > - * Handle sysfs file creation and removal here, before userspace is told that > - * the device is added / removed from the system > - */ > -static int w1_bus_notify(struct notifier_block *nb, unsigned long action, > - void *data) > +static int w1_family_notify(unsigned long action, struct w1_slave *sl) > { > - struct device *dev = data; > - struct w1_slave *sl; > struct w1_family_ops *fops; > int err; > > - /* > - * Only care about slave devices at the moment. Yes, we should use a > - * separate "type" for this, but for now, look at the release function > - * to know which type it is... > - */ > - if (dev->release != w1_slave_release) > - return 0; > - > - sl = dev_to_w1_slave(dev); > fops = sl->family->fops; > > if (!fops) > @@ -673,10 +657,6 @@ static int w1_bus_notify(struct notifier_block *nb, unsigned long action, > return 0; > } > > -static struct notifier_block w1_bus_nb = { > - .notifier_call = w1_bus_notify, > -}; > - > static int __w1_attach_slave_device(struct w1_slave *sl) > { > int err; > @@ -705,6 +685,7 @@ static int __w1_attach_slave_device(struct w1_slave *sl) > dev_name(&sl->dev), err); > return err; > } > + w1_family_notify(BUS_NOTIFY_ADD_DEVICE, sl); > > > dev_set_uevent_suppress(&sl->dev, false); > @@ -799,6 +780,7 @@ int w1_unref_slave(struct w1_slave *sl) > msg.type = W1_SLAVE_REMOVE; > w1_netlink_send(sl->master, &msg); > > + w1_family_notify(BUS_NOTIFY_DEL_DEVICE, sl); > device_unregister(&sl->dev); > #ifdef DEBUG > memset(sl, 0, sizeof(*sl)); > @@ -1186,10 +1168,6 @@ static int __init w1_init(void) > goto err_out_exit_init; > } > > - retval = bus_register_notifier(&w1_bus_type, &w1_bus_nb); > - if (retval) > - goto err_out_bus_unregister; > - > retval = driver_register(&w1_master_driver); > if (retval) { > printk(KERN_ERR ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: w1: 3.14-rc7 - possible recursive locking detected 2014-04-13 22:24 ` David Fries 2014-04-14 13:10 ` Dr. H. Nikolaus Schaller @ 2014-04-14 15:04 ` Greg Kroah-Hartman 2014-04-16 6:21 ` [PATCH] w1: avoid recursive device_add David Fries 2 siblings, 0 replies; 9+ messages in thread From: Greg Kroah-Hartman @ 2014-04-14 15:04 UTC (permalink / raw) To: David Fries Cc: Belisko Marek, Evgeniy Polyakov, LKML, Dr. H. Nikolaus Schaller On Sun, Apr 13, 2014 at 05:24:01PM -0500, David Fries wrote: > Belisko Marek, > Here is a possible solution, could you give it a try and report back? > > Greg Kroah-Hartman, > Evgeniy asked me to look into this report. I don't have the > reporter's hardware configuration, but I wouldn't think that would be > needed, just some w1 bus master (even W1_MASTER_GPIO might work), then > loading the slave device and manually adding a slave device with that > family id. Even then I didn't reproduce the reported recursive > locking error. I saw unrelated locking reports, but not this one. I > wrote up the included patch, but that undoes the notify changes that > you added earlier in commit 47eba33a0997fc7, and I wanted to ask about > that commit. Specifically these two lines, > > err = device_register(&sl->dev); > ... > + dev_set_uevent_suppress(&sl->dev, false); > + kobject_uevent(&sl->dev.kobj, KOBJ_ADD); > > Wouldn't the default be to not suppress? Nothing in the W1 system > enables suppressing so is that even needed? (I'm fine with saying > it's a good idea). > device_register at some point must call device_add and device_add > calls kobject_uevent(&dev->kobj, KOBJ_ADD); > so doesn't the KOBJ_ADD send the add a second time? As in it > shouldn't be needed? > Can the suppress be called before device_register to avoid the > automatic notify, then after it returns setup the slave device as this > patch does to avoid this problem report, and then call the KOBJ_ADD to > make everything happy? I really have no idea, if your fix resolves an issue, that's great, I'll be glad to take it. I have no w1 devices to test any of this, and don't even remember writing that kernel patch :) thanks, greg k-h ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] w1: avoid recursive device_add 2014-04-13 22:24 ` David Fries 2014-04-14 13:10 ` Dr. H. Nikolaus Schaller 2014-04-14 15:04 ` Greg Kroah-Hartman @ 2014-04-16 6:21 ` David Fries 2014-04-16 20:46 ` Dr. H. Nikolaus Schaller 2 siblings, 1 reply; 9+ messages in thread From: David Fries @ 2014-04-16 6:21 UTC (permalink / raw) To: Evgeniy Polyakov, Greg Kroah-Hartman Cc: Belisko Marek, LKML, Dr. H. Nikolaus Schaller __w1_attach_slave_device calls device_add which calls w1_bus_notify which calls the w1_bq27000 slave driver, which calls platform_device_add and device_add and deadlocks on getting &(&priv->bus_notifier)->rwsem as it is still held in the previous device_add. This avoids the problem by processing the family add/remove outside of the slave device_add call. Commit 47eba33a0997fc7362a introduced this deadlock and added a KOBJ_ADD, as the add was already reported in device_register two add events were being sent. This change suppresses the device_register add so that any slave device sysfs entries are setup before the add goes out. Belisko Marek reported this change fixed the deadlock he was seeing on ARM device tree, while testing on my x86-64 system never saw the deadlock. Reported-by: Belisko Marek <marek.belisko@gmail.com> Signed-off-by: David Fries <David@Fries.net> --- Evgeniy any comments? The patch looks good to me. Dr. H. Nikolaus Schaller, Can you give this patch another test? I don't expect any problems, but it is now only sending one add instead of two which is a change. I verified two KOBJ_ADD notifications were going out prior to this change, and only one after. There should only be one, also verified that the w1_bq27000 add notification went out before the slave device id, showing the suppression was working. Used udevadm monitor --kernel --property for notification monitoring. drivers/w1/w1.c | 32 ++++++-------------------------- 1 file changed, 6 insertions(+), 26 deletions(-) diff --git a/drivers/w1/w1.c b/drivers/w1/w1.c index b96f61b..ff52618 100644 --- a/drivers/w1/w1.c +++ b/drivers/w1/w1.c @@ -614,27 +614,11 @@ end: return err; } -/* - * Handle sysfs file creation and removal here, before userspace is told that - * the device is added / removed from the system - */ -static int w1_bus_notify(struct notifier_block *nb, unsigned long action, - void *data) +static int w1_family_notify(unsigned long action, struct w1_slave *sl) { - struct device *dev = data; - struct w1_slave *sl; struct w1_family_ops *fops; int err; - /* - * Only care about slave devices at the moment. Yes, we should use a - * separate "type" for this, but for now, look at the release function - * to know which type it is... - */ - if (dev->release != w1_slave_release) - return 0; - - sl = dev_to_w1_slave(dev); fops = sl->family->fops; if (!fops) @@ -673,10 +657,6 @@ static int w1_bus_notify(struct notifier_block *nb, unsigned long action, return 0; } -static struct notifier_block w1_bus_nb = { - .notifier_call = w1_bus_notify, -}; - static int __w1_attach_slave_device(struct w1_slave *sl) { int err; @@ -698,6 +678,9 @@ static int __w1_attach_slave_device(struct w1_slave *sl) dev_dbg(&sl->dev, "%s: registering %s as %p.\n", __func__, dev_name(&sl->dev), sl); + /* suppress for w1_family_notify before sending KOBJ_ADD */ + dev_set_uevent_suppress(&sl->dev, true); + err = device_register(&sl->dev); if (err < 0) { dev_err(&sl->dev, @@ -705,7 +688,7 @@ static int __w1_attach_slave_device(struct w1_slave *sl) dev_name(&sl->dev), err); return err; } - + w1_family_notify(BUS_NOTIFY_ADD_DEVICE, sl); dev_set_uevent_suppress(&sl->dev, false); kobject_uevent(&sl->dev.kobj, KOBJ_ADD); @@ -799,6 +782,7 @@ int w1_unref_slave(struct w1_slave *sl) msg.type = W1_SLAVE_REMOVE; w1_netlink_send(sl->master, &msg); + w1_family_notify(BUS_NOTIFY_DEL_DEVICE, sl); device_unregister(&sl->dev); #ifdef DEBUG memset(sl, 0, sizeof(*sl)); @@ -1186,10 +1170,6 @@ static int __init w1_init(void) goto err_out_exit_init; } - retval = bus_register_notifier(&w1_bus_type, &w1_bus_nb); - if (retval) - goto err_out_bus_unregister; - retval = driver_register(&w1_master_driver); if (retval) { printk(KERN_ERR -- 1.7.10.4 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] w1: avoid recursive device_add 2014-04-16 6:21 ` [PATCH] w1: avoid recursive device_add David Fries @ 2014-04-16 20:46 ` Dr. H. Nikolaus Schaller 0 siblings, 0 replies; 9+ messages in thread From: Dr. H. Nikolaus Schaller @ 2014-04-16 20:46 UTC (permalink / raw) To: David Fries; +Cc: Evgeniy Polyakov, Greg Kroah-Hartman, Belisko Marek, LKML Hi David, Am 16.04.2014 um 08:21 schrieb David Fries: > __w1_attach_slave_device calls device_add which calls w1_bus_notify > which calls the w1_bq27000 slave driver, which calls > platform_device_add and device_add and deadlocks on getting > &(&priv->bus_notifier)->rwsem as it is still held in the previous > device_add. This avoids the problem by processing the family > add/remove outside of the slave device_add call. > > Commit 47eba33a0997fc7362a introduced this deadlock and added > a KOBJ_ADD, as the add was already reported in device_register two add > events were being sent. This change suppresses the device_register > add so that any slave device sysfs entries are setup before the add > goes out. > > Belisko Marek reported this change fixed the deadlock he was seeing on > ARM device tree, while testing on my x86-64 system never saw the > deadlock. > > Reported-by: Belisko Marek <marek.belisko@gmail.com> > Signed-off-by: David Fries <David@Fries.net> > --- > Evgeniy any comments? The patch looks good to me. > Dr. H. Nikolaus Schaller, Can you give this patch another test? I > don't expect any problems, but it is now only sending one add instead > of two which is a change. It merges fine and we don't see the problematic deadlock. BR, Nikolaus > I verified two KOBJ_ADD notifications were going out prior to this > change, and only one after. There should only be one, also verified > that the w1_bq27000 add notification went out before the slave device > id, showing the suppression was working. Used > udevadm monitor --kernel --property > for notification monitoring. > > drivers/w1/w1.c | 32 ++++++-------------------------- > 1 file changed, 6 insertions(+), 26 deletions(-) > > diff --git a/drivers/w1/w1.c b/drivers/w1/w1.c > index b96f61b..ff52618 100644 > --- a/drivers/w1/w1.c > +++ b/drivers/w1/w1.c > @@ -614,27 +614,11 @@ end: > return err; > } > > -/* > - * Handle sysfs file creation and removal here, before userspace is told that > - * the device is added / removed from the system > - */ > -static int w1_bus_notify(struct notifier_block *nb, unsigned long action, > - void *data) > +static int w1_family_notify(unsigned long action, struct w1_slave *sl) > { > - struct device *dev = data; > - struct w1_slave *sl; > struct w1_family_ops *fops; > int err; > > - /* > - * Only care about slave devices at the moment. Yes, we should use a > - * separate "type" for this, but for now, look at the release function > - * to know which type it is... > - */ > - if (dev->release != w1_slave_release) > - return 0; > - > - sl = dev_to_w1_slave(dev); > fops = sl->family->fops; > > if (!fops) > @@ -673,10 +657,6 @@ static int w1_bus_notify(struct notifier_block *nb, unsigned long action, > return 0; > } > > -static struct notifier_block w1_bus_nb = { > - .notifier_call = w1_bus_notify, > -}; > - > static int __w1_attach_slave_device(struct w1_slave *sl) > { > int err; > @@ -698,6 +678,9 @@ static int __w1_attach_slave_device(struct w1_slave *sl) > dev_dbg(&sl->dev, "%s: registering %s as %p.\n", __func__, > dev_name(&sl->dev), sl); > > + /* suppress for w1_family_notify before sending KOBJ_ADD */ > + dev_set_uevent_suppress(&sl->dev, true); > + > err = device_register(&sl->dev); > if (err < 0) { > dev_err(&sl->dev, > @@ -705,7 +688,7 @@ static int __w1_attach_slave_device(struct w1_slave *sl) > dev_name(&sl->dev), err); > return err; > } > - > + w1_family_notify(BUS_NOTIFY_ADD_DEVICE, sl); > > dev_set_uevent_suppress(&sl->dev, false); > kobject_uevent(&sl->dev.kobj, KOBJ_ADD); > @@ -799,6 +782,7 @@ int w1_unref_slave(struct w1_slave *sl) > msg.type = W1_SLAVE_REMOVE; > w1_netlink_send(sl->master, &msg); > > + w1_family_notify(BUS_NOTIFY_DEL_DEVICE, sl); > device_unregister(&sl->dev); > #ifdef DEBUG > memset(sl, 0, sizeof(*sl)); > @@ -1186,10 +1170,6 @@ static int __init w1_init(void) > goto err_out_exit_init; > } > > - retval = bus_register_notifier(&w1_bus_type, &w1_bus_nb); > - if (retval) > - goto err_out_bus_unregister; > - > retval = driver_register(&w1_master_driver); > if (retval) { > printk(KERN_ERR > -- > 1.7.10.4 ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2014-04-16 20:46 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-17 21:38 w1: 3.14-rc7 - possible recursive locking detected Belisko Marek
2014-03-23 21:50 ` David Fries
2014-03-24 5:55 ` Dr. H. Nikolaus Schaller
[not found] ` <698551395662480@web15j.yandex.ru>
2014-04-08 19:47 ` Belisko Marek
2014-04-13 22:24 ` David Fries
2014-04-14 13:10 ` Dr. H. Nikolaus Schaller
2014-04-14 15:04 ` Greg Kroah-Hartman
2014-04-16 6:21 ` [PATCH] w1: avoid recursive device_add David Fries
2014-04-16 20:46 ` Dr. H. Nikolaus Schaller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox