From mboxrd@z Thu Jan 1 00:00:00 1970 From: Denys Fedoryshchenko Subject: Re: [PATCH] net/802/mrp: fix lockdep splat Date: Mon, 13 May 2013 15:53:30 +0300 Message-ID: <16b0a5ee4602cb9005dd67ea7ad0d4e8@visp.net.lb> References: <6901a7785249816fe13a98e3c0b35127@visp.net.lb> <1368447851.13473.119.camel@edumazet-glaptop> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: Cong Wang , David Ward , hayeswang , Francois Romieu , netdev@vger.kernel.org To: Eric Dumazet Return-path: Received: from hosting.visp.net.lb ([194.146.153.11]:52386 "EHLO hosting.visp.net.lb" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751502Ab3EMMxd (ORCPT ); Mon, 13 May 2013 08:53:33 -0400 In-Reply-To: <1368447851.13473.119.camel@edumazet-glaptop> Sender: netdev-owner@vger.kernel.org List-ID: Hi Thank you for superfast answer :) It is changed a bit, but not disappeared. Relevant part from new dmesg: [ 3.936191] ================================= [ 3.936279] [ INFO: inconsistent lock state ] [ 3.936367] 3.9.2-build-0064 #6 Not tainted [ 3.936453] --------------------------------- [ 3.936540] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. [ 3.936630] rmmod/1843 [HC0[0]:SC0[0]:HE1:SE1] takes: [ 3.936718] (&(&app->lock)->rlock#2){+.?...}, at: [] mrp_uninit_applicant+0x69/0xba [mrp] [ 3.936967] {IN-SOFTIRQ-W} state was registered at: [ 3.937055] [] __lock_acquire+0x284/0xd52 [ 3.937058] [] lock_acquire+0x71/0x85 [ 3.937058] [] _raw_spin_lock_bh+0x38/0x45 [ 3.937058] [] mrp_join_timer+0x11/0x3d [mrp] [ 3.937058] [] call_timer_fn+0x69/0xdc [ 3.937058] [] run_timer_softirq+0x171/0x1ab [ 3.937058] [] __do_softirq+0x98/0x153 [ 3.937058] [] irq_exit+0x47/0x82 [ 3.937058] [] smp_apic_timer_interrupt+0x64/0x71 [ 3.937058] [] apic_timer_interrupt+0x32/0x38 [ 3.937058] [] mutex_lock_nested+0x20/0x22 [ 3.937058] [] unlink_file_vma+0x25/0x75 [ 3.937058] [] free_pgtables+0x55/0x99 [ 3.937058] [] exit_mmap+0x83/0xd5 [ 3.937058] [] mmput+0x4c/0xc4 [ 3.937058] [] do_exit+0x2ac/0x730 [ 3.937058] [] do_group_exit+0x60/0x83 [ 3.937058] [] sys_exit_group+0x13/0x17 [ 3.937058] [] syscall_call+0x7/0xb [ 3.937058] irq event stamp: 4683 [ 3.937058] hardirqs last enabled at (4683): [] _raw_spin_unlock_irqrestore+0x36/0x3c [ 3.937058] hardirqs last disabled at (4682): [] _raw_spin_lock_irqsave+0x16/0x50 [ 3.937058] softirqs last enabled at (4646): [] neigh_ifdown+0x95/0xb5 [ 3.937058] softirqs last disabled at (4644): [] _raw_write_lock_bh+0xe/0x45 [ 3.937058] [ 3.937058] other info that might help us debug this: [ 3.937058] Possible unsafe locking scenario: [ 3.937058] [ 3.937058] CPU0 [ 3.937058] ---- [ 3.937058] lock(&(&app->lock)->rlock#2); [ 3.937058] [ 3.937058] lock(&(&app->lock)->rlock#2); [ 3.937058] [ 3.937058] *** DEADLOCK *** [ 3.937058] [ 3.937058] 3 locks held by rmmod/1843: [ 3.937058] #0: (&__lockdep_no_validate__){......}, at: [] device_lock+0xd/0xf [ 3.937058] #1: (&__lockdep_no_validate__){......}, at: [] device_lock+0xd/0xf [ 3.937058] #2: (rtnl_mutex){+.+.+.}, at: [] rtnl_lock+0xf/0x11 [ 3.937058] [ 3.937058] stack backtrace: [ 3.937058] Pid: 1843, comm: rmmod Not tainted 3.9.2-build-0064 #6 [ 3.937058] Call Trace: [ 3.937058] [] valid_state+0x1f6/0x201 [ 3.937058] [] mark_lock+0xd1/0x1bb [ 3.937058] [] ? check_usage_forwards+0x77/0x77 [ 3.937058] [] __lock_acquire+0x2f8/0xd52 [ 3.937058] [] ? valid_state+0x168/0x201 [ 3.937058] [] ? mark_lock+0x26/0x1bb [ 3.937058] [] ? mark_held_locks+0x5d/0x7b [ 3.937058] [] ? _raw_spin_unlock_irqrestore+0x36/0x3c [ 3.937058] [] lock_acquire+0x71/0x85 [ 3.937058] [] ? mrp_uninit_applicant+0x69/0xba [mrp] [ 3.937058] [] _raw_spin_lock+0x33/0x40 [ 3.937058] [] ? mrp_uninit_applicant+0x69/0xba [mrp] [ 3.937058] [] mrp_uninit_applicant+0x69/0xba [mrp] [ 3.937058] [] ? kfree_call_rcu+0x17/0x19 [ 3.937058] [] vlan_mvrp_uninit_applicant+0xd/0xf [8021q] [ 3.937058] [] unregister_vlan_dev+0xb8/0xe2 [8021q] [ 3.937058] [] ? packet_mm_close+0x1f/0x1f [ 3.937058] [] vlan_device_event+0x307/0x35a [8021q] [ 3.937058] [] ? rcu_read_unlock+0x17/0x19 [ 3.937058] [] notifier_call_chain+0x26/0x48 [ 3.937058] [] raw_notifier_call_chain+0x1a/0x1c [ 3.937058] [] call_netdevice_notifiers+0x44/0x4b [ 3.937058] [] rollback_registered_many+0xe2/0x185 [ 3.937058] [] rollback_registered+0x1f/0x2c [ 3.937058] [] unregister_netdevice_queue+0x5d/0x87 [ 3.937058] [] ? mutex_lock_nested+0x20/0x22 [ 3.937058] [] ? rtnl_lock+0xf/0x11 [ 3.937058] [] unregister_netdev+0x18/0x1f [ 3.937058] [] rtl_remove_one+0x7f/0xf9 [r8169] [ 3.937058] [] ? __pm_runtime_resume+0x48/0x50 [ 3.937058] [] pci_device_remove+0x27/0x76 [ 3.937058] [] __device_release_driver+0x66/0xaa [ 3.937058] [] driver_detach+0x62/0x83 [ 3.937058] [] bus_remove_driver+0x69/0x88 [ 3.937058] [] driver_unregister+0x53/0x5a [ 3.937058] [] ? trace_hardirqs_on_caller+0x10e/0x13f [ 3.937058] [] pci_unregister_driver+0x10/0x5a [ 3.937058] [] rtl8169_pci_driver_exit+0xd/0xf [r8169] [ 3.937058] [] sys_delete_module+0x175/0x1c1 [ 3.937058] [] ? restore_all+0xf/0xf [ 3.937058] [] syscall_call+0x7/0xb On 2013-05-13 15:24, Eric Dumazet wrote: > From: Eric Dumazet > > On Mon, 2013-05-13 at 14:59 +0300, Denys Fedoryshchenko wrote: > Hi > > Noticed on 32-builds of my systems that r8169 is not working anymore at > all. > Problem started to occur somewhere after 3.6 kernels, but it is > difficult to try other kernels on this semi-embedded system, and i am > not completely sure about that. > Once i used another OS build, but 64-bit build with same latest kernel, > and it worked. > It spits netdev watchdog timeouts, and constantly shows interface > coming > up. > > Here is max debug output(dmesg) i got, with some inline comments: > > > Here i am trying to remove module > > [ 19.735147] ================================= > [ 19.735235] [ INFO: inconsistent lock state ] > [ 19.735324] 3.9.2-build-0063 #4 Not tainted > [ 19.735412] --------------------------------- > [ 19.735500] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. > [ 19.735592] rmmod/1840 [HC0[0]:SC0[0]:HE1:SE1] takes: > [ 19.735682] (&(&app->lock)->rlock#2){+.?...}, at: [] > mrp_uninit_applicant+0x69/0xba [mrp] > [ 19.735932] {IN-SOFTIRQ-W} state was registered at: > [ 19.736002] [] __lock_acquire+0x284/0xd52 > [ 19.736002] [] lock_acquire+0x71/0x85 > [ 19.736002] [] _raw_spin_lock+0x33/0x40 > [ 19.736002] [] mrp_join_timer+0x11/0x3d [mrp] > [ 19.736002] [] call_timer_fn+0x69/0xdc > [ 19.736002] [] run_timer_softirq+0x171/0x1ab > [ 19.736002] [] __do_softirq+0x98/0x153 > [ 19.736002] [] irq_exit+0x47/0x82 > [ 19.736002] [] smp_apic_timer_interrupt+0x64/0x71 > [ 19.736002] [] apic_timer_interrupt+0x32/0x38 > [ 19.736002] [] cpu_idle+0x55/0x6f > [ 19.736002] [] rest_init+0xa1/0xa7 > [ 19.736002] [] start_kernel+0x300/0x307 > [ 19.736002] [] i386_start_kernel+0x79/0x7d > [ 19.736002] irq event stamp: 4379 > [ 19.736002] hardirqs last enabled at (4379): [] > _raw_spin_unlock_irqrestore+0x36/0x3c > [ 19.736002] hardirqs last disabled at (4378): [] > _raw_spin_lock_irqsave+0x16/0x50 > [ 19.736002] softirqs last enabled at (4342): [] > neigh_ifdown+0x95/0xb5 > [ 19.736002] softirqs last disabled at (4340): [] > _raw_write_lock_bh+0xe/0x45 > [ 19.736002] > [ 19.736002] other info that might help us debug this: > [ 19.736002] Possible unsafe locking scenario: > [ 19.736002] > [ 19.736002] CPU0 > [ 19.736002] ---- > [ 19.736002] lock(&(&app->lock)->rlock#2); > [ 19.736002] > [ 19.736002] lock(&(&app->lock)->rlock#2); > [ 19.736002] > [ 19.736002] *** DEADLOCK *** > [ 19.736002] > [ 19.736002] 3 locks held by rmmod/1840: > [ 19.736002] #0: (&__lockdep_no_validate__){......}, at: > [] device_lock+0xd/0xf > [ 19.736002] #1: (&__lockdep_no_validate__){......}, at: > [] device_lock+0xd/0xf > [ 19.736002] #2: (rtnl_mutex){+.+.+.}, at: [] > rtnl_lock+0xf/0x11 > [ 19.736002] > [ 19.736002] stack backtrace: > [ 19.736002] Pid: 1840, comm: rmmod Not tainted 3.9.2-build-0063 #4 > [ 19.736002] Call Trace: > [ 19.736002] [] valid_state+0x1f6/0x201 > [ 19.736002] [] mark_lock+0xd1/0x1bb > [ 19.736002] [] ? check_usage_forwards+0x77/0x77 > [ 19.736002] [] __lock_acquire+0x2f8/0xd52 > [ 19.736002] [] ? valid_state+0x168/0x201 > [ 19.736002] [] ? mark_lock+0x26/0x1bb > [ 19.736002] [] ? mark_held_locks+0x5d/0x7b > [ 19.736002] [] ? _raw_spin_unlock_irqrestore+0x36/0x3c > [ 19.736002] [] lock_acquire+0x71/0x85 > [ 19.736002] [] ? mrp_uninit_applicant+0x69/0xba [mrp] > [ 19.736002] [] _raw_spin_lock+0x33/0x40 > [ 19.736002] [] ? mrp_uninit_applicant+0x69/0xba [mrp] > [ 19.736002] [] mrp_uninit_applicant+0x69/0xba [mrp] > [ 19.736002] [] ? kfree_call_rcu+0x17/0x19 > [ 19.736002] [] vlan_mvrp_uninit_applicant+0xd/0xf [8021q] > [ 19.736002] [] unregister_vlan_dev+0xb8/0xe2 [8021q] > [ 19.736002] [] ? packet_mm_close+0x1f/0x1f > [ 19.736002] [] vlan_device_event+0x307/0x35a [8021q] > [ 19.736002] [] ? rcu_read_unlock+0x17/0x19 > [ 19.736002] [] notifier_call_chain+0x26/0x48 > [ 19.736002] [] raw_notifier_call_chain+0x1a/0x1c > [ 19.736002] [] call_netdevice_notifiers+0x44/0x4b > [ 19.736002] [] rollback_registered_many+0xe2/0x185 > [ 19.736002] [] rollback_registered+0x1f/0x2c > [ 19.736002] [] unregister_netdevice_queue+0x5d/0x87 > [ 19.736002] [] ? mutex_lock_nested+0x20/0x22 > [ 19.736002] [] ? rtnl_lock+0xf/0x11 > [ 19.736002] [] unregister_netdev+0x18/0x1f > [ 19.736002] [] rtl_remove_one+0x7f/0xf9 [r8169] > [ 19.736002] [] ? __pm_runtime_resume+0x48/0x50 > [ 19.736002] [] pci_device_remove+0x27/0x76 > [ 19.736002] [] __device_release_driver+0x66/0xaa > [ 19.736002] [] driver_detach+0x62/0x83 > [ 19.736002] [] bus_remove_driver+0x69/0x88 > [ 19.736002] [] driver_unregister+0x53/0x5a > [ 19.736002] [] ? trace_hardirqs_on_caller+0x10e/0x13f > [ 19.736002] [] pci_unregister_driver+0x10/0x5a > [ 19.736002] [] rtl8169_pci_driver_exit+0xd/0xf [r8169] > [ 19.736002] [] sys_delete_module+0x175/0x1c1 > [ 19.736002] [] ? restore_all+0xf/0xf > [ 19.736002] [] syscall_call+0x7/0xb > > Thanks for the report. Could you please try the following fix for the > module unload problem. > > [PATCH] net/802/mrp: fix lockdep splat > > commit fb745e9a037895 ("net/802/mrp: fix possible race condition when > calling mrp_pdu_queue()") introduced a lockdep splat. > > [ 19.735147] ================================= > [ 19.735235] [ INFO: inconsistent lock state ] > [ 19.735324] 3.9.2-build-0063 #4 Not tainted > [ 19.735412] --------------------------------- > [ 19.735500] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. > [ 19.735592] rmmod/1840 [HC0[0]:SC0[0]:HE1:SE1] takes: > [ 19.735682] (&(&app->lock)->rlock#2){+.?...}, at: [] > mrp_uninit_applicant+0x69/0xba [mrp] > > app->lock is normally taken under softirq context, so disable BH to > avoid the splat. > > Reported-by: Denys Fedoryshchenko > Signed-off-by: Eric Dumazet > Cc: David Ward > Cc: Cong Wang > --- > net/802/mrp.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/net/802/mrp.c b/net/802/mrp.c > index e085bcc..1eb05d8 100644 > --- a/net/802/mrp.c > +++ b/net/802/mrp.c > @@ -871,10 +871,10 @@ void mrp_uninit_applicant(struct net_device > *dev, struct mrp_application *appl) > */ > del_timer_sync(&app->join_timer); > > - spin_lock(&app->lock); > + spin_lock_bh(&app->lock); > mrp_mad_event(app, MRP_EVENT_TX); > mrp_pdu_queue(app); > - spin_unlock(&app->lock); > + spin_unlock_bh(&app->lock); > > mrp_queue_xmit(app); --- Denys Fedoryshchenko, Network Engineer, Virtual ISP S.A.L.