* circular locking, mirred, 2.6.24.2
@ 2008-02-24 22:20 Denys Fedoryshchenko
2008-02-25 9:56 ` Jarek Poplawski
0 siblings, 1 reply; 47+ messages in thread
From: Denys Fedoryshchenko @ 2008-02-24 22:20 UTC (permalink / raw)
To: netdev
2.6.24.2 with applied patches for printk,softlockup, and patch for htb (as i
understand, they are in 2.6.25 git and it is fixes).
I will send also to private mails QoS rules i am using.
[ 118.840072] =======================================================
[ 118.840158] [ INFO: possible circular locking dependency detected ]
[ 118.840203] 2.6.24.2-build-0022 #7
[ 118.840243] -------------------------------------------------------
[ 118.840288] swapper/0 is trying to acquire lock:
[ 118.840329] (&dev->queue_lock){-+..}, at: [<c0289c11>] dev_queue_xmit
+0x177/0x302
[ 118.840490]
[ 118.840490] but task is already holding lock:
[ 118.840567] (&p->tcfc_lock){-+..}, at: [<f89f0066>] tcf_mirred+0x20/0x180
[act_mirred]
[ 118.840727]
[ 118.840727] which lock already depends on the new lock.
[ 118.840728]
[ 118.840842]
[ 118.840842] the existing dependency chain (in reverse order) is:
[ 118.840921]
[ 118.840921] -> #2 (&p->tcfc_lock){-+..}:
[ 118.841075] [<c0143624>] __lock_acquire+0xa30/0xc19
[ 118.841324] [<c0143887>] lock_acquire+0x7a/0x94
[ 118.841572] [<c02d93f5>] _spin_lock+0x2e/0x58
[ 118.841820] [<f89f0066>] tcf_mirred+0x20/0x180 [act_mirred]
[ 118.842068] [<c0297ec4>] tcf_action_exec+0x44/0x77
[ 118.842344] [<f89ba5d2>] u32_classify+0x119/0x24a [cls_u32]
[ 118.842595] [<c0295ce2>] tc_classify_compat+0x2f/0x5e
[ 118.842845] [<c0295ec3>] tc_classify+0x1a/0x80
[ 118.843092] [<f899c118>] ingress_enqueue+0x1a/0x53 [sch_ingress]
[ 118.843343] [<c0287139>] netif_receive_skb+0x296/0x44c
[ 118.843592] [<f88d1a4e>] e100_poll+0x14b/0x26a [e100]
[ 118.843843] [<c02894bc>] net_rx_action+0xbf/0x201
[ 118.844091] [<c012ac15>] __do_softirq+0x6f/0xe9
[ 118.844343] [<c01078af>] do_softirq+0x61/0xc8
[ 118.844591] [<ffffffff>] 0xffffffff
[ 118.844840]
[ 118.844840] -> #1 (&dev->ingress_lock){-+..}:
[ 118.844993] [<c0143624>] __lock_acquire+0xa30/0xc19
[ 118.845242] [<c0143887>] lock_acquire+0x7a/0x94
[ 118.845489] [<c02d93f5>] _spin_lock+0x2e/0x58
[ 118.845737] [<c0295387>] qdisc_lock_tree+0x1e/0x21
[ 118.845984] [<c02953b6>] dev_init_scheduler+0xb/0x53
[ 118.846235] [<c0288483>] register_netdevice+0x2a3/0x2fd
[ 118.846483] [<c028850f>] register_netdev+0x32/0x3f
[ 118.846730] [<c03ea8ee>] loopback_net_init+0x39/0x6c
[ 118.846980] [<c02858ef>] register_pernet_operations+0x13/0x15
[ 118.847230] [<c0285958>] register_pernet_device+0x1f/0x4c
[ 118.847478] [<c03ea8b3>] loopback_init+0xd/0xf
[ 118.847725] [<c03d64df>] kernel_init+0x155/0x2c6
[ 118.847973] [<c0105bab>] kernel_thread_helper+0x7/0x10
[ 118.848225] [<ffffffff>] 0xffffffff
[ 118.848472]
[ 118.848472] -> #0 (&dev->queue_lock){-+..}:
[ 118.848626] [<c0143514>] __lock_acquire+0x920/0xc19
[ 118.848874] [<c0143887>] lock_acquire+0x7a/0x94
[ 118.849122] [<c02d93f5>] _spin_lock+0x2e/0x58
[ 118.849370] [<c0289c11>] dev_queue_xmit+0x177/0x302
[ 118.849617] [<f89f01a5>] tcf_mirred+0x15f/0x180 [act_mirred]
[ 118.849866] [<c0297ec4>] tcf_action_exec+0x44/0x77
[ 118.850114] [<f89ba5d2>] u32_classify+0x119/0x24a [cls_u32]
[ 118.850366] [<c0295ce2>] tc_classify_compat+0x2f/0x5e
[ 118.850614] [<c0295ec3>] tc_classify+0x1a/0x80
[ 118.850861] [<f899c118>] ingress_enqueue+0x1a/0x53 [sch_ingress]
[ 118.851111] [<c0287139>] netif_receive_skb+0x296/0x44c
[ 118.851360] [<f88d1a4e>] e100_poll+0x14b/0x26a [e100]
[ 118.851612] [<c02894bc>] net_rx_action+0xbf/0x201
[ 118.851859] [<c012ac15>] __do_softirq+0x6f/0xe9
[ 118.852106] [<c01078af>] do_softirq+0x61/0xc8
[ 118.852355] [<ffffffff>] 0xffffffff
[ 118.852602]
[ 118.852602] other info that might help us debug this:
[ 118.852603]
[ 118.852716] 5 locks held by swapper/0:
[ 118.852756] #0: (rcu_read_lock){..--}, at: [<c028944d>] net_rx_action
+0x50/0x201
[ 118.852940] #1: (rcu_read_lock){..--}, at: [<c0286f99>] netif_receive_skb
+0xf6/0x44c
[ 118.853123] #2: (&dev->ingress_lock){-+..}, at: [<c0287125>]
netif_receive_skb+0x282/0x44c
[ 118.853309] #3: (&p->tcfc_lock){-+..}, at: [<f89f0066>] tcf_mirred
+0x20/0x180 [act_mirred]
[ 118.853493] #4: (rcu_read_lock){..--}, at: [<c0289bb7>] dev_queue_xmit
+0x11d/0x302
[ 118.853677]
[ 118.853677] stack backtrace:
[ 118.853753] Pid: 0, comm: swapper Not tainted 2.6.24.2-build-0022 #7
[ 118.853796] [<c0105f2a>] show_trace_log_lvl+0x1a/0x2f
[ 118.853865] [<c0106843>] show_trace+0x12/0x14
[ 118.853932] [<c010713a>] dump_stack+0x6c/0x72
[ 118.853999] [<c01416f4>] print_circular_bug_tail+0x5f/0x68
[ 118.854068] [<c0143514>] __lock_acquire+0x920/0xc19
[ 118.854135] [<c0143887>] lock_acquire+0x7a/0x94
[ 118.854205] [<c02d93f5>] _spin_lock+0x2e/0x58
[ 118.854272] [<c0289c11>] dev_queue_xmit+0x177/0x302
[ 118.854340] [<f89f01a5>] tcf_mirred+0x15f/0x180 [act_mirred]
[ 118.854409] [<c0297ec4>] tcf_action_exec+0x44/0x77
[ 118.854477] [<f89ba5d2>] u32_classify+0x119/0x24a [cls_u32]
[ 118.854547] [<c0295ce2>] tc_classify_compat+0x2f/0x5e
[ 118.854615] [<c0295ec3>] tc_classify+0x1a/0x80
[ 118.854682] [<f899c118>] ingress_enqueue+0x1a/0x53 [sch_ingress]
[ 118.854752] [<c0287139>] netif_receive_skb+0x296/0x44c
[ 118.854820] [<f88d1a4e>] e100_poll+0x14b/0x26a [e100]
[ 118.854890] [<c02894bc>] net_rx_action+0xbf/0x201
[ 118.854958] [<c012ac15>] __do_softirq+0x6f/0xe9
[ 118.855025] [<c01078af>] do_softirq+0x61/0xc8
--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.
^ permalink raw reply [flat|nested] 47+ messages in thread* Re: circular locking, mirred, 2.6.24.2 2008-02-24 22:20 circular locking, mirred, 2.6.24.2 Denys Fedoryshchenko @ 2008-02-25 9:56 ` Jarek Poplawski 2008-02-25 10:48 ` Denys Fedoryshchenko 0 siblings, 1 reply; 47+ messages in thread From: Jarek Poplawski @ 2008-02-25 9:56 UTC (permalink / raw) To: Denys Fedoryshchenko; +Cc: netdev On 24-02-2008 23:20, Denys Fedoryshchenko wrote: > 2.6.24.2 with applied patches for printk,softlockup, and patch for htb (as i > understand, they are in 2.6.25 git and it is fixes). > > I will send also to private mails QoS rules i am using. > > [ 118.840072] ======================================================= > [ 118.840158] [ INFO: possible circular locking dependency detected ] > [ 118.840203] 2.6.24.2-build-0022 #7 > [ 118.840243] ------------------------------------------------------- > [ 118.840288] swapper/0 is trying to acquire lock: > [ 118.840329] (&dev->queue_lock){-+..}, at: [<c0289c11>] dev_queue_xmit > +0x177/0x302 > [ 118.840490] > [ 118.840490] but task is already holding lock: > [ 118.840567] (&p->tcfc_lock){-+..}, at: [<f89f0066>] tcf_mirred+0x20/0x180 > [act_mirred] > [ 118.840727] > [ 118.840727] which lock already depends on the new lock. > [ 118.840728] > [ 118.840842] > [ 118.840842] the existing dependency chain (in reverse order) is: > [ 118.840921] > [ 118.840921] -> #2 (&p->tcfc_lock){-+..}: > [ 118.841075] [<c0143624>] __lock_acquire+0xa30/0xc19 > [ 118.841324] [<c0143887>] lock_acquire+0x7a/0x94 > [ 118.841572] [<c02d93f5>] _spin_lock+0x2e/0x58 > [ 118.841820] [<f89f0066>] tcf_mirred+0x20/0x180 [act_mirred] > [ 118.842068] [<c0297ec4>] tcf_action_exec+0x44/0x77 > [ 118.842344] [<f89ba5d2>] u32_classify+0x119/0x24a [cls_u32] > [ 118.842595] [<c0295ce2>] tc_classify_compat+0x2f/0x5e > [ 118.842845] [<c0295ec3>] tc_classify+0x1a/0x80 > [ 118.843092] [<f899c118>] ingress_enqueue+0x1a/0x53 [sch_ingress] > [ 118.843343] [<c0287139>] netif_receive_skb+0x296/0x44c > [ 118.843592] [<f88d1a4e>] e100_poll+0x14b/0x26a [e100] > [ 118.843843] [<c02894bc>] net_rx_action+0xbf/0x201 > [ 118.844091] [<c012ac15>] __do_softirq+0x6f/0xe9 > [ 118.844343] [<c01078af>] do_softirq+0x61/0xc8 > [ 118.844591] [<ffffffff>] 0xffffffff > [ 118.844840] > [ 118.844840] -> #1 (&dev->ingress_lock){-+..}: > [ 118.844993] [<c0143624>] __lock_acquire+0xa30/0xc19 > [ 118.845242] [<c0143887>] lock_acquire+0x7a/0x94 > [ 118.845489] [<c02d93f5>] _spin_lock+0x2e/0x58 > [ 118.845737] [<c0295387>] qdisc_lock_tree+0x1e/0x21 > [ 118.845984] [<c02953b6>] dev_init_scheduler+0xb/0x53 > [ 118.846235] [<c0288483>] register_netdevice+0x2a3/0x2fd > [ 118.846483] [<c028850f>] register_netdev+0x32/0x3f > [ 118.846730] [<c03ea8ee>] loopback_net_init+0x39/0x6c > [ 118.846980] [<c02858ef>] register_pernet_operations+0x13/0x15 > [ 118.847230] [<c0285958>] register_pernet_device+0x1f/0x4c > [ 118.847478] [<c03ea8b3>] loopback_init+0xd/0xf > [ 118.847725] [<c03d64df>] kernel_init+0x155/0x2c6 This looks strange: are you sure your tc scripts aren't started too early? (Or maybe there are some problems during booting?) Regards, Jarek P. > [ 118.847973] [<c0105bab>] kernel_thread_helper+0x7/0x10 > [ 118.848225] [<ffffffff>] 0xffffffff > [ 118.848472] > [ 118.848472] -> #0 (&dev->queue_lock){-+..}: > [ 118.848626] [<c0143514>] __lock_acquire+0x920/0xc19 > [ 118.848874] [<c0143887>] lock_acquire+0x7a/0x94 > [ 118.849122] [<c02d93f5>] _spin_lock+0x2e/0x58 > [ 118.849370] [<c0289c11>] dev_queue_xmit+0x177/0x302 > [ 118.849617] [<f89f01a5>] tcf_mirred+0x15f/0x180 [act_mirred] > [ 118.849866] [<c0297ec4>] tcf_action_exec+0x44/0x77 > [ 118.850114] [<f89ba5d2>] u32_classify+0x119/0x24a [cls_u32] > [ 118.850366] [<c0295ce2>] tc_classify_compat+0x2f/0x5e > [ 118.850614] [<c0295ec3>] tc_classify+0x1a/0x80 > [ 118.850861] [<f899c118>] ingress_enqueue+0x1a/0x53 [sch_ingress] > [ 118.851111] [<c0287139>] netif_receive_skb+0x296/0x44c > [ 118.851360] [<f88d1a4e>] e100_poll+0x14b/0x26a [e100] > [ 118.851612] [<c02894bc>] net_rx_action+0xbf/0x201 > [ 118.851859] [<c012ac15>] __do_softirq+0x6f/0xe9 > [ 118.852106] [<c01078af>] do_softirq+0x61/0xc8 > [ 118.852355] [<ffffffff>] 0xffffffff ... ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: circular locking, mirred, 2.6.24.2 2008-02-25 9:56 ` Jarek Poplawski @ 2008-02-25 10:48 ` Denys Fedoryshchenko 2008-02-25 11:39 ` Jarek Poplawski 0 siblings, 1 reply; 47+ messages in thread From: Denys Fedoryshchenko @ 2008-02-25 10:48 UTC (permalink / raw) To: Jarek Poplawski; +Cc: netdev What does it mean early? I have custom boot scripts, it is also custom system based on busybox. There is a chance that i forgot to bring ifb0 up, but thats it. I think such warning must not appear on any actions in userspace. On Mon, 25 Feb 2008 09:56:46 +0000, Jarek Poplawski wrote > On 24-02-2008 23:20, Denys Fedoryshchenko wrote: > > 2.6.24.2 with applied patches for printk,softlockup, and patch for htb (as i > > understand, they are in 2.6.25 git and it is fixes). > > > > I will send also to private mails QoS rules i am using. > > > > [ 118.840072] ======================================================= > > [ 118.840158] [ INFO: possible circular locking dependency detected ] > > [ 118.840203] 2.6.24.2-build-0022 #7 > > [ 118.840243] ------------------------------------------------------- > > [ 118.840288] swapper/0 is trying to acquire lock: > > [ 118.840329] (&dev->queue_lock){-+..}, at: [<c0289c11>] dev_queue_xmit > > +0x177/0x302 > > [ 118.840490] > > [ 118.840490] but task is already holding lock: > > [ 118.840567] (&p->tcfc_lock){-+..}, at: [<f89f0066>] tcf_mirred +0x20/0x180 > > [act_mirred] > > [ 118.840727] > > [ 118.840727] which lock already depends on the new lock. > > [ 118.840728] > > [ 118.840842] > > [ 118.840842] the existing dependency chain (in reverse order) is: > > [ 118.840921] > > [ 118.840921] -> #2 (&p->tcfc_lock){-+..}: > > [ 118.841075] [<c0143624>] __lock_acquire+0xa30/0xc19 > > [ 118.841324] [<c0143887>] lock_acquire+0x7a/0x94 > > [ 118.841572] [<c02d93f5>] _spin_lock+0x2e/0x58 > > [ 118.841820] [<f89f0066>] tcf_mirred+0x20/0x180 [act_mirred] > > [ 118.842068] [<c0297ec4>] tcf_action_exec+0x44/0x77 > > [ 118.842344] [<f89ba5d2>] u32_classify+0x119/0x24a [cls_u32] > > [ 118.842595] [<c0295ce2>] tc_classify_compat+0x2f/0x5e > > [ 118.842845] [<c0295ec3>] tc_classify+0x1a/0x80 > > [ 118.843092] [<f899c118>] ingress_enqueue+0x1a/0x53 [sch_ingress] > > [ 118.843343] [<c0287139>] netif_receive_skb+0x296/0x44c > > [ 118.843592] [<f88d1a4e>] e100_poll+0x14b/0x26a [e100] > > [ 118.843843] [<c02894bc>] net_rx_action+0xbf/0x201 > > [ 118.844091] [<c012ac15>] __do_softirq+0x6f/0xe9 > > [ 118.844343] [<c01078af>] do_softirq+0x61/0xc8 > > [ 118.844591] [<ffffffff>] 0xffffffff > > [ 118.844840] > > [ 118.844840] -> #1 (&dev->ingress_lock){-+..}: > > [ 118.844993] [<c0143624>] __lock_acquire+0xa30/0xc19 > > [ 118.845242] [<c0143887>] lock_acquire+0x7a/0x94 > > [ 118.845489] [<c02d93f5>] _spin_lock+0x2e/0x58 > > [ 118.845737] [<c0295387>] qdisc_lock_tree+0x1e/0x21 > > [ 118.845984] [<c02953b6>] dev_init_scheduler+0xb/0x53 > > [ 118.846235] [<c0288483>] register_netdevice+0x2a3/0x2fd > > [ 118.846483] [<c028850f>] register_netdev+0x32/0x3f > > [ 118.846730] [<c03ea8ee>] loopback_net_init+0x39/0x6c > > [ 118.846980] [<c02858ef>] register_pernet_operations+0x13/0x15 > > [ 118.847230] [<c0285958>] register_pernet_device+0x1f/0x4c > > [ 118.847478] [<c03ea8b3>] loopback_init+0xd/0xf > > [ 118.847725] [<c03d64df>] kernel_init+0x155/0x2c6 > > This looks strange: are you sure your tc scripts aren't started too > early? (Or maybe there are some problems during booting?) > > Regards, > Jarek P. > > > [ 118.847973] [<c0105bab>] kernel_thread_helper+0x7/0x10 > > [ 118.848225] [<ffffffff>] 0xffffffff > > [ 118.848472] > > [ 118.848472] -> #0 (&dev->queue_lock){-+..}: > > [ 118.848626] [<c0143514>] __lock_acquire+0x920/0xc19 > > [ 118.848874] [<c0143887>] lock_acquire+0x7a/0x94 > > [ 118.849122] [<c02d93f5>] _spin_lock+0x2e/0x58 > > [ 118.849370] [<c0289c11>] dev_queue_xmit+0x177/0x302 > > [ 118.849617] [<f89f01a5>] tcf_mirred+0x15f/0x180 [act_mirred] > > [ 118.849866] [<c0297ec4>] tcf_action_exec+0x44/0x77 > > [ 118.850114] [<f89ba5d2>] u32_classify+0x119/0x24a [cls_u32] > > [ 118.850366] [<c0295ce2>] tc_classify_compat+0x2f/0x5e > > [ 118.850614] [<c0295ec3>] tc_classify+0x1a/0x80 > > [ 118.850861] [<f899c118>] ingress_enqueue+0x1a/0x53 [sch_ingress] > > [ 118.851111] [<c0287139>] netif_receive_skb+0x296/0x44c > > [ 118.851360] [<f88d1a4e>] e100_poll+0x14b/0x26a [e100] > > [ 118.851612] [<c02894bc>] net_rx_action+0xbf/0x201 > > [ 118.851859] [<c012ac15>] __do_softirq+0x6f/0xe9 > > [ 118.852106] [<c01078af>] do_softirq+0x61/0xc8 > > [ 118.852355] [<ffffffff>] 0xffffffff > .... > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Denys Fedoryshchenko Technical Manager Virtual ISP S.A.L. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: circular locking, mirred, 2.6.24.2 2008-02-25 10:48 ` Denys Fedoryshchenko @ 2008-02-25 11:39 ` Jarek Poplawski 2008-03-05 10:45 ` Denys Fedoryshchenko 0 siblings, 1 reply; 47+ messages in thread From: Jarek Poplawski @ 2008-02-25 11:39 UTC (permalink / raw) To: Denys Fedoryshchenko; +Cc: netdev On Mon, Feb 25, 2008 at 12:48:38PM +0200, Denys Fedoryshchenko wrote: > What does it mean early? > I have custom boot scripts, it is also custom system based on busybox. There > is a chance that i forgot to bring ifb0 up, but thats it. > I think such warning must not appear on any actions in userspace. It's not about ifb0: this report shows loopback_init after some action on eth, so eth was probably up before lo. And of course you are right: this warning shouldn't be there. But, since this report looks very strange, I wonder if there could be something else that mislead lockdep. Could you try to reproduce this with 2.6.24.2 without these additional patches? Jarek P. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: circular locking, mirred, 2.6.24.2 2008-02-25 11:39 ` Jarek Poplawski @ 2008-03-05 10:45 ` Denys Fedoryshchenko 2008-03-05 13:54 ` [BUG] Probably lockdep bug " Jarek Poplawski 2008-03-06 13:40 ` Jarek Poplawski 0 siblings, 2 replies; 47+ messages in thread From: Denys Fedoryshchenko @ 2008-03-05 10:45 UTC (permalink / raw) To: Jarek Poplawski; +Cc: netdev I did test on vanilla 2.6.25-rc3, on clean Gentoo distro and got similar message. The strange thing, message appeared not immediately after launching script, but after few seconds. Scripts is the same. I have same message on another script, used for ppp shaper. [ 10.536424] ======================================================= [ 10.536424] [ INFO: possible circular locking dependency detected ] [ 10.536424] 2.6.25-rc3-devel #3 [ 10.536424] ------------------------------------------------------- [ 10.536424] swapper/0 is trying to acquire lock: [ 10.536424] (&dev->queue_lock){-+..}, at: [<c0299b4a>] dev_queue_xmit+0x175/0x2f3 [ 10.536424] [ 10.536424] but task is already holding lock: [ 10.536424] (&p->tcfc_lock){-+..}, at: [<f8a67154>] tcf_mirred+0x20/0x178 [act_mirred] [ 10.536424] [ 10.536424] which lock already depends on the new lock. [ 10.536424] [ 10.536424] [ 10.536424] the existing dependency chain (in reverse order) is: [ 10.536424] [ 10.536424] -> #2 (&p->tcfc_lock){-+..}: [ 10.536424] [<c013efb6>] __lock_acquire+0x963/0xb18 [ 10.536424] [<f8a67154>] tcf_mirred+0x20/0x178 [act_mirred] [ 10.536424] [<c013f1d7>] lock_acquire+0x6c/0x89 [ 10.536424] [<f8a67154>] tcf_mirred+0x20/0x178 [act_mirred] [ 10.536424] [<c02f6927>] _spin_lock+0x1c/0x49 [ 10.536424] [<f8a67154>] tcf_mirred+0x20/0x178 [act_mirred] [ 10.536424] [<f8a67134>] tcf_mirred+0x0/0x178 [act_mirred] [ 10.536424] [<f8a67154>] tcf_mirred+0x20/0x178 [act_mirred] [ 10.536424] [<c010b2ce>] save_stack_address+0x0/0x28 [ 10.536424] [<f8a67134>] tcf_mirred+0x0/0x178 [act_mirred] [ 10.536424] [<c02a7018>] tcf_action_exec+0x44/0x77 [ 10.536424] [<f89927b0>] u32_classify+0x119/0x24e [cls_u32] [ 10.536424] [<c013f123>] __lock_acquire+0xad0/0xb18 [ 10.536424] [<c02a5006>] tc_classify_compat+0x2f/0x5e [ 10.536424] [<c02a5d1c>] tc_classify+0x17/0x78 [ 10.536424] [<f89670a4>] ingress_enqueue+0x1a/0x53 [sch_ingress] [ 10.536424] [<c0297009>] netif_receive_skb+0x263/0x3e6 [ 10.536424] [<f882b4d9>] e1000_clean_rx_irq+0x380/0x448 [e1000] [ 10.536424] [<f8828e1e>] e1000_clean+0x62/0x1fd [e1000] [ 10.536424] [<c029926f>] net_rx_action+0xb3/0x19e [ 10.536424] [<c01276f7>] __do_softirq+0x6f/0xe9 [ 10.536424] [<c0106aa7>] do_softirq+0x5e/0xa8 [ 10.536424] [<ffffffff>] 0xffffffff [ 10.536424] [ 10.536424] -> #1 (&dev->ingress_lock){-+..}: [ 10.536424] [<c013efb6>] __lock_acquire+0x963/0xb18 [ 10.536424] [<c02a4718>] qdisc_lock_tree+0x1e/0x21 [ 10.536424] [<c013f1d7>] lock_acquire+0x6c/0x89 [ 10.536424] [<c02a4718>] qdisc_lock_tree+0x1e/0x21 [ 10.536424] [<c02f6927>] _spin_lock+0x1c/0x49 [ 10.536424] [<c02a4718>] qdisc_lock_tree+0x1e/0x21 [ 10.536424] [<c02a4718>] qdisc_lock_tree+0x1e/0x21 [ 10.536424] [<c02a4747>] dev_init_scheduler+0xb/0x4d [ 10.536424] [<c0298299>] register_netdevice+0x288/0x2e2 [ 10.536424] [<c0298325>] register_netdev+0x32/0x3f [ 10.536424] [<c042a867>] loopback_net_init+0x34/0x63 [ 10.536424] [<c0295b3f>] register_pernet_operations+0x13/0x15 [ 10.536424] [<c0295ba8>] register_pernet_device+0x1f/0x4c [ 10.536424] [<c042a831>] loopback_init+0xd/0xf [ 10.536424] [<c0415478>] kernel_init+0x132/0x27c [ 10.536424] [<c0415346>] kernel_init+0x0/0x27c [ 10.536424] [<c0415346>] kernel_init+0x0/0x27c [ 10.536424] [<c01056bf>] kernel_thread_helper+0x7/0x10 [ 10.536424] [<ffffffff>] 0xffffffff [ 10.536424] [ 10.536424] -> #0 (&dev->queue_lock){-+..}: [ 10.536424] [<c013d169>] print_circular_bug_tail+0x2e/0x66 [ 10.536424] [<c013eedd>] __lock_acquire+0x88a/0xb18 [ 10.536424] [<c013f1d7>] lock_acquire+0x6c/0x89 [ 10.536424] [<c0299b4a>] dev_queue_xmit+0x175/0x2f3 [ 10.536424] [<c02f6927>] _spin_lock+0x1c/0x49 [ 10.536424] [<c0299b4a>] dev_queue_xmit+0x175/0x2f3 [ 10.536425] [<c0299b4a>] dev_queue_xmit+0x175/0x2f3 [ 10.536425] [<f8a6728b>] tcf_mirred+0x157/0x178 [act_mirred] [ 10.536425] [<f8a67134>] tcf_mirred+0x0/0x178 [act_mirred] [ 10.536425] [<c02a7018>] tcf_action_exec+0x44/0x77 [ 10.536425] [<f89927b0>] u32_classify+0x119/0x24e [cls_u32] [ 10.536425] [<c013f123>] __lock_acquire+0xad0/0xb18 [ 10.536425] [<c02a5006>] tc_classify_compat+0x2f/0x5e [ 10.536425] [<c02a5d1c>] tc_classify+0x17/0x78 [ 10.536425] [<f89670a4>] ingress_enqueue+0x1a/0x53 [sch_ingress] [ 10.536425] [<c0297009>] netif_receive_skb+0x263/0x3e6 [ 10.536425] [<f882b4d9>] e1000_clean_rx_irq+0x380/0x448 [e1000] [ 10.536425] [<f8828e1e>] e1000_clean+0x62/0x1fd [e1000] [ 10.536425] [<c029926f>] net_rx_action+0xb3/0x19e [ 10.536425] [<c01276f7>] __do_softirq+0x6f/0xe9 [ 10.536425] [<c0106aa7>] do_softirq+0x5e/0xa8 [ 10.536425] [<ffffffff>] 0xffffffff [ 10.536425] [ 10.536425] other info that might help us debug this: [ 10.536425] [ 10.536425] 5 locks held by swapper/0: [ 10.536425] #0: (rcu_read_lock){..--}, at: [<c029920c>] net_rx_action+0x50/0x19e [ 10.536425] #1: (rcu_read_lock){..--}, at: [<c0296e96>] netif_receive_skb+0xf0/0x3e6 [ 10.536425] #2: (&dev->ingress_lock){-+..}, at: [<c0296ff5>] netif_receive_skb+0x24f/0x3e6 [ 10.536425] #3: (&p->tcfc_lock){-+..}, at: [<f8a67154>] tcf_mirred+0x20/ 0x178 [act_mirred] [ 10.536425] #4: (rcu_read_lock){..--}, at: [<c0299afb>] dev_queue_xmit+0x126/0x2f3 [ 10.536425] [ 10.536425] stack backtrace: [ 10.536425] Pid: 0, comm: swapper Not tainted 2.6.25-rc3-devel #3 [ 10.536425] [<c013d196>] print_circular_bug_tail+0x5b/0x66 [ 10.536425] [<c013eedd>] __lock_acquire+0x88a/0xb18 [ 10.536425] [<c013f1d7>] lock_acquire+0x6c/0x89 [ 10.536425] [<c0299b4a>] ? dev_queue_xmit+0x175/0x2f3 [ 10.536425] [<c02f6927>] _spin_lock+0x1c/0x49 [ 10.536425] [<c0299b4a>] ? dev_queue_xmit+0x175/0x2f3 [ 10.536425] [<c0299b4a>] dev_queue_xmit+0x175/0x2f3 [ 10.536425] [<f8a6728b>] tcf_mirred+0x157/0x178 [act_mirred] [ 10.536425] [<f8a67134>] ? tcf_mirred+0x0/0x178 [act_mirred] [ 10.536425] [<c02a7018>] tcf_action_exec+0x44/0x77 [ 10.536425] [<f89927b0>] u32_classify+0x119/0x24e [cls_u32] [ 10.536425] [<c013f123>] ? __lock_acquire+0xad0/0xb18 [ 10.536425] [<c02a5006>] tc_classify_compat+0x2f/0x5e [ 10.536425] [<c02a5d1c>] tc_classify+0x17/0x78 [ 10.536425] [<f89670a4>] ingress_enqueue+0x1a/0x53 [sch_ingress] [ 10.536425] [<c0297009>] netif_receive_skb+0x263/0x3e6 [ 10.536425] [<f882b4d9>] e1000_clean_rx_irq+0x380/0x448 [e1000] [ 10.536425] [<f8828e1e>] e1000_clean+0x62/0x1fd [e1000] [ 10.536425] [<c029926f>] net_rx_action+0xb3/0x19e [ 10.536425] [<c01276f7>] __do_softirq+0x6f/0xe9 [ 10.536425] [<c0106aa7>] do_softirq+0x5e/0xa8 [ 10.536425] [<c014d8c9>] ? handle_edge_irq+0x0/0x10a [ 10.536425] [<c0127655>] irq_exit+0x44/0x77 [ 10.536425] [<c0106b91>] do_IRQ+0xa0/0xb7 [ 10.536425] [<c0105446>] common_interrupt+0x2e/0x34 [ 10.536425] [<c01031b9>] ? mwait_idle_with_hints+0x39/0x3d [ 10.536425] [<c0103364>] ? mwait_idle+0x0/0x14 [ 10.536425] [<c0103376>] mwait_idle+0x12/0x14 [ 10.536425] [<c0103696>] cpu_idle+0xb5/0xd5 [ 10.536425] [<c02e5b79>] rest_init+0x49/0x4b On Mon, 25 Feb 2008 11:39:30 +0000, Jarek Poplawski wrote > On Mon, Feb 25, 2008 at 12:48:38PM +0200, Denys Fedoryshchenko wrote: > > What does it mean early? > > I have custom boot scripts, it is also custom system based on busybox. There > > is a chance that i forgot to bring ifb0 up, but thats it. > > I think such warning must not appear on any actions in userspace. > > It's not about ifb0: this report shows loopback_init after some > action on eth, so eth was probably up before lo. And of course you > are right: this warning shouldn't be there. But, since this report > looks very strange, I wonder if there could be something else that mislead > lockdep. Could you try to reproduce this with 2.6.24.2 without these > additional patches? > > Jarek P. > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Denys Fedoryshchenko Technical Manager Virtual ISP S.A.L. ^ permalink raw reply [flat|nested] 47+ messages in thread
* [BUG] Probably lockdep bug Re: circular locking, mirred, 2.6.24.2 2008-03-05 10:45 ` Denys Fedoryshchenko @ 2008-03-05 13:54 ` Jarek Poplawski 2008-03-06 9:41 ` Jarek Poplawski 2008-03-06 13:40 ` Jarek Poplawski 1 sibling, 1 reply; 47+ messages in thread From: Jarek Poplawski @ 2008-03-05 13:54 UTC (permalink / raw) To: Denys Fedoryshchenko; +Cc: netdev, Peter Zijlstra, Ingo Molnar, linux-kernel Hi, dev->queue_lock is taken in a scenario like below: always after dev->ingress_lock and p->tcfc_lock, so just like on this last backtrace with info about held locks. But this report shows that lockdep for some reason forgot the history before dev->queue_lock, and recorded it again. It seems, even if there is something wrong with init lockdep shouldn't report it like this. I CC this report to lockdep maintainers and linux-kernel. Regards, Jarek P. Original lockdep report for 2.6.24.2: http://permalink.gmane.org/gmane.linux.network/86896 On Wed, Mar 05, 2008 at 12:45:51PM +0200, Denys Fedoryshchenko wrote: > I did test on vanilla 2.6.25-rc3, on clean Gentoo distro and got > similar message. The strange thing, message appeared not immediately after > launching script, but after few seconds. > > Scripts is the same. I have same message on another script, used for ppp > shaper. > > [ 10.536424] ======================================================= > [ 10.536424] [ INFO: possible circular locking dependency detected ] > [ 10.536424] 2.6.25-rc3-devel #3 > [ 10.536424] ------------------------------------------------------- > [ 10.536424] swapper/0 is trying to acquire lock: > [ 10.536424] (&dev->queue_lock){-+..}, at: [<c0299b4a>] > dev_queue_xmit+0x175/0x2f3 > [ 10.536424] > [ 10.536424] but task is already holding lock: > [ 10.536424] (&p->tcfc_lock){-+..}, at: [<f8a67154>] tcf_mirred+0x20/0x178 > [act_mirred] > [ 10.536424] > [ 10.536424] which lock already depends on the new lock. > [ 10.536424] > [ 10.536424] > [ 10.536424] the existing dependency chain (in reverse order) is: > [ 10.536424] > [ 10.536424] -> #2 (&p->tcfc_lock){-+..}: > [ 10.536424] [<c013efb6>] __lock_acquire+0x963/0xb18 > [ 10.536424] [<f8a67154>] tcf_mirred+0x20/0x178 [act_mirred] > [ 10.536424] [<c013f1d7>] lock_acquire+0x6c/0x89 > [ 10.536424] [<f8a67154>] tcf_mirred+0x20/0x178 [act_mirred] > [ 10.536424] [<c02f6927>] _spin_lock+0x1c/0x49 > [ 10.536424] [<f8a67154>] tcf_mirred+0x20/0x178 [act_mirred] > [ 10.536424] [<f8a67134>] tcf_mirred+0x0/0x178 [act_mirred] > [ 10.536424] [<f8a67154>] tcf_mirred+0x20/0x178 [act_mirred] > [ 10.536424] [<c010b2ce>] save_stack_address+0x0/0x28 > [ 10.536424] [<f8a67134>] tcf_mirred+0x0/0x178 [act_mirred] > [ 10.536424] [<c02a7018>] tcf_action_exec+0x44/0x77 > [ 10.536424] [<f89927b0>] u32_classify+0x119/0x24e [cls_u32] > [ 10.536424] [<c013f123>] __lock_acquire+0xad0/0xb18 > [ 10.536424] [<c02a5006>] tc_classify_compat+0x2f/0x5e > [ 10.536424] [<c02a5d1c>] tc_classify+0x17/0x78 > [ 10.536424] [<f89670a4>] ingress_enqueue+0x1a/0x53 [sch_ingress] > [ 10.536424] [<c0297009>] netif_receive_skb+0x263/0x3e6 > [ 10.536424] [<f882b4d9>] e1000_clean_rx_irq+0x380/0x448 [e1000] > [ 10.536424] [<f8828e1e>] e1000_clean+0x62/0x1fd [e1000] > [ 10.536424] [<c029926f>] net_rx_action+0xb3/0x19e > [ 10.536424] [<c01276f7>] __do_softirq+0x6f/0xe9 > [ 10.536424] [<c0106aa7>] do_softirq+0x5e/0xa8 > [ 10.536424] [<ffffffff>] 0xffffffff > [ 10.536424] > [ 10.536424] -> #1 (&dev->ingress_lock){-+..}: > [ 10.536424] [<c013efb6>] __lock_acquire+0x963/0xb18 > [ 10.536424] [<c02a4718>] qdisc_lock_tree+0x1e/0x21 > [ 10.536424] [<c013f1d7>] lock_acquire+0x6c/0x89 > [ 10.536424] [<c02a4718>] qdisc_lock_tree+0x1e/0x21 > [ 10.536424] [<c02f6927>] _spin_lock+0x1c/0x49 > [ 10.536424] [<c02a4718>] qdisc_lock_tree+0x1e/0x21 > [ 10.536424] [<c02a4718>] qdisc_lock_tree+0x1e/0x21 > [ 10.536424] [<c02a4747>] dev_init_scheduler+0xb/0x4d > [ 10.536424] [<c0298299>] register_netdevice+0x288/0x2e2 > [ 10.536424] [<c0298325>] register_netdev+0x32/0x3f > [ 10.536424] [<c042a867>] loopback_net_init+0x34/0x63 > [ 10.536424] [<c0295b3f>] register_pernet_operations+0x13/0x15 > [ 10.536424] [<c0295ba8>] register_pernet_device+0x1f/0x4c > [ 10.536424] [<c042a831>] loopback_init+0xd/0xf > [ 10.536424] [<c0415478>] kernel_init+0x132/0x27c > [ 10.536424] [<c0415346>] kernel_init+0x0/0x27c > [ 10.536424] [<c0415346>] kernel_init+0x0/0x27c > [ 10.536424] [<c01056bf>] kernel_thread_helper+0x7/0x10 > [ 10.536424] [<ffffffff>] 0xffffffff > [ 10.536424] > [ 10.536424] -> #0 (&dev->queue_lock){-+..}: > [ 10.536424] [<c013d169>] print_circular_bug_tail+0x2e/0x66 > [ 10.536424] [<c013eedd>] __lock_acquire+0x88a/0xb18 > [ 10.536424] [<c013f1d7>] lock_acquire+0x6c/0x89 > [ 10.536424] [<c0299b4a>] dev_queue_xmit+0x175/0x2f3 > [ 10.536424] [<c02f6927>] _spin_lock+0x1c/0x49 > [ 10.536424] [<c0299b4a>] dev_queue_xmit+0x175/0x2f3 > [ 10.536425] [<c0299b4a>] dev_queue_xmit+0x175/0x2f3 > [ 10.536425] [<f8a6728b>] tcf_mirred+0x157/0x178 [act_mirred] > [ 10.536425] [<f8a67134>] tcf_mirred+0x0/0x178 [act_mirred] > [ 10.536425] [<c02a7018>] tcf_action_exec+0x44/0x77 > [ 10.536425] [<f89927b0>] u32_classify+0x119/0x24e [cls_u32] > [ 10.536425] [<c013f123>] __lock_acquire+0xad0/0xb18 > [ 10.536425] [<c02a5006>] tc_classify_compat+0x2f/0x5e > [ 10.536425] [<c02a5d1c>] tc_classify+0x17/0x78 > [ 10.536425] [<f89670a4>] ingress_enqueue+0x1a/0x53 [sch_ingress] > [ 10.536425] [<c0297009>] netif_receive_skb+0x263/0x3e6 > [ 10.536425] [<f882b4d9>] e1000_clean_rx_irq+0x380/0x448 [e1000] > [ 10.536425] [<f8828e1e>] e1000_clean+0x62/0x1fd [e1000] > [ 10.536425] [<c029926f>] net_rx_action+0xb3/0x19e > [ 10.536425] [<c01276f7>] __do_softirq+0x6f/0xe9 > [ 10.536425] [<c0106aa7>] do_softirq+0x5e/0xa8 > [ 10.536425] [<ffffffff>] 0xffffffff > [ 10.536425] > [ 10.536425] other info that might help us debug this: > [ 10.536425] > [ 10.536425] 5 locks held by swapper/0: > [ 10.536425] #0: (rcu_read_lock){..--}, at: [<c029920c>] > net_rx_action+0x50/0x19e > [ 10.536425] #1: (rcu_read_lock){..--}, at: [<c0296e96>] > netif_receive_skb+0xf0/0x3e6 > [ 10.536425] #2: (&dev->ingress_lock){-+..}, at: [<c0296ff5>] > netif_receive_skb+0x24f/0x3e6 > [ 10.536425] #3: (&p->tcfc_lock){-+..}, at: [<f8a67154>] tcf_mirred+0x20/ > 0x178 [act_mirred] > [ 10.536425] #4: (rcu_read_lock){..--}, at: [<c0299afb>] > dev_queue_xmit+0x126/0x2f3 > [ 10.536425] > [ 10.536425] stack backtrace: > [ 10.536425] Pid: 0, comm: swapper Not tainted 2.6.25-rc3-devel #3 > [ 10.536425] [<c013d196>] print_circular_bug_tail+0x5b/0x66 > [ 10.536425] [<c013eedd>] __lock_acquire+0x88a/0xb18 > [ 10.536425] [<c013f1d7>] lock_acquire+0x6c/0x89 > [ 10.536425] [<c0299b4a>] ? dev_queue_xmit+0x175/0x2f3 > [ 10.536425] [<c02f6927>] _spin_lock+0x1c/0x49 > [ 10.536425] [<c0299b4a>] ? dev_queue_xmit+0x175/0x2f3 > [ 10.536425] [<c0299b4a>] dev_queue_xmit+0x175/0x2f3 > [ 10.536425] [<f8a6728b>] tcf_mirred+0x157/0x178 [act_mirred] > [ 10.536425] [<f8a67134>] ? tcf_mirred+0x0/0x178 [act_mirred] > [ 10.536425] [<c02a7018>] tcf_action_exec+0x44/0x77 > [ 10.536425] [<f89927b0>] u32_classify+0x119/0x24e [cls_u32] > [ 10.536425] [<c013f123>] ? __lock_acquire+0xad0/0xb18 > [ 10.536425] [<c02a5006>] tc_classify_compat+0x2f/0x5e > [ 10.536425] [<c02a5d1c>] tc_classify+0x17/0x78 > [ 10.536425] [<f89670a4>] ingress_enqueue+0x1a/0x53 [sch_ingress] > [ 10.536425] [<c0297009>] netif_receive_skb+0x263/0x3e6 > [ 10.536425] [<f882b4d9>] e1000_clean_rx_irq+0x380/0x448 [e1000] > [ 10.536425] [<f8828e1e>] e1000_clean+0x62/0x1fd [e1000] > [ 10.536425] [<c029926f>] net_rx_action+0xb3/0x19e > [ 10.536425] [<c01276f7>] __do_softirq+0x6f/0xe9 > [ 10.536425] [<c0106aa7>] do_softirq+0x5e/0xa8 > [ 10.536425] [<c014d8c9>] ? handle_edge_irq+0x0/0x10a > [ 10.536425] [<c0127655>] irq_exit+0x44/0x77 > [ 10.536425] [<c0106b91>] do_IRQ+0xa0/0xb7 > [ 10.536425] [<c0105446>] common_interrupt+0x2e/0x34 > [ 10.536425] [<c01031b9>] ? mwait_idle_with_hints+0x39/0x3d > [ 10.536425] [<c0103364>] ? mwait_idle+0x0/0x14 > [ 10.536425] [<c0103376>] mwait_idle+0x12/0x14 > [ 10.536425] [<c0103696>] cpu_idle+0xb5/0xd5 > [ 10.536425] [<c02e5b79>] rest_init+0x49/0x4b > > > > > On Mon, 25 Feb 2008 11:39:30 +0000, Jarek Poplawski wrote > > On Mon, Feb 25, 2008 at 12:48:38PM +0200, Denys Fedoryshchenko wrote: > > > What does it mean early? > > > I have custom boot scripts, it is also custom system based on busybox. > There > > > is a chance that i forgot to bring ifb0 up, but thats it. > > > I think such warning must not appear on any actions in userspace. > > > > It's not about ifb0: this report shows loopback_init after some > > action on eth, so eth was probably up before lo. And of course you > > are right: this warning shouldn't be there. But, since this report > > looks very strange, I wonder if there could be something else that mislead > > lockdep. Could you try to reproduce this with 2.6.24.2 without these > > additional patches? > > > > Jarek P. > > -- > > To unsubscribe from this list: send the line "unsubscribe netdev" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > -- > Denys Fedoryshchenko > Technical Manager > Virtual ISP S.A.L. > ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [BUG] Probably lockdep bug Re: circular locking, mirred, 2.6.24.2 2008-03-05 13:54 ` [BUG] Probably lockdep bug " Jarek Poplawski @ 2008-03-06 9:41 ` Jarek Poplawski 0 siblings, 0 replies; 47+ messages in thread From: Jarek Poplawski @ 2008-03-06 9:41 UTC (permalink / raw) To: Denys Fedoryshchenko; +Cc: netdev, Peter Zijlstra, Ingo Molnar, linux-kernel On Wed, Mar 05, 2008 at 01:54:48PM +0000, Jarek Poplawski wrote: > Hi, > > dev->queue_lock is taken in a scenario like below: always after > dev->ingress_lock and p->tcfc_lock, so just like on this last > backtrace with info about held locks. But this report shows that > lockdep for some reason forgot the history before dev->queue_lock, > and recorded it again. It seems, even if there is something wrong > with init lockdep shouldn't report it like this. ...Hmmm... On the other hand, despite misleading dependency chain on this report, lockdep seems to be right: dev->queue_lock and dev->ingress_lock are really taken in a different order from qdisc_lock_tree() and while using act_mirred! Now I wonder why this warning is so rare? So, let's give a break to lockdep maintainers and linux-kernel, and try to figure it out more in netdev... Jarek P. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: circular locking, mirred, 2.6.24.2 2008-03-05 10:45 ` Denys Fedoryshchenko 2008-03-05 13:54 ` [BUG] Probably lockdep bug " Jarek Poplawski @ 2008-03-06 13:40 ` Jarek Poplawski 2008-03-06 13:57 ` Denys Fedoryshchenko 2008-03-06 13:59 ` jamal 1 sibling, 2 replies; 47+ messages in thread From: Jarek Poplawski @ 2008-03-06 13:40 UTC (permalink / raw) To: Denys Fedoryshchenko; +Cc: netdev, jamal On Wed, Mar 05, 2008 at 12:45:51PM +0200, Denys Fedoryshchenko wrote: > I did test on vanilla 2.6.25-rc3, on clean Gentoo distro and got > similar message. The strange thing, message appeared not immediately after > launching script, but after few seconds. > > Scripts is the same. I have same message on another script, used for ppp > shaper. > > [ 10.536424] ======================================================= > [ 10.536424] [ INFO: possible circular locking dependency detected ] > [ 10.536424] 2.6.25-rc3-devel #3 > [ 10.536424] ------------------------------------------------------- > [ 10.536424] swapper/0 is trying to acquire lock: > [ 10.536424] (&dev->queue_lock){-+..}, at: [<c0299b4a>] > dev_queue_xmit+0x175/0x2f3 > [ 10.536424] > [ 10.536424] but task is already holding lock: > [ 10.536424] (&p->tcfc_lock){-+..}, at: [<f8a67154>] tcf_mirred+0x20/0x178 > [act_mirred] > [ 10.536424] > [ 10.536424] which lock already depends on the new lock. ... Hi, I'm not sure this lockdep report is because of this, but there is really a problem with lock order while using sch_ingress with act_mirred: dev->queue_lock is taken after dev->ingress_lock, so reversely to e.g. qdisc_lock_tree(). This shouldn't be a problem when one of the devices is ifb yet. Regards, Jarek P. Here is a patch for testing: --- drivers/net/ifb.c | 19 +++++++++++++++++++ 1 files changed, 19 insertions(+), 0 deletions(-) diff --git a/drivers/net/ifb.c b/drivers/net/ifb.c index 15949d3..2bc71df 100644 --- a/drivers/net/ifb.c +++ b/drivers/net/ifb.c @@ -227,6 +227,22 @@ static struct rtnl_link_ops ifb_link_ops __read_mostly = { module_param(numifbs, int, 0); MODULE_PARM_DESC(numifbs, "Number of ifb devices"); +#ifdef CONFIG_DEBUG_LOCK_ALLOC +/* + * dev_ifb->queue_lock is usually taken after dev->ingress_lock, + * so let's tell lockdep it's different from dev->queue_lock + */ +static struct lock_class_key ifb_queue_lock_key; +static inline void ifb_set_lock_class(spinlock_t *lock) +{ + lockdep_set_class(lock, &ifb_queue_lock_key); +} +#else +static inline void ifb_set_lock_class(spinlock_t *lock) +{ +} +#endif /* CONFIG_DEBUG_LOCK_ALLOC */ + static int __init ifb_init_one(int index) { struct net_device *dev_ifb; @@ -246,6 +262,9 @@ static int __init ifb_init_one(int index) err = register_netdevice(dev_ifb); if (err < 0) goto err; + + ifb_set_lock_class(&dev_ifb->queue_lock); + return 0; err: ^ permalink raw reply related [flat|nested] 47+ messages in thread
* Re: circular locking, mirred, 2.6.24.2 2008-03-06 13:40 ` Jarek Poplawski @ 2008-03-06 13:57 ` Denys Fedoryshchenko 2008-03-06 14:27 ` jamal 2008-03-06 13:59 ` jamal 1 sibling, 1 reply; 47+ messages in thread From: Denys Fedoryshchenko @ 2008-03-06 13:57 UTC (permalink / raw) To: Jarek Poplawski; +Cc: netdev, jamal I am able to reproduce this warning over this relatively simple shell script on my Gentoo PC (2.6.25-rc3). http://www.nuclearcat.com/files/bug_feb.txt Probably it will help to debug issue for more experienced developers. Note: it appears not immediately, second time i tested, it's appeared after while, but in matter of seconds. Note - it can stop traffic on PC completely. It is also seems crashed my desktop PC, i am not able to execute "tc qdisc del dev eth0 root". The system hang completely. I had few similar issues on my PPPoE servers (with different scripts for shapers), that system hang, and even "reboot -f" doesn't work sometimes. On Thu, 6 Mar 2008 13:40:15 +0000, Jarek Poplawski wrote > On Wed, Mar 05, 2008 at 12:45:51PM +0200, Denys Fedoryshchenko wrote: > > I did test on vanilla 2.6.25-rc3, on clean Gentoo distro and got > > similar message. The strange thing, message appeared not immediately after > > launching script, but after few seconds. > > > > Scripts is the same. I have same message on another script, used for ppp > > shaper. > > > > [ 10.536424] ======================================================= > > [ 10.536424] [ INFO: possible circular locking dependency detected ] > > [ 10.536424] 2.6.25-rc3-devel #3 > > [ 10.536424] ------------------------------------------------------- > > [ 10.536424] swapper/0 is trying to acquire lock: > > [ 10.536424] (&dev->queue_lock){-+..}, at: [<c0299b4a>] > > dev_queue_xmit+0x175/0x2f3 > > [ 10.536424] > > [ 10.536424] but task is already holding lock: > > [ 10.536424] (&p->tcfc_lock){-+..}, at: [<f8a67154>] tcf_mirred+0x20/ 0x178 > > [act_mirred] > > [ 10.536424] > > [ 10.536424] which lock already depends on the new lock. > .... > > Hi, > > I'm not sure this lockdep report is because of this, but there is > really a problem with lock order while using sch_ingress with > act_mirred: dev->queue_lock is taken after dev->ingress_lock, so > reversely to e.g. qdisc_lock_tree(). This shouldn't be a problem > when one of the devices is ifb yet. > > Regards, > Jarek P. > > Here is a patch for testing: > > --- > > drivers/net/ifb.c | 19 +++++++++++++++++++ > 1 files changed, 19 insertions(+), 0 deletions(-) > > diff --git a/drivers/net/ifb.c b/drivers/net/ifb.c > index 15949d3..2bc71df 100644 > --- a/drivers/net/ifb.c > +++ b/drivers/net/ifb.c > @@ -227,6 +227,22 @@ static struct rtnl_link_ops ifb_link_ops > __read_mostly = { module_param(numifbs, int, 0); > MODULE_PARM_DESC(numifbs, "Number of ifb devices"); > > +#ifdef CONFIG_DEBUG_LOCK_ALLOC > +/* > + * dev_ifb->queue_lock is usually taken after dev->ingress_lock, > + * so let's tell lockdep it's different from dev->queue_lock > + */ > +static struct lock_class_key ifb_queue_lock_key; > +static inline void ifb_set_lock_class(spinlock_t *lock) > +{ > + lockdep_set_class(lock, &ifb_queue_lock_key); > +} > +#else > +static inline void ifb_set_lock_class(spinlock_t *lock) > +{ > +} > +#endif /* CONFIG_DEBUG_LOCK_ALLOC */ > + > static int __init ifb_init_one(int index) > { > struct net_device *dev_ifb; > @@ -246,6 +262,9 @@ static int __init ifb_init_one(int index) > err = register_netdevice(dev_ifb); > if (err < 0) > goto err; > + > + ifb_set_lock_class(&dev_ifb->queue_lock); > + > return 0; > > err: -- Denys Fedoryshchenko Technical Manager Virtual ISP S.A.L. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: circular locking, mirred, 2.6.24.2 2008-03-06 13:57 ` Denys Fedoryshchenko @ 2008-03-06 14:27 ` jamal 2008-03-06 15:50 ` Denys Fedoryshchenko 0 siblings, 1 reply; 47+ messages in thread From: jamal @ 2008-03-06 14:27 UTC (permalink / raw) To: Denys Fedoryshchenko; +Cc: Jarek Poplawski, netdev On Thu, 2008-06-03 at 15:57 +0200, Denys Fedoryshchenko wrote: > I am able to reproduce this warning over this relatively simple shell script > on my Gentoo PC (2.6.25-rc3). > http://www.nuclearcat.com/files/bug_feb.txt > That script looks pretty sane to me - nothing super-exciting. I suspect you eventually want them all to look like ifb1 on the egress. Do you see the same issue without the ifb1 speacial case? > Probably it will help to debug issue for more experienced developers. Note: > it appears not immediately, second time i tested, it's appeared after while, > but in matter of seconds. I wonder is there some latency from the moment you insmod ifb to the moment the tc rules take effect? Will it still happen if you dont have modules? Also note, that lock dependency is a bit strange, Jarek correct me if i am wrong; it seems to say: a packet received on ingress of some e1000 (ethx) gets acted on by mirred which ends grabbing lock of an ifb device - this part should be fine and no need for the alarm. The alarm seems to be a result of a loopback device that is being registered in between the two activities. i.e there are three devices affected with entirely different locks(ethx, ifbx, and loopback). Smells like lockdep is getting it wrong? > Note - it can stop traffic on PC completely. It is also seems crashed my > desktop PC, i am not able to execute "tc qdisc del dev eth0 root". > The system hang completely. I had few similar issues on my PPPoE servers > (with different scripts for shapers), that system hang, and even "reboot -f" > doesn't work sometimes. This sounds like a different issue from above - when did this start to happen? Is it at the same time as above warnings showing up? cheers, jamal ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: circular locking, mirred, 2.6.24.2 2008-03-06 14:27 ` jamal @ 2008-03-06 15:50 ` Denys Fedoryshchenko 2008-03-06 20:25 ` Jarek Poplawski 2008-03-06 20:44 ` jamal 0 siblings, 2 replies; 47+ messages in thread From: Denys Fedoryshchenko @ 2008-03-06 15:50 UTC (permalink / raw) To: hadi; +Cc: Jarek Poplawski, netdev On Thu, 06 Mar 2008 09:27:14 -0500, jamal wrote > On Thu, 2008-06-03 at 15:57 +0200, Denys Fedoryshchenko wrote: > > I am able to reproduce this warning over this relatively simple shell script > > on my Gentoo PC (2.6.25-rc3). > > http://www.nuclearcat.com/files/bug_feb.txt > > > > That script looks pretty sane to me - nothing super-exciting. I suspect > you eventually want them all to look like ifb1 on the egress. > Do you see the same issue without the ifb1 speacial case? Well, i am able to reproduce in much more trivial script. Tested 2.6.25-rc4 also. modprobe ifb ifconfig ifb0 up TC=/sbin/tc $TC qdisc del dev eth0 ingress 1>/dev/null 2>/dev/null $TC qdisc add dev eth0 ingress ${TC} filter add dev eth0 parent ffff: protocol ip prio 10 u32 \ match u32 0 0 flowid 1:1 \ action mirred egress redirect dev ifb0 > > > Probably it will help to debug issue for more experienced developers. Note: > > it appears not immediately, second time i tested, it's appeared after while, > > but in matter of seconds. > > I wonder is there some latency from the moment you insmod ifb to the > moment the tc rules take effect? Will it still happen if you dont > have modules? Also note, that lock dependency is a bit strange, > Jarek correct me if i am wrong; it seems to say: a packet received > on ingress of some e1000 (ethx) gets acted on by mirred which ends > grabbing lock of an ifb device - this part should be fine and no > need for the alarm. The alarm seems to be a result of a loopback > device that is being registered in between the two activities. > i.e there are three devices affected with entirely different > locks(ethx, ifbx, and loopback). Smells like lockdep is getting it wrong? No idea, i have strange lockup's on my systems where i have ifb, and that make me worry. And i feel it is directly related with my love to use ifb devices and way how i am using them. > > > Note - it can stop traffic on PC completely. It is also seems crashed my > > desktop PC, i am not able to execute "tc qdisc del dev eth0 root". > > The system hang completely. I had few similar issues on my PPPoE servers > > (with different scripts for shapers), that system hang, and even "reboot - f" > > doesn't work sometimes. > > This sounds like a different issue from above - when did this start > to happen? Is it at the same time as above warnings showing up? Yes, it is different issue seems, it is rare to lockup system , and i will dig more, to understand how it is happening. > > cheers, > jamal > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Denys Fedoryshchenko Technical Manager Virtual ISP S.A.L. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: circular locking, mirred, 2.6.24.2 2008-03-06 15:50 ` Denys Fedoryshchenko @ 2008-03-06 20:25 ` Jarek Poplawski 2008-03-06 20:56 ` jamal 2008-03-06 20:44 ` jamal 1 sibling, 1 reply; 47+ messages in thread From: Jarek Poplawski @ 2008-03-06 20:25 UTC (permalink / raw) To: Denys Fedoryshchenko; +Cc: hadi, netdev On Thu, Mar 06, 2008 at 05:50:59PM +0200, Denys Fedoryshchenko wrote: > On Thu, 06 Mar 2008 09:27:14 -0500, jamal wrote > > On Thu, 2008-06-03 at 15:57 +0200, Denys Fedoryshchenko wrote: > > > I am able to reproduce this warning over this relatively simple shell > script > > > on my Gentoo PC (2.6.25-rc3). > > > http://www.nuclearcat.com/files/bug_feb.txt > > > > > > > That script looks pretty sane to me - nothing super-exciting. I suspect > > you eventually want them all to look like ifb1 on the egress. > > Do you see the same issue without the ifb1 speacial case? > Well, i am able to reproduce in much more trivial script. Tested 2.6.25-rc4 > also. > > modprobe ifb > ifconfig ifb0 up > TC=/sbin/tc > $TC qdisc del dev eth0 ingress 1>/dev/null 2>/dev/null > $TC qdisc add dev eth0 ingress > ${TC} filter add dev eth0 parent ffff: protocol ip prio 10 u32 \ > match u32 0 0 flowid 1:1 \ > action mirred egress redirect dev ifb0 > It's really strange: I can't reproduce this, and if it were so easy we would get really a lot of similar reports. It looks like you have something special. This lockdep report with this kind of problem usually looks different too. The good side is it's easy to reproduce. So, could you try the patch below? (It's only supposed to fix the lockdep warning, not lockups). > > > > > Probably it will help to debug issue for more experienced developers. > Note: > > > it appears not immediately, second time i tested, it's appeared after > while, > > > but in matter of seconds. > > > > I wonder is there some latency from the moment you insmod ifb to the > > moment the tc rules take effect? Will it still happen if you dont > > have modules? Also note, that lock dependency is a bit strange, > > Jarek correct me if i am wrong; it seems to say: a packet received > > on ingress of some e1000 (ethx) gets acted on by mirred which ends > > grabbing lock of an ifb device - this part should be fine and no > > need for the alarm. The alarm seems to be a result of a loopback > > device that is being registered in between the two activities. > > i.e there are three devices affected with entirely different > > locks(ethx, ifbx, and loopback). Smells like lockdep is getting it wrong? Yes, the details look wrong, but IMHO there are reasons to report possible problems yet. > No idea, i have strange lockup's on my systems where i have ifb, and that > make me worry. And i feel it is directly related with my love to use ifb > devices and way how i am using them. Did you try to change the order of rules in this first script: ifb rules before eth? > > > Note - it can stop traffic on PC completely. It is also seems crashed my > > > desktop PC, i am not able to execute "tc qdisc del dev eth0 root". > > > The system hang completely. I had few similar issues on my PPPoE servers > > > (with different scripts for shapers), that system hang, and even "reboot - > f" > > > doesn't work sometimes. > > > > This sounds like a different issue from above - when did this start > > to happen? Is it at the same time as above warnings showing up? > > Yes, it is different issue seems, it is rare to lockup system , and i will > dig more, to understand how it is happening. If you are willing for debugging I would be interested with results or sending some patches. Cheers, Jarek P. (testing patch take 2) --- drivers/net/ifb.c | 24 ++++++++++++++++++++++++ 1 files changed, 24 insertions(+), 0 deletions(-) diff --git a/drivers/net/ifb.c b/drivers/net/ifb.c index 15949d3..c553b62 100644 --- a/drivers/net/ifb.c +++ b/drivers/net/ifb.c @@ -227,6 +227,27 @@ static struct rtnl_link_ops ifb_link_ops __read_mostly = { module_param(numifbs, int, 0); MODULE_PARM_DESC(numifbs, "Number of ifb devices"); +#ifdef CONFIG_DEBUG_LOCK_ALLOC +/* + * dev_ifb->queue_lock is usually taken after dev->ingress_lock, + * reversely to e.g. qdisc_lock_tree(). It should be safe until + * ifb doesn't take dev->queue_lock with dev_ifb->ingress_lock. + * But lockdep should know that ifb has different locks from dev. + */ +static struct lock_class_key ifb_queue_lock_key; +static struct lock_class_key ifb_ingress_lock_key; + +static inline void ifb_set_lock_classes(struct net_device *dev_ifb) +{ + lockdep_set_class(&dev_ifb->queue_lock, &ifb_queue_lock_key); + lockdep_set_class(&dev_ifb->ingress_lock, &ifb_ingress_lock_key); +} +#else +static inline void ifb_set_lock_classes(struct net_device *dev_ifb) +{ +} +#endif /* CONFIG_DEBUG_LOCK_ALLOC */ + static int __init ifb_init_one(int index) { struct net_device *dev_ifb; @@ -246,6 +267,9 @@ static int __init ifb_init_one(int index) err = register_netdevice(dev_ifb); if (err < 0) goto err; + + ifb_set_lock_classes(dev_ifb); + return 0; err: ^ permalink raw reply related [flat|nested] 47+ messages in thread
* Re: circular locking, mirred, 2.6.24.2 2008-03-06 20:25 ` Jarek Poplawski @ 2008-03-06 20:56 ` jamal 2008-03-06 22:12 ` Jarek Poplawski 0 siblings, 1 reply; 47+ messages in thread From: jamal @ 2008-03-06 20:56 UTC (permalink / raw) To: Jarek Poplawski; +Cc: Denys Fedoryshchenko, netdev On Thu, 2008-06-03 at 21:25 +0100, Jarek Poplawski wrote: > It's really strange: I can't reproduce this, I couldnt either - are you using 2.6.25-rc4? > and if it were so easy we > would get really a lot of similar reports. It looks like you have > something special. This lockdep report with this kind of problem > usually looks different too. The good side is it's easy to reproduce. > So, could you try the patch below? (It's only supposed to fix the lockdep > warning, not lockups). This is more out of ignorance: Why is ifb needing the extra teaching for lockdep? It is a netdevice - shouldnt the two global lockdeps you described earlier not be sufficient? cheers, jamal ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: circular locking, mirred, 2.6.24.2 2008-03-06 20:56 ` jamal @ 2008-03-06 22:12 ` Jarek Poplawski 2008-03-06 23:43 ` Denys Fedoryshchenko 0 siblings, 1 reply; 47+ messages in thread From: Jarek Poplawski @ 2008-03-06 22:12 UTC (permalink / raw) To: jamal; +Cc: Denys Fedoryshchenko, netdev On Thu, Mar 06, 2008 at 03:56:40PM -0500, jamal wrote: > On Thu, 2008-06-03 at 21:25 +0100, Jarek Poplawski wrote: > > > It's really strange: I can't reproduce this, > > I couldnt either - are you using 2.6.25-rc4? No, David's net-2.6 tree so 2.6.25-rc3 plus something... > > > and if it were so easy we > > would get really a lot of similar reports. It looks like you have > > something special. This lockdep report with this kind of problem > > usually looks different too. The good side is it's easy to reproduce. > > So, could you try the patch below? (It's only supposed to fix the lockdep > > warning, not lockups). > > This is more out of ignorance: Why is ifb needing the extra teaching for > lockdep? It is a netdevice - shouldnt the two global lockdeps you > described earlier not be sufficient? As I've written in the previous message, currently lockdep tracks dev->queue_lock and dev->ingress_lock as only two locks used by all net devices (unless they were annotated individually). So, it's like A and B lock, and it's really not right to them AB in one place, and BA in another. In reality each net_device's locks are independent, so ifb has C and D. And it's not AB vs. BA, but: AB (eth/lo->queue_lock, eth/lo->ingress_lock), CD (the same for ifb) and BC (eth/lo->ingress_lock, ifb->queue_lock) - all legal combinations, and no inversion. Cheers, Jarek P. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: circular locking, mirred, 2.6.24.2 2008-03-06 22:12 ` Jarek Poplawski @ 2008-03-06 23:43 ` Denys Fedoryshchenko 2008-03-07 0:09 ` jamal 2008-03-07 9:31 ` Jarek Poplawski 0 siblings, 2 replies; 47+ messages in thread From: Denys Fedoryshchenko @ 2008-03-06 23:43 UTC (permalink / raw) To: jamal, Jarek Poplawski; +Cc: netdev About reproducing, I think .config matter Mine is at http://www.nuclearcat.com/files/config.txt About hardware, maybe it is important - it is Intel dual core machine (32- bit). I test few more times, i can reproduce it for sure. It gives lockdep on first incoming packet. More info about hardware: CPU model name : Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz Other machines where i get this lockdep also Core Duo CPU's. I test now on old 2xXeon,P4 with hyperthreading - also triggered. Shortest script i use: modprobe ifb ifconfig ifb0 up /sbin/tc qdisc del dev eth0 ingress 1>/dev/null 2>/dev/null /sbin/tc qdisc add dev eth0 ingress /sbin/tc filter add dev eth0 parent ffff: protocol ip prio 10 u32 \ match u32 0 0 flowid 1:1 \ action mirred egress redirect dev ifb0 On Thu, 6 Mar 2008 23:12:53 +0100, Jarek Poplawski wrote > On Thu, Mar 06, 2008 at 03:56:40PM -0500, jamal wrote: > > On Thu, 2008-06-03 at 21:25 +0100, Jarek Poplawski wrote: > > > > > It's really strange: I can't reproduce this, > > > > I couldnt either - are you using 2.6.25-rc4? > > No, David's net-2.6 tree so 2.6.25-rc3 plus something... > > > > > > and if it were so easy we > > > would get really a lot of similar reports. It looks like you have > > > something special. This lockdep report with this kind of problem > > > usually looks different too. The good side is it's easy to reproduce. > > > So, could you try the patch below? (It's only supposed to fix the lockdep > > > warning, not lockups). > > > > This is more out of ignorance: Why is ifb needing the extra teaching for > > lockdep? It is a netdevice - shouldnt the two global lockdeps you > > described earlier not be sufficient? > > As I've written in the previous message, currently lockdep tracks > dev->queue_lock and dev->ingress_lock as only two locks used by all > net devices (unless they were annotated individually). So, it's like > A and B lock, and it's really not right to them AB in one place, and > BA in another. In reality each net_device's locks are independent, > so ifb has C and D. And it's not AB vs. BA, but: AB (eth/lo- > >queue_lock, eth/lo->ingress_lock), CD (the same for ifb) and BC > (eth/lo->ingress_lock, ifb->queue_lock) - all legal combinations, > and no inversion. > > Cheers, > Jarek P. > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Denys Fedoryshchenko Technical Manager Virtual ISP S.A.L. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: circular locking, mirred, 2.6.24.2 2008-03-06 23:43 ` Denys Fedoryshchenko @ 2008-03-07 0:09 ` jamal 2008-03-07 0:15 ` Denys Fedoryshchenko 2008-03-07 9:31 ` Jarek Poplawski 1 sibling, 1 reply; 47+ messages in thread From: jamal @ 2008-03-07 0:09 UTC (permalink / raw) To: Denys Fedoryshchenko; +Cc: Jarek Poplawski, netdev Denys, On Fri, 2008-07-03 at 01:43 +0200, Denys Fedoryshchenko wrote: > About reproducing, I think .config matter > Mine is at http://www.nuclearcat.com/files/config.txt > Ok, that + SMP is what i am lacking. Given i am in travel mode i will not be able to try it out on SMP for a few days but i will try to build the kernel with your config. My initial test with latest git also seems to fail to reproduce it. In any case, i think the lockdep alarm is false positive. One thing i couldnt figure from your emails is whether you saw the traces on the latest kernels or it has always been there. cheers, jamal ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: circular locking, mirred, 2.6.24.2 2008-03-07 0:09 ` jamal @ 2008-03-07 0:15 ` Denys Fedoryshchenko 2008-03-07 0:25 ` jamal 0 siblings, 1 reply; 47+ messages in thread From: Denys Fedoryshchenko @ 2008-03-07 0:15 UTC (permalink / raw) To: hadi; +Cc: Jarek Poplawski, netdev I just start to using lockdep on servers with shaper over ifb from 2.6.24. So it is triggering from 2.6.24 till latest 2.6.25-rc4. If you want - i can build some specific old version of kernel with lockdep. On Thu, 06 Mar 2008 19:09:59 -0500, jamal wrote > Denys, > > On Fri, 2008-07-03 at 01:43 +0200, Denys Fedoryshchenko wrote: > > About reproducing, I think .config matter > > Mine is at http://www.nuclearcat.com/files/config.txt > > > > Ok, that + SMP is what i am lacking. Given i am in travel mode i will > not be able to try it out on SMP for a few days but i will try to build > the kernel with your config. My initial test with latest git also seems > to fail to reproduce it. > > In any case, i think the lockdep alarm is false positive. > > One thing i couldnt figure from your emails is whether you saw the > traces on the latest kernels or it has always been there. > > cheers, > jamal > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Denys Fedoryshchenko Technical Manager Virtual ISP S.A.L. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: circular locking, mirred, 2.6.24.2 2008-03-07 0:15 ` Denys Fedoryshchenko @ 2008-03-07 0:25 ` jamal 0 siblings, 0 replies; 47+ messages in thread From: jamal @ 2008-03-07 0:25 UTC (permalink / raw) To: Denys Fedoryshchenko; +Cc: Jarek Poplawski, netdev On Fri, 2008-07-03 at 02:15 +0200, Denys Fedoryshchenko wrote: > I just start to using lockdep on servers with shaper over ifb from 2.6.24. So > it is triggering from 2.6.24 till latest 2.6.25-rc4. If you want - i can > build some specific old version of kernel with lockdep. I think i should be able to try with your .config first with the latest git. I should be able to at least see the traces. I am just going to compile some things into the kernel, example: --- CONFIG_IFB=m CONFIG_DUMMY=m CONFIG_BONDING=m CONFIG_MACVLAN=m CONFIG_EQUALIZER=m CONFIG_TUN=m ---- Can you try that as well? Then if it doesnt happen with me, we'll narrow it to need for smp. cheers, jamal ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: circular locking, mirred, 2.6.24.2 2008-03-06 23:43 ` Denys Fedoryshchenko 2008-03-07 0:09 ` jamal @ 2008-03-07 9:31 ` Jarek Poplawski 2008-03-07 10:19 ` Denys Fedoryshchenko 1 sibling, 1 reply; 47+ messages in thread From: Jarek Poplawski @ 2008-03-07 9:31 UTC (permalink / raw) To: Denys Fedoryshchenko; +Cc: jamal, netdev On Fri, Mar 07, 2008 at 01:43:33AM +0200, Denys Fedoryshchenko wrote: > About reproducing, I think .config matter > Mine is at http://www.nuclearcat.com/files/config.txt I see: CONFIG_4KSTACKS=y but CONFIG_DEBUG_STACKOVERFLOW is not set. This doesn't look safe to me... And can trigger really strange things. Probably dmesg could be interesting too. And since it all takes place and can change in time, I wonder if it isn't better to open a report in bugzilla for this. Of course, you should remember to mask any confidential information. I'm also not sure this: http://www.nuclearcat.com/files/bug_feb.txt is a full script (no more ifbs?) or an example. Doesn't "magic sysrq" work during such lockups? Denys, there were some reports on similar problems with ifb on SMP, but this was hard to trigger and debugging stopped for some reason. It seems there were some timer OOPSes, but this could be because wrong locking too. This could be even because some other apps can't handle their lost net traffic. Without some good log traces this could be impossible to tell. And such debugging shouldn't be done at production of course: it's really hard to foresee any locking changes. So, until you are willing to respond to our proposals and try the patches I can see no problem with assisting this problem. BTW, probably it's easier to stick to one kernel version, so maybe 2.6.24.3 isn't a bad choice if 2.6.25-rc4 didn't help at all. Regards, Jarek P. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: circular locking, mirred, 2.6.24.2 2008-03-07 9:31 ` Jarek Poplawski @ 2008-03-07 10:19 ` Denys Fedoryshchenko 2008-03-07 10:48 ` Jarek Poplawski 2008-03-07 14:58 ` jamal 0 siblings, 2 replies; 47+ messages in thread From: Denys Fedoryshchenko @ 2008-03-07 10:19 UTC (permalink / raw) To: Jarek Poplawski; +Cc: jamal, netdev Ok, i will proceed and will try all required options after few hours (need to go another city, work). I will try to set max as possible debug options. On bug_feb.txt it is triggering ONLY lockdep warning, but full network lockup occur very rarely. But even it happens with me 2 times, i am not sure it is same problem. Let's stick maybe to this one (lockdep warning) which is easy to reproduce. I will open also bugzilla report. Regarding testing with compiled in modules, i did CONFIG_IFB=y CONFIG_DUMMY=y CONFIG_BONDING=y CONFIG_MACVLAN=y CONFIG_EQUALIZER=y CONFIG_TUN=y It is still triggering lockdep. Here is full dmesg: http://www.nuclearcat.com/files/dmesg_1234.txt On Fri, 7 Mar 2008 09:31:50 +0000, Jarek Poplawski wrote > On Fri, Mar 07, 2008 at 01:43:33AM +0200, Denys Fedoryshchenko wrote: > > About reproducing, I think .config matter > > Mine is at http://www.nuclearcat.com/files/config.txt > > I see: CONFIG_4KSTACKS=y but CONFIG_DEBUG_STACKOVERFLOW is not set. > This doesn't look safe to me... And can trigger really strange things. > > Probably dmesg could be interesting too. And since it all takes place > and can change in time, I wonder if it isn't better to open a report > in bugzilla for this. Of course, you should remember to mask any > confidential information. > > I'm also not sure this: > http://www.nuclearcat.com/files/bug_feb.txt > is a full script (no more ifbs?) or an example. Doesn't "magic sysrq" > work during such lockups? > > Denys, there were some reports on similar problems with ifb on SMP, > but this was hard to trigger and debugging stopped for some reason. > It seems there were some timer OOPSes, but this could be because > wrong locking too. This could be even because some other apps can't > handle their lost net traffic. Without some good log traces this > could be impossible to tell. And such debugging shouldn't be done at > production of course: it's really hard to foresee any locking > changes. So, until you are willing to respond to our proposals and > try the patches I can see no problem with assisting this problem. > > BTW, probably it's easier to stick to one kernel version, so maybe > 2.6.24.3 isn't a bad choice if 2.6.25-rc4 didn't help at all. > > Regards, > Jarek P. > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Denys Fedoryshchenko Technical Manager Virtual ISP S.A.L. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: circular locking, mirred, 2.6.24.2 2008-03-07 10:19 ` Denys Fedoryshchenko @ 2008-03-07 10:48 ` Jarek Poplawski 2008-03-07 14:58 ` jamal 1 sibling, 0 replies; 47+ messages in thread From: Jarek Poplawski @ 2008-03-07 10:48 UTC (permalink / raw) To: Denys Fedoryshchenko; +Cc: jamal, netdev On Fri, Mar 07, 2008 at 12:19:03PM +0200, Denys Fedoryshchenko wrote: > Ok, i will proceed and will try all required options after few hours (need to > go another city, work). > I will try to set max as possible debug options. Max could be too much, but e.g. this is sometimes helpful too: CONFIG_DEBUG_LIST is not set BTW, you didn't write yet if with min/no debugging, no 4KSTACKS and lockdep off there is any change in these lockups? And no need to hurry... > On bug_feb.txt it is triggering ONLY lockdep warning, but full network lockup > occur very rarely. But even it happens with me 2 times, i am not sure it is > same problem. Let's stick maybe to this one (lockdep warning) which is easy > to reproduce. > I will open also bugzilla report. Send Re: here with a bugzilla number. > > Regarding testing with compiled in modules, i did > CONFIG_IFB=y > CONFIG_DUMMY=y > CONFIG_BONDING=y > CONFIG_MACVLAN=y > CONFIG_EQUALIZER=y > CONFIG_TUN=y > > It is still triggering lockdep. > Here is full dmesg: > http://www.nuclearcat.com/files/dmesg_1234.txt OK, but it seems still without my last patch to ifb? Thanks, Jarek P. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: circular locking, mirred, 2.6.24.2 2008-03-07 10:19 ` Denys Fedoryshchenko 2008-03-07 10:48 ` Jarek Poplawski @ 2008-03-07 14:58 ` jamal 1 sibling, 0 replies; 47+ messages in thread From: jamal @ 2008-03-07 14:58 UTC (permalink / raw) To: Denys Fedoryshchenko; +Cc: Jarek Poplawski, netdev On Fri, 2008-07-03 at 12:19 +0200, Denys Fedoryshchenko wrote: > Ok, i will proceed and will try all required options after few hours (need to > go another city, work). > I will try to set max as possible debug options. This is embarassing on my part - all this time i claimed i couldnt reproduce it was because i didnt have DEPLOCK debuging on. I apologize. I am only have time to rebuild the kernel but not test it. In any case, Denys even if i do reproduce it this is a false positive i.e it is just noise in my opinion and you can safely ignore it (refer to my emails to Jarek). Perhaps we should invest the time trying to reproduce the other issues you have instead. cheers, jamal ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: circular locking, mirred, 2.6.24.2 2008-03-06 15:50 ` Denys Fedoryshchenko 2008-03-06 20:25 ` Jarek Poplawski @ 2008-03-06 20:44 ` jamal 1 sibling, 0 replies; 47+ messages in thread From: jamal @ 2008-03-06 20:44 UTC (permalink / raw) To: Denys Fedoryshchenko; +Cc: Jarek Poplawski, netdev On Thu, 2008-06-03 at 17:50 +0200, Denys Fedoryshchenko wrote: > Well, i am able to reproduce in much more trivial script. Tested 2.6.25-rc4 > also. > > modprobe ifb > ifconfig ifb0 up > TC=/sbin/tc > $TC qdisc del dev eth0 ingress 1>/dev/null 2>/dev/null > $TC qdisc add dev eth0 ingress > ${TC} filter add dev eth0 parent ffff: protocol ip prio 10 u32 \ > match u32 0 0 flowid 1:1 \ > action mirred egress redirect dev ifb0 > I have tried this on my laptop - 2.6.24 and it doesnt happen. Indicates whatever issues that are occuring must be post 2.6.24 since you are running something newer. > > locks(ethx, ifbx, and loopback). Smells like lockdep is getting it wrong? > No idea, i have strange lockup's on my systems where i have ifb, and that > make me worry. And i feel it is directly related with my love to use ifb > devices and way how i am using them. I will like to help resolve it - if i can reproduce it i can fix it. I love the fact you love to use ifb. I am going to try later on 2.6.25-rc4 that you are running. > > This sounds like a different issue from above - when did this start > > to happen? Is it at the same time as above warnings showing up? > > Yes, it is different issue seems, it is rare to lockup system , and i will > dig more, to understand how it is happening. Are you running std kernels or do you have some other patches? cheers, jamal ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: circular locking, mirred, 2.6.24.2 2008-03-06 13:40 ` Jarek Poplawski 2008-03-06 13:57 ` Denys Fedoryshchenko @ 2008-03-06 13:59 ` jamal 2008-03-06 17:56 ` Jarek Poplawski 1 sibling, 1 reply; 47+ messages in thread From: jamal @ 2008-03-06 13:59 UTC (permalink / raw) To: Jarek Poplawski; +Cc: Denys Fedoryshchenko, netdev On Thu, 2008-06-03 at 13:40 +0000, Jarek Poplawski wrote: > I'm not sure this lockdep report is because of this, but there is > really a problem with lock order while using sch_ingress with > act_mirred: dev->queue_lock is taken after dev->ingress_lock, so > reversely to e.g. qdisc_lock_tree(). This shouldn't be a problem > when one of the devices is ifb yet. Are there more details? Ingress of which netdevice is redirecting to egress of which netdevice? Sorry, I dont understand much about the internals of lockdep so i dont know what you are teaching it in the patch below... cheers, jamal ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: circular locking, mirred, 2.6.24.2 2008-03-06 13:59 ` jamal @ 2008-03-06 17:56 ` Jarek Poplawski 2008-03-06 20:48 ` jamal 0 siblings, 1 reply; 47+ messages in thread From: Jarek Poplawski @ 2008-03-06 17:56 UTC (permalink / raw) To: jamal; +Cc: Denys Fedoryshchenko, netdev On Thu, Mar 06, 2008 at 08:59:55AM -0500, jamal wrote: > On Thu, 2008-06-03 at 13:40 +0000, Jarek Poplawski wrote: > > > I'm not sure this lockdep report is because of this, but there is > > really a problem with lock order while using sch_ingress with > > act_mirred: dev->queue_lock is taken after dev->ingress_lock, so > > reversely to e.g. qdisc_lock_tree(). This shouldn't be a problem > > when one of the devices is ifb yet. > > Are there more details? Ingress of which netdevice is redirecting to > egress of which netdevice? > Sorry, I dont understand much about the internals of lockdep so i dont > know what you are teaching it in the patch below... Every netdevice after register_netdevice() has its queue_lock and ingress_lock initialized with the same static lock_class_keys, so unless changed later, these locks are treated by lockdep as 2 global locks. So, taking such locks with different order should be reported. This really happens in act_mirred, and I don't know yet, why this wasn't reported earlier. Of course, if there are two different devices this could be safe, but not always (e.g. thread1 has dev_eth0->ingress_lock and wants dev_eth1->queue_lock, thread2 has dev_eth1->ingress_lock, wants dev_eth0->qdisc_lock, and thread3 has dev_eth0->qdisc_lock and wants dev_eth0->ingress_lock). With ifb this shouldn't be the case, but anyway we have to tell lockdep that ifb uses a different pair of locks. My patch teaches lockdep about queue_lock, but after rethinking it seems it's not enough, and I'll have to update this patch with ingress_lock too. Denys, I'll have to read your script yet, so you can wait with this patching (unless you've started already - anyway this patch shouldn't be harmful). Thanks, Jarek P. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: circular locking, mirred, 2.6.24.2 2008-03-06 17:56 ` Jarek Poplawski @ 2008-03-06 20:48 ` jamal 2008-03-06 21:40 ` Jarek Poplawski 0 siblings, 1 reply; 47+ messages in thread From: jamal @ 2008-03-06 20:48 UTC (permalink / raw) To: Jarek Poplawski; +Cc: Denys Fedoryshchenko, netdev On Thu, 2008-06-03 at 18:56 +0100, Jarek Poplawski wrote: > Every netdevice after register_netdevice() has its queue_lock and > ingress_lock initialized with the same static lock_class_keys, so unless > changed later, these locks are treated by lockdep as 2 global locks. > So, taking such locks with different order should be reported. ok. > This really > happens in act_mirred, and I don't know yet, why this wasn't reported > earlier. Look closely at those traces again ;-> there are *three* different netdevices involved, one (loopback) seems to be _totaly_ unrelated. Tracing of those locks just seems confused - perhaps the pernet stuff is confusing loopback? > Of course, if there are two different devices this could be safe, but > not always (e.g. thread1 has dev_eth0->ingress_lock and wants > dev_eth1->queue_lock, thread2 has dev_eth1->ingress_lock, wants > dev_eth0->qdisc_lock, and thread3 has dev_eth0->qdisc_lock and wants > dev_eth0->ingress_lock). With ifb this shouldn't be the case, but > anyway we have to tell lockdep that ifb uses a different pair of locks. thread3 can not happen because we dont allow it (the other 2 are not contentious). There are cases where redirecting will cause you problems (example if you redirected to yourself and cause an infinite redirect) which are listed in iproute2/doc. Denys script is fine afaics. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: circular locking, mirred, 2.6.24.2 2008-03-06 20:48 ` jamal @ 2008-03-06 21:40 ` Jarek Poplawski 2008-03-06 23:40 ` jamal 0 siblings, 1 reply; 47+ messages in thread From: Jarek Poplawski @ 2008-03-06 21:40 UTC (permalink / raw) To: jamal; +Cc: Denys Fedoryshchenko, netdev On Thu, Mar 06, 2008 at 03:48:42PM -0500, jamal wrote: > On Thu, 2008-06-03 at 18:56 +0100, Jarek Poplawski wrote: > > > Every netdevice after register_netdevice() has its queue_lock and > > ingress_lock initialized with the same static lock_class_keys, so unless > > changed later, these locks are treated by lockdep as 2 global locks. > > So, taking such locks with different order should be reported. > > ok. > > > This really > > happens in act_mirred, and I don't know yet, why this wasn't reported > > earlier. > > Look closely at those traces again ;-> there are *three* different > netdevices involved, one (loopback) seems to be _totaly_ unrelated. > Tracing of those locks just seems confused - perhaps the pernet stuff is > confusing loopback? But currently lockdep doesn't know dev->queue_lock could mean eth or lo. It sees one class of devices using one lock. We can let it know (e.g. dev->_xmit_lock is different according to dev->type), but it wasn't necessary. I hope it will suffice here if lockdep knows more about ifb, but similar problem could theoretically happen with other devs. > > Of course, if there are two different devices this could be safe, but > > not always (e.g. thread1 has dev_eth0->ingress_lock and wants > > dev_eth1->queue_lock, thread2 has dev_eth1->ingress_lock, wants > > dev_eth0->qdisc_lock, and thread3 has dev_eth0->qdisc_lock and wants Sorry, should be: dev_eth0->queue_lock, and thread3 has dev_eth0->queue_lock and wants > > dev_eth0->ingress_lock). With ifb this shouldn't be the case, but > > anyway we have to tell lockdep that ifb uses a different pair of locks. > > thread3 can not happen because we dont allow it (the other 2 are not > contentious). Could you explain why? It's a qdisc_lock_tree case and probably not only this. > There are cases where redirecting will cause you problems (example if > you redirected to yourself and cause an infinite redirect) which are > listed in iproute2/doc. Denys script is fine afaics. Yes, but it seems such redirection between two eths like above mentioned is legal? Jarek P. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: circular locking, mirred, 2.6.24.2 2008-03-06 21:40 ` Jarek Poplawski @ 2008-03-06 23:40 ` jamal 2008-03-07 7:51 ` Jarek Poplawski 0 siblings, 1 reply; 47+ messages in thread From: jamal @ 2008-03-06 23:40 UTC (permalink / raw) To: Jarek Poplawski; +Cc: Denys Fedoryshchenko, netdev Jarek, On Thu, 2008-06-03 at 22:40 +0100, Jarek Poplawski wrote: > But currently lockdep doesn't know dev->queue_lock could mean eth or lo. > It sees one class of devices using one lock. yikes ;-> I missed the implication when you mentioned it first - thanks for the enlightnment. IMO, it needs to be per netdevice not global; so i contend this is the problem i.e those warnings are false. > We can let it know (e.g. > dev->_xmit_lock is different according to dev->type), but it wasn't > necessary. I hope it will suffice here if lockdep knows more about ifb, > but similar problem could theoretically happen with other devs. I think the condition needs to be met for all netdevices, not just ifb. Instead of annotating each netdevice type separately, could you not use a different annotation in the generic netdev registration code of the form dev_%s->ingress/queue_lock ? (where %s is dev->name) > Sorry, should be: > dev_eth0->queue_lock, and thread3 has dev_eth0->queue_lock and wants indeed - i understood when you said that; it doesnt invalidate what i said. > > thread3 can not happen because we dont allow it (the other 2 are not > > contentious). > > Could you explain why? Because we never redirect to ingress (it is in the todo as mentioned as described at the top of the source file). You can redirect from ingress or egress but only to egress (to sockets and ingress are going to happen at some future point). > Yes, but it seems such redirection between two eths like above mentioned > is legal? Indeed it is. I think it is clear to me now the issues seem to be the namespace of what lockdep - it should be per-device and NOT global. What do you think? cheers, jamal ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: circular locking, mirred, 2.6.24.2 2008-03-06 23:40 ` jamal @ 2008-03-07 7:51 ` Jarek Poplawski 2008-03-07 8:32 ` Jarek Poplawski 2008-03-07 13:53 ` jamal 0 siblings, 2 replies; 47+ messages in thread From: Jarek Poplawski @ 2008-03-07 7:51 UTC (permalink / raw) To: jamal; +Cc: Denys Fedoryshchenko, netdev On Thu, Mar 06, 2008 at 06:40:39PM -0500, jamal wrote: > Jarek, > > On Thu, 2008-06-03 at 22:40 +0100, Jarek Poplawski wrote: > > > But currently lockdep doesn't know dev->queue_lock could mean eth or lo. > > It sees one class of devices using one lock. > > yikes ;-> I missed the implication when you mentioned it first - thanks > for the enlightnment. IMO, it needs to be per netdevice not global; so i > contend this is the problem i.e those warnings are false. > > > We can let it know (e.g. > > dev->_xmit_lock is different according to dev->type), but it wasn't > > necessary. I hope it will suffice here if lockdep knows more about ifb, > > but similar problem could theoretically happen with other devs. > > I think the condition needs to be met for all netdevices, not just ifb. > Instead of annotating each netdevice type separately, could you not use > a different annotation in the generic netdev registration code of the > form dev_%s->ingress/queue_lock ? (where %s is dev->name) At first this lockdep method looks like wrong (too general). But it works. With every lock tracked separately there would be huge overhead (especially with some structures created in thousands of instances). And with this general approach we can see general problems: e.g. wrong locking order even reported on two different devices like here, and impossible in reality, can help to analyze if the same could happen in another, maybe even very hard to reproduce, variant, and to fix this before ever happened. So, current way is to start from very general tracking, and if it doesn't work (I mean false reports, not lockups), we make it more specific. With dev->_xmit_lock the type level seems to be enough for some time (so lockdep doesn't even know eth0 and eth1 created as ARPHRD_ETHER use 2 different _xmit_locks). Such specific information is usually necessary only with nesting. But here, with sch_ingress + act_mirred it's not about nesting: it's really about wrong order, which could be safe only because ifb doesn't currently work with a full egress + ingress functionality. Making this all more specific for lockdep won't help with real lockups then, but will make reports more exact and less reproducible. > > Sorry, should be: > > dev_eth0->queue_lock, and thread3 has dev_eth0->queue_lock and wants > > indeed - i understood when you said that; it doesnt invalidate what i > said. > > > > thread3 can not happen because we dont allow it (the other 2 are not > > > contentious). > > > > Could you explain why? > > Because we never redirect to ingress (it is in the todo as mentioned as > described at the top of the source file). You can redirect from ingress > or egress but only to egress (to sockets and ingress are going to happen > at some future point). > > > Yes, but it seems such redirection between two eths like above mentioned > > is legal? > > Indeed it is. I think it is clear to me now the issues seem to be the > namespace of what lockdep - it should be per-device and NOT global. > What do you think? I've to find first what really bothers lockdep here, and why this queue_lock vs. ingress_lock order isn't reported "by default". But if this really is like it looks now, then it seems before doing this ingress "future point" some change in locking could be necessary. (Maybe even sooner if lockdep finds something real after this current patch to ifb.) Cheers, Jarek P. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: circular locking, mirred, 2.6.24.2 2008-03-07 7:51 ` Jarek Poplawski @ 2008-03-07 8:32 ` Jarek Poplawski 2008-03-07 13:53 ` jamal 1 sibling, 0 replies; 47+ messages in thread From: Jarek Poplawski @ 2008-03-07 8:32 UTC (permalink / raw) To: jamal; +Cc: Denys Fedoryshchenko, netdev On Fri, Mar 07, 2008 at 07:51:13AM +0000, Jarek Poplawski wrote: ... > I've to find first what really bothers lockdep here, and why this > queue_lock vs. ingress_lock order isn't reported "by default". But if > this really is like it looks now, then it seems before doing this > ingress "future point" some change in locking could be necessary. > (Maybe even sooner if lockdep finds something real after this current > patch to ifb.) Actually, I got it a bit wrong: "the problem" with ifb could probably exist now: if we would redirect e.g. from ifb0's ingress to eth0's egress, while doing the same thing from eth0 to ifb0. But can we get any traffic on ifb0's ingress? And why would we do this after all? Jarek P. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: circular locking, mirred, 2.6.24.2 2008-03-07 7:51 ` Jarek Poplawski 2008-03-07 8:32 ` Jarek Poplawski @ 2008-03-07 13:53 ` jamal 2008-03-08 8:46 ` Jarek Poplawski 1 sibling, 1 reply; 47+ messages in thread From: jamal @ 2008-03-07 13:53 UTC (permalink / raw) To: Jarek Poplawski; +Cc: Denys Fedoryshchenko, netdev On Fri, 2008-07-03 at 07:51 +0000, Jarek Poplawski wrote: > At first this lockdep method looks like wrong (too general). But it > works. With every lock tracked separately there would be huge overhead > (especially with some structures created in thousands of instances). > And with this general approach we can see general problems: e.g. > wrong locking order even reported on two different devices like here, > and impossible in reality, The problem is not so much so that it is an impossible situation, rather it is not a useful detail to report as is. As an example, heres something i consider useful to catch: dev->queuelock for ethx being held recursively In your current setup, there will be many false-positives. i.e it will show something like dev->queuelock for eth0 is being held and flagging that as a problem because dev->queuelock for eth1 is also being held. i.e a lot of noise. > can help to analyze if the same could > happen in another, maybe even very hard to reproduce, variant, and to > fix this before ever happened. It is possible some odd issue will be found by having the lockdeps in their current setup - but if you have to sift through noise in order to find it, then it makes it harder. > So, current way is to start from very general tracking, and if it > doesn't work (I mean false reports, not lockups), we make it more > specific. Is it time to do that now? I know you said it was too much data to carry. However, such data depends on how many netdevices are on the system and lockdep is optional. You could even make NETDEV_LOCKDEP version which is accurate and the other one as being less accurate. > With dev->_xmit_lock the type level seems to be enough > for some time (so lockdep doesn't even know eth0 and eth1 created > as ARPHRD_ETHER use 2 different _xmit_locks). I am curious what issue has been found with this approach. I can see that someone who knows what they are doing could sift through reports and be able to catch something. My thinking is it most reports are going to be noise. People will report them because they are scary looking. > Such specific information is usually necessary only with nesting. It could also apply to a single lock (for example redirecting eth0->eth0 or eth0->manyotherredirectshere->eth0) > But here, with sch_ingress + act_mirred it's not about nesting: it's > really about wrong order, which could be safe only because ifb > doesn't currently work with a full egress + ingress functionality. ifb never ever works with ingress. It is design intent. It is a special netdevice (as the text says) which receives only redirected packets. > I've to find first what really bothers lockdep here, and why this > queue_lock vs. ingress_lock order isn't reported "by default". I dont see the problem in this case, those traces are false-positives - but no harm in investigating further. I still think the only time you can find real issues is when you have per-netdevice lockdep; anything else will only find issues only with the help of an expert. > But if > this really is like it looks now, then it seems before doing this > ingress "future point" some change in locking could be necessary. true. > (Maybe even sooner if lockdep finds something real after this current > patch to ifb.) As it stands right now Jarek, I think you need to teach lockedep more smarter details. cheers, jamal ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: circular locking, mirred, 2.6.24.2 2008-03-07 13:53 ` jamal @ 2008-03-08 8:46 ` Jarek Poplawski 2008-03-08 8:58 ` Jarek Poplawski 0 siblings, 1 reply; 47+ messages in thread From: Jarek Poplawski @ 2008-03-08 8:46 UTC (permalink / raw) To: jamal; +Cc: Denys Fedoryshchenko, netdev On Fri, Mar 07, 2008 at 08:53:22AM -0500, jamal wrote: ... > The problem is not so much so that it is an impossible situation, rather > it is not a useful detail to report as is. > As an example, heres something i consider useful to catch: > dev->queuelock for ethx being held recursively > In your current setup, there will be many false-positives. i.e it will > show something like dev->queuelock for eth0 is being held and flagging > that as a problem because dev->queuelock for eth1 is also being held. > i.e a lot of noise. But as you can see... you can't see these reports too often now. Probably there are many people who are skeptical of lockdep like you. But until it's an official tool you've to adjust and not scare users with these reports (often called OOPSes!). The bad thing is that currently any such report turns off lockdep checking for next possible (false!?) problems. > Is it time to do that now? I know you said it was too much data to > carry. However, such data depends on how many netdevices are on the > system and lockdep is optional. You could even make NETDEV_LOCKDEP > version which is accurate and the other one as being less accurate. That's why I'm trying now to find if making this more specific only for ifb could suffice. This reverse locking order is used only on sch_ingress -> act_mirred path, and it's mostly with ifb, which as you said doesn't use this patch with it's ingress (at least currently). > I am curious what issue has been found with this approach. I can see > that someone who knows what they are doing could sift through reports > and be able to catch something. My thinking is it most reports are going > to be noise. People will report them because they are scary looking. I never counted this, but you're mostly right, and most (90%?) of these reports are really false-positives. But, even 10 or 5% correctness is IMHO worth of it. I still think one of the most bothering things in linux for users are misterious lockups, and this could be even more so with these fast multicore boxes now. One of the recent right-positives I can remember can be found on the list in "Fix SMP oops in pppol2tp driver". This was really a complex thing (and there is still something to find). The funniest of all is David diagnosed it fastest probably without studying this report too much, but on the other hand, nobody had spotted this issue before lockdep. Another such a good lockdep's work is checking for flush_workqueue() abusing (recently in "lockdep trace from rc2." netdev thread). > It could also apply to a single lock (for example redirecting > eth0->eth0 or eth0->manyotherredirectshere->eth0) If you mean redirecting from ingress with act_mirred then of course. I wrote earlier about eth0->ingress_redirected->eth1, eth1-> ingress_redirected->eth0 which would be right-positive IMHO. If eth0->eth0 is legal this should be even more dangerous. > ifb never ever works with ingress. It is design intent. It is a special > netdevice (as the text says) which receives only redirected packets. ... > I dont see the problem in this case, those traces are false-positives - > but no harm in investigating further. I still think the only time you > can find real issues is when you have per-netdevice lockdep; anything > else will only find issues only with the help of an expert. I think wrt ifb it's a false-positive too, that's why I din't try to change locking but only add some info for lockdep. BTW, we shouldn't forget such lockdep annotations are always under debugging config options, and not intended to work (or use any resources) in production or home boxes (at least until no need to write to kernel lists). BTW#2, it seems this tcf_lock in act_mirred could need similar lockdep treatment, but let's wait for Denys'es testing. > As it stands right now Jarek, I think you need to teach lockedep more > smarter details. ...But the beast is much smarter than me! Regards, Jarek P. PS: sorry for delay, a little cold or something. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: circular locking, mirred, 2.6.24.2 2008-03-08 8:46 ` Jarek Poplawski @ 2008-03-08 8:58 ` Jarek Poplawski 2008-03-08 9:56 ` Denys Fedoryshchenko 2008-03-08 10:16 ` Denys Fedoryshchenko 0 siblings, 2 replies; 47+ messages in thread From: Jarek Poplawski @ 2008-03-08 8:58 UTC (permalink / raw) To: jamal; +Cc: Denys Fedoryshchenko, netdev On Sat, Mar 08, 2008 at 09:46:28AM +0100, Jarek Poplawski wrote: ... > said doesn't use this patch with it's ingress (at least currently). + said doesn't use this path with it's ingress (at least currently). Jarek P. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: circular locking, mirred, 2.6.24.2 2008-03-08 8:58 ` Jarek Poplawski @ 2008-03-08 9:56 ` Denys Fedoryshchenko 2008-03-08 10:16 ` Denys Fedoryshchenko 1 sibling, 0 replies; 47+ messages in thread From: Denys Fedoryshchenko @ 2008-03-08 9:56 UTC (permalink / raw) To: Jarek Poplawski, jamal; +Cc: netdev I apply proposel patch at http://marc.info/?l=linux- netdev&m=120481096107584&w=2 and not able to trigger lockdep warning anymore. It is expected result? On Sat, 8 Mar 2008 09:58:54 +0100, Jarek Poplawski wrote > On Sat, Mar 08, 2008 at 09:46:28AM +0100, Jarek Poplawski wrote: > .... > > said doesn't use this patch with it's ingress (at least currently). > > + said doesn't use this path with it's ingress (at least currently). > > Jarek P. > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Denys Fedoryshchenko Technical Manager Virtual ISP S.A.L. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: circular locking, mirred, 2.6.24.2 2008-03-08 8:58 ` Jarek Poplawski 2008-03-08 9:56 ` Denys Fedoryshchenko @ 2008-03-08 10:16 ` Denys Fedoryshchenko 2008-03-08 10:43 ` Jarek Poplawski 1 sibling, 1 reply; 47+ messages in thread From: Denys Fedoryshchenko @ 2008-03-08 10:16 UTC (permalink / raw) To: Jarek Poplawski, jamal; +Cc: netdev Update. I am not able to reproduce on my PC, but still able to reproduce on PPPoE server (but here i cannot take risk to use 2.6.25-rcX kernels). This is PPPoE: [2148614.154684] [2148614.154688] ======================================================= [2148614.154805] [ INFO: possible circular locking dependency detected ] [2148614.154862] 2.6.24.3-build-0023 #9 [2148614.154913] ------------------------------------------------------- [2148614.154969] swapper/0 is trying to acquire lock: [2148614.155023] (&ifb_queue_lock_key){-+..}, at: [<c0289d4d>] dev_queue_xmit+0x177/0x302 [2148614.155245] [2148614.155246] but task is already holding lock: [2148614.155346] (&p->tcfc_lock){-+..}, at: [<f8a10066>] tcf_mirred+0x20/ 0x180 [act_mirred] [2148614.155569] [2148614.155570] which lock already depends on the new lock. [2148614.155571] [2148614.155715] [2148614.155716] the existing dependency chain (in reverse order) is: [2148614.155817] [2148614.155818] -> #2 (&p->tcfc_lock){-+..}: [2148614.156026] [<c014368c>] __lock_acquire+0xa30/0xc19 [2148614.156371] [<c01438ef>] lock_acquire+0x7a/0x94 [2148614.156714] [<c02d9525>] _spin_lock+0x2e/0x58 [2148614.157062] [<f89e638d>] tcf_ipt+0x4b/0xde [act_ipt] [2148614.157412] [<c0298000>] tcf_action_exec+0x44/0x77 [2148614.157764] [<f89da5d2>] u32_classify+0x119/0x24a [cls_u32] [2148614.158116] [<c0295e1e>] tc_classify_compat+0x2f/0x5e [2148614.158460] [<c0295fff>] tc_classify+0x1a/0x80 [2148614.158802] [<f89ec118>] ingress_enqueue+0x1a/0x53 [sch_ingress] [2148614.159152] [<c0287275>] netif_receive_skb+0x296/0x44c [2148614.159496] [<c0289ada>] process_backlog+0x77/0xd4 [2148614.159839] [<c02895f8>] net_rx_action+0xbf/0x201 [2148614.160183] [<c012ac15>] __do_softirq+0x6f/0xe9 [2148614.160530] [<c01078af>] do_softirq+0x61/0xc8 [2148614.160876] [<ffffffff>] 0xffffffff [2148614.161226] [2148614.161227] -> #1 (&dev->ingress_lock){-+..}: [2148614.161435] [<c014368c>] __lock_acquire+0xa30/0xc19 [2148614.161781] [<c01438ef>] lock_acquire+0x7a/0x94 [2148614.162123] [<c02d9525>] _spin_lock+0x2e/0x58 [2148614.162465] [<c02954c3>] qdisc_lock_tree+0x1e/0x21 [2148614.162807] [<c0296ba5>] qdisc_create+0x17e/0x1e3 [2148614.163148] [<c0297368>] tc_modify_qdisc+0x2a3/0x3a3 [2148614.163490] [<c029051d>] rtnetlink_rcv_msg+0x18e/0x1a8 [2148614.163835] [<c029b670>] netlink_rcv_skb+0x30/0x82 [2148614.164181] [<c0290387>] rtnetlink_rcv+0x17/0x1f [2148614.164523] [<c029b444>] netlink_unicast+0x19c/0x205 [2148614.164868] [<c029bc41>] netlink_sendmsg+0x250/0x25c [2148614.165210] [<c027d894>] sock_sendmsg+0xcc/0xe5 [2148614.165554] [<c027da00>] sys_sendmsg+0x153/0x1b1 [2148614.165898] [<c027e860>] sys_socketcall+0x203/0x222 [2148614.166242] [<c0104e9a>] sysenter_past_esp+0x5f/0xa5 [2148614.166589] [<ffffffff>] 0xffffffff [2148614.166935] [2148614.166936] -> #0 (&ifb_queue_lock_key){-+..}: [2148614.167143] [<c014357c>] __lock_acquire+0x920/0xc19 [2148614.167490] [<c01438ef>] lock_acquire+0x7a/0x94 [2148614.167832] [<c02d9525>] _spin_lock+0x2e/0x58 [2148614.168176] [<c0289d4d>] dev_queue_xmit+0x177/0x302 [2148614.168529] [<f8a101a5>] tcf_mirred+0x15f/0x180 [act_mirred] [2148614.168878] [<c0298000>] tcf_action_exec+0x44/0x77 [2148614.169223] [<f89da5d2>] u32_classify+0x119/0x24a [cls_u32] [2148614.169571] [<c0295e1e>] tc_classify_compat+0x2f/0x5e [2148614.169914] [<c0295fff>] tc_classify+0x1a/0x80 [2148614.170257] [<f89ec118>] ingress_enqueue+0x1a/0x53 [sch_ingress] [2148614.170605] [<c0287275>] netif_receive_skb+0x296/0x44c [2148614.170950] [<c0289ada>] process_backlog+0x77/0xd4 [2148614.171293] [<c02895f8>] net_rx_action+0xbf/0x201 [2148614.171635] [<c012ac15>] __do_softirq+0x6f/0xe9 [2148614.171978] [<c01078af>] do_softirq+0x61/0xc8 [2148614.172321] [<ffffffff>] 0xffffffff [2148614.172664] [2148614.172665] other info that might help us debug this: [2148614.172666] [2148614.172810] 5 locks held by swapper/0: [2148614.172862] #0: (rcu_read_lock){..--}, at: [<c0289589>] net_rx_action+0x50/0x201 [2148614.173111] #1: (rcu_read_lock){..--}, at: [<c02870d5>] netif_receive_skb+0xf6/0x44c [2148614.173359] #2: (&dev->ingress_lock){-+..}, at: [<c0287261>] netif_receive_skb+0x282/0x44c [2148614.173608] #3: (&p->tcfc_lock){-+..}, at: [<f8a10066>] tcf_mirred+0x20/0x180 [act_mirred] [2148614.173859] #4: (rcu_read_lock){..--}, at: [<c0289cf3>] dev_queue_xmit+0x11d/0x302 [2148614.174110] [2148614.174111] stack backtrace: [2148614.174208] Pid: 0, comm: swapper Not tainted 2.6.24.3-build-0023 #9 [2148614.174265] [<c0105f2a>] show_trace_log_lvl+0x1a/0x2f [2148614.174360] [<c0106843>] show_trace+0x12/0x14 [2148614.174452] [<c010713a>] dump_stack+0x6c/0x72 [2148614.174541] [<c014175c>] print_circular_bug_tail+0x5f/0x68 [2148614.174634] [<c014357c>] __lock_acquire+0x920/0xc19 [2148614.174725] [<c01438ef>] lock_acquire+0x7a/0x94 [2148614.174814] [<c02d9525>] _spin_lock+0x2e/0x58 [2148614.174904] [<c0289d4d>] dev_queue_xmit+0x177/0x302 [2148614.174995] [<f8a101a5>] tcf_mirred+0x15f/0x180 [act_mirred] [2148614.175091] [<c0298000>] tcf_action_exec+0x44/0x77 [2148614.175183] [<f89da5d2>] u32_classify+0x119/0x24a [cls_u32] [2148614.175279] [<c0295e1e>] tc_classify_compat+0x2f/0x5e [2148614.175370] [<c0295fff>] tc_classify+0x1a/0x80 [2148614.175460] [<f89ec118>] ingress_enqueue+0x1a/0x53 [sch_ingress] [2148614.175554] [<c0287275>] netif_receive_skb+0x296/0x44c [2148614.175646] [<c0289ada>] process_backlog+0x77/0xd4 [2148614.175735] [<c02895f8>] net_rx_action+0xbf/0x201 [2148614.175825] [<c012ac15>] __do_softirq+0x6f/0xe9 [2148614.175915] [<c01078af>] do_softirq+0x61/0xc8 [2148614.176005] ======================= On Sat, 8 Mar 2008 09:58:54 +0100, Jarek Poplawski wrote > On Sat, Mar 08, 2008 at 09:46:28AM +0100, Jarek Poplawski wrote: > .... > > said doesn't use this patch with it's ingress (at least currently). > > + said doesn't use this path with it's ingress (at least currently). > > Jarek P. > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Denys Fedoryshchenko Technical Manager Virtual ISP S.A.L. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: circular locking, mirred, 2.6.24.2 2008-03-08 10:16 ` Denys Fedoryshchenko @ 2008-03-08 10:43 ` Jarek Poplawski 2008-03-08 10:52 ` Jarek Poplawski 0 siblings, 1 reply; 47+ messages in thread From: Jarek Poplawski @ 2008-03-08 10:43 UTC (permalink / raw) To: Denys Fedoryshchenko; +Cc: jamal, netdev On Sat, Mar 08, 2008 at 12:16:52PM +0200, Denys Fedoryshchenko wrote: > Update. I am not able to reproduce on my PC, but still able to reproduce on > PPPoE server (but here i cannot take risk to use 2.6.25-rcX kernels). This is > PPPoE: > > [2148614.154684] > [2148614.154688] ======================================================= > [2148614.154805] [ INFO: possible circular locking dependency detected ] > [2148614.154862] 2.6.24.3-build-0023 #9 > [2148614.154913] ------------------------------------------------------- > [2148614.154969] swapper/0 is trying to acquire lock: > [2148614.155023] (&ifb_queue_lock_key){-+..}, at: [<c0289d4d>] > dev_queue_xmit+0x177/0x302 > [2148614.155245] > [2148614.155246] but task is already holding lock: > [2148614.155346] (&p->tcfc_lock){-+..}, at: [<f8a10066>] tcf_mirred+0x20/ > 0x180 [act_mirred] > [2148614.155569] > [2148614.155570] which lock already depends on the new lock. ...Hmm... the same day I sent to you "take 2" of my patch which was more complete (lockdep_set_class for ingress_lock added). Could you try this? It still could be not enough, and something similar is needed for tcfc_lock in act_mirred, but I'd like to see a report after this patch first. As Jamal wrote this all is a false-positive with ifb, but I think it's needed to please lockdep. Thanks, Jarek P. PS: I resend this "take 2" here: --- drivers/net/ifb.c | 24 ++++++++++++++++++++++++ 1 files changed, 24 insertions(+), 0 deletions(-) diff --git a/drivers/net/ifb.c b/drivers/net/ifb.c index 15949d3..c553b62 100644 --- a/drivers/net/ifb.c +++ b/drivers/net/ifb.c @@ -227,6 +227,27 @@ static struct rtnl_link_ops ifb_link_ops __read_mostly = { module_param(numifbs, int, 0); MODULE_PARM_DESC(numifbs, "Number of ifb devices"); +#ifdef CONFIG_DEBUG_LOCK_ALLOC +/* + * dev_ifb->queue_lock is usually taken after dev->ingress_lock, + * reversely to e.g. qdisc_lock_tree(). It should be safe until + * ifb doesn't take dev->queue_lock with dev_ifb->ingress_lock. + * But lockdep should know that ifb has different locks from dev. + */ +static struct lock_class_key ifb_queue_lock_key; +static struct lock_class_key ifb_ingress_lock_key; + +static inline void ifb_set_lock_classes(struct net_device *dev_ifb) +{ + lockdep_set_class(&dev_ifb->queue_lock, &ifb_queue_lock_key); + lockdep_set_class(&dev_ifb->ingress_lock, &ifb_ingress_lock_key); +} +#else +static inline void ifb_set_lock_classes(struct net_device *dev_ifb) +{ +} +#endif /* CONFIG_DEBUG_LOCK_ALLOC */ + static int __init ifb_init_one(int index) { struct net_device *dev_ifb; @@ -246,6 +267,9 @@ static int __init ifb_init_one(int index) err = register_netdevice(dev_ifb); if (err < 0) goto err; + + ifb_set_lock_classes(dev_ifb); + return 0; err: ^ permalink raw reply related [flat|nested] 47+ messages in thread
* Re: circular locking, mirred, 2.6.24.2 2008-03-08 10:43 ` Jarek Poplawski @ 2008-03-08 10:52 ` Jarek Poplawski 2008-03-08 11:09 ` Denys Fedoryshchenko 0 siblings, 1 reply; 47+ messages in thread From: Jarek Poplawski @ 2008-03-08 10:52 UTC (permalink / raw) To: Denys Fedoryshchenko; +Cc: jamal, netdev On Sat, Mar 08, 2008 at 11:43:22AM +0100, Jarek Poplawski wrote: ... > ...Hmm... the same day I sent to you "take 2" of my patch which was more > complete (lockdep_set_class for ingress_lock added). Could you try this? > It still could be not enough, and something similar is needed for > tcfc_lock in act_mirred, but I'd like to see a report after this patch > first. ...Hmm#2... or maybe it doesn't matter? I'll try to send you today this additional act_mirred patch. Jarek P. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: circular locking, mirred, 2.6.24.2 2008-03-08 10:52 ` Jarek Poplawski @ 2008-03-08 11:09 ` Denys Fedoryshchenko 2008-03-08 12:02 ` Jarek Poplawski 0 siblings, 1 reply; 47+ messages in thread From: Denys Fedoryshchenko @ 2008-03-08 11:09 UTC (permalink / raw) To: Jarek Poplawski; +Cc: jamal, netdev Seems "try 2" helped. lockdep is not triggered anymore. I test on 3 different servers for now. I will test more deeply and on more servers. On Sat, 8 Mar 2008 11:52:50 +0100, Jarek Poplawski wrote > On Sat, Mar 08, 2008 at 11:43:22AM +0100, Jarek Poplawski wrote: > .... > > ...Hmm... the same day I sent to you "take 2" of my patch which was more > > complete (lockdep_set_class for ingress_lock added). Could you try this? > > It still could be not enough, and something similar is needed for > > tcfc_lock in act_mirred, but I'd like to see a report after this patch > > first. > > ....Hmm#2... or maybe it doesn't matter? I'll try to send you today > this additional act_mirred patch. > > Jarek P. > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Denys Fedoryshchenko Technical Manager Virtual ISP S.A.L. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: circular locking, mirred, 2.6.24.2 2008-03-08 11:09 ` Denys Fedoryshchenko @ 2008-03-08 12:02 ` Jarek Poplawski 2008-03-19 0:46 ` Denys Fedoryshchenko 0 siblings, 1 reply; 47+ messages in thread From: Jarek Poplawski @ 2008-03-08 12:02 UTC (permalink / raw) To: Denys Fedoryshchenko; +Cc: jamal, netdev On Sat, Mar 08, 2008 at 01:09:10PM +0200, Denys Fedoryshchenko wrote: > Seems "try 2" helped. lockdep is not triggered anymore. I test on 3 different > servers for now. > I will test more deeply and on more servers. Fine! So, let's wait yet... (No need to hurry.) BTW, I thought about those earlier probably stack overflow problems, and I wonder if using action ipt couldn't be blamed here. Sometimes it seems to be unnecessary, so maybe you could try if re-doing some filter rules can help. Jarek P. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: circular locking, mirred, 2.6.24.2 2008-03-08 12:02 ` Jarek Poplawski @ 2008-03-19 0:46 ` Denys Fedoryshchenko 2008-03-19 7:34 ` [PATCH][NET] ifb: set separate lockdep classes for queue locks Jarek Poplawski 0 siblings, 1 reply; 47+ messages in thread From: Denys Fedoryshchenko @ 2008-03-19 0:46 UTC (permalink / raw) To: Jarek Poplawski; +Cc: jamal, netdev No more warnings. Probably it must be applied to 2.6.25 before it is released? On Sat, 8 Mar 2008 13:02:44 +0100, Jarek Poplawski wrote > On Sat, Mar 08, 2008 at 01:09:10PM +0200, Denys Fedoryshchenko wrote: > > Seems "try 2" helped. lockdep is not triggered anymore. I test on 3 different > > servers for now. > > I will test more deeply and on more servers. > > Fine! So, let's wait yet... (No need to hurry.) > > BTW, I thought about those earlier probably stack overflow problems, > and I wonder if using action ipt couldn't be blamed here. Sometimes > it seems to be unnecessary, so maybe you could try if re-doing some > filter rules can help. > > Jarek P. > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Denys Fedoryshchenko Technical Manager Virtual ISP S.A.L. ^ permalink raw reply [flat|nested] 47+ messages in thread
* [PATCH][NET] ifb: set separate lockdep classes for queue locks 2008-03-19 0:46 ` Denys Fedoryshchenko @ 2008-03-19 7:34 ` Jarek Poplawski 2008-03-19 11:34 ` jamal 2008-03-20 22:37 ` David Miller 0 siblings, 2 replies; 47+ messages in thread From: Jarek Poplawski @ 2008-03-19 7:34 UTC (permalink / raw) To: Jeff Garzik, David S. Miller; +Cc: Denys Fedoryshchenko, jamal, netdev On Wed, Mar 19, 2008 at 02:46:08AM +0200, Denys Fedoryshchenko wrote: > No more warnings. > Probably it must be applied to 2.6.25 before it is released? > > On Sat, 8 Mar 2008 13:02:44 +0100, Jarek Poplawski wrote > > On Sat, Mar 08, 2008 at 01:09:10PM +0200, Denys Fedoryshchenko wrote: > > > Seems "try 2" helped. lockdep is not triggered anymore. I test on 3 > different > > > servers for now. > > > I will test more deeply and on more servers. Thanks for testing Denys, Jarek P. ---------------------> Subject: [NET] ifb: set separate lockdep classes for queue locks > [2148614.154688] ======================================================= > [2148614.154805] [ INFO: possible circular locking dependency detected ] > [2148614.154862] 2.6.24.3-build-0023 #9 > [2148614.154913] ------------------------------------------------------- > [2148614.154969] swapper/0 is trying to acquire lock: > [2148614.155023] (&ifb_queue_lock_key){-+..}, at: [<c0289d4d>] > dev_queue_xmit+0x177/0x302 > [2148614.155245] > [2148614.155246] but task is already holding lock: > [2148614.155346] (&p->tcfc_lock){-+..}, at: [<f8a10066>] tcf_mirred+0x20/ > 0x180 [act_mirred] > [2148614.155569] > [2148614.155570] which lock already depends on the new lock. lockdep warns of locking order while using ifb with sch_ingress and act_mirred: ingress_lock, tcfc_lock, queue_lock (usually queue_lock is at the beginning). This patch is only to tell lockdep that ifb is a different device (e.g. from eth) and has its own pair of queue locks. (This warning is a false-positive in common scenario of using ifb; yet there are possible situations, when this order could be dangerous; lockdep should warn in such a case.) Reported-and-tested-by: Denys Fedoryshchenko <denys@visp.net.lb> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Cc: Jamal Hadi Salim <hadi@cyberus.ca> --- drivers/net/ifb.c | 24 ++++++++++++++++++++++++ 1 files changed, 24 insertions(+), 0 deletions(-) diff --git a/drivers/net/ifb.c b/drivers/net/ifb.c index 15949d3..c553b62 100644 --- a/drivers/net/ifb.c +++ b/drivers/net/ifb.c @@ -227,6 +227,27 @@ static struct rtnl_link_ops ifb_link_ops __read_mostly = { module_param(numifbs, int, 0); MODULE_PARM_DESC(numifbs, "Number of ifb devices"); +#ifdef CONFIG_DEBUG_LOCK_ALLOC +/* + * dev_ifb->queue_lock is usually taken after dev->ingress_lock, + * reversely to e.g. qdisc_lock_tree(). It should be safe until + * ifb doesn't take dev->queue_lock with dev_ifb->ingress_lock. + * But lockdep should know that ifb has different locks from dev. + */ +static struct lock_class_key ifb_queue_lock_key; +static struct lock_class_key ifb_ingress_lock_key; + +static inline void ifb_set_lock_classes(struct net_device *dev_ifb) +{ + lockdep_set_class(&dev_ifb->queue_lock, &ifb_queue_lock_key); + lockdep_set_class(&dev_ifb->ingress_lock, &ifb_ingress_lock_key); +} +#else +static inline void ifb_set_lock_classes(struct net_device *dev_ifb) +{ +} +#endif /* CONFIG_DEBUG_LOCK_ALLOC */ + static int __init ifb_init_one(int index) { struct net_device *dev_ifb; @@ -246,6 +267,9 @@ static int __init ifb_init_one(int index) err = register_netdevice(dev_ifb); if (err < 0) goto err; + + ifb_set_lock_classes(dev_ifb); + return 0; err: -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 47+ messages in thread
* Re: [PATCH][NET] ifb: set separate lockdep classes for queue locks 2008-03-19 7:34 ` [PATCH][NET] ifb: set separate lockdep classes for queue locks Jarek Poplawski @ 2008-03-19 11:34 ` jamal 2008-03-19 12:20 ` Jarek Poplawski 2008-03-20 22:37 ` David Miller 1 sibling, 1 reply; 47+ messages in thread From: jamal @ 2008-03-19 11:34 UTC (permalink / raw) To: Jarek Poplawski Cc: Jeff Garzik, David S. Miller, Denys Fedoryshchenko, netdev On Wed, 2008-19-03 at 07:34 +0000, Jarek Poplawski wrote: Thanks for logging the false-positive aspect Jarek. > Reported-and-tested-by: Denys Fedoryshchenko <denys@visp.net.lb> > Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> > Cc: Jamal Hadi Salim <hadi@cyberus.ca> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> cheers, jamal ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH][NET] ifb: set separate lockdep classes for queue locks 2008-03-19 11:34 ` jamal @ 2008-03-19 12:20 ` Jarek Poplawski 0 siblings, 0 replies; 47+ messages in thread From: Jarek Poplawski @ 2008-03-19 12:20 UTC (permalink / raw) To: jamal; +Cc: Jeff Garzik, David S. Miller, Denys Fedoryshchenko, netdev On Wed, Mar 19, 2008 at 07:34:29AM -0400, jamal wrote: ... > Thanks for logging the false-positive aspect Jarek. I guess it's something worth to remember... Cheers, Jarek P. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH][NET] ifb: set separate lockdep classes for queue locks 2008-03-19 7:34 ` [PATCH][NET] ifb: set separate lockdep classes for queue locks Jarek Poplawski 2008-03-19 11:34 ` jamal @ 2008-03-20 22:37 ` David Miller 2008-03-21 0:03 ` [PATCH take2][NET] " Jarek Poplawski 1 sibling, 1 reply; 47+ messages in thread From: David Miller @ 2008-03-20 22:37 UTC (permalink / raw) To: jarkao2; +Cc: jeff, denys, hadi, netdev From: Jarek Poplawski <jarkao2@gmail.com> Date: Wed, 19 Mar 2008 07:34:41 +0000 > Subject: [NET] ifb: set separate lockdep classes for queue locks > > > [2148614.154688] ======================================================= > > [2148614.154805] [ INFO: possible circular locking dependency detected ] > > [2148614.154862] 2.6.24.3-build-0023 #9 > > [2148614.154913] ------------------------------------------------------- > > [2148614.154969] swapper/0 is trying to acquire lock: > > [2148614.155023] (&ifb_queue_lock_key){-+..}, at: [<c0289d4d>] > > dev_queue_xmit+0x177/0x302 > > [2148614.155245] > > [2148614.155246] but task is already holding lock: > > [2148614.155346] (&p->tcfc_lock){-+..}, at: [<f8a10066>] tcf_mirred+0x20/ > > 0x180 [act_mirred] > > [2148614.155569] > > [2148614.155570] which lock already depends on the new lock. > > lockdep warns of locking order while using ifb with sch_ingress and > act_mirred: ingress_lock, tcfc_lock, queue_lock (usually queue_lock > is at the beginning). This patch is only to tell lockdep that ifb is > a different device (e.g. from eth) and has its own pair of queue > locks. (This warning is a false-positive in common scenario of using > ifb; yet there are possible situations, when this order could be > dangerous; lockdep should warn in such a case.) > > > Reported-and-tested-by: Denys Fedoryshchenko <denys@visp.net.lb> > Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> > Cc: Jamal Hadi Salim <hadi@cyberus.ca> Jarek, the code in linux/lockdep.h provides dummy do-nothing versions of the lockdep_*() interfaces, so the spinlock debug ifdeffing you do here is unnecessary. Simply include linux/lockdep.h and perform the actions unconditionally. For example, this is how net/core/sock.c does things. Also, please upgrade Jamal's "CC" to an "Acked-by" :-) ^ permalink raw reply [flat|nested] 47+ messages in thread
* [PATCH take2][NET] ifb: set separate lockdep classes for queue locks 2008-03-20 22:37 ` David Miller @ 2008-03-21 0:03 ` Jarek Poplawski 2008-03-21 0:05 ` David Miller 2008-03-21 0:15 ` Jarek Poplawski 0 siblings, 2 replies; 47+ messages in thread From: Jarek Poplawski @ 2008-03-21 0:03 UTC (permalink / raw) To: David Miller; +Cc: jeff, denys, hadi, netdev On Thu, Mar 20, 2008 at 03:37:45PM -0700, David Miller wrote: ... > Jarek, the code in linux/lockdep.h provides dummy do-nothing > versions of the lockdep_*() interfaces, so the spinlock > debug ifdeffing you do here is unnecessary. > > Simply include linux/lockdep.h and perform the actions > unconditionally. > > For example, this is how net/core/sock.c does things. > > Also, please upgrade Jamal's "CC" to an "Acked-by" :-) IMHO it wastes a bit of memory when lockdep is off and adds some overhead when lockdep is "partialy" on and doesn't need this, but it's really late... Regards, Jarek P. ---------------------> Subject: [NET] ifb: set separate lockdep classes for queue locks [ 10.536424] ======================================================= [ 10.536424] [ INFO: possible circular locking dependency detected ] [ 10.536424] 2.6.25-rc3-devel #3 [ 10.536424] ------------------------------------------------------- [ 10.536424] swapper/0 is trying to acquire lock: [ 10.536424] (&dev->queue_lock){-+..}, at: [<c0299b4a>] dev_queue_xmit+0x175/0x2f3 [ 10.536424] [ 10.536424] but task is already holding lock: [ 10.536424] (&p->tcfc_lock){-+..}, at: [<f8a67154>] tcf_mirred+0x20/0x178 [act_mirred] [ 10.536424] [ 10.536424] which lock already depends on the new lock. lockdep warns of locking order while using ifb with sch_ingress and act_mirred: ingress_lock, tcfc_lock, queue_lock (usually queue_lock is at the beginning). This patch is only to tell lockdep that ifb is a different device (e.g. from eth) and has its own pair of queue locks. (This warning is a false-positive in common scenario of using ifb; yet there are possible situations, when this order could be dangerous; lockdep should warn in such a case.) (With suggestions by David S. Miller) Reported-and-tested-by: Denys Fedoryshchenko <denys@visp.net.lb> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> --- drivers/net/ifb.c | 15 +++++++++++++++ 1 files changed, 15 insertions(+), 0 deletions(-) diff --git a/drivers/net/ifb.c b/drivers/net/ifb.c index 15949d3..af233b5 100644 --- a/drivers/net/ifb.c +++ b/drivers/net/ifb.c @@ -35,6 +35,7 @@ #include <linux/moduleparam.h> #include <net/pkt_sched.h> #include <net/net_namespace.h> +#include <linux/lockdep.h> #define TX_TIMEOUT (2*HZ) @@ -227,6 +228,16 @@ static struct rtnl_link_ops ifb_link_ops __read_mostly = { module_param(numifbs, int, 0); MODULE_PARM_DESC(numifbs, "Number of ifb devices"); +/* + * dev_ifb->queue_lock is usually taken after dev->ingress_lock, + * reversely to e.g. qdisc_lock_tree(). It should be safe until + * ifb doesn't take dev->queue_lock with dev_ifb->ingress_lock. + * But lockdep should know that ifb has different locks from dev. + */ +static struct lock_class_key ifb_queue_lock_key; +static struct lock_class_key ifb_ingress_lock_key; + + static int __init ifb_init_one(int index) { struct net_device *dev_ifb; @@ -246,6 +257,10 @@ static int __init ifb_init_one(int index) err = register_netdevice(dev_ifb); if (err < 0) goto err; + + lockdep_set_class(&dev_ifb->queue_lock, &ifb_queue_lock_key); + lockdep_set_class(&dev_ifb->ingress_lock, &ifb_ingress_lock_key); + return 0; err: ^ permalink raw reply related [flat|nested] 47+ messages in thread
* Re: [PATCH take2][NET] ifb: set separate lockdep classes for queue locks 2008-03-21 0:03 ` [PATCH take2][NET] " Jarek Poplawski @ 2008-03-21 0:05 ` David Miller 2008-03-21 0:15 ` Jarek Poplawski 1 sibling, 0 replies; 47+ messages in thread From: David Miller @ 2008-03-21 0:05 UTC (permalink / raw) To: jarkao2; +Cc: jeff, denys, hadi, netdev From: Jarek Poplawski <jarkao2@gmail.com> Date: Fri, 21 Mar 2008 01:03:12 +0100 > IMHO it wastes a bit of memory when lockdep is off and adds some > overhead when lockdep is "partialy" on and doesn't need this, but > it's really late... Understood but I really think this is the right way to implement this. Patch applied, thanks! ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH take2][NET] ifb: set separate lockdep classes for queue locks 2008-03-21 0:03 ` [PATCH take2][NET] " Jarek Poplawski 2008-03-21 0:05 ` David Miller @ 2008-03-21 0:15 ` Jarek Poplawski 1 sibling, 0 replies; 47+ messages in thread From: Jarek Poplawski @ 2008-03-21 0:15 UTC (permalink / raw) To: David Miller; +Cc: jeff, denys, hadi, netdev On Fri, Mar 21, 2008 at 01:03:12AM +0100, Jarek Poplawski wrote: ... > IMHO it wastes a bit of memory when lockdep is off and adds some OOPS! My IMHO is wrong again - no memory is wasted when lockdep is off! Sorry, Jarek P. ^ permalink raw reply [flat|nested] 47+ messages in thread
end of thread, other threads:[~2008-03-21 0:09 UTC | newest] Thread overview: 47+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-02-24 22:20 circular locking, mirred, 2.6.24.2 Denys Fedoryshchenko 2008-02-25 9:56 ` Jarek Poplawski 2008-02-25 10:48 ` Denys Fedoryshchenko 2008-02-25 11:39 ` Jarek Poplawski 2008-03-05 10:45 ` Denys Fedoryshchenko 2008-03-05 13:54 ` [BUG] Probably lockdep bug " Jarek Poplawski 2008-03-06 9:41 ` Jarek Poplawski 2008-03-06 13:40 ` Jarek Poplawski 2008-03-06 13:57 ` Denys Fedoryshchenko 2008-03-06 14:27 ` jamal 2008-03-06 15:50 ` Denys Fedoryshchenko 2008-03-06 20:25 ` Jarek Poplawski 2008-03-06 20:56 ` jamal 2008-03-06 22:12 ` Jarek Poplawski 2008-03-06 23:43 ` Denys Fedoryshchenko 2008-03-07 0:09 ` jamal 2008-03-07 0:15 ` Denys Fedoryshchenko 2008-03-07 0:25 ` jamal 2008-03-07 9:31 ` Jarek Poplawski 2008-03-07 10:19 ` Denys Fedoryshchenko 2008-03-07 10:48 ` Jarek Poplawski 2008-03-07 14:58 ` jamal 2008-03-06 20:44 ` jamal 2008-03-06 13:59 ` jamal 2008-03-06 17:56 ` Jarek Poplawski 2008-03-06 20:48 ` jamal 2008-03-06 21:40 ` Jarek Poplawski 2008-03-06 23:40 ` jamal 2008-03-07 7:51 ` Jarek Poplawski 2008-03-07 8:32 ` Jarek Poplawski 2008-03-07 13:53 ` jamal 2008-03-08 8:46 ` Jarek Poplawski 2008-03-08 8:58 ` Jarek Poplawski 2008-03-08 9:56 ` Denys Fedoryshchenko 2008-03-08 10:16 ` Denys Fedoryshchenko 2008-03-08 10:43 ` Jarek Poplawski 2008-03-08 10:52 ` Jarek Poplawski 2008-03-08 11:09 ` Denys Fedoryshchenko 2008-03-08 12:02 ` Jarek Poplawski 2008-03-19 0:46 ` Denys Fedoryshchenko 2008-03-19 7:34 ` [PATCH][NET] ifb: set separate lockdep classes for queue locks Jarek Poplawski 2008-03-19 11:34 ` jamal 2008-03-19 12:20 ` Jarek Poplawski 2008-03-20 22:37 ` David Miller 2008-03-21 0:03 ` [PATCH take2][NET] " Jarek Poplawski 2008-03-21 0:05 ` David Miller 2008-03-21 0:15 ` Jarek Poplawski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).