circular locking, mirred, 2.6.24.2

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* circular locking, mirred, 2.6.24.2
@ 2008-02-24 22:20 Denys Fedoryshchenko
  2008-02-25  9:56 ` Jarek Poplawski
  0 siblings, 1 reply; 47+ messages in thread
From: Denys Fedoryshchenko @ 2008-02-24 22:20 UTC (permalink / raw)
  To: netdev

2.6.24.2 with applied patches for printk,softlockup, and patch for htb (as i 
understand, they are in 2.6.25 git and it is fixes).

I will send also to private mails QoS rules i am using.

[  118.840072] =======================================================
[  118.840158] [ INFO: possible circular locking dependency detected ]
[  118.840203] 2.6.24.2-build-0022 #7
[  118.840243] -------------------------------------------------------
[  118.840288] swapper/0 is trying to acquire lock:
[  118.840329]  (&dev->queue_lock){-+..}, at: [<c0289c11>] dev_queue_xmit
+0x177/0x302
[  118.840490]
[  118.840490] but task is already holding lock:
[  118.840567]  (&p->tcfc_lock){-+..}, at: [<f89f0066>] tcf_mirred+0x20/0x180 
[act_mirred]
[  118.840727]
[  118.840727] which lock already depends on the new lock.
[  118.840728]
[  118.840842]
[  118.840842] the existing dependency chain (in reverse order) is:
[  118.840921]
[  118.840921] -> #2 (&p->tcfc_lock){-+..}:
[  118.841075]        [<c0143624>] __lock_acquire+0xa30/0xc19
[  118.841324]        [<c0143887>] lock_acquire+0x7a/0x94
[  118.841572]        [<c02d93f5>] _spin_lock+0x2e/0x58
[  118.841820]        [<f89f0066>] tcf_mirred+0x20/0x180 [act_mirred]
[  118.842068]        [<c0297ec4>] tcf_action_exec+0x44/0x77
[  118.842344]        [<f89ba5d2>] u32_classify+0x119/0x24a [cls_u32]
[  118.842595]        [<c0295ce2>] tc_classify_compat+0x2f/0x5e
[  118.842845]        [<c0295ec3>] tc_classify+0x1a/0x80
[  118.843092]        [<f899c118>] ingress_enqueue+0x1a/0x53 [sch_ingress]
[  118.843343]        [<c0287139>] netif_receive_skb+0x296/0x44c
[  118.843592]        [<f88d1a4e>] e100_poll+0x14b/0x26a [e100]
[  118.843843]        [<c02894bc>] net_rx_action+0xbf/0x201
[  118.844091]        [<c012ac15>] __do_softirq+0x6f/0xe9
[  118.844343]        [<c01078af>] do_softirq+0x61/0xc8
[  118.844591]        [<ffffffff>] 0xffffffff
[  118.844840]
[  118.844840] -> #1 (&dev->ingress_lock){-+..}:
[  118.844993]        [<c0143624>] __lock_acquire+0xa30/0xc19
[  118.845242]        [<c0143887>] lock_acquire+0x7a/0x94
[  118.845489]        [<c02d93f5>] _spin_lock+0x2e/0x58
[  118.845737]        [<c0295387>] qdisc_lock_tree+0x1e/0x21
[  118.845984]        [<c02953b6>] dev_init_scheduler+0xb/0x53
[  118.846235]        [<c0288483>] register_netdevice+0x2a3/0x2fd
[  118.846483]        [<c028850f>] register_netdev+0x32/0x3f
[  118.846730]        [<c03ea8ee>] loopback_net_init+0x39/0x6c
[  118.846980]        [<c02858ef>] register_pernet_operations+0x13/0x15
[  118.847230]        [<c0285958>] register_pernet_device+0x1f/0x4c
[  118.847478]        [<c03ea8b3>] loopback_init+0xd/0xf
[  118.847725]        [<c03d64df>] kernel_init+0x155/0x2c6
[  118.847973]        [<c0105bab>] kernel_thread_helper+0x7/0x10
[  118.848225]        [<ffffffff>] 0xffffffff
[  118.848472]
[  118.848472] -> #0 (&dev->queue_lock){-+..}:
[  118.848626]        [<c0143514>] __lock_acquire+0x920/0xc19
[  118.848874]        [<c0143887>] lock_acquire+0x7a/0x94
[  118.849122]        [<c02d93f5>] _spin_lock+0x2e/0x58
[  118.849370]        [<c0289c11>] dev_queue_xmit+0x177/0x302
[  118.849617]        [<f89f01a5>] tcf_mirred+0x15f/0x180 [act_mirred]
[  118.849866]        [<c0297ec4>] tcf_action_exec+0x44/0x77
[  118.850114]        [<f89ba5d2>] u32_classify+0x119/0x24a [cls_u32]
[  118.850366]        [<c0295ce2>] tc_classify_compat+0x2f/0x5e
[  118.850614]        [<c0295ec3>] tc_classify+0x1a/0x80
[  118.850861]        [<f899c118>] ingress_enqueue+0x1a/0x53 [sch_ingress]
[  118.851111]        [<c0287139>] netif_receive_skb+0x296/0x44c
[  118.851360]        [<f88d1a4e>] e100_poll+0x14b/0x26a [e100]
[  118.851612]        [<c02894bc>] net_rx_action+0xbf/0x201
[  118.851859]        [<c012ac15>] __do_softirq+0x6f/0xe9
[  118.852106]        [<c01078af>] do_softirq+0x61/0xc8
[  118.852355]        [<ffffffff>] 0xffffffff
[  118.852602]
[  118.852602] other info that might help us debug this:
[  118.852603]
[  118.852716] 5 locks held by swapper/0:
[  118.852756]  #0:  (rcu_read_lock){..--}, at: [<c028944d>] net_rx_action
+0x50/0x201
[  118.852940]  #1:  (rcu_read_lock){..--}, at: [<c0286f99>] netif_receive_skb
+0xf6/0x44c
[  118.853123]  #2:  (&dev->ingress_lock){-+..}, at: [<c0287125>] 
netif_receive_skb+0x282/0x44c
[  118.853309]  #3:  (&p->tcfc_lock){-+..}, at: [<f89f0066>] tcf_mirred
+0x20/0x180 [act_mirred]
[  118.853493]  #4:  (rcu_read_lock){..--}, at: [<c0289bb7>] dev_queue_xmit
+0x11d/0x302
[  118.853677]
[  118.853677] stack backtrace:
[  118.853753] Pid: 0, comm: swapper Not tainted 2.6.24.2-build-0022 #7
[  118.853796]  [<c0105f2a>] show_trace_log_lvl+0x1a/0x2f
[  118.853865]  [<c0106843>] show_trace+0x12/0x14
[  118.853932]  [<c010713a>] dump_stack+0x6c/0x72
[  118.853999]  [<c01416f4>] print_circular_bug_tail+0x5f/0x68
[  118.854068]  [<c0143514>] __lock_acquire+0x920/0xc19
[  118.854135]  [<c0143887>] lock_acquire+0x7a/0x94
[  118.854205]  [<c02d93f5>] _spin_lock+0x2e/0x58
[  118.854272]  [<c0289c11>] dev_queue_xmit+0x177/0x302
[  118.854340]  [<f89f01a5>] tcf_mirred+0x15f/0x180 [act_mirred]
[  118.854409]  [<c0297ec4>] tcf_action_exec+0x44/0x77
[  118.854477]  [<f89ba5d2>] u32_classify+0x119/0x24a [cls_u32]
[  118.854547]  [<c0295ce2>] tc_classify_compat+0x2f/0x5e
[  118.854615]  [<c0295ec3>] tc_classify+0x1a/0x80
[  118.854682]  [<f899c118>] ingress_enqueue+0x1a/0x53 [sch_ingress]
[  118.854752]  [<c0287139>] netif_receive_skb+0x296/0x44c
[  118.854820]  [<f88d1a4e>] e100_poll+0x14b/0x26a [e100]
[  118.854890]  [<c02894bc>] net_rx_action+0xbf/0x201
[  118.854958]  [<c012ac15>] __do_softirq+0x6f/0xe9
[  118.855025]  [<c01078af>] do_softirq+0x61/0xc8


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: circular locking, mirred, 2.6.24.2
  2008-02-24 22:20 circular locking, mirred, 2.6.24.2 Denys Fedoryshchenko
@ 2008-02-25  9:56 ` Jarek Poplawski
  2008-02-25 10:48   ` Denys Fedoryshchenko
  0 siblings, 1 reply; 47+ messages in thread
From: Jarek Poplawski @ 2008-02-25  9:56 UTC (permalink / raw)
  To: Denys Fedoryshchenko; +Cc: netdev

On 24-02-2008 23:20, Denys Fedoryshchenko wrote:
> 2.6.24.2 with applied patches for printk,softlockup, and patch for htb (as i 
> understand, they are in 2.6.25 git and it is fixes).
> 
> I will send also to private mails QoS rules i am using.
> 
> [  118.840072] =======================================================
> [  118.840158] [ INFO: possible circular locking dependency detected ]
> [  118.840203] 2.6.24.2-build-0022 #7
> [  118.840243] -------------------------------------------------------
> [  118.840288] swapper/0 is trying to acquire lock:
> [  118.840329]  (&dev->queue_lock){-+..}, at: [<c0289c11>] dev_queue_xmit
> +0x177/0x302
> [  118.840490]
> [  118.840490] but task is already holding lock:
> [  118.840567]  (&p->tcfc_lock){-+..}, at: [<f89f0066>] tcf_mirred+0x20/0x180 
> [act_mirred]
> [  118.840727]
> [  118.840727] which lock already depends on the new lock.
> [  118.840728]
> [  118.840842]
> [  118.840842] the existing dependency chain (in reverse order) is:
> [  118.840921]
> [  118.840921] -> #2 (&p->tcfc_lock){-+..}:
> [  118.841075]        [<c0143624>] __lock_acquire+0xa30/0xc19
> [  118.841324]        [<c0143887>] lock_acquire+0x7a/0x94
> [  118.841572]        [<c02d93f5>] _spin_lock+0x2e/0x58
> [  118.841820]        [<f89f0066>] tcf_mirred+0x20/0x180 [act_mirred]
> [  118.842068]        [<c0297ec4>] tcf_action_exec+0x44/0x77
> [  118.842344]        [<f89ba5d2>] u32_classify+0x119/0x24a [cls_u32]
> [  118.842595]        [<c0295ce2>] tc_classify_compat+0x2f/0x5e
> [  118.842845]        [<c0295ec3>] tc_classify+0x1a/0x80
> [  118.843092]        [<f899c118>] ingress_enqueue+0x1a/0x53 [sch_ingress]
> [  118.843343]        [<c0287139>] netif_receive_skb+0x296/0x44c
> [  118.843592]        [<f88d1a4e>] e100_poll+0x14b/0x26a [e100]
> [  118.843843]        [<c02894bc>] net_rx_action+0xbf/0x201
> [  118.844091]        [<c012ac15>] __do_softirq+0x6f/0xe9
> [  118.844343]        [<c01078af>] do_softirq+0x61/0xc8
> [  118.844591]        [<ffffffff>] 0xffffffff
> [  118.844840]
> [  118.844840] -> #1 (&dev->ingress_lock){-+..}:
> [  118.844993]        [<c0143624>] __lock_acquire+0xa30/0xc19
> [  118.845242]        [<c0143887>] lock_acquire+0x7a/0x94
> [  118.845489]        [<c02d93f5>] _spin_lock+0x2e/0x58
> [  118.845737]        [<c0295387>] qdisc_lock_tree+0x1e/0x21
> [  118.845984]        [<c02953b6>] dev_init_scheduler+0xb/0x53
> [  118.846235]        [<c0288483>] register_netdevice+0x2a3/0x2fd
> [  118.846483]        [<c028850f>] register_netdev+0x32/0x3f
> [  118.846730]        [<c03ea8ee>] loopback_net_init+0x39/0x6c
> [  118.846980]        [<c02858ef>] register_pernet_operations+0x13/0x15
> [  118.847230]        [<c0285958>] register_pernet_device+0x1f/0x4c
> [  118.847478]        [<c03ea8b3>] loopback_init+0xd/0xf
> [  118.847725]        [<c03d64df>] kernel_init+0x155/0x2c6

This looks strange: are you sure your tc scripts aren't started too
early? (Or maybe there are some problems during booting?)

Regards,
Jarek P.

> [  118.847973]        [<c0105bab>] kernel_thread_helper+0x7/0x10
> [  118.848225]        [<ffffffff>] 0xffffffff
> [  118.848472]
> [  118.848472] -> #0 (&dev->queue_lock){-+..}:
> [  118.848626]        [<c0143514>] __lock_acquire+0x920/0xc19
> [  118.848874]        [<c0143887>] lock_acquire+0x7a/0x94
> [  118.849122]        [<c02d93f5>] _spin_lock+0x2e/0x58
> [  118.849370]        [<c0289c11>] dev_queue_xmit+0x177/0x302
> [  118.849617]        [<f89f01a5>] tcf_mirred+0x15f/0x180 [act_mirred]
> [  118.849866]        [<c0297ec4>] tcf_action_exec+0x44/0x77
> [  118.850114]        [<f89ba5d2>] u32_classify+0x119/0x24a [cls_u32]
> [  118.850366]        [<c0295ce2>] tc_classify_compat+0x2f/0x5e
> [  118.850614]        [<c0295ec3>] tc_classify+0x1a/0x80
> [  118.850861]        [<f899c118>] ingress_enqueue+0x1a/0x53 [sch_ingress]
> [  118.851111]        [<c0287139>] netif_receive_skb+0x296/0x44c
> [  118.851360]        [<f88d1a4e>] e100_poll+0x14b/0x26a [e100]
> [  118.851612]        [<c02894bc>] net_rx_action+0xbf/0x201
> [  118.851859]        [<c012ac15>] __do_softirq+0x6f/0xe9
> [  118.852106]        [<c01078af>] do_softirq+0x61/0xc8
> [  118.852355]        [<ffffffff>] 0xffffffff
...

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: circular locking, mirred, 2.6.24.2
  2008-02-25  9:56 ` Jarek Poplawski
@ 2008-02-25 10:48   ` Denys Fedoryshchenko
  2008-02-25 11:39     ` Jarek Poplawski
  0 siblings, 1 reply; 47+ messages in thread
From: Denys Fedoryshchenko @ 2008-02-25 10:48 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: netdev

What does it mean early?
I have custom boot scripts, it is also custom system based on busybox. There 
is a chance that i forgot to bring ifb0 up, but thats it.
I think such warning must not appear on any actions in userspace.

On Mon, 25 Feb 2008 09:56:46 +0000, Jarek Poplawski wrote
> On 24-02-2008 23:20, Denys Fedoryshchenko wrote:
> > 2.6.24.2 with applied patches for printk,softlockup, and patch for htb 
(as i 
> > understand, they are in 2.6.25 git and it is fixes).
> > 
> > I will send also to private mails QoS rules i am using.
> > 
> > [  118.840072] =======================================================
> > [  118.840158] [ INFO: possible circular locking dependency detected ]
> > [  118.840203] 2.6.24.2-build-0022 #7
> > [  118.840243] -------------------------------------------------------
> > [  118.840288] swapper/0 is trying to acquire lock:
> > [  118.840329]  (&dev->queue_lock){-+..}, at: [<c0289c11>] dev_queue_xmit
> > +0x177/0x302
> > [  118.840490]
> > [  118.840490] but task is already holding lock:
> > [  118.840567]  (&p->tcfc_lock){-+..}, at: [<f89f0066>] tcf_mirred
+0x20/0x180 
> > [act_mirred]
> > [  118.840727]
> > [  118.840727] which lock already depends on the new lock.
> > [  118.840728]
> > [  118.840842]
> > [  118.840842] the existing dependency chain (in reverse order) is:
> > [  118.840921]
> > [  118.840921] -> #2 (&p->tcfc_lock){-+..}:
> > [  118.841075]        [<c0143624>] __lock_acquire+0xa30/0xc19
> > [  118.841324]        [<c0143887>] lock_acquire+0x7a/0x94
> > [  118.841572]        [<c02d93f5>] _spin_lock+0x2e/0x58
> > [  118.841820]        [<f89f0066>] tcf_mirred+0x20/0x180 [act_mirred]
> > [  118.842068]        [<c0297ec4>] tcf_action_exec+0x44/0x77
> > [  118.842344]        [<f89ba5d2>] u32_classify+0x119/0x24a [cls_u32]
> > [  118.842595]        [<c0295ce2>] tc_classify_compat+0x2f/0x5e
> > [  118.842845]        [<c0295ec3>] tc_classify+0x1a/0x80
> > [  118.843092]        [<f899c118>] ingress_enqueue+0x1a/0x53 [sch_ingress]
> > [  118.843343]        [<c0287139>] netif_receive_skb+0x296/0x44c
> > [  118.843592]        [<f88d1a4e>] e100_poll+0x14b/0x26a [e100]
> > [  118.843843]        [<c02894bc>] net_rx_action+0xbf/0x201
> > [  118.844091]        [<c012ac15>] __do_softirq+0x6f/0xe9
> > [  118.844343]        [<c01078af>] do_softirq+0x61/0xc8
> > [  118.844591]        [<ffffffff>] 0xffffffff
> > [  118.844840]
> > [  118.844840] -> #1 (&dev->ingress_lock){-+..}:
> > [  118.844993]        [<c0143624>] __lock_acquire+0xa30/0xc19
> > [  118.845242]        [<c0143887>] lock_acquire+0x7a/0x94
> > [  118.845489]        [<c02d93f5>] _spin_lock+0x2e/0x58
> > [  118.845737]        [<c0295387>] qdisc_lock_tree+0x1e/0x21
> > [  118.845984]        [<c02953b6>] dev_init_scheduler+0xb/0x53
> > [  118.846235]        [<c0288483>] register_netdevice+0x2a3/0x2fd
> > [  118.846483]        [<c028850f>] register_netdev+0x32/0x3f
> > [  118.846730]        [<c03ea8ee>] loopback_net_init+0x39/0x6c
> > [  118.846980]        [<c02858ef>] register_pernet_operations+0x13/0x15
> > [  118.847230]        [<c0285958>] register_pernet_device+0x1f/0x4c
> > [  118.847478]        [<c03ea8b3>] loopback_init+0xd/0xf
> > [  118.847725]        [<c03d64df>] kernel_init+0x155/0x2c6
> 
> This looks strange: are you sure your tc scripts aren't started too
> early? (Or maybe there are some problems during booting?)
> 
> Regards,
> Jarek P.
> 
> > [  118.847973]        [<c0105bab>] kernel_thread_helper+0x7/0x10
> > [  118.848225]        [<ffffffff>] 0xffffffff
> > [  118.848472]
> > [  118.848472] -> #0 (&dev->queue_lock){-+..}:
> > [  118.848626]        [<c0143514>] __lock_acquire+0x920/0xc19
> > [  118.848874]        [<c0143887>] lock_acquire+0x7a/0x94
> > [  118.849122]        [<c02d93f5>] _spin_lock+0x2e/0x58
> > [  118.849370]        [<c0289c11>] dev_queue_xmit+0x177/0x302
> > [  118.849617]        [<f89f01a5>] tcf_mirred+0x15f/0x180 [act_mirred]
> > [  118.849866]        [<c0297ec4>] tcf_action_exec+0x44/0x77
> > [  118.850114]        [<f89ba5d2>] u32_classify+0x119/0x24a [cls_u32]
> > [  118.850366]        [<c0295ce2>] tc_classify_compat+0x2f/0x5e
> > [  118.850614]        [<c0295ec3>] tc_classify+0x1a/0x80
> > [  118.850861]        [<f899c118>] ingress_enqueue+0x1a/0x53 [sch_ingress]
> > [  118.851111]        [<c0287139>] netif_receive_skb+0x296/0x44c
> > [  118.851360]        [<f88d1a4e>] e100_poll+0x14b/0x26a [e100]
> > [  118.851612]        [<c02894bc>] net_rx_action+0xbf/0x201
> > [  118.851859]        [<c012ac15>] __do_softirq+0x6f/0xe9
> > [  118.852106]        [<c01078af>] do_softirq+0x61/0xc8
> > [  118.852355]        [<ffffffff>] 0xffffffff
> ....
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: circular locking, mirred, 2.6.24.2
  2008-02-25 10:48   ` Denys Fedoryshchenko
@ 2008-02-25 11:39     ` Jarek Poplawski
  2008-03-05 10:45       ` Denys Fedoryshchenko
  0 siblings, 1 reply; 47+ messages in thread
From: Jarek Poplawski @ 2008-02-25 11:39 UTC (permalink / raw)
  To: Denys Fedoryshchenko; +Cc: netdev

On Mon, Feb 25, 2008 at 12:48:38PM +0200, Denys Fedoryshchenko wrote:
> What does it mean early?
> I have custom boot scripts, it is also custom system based on busybox. There 
> is a chance that i forgot to bring ifb0 up, but thats it.
> I think such warning must not appear on any actions in userspace.

It's not about ifb0: this report shows loopback_init after some action
on eth, so eth was probably up before lo. And of course you are right:
this warning shouldn't be there. But, since this report looks very
strange, I wonder if there could be something else that mislead
lockdep. Could you try to reproduce this with 2.6.24.2 without these
additional patches?

Jarek P.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: circular locking, mirred, 2.6.24.2
  2008-02-25 11:39     ` Jarek Poplawski
@ 2008-03-05 10:45       ` Denys Fedoryshchenko
  2008-03-05 13:54         ` [BUG] Probably lockdep bug " Jarek Poplawski
  2008-03-06 13:40         ` Jarek Poplawski
  0 siblings, 2 replies; 47+ messages in thread
From: Denys Fedoryshchenko @ 2008-03-05 10:45 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: netdev

I did test on vanilla 2.6.25-rc3, on clean Gentoo distro and got 
similar message. The strange thing, message appeared not immediately after 
launching script, but after few seconds.

Scripts is the same. I have same message on another script, used for ppp 
shaper.

[   10.536424] =======================================================
[   10.536424] [ INFO: possible circular locking dependency detected ]
[   10.536424] 2.6.25-rc3-devel #3
[   10.536424] -------------------------------------------------------
[   10.536424] swapper/0 is trying to acquire lock:
[   10.536424]  (&dev->queue_lock){-+..}, at: [<c0299b4a>] 
dev_queue_xmit+0x175/0x2f3
[   10.536424]
[   10.536424] but task is already holding lock:
[   10.536424]  (&p->tcfc_lock){-+..}, at: [<f8a67154>] tcf_mirred+0x20/0x178 
[act_mirred]
[   10.536424]
[   10.536424] which lock already depends on the new lock.
[   10.536424]
[   10.536424]
[   10.536424] the existing dependency chain (in reverse order) is:
[   10.536424]
[   10.536424] -> #2 (&p->tcfc_lock){-+..}:
[   10.536424]        [<c013efb6>] __lock_acquire+0x963/0xb18
[   10.536424]        [<f8a67154>] tcf_mirred+0x20/0x178 [act_mirred]
[   10.536424]        [<c013f1d7>] lock_acquire+0x6c/0x89
[   10.536424]        [<f8a67154>] tcf_mirred+0x20/0x178 [act_mirred]
[   10.536424]        [<c02f6927>] _spin_lock+0x1c/0x49
[   10.536424]        [<f8a67154>] tcf_mirred+0x20/0x178 [act_mirred]
[   10.536424]        [<f8a67134>] tcf_mirred+0x0/0x178 [act_mirred]
[   10.536424]        [<f8a67154>] tcf_mirred+0x20/0x178 [act_mirred]
[   10.536424]        [<c010b2ce>] save_stack_address+0x0/0x28
[   10.536424]        [<f8a67134>] tcf_mirred+0x0/0x178 [act_mirred]
[   10.536424]        [<c02a7018>] tcf_action_exec+0x44/0x77
[   10.536424]        [<f89927b0>] u32_classify+0x119/0x24e [cls_u32]
[   10.536424]        [<c013f123>] __lock_acquire+0xad0/0xb18
[   10.536424]        [<c02a5006>] tc_classify_compat+0x2f/0x5e
[   10.536424]        [<c02a5d1c>] tc_classify+0x17/0x78
[   10.536424]        [<f89670a4>] ingress_enqueue+0x1a/0x53 [sch_ingress]
[   10.536424]        [<c0297009>] netif_receive_skb+0x263/0x3e6
[   10.536424]        [<f882b4d9>] e1000_clean_rx_irq+0x380/0x448 [e1000]
[   10.536424]        [<f8828e1e>] e1000_clean+0x62/0x1fd [e1000]
[   10.536424]        [<c029926f>] net_rx_action+0xb3/0x19e
[   10.536424]        [<c01276f7>] __do_softirq+0x6f/0xe9
[   10.536424]        [<c0106aa7>] do_softirq+0x5e/0xa8
[   10.536424]        [<ffffffff>] 0xffffffff
[   10.536424]
[   10.536424] -> #1 (&dev->ingress_lock){-+..}:
[   10.536424]        [<c013efb6>] __lock_acquire+0x963/0xb18
[   10.536424]        [<c02a4718>] qdisc_lock_tree+0x1e/0x21
[   10.536424]        [<c013f1d7>] lock_acquire+0x6c/0x89
[   10.536424]        [<c02a4718>] qdisc_lock_tree+0x1e/0x21
[   10.536424]        [<c02f6927>] _spin_lock+0x1c/0x49
[   10.536424]        [<c02a4718>] qdisc_lock_tree+0x1e/0x21
[   10.536424]        [<c02a4718>] qdisc_lock_tree+0x1e/0x21
[   10.536424]        [<c02a4747>] dev_init_scheduler+0xb/0x4d
[   10.536424]        [<c0298299>] register_netdevice+0x288/0x2e2
[   10.536424]        [<c0298325>] register_netdev+0x32/0x3f
[   10.536424]        [<c042a867>] loopback_net_init+0x34/0x63
[   10.536424]        [<c0295b3f>] register_pernet_operations+0x13/0x15
[   10.536424]        [<c0295ba8>] register_pernet_device+0x1f/0x4c
[   10.536424]        [<c042a831>] loopback_init+0xd/0xf
[   10.536424]        [<c0415478>] kernel_init+0x132/0x27c
[   10.536424]        [<c0415346>] kernel_init+0x0/0x27c
[   10.536424]        [<c0415346>] kernel_init+0x0/0x27c
[   10.536424]        [<c01056bf>] kernel_thread_helper+0x7/0x10
[   10.536424]        [<ffffffff>] 0xffffffff
[   10.536424]
[   10.536424] -> #0 (&dev->queue_lock){-+..}:
[   10.536424]        [<c013d169>] print_circular_bug_tail+0x2e/0x66
[   10.536424]        [<c013eedd>] __lock_acquire+0x88a/0xb18
[   10.536424]        [<c013f1d7>] lock_acquire+0x6c/0x89
[   10.536424]        [<c0299b4a>] dev_queue_xmit+0x175/0x2f3
[   10.536424]        [<c02f6927>] _spin_lock+0x1c/0x49
[   10.536424]        [<c0299b4a>] dev_queue_xmit+0x175/0x2f3
[   10.536425]        [<c0299b4a>] dev_queue_xmit+0x175/0x2f3
[   10.536425]        [<f8a6728b>] tcf_mirred+0x157/0x178 [act_mirred]
[   10.536425]        [<f8a67134>] tcf_mirred+0x0/0x178 [act_mirred]
[   10.536425]        [<c02a7018>] tcf_action_exec+0x44/0x77
[   10.536425]        [<f89927b0>] u32_classify+0x119/0x24e [cls_u32]
[   10.536425]        [<c013f123>] __lock_acquire+0xad0/0xb18
[   10.536425]        [<c02a5006>] tc_classify_compat+0x2f/0x5e
[   10.536425]        [<c02a5d1c>] tc_classify+0x17/0x78
[   10.536425]        [<f89670a4>] ingress_enqueue+0x1a/0x53 [sch_ingress]
[   10.536425]        [<c0297009>] netif_receive_skb+0x263/0x3e6
[   10.536425]        [<f882b4d9>] e1000_clean_rx_irq+0x380/0x448 [e1000]
[   10.536425]        [<f8828e1e>] e1000_clean+0x62/0x1fd [e1000]
[   10.536425]        [<c029926f>] net_rx_action+0xb3/0x19e
[   10.536425]        [<c01276f7>] __do_softirq+0x6f/0xe9
[   10.536425]        [<c0106aa7>] do_softirq+0x5e/0xa8
[   10.536425]        [<ffffffff>] 0xffffffff
[   10.536425]
[   10.536425] other info that might help us debug this:
[   10.536425]
[   10.536425] 5 locks held by swapper/0:
[   10.536425]  #0:  (rcu_read_lock){..--}, at: [<c029920c>] 
net_rx_action+0x50/0x19e
[   10.536425]  #1:  (rcu_read_lock){..--}, at: [<c0296e96>] 
netif_receive_skb+0xf0/0x3e6
[   10.536425]  #2:  (&dev->ingress_lock){-+..}, at: [<c0296ff5>] 
netif_receive_skb+0x24f/0x3e6
[   10.536425]  #3:  (&p->tcfc_lock){-+..}, at: [<f8a67154>] tcf_mirred+0x20/
0x178 [act_mirred]
[   10.536425]  #4:  (rcu_read_lock){..--}, at: [<c0299afb>] 
dev_queue_xmit+0x126/0x2f3
[   10.536425]
[   10.536425] stack backtrace:
[   10.536425] Pid: 0, comm: swapper Not tainted 2.6.25-rc3-devel #3
[   10.536425]  [<c013d196>] print_circular_bug_tail+0x5b/0x66
[   10.536425]  [<c013eedd>] __lock_acquire+0x88a/0xb18
[   10.536425]  [<c013f1d7>] lock_acquire+0x6c/0x89
[   10.536425]  [<c0299b4a>] ? dev_queue_xmit+0x175/0x2f3
[   10.536425]  [<c02f6927>] _spin_lock+0x1c/0x49
[   10.536425]  [<c0299b4a>] ? dev_queue_xmit+0x175/0x2f3
[   10.536425]  [<c0299b4a>] dev_queue_xmit+0x175/0x2f3
[   10.536425]  [<f8a6728b>] tcf_mirred+0x157/0x178 [act_mirred]
[   10.536425]  [<f8a67134>] ? tcf_mirred+0x0/0x178 [act_mirred]
[   10.536425]  [<c02a7018>] tcf_action_exec+0x44/0x77
[   10.536425]  [<f89927b0>] u32_classify+0x119/0x24e [cls_u32]
[   10.536425]  [<c013f123>] ? __lock_acquire+0xad0/0xb18
[   10.536425]  [<c02a5006>] tc_classify_compat+0x2f/0x5e
[   10.536425]  [<c02a5d1c>] tc_classify+0x17/0x78
[   10.536425]  [<f89670a4>] ingress_enqueue+0x1a/0x53 [sch_ingress]
[   10.536425]  [<c0297009>] netif_receive_skb+0x263/0x3e6
[   10.536425]  [<f882b4d9>] e1000_clean_rx_irq+0x380/0x448 [e1000]
[   10.536425]  [<f8828e1e>] e1000_clean+0x62/0x1fd [e1000]
[   10.536425]  [<c029926f>] net_rx_action+0xb3/0x19e
[   10.536425]  [<c01276f7>] __do_softirq+0x6f/0xe9
[   10.536425]  [<c0106aa7>] do_softirq+0x5e/0xa8
[   10.536425]  [<c014d8c9>] ? handle_edge_irq+0x0/0x10a
[   10.536425]  [<c0127655>] irq_exit+0x44/0x77
[   10.536425]  [<c0106b91>] do_IRQ+0xa0/0xb7
[   10.536425]  [<c0105446>] common_interrupt+0x2e/0x34
[   10.536425]  [<c01031b9>] ? mwait_idle_with_hints+0x39/0x3d
[   10.536425]  [<c0103364>] ? mwait_idle+0x0/0x14
[   10.536425]  [<c0103376>] mwait_idle+0x12/0x14
[   10.536425]  [<c0103696>] cpu_idle+0xb5/0xd5
[   10.536425]  [<c02e5b79>] rest_init+0x49/0x4b




On Mon, 25 Feb 2008 11:39:30 +0000, Jarek Poplawski wrote
> On Mon, Feb 25, 2008 at 12:48:38PM +0200, Denys Fedoryshchenko wrote:
> > What does it mean early?
> > I have custom boot scripts, it is also custom system based on busybox. 
There 
> > is a chance that i forgot to bring ifb0 up, but thats it.
> > I think such warning must not appear on any actions in userspace.
> 
> It's not about ifb0: this report shows loopback_init after some 
> action on eth, so eth was probably up before lo. And of course you 
> are right: this warning shouldn't be there. But, since this report 
> looks very strange, I wonder if there could be something else that mislead
> lockdep. Could you try to reproduce this with 2.6.24.2 without these
> additional patches?
> 
> Jarek P.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [BUG] Probably lockdep bug Re: circular locking, mirred, 2.6.24.2
  2008-03-05 10:45       ` Denys Fedoryshchenko
@ 2008-03-05 13:54         ` Jarek Poplawski
  2008-03-06  9:41           ` Jarek Poplawski
  2008-03-06 13:40         ` Jarek Poplawski
  1 sibling, 1 reply; 47+ messages in thread
From: Jarek Poplawski @ 2008-03-05 13:54 UTC (permalink / raw)
  To: Denys Fedoryshchenko; +Cc: netdev, Peter Zijlstra, Ingo Molnar, linux-kernel

Hi,

dev->queue_lock is taken in a scenario like below: always after
dev->ingress_lock and p->tcfc_lock, so just like on this last
backtrace with info about held locks. But this report shows that
lockdep for some reason forgot the history before dev->queue_lock,
and recorded it again. It seems, even if there is something wrong
with init lockdep shouldn't report it like this.

I CC this report to lockdep maintainers and linux-kernel.

Regards,
Jarek P.

Original lockdep report for 2.6.24.2:
http://permalink.gmane.org/gmane.linux.network/86896 


On Wed, Mar 05, 2008 at 12:45:51PM +0200, Denys Fedoryshchenko wrote:
> I did test on vanilla 2.6.25-rc3, on clean Gentoo distro and got 
> similar message. The strange thing, message appeared not immediately after 
> launching script, but after few seconds.
> 
> Scripts is the same. I have same message on another script, used for ppp 
> shaper.
> 
> [   10.536424] =======================================================
> [   10.536424] [ INFO: possible circular locking dependency detected ]
> [   10.536424] 2.6.25-rc3-devel #3
> [   10.536424] -------------------------------------------------------
> [   10.536424] swapper/0 is trying to acquire lock:
> [   10.536424]  (&dev->queue_lock){-+..}, at: [<c0299b4a>] 
> dev_queue_xmit+0x175/0x2f3
> [   10.536424]
> [   10.536424] but task is already holding lock:
> [   10.536424]  (&p->tcfc_lock){-+..}, at: [<f8a67154>] tcf_mirred+0x20/0x178 
> [act_mirred]
> [   10.536424]
> [   10.536424] which lock already depends on the new lock.
> [   10.536424]
> [   10.536424]
> [   10.536424] the existing dependency chain (in reverse order) is:
> [   10.536424]
> [   10.536424] -> #2 (&p->tcfc_lock){-+..}:
> [   10.536424]        [<c013efb6>] __lock_acquire+0x963/0xb18
> [   10.536424]        [<f8a67154>] tcf_mirred+0x20/0x178 [act_mirred]
> [   10.536424]        [<c013f1d7>] lock_acquire+0x6c/0x89
> [   10.536424]        [<f8a67154>] tcf_mirred+0x20/0x178 [act_mirred]
> [   10.536424]        [<c02f6927>] _spin_lock+0x1c/0x49
> [   10.536424]        [<f8a67154>] tcf_mirred+0x20/0x178 [act_mirred]
> [   10.536424]        [<f8a67134>] tcf_mirred+0x0/0x178 [act_mirred]
> [   10.536424]        [<f8a67154>] tcf_mirred+0x20/0x178 [act_mirred]
> [   10.536424]        [<c010b2ce>] save_stack_address+0x0/0x28
> [   10.536424]        [<f8a67134>] tcf_mirred+0x0/0x178 [act_mirred]
> [   10.536424]        [<c02a7018>] tcf_action_exec+0x44/0x77
> [   10.536424]        [<f89927b0>] u32_classify+0x119/0x24e [cls_u32]
> [   10.536424]        [<c013f123>] __lock_acquire+0xad0/0xb18
> [   10.536424]        [<c02a5006>] tc_classify_compat+0x2f/0x5e
> [   10.536424]        [<c02a5d1c>] tc_classify+0x17/0x78
> [   10.536424]        [<f89670a4>] ingress_enqueue+0x1a/0x53 [sch_ingress]
> [   10.536424]        [<c0297009>] netif_receive_skb+0x263/0x3e6
> [   10.536424]        [<f882b4d9>] e1000_clean_rx_irq+0x380/0x448 [e1000]
> [   10.536424]        [<f8828e1e>] e1000_clean+0x62/0x1fd [e1000]
> [   10.536424]        [<c029926f>] net_rx_action+0xb3/0x19e
> [   10.536424]        [<c01276f7>] __do_softirq+0x6f/0xe9
> [   10.536424]        [<c0106aa7>] do_softirq+0x5e/0xa8
> [   10.536424]        [<ffffffff>] 0xffffffff
> [   10.536424]
> [   10.536424] -> #1 (&dev->ingress_lock){-+..}:
> [   10.536424]        [<c013efb6>] __lock_acquire+0x963/0xb18
> [   10.536424]        [<c02a4718>] qdisc_lock_tree+0x1e/0x21
> [   10.536424]        [<c013f1d7>] lock_acquire+0x6c/0x89
> [   10.536424]        [<c02a4718>] qdisc_lock_tree+0x1e/0x21
> [   10.536424]        [<c02f6927>] _spin_lock+0x1c/0x49
> [   10.536424]        [<c02a4718>] qdisc_lock_tree+0x1e/0x21
> [   10.536424]        [<c02a4718>] qdisc_lock_tree+0x1e/0x21
> [   10.536424]        [<c02a4747>] dev_init_scheduler+0xb/0x4d
> [   10.536424]        [<c0298299>] register_netdevice+0x288/0x2e2
> [   10.536424]        [<c0298325>] register_netdev+0x32/0x3f
> [   10.536424]        [<c042a867>] loopback_net_init+0x34/0x63
> [   10.536424]        [<c0295b3f>] register_pernet_operations+0x13/0x15
> [   10.536424]        [<c0295ba8>] register_pernet_device+0x1f/0x4c
> [   10.536424]        [<c042a831>] loopback_init+0xd/0xf
> [   10.536424]        [<c0415478>] kernel_init+0x132/0x27c
> [   10.536424]        [<c0415346>] kernel_init+0x0/0x27c
> [   10.536424]        [<c0415346>] kernel_init+0x0/0x27c
> [   10.536424]        [<c01056bf>] kernel_thread_helper+0x7/0x10
> [   10.536424]        [<ffffffff>] 0xffffffff
> [   10.536424]
> [   10.536424] -> #0 (&dev->queue_lock){-+..}:
> [   10.536424]        [<c013d169>] print_circular_bug_tail+0x2e/0x66
> [   10.536424]        [<c013eedd>] __lock_acquire+0x88a/0xb18
> [   10.536424]        [<c013f1d7>] lock_acquire+0x6c/0x89
> [   10.536424]        [<c0299b4a>] dev_queue_xmit+0x175/0x2f3
> [   10.536424]        [<c02f6927>] _spin_lock+0x1c/0x49
> [   10.536424]        [<c0299b4a>] dev_queue_xmit+0x175/0x2f3
> [   10.536425]        [<c0299b4a>] dev_queue_xmit+0x175/0x2f3
> [   10.536425]        [<f8a6728b>] tcf_mirred+0x157/0x178 [act_mirred]
> [   10.536425]        [<f8a67134>] tcf_mirred+0x0/0x178 [act_mirred]
> [   10.536425]        [<c02a7018>] tcf_action_exec+0x44/0x77
> [   10.536425]        [<f89927b0>] u32_classify+0x119/0x24e [cls_u32]
> [   10.536425]        [<c013f123>] __lock_acquire+0xad0/0xb18
> [   10.536425]        [<c02a5006>] tc_classify_compat+0x2f/0x5e
> [   10.536425]        [<c02a5d1c>] tc_classify+0x17/0x78
> [   10.536425]        [<f89670a4>] ingress_enqueue+0x1a/0x53 [sch_ingress]
> [   10.536425]        [<c0297009>] netif_receive_skb+0x263/0x3e6
> [   10.536425]        [<f882b4d9>] e1000_clean_rx_irq+0x380/0x448 [e1000]
> [   10.536425]        [<f8828e1e>] e1000_clean+0x62/0x1fd [e1000]
> [   10.536425]        [<c029926f>] net_rx_action+0xb3/0x19e
> [   10.536425]        [<c01276f7>] __do_softirq+0x6f/0xe9
> [   10.536425]        [<c0106aa7>] do_softirq+0x5e/0xa8
> [   10.536425]        [<ffffffff>] 0xffffffff
> [   10.536425]
> [   10.536425] other info that might help us debug this:
> [   10.536425]
> [   10.536425] 5 locks held by swapper/0:
> [   10.536425]  #0:  (rcu_read_lock){..--}, at: [<c029920c>] 
> net_rx_action+0x50/0x19e
> [   10.536425]  #1:  (rcu_read_lock){..--}, at: [<c0296e96>] 
> netif_receive_skb+0xf0/0x3e6
> [   10.536425]  #2:  (&dev->ingress_lock){-+..}, at: [<c0296ff5>] 
> netif_receive_skb+0x24f/0x3e6
> [   10.536425]  #3:  (&p->tcfc_lock){-+..}, at: [<f8a67154>] tcf_mirred+0x20/
> 0x178 [act_mirred]
> [   10.536425]  #4:  (rcu_read_lock){..--}, at: [<c0299afb>] 
> dev_queue_xmit+0x126/0x2f3
> [   10.536425]
> [   10.536425] stack backtrace:
> [   10.536425] Pid: 0, comm: swapper Not tainted 2.6.25-rc3-devel #3
> [   10.536425]  [<c013d196>] print_circular_bug_tail+0x5b/0x66
> [   10.536425]  [<c013eedd>] __lock_acquire+0x88a/0xb18
> [   10.536425]  [<c013f1d7>] lock_acquire+0x6c/0x89
> [   10.536425]  [<c0299b4a>] ? dev_queue_xmit+0x175/0x2f3
> [   10.536425]  [<c02f6927>] _spin_lock+0x1c/0x49
> [   10.536425]  [<c0299b4a>] ? dev_queue_xmit+0x175/0x2f3
> [   10.536425]  [<c0299b4a>] dev_queue_xmit+0x175/0x2f3
> [   10.536425]  [<f8a6728b>] tcf_mirred+0x157/0x178 [act_mirred]
> [   10.536425]  [<f8a67134>] ? tcf_mirred+0x0/0x178 [act_mirred]
> [   10.536425]  [<c02a7018>] tcf_action_exec+0x44/0x77
> [   10.536425]  [<f89927b0>] u32_classify+0x119/0x24e [cls_u32]
> [   10.536425]  [<c013f123>] ? __lock_acquire+0xad0/0xb18
> [   10.536425]  [<c02a5006>] tc_classify_compat+0x2f/0x5e
> [   10.536425]  [<c02a5d1c>] tc_classify+0x17/0x78
> [   10.536425]  [<f89670a4>] ingress_enqueue+0x1a/0x53 [sch_ingress]
> [   10.536425]  [<c0297009>] netif_receive_skb+0x263/0x3e6
> [   10.536425]  [<f882b4d9>] e1000_clean_rx_irq+0x380/0x448 [e1000]
> [   10.536425]  [<f8828e1e>] e1000_clean+0x62/0x1fd [e1000]
> [   10.536425]  [<c029926f>] net_rx_action+0xb3/0x19e
> [   10.536425]  [<c01276f7>] __do_softirq+0x6f/0xe9
> [   10.536425]  [<c0106aa7>] do_softirq+0x5e/0xa8
> [   10.536425]  [<c014d8c9>] ? handle_edge_irq+0x0/0x10a
> [   10.536425]  [<c0127655>] irq_exit+0x44/0x77
> [   10.536425]  [<c0106b91>] do_IRQ+0xa0/0xb7
> [   10.536425]  [<c0105446>] common_interrupt+0x2e/0x34
> [   10.536425]  [<c01031b9>] ? mwait_idle_with_hints+0x39/0x3d
> [   10.536425]  [<c0103364>] ? mwait_idle+0x0/0x14
> [   10.536425]  [<c0103376>] mwait_idle+0x12/0x14
> [   10.536425]  [<c0103696>] cpu_idle+0xb5/0xd5
> [   10.536425]  [<c02e5b79>] rest_init+0x49/0x4b
> 
> 
> 
> 
> On Mon, 25 Feb 2008 11:39:30 +0000, Jarek Poplawski wrote
> > On Mon, Feb 25, 2008 at 12:48:38PM +0200, Denys Fedoryshchenko wrote:
> > > What does it mean early?
> > > I have custom boot scripts, it is also custom system based on busybox. 
> There 
> > > is a chance that i forgot to bring ifb0 up, but thats it.
> > > I think such warning must not appear on any actions in userspace.
> > 
> > It's not about ifb0: this report shows loopback_init after some 
> > action on eth, so eth was probably up before lo. And of course you 
> > are right: this warning shouldn't be there. But, since this report 
> > looks very strange, I wonder if there could be something else that mislead
> > lockdep. Could you try to reproduce this with 2.6.24.2 without these
> > additional patches?
> > 
> > Jarek P.
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> --
> Denys Fedoryshchenko
> Technical Manager
> Virtual ISP S.A.L.
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [BUG] Probably lockdep bug Re: circular locking, mirred, 2.6.24.2
  2008-03-05 13:54         ` [BUG] Probably lockdep bug " Jarek Poplawski
@ 2008-03-06  9:41           ` Jarek Poplawski
  0 siblings, 0 replies; 47+ messages in thread
From: Jarek Poplawski @ 2008-03-06  9:41 UTC (permalink / raw)
  To: Denys Fedoryshchenko; +Cc: netdev, Peter Zijlstra, Ingo Molnar, linux-kernel

On Wed, Mar 05, 2008 at 01:54:48PM +0000, Jarek Poplawski wrote:
> Hi,
> 
> dev->queue_lock is taken in a scenario like below: always after
> dev->ingress_lock and p->tcfc_lock, so just like on this last
> backtrace with info about held locks. But this report shows that
> lockdep for some reason forgot the history before dev->queue_lock,
> and recorded it again. It seems, even if there is something wrong
> with init lockdep shouldn't report it like this.

...Hmmm... On the other hand, despite misleading dependency chain on
this report, lockdep seems to be right: dev->queue_lock and
dev->ingress_lock are really taken in a different order from
qdisc_lock_tree() and while using act_mirred! Now I wonder why this
warning is so rare?

So, let's give a break to lockdep maintainers and linux-kernel, and
try to figure it out more in netdev...

Jarek P.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: circular locking, mirred, 2.6.24.2
  2008-03-05 10:45       ` Denys Fedoryshchenko
  2008-03-05 13:54         ` [BUG] Probably lockdep bug " Jarek Poplawski
@ 2008-03-06 13:40         ` Jarek Poplawski
  2008-03-06 13:57           ` Denys Fedoryshchenko
  2008-03-06 13:59           ` jamal
  1 sibling, 2 replies; 47+ messages in thread
From: Jarek Poplawski @ 2008-03-06 13:40 UTC (permalink / raw)
  To: Denys Fedoryshchenko; +Cc: netdev, jamal

On Wed, Mar 05, 2008 at 12:45:51PM +0200, Denys Fedoryshchenko wrote:
> I did test on vanilla 2.6.25-rc3, on clean Gentoo distro and got 
> similar message. The strange thing, message appeared not immediately after 
> launching script, but after few seconds.
> 
> Scripts is the same. I have same message on another script, used for ppp 
> shaper.
> 
> [   10.536424] =======================================================
> [   10.536424] [ INFO: possible circular locking dependency detected ]
> [   10.536424] 2.6.25-rc3-devel #3
> [   10.536424] -------------------------------------------------------
> [   10.536424] swapper/0 is trying to acquire lock:
> [   10.536424]  (&dev->queue_lock){-+..}, at: [<c0299b4a>] 
> dev_queue_xmit+0x175/0x2f3
> [   10.536424]
> [   10.536424] but task is already holding lock:
> [   10.536424]  (&p->tcfc_lock){-+..}, at: [<f8a67154>] tcf_mirred+0x20/0x178 
> [act_mirred]
> [   10.536424]
> [   10.536424] which lock already depends on the new lock.
...

Hi,

I'm not sure this lockdep report is because of this, but there is
really a problem with lock order while using sch_ingress with
act_mirred: dev->queue_lock is taken after dev->ingress_lock, so
reversely to e.g. qdisc_lock_tree(). This shouldn't be a problem
when one of the devices is ifb yet.

Regards,
Jarek P.

Here is a patch for testing:

---

 drivers/net/ifb.c |   19 +++++++++++++++++++
 1 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ifb.c b/drivers/net/ifb.c
index 15949d3..2bc71df 100644
--- a/drivers/net/ifb.c
+++ b/drivers/net/ifb.c
@@ -227,6 +227,22 @@ static struct rtnl_link_ops ifb_link_ops __read_mostly = {
 module_param(numifbs, int, 0);
 MODULE_PARM_DESC(numifbs, "Number of ifb devices");
 
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+/*
+ * dev_ifb->queue_lock is usually taken after dev->ingress_lock,
+ * so let's tell lockdep it's different from dev->queue_lock
+ */
+static struct lock_class_key ifb_queue_lock_key;
+static inline void ifb_set_lock_class(spinlock_t *lock)
+{
+	lockdep_set_class(lock, &ifb_queue_lock_key);
+}
+#else
+static inline void ifb_set_lock_class(spinlock_t *lock)
+{
+}
+#endif	/* CONFIG_DEBUG_LOCK_ALLOC */
+
 static int __init ifb_init_one(int index)
 {
 	struct net_device *dev_ifb;
@@ -246,6 +262,9 @@ static int __init ifb_init_one(int index)
 	err = register_netdevice(dev_ifb);
 	if (err < 0)
 		goto err;
+
+	ifb_set_lock_class(&dev_ifb->queue_lock);
+
 	return 0;
 
 err:

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: circular locking, mirred, 2.6.24.2
  2008-03-06 13:40         ` Jarek Poplawski
@ 2008-03-06 13:57           ` Denys Fedoryshchenko
  2008-03-06 14:27             ` jamal
  2008-03-06 13:59           ` jamal
  1 sibling, 1 reply; 47+ messages in thread
From: Denys Fedoryshchenko @ 2008-03-06 13:57 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: netdev, jamal

I am able to reproduce this warning over this relatively simple shell script
on my Gentoo PC (2.6.25-rc3).
http://www.nuclearcat.com/files/bug_feb.txt

Probably it will help to debug issue for more experienced developers. Note:
it appears not immediately, second time i tested, it's appeared after while,
but in matter of seconds.

Note - it can stop traffic on PC completely. It is also seems crashed my 
desktop PC, i am not able to execute "tc qdisc del dev eth0 root".
The system hang completely. I had few similar issues on my PPPoE servers 
(with different scripts for shapers), that system hang, and even "reboot -f" 
doesn't work sometimes.


On Thu, 6 Mar 2008 13:40:15 +0000, Jarek Poplawski wrote
> On Wed, Mar 05, 2008 at 12:45:51PM +0200, Denys Fedoryshchenko wrote:
> > I did test on vanilla 2.6.25-rc3, on clean Gentoo distro and got 
> > similar message. The strange thing, message appeared not immediately 
after 
> > launching script, but after few seconds.
> > 
> > Scripts is the same. I have same message on another script, used for ppp 
> > shaper.
> > 
> > [   10.536424] =======================================================
> > [   10.536424] [ INFO: possible circular locking dependency detected ]
> > [   10.536424] 2.6.25-rc3-devel #3
> > [   10.536424] -------------------------------------------------------
> > [   10.536424] swapper/0 is trying to acquire lock:
> > [   10.536424]  (&dev->queue_lock){-+..}, at: [<c0299b4a>] 
> > dev_queue_xmit+0x175/0x2f3
> > [   10.536424]
> > [   10.536424] but task is already holding lock:
> > [   10.536424]  (&p->tcfc_lock){-+..}, at: [<f8a67154>] tcf_mirred+0x20/
0x178 
> > [act_mirred]
> > [   10.536424]
> > [   10.536424] which lock already depends on the new lock.
> ....
> 
> Hi,
> 
> I'm not sure this lockdep report is because of this, but there is
> really a problem with lock order while using sch_ingress with
> act_mirred: dev->queue_lock is taken after dev->ingress_lock, so
> reversely to e.g. qdisc_lock_tree(). This shouldn't be a problem
> when one of the devices is ifb yet.
> 
> Regards,
> Jarek P.
> 
> Here is a patch for testing:
> 
> ---
> 
>  drivers/net/ifb.c |   19 +++++++++++++++++++
>  1 files changed, 19 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/net/ifb.c b/drivers/net/ifb.c
> index 15949d3..2bc71df 100644
> --- a/drivers/net/ifb.c
> +++ b/drivers/net/ifb.c
> @@ -227,6 +227,22 @@ static struct rtnl_link_ops ifb_link_ops 
> __read_mostly = { module_param(numifbs, int, 0); 
> MODULE_PARM_DESC(numifbs, "Number of ifb devices");
> 
> +#ifdef CONFIG_DEBUG_LOCK_ALLOC
> +/*
> + * dev_ifb->queue_lock is usually taken after dev->ingress_lock,
> + * so let's tell lockdep it's different from dev->queue_lock
> + */
> +static struct lock_class_key ifb_queue_lock_key;
> +static inline void ifb_set_lock_class(spinlock_t *lock)
> +{
> +	lockdep_set_class(lock, &ifb_queue_lock_key);
> +}
> +#else
> +static inline void ifb_set_lock_class(spinlock_t *lock)
> +{
> +}
> +#endif	/* CONFIG_DEBUG_LOCK_ALLOC */
> +
>  static int __init ifb_init_one(int index)
>  {
>  	struct net_device *dev_ifb;
> @@ -246,6 +262,9 @@ static int __init ifb_init_one(int index)
>  	err = register_netdevice(dev_ifb);
>  	if (err < 0)
>  		goto err;
> +
> +	ifb_set_lock_class(&dev_ifb->queue_lock);
> +
>  	return 0;
> 
>  err:


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: circular locking, mirred, 2.6.24.2
  2008-03-06 13:57           ` Denys Fedoryshchenko
@ 2008-03-06 14:27             ` jamal
  2008-03-06 15:50               ` Denys Fedoryshchenko
  0 siblings, 1 reply; 47+ messages in thread
From: jamal @ 2008-03-06 14:27 UTC (permalink / raw)
  To: Denys Fedoryshchenko; +Cc: Jarek Poplawski, netdev

On Thu, 2008-06-03 at 15:57 +0200, Denys Fedoryshchenko wrote:
> I am able to reproduce this warning over this relatively simple shell script
> on my Gentoo PC (2.6.25-rc3).
> http://www.nuclearcat.com/files/bug_feb.txt
> 

That script looks pretty sane to me - nothing super-exciting. I suspect
you eventually want them all to look like ifb1 on the egress.
Do you see the same issue without the ifb1 speacial case?

> Probably it will help to debug issue for more experienced developers. Note:
> it appears not immediately, second time i tested, it's appeared after while,
> but in matter of seconds.

I wonder is there some latency from the moment you insmod ifb to the
moment the tc rules take effect? Will it still happen if you dont have
modules?
Also note, that lock dependency is a bit strange, Jarek correct me if i
am wrong; it seems to say:
a packet received on ingress of some e1000 (ethx) gets acted on by
mirred which ends grabbing lock of an ifb device - this part should be
fine and no need for the alarm. The alarm seems to be a result of a
loopback device that is being registered in between the two activities.
i.e there are three devices affected with entirely different locks(ethx,
ifbx, and loopback). Smells like lockdep is getting it wrong?

> Note - it can stop traffic on PC completely. It is also seems crashed my 
> desktop PC, i am not able to execute "tc qdisc del dev eth0 root".
> The system hang completely. I had few similar issues on my PPPoE servers 
> (with different scripts for shapers), that system hang, and even "reboot -f" 
> doesn't work sometimes.

This sounds like a different issue from above - when did this start to
happen? Is it at the same time as above warnings showing up?

cheers,
jamal

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: circular locking, mirred, 2.6.24.2
  2008-03-06 14:27             ` jamal
@ 2008-03-06 15:50               ` Denys Fedoryshchenko
  2008-03-06 20:25                 ` Jarek Poplawski
  2008-03-06 20:44                 ` jamal
  0 siblings, 2 replies; 47+ messages in thread
From: Denys Fedoryshchenko @ 2008-03-06 15:50 UTC (permalink / raw)
  To: hadi; +Cc: Jarek Poplawski, netdev

On Thu, 06 Mar 2008 09:27:14 -0500, jamal wrote
> On Thu, 2008-06-03 at 15:57 +0200, Denys Fedoryshchenko wrote:
> > I am able to reproduce this warning over this relatively simple shell 
script
> > on my Gentoo PC (2.6.25-rc3).
> > http://www.nuclearcat.com/files/bug_feb.txt
> >
> 
> That script looks pretty sane to me - nothing super-exciting. I suspect
> you eventually want them all to look like ifb1 on the egress.
> Do you see the same issue without the ifb1 speacial case?
Well, i am able to reproduce in much more trivial script. Tested 2.6.25-rc4 
also.

modprobe ifb
ifconfig ifb0 up
TC=/sbin/tc
$TC qdisc del dev eth0 ingress 1>/dev/null 2>/dev/null
$TC qdisc add dev eth0 ingress
${TC} filter add dev eth0 parent ffff: protocol ip prio 10 u32 \
      match u32 0 0 flowid 1:1 \
      action mirred egress redirect dev ifb0

> 
> > Probably it will help to debug issue for more experienced developers. 
Note:
> > it appears not immediately, second time i tested, it's appeared after 
while,
> > but in matter of seconds.
> 
> I wonder is there some latency from the moment you insmod ifb to the
> moment the tc rules take effect? Will it still happen if you dont 
> have modules? Also note, that lock dependency is a bit strange,
>  Jarek correct me if i am wrong; it seems to say: a packet received 
> on ingress of some e1000 (ethx) gets acted on by mirred which ends 
> grabbing lock of an ifb device - this part should be fine and no 
> need for the alarm. The alarm seems to be a result of a loopback 
> device that is being registered in between the two activities.
> i.e there are three devices affected with entirely different 
> locks(ethx, ifbx, and loopback). Smells like lockdep is getting it wrong?
No idea, i have strange lockup's on my systems where i have ifb, and that 
make me worry. And i feel it is directly related with my love to use ifb 
devices and way how i am using them.

> 
> > Note - it can stop traffic on PC completely. It is also seems crashed my 
> > desktop PC, i am not able to execute "tc qdisc del dev eth0 root".
> > The system hang completely. I had few similar issues on my PPPoE servers 
> > (with different scripts for shapers), that system hang, and even "reboot -
f" 
> > doesn't work sometimes.
> 
> This sounds like a different issue from above - when did this start 
> to happen? Is it at the same time as above warnings showing up?

Yes, it is different issue seems, it is rare to lockup system , and i will 
dig more, to understand how it is happening.

> 
> cheers,
> jamal
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: circular locking, mirred, 2.6.24.2
  2008-03-06 15:50               ` Denys Fedoryshchenko
@ 2008-03-06 20:25                 ` Jarek Poplawski
  2008-03-06 20:56                   ` jamal
  2008-03-06 20:44                 ` jamal
  1 sibling, 1 reply; 47+ messages in thread
From: Jarek Poplawski @ 2008-03-06 20:25 UTC (permalink / raw)
  To: Denys Fedoryshchenko; +Cc: hadi, netdev

On Thu, Mar 06, 2008 at 05:50:59PM +0200, Denys Fedoryshchenko wrote:
> On Thu, 06 Mar 2008 09:27:14 -0500, jamal wrote
> > On Thu, 2008-06-03 at 15:57 +0200, Denys Fedoryshchenko wrote:
> > > I am able to reproduce this warning over this relatively simple shell 
> script
> > > on my Gentoo PC (2.6.25-rc3).
> > > http://www.nuclearcat.com/files/bug_feb.txt
> > >
> > 
> > That script looks pretty sane to me - nothing super-exciting. I suspect
> > you eventually want them all to look like ifb1 on the egress.
> > Do you see the same issue without the ifb1 speacial case?
> Well, i am able to reproduce in much more trivial script. Tested 2.6.25-rc4 
> also.
> 
> modprobe ifb
> ifconfig ifb0 up
> TC=/sbin/tc
> $TC qdisc del dev eth0 ingress 1>/dev/null 2>/dev/null
> $TC qdisc add dev eth0 ingress
> ${TC} filter add dev eth0 parent ffff: protocol ip prio 10 u32 \
>       match u32 0 0 flowid 1:1 \
>       action mirred egress redirect dev ifb0
> 

It's really strange: I can't reproduce this, and if it were so easy we
would get really a lot of similar reports. It looks like you have
something special. This lockdep report with this kind of problem
usually looks different too. The good side is it's easy to reproduce.
So, could you try the patch below? (It's only supposed to fix the lockdep
warning, not lockups).

> > 
> > > Probably it will help to debug issue for more experienced developers. 
> Note:
> > > it appears not immediately, second time i tested, it's appeared after 
> while,
> > > but in matter of seconds.
> > 
> > I wonder is there some latency from the moment you insmod ifb to the
> > moment the tc rules take effect? Will it still happen if you dont 
> > have modules? Also note, that lock dependency is a bit strange,
> >  Jarek correct me if i am wrong; it seems to say: a packet received 
> > on ingress of some e1000 (ethx) gets acted on by mirred which ends 
> > grabbing lock of an ifb device - this part should be fine and no 
> > need for the alarm. The alarm seems to be a result of a loopback 
> > device that is being registered in between the two activities.
> > i.e there are three devices affected with entirely different 
> > locks(ethx, ifbx, and loopback). Smells like lockdep is getting it wrong?

Yes, the details look wrong, but IMHO there are reasons to report
possible problems yet.

> No idea, i have strange lockup's on my systems where i have ifb, and that 
> make me worry. And i feel it is directly related with my love to use ifb 
> devices and way how i am using them.

Did you try to change the order of rules in this first script: ifb rules
before eth?

> > > Note - it can stop traffic on PC completely. It is also seems crashed my 
> > > desktop PC, i am not able to execute "tc qdisc del dev eth0 root".
> > > The system hang completely. I had few similar issues on my PPPoE servers 
> > > (with different scripts for shapers), that system hang, and even "reboot -
> f" 
> > > doesn't work sometimes.
> > 
> > This sounds like a different issue from above - when did this start 
> > to happen? Is it at the same time as above warnings showing up?
> 
> Yes, it is different issue seems, it is rare to lockup system , and i will 
> dig more, to understand how it is happening.

If you are willing for debugging I would be interested with results or
sending some patches.

Cheers,
Jarek P.

(testing patch take 2)

---

 drivers/net/ifb.c |   24 ++++++++++++++++++++++++
 1 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ifb.c b/drivers/net/ifb.c
index 15949d3..c553b62 100644
--- a/drivers/net/ifb.c
+++ b/drivers/net/ifb.c
@@ -227,6 +227,27 @@ static struct rtnl_link_ops ifb_link_ops __read_mostly = {
 module_param(numifbs, int, 0);
 MODULE_PARM_DESC(numifbs, "Number of ifb devices");
 
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+/*
+ * dev_ifb->queue_lock is usually taken after dev->ingress_lock,
+ * reversely to e.g. qdisc_lock_tree(). It should be safe until
+ * ifb doesn't take dev->queue_lock with dev_ifb->ingress_lock.
+ * But lockdep should know that ifb has different locks from dev.
+ */
+static struct lock_class_key ifb_queue_lock_key;
+static struct lock_class_key ifb_ingress_lock_key;
+
+static inline void ifb_set_lock_classes(struct net_device *dev_ifb)
+{
+	lockdep_set_class(&dev_ifb->queue_lock, &ifb_queue_lock_key);
+	lockdep_set_class(&dev_ifb->ingress_lock, &ifb_ingress_lock_key);
+}
+#else
+static inline void ifb_set_lock_classes(struct net_device *dev_ifb)
+{
+}
+#endif	/* CONFIG_DEBUG_LOCK_ALLOC */
+
 static int __init ifb_init_one(int index)
 {
 	struct net_device *dev_ifb;
@@ -246,6 +267,9 @@ static int __init ifb_init_one(int index)
 	err = register_netdevice(dev_ifb);
 	if (err < 0)
 		goto err;
+
+	ifb_set_lock_classes(dev_ifb);
+
 	return 0;
 
 err:

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: circular locking, mirred, 2.6.24.2
  2008-03-06 20:25                 ` Jarek Poplawski
@ 2008-03-06 20:56                   ` jamal
  2008-03-06 22:12                     ` Jarek Poplawski
  0 siblings, 1 reply; 47+ messages in thread
From: jamal @ 2008-03-06 20:56 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Denys Fedoryshchenko, netdev

On Thu, 2008-06-03 at 21:25 +0100, Jarek Poplawski wrote:

> It's really strange: I can't reproduce this, 

I couldnt either - are you using 2.6.25-rc4?

> and if it were so easy we
> would get really a lot of similar reports. It looks like you have
> something special. This lockdep report with this kind of problem
> usually looks different too. The good side is it's easy to reproduce.
> So, could you try the patch below? (It's only supposed to fix the lockdep
> warning, not lockups).

This is more out of ignorance: Why is ifb needing the extra teaching for
lockdep? It is a netdevice - shouldnt the two global lockdeps you
described earlier not be sufficient?

cheers,
jamal


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: circular locking, mirred, 2.6.24.2
  2008-03-06 20:56                   ` jamal
@ 2008-03-06 22:12                     ` Jarek Poplawski
  2008-03-06 23:43                       ` Denys Fedoryshchenko
  0 siblings, 1 reply; 47+ messages in thread
From: Jarek Poplawski @ 2008-03-06 22:12 UTC (permalink / raw)
  To: jamal; +Cc: Denys Fedoryshchenko, netdev

On Thu, Mar 06, 2008 at 03:56:40PM -0500, jamal wrote:
> On Thu, 2008-06-03 at 21:25 +0100, Jarek Poplawski wrote:
> 
> > It's really strange: I can't reproduce this, 
> 
> I couldnt either - are you using 2.6.25-rc4?

No, David's net-2.6 tree so 2.6.25-rc3 plus something...

> 
> > and if it were so easy we
> > would get really a lot of similar reports. It looks like you have
> > something special. This lockdep report with this kind of problem
> > usually looks different too. The good side is it's easy to reproduce.
> > So, could you try the patch below? (It's only supposed to fix the lockdep
> > warning, not lockups).
> 
> This is more out of ignorance: Why is ifb needing the extra teaching for
> lockdep? It is a netdevice - shouldnt the two global lockdeps you
> described earlier not be sufficient?

As I've written in the previous message, currently lockdep tracks
dev->queue_lock and dev->ingress_lock as only two locks used by all
net devices (unless they were annotated individually). So, it's like
A and B lock, and it's really not right to them AB in one place, and
BA in another. In reality each net_device's locks are independent,
so ifb has C and D. And it's not AB vs. BA, but: AB (eth/lo->queue_lock,
eth/lo->ingress_lock), CD (the same for ifb) and BC (eth/lo->ingress_lock,
ifb->queue_lock) - all legal combinations, and no inversion.

Cheers,
Jarek P.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: circular locking, mirred, 2.6.24.2
  2008-03-06 22:12                     ` Jarek Poplawski
@ 2008-03-06 23:43                       ` Denys Fedoryshchenko
  2008-03-07  0:09                         ` jamal
  2008-03-07  9:31                         ` Jarek Poplawski
  0 siblings, 2 replies; 47+ messages in thread
From: Denys Fedoryshchenko @ 2008-03-06 23:43 UTC (permalink / raw)
  To: jamal, Jarek Poplawski; +Cc: netdev

About reproducing, I think .config matter
Mine is at http://www.nuclearcat.com/files/config.txt

About hardware, maybe it is important - it is Intel dual core machine (32-
bit). I test few more times, i can reproduce it for sure. It gives lockdep on 
first incoming packet.

More info about hardware:
CPU model name      : Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz
Other machines where i get this lockdep also Core Duo CPU's. 
I test now on old 2xXeon,P4 with hyperthreading - also triggered.




Shortest script i use:
modprobe ifb
ifconfig ifb0 up
/sbin/tc qdisc del dev eth0 ingress 1>/dev/null 2>/dev/null
/sbin/tc qdisc add dev eth0 ingress
/sbin/tc filter add dev eth0 parent ffff: protocol ip prio 10 u32 \
         match u32 0 0 flowid 1:1 \
         action mirred egress redirect dev ifb0




On Thu, 6 Mar 2008 23:12:53 +0100, Jarek Poplawski wrote
> On Thu, Mar 06, 2008 at 03:56:40PM -0500, jamal wrote:
> > On Thu, 2008-06-03 at 21:25 +0100, Jarek Poplawski wrote:
> > 
> > > It's really strange: I can't reproduce this, 
> > 
> > I couldnt either - are you using 2.6.25-rc4?
> 
> No, David's net-2.6 tree so 2.6.25-rc3 plus something...
> 
> > 
> > > and if it were so easy we
> > > would get really a lot of similar reports. It looks like you have
> > > something special. This lockdep report with this kind of problem
> > > usually looks different too. The good side is it's easy to reproduce.
> > > So, could you try the patch below? (It's only supposed to fix the 
lockdep
> > > warning, not lockups).
> > 
> > This is more out of ignorance: Why is ifb needing the extra teaching for
> > lockdep? It is a netdevice - shouldnt the two global lockdeps you
> > described earlier not be sufficient?
> 
> As I've written in the previous message, currently lockdep tracks
> dev->queue_lock and dev->ingress_lock as only two locks used by all
> net devices (unless they were annotated individually). So, it's like
> A and B lock, and it's really not right to them AB in one place, and
> BA in another. In reality each net_device's locks are independent,
> so ifb has C and D. And it's not AB vs. BA, but: AB (eth/lo-
> >queue_lock, eth/lo->ingress_lock), CD (the same for ifb) and BC 
> (eth/lo->ingress_lock, ifb->queue_lock) - all legal combinations,
>  and no inversion.
> 
> Cheers,
> Jarek P.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: circular locking, mirred, 2.6.24.2
  2008-03-06 23:43                       ` Denys Fedoryshchenko
@ 2008-03-07  0:09                         ` jamal
  2008-03-07  0:15                           ` Denys Fedoryshchenko
  2008-03-07  9:31                         ` Jarek Poplawski
  1 sibling, 1 reply; 47+ messages in thread
From: jamal @ 2008-03-07  0:09 UTC (permalink / raw)
  To: Denys Fedoryshchenko; +Cc: Jarek Poplawski, netdev

Denys,

On Fri, 2008-07-03 at 01:43 +0200, Denys Fedoryshchenko wrote:
> About reproducing, I think .config matter
> Mine is at http://www.nuclearcat.com/files/config.txt
> 

Ok, that + SMP is what i am lacking. Given i am in travel mode i will
not be able to try it out on SMP for a few days but i will try to build
the kernel with your config. My initial test with latest git also seems
to fail to reproduce it.

In any case, i think the lockdep alarm is false positive.

One thing i couldnt figure from your emails is whether you saw the
traces on the latest kernels or it has always been there.

cheers,
jamal

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: circular locking, mirred, 2.6.24.2
  2008-03-07  0:09                         ` jamal
@ 2008-03-07  0:15                           ` Denys Fedoryshchenko
  2008-03-07  0:25                             ` jamal
  0 siblings, 1 reply; 47+ messages in thread
From: Denys Fedoryshchenko @ 2008-03-07  0:15 UTC (permalink / raw)
  To: hadi; +Cc: Jarek Poplawski, netdev

I just start to using lockdep on servers with shaper over ifb from 2.6.24. So 
it is triggering from 2.6.24 till latest 2.6.25-rc4. If you want - i can 
build some specific old version of kernel with lockdep.

On Thu, 06 Mar 2008 19:09:59 -0500, jamal wrote
> Denys,
> 
> On Fri, 2008-07-03 at 01:43 +0200, Denys Fedoryshchenko wrote:
> > About reproducing, I think .config matter
> > Mine is at http://www.nuclearcat.com/files/config.txt
> >
> 
> Ok, that + SMP is what i am lacking. Given i am in travel mode i will
> not be able to try it out on SMP for a few days but i will try to build
> the kernel with your config. My initial test with latest git also seems
> to fail to reproduce it.
> 
> In any case, i think the lockdep alarm is false positive.
> 
> One thing i couldnt figure from your emails is whether you saw the
> traces on the latest kernels or it has always been there.
> 
> cheers,
> jamal
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: circular locking, mirred, 2.6.24.2
  2008-03-07  0:15                           ` Denys Fedoryshchenko
@ 2008-03-07  0:25                             ` jamal
  0 siblings, 0 replies; 47+ messages in thread
From: jamal @ 2008-03-07  0:25 UTC (permalink / raw)
  To: Denys Fedoryshchenko; +Cc: Jarek Poplawski, netdev

On Fri, 2008-07-03 at 02:15 +0200, Denys Fedoryshchenko wrote:
> I just start to using lockdep on servers with shaper over ifb from 2.6.24. So 
> it is triggering from 2.6.24 till latest 2.6.25-rc4. If you want - i can 
> build some specific old version of kernel with lockdep.

I think i should be able to try with your .config first with the latest
git. I should be able to at least see the traces.

I am just going to compile some things into the kernel, example:

---
CONFIG_IFB=m
CONFIG_DUMMY=m
CONFIG_BONDING=m
CONFIG_MACVLAN=m
CONFIG_EQUALIZER=m
CONFIG_TUN=m
----

Can you try that as well? Then if it doesnt happen with me, we'll narrow it to need for
smp.

cheers,
jamal


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: circular locking, mirred, 2.6.24.2
  2008-03-06 23:43                       ` Denys Fedoryshchenko
  2008-03-07  0:09                         ` jamal
@ 2008-03-07  9:31                         ` Jarek Poplawski
  2008-03-07 10:19                           ` Denys Fedoryshchenko
  1 sibling, 1 reply; 47+ messages in thread
From: Jarek Poplawski @ 2008-03-07  9:31 UTC (permalink / raw)
  To: Denys Fedoryshchenko; +Cc: jamal, netdev

On Fri, Mar 07, 2008 at 01:43:33AM +0200, Denys Fedoryshchenko wrote:
> About reproducing, I think .config matter
> Mine is at http://www.nuclearcat.com/files/config.txt

I see: CONFIG_4KSTACKS=y  but CONFIG_DEBUG_STACKOVERFLOW is not set.
This doesn't look safe to me... And can trigger really strange things.

Probably dmesg could be interesting too. And since it all takes place
and can change in time, I wonder if it isn't better to open a report
in bugzilla for this. Of course, you should remember to mask any
confidential information.

I'm also not sure this: 
http://www.nuclearcat.com/files/bug_feb.txt
is a full script (no more ifbs?) or an example. Doesn't "magic sysrq"
work during such lockups?

Denys, there were some reports on similar problems with ifb on SMP,
but this was hard to trigger and debugging stopped for some reason.
It seems there were some timer OOPSes, but this could be because
wrong locking too. This could be even because some other apps can't
handle their lost net traffic. Without some good log traces this could
be impossible to tell. And such debugging shouldn't be done at
production of course: it's really hard to foresee any locking changes.
So, until you are willing to respond to our proposals and try the
patches I can see no problem with assisting this problem.

BTW, probably it's easier to stick to one kernel version, so maybe
2.6.24.3 isn't a bad choice if 2.6.25-rc4 didn't help at all.

Regards,
Jarek P.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: circular locking, mirred, 2.6.24.2
  2008-03-07  9:31                         ` Jarek Poplawski
@ 2008-03-07 10:19                           ` Denys Fedoryshchenko
  2008-03-07 10:48                             ` Jarek Poplawski
  2008-03-07 14:58                             ` jamal
  0 siblings, 2 replies; 47+ messages in thread
From: Denys Fedoryshchenko @ 2008-03-07 10:19 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: jamal, netdev

Ok, i will proceed and will try all required options after few hours (need to 
go another city, work).
I will try to set max as possible debug options.

On bug_feb.txt it is triggering ONLY lockdep warning, but full network lockup 
occur very rarely. But even it happens with me 2 times, i am not sure it is 
same problem. Let's stick maybe to this one (lockdep warning) which is easy 
to reproduce.
I will open also bugzilla report.

Regarding testing with compiled in modules, i did
CONFIG_IFB=y 
CONFIG_DUMMY=y
CONFIG_BONDING=y
CONFIG_MACVLAN=y 
CONFIG_EQUALIZER=y
CONFIG_TUN=y

It is still triggering lockdep.
Here is full dmesg:
http://www.nuclearcat.com/files/dmesg_1234.txt



On Fri, 7 Mar 2008 09:31:50 +0000, Jarek Poplawski wrote
> On Fri, Mar 07, 2008 at 01:43:33AM +0200, Denys Fedoryshchenko wrote:
> > About reproducing, I think .config matter
> > Mine is at http://www.nuclearcat.com/files/config.txt
> 
> I see: CONFIG_4KSTACKS=y  but CONFIG_DEBUG_STACKOVERFLOW is not set.
> This doesn't look safe to me... And can trigger really strange things.
> 
> Probably dmesg could be interesting too. And since it all takes place
> and can change in time, I wonder if it isn't better to open a report
> in bugzilla for this. Of course, you should remember to mask any
> confidential information.
> 
> I'm also not sure this: 
> http://www.nuclearcat.com/files/bug_feb.txt
> is a full script (no more ifbs?) or an example. Doesn't "magic sysrq"
> work during such lockups?
> 
> Denys, there were some reports on similar problems with ifb on SMP,
> but this was hard to trigger and debugging stopped for some reason.
> It seems there were some timer OOPSes, but this could be because
> wrong locking too. This could be even because some other apps can't
> handle their lost net traffic. Without some good log traces this 
> could be impossible to tell. And such debugging shouldn't be done at 
> production of course: it's really hard to foresee any locking 
> changes. So, until you are willing to respond to our proposals and 
> try the patches I can see no problem with assisting this problem.
> 
> BTW, probably it's easier to stick to one kernel version, so maybe
> 2.6.24.3 isn't a bad choice if 2.6.25-rc4 didn't help at all.
> 
> Regards,
> Jarek P.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: circular locking, mirred, 2.6.24.2
  2008-03-07 10:19                           ` Denys Fedoryshchenko
@ 2008-03-07 10:48                             ` Jarek Poplawski
  2008-03-07 14:58                             ` jamal
  1 sibling, 0 replies; 47+ messages in thread
From: Jarek Poplawski @ 2008-03-07 10:48 UTC (permalink / raw)
  To: Denys Fedoryshchenko; +Cc: jamal, netdev

On Fri, Mar 07, 2008 at 12:19:03PM +0200, Denys Fedoryshchenko wrote:
> Ok, i will proceed and will try all required options after few hours (need to 
> go another city, work).
> I will try to set max as possible debug options.

Max could be too much, but e.g. this is sometimes helpful too:
CONFIG_DEBUG_LIST is not set

BTW, you didn't write yet if with min/no debugging, no 4KSTACKS and
lockdep off there is any change in these lockups?

And no need to hurry...

> On bug_feb.txt it is triggering ONLY lockdep warning, but full network lockup 
> occur very rarely. But even it happens with me 2 times, i am not sure it is 
> same problem. Let's stick maybe to this one (lockdep warning) which is easy 
> to reproduce.
> I will open also bugzilla report.

Send Re: here with a bugzilla number.

> 
> Regarding testing with compiled in modules, i did
> CONFIG_IFB=y 
> CONFIG_DUMMY=y
> CONFIG_BONDING=y
> CONFIG_MACVLAN=y 
> CONFIG_EQUALIZER=y
> CONFIG_TUN=y
> 
> It is still triggering lockdep.
> Here is full dmesg:
> http://www.nuclearcat.com/files/dmesg_1234.txt

OK, but it seems still without my last patch to ifb?

Thanks,
Jarek P.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: circular locking, mirred, 2.6.24.2
  2008-03-07 10:19                           ` Denys Fedoryshchenko
  2008-03-07 10:48                             ` Jarek Poplawski
@ 2008-03-07 14:58                             ` jamal
  1 sibling, 0 replies; 47+ messages in thread
From: jamal @ 2008-03-07 14:58 UTC (permalink / raw)
  To: Denys Fedoryshchenko; +Cc: Jarek Poplawski, netdev

On Fri, 2008-07-03 at 12:19 +0200, Denys Fedoryshchenko wrote:
> Ok, i will proceed and will try all required options after few hours (need to 
> go another city, work).
> I will try to set max as possible debug options.

This is embarassing on my part - all this time i claimed i couldnt
reproduce it was because i didnt have DEPLOCK debuging on. I apologize.
I am only have time to rebuild the kernel but not test it. In any case,
Denys even if i do reproduce it this is a false positive i.e it is just
noise in my opinion and you can safely ignore it (refer to my emails to
Jarek). 
Perhaps we should invest the time trying to reproduce the other issues
you have instead.

cheers,
jamal

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: circular locking, mirred, 2.6.24.2
  2008-03-06 15:50               ` Denys Fedoryshchenko
  2008-03-06 20:25                 ` Jarek Poplawski
@ 2008-03-06 20:44                 ` jamal
  1 sibling, 0 replies; 47+ messages in thread
From: jamal @ 2008-03-06 20:44 UTC (permalink / raw)
  To: Denys Fedoryshchenko; +Cc: Jarek Poplawski, netdev

On Thu, 2008-06-03 at 17:50 +0200, Denys Fedoryshchenko wrote:

> Well, i am able to reproduce in much more trivial script. Tested 2.6.25-rc4 
> also.
> 
> modprobe ifb
> ifconfig ifb0 up
> TC=/sbin/tc
> $TC qdisc del dev eth0 ingress 1>/dev/null 2>/dev/null
> $TC qdisc add dev eth0 ingress
> ${TC} filter add dev eth0 parent ffff: protocol ip prio 10 u32 \
>       match u32 0 0 flowid 1:1 \
>       action mirred egress redirect dev ifb0
> 

I have tried this on my laptop - 2.6.24 and it doesnt happen. Indicates
whatever issues that are occuring must be post 2.6.24 since you are
running something newer. 


> > locks(ethx, ifbx, and loopback). Smells like lockdep is getting it wrong?
> No idea, i have strange lockup's on my systems where i have ifb, and that 
> make me worry. And i feel it is directly related with my love to use ifb 
> devices and way how i am using them.

I will like to help resolve it - if i can reproduce it i can fix it. I
love the fact you love to use ifb. I am going to try later on 2.6.25-rc4
that you are running.

> > This sounds like a different issue from above - when did this start 
> > to happen? Is it at the same time as above warnings showing up?
> 
> Yes, it is different issue seems, it is rare to lockup system , and i will 
> dig more, to understand how it is happening.

Are you running std kernels or do you have some other patches?

cheers,
jamal


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: circular locking, mirred, 2.6.24.2
  2008-03-06 13:40         ` Jarek Poplawski
  2008-03-06 13:57           ` Denys Fedoryshchenko
@ 2008-03-06 13:59           ` jamal
  2008-03-06 17:56             ` Jarek Poplawski
  1 sibling, 1 reply; 47+ messages in thread
From: jamal @ 2008-03-06 13:59 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Denys Fedoryshchenko, netdev

On Thu, 2008-06-03 at 13:40 +0000, Jarek Poplawski wrote:

> I'm not sure this lockdep report is because of this, but there is
> really a problem with lock order while using sch_ingress with
> act_mirred: dev->queue_lock is taken after dev->ingress_lock, so
> reversely to e.g. qdisc_lock_tree(). This shouldn't be a problem
> when one of the devices is ifb yet.

Are there more details? Ingress of which netdevice is redirecting to
egress of which netdevice?
Sorry, I dont understand much about the internals of lockdep so i dont
know what you are teaching it in the patch below...

cheers,
jamal


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: circular locking, mirred, 2.6.24.2
  2008-03-06 13:59           ` jamal
@ 2008-03-06 17:56             ` Jarek Poplawski
  2008-03-06 20:48               ` jamal
  0 siblings, 1 reply; 47+ messages in thread
From: Jarek Poplawski @ 2008-03-06 17:56 UTC (permalink / raw)
  To: jamal; +Cc: Denys Fedoryshchenko, netdev

On Thu, Mar 06, 2008 at 08:59:55AM -0500, jamal wrote:
> On Thu, 2008-06-03 at 13:40 +0000, Jarek Poplawski wrote:
> 
> > I'm not sure this lockdep report is because of this, but there is
> > really a problem with lock order while using sch_ingress with
> > act_mirred: dev->queue_lock is taken after dev->ingress_lock, so
> > reversely to e.g. qdisc_lock_tree(). This shouldn't be a problem
> > when one of the devices is ifb yet.
> 
> Are there more details? Ingress of which netdevice is redirecting to
> egress of which netdevice?
> Sorry, I dont understand much about the internals of lockdep so i dont
> know what you are teaching it in the patch below...

Every netdevice after register_netdevice() has its queue_lock and
ingress_lock initialized with the same static lock_class_keys, so unless
changed later, these locks are treated by lockdep as 2 global locks. So,
taking such locks with different order should be reported. This really
happens in act_mirred, and I don't know yet, why this wasn't reported
earlier.

Of course, if there are two different devices this could be safe, but
not always (e.g. thread1 has dev_eth0->ingress_lock and wants
dev_eth1->queue_lock, thread2 has dev_eth1->ingress_lock, wants
dev_eth0->qdisc_lock, and thread3 has dev_eth0->qdisc_lock and wants
dev_eth0->ingress_lock). With ifb this shouldn't be the case, but
anyway we have to tell lockdep that ifb uses a different pair of locks.

My patch teaches lockdep about queue_lock, but after rethinking it
seems it's not enough, and I'll have to update this patch with
ingress_lock too.

Denys, I'll have to read your script yet, so you can wait with this
patching (unless you've started already - anyway this patch shouldn't
be harmful).

Thanks,
Jarek P.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: circular locking, mirred, 2.6.24.2
  2008-03-06 17:56             ` Jarek Poplawski
@ 2008-03-06 20:48               ` jamal
  2008-03-06 21:40                 ` Jarek Poplawski
  0 siblings, 1 reply; 47+ messages in thread
From: jamal @ 2008-03-06 20:48 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Denys Fedoryshchenko, netdev

On Thu, 2008-06-03 at 18:56 +0100, Jarek Poplawski wrote:

> Every netdevice after register_netdevice() has its queue_lock and
> ingress_lock initialized with the same static lock_class_keys, so unless
> changed later, these locks are treated by lockdep as 2 global locks.
>  So, taking such locks with different order should be reported. 

ok.

> This really
> happens in act_mirred, and I don't know yet, why this wasn't reported
> earlier.

Look closely at those traces again ;-> there are *three* different
netdevices involved, one (loopback) seems to be _totaly_ unrelated.
Tracing of those locks just seems confused - perhaps the pernet stuff is
confusing loopback?

> Of course, if there are two different devices this could be safe, but
> not always (e.g. thread1 has dev_eth0->ingress_lock and wants
> dev_eth1->queue_lock, thread2 has dev_eth1->ingress_lock, wants
> dev_eth0->qdisc_lock, and thread3 has dev_eth0->qdisc_lock and wants
> dev_eth0->ingress_lock). With ifb this shouldn't be the case, but
> anyway we have to tell lockdep that ifb uses a different pair of locks.

thread3 can not happen because we dont allow it (the other 2 are not
contentious).
There are cases where redirecting will cause you problems (example if
you redirected to yourself and cause an infinite redirect) which are
listed in iproute2/doc. Denys script is fine afaics.






^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: circular locking, mirred, 2.6.24.2
  2008-03-06 20:48               ` jamal
@ 2008-03-06 21:40                 ` Jarek Poplawski
  2008-03-06 23:40                   ` jamal
  0 siblings, 1 reply; 47+ messages in thread
From: Jarek Poplawski @ 2008-03-06 21:40 UTC (permalink / raw)
  To: jamal; +Cc: Denys Fedoryshchenko, netdev

On Thu, Mar 06, 2008 at 03:48:42PM -0500, jamal wrote:
> On Thu, 2008-06-03 at 18:56 +0100, Jarek Poplawski wrote:
> 
> > Every netdevice after register_netdevice() has its queue_lock and
> > ingress_lock initialized with the same static lock_class_keys, so unless
> > changed later, these locks are treated by lockdep as 2 global locks.
> >  So, taking such locks with different order should be reported. 
> 
> ok.
> 
> > This really
> > happens in act_mirred, and I don't know yet, why this wasn't reported
> > earlier.
> 
> Look closely at those traces again ;-> there are *three* different
> netdevices involved, one (loopback) seems to be _totaly_ unrelated.
> Tracing of those locks just seems confused - perhaps the pernet stuff is
> confusing loopback?

But currently lockdep doesn't know dev->queue_lock could mean eth or lo.
It sees one class of devices using one lock. We can let it know (e.g.
dev->_xmit_lock is different according to dev->type), but it wasn't
necessary. I hope it will suffice here if lockdep knows more about ifb,
but similar problem could theoretically happen with other devs.

> > Of course, if there are two different devices this could be safe, but
> > not always (e.g. thread1 has dev_eth0->ingress_lock and wants
> > dev_eth1->queue_lock, thread2 has dev_eth1->ingress_lock, wants
> > dev_eth0->qdisc_lock, and thread3 has dev_eth0->qdisc_lock and wants

Sorry, should be:
 dev_eth0->queue_lock, and thread3 has dev_eth0->queue_lock and wants

> > dev_eth0->ingress_lock). With ifb this shouldn't be the case, but
> > anyway we have to tell lockdep that ifb uses a different pair of locks.
> 
> thread3 can not happen because we dont allow it (the other 2 are not
> contentious).

Could you explain why? It's a qdisc_lock_tree case and probably not only
this.

> There are cases where redirecting will cause you problems (example if
> you redirected to yourself and cause an infinite redirect) which are
> listed in iproute2/doc. Denys script is fine afaics.

Yes, but it seems such redirection between two eths like above mentioned
is legal?

Jarek P.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: circular locking, mirred, 2.6.24.2
  2008-03-06 21:40                 ` Jarek Poplawski
@ 2008-03-06 23:40                   ` jamal
  2008-03-07  7:51                     ` Jarek Poplawski
  0 siblings, 1 reply; 47+ messages in thread
From: jamal @ 2008-03-06 23:40 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Denys Fedoryshchenko, netdev

Jarek,

On Thu, 2008-06-03 at 22:40 +0100, Jarek Poplawski wrote:

> But currently lockdep doesn't know dev->queue_lock could mean eth or lo.
> It sees one class of devices using one lock. 

yikes ;-> I missed the implication when you mentioned it first - thanks
for the enlightnment. IMO, it needs to be per netdevice not global; so i
contend this is the problem i.e those warnings are false.

> We can let it know (e.g.
> dev->_xmit_lock is different according to dev->type), but it wasn't
> necessary. I hope it will suffice here if lockdep knows more about ifb,
> but similar problem could theoretically happen with other devs.

I think the condition needs to be met for all netdevices, not just ifb.
Instead of annotating each netdevice type separately, could you not use
a different annotation in the generic netdev registration code of the
form dev_%s->ingress/queue_lock ? (where %s is dev->name)

> Sorry, should be:
>  dev_eth0->queue_lock, and thread3 has dev_eth0->queue_lock and wants

indeed - i understood when you said that; it doesnt invalidate what i
said.

> > thread3 can not happen because we dont allow it (the other 2 are not
> > contentious).
> 
> Could you explain why? 

Because we never redirect to ingress (it is in the todo as mentioned as
described at the top of the source file). You can redirect from ingress
or egress but only to egress (to sockets and ingress are going to happen
at some future point).

> Yes, but it seems such redirection between two eths like above mentioned
> is legal?

Indeed it is. I think it is clear to me now the issues seem to be the
namespace of what lockdep - it should be per-device and NOT global.
What do you think?

cheers,
jamal

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: circular locking, mirred, 2.6.24.2
  2008-03-06 23:40                   ` jamal
@ 2008-03-07  7:51                     ` Jarek Poplawski
  2008-03-07  8:32                       ` Jarek Poplawski
  2008-03-07 13:53                       ` jamal
  0 siblings, 2 replies; 47+ messages in thread
From: Jarek Poplawski @ 2008-03-07  7:51 UTC (permalink / raw)
  To: jamal; +Cc: Denys Fedoryshchenko, netdev

On Thu, Mar 06, 2008 at 06:40:39PM -0500, jamal wrote:
> Jarek,
> 
> On Thu, 2008-06-03 at 22:40 +0100, Jarek Poplawski wrote:
> 
> > But currently lockdep doesn't know dev->queue_lock could mean eth or lo.
> > It sees one class of devices using one lock. 
> 
> yikes ;-> I missed the implication when you mentioned it first - thanks
> for the enlightnment. IMO, it needs to be per netdevice not global; so i
> contend this is the problem i.e those warnings are false.
> 
> > We can let it know (e.g.
> > dev->_xmit_lock is different according to dev->type), but it wasn't
> > necessary. I hope it will suffice here if lockdep knows more about ifb,
> > but similar problem could theoretically happen with other devs.
> 
> I think the condition needs to be met for all netdevices, not just ifb.
> Instead of annotating each netdevice type separately, could you not use
> a different annotation in the generic netdev registration code of the
> form dev_%s->ingress/queue_lock ? (where %s is dev->name)

At first this lockdep method looks like wrong (too general). But it
works. With every lock tracked separately there would be huge overhead
(especially with some structures created in thousands of instances).
And with this general approach we can see general problems: e.g.
wrong locking order even reported on two different devices like here,
and impossible in reality, can help to analyze if the same could
happen in another, maybe even very hard to reproduce, variant, and to
fix this before ever happened.

So, current way is to start from very general tracking, and if it
doesn't work (I mean false reports, not lockups), we make it more
specific. With dev->_xmit_lock the type level seems to be enough
for some time (so lockdep doesn't even know eth0 and eth1 created
as ARPHRD_ETHER use 2 different _xmit_locks).

Such specific information is usually necessary only with nesting.
But here, with sch_ingress + act_mirred it's not about nesting: it's
really about wrong order, which could be safe only because ifb
doesn't currently work with a full egress + ingress functionality.

Making this all more specific for lockdep won't help with real
lockups then, but will make reports more exact and less reproducible.

> > Sorry, should be:
> >  dev_eth0->queue_lock, and thread3 has dev_eth0->queue_lock and wants
> 
> indeed - i understood when you said that; it doesnt invalidate what i
> said.
> 
> > > thread3 can not happen because we dont allow it (the other 2 are not
> > > contentious).
> > 
> > Could you explain why? 
> 
> Because we never redirect to ingress (it is in the todo as mentioned as
> described at the top of the source file). You can redirect from ingress
> or egress but only to egress (to sockets and ingress are going to happen
> at some future point).
> 
> > Yes, but it seems such redirection between two eths like above mentioned
> > is legal?
> 
> Indeed it is. I think it is clear to me now the issues seem to be the
> namespace of what lockdep - it should be per-device and NOT global.
> What do you think?

I've to find first what really bothers lockdep here, and why this
queue_lock vs. ingress_lock order isn't reported "by default". But if
this really is like it looks now, then it seems before doing this
ingress "future point" some change in locking could be necessary.
(Maybe even sooner if lockdep finds something real after this current
patch to ifb.)

Cheers,
Jarek P.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: circular locking, mirred, 2.6.24.2
  2008-03-07  7:51                     ` Jarek Poplawski
@ 2008-03-07  8:32                       ` Jarek Poplawski
  2008-03-07 13:53                       ` jamal
  1 sibling, 0 replies; 47+ messages in thread
From: Jarek Poplawski @ 2008-03-07  8:32 UTC (permalink / raw)
  To: jamal; +Cc: Denys Fedoryshchenko, netdev

On Fri, Mar 07, 2008 at 07:51:13AM +0000, Jarek Poplawski wrote:
...
> I've to find first what really bothers lockdep here, and why this
> queue_lock vs. ingress_lock order isn't reported "by default". But if
> this really is like it looks now, then it seems before doing this
> ingress "future point" some change in locking could be necessary.
> (Maybe even sooner if lockdep finds something real after this current
> patch to ifb.)

Actually, I got it a bit wrong: "the problem" with ifb could probably
exist now: if we would redirect e.g. from ifb0's ingress to eth0's
egress, while doing the same thing from eth0 to ifb0. But can we get
any traffic on ifb0's ingress? And why would we do this after all?

Jarek P.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: circular locking, mirred, 2.6.24.2
  2008-03-07  7:51                     ` Jarek Poplawski
  2008-03-07  8:32                       ` Jarek Poplawski
@ 2008-03-07 13:53                       ` jamal
  2008-03-08  8:46                         ` Jarek Poplawski
  1 sibling, 1 reply; 47+ messages in thread
From: jamal @ 2008-03-07 13:53 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Denys Fedoryshchenko, netdev

On Fri, 2008-07-03 at 07:51 +0000, Jarek Poplawski wrote:

> At first this lockdep method looks like wrong (too general). But it
> works. With every lock tracked separately there would be huge overhead
> (especially with some structures created in thousands of instances).
> And with this general approach we can see general problems: e.g.
> wrong locking order even reported on two different devices like here,
> and impossible in reality, 

The problem is not so much so that it is an impossible situation, rather
it is not a useful detail to report as is. 
As an example, heres something i consider useful to catch:
 dev->queuelock for ethx being held recursively
In your current setup, there will be many false-positives. i.e it will
show something like dev->queuelock for eth0 is being held and flagging
that as a problem because dev->queuelock for eth1 is also being held.
i.e a lot of noise.

> can help to analyze if the same could
> happen in another, maybe even very hard to reproduce, variant, and to
> fix this before ever happened.

It is possible some odd issue will be found by having the lockdeps in
their current setup - but if you have to sift through noise in order to
find it, then it makes it harder.

> So, current way is to start from very general tracking, and if it
> doesn't work (I mean false reports, not lockups), we make it more
> specific. 

Is it time to do that now? I know you said it was too much data to
carry. However, such data depends on how many netdevices are on the
system and lockdep is optional. You could even make NETDEV_LOCKDEP
version which is accurate and the other one as being less accurate.

> With dev->_xmit_lock the type level seems to be enough
> for some time (so lockdep doesn't even know eth0 and eth1 created
> as ARPHRD_ETHER use 2 different _xmit_locks).

I am curious what issue has been found with this approach. I can see
that someone who knows what they are doing could sift through reports
and be able to catch something. My thinking is it most reports are going
to be noise. People will report them because they are scary looking.

> Such specific information is usually necessary only with nesting.

It could also apply to a single lock (for example redirecting 
eth0->eth0 or eth0->manyotherredirectshere->eth0)

> But here, with sch_ingress + act_mirred it's not about nesting: it's
> really about wrong order, which could be safe only because ifb
> doesn't currently work with a full egress + ingress functionality.

ifb never ever works with ingress. It is design intent. It is a special
netdevice (as the text says) which receives only redirected packets.

> I've to find first what really bothers lockdep here, and why this
> queue_lock vs. ingress_lock order isn't reported "by default". 

I dont see the problem in this case, those traces are false-positives -
but no harm in investigating further. I still think the only time you
can find real issues is when you have per-netdevice lockdep; anything
else will only find issues only with the help of an expert.

> But if
> this really is like it looks now, then it seems before doing this
> ingress "future point" some change in locking could be necessary.

true. 

> (Maybe even sooner if lockdep finds something real after this current
> patch to ifb.)

As it stands right now Jarek, I think you need to teach lockedep more
smarter details.

cheers,
jamal

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: circular locking, mirred, 2.6.24.2
  2008-03-07 13:53                       ` jamal
@ 2008-03-08  8:46                         ` Jarek Poplawski
  2008-03-08  8:58                           ` Jarek Poplawski
  0 siblings, 1 reply; 47+ messages in thread
From: Jarek Poplawski @ 2008-03-08  8:46 UTC (permalink / raw)
  To: jamal; +Cc: Denys Fedoryshchenko, netdev

On Fri, Mar 07, 2008 at 08:53:22AM -0500, jamal wrote:
...
> The problem is not so much so that it is an impossible situation, rather
> it is not a useful detail to report as is. 
> As an example, heres something i consider useful to catch:
>  dev->queuelock for ethx being held recursively
> In your current setup, there will be many false-positives. i.e it will
> show something like dev->queuelock for eth0 is being held and flagging
> that as a problem because dev->queuelock for eth1 is also being held.
> i.e a lot of noise.

But as you can see... you can't see these reports too often now.
Probably there are many people who are skeptical of lockdep like you.
But until it's an official tool you've to adjust and not scare users
with these reports (often called OOPSes!). The bad thing is that
currently any such report turns off lockdep checking for next possible
(false!?) problems.

> Is it time to do that now? I know you said it was too much data to
> carry. However, such data depends on how many netdevices are on the
> system and lockdep is optional. You could even make NETDEV_LOCKDEP
> version which is accurate and the other one as being less accurate.

That's why I'm trying now to find if making this more specific only
for ifb could suffice. This reverse locking order is used only on
sch_ingress -> act_mirred path, and it's mostly with ifb, which as you
said doesn't use this patch with it's ingress (at least currently).

> I am curious what issue has been found with this approach. I can see
> that someone who knows what they are doing could sift through reports
> and be able to catch something. My thinking is it most reports are going
> to be noise. People will report them because they are scary looking.

I never counted this, but you're mostly right, and most (90%?) of these
reports are really false-positives. But, even 10 or 5% correctness is
IMHO worth of it. I still think one of the most bothering things in
linux for users are misterious lockups, and this could be even more
so with these fast multicore boxes now.

One of the recent right-positives I can remember can be found on the
list in "Fix SMP oops in pppol2tp driver". This was really a complex
thing (and there is still something to find). The funniest of all is
David diagnosed it fastest probably without studying this report too
much, but on the other hand, nobody had spotted this issue before
lockdep. Another such a good lockdep's work is checking for
flush_workqueue() abusing (recently in "lockdep trace from rc2."
netdev thread).

> It could also apply to a single lock (for example redirecting 
> eth0->eth0 or eth0->manyotherredirectshere->eth0)

If you mean redirecting from ingress with act_mirred then of course.
I wrote earlier about eth0->ingress_redirected->eth1, eth1->
ingress_redirected->eth0 which would be right-positive IMHO. If
eth0->eth0 is legal this should be even more dangerous.

> ifb never ever works with ingress. It is design intent. It is a special
> netdevice (as the text says) which receives only redirected packets.
...
> I dont see the problem in this case, those traces are false-positives -
> but no harm in investigating further. I still think the only time you
> can find real issues is when you have per-netdevice lockdep; anything
> else will only find issues only with the help of an expert.

I think wrt ifb it's a false-positive too, that's why I din't try to
change locking but only add some info for lockdep. BTW, we shouldn't
forget such lockdep annotations are always under debugging config
options, and not intended to work (or use any resources) in production
or home boxes (at least until no need to write to kernel lists).
BTW#2, it seems this tcf_lock in act_mirred could need similar lockdep
treatment, but let's wait for Denys'es testing.

> As it stands right now Jarek, I think you need to teach lockedep more
> smarter details.

...But the beast is much smarter than me!

Regards,
Jarek P.

PS: sorry for delay, a little cold or something.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: circular locking, mirred, 2.6.24.2
  2008-03-08  8:46                         ` Jarek Poplawski
@ 2008-03-08  8:58                           ` Jarek Poplawski
  2008-03-08  9:56                             ` Denys Fedoryshchenko
  2008-03-08 10:16                             ` Denys Fedoryshchenko
  0 siblings, 2 replies; 47+ messages in thread
From: Jarek Poplawski @ 2008-03-08  8:58 UTC (permalink / raw)
  To: jamal; +Cc: Denys Fedoryshchenko, netdev

On Sat, Mar 08, 2008 at 09:46:28AM +0100, Jarek Poplawski wrote:
...
> said doesn't use this patch with it's ingress (at least currently).

+ said doesn't use this path with it's ingress (at least currently).

Jarek P.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: circular locking, mirred, 2.6.24.2
  2008-03-08  8:58                           ` Jarek Poplawski
@ 2008-03-08  9:56                             ` Denys Fedoryshchenko
  2008-03-08 10:16                             ` Denys Fedoryshchenko
  1 sibling, 0 replies; 47+ messages in thread
From: Denys Fedoryshchenko @ 2008-03-08  9:56 UTC (permalink / raw)
  To: Jarek Poplawski, jamal; +Cc: netdev

I apply proposel patch at http://marc.info/?l=linux-
netdev&m=120481096107584&w=2

and not able to trigger lockdep warning anymore.
It is expected result?

On Sat, 8 Mar 2008 09:58:54 +0100, Jarek Poplawski wrote
> On Sat, Mar 08, 2008 at 09:46:28AM +0100, Jarek Poplawski wrote:
> ....
> > said doesn't use this patch with it's ingress (at least currently).
> 
> + said doesn't use this path with it's ingress (at least currently).
> 
> Jarek P.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: circular locking, mirred, 2.6.24.2
  2008-03-08  8:58                           ` Jarek Poplawski
  2008-03-08  9:56                             ` Denys Fedoryshchenko
@ 2008-03-08 10:16                             ` Denys Fedoryshchenko
  2008-03-08 10:43                               ` Jarek Poplawski
  1 sibling, 1 reply; 47+ messages in thread
From: Denys Fedoryshchenko @ 2008-03-08 10:16 UTC (permalink / raw)
  To: Jarek Poplawski, jamal; +Cc: netdev

Update. I am not able to reproduce on my PC, but still able to reproduce on 
PPPoE server (but here i cannot take risk to use 2.6.25-rcX kernels). This is 
PPPoE:

[2148614.154684]
[2148614.154688] =======================================================
[2148614.154805] [ INFO: possible circular locking dependency detected ]
[2148614.154862] 2.6.24.3-build-0023 #9
[2148614.154913] -------------------------------------------------------
[2148614.154969] swapper/0 is trying to acquire lock:
[2148614.155023]  (&ifb_queue_lock_key){-+..}, at: [<c0289d4d>] 
dev_queue_xmit+0x177/0x302
[2148614.155245]
[2148614.155246] but task is already holding lock:
[2148614.155346]  (&p->tcfc_lock){-+..}, at: [<f8a10066>] tcf_mirred+0x20/
0x180 [act_mirred]
[2148614.155569]
[2148614.155570] which lock already depends on the new lock.
[2148614.155571]
[2148614.155715]
[2148614.155716] the existing dependency chain (in reverse order) is:
[2148614.155817]
[2148614.155818] -> #2 (&p->tcfc_lock){-+..}:
[2148614.156026]        [<c014368c>] __lock_acquire+0xa30/0xc19
[2148614.156371]        [<c01438ef>] lock_acquire+0x7a/0x94
[2148614.156714]        [<c02d9525>] _spin_lock+0x2e/0x58
[2148614.157062]        [<f89e638d>] tcf_ipt+0x4b/0xde [act_ipt]
[2148614.157412]        [<c0298000>] tcf_action_exec+0x44/0x77
[2148614.157764]        [<f89da5d2>] u32_classify+0x119/0x24a [cls_u32]
[2148614.158116]        [<c0295e1e>] tc_classify_compat+0x2f/0x5e
[2148614.158460]        [<c0295fff>] tc_classify+0x1a/0x80
[2148614.158802]        [<f89ec118>] ingress_enqueue+0x1a/0x53 [sch_ingress]
[2148614.159152]        [<c0287275>] netif_receive_skb+0x296/0x44c
[2148614.159496]        [<c0289ada>] process_backlog+0x77/0xd4
[2148614.159839]        [<c02895f8>] net_rx_action+0xbf/0x201
[2148614.160183]        [<c012ac15>] __do_softirq+0x6f/0xe9
[2148614.160530]        [<c01078af>] do_softirq+0x61/0xc8
[2148614.160876]        [<ffffffff>] 0xffffffff
[2148614.161226]
[2148614.161227] -> #1 (&dev->ingress_lock){-+..}:
[2148614.161435]        [<c014368c>] __lock_acquire+0xa30/0xc19
[2148614.161781]        [<c01438ef>] lock_acquire+0x7a/0x94
[2148614.162123]        [<c02d9525>] _spin_lock+0x2e/0x58
[2148614.162465]        [<c02954c3>] qdisc_lock_tree+0x1e/0x21
[2148614.162807]        [<c0296ba5>] qdisc_create+0x17e/0x1e3
[2148614.163148]        [<c0297368>] tc_modify_qdisc+0x2a3/0x3a3
[2148614.163490]        [<c029051d>] rtnetlink_rcv_msg+0x18e/0x1a8
[2148614.163835]        [<c029b670>] netlink_rcv_skb+0x30/0x82
[2148614.164181]        [<c0290387>] rtnetlink_rcv+0x17/0x1f
[2148614.164523]        [<c029b444>] netlink_unicast+0x19c/0x205
[2148614.164868]        [<c029bc41>] netlink_sendmsg+0x250/0x25c
[2148614.165210]        [<c027d894>] sock_sendmsg+0xcc/0xe5
[2148614.165554]        [<c027da00>] sys_sendmsg+0x153/0x1b1
[2148614.165898]        [<c027e860>] sys_socketcall+0x203/0x222
[2148614.166242]        [<c0104e9a>] sysenter_past_esp+0x5f/0xa5
[2148614.166589]        [<ffffffff>] 0xffffffff
[2148614.166935]
[2148614.166936] -> #0 (&ifb_queue_lock_key){-+..}:
[2148614.167143]        [<c014357c>] __lock_acquire+0x920/0xc19
[2148614.167490]        [<c01438ef>] lock_acquire+0x7a/0x94
[2148614.167832]        [<c02d9525>] _spin_lock+0x2e/0x58
[2148614.168176]        [<c0289d4d>] dev_queue_xmit+0x177/0x302
[2148614.168529]        [<f8a101a5>] tcf_mirred+0x15f/0x180 [act_mirred]
[2148614.168878]        [<c0298000>] tcf_action_exec+0x44/0x77
[2148614.169223]        [<f89da5d2>] u32_classify+0x119/0x24a [cls_u32]
[2148614.169571]        [<c0295e1e>] tc_classify_compat+0x2f/0x5e
[2148614.169914]        [<c0295fff>] tc_classify+0x1a/0x80
[2148614.170257]        [<f89ec118>] ingress_enqueue+0x1a/0x53 [sch_ingress]
[2148614.170605]        [<c0287275>] netif_receive_skb+0x296/0x44c
[2148614.170950]        [<c0289ada>] process_backlog+0x77/0xd4
[2148614.171293]        [<c02895f8>] net_rx_action+0xbf/0x201
[2148614.171635]        [<c012ac15>] __do_softirq+0x6f/0xe9
[2148614.171978]        [<c01078af>] do_softirq+0x61/0xc8
[2148614.172321]        [<ffffffff>] 0xffffffff
[2148614.172664]
[2148614.172665] other info that might help us debug this:
[2148614.172666]
[2148614.172810] 5 locks held by swapper/0:
[2148614.172862]  #0:  (rcu_read_lock){..--}, at: [<c0289589>] 
net_rx_action+0x50/0x201
[2148614.173111]  #1:  (rcu_read_lock){..--}, at: [<c02870d5>] 
netif_receive_skb+0xf6/0x44c
[2148614.173359]  #2:  (&dev->ingress_lock){-+..}, at: [<c0287261>] 
netif_receive_skb+0x282/0x44c
[2148614.173608]  #3:  (&p->tcfc_lock){-+..}, at: [<f8a10066>] 
tcf_mirred+0x20/0x180 [act_mirred]
[2148614.173859]  #4:  (rcu_read_lock){..--}, at: [<c0289cf3>] 
dev_queue_xmit+0x11d/0x302
[2148614.174110]
[2148614.174111] stack backtrace:
[2148614.174208] Pid: 0, comm: swapper Not tainted 2.6.24.3-build-0023 #9
[2148614.174265]  [<c0105f2a>] show_trace_log_lvl+0x1a/0x2f
[2148614.174360]  [<c0106843>] show_trace+0x12/0x14
[2148614.174452]  [<c010713a>] dump_stack+0x6c/0x72
[2148614.174541]  [<c014175c>] print_circular_bug_tail+0x5f/0x68
[2148614.174634]  [<c014357c>] __lock_acquire+0x920/0xc19
[2148614.174725]  [<c01438ef>] lock_acquire+0x7a/0x94
[2148614.174814]  [<c02d9525>] _spin_lock+0x2e/0x58
[2148614.174904]  [<c0289d4d>] dev_queue_xmit+0x177/0x302
[2148614.174995]  [<f8a101a5>] tcf_mirred+0x15f/0x180 [act_mirred]
[2148614.175091]  [<c0298000>] tcf_action_exec+0x44/0x77
[2148614.175183]  [<f89da5d2>] u32_classify+0x119/0x24a [cls_u32]
[2148614.175279]  [<c0295e1e>] tc_classify_compat+0x2f/0x5e
[2148614.175370]  [<c0295fff>] tc_classify+0x1a/0x80
[2148614.175460]  [<f89ec118>] ingress_enqueue+0x1a/0x53 [sch_ingress]
[2148614.175554]  [<c0287275>] netif_receive_skb+0x296/0x44c
[2148614.175646]  [<c0289ada>] process_backlog+0x77/0xd4
[2148614.175735]  [<c02895f8>] net_rx_action+0xbf/0x201
[2148614.175825]  [<c012ac15>] __do_softirq+0x6f/0xe9
[2148614.175915]  [<c01078af>] do_softirq+0x61/0xc8
[2148614.176005]  =======================


On Sat, 8 Mar 2008 09:58:54 +0100, Jarek Poplawski wrote
> On Sat, Mar 08, 2008 at 09:46:28AM +0100, Jarek Poplawski wrote:
> ....
> > said doesn't use this patch with it's ingress (at least currently).
> 
> + said doesn't use this path with it's ingress (at least currently).
> 
> Jarek P.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: circular locking, mirred, 2.6.24.2
  2008-03-08 10:16                             ` Denys Fedoryshchenko
@ 2008-03-08 10:43                               ` Jarek Poplawski
  2008-03-08 10:52                                 ` Jarek Poplawski
  0 siblings, 1 reply; 47+ messages in thread
From: Jarek Poplawski @ 2008-03-08 10:43 UTC (permalink / raw)
  To: Denys Fedoryshchenko; +Cc: jamal, netdev

On Sat, Mar 08, 2008 at 12:16:52PM +0200, Denys Fedoryshchenko wrote:
> Update. I am not able to reproduce on my PC, but still able to reproduce on 
> PPPoE server (but here i cannot take risk to use 2.6.25-rcX kernels). This is 
> PPPoE:
> 
> [2148614.154684]
> [2148614.154688] =======================================================
> [2148614.154805] [ INFO: possible circular locking dependency detected ]
> [2148614.154862] 2.6.24.3-build-0023 #9
> [2148614.154913] -------------------------------------------------------
> [2148614.154969] swapper/0 is trying to acquire lock:
> [2148614.155023]  (&ifb_queue_lock_key){-+..}, at: [<c0289d4d>] 
> dev_queue_xmit+0x177/0x302
> [2148614.155245]
> [2148614.155246] but task is already holding lock:
> [2148614.155346]  (&p->tcfc_lock){-+..}, at: [<f8a10066>] tcf_mirred+0x20/
> 0x180 [act_mirred]
> [2148614.155569]
> [2148614.155570] which lock already depends on the new lock.


...Hmm... the same day I sent to you "take 2" of my patch which was more
complete (lockdep_set_class for ingress_lock added). Could you try this?
It still could be not enough, and something similar is needed for
tcfc_lock in act_mirred, but I'd like to see a report after this patch
first.

As Jamal wrote this all is a false-positive with ifb, but I think it's
needed to please lockdep.

Thanks,
Jarek P.

PS: I resend this "take 2" here:
---

 drivers/net/ifb.c |   24 ++++++++++++++++++++++++
 1 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ifb.c b/drivers/net/ifb.c
index 15949d3..c553b62 100644
--- a/drivers/net/ifb.c
+++ b/drivers/net/ifb.c
@@ -227,6 +227,27 @@ static struct rtnl_link_ops ifb_link_ops __read_mostly = {
 module_param(numifbs, int, 0);
 MODULE_PARM_DESC(numifbs, "Number of ifb devices");
 
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+/*
+ * dev_ifb->queue_lock is usually taken after dev->ingress_lock,
+ * reversely to e.g. qdisc_lock_tree(). It should be safe until
+ * ifb doesn't take dev->queue_lock with dev_ifb->ingress_lock.
+ * But lockdep should know that ifb has different locks from dev.
+ */
+static struct lock_class_key ifb_queue_lock_key;
+static struct lock_class_key ifb_ingress_lock_key;
+
+static inline void ifb_set_lock_classes(struct net_device *dev_ifb)
+{
+	lockdep_set_class(&dev_ifb->queue_lock, &ifb_queue_lock_key);
+	lockdep_set_class(&dev_ifb->ingress_lock, &ifb_ingress_lock_key);
+}
+#else
+static inline void ifb_set_lock_classes(struct net_device *dev_ifb)
+{
+}
+#endif	/* CONFIG_DEBUG_LOCK_ALLOC */
+
 static int __init ifb_init_one(int index)
 {
 	struct net_device *dev_ifb;
@@ -246,6 +267,9 @@ static int __init ifb_init_one(int index)
 	err = register_netdevice(dev_ifb);
 	if (err < 0)
 		goto err;
+
+	ifb_set_lock_classes(dev_ifb);
+
 	return 0;
 
 err:

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: circular locking, mirred, 2.6.24.2
  2008-03-08 10:43                               ` Jarek Poplawski
@ 2008-03-08 10:52                                 ` Jarek Poplawski
  2008-03-08 11:09                                   ` Denys Fedoryshchenko
  0 siblings, 1 reply; 47+ messages in thread
From: Jarek Poplawski @ 2008-03-08 10:52 UTC (permalink / raw)
  To: Denys Fedoryshchenko; +Cc: jamal, netdev

On Sat, Mar 08, 2008 at 11:43:22AM +0100, Jarek Poplawski wrote:
...
> ...Hmm... the same day I sent to you "take 2" of my patch which was more
> complete (lockdep_set_class for ingress_lock added). Could you try this?
> It still could be not enough, and something similar is needed for
> tcfc_lock in act_mirred, but I'd like to see a report after this patch
> first.

...Hmm#2... or maybe it doesn't matter? I'll try to send you today
this additional act_mirred patch.

Jarek P.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: circular locking, mirred, 2.6.24.2
  2008-03-08 10:52                                 ` Jarek Poplawski
@ 2008-03-08 11:09                                   ` Denys Fedoryshchenko
  2008-03-08 12:02                                     ` Jarek Poplawski
  0 siblings, 1 reply; 47+ messages in thread
From: Denys Fedoryshchenko @ 2008-03-08 11:09 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: jamal, netdev

Seems "try 2" helped. lockdep is not triggered anymore. I test on 3 different 
servers for now.
I will test more deeply and on more servers.


On Sat, 8 Mar 2008 11:52:50 +0100, Jarek Poplawski wrote
> On Sat, Mar 08, 2008 at 11:43:22AM +0100, Jarek Poplawski wrote:
> ....
> > ...Hmm... the same day I sent to you "take 2" of my patch which was more
> > complete (lockdep_set_class for ingress_lock added). Could you try this?
> > It still could be not enough, and something similar is needed for
> > tcfc_lock in act_mirred, but I'd like to see a report after this patch
> > first.
> 
> ....Hmm#2... or maybe it doesn't matter? I'll try to send you today
> this additional act_mirred patch.
> 
> Jarek P.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: circular locking, mirred, 2.6.24.2
  2008-03-08 11:09                                   ` Denys Fedoryshchenko
@ 2008-03-08 12:02                                     ` Jarek Poplawski
  2008-03-19  0:46                                       ` Denys Fedoryshchenko
  0 siblings, 1 reply; 47+ messages in thread
From: Jarek Poplawski @ 2008-03-08 12:02 UTC (permalink / raw)
  To: Denys Fedoryshchenko; +Cc: jamal, netdev

On Sat, Mar 08, 2008 at 01:09:10PM +0200, Denys Fedoryshchenko wrote:
> Seems "try 2" helped. lockdep is not triggered anymore. I test on 3 different 
> servers for now.
> I will test more deeply and on more servers.

Fine! So, let's wait yet... (No need to hurry.)

BTW, I thought about those earlier probably stack overflow problems,
and I wonder if using action ipt couldn't be blamed here. Sometimes
it seems to be unnecessary, so maybe you could try if re-doing some
filter rules can help.

Jarek P.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: circular locking, mirred, 2.6.24.2
  2008-03-08 12:02                                     ` Jarek Poplawski
@ 2008-03-19  0:46                                       ` Denys Fedoryshchenko
  2008-03-19  7:34                                         ` [PATCH][NET] ifb: set separate lockdep classes for queue locks Jarek Poplawski
  0 siblings, 1 reply; 47+ messages in thread
From: Denys Fedoryshchenko @ 2008-03-19  0:46 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: jamal, netdev

No more warnings.
Probably it must be applied to 2.6.25 before it is released?

On Sat, 8 Mar 2008 13:02:44 +0100, Jarek Poplawski wrote
> On Sat, Mar 08, 2008 at 01:09:10PM +0200, Denys Fedoryshchenko wrote:
> > Seems "try 2" helped. lockdep is not triggered anymore. I test on 3 
different 
> > servers for now.
> > I will test more deeply and on more servers.
> 
> Fine! So, let's wait yet... (No need to hurry.)
> 
> BTW, I thought about those earlier probably stack overflow problems,
> and I wonder if using action ipt couldn't be blamed here. Sometimes
> it seems to be unnecessary, so maybe you could try if re-doing some
> filter rules can help.
> 
> Jarek P.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH][NET] ifb: set separate lockdep classes for queue locks
  2008-03-19  0:46                                       ` Denys Fedoryshchenko
@ 2008-03-19  7:34                                         ` Jarek Poplawski
  2008-03-19 11:34                                           ` jamal
  2008-03-20 22:37                                           ` David Miller
  0 siblings, 2 replies; 47+ messages in thread
From: Jarek Poplawski @ 2008-03-19  7:34 UTC (permalink / raw)
  To: Jeff Garzik, David S. Miller; +Cc: Denys Fedoryshchenko, jamal, netdev

On Wed, Mar 19, 2008 at 02:46:08AM +0200, Denys Fedoryshchenko wrote:
> No more warnings.
> Probably it must be applied to 2.6.25 before it is released?
> 
> On Sat, 8 Mar 2008 13:02:44 +0100, Jarek Poplawski wrote
> > On Sat, Mar 08, 2008 at 01:09:10PM +0200, Denys Fedoryshchenko wrote:
> > > Seems "try 2" helped. lockdep is not triggered anymore. I test on 3 
> different 
> > > servers for now.
> > > I will test more deeply and on more servers.

Thanks for testing Denys,
Jarek P.

--------------------->

Subject: [NET] ifb: set separate lockdep classes for queue locks

> [2148614.154688] =======================================================
> [2148614.154805] [ INFO: possible circular locking dependency detected ]
> [2148614.154862] 2.6.24.3-build-0023 #9
> [2148614.154913] -------------------------------------------------------
> [2148614.154969] swapper/0 is trying to acquire lock:
> [2148614.155023]  (&ifb_queue_lock_key){-+..}, at: [<c0289d4d>] 
> dev_queue_xmit+0x177/0x302
> [2148614.155245]
> [2148614.155246] but task is already holding lock:
> [2148614.155346]  (&p->tcfc_lock){-+..}, at: [<f8a10066>] tcf_mirred+0x20/
> 0x180 [act_mirred]
> [2148614.155569]
> [2148614.155570] which lock already depends on the new lock.

lockdep warns of locking order while using ifb with sch_ingress and
act_mirred: ingress_lock, tcfc_lock, queue_lock (usually queue_lock
is at the beginning). This patch is only to tell lockdep that ifb is
a different device (e.g. from eth) and has its own pair of queue
locks. (This warning is a false-positive in common scenario of using
ifb; yet there are possible situations, when this order could be
dangerous; lockdep should warn in such a case.)


Reported-and-tested-by: Denys Fedoryshchenko <denys@visp.net.lb>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Cc: Jamal Hadi Salim <hadi@cyberus.ca>

---

 drivers/net/ifb.c |   24 ++++++++++++++++++++++++
 1 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ifb.c b/drivers/net/ifb.c
index 15949d3..c553b62 100644
--- a/drivers/net/ifb.c
+++ b/drivers/net/ifb.c
@@ -227,6 +227,27 @@ static struct rtnl_link_ops ifb_link_ops __read_mostly = {
 module_param(numifbs, int, 0);
 MODULE_PARM_DESC(numifbs, "Number of ifb devices");
 
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+/*
+ * dev_ifb->queue_lock is usually taken after dev->ingress_lock,
+ * reversely to e.g. qdisc_lock_tree(). It should be safe until
+ * ifb doesn't take dev->queue_lock with dev_ifb->ingress_lock.
+ * But lockdep should know that ifb has different locks from dev.
+ */
+static struct lock_class_key ifb_queue_lock_key;
+static struct lock_class_key ifb_ingress_lock_key;
+
+static inline void ifb_set_lock_classes(struct net_device *dev_ifb)
+{
+	lockdep_set_class(&dev_ifb->queue_lock, &ifb_queue_lock_key);
+	lockdep_set_class(&dev_ifb->ingress_lock, &ifb_ingress_lock_key);
+}
+#else
+static inline void ifb_set_lock_classes(struct net_device *dev_ifb)
+{
+}
+#endif	/* CONFIG_DEBUG_LOCK_ALLOC */
+
 static int __init ifb_init_one(int index)
 {
 	struct net_device *dev_ifb;
@@ -246,6 +267,9 @@ static int __init ifb_init_one(int index)
 	err = register_netdevice(dev_ifb);
 	if (err < 0)
 		goto err;
+
+	ifb_set_lock_classes(dev_ifb);
+
 	return 0;
 
 err:
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH][NET] ifb: set separate lockdep classes for queue locks
  2008-03-19  7:34                                         ` [PATCH][NET] ifb: set separate lockdep classes for queue locks Jarek Poplawski
@ 2008-03-19 11:34                                           ` jamal
  2008-03-19 12:20                                             ` Jarek Poplawski
  2008-03-20 22:37                                           ` David Miller
  1 sibling, 1 reply; 47+ messages in thread
From: jamal @ 2008-03-19 11:34 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Jeff Garzik, David S. Miller, Denys Fedoryshchenko, netdev

On Wed, 2008-19-03 at 07:34 +0000, Jarek Poplawski wrote:

Thanks for logging the false-positive aspect Jarek.

> Reported-and-tested-by: Denys Fedoryshchenko <denys@visp.net.lb>
> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
> Cc: Jamal Hadi Salim <hadi@cyberus.ca>

Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>

cheers,
jamal


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH][NET] ifb: set separate lockdep classes for queue locks
  2008-03-19 11:34                                           ` jamal
@ 2008-03-19 12:20                                             ` Jarek Poplawski
  0 siblings, 0 replies; 47+ messages in thread
From: Jarek Poplawski @ 2008-03-19 12:20 UTC (permalink / raw)
  To: jamal; +Cc: Jeff Garzik, David S. Miller, Denys Fedoryshchenko, netdev

On Wed, Mar 19, 2008 at 07:34:29AM -0400, jamal wrote:
...
> Thanks for logging the false-positive aspect Jarek.

I guess it's something worth to remember...

Cheers,
Jarek P.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH][NET] ifb: set separate lockdep classes for queue locks
  2008-03-19  7:34                                         ` [PATCH][NET] ifb: set separate lockdep classes for queue locks Jarek Poplawski
  2008-03-19 11:34                                           ` jamal
@ 2008-03-20 22:37                                           ` David Miller
  2008-03-21  0:03                                             ` [PATCH take2][NET] " Jarek Poplawski
  1 sibling, 1 reply; 47+ messages in thread
From: David Miller @ 2008-03-20 22:37 UTC (permalink / raw)
  To: jarkao2; +Cc: jeff, denys, hadi, netdev

From: Jarek Poplawski <jarkao2@gmail.com>
Date: Wed, 19 Mar 2008 07:34:41 +0000

> Subject: [NET] ifb: set separate lockdep classes for queue locks
> 
> > [2148614.154688] =======================================================
> > [2148614.154805] [ INFO: possible circular locking dependency detected ]
> > [2148614.154862] 2.6.24.3-build-0023 #9
> > [2148614.154913] -------------------------------------------------------
> > [2148614.154969] swapper/0 is trying to acquire lock:
> > [2148614.155023]  (&ifb_queue_lock_key){-+..}, at: [<c0289d4d>] 
> > dev_queue_xmit+0x177/0x302
> > [2148614.155245]
> > [2148614.155246] but task is already holding lock:
> > [2148614.155346]  (&p->tcfc_lock){-+..}, at: [<f8a10066>] tcf_mirred+0x20/
> > 0x180 [act_mirred]
> > [2148614.155569]
> > [2148614.155570] which lock already depends on the new lock.
> 
> lockdep warns of locking order while using ifb with sch_ingress and
> act_mirred: ingress_lock, tcfc_lock, queue_lock (usually queue_lock
> is at the beginning). This patch is only to tell lockdep that ifb is
> a different device (e.g. from eth) and has its own pair of queue
> locks. (This warning is a false-positive in common scenario of using
> ifb; yet there are possible situations, when this order could be
> dangerous; lockdep should warn in such a case.)
> 
> 
> Reported-and-tested-by: Denys Fedoryshchenko <denys@visp.net.lb>
> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
> Cc: Jamal Hadi Salim <hadi@cyberus.ca>

Jarek, the code in linux/lockdep.h provides dummy do-nothing
versions of the lockdep_*() interfaces, so the spinlock
debug ifdeffing you do here is unnecessary.

Simply include linux/lockdep.h and perform the actions
unconditionally.

For example, this is how net/core/sock.c does things.

Also, please upgrade Jamal's "CC" to an "Acked-by" :-)

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH take2][NET] ifb: set separate lockdep classes for queue locks
  2008-03-20 22:37                                           ` David Miller
@ 2008-03-21  0:03                                             ` Jarek Poplawski
  2008-03-21  0:05                                               ` David Miller
  2008-03-21  0:15                                               ` Jarek Poplawski
  0 siblings, 2 replies; 47+ messages in thread
From: Jarek Poplawski @ 2008-03-21  0:03 UTC (permalink / raw)
  To: David Miller; +Cc: jeff, denys, hadi, netdev

On Thu, Mar 20, 2008 at 03:37:45PM -0700, David Miller wrote:
...
> Jarek, the code in linux/lockdep.h provides dummy do-nothing
> versions of the lockdep_*() interfaces, so the spinlock
> debug ifdeffing you do here is unnecessary.
> 
> Simply include linux/lockdep.h and perform the actions
> unconditionally.
> 
> For example, this is how net/core/sock.c does things.
> 
> Also, please upgrade Jamal's "CC" to an "Acked-by" :-)

IMHO it wastes a bit of memory when lockdep is off and adds some
overhead when lockdep is "partialy" on and doesn't need this, but
it's really late...

Regards,
Jarek P.

--------------------->

Subject: [NET] ifb: set separate lockdep classes for queue locks

[   10.536424] =======================================================
[   10.536424] [ INFO: possible circular locking dependency detected ]
[   10.536424] 2.6.25-rc3-devel #3
[   10.536424] -------------------------------------------------------
[   10.536424] swapper/0 is trying to acquire lock:
[   10.536424]  (&dev->queue_lock){-+..}, at: [<c0299b4a>] 
dev_queue_xmit+0x175/0x2f3
[   10.536424]
[   10.536424] but task is already holding lock:
[   10.536424]  (&p->tcfc_lock){-+..}, at: [<f8a67154>] tcf_mirred+0x20/0x178 
[act_mirred]
[   10.536424]
[   10.536424] which lock already depends on the new lock.

lockdep warns of locking order while using ifb with sch_ingress and
act_mirred: ingress_lock, tcfc_lock, queue_lock (usually queue_lock
is at the beginning). This patch is only to tell lockdep that ifb is
a different device (e.g. from eth) and has its own pair of queue
locks. (This warning is a false-positive in common scenario of using
ifb; yet there are possible situations, when this order could be
dangerous; lockdep should warn in such a case.) (With suggestions by
David S. Miller)


Reported-and-tested-by: Denys Fedoryshchenko <denys@visp.net.lb>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>

---

 drivers/net/ifb.c |   15 +++++++++++++++
 1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ifb.c b/drivers/net/ifb.c
index 15949d3..af233b5 100644
--- a/drivers/net/ifb.c
+++ b/drivers/net/ifb.c
@@ -35,6 +35,7 @@
 #include <linux/moduleparam.h>
 #include <net/pkt_sched.h>
 #include <net/net_namespace.h>
+#include <linux/lockdep.h>
 
 #define TX_TIMEOUT  (2*HZ)
 
@@ -227,6 +228,16 @@ static struct rtnl_link_ops ifb_link_ops __read_mostly = {
 module_param(numifbs, int, 0);
 MODULE_PARM_DESC(numifbs, "Number of ifb devices");
 
+/*
+ * dev_ifb->queue_lock is usually taken after dev->ingress_lock,
+ * reversely to e.g. qdisc_lock_tree(). It should be safe until
+ * ifb doesn't take dev->queue_lock with dev_ifb->ingress_lock.
+ * But lockdep should know that ifb has different locks from dev.
+ */
+static struct lock_class_key ifb_queue_lock_key;
+static struct lock_class_key ifb_ingress_lock_key;
+
+
 static int __init ifb_init_one(int index)
 {
 	struct net_device *dev_ifb;
@@ -246,6 +257,10 @@ static int __init ifb_init_one(int index)
 	err = register_netdevice(dev_ifb);
 	if (err < 0)
 		goto err;
+
+	lockdep_set_class(&dev_ifb->queue_lock, &ifb_queue_lock_key);
+	lockdep_set_class(&dev_ifb->ingress_lock, &ifb_ingress_lock_key);
+
 	return 0;
 
 err:

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH take2][NET] ifb: set separate lockdep classes for queue locks
  2008-03-21  0:03                                             ` [PATCH take2][NET] " Jarek Poplawski
@ 2008-03-21  0:05                                               ` David Miller
  2008-03-21  0:15                                               ` Jarek Poplawski
  1 sibling, 0 replies; 47+ messages in thread
From: David Miller @ 2008-03-21  0:05 UTC (permalink / raw)
  To: jarkao2; +Cc: jeff, denys, hadi, netdev

From: Jarek Poplawski <jarkao2@gmail.com>
Date: Fri, 21 Mar 2008 01:03:12 +0100

> IMHO it wastes a bit of memory when lockdep is off and adds some
> overhead when lockdep is "partialy" on and doesn't need this, but
> it's really late...

Understood but I really think this is the right way to
implement this.

Patch applied, thanks!

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH take2][NET] ifb: set separate lockdep classes for queue locks
  2008-03-21  0:03                                             ` [PATCH take2][NET] " Jarek Poplawski
  2008-03-21  0:05                                               ` David Miller
@ 2008-03-21  0:15                                               ` Jarek Poplawski
  1 sibling, 0 replies; 47+ messages in thread
From: Jarek Poplawski @ 2008-03-21  0:15 UTC (permalink / raw)
  To: David Miller; +Cc: jeff, denys, hadi, netdev

On Fri, Mar 21, 2008 at 01:03:12AM +0100, Jarek Poplawski wrote:
...
> IMHO it wastes a bit of memory when lockdep is off and adds some

OOPS! My IMHO is wrong again - no memory is wasted when lockdep is off!

Sorry,
Jarek P.

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2008-03-21  0:09 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-24 22:20 circular locking, mirred, 2.6.24.2 Denys Fedoryshchenko
2008-02-25  9:56 ` Jarek Poplawski
2008-02-25 10:48   ` Denys Fedoryshchenko
2008-02-25 11:39     ` Jarek Poplawski
2008-03-05 10:45       ` Denys Fedoryshchenko
2008-03-05 13:54         ` [BUG] Probably lockdep bug " Jarek Poplawski
2008-03-06  9:41           ` Jarek Poplawski
2008-03-06 13:40         ` Jarek Poplawski
2008-03-06 13:57           ` Denys Fedoryshchenko
2008-03-06 14:27             ` jamal
2008-03-06 15:50               ` Denys Fedoryshchenko
2008-03-06 20:25                 ` Jarek Poplawski
2008-03-06 20:56                   ` jamal
2008-03-06 22:12                     ` Jarek Poplawski
2008-03-06 23:43                       ` Denys Fedoryshchenko
2008-03-07  0:09                         ` jamal
2008-03-07  0:15                           ` Denys Fedoryshchenko
2008-03-07  0:25                             ` jamal
2008-03-07  9:31                         ` Jarek Poplawski
2008-03-07 10:19                           ` Denys Fedoryshchenko
2008-03-07 10:48                             ` Jarek Poplawski
2008-03-07 14:58                             ` jamal
2008-03-06 20:44                 ` jamal
2008-03-06 13:59           ` jamal
2008-03-06 17:56             ` Jarek Poplawski
2008-03-06 20:48               ` jamal
2008-03-06 21:40                 ` Jarek Poplawski
2008-03-06 23:40                   ` jamal
2008-03-07  7:51                     ` Jarek Poplawski
2008-03-07  8:32                       ` Jarek Poplawski
2008-03-07 13:53                       ` jamal
2008-03-08  8:46                         ` Jarek Poplawski
2008-03-08  8:58                           ` Jarek Poplawski
2008-03-08  9:56                             ` Denys Fedoryshchenko
2008-03-08 10:16                             ` Denys Fedoryshchenko
2008-03-08 10:43                               ` Jarek Poplawski
2008-03-08 10:52                                 ` Jarek Poplawski
2008-03-08 11:09                                   ` Denys Fedoryshchenko
2008-03-08 12:02                                     ` Jarek Poplawski
2008-03-19  0:46                                       ` Denys Fedoryshchenko
2008-03-19  7:34                                         ` [PATCH][NET] ifb: set separate lockdep classes for queue locks Jarek Poplawski
2008-03-19 11:34                                           ` jamal
2008-03-19 12:20                                             ` Jarek Poplawski
2008-03-20 22:37                                           ` David Miller
2008-03-21  0:03                                             ` [PATCH take2][NET] " Jarek Poplawski
2008-03-21  0:05                                               ` David Miller
2008-03-21  0:15                                               ` Jarek Poplawski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).