netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* circular locking dependency splat due to f9eb8aea2a1e
@ 2016-06-08 18:17 David Ahern
  2016-06-08 20:30 ` Eric Dumazet
  0 siblings, 1 reply; 6+ messages in thread
From: David Ahern @ 2016-06-08 18:17 UTC (permalink / raw)
  To: edumazet, netdev@vger.kernel.org

Getting the below splat with ping in a VRF. git bisect points to:

dsa@kenny:~/kernel-2.git$ git bisect good
f9eb8aea2a1e12fc2f584d1627deeb957435a801 is the first bad commit
commit f9eb8aea2a1e12fc2f584d1627deeb957435a801
Author: Eric Dumazet <edumazet@google.com>
Date:   Mon Jun 6 09:37:15 2016 -0700

     net_sched: transform qdisc running bit into a seqcount

     Instead of using a single bit (__QDISC___STATE_RUNNING)
     in sch->__state, use a seqcount.

     This adds lockdep support, but more importantly it will allow us
     to sample qdisc/class statistics without having to grab qdisc root 
lock.

     Signed-off-by: Eric Dumazet <edumazet@google.com>
     Cc: Cong Wang <xiyou.wangcong@gmail.com>
     Cc: Jamal Hadi Salim <jhs@mojatatu.com>
     Signed-off-by: David S. Miller <davem@davemloft.net>


[   49.036323] ======================================================
[   49.038300] [ INFO: possible circular locking dependency detected ]
[   49.040313] 4.7.0-rc1+ #252 Not tainted
[   49.041502] -------------------------------------------------------
[   49.043518] ping/2143 is trying to acquire lock:
[   49.044999]  (&(&list->lock)->rlock#3){+.-...}, at: 
[<ffffffff81407cc5>] __dev_queue_xmit+0x3d3/0x751
[   49.048107]
[   49.048107] but task is already holding lock:
[   49.049844]  (dev->qdisc_running_key ?: &qdisc_running_key){+.-...}, 
at: [<ffffffff81407d1a>] __dev_queue_xmit+0x428/0x751
[   49.052783]
[   49.052783] which lock already depends on the new lock.
[   49.052783]
[   49.054455]
[   49.054455] the existing dependency chain (in reverse order) is:
[   49.055894]
-> #1 (dev->qdisc_running_key ?: &qdisc_running_key){+.-...}:
[   49.057531]        [<ffffffff810853b7>] __lock_acquire+0x5e4/0x690
[   49.058924]        [<ffffffff81085a32>] lock_acquire+0x140/0x1d8
[   49.060271]        [<ffffffff81401fe6>] write_seqcount_begin+0x21/0x24
[   49.061741]        [<ffffffff81407d1a>] __dev_queue_xmit+0x428/0x751
[   49.062936]        [<ffffffff8140804e>] dev_queue_xmit+0xb/0xd
[   49.064258]        [<ffffffff81411989>] neigh_resolve_output+0x113/0x12e
[   49.065767]        [<ffffffff8149fda9>] ip6_finish_output2+0x39b/0x3ee
[   49.067212]        [<ffffffff814a2472>] ip6_finish_output+0x96/0x9d
[   49.068568]        [<ffffffff814a24f9>] ip6_output+0x80/0xe9
[   49.069789]        [<ffffffff814bf9bc>] dst_output+0x4f/0x56
[   49.070824]        [<ffffffff814c01b6>] mld_sendpack+0x17e/0x1fd
[   49.071964]        [<ffffffff814c17d9>] mld_ifc_timer_expire+0x1ea/0x21e
[   49.073407]        [<ffffffff8109deb9>] call_timer_fn+0x120/0x303
[   49.074551]        [<ffffffff8109e29b>] run_timer_softirq+0x1ae/0x1d8
[   49.075399]        [<ffffffff8105129a>] __do_softirq+0x1c3/0x451
[   49.076217]        [<ffffffff81051744>] irq_exit+0x40/0x94
[   49.076945]        [<ffffffff81031069>] 
smp_apic_timer_interrupt+0x29/0x34
[   49.077849]        [<ffffffff814f6fac>] apic_timer_interrupt+0x8c/0xa0
[   49.078719]        [<ffffffff8101da2d>] default_idle+0x1f/0x32
[   49.079462]        [<ffffffff8101dfe0>] arch_cpu_idle+0xa/0xc
[   49.080191]        [<ffffffff8107eeca>] default_idle_call+0x1e/0x20
[   49.080989]        [<ffffffff8107f002>] cpu_startup_entry+0x136/0x1ee
[   49.081805]        [<ffffffff814f05b4>] rest_init+0x12b/0x131
[   49.082551]        [<ffffffff81cb8ea2>] start_kernel+0x3da/0x3e7
[   49.083317]        [<ffffffff81cb841b>] 
x86_64_start_reservations+0x2f/0x31
[   49.084192]        [<ffffffff81cb854a>] x86_64_start_kernel+0x12d/0x13a
[   49.085025]
-> #0 (&(&list->lock)->rlock#3){+.-...}:
[   49.085749]        [<ffffffff8108435a>] 
validate_chain.isra.37+0x7c8/0xa5b
[   49.086631]        [<ffffffff810853b7>] __lock_acquire+0x5e4/0x690
[   49.087427]        [<ffffffff81085a32>] lock_acquire+0x140/0x1d8
[   49.088194]        [<ffffffff814f5896>] _raw_spin_lock+0x2f/0x65
[   49.088961]        [<ffffffff81407cc5>] __dev_queue_xmit+0x3d3/0x751
[   49.089771]        [<ffffffff8140804e>] dev_queue_xmit+0xb/0xd
[   49.090534]        [<ffffffff8146a633>] arp_xmit+0x32/0x7e
[   49.091238]        [<ffffffff8146a6bb>] arp_send_dst+0x3c/0x42
[   49.091972]        [<ffffffff8146a93c>] arp_solicit+0x27b/0x28e
[   49.092719]        [<ffffffff8140ead8>] neigh_probe+0x4a/0x5e
[   49.093448]        [<ffffffff814100b8>] __neigh_event_send+0x1d0/0x21b
[   49.094305]        [<ffffffff8141012e>] neigh_event_send+0x2b/0x2d
[   49.095080]        [<ffffffff8141188e>] neigh_resolve_output+0x18/0x12e
[   49.095908]        [<ffffffff81441bf2>] ip_finish_output2+0x3c0/0x41c
[   49.096716]        [<ffffffff81442485>] ip_finish_output+0x132/0x13e
[   49.097517]        [<ffffffff814424b2>] 
NF_HOOK_COND.constprop.43+0x21/0x8a
[   49.098408]        [<ffffffff81443623>] ip_output+0x65/0x6a
[   49.099131]        [<ffffffff814414ed>] dst_output+0x2b/0x30
[   49.099848]        [<ffffffff81442f64>] ip_local_out+0x2a/0x31
[   49.100580]        [<ffffffff81398201>] vrf_xmit+0x149/0x28c
[   49.101294]        [<ffffffff814075f1>] dev_hard_start_xmit+0x154/0x337
[   49.102169]        [<ffffffff8142795e>] sch_direct_xmit+0x98/0x16c
[   49.102948]        [<ffffffff81407d45>] __dev_queue_xmit+0x453/0x751
[   49.103747]        [<ffffffff8140804e>] dev_queue_xmit+0xb/0xd
[   49.104480]        [<ffffffff81411989>] neigh_resolve_output+0x113/0x12e
[   49.105322]        [<ffffffff813976aa>] 
dst_neigh_output.isra.18+0x13b/0x148
[   49.106204]        [<ffffffff813977bb>] vrf_finish_output+0x104/0x139
[   49.107044]        [<ffffffff81397986>] vrf_output+0x5c/0xc1
[   49.107755]        [<ffffffff814414ed>] dst_output+0x2b/0x30
[   49.108472]        [<ffffffff81442f64>] ip_local_out+0x2a/0x31
[   49.109211]        [<ffffffff81443dd5>] ip_send_skb+0x14/0x38
[   49.109941]        [<ffffffff81443e27>] ip_push_pending_frames+0x2e/0x31
[   49.110792]        [<ffffffff81464a77>] raw_sendmsg+0x788/0x9d2
[   49.111543]        [<ffffffff81471de5>] inet_sendmsg+0x35/0x5c
[   49.112283]        [<ffffffff813ece43>] sock_sendmsg_nosec+0x12/0x1d
[   49.113079]        [<ffffffff813edcf7>] ___sys_sendmsg+0x1b1/0x21f
[   49.113851]        [<ffffffff813ee01c>] __sys_sendmsg+0x40/0x5e
[   49.114609]        [<ffffffff813ee04e>] SyS_sendmsg+0x14/0x16
[   49.115345]        [<ffffffff814f633c>] 
entry_SYSCALL_64_fastpath+0x1f/0xbd
[   49.116221]
[   49.116221] other info that might help us debug this:
[   49.116221]
[   49.117198]  Possible unsafe locking scenario:
[   49.117198]
[   49.117926]        CPU0                    CPU1
[   49.118503]        ----                    ----
[   49.119059]   lock(dev->qdisc_running_key ?: &qdisc_running_key);
[   49.119839]                                lock(&(&list->lock)->rlock#3);
[   49.120722] 
lock(dev->qdisc_running_key ?: &qdisc_running_key);
[   49.121800]   lock(&(&list->lock)->rlock#3);
[   49.122396]
[   49.122396]  *** DEADLOCK ***
[   49.122396]
[   49.123134] 6 locks held by ping/2143:
[   49.123604]  #0:  (sk_lock-AF_INET){+.+.+.}, at: [<ffffffff81464a18>] 
raw_sendmsg+0x729/0x9d2
[   49.124749]  #1:  (rcu_read_lock_bh){......}, at: 
[<ffffffff81397076>] rcu_lock_acquire+0x0/0x20
[   49.125914]  #2:  (rcu_read_lock_bh){......}, at: 
[<ffffffff813ff64d>] rcu_lock_acquire+0x0/0x20
[   49.127103]  #3:  (dev->qdisc_running_key ?: 
&qdisc_running_key){+.-...}, at: [<ffffffff81407d1a>] 
__dev_queue_xmit+0x428/0x751
[   49.128592]  #4:  (rcu_read_lock_bh){......}, at: 
[<ffffffff814412dd>] rcu_lock_acquire+0x0/0x20
[   49.129760]  #5:  (rcu_read_lock_bh){......}, at: 
[<ffffffff813ff64d>] rcu_lock_acquire+0x0/0x20
[   49.130940]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: circular locking dependency splat due to f9eb8aea2a1e
  2016-06-08 18:17 circular locking dependency splat due to f9eb8aea2a1e David Ahern
@ 2016-06-08 20:30 ` Eric Dumazet
  2016-06-08 20:31   ` David Ahern
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2016-06-08 20:30 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev@vger.kernel.org

On Wed, Jun 8, 2016 at 11:17 AM, David Ahern <dsa@cumulusnetworks.com> wrote:
> Getting the below splat with ping in a VRF. git bisect points to:
>
> dsa@kenny:~/kernel-2.git$ git bisect good
> f9eb8aea2a1e12fc2f584d1627deeb957435a801 is the first bad commit
> commit f9eb8aea2a1e12fc2f584d1627deeb957435a801
> Author: Eric Dumazet <edumazet@google.com>
> Date:   Mon Jun 6 09:37:15 2016 -0700
>
>     net_sched: transform qdisc running bit into a seqcount
>
>     Instead of using a single bit (__QDISC___STATE_RUNNING)
>     in sch->__state, use a seqcount.
>
>     This adds lockdep support, but more importantly it will allow us
>     to sample qdisc/class statistics without having to grab qdisc root lock.
>
>     Signed-off-by: Eric Dumazet <edumazet@google.com>
>     Cc: Cong Wang <xiyou.wangcong@gmail.com>
>     Cc: Jamal Hadi Salim <jhs@mojatatu.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
>
>
> [   49.036323] ======================================================
> [   49.038300] [ INFO: possible circular locking dependency detected ]
> [   49.040313] 4.7.0-rc1+ #252 Not tainted
> [   49.041502] -------------------------------------------------------
> [   49.043518] ping/2143 is trying to acquire lock:
> [

It looks VRF was already lacking annotation for busylock then ?

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 1b214ea4619a5838d6478a2835f8e73da85bef78..3c1c414623196f2bd8bf99c1d84563a54b3f6822
100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -635,6 +635,9 @@ static void vrf_dev_uninit(struct net_device *dev)
        dev->dstats = NULL;
 }

+static struct lock_class_key vrf_tx_busylock;
+static struct lock_class_key vrf_qdisc_running_key;
+
 static int vrf_dev_init(struct net_device *dev)
 {
        struct net_vrf *vrf = netdev_priv(dev);
@@ -658,6 +661,8 @@ static int vrf_dev_init(struct net_device *dev)
        /* similarly, oper state is irrelevant; set to up to avoid confusion */
        dev->operstate = IF_OPER_UP;

+       dev->qdisc_tx_busylock = &vrf_tx_busylock;
+       dev->qdisc_running_key = &vrf_qdisc_running_key;
        return 0;

 out_rth:

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: circular locking dependency splat due to f9eb8aea2a1e
  2016-06-08 20:30 ` Eric Dumazet
@ 2016-06-08 20:31   ` David Ahern
  2016-06-08 20:32     ` Eric Dumazet
  0 siblings, 1 reply; 6+ messages in thread
From: David Ahern @ 2016-06-08 20:31 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev@vger.kernel.org

On 6/8/16 2:30 PM, Eric Dumazet wrote:
>
> It looks VRF was already lacking annotation for busylock then ?

I sent a patch adding IFF_NO_QUEUE flag

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: circular locking dependency splat due to f9eb8aea2a1e
  2016-06-08 20:31   ` David Ahern
@ 2016-06-08 20:32     ` Eric Dumazet
  2016-06-08 20:59       ` David Ahern
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2016-06-08 20:32 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev@vger.kernel.org

On Wed, Jun 8, 2016 at 1:31 PM, David Ahern <dsa@cumulusnetworks.com> wrote:
> On 6/8/16 2:30 PM, Eric Dumazet wrote:
>>
>>
>> It looks VRF was already lacking annotation for busylock then ?
>
>
> I sent a patch adding IFF_NO_QUEUE flag
>

OK, but we still need the fix, in case a qdisc is installed on vrf
(who knows...)

I will submit it formally.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: circular locking dependency splat due to f9eb8aea2a1e
  2016-06-08 20:32     ` Eric Dumazet
@ 2016-06-08 20:59       ` David Ahern
  2016-06-08 21:44         ` Eric Dumazet
  0 siblings, 1 reply; 6+ messages in thread
From: David Ahern @ 2016-06-08 20:59 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev@vger.kernel.org

On 6/8/16 2:32 PM, Eric Dumazet wrote:
> On Wed, Jun 8, 2016 at 1:31 PM, David Ahern <dsa@cumulusnetworks.com> wrote:
>> On 6/8/16 2:30 PM, Eric Dumazet wrote:
>>>
>>>
>>> It looks VRF was already lacking annotation for busylock then ?
>>
>>
>> I sent a patch adding IFF_NO_QUEUE flag
>>
>
> OK, but we still need the fix, in case a qdisc is installed on vrf
> (who knows...)

Sure. What about other virtual devices? e.g., I can generate the same 
splat with ipvlan.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: circular locking dependency splat due to f9eb8aea2a1e
  2016-06-08 20:59       ` David Ahern
@ 2016-06-08 21:44         ` Eric Dumazet
  0 siblings, 0 replies; 6+ messages in thread
From: Eric Dumazet @ 2016-06-08 21:44 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev@vger.kernel.org

All devices that are missing busylock were already buggy ;)

On Wed, Jun 8, 2016 at 1:59 PM, David Ahern <dsa@cumulusnetworks.com> wrote:
> On 6/8/16 2:32 PM, Eric Dumazet wrote:
>>
>> On Wed, Jun 8, 2016 at 1:31 PM, David Ahern <dsa@cumulusnetworks.com>
>> wrote:
>>>
>>> On 6/8/16 2:30 PM, Eric Dumazet wrote:
>>>>
>>>>
>>>>
>>>> It looks VRF was already lacking annotation for busylock then ?
>>>
>>>
>>>
>>> I sent a patch adding IFF_NO_QUEUE flag
>>>
>>
>> OK, but we still need the fix, in case a qdisc is installed on vrf
>> (who knows...)
>
>
> Sure. What about other virtual devices? e.g., I can generate the same splat
> with ipvlan.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-06-08 21:44 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-06-08 18:17 circular locking dependency splat due to f9eb8aea2a1e David Ahern
2016-06-08 20:30 ` Eric Dumazet
2016-06-08 20:31   ` David Ahern
2016-06-08 20:32     ` Eric Dumazet
2016-06-08 20:59       ` David Ahern
2016-06-08 21:44         ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).