From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: [PATCH net-next 2/4] net: vrf: call netdev_lockdep_set_classes() Date: Wed, 8 Jun 2016 18:43:48 -0700 Message-ID: References: <1465430127-22993-1-git-send-email-edumazet@google.com> <1465430127-22993-2-git-send-email-edumazet@google.com> <31939b76-efa5-f1dd-3a8b-92d9343df7ee@cumulusnetworks.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Cc: "David S . Miller" , netdev , Eric Dumazet To: David Ahern Return-path: Received: from mail-yw0-f171.google.com ([209.85.161.171]:35662 "EHLO mail-yw0-f171.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933838AbcFIBnu (ORCPT ); Wed, 8 Jun 2016 21:43:50 -0400 Received: by mail-yw0-f171.google.com with SMTP id z186so6415488ywd.2 for ; Wed, 08 Jun 2016 18:43:50 -0700 (PDT) In-Reply-To: <31939b76-efa5-f1dd-3a8b-92d9343df7ee@cumulusnetworks.com> Sender: netdev-owner@vger.kernel.org List-ID: On Wed, Jun 8, 2016 at 6:26 PM, David Ahern wrote: > On 6/8/16 5:55 PM, Eric Dumazet wrote: >> >> In case a qdisc is used on a vrf device, we need to use different >> lockdep classes to avoid false positives. >> >> Fixes: f9eb8aea2a1e ("net_sched: transform qdisc running bit into a >> seqcount") >> Reported-by: David Ahern >> Signed-off-by: Eric Dumazet >> --- >> drivers/net/vrf.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c >> index 1b214ea4619a..ee6bc1c5c1ce 100644 >> --- a/drivers/net/vrf.c >> +++ b/drivers/net/vrf.c >> @@ -657,7 +657,7 @@ static int vrf_dev_init(struct net_device *dev) >> >> /* similarly, oper state is irrelevant; set to up to avoid >> confusion */ >> dev->operstate = IF_OPER_UP; >> - >> + netdev_lockdep_set_classes(dev); >> return 0; >> >> out_rth: >> > > Still see the problem; all 4 patches applied, make clean followed by a build > just to make sure. > > [ 90.956522] > [ 90.956952] ====================================================== > [ 90.958441] [ INFO: possible circular locking dependency detected ] > [ 90.959820] 4.7.0-rc1+ #271 Not tainted > [ 90.960672] ------------------------------------------------------- > [ 90.962051] ping/1585 is trying to acquire lock: > [ 90.962997] (&(&list->lock)->rlock#3){+.-...}, at: [] > __dev_queue_xmit+0x3d3/0x751 > [ 90.965033] > [ 90.965033] but task is already holding lock: > [ 90.966200] (dev->qdisc_running_key ?: &qdisc_running_key#2){+.....}, > at: [] __dev_queue_xmit+0x428/0x751 > [ 90.968591] > [ 90.968591] which lock already depends on the new lock. > [ 90.968591] > [ 90.970014] > [ 90.970014] the existing dependency chain (in reverse order) is: > [ 90.971287] > -> #1 (dev->qdisc_running_key ?: &qdisc_running_key#2){+.....}: > [ 90.972611] [] __lock_acquire+0x5e4/0x690 > [ 90.973712] [] lock_acquire+0x140/0x1d8 > [ 90.974668] [] write_seqcount_begin+0x21/0x24 > [ 90.975673] [] __dev_queue_xmit+0x428/0x751 > [ 90.976608] [] dev_queue_xmit+0xb/0xd > [ 90.977484] [] neigh_resolve_output+0x113/0x12e > [ 90.978497] [] > dst_neigh_output.isra.20+0x13b/0x148 > [ 90.979548] [] vrf_finish_output+0x104/0x139 > [ 90.980504] [] vrf_output+0x5c/0xc1 > [ 90.981365] [] dst_output+0x2b/0x30 > [ 90.982240] [] ip_local_out+0x2a/0x31 > [ 90.983109] [] ip_send_skb+0x14/0x38 > [ 90.983974] [] ip_push_pending_frames+0x2e/0x31 > [ 90.984887] [] raw_sendmsg+0x788/0x9d2 > [ 90.985658] [] inet_sendmsg+0x35/0x5c > [ 90.986452] [] sock_sendmsg_nosec+0x12/0x1d > [ 90.987275] [] ___sys_sendmsg+0x1b1/0x21f > [ 90.988094] [] __sys_sendmsg+0x40/0x5e > [ 90.988882] [] SyS_sendmsg+0x14/0x16 > [ 90.989647] [] > entry_SYSCALL_64_fastpath+0x1f/0xbd > [ 90.990590] > -> #0 (&(&list->lock)->rlock#3){+.-...}: > [ 90.991368] [] > validate_chain.isra.37+0x7c8/0xa5b > [ 90.992268] [] __lock_acquire+0x5e4/0x690 > [ 90.993114] [] lock_acquire+0x140/0x1d8 > [ 90.993955] [] _raw_spin_lock+0x2f/0x65 > [ 90.994751] [] __dev_queue_xmit+0x3d3/0x751 > [ 90.995574] [] dev_queue_xmit+0xb/0xd > [ 90.996338] [] arp_xmit+0x32/0x7e > [ 90.997062] [] arp_send_dst+0x3c/0x42 > [ 90.997834] [] arp_solicit+0x27b/0x28e > [ 90.998631] [] neigh_probe+0x4a/0x5e > [ 90.999407] [] __neigh_event_send+0x1d0/0x21b > [ 91.000264] [] neigh_event_send+0x2b/0x2d > [ 91.001075] [] neigh_resolve_output+0x18/0x12e > [ 91.001956] [] ip_finish_output2+0x3c0/0x41c > [ 91.002803] [] ip_finish_output+0x132/0x13e > [ 91.003630] [] > NF_HOOK_COND.constprop.43+0x21/0x8a > [ 91.004550] [] ip_output+0x65/0x6a > [ 91.005293] [] dst_output+0x2b/0x30 > [ 91.006066] [] ip_local_out+0x2a/0x31 > [ 91.006821] [] vrf_xmit+0x1f6/0x417 > [ 91.007554] [] dev_hard_start_xmit+0x154/0x337 > [ 91.008410] [] sch_direct_xmit+0x98/0x16c > [ 91.009211] [] __dev_queue_xmit+0x453/0x751 > [ 91.010043] [] dev_queue_xmit+0xb/0xd > [ 91.010806] [] neigh_resolve_output+0x113/0x12e > [ 91.011690] [] > dst_neigh_output.isra.20+0x13b/0x148 > [ 91.012600] [] vrf_finish_output+0x104/0x139 > [ 91.013431] [] vrf_output+0x5c/0xc1 > [ 91.014172] [] dst_output+0x2b/0x30 > [ 91.014906] [] ip_local_out+0x2a/0x31 > [ 91.015659] [] ip_send_skb+0x14/0x38 > [ 91.016398] [] ip_push_pending_frames+0x2e/0x31 > [ 91.017259] [] raw_sendmsg+0x788/0x9d2 > [ 91.018033] [] inet_sendmsg+0x35/0x5c > [ 91.018797] [] sock_sendmsg_nosec+0x12/0x1d > [ 91.019615] [] ___sys_sendmsg+0x1b1/0x21f > [ 91.020417] [] __sys_sendmsg+0x40/0x5e > [ 91.021180] [] SyS_sendmsg+0x14/0x16 > [ 91.021927] [] > entry_SYSCALL_64_fastpath+0x1f/0xbd > [ 91.022820] > [ 91.022820] other info that might help us debug this: > [ 91.022820] > [ 91.023848] Possible unsafe locking scenario: > [ 91.023848] > [ 91.024595] CPU0 CPU1 > [ 91.025173] ---- ---- > [ 91.025751] lock(dev->qdisc_running_key ?: &qdisc_running_key#2); > [ 91.026600] lock(&(&list->lock)->rlock#3); > [ 91.027505] lock(dev->qdisc_running_key ?: &qdisc_running_key#2); > [ 91.028657] lock(&(&list->lock)->rlock#3); > [ 91.029257] > [ 91.029257] *** DEADLOCK *** > [ 91.029257] > [ 91.030008] 6 locks held by ping/1585: > [ 91.030485] #0: (sk_lock-AF_INET){+.+.+.}, at: [] > raw_sendmsg+0x729/0x9d2 > [ 91.031643] #1: (rcu_read_lock_bh){......}, at: [] > rcu_lock_acquire+0x0/0x20 > [ 91.032834] #2: (rcu_read_lock_bh){......}, at: [] > rcu_lock_acquire+0x0/0x20 > [ 91.034040] #3: (dev->qdisc_running_key ?: > &qdisc_running_key#2){+.....}, at: [] > __dev_queue_xmit+0x428/0x751 > [ 91.035622] #4: (rcu_read_lock_bh){......}, at: [] > rcu_lock_acquire+0x0/0x20 > [ 91.036823] #5: (rcu_read_lock_bh){......}, at: [] > rcu_lock_acquire+0x0/0x20 > [ 91.038016] > [ 91.038016] stack backtrace: > [ 91.038574] CPU: 6 PID: 1585 Comm: ping Not tainted 4.7.0-rc1+ #271 > [ 91.039366] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > 1.7.5-20140531_083030-gandalf 04/01/2014 > [ 91.040634] 0000000000000000 ffff8800b84c72d0 ffffffff8127e395 > ffffffff8254b8c0 > [ 91.041632] ffffffff8254b8c0 ffff8800b84c7310 ffffffff81083461 > ffff8800b9b5e240 > [ 91.042645] ffff8800b9b5ea60 0000000000000004 ffff8800b9b5ea98 > ffff8800b9b5e240 > [ 91.043645] Call Trace: > [ 91.043968] [] dump_stack+0x81/0xb6 > [ 91.044620] [] print_circular_bug+0x1f6/0x204 > [ 91.045377] [] validate_chain.isra.37+0x7c8/0xa5b > [ 91.046193] [] ? paravirt_sched_clock+0x9/0xd > [ 91.046951] [] __lock_acquire+0x5e4/0x690 > [ 91.047666] [] ? __lock_acquire+0x5e4/0x690 > [ 91.048404] [] lock_acquire+0x140/0x1d8 > [ 91.049104] [] ? __dev_queue_xmit+0x3d3/0x751 > [ 91.049868] [] _raw_spin_lock+0x2f/0x65 > [ 91.050564] [] ? __dev_queue_xmit+0x3d3/0x751 > [ 91.051322] [] __dev_queue_xmit+0x3d3/0x751 > [ 91.052061] [] ? __alloc_skb+0xae/0x19c > [ 91.052760] [] dev_queue_xmit+0xb/0xd > [ 91.053431] [] arp_xmit+0x32/0x7e > [ 91.054069] [] arp_send_dst+0x3c/0x42 > [ 91.054741] [] arp_solicit+0x27b/0x28e > [ 91.055423] [] neigh_probe+0x4a/0x5e > [ 91.056097] [] __neigh_event_send+0x1d0/0x21b > [ 91.056853] [] neigh_event_send+0x2b/0x2d > [ 91.057569] [] neigh_resolve_output+0x18/0x12e > [ 91.058341] [] ip_finish_output2+0x3c0/0x41c > [ 91.059087] [] ip_finish_output+0x132/0x13e > [ 91.059820] [] NF_HOOK_COND.constprop.43+0x21/0x8a > [ 91.060634] [] ? rcu_read_unlock+0x5d/0x5f > [ 91.061362] [] ? nf_hook_slow+0x94/0x9e > [ 91.062061] [] ip_output+0x65/0x6a > [ 91.062703] [] dst_output+0x2b/0x30 > [ 91.063350] [] ip_local_out+0x2a/0x31 > [ 91.064025] [] vrf_xmit+0x1f6/0x417 > [ 91.064675] [] ? __lock_is_held+0x38/0x50 > [ 91.065393] [] dev_hard_start_xmit+0x154/0x337 > [ 91.066170] [] sch_direct_xmit+0x98/0x16c > [ 91.066916] [] __dev_queue_xmit+0x453/0x751 > [ 91.067663] [] ? eth_header+0x27/0xaf > [ 91.068353] [] dev_queue_xmit+0xb/0xd > [ 91.069031] [] neigh_resolve_output+0x113/0x12e > [ 91.069834] [] dst_neigh_output.isra.20+0x13b/0x148 > [ 91.070655] [] vrf_finish_output+0x104/0x139 > [ 91.071400] [] vrf_output+0x5c/0xc1 > [ 91.072055] [] ? __ip_local_out+0x9e/0xa9 > [ 91.072769] [] dst_output+0x2b/0x30 > [ 91.073419] [] ip_local_out+0x2a/0x31 > [ 91.074093] [] ip_send_skb+0x14/0x38 > [ 91.074756] [] ip_push_pending_frames+0x2e/0x31 > [ 91.075533] [] raw_sendmsg+0x788/0x9d2 > [ 91.076224] [] ? paravirt_sched_clock+0x9/0xd > [ 91.076984] [] ? native_cpu_up+0x214/0x7c1 > [ 91.077710] [] ? lock_release+0x20f/0x4c4 > [ 91.078446] [] inet_sendmsg+0x35/0x5c > [ 91.079135] [] sock_sendmsg_nosec+0x12/0x1d > [ 91.079873] [] ___sys_sendmsg+0x1b1/0x21f > [ 91.080607] [] ? lock_release+0x20f/0x4c4 > [ 91.081333] [] ? __mutex_unlock_slowpath+0x152/0x15f > [ 91.082191] [] ? mutex_unlock+0x9/0xb > [ 91.082872] [] ? lock_release+0x20f/0x4c4 > [ 91.083601] [] ? __fget_light+0x48/0x6f > [ 91.084306] [] __sys_sendmsg+0x40/0x5e > [ 91.084995] [] ? __sys_sendmsg+0x40/0x5e > [ 91.085704] [] SyS_sendmsg+0x14/0x16 > [ 91.086371] [] entry_SYSCALL_64_fastpath+0x1f/0xbd > [ 91.087206] [] ? trace_hardirqs_off_caller+0xbc/0x122 For this one, it looks vrf misses the _xmit_lock lockdep support. We might need to factorize the code found for example in bond_set_lockdep_class_one() Have you run lockdep before ? Strange that these issues were not spotted earlier.