All of lore.kernel.org
 help / color / mirror / Atom feed
From: Potnuri Bharat Teja <bharat-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
To: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: "linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	SWise OGC
	<swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
Subject: Re: Bug Report: possible circular locking issue
Date: Tue, 5 Sep 2017 17:10:29 +0530	[thread overview]
Message-ID: <20170905114028.GA1959@chelsio.com> (raw)
In-Reply-To: <20170831152430.GA15173-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>

On Thursday, August 08/31/17, 2017 at 20:54:31 +0530, Potnuri Bharat Teja wrote:
> Hi Doug,
> Could you please share the config you have on the Fedora box.
> I tried enabling lock debug on 4.13-rc7 but I dont see the warning.
Nevermind, I now see the issue on my machines.
Thanks,
Bharat.
> Thanks,
> Bharat.
> On Tuesday, August 08/29/17, 2017 at 00:42:09 +0530, Doug Ledford wrote:
> > On Mon, 2017-08-28 at 12:38 -0400, Doug Ledford wrote:
> > > Resend from my work email address:
> > > 
> > > 
> > > I ran across this while testing a 4.13-rc7 kernel + the rdma next
> > > code.
> > 
> > This reproduces on a stock 4.13-rc7 kernel.  But, across all the stuff
> > I've booted it on so far, it only shows up on cxgb4 devices, so I think
> > this is a cxgb4 specific issue.  Steve, can you look into this? 
> > 
> > My basic config is a stock Fedora rawhide box and I took the Fedora
> > kernel config and copied it into my git repo checkout of v4.13-rc7 and
> > compiled using that config.  If you need any more info, I can try to
> > get it to you.
> > 
> > The machine environment that produces this includes:
> > 
> > base Ethernet device + 2 vlan devices
> > srp target mode is in use (kernel LIO support), the iwarp device isn't
> > specifically configured for use, but srpt tries to set it up anyway
> > iser target mode is in use (kernel LIO support again, single tpg with
> > wildcard address so the iwarp devices are in use)
> > nfsordma in use and exporting several mount points, again with wildcard
> > address so all RDMA devices are candidates
> > 
> > With this environment, I get the trackback on bootup every time.  It
> > then proceeds to run.  I haven't tested it under load to see how it
> > does, but it's up anyway.
> > 
> > >  I don't have the time to track this down before going on PTO, so I'm
> > > putting it out here for others to look at.
> > > 
> > > This machine holds multiple connections in it:
> > > 
> > > ib0/ib1 -> dual port qib
> > > roce -> ocrdma
> > > iwarp -> cxgb4
> > > 
> > > During bootup I got this:
> > > 
> > > [   37.244753] iw_cxgb4: 0000:83:00.4: Up
> > > [   37.250168] iw_cxgb4: 0000:83:00.4: On-Chip Queues not supported
> > > on
> > > this deve
> > > 
> > > [   37.263207] ======================================================
> > > [   37.270656] WARNING: possible circular locking dependency detected
> > > [   37.278101] 4.13.0-rc7+ #130 Not tainted
> > > [   37.283019] ------------------------------------------------------
> > > [   37.290470] NetworkManager/2196 is trying to acquire lock:
> > > [   37.297143]  (device_mutex){+.+.+.}, at: [<ffffffffc08d2465>]
> > > ib_register_de]
> > > [   37.308026] 
> > >                but task is already holding lock:
> > > [   37.315694]  (uld_mutex){+.+.+.}, at: [<ffffffffc0574fd4>]
> > > notify_ulds.isra.]
> > > [   37.326108] 
> > >                which lock already depends on the new lock.
> > > 
> > > [   37.337689] 
> > >                the existing dependency chain (in reverse order) is:
> > > [   37.347301] 
> > >                -> #2 (uld_mutex){+.+.+.}:
> > > [   37.354048]        lock_acquire+0xbd/0x200
> > > [   37.359083]        __mutex_lock+0x88/0x950
> > > [   37.364122]        mutex_lock_nested+0x1b/0x20
> > > [   37.369690]        cxgb_up+0x27/0x840 [cxgb4]
> > > [   37.375623]        cxgb_open+0x34/0x90 [cxgb4]
> > > [   37.381168]        __dev_open+0xc9/0x140
> > > [   37.386039]        __dev_change_flags+0x9d/0x160
> > > [   37.391686]        dev_change_flags+0x29/0x60
> > > [   37.397069]        do_setlink+0x4bf/0xc80
> > > [   37.402024]        rtnl_newlink+0x512/0x8a0
> > > [   37.407177]        rtnetlink_rcv_msg+0xac/0x240
> > > [   37.412702]        netlink_rcv_skb+0xed/0x120
> > > [   37.418023]        rtnetlink_rcv+0x2a/0x40
> > > [   37.423060]        netlink_unicast+0x182/0x220
> > > [   37.428482]        netlink_sendmsg+0x2e9/0x3e0
> > > [   37.433868]        sock_sendmsg+0x38/0x50
> > > [   37.438766]        ___sys_sendmsg+0x2b2/0x2d0
> > > [   37.444052]        __sys_sendmsg+0x54/0x90
> > > [   37.449047]        SyS_sendmsg+0x12/0x20
> > > [   37.453848]        entry_SYSCALL_64_fastpath+0x1f/0xbe
> > > [   37.460007] 
> > >                -> #1 (rtnl_mutex){+.+.+.}:
> > > [   37.466764]        lock_acquire+0xbd/0x200
> > > [   37.471745]        __mutex_lock+0x88/0x950
> > > [   37.476853]        mutex_lock_nested+0x1b/0x20
> > > [   37.482336]        rtnl_lock+0x17/0x20
> > > [   37.487038]        enum_all_gids_of_dev_cb+0x25/0xd0 [ib_core]
> > > [   37.494509]        ib_enum_roce_netdev+0xe7/0x100 [ib_core]
> > > [   37.501256]        roce_rescan_device+0x21/0x30 [ib_core]
> > > [   37.507680]        ib_cache_setup_one+0x1f1/0x350 [ib_core]
> > > [   37.514297]        ib_register_device+0x444/0x720 [ib_core]
> > > [   37.520900]        ocrdma_add+0x46f/0x820 [ocrdma]
> > > [   37.526622]        _be_roce_dev_add+0x17d/0x1e0 [be2net]
> > > [   37.532929]        be_roce_register_driver+0x4a/0x90 [be2net]
> > > [   37.539716]        ib_umad_poll+0x15/0x50 [ib_umad]
> > > [   37.545527]        do_one_initcall+0x51/0x1a9
> > > [   37.550881]        do_init_module+0x60/0x1ff
> > > [   37.556129]        load_module+0x257e/0x2b10
> > > [   37.561375]        SYSC_finit_module+0xa9/0x100
> > > [   37.566880]        SyS_finit_module+0xe/0x10
> > > [   37.572099]        do_syscall_64+0x6c/0x1d0
> > > [   37.577178]        return_from_SYSCALL_64+0x0/0x7a
> > > [   37.583232] 
> > >                -> #0 (device_mutex){+.+.+.}:
> > > [   37.590704]        __lock_acquire+0x153c/0x1550
> > > [   37.596442]        lock_acquire+0xbd/0x200
> > > [   37.601399]        __mutex_lock+0x88/0x950
> > > [   37.606346]        mutex_lock_nested+0x1b/0x20
> > > [   37.611669]        ib_register_device+0xb5/0x720 [ib_core]
> > > [   37.618170]        c4iw_register_device+0x3a0/0x460 [iw_cxgb4]
> > > [   37.625061]        c4iw_uld_state_change+0x7a4/0xcd0 [iw_cxgb4]
> > > [   37.632108]        notify_ulds.isra.28+0x3f/0x60 [cxgb4]
> > > [   37.638410]        cxgb_up+0x70b/0x840 [cxgb4]
> > > [   37.643946]        cxgb_open+0x34/0x90 [cxgb4]
> > > [   37.649265]        __dev_open+0xc9/0x140
> > > [   37.653977]        __dev_change_flags+0x9d/0x160
> > > [   37.659613]        dev_change_flags+0x29/0x60
> > > [   37.665046]        do_setlink+0x4bf/0xc80
> > > [   37.669851]        rtnl_newlink+0x512/0x8a0
> > > [   37.675090]        rtnetlink_rcv_msg+0xac/0x240
> > > [   37.680717]        netlink_rcv_skb+0xed/0x120
> > > [   37.685937]        rtnetlink_rcv+0x2a/0x40
> > > [   37.691081]        netlink_unicast+0x182/0x220
> > > [   37.696607]        netlink_sendmsg+0x2e9/0x3e0
> > > [   37.702136]        sock_sendmsg+0x38/0x50
> > > [   37.707180]        ___sys_sendmsg+0x2b2/0x2d0
> > > [   37.712639]        __sys_sendmsg+0x54/0x90
> > > [   37.717542]        SyS_sendmsg+0x12/0x20
> > > [   37.722249]        entry_SYSCALL_64_fastpath+0x1f/0xbe
> > > [   37.728326] 
> > >                other info that might help us debug this:
> > > 
> > > [   37.738479] Chain exists of:
> > >                  device_mutex --> rtnl_mutex --> uld_mutex
> > > 
> > > [   37.750153]  Possible unsafe locking scenario:
> > > 
> > > [   37.757412]        CPU0                    CPU1
> > > [   37.762894]        ----                    ----
> > > [   37.768381]   lock(uld_mutex);
> > > [   37.772149]                                lock(rtnl_mutex);
> > > [   37.778830]                                lock(uld_mutex);
> > > [   37.785413]   lock(device_mutex);
> > > [   37.789462] 
> > >                 *** DEADLOCK ***
> > > 
> > > [   37.797070] 2 locks held by NetworkManager/2196:
> > > [   37.802557]  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff9e83457b>]
> > > rtnetlink_r0
> > > [   37.812213]  #1:  (uld_mutex){+.+.+.}, at: [<ffffffffc0574fd4>]
> > > notify_ulds.]
> > > [   37.822846] 
> > >                stack backtrace:
> > > [   37.828894] CPU: 17 PID: 2196 Comm: NetworkManager Not tainted
> > > 4.13.0-rc7+ #0
> > > [   37.837655] Hardware name: Dell Inc. PowerEdge R730xd/0599V5, BIOS
> > > 2.0.2 03/6
> > > [   37.846551] Call Trace:
> > > [   37.849630]  dump_stack+0x85/0xcc
> > > [   37.853679]  print_circular_bug+0x200/0x20e
> > > [   37.858806]  __lock_acquire+0x153c/0x1550
> > > [   37.863738]  lock_acquire+0xbd/0x200
> > > [   37.868138]  ? ib_register_device+0xb5/0x720 [ib_core]
> > > [   37.874275]  ? ib_register_device+0xb5/0x720 [ib_core]
> > > [   37.880403]  __mutex_lock+0x88/0x950
> > > [   37.884782]  ? ib_register_device+0xb5/0x720 [ib_core]
> > > [   37.890914]  ? ib_register_device+0xb5/0x720 [ib_core]
> > > [   37.897108]  ? find_held_lock+0x40/0xb0
> > > [   37.901838]  mutex_lock_nested+0x1b/0x20
> > > [   37.906669]  ib_register_device+0xb5/0x720 [ib_core]
> > > [   37.912669]  ? c4iw_register_device+0x2f6/0x460 [iw_cxgb4]
> > > [   37.919261]  ? rcu_read_lock_sched_held+0x98/0xa0
> > > [   37.924973]  ? kmem_cache_alloc_trace+0x278/0x2e0
> > > [   37.930691]  ? c4iw_register_device+0x2f6/0x460 [iw_cxgb4]
> > > [   37.937293]  c4iw_register_device+0x3a0/0x460 [iw_cxgb4]
> > > [   37.943702]  c4iw_uld_state_change+0x7a4/0xcd0 [iw_cxgb4]
> > > [   37.950213]  ? notify_ulds.isra.28+0x24/0x60 [cxgb4]
> > > [   37.956244]  notify_ulds.isra.28+0x3f/0x60 [cxgb4]
> > > [   37.962083]  cxgb_up+0x70b/0x840 [cxgb4]
> > > [   37.966951]  ? cxgb4_ofld_send+0x20/0x20 [cxgb4]
> > > [   37.972594]  cxgb_open+0x34/0x90 [cxgb4]
> > > [   37.977462]  __dev_open+0xc9/0x140
> > > [   37.981741]  __dev_change_flags+0x9d/0x160
> > > [   37.986794]  dev_change_flags+0x29/0x60
> > > [   37.991557]  do_setlink+0x4bf/0xc80
> > > [   37.995931]  rtnl_newlink+0x512/0x8a0
> > > [   38.000500]  ? rtnl_newlink+0x104/0x8a0
> > > [   38.005263]  ? check_usage+0xb5/0x490
> > > [   38.009826]  ? ns_capable_common+0x7a/0x90
> > > [   38.014876]  ? ns_capable+0x13/0x20
> > > [   38.019253]  rtnetlink_rcv_msg+0xac/0x240
> > > [   38.024215]  ? rtnetlink_rcv+0x1b/0x40
> > > [   38.028879]  ? netlink_deliver_tap+0x7a/0x2c0
> > > [   38.034232]  ? rtnl_newlink+0x8a0/0x8a0
> > > [   38.038995]  netlink_rcv_skb+0xed/0x120
> > > [   38.043760]  rtnetlink_rcv+0x2a/0x40
> > > [   38.048244]  netlink_unicast+0x182/0x220
> > > [   38.053119]  netlink_sendmsg+0x2e9/0x3e0
> > > [   38.057985]  sock_sendmsg+0x38/0x50
> > > [   38.062243]  ___sys_sendmsg+0x2b2/0x2d0
> > > [   38.066877]  ? find_held_lock+0x40/0xb0
> > > [   38.071499]  ? __fget+0x102/0x210
> > > [   38.075647]  ? __fget+0x121/0x210
> > > [   38.079780]  ? __fget+0x5/0x210
> > > [   38.083706]  ? __fget_light+0x25/0x70
> > > [   38.088208]  __sys_sendmsg+0x54/0x90
> > > [   38.092606]  SyS_sendmsg+0x12/0x20
> > > [   38.096810]  entry_SYSCALL_64_fastpath+0x1f/0xbe
> > > [   38.102379] RIP: 0033:0x7f146e486974
> > > [   38.106778] RSP: 002b:00007ffd0cd3ee00 EFLAGS: 00000293 ORIG_RAX:
> > > 0000000000e
> > > [   38.115654] RAX: ffffffffffffffda RBX: 000055698f9641f9 RCX:
> > > 00007f146e486974
> > > [   38.124058] RDX: 0000000000000000 RSI: 00007ffd0cd3ee50 RDI:
> > > 0000000000000007
> > > [   38.132474] RBP: 00007ffd0cd3f2e0 R08: 0000000000000000 R09:
> > > 000055699118c300
> > > [   38.140884] R10: 0000000000000001 R11: 0000000000000293 R12:
> > > 0000000000000001
> > > [   38.149306] R13: 0000000000000001 R14: 00007ffd0cd3f010 R15:
> > > 000055698fbda5c0
> > > [   38.160359] ib_srpt srpt_add_one(cxgb4_0) failed.
> > > 
> > -- 
> > Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> >     GPG KeyID: B826A3330E572FDD
> >     Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

      parent reply	other threads:[~2017-09-05 11:40 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-28 16:38 Bug Report: possible circular locking issue Doug Ledford
     [not found] ` <1503938316.78641.98.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-08-28 19:12   ` Doug Ledford
     [not found]     ` <1503947529.78641.108.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-08-28 19:18       ` Steve Wise
2017-08-28 19:28         ` Doug Ledford
     [not found]           ` <1503948482.78641.110.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-08-28 19:44             ` Steve Wise
2017-08-28 20:03               ` Doug Ledford
2017-08-31 15:24       ` Potnuri Bharat Teja
     [not found]         ` <20170831152430.GA15173-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
2017-09-05 11:40           ` Potnuri Bharat Teja [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170905114028.GA1959@chelsio.com \
    --to=bharat-ut6up61k2wzbdgjk7y7tuq@public.gmane.org \
    --cc=dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.