public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
From: Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org>
To: Yuval Shaia <yuval.shaia-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Cc: Erez Shitrit
	<erezsh-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>,
	Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	"linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	SiteGround Operations
	<operations-/eCPMmvKun9pLGFMi4vTTA@public.gmane.org>
Subject: Re: Hang in ipoib_mcast_stop_thread
Date: Mon, 6 Jun 2016 15:36:55 +0300	[thread overview]
Message-ID: <57556E67.4050904@kyup.com> (raw)
In-Reply-To: <20160606122558.GB10894-Hxa29pjIrETlQW142y8m19+IiqhCXseY@public.gmane.org>



On 06/06/2016 03:25 PM, Yuval Shaia wrote:
> On Mon, Jun 06, 2016 at 03:11:18PM +0300, Nikolay Borisov wrote:
>>
>>
>> On 06/06/2016 02:51 PM, Erez Shitrit wrote:
>>> Hi Nikolay
>>>
>>> What was the scenario? (Do you have a way to reproduce that?)
>>
>> I have no way of reliably reproducing that, but I have experienced
>> misteryous hangs in the ipoib stack e.g.
>>
>> https://marc.info/?l=linux-rdma&m=145915284709901
>>
>> when the connection to the infiniband network is lost. Unfortunately I
>> cannot even figure out where to being debugging this ;(
>>
>> As a matter of fact I currently have one server which complains:
>>
>> ib0: transmit timeout: latency 139718550 msecs
>> ib0: queue stopped 1, tx_head 242, tx_tail 114
> 
> What i can tell from our experience is that this issue is not new, we have
> it with older kernels (<3).
> Also, not sure it is an issue with HCA vendor as we see it with CX3.

Are you using ipoib or pure RDMA?

> 
> This is a very slippery bug which we find it hard to reproduce.
> 
>>
>> yet, "iblinkinfo" can show all the nodes in the infiniband network.
>>
>>> Which IB card are you using?
>>
>> ibstat
>> CA 'qib0'
>> 	CA type: InfiniPath_QLE7342
>> 	Number of ports: 2
>> 	Firmware version:
>> 	Hardware version: 2
>> 	Node GUID: 0x001175000077b918
>> 	System image GUID: 0x001175000077b918
>> 	Port 1:
>> 		State: Active
>> 		Physical state: LinkUp
>> 		Rate: 40
>> 		Base lid: 68
>> 		LMC: 0
>> 		SM lid: 61
>> 		Capability mask: 0x07610868
>> 		Port GUID: 0x001175000077b918
>> 		Link layer: InfiniBand
>>
>>
>>
>>>
>>> Thanks, Erez
>>>
>>> On Mon, Jun 6, 2016 at 12:09 PM, Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org> wrote:
>>>> Hello,
>>>>
>>>> [Tejun, I have cc'ed you since I'm not familiar with the internals of
>>>> the workqueue code and cannot comment whether the the mcast_task was
>>>> queued or and whether there were any tasks in the workqueue based on the
>>>> structures I've presented below, could you please comment on that?]
>>>>
>>>> I've been running an infiniband network on 4.4.10 and recently due to
>>>> the infiniband failing (I presume) and having the following in my logs:
>>>>
>>>> [128340.462147] ib0: transmit timeout: latency 45014865 msecs
>>>> [128340.462151] ib0: queue stopped 1, tx_head 711, tx_tail 583
>>>> [128341.461122] ib0: transmit timeout: latency 45015864 msecs
>>>> [128341.461126] ib0: queue stopped 1, tx_head 711, tx_tail 583
>>>> [128342.461053] ib0: transmit timeout: latency 45016864 msecs
>>>> [128342.461056] ib0: queue stopped 1, tx_head 711, tx_tail 583
>>>>
>>>> I had various commands hang due to rtnl_lock being held by an ip
>>>> command, trying to shutdown the infiniband interface:
>>>>
>>>> PID: 30572 TASK: ffff881bb6bf6e00 CPU: 15 COMMAND: "ip"
>>>> #0 [ffff88027f50b390] __schedule at ffffffff81631421
>>>> #1 [ffff88027f50b430] schedule at ffffffff81631cc7
>>>> #2 [ffff88027f50b450] schedule_timeout at ffffffff81634c89
>>>> #3 [ffff88027f50b4e0] wait_for_completion at ffffffff81632d73
>>>> #4 [ffff88027f50b550] flush_workqueue at ffffffff8106c761
>>>> #5 [ffff88027f50b620] ipoib_mcast_stop_thread at ffffffffa02e0bf2 [ib_ipoib]
>>>> #6 [ffff88027f50b650] ipoib_ib_dev_down at ffffffffa02de7d6 [ib_ipoib]
>>>> #7 [ffff88027f50b670] ipoib_stop at ffffffffa02dc208 [ib_ipoib]
>>>> #8 [ffff88027f50b6a0] __dev_close_many at ffffffff81562692
>>>> #9 [ffff88027f50b6c0] __dev_close at ffffffff81562716
>>>> #10 [ffff88027f50b6f0] __dev_change_flags at ffffffff815630fc
>>>> #11 [ffff88027f50b730] dev_change_flags at ffffffff81563207
>>>> #12 [ffff88027f50b760] do_setlink at ffffffff81576f5f
>>>> #13 [ffff88027f50b860] rtnl_newlink at ffffffff81578afb
>>>> #14 [ffff88027f50bb10] rtnetlink_rcv_msg at ffffffff81577ae5
>>>> #15 [ffff88027f50bb90] netlink_rcv_skb at ffffffff8159a373
>>>> #16 [ffff88027f50bbc0] rtnetlink_rcv at ffffffff81577bb5
>>>> #17 [ffff88027f50bbe0] netlink_unicast at ffffffff81599e98
>>>> #18 [ffff88027f50bc40] netlink_sendmsg at ffffffff8159aabe
>>>> #19 [ffff88027f50bd00] sock_sendmsg at ffffffff81548ce9
>>>> #20 [ffff88027f50bd20] ___sys_sendmsg at ffffffff8154ae78
>>>> #21 [ffff88027f50beb0] __sys_sendmsg at ffffffff8154b059
>>>> #22 [ffff88027f50bf40] sys_sendmsg at ffffffff8154b0a9
>>>> #23 [ffff88027f50bf50] entry_SYSCALL_64_fastpath at ffffffff81635c57
>>>>
>>>>
>>>> So clearly ipoib_mcast_stop_thread has hung on trying to stop the
>>>> multicast thread. Here is the state of the ipoib_wq:
>>>>
>>>> struct workqueue_struct {
>>>>   pwqs = {
>>>>     next = 0xffff883ff1196770,
>>>>     prev = 0xffff883ff1196770
>>>>   },
>>>>   list = {
>>>>     next = 0xffff883fef5bea10,
>>>>     prev = 0xffff883fef201010
>>>>   },
>>>>   mutex = {
>>>>     count = {
>>>>       counter = 1
>>>>     },
>>>>     wait_lock = {
>>>>       {
>>>>         rlock = {
>>>>           raw_lock = {
>>>>             val = {
>>>>               counter = 0
>>>>             }
>>>>           }
>>>>         }
>>>>       }
>>>>     },
>>>>     wait_list = {
>>>>       next = 0xffff883fef200c28,
>>>>       prev = 0xffff883fef200c28
>>>>     },
>>>>     owner = 0x0,
>>>>     osq = {
>>>>       tail = {
>>>>         counter = 0
>>>>       }
>>>>     }
>>>>   },
>>>>   work_color = 1,
>>>>   flush_color = 0,
>>>>   nr_pwqs_to_flush = {
>>>>     counter = 1
>>>>   },
>>>>   first_flusher = 0xffff88027f50b568,
>>>>   flusher_queue = {
>>>>     next = 0xffff883fef200c60,
>>>>     prev = 0xffff883fef200c60
>>>>   },
>>>>   flusher_overflow = {
>>>>     next = 0xffff883fef200c70,
>>>>     prev = 0xffff883fef200c70
>>>>   },
>>>>   maydays = {
>>>>     next = 0xffff883fef200c80,
>>>>     prev = 0xffff883fef200c80
>>>>   },
>>>>   rescuer = 0xffff883feb073680,
>>>>   nr_drainers = 0,
>>>>   saved_max_active = 1,
>>>>   unbound_attrs = 0xffff883fef2b5c20,
>>>>   dfl_pwq = 0xffff883ff1196700,
>>>>   wq_dev = 0x0,
>>>>   name =
>>>> "ipoib_wq\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
>>>>   rcu = {
>>>>     next = 0x0,
>>>>     func = 0x0
>>>>   },
>>>>   flags = 131082,
>>>>   cpu_pwqs = 0x0,
>>>>
>>>> Also the state of the mcast_task member:
>>>>
>>>> crash> struct ipoib_dev_priv.mcast_task ffff883fed6ec700
>>>>   mcast_task = {
>>>>     work = {
>>>>       data = {
>>>>         counter = 3072
>>>>       },
>>>>       entry = {
>>>>         next = 0xffff883fed6ec888,
>>>>         prev = 0xffff883fed6ec888
>>>>       },
>>>>       func = 0xffffffffa02e1220 <ipoib_mcast_join_task>
>>>>     },
>>>>     timer = {
>>>>       entry = {
>>>>         next = 0x0,
>>>>         pprev = 0x0
>>>>       },
>>>>       expires = 0,
>>>>       function = 0xffffffff8106dd20 <delayed_work_timer_fn>,
>>>>       data = 18446612406880618624,
>>>>       flags = 2097164,
>>>>       slack = -1
>>>>     },
>>>>     wq = 0x0,
>>>>     cpu = 0
>>>>   }
>>>>
>>>>
>>>> flush_workqueue is essentially waiting on
>>>> wait_for_completion(&this_flusher.done) in flush_workqueue, which
>>>> apparently never returned. I'm assuming this is due to the general
>>>> unavailability of the infiniband network. However, I think it's wrong,
>>>> in case of infiniband being down, the whole server to be rendered
>>>> unresponsive, due to rtnl_lock being held. Do you think it is possible
>>>> to rework this code to prevent it hanging in case the workqueue cannot
>>>> be flushed? Furthermore, do you think it's feasible to put code in
>>>> ipoib_mcast_join_task to detect such situations and not re-arm itself
>>>> and then use cancel_delayed_work_sync ?
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2016-06-06 12:36 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-06  9:09 Hang in ipoib_mcast_stop_thread Nikolay Borisov
     [not found] ` <57553DE7.2060009-6AxghH7DbtA@public.gmane.org>
2016-06-06 11:51   ` Erez Shitrit
     [not found]     ` <CAAk-MO_6GMm1AHay4uQ4yZ8HHcH_Dk=Ls5gKSVZUCejexDahLQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-06-06 12:11       ` Nikolay Borisov
     [not found]         ` <57556866.8040703-6AxghH7DbtA@public.gmane.org>
2016-06-06 12:25           ` Yuval Shaia
     [not found]             ` <20160606122558.GB10894-Hxa29pjIrETlQW142y8m19+IiqhCXseY@public.gmane.org>
2016-06-06 12:36               ` Nikolay Borisov [this message]
     [not found]                 ` <57556E67.4050904-6AxghH7DbtA@public.gmane.org>
2016-06-06 13:06                   ` Yuval Shaia
2016-06-06 12:57           ` Erez Shitrit
     [not found]             ` <CAAk-MO-T_9J2jtX+gtJQXeMVTW6bxbRUrq3mHRepVPatbHFO9g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-06-06 13:08               ` Nikolay Borisov
     [not found]                 ` <575575B5.9010504-6AxghH7DbtA@public.gmane.org>
2016-06-06 16:57                   ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57556E67.4050904@kyup.com \
    --to=kernel-6axghh7dbta@public.gmane.org \
    --cc=dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=erezsh-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=operations-/eCPMmvKun9pLGFMi4vTTA@public.gmane.org \
    --cc=yuval.shaia-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox