From: Kleber Sacilotto de Souza <klebers-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
To: Or Gerlitz <or.gerlitz-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: Jack Morgenstein
<jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>,
Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
"linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
Sean Hefty <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
Hal Rosenstock
<hal.rosenstock-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Subject: Re: [PATCH] IB/mlx4: Fail post send command on error recovery
Date: Mon, 08 Apr 2013 10:51:13 -0300 [thread overview]
Message-ID: <5162CB51.3080600@linux.vnet.ibm.com> (raw)
In-Reply-To: <CAJZOPZ+dgtQRX_sfcDc=aSOW553Twi0oqNjREeminQ2tnZeEmQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
On 04/04/2013 06:45 PM, Or Gerlitz wrote:
>
> Kleber , as for the 1st problem, which kernel consumers are hanging
> for ever on their CQs? IPoIB is giving up after sometime e.g see in
> ipoib_ib.c "assume the HW is wedged and just free up all our pending
> work requests"
>
Or, I don't have a very comprehensive testcase to stress most part of
the IB stack during error recovery, but during my tests the kernel
consumer that are still hanging is the ib_sa module, mcast_remove_one()
is waiting for the port completion queue:
INFO: task eehd:4689 blocked for more than 30 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
eehd D 0000000000000000 0 4689 2 0x00010000
Call Trace:
[c0000000fba83190] [0000000000000001] 0x1 (unreliable)
[c0000000fba83360] [c000000000016188] .__switch_to+0x140/0x268
[c0000000fba83410] [c000000000674f28] .__schedule+0x570/0x8f0
[c0000000fba836b0] [c000000000675bc4] .schedule_timeout+0x334/0x3c8
[c0000000fba837c0] [c000000000674738] .wait_for_common+0x1c0/0x238
[c0000000fba838a0] [d000000002ca230c] .mcast_remove_one+0xfc/0x168 [ib_sa]
[c0000000fba83940] [d000000002bc4f60] .ib_unregister_device+0x78/0x170
[ib_core]
...
Or rdma_cm waiting for the cma_dev completion:
Call Trace:
[c0000000f8fc70f0] [0000000000000001] 0x1 (unreliable)
[c0000000f8fc72c0] [c000000000016188] .__switch_to+0x140/0x268
[c0000000f8fc7370] [c000000000674f28] .__schedule+0x570/0x8f0
[c0000000f8fc7610] [c000000000675bc4] .schedule_timeout+0x334/0x3c8
[c0000000f8fc7720] [c000000000674738] .wait_for_common+0x1c0/0x238
[c0000000f8fc7800] [d000000002f835b0] .cma_process_remove+0x170/0x1a8
[rdma_cm]
[c0000000f8fc78b0] [d000000002f8366c] .cma_remove_one+0x84/0xb0 [rdma_cm]
[c0000000f8fc7940] [d000000002c34f60] .ib_unregister_device+0x78/0x170
[ib_core]
...
Thanks,
kleber
--
Kleber Sacilotto de Souza
IBM Linux Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2013-04-08 13:51 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-28 18:45 [PATCH] IB/mlx4: Fail post send command on error recovery Kleber Sacilotto de Souza
[not found] ` <1364496315-7588-1-git-send-email-klebers-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2013-04-02 9:15 ` Or Gerlitz
[not found] ` <515AA1C6.7070804-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-04-02 11:24 ` Jack Morgenstein
2013-04-02 17:00 ` Roland Dreier
[not found] ` <CAL1RGDW7wMVmyFhCv-Ei8Mbca-Y9yv+nygzfREU2_TozNSZ60A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-04-04 13:01 ` Kleber Sacilotto de Souza
[not found] ` <515D79B3.4090808-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2013-04-04 14:00 ` Jack Morgenstein
[not found] ` <201304041700.40349.jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2013-04-04 21:45 ` Or Gerlitz
[not found] ` <CAJZOPZ+dgtQRX_sfcDc=aSOW553Twi0oqNjREeminQ2tnZeEmQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-04-08 13:51 ` Kleber Sacilotto de Souza [this message]
[not found] ` <5162CB51.3080600-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2013-04-08 15:47 ` Or Gerlitz
2013-04-04 21:45 ` Or Gerlitz
[not found] ` <CAJZOPZLgCMDmTO-qqZXm9Y9xv+xCh5bezz3A_nBn6BEtB-G+0A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-04-04 21:49 ` Roland Dreier
[not found] ` <CAL1RGDU=hn7hMy0ECQ7AOQqmuB8R6+BT6JUudNS_6rPBKr2UtQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-04-08 14:07 ` Kleber Sacilotto de Souza
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5162CB51.3080600@linux.vnet.ibm.com \
--to=klebers-23vcf4htsmix0ybbhkvfkdbpr1lh4cv8@public.gmane.org \
--cc=hal.rosenstock-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
--cc=or.gerlitz-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
--cc=sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox