From: Kleber Sacilotto de Souza <klebers-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
To: Or Gerlitz <or.gerlitz-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: Jack Morgenstein
<jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>,
Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
"linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
Sean Hefty <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
Hal Rosenstock
<hal.rosenstock-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Subject: Re: [PATCH] IB/mlx4: Fail post send command on error recovery
Date: Mon, 08 Apr 2013 10:51:13 -0300 [thread overview]
Message-ID: <5162CB51.3080600@linux.vnet.ibm.com> (raw)
In-Reply-To: <CAJZOPZ+dgtQRX_sfcDc=aSOW553Twi0oqNjREeminQ2tnZeEmQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
On 04/04/2013 06:45 PM, Or Gerlitz wrote:
>
> Kleber , as for the 1st problem, which kernel consumers are hanging
> for ever on their CQs? IPoIB is giving up after sometime e.g see in
> ipoib_ib.c "assume the HW is wedged and just free up all our pending
> work requests"
>
Or, I don't have a very comprehensive testcase to stress most part of
the IB stack during error recovery, but during my tests the kernel
consumer that are still hanging is the ib_sa module, mcast_remove_one()
is waiting for the port completion queue:
INFO: task eehd:4689 blocked for more than 30 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
eehd D 0000000000000000 0 4689 2 0x00010000
Call Trace:
[c0000000fba83190] [0000000000000001] 0x1 (unreliable)
[c0000000fba83360] [c000000000016188] .__switch_to+0x140/0x268
[c0000000fba83410] [c000000000674f28] .__schedule+0x570/0x8f0
[c0000000fba836b0] [c000000000675bc4] .schedule_timeout+0x334/0x3c8
[c0000000fba837c0] [c000000000674738] .wait_for_common+0x1c0/0x238
[c0000000fba838a0] [d000000002ca230c] .mcast_remove_one+0xfc/0x168 [ib_sa]
[c0000000fba83940] [d000000002bc4f60] .ib_unregister_device+0x78/0x170
[ib_core]
...
Or rdma_cm waiting for the cma_dev completion:
Call Trace:
[c0000000f8fc70f0] [0000000000000001] 0x1 (unreliable)
[c0000000f8fc72c0] [c000000000016188] .__switch_to+0x140/0x268
[c0000000f8fc7370] [c000000000674f28] .__schedule+0x570/0x8f0
[c0000000f8fc7610] [c000000000675bc4] .schedule_timeout+0x334/0x3c8
[c0000000f8fc7720] [c000000000674738] .wait_for_common+0x1c0/0x238
[c0000000f8fc7800] [d000000002f835b0] .cma_process_remove+0x170/0x1a8
[rdma_cm]
[c0000000f8fc78b0] [d000000002f8366c] .cma_remove_one+0x84/0xb0 [rdma_cm]
[c0000000f8fc7940] [d000000002c34f60] .ib_unregister_device+0x78/0x170
[ib_core]
...
Thanks,
kleber
--
Kleber Sacilotto de Souza
IBM Linux Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2013-04-08 13:51 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-28 18:45 [PATCH] IB/mlx4: Fail post send command on error recovery Kleber Sacilotto de Souza
[not found] ` <1364496315-7588-1-git-send-email-klebers-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2013-04-02 9:15 ` Or Gerlitz
[not found] ` <515AA1C6.7070804-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-04-02 11:24 ` Jack Morgenstein
2013-04-02 17:00 ` Roland Dreier
[not found] ` <CAL1RGDW7wMVmyFhCv-Ei8Mbca-Y9yv+nygzfREU2_TozNSZ60A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-04-04 13:01 ` Kleber Sacilotto de Souza
[not found] ` <515D79B3.4090808-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2013-04-04 14:00 ` Jack Morgenstein
[not found] ` <201304041700.40349.jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2013-04-04 21:45 ` Or Gerlitz
[not found] ` <CAJZOPZ+dgtQRX_sfcDc=aSOW553Twi0oqNjREeminQ2tnZeEmQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-04-08 13:51 ` Kleber Sacilotto de Souza [this message]
[not found] ` <5162CB51.3080600-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2013-04-08 15:47 ` Or Gerlitz
2013-04-04 21:45 ` Or Gerlitz
[not found] ` <CAJZOPZLgCMDmTO-qqZXm9Y9xv+xCh5bezz3A_nBn6BEtB-G+0A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-04-04 21:49 ` Roland Dreier
[not found] ` <CAL1RGDU=hn7hMy0ECQ7AOQqmuB8R6+BT6JUudNS_6rPBKr2UtQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-04-08 14:07 ` Kleber Sacilotto de Souza
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5162CB51.3080600@linux.vnet.ibm.com \
--to=klebers-23vcf4htsmix0ybbhkvfkdbpr1lh4cv8@public.gmane.org \
--cc=hal.rosenstock-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
--cc=or.gerlitz-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
--cc=sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.