public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
From: Kleber Sacilotto de Souza <klebers-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
To: Or Gerlitz <or.gerlitz-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: Jack Morgenstein
	<jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>,
	Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	"linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Sean Hefty <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	Hal Rosenstock
	<hal.rosenstock-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Subject: Re: [PATCH] IB/mlx4: Fail post send command on error recovery
Date: Mon, 08 Apr 2013 10:51:13 -0300	[thread overview]
Message-ID: <5162CB51.3080600@linux.vnet.ibm.com> (raw)
In-Reply-To: <CAJZOPZ+dgtQRX_sfcDc=aSOW553Twi0oqNjREeminQ2tnZeEmQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On 04/04/2013 06:45 PM, Or Gerlitz wrote:
> 
> Kleber , as for the 1st problem, which kernel consumers are hanging
> for ever on their CQs? IPoIB is giving up after sometime e.g see in
> ipoib_ib.c "assume the HW is wedged and just free up all our pending
> work requests"
> 

Or, I don't have a very comprehensive testcase to stress most part of
the IB stack during error recovery, but during my tests the kernel
consumer that are still hanging is the ib_sa module, mcast_remove_one()
is waiting for the port completion queue:

INFO: task eehd:4689 blocked for more than 30 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
eehd            D 0000000000000000     0  4689      2 0x00010000
Call Trace:
[c0000000fba83190] [0000000000000001] 0x1 (unreliable)
[c0000000fba83360] [c000000000016188] .__switch_to+0x140/0x268
[c0000000fba83410] [c000000000674f28] .__schedule+0x570/0x8f0
[c0000000fba836b0] [c000000000675bc4] .schedule_timeout+0x334/0x3c8
[c0000000fba837c0] [c000000000674738] .wait_for_common+0x1c0/0x238
[c0000000fba838a0] [d000000002ca230c] .mcast_remove_one+0xfc/0x168 [ib_sa]
[c0000000fba83940] [d000000002bc4f60] .ib_unregister_device+0x78/0x170
[ib_core]
...

Or rdma_cm waiting for the cma_dev completion:

Call Trace:
[c0000000f8fc70f0] [0000000000000001] 0x1 (unreliable)
[c0000000f8fc72c0] [c000000000016188] .__switch_to+0x140/0x268
[c0000000f8fc7370] [c000000000674f28] .__schedule+0x570/0x8f0
[c0000000f8fc7610] [c000000000675bc4] .schedule_timeout+0x334/0x3c8
[c0000000f8fc7720] [c000000000674738] .wait_for_common+0x1c0/0x238
[c0000000f8fc7800] [d000000002f835b0] .cma_process_remove+0x170/0x1a8
[rdma_cm]
[c0000000f8fc78b0] [d000000002f8366c] .cma_remove_one+0x84/0xb0 [rdma_cm]
[c0000000f8fc7940] [d000000002c34f60] .ib_unregister_device+0x78/0x170
[ib_core]
...


Thanks,
kleber

-- 
Kleber Sacilotto de Souza
IBM Linux Technology Center

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2013-04-08 13:51 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-28 18:45 [PATCH] IB/mlx4: Fail post send command on error recovery Kleber Sacilotto de Souza
     [not found] ` <1364496315-7588-1-git-send-email-klebers-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2013-04-02  9:15   ` Or Gerlitz
     [not found]     ` <515AA1C6.7070804-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-04-02 11:24       ` Jack Morgenstein
2013-04-02 17:00   ` Roland Dreier
     [not found]     ` <CAL1RGDW7wMVmyFhCv-Ei8Mbca-Y9yv+nygzfREU2_TozNSZ60A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-04-04 13:01       ` Kleber Sacilotto de Souza
     [not found]         ` <515D79B3.4090808-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2013-04-04 14:00           ` Jack Morgenstein
     [not found]             ` <201304041700.40349.jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2013-04-04 21:45               ` Or Gerlitz
     [not found]                 ` <CAJZOPZ+dgtQRX_sfcDc=aSOW553Twi0oqNjREeminQ2tnZeEmQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-04-08 13:51                   ` Kleber Sacilotto de Souza [this message]
     [not found]                     ` <5162CB51.3080600-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2013-04-08 15:47                       ` Or Gerlitz
2013-04-04 21:45           ` Or Gerlitz
     [not found]             ` <CAJZOPZLgCMDmTO-qqZXm9Y9xv+xCh5bezz3A_nBn6BEtB-G+0A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-04-04 21:49               ` Roland Dreier
     [not found]                 ` <CAL1RGDU=hn7hMy0ECQ7AOQqmuB8R6+BT6JUudNS_6rPBKr2UtQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-04-08 14:07                   ` Kleber Sacilotto de Souza

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5162CB51.3080600@linux.vnet.ibm.com \
    --to=klebers-23vcf4htsmix0ybbhkvfkdbpr1lh4cv8@public.gmane.org \
    --cc=hal.rosenstock-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    --cc=or.gerlitz-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    --cc=sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox