From: Pradeep Satyanarayana <pradeeps-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
To: EWG <Openfabrics-ewg-0P3JtQMG0aQdnm+yROfE0A@public.gmane.org>
Cc: linux-rdma <linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: IB errors with openMPI
Date: Sun, 21 Feb 2010 21:46:53 -0800 [thread overview]
Message-ID: <4B821A4D.2000409@linux.vnet.ibm.com> (raw)
We are trying run openMPI with OFED-1.5 on the 2.6.31-rt11-preempt-rt kernel and see the following errors:
[[45393,1],8][../../../../../ompi/mca/btl/openib/btl_openib_component.c:2951:handle_wc]
from elm3b107 to: elm3b17 error polling HP CQ with status WORK REQUEST FLUSHED
ERROR status number 5 for wr_id 1289846528 opcode -1782678528 vendor error 244
qp_idx 0
At this point I looked at the mlx4 diag counters and saw some non-zero values. Since we were attempting
a series of runs, we don't know when the counters increased from 0. Do these counters have any correlation
to the above MPI error?
[root@elm3b17 diag_counters]# pwd
/sys/class/infiniband/mlx4_0/diag_counters
[root@elm3b17 diag_counters]#
[root@elm3b17 diag_counters]# cat rq_num_rnr
19
[root@elm3b17 diag_counters]# cat rq_num_wrfe
2009
[root@elm3b17 diag_counters]# cat sq_num_tree
12
[root@elm3b17 diag_counters]# cat sq_num_wrfe
12
[root@elm3b17 diag_counters]#
Similarly on 3b107 let us look at the counters.
[root@elm3b107 diag_counters]# cat rq_num_wrfe
5156
[root@elm3b107 diag_counters]# cat sq_num_rnr
18
[root@elm3b107 diag_counters]# cat sq_num_tree
20
[root@elm3b107 diag_counters]# cat sq_num_wrfe
20
[root@elm3b107 diag_counters]#
We are using ConnectX dual port DDR HCAs (FW version 2.6). What does the vendor error 244 mean? Any suggestions to
debug this further?
Thanks
Pradeep
reply other threads:[~2010-02-22 5:46 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4B821A4D.2000409@linux.vnet.ibm.com \
--to=pradeeps-23vcf4htsmix0ybbhkvfkdbpr1lh4cv8@public.gmane.org \
--cc=Openfabrics-ewg-0P3JtQMG0aQdnm+yROfE0A@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.