All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pradeep Satyanarayana <pradeeps-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
To: EWG <Openfabrics-ewg-0P3JtQMG0aQdnm+yROfE0A@public.gmane.org>
Cc: linux-rdma <linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: IB errors with openMPI
Date: Sun, 21 Feb 2010 21:46:53 -0800	[thread overview]
Message-ID: <4B821A4D.2000409@linux.vnet.ibm.com> (raw)

We are trying run openMPI with OFED-1.5 on the 2.6.31-rt11-preempt-rt kernel and see the following errors:

[[45393,1],8][../../../../../ompi/mca/btl/openib/btl_openib_component.c:2951:handle_wc]
from elm3b107 to: elm3b17 error polling HP CQ with status WORK REQUEST FLUSHED
ERROR status number 5 for wr_id 1289846528 opcode -1782678528  vendor error 244
qp_idx 0

At this point I looked at the mlx4 diag counters and saw some non-zero values. Since we were attempting 
a series of runs, we don't know when the counters increased from 0. Do these counters have any correlation 
to the above MPI error?

[root@elm3b17 diag_counters]# pwd
/sys/class/infiniband/mlx4_0/diag_counters
[root@elm3b17 diag_counters]#

[root@elm3b17 diag_counters]# cat rq_num_rnr
19
[root@elm3b17 diag_counters]# cat rq_num_wrfe 
2009
[root@elm3b17 diag_counters]# cat sq_num_tree 
12
[root@elm3b17 diag_counters]# cat sq_num_wrfe
12
[root@elm3b17 diag_counters]#

Similarly on 3b107 let us look at the counters.

[root@elm3b107 diag_counters]# cat rq_num_wrfe
5156
[root@elm3b107 diag_counters]# cat sq_num_rnr
18
[root@elm3b107 diag_counters]# cat sq_num_tree
20
[root@elm3b107 diag_counters]# cat sq_num_wrfe
20
[root@elm3b107 diag_counters]#


We are using ConnectX dual port DDR HCAs (FW version 2.6). What does the vendor error 244 mean? Any suggestions to 
debug this further?

Thanks
Pradeep

                 reply	other threads:[~2010-02-22  5:46 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B821A4D.2000409@linux.vnet.ibm.com \
    --to=pradeeps-23vcf4htsmix0ybbhkvfkdbpr1lh4cv8@public.gmane.org \
    --cc=Openfabrics-ewg-0P3JtQMG0aQdnm+yROfE0A@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.