* OpenMPI over RoCEE
@ 2010-07-12 20:21 Steve Wise
[not found] ` <4C3B794E.7010701-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
0 siblings, 1 reply; 3+ messages in thread
From: Steve Wise @ 2010-07-12 20:21 UTC (permalink / raw)
To: linux-rdma; +Cc: Jeff Squyres
I'm running OFED-1.5.1 with the RoCEE mlx4 drivers. I can run low level
verbs programs ok, but when running open mpi, I'm getting this error.
Anybody seen this?
-----
[ompi@escher ~]$ mpirun -np 2 -host 10.192.176.111,10.192.176.112 --mca
btl openib,sm,self /usr/mpi/gcc/openmpi-1.4.1/tests/IMB-3.2/IMB-MPI1
-msglen msglen.txt -iter 1000000 pingpong
[escher][[36356,1],1][connect/btl_openib_connect_oob.c:325:qp_connect_all]
error modifing QP to RTR errno says Invalid argument
[escher][[36356,1],1][connect/btl_openib_connect_oob.c:809:rml_recv_cb]
error in endpoint reply start connect
--------------------------------------------------------------------------
mpirun has exited due to process rank 1 with PID 4894 on
node escher exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 3+ messages in thread[parent not found: <4C3B794E.7010701-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>]
* Re: OpenMPI over RoCEE [not found] ` <4C3B794E.7010701-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> @ 2010-07-13 23:56 ` Jeff Squyres [not found] ` <48CED3A4-25F4-4D43-9948-881C0856225B-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org> 0 siblings, 1 reply; 3+ messages in thread From: Jeff Squyres @ 2010-07-13 23:56 UTC (permalink / raw) To: Steve Wise; +Cc: linux-rdma Does it work with Open MPI v1.4.2? On Jul 12, 2010, at 4:21 PM, Steve Wise wrote: > I'm running OFED-1.5.1 with the RoCEE mlx4 drivers. I can run low level > verbs programs ok, but when running open mpi, I'm getting this error. > Anybody seen this? > > ----- > > [ompi@escher ~]$ mpirun -np 2 -host 10.192.176.111,10.192.176.112 --mca > btl openib,sm,self /usr/mpi/gcc/openmpi-1.4.1/tests/IMB-3.2/IMB-MPI1 > -msglen msglen.txt -iter 1000000 pingpong > [escher][[36356,1],1][connect/btl_openib_connect_oob.c:325:qp_connect_all] > error modifing QP to RTR errno says Invalid argument > [escher][[36356,1],1][connect/btl_openib_connect_oob.c:809:rml_recv_cb] > error in endpoint reply start connect > -------------------------------------------------------------------------- > mpirun has exited due to process rank 1 with PID 4894 on > node escher exiting without calling "finalize". This may > have caused other processes in the application to be > terminated by signals sent by mpirun (as reported here). > -------------------------------------------------------------------------- > > -- Jeff Squyres jsquyres-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 3+ messages in thread
[parent not found: <48CED3A4-25F4-4D43-9948-881C0856225B-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>]
* Re: OpenMPI over RoCEE [not found] ` <48CED3A4-25F4-4D43-9948-881C0856225B-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org> @ 2010-07-14 0:10 ` Steve Wise 0 siblings, 0 replies; 3+ messages in thread From: Steve Wise @ 2010-07-14 0:10 UTC (permalink / raw) To: Jeff Squyres; +Cc: linux-rdma You know, I got it running by adding this: --mca btl_openib_cpc_include rdmacm Which basically sez use only the rdmacm to setup the connection. Thanks, Steve. Jeff Squyres wrote: > Does it work with Open MPI v1.4.2? > > > On Jul 12, 2010, at 4:21 PM, Steve Wise wrote: > > >> I'm running OFED-1.5.1 with the RoCEE mlx4 drivers. I can run low level >> verbs programs ok, but when running open mpi, I'm getting this error. >> Anybody seen this? >> >> ----- >> >> [ompi@escher ~]$ mpirun -np 2 -host 10.192.176.111,10.192.176.112 --mca >> btl openib,sm,self /usr/mpi/gcc/openmpi-1.4.1/tests/IMB-3.2/IMB-MPI1 >> -msglen msglen.txt -iter 1000000 pingpong >> [escher][[36356,1],1][connect/btl_openib_connect_oob.c:325:qp_connect_all] >> error modifing QP to RTR errno says Invalid argument >> [escher][[36356,1],1][connect/btl_openib_connect_oob.c:809:rml_recv_cb] >> error in endpoint reply start connect >> -------------------------------------------------------------------------- >> mpirun has exited due to process rank 1 with PID 4894 on >> node escher exiting without calling "finalize". This may >> have caused other processes in the application to be >> terminated by signals sent by mpirun (as reported here). >> -------------------------------------------------------------------------- >> >> >> > > > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2010-07-14 0:10 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-12 20:21 OpenMPI over RoCEE Steve Wise
[not found] ` <4C3B794E.7010701-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-07-13 23:56 ` Jeff Squyres
[not found] ` <48CED3A4-25F4-4D43-9948-881C0856225B-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
2010-07-14 0:10 ` Steve Wise
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox