* OpenMPI over RoCEE
@ 2010-07-12 20:21 Steve Wise
[not found] ` <4C3B794E.7010701-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
0 siblings, 1 reply; 3+ messages in thread
From: Steve Wise @ 2010-07-12 20:21 UTC (permalink / raw)
To: linux-rdma; +Cc: Jeff Squyres
I'm running OFED-1.5.1 with the RoCEE mlx4 drivers. I can run low level
verbs programs ok, but when running open mpi, I'm getting this error.
Anybody seen this?
-----
[ompi@escher ~]$ mpirun -np 2 -host 10.192.176.111,10.192.176.112 --mca
btl openib,sm,self /usr/mpi/gcc/openmpi-1.4.1/tests/IMB-3.2/IMB-MPI1
-msglen msglen.txt -iter 1000000 pingpong
[escher][[36356,1],1][connect/btl_openib_connect_oob.c:325:qp_connect_all]
error modifing QP to RTR errno says Invalid argument
[escher][[36356,1],1][connect/btl_openib_connect_oob.c:809:rml_recv_cb]
error in endpoint reply start connect
--------------------------------------------------------------------------
mpirun has exited due to process rank 1 with PID 4894 on
node escher exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: OpenMPI over RoCEE
[not found] ` <4C3B794E.7010701-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
@ 2010-07-13 23:56 ` Jeff Squyres
[not found] ` <48CED3A4-25F4-4D43-9948-881C0856225B-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
0 siblings, 1 reply; 3+ messages in thread
From: Jeff Squyres @ 2010-07-13 23:56 UTC (permalink / raw)
To: Steve Wise; +Cc: linux-rdma
Does it work with Open MPI v1.4.2?
On Jul 12, 2010, at 4:21 PM, Steve Wise wrote:
> I'm running OFED-1.5.1 with the RoCEE mlx4 drivers. I can run low level
> verbs programs ok, but when running open mpi, I'm getting this error.
> Anybody seen this?
>
> -----
>
> [ompi@escher ~]$ mpirun -np 2 -host 10.192.176.111,10.192.176.112 --mca
> btl openib,sm,self /usr/mpi/gcc/openmpi-1.4.1/tests/IMB-3.2/IMB-MPI1
> -msglen msglen.txt -iter 1000000 pingpong
> [escher][[36356,1],1][connect/btl_openib_connect_oob.c:325:qp_connect_all]
> error modifing QP to RTR errno says Invalid argument
> [escher][[36356,1],1][connect/btl_openib_connect_oob.c:809:rml_recv_cb]
> error in endpoint reply start connect
> --------------------------------------------------------------------------
> mpirun has exited due to process rank 1 with PID 4894 on
> node escher exiting without calling "finalize". This may
> have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
> --------------------------------------------------------------------------
>
>
--
Jeff Squyres
jsquyres-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: OpenMPI over RoCEE
[not found] ` <48CED3A4-25F4-4D43-9948-881C0856225B-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
@ 2010-07-14 0:10 ` Steve Wise
0 siblings, 0 replies; 3+ messages in thread
From: Steve Wise @ 2010-07-14 0:10 UTC (permalink / raw)
To: Jeff Squyres; +Cc: linux-rdma
You know, I got it running by adding this: --mca btl_openib_cpc_include
rdmacm
Which basically sez use only the rdmacm to setup the connection.
Thanks,
Steve.
Jeff Squyres wrote:
> Does it work with Open MPI v1.4.2?
>
>
> On Jul 12, 2010, at 4:21 PM, Steve Wise wrote:
>
>
>> I'm running OFED-1.5.1 with the RoCEE mlx4 drivers. I can run low level
>> verbs programs ok, but when running open mpi, I'm getting this error.
>> Anybody seen this?
>>
>> -----
>>
>> [ompi@escher ~]$ mpirun -np 2 -host 10.192.176.111,10.192.176.112 --mca
>> btl openib,sm,self /usr/mpi/gcc/openmpi-1.4.1/tests/IMB-3.2/IMB-MPI1
>> -msglen msglen.txt -iter 1000000 pingpong
>> [escher][[36356,1],1][connect/btl_openib_connect_oob.c:325:qp_connect_all]
>> error modifing QP to RTR errno says Invalid argument
>> [escher][[36356,1],1][connect/btl_openib_connect_oob.c:809:rml_recv_cb]
>> error in endpoint reply start connect
>> --------------------------------------------------------------------------
>> mpirun has exited due to process rank 1 with PID 4894 on
>> node escher exiting without calling "finalize". This may
>> have caused other processes in the application to be
>> terminated by signals sent by mpirun (as reported here).
>> --------------------------------------------------------------------------
>>
>>
>>
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2010-07-14 0:10 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-12 20:21 OpenMPI over RoCEE Steve Wise
[not found] ` <4C3B794E.7010701-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-07-13 23:56 ` Jeff Squyres
[not found] ` <48CED3A4-25F4-4D43-9948-881C0856225B-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
2010-07-14 0:10 ` Steve Wise
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox