public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
* OpenMPI over RoCEE
@ 2010-07-12 20:21 Steve Wise
       [not found] ` <4C3B794E.7010701-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
  0 siblings, 1 reply; 3+ messages in thread
From: Steve Wise @ 2010-07-12 20:21 UTC (permalink / raw)
  To: linux-rdma; +Cc: Jeff Squyres

I'm running OFED-1.5.1 with the RoCEE mlx4 drivers.  I can run low level 
verbs programs ok, but when running open mpi, I'm getting this error.  
Anybody seen this?

-----

[ompi@escher ~]$ mpirun -np 2 -host 10.192.176.111,10.192.176.112 --mca 
btl openib,sm,self /usr/mpi/gcc/openmpi-1.4.1/tests/IMB-3.2/IMB-MPI1 
-msglen msglen.txt -iter 1000000 pingpong
[escher][[36356,1],1][connect/btl_openib_connect_oob.c:325:qp_connect_all] 
error modifing QP to RTR errno says Invalid argument
[escher][[36356,1],1][connect/btl_openib_connect_oob.c:809:rml_recv_cb] 
error in endpoint reply start connect
--------------------------------------------------------------------------
mpirun has exited due to process rank 1 with PID 4894 on
node escher exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: OpenMPI over RoCEE
       [not found] ` <4C3B794E.7010701-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
@ 2010-07-13 23:56   ` Jeff Squyres
       [not found]     ` <48CED3A4-25F4-4D43-9948-881C0856225B-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 3+ messages in thread
From: Jeff Squyres @ 2010-07-13 23:56 UTC (permalink / raw)
  To: Steve Wise; +Cc: linux-rdma

Does it work with Open MPI v1.4.2?


On Jul 12, 2010, at 4:21 PM, Steve Wise wrote:

> I'm running OFED-1.5.1 with the RoCEE mlx4 drivers.  I can run low level
> verbs programs ok, but when running open mpi, I'm getting this error. 
> Anybody seen this?
> 
> -----
> 
> [ompi@escher ~]$ mpirun -np 2 -host 10.192.176.111,10.192.176.112 --mca
> btl openib,sm,self /usr/mpi/gcc/openmpi-1.4.1/tests/IMB-3.2/IMB-MPI1
> -msglen msglen.txt -iter 1000000 pingpong
> [escher][[36356,1],1][connect/btl_openib_connect_oob.c:325:qp_connect_all]
> error modifing QP to RTR errno says Invalid argument
> [escher][[36356,1],1][connect/btl_openib_connect_oob.c:809:rml_recv_cb]
> error in endpoint reply start connect
> --------------------------------------------------------------------------
> mpirun has exited due to process rank 1 with PID 4894 on
> node escher exiting without calling "finalize". This may
> have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
> --------------------------------------------------------------------------
> 
> 


-- 
Jeff Squyres
jsquyres-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: OpenMPI over RoCEE
       [not found]     ` <48CED3A4-25F4-4D43-9948-881C0856225B-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
@ 2010-07-14  0:10       ` Steve Wise
  0 siblings, 0 replies; 3+ messages in thread
From: Steve Wise @ 2010-07-14  0:10 UTC (permalink / raw)
  To: Jeff Squyres; +Cc: linux-rdma

You know, I got it running by adding this:  --mca btl_openib_cpc_include 
rdmacm

Which basically sez use only the rdmacm to setup the connection.

Thanks,

Steve.

Jeff Squyres wrote:
> Does it work with Open MPI v1.4.2?
>
>
> On Jul 12, 2010, at 4:21 PM, Steve Wise wrote:
>
>   
>> I'm running OFED-1.5.1 with the RoCEE mlx4 drivers.  I can run low level
>> verbs programs ok, but when running open mpi, I'm getting this error. 
>> Anybody seen this?
>>
>> -----
>>
>> [ompi@escher ~]$ mpirun -np 2 -host 10.192.176.111,10.192.176.112 --mca
>> btl openib,sm,self /usr/mpi/gcc/openmpi-1.4.1/tests/IMB-3.2/IMB-MPI1
>> -msglen msglen.txt -iter 1000000 pingpong
>> [escher][[36356,1],1][connect/btl_openib_connect_oob.c:325:qp_connect_all]
>> error modifing QP to RTR errno says Invalid argument
>> [escher][[36356,1],1][connect/btl_openib_connect_oob.c:809:rml_recv_cb]
>> error in endpoint reply start connect
>> --------------------------------------------------------------------------
>> mpirun has exited due to process rank 1 with PID 4894 on
>> node escher exiting without calling "finalize". This may
>> have caused other processes in the application to be
>> terminated by signals sent by mpirun (as reported here).
>> --------------------------------------------------------------------------
>>
>>
>>     
>
>
>   

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2010-07-14  0:10 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-12 20:21 OpenMPI over RoCEE Steve Wise
     [not found] ` <4C3B794E.7010701-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-07-13 23:56   ` Jeff Squyres
     [not found]     ` <48CED3A4-25F4-4D43-9948-881C0856225B-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
2010-07-14  0:10       ` Steve Wise

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox