public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
* p
@ 2009-11-01 19:58 Sasha Khapyorsky
  2009-11-01 20:01 ` [PATCH] management: bump package versions Sasha Khapyorsky
  0 siblings, 1 reply; 9+ messages in thread
From: Sasha Khapyorsky @ 2009-11-01 19:58 UTC (permalink / raw)
  To: linux-rdma

>From bf49c02f6eb474fcc25af40de68991de23ee629f Mon Sep 17 00:00:00 2001
From: Sasha Khapyorsky <sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org>
Date: Sun, 1 Nov 2009 21:55:47 +0200
Subject: [PATCH] management: bump package versions

Bump IB management packages versions:

	libibumad-1.3.3
	libibmad-1.3.3
	opensm-3.3.3
	infiniband-diags-1.5.3

Update mailing list name in configure.in files.

Signed-off-by: Sasha Khapyorsky <sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org>
---
 infiniband-diags/configure.in |    2 +-
 libibmad/configure.in         |    2 +-
 libibumad/configure.in        |    2 +-
 opensm/configure.in           |    2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/infiniband-diags/configure.in b/infiniband-diags/configure.in
index 3ef35cc..5865727 100644
--- a/infiniband-diags/configure.in
+++ b/infiniband-diags/configure.in
@@ -1,7 +1,7 @@
 dnl Process this file with autoconf to produce a configure script.
 
 AC_PREREQ(2.57)
-AC_INIT(infiniband-diags, 1.5.2, general-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org)
+AC_INIT(infiniband-diags, 1.5.3, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org)
 AC_CONFIG_AUX_DIR(config)
 AM_CONFIG_HEADER(config.h)
 AM_INIT_AUTOMAKE
diff --git a/libibmad/configure.in b/libibmad/configure.in
index b4f5c41..ce31729 100644
--- a/libibmad/configure.in
+++ b/libibmad/configure.in
@@ -1,7 +1,7 @@
 dnl Process this file with autoconf to produce a configure script.
 
 AC_PREREQ(2.57)
-AC_INIT(libibmad, 1.3.2, general-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org)
+AC_INIT(libibmad, 1.3.3, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org)
 AC_CONFIG_SRCDIR([src/sa.c])
 AC_CONFIG_AUX_DIR(config)
 AM_CONFIG_HEADER(config.h)
diff --git a/libibumad/configure.in b/libibumad/configure.in
index 6dbfeaf..3152491 100644
--- a/libibumad/configure.in
+++ b/libibumad/configure.in
@@ -1,7 +1,7 @@
 dnl Process this file with autoconf to produce a configure script.
 
 AC_PREREQ(2.57)
-AC_INIT(libibumad, 1.3.2, general-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org)
+AC_INIT(libibumad, 1.3.3, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org)
 AC_CONFIG_SRCDIR([src/umad.c])
 AC_CONFIG_AUX_DIR(config)
 AM_CONFIG_HEADER(config.h)
diff --git a/opensm/configure.in b/opensm/configure.in
index 8a6b4c0..2c68cbd 100644
--- a/opensm/configure.in
+++ b/opensm/configure.in
@@ -1,7 +1,7 @@
 dnl Process this file with autoconf to produce a configure script.
 
 AC_PREREQ(2.57)
-AC_INIT(opensm, 3.3.2, general-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org)
+AC_INIT(opensm, 3.3.3, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org)
 AC_CONFIG_SRCDIR([opensm/osm_opensm.c])
 AC_CONFIG_AUX_DIR(config)
 AC_CONFIG_HEADERS(include/config.h include/opensm/osm_config.h)
-- 
1.6.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH] management: bump package versions
  2009-11-01 19:58 p Sasha Khapyorsky
@ 2009-11-01 20:01 ` Sasha Khapyorsky
  0 siblings, 0 replies; 9+ messages in thread
From: Sasha Khapyorsky @ 2009-11-01 20:01 UTC (permalink / raw)
  To: linux-rdma


Bump IB management packages versions:

	libibumad-1.3.3
	libibmad-1.3.3
	opensm-3.3.3
	infiniband-diags-1.5.3

Update mailing list name in configure.in files.

Signed-off-by: Sasha Khapyorsky <sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org>
---

On 21:58 Sun 01 Nov     , Sasha Khapyorsky wrote:
> From bf49c02f6eb474fcc25af40de68991de23ee629f Mon Sep 17 00:00:00 2001
> From: Sasha Khapyorsky <sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org>
> Date: Sun, 1 Nov 2009 21:55:47 +0200
> Subject: [PATCH] management: bump package versions

Resending. Used bad subject and format first time, sorry - hit 'send'
too quickly.

 infiniband-diags/configure.in |    2 +-
 libibmad/configure.in         |    2 +-
 libibumad/configure.in        |    2 +-
 opensm/configure.in           |    2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/infiniband-diags/configure.in b/infiniband-diags/configure.in
index 3ef35cc..5865727 100644
--- a/infiniband-diags/configure.in
+++ b/infiniband-diags/configure.in
@@ -1,7 +1,7 @@
 dnl Process this file with autoconf to produce a configure script.
 
 AC_PREREQ(2.57)
-AC_INIT(infiniband-diags, 1.5.2, general-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org)
+AC_INIT(infiniband-diags, 1.5.3, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org)
 AC_CONFIG_AUX_DIR(config)
 AM_CONFIG_HEADER(config.h)
 AM_INIT_AUTOMAKE
diff --git a/libibmad/configure.in b/libibmad/configure.in
index b4f5c41..ce31729 100644
--- a/libibmad/configure.in
+++ b/libibmad/configure.in
@@ -1,7 +1,7 @@
 dnl Process this file with autoconf to produce a configure script.
 
 AC_PREREQ(2.57)
-AC_INIT(libibmad, 1.3.2, general-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org)
+AC_INIT(libibmad, 1.3.3, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org)
 AC_CONFIG_SRCDIR([src/sa.c])
 AC_CONFIG_AUX_DIR(config)
 AM_CONFIG_HEADER(config.h)
diff --git a/libibumad/configure.in b/libibumad/configure.in
index 6dbfeaf..3152491 100644
--- a/libibumad/configure.in
+++ b/libibumad/configure.in
@@ -1,7 +1,7 @@
 dnl Process this file with autoconf to produce a configure script.
 
 AC_PREREQ(2.57)
-AC_INIT(libibumad, 1.3.2, general-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org)
+AC_INIT(libibumad, 1.3.3, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org)
 AC_CONFIG_SRCDIR([src/umad.c])
 AC_CONFIG_AUX_DIR(config)
 AM_CONFIG_HEADER(config.h)
diff --git a/opensm/configure.in b/opensm/configure.in
index 8a6b4c0..2c68cbd 100644
--- a/opensm/configure.in
+++ b/opensm/configure.in
@@ -1,7 +1,7 @@
 dnl Process this file with autoconf to produce a configure script.
 
 AC_PREREQ(2.57)
-AC_INIT(opensm, 3.3.2, general-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org)
+AC_INIT(opensm, 3.3.3, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org)
 AC_CONFIG_SRCDIR([opensm/osm_opensm.c])
 AC_CONFIG_AUX_DIR(config)
 AC_CONFIG_HEADERS(include/config.h include/opensm/osm_config.h)
-- 
1.6.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* P
       [not found]     ` <AANLkTilhV8JJTKxc4OpudTUgKqMyJ5mzcxt6XdSMurDS-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2010-06-12  9:10       ` Dotan Barak
       [not found]         ` <4C134EFC.5010207-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Dotan Barak @ 2010-06-12  9:10 UTC (permalink / raw)
  To: Ding Dinghua; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 12/06/2010 03:22, Ding Dinghua wrote:
> 2010/6/11 Dotan Barak<dotanba-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:
>    
>> Hi.
>>
>> On 11/06/2010 10:51, Ding Dinghua wrote:
>>      
>>> Hi all:
>>>           I'm using RDMA to do fs-metadata mirror between nodes. I
>>> encountered a strange problem when the program was running:
>>> Complete queue handler reported that the  RDMA-Write operation failed,
>>>   the status of  corresponding "struct ib_wc" is "IB_WC_RETRY_EXC_ERR".
>>> The problem is encountered randomly. I don't know the meaning of this
>>> error code as well as what to do next. Would anyone give me some tips?
>>> thanks a lot.
>>>
>>>        
>> Do you sync between the sides before closing the QPs?
>>      
> Can you say it more detail? thanks.
>    
If you try to send a message from local QP to a remote QP before the 
remote QP is in RTR state (or after it was closed/transferred to the 
ERROR state),
you may get RETRY EXCEEDED, because there isn't any QP in the remote 
side that can accept your message (and send a response).

How do you connect the QPs? (And how do you close the connection between 
them)

Dotan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: P
       [not found]         ` <4C134EFC.5010207-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2010-06-17  7:07           ` Ding Dinghua
       [not found]             ` <AANLkTimdeZwZI3FlTncXY_d3QY8jFfNhHERTxl3BD3Bd-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2010-06-17  7:27           ` P Ding Dinghua
  1 sibling, 1 reply; 9+ messages in thread
From: Ding Dinghua @ 2010-06-17  7:07 UTC (permalink / raw)
  To: Dotan Barak; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Sorry for late reply.

2010/6/12 Dotan Barak <dotanba-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:
> On 12/06/2010 03:22, Ding Dinghua wrote:
>>
>> 2010/6/11 Dotan Barak<dotanba-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:
>>
>>>
>>> Hi.
>>>
>>> On 11/06/2010 10:51, Ding Dinghua wrote:
>>>
>>>>
>>>> Hi all:
>>>>          I'm using RDMA to do fs-metadata mirror between nodes. I
>>>> encountered a strange problem when the program was running:
>>>> Complete queue handler reported that the  RDMA-Write operation failed,
>>>>  the status of  corresponding "struct ib_wc" is "IB_WC_RETRY_EXC_ERR".
>>>> The problem is encountered randomly. I don't know the meaning of this
>>>> error code as well as what to do next. Would anyone give me some tips?
>>>> thanks a lot.
>>>>
>>>>
>>>
>>> Do you sync between the sides before closing the QPs?
>>>
>>
>> Can you say it more detail? thanks.
>>
>
> If you try to send a message from local QP to a remote QP before the remote
> QP is in RTR state (or after it was closed/transferred to the ERROR state),
> you may get RETRY EXCEEDED, because there isn't any QP in the remote side
> that can accept your message (and send a response).
>
> How do you connect the QPs? (And how do you close the connection between
> them)
>
I call rdma_create_id to create an ib id, then do resolve remote addr,
resolve route work, then
setup qp and call rdma_connect to setup connection, before ack or
error replies, the thread will
wait on a wait queue. The listening ib id of remote node will catch
the connect request,
setup qp, allocate and map pages to construct the RDMA-WRITE space,
and call rdma_accept to reply
the request.

Some other information which may be useful:
1.All the "RETRY EXCEEDED" problems happened when there were two
connections which use RDMA-WRITE to transfer things.
And the latter connection had a high possibility to get into this problem.
2. All the "RETRY EXCEEDED" problems happened when the RMDA-WRITE
space is 256MB each(that is, for two connections, consumes 512MB mem),
when the RDMA-WRITE  space is 64MB, this problem never happened in our
test. Remote node's total memory is 2GB.

Thanks a lot.


> Dotan
>



-- 
Ding Dinghua
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: P
       [not found]         ` <4C134EFC.5010207-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2010-06-17  7:07           ` P Ding Dinghua
@ 2010-06-17  7:27           ` Ding Dinghua
  1 sibling, 0 replies; 9+ messages in thread
From: Ding Dinghua @ 2010-06-17  7:27 UTC (permalink / raw)
  To: Dotan Barak; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

2010/6/12 Dotan Barak <dotanba-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:
> On 12/06/2010 03:22, Ding Dinghua wrote:
>>
>> 2010/6/11 Dotan Barak<dotanba-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:
>>
>>>
>>> Hi.
>>>
>>> On 11/06/2010 10:51, Ding Dinghua wrote:
>>>
>>>>
>>>> Hi all:
>>>>          I'm using RDMA to do fs-metadata mirror between nodes. I
>>>> encountered a strange problem when the program was running:
>>>> Complete queue handler reported that the  RDMA-Write operation failed,
>>>>  the status of  corresponding "struct ib_wc" is "IB_WC_RETRY_EXC_ERR".
>>>> The problem is encountered randomly. I don't know the meaning of this
>>>> error code as well as what to do next. Would anyone give me some tips?
>>>> thanks a lot.
>>>>
>>>>
>>>
>>> Do you sync between the sides before closing the QPs?
>>>
>>
>> Can you say it more detail? thanks.
>>
>
> If you try to send a message from local QP to a remote QP before the remote
> QP is in RTR state (or after it was closed/transferred to the ERROR state),
> you may get RETRY EXCEEDED, because there isn't any QP in the remote side
> that can accept your message (and send a response).
>
> How do you connect the QPs? (And how do you close the connection between
> them)
>
Sorry i forget the close issue.

1. Local node call ib_poll_cq to process the remaining complete queue entry,
2. Local node call rdma_disconnect to destroy connection, before
remote side ack, the thread will wait on a wait queue.
3. After catching this request, the remote node will also call
ib_poll_cq to process the remainning complete queue entry,
then do some resource-release work, then send a reply.
4. Local node was waken up and do resource-release work.


> Dotan
>



-- 
Ding Dinghua
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: P
       [not found]             ` <AANLkTimdeZwZI3FlTncXY_d3QY8jFfNhHERTxl3BD3Bd-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2010-06-19 10:42               ` Dotan Barak
       [not found]                 ` <4C1C9EFC.4020304-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Dotan Barak @ 2010-06-19 10:42 UTC (permalink / raw)
  To: Ding Dinghua; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA


> I call rdma_create_id to create an ib id, then do resolve remote addr,
> resolve route work, then
> setup qp and call rdma_connect to setup connection, before ack or
> error replies, the thread will
> wait on a wait queue. The listening ib id of remote node will catch
> the connect request,
> setup qp, allocate and map pages to construct the RDMA-WRITE space,
> and call rdma_accept to reply
> the request.
>
> Some other information which may be useful:
> 1.All the "RETRY EXCEEDED" problems happened when there were two
> connections which use RDMA-WRITE to transfer things.
> And the latter connection had a high possibility to get into this problem.
> 2. All the "RETRY EXCEEDED" problems happened when the RMDA-WRITE
> space is 256MB each(that is, for two connections, consumes 512MB mem),
> when the RDMA-WRITE  space is 64MB, this problem never happened in our
> test. Remote node's total memory is 2GB.
>
> Thanks a lot.
>    
Some more questions:
* Is the WR that "produces" the RETRY EXCEEDED is the first one/last 
one/in the middle?
* Which values are you using in the QP context for retry exceeded 
counter + retry timeout?
* Did you try to increase those values?
* How many more QPs do you have between those nodes and which operations 
do they use
    (only RDMA-WRITEs?)

Thanks
Dotan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: P
       [not found]                 ` <4C1C9EFC.4020304-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2010-06-20  5:51                   ` Ding Dinghua
       [not found]                     ` <AANLkTinah43AD5N0ZryDsrGprkeVf9-BdLCyr125PQ3p-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Ding Dinghua @ 2010-06-20  5:51 UTC (permalink / raw)
  To: Dotan Barak; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

hello,

2010/6/19 Dotan Barak <dotanba-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:
>
>> I call rdma_create_id to create an ib id, then do resolve remote addr,
>> resolve route work, then
>> setup qp and call rdma_connect to setup connection, before ack or
>> error replies, the thread will
>> wait on a wait queue. The listening ib id of remote node will catch
>> the connect request,
>> setup qp, allocate and map pages to construct the RDMA-WRITE space,
>> and call rdma_accept to reply
>> the request.
>>
>> Some other information which may be useful:
>> 1.All the "RETRY EXCEEDED" problems happened when there were two
>> connections which use RDMA-WRITE to transfer things.
>> And the latter connection had a high possibility to get into this problem.
>> 2. All the "RETRY EXCEEDED" problems happened when the RMDA-WRITE
>> space is 256MB each(that is, for two connections, consumes 512MB mem),
>> when the RDMA-WRITE  space is 64MB, this problem never happened in our
>> test. Remote node's total memory is 2GB.
>>
>> Thanks a lot.
>>
>
> Some more questions:
> * Is the WR that "produces" the RETRY EXCEEDED is the first one/last one/in
> the middle?

it's the first one

> * Which values are you using in the QP context for retry exceeded counter +
> retry timeout?
> * Did you try to increase those values?

I haven't set these values(actually  I don't know where to set these
values), i just set max_send_wr and max_send_sge
fields of struct ib_qp_cap when creating qp.

> * How many more QPs do you have between those nodes and which operations do
> they use
>   (only RDMA-WRITEs?)
>

4096 QPs for each connection,  only do RDMA-WRITES.

> Thanks
> Dotan
>



-- 
Ding Dinghua
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: P
       [not found]                     ` <AANLkTinah43AD5N0ZryDsrGprkeVf9-BdLCyr125PQ3p-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2010-06-20 18:43                       ` Dotan Barak
       [not found]                         ` <4C1E6134.6070304-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Dotan Barak @ 2010-06-20 18:43 UTC (permalink / raw)
  To: Ding Dinghua; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 20/06/2010 07:51, Ding Dinghua wrote:
> hello,
>
> 2010/6/19 Dotan Barak<dotanba-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:
>    
>>      
>>> I call rdma_create_id to create an ib id, then do resolve remote addr,
>>> resolve route work, then
>>> setup qp and call rdma_connect to setup connection, before ack or
>>> error replies, the thread will
>>> wait on a wait queue. The listening ib id of remote node will catch
>>> the connect request,
>>> setup qp, allocate and map pages to construct the RDMA-WRITE space,
>>> and call rdma_accept to reply
>>> the request.
>>>
>>> Some other information which may be useful:
>>> 1.All the "RETRY EXCEEDED" problems happened when there were two
>>> connections which use RDMA-WRITE to transfer things.
>>> And the latter connection had a high possibility to get into this problem.
>>> 2. All the "RETRY EXCEEDED" problems happened when the RMDA-WRITE
>>> space is 256MB each(that is, for two connections, consumes 512MB mem),
>>> when the RDMA-WRITE  space is 64MB, this problem never happened in our
>>> test. Remote node's total memory is 2GB.
>>>
>>> Thanks a lot.
>>>
>>>        
>> Some more questions:
>> * Is the WR that "produces" the RETRY EXCEEDED is the first one/last one/in
>> the middle?
>>      
> it's the first one
>
>    
>> * Which values are you using in the QP context for retry exceeded counter +
>> retry timeout?
>> * Did you try to increase those values?
>>      
> I haven't set these values(actually  I don't know where to set these
> values), i just set max_send_wr and max_send_sge
> fields of struct ib_qp_cap when creating qp.
>
>    
Can you perform query QP after establishing a connection between the QPs 
and check those values?

>> * How many more QPs do you have between those nodes and which operations do
>> they use
>>    (only RDMA-WRITEs?)
>>
>>      
> 4096 QPs for each connection,  only do RDMA-WRITES.
>    
So, you send in parallel total of 4K (QPs) * 64M (Bytes)  = 256 GB
(am i missing something, or this is the amount of data that will be sent 
between two nodes?)

Dotan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: P
       [not found]                         ` <4C1E6134.6070304-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2010-06-21  2:30                           ` Ding Dinghua
  0 siblings, 0 replies; 9+ messages in thread
From: Ding Dinghua @ 2010-06-21  2:30 UTC (permalink / raw)
  To: Dotan Barak; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

2010/6/21 Dotan Barak <dotanba-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:
> On 20/06/2010 07:51, Ding Dinghua wrote:
>>
>> hello,
>>
>> 2010/6/19 Dotan Barak<dotanba-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:
>>
>>>
>>>
>>>>
>>>> I call rdma_create_id to create an ib id, then do resolve remote addr,
>>>> resolve route work, then
>>>> setup qp and call rdma_connect to setup connection, before ack or
>>>> error replies, the thread will
>>>> wait on a wait queue. The listening ib id of remote node will catch
>>>> the connect request,
>>>> setup qp, allocate and map pages to construct the RDMA-WRITE space,
>>>> and call rdma_accept to reply
>>>> the request.
>>>>
>>>> Some other information which may be useful:
>>>> 1.All the "RETRY EXCEEDED" problems happened when there were two
>>>> connections which use RDMA-WRITE to transfer things.
>>>> And the latter connection had a high possibility to get into this
>>>> problem.
>>>> 2. All the "RETRY EXCEEDED" problems happened when the RMDA-WRITE
>>>> space is 256MB each(that is, for two connections, consumes 512MB mem),
>>>> when the RDMA-WRITE  space is 64MB, this problem never happened in our
>>>> test. Remote node's total memory is 2GB.
>>>>
>>>> Thanks a lot.
>>>>
>>>>
>>>
>>> Some more questions:
>>> * Is the WR that "produces" the RETRY EXCEEDED is the first one/last
>>> one/in
>>> the middle?
>>>
>>
>> it's the first one
>>
>>
>>>
>>> * Which values are you using in the QP context for retry exceeded counter
>>> +
>>> retry timeout?
>>> * Did you try to increase those values?
>>>
>>
>> I haven't set these values(actually  I don't know where to set these
>> values), i just set max_send_wr and max_send_sge
>> fields of struct ib_qp_cap when creating qp.
>>
>>
>
> Can you perform query QP after establishing a connection between the QPs and
> check those values?
>

All the QPs (local and remote, 2 connections) :
3 qp_state, 19 retry_cnt, 7 timeout, that's, all QPs' qp_state is
IB_QPS_RTS(Should remote QP's state be this, or IB_QPS_RTR?
But QP's state of first connection is the same and it can work...).

>>> * How many more QPs do you have between those nodes and which operations
>>> do
>>> they use
>>>   (only RDMA-WRITEs?)
>>>
>>>
>>
>> 4096 QPs for each connection,  only do RDMA-WRITES.
>>
>
> So, you send in parallel total of 4K (QPs) * 64M (Bytes)  = 256 GB
> (am i missing something, or this is the amount of data that will be sent
> between two nodes?)
>
if  RDMA-WRITE space is 64M(Bytes),  this means upper level applications send
at most 64M(Bytes)  to remote node one time. These QPs may send to different
piece of the 64M space in parallel.

> Dotan
>



-- 
Ding Dinghua
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-06-21  2:30 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-01 19:58 p Sasha Khapyorsky
2009-11-01 20:01 ` [PATCH] management: bump package versions Sasha Khapyorsky
  -- strict thread matches above, loose matches on Subject: below --
2010-06-11  8:51 A strange problem when using IB to transfer things Ding Dinghua
2010-06-11 15:30 ` Dotan Barak
2010-06-12  1:22   ` Ding Dinghua
     [not found]     ` <AANLkTilhV8JJTKxc4OpudTUgKqMyJ5mzcxt6XdSMurDS-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-06-12  9:10       ` P Dotan Barak
     [not found]         ` <4C134EFC.5010207-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2010-06-17  7:07           ` P Ding Dinghua
     [not found]             ` <AANLkTimdeZwZI3FlTncXY_d3QY8jFfNhHERTxl3BD3Bd-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-06-19 10:42               ` P Dotan Barak
     [not found]                 ` <4C1C9EFC.4020304-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2010-06-20  5:51                   ` P Ding Dinghua
     [not found]                     ` <AANLkTinah43AD5N0ZryDsrGprkeVf9-BdLCyr125PQ3p-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-06-20 18:43                       ` P Dotan Barak
     [not found]                         ` <4C1E6134.6070304-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2010-06-21  2:30                           ` P Ding Dinghua
2010-06-17  7:27           ` P Ding Dinghua

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox