From: Dotan Barak <dotanba-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: neutron <neutronsharc-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: Paul Grun
<pgrun-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org>,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: back to back RDMA read fail?
Date: Wed, 11 Nov 2009 11:52:41 +0200 [thread overview]
Message-ID: <4AFA8969.501@gmail.com> (raw)
In-Reply-To: <7d5928b30911100731o24941445wfb8be19e2b0cc1fb-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
Hi.
how do you connect the QPs?
via CM/CMA or by sockets (and you actually call the ibv_modify_qp)?
Dotan
neutron wrote:
> Hi Paul, thanks a lot for your quick reply!
>
> In my test, client informs the server of its local memory (rkey,
> addr, size) by sending 4 back to back messages, each message elicits
> a RDMA read request (RR) from the server.
>
> In other words, client exposes its memory to the server, and server
> RDMA reads it.
>
> As far as RDMA read is concerned, server is a requester, and client is
> a responder, right?
>
> The error I encountered happens at the initial phase, when client
> sends 4 back to back messages to server(using ibv_post_send ),
> containing (rkey, addr, size) client's local memory.
>
> In these 4 ibv_post_send(), client will see one failure. At server
> side, server has already posted enough WQs in the RQ. The failures
> are included in my first email.
>
> Looking at the program output, it appears that, server gets messages
> 1, issues RR 1, gets message 2, issues RR 2. But somehow client
> reports that "send message 2" fails.
>
> On the contrary, server reports "receive message 3" fails.
>
> As a result, server gets message 1,2,4, and succeeds with RR 1,2,4.
> But clients sees that message 2 fails, and succeed with message 1,3,4.
> This inconsistency is the problem that puzzled me.
>
> ------------
> By the way, how to interpret the parameters for RDMA, and what are
> parameters that control RDMA behavior? Below are something I can
> find, there must be more....
>
> max_qp_rd_atom: 4
> max_res_rd_atom: 258048
> max_qp_init_rd_atom: 128
>
> qp_attr.max_dest_rd_atomic
> qp_attr.max_rd_atomic
>
>
>
> -neutron
>
>
>
> On Tue, Nov 10, 2009 at 2:04 AM, Paul Grun <pgrun-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote:
>
>> Is it possible that you exceeded the number of available RDMA Read Resources
>> available on the server? There is an expectation that the client knows how
>> many outstanding RDMA Read Requests the responder (server) is capable of
>> handling; if the requester (client) exceeds that number, the responder will
>> indeed return a NAK-Invalid Request. Sounds like your server is configured
>> to accept three outstanding RDMA Read Requests.
>> This also explains why it works when you pause the program periodically...it
>> gives the responder time to generate the RDMA Read Responses and therefore
>> free up some resources to be used in receiving the next incoming RDMA Read
>> Request.
>>
>> -Paul
>>
>> -----Original Message-----
>> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> [mailto:linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of neutron
>> Sent: Monday, November 09, 2009 9:04 PM
>> To: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> Subject: back to back RDMA read fail?
>>
>> Hi all,
>>
>> I have a simple program that test back to back RDMA read performance.
>> However I encountered errors for unknown reasons.
>>
>> The basic flow of my program is:
>>
>> client:
>> ibv_post_send() to send 4 back to back messages to server (no delay
>> inbetween). Each message contains the (rkey, addr, size) of a local
>> buffer. The buffer is registered with remote-read/write/ permissions.
>> After that, ibv_poll_cq() is called to wait for completion.
>>
>> server:
>> First, enough receive WRs are posted to the RQ. Upon receipt of each
>> message, immediately post a RDMA read request, using the (rkey, addr,
>> size) information contained in the originating message.
>>
>> --------------
>> Both client and server use RC QP. Some errors are observed.
>>
>> On client side, ibv_poll_cq() gets 4 CQE, one out of the 4 CQE is an error:
>> CQ:: wr_id=0x0, wc_opcode=IBV_WC_SEND, wc_status=remote invalid RD
>> request, wc_flag=0x3b
>> byte_len=11338758, immdata=1110104528, qp_num=0x0, src_qp=2290530758
>>
>> The other 3 CQE are success.
>>
>> On server side,
>> 3 of the 4 messages are successfully received. One message produces an
>> error CQE:
>> CQ:: wr_id=0x8000000000, wc_opcode=Unknow-wc-opcode,
>> wc_status=unknown, wc_flag=0x0
>> byte_len=9569287, immdata=0, qp_num=0x0, src_qp=265551872
>>
>> The 3 RDMA read corresponding to the successful receive all succeed.
>>
>> But, if I pause the client program for a short while( usleep(100) for
>> example ) after calling ibv_post_send(), then no error occurs.
>> Anyone can point out the pitfall here? Thanks!
>>
>>
>> -----------
>> On both client and server, I'm using 'mthca0' type MT25208. The QPs
>> are initialized with "qp_attr.max_dest_rd_atomic=4,
>> qp_attr.max_rd_atomic = 4". The QP's "devinfo -v" gives the
>> information:
>>
>> hca_id: mthca0
>> fw_ver: 5.1.400
>> node_guid: 0002:c902:0023:c04c
>> sys_image_guid: 0002:c902:0023:c04f
>> vendor_id: 0x02c9
>> vendor_part_id: 25218
>> hw_ver: 0xA0
>> board_id: MT_0370130002
>> phys_port_cnt: 2
>> max_mr_size: 0xffffffffffffffff
>> page_size_cap: 0xfffff000
>> max_qp: 64512
>> max_qp_wr: 16384
>> device_cap_flags: 0x00001c76
>> max_sge: 27
>> max_sge_rd: 0
>> max_cq: 65408
>> max_cqe: 131071
>> max_mr: 131056
>> max_pd: 32764
>> max_qp_rd_atom: 4
>> max_ee_rd_atom: 0
>> max_res_rd_atom: 258048
>> max_qp_init_rd_atom: 128
>> max_ee_init_rd_atom: 0
>> atomic_cap: ATOMIC_HCA (1)
>> max_ee: 0
>> max_rdd: 0
>> max_mw: 0
>> max_raw_ipv6_qp: 0
>> max_raw_ethy_qp: 0
>> max_mcast_grp: 8192
>> max_mcast_qp_attach: 56
>> max_total_mcast_qp_attach: 458752
>> max_ah: 0
>> max_fmr: 0
>> max_srq: 960
>> max_srq_wr: 16384
>> max_srq_sge: 27
>> max_pkeys: 64
>> local_ca_ack_delay: 15
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2009-11-11 9:52 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <7d5928b30911092036v6d1196a8m53287dc5eebb654d@mail.gmail.com>
[not found] ` <7d5928b30911092036v6d1196a8m53287dc5eebb654d-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-11-10 5:03 ` back to back RDMA read fail? neutron
[not found] ` <00ca01ca61d3$fd4dd290$f7e977b0$@com>
2009-11-10 15:31 ` neutron
[not found] ` <7d5928b30911100731o24941445wfb8be19e2b0cc1fb-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-11-11 9:52 ` Dotan Barak [this message]
[not found] ` <4AFA8969.501-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2009-11-11 17:02 ` neutron
[not found] ` <7d5928b30911110902q58d58ae3n9dc86c6ad2ed587b-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-11-11 18:17 ` Dotan Barak
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4AFA8969.501@gmail.com \
--to=dotanba-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=neutronsharc-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=pgrun-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox