* Performance issue for a simple RDMA PingPong
@ 2024-10-03 7:44 IVANE ADAM
2024-10-06 13:31 ` Leon Romanovsky
2024-10-07 9:20 ` Konstantin Taranov
0 siblings, 2 replies; 3+ messages in thread
From: IVANE ADAM @ 2024-10-03 7:44 UTC (permalink / raw)
To: linux-rdma
Hello everyone,
I'm currently having a performance issue to synchronize two different nodes with a simple ping/pong algorithm.
I currently have two different simple code to resume my issue :
The first one work as intended, and loop as follow on both client and server sides :
- post a send work request
- post a receive work request
- wait both completion, acknowledge them and continue.
This little piece of program work as intended, and I'm able to complete 100k request in 2–3 seconds.
However, the second code is as follows :
The client is identical as the first code.
The server do :
- post a receive work request
- wait its completion and acknowledge it
- post a send work request
- wait its completion and acknowledge it
When I do this, it happens that the time to complete a request can take up to 2 seconds (most of it inside the "ibv_get_cq_event()")
Furthermore, we observed that, this happens more often when multiple threads try to do this in synch (unlike first code).
Nb: I was able to replicate this issue only with send/recv, and never with read/write operations.
I try looking for this issue, but found nothing related.
I was able to test this on multiple configuration:
Linux version : 5.10.0-20-amd64, linux distribution : Debian11 & Debian12
We have a Omni-Path network, configured to 100Gb/s ( Intel Omni-Path HFI Silicon 100 Series [discrete] with the hfi1 driver) Firmware version: 1.27.0
Or a Infiniband network, configured to 100Gb/s (Mellanox Technologies MT28908 Family [ConnectX-6] with the mlx5_core driver) Firmware version: 20.29.2002
I tried with the latest version of rdma-core for debian11 & debian 12, having the same issue.
The program were all compiled with gcc, with the -O3 or -O0 optimisation, without any change in the communication time.
The full code for the exemple described above can be found on GitHub under : IAdamUGA/RDMAPerfIssue
I'm not aware if this is a usual behavior or not, or if this is a knowned issue.
I'm relatively new to this domain and i might not know about the different tools that could help me debug that.
Thank you in advance for your help.
Best regards.
__________________________
Ivane ADAM
Doctorant LIG, équipe Erods
ivane.adam@univ-grenoble-alpes.fr
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Performance issue for a simple RDMA PingPong
2024-10-03 7:44 Performance issue for a simple RDMA PingPong IVANE ADAM
@ 2024-10-06 13:31 ` Leon Romanovsky
2024-10-07 9:20 ` Konstantin Taranov
1 sibling, 0 replies; 3+ messages in thread
From: Leon Romanovsky @ 2024-10-06 13:31 UTC (permalink / raw)
To: IVANE ADAM; +Cc: linux-rdma
On Thu, Oct 03, 2024 at 09:44:11AM +0200, IVANE ADAM wrote:
> Hello everyone,
>
> I'm currently having a performance issue to synchronize two different nodes with a simple ping/pong algorithm.
<...>
> I was able to test this on multiple configuration:
> Linux version : 5.10.0-20-amd64, linux distribution : Debian11 & Debian12
> We have a Omni-Path network, configured to 100Gb/s ( Intel Omni-Path HFI Silicon 100 Series [discrete] with the hfi1 driver) Firmware version: 1.27.0
> Or a Infiniband network, configured to 100Gb/s (Mellanox Technologies MT28908 Family [ConnectX-6] with the mlx5_core driver) Firmware version: 20.29.2002
Please work with your FAEs, they will help you to resolve the issue.
Thanks
^ permalink raw reply [flat|nested] 3+ messages in thread
* RE: Performance issue for a simple RDMA PingPong
2024-10-03 7:44 Performance issue for a simple RDMA PingPong IVANE ADAM
2024-10-06 13:31 ` Leon Romanovsky
@ 2024-10-07 9:20 ` Konstantin Taranov
1 sibling, 0 replies; 3+ messages in thread
From: Konstantin Taranov @ 2024-10-07 9:20 UTC (permalink / raw)
To: IVANE ADAM, linux-rdma
> Hello everyone,
>
> I'm currently having a performance issue to synchronize two different nodes
> with a simple ping/pong algorithm.
> I currently have two different simple code to resume my issue :
>
> The first one work as intended, and loop as follow on both client and server
> sides :
> - post a send work request
> - post a receive work request
> - wait both completion, acknowledge them and continue.
> This little piece of program work as intended, and I'm able to complete 100k
> request in 2–3 seconds.
>
> However, the second code is as follows :
> The client is identical as the first code.
> The server do :
> - post a receive work request
> - wait its completion and acknowledge it
> - post a send work request
> - wait its completion and acknowledge it When I do this, it happens that the
> time to complete a request can take up to 2 seconds (most of it inside the
> "ibv_get_cq_event()") Furthermore, we observed that, this happens more
> often when multiple threads try to do this in synch (unlike first code).
Hey,
The problem you experience is that the responder does not have receive buffers when incoming send packet arrives.
That is why you do not see it for Writes and Reads.
To check that you can configure your RC QP with zero RNR NAK retransmit. (rnr_retry = 0).
You may wonder why you experience it, and the key is that Send WC is generated after an RTT (as WC indicates reliable reception of data) and Receive WC once data arrives (as Receive WR is consumed). You have the following problem with a new client:
The new client waits for a Send WC before posting a new Receive WR, but the server sends a WR once it sees a Receive WR.
As a result, the client has not posted on time a receive WR (as it saw a Send WC right before an incoming Send message), which incurs RNR NAK retransmit.
- Konstantin
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-10-07 9:20 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-03 7:44 Performance issue for a simple RDMA PingPong IVANE ADAM
2024-10-06 13:31 ` Leon Romanovsky
2024-10-07 9:20 ` Konstantin Taranov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox