public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
* RDMA reads/writes per second
@ 2013-10-29  0:05 akalia-OM76b2Iv3yLQjUSlxSEPGw
  0 siblings, 0 replies; 3+ messages in thread
From: akalia-OM76b2Iv3yLQjUSlxSEPGw @ 2013-10-29  0:05 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Hi.

I'm measuring the number of RDMA reads and writes per second. In my
experimental setup I have one server connected to several clients and I
want to extract the maximum IOs from the server. I had two questions
regarding this:

1. What is the expected number of small (16 byte values) RDMA reads for
ConnectX 3 cards? Currently, I've seen a maximum of 9 million reads per
second with my code. However, several websites report much higher messages
per second. For example,
http://www.marketwatch.com/story/mellanox-fdr-56gbs-infiniband-solutions-deliver-leading-application-performance-and-scalability-2013-06-17
talks about 137 messages per second.
http://www.mellanox.com/pdf/products/oem/RG_HP.pdf reports 40 million MPI
messages per second. What sort optimizations could I do to reach similar
numbers?

2. The number of IOPS drops when the size of the registered region
increases. For a 1 KB registered region, the maximum random reads per
second that the server can provide is around 9 million. It drops to 2
million when I increase the registered size to 1 GB.
What is the reason behind this? Does the HCA perform caching for reads?
That could be a possible explanation. Another possible reason is TLB
misses in the HCA.
Further, I'm seeing even greater variation with writes. I can think of 2
possible explanations for that:
a. As my writes are to random locations, there could be more TLB misses
for larger registered regions.
b. The HCA buffers writes locally and does not transfer them into the CPU
memory immediately (this can be done only for small registered regions).

Thanks for your time!

Anuj Kalia,
Carnegie Mellon University

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RDMA reads/writes per second
@ 2013-10-29  0:28 Anuj Kalia
  2013-10-30 16:01 ` Ido Shamai
  0 siblings, 1 reply; 3+ messages in thread
From: Anuj Kalia @ 2013-10-29  0:28 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Hi.

I'm measuring the number of RDMA reads and writes per second. In my
experimental setup I have one server connected to several clients and
I want to extract the maximum IOs from the server. I had two questions
regarding this:

1. What is the expected number of small (16 byte values) RDMA reads
for ConnectX 3 cards? Currently, I've seen a maximum of 9 million
reads per second with my code. However, several websites report much
higher messages per second. For
example,http://www.marketwatch.com/story/mellanox-fdr-56gbs-infiniband-solutions-deliver-leading-application-performance-and-scalability-2013-06-17talks
about 137 messages per second.
http://www.mellanox.com/pdf/products/oem/RG_HP.pdf reports 40 million
MPI messages per second. What sort optimizations could I do to reach
similar numbers?

2. The number of IOPS drops when the size of the registered region
increases. For a 1 KB registered region, the maximum random reads per
second that the server can provide is around 9 million. It drops to 2
million when I increase the registered size to 1 GB.
What is the reason behind this? Does the HCA perform caching for
reads? That could be a possible explanation. Another possible reason
is TLB misses in the HCA.
Further, I'm seeing even greater variation with writes. I can think of
2 possible explanations for that:
a. As my writes are to random locations, there could be more TLB
misses for larger registered regions.
b. The HCA buffers writes locally and does not transfer them into the
CPU memory immediately (this can be done only for small registered
regions).

Thanks for your time!
I'm sorry if the list receives more than one copy of this email. I've
been running into a HTML rejection error.

Anuj Kalia,
Carnegie Mellon University
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: RDMA reads/writes per second
  2013-10-29  0:28 RDMA reads/writes per second Anuj Kalia
@ 2013-10-30 16:01 ` Ido Shamai
  0 siblings, 0 replies; 3+ messages in thread
From: Ido Shamai @ 2013-10-30 16:01 UTC (permalink / raw)
  To: Anuj Kalia, linux-rdma-u79uwXL29TY76Z2rM5mHXA

Hi,

With ConnectX3 the maximum IOs is around 35M at most.
137M refers to Connect-IB HCA (and not ConnectX3).

Anyway, If you are using 1 process in this test, then 9M is highest you 
can get.
This limitation comes from the SW layer (post_send function for single 
IO takes ~ 100 ns , so posting per processes is bounded to 10M at most)
Above it (and up to the max) can be achieved with multiple parallel 
process or using post list to issue several IOs in parallel in the a 
single post send.
Perftest has a nice demonstration for how to achieve it - 
https://openfabrics.org/downloads/perftest/

As for the second issue,if you randomize each IO address access, as big 
as the buffer the larger chance for a HCA TLB miss.
You can optimize the IO rate using 64B aligned accesses (for both ways) 
for each IO transaction.
I believe you can get around 5M that way, even if every transaction will 
cause a HCA TLB miss.

How do not see a reason WRITE should be different in READ in terms of 
IOs, assuming you randomize both sides in either scenario.
Also, I don't see a reason for different sizes of region to affect here.
If you are using a Sandy Bridge (Xeon E5 series) then data is written to 
L3 cache, regardless of the registered area.

Ido

On 10/29/2013 2:28 AM, Anuj Kalia wrote:
> Hi.
>
> I'm measuring the number of RDMA reads and writes per second. In my
> experimental setup I have one server connected to several clients and
> I want to extract the maximum IOs from the server. I had two questions
> regarding this:
>
> 1. What is the expected number of small (16 byte values) RDMA reads
> for ConnectX 3 cards? Currently, I've seen a maximum of 9 million
> reads per second with my code. However, several websites report much
> higher messages per second. For
> example,http://www.marketwatch.com/story/mellanox-fdr-56gbs-infiniband-solutions-deliver-leading-application-performance-and-scalability-2013-06-17talks
> about 137 messages per second.
> http://www.mellanox.com/pdf/products/oem/RG_HP.pdf reports 40 million
> MPI messages per second. What sort optimizations could I do to reach
> similar numbers?
>
> 2. The number of IOPS drops when the size of the registered region
> increases. For a 1 KB registered region, the maximum random reads per
> second that the server can provide is around 9 million. It drops to 2
> million when I increase the registered size to 1 GB.
> What is the reason behind this? Does the HCA perform caching for
> reads? That could be a possible explanation. Another possible reason
> is TLB misses in the HCA.
> Further, I'm seeing even greater variation with writes. I can think of
> 2 possible explanations for that:
> a. As my writes are to random locations, there could be more TLB
> misses for larger registered regions.
> b. The HCA buffers writes locally and does not transfer them into the
> CPU memory immediately (this can be done only for small registered
> regions).
>
> Thanks for your time!
> I'm sorry if the list receives more than one copy of this email. I've
> been running into a HTML rejection error.
>
> Anuj Kalia,
> Carnegie Mellon University
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2013-10-30 16:01 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-29  0:28 RDMA reads/writes per second Anuj Kalia
2013-10-30 16:01 ` Ido Shamai
  -- strict thread matches above, loose matches on Subject: below --
2013-10-29  0:05 akalia-OM76b2Iv3yLQjUSlxSEPGw

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox