From mboxrd@z Thu Jan 1 00:00:00 1970 From: Steve Wise Subject: Re: ib_post_send execution time Date: Fri, 24 Oct 2014 10:52:21 -0500 Message-ID: <544A75B5.1080304@opengridcomputing.com> References: <20141024003933.GA30941@mtldesk30> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: Sender: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Or Gerlitz , Eli Cohen Cc: Roland Dreier , Evgenii Smirnov , linux-rdma List-Id: linux-rdma@vger.kernel.org On 10/24/2014 6:30 AM, Or Gerlitz wrote: > On Fri, Oct 24, 2014 at 3:39 AM, Eli Cohen wrote: >> On Thu, Oct 23, 2014 at 11:45:05AM -0700, Roland Dreier wrote: >>> On Thu, Oct 23, 2014 at 10:21 AM, Evgenii Smirnov >>> wrote: >>>> I am trying to achieve high packet per second throughput with 2-byte >>>> messages over Infiniband from kernel using IB_SEND verb. The most I >>>> can get so far is 3.5 Mpps. However, ib_send_bw utility from perftest >>>> package is able to send 2-byte packets with rate of 9 Mpps. >>>> After some profiling I found that execution of ib_post_send function >>>> in kernel takes about 213 ns in average, for the user-space function >>>> ibv_post_send takes only about 57 ns. >>>> As I understand, these functions do almost same operations. The work >>>> request fields and queue pair parameters are also the same. Why do >>>> they have such big difference in execution times? >>> >>> Interesting. I guess it would be useful to look at perf top / and or >>> get a perf report with "perf report -a -g" when running your high PPS >>> workload, and see where the time is wasted. >>> >> I assume ib_send_bw uses inline with blueflame so it may be part of >> the explanation to the differences you see. > I think it should be the other way around... when we use inline we > consume more CPU cycles and here we see notable different (213ns -- > kernel 57ns user) in favor of libmlx4 > Inline may consume more cpu cycles but should reduce latency because the IO is completed with only 1 DMA transaction, the WR fetch, which includes the data. Non-inline requires 2 DMA transactions, the WR fetch and the data fetch. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html