ib_post_send execution time

public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed

* ib_post_send execution time
@ 2014-10-23 17:21 Evgenii Smirnov
       [not found] ` <CAEv+Kc1mioxX+pUky3a9Wfd8HzzOTAqyjw0tgdf4Qu2956hOaw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Evgenii Smirnov @ 2014-10-23 17:21 UTC (permalink / raw)
  To: linux-rdma

Hello,

I am trying to achieve high packet per second throughput with 2-byte
messages over Infiniband from kernel using IB_SEND verb. The most I
can get so far is 3.5 Mpps. However, ib_send_bw utility from perftest
package is able to send 2-byte packets with rate of 9 Mpps.
After some profiling I found that execution of ib_post_send function
in kernel takes about 213 ns in average, for the user-space function
ibv_post_send takes only about 57 ns.
As I understand, these functions do almost same operations. The work
request fields and queue pair parameters are also the same. Why do
they have such big difference in execution times?

I'm using:
Debian Jessie
kernel 3.16-2-amd64
libibverbs1 (1.1.8-1)
libmlx4-1 (1.0.6-1)
perftest (2.3+0.12.gcb5b746-1)
ConnectX-3 VPI adapter MT_1090110018 fw_ver: 2.32.5100
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

[parent not found: <CAEv+Kc1mioxX+pUky3a9Wfd8HzzOTAqyjw0tgdf4Qu2956hOaw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: ib_post_send execution time
       [not found] ` <CAEv+Kc1mioxX+pUky3a9Wfd8HzzOTAqyjw0tgdf4Qu2956hOaw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-10-23 18:45   ` Roland Dreier
       [not found]     ` <CAL1RGDVS2h4wJrxsYwjMH6cOz2jXCuUiY-OjZPjQrSu1btkHmw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Roland Dreier @ 2014-10-23 18:45 UTC (permalink / raw)
  To: Evgenii Smirnov; +Cc: linux-rdma

On Thu, Oct 23, 2014 at 10:21 AM, Evgenii Smirnov
<evgenii.smirnov-EIkl63zCoXaH+58JC4qpiA@public.gmane.org> wrote:
> I am trying to achieve high packet per second throughput with 2-byte
> messages over Infiniband from kernel using IB_SEND verb. The most I
> can get so far is 3.5 Mpps. However, ib_send_bw utility from perftest
> package is able to send 2-byte packets with rate of 9 Mpps.
> After some profiling I found that execution of ib_post_send function
> in kernel takes about 213 ns in average, for the user-space function
> ibv_post_send takes only about 57 ns.
> As I understand, these functions do almost same operations. The work
> request fields and queue pair parameters are also the same. Why do
> they have such big difference in execution times?


Interesting.  I guess it would be useful to look at perf top / and or
get a perf report with "perf report -a -g" when running your high PPS
workload, and see where the time is wasted.

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

[parent not found: <CAL1RGDVS2h4wJrxsYwjMH6cOz2jXCuUiY-OjZPjQrSu1btkHmw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: ib_post_send execution time
       [not found]     ` <CAL1RGDVS2h4wJrxsYwjMH6cOz2jXCuUiY-OjZPjQrSu1btkHmw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-10-24  0:39       ` Eli Cohen
  2014-10-24 11:30         ` Or Gerlitz
  0 siblings, 1 reply; 6+ messages in thread
From: Eli Cohen @ 2014-10-24  0:39 UTC (permalink / raw)
  To: Roland Dreier; +Cc: Evgenii Smirnov, linux-rdma

On Thu, Oct 23, 2014 at 11:45:05AM -0700, Roland Dreier wrote:
> On Thu, Oct 23, 2014 at 10:21 AM, Evgenii Smirnov
> <evgenii.smirnov-EIkl63zCoXaH+58JC4qpiA@public.gmane.org> wrote:
> > I am trying to achieve high packet per second throughput with 2-byte
> > messages over Infiniband from kernel using IB_SEND verb. The most I
> > can get so far is 3.5 Mpps. However, ib_send_bw utility from perftest
> > package is able to send 2-byte packets with rate of 9 Mpps.
> > After some profiling I found that execution of ib_post_send function
> > in kernel takes about 213 ns in average, for the user-space function
> > ibv_post_send takes only about 57 ns.
> > As I understand, these functions do almost same operations. The work
> > request fields and queue pair parameters are also the same. Why do
> > they have such big difference in execution times?
> 
> 
> Interesting.  I guess it would be useful to look at perf top / and or
> get a perf report with "perf report -a -g" when running your high PPS
> workload, and see where the time is wasted.
> 

I assume ib_send_bw uses inline with blueflame so it may be part of
the explanation to the differences you see.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ib_post_send execution time
  2014-10-24  0:39       ` Eli Cohen
@ 2014-10-24 11:30         ` Or Gerlitz
       [not found]           ` <CAJ3xEMhN1HvddjMECnobZixd+v=0JasRzCXxY_aQXznU2Zx1sQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Or Gerlitz @ 2014-10-24 11:30 UTC (permalink / raw)
  To: Eli Cohen; +Cc: Roland Dreier, Evgenii Smirnov, linux-rdma

On Fri, Oct 24, 2014 at 3:39 AM, Eli Cohen <eli-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> wrote:
> On Thu, Oct 23, 2014 at 11:45:05AM -0700, Roland Dreier wrote:
>> On Thu, Oct 23, 2014 at 10:21 AM, Evgenii Smirnov
>> <evgenii.smirnov-EIkl63zCoXaH+58JC4qpiA@public.gmane.org> wrote:
>> > I am trying to achieve high packet per second throughput with 2-byte
>> > messages over Infiniband from kernel using IB_SEND verb. The most I
>> > can get so far is 3.5 Mpps. However, ib_send_bw utility from perftest
>> > package is able to send 2-byte packets with rate of 9 Mpps.
>> > After some profiling I found that execution of ib_post_send function
>> > in kernel takes about 213 ns in average, for the user-space function
>> > ibv_post_send takes only about 57 ns.
>> > As I understand, these functions do almost same operations. The work
>> > request fields and queue pair parameters are also the same. Why do
>> > they have such big difference in execution times?
>>
>>
>> Interesting.  I guess it would be useful to look at perf top / and or
>> get a perf report with "perf report -a -g" when running your high PPS
>> workload, and see where the time is wasted.
>>
>
> I assume ib_send_bw uses inline with blueflame so it may be part of
> the explanation to the differences you see.

I think it should be the other way around... when we use inline we
consume more CPU cycles and here we see notable different (213ns --
kernel 57ns user) in favor of libmlx4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

[parent not found: <CAJ3xEMhN1HvddjMECnobZixd+v=0JasRzCXxY_aQXznU2Zx1sQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: ib_post_send execution time
       [not found]           ` <CAJ3xEMhN1HvddjMECnobZixd+v=0JasRzCXxY_aQXznU2Zx1sQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-10-24 15:52             ` Steve Wise
       [not found]               ` <544A75B5.1080304-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Steve Wise @ 2014-10-24 15:52 UTC (permalink / raw)
  To: Or Gerlitz, Eli Cohen; +Cc: Roland Dreier, Evgenii Smirnov, linux-rdma

On 10/24/2014 6:30 AM, Or Gerlitz wrote:
> On Fri, Oct 24, 2014 at 3:39 AM, Eli Cohen <eli-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> wrote:
>> On Thu, Oct 23, 2014 at 11:45:05AM -0700, Roland Dreier wrote:
>>> On Thu, Oct 23, 2014 at 10:21 AM, Evgenii Smirnov
>>> <evgenii.smirnov-EIkl63zCoXaH+58JC4qpiA@public.gmane.org> wrote:
>>>> I am trying to achieve high packet per second throughput with 2-byte
>>>> messages over Infiniband from kernel using IB_SEND verb. The most I
>>>> can get so far is 3.5 Mpps. However, ib_send_bw utility from perftest
>>>> package is able to send 2-byte packets with rate of 9 Mpps.
>>>> After some profiling I found that execution of ib_post_send function
>>>> in kernel takes about 213 ns in average, for the user-space function
>>>> ibv_post_send takes only about 57 ns.
>>>> As I understand, these functions do almost same operations. The work
>>>> request fields and queue pair parameters are also the same. Why do
>>>> they have such big difference in execution times?
>>>
>>> Interesting.  I guess it would be useful to look at perf top / and or
>>> get a perf report with "perf report -a -g" when running your high PPS
>>> workload, and see where the time is wasted.
>>>
>> I assume ib_send_bw uses inline with blueflame so it may be part of
>> the explanation to the differences you see.
> I think it should be the other way around... when we use inline we
> consume more CPU cycles and here we see notable different (213ns --
> kernel 57ns user) in favor of libmlx4
>

Inline may consume more cpu cycles but should reduce latency because the 
IO is completed with only 1 DMA transaction, the WR fetch, which 
includes the data.  Non-inline requires 2 DMA transactions, the WR fetch 
and the data fetch.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

[parent not found: <544A75B5.1080304-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>]

* Re: ib_post_send execution time
       [not found]               ` <544A75B5.1080304-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
@ 2014-10-28 16:49                 ` Evgenii Smirnov
  0 siblings, 0 replies; 6+ messages in thread
From: Evgenii Smirnov @ 2014-10-28 16:49 UTC (permalink / raw)
  To: Steve Wise; +Cc: Or Gerlitz, Eli Cohen, Roland Dreier, linux-rdma

I forgot to mention that in both cases flag IB_SEND_INLINE (or
IBV_SEND_INLINE) is cleared.
Below are the results from perf top. They are quite equal to the
results of perf record. The functions that consumes almost 50% of
cycles is my function. Essentially, all it does is calling
ib_post_send with already predefined structs ib_send_wr and ib_sge.
Still have no idea why mlx4_ib_post_send consumes such big amount of
cycles.


  48.93%  /proc/kcore  0x7fffa078774c      k [k] test_send
  35.20%  /proc/kcore  0x7fffa0317a99      k [k] mlx4_ib_post_send
   6.39%  /proc/kcore  0x7fff8150cd3f      k [k] _raw_spin_lock_irqsave
   3.85%  /proc/kcore  0x7fffa0313e0f      k [k] stamp_send_wqe
   2.03%  /proc/kcore  0x7fff8150c9de      k [k] _raw_spin_unlock_irqrestore
   1.80%  /proc/kcore  0x7fffa05fc7a9      k [k] client_send
   1.13%  /proc/kcore  0x7fff81086a75      k [k] kthread_should_stop
   0.18%  /proc/kcore  0x7fffa03080d8      k [k] mlx4_ib_poll_cq
   0.17%  /proc/kcore  0x7fffa0787b80      k [k] process_wc
   0.14%  /proc/kcore  0x7fffa0307403      k [k] get_sw_cqe
   0.02%  /proc/kcore  0x7fff8150dcd0      k [k] irq_entries_start
   0.02%  /proc/kcore  0x7fffa017bf37      k [k] eq_set_ci.isra.14
   0.02%  /proc/kcore  0x7fffa017c4d6      k [k] mlx4_eq_int
   0.01%  /proc/kcore  0x7fff8150cd7e      k [k] _raw_spin_lock
   0.01%  /proc/kcore  0x7fffa017b106      k [k] mlx4_cq_completion
   0.01%  /proc/kcore  0x7fff8150e1f0      k [k] apic_timer_interrupt
   0.01%  /proc/kcore  0x7fffa0308a5d      k [k] mlx4_ib_arm_cq
   0.01%  /proc/kcore  0x7fff81051a86      k [k] native_read_msr_safe
   0.01%  /proc/kcore  0x7fffa017d07b      k [k] mlx4_msi_x_interrupt
   0.01%  /proc/kcore  0x7fff81051aa6      k [k] native_write_msr_safe
   0.01%  /proc/kcore  0x7fff8138c83e      k [k] add_interrupt_randomness
   0.00%  /proc/kcore  0x7fff8101c312      k [k] native_read_tsc
   0.00%  /proc/kcore  0x7fff810bc6d0      k [k] handle_edge_irq
   0.00%  /proc/kcore  0x7fff8150de92      k [k] common_interrupt
   0.00%  /proc/kcore  0x7fff8106ba3c      k [k] raise_softirq
   0.00%  /proc/kcore  0x7fff812accd0      k [k] __radix_tree_lookup
   0.00%  /proc/kcore  0x7fff81072ebc      k [k] run_timer_softirq
   0.00%  /proc/kcore  0x7fff81094721      k [k] idle_cpu

On Fri, Oct 24, 2014 at 5:52 PM, Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> wrote:
> On 10/24/2014 6:30 AM, Or Gerlitz wrote:
>>
>> On Fri, Oct 24, 2014 at 3:39 AM, Eli Cohen <eli-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> wrote:
>>>
>>> On Thu, Oct 23, 2014 at 11:45:05AM -0700, Roland Dreier wrote:
>>>>
>>>> On Thu, Oct 23, 2014 at 10:21 AM, Evgenii Smirnov
>>>> <evgenii.smirnov-EIkl63zCoXaH+58JC4qpiA@public.gmane.org> wrote:
>>>>>
>>>>> I am trying to achieve high packet per second throughput with 2-byte
>>>>> messages over Infiniband from kernel using IB_SEND verb. The most I
>>>>> can get so far is 3.5 Mpps. However, ib_send_bw utility from perftest
>>>>> package is able to send 2-byte packets with rate of 9 Mpps.
>>>>> After some profiling I found that execution of ib_post_send function
>>>>> in kernel takes about 213 ns in average, for the user-space function
>>>>> ibv_post_send takes only about 57 ns.
>>>>> As I understand, these functions do almost same operations. The work
>>>>> request fields and queue pair parameters are also the same. Why do
>>>>> they have such big difference in execution times?
>>>>
>>>>
>>>> Interesting.  I guess it would be useful to look at perf top / and or
>>>> get a perf report with "perf report -a -g" when running your high PPS
>>>> workload, and see where the time is wasted.
>>>>
>>> I assume ib_send_bw uses inline with blueflame so it may be part of
>>> the explanation to the differences you see.
>>
>> I think it should be the other way around... when we use inline we
>> consume more CPU cycles and here we see notable different (213ns --
>> kernel 57ns user) in favor of libmlx4
>>
>
> Inline may consume more cpu cycles but should reduce latency because the IO
> is completed with only 1 DMA transaction, the WR fetch, which includes the
> data.  Non-inline requires 2 DMA transactions, the WR fetch and the data
> fetch.
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-10-28 16:49 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-23 17:21 ib_post_send execution time Evgenii Smirnov
     [not found] ` <CAEv+Kc1mioxX+pUky3a9Wfd8HzzOTAqyjw0tgdf4Qu2956hOaw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-10-23 18:45   ` Roland Dreier
     [not found]     ` <CAL1RGDVS2h4wJrxsYwjMH6cOz2jXCuUiY-OjZPjQrSu1btkHmw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-10-24  0:39       ` Eli Cohen
2014-10-24 11:30         ` Or Gerlitz
     [not found]           ` <CAJ3xEMhN1HvddjMECnobZixd+v=0JasRzCXxY_aQXznU2Zx1sQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-10-24 15:52             ` Steve Wise
     [not found]               ` <544A75B5.1080304-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2014-10-28 16:49                 ` Evgenii Smirnov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox