netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Intel 82599 ixgbe driver performance
@ 2011-08-10  6:19 J.Hwan Kim
  2011-08-10 19:18 ` Martin Josefsson
  2011-08-10 20:58 ` Rick Jones
  0 siblings, 2 replies; 4+ messages in thread
From: J.Hwan Kim @ 2011-08-10  6:19 UTC (permalink / raw)
  To: netdev

Hi, everyone

I'm testing our network card which includes intel 82599 based on ixgbe 
driver.
I wonder what is the Rx performance of i82599 without network stack only 
with 64Byte frames.
Our driver reads the packet directly from DMA packet buffer and push to 
the application
without passing through linux kernel stack.
It seems that the intel 82599 cannot push 64B frames to DMA area in 10G.
Is it right?

If it is the case, what is the bottleneck of 82599?

Thanks in advance.

Best Regards,
J.Hwan KIm






^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Intel 82599 ixgbe driver performance
  2011-08-10  6:19 Intel 82599 ixgbe driver performance J.Hwan Kim
@ 2011-08-10 19:18 ` Martin Josefsson
  2011-08-10 20:58 ` Rick Jones
  1 sibling, 0 replies; 4+ messages in thread
From: Martin Josefsson @ 2011-08-10 19:18 UTC (permalink / raw)
  To: J.Hwan Kim; +Cc: netdev

On Wed, Aug 10, 2011 at 8:19 AM, J.Hwan Kim <frog1120@gmail.com> wrote:

> I'm testing our network card which includes intel 82599 based on ixgbe driver.
> I wonder what is the Rx performance of i82599 without network stack only with 64Byte frames.
> Our driver reads the packet directly from DMA packet buffer and push to the application
> without passing through linux kernel stack.
> It seems that the intel 82599 cannot push 64B frames to DMA area in 10G.
> Is it right?
>
> If it is the case, what is the bottleneck of 82599?

My experience with 82599 is that it can RX 13.4 Mpps using 64byte
frames and a single port.
When using both ports the rate drops to 10.7 Mpps per port.

Note that my experience does not involve the ixgbe driver, these
numbers were obtained using a custom driver and OS, but should give
some indication of what the hardware is capable of.

(If you want to see something "fun", try 65byte packets with 82599 and
the Intel X58 IOH (or 5500/5520 server versions) :)

--
/Martin

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Intel 82599 ixgbe driver performance
  2011-08-10  6:19 Intel 82599 ixgbe driver performance J.Hwan Kim
  2011-08-10 19:18 ` Martin Josefsson
@ 2011-08-10 20:58 ` Rick Jones
       [not found]   ` <4E433706.2020302@gmail.com>
  1 sibling, 1 reply; 4+ messages in thread
From: Rick Jones @ 2011-08-10 20:58 UTC (permalink / raw)
  To: J.Hwan Kim; +Cc: netdev, rizzo

On 08/09/2011 11:19 PM, J.Hwan Kim wrote:
> Hi, everyone
>
> I'm testing our network card which includes intel 82599 based on
> ixgbe driver. I wonder what is the Rx performance of i82599 without
> network stack only with 64Byte frames. Our driver reads the packet
> directly from DMA packet buffer and push to the application without
> passing through linux kernel stack. It seems that the intel 82599
> cannot push 64B frames to DMA area in 10G. Is it right?

Does your driver perform a copy of that 64B frame to user space?

Is this a single-threaded test running?

What does an lat_mem_rd -t (-t for random stride) test from lmbench give 
for your system's memory latency?  (Perhaps using numactl to ensure 
local, or remote memory access, as you desire)

At line rate for minimum sized frames over 10 GbE, you have a frame 
arriving every 60-odd nanoseconds. At that speed, you cannot take even 
one cache miss per frame (*) in a single-threaded path and still achieve 
line-rate PPS.

As it happens, there was a presentation at HP Labs recently, given by 
Luigi Rizzo on his netmap work.  The slides can be found at
http://info.iet.unipi.it/~luigi/netmap/talk-hp.html .  As it happens, 
Luigi presented some performance figures using an Intel 82599.

happy benchmarking,

rick jones
(*) much of my time has been spent in a world where a cache miss is 
three digits worth of nanoseconds (to the left of the decimal), 
sometimes high two digits.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Intel 82599 ixgbe driver performance
       [not found]   ` <4E433706.2020302@gmail.com>
@ 2011-08-11 18:43     ` Rick Jones
  0 siblings, 0 replies; 4+ messages in thread
From: Rick Jones @ 2011-08-11 18:43 UTC (permalink / raw)
  To: J.Hwan Kim; +Cc: netdev

On 08/10/2011 06:57 PM, J.Hwan Kim wrote:
> On 2011년 08월 11일 05:58, Rick Jones wrote:
>> On 08/09/2011 11:19 PM, J.Hwan Kim wrote:
>>> Hi, everyone
>>>
>>> I'm testing our network card which includes intel 82599 based on
>>> ixgbe driver. I wonder what is the Rx performance of i82599 without
>>> network stack only with 64Byte frames. Our driver reads the packet
>>> directly from DMA packet buffer and push to the application without
>>> passing through linux kernel stack. It seems that the intel 82599
>>> cannot push 64B frames to DMA area in 10G. Is it right?
>>
>> Does your driver perform a copy of that 64B frame to user space?
> Our driver and user application shares the packet memory
>
>> Is this a single-threaded test running?
> Now, 4 core is running and 4 RX queue is used, of which intrerrupt
> affinity is set, but the result is worse than 1 single queue.
>> What does an lat_mem_rd -t (-t for random stride) test from lmbench
>> give for your system's memory latency? (Perhaps using numactl to
>> ensure local, or remote memory access, as you desire)
> ./lat_mem_rd -t 128
> "stride=64
>
> 0.00049 1.003
> 0.00098 1.003
> 0.00195 1.003
> 0.00293 1.003
> 0.00391 1.003
> 0.00586 1.003
> 0.00781 1.003
> 0.01172 1.003
> 0.01562 1.003
> 0.02344 1.003
> 0.03125 1.003
> 0.04688 5.293
> 0.06250 5.307
> 0.09375 5.571
> 0.12500 5.683
> 0.18750 5.683
> 0.25000 5.683
> 0.37500 16.394
> 0.50000 42.394

Unless the chip you are using has a rather tiny (by today's standards) 
data cache, you need to go much father there - I suspect that at 0.5 MB 
you have not yet gotten beyond the size of the last level of data cache 
on the chip.

I would suggest:

(from a system that is not otherwise idle...)

./lat_mem_rd -t 512 256"stride=256
0.00049 1.237
0.00098 1.239
0.00195 1.228
0.00293 1.238
0.00391 1.243
0.00586 1.238
0.00781 1.250
0.01172 1.249
0.01562 1.251
0.02344 1.247
0.03125 1.247
0.04688 3.125
0.06250 3.153
0.09375 3.158
0.12500 3.177
0.18750 6.636
0.25000 8.729
0.37500 16.167
0.50000 16.901
0.75000 16.953
1.00000 17.362
1.50000 18.781
2.00000 20.243
3.00000 23.434
4.00000 24.965
6.00000 35.951
8.00000 56.026
12.00000 76.169
16.00000 80.741
24.00000 83.237
32.00000 84.043
48.00000 84.132
64.00000 83.775
96.00000 83.298
128.00000 83.039
192.00000 82.659
256.00000 82.464
384.00000 82.280
512.00000 82.092

You can see the large jump starting at 8MB  - that is where the last 
level cache runs-out on the chip I'm using - an Intel W3550.

Now, as run, that will include TLB miss overhead once the area of memory 
being accessed is larger than can be mapped by the chip's TLB at the 
page size being used.  You can use libhugetlbfs to mitigate that through 
the use of hugepages.

rick jones

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-08-11 18:43 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-08-10  6:19 Intel 82599 ixgbe driver performance J.Hwan Kim
2011-08-10 19:18 ` Martin Josefsson
2011-08-10 20:58 ` Rick Jones
     [not found]   ` <4E433706.2020302@gmail.com>
2011-08-11 18:43     ` Rick Jones

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).