* intel 82599 multi-port performance @ 2011-09-26 10:26 J.Hwan Kim 2011-09-26 14:20 ` Chris Friesen 0 siblings, 1 reply; 14+ messages in thread From: J.Hwan Kim @ 2011-09-26 10:26 UTC (permalink / raw) To: netdev Hi, everyone Now, I'm testing a network card including intel 82599. In our experiment, with the driver modified with ixgbe and multi-port enabled, rx performance of each port with 10Gbps of 64bytes frame is a half than when only 1 port is used. The pcie of our server is GEN2 (5x X 8). Is the result reasonable? When multi-ports are enabled and 10G stream is inserted to each port, the maximum performance of each port is a half in our experiment. Do you think it is a problem of our modified driver or the performance bottleneck of 82599? Now I cannot understand our experiment result. Please give me an advice. Thanks in advance. Best Regards, J.Hwan Kim ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: intel 82599 multi-port performance 2011-09-26 10:26 intel 82599 multi-port performance J.Hwan Kim @ 2011-09-26 14:20 ` Chris Friesen 2011-09-26 15:42 ` J.Hwan.Kim 0 siblings, 1 reply; 14+ messages in thread From: Chris Friesen @ 2011-09-26 14:20 UTC (permalink / raw) To: J.Hwan Kim; +Cc: netdev On 09/26/2011 04:26 AM, J.Hwan Kim wrote: > Hi, everyone > > Now, I'm testing a network card including intel 82599. > In our experiment, with the driver modified with ixgbe and multi-port > enabled, What do you mean by "modified with ixgbe and multi-port enabled"? You shouldn't need to do anything special to use both ports. > rx performance of each port with 10Gbps of 64bytes frame is > a half than when only 1 port is used. Sounds like a cpu limitation. What is your cpu usage? How are your interrupts routed? Are you using multiple rx queues? Chris -- Chris Friesen Software Developer GENBAND chris.friesen@genband.com www.genband.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: intel 82599 multi-port performance 2011-09-26 14:20 ` Chris Friesen @ 2011-09-26 15:42 ` J.Hwan.Kim 2011-09-26 16:04 ` Alexander Duyck 2011-09-26 18:16 ` Rick Jones 0 siblings, 2 replies; 14+ messages in thread From: J.Hwan.Kim @ 2011-09-26 15:42 UTC (permalink / raw) Cc: netdev On 2011년 09월 26일 23:20, Chris Friesen wrote: > On 09/26/2011 04:26 AM, J.Hwan Kim wrote: >> Hi, everyone >> >> Now, I'm testing a network card including intel 82599. >> In our experiment, with the driver modified with ixgbe and multi-port >> enabled, > > What do you mean by "modified with ixgbe and multi-port enabled"? You > shouldn't need to do anything special to use both ports. > >> rx performance of each port with 10Gbps of 64bytes frame is >> a half than when only 1 port is used. > > Sounds like a cpu limitation. What is your cpu usage? How are your > interrupts routed? Are you using multiple rx queues? > Our server is XEON 2.4GHz with 8 cores. I'm using 4 RSS queues for each port and distributed it's interrupts to different cores respectively. I checked the CPU utilization with TOP, I guess ,it is not cpu imitation problem. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: intel 82599 multi-port performance 2011-09-26 15:42 ` J.Hwan.Kim @ 2011-09-26 16:04 ` Alexander Duyck 2011-09-26 16:40 ` Chris Friesen 2011-09-27 0:45 ` J.Hwan Kim 2011-09-26 18:16 ` Rick Jones 1 sibling, 2 replies; 14+ messages in thread From: Alexander Duyck @ 2011-09-26 16:04 UTC (permalink / raw) To: frog1120; +Cc: J.Hwan.Kim, netdev On 09/26/2011 08:42 AM, J.Hwan.Kim wrote: > On 2011년 09월 26일 23:20, Chris Friesen wrote: >> On 09/26/2011 04:26 AM, J.Hwan Kim wrote: >>> Hi, everyone >>> >>> Now, I'm testing a network card including intel 82599. >>> In our experiment, with the driver modified with ixgbe and multi-port >>> enabled, >> >> What do you mean by "modified with ixgbe and multi-port enabled"? You >> shouldn't need to do anything special to use both ports. >> >>> rx performance of each port with 10Gbps of 64bytes frame is >>> a half than when only 1 port is used. >> >> Sounds like a cpu limitation. What is your cpu usage? How are your >> interrupts routed? Are you using multiple rx queues? >> > > Our server is XEON 2.4GHz with 8 cores. > I'm using 4 RSS queues for each port and distributed it's interrupts > to different cores respectively. > I checked the CPU utilization with TOP, I guess ,it is not cpu > imitation problem. What kind of rates are you seeing on a single port versus multiple ports? There are multiple possibilities in terms of what could be limiting your performance. It sounds like you are using a single card, would that be correct? If you are running close to line rate on both ports this could be causing you to saturate the PCIe x8 link. If you have a second card available you may want to try installing that in a secondary Gen2 PCIe slot and seeing if you can improve the performance by using 2 PCIe slots instead of one. Also could you include your kernel config? Certain features such as Netfilter and IOMMU can have a significant impact on performance. Thanks, Alex ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: intel 82599 multi-port performance 2011-09-26 16:04 ` Alexander Duyck @ 2011-09-26 16:40 ` Chris Friesen 2011-09-26 17:24 ` [E1000-devel] " Ben Greear 2011-09-27 0:45 ` J.Hwan Kim 1 sibling, 1 reply; 14+ messages in thread From: Chris Friesen @ 2011-09-26 16:40 UTC (permalink / raw) To: Alexander Duyck Cc: frog1120, J.Hwan.Kim, netdev, Kirsher, Jeffrey T, Brandeburg, Jesse, e1000-devel@lists.sourceforge.net On 09/26/2011 10:04 AM, Alexander Duyck wrote: > It sounds like you are using a single card, would that be correct? If > you are running close to line rate on both ports this could be causing > you to saturate the PCIe x8 link. According to "http://communities.intel.com/community/wired/blog/2009/06/08/understanding-pci-express-bandwidth" 8x PCIe should have a bandwidth of 4GB/s. 2 10Gigabit ports is 2.5GB/s. The 82599 only goes up to 8x, so I'd expect that it should be sufficient to handle the full traffic. To any of the Intel guys out there...any ideas? Can an 82599 on an 8x bus handle max line rate with minimum size packets? Chris -- Chris Friesen Software Developer GENBAND chris.friesen@genband.com www.genband.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [E1000-devel] intel 82599 multi-port performance 2011-09-26 16:40 ` Chris Friesen @ 2011-09-26 17:24 ` Ben Greear 2011-09-26 17:46 ` Chris Friesen 0 siblings, 1 reply; 14+ messages in thread From: Ben Greear @ 2011-09-26 17:24 UTC (permalink / raw) To: Chris Friesen Cc: Alexander Duyck, e1000-devel@lists.sourceforge.net, netdev, Brandeburg, Jesse, J.Hwan.Kim, frog1120 On 09/26/2011 09:40 AM, Chris Friesen wrote: > On 09/26/2011 10:04 AM, Alexander Duyck wrote: > >> It sounds like you are using a single card, would that be correct? If >> you are running close to line rate on both ports this could be causing >> you to saturate the PCIe x8 link. > > According to > "http://communities.intel.com/community/wired/blog/2009/06/08/understanding-pci-express-bandwidth" > 8x PCIe should have a bandwidth of 4GB/s. 2 10Gigabit ports is 2.5GB/s. > > The 82599 only goes up to 8x, so I'd expect that it should be sufficient > to handle the full traffic. > > To any of the Intel guys out there...any ideas? Can an 82599 on an 8x > bus handle max line rate with minimum size packets? Rick Jones sent me an interesting link related to this. Short answer seems to be 'yes', but it seems not for any normal off-the-shelf software stack. > This: http://comments.gmane.org/gmane.linux.network/203602 should lead you to some slide. Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [E1000-devel] intel 82599 multi-port performance 2011-09-26 17:24 ` [E1000-devel] " Ben Greear @ 2011-09-26 17:46 ` Chris Friesen 2011-09-26 17:57 ` Ben Greear 0 siblings, 1 reply; 14+ messages in thread From: Chris Friesen @ 2011-09-26 17:46 UTC (permalink / raw) To: Ben Greear Cc: Alexander Duyck, e1000-devel@lists.sourceforge.net, netdev, Brandeburg, Jesse, J.Hwan.Kim, frog1120 On 09/26/2011 11:24 AM, Ben Greear wrote: > On 09/26/2011 09:40 AM, Chris Friesen wrote: >> To any of the Intel guys out there...any ideas? Can an 82599 on an 8x >> bus handle max line rate with minimum size packets? > > Rick Jones sent me an interesting link related to this. Short answer seems > to be 'yes', but it seems not for any normal off-the-shelf software stack. > > > This: http://comments.gmane.org/gmane.linux.network/203602 should > lead you to some slide. Interesting. I wonder if Intel's DPDK will be the only way to handle those sorts of packet rates. Chris -- Chris Friesen Software Developer GENBAND chris.friesen@genband.com www.genband.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: intel 82599 multi-port performance 2011-09-26 17:46 ` Chris Friesen @ 2011-09-26 17:57 ` Ben Greear 0 siblings, 0 replies; 14+ messages in thread From: Ben Greear @ 2011-09-26 17:57 UTC (permalink / raw) To: Chris Friesen Cc: e1000-devel@lists.sourceforge.net, netdev, Brandeburg, Jesse, J.Hwan.Kim, frog1120 On 09/26/2011 10:46 AM, Chris Friesen wrote: > On 09/26/2011 11:24 AM, Ben Greear wrote: >> On 09/26/2011 09:40 AM, Chris Friesen wrote: > >>> To any of the Intel guys out there...any ideas? Can an 82599 on an 8x >>> bus handle max line rate with minimum size packets? >> >> Rick Jones sent me an interesting link related to this. Short answer seems >> to be 'yes', but it seems not for any normal off-the-shelf software stack. >> >> > This: http://comments.gmane.org/gmane.linux.network/203602 should >> lead you to some slide. > > Interesting. I wonder if Intel's DPDK will be the only way to handle those sorts of packet rates. Pktgen is probably still the fastest general code that I know of, but we had some interesting results setting the TCP_MAXSEGS to 88, which creates around 150 byte packets, and let the NICs offload chop up large TCP writes into small packets on the wire. Using core-I7 980x CPU, and dual-port 82599, we could send around 4Mpps and receive around 2Mpps between two machines. We were using a single port on each NIC/machine for this test. Connection was a bit asymmetric, seems one side would over-power the other...so if we twiddled a bit, we could get around 3Mpps in each direction. Our user-space app has some over-head as well, but we can send at least 5Gbps full duplex on two ports using normal sized frames, so I think the bottleneck in this case is the TCP offload in the NIC. Still, pretty impressive for stateful TCP packets per second :) Top-of-tree netperf just learned to do the TCP_MAXSEG trick as well, so it might be fun to play with that. It probably has less overhead than our tool, so might run even faster. Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2dcopy1 _______________________________________________ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: intel 82599 multi-port performance 2011-09-26 16:04 ` Alexander Duyck 2011-09-26 16:40 ` Chris Friesen @ 2011-09-27 0:45 ` J.Hwan Kim 2011-09-27 15:30 ` Martin Millnert 2011-09-27 17:14 ` Alexander Duyck 1 sibling, 2 replies; 14+ messages in thread From: J.Hwan Kim @ 2011-09-27 0:45 UTC (permalink / raw) To: Alexander Duyck; +Cc: netdev On 2011년 09월 27일 01:04, Alexander Duyck wrote: > On 09/26/2011 08:42 AM, J.Hwan.Kim wrote: >> On 2011년 09월 26일 23:20, Chris Friesen wrote: >>> On 09/26/2011 04:26 AM, J.Hwan Kim wrote: >>>> Hi, everyone >>>> >>>> Now, I'm testing a network card including intel 82599. >>>> In our experiment, with the driver modified with ixgbe and multi-port >>>> enabled, >>> >>> What do you mean by "modified with ixgbe and multi-port enabled"? You >>> shouldn't need to do anything special to use both ports. >>> >>>> rx performance of each port with 10Gbps of 64bytes frame is >>>> a half than when only 1 port is used. >>> >>> Sounds like a cpu limitation. What is your cpu usage? How are your >>> interrupts routed? Are you using multiple rx queues? >>> >> >> Our server is XEON 2.4GHz with 8 cores. >> I'm using 4 RSS queues for each port and distributed it's interrupts >> to different cores respectively. >> I checked the CPU utilization with TOP, I guess ,it is not cpu >> imitation problem. > > What kind of rates are you seeing on a single port versus multiple > ports? There are multiple possibilities in terms of what could be > limiting your performance. > I tested the 10G - 64byte frames. With ixgbe-modified driver, in single port, 92% of packet received in driver level and in 2 port we received around 42% packets. > It sounds like you are using a single card, would that be correct? Yes, I tested a single card with 2 ports. > If you are running close to line rate on both ports this could be > causing you to saturate the PCIe x8 link. If you have a second card > available you may want to try installing that in a secondary Gen2 PCIe > slot and seeing if you can improve the performance by using 2 PCIe > slots instead of one. I tested it also, if it is tested with 2 card, it seems that the performance of each port is almost same with a single port. (maximum performance) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: intel 82599 multi-port performance 2011-09-27 0:45 ` J.Hwan Kim @ 2011-09-27 15:30 ` Martin Millnert 2011-09-27 17:14 ` Alexander Duyck 1 sibling, 0 replies; 14+ messages in thread From: Martin Millnert @ 2011-09-27 15:30 UTC (permalink / raw) To: J.Hwan Kim; +Cc: Alexander Duyck, netdev Hi J.Hwan, On Tue, Sep 27, 2011 at 2:45 AM, J.Hwan Kim <frog1120@gmail.com> wrote: > I tested the 10G - 64byte frames. > With ixgbe-modified driver, in single port, 92% of packet received in driver > level and in 2 port we received around 42% packets. Are you reading packet drops in the 82599's own registries (ie ethtool)? Regards, Martin ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: intel 82599 multi-port performance 2011-09-27 0:45 ` J.Hwan Kim 2011-09-27 15:30 ` Martin Millnert @ 2011-09-27 17:14 ` Alexander Duyck 2011-09-27 22:57 ` Chris Friesen 1 sibling, 1 reply; 14+ messages in thread From: Alexander Duyck @ 2011-09-27 17:14 UTC (permalink / raw) To: J.Hwan Kim; +Cc: netdev On 09/26/2011 05:45 PM, J.Hwan Kim wrote: > On 2011년 09월 27일 01:04, Alexander Duyck wrote: >> On 09/26/2011 08:42 AM, J.Hwan.Kim wrote: >>> On 2011년 09월 26일 23:20, Chris Friesen wrote: >>>> On 09/26/2011 04:26 AM, J.Hwan Kim wrote: >>>>> Hi, everyone >>>>> >>>>> Now, I'm testing a network card including intel 82599. >>>>> In our experiment, with the driver modified with ixgbe and multi-port >>>>> enabled, >>>> >>>> What do you mean by "modified with ixgbe and multi-port enabled"? You >>>> shouldn't need to do anything special to use both ports. >>>> >>>>> rx performance of each port with 10Gbps of 64bytes frame is >>>>> a half than when only 1 port is used. >>>> >>>> Sounds like a cpu limitation. What is your cpu usage? How are your >>>> interrupts routed? Are you using multiple rx queues? >>>> >>> >>> Our server is XEON 2.4GHz with 8 cores. >>> I'm using 4 RSS queues for each port and distributed it's interrupts >>> to different cores respectively. >>> I checked the CPU utilization with TOP, I guess ,it is not cpu >>> imitation problem. >> >> What kind of rates are you seeing on a single port versus multiple >> ports? There are multiple possibilities in terms of what could be >> limiting your performance. >> > > I tested the 10G - 64byte frames. > With ixgbe-modified driver, in single port, 92% of packet received in > driver level and in 2 port we received around 42% packets. When you say 92% of packets are received are you talking about 92% of line rate which would be somewhere around 14.8Mpps? >> It sounds like you are using a single card, would that be correct? > > Yes, I tested a single card with 2 ports. > >> If you are running close to line rate on both ports this could be >> causing you to saturate the PCIe x8 link. If you have a second card >> available you may want to try installing that in a secondary Gen2 >> PCIe slot and seeing if you can improve the performance by using 2 >> PCIe slots instead of one. > > I tested it also, if it is tested with 2 card, it seems that the > performance of each port is almost same with a single port. (maximum > performance) This more or less confirms what I was thinking. You are likely hitting the PCIe limits of the adapters. The overhead for 64 byte packets is too great and as a result you are exceeding the PCIe bandwidth available to the adapter. In order to achieve line rate on both ports you would likely need to increase your packet size to something along the lines of 256 bytes so that the additional PCIe overhead only contributes 50% or less to the total PCIe traffic across the bus. Then the 2.5Gb/s of network traffic should consume less than 4.0GT/s of PCIe traffic. Thanks, Alex ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: intel 82599 multi-port performance 2011-09-27 17:14 ` Alexander Duyck @ 2011-09-27 22:57 ` Chris Friesen 0 siblings, 0 replies; 14+ messages in thread From: Chris Friesen @ 2011-09-27 22:57 UTC (permalink / raw) To: Alexander Duyck; +Cc: J.Hwan Kim, netdev On 09/27/2011 11:14 AM, Alexander Duyck wrote: > This more or less confirms what I was thinking. You are likely hitting > the PCIe limits of the adapters. The overhead for 64 byte packets is too > great and as a result you are exceeding the PCIe bandwidth available to > the adapter. In order to achieve line rate on both ports you would > likely need to increase your packet size to something along the lines of > 256 bytes so that the additional PCIe overhead only contributes 50% or > less to the total PCIe traffic across the bus. Then the 2.5Gb/s of > network traffic should consume less than 4.0GT/s of PCIe traffic. For some further information, according to the information here: http://shader.kaist.edu/packetshader/io_engine/benchmark/i3.html a dual-port 82599 controller with an i3 CPU can in fact handle sending *or* receiving (and then dropping) full line rate on both ports for minimum-sized packets. It can't do both though. The CPU used in tose tests isn't the greatest however, so it's tough to say where the bottleneck is. Chris -- Chris Friesen Software Developer GENBAND chris.friesen@genband.com www.genband.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: intel 82599 multi-port performance 2011-09-26 15:42 ` J.Hwan.Kim 2011-09-26 16:04 ` Alexander Duyck @ 2011-09-26 18:16 ` Rick Jones 2011-09-27 0:39 ` J.Hwan Kim 1 sibling, 1 reply; 14+ messages in thread From: Rick Jones @ 2011-09-26 18:16 UTC (permalink / raw) To: frog1120; +Cc: J.Hwan.Kim, netdev, netdev On 09/26/2011 08:42 AM, J.Hwan.Kim wrote: > On 2011년 09월 26일 23:20, Chris Friesen wrote: >> On 09/26/2011 04:26 AM, J.Hwan Kim wrote: >>> Hi, everyone >>> >>> Now, I'm testing a network card including intel 82599. >>> In our experiment, with the driver modified with ixgbe and multi-port >>> enabled, >> >> What do you mean by "modified with ixgbe and multi-port enabled"? You >> shouldn't need to do anything special to use both ports. >> >>> rx performance of each port with 10Gbps of 64bytes frame is >>> a half than when only 1 port is used. >> >> Sounds like a cpu limitation. What is your cpu usage? How are your >> interrupts routed? Are you using multiple rx queues? >> > > Our server is XEON 2.4GHz with 8 cores. > I'm using 4 RSS queues for each port and distributed it's interrupts to > different cores respectively. > I checked the CPU utilization with TOP, I guess ,it is not cpu imitation > problem. 99 times out of 10, by default top will show the average CPU utilization across all the "CPUs" of the system. So I will ask the pedantic question - Did you check per-CPU utilization or just overall? rick jones ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: intel 82599 multi-port performance 2011-09-26 18:16 ` Rick Jones @ 2011-09-27 0:39 ` J.Hwan Kim 0 siblings, 0 replies; 14+ messages in thread From: J.Hwan Kim @ 2011-09-27 0:39 UTC (permalink / raw) To: Rick Jones; +Cc: netdev On 2011년 09월 27일 03:16, Rick Jones wrote: > On 09/26/2011 08:42 AM, J.Hwan.Kim wrote: >> On 2011년 09월 26일 23:20, Chris Friesen wrote: >>> On 09/26/2011 04:26 AM, J.Hwan Kim wrote: >>>> Hi, everyone >>>> >>>> Now, I'm testing a network card including intel 82599. >>>> In our experiment, with the driver modified with ixgbe and multi-port >>>> enabled, >>> >>> What do you mean by "modified with ixgbe and multi-port enabled"? You >>> shouldn't need to do anything special to use both ports. >>> >>>> rx performance of each port with 10Gbps of 64bytes frame is >>>> a half than when only 1 port is used. >>> >>> Sounds like a cpu limitation. What is your cpu usage? How are your >>> interrupts routed? Are you using multiple rx queues? >>> >> >> Our server is XEON 2.4GHz with 8 cores. >> I'm using 4 RSS queues for each port and distributed it's interrupts to >> different cores respectively. >> I checked the CPU utilization with TOP, I guess ,it is not cpu imitation >> problem. > > 99 times out of 10, by default top will show the average CPU > utilization across all the "CPUs" of the system. So I will ask the > pedantic question - Did you check per-CPU utilization or just overall? > I checked CPU utilization per CPU with top. I pressed "1" after executing top so that I can view the per-cpu utilization. ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2011-09-27 22:59 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-09-26 10:26 intel 82599 multi-port performance J.Hwan Kim 2011-09-26 14:20 ` Chris Friesen 2011-09-26 15:42 ` J.Hwan.Kim 2011-09-26 16:04 ` Alexander Duyck 2011-09-26 16:40 ` Chris Friesen 2011-09-26 17:24 ` [E1000-devel] " Ben Greear 2011-09-26 17:46 ` Chris Friesen 2011-09-26 17:57 ` Ben Greear 2011-09-27 0:45 ` J.Hwan Kim 2011-09-27 15:30 ` Martin Millnert 2011-09-27 17:14 ` Alexander Duyck 2011-09-27 22:57 ` Chris Friesen 2011-09-26 18:16 ` Rick Jones 2011-09-27 0:39 ` J.Hwan Kim
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).