From mboxrd@z Thu Jan 1 00:00:00 1970 From: Qian Xu Subject: =?utf-8?q?=5BPATCH_v2=5Ddoc=3AAdd_performance_test_gui?= =?utf-8?q?de_about_how_to_get_DPDK_high_perf_on_Intel_platform?= Date: Thu, 13 Aug 2015 11:19:39 +0800 Message-ID: <1439435979-25869-1-git-send-email-qian.q.xu@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable To: dev@dpdk.org Return-path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id A3FFF8D3B for ; Thu, 13 Aug 2015 05:19:49 +0200 (CEST) List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" v2 changes:=20 1. Create a svg picture. 2. Add part about how to check memory channel by dmidecode -t memory. 3. Add the command about how to check PCIe slot's speed. 4. Some doc updates according to the comments. =20 Add a new guide doc under guides folder. This document is a step-by-step guide about how to get high performance with DPDK on Intel'= s platform and NICs. It is designed for users who are not familiar with DP= DK but would like to measure the best performance. It contains step-by-step instructions to set the platform and NICs to its best performance. The document will add more sections with the DPDK features' increment. Signed-off-by: Qian Xu diff --git a/doc/guides/perf_test_guide/img/intel_perf_test_setup.svg b/d= oc/guides/perf_test_guide/img/intel_perf_test_setup.svg new file mode 100644 index 0000000..40bb189 --- /dev/null +++ b/doc/guides/perf_test_guide/img/intel_perf_test_setup.svg @@ -0,0 +1,467 @@ + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + + + + + Flow1DEST MAC=3DPort0's MACDEST IP=3D2.1.1.1SRC IP: Random + + 0 + 1 + + X + X + Flow2DEST MAC=3DPort1's MACDEST IP=3D1.1.1.1SRC IP: Random + + + + + Ixia + + + + A + B + + + + + + + 40G Ethernet + XL710 + IA PlatformSocket1 + Port0 --> Port1Port1 --> Port0 + XL710 + 40G Ethernet + + diff --git a/doc/guides/perf_test_guide/index.rst b/doc/guides/perf_test_= guide/index.rst new file mode 100644 index 0000000..25c8ee9 --- /dev/null +++ b/doc/guides/perf_test_guide/index.rst @@ -0,0 +1,47 @@ +.. BSD LICENSE + Copyright(c) 2010-2015 Intel Corporation. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FO= R + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL= , + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE= , + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON AN= Y + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE US= E + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +Performance test guide on Intel's Platform +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +|today| + +Contents + +.. toctree:: + :maxdepth: 2 + :numbered: + + intro + perf_test_intel_platform_nic + =20 + =20 + =20 + =20 diff --git a/doc/guides/perf_test_guide/intro.rst b/doc/guides/perf_test_= guide/intro.rst new file mode 100644 index 0000000..471d15e --- /dev/null +++ b/doc/guides/perf_test_guide/intro.rst @@ -0,0 +1,40 @@ +.. BSD LICENSE + Copyright(c) 2010-2015 Intel Corporation. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FO= R + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL= , + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE= , + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON AN= Y + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE US= E + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +Introduction +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +This document is a step-by-step guide about how to get high performance = with DPDK on Intel's platform and NICs. +It is designed for users who are not familiar with DPDK but would like t= o measure the best performance. It contains +step-by-step instructions to set the platform and NICs to its best perfo= rmance.=20 +The document will add more sections with the DPDK features' increment.=20 +Currently, the document has only one section about PF performance test s= etup, and will add other performance cases in future.=20 + + diff --git a/doc/guides/perf_test_guide/perf_test_intel_platform_nic.rst = b/doc/guides/perf_test_guide/perf_test_intel_platform_nic.rst new file mode 100644 index 0000000..4320f13 --- /dev/null +++ b/doc/guides/perf_test_guide/perf_test_intel_platform_nic.rst @@ -0,0 +1,220 @@ +How to get best performance with Intel's Platform and NICs +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D + +This document is a step-by-step guide for getting high DPDK performance = with Intel's platform and NICs. For other NICs, e.g. Chelsio, Cisco, Mell= anox, the Intel platform/CPU settings could be very similar, but the spec= ific NIC's configurations may differ from each vendor. + +Prerequisites +------------- + +Hardware platform essential requirements: + +1. Use a standard Intel=C2=AE Xeon=C2=AE server system (e.g. Ivy Bridge,= Haswell or newer). + +2. Ensure that each memory channel has at least one memory DIMM inserted= , and the memory size for each can be 4GB or above (e.g: 8GB or 16GB). **= Note**: This is one important element to impact the performance.You can u= se ``dmidecode -t memory`` to check the memory status:: + + dmidecode -t memory |grep Locator + =20 + #sample output is as below, and there are memory channels from A t= o H, totally 8 channels and each channel has 2 DIMMs.=20 + =20 + Locator: DIMM_A1 + Bank Locator: NODE 1 + Locator: DIMM_A2 + Bank Locator: NODE 1 + Locator: DIMM_B1 + Bank Locator: NODE 1 + Locator: DIMM_B2 + Bank Locator: NODE 1 + Locator: DIMM_C1 + Bank Locator: NODE 1 + Locator: DIMM_C2 + Bank Locator: NODE 1 + Locator: DIMM_D1 + Bank Locator: NODE 1 + Locator: DIMM_D2 + Bank Locator: NODE 1 + Locator: DIMM_E1 + Bank Locator: NODE 2 + Locator: DIMM_E2 + Bank Locator: NODE 2 + Locator: DIMM_F1 + Bank Locator: NODE 2 + Locator: DIMM_F2 + Bank Locator: NODE 2 + Locator: DIMM_G1 + Bank Locator: NODE 2 + Locator: DIMM_G2 + Bank Locator: NODE 2 + Locator: DIMM_H1 + Bank Locator: NODE 2 + Locator: DIMM_H2 + Bank Locator: NODE 2 + + dmidecode -t memory |grep Speed + =20 + #sample output is as below. It shows Speed 2133 MHz(DDR4) and Unkn= own(not exist) alternatively. + then align with the above channel's information, we can know each = channel has one memory bar. + =20 + Speed: 2133 MHz + Configured Clock Speed: 2134 MHz + Speed: Unknown + Configured Clock Speed: Unknown + Speed: 2133 MHz + Configured Clock Speed: 2134 MHz + Speed: Unknown + Configured Clock Speed: Unknown + Speed: 2133 MHz + Configured Clock Speed: 2134 MHz + Speed: Unknown + Configured Clock Speed: Unknown + Speed: 2133 MHz + Configured Clock Speed: 2134 MHz + Speed: Unknown + Configured Clock Speed: Unknown + Speed: 2133 MHz + Configured Clock Speed: 2134 MHz + Speed: Unknown + Configured Clock Speed: Unknown + Speed: 2133 MHz + Configured Clock Speed: 2134 MHz + Speed: Unknown + Configured Clock Speed: Unknown + Speed: 2133 MHz + Configured Clock Speed: 2134 MHz + Speed: Unknown + Configured Clock Speed: Unknown + Speed: 2133 MHz + Configured Clock Speed: 2134 MHz + Speed: Unknown + Configured Clock Speed: Unknown + + +Hardware platform Network Interface Card Essential requirements: +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +1. Get an high end Intel=C2=AE NIC, e.g: Intel=C2=AE XL710. Note: Get an= high end NIC means getting a high packet rate, since currently 1G and 10= G NICs can achieve line rate easily, but 40G NIC would be more complicate= d, so take it as an example.=20 + +2. Make sure each NIC has flashed the latest version of NVM/firmware, if= there is. + +3. Use PCIe Gen3 slots, such as Gen3 x8 or Gen3 x16 because PCIe Gen2 sl= ots can't provide enough bandwidth for 2x10G and above.The way to check t= he PCI slot's speed can be like below::=20 + + #lspci -s 03:00.1 -vv |grep LnkSta + #LnkSta: Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- B= WMgmt- ABWMgmt- + LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, = EqualizationPhase1+ + +4. When inserting NICs to the PCI slots, please check the caption on the= PCI slot, there would be CPU0 or CPU1 to tell you which socket. Or you c= an search and download Intel platform board layout in website mark.intel.= com. Be careful about the NUMA. If you will use 2 or more ports from diff= erent NICs, please make sure these NICs on the same CPU socket. Below ses= sion will show you how to check the PCI device locates in which socket by= command. + +BIOS settings: +~~~~~~~~~~~~~~ + +1. To be sure, reset all the BIOS settings to default. + +2. Disable all power saving options, and set all options for best perfor= mance. + +3. Disable Turbo to ensure the performance scaling with core numbers inc= rement. + +4. Set memory frequency to the highest number, NOT auto. + +5. Disable all Virtualization options when you test physical function of= NIC, and turn on VT-d if you wants to use VFIO. + + +Grub Parameters Essential Requirements: +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +1. Use the default grub file as a good start. + +2. Reserve 1G huge pages via grub configurations, e.g. add ``default_hug= epagesz=3D1G hugepagesz=3D1G hugepages=3D8`` to reserve 8 huge pages in 1= G size. + +3. Isolate CPU cores which will be used for DPDK from scheduler, e.g: is= olcpus=3D2,3,4,5,6,7,8 + +4. If it wants to use VFIO, additional grub parameters are needed. e.g: = ``iommu=3Dpt intel_iommu=3Don`` + + +Configurations before running DPDK +---------------------------------- + +1. Build DPDK target and reserve huge page, refer to GSG guide for more = details. Below scripts are for your reference:: + + cd + make install T=3Dx86_64-native-linuxapp-gcc -j # Build DP= DK target + awk '/Hugepagesize/ {print $2}' /proc/meminfo # Get the hu= gepage size + awk '/Hugepage_Total/ {print $2} ' /proc/meminfo # Get the to= tal huge page numbers + umount `awk '/hugetlbfs/ {print $2}' /proc/mounts` # Umount + mkdir -p /mnt/huge # Create the= hugepage mount folder + mount -t hugetlbfs nodev /mnt/huge # Mount to t= he specific folder + +2. Check the CPU layout by dpdk tools or system commands ``lscpu``:: + + cd /tools + ./cpu_layout.py #Run the script to check you= r system's cpu layout. + + Or run ``lscpu`` to check the the cores on each socket + +3. Check your NIC id and related socket id:: + + lspci -nn|grep Eth # List all the NICs with PCI= address and device IDs. + + e.g. Suppose your output is as below:: + + 82:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Con= troller XL710 for 40GbE QSFP+ [8086:1583] (rev 01) + 82:00.1 Ethernet controller [0200]: Intel Corporation Ethernet Con= troller XL710 for 40GbE QSFP+ [8086:1583] (rev 01) + 85:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Con= troller XL710 for 40GbE QSFP+ [8086:1583] (rev 01) + 85:00.1 Ethernet controller [0200]: Intel Corporation Ethernet Con= troller XL710 for 40GbE QSFP+ [8086:1583] (rev 01) + + Check the PCI device related numa node id:: + + cat /sys/bus/pci/devices/0000\:xx\:00.x/numa_node + + Usually ``8x:00.x`` is on socket 1, ``0x:00.x`` is on socket 0. **Not= e**: To get best performance, please make sure the core and NICs are in t= he same socket. Take ``85:00.0`` for example, it's on socket 1, then use = cores on socket1 for best performance. + +4. Bind the test ports to igb_uio. For example bind two ports to dpdk co= mpatible driver and check the status:: + + # Bind ports 82:00.0 and 85:00.0 to dpdk driver + + .//tools/dpdk_nic_bind.py -b igb_uio 82:00.0 85:00.0 + + # Check the port driver status + + .//tools/dpdk_nic_bind.py --st + +5. More details about setup or linux kernel requirements can be referred= to GSG guide.=20 + +Example +------- + +Below is an case of running dpdk l3fwd sample to get high performance wi= th Intel platform and Intel(R) XL710 NIC. Any specific 40G NIC configurat= ions please refer to the NIC's(i40e) guide. + +**Note**: The scenario is to get best performance with two Intel=C2=AEXL= 710 40G ports. See below Figure1 as the performance test setup.=20 + +.. figure:: img/intel_perf_test_setup.* + +**Figure 1. PF_Performance_Test_setup** + + +1. Insert two NICs(Intel=C2=AEXL710) into the platform, and use one port= per card to get best performance. The reason using two NICs is the PCIe = Gen3's limitations. **Note**: As PCIe Gen3x8 can't provide 80G bandwidth = for two 40G ports, but two different PCIe Gen3x8 slot can. Refer to the s= ample NICs output above, then we can select 82:00.0 and 85:00.0 as test p= orts:: + + 82:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Con= troller XL710 for 40GbE QSFP+ [8086:1583] (rev 01) + 85:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Con= troller XL710 for 40GbE QSFP+ [8086:1583] (rev 01) + +2. Connect the ports to the traffic generator, such as IXIA and Spirent. + +3. Check the PCI devices numa node(socket id) and get the cores number o= n the exact socket id. In this case, 82:00.0 and 85:00.0 are both in sock= et1, and the cores on socket1 in the referenced platform is 18-35,54-71. = Note: Don't use one core's 2 thread(e.g core18 has 2 lcores, lcore18 and = lcore54), instead, use 2 logical cores from different cores(e.g core18 an= d core19). + +4. Bind these two ports to igb_uio. + +5. As it is known that XL710 40G port need at least two queue pairs to a= chieve best performance, then two queues per port will be required, and e= ach queue pair will need a dedicated CPU core for receiving/transmitting = packets. + +6. Basically l3fwd will be using for performance testing, with using two= ports for bi-directional forwarding. Compile the l3fwd sample with defau= lt lpm mode. + +7. Final command line of running l3fwd could be as followings. That mean= s use core 18 for port 0, queue pair 0 forwarding, core 19 for port 0, qu= eue pair 1 forwarding, core 20 for port 1, queue pair 0 forwarding, core = 21 for port 1, queue pair 1 forwarding:: + + ./l3fwd -c 0x3c0000 -n 4 -w 82:00.0 -w 85:00.0 -- -p 0x3 --config = '(0,0,18),(0,1,19),(1,0,20),(1,1,21)' +=20 +8. Configure the traffic to a traffic generator such as IXIA or Spirent. + +* Start creating a stream on packet generator, e.g. IXIA. +* Set the Ethernet II type to 0x0800 +* Set the protocols to IPV4. +* Do not set any L4 protocols, just keep it as none.**Note**: this is ve= ry important, if you set UDP or TCP protocol, you may get relative low pe= rformance since the l3fwd example default using none protocols for RSS en= abling.=20 +* The flow's DEST MAC, DEST IP, SRC IP's settings can be seen in the abo= ve figure. It's for the user's reference. Set the correct destination IP = address according to "ipv4_l3fwd_route_array" in the l3fwd example code, = such as 2.1.1.1 for port0, then it will forward the packets to port1. Set= the source IP as random, **Note**: this is very important to make sure t= he packets will be received in multiple queues. + + --=20 2.1.0