From mboxrd@z Thu Jan 1 00:00:00 1970 From: Qian Xu Subject: =?utf-8?q?=5BPATCH=5Ddoc=3A_Add_performance_tuning_gui?= =?utf-8?q?de_about_how_to_get_DPDK_high_perf_on_Intel_platform=2E?= Date: Mon, 10 Aug 2015 14:34:47 +0800 Message-ID: <1439188487-7302-1-git-send-email-qian.q.xu@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable To: dev@dpdk.org Return-path: Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id 53E812A07 for ; Mon, 10 Aug 2015 08:34:57 +0200 (CEST) List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Signed-off-by: Qian Xu Add a new guide doc under guides folder. This document is a step-by-step = guide about how to get high performance with DPDK on Intel's platform and NICs. It is designed for users who are not familiar with DPDK but would like to= measure the best performance. It contains step-by-step instructions to set the platform and NICs to its best perfor= mance. The document will add more sections with the DPDK features' increment. Currently, the document has only one section about PF performance test se= tup, and will add below cases in near future. * VF performance tuning. * Vhost/virtio performance tuning. * new features.... diff --git a/doc/guides/perf_tuning_guide/img/pf_performance_test_setup.s= vg b/doc/guides/perf_tuning_guide/img/pf_performance_test_setup.svg new file mode 100644 index 0000000..50ce92d --- /dev/null +++ b/doc/guides/perf_tuning_guide/img/pf_performance_test_setup.svg @@ -0,0 +1,375 @@ + + + + + + + + image/svg+xml + + + + + + + + + diff --git a/doc/guides/perf_tuning_guide/index.rst b/doc/guides/perf_tun= ing_guide/index.rst new file mode 100644 index 0000000..ff325e9 --- /dev/null +++ b/doc/guides/perf_tuning_guide/index.rst @@ -0,0 +1,47 @@ +.. BSD LICENSE + Copyright(c) 2010-2015 Intel Corporation. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FO= R + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL= , + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE= , + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON AN= Y + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE US= E + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +Performance Tuning Guide for Intel's platform +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +|today| + +Contents + +.. toctree:: + :maxdepth: 2 + :numbered: + + intro + performance_tuning + =20 + =20 + =20 + =20 diff --git a/doc/guides/perf_tuning_guide/intro.rst b/doc/guides/perf_tun= ing_guide/intro.rst new file mode 100644 index 0000000..5672549 --- /dev/null +++ b/doc/guides/perf_tuning_guide/intro.rst @@ -0,0 +1,44 @@ +.. BSD LICENSE + Copyright(c) 2010-2015 Intel Corporation. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FO= R + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL= , + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE= , + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON AN= Y + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE US= E + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +Introduction +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +This document is a step-by-step guide about how to get high performance = with DPDK on Intel's platform and NICs. +It is designed for users who are not familiar with DPDK but would like t= o measure the best performance. It contains +step-by-step instructions to set the platform and NICs to its best perfo= rmance.=20 +The document will add more sections with the DPDK features' increment.=20 +Currently, the document has only one section about PF performance test s= etup, and will add below cases in near future.=20 + +* VF performance tuning. +* Vhost/virtio performance tuning. +* new features.... + + diff --git a/doc/guides/perf_tuning_guide/performance_tuning.rst b/doc/gu= ides/perf_tuning_guide/performance_tuning.rst new file mode 100644 index 0000000..e701d48 --- /dev/null +++ b/doc/guides/perf_tuning_guide/performance_tuning.rst @@ -0,0 +1,157 @@ +Performance Tuning DPDK with Intel's Platform and NICs +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D + +This document is a step-by-step guide for getting high DPDK performance = with Intel's platform and NICs. + +Prerequisites +------------- + +Hardware platform essential requirements: + +1. Use a standard Intel=C2=AE Xeon=C2=AE server system (e.g. Ivy Bridge,= Haswell or newer). + +2. Ensure that each memory channel has at least one memory DIMM inserted= , and the memory size for each can be 4GB or above (e.g: 8GB or 16GB). Yo= u can use ``dmidecode -t memory`` to check the memory status. **Note**: T= his is one important element to impact the performance. + +Hardware platform Network Interface Card Essential requirements: +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +1. Get an high end Intel=C2=AE NIC, e.g: Intel=C2=AE XL710. + +2. Make sure each NIC has flashed the latest version of NVM/firmware, if= there is. + +3. Use PCIe Gen3 slots, such as Gen3 x8 or Gen3 x16 because PCIe Gen2 sl= ots can't provide enough bandwidth for 2x10G and above. + +4. When inserting NICs to the PCI slots, be careful about the NUMA. If y= ou will use 2 or more ports from different NICs, please make sure these N= ICs on the same CPU socket. + +BIOS settings: +~~~~~~~~~~~~~~ + +1. To be sure, reset all the BIOS settings to default. + +2. Disable all power saving options, and set all options for best perfor= mance. + +3. Disable Turbo to ensure the performance scaling with core numbers inc= rement. + +4. Set memory frequency to the highest number, NOT auto. + +5. Disable all Virtualization options when you test physical function of= NIC, and turn on VT-d if you wants to use VFIO. + + +Linux System Essential Requirements: +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +1. Get an widely used 64 bits Linux distribution installed, e.g. Fedora = 20 - 64 bits. + +2. Make sure to select as more components as possible during system inst= allation, to avoid install necessary components again and again. + +3. Make sure the widely used and fully validated version of kernel insta= lled, e.g. 3.18. + +4. Make sure the required components are enabled for some old version of= kernels, before rebuilding the kernel. The kernel may need to be rebuilt= if any DPDK component is missing. Refer to Getting Started Guide on www.= dpdk.org for more details. + + +Grub Parameters Essential Requirements: +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +1. Use the default grub file as a good start. + +2. Reserve 1G huge pages via grub configurations, e.g. add ``default_hug= epagesz=3D1G hugepagesz=3D1G hugepages=3D8`` to reserve 8 huge pages in 1= G size. + +3. Isolate CPU cores which will be used for DPDK from scheduler, e.g: is= olcpus=3D2,3,4,5,6,7,8 + +4. If it wants to use VFIO, additional grub parameters are needed. e.g: = ``iommu=3Dpt intel_iommu=3Don`` + + +Configurations before running DPDK +---------------------------------- + +1. For Intel=C2=AE 40G NICs, special configurations should be set before= compiling it, as follows. **Note**: This is very important:: + + for at least DPDK release 1.8, 2.0 and 2.1, in /confi= g/common_linuxapp + CONFIG_RTE_PCI_CONFIG=3Dy + CONFIG_RTE_PCI_EXTENDED_TAG=3D=E2=80=9Don=E2=80=9D + +2. Build DPDK target and reserve huge page, refer to GSG guide for more = details. Below scripts are for your reference:: + + cd + make install T=3Dx86_64-native-linuxapp-gcc -j # Build DP= DK target + awk '/Hugepagesize/ {print $2}' /proc/meminfo # Get the hu= gepage size + awk '/Hugepage_Total/ {print $2} ' /proc/meminfo # Get the to= tal huge page numbers + umount `awk '/hugetlbfs/ {print $2}' /proc/mounts` # Umount + mkdir -p /mnt/huge # Create the= hugepage mount folder + mount -t hugetlbfs nodev /mnt/huge # Mount to t= he specific folder + +3. Check the CPU layout by dpdk tools or system commands ``lscpu``:: + + cd /tools + ./cpu_layout.py #Run the script to check you= r system's cpu layout. + + Or run ``lscpu`` to check the the cores on each socket + +4. Check your NIC id and related socket id:: + + lspci -nn|grep Eth # List all the NICs with PCI= address and device IDs. + + e.g. Suppose your output is as below:: + + 82:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Con= troller XL710 for 40GbE QSFP+ [8086:1583] (rev 01) + 82:00.1 Ethernet controller [0200]: Intel Corporation Ethernet Con= troller XL710 for 40GbE QSFP+ [8086:1583] (rev 01) + 85:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Con= troller XL710 for 40GbE QSFP+ [8086:1583] (rev 01) + 85:00.1 Ethernet controller [0200]: Intel Corporation Ethernet Con= troller XL710 for 40GbE QSFP+ [8086:1583] (rev 01) + + Check the PCI device related numa node id:: + + cat /sys/bus/pci/devices/0000\:xx\:00.x/numa_node + + Usually ``8x:00.x`` is on socket 1, ``0x:00.x`` is on socket 0. **Not= e**: To get best performance, please make sure the core and NICs are in t= he same socket. Take ``85:00.0`` for example, it's on socket 1, then use = cores on socket1 for best performance. + +5. Bind the test ports to igb_uio. For example bind two ports to dpdk co= mpatible driver and check the status:: + + # Bind ports 82:00.0 and 85:00.0 to dpdk driver + + .//tools/dpdk_nic_bind.py -b igb_uio 82:00.0 85:00.0 + + # Check the port driver status + + .//tools/dpdk_nic_bind.py --st + + +Example +------- + +Below is an case of running dpdk l3fwd sample to get high performance wi= th Intel platform and NIC. + +**Note**: The scenario is to get best performance with two Intel=C2=AEXL= 710 40G ports. See below Figure1 as the performance test setup.=20 + +.. figure:: img/pf_performance_test_setup.* + +**Figure 1. PF_Performance_Test_setup** + + +1. Insert two NICs(Intel=C2=AEXL710) into the platform, and use one port= per card to get best performance. The reason using two NICs is the PCIe = Gen3's limitations. **Note**: As PCIe Gen3 can't provide 80G bandwidth fo= r two 40G ports, but two different PCIe Gen3 slot can. Refer to the sampl= e NICs output above, then we can select 82:00.0 and 85:00.0 as test ports= :: + + 82:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Con= troller XL710 for 40GbE QSFP+ [8086:1583] (rev 01) + 85:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Con= troller XL710 for 40GbE QSFP+ [8086:1583] (rev 01) + +2. Connect the ports to the traffic generator, such as IXIA and Spirent. + +3. Check the PCI devices numa node(socket id) and get the cores number o= n the exact socket id. In this case, 82:00.0 and 85:00.0 are both in sock= et1, and the cores on socket1 in the referenced platform is 18-35,54-71. = Note: Don't use one core's 2 thread(e.g core18 has 2 lcores, lcore18 and = lcore54), instead, use 2 logical cores from different cores(e.g core18 an= d core19). + +4. Bind these two ports to igb_uio. + +5. As it is known that XL710 40G port need at least two queue pairs to a= chieve best performance, then two queues per port will be required, and e= ach queue pair will need a dedicated CPU core for receiving/transmitting = packets. + +6. Basically l3fwd will be using for performance testing, with using two= ports for bi-directional forwarding. Compile the l3fwd sample with defau= lt lpm mode. + +7. Final command line of running l3fwd could be as followings. That mean= s use core 18 for port 0, queue pair 0 forwarding, core 19 for port 0, qu= eue pair 1 forwarding, core 20 for port 1, queue pair 0 forwarding, core = 21 for port 1, queue pair 1 forwarding:: + + ./l3fwd -c 0x3c0000 -n 4 -w 82:00.0 -w 85:00.0 -- -p 0x3 --config = '(0,0,18),(0,1,19),(1,0,20),(1,1,21)' +=20 +8. Configure the traffic to a traffic generator such as IXIA or Spirent. + +* Start creating a stream on packet generator, e.g. IXIA. +* Set the Ethernet II type to 0x0800 +* Set the protocols to IPV4. +* Do not set any L4 protocols, just keep it as none.**Note**: this is ve= ry important, if you set UDP or TCP protocol, you may get relative low pe= rformance since the l3fwd example default using none protocols for RSS en= abling.=20 +* The flow's DEST MAC, DEST IP, SRC IP's settings can be seen in the abo= ve figure. It's for the user's reference. Set the correct destination IP = address according to "ipv4_l3fwd_route_array" in the l3fwd example code, = such as 2.1.1.1 for port0, then it will forward the packets to port1. Set= the source IP as random, **Note**: this is very important to make sure t= he packets will be received in multiple queues. + + --=20 2.1.0