From mboxrd@z Thu Jan 1 00:00:00 1970 From: Saber Rezvani Subject: Re: IXGBE throughput loss with 4+ cores Date: Wed, 29 Aug 2018 21:49:26 +0430 Message-ID: References: <74400e6a-91ba-3648-0980-47ceae1089a7@zoho.com> <20180828090142.1262c5ea@shemminger-XPS-13-9360> <9e7b00bb-285b-fe37-f298-6d20d47a77ec@zoho.com> <60BF6874-6CE2-424E-9048-2CF3E8AB6D56@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Cc: Stephen Hemminger , "dev@dpdk.org" To: "Wiles, Keith" Return-path: Received: from sender-pp-092.zoho.com (sender-pp-092.zoho.com [135.84.80.237]) by dpdk.org (Postfix) with ESMTP id D6F78F11 for ; Wed, 29 Aug 2018 19:19:39 +0200 (CEST) In-Reply-To: <60BF6874-6CE2-424E-9048-2CF3E8AB6D56@intel.com> Content-Language: en-US List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 08/29/2018 01:39 AM, Wiles, Keith wrote: > >> On Aug 28, 2018, at 2:16 PM, Saber Rezvani wrote: >> >> >> >> On 08/28/2018 11:39 PM, Wiles, Keith wrote: >>> Which version of Pktgen? I just pushed a patch in 3.5.3 to fix a perfo= rmance problem. >> I use Pktgen verion 3.0.0, indeed it is O.k as far as I have one core. = (10 Gb/s) but when I increase the number of core (one core per queue) then = I loose some performance (roughly 8.5 Gb/s for 8-core). In my scenario Pktg= en shows it is generating at line rate, but receiving 8.5 Gb/s. >> Is it because of Pktgen??? > Normally Pktgen can receive at line rate up to 10G 64 byte frames, which = means Pktgen should not be the problem. You can verify that by looping the = cable from one port to another on the pktgen machine to create a external l= oopback. Then send traffic what ever you can send from one port you should = be able to receive those packets unless something is configured wrong. > > Please send me the command line for pktgen. > > > In pktgen if you have this config -m =E2=80=9C[1-4:5-8].0=E2=80=9D then y= ou have 4 cores sending traffic and 4 core receiving packets. > > In this case the TX cores will be sending the packets on all 4 lcores to = the same port. On the rx side you have 4 cores polling 4 rx queues. The rx = queues are controlled by RSS, which means the RX traffic 5 tuples hash must= divide the inbound packets across all 4 queues to make sure each core is d= oing the same amount of work. If you are sending only a single packet on th= e Tx cores then only one rx queue be used. > > I hope that makes sense. I think there is a misunderstanding of the problem. Indeed the problem=20 is not the Pktgen. Here is my command --> ./app/app/x86_64-native-linuxapp-gcc/pktgen -c=20 ffc0000 -n 4 -w 84:00.0 -w 84:00.1 --file-prefix pktgen_F2 --socket-mem=20 1000,2000,1000,1000 -- -T -P -m "[18-19:20-21].0, [22:23].1" The problem is when I run the symmetric_mp example for=20 $numberOfProcesses=3D8 cores, then I have less throughput (roughly 8.4=20 Gb/s). but when I run it for $numberOfProcesses=3D3 cores throughput is 10G= . for i in `seq $numberOfProcesses`; =C2=A0=C2=A0=C2=A0 do =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 .... so= me calculation goes here..... =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 s= ymmetric_mp -c $coremask -n 2 --proc-type=3Dauto -w 0b:00.0=20 -w 0b:00.1 --file-prefix sm --socket-mem 4000,1000,1000,1000 -- -p 3=20 --num-procs=3D$numberOfProcesses --proc-id=3D$procid"; =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 .= .... =C2=A0=C2=A0=C2=A0 done I am trying find out what makes this loss! >>>> On Aug 28, 2018, at 12:05 PM, Saber Rezvani wrote: >>>> >>>> >>>> >>>> On 08/28/2018 08:31 PM, Stephen Hemminger wrote: >>>>> On Tue, 28 Aug 2018 17:34:27 +0430 >>>>> Saber Rezvani wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> >>>>>> I have run multi_process/symmetric_mp example in DPDK example direct= ory. >>>>>> For a one process its throughput is line rate but as I increase the >>>>>> number of cores I see decrease in throughput. For example, If the nu= mber >>>>>> of queues set to 4 and each queue assigns to a single core, then the >>>>>> throughput will be something about 9.4. if 8 queues, then throughput >>>>>> will be 8.5. >>>>>> >>>>>> I have read the following, but it was not convincing. >>>>>> >>>>>> http://mails.dpdk.org/archives/dev/2015-October/024960.html >>>>>> >>>>>> >>>>>> I am eagerly looking forward to hearing from you, all. >>>>>> >>>>>> >>>>>> Best wishes, >>>>>> >>>>>> Saber >>>>>> >>>>>> >>>>> Not completely surprising. If you have more cores than packet line ra= te >>>>> then the number of packets returned for each call to rx_burst will be= less. >>>>> With large number of cores, most of the time will be spent doing read= s of >>>>> PCI registers for no packets! >>>> Indeed pktgen says it is generating traffic at line rate, but receivin= g less than 10 Gb/s. So, it that case there should be something that causes= the reduction in throughput :( >>>> >>>> >>> Regards, >>> Keith >>> >> >> > Regards, > Keith > Best regards, Saber