From mboxrd@z Thu Jan  1 00:00:00 1970
From: Saber Rezvani <irsaber@zoho.com>
Subject: Re: IXGBE throughput loss with 4+ cores
Date: Wed, 29 Aug 2018 21:49:26 +0430
Message-ID: <b50d5b50-cc47-50c5-5e6c-a26a76e5a0e1@zoho.com>
References: <74400e6a-91ba-3648-0980-47ceae1089a7@zoho.com>
 <20180828090142.1262c5ea@shemminger-XPS-13-9360>
 <ce7462f4-a4f5-3a8f-1817-2604f225546a@zoho.com>
 <CC7A1A46-ED26-415E-AB52-BEA9AAEBBD7A@intel.com>
 <9e7b00bb-285b-fe37-f298-6d20d47a77ec@zoho.com>
 <60BF6874-6CE2-424E-9048-2CF3E8AB6D56@intel.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: quoted-printable
Cc: Stephen Hemminger <stephen@networkplumber.org>,
 "dev@dpdk.org" <dev@dpdk.org>
To: "Wiles, Keith" <keith.wiles@intel.com>
Return-path: <dev-bounces@dpdk.org>
Received: from sender-pp-092.zoho.com (sender-pp-092.zoho.com [135.84.80.237])
 by dpdk.org (Postfix) with ESMTP id D6F78F11
 for <dev@dpdk.org>; Wed, 29 Aug 2018 19:19:39 +0200 (CEST)
In-Reply-To: <60BF6874-6CE2-424E-9048-2CF3E8AB6D56@intel.com>
Content-Language: en-US
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>


On 08/29/2018 01:39 AM, Wiles, Keith wrote:
>
>> On Aug 28, 2018, at 2:16 PM, Saber Rezvani <irsaber@zoho.com> wrote:
>>
>>
>>
>> On 08/28/2018 11:39 PM, Wiles, Keith wrote:
>>> Which version of Pktgen? I just pushed a patch in 3.5.3 to fix a  perfo=
rmance problem.
>> I use Pktgen verion 3.0.0, indeed it is O.k as far as I  have one core. =
(10 Gb/s) but when I increase the number of core (one core per queue) then =
I loose some performance (roughly 8.5 Gb/s for 8-core). In my scenario Pktg=
en shows it is generating at line rate, but receiving 8.5 Gb/s.
>> Is it because of Pktgen???
> Normally Pktgen can receive at line rate up to 10G 64 byte frames, which =
means Pktgen should not be the problem. You can verify that by looping the =
cable from one port to another on the pktgen machine to create a external l=
oopback. Then send traffic what ever you can send from one port you should =
be able to receive those packets unless something is configured wrong.
>
> Please send me the command line for pktgen.
>
>
> In pktgen if you have this config -m =E2=80=9C[1-4:5-8].0=E2=80=9D then y=
ou have 4 cores sending traffic and 4 core receiving packets.
>
> In this case the TX cores will be sending the packets on all 4 lcores to =
the same port. On the rx side you have 4 cores polling 4 rx queues. The rx =
queues are controlled by RSS, which means the RX traffic 5 tuples hash must=
 divide the inbound packets across all 4 queues to make sure each core is d=
oing the same amount of work. If you are sending only a single packet on th=
e Tx cores then only one rx queue be used.
>
> I hope that makes sense.
I think there is a misunderstanding of the problem. Indeed the problem=20
is not the Pktgen.
Here is my command --> ./app/app/x86_64-native-linuxapp-gcc/pktgen -c=20
ffc0000 -n 4 -w 84:00.0 -w 84:00.1 --file-prefix pktgen_F2 --socket-mem=20
1000,2000,1000,1000 -- -T -P -m "[18-19:20-21].0, [22:23].1"

The problem is when I run the symmetric_mp example for=20
$numberOfProcesses=3D8 cores, then I have less throughput (roughly 8.4=20
Gb/s). but when I run it for $numberOfProcesses=3D3 cores throughput is 10G=
.
for i in `seq $numberOfProcesses`;
 =C2=A0=C2=A0=C2=A0 do
 =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 .... so=
me calculation goes here.....
 =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 s=
ymmetric_mp -c $coremask -n 2 --proc-type=3Dauto -w 0b:00.0=20
-w 0b:00.1 --file-prefix sm --socket-mem 4000,1000,1000,1000 -- -p 3=20
--num-procs=3D$numberOfProcesses --proc-id=3D$procid";
 =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 .=
....
 =C2=A0=C2=A0=C2=A0 done

I am trying find out what makes this loss!


>>>> On Aug 28, 2018, at 12:05 PM, Saber Rezvani <irsaber@zoho.com> wrote:
>>>>
>>>>
>>>>
>>>> On 08/28/2018 08:31 PM, Stephen Hemminger wrote:
>>>>> On Tue, 28 Aug 2018 17:34:27 +0430
>>>>> Saber Rezvani <irsaber@zoho.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>
>>>>>> I have run multi_process/symmetric_mp example in DPDK example direct=
ory.
>>>>>> For a one process its throughput is line rate but as I increase the
>>>>>> number of cores I see decrease in throughput. For example, If the nu=
mber
>>>>>> of queues set to 4 and each queue assigns to a single core, then the
>>>>>> throughput will be something about 9.4. if 8 queues, then throughput
>>>>>> will be 8.5.
>>>>>>
>>>>>> I have read the following, but it was not convincing.
>>>>>>
>>>>>> http://mails.dpdk.org/archives/dev/2015-October/024960.html
>>>>>>
>>>>>>
>>>>>> I am eagerly looking forward to hearing from you, all.
>>>>>>
>>>>>>
>>>>>> Best wishes,
>>>>>>
>>>>>> Saber
>>>>>>
>>>>>>
>>>>> Not completely surprising. If you have more cores than packet line ra=
te
>>>>> then the number of packets returned for each call to rx_burst will be=
 less.
>>>>> With large number of cores, most of the time will be spent doing read=
s of
>>>>> PCI registers for no packets!
>>>> Indeed pktgen says it is generating traffic at line rate, but receivin=
g less than 10 Gb/s. So, it that case there should be something that causes=
 the reduction in throughput :(
>>>>
>>>>
>>> Regards,
>>> Keith
>>>
>>
>>
> Regards,
> Keith
>

Best regards,
Saber