From mboxrd@z Thu Jan  1 00:00:00 1970
From: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Subject: Re: [PATCH v3 0/5] vhost: optimize enqueue
Date: Mon, 10 Oct 2016 10:44:28 +0800
Message-ID: <20161010024428.GT1597@yliu-dev.sh.intel.com>
References: <CAP4Qi3-cSgHDPC3Wne3RSL0t=Z-vhYUPsPWH6VAXsXsHYX6ShQ@mail.gmail.com>
 <8F6C2BD409508844A0EFC19955BE09414E7B5581@SHSMSX103.ccr.corp.intel.com>
 <CAP4Qi39-KD8pY-3M31asoDV+dja27XzFTsBMq9ignoawdL8=HQ@mail.gmail.com>
 <20160922022903.GJ23158@yliu-dev.sh.intel.com>
 <CAP4Qi392=aOMrSyTu-5qwpSLpwK-NVdHp-aztT-xT=BcRPWoew@mail.gmail.com>
 <8F6C2BD409508844A0EFC19955BE09414E7B5DAE@SHSMSX103.ccr.corp.intel.com>
 <CAP4Qi39YF6SoaiSaka0ioZFWb-2uzWZUbNP4CK7LqCQosaSmWQ@mail.gmail.com>
 <20160927102123.GL25823@yliu-dev.sh.intel.com>
 <8F6C2BD409508844A0EFC19955BE09414E7B7C0B@SHSMSX103.ccr.corp.intel.com>
 <8F6C2BD409508844A0EFC19955BE09414E7BBE7D@SHSMSX103.ccr.corp.intel.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: Jianbo Liu <jianbo.liu@linaro.org>,
 Maxime Coquelin <maxime.coquelin@redhat.com>, "dev@dpdk.org" <dev@dpdk.org>
To: "Wang, Zhihong" <zhihong.wang@intel.com>
Return-path: <dev-bounces@dpdk.org>
Received: from mga11.intel.com (mga11.intel.com [192.55.52.93])
 by dpdk.org (Postfix) with ESMTP id 2BC402E41
 for <dev@dpdk.org>; Mon, 10 Oct 2016 04:43:29 +0200 (CEST)
Content-Disposition: inline
In-Reply-To: <8F6C2BD409508844A0EFC19955BE09414E7BBE7D@SHSMSX103.ccr.corp.intel.com>
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

On Sun, Oct 09, 2016 at 12:09:07PM +0000, Wang, Zhihong wrote:
> > > > Tested with testpmd, host: txonly, guest: rxonly
> > > > size (bytes)     improvement (%)
> > > > 64                    4.12
> > > > 128                   6
> > > > 256                   2.65
> > > > 512                   -1.12
> > > > 1024                 -7.02
> > >
> > > There is a difference between Zhihong's code and the old I spotted in
> > > the first time: Zhihong removed the avail_idx prefetch. I understand
> > > the prefetch becomes a bit tricky when mrg-rx code path is considered;
> > > thus, I didn't comment on that.
> > >
> > > That's one of the difference that, IMO, could drop a regression. I then
> > > finally got a chance to add it back.
> > >
> > > A rough test shows it improves the performance of 1400B packet size
> > greatly
> > > in the "txonly in host and rxonly in guest" case: +33% is the number I get
> > > with my test server (Ivybridge).
> > 
> > Thanks Yuanhan! I'll validate this on x86.
> 
> Hi Yuanhan,
> 
> Seems your code doesn't perform correctly. I write a new version
> of avail idx prefetch but didn't see any perf benefit.
> 
> To be honest I doubt the benefit of this idea. The previous mrg_off
> code has this method but doesn't give any benefits.

Good point. I thought of that before, too. But you know that I made it
in rush, that I didn't think further and test more.

I looked the code a bit closer this time, and spotted a bug: the prefetch
actually didn't happen, due to following code piece:

	if (vq->next_avail_idx >= NR_AVAIL_IDX_PREFETCH) {
		prefetch_avail_idx(vq);
		...
	}

Since vq->next_avail_idx is set to 0 at the entrance of enqueue path,
prefetch_avail_idx() will be called. The fix is easy though: just put
prefetch_avail_idx before invoking enqueue_packet.

In summary, Zhihong is right, I see no more gains with that fix :(

However, as stated, that's kind of the only difference I found between
yours and the old code, that maybe it's still worthwhile to have a
test on ARM, Jianbo?

	--yliu

> Even if this is useful, the benefits should be more significant for
> small packets, it's unlikely this simple idx prefetch could bring
> over 30% perf gain for large packets like 1400B ones.
> 
> But if you really do work it out like that I'll be very glad to see.
> 
> Thanks
> Zhihong
> 
> > 
> > >
> > > I guess this might/would help your case as well. Mind to have a test
> > > and tell me the results?
> > >
> > > BTW, I made it in rush; I haven't tested the mrg-rx code path yet.
> > >
> > > Thanks.
> > >
> > > 	--yliu