From mboxrd@z Thu Jan  1 00:00:00 1970
From: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Subject: Re: [PATCH] mbuf: make rearm_data address naturally
 aligned
Date: Thu, 19 May 2016 00:20:16 +0530
Message-ID: <20160518185011.GA4432@localhost.localdomain>
References: <1463579863-32053-1-git-send-email-jerin.jacob@caviumnetworks.com>
 <20160518164300.GA12324@bricha3-MOBL3>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Cc: <dev@dpdk.org>, <thomas.monjalon@6wind.com>,
 <konstantin.ananyev@intel.com>, <viktorin@rehivetech.com>,
 <jianbo.liu@linaro.org>
To: Bruce Richardson <bruce.richardson@intel.com>
Return-path: <dev-bounces@dpdk.org>
Received: from na01-bn1-obe.outbound.protection.outlook.com
 (mail-bn1bon0081.outbound.protection.outlook.com [157.56.111.81])
 by dpdk.org (Postfix) with ESMTP id 7F5F06CC8
 for <dev@dpdk.org>; Wed, 18 May 2016 20:50:40 +0200 (CEST)
Content-Disposition: inline
In-Reply-To: <20160518164300.GA12324@bricha3-MOBL3>
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

On Wed, May 18, 2016 at 05:43:00PM +0100, Bruce Richardson wrote:
> On Wed, May 18, 2016 at 07:27:43PM +0530, Jerin Jacob wrote:
> > To avoid multiple stores on fast path, Ethernet drivers
> > aggregate the writes to data_off, refcnt, nb_segs and port
> > to an uint64_t data and write the data in one shot
> > with uint64_t* at &mbuf->rearm_data address.
> > 
> > Some of the non-IA platforms have store operation overhead
> > if the store address is not naturally aligned.This patch
> > fixes the performance issue on those targets.
> > 
> > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > ---
> > 
> > Tested this patch on IA and non-IA(ThunderX) platforms.
> > This patch shows 400Kpps/core improvement on ThunderX + ixgbe + vector environment.
> > and this patch does not have any overhead on IA platform.
> > 
> > Have tried an another similar approach by replacing "buf_len" with "pad"
> > (in this patch context),
> > Since it has additional overhead on read and then mask to keep "buf_len" intact,
> > not much improvement is not shown.
> > ref: http://dpdk.org/ml/archives/dev/2016-May/038914.html
> > 
> > ---
> While this will work and from your tests doesn't seem to have a performance
> impact, I'm not sure I particularly like it. It's extending out the end of
> cacheline0 of the mbuf by 16 bytes, though I suppose it's not technically using
> up any more space of it.

Extending by 2 bytes. Right ?. Yes, I guess, Now we using only 56 out of 64 bytes
in the first 64-byte cache line.

> 
> What I'm wondering about though, is do we have any usecases where we need a
> variable buf_len for packets for RX. These mbufs come directly from a mempool,
> which is generally understood to be a set of fixed-sized buffers. I realise that
> this change was made in the past after some discussion, but one of the key points
> there [at least to my reading] was that - even though nobody actually made a
> concrete case where they had variable-sized buffers - having support for them
> made no performance difference.
> 
> The latter part of that has now changed, and supporting variable-sized mbufs
> from an mbuf pool has a perf impact. Do we definitely need that functionality,
> because the easiest fix here is just to move the rxrearm marker back above
> mbuf_len as it was originally in releases like 1.8?

And initialize the buf_len with mp->elt_size - sizeof(struct rte_mbuf).
Right?

I don't have a strong opinion on this, I can do this if there is no
objection on this. Let me know.

However, I do see in future, "buf_len" may belong at the end of the first 64 byte
cache line as currently "port" is defined as uint8_t, IMO, that is less.
We may need to increase that uint16_t. The reason why I think that
because, Currently in ThunderX HW, we do have 128VFs per socket for
built-in NIC, So, the two node configuration and one external PCIe NW card
configuration can easily go beyond 256 ports.

> 
> Regards,
> /Bruce
> 
> Ref: http://dpdk.org/ml/archives/dev/2014-December/009432.html
>