Re: [PATCH 4/8] net/mlx4: optimize Tx multi-segment case

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Adrien Mazarguil <adrien.mazarguil@6wind.com>
To: Matan Azrad <matan@mellanox.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [PATCH 4/8] net/mlx4: optimize Tx multi-segment case
Date: Wed, 6 Dec 2017 12:55:31 +0100	[thread overview]
Message-ID: <20171206115531.GA4062@6wind.com> (raw)
In-Reply-To: <HE1PR0502MB3659DB14EFE589B5FAF075AED2320@HE1PR0502MB3659.eurprd05.prod.outlook.com>

On Wed, Dec 06, 2017 at 11:29:38AM +0000, Matan Azrad wrote:
> Hi Adrien
> 
> > -----Original Message-----
> > From: Adrien Mazarguil [mailto:adrien.mazarguil@6wind.com]
> > Sent: Wednesday, December 6, 2017 12:59 PM
> > To: Matan Azrad <matan@mellanox.com>
> > Cc: dev@dpdk.org
> > Subject: Re: [PATCH 4/8] net/mlx4: optimize Tx multi-segment case
> > 
> > On Tue, Nov 28, 2017 at 12:19:26PM +0000, Matan Azrad wrote:
> > > mlx4 Tx block can handle up to 4 data segments or control segment + up
> > > to 3 data segments. The first data segment in each not first Tx block
> > > must validate Tx queue wraparound and must use IO memory barrier
> > > before writing the byte count.
> > >
> > > The previous multi-segment code used "for" loop to iterate over all
> > > packet segments and separated first Tx block data case by "if"
> > > statments.
> > 
> > statments => statements
> > 
> > >
> > > Use switch case and unconditional branches instead of "for" loop can
> > > optimize the case and prevents the unnecessary checks for each data
> > > segment; This hints to compiler to create opitimized jump table.
> > 
> > opitimized => optimized
> > 
> > >
> > > Optimize this case by switch case and unconditional branches usage.
> > >
> > > Signed-off-by: Matan Azrad <matan@mellanox.com>
> > > ---
> > >  drivers/net/mlx4/mlx4_rxtx.c | 165
> > > +++++++++++++++++++++++++------------------
> > >  drivers/net/mlx4/mlx4_rxtx.h |  33 +++++++++
> > >  2 files changed, 128 insertions(+), 70 deletions(-)
> > >
> > > diff --git a/drivers/net/mlx4/mlx4_rxtx.c
<snip>
> > > +	/*
> > > +	 * Fill the data segments with buffer information.
> > > +	 * First WQE TXBB head segment is always control segment,
> > > +	 * so jump to tail TXBB data segments code for the first
> > > +	 * WQE data segments filling.
> > > +	 */
> > > +	goto txbb_tail_segs;
> > > +txbb_head_seg:
> > 
> > I'm not fundamentally opposed to "goto" unlike a lot of people out there,
> > but this doesn't look good. It's OK to use goto for error cases and to extricate
> > yourself when trapped in an inner loop, also in some optimization scenarios
> > where it sometimes make sense, but not when the same can be achieved
> > through standard loop constructs and keywords.
> > 
> > In this case I'm under the impression you could have managed with a do { ... }
> > while (...) construct. You need to try harder to reorganize these changes or
> > prove it can't be done without negatively impacting performance.
> > 
> > Doing so should make this patch shorter as well.
> > 
> 
> I noticed this could be done with loop and without unconditional branches, but I checked it and found nice performance improvement using that way.
> When I used the loop I degraded performance.  

OK, you can leave it as is in the meantime then, we'll keep that in mind for
the next refactoring iteration.

<snip>
> > > a/drivers/net/mlx4/mlx4_rxtx.h b/drivers/net/mlx4/mlx4_rxtx.h index
> > > 463df2b..8207232 100644
> > > --- a/drivers/net/mlx4/mlx4_rxtx.h
> > > +++ b/drivers/net/mlx4/mlx4_rxtx.h
> > > @@ -212,4 +212,37 @@ int mlx4_tx_queue_setup(struct rte_eth_dev
> > *dev, uint16_t idx,
> > >  	return mlx4_txq_add_mr(txq, mp, i);
> > >  }
> > >
> > > +/**
> > > + * Write Tx data segment to the SQ.
> > > + *
> > > + * @param dseg
> > > + *   Pointer to data segment in SQ.
> > > + * @param lkey
> > > + *   Memory region lkey.
> > > + * @param addr
> > > + *   Data address.
> > > + * @param byte_count
> > > + *   Big Endian bytes count of the data to send.
> > 
> > Big Endian => Big endian
> > 
> > How about using the dedicated type to properly document it?
> > See rte_be32_t from rte_byteorder.h.
> > 
> Learned new something, thanks, will check it.
> 
> > > + */
> > > +static inline void
> > > +mlx4_fill_tx_data_seg(volatile struct mlx4_wqe_data_seg *dseg,
> > > +		       uint32_t lkey, uintptr_t addr, uint32_t byte_count) {
> > > +	dseg->addr = rte_cpu_to_be_64(addr);
> > > +	dseg->lkey = rte_cpu_to_be_32(lkey); #if RTE_CACHE_LINE_SIZE <
> > 64
> > > +	/*
> > > +	 * Need a barrier here before writing the byte_count
> > > +	 * fields to make sure that all the data is visible
> > > +	 * before the byte_count field is set.
> > > +	 * Otherwise, if the segment begins a new cacheline,
> > > +	 * the HCA prefetcher could grab the 64-byte chunk and
> > > +	 * get a valid (!= 0xffffffff) byte count but stale
> > > +	 * data, and end up sending the wrong data.
> > > +	 */
> > > +	rte_io_wmb();
> > > +#endif /* RTE_CACHE_LINE_SIZE */
> > > +	dseg->byte_count = byte_count;
> > > +}
> > > +
> > 
> > No need to expose this function in a header file. Note that rte_cpu_*() and
> > rte_io*() require the inclusion of rte_byteorder.h and rte_atomic.h
> > respectively.
> > 
> 
> Shouldn't inline functions be in header files?

Not necessarily, like other function types actually. They have to be
declared where needed, in a header file only if several files depend on
them, otherwise they bring namespace pollution for no apparent reason.

-- 
Adrien Mazarguil
6WIND

next prev parent reply	other threads:[~2017-12-06 11:55 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-28 12:19 [PATCH 0/8] improve mlx4 Tx performance Matan Azrad
2017-11-28 12:19 ` [PATCH 1/8] net/mlx4: fix Tx packet drop application report Matan Azrad
2017-12-06 10:57   ` Adrien Mazarguil
2017-11-28 12:19 ` [PATCH 2/8] net/mlx4: remove unnecessary Tx wraparound checks Matan Azrad
2017-12-06 10:57   ` Adrien Mazarguil
2017-11-28 12:19 ` [PATCH 3/8] net/mlx4: remove restamping from Tx error path Matan Azrad
2017-12-06 10:58   ` Adrien Mazarguil
2017-11-28 12:19 ` [PATCH 4/8] net/mlx4: optimize Tx multi-segment case Matan Azrad
2017-12-06 10:58   ` Adrien Mazarguil
2017-12-06 11:29     ` Matan Azrad
2017-12-06 11:55       ` Adrien Mazarguil [this message]
2017-11-28 12:19 ` [PATCH 5/8] net/mlx4: merge Tx queue rings management Matan Azrad
2017-12-06 10:58   ` Adrien Mazarguil
2017-12-06 11:43     ` Matan Azrad
2017-12-06 12:09       ` Adrien Mazarguil
2017-11-28 12:19 ` [PATCH 6/8] net/mlx4: mitigate Tx send entry size calculations Matan Azrad
2017-12-06 10:59   ` Adrien Mazarguil
2017-11-28 12:19 ` [PATCH 7/8] net/mlx4: align Tx descriptors number Matan Azrad
2017-12-06 10:59   ` Adrien Mazarguil
2017-12-06 11:44     ` Matan Azrad
2017-11-28 12:19 ` [PATCH 8/8] net/mlx4: remove Tx completion elements counter Matan Azrad
2017-12-06 10:59   ` Adrien Mazarguil
2017-12-06 14:48 ` [PATCH v2 0/8] improve mlx4 Tx performance Matan Azrad
2017-12-06 14:48   ` [PATCH v2 1/8] net/mlx4: fix Tx packet drop application report Matan Azrad
2017-12-06 14:48   ` [PATCH v2 2/8] net/mlx4: remove unnecessary Tx wraparound checks Matan Azrad
2017-12-06 14:48   ` [PATCH v2 3/8] net/mlx4: remove restamping from Tx error path Matan Azrad
2017-12-06 14:48   ` [PATCH v2 4/8] net/mlx4: optimize Tx multi-segment case Matan Azrad
2017-12-06 16:22     ` Adrien Mazarguil
2017-12-06 14:48   ` [PATCH v2 5/8] net/mlx4: merge Tx queue rings management Matan Azrad
2017-12-06 16:22     ` Adrien Mazarguil
2017-12-06 14:48   ` [PATCH v2 6/8] net/mlx4: mitigate Tx send entry size calculations Matan Azrad
2017-12-06 14:48   ` [PATCH v2 7/8] net/mlx4: align Tx descriptors number Matan Azrad
2017-12-06 16:22     ` Adrien Mazarguil
2017-12-06 17:24       ` Matan Azrad
2017-12-06 14:48   ` [PATCH v2 8/8] net/mlx4: remove Tx completion elements counter Matan Azrad
2017-12-06 16:22     ` Adrien Mazarguil
2017-12-06 17:57   ` [PATCH v3 0/8] improve mlx4 Tx performance Matan Azrad
2017-12-06 17:57     ` [PATCH v3 1/8] net/mlx4: fix Tx packet drop application report Matan Azrad
2017-12-06 17:57     ` [PATCH v3 2/8] net/mlx4: remove unnecessary Tx wraparound checks Matan Azrad
2017-12-06 17:57     ` [PATCH v3 3/8] net/mlx4: remove restamping from Tx error path Matan Azrad
2017-12-06 17:57     ` [PATCH v3 4/8] net/mlx4: optimize Tx multi-segment case Matan Azrad
2017-12-06 17:57     ` [PATCH v3 5/8] net/mlx4: merge Tx queue rings management Matan Azrad
2017-12-06 17:57     ` [PATCH v3 6/8] net/mlx4: mitigate Tx send entry size calculations Matan Azrad
2017-12-06 17:57     ` [PATCH v3 7/8] net/mlx4: align Tx descriptors number Matan Azrad
2017-12-06 17:57     ` [PATCH v3 8/8] net/mlx4: remove Tx completion elements counter Matan Azrad
2017-12-07 10:56     ` [PATCH v3 0/8] improve mlx4 Tx performance Adrien Mazarguil
2017-12-10 10:22       ` Shahaf Shuler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171206115531.GA4062@6wind.com \
    --to=adrien.mazarguil@6wind.com \
    --cc=dev@dpdk.org \
    --cc=matan@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.