All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stephen Hemminger <stephen@networkplumber.org>
To: "Morten Brørup" <mb@smartsharesystems.com>
Cc: bruce.richardson@intel.com, konstantin.v.ananyev@yandex.ru,
	mattias.ronnblom@ericsson.com, dev@dpdk.org
Subject: Re: [PATCH] eal/x86: improve rte_memcpy const size 16 performance
Date: Sat, 2 Mar 2024 21:58:07 -0800	[thread overview]
Message-ID: <20240302215807.6d7c3cd9@hermes.local> (raw)
In-Reply-To: <20240302214003.15c37310@hermes.local>

On Sat, 2 Mar 2024 21:40:03 -0800
Stephen Hemminger <stephen@networkplumber.org> wrote:

> On Sun,  3 Mar 2024 00:48:12 +0100
> Morten Brørup <mb@smartsharesystems.com> wrote:
> 
> > When the rte_memcpy() size is 16, the same 16 bytes are copied twice.
> > In the case where the size is knownto be 16 at build tine, omit the
> > duplicate copy.
> > 
> > Reduced the amount of effectively copy-pasted code by using #ifdef
> > inside functions instead of outside functions.
> > 
> > Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
> > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > ---  
> 
> Looks good, let me see how it looks in goldbolt vs Gcc.
> 
> One other issue is that for the non-constant case, rte_memcpy has an excessively
> large inline code footprint. That is one of the reasons Gcc doesn't always
> inline.  For > 128 bytes, it really should be a function.

For size of 4,6,8,16, 32, 64, up to 128 Gcc inline and rte_memcpy match.

For size 128. It looks gcc is simpler.

rte_copy_addr:
        vmovdqu ymm0, YMMWORD PTR [rsi]
        vextracti128    XMMWORD PTR [rdi+16], ymm0, 0x1
        vmovdqu XMMWORD PTR [rdi], xmm0
        vmovdqu ymm0, YMMWORD PTR [rsi+32]
        vextracti128    XMMWORD PTR [rdi+48], ymm0, 0x1
        vmovdqu XMMWORD PTR [rdi+32], xmm0
        vmovdqu ymm0, YMMWORD PTR [rsi+64]
        vextracti128    XMMWORD PTR [rdi+80], ymm0, 0x1
        vmovdqu XMMWORD PTR [rdi+64], xmm0
        vmovdqu ymm0, YMMWORD PTR [rsi+96]
        vextracti128    XMMWORD PTR [rdi+112], ymm0, 0x1
        vmovdqu XMMWORD PTR [rdi+96], xmm0
        vzeroupper
        ret
copy_addr:
        vmovdqu ymm0, YMMWORD PTR [rsi]
        vmovdqu YMMWORD PTR [rdi], ymm0
        vmovdqu ymm1, YMMWORD PTR [rsi+32]
        vmovdqu YMMWORD PTR [rdi+32], ymm1
        vmovdqu ymm2, YMMWORD PTR [rsi+64]
        vmovdqu YMMWORD PTR [rdi+64], ymm2
        vmovdqu ymm3, YMMWORD PTR [rsi+96]
        vmovdqu YMMWORD PTR [rdi+96], ymm3
        vzeroupper
        ret

  parent reply	other threads:[~2024-03-03  5:58 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-02 23:48 [PATCH] eal/x86: improve rte_memcpy const size 16 performance Morten Brørup
2024-03-03  0:38 ` Morten Brørup
2024-03-03  5:40 ` Stephen Hemminger
2024-03-03  5:47   ` Stephen Hemminger
2024-03-03  5:58     ` Stephen Hemminger
2024-03-03  5:58   ` Stephen Hemminger [this message]
2024-03-03 10:07     ` Morten Brørup
2024-03-03  5:41 ` Stephen Hemminger
2024-03-03  9:46 ` [PATCH v2] " Morten Brørup
2024-04-04  9:18   ` Morten Brørup
2024-04-04 10:07   ` Bruce Richardson
2024-04-04 11:19     ` Morten Brørup
2024-04-04 13:29       ` Bruce Richardson
2024-04-04 15:37         ` Morten Brørup
2024-04-04 15:55           ` Stephen Hemminger
2024-04-04 16:10             ` Morten Brørup
2024-04-04 16:55               ` Bruce Richardson
2024-03-03 16:05 ` [PATCH] " Stephen Hemminger
2024-04-05 12:46 ` [PATCH v3] " Morten Brørup
2024-04-05 13:17   ` Bruce Richardson
2024-04-05 13:48 ` [PATCH v4] " Morten Brørup
2024-05-27 13:15 ` Morten Brørup
2024-05-27 13:16 ` [PATCH v5] " Morten Brørup
2024-05-27 14:13   ` Morten Brørup
2024-05-28  6:18 ` Morten Brørup
2024-05-28  6:22 ` [PATCH v6] " Morten Brørup
2024-05-28  7:05 ` [PATCH v7] " Morten Brørup
2024-05-30 15:41 ` [PATCH v8] " Morten Brørup
2024-06-10  9:05   ` Morten Brørup
2024-06-10 13:40   ` Konstantin Ananyev
2024-06-10 13:59     ` Morten Brørup
2024-07-09  9:24     ` David Marchand
2024-07-09 11:42       ` David Marchand
2024-07-09 12:43         ` Morten Brørup
2024-07-09 12:47           ` David Marchand
2024-07-09 12:54             ` Morten Brørup
2024-07-09 15:26             ` Patrick Robb
2024-07-09 13:27 ` [PATCH v9] " Morten Brørup
2024-07-09 15:42   ` David Marchand
2024-07-10  8:03   ` David Marchand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240302215807.6d7c3cd9@hermes.local \
    --to=stephen@networkplumber.org \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=konstantin.v.ananyev@yandex.ru \
    --cc=mattias.ronnblom@ericsson.com \
    --cc=mb@smartsharesystems.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.