All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jisheng Zhang <jszhang@kernel.org>
To: David Laight <David.Laight@aculab.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Albert Ou <aou@eecs.berkeley.edu>,
	"linux-riscv@lists.infradead.org"
	<linux-riscv@lists.infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Matteo Croce <mcroce@microsoft.com>,
	kernel test robot <lkp@intel.com>
Subject: Re: [PATCH 2/3] riscv: optimized memmove
Date: Tue, 30 Jan 2024 19:30:43 +0800	[thread overview]
Message-ID: <Zbjd43y3s6PDfQE0@xhacker> (raw)
In-Reply-To: <59bed43df37b4361a8a1cb31b8582e9b@AcuMS.aculab.com>

On Sun, Jan 28, 2024 at 12:47:00PM +0000, David Laight wrote:
> From: Jisheng Zhang
> > Sent: 28 January 2024 11:10
> > 
> > When the destination buffer is before the source one, or when the
> > buffers doesn't overlap, it's safe to use memcpy() instead, which is
> > optimized to use a bigger data size possible.
> > 
> ...
> > + * Simply check if the buffer overlaps an call memcpy() in case,
> > + * otherwise do a simple one byte at time backward copy.
> 
> I'd at least do a 64bit copy loop if the addresses are aligned.
> 
> Thinks a bit more....
> 
> Put the copy 64 bytes code (the body of the memcpy() loop)
> into it an inline function and call it with increasing addresses
> in memcpy() are decrementing addresses in memmove.

Hi David,

Besides the 64 bytes copy, there's another optimization in __memcpy:
word-by-word copy even if s and d are not aligned.
So if we make the two optimizd copy as inline functions and call them
in memmove(), we almost duplicate the __memcpy code, so I think
directly calling __memcpy is a bit better.

Thanks
> 
> So memcpy() contains:
> 	src_lim = src_lim + count;
> 	... alignment copy
> 	for (; src + 64 <= src_lim; src += 64; dest += 64)
> 		copy_64_bytes(dest, src);
> 	... tail copy
> 
> Then you can do something very similar for backwards copies.
> 
> 	David
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
> 

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

WARNING: multiple messages have this Message-ID (diff)
From: Jisheng Zhang <jszhang@kernel.org>
To: David Laight <David.Laight@aculab.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Albert Ou <aou@eecs.berkeley.edu>,
	"linux-riscv@lists.infradead.org"
	<linux-riscv@lists.infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Matteo Croce <mcroce@microsoft.com>,
	kernel test robot <lkp@intel.com>
Subject: Re: [PATCH 2/3] riscv: optimized memmove
Date: Tue, 30 Jan 2024 19:30:43 +0800	[thread overview]
Message-ID: <Zbjd43y3s6PDfQE0@xhacker> (raw)
In-Reply-To: <59bed43df37b4361a8a1cb31b8582e9b@AcuMS.aculab.com>

On Sun, Jan 28, 2024 at 12:47:00PM +0000, David Laight wrote:
> From: Jisheng Zhang
> > Sent: 28 January 2024 11:10
> > 
> > When the destination buffer is before the source one, or when the
> > buffers doesn't overlap, it's safe to use memcpy() instead, which is
> > optimized to use a bigger data size possible.
> > 
> ...
> > + * Simply check if the buffer overlaps an call memcpy() in case,
> > + * otherwise do a simple one byte at time backward copy.
> 
> I'd at least do a 64bit copy loop if the addresses are aligned.
> 
> Thinks a bit more....
> 
> Put the copy 64 bytes code (the body of the memcpy() loop)
> into it an inline function and call it with increasing addresses
> in memcpy() are decrementing addresses in memmove.

Hi David,

Besides the 64 bytes copy, there's another optimization in __memcpy:
word-by-word copy even if s and d are not aligned.
So if we make the two optimizd copy as inline functions and call them
in memmove(), we almost duplicate the __memcpy code, so I think
directly calling __memcpy is a bit better.

Thanks
> 
> So memcpy() contains:
> 	src_lim = src_lim + count;
> 	... alignment copy
> 	for (; src + 64 <= src_lim; src += 64; dest += 64)
> 		copy_64_bytes(dest, src);
> 	... tail copy
> 
> Then you can do something very similar for backwards copies.
> 
> 	David
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
> 

  reply	other threads:[~2024-01-30 11:43 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-28 11:10 [PATCH 0/3] riscv: optimize memcpy/memmove/memset Jisheng Zhang
2024-01-28 11:10 ` Jisheng Zhang
2024-01-28 11:10 ` [PATCH 1/3] riscv: optimized memcpy Jisheng Zhang
2024-01-28 11:10   ` Jisheng Zhang
2024-01-28 12:35   ` David Laight
2024-01-28 12:35     ` David Laight
2024-01-30 12:11   ` Nick Kossifidis
2024-01-30 12:11     ` Nick Kossifidis
2024-01-30 22:44   ` kernel test robot
2024-01-31  0:19   ` kernel test robot
2024-01-31  0:19   ` kernel test robot
2024-01-28 11:10 ` [PATCH 2/3] riscv: optimized memmove Jisheng Zhang
2024-01-28 11:10   ` Jisheng Zhang
2024-01-28 12:47   ` David Laight
2024-01-28 12:47     ` David Laight
2024-01-30 11:30     ` Jisheng Zhang [this message]
2024-01-30 11:30       ` Jisheng Zhang
2024-01-30 11:51       ` David Laight
2024-01-30 11:51         ` David Laight
2024-01-30 11:39   ` Nick Kossifidis
2024-01-30 11:39     ` Nick Kossifidis
2024-01-30 13:12     ` Jisheng Zhang
2024-01-30 13:12       ` Jisheng Zhang
2024-01-30 16:52       ` Nick Kossifidis
2024-01-30 16:52         ` Nick Kossifidis
2024-01-31  5:25         ` Jisheng Zhang
2024-01-31  5:25           ` Jisheng Zhang
2024-01-31  9:13           ` Nick Kossifidis
2024-01-31  9:13             ` Nick Kossifidis
2024-01-28 11:10 ` [PATCH 3/3] riscv: optimized memset Jisheng Zhang
2024-01-28 11:10   ` Jisheng Zhang
2024-01-30 12:07   ` Nick Kossifidis
2024-01-30 12:07     ` Nick Kossifidis
2024-01-30 13:25     ` Jisheng Zhang
2024-01-30 13:25       ` Jisheng Zhang
2024-02-01 23:04     ` David Laight
2024-02-01 23:04       ` David Laight
2024-01-29 18:16 ` [PATCH 0/3] riscv: optimize memcpy/memmove/memset Conor Dooley
2024-01-29 18:16   ` Conor Dooley
2024-01-30  2:28   ` Jisheng Zhang
2024-01-30  2:28     ` Jisheng Zhang
  -- strict thread matches above, loose matches on Subject: below --
2021-06-15  2:38 [PATCH 0/3] riscv: optimized mem* functions Matteo Croce
2021-06-15  2:38 ` [PATCH 2/3] riscv: optimized memmove Matteo Croce
2021-06-15  2:38   ` Matteo Croce

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zbjd43y3s6PDfQE0@xhacker \
    --to=jszhang@kernel.org \
    --cc=David.Laight@aculab.com \
    --cc=aou@eecs.berkeley.edu \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=lkp@intel.com \
    --cc=mcroce@microsoft.com \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.