All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Daney <ddaney.cavm@gmail.com>
To: "Steven J. Hill" <Steven.Hill@imgtec.com>
Cc: linux-mips@linux-mips.org, ralf@linux-mips.org
Subject: Re: [PATCH] MIPS: lib: Optimize partial checksum ops using prefetching.
Date: Tue, 21 Jan 2014 10:25:42 -0800	[thread overview]
Message-ID: <52DEBBA6.9070701@gmail.com> (raw)
In-Reply-To: <1390321122-25634-1-git-send-email-Steven.Hill@imgtec.com>

On 01/21/2014 08:18 AM, Steven J. Hill wrote:
> From: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com>
>
> Use the PREF instruction to optimize partial checksum operations.
>
> Signed-off-by: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com>
> Signed-off-by: Steven J. Hill <Steven.Hill@imgtec.com>

NACK.  The proper latench and cacheline stride vary by CPU, you cannot 
just hard code them for 32-byte cacheline size with some random latency.

This will make some CPUs slower.

> ---
>   arch/mips/lib/csum_partial.S | 12 ++++++++++++
>   1 file changed, 12 insertions(+)
>
> diff --git a/arch/mips/lib/csum_partial.S b/arch/mips/lib/csum_partial.S
> index a6adffb..272820e 100644
> --- a/arch/mips/lib/csum_partial.S
> +++ b/arch/mips/lib/csum_partial.S
> @@ -417,13 +417,19 @@ FEXPORT(csum_partial_copy_nocheck)
>   	 *
>   	 * If len < NBYTES use byte operations.
>   	 */
> +	PREF(	0, 0(src))
> +	PREF(	1, 0(dst))
>   	sltu	t2, len, NBYTES
>   	and	t1, dst, ADDRMASK
>   	bnez	t2, .Lcopy_bytes_checklen
> +	PREF(	0, 32(src))
> +	PREF(	1, 32(dst))
>   	 and	t0, src, ADDRMASK
>   	andi	odd, dst, 0x1			/* odd buffer? */
>   	bnez	t1, .Ldst_unaligned
>   	 nop
> +	PREF(	0, 2*32(src))
> +	PREF(	1, 2*32(dst))
>   	bnez	t0, .Lsrc_unaligned_dst_aligned
>   	/*
>   	 * use delay slot for fall-through
> @@ -434,6 +440,8 @@ FEXPORT(csum_partial_copy_nocheck)
>   	beqz	t0, .Lcleanup_both_aligned # len < 8*NBYTES
>   	 nop
>   	SUB	len, 8*NBYTES		# subtract here for bgez loop
> +	PREF(	0, 3*32(src))
> +	PREF(	1, 3*32(dst))
>   	.align	4
>   1:
>   EXC(	LOAD	t0, UNIT(0)(src),	.Ll_exc)
> @@ -464,6 +472,8 @@ EXC(	STORE	t7, UNIT(7)(dst),	.Ls_exc)
>   	ADDC(sum, t7)
>   	.set	reorder				/* DADDI_WAR */
>   	ADD	dst, dst, 8*NBYTES
> +	PREF(	0, 8*32(src))
> +	PREF(	1, 8*32(dst))
>   	bgez	len, 1b
>   	.set	noreorder
>   	ADD	len, 8*NBYTES		# revert len (see above)
> @@ -569,8 +579,10 @@ EXC(	STFIRST t3, FIRST(0)(dst),	.Ls_exc)
>
>   .Lsrc_unaligned_dst_aligned:
>   	SRL	t0, len, LOG_NBYTES+2	 # +2 for 4 units/iter
> +	PREF(	0, 3*32(src))
>   	beqz	t0, .Lcleanup_src_unaligned
>   	 and	rem, len, (4*NBYTES-1)	 # rem = len % 4*NBYTES
> +	PREF(	1, 3*32(dst))
>   1:
>   /*
>    * Avoid consecutive LD*'s to the same register since some mips
>

  parent reply	other threads:[~2014-01-21 18:25 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-21 16:18 [PATCH] MIPS: lib: Optimize partial checksum ops using prefetching Steven J. Hill
2014-01-21 17:37 ` Florian Fainelli
2014-01-21 18:25 ` David Daney [this message]
2014-01-21 20:16   ` Steven J. Hill
2014-01-21 20:25     ` Florian Fainelli
2014-01-21 20:49 ` Ralf Baechle
2014-01-21 20:58   ` Steven J. Hill
2014-01-21 20:58     ` Steven J. Hill
2014-01-21 21:03     ` David Daney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52DEBBA6.9070701@gmail.com \
    --to=ddaney.cavm@gmail.com \
    --cc=Steven.Hill@imgtec.com \
    --cc=linux-mips@linux-mips.org \
    --cc=ralf@linux-mips.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.