From: David Daney <ddaney.cavm@gmail.com>
To: "Steven J. Hill" <Steven.Hill@imgtec.com>
Cc: linux-mips@linux-mips.org, ralf@linux-mips.org
Subject: Re: [PATCH] MIPS: lib: Optimize partial checksum ops using prefetching.
Date: Tue, 21 Jan 2014 10:25:42 -0800 [thread overview]
Message-ID: <52DEBBA6.9070701@gmail.com> (raw)
In-Reply-To: <1390321122-25634-1-git-send-email-Steven.Hill@imgtec.com>
On 01/21/2014 08:18 AM, Steven J. Hill wrote:
> From: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com>
>
> Use the PREF instruction to optimize partial checksum operations.
>
> Signed-off-by: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com>
> Signed-off-by: Steven J. Hill <Steven.Hill@imgtec.com>
NACK. The proper latench and cacheline stride vary by CPU, you cannot
just hard code them for 32-byte cacheline size with some random latency.
This will make some CPUs slower.
> ---
> arch/mips/lib/csum_partial.S | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/arch/mips/lib/csum_partial.S b/arch/mips/lib/csum_partial.S
> index a6adffb..272820e 100644
> --- a/arch/mips/lib/csum_partial.S
> +++ b/arch/mips/lib/csum_partial.S
> @@ -417,13 +417,19 @@ FEXPORT(csum_partial_copy_nocheck)
> *
> * If len < NBYTES use byte operations.
> */
> + PREF( 0, 0(src))
> + PREF( 1, 0(dst))
> sltu t2, len, NBYTES
> and t1, dst, ADDRMASK
> bnez t2, .Lcopy_bytes_checklen
> + PREF( 0, 32(src))
> + PREF( 1, 32(dst))
> and t0, src, ADDRMASK
> andi odd, dst, 0x1 /* odd buffer? */
> bnez t1, .Ldst_unaligned
> nop
> + PREF( 0, 2*32(src))
> + PREF( 1, 2*32(dst))
> bnez t0, .Lsrc_unaligned_dst_aligned
> /*
> * use delay slot for fall-through
> @@ -434,6 +440,8 @@ FEXPORT(csum_partial_copy_nocheck)
> beqz t0, .Lcleanup_both_aligned # len < 8*NBYTES
> nop
> SUB len, 8*NBYTES # subtract here for bgez loop
> + PREF( 0, 3*32(src))
> + PREF( 1, 3*32(dst))
> .align 4
> 1:
> EXC( LOAD t0, UNIT(0)(src), .Ll_exc)
> @@ -464,6 +472,8 @@ EXC( STORE t7, UNIT(7)(dst), .Ls_exc)
> ADDC(sum, t7)
> .set reorder /* DADDI_WAR */
> ADD dst, dst, 8*NBYTES
> + PREF( 0, 8*32(src))
> + PREF( 1, 8*32(dst))
> bgez len, 1b
> .set noreorder
> ADD len, 8*NBYTES # revert len (see above)
> @@ -569,8 +579,10 @@ EXC( STFIRST t3, FIRST(0)(dst), .Ls_exc)
>
> .Lsrc_unaligned_dst_aligned:
> SRL t0, len, LOG_NBYTES+2 # +2 for 4 units/iter
> + PREF( 0, 3*32(src))
> beqz t0, .Lcleanup_src_unaligned
> and rem, len, (4*NBYTES-1) # rem = len % 4*NBYTES
> + PREF( 1, 3*32(dst))
> 1:
> /*
> * Avoid consecutive LD*'s to the same register since some mips
>
next prev parent reply other threads:[~2014-01-21 18:25 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-21 16:18 [PATCH] MIPS: lib: Optimize partial checksum ops using prefetching Steven J. Hill
2014-01-21 17:37 ` Florian Fainelli
2014-01-21 18:25 ` David Daney [this message]
2014-01-21 20:16 ` Steven J. Hill
2014-01-21 20:25 ` Florian Fainelli
2014-01-21 20:49 ` Ralf Baechle
2014-01-21 20:58 ` Steven J. Hill
2014-01-21 20:58 ` Steven J. Hill
2014-01-21 21:03 ` David Daney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52DEBBA6.9070701@gmail.com \
--to=ddaney.cavm@gmail.com \
--cc=Steven.Hill@imgtec.com \
--cc=linux-mips@linux-mips.org \
--cc=ralf@linux-mips.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.