Re: [PATCH RFC 2/2] [x86] Optimize copy_page by re-arranging instruction sequence and saving register

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Konrad Rzeszutek Wilk <konrad@kernel.org>
To: ling.ma@intel.com
Cc: mingo@elte.hu, hpa@zytor.com, tglx@linutronix.de,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH RFC 2/2] [x86] Optimize copy_page by re-arranging instruction sequence and saving register
Date: Thu, 11 Oct 2012 10:35:28 -0400	[thread overview]
Message-ID: <20121011143527.GA2408@localhost.localdomain> (raw)
In-Reply-To: <1349958548-1868-1-git-send-email-ling.ma@intel.com>

On Thu, Oct 11, 2012 at 08:29:08PM +0800, ling.ma@intel.com wrote:
> From: Ma Ling <ling.ma@intel.com>
> 
> Load and write operation occupy about 35% and 10% respectively
> for most industry benchmarks. Fetched 16-aligned bytes code include 
> about 4 instructions, implying 1.34(0.35 * 4) load, 0.4 write.  
> Modern CPU support 2 load and 1 write per cycle, so throughput from write is
> bottleneck for memcpy or copy_page, and some slight CPU only support one mem
> operation per cycle. So it is enough to issue one read and write instruction
> per cycle, and we can save registers. 

So is that also true for AMD CPUs?
> 
> In this patch we also re-arrange instruction sequence to improve performance
> The performance on atom is improved about 11%, 9% on hot/cold-cache case respectively.
> 
> Signed-off-by: Ma Ling <ling.ma@intel.com>
> 
> ---
>  arch/x86/lib/copy_page_64.S |  103 +++++++++++++++++-------------------------
>  1 files changed, 42 insertions(+), 61 deletions(-)
> 
> diff --git a/arch/x86/lib/copy_page_64.S b/arch/x86/lib/copy_page_64.S
> index 3da5527..13c97f4 100644
> --- a/arch/x86/lib/copy_page_64.S
> +++ b/arch/x86/lib/copy_page_64.S
> @@ -20,76 +20,57 @@ ENDPROC(copy_page_rep)
>  
>  ENTRY(copy_page)
>  	CFI_STARTPROC
> -	subq	$2*8,	%rsp
> -	CFI_ADJUST_CFA_OFFSET 2*8
> -	movq	%rbx,	(%rsp)
> -	CFI_REL_OFFSET rbx, 0
> -	movq	%r12,	1*8(%rsp)
> -	CFI_REL_OFFSET r12, 1*8
> +	mov	$(4096/64)-5, %ecx
>  
> -	movl	$(4096/64)-5,	%ecx
> -	.p2align 4
>  .Loop64:
> -  	dec	%rcx
> -
> -	movq	0x8*0(%rsi), %rax
> -	movq	0x8*1(%rsi), %rbx
> -	movq	0x8*2(%rsi), %rdx
> -	movq	0x8*3(%rsi), %r8
> -	movq	0x8*4(%rsi), %r9
> -	movq	0x8*5(%rsi), %r10
> -	movq	0x8*6(%rsi), %r11
> -	movq	0x8*7(%rsi), %r12
> -
>  	prefetcht0 5*64(%rsi)
> -
> -	movq	%rax, 0x8*0(%rdi)
> -	movq	%rbx, 0x8*1(%rdi)
> -	movq	%rdx, 0x8*2(%rdi)
> -	movq	%r8,  0x8*3(%rdi)
> -	movq	%r9,  0x8*4(%rdi)
> -	movq	%r10, 0x8*5(%rdi)
> -	movq	%r11, 0x8*6(%rdi)
> -	movq	%r12, 0x8*7(%rdi)
> -
> -	leaq	64 (%rsi), %rsi
> -	leaq	64 (%rdi), %rdi
> -
> +	decb	%cl
> +
> +	movq	0x8*0(%rsi), %r10
> +	movq	0x8*1(%rsi), %rax
> +	movq	0x8*2(%rsi), %r8
> +	movq	0x8*3(%rsi), %r9
> +	movq	%r10, 0x8*0(%rdi)
> +	movq	%rax, 0x8*1(%rdi)
> +	movq	%r8, 0x8*2(%rdi)
> +	movq	%r9, 0x8*3(%rdi)
> +
> +	movq	0x8*4(%rsi), %r10
> +	movq	0x8*5(%rsi), %rax
> +	movq	0x8*6(%rsi), %r8
> +	movq	0x8*7(%rsi), %r9
> +	leaq	64(%rsi), %rsi
> +	movq	%r10, 0x8*4(%rdi)
> +	movq	%rax, 0x8*5(%rdi)
> +	movq	%r8, 0x8*6(%rdi)
> +	movq	%r9, 0x8*7(%rdi)
> +	leaq	64(%rdi), %rdi
>  	jnz	.Loop64
>  
> -	movl	$5, %ecx
> -	.p2align 4
> +	mov	$5, %dl
>  .Loop2:
> -	decl	%ecx
> -
> -	movq	0x8*0(%rsi), %rax
> -	movq	0x8*1(%rsi), %rbx
> -	movq	0x8*2(%rsi), %rdx
> -	movq	0x8*3(%rsi), %r8
> -	movq	0x8*4(%rsi), %r9
> -	movq	0x8*5(%rsi), %r10
> -	movq	0x8*6(%rsi), %r11
> -	movq	0x8*7(%rsi), %r12
> -
> -	movq	%rax, 0x8*0(%rdi)
> -	movq	%rbx, 0x8*1(%rdi)
> -	movq	%rdx, 0x8*2(%rdi)
> -	movq	%r8,  0x8*3(%rdi)
> -	movq	%r9,  0x8*4(%rdi)
> -	movq	%r10, 0x8*5(%rdi)
> -	movq	%r11, 0x8*6(%rdi)
> -	movq	%r12, 0x8*7(%rdi)
> -
> -	leaq	64(%rdi), %rdi
> +	decb	%dl
> +	movq	0x8*0(%rsi), %r10
> +	movq	0x8*1(%rsi), %rax
> +	movq	0x8*2(%rsi), %r8
> +	movq	0x8*3(%rsi), %r9
> +	movq	%r10, 0x8*0(%rdi)
> +	movq	%rax, 0x8*1(%rdi)
> +	movq	%r8, 0x8*2(%rdi)
> +	movq	%r9, 0x8*3(%rdi)
> +
> +	movq	0x8*4(%rsi), %r10
> +	movq	0x8*5(%rsi), %rax
> +	movq	0x8*6(%rsi), %r8
> +	movq	0x8*7(%rsi), %r9
>  	leaq	64(%rsi), %rsi
> +	movq	%r10, 0x8*4(%rdi)
> +	movq	%rax, 0x8*5(%rdi)
> +	movq	%r8, 0x8*6(%rdi)
> +	movq	%r9, 0x8*7(%rdi)
> +	leaq	64(%rdi), %rdi
>  	jnz	.Loop2
>  
> -	movq	(%rsp), %rbx
> -	CFI_RESTORE rbx
> -	movq	1*8(%rsp), %r12
> -	CFI_RESTORE r12
> -	addq	$2*8, %rsp
> -	CFI_ADJUST_CFA_OFFSET -2*8
>  	ret
>  .Lcopy_page_end:
>  	CFI_ENDPROC
> -- 
> 1.6.5.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

next prev parent reply	other threads:[~2012-10-11 14:35 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-11 12:29 [PATCH RFC 2/2] [x86] Optimize copy_page by re-arranging instruction sequence and saving register ling.ma
2012-10-11 13:40 ` Andi Kleen
2012-10-12  3:10   ` Ma, Ling
2012-10-12 13:35     ` Andi Kleen
2012-10-12 14:54       ` Ma, Ling
2012-10-12 15:14         ` Andi Kleen
2012-10-11 14:35 ` Konrad Rzeszutek Wilk [this message]
2012-10-12  3:37   ` Ma, Ling
2012-10-12  6:18     ` Borislav Petkov
2012-10-12  9:07       ` Ma, Ling
2012-10-12 18:04         ` Borislav Petkov
2012-10-14 10:58           ` Borislav Petkov
2012-10-15  5:00             ` Ma, Ling
2012-10-15  5:13             ` George Spelvin
  -- strict thread matches above, loose matches on Subject: below --
2012-10-12 21:02 George Spelvin
2012-10-12 23:17 ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121011143527.GA2408@localhost.localdomain \
    --to=konrad@kernel.org \
    --cc=hpa@zytor.com \
    --cc=ling.ma@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox