Re: [PATCH 2/4] Accelerated CRC T10 DIF computation with PCLMULQDQ instruction

linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jussi Kivilinna <jussi.kivilinna@iki.fi>
To: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"David S. Miller" <davem@davemloft.net>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	James Bottomley <James.Bottomley@HansenPartnership.com>,
	Matthew Wilcox <willy@linux.intel.com>,
	Jim Kukunas <james.t.kukunas@linux.intel.com>,
	Keith Busch <keith.busch@intel.com>,
	Erdinc Ozturk <erdinc.ozturk@intel.com>,
	Vinodh Gopal <vinodh.gopal@intel.com>,
	James Guilford <james.guilford@intel.com>,
	Wajdi Feghali <wajdi.k.feghali@intel.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	linux-crypto@vger.kernel.org, linux-scsi@vger.kernel.org
Subject: Re: [PATCH 2/4] Accelerated CRC T10 DIF computation with PCLMULQDQ instruction
Date: Wed, 17 Apr 2013 20:58:30 +0300	[thread overview]
Message-ID: <516EE2C6.6010901@iki.fi> (raw)
In-Reply-To: <5227e0b295142e1fbb3c7e0241646eb65319b18a.1366120266.git.tim.c.chen@linux.intel.com>

On 16.04.2013 19:20, Tim Chen wrote:
> This is the x86_64 CRC T10 DIF transform accelerated with the PCLMULQDQ
> instructions.  Details discussing the implementation can be found in the
> paper:
> 
> "Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction"
> URL: http://download.intel.com/design/intarch/papers/323102.pdf

URL does not work.

> 
> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> Tested-by: Keith Busch <keith.busch@intel.com>
> ---
>  arch/x86/crypto/crct10dif-pcl-asm_64.S | 659 +++++++++++++++++++++++++++++++++
>  1 file changed, 659 insertions(+)
>  create mode 100644 arch/x86/crypto/crct10dif-pcl-asm_64.S
<snip>
> +
> +	# Allocate Stack Space
> +	mov     %rsp, %rcx
> +	sub	$16*10, %rsp
> +	and     $~(0x20 - 1), %rsp
> +
> +	# push the xmm registers into the stack to maintain
> +	movdqa %xmm10, 16*2(%rsp)
> +	movdqa %xmm11, 16*3(%rsp)
> +	movdqa %xmm8 , 16*4(%rsp)
> +	movdqa %xmm12, 16*5(%rsp)
> +	movdqa %xmm13, 16*6(%rsp)
> +	movdqa %xmm6,  16*7(%rsp)
> +	movdqa %xmm7,  16*8(%rsp)
> +	movdqa %xmm9,  16*9(%rsp)

You don't need to store (and restore) these, as 'crc_t10dif_pcl' is called between kernel_fpu_begin/_end.

> +
> +
> +	# check if smaller than 256
> +	cmp	$256, arg3
> +
<snip>
> +_cleanup:
> +	# scale the result back to 16 bits
> +	shr	$16, %eax
> +	movdqa	16*2(%rsp), %xmm10
> +	movdqa	16*3(%rsp), %xmm11
> +	movdqa	16*4(%rsp), %xmm8
> +	movdqa	16*5(%rsp), %xmm12
> +	movdqa	16*6(%rsp), %xmm13
> +	movdqa	16*7(%rsp), %xmm6
> +	movdqa	16*8(%rsp), %xmm7
> +	movdqa	16*9(%rsp), %xmm9

Registers are overwritten by kernel_fpu_end.

> +	mov     %rcx, %rsp
> +	ret
> +ENDPROC(crc_t10dif_pcl)
> +

You should move ENDPROC at end of the full function.

> +########################################################################
> +
> +.align 16
> +_less_than_128:
> +
> +	# check if there is enough buffer to be able to fold 16B at a time
> +	cmp	$32, arg3
<snip>
> +	movdqa	(%rsp), %xmm7
> +	pshufb	%xmm11, %xmm7
> +	pxor	%xmm0 , %xmm7   # xor the initial crc value
> +
> +	psrldq	$7, %xmm7
> +
> +	jmp	_barrett

Move ENDPROC here.


 -Jussi

next prev parent reply	other threads:[~2013-04-17 17:58 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-16 16:20 [PATCH 0/4] Patchset to use PCLMULQDQ to accelerate CRC-T10DIF checksum computation Tim Chen
2013-04-16 16:20 ` [PATCH 1/4] Wrap crc_t10dif function all to use crypto transform framework Tim Chen
2013-04-16 16:20 ` [PATCH 2/4] Accelerated CRC T10 DIF computation with PCLMULQDQ instruction Tim Chen
2013-04-17 17:58   ` Jussi Kivilinna [this message]
2013-04-17 18:20     ` Tim Chen
2013-04-16 16:20 ` [PATCH 3/4] Glue code to cast accelerated CRCT10DIF assembly as a crypto transform Tim Chen
2013-04-16 16:20 ` [PATCH 4/4] Simple correctness and speed test for CRCT10DIF hash Tim Chen
2013-04-17 17:58   ` Jussi Kivilinna
2013-04-17 18:07     ` Tim Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=516EE2C6.6010901@iki.fi \
    --to=jussi.kivilinna@iki.fi \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=davem@davemloft.net \
    --cc=erdinc.ozturk@intel.com \
    --cc=herbert@gondor.apana.org.au \
    --cc=hpa@zytor.com \
    --cc=james.guilford@intel.com \
    --cc=james.t.kukunas@linux.intel.com \
    --cc=keith.busch@intel.com \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=vinodh.gopal@intel.com \
    --cc=wajdi.k.feghali@intel.com \
    --cc=willy@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).