From: Arnd Bergmann <arnd@arndb.de>
To: linuxppc-dev@ozlabs.org
Subject: Re: [RFC 1/3] cryptoapi: AES with AltiVec support
Date: Wed, 11 Apr 2007 20:24:50 +0200 [thread overview]
Message-ID: <200704112024.51124.arnd@arndb.de> (raw)
In-Reply-To: <20070411165702.256910000@linux.vnet.ibm.com>
Just a few things I noticed in this version:
On Wednesday 11 April 2007, Sebastian Siewior wrote:
> +#ifdef CONFIG_CRYPTO_AES_ALTIVEC_TABLE
> +
> + /* 19 operations, 1x 8 memory loads */
> + imm_04h = vec_splat_u8(0x04);
> + imm_0fh = vec_splat_u8(0x0f);
> +
> + state_high = vec_sr(state, imm_04h);
> + state_low = vec_and(state, imm_0fh);
Why do you need to and with 0x0f here? I thought vec_perm simply ignored
the high bits anyway.
> + mul_08_hi = vec_perm(gf_mul_8_high, gf_mul_8_high, state_high);
> + mul_0a_hi = vec_perm(gf_mul_a_high, gf_mul_a_high, state_high);
> + mul_0c_hi = vec_perm(gf_mul_c_high, gf_mul_c_high, state_high);
> + mul_0e_hi = vec_perm(gf_mul_e_high, gf_mul_e_high, state_high);
> +
> + mul_08_lo = vec_perm(gf_mul_8_low, gf_mul_8_low, state_low);
> + mul_0a_lo = vec_perm(gf_mul_a_low, gf_mul_a_low, state_low);
> + mul_0c_lo = vec_perm(gf_mul_c_low, gf_mul_c_low, state_low);
> + mul_0e_lo = vec_perm(gf_mul_e_low, gf_mul_e_low, state_low);
> +
> + mul_08 = vec_xor(mul_08_hi, mul_08_lo);
> + mul_0a = vec_xor(mul_0a_hi, mul_0a_lo);
> + mul_0c = vec_xor(mul_0c_hi, mul_0c_lo);
> + mul_0e = vec_xor(mul_0e_hi, mul_0e_lo);
> +
> + mul_09 = vec_xor(mul_08, state);
> + mul_0b = vec_xor(mul_0a, state);
> + mul_0d = vec_xor(mul_0c, state);
What are the last three xor used for? I'd think you can have the values
for 0x9, 0xb and 0xd in the table directly.
> +int aes_decrypt_altivec(const unsigned char *in, unsigned char *out,
> + const unsigned char *kp, unsigned char key_len)
> +{
> + unsigned char i;
> + vector unsigned char pstate;
> + const vector unsigned char *key;
> + unsigned char tmpbuf[16] __attribute__ ((aligned (16)));
My understanding is that gcc will not align the latter on the stack
now does it warn about the fact that the attribute gets ignored.
Have you checked that the variable is really put into a 16 byte
aligned stack slot? Does it even make a difference?
> + memcpy(tmpbuf, in, sizeof(tmpbuf));
> + pstate = vec_ld(0, tmpbuf);
> +
> + key = (const vector unsigned char*) kp;
> +
> + pstate = vec_xor(pstate, *key++);
> +
> + switch (key_len) {
> + case 32: /* 14 rounds */
> + pstate = InvNormalRound(pstate, *key++);
> + pstate = InvNormalRound(pstate, *key++);
> +
> + case 24: /* 12 rounds */
> + pstate = InvNormalRound(pstate, *key++);
> + pstate = InvNormalRound(pstate, *key++);
> +
> + case 16: /* 10 rounds */
> + for (i=0; i<9; i++)
> + pstate = InvNormalRound(pstate, *key++);
> +
> + break;
> +
> + default:
> + BUG();
> + }
Did this manual partial unrolling actually make a difference compared
this?
for (i=0; i<(5 + key_len / 8); i++)
pstate = InvNormalRound(pstate, *key++);
If they are the same speed, there should probably be no unrolling,
because the larger object code will be bad for the instruction
cache.
> ===================================================================
> --- ps3-linux.orig/crypto/Makefile
> +++ ps3-linux/crypto/Makefile
> @@ -48,3 +48,7 @@ obj-$(CONFIG_CRYPTO_MICHAEL_MIC) += mich
> obj-$(CONFIG_CRYPTO_CRC32C) += crc32c.o
>
> obj-$(CONFIG_CRYPTO_TEST) += tcrypt.o
> +
> +CFLAGS_aes-altivec.o += -O3 -maltivec -mcpu=cell
mcpu=cell probably breaks on most compilers. This needs some
experiments, but most systems should either leave this out
or specify the cpu they are actually compiling for.
Some time ago, I did a patch to extend the CPU selection in
Kconfig so you can choose sensible mcpu= and mtune= flags
semi-automatically.
Arnd <><
next prev parent reply other threads:[~2007-04-11 18:25 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-04-11 16:49 [RFC 0/3] Experiments with AES-AltiVec Sebastian Siewior
2007-04-11 16:49 ` [RFC 1/3] cryptoapi: AES with AltiVec support Sebastian Siewior
2007-04-11 18:24 ` Arnd Bergmann [this message]
2007-04-12 13:40 ` Sebastian Siewior
2007-04-11 22:22 ` Benjamin Herrenschmidt
2007-04-12 7:45 ` Sebastian Siewior
2007-04-12 8:39 ` Benjamin Herrenschmidt
2007-04-11 16:49 ` [RFC 2/3] PowerPC: lazy altivec enabling in kernel Sebastian Siewior
2007-04-11 16:49 ` [RFC 3/3] cryptoapi: speed test Sebastian Siewior
-- strict thread matches above, loose matches on Subject: below --
2007-04-17 11:52 [RFC 0/3] Experiments with AES-AltiVec, part 2 Sebastian Siewior
2007-04-17 11:52 ` [RFC 1/3] cryptoapi: AES with AltiVec support Sebastian Siewior
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200704112024.51124.arnd@arndb.de \
--to=arnd@arndb.de \
--cc=linuxppc-dev@ozlabs.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.