linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Sebastian Siewior <bigeasy@linux.vnet.ibm.com>
To: Arnd Bergmann <arnd@arndb.de>
Cc: linuxppc-dev@ozlabs.org
Subject: Re: [RFC 1/3] cryptoapi: AES with AltiVec support
Date: Thu, 12 Apr 2007 15:40:26 +0200	[thread overview]
Message-ID: <461E36CA.4030509@linux.vnet.ibm.com> (raw)
In-Reply-To: <200704112024.51124.arnd@arndb.de>

Arnd Bergmann wrote:
> Just a few things I noticed in this version:
> 
>> +	imm_04h = vec_splat_u8(0x04);
>> +	imm_0fh = vec_splat_u8(0x0f);
>> +
>> +	state_high = vec_sr(state, imm_04h);
>> +	state_low  = vec_and(state, imm_0fh);
> 
> Why do you need to and with 0x0f here? I thought vec_perm simply ignored
> the high bits anyway.

Yes it does, my bad

>> +	mul_09 = vec_xor(mul_08, state);
>> +	mul_0b = vec_xor(mul_0a, state);
>> +	mul_0d = vec_xor(mul_0c, state);
> 
> What are the last three xor used for? I'd think you can have the values
> for 0x9, 0xb and 0xd in the table directly.
Yes. With recomputing table the performance seems to be unchanged.

>> +int aes_decrypt_altivec(const unsigned char *in, unsigned char *out,
>> +		const unsigned char *kp, unsigned char key_len)
>> +{
>> +	unsigned char i;
>> +	vector unsigned char pstate;
>> +	const vector unsigned char *key;
>> +	unsigned char tmpbuf[16]  __attribute__ ((aligned (16)));
> 
> My understanding is that gcc will not align the latter on the stack
> now does it warn about the fact that the attribute gets ignored.
> Have you checked that the variable is really put into a 16 byte
> aligned stack slot? Does it even make a difference?

Quote from the PPC64 ABI [1]:
r1: The stack pointer (stored in r1) shall maintain quadword alignment.
It shall always point to the lowest allocated valid stack frame, and
grow toward low addresses.  The contents of the word at that address
always point to the previously allocated stack frame.
...

PPC32 ABI[2] page 32 contains the same sentence about r1. The compiler 
does not behave correctly on >16 byte alignment and should warn you.

>> +	switch (key_len) {
>> +		case 32: /* 14 rounds */
>> +			pstate = InvNormalRound(pstate, *key++);
>> +			pstate = InvNormalRound(pstate, *key++);
>> +
>> +		case 24: /* 12 rounds */
>> +			pstate = InvNormalRound(pstate, *key++);
>> +			pstate = InvNormalRound(pstate, *key++);
>> +
>> +		case 16: /* 10 rounds */
>> +			for (i=0; i<9; i++)
>> +				pstate = InvNormalRound(pstate, *key++);
>> +
>> +			break;
>> +
>> +		default:
>> +			BUG();
>> +	}
> 
> Did this manual partial unrolling actually make a difference compared
> this?
> 
> 	for (i=0; i<(5 + key_len / 8); i++)
> 		pstate = InvNormalRound(pstate, *key++);

The numbers for encryption change from 33mb/sec to 23mb/sec and for 
decryption from 29mb/sec to 26mb/sec. The table decryption, however, 
improves from 19mb/sec to 23mb/sec.

> If they are the same speed, there should probably be no unrolling,
> because the larger object code will be bad for the instruction
> cache.
I don't force him to unroll the code, there is no inline keyword there. 
ppu-gcc from SDK 2.1 unrolls the code, gentoo's gcc 4.1.2 doesn't.

>> +CFLAGS_aes-altivec.o += -O3  -maltivec -mcpu=cell
> 
> mcpu=cell probably breaks on most compilers. This needs some
> experiments, but most systems should either leave this out
> or specify the cpu they are actually compiling for.
> 
> Some time ago, I did a patch to extend the CPU selection in
> Kconfig so you can choose sensible mcpu= and mtune= flags
> semi-automatically.

I have no cpu selection in my current Kconfig. Will take a look on that 
patch

> 
> 	Arnd <><

Sebastian

[1] ftp://ftp.linuxppc64.org/pub/people/sjmunroe/PPC64-VMXabi.txt
[2]http://www.cloudcaptech.com/MPC555%20Resources/Programming%20Environment/SVR4abippc.pdf

  reply	other threads:[~2007-04-12 13:41 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-04-11 16:49 [RFC 0/3] Experiments with AES-AltiVec Sebastian Siewior
2007-04-11 16:49 ` [RFC 1/3] cryptoapi: AES with AltiVec support Sebastian Siewior
2007-04-11 18:24   ` Arnd Bergmann
2007-04-12 13:40     ` Sebastian Siewior [this message]
2007-04-11 22:22   ` Benjamin Herrenschmidt
2007-04-12  7:45     ` Sebastian Siewior
2007-04-12  8:39       ` Benjamin Herrenschmidt
2007-04-11 16:49 ` [RFC 2/3] PowerPC: lazy altivec enabling in kernel Sebastian Siewior
2007-04-11 16:49 ` [RFC 3/3] cryptoapi: speed test Sebastian Siewior
  -- strict thread matches above, loose matches on Subject: below --
2007-04-17 11:52 [RFC 0/3] Experiments with AES-AltiVec, part 2 Sebastian Siewior
2007-04-17 11:52 ` [RFC 1/3] cryptoapi: AES with AltiVec support Sebastian Siewior

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=461E36CA.4030509@linux.vnet.ibm.com \
    --to=bigeasy@linux.vnet.ibm.com \
    --cc=arnd@arndb.de \
    --cc=linuxppc-dev@ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).