Re: [PATCH] AES x86-64-asm impl.

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH] AES x86-64-asm impl.
       [not found] <2KWl4-wq-25@gated-at.bofh.it>
@ 2004-10-02 19:41 ` Andi Kleen
  2004-10-04  2:15   ` dean gaudet
  2004-10-04 11:51   ` Jari Ruusu
  0 siblings, 2 replies; 17+ messages in thread
From: Andi Kleen @ 2004-10-02 19:41 UTC (permalink / raw)
  To: Florian Bohrer; +Cc: linux-kernel, discuss

Florian.Bohrer@t-online.de (Florian Bohrer) writes:

> hi,
>
> this is my first public kernel patch. it is an x86_64 asm optimized version of AES for the 
> crypto-framework. the patch is against 2.6.9-rc2-mm1 but should work with other 
> versions too. 
>
>
> the asm-code is from Jari Ruusu (loop-aes).
> the org. glue-code is from Fruhwirth Clemens.
>

Thanks. I will add it to the x86-64 patchkit. I have a 64bit version 
here too, but it had a bug somewhere and I didn't have time to fix it yet.

Unfortunately it's still fundamentally 32bit. Anybody interested
in doing a true 64bit AES? 

-Andi



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] AES x86-64-asm impl.
  2004-10-02 19:41 ` [PATCH] AES x86-64-asm impl Andi Kleen
@ 2004-10-04  2:15   ` dean gaudet
  2004-10-04 11:51   ` Jari Ruusu
  1 sibling, 0 replies; 17+ messages in thread
From: dean gaudet @ 2004-10-04  2:15 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Florian Bohrer, linux-kernel, discuss

On Sat, 2 Oct 2004, Andi Kleen wrote:

> Unfortunately it's still fundamentally 32bit. Anybody interested
> in doing a true 64bit AES?

i doubt it helps any -- except for benchmark-only purposes.

there's a description of the 32-bit T-table approach in section 7.3 of 
<http://fp.gladman.plus.com/cryptography_technology/rijndael/aesspec.pdf>

basically the tables are 8-bit -> 32-bit maps, and there are 4 of them (2 
for each direction).  to go to 64-bit you'd need 16-bit -> 64-bit maps... 
512KiB per table.  there are some other variations on the tables which are 
smaller, but nothing as small as the 1024 bytes per table of the 32-bit 
implementation.

there's a completely different approach using bit-slicing (basically 
consider each register as 64 1-bit registers), which yields great 
throughput but cruddy latency -- you basically need lots of non-dependant 
streams to make this pay off (i.e. it might work for disk crypto 
processing multiple blocks simultaneously).

-dean

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] AES x86-64-asm impl.
  2004-10-02 19:41 ` [PATCH] AES x86-64-asm impl Andi Kleen
  2004-10-04  2:15   ` dean gaudet
@ 2004-10-04 11:51   ` Jari Ruusu
  2004-10-04 12:09     ` Paolo Ciarrocchi
  2004-10-04 13:08     ` Andi Kleen
  1 sibling, 2 replies; 17+ messages in thread
From: Jari Ruusu @ 2004-10-04 11:51 UTC (permalink / raw)
  To: Andi Kleen, Linus Torvalds; +Cc: Florian Bohrer, linux-kernel, discuss

Andi Kleen wrote:
> Florian.Bohrer@t-online.de (Florian Bohrer) writes:
> > the asm-code is from Jari Ruusu (loop-aes).
> > the org. glue-code is from Fruhwirth Clemens.
> 
> Thanks. I will add it to the x86-64 patchkit.

Here we go again...

Linus promised that he will not merge my code, and I am quite happy with my
code not being anywhere near mainline linux cryptoapi.

Linus, please consider dropping this.

-- 
Jari Ruusu  1024R/3A220F51 5B 4B F9 BB D3 3F 52 E9  DB 1D EB E3 24 0E A9 DD

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] AES x86-64-asm impl.
  2004-10-04 11:51   ` Jari Ruusu
@ 2004-10-04 12:09     ` Paolo Ciarrocchi
  2004-10-04 12:20       ` Jari Ruusu
  2004-10-04 13:08     ` Andi Kleen
  1 sibling, 1 reply; 17+ messages in thread
From: Paolo Ciarrocchi @ 2004-10-04 12:09 UTC (permalink / raw)
  To: Jari Ruusu
  Cc: Andi Kleen, Linus Torvalds, Florian Bohrer, linux-kernel, discuss

On Mon, 04 Oct 2004 14:51:19 +0300, Jari Ruusu
<jariruusu@users.sourceforge.net> wrote:
> Andi Kleen wrote:
> > Florian.Bohrer@t-online.de (Florian Bohrer) writes:
> > > the asm-code is from Jari Ruusu (loop-aes).
> > > the org. glue-code is from Fruhwirth Clemens.
> >
> > Thanks. I will add it to the x86-64 patchkit.
> 
> Here we go again...
> 
> Linus promised that he will not merge my code, and I am quite happy with my
> code not being anywhere near mainline linux cryptoapi.
> 
> Linus, please consider dropping this.

I guess Linus will do so,
but may I ask you why don't you want to see your code merged in mainline ?

Thanks.

-- 
Paolo
Personal home page: www.ciarrocchi.tk
See my photos: http://paolociarrocchi.fotopic.net/
Buy cool stuff here: http://www.cafepress.com/paoloc

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] AES x86-64-asm impl.
  2004-10-04 12:09     ` Paolo Ciarrocchi
@ 2004-10-04 12:20       ` Jari Ruusu
  2004-10-04 12:23         ` Paolo Ciarrocchi
  0 siblings, 1 reply; 17+ messages in thread
From: Jari Ruusu @ 2004-10-04 12:20 UTC (permalink / raw)
  To: Paolo Ciarrocchi
  Cc: Andi Kleen, Linus Torvalds, Florian Bohrer, linux-kernel, discuss

Paolo Ciarrocchi wrote:
> On Mon, 04 Oct 2004 14:51:19 +0300, Jari Ruusu
> > Linus promised that he will not merge my code, and I am quite happy with my
> > code not being anywhere near mainline linux cryptoapi.
> >
> > Linus, please consider dropping this.
> 
> I guess Linus will do so,
> but may I ask you why don't you want to see your code merged in mainline ?

I don't want my name associated with mainline linux cryptoapi or cryptoloop
or their developers.

-- 
Jari Ruusu  1024R/3A220F51 5B 4B F9 BB D3 3F 52 E9  DB 1D EB E3 24 0E A9 DD

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] AES x86-64-asm impl.
  2004-10-04 12:20       ` Jari Ruusu
@ 2004-10-04 12:23         ` Paolo Ciarrocchi
  2004-10-04 12:32           ` Jari Ruusu
  0 siblings, 1 reply; 17+ messages in thread
From: Paolo Ciarrocchi @ 2004-10-04 12:23 UTC (permalink / raw)
  To: Jari Ruusu
  Cc: Andi Kleen, Linus Torvalds, Florian Bohrer, linux-kernel, discuss

On Mon, 04 Oct 2004 15:20:43 +0300, Jari Ruusu
<jariruusu@users.sourceforge.net> wrote:
> Paolo Ciarrocchi wrote:
> > On Mon, 04 Oct 2004 14:51:19 +0300, Jari Ruusu
> > > Linus promised that he will not merge my code, and I am quite happy with my
> > > code not being anywhere near mainline linux cryptoapi.
> > >
> > > Linus, please consider dropping this.
> >
> > I guess Linus will do so,
> > but may I ask you why don't you want to see your code merged in mainline ?
> 
> I don't want my name associated with mainline linux cryptoapi or cryptoloop
> or their developers.

I understand that, I still don't understand the reaseon.
But hey, feel free to ignore my question ;)
-- 
Paolo
Personal home page: www.ciarrocchi.tk
See my photos: http://paolociarrocchi.fotopic.net/
Buy cool stuff here: http://www.cafepress.com/paoloc

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] AES x86-64-asm impl.
  2004-10-04 12:23         ` Paolo Ciarrocchi
@ 2004-10-04 12:32           ` Jari Ruusu
  2004-10-04 12:35             ` Paolo Ciarrocchi
                               ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Jari Ruusu @ 2004-10-04 12:32 UTC (permalink / raw)
  To: Paolo Ciarrocchi
  Cc: Andi Kleen, Linus Torvalds, Florian Bohrer, linux-kernel, discuss

Paolo Ciarrocchi wrote:
> On Mon, 04 Oct 2004 15:20:43 +0300, Jari Ruusu
> I understand that, I still don't understand the reaseon.
> But hey, feel free to ignore my question ;)

You haven't looked at cryptoloop security, have you?

No sane person wants to be accociated with that kind of broken and
backdoored scam. I certainly don't.

-- 
Jari Ruusu  1024R/3A220F51 5B 4B F9 BB D3 3F 52 E9  DB 1D EB E3 24 0E A9 DD

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] AES x86-64-asm impl.
  2004-10-04 12:32           ` Jari Ruusu
@ 2004-10-04 12:35             ` Paolo Ciarrocchi
  2004-10-04 18:58             ` [discuss] " Raul Miller
  2004-10-04 19:26             ` Bill Davidsen
  2 siblings, 0 replies; 17+ messages in thread
From: Paolo Ciarrocchi @ 2004-10-04 12:35 UTC (permalink / raw)
  To: Jari Ruusu
  Cc: Andi Kleen, Linus Torvalds, Florian Bohrer, linux-kernel, discuss

On Mon, 04 Oct 2004 15:32:29 +0300, Jari Ruusu
<jariruusu@users.sourceforge.net> wrote:
> Paolo Ciarrocchi wrote:
> > On Mon, 04 Oct 2004 15:20:43 +0300, Jari Ruusu
> > I understand that, I still don't understand the reaseon.
> > But hey, feel free to ignore my question ;)
> 
> You haven't looked at cryptoloop security, have you?

No at all.
 
> No sane person wants to be accociated with that kind of broken and
> backdoored scam. I certainly don't.

It was just curiosity ;)
Thank you for the answer.

-- 
Paolo
Personal home page: www.ciarrocchi.tk
See my photos: http://paolociarrocchi.fotopic.net/
Buy cool stuff here: http://www.cafepress.com/paoloc

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [discuss] Re: [PATCH] AES x86-64-asm impl.
  2004-10-04 12:32           ` Jari Ruusu
  2004-10-04 12:35             ` Paolo Ciarrocchi
@ 2004-10-04 18:58             ` Raul Miller
  2004-10-04 19:26             ` Bill Davidsen
  2 siblings, 0 replies; 17+ messages in thread
From: Raul Miller @ 2004-10-04 18:58 UTC (permalink / raw)
  To: Jari Ruusu; +Cc: linux-kernel, discuss

On Mon, Oct 04, 2004 at 03:32:29PM +0300, Jari Ruusu wrote:
> You haven't looked at cryptoloop security, have you?
> 
> No sane person wants to be accociated with that kind of broken and
> backdoored scam. I certainly don't.

Most kernel software is broken, initially -- and eventually it's either
replaced with something better or tossed because no one is interested
in it.

It's not clear to me whether you're more in the "offering something
better" camp or the "not interested" camp.  But I'm curious -- what do
you see as the major issues?

Thanks,

-- 
Raul

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] AES x86-64-asm impl.
  2004-10-04 12:32           ` Jari Ruusu
  2004-10-04 12:35             ` Paolo Ciarrocchi
  2004-10-04 18:58             ` [discuss] " Raul Miller
@ 2004-10-04 19:26             ` Bill Davidsen
  2004-10-04 21:20               ` Lee Revell
  2 siblings, 1 reply; 17+ messages in thread
From: Bill Davidsen @ 2004-10-04 19:26 UTC (permalink / raw)
  To: Jari Ruusu
  Cc: Paolo Ciarrocchi, Andi Kleen, Linus Torvalds, Florian Bohrer,
	linux-kernel, discuss

Jari Ruusu wrote:
> Paolo Ciarrocchi wrote:
> 
>>On Mon, 04 Oct 2004 15:20:43 +0300, Jari Ruusu
>>I understand that, I still don't understand the reaseon.
>>But hey, feel free to ignore my question ;)
> 
> 
> You haven't looked at cryptoloop security, have you?
> 
> No sane person wants to be accociated with that kind of broken and
> backdoored scam. I certainly don't.
> 
Would you be happy if the code were wrapped as a general use package 
like blowfish, or have you decided that because one part of Linux 
doesn't meet your standards you don't want to allow any of your code to 
be used in it?


-- 
    -bill davidsen (davidsen@tmr.com)
"The secret to procrastination is to put things off until the
  last possible moment - but no longer"  -me

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] AES x86-64-asm impl.
  2004-10-04 19:26             ` Bill Davidsen
@ 2004-10-04 21:20               ` Lee Revell
  2004-10-05 15:00                 ` Bill Davidsen
  0 siblings, 1 reply; 17+ messages in thread
From: Lee Revell @ 2004-10-04 21:20 UTC (permalink / raw)
  To: Bill Davidsen
  Cc: Jari Ruusu, Paolo Ciarrocchi, Andi Kleen, Linus Torvalds,
	Florian Bohrer, linux-kernel, discuss

On Mon, 2004-10-04 at 15:26, Bill Davidsen wrote:
> Jari Ruusu wrote:
> > Paolo Ciarrocchi wrote:
> > 
> >>On Mon, 04 Oct 2004 15:20:43 +0300, Jari Ruusu
> >>I understand that, I still don't understand the reaseon.
> >>But hey, feel free to ignore my question ;)
> > 
> > 
> > You haven't looked at cryptoloop security, have you?
> > 
> > No sane person wants to be accociated with that kind of broken and
> > backdoored scam. I certainly don't.
> > 
> Would you be happy if the code were wrapped as a general use package 
> like blowfish, or have you decided that because one part of Linux 
> doesn't meet your standards you don't want to allow any of your code to 
> be used in it?
> 

Please check the archives, Jari's reasons are well documented.  I cannot
summarize the technical issues here as IANA cryptographer but please,
let's not start that thread again.

Lee 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] AES x86-64-asm impl.
  2004-10-04 21:20               ` Lee Revell
@ 2004-10-05 15:00                 ` Bill Davidsen
  0 siblings, 0 replies; 17+ messages in thread
From: Bill Davidsen @ 2004-10-05 15:00 UTC (permalink / raw)
  To: Lee Revell
  Cc: Jari Ruusu, Paolo Ciarrocchi, Andi Kleen, Linus Torvalds,
	Florian Bohrer, linux-kernel, discuss

On Mon, 4 Oct 2004, Lee Revell wrote:

> On Mon, 2004-10-04 at 15:26, Bill Davidsen wrote:
> > Jari Ruusu wrote:
> > > Paolo Ciarrocchi wrote:
> > > 
> > >>On Mon, 04 Oct 2004 15:20:43 +0300, Jari Ruusu
> > >>I understand that, I still don't understand the reaseon.
> > >>But hey, feel free to ignore my question ;)
> > > 
> > > 
> > > You haven't looked at cryptoloop security, have you?
> > > 
> > > No sane person wants to be accociated with that kind of broken and
> > > backdoored scam. I certainly don't.
> > > 
> > Would you be happy if the code were wrapped as a general use package 
> > like blowfish, or have you decided that because one part of Linux 
> > doesn't meet your standards you don't want to allow any of your code to 
> > be used in it?
> > 
> 
> Please check the archives, Jari's reasons are well documented.  I cannot
> summarize the technical issues here as IANA cryptographer but please,
> let's not start that thread again.

I'm not starting a thread, I read the discussion the first time and I'm
not asking about his reasons, I'm asking a yes/no question which he will
answer or not as he pleases. 

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] AES x86-64-asm impl.
  2004-10-04 11:51   ` Jari Ruusu
  2004-10-04 12:09     ` Paolo Ciarrocchi
@ 2004-10-04 13:08     ` Andi Kleen
  2004-10-05  0:35       ` Andy Lutomirski
  1 sibling, 1 reply; 17+ messages in thread
From: Andi Kleen @ 2004-10-04 13:08 UTC (permalink / raw)
  To: Jari Ruusu; +Cc: Linus Torvalds, Florian Bohrer, linux-kernel, discuss

On Mon, Oct 04, 2004 at 02:51:19PM +0300, Jari Ruusu wrote:
> Andi Kleen wrote:
> > Florian.Bohrer@t-online.de (Florian Bohrer) writes:
> > > the asm-code is from Jari Ruusu (loop-aes).
> > > the org. glue-code is from Fruhwirth Clemens.
> > 
> > Thanks. I will add it to the x86-64 patchkit.
> 
> Here we go again...
> 
> Linus promised that he will not merge my code, and I am quite happy with my
> code not being anywhere near mainline linux cryptoapi.
> 
> Linus, please consider dropping this.

Ok, I will drop that version and go back to the older version based
on the i386 code.

-Andi

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] AES x86-64-asm impl.
  2004-10-04 13:08     ` Andi Kleen
@ 2004-10-05  0:35       ` Andy Lutomirski
  2004-10-05  5:15         ` Linus Torvalds
  0 siblings, 1 reply; 17+ messages in thread
From: Andy Lutomirski @ 2004-10-05  0:35 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Linus Torvalds, Florian Bohrer, linux-kernel, discuss

Andi Kleen wrote:
> On Mon, Oct 04, 2004 at 02:51:19PM +0300, Jari Ruusu wrote:

>>
>>Here we go again...
>>
>>Linus promised that he will not merge my code, and I am quite happy with my
>>code not being anywhere near mainline linux cryptoapi.
>>
>>Linus, please consider dropping this.
> 
> 
> Ok, I will drop that version and go back to the older version based
> on the i386 code.
> 
> -Andi

WHAT?  We're dropping potentially better code because someone _who 
didn't submit it_ disagrees for personal political reasons?  (Jari- I'm 
not questioning your reasons for not wanting to be involved in 
cryptoapi, but IIRC you did release that code under the GPL.)

--Andy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] AES x86-64-asm impl.
  2004-10-05  0:35       ` Andy Lutomirski
@ 2004-10-05  5:15         ` Linus Torvalds
  0 siblings, 0 replies; 17+ messages in thread
From: Linus Torvalds @ 2004-10-05  5:15 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Andi Kleen, Florian Bohrer, linux-kernel, discuss



On Mon, 4 Oct 2004, Andy Lutomirski wrote:
> 
> WHAT?  We're dropping potentially better code because someone _who 
> didn't submit it_ disagrees for personal political reasons?  (Jari- I'm 
> not questioning your reasons for not wanting to be involved in 
> cryptoapi, but IIRC you did release that code under the GPL.)

Guys. Please remember this: don't bother with code that Jari supposedly 
"releases". It's simply not worth the bother.

		Linus

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] AES x86-64-asm impl.
@ 2004-10-02 17:53 Florian Bohrer
  2004-10-02 19:37 ` Lee Revell
  0 siblings, 1 reply; 17+ messages in thread
From: Florian Bohrer @ 2004-10-02 17:53 UTC (permalink / raw)
  To: linux-kernel

hi,

this is my first public kernel patch. it is an x86_64 asm optimized version of AES for the 
crypto-framework. the patch is against 2.6.9-rc2-mm1 but should work with other 
versions too. 


the asm-code is from Jari Ruusu (loop-aes).
the org. glue-code is from Fruhwirth Clemens.



--- linux-2.6.9-rc2-mm1/arch/x86_64/crypto/aes-x86_64-asm.S	1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.9-rc2-mm1-aes/arch/x86_64/crypto/aes-x86_64-asm.S	2004-09-26 23:57:35.936380752 +0200
@@ -0,0 +1,896 @@
+//
+// Copyright (c) 2001, Dr Brian Gladman <brg@gladman.uk.net>, Worcester, UK.
+// All rights reserved.
+//
+// TERMS
+//
+//  Redistribution and use in source and binary forms, with or without
+//  modification, are permitted subject to the following conditions:
+//
+//  1. Redistributions of source code must retain the above copyright
+//     notice, this list of conditions and the following disclaimer.
+//
+//  2. Redistributions in binary form must reproduce the above copyright
+//     notice, this list of conditions and the following disclaimer in the
+//     documentation and/or other materials provided with the distribution.
+//
+//  3. The copyright holder's name must not be used to endorse or promote
+//     any products derived from this software without his specific prior
+//     written permission.
+//
+//  This software is provided 'as is' with no express or implied warranties
+//  of correctness or fitness for purpose.
+
+// Modified by Jari Ruusu,  December 24 2001
+//  - Converted syntax to GNU CPP/assembler syntax
+//  - C programming interface converted back to "old" API
+//  - Minor portability cleanups and speed optimizations
+
+// Modified by Jari Ruusu,  April 11 2002
+//  - Added above copyright and terms to resulting object code so that
+//    binary distributions can avoid legal trouble
+
+// Modified by Jari Ruusu,  June 12 2004
+//  - Converted 32 bit x86 code to 64 bit AMD64 code
+//  - Re-wrote encrypt and decrypt code from scratch
+
+// Modified by Florian Bohrer,  September 26 2004
+//  - Switched in/out 
+
+// An AES (Rijndael) implementation for the AMD64. This version only
+// implements the standard AES block length (128 bits, 16 bytes). This code
+// does not preserve the rax, rcx, rdx, rsi, rdi or r8-r11 registers or the
+// artihmetic status flags. However, the rbx, rbp and r12-r15 registers are
+// preserved across calls.
+
+// void aes_set_key(aes_context *cx, const unsigned char key[], const int key_len, const int f)
+// void aes_encrypt(const aes_context *cx, const unsigned char out_blk[], unsigned char in_blk[])
+// void aes_decrypt(const aes_context *cx, const unsigned char out_blk[], unsigned char in_blk[])
+
+#if defined(USE_UNDERLINE)
+# define aes_set_key _aes_set_key
+# define aes_encrypt _aes_encrypt
+# define aes_decrypt _aes_decrypt
+#endif
+#if !defined(ALIGN64BYTES)
+# define ALIGN64BYTES 64
+#endif
+
+	.file	"aes-x86_64-asm.S"
+	.globl	aes_set_key
+	.globl	aes_encrypt
+	.globl	aes_decrypt
+
+	.section .rodata
+copyright:
+	.ascii "    \000"
+	.ascii "Copyright (c) 2001, Dr Brian Gladman <brg@gladman.uk.net>, Worcester, UK.\000"
+	.ascii "All rights reserved.\000"
+	.ascii "    \000"
+	.ascii "TERMS\000"
+	.ascii "    \000"
+	.ascii " Redistribution and use in source and binary forms, with or without\000"
+	.ascii " modification, are permitted subject to the following conditions:\000"
+	.ascii "    \000"
+	.ascii " 1. Redistributions of source code must retain the above copyright\000"
+	.ascii "    notice, this list of conditions and the following disclaimer.\000"
+	.ascii "    \000"
+	.ascii " 2. Redistributions in binary form must reproduce the above copyright\000"
+	.ascii "    notice, this list of conditions and the following disclaimer in the\000"
+	.ascii "    documentation and/or other materials provided with the distribution.\000"
+	.ascii "    \000"
+	.ascii " 3. The copyright holder's name must not be used to endorse or promote\000"
+	.ascii "    any products derived from this software without his specific prior\000"
+	.ascii "    written permission.\000"
+	.ascii "    \000"
+	.ascii " This software is provided 'as is' with no express or implied warranties\000"
+	.ascii " of correctness or fitness for purpose.\000"
+	.ascii "    \000"
+
+#define tlen	1024	// length of each of 4 'xor' arrays (256 32-bit words)
+
+// offsets in context structure
+
+#define nkey	0	// key length, size 4
+#define nrnd	4	// number of rounds, size 4
+#define ekey	8	// encryption key schedule base address, size 256
+#define dkey	264	// decryption key schedule base address, size 256
+
+// This macro performs a forward encryption cycle. It is entered with
+// the first previous round column values in I1E, I2E, I3E and I4E and
+// exits with the final values OU1, OU2, OU3 and OU4 registers.
+
+#define fwd_rnd(p1,p2,I1E,I1B,I1H,I2E,I2B,I2H,I3E,I3B,I3R,I4E,I4B,I4R,OU1,OU2,OU3,OU4) \
+	movl	p2(%rbp),OU1		;\
+	movl	p2+4(%rbp),OU2		;\
+	movl	p2+8(%rbp),OU3		;\
+	movl	p2+12(%rbp),OU4		;\
+	movzbl	I1B,%edi		;\
+	movzbl	I2B,%esi		;\
+	movzbl	I3B,%r8d		;\
+	movzbl	I4B,%r13d		;\
+	shrl	$8,I3E			;\
+	shrl	$8,I4E			;\
+	xorl	p1(,%rdi,4),OU1		;\
+	xorl	p1(,%rsi,4),OU2		;\
+	xorl	p1(,%r8,4),OU3		;\
+	xorl	p1(,%r13,4),OU4		;\
+	movzbl	I2H,%esi		;\
+	movzbl	I3B,%r8d		;\
+	movzbl	I4B,%r13d		;\
+	movzbl	I1H,%edi		;\
+	shrl	$8,I3E			;\
+	shrl	$8,I4E			;\
+	xorl	p1+tlen(,%rsi,4),OU1	;\
+	xorl	p1+tlen(,%r8,4),OU2	;\
+	xorl	p1+tlen(,%r13,4),OU3	;\
+	xorl	p1+tlen(,%rdi,4),OU4	;\
+	shrl	$16,I1E			;\
+	shrl	$16,I2E			;\
+	movzbl	I3B,%r8d		;\
+	movzbl	I4B,%r13d		;\
+	movzbl	I1B,%edi		;\
+	movzbl	I2B,%esi		;\
+	xorl	p1+2*tlen(,%r8,4),OU1	;\
+	xorl	p1+2*tlen(,%r13,4),OU2	;\
+	xorl	p1+2*tlen(,%rdi,4),OU3	;\
+	xorl	p1+2*tlen(,%rsi,4),OU4	;\
+	shrl	$8,I4E			;\
+	movzbl	I1H,%edi		;\
+	movzbl	I2H,%esi		;\
+	shrl	$8,I3E			;\
+	xorl	p1+3*tlen(,I4R,4),OU1	;\
+	xorl	p1+3*tlen(,%rdi,4),OU2	;\
+	xorl	p1+3*tlen(,%rsi,4),OU3	;\
+	xorl	p1+3*tlen(,I3R,4),OU4
+
+// This macro performs an inverse encryption cycle. It is entered with
+// the first previous round column values in I1E, I2E, I3E and I4E and
+// exits with the final values OU1, OU2, OU3 and OU4 registers.
+
+#define inv_rnd(p1,p2,I1E,I1B,I1R,I2E,I2B,I2R,I3E,I3B,I3H,I4E,I4B,I4H,OU1,OU2,OU3,OU4) \
+	movl	p2+12(%rbp),OU4		;\
+	movl	p2+8(%rbp),OU3		;\
+	movl	p2+4(%rbp),OU2		;\
+	movl	p2(%rbp),OU1		;\
+	movzbl	I4B,%edi		;\
+	movzbl	I3B,%esi		;\
+	movzbl	I2B,%r8d		;\
+	movzbl	I1B,%r13d		;\
+	shrl	$8,I2E			;\
+	shrl	$8,I1E			;\
+	xorl	p1(,%rdi,4),OU4		;\
+	xorl	p1(,%rsi,4),OU3		;\
+	xorl	p1(,%r8,4),OU2		;\
+	xorl	p1(,%r13,4),OU1		;\
+	movzbl	I3H,%esi		;\
+	movzbl	I2B,%r8d		;\
+	movzbl	I1B,%r13d		;\
+	movzbl	I4H,%edi		;\
+	shrl	$8,I2E			;\
+	shrl	$8,I1E			;\
+	xorl	p1+tlen(,%rsi,4),OU4	;\
+	xorl	p1+tlen(,%r8,4),OU3	;\
+	xorl	p1+tlen(,%r13,4),OU2	;\
+	xorl	p1+tlen(,%rdi,4),OU1	;\
+	shrl	$16,I4E			;\
+	shrl	$16,I3E			;\
+	movzbl	I2B,%r8d		;\
+	movzbl	I1B,%r13d		;\
+	movzbl	I4B,%edi		;\
+	movzbl	I3B,%esi		;\
+	xorl	p1+2*tlen(,%r8,4),OU4	;\
+	xorl	p1+2*tlen(,%r13,4),OU3	;\
+	xorl	p1+2*tlen(,%rdi,4),OU2	;\
+	xorl	p1+2*tlen(,%rsi,4),OU1	;\
+	shrl	$8,I1E			;\
+	movzbl	I4H,%edi		;\
+	movzbl	I3H,%esi		;\
+	shrl	$8,I2E			;\
+	xorl	p1+3*tlen(,I1R,4),OU4	;\
+	xorl	p1+3*tlen(,%rdi,4),OU3	;\
+	xorl	p1+3*tlen(,%rsi,4),OU2	;\
+	xorl	p1+3*tlen(,I2R,4),OU1
+
+// AES (Rijndael) Encryption Subroutine
+
+// rdi = pointer to AES context
+// rsi = pointer to output ciphertext bytes
+// rdx = pointer to input plaintext bytes
+
+	.text
+	.align	ALIGN64BYTES
+aes_encrypt:
+	movl	(%rdx),%eax		// read in plaintext
+	movl	4(%rdx),%ecx
+	movl	8(%rdx),%r10d
+	movl	12(%rdx),%r11d
+
+	pushq	%rbp
+	leaq	ekey+16(%rdi),%rbp	// encryption key pointer
+	movq	%rsi,%r9		// pointer to out block
+	movl	nrnd(%rdi),%edx		// number of rounds
+	pushq	%rbx
+	pushq	%r13
+	pushq	%r14
+	pushq	%r15
+
+	xorl	-16(%rbp),%eax		// xor in first round key
+	xorl	-12(%rbp),%ecx
+	xorl	-8(%rbp),%r10d
+	xorl	-4(%rbp),%r11d
+
+	subl	$10,%edx
+	je	aes_15
+	addq	$32,%rbp
+	subl	$2,%edx
+	je	aes_13
+	addq	$32,%rbp
+
+	fwd_rnd(aes_ft_tab,-64,%eax,%al,%ah,%ecx,%cl,%ch,%r10d,%r10b,%r10,%r11d,%r11b,%r11,%ebx,%edx,%r14d,%r15d)
+	fwd_rnd(aes_ft_tab,-48,%ebx,%bl,%bh,%edx,%dl,%dh,%r14d,%r14b,%r14,%r15d,%r15b,%r15,%eax,%ecx,%r10d,%r11d)
+	jmp	aes_13
+	.align	ALIGN64BYTES
+aes_13:	fwd_rnd(aes_ft_tab,-32,%eax,%al,%ah,%ecx,%cl,%ch,%r10d,%r10b,%r10,%r11d,%r11b,%r11,%ebx,%edx,%r14d,%r15d)
+	fwd_rnd(aes_ft_tab,-16,%ebx,%bl,%bh,%edx,%dl,%dh,%r14d,%r14b,%r14,%r15d,%r15b,%r15,%eax,%ecx,%r10d,%r11d)
+	jmp	aes_15
+	.align	ALIGN64BYTES
+aes_15:	fwd_rnd(aes_ft_tab,0,  %eax,%al,%ah,%ecx,%cl,%ch,%r10d,%r10b,%r10,%r11d,%r11b,%r11,%ebx,%edx,%r14d,%r15d)
+	fwd_rnd(aes_ft_tab,16, %ebx,%bl,%bh,%edx,%dl,%dh,%r14d,%r14b,%r14,%r15d,%r15b,%r15,%eax,%ecx,%r10d,%r11d)
+	fwd_rnd(aes_ft_tab,32, %eax,%al,%ah,%ecx,%cl,%ch,%r10d,%r10b,%r10,%r11d,%r11b,%r11,%ebx,%edx,%r14d,%r15d)
+	fwd_rnd(aes_ft_tab,48, %ebx,%bl,%bh,%edx,%dl,%dh,%r14d,%r14b,%r14,%r15d,%r15b,%r15,%eax,%ecx,%r10d,%r11d)
+	fwd_rnd(aes_ft_tab,64, %eax,%al,%ah,%ecx,%cl,%ch,%r10d,%r10b,%r10,%r11d,%r11b,%r11,%ebx,%edx,%r14d,%r15d)
+	fwd_rnd(aes_ft_tab,80, %ebx,%bl,%bh,%edx,%dl,%dh,%r14d,%r14b,%r14,%r15d,%r15b,%r15,%eax,%ecx,%r10d,%r11d)
+	fwd_rnd(aes_ft_tab,96, %eax,%al,%ah,%ecx,%cl,%ch,%r10d,%r10b,%r10,%r11d,%r11b,%r11,%ebx,%edx,%r14d,%r15d)
+	fwd_rnd(aes_ft_tab,112,%ebx,%bl,%bh,%edx,%dl,%dh,%r14d,%r14b,%r14,%r15d,%r15b,%r15,%eax,%ecx,%r10d,%r11d)
+	fwd_rnd(aes_ft_tab,128,%eax,%al,%ah,%ecx,%cl,%ch,%r10d,%r10b,%r10,%r11d,%r11b,%r11,%ebx,%edx,%r14d,%r15d)
+	fwd_rnd(aes_fl_tab,144,%ebx,%bl,%bh,%edx,%dl,%dh,%r14d,%r14b,%r14,%r15d,%r15b,%r15,%eax,%ecx,%r10d,%r11d)
+
+	popq	%r15
+	popq	%r14
+	popq	%r13
+	popq	%rbx
+	popq	%rbp
+
+	movl	%eax,(%r9)		// move final values to the output array.
+	movl	%ecx,4(%r9)
+	movl	%r10d,8(%r9)
+	movl	%r11d,12(%r9)
+	ret
+
+// AES (Rijndael) Decryption Subroutine
+
+// rdi = pointer to AES context
+// rsi = pointer to output plaintext bytes
+// rdx = pointer to input ciphertext bytes
+
+	.align	ALIGN64BYTES
+aes_decrypt:
+	movl	12(%rdx),%eax		// read in ciphertext
+	movl	8(%rdx),%ecx
+	movl	4(%rdx),%r10d
+	movl	(%rdx),%r11d
+
+	pushq	%rbp
+	leaq	dkey+16(%rdi),%rbp	// decryption key pointer
+	movq	%rsi,%r9		// pointer to out block
+	movl	nrnd(%rdi),%edx		// number of rounds
+	pushq	%rbx
+	pushq	%r13
+	pushq	%r14
+	pushq	%r15
+
+	xorl	-4(%rbp),%eax		// xor in first round key
+	xorl	-8(%rbp),%ecx
+	xorl	-12(%rbp),%r10d
+	xorl	-16(%rbp),%r11d
+
+	subl	$10,%edx
+	je	aes_25
+	addq	$32,%rbp
+	subl	$2,%edx
+	je	aes_23
+	addq	$32,%rbp
+
+	inv_rnd(aes_it_tab,-64,%r11d,%r11b,%r11,%r10d,%r10b,%r10,%ecx,%cl,%ch,%eax,%al,%ah,%r15d,%r14d,%edx,%ebx)
+	inv_rnd(aes_it_tab,-48,%r15d,%r15b,%r15,%r14d,%r14b,%r14,%edx,%dl,%dh,%ebx,%bl,%bh,%r11d,%r10d,%ecx,%eax)
+	jmp	aes_23
+	.align	ALIGN64BYTES
+aes_23:	inv_rnd(aes_it_tab,-32,%r11d,%r11b,%r11,%r10d,%r10b,%r10,%ecx,%cl,%ch,%eax,%al,%ah,%r15d,%r14d,%edx,%ebx)
+	inv_rnd(aes_it_tab,-16,%r15d,%r15b,%r15,%r14d,%r14b,%r14,%edx,%dl,%dh,%ebx,%bl,%bh,%r11d,%r10d,%ecx,%eax)
+	jmp	aes_25
+	.align	ALIGN64BYTES
+aes_25:	inv_rnd(aes_it_tab,0,  %r11d,%r11b,%r11,%r10d,%r10b,%r10,%ecx,%cl,%ch,%eax,%al,%ah,%r15d,%r14d,%edx,%ebx)
+	inv_rnd(aes_it_tab,16, %r15d,%r15b,%r15,%r14d,%r14b,%r14,%edx,%dl,%dh,%ebx,%bl,%bh,%r11d,%r10d,%ecx,%eax)
+	inv_rnd(aes_it_tab,32, %r11d,%r11b,%r11,%r10d,%r10b,%r10,%ecx,%cl,%ch,%eax,%al,%ah,%r15d,%r14d,%edx,%ebx)
+	inv_rnd(aes_it_tab,48, %r15d,%r15b,%r15,%r14d,%r14b,%r14,%edx,%dl,%dh,%ebx,%bl,%bh,%r11d,%r10d,%ecx,%eax)
+	inv_rnd(aes_it_tab,64, %r11d,%r11b,%r11,%r10d,%r10b,%r10,%ecx,%cl,%ch,%eax,%al,%ah,%r15d,%r14d,%edx,%ebx)
+	inv_rnd(aes_it_tab,80, %r15d,%r15b,%r15,%r14d,%r14b,%r14,%edx,%dl,%dh,%ebx,%bl,%bh,%r11d,%r10d,%ecx,%eax)
+	inv_rnd(aes_it_tab,96, %r11d,%r11b,%r11,%r10d,%r10b,%r10,%ecx,%cl,%ch,%eax,%al,%ah,%r15d,%r14d,%edx,%ebx)
+	inv_rnd(aes_it_tab,112,%r15d,%r15b,%r15,%r14d,%r14b,%r14,%edx,%dl,%dh,%ebx,%bl,%bh,%r11d,%r10d,%ecx,%eax)
+	inv_rnd(aes_it_tab,128,%r11d,%r11b,%r11,%r10d,%r10b,%r10,%ecx,%cl,%ch,%eax,%al,%ah,%r15d,%r14d,%edx,%ebx)
+	inv_rnd(aes_il_tab,144,%r15d,%r15b,%r15,%r14d,%r14b,%r14,%edx,%dl,%dh,%ebx,%bl,%bh,%r11d,%r10d,%ecx,%eax)
+
+	popq	%r15
+	popq	%r14
+	popq	%r13
+	popq	%rbx
+	popq	%rbp
+
+	movl	%eax,12(%r9)		// move final values to the output array.
+	movl	%ecx,8(%r9)
+	movl	%r10d,4(%r9)
+	movl	%r11d,(%r9)
+	ret
+
+// AES (Rijndael) Key Schedule Subroutine
+
+// This macro performs a column mixing operation on an input 32-bit
+// word to give a 32-bit result. It uses each of the 4 bytes in the
+// the input column to index 4 different tables of 256 32-bit words
+// that are xored together to form the output value.
+
+#define mix_col(p1)			 \
+	movzbl	%bl,%ecx		;\
+	movl	p1(,%rcx,4),%eax	;\
+	movzbl	%bh,%ecx		;\
+	ror	$16,%ebx		;\
+	xorl	p1+tlen(,%rcx,4),%eax	;\
+	movzbl	%bl,%ecx		;\
+	xorl	p1+2*tlen(,%rcx,4),%eax	;\
+	movzbl	%bh,%ecx		;\
+	xorl	p1+3*tlen(,%rcx,4),%eax
+
+// Key Schedule Macros
+
+#define ksc4(p1)			 \
+	rol	$24,%ebx		;\
+	mix_col(aes_fl_tab)		;\
+	ror	$8,%ebx			;\
+	xorl	4*p1+aes_rcon_tab,%eax	;\
+	xorl	%eax,%esi		;\
+	xorl	%esi,%ebp		;\
+	movl	%esi,16*p1(%rdi)	;\
+	movl	%ebp,16*p1+4(%rdi)	;\
+	xorl	%ebp,%edx		;\
+	xorl	%edx,%ebx		;\
+	movl	%edx,16*p1+8(%rdi)	;\
+	movl	%ebx,16*p1+12(%rdi)
+
+#define ksc6(p1)			 \
+	rol	$24,%ebx		;\
+	mix_col(aes_fl_tab)		;\
+	ror	$8,%ebx			;\
+	xorl	4*p1+aes_rcon_tab,%eax	;\
+	xorl	24*p1-24(%rdi),%eax	;\
+	movl	%eax,24*p1(%rdi)	;\
+	xorl	24*p1-20(%rdi),%eax	;\
+	movl	%eax,24*p1+4(%rdi)	;\
+	xorl	%eax,%esi		;\
+	xorl	%esi,%ebp		;\
+	movl	%esi,24*p1+8(%rdi)	;\
+	movl	%ebp,24*p1+12(%rdi)	;\
+	xorl	%ebp,%edx		;\
+	xorl	%edx,%ebx		;\
+	movl	%edx,24*p1+16(%rdi)	;\
+	movl	%ebx,24*p1+20(%rdi)
+
+#define ksc8(p1)			 \
+	rol	$24,%ebx		;\
+	mix_col(aes_fl_tab)		;\
+	ror	$8,%ebx			;\
+	xorl	4*p1+aes_rcon_tab,%eax	;\
+	xorl	32*p1-32(%rdi),%eax	;\
+	movl	%eax,32*p1(%rdi)	;\
+	xorl	32*p1-28(%rdi),%eax	;\
+	movl	%eax,32*p1+4(%rdi)	;\
+	xorl	32*p1-24(%rdi),%eax	;\
+	movl	%eax,32*p1+8(%rdi)	;\
+	xorl	32*p1-20(%rdi),%eax	;\
+	movl	%eax,32*p1+12(%rdi)	;\
+	pushq	%rbx			;\
+	movl	%eax,%ebx		;\
+	mix_col(aes_fl_tab)		;\
+	popq	%rbx			;\
+	xorl	%eax,%esi		;\
+	xorl	%esi,%ebp		;\
+	movl	%esi,32*p1+16(%rdi)	;\
+	movl	%ebp,32*p1+20(%rdi)	;\
+	xorl	%ebp,%edx		;\
+	xorl	%edx,%ebx		;\
+	movl	%edx,32*p1+24(%rdi)	;\
+	movl	%ebx,32*p1+28(%rdi)
+
+// rdi = pointer to AES context
+// rsi = pointer to key bytes
+// rdx = key length, bytes or bits
+// rcx = ed_flag, 1=encrypt only, 0=both encrypt and decrypt
+
+	.align	ALIGN64BYTES
+aes_set_key:
+	pushfq
+	pushq	%rbp
+	pushq	%rbx
+
+	movq	%rcx,%r11		// ed_flg
+	movq	%rdx,%rcx		// key length
+	movq	%rdi,%r10		// AES context
+
+	cmpl	$128,%ecx
+	jb	aes_30
+	shrl	$3,%ecx
+aes_30:	cmpl	$32,%ecx
+	je	aes_32
+	cmpl	$24,%ecx
+	je	aes_32
+	movl	$16,%ecx
+aes_32:	shrl	$2,%ecx
+	movl	%ecx,nkey(%r10)
+	leaq	6(%rcx),%rax		// 10/12/14 for 4/6/8 32-bit key length
+	movl	%eax,nrnd(%r10)
+	leaq	ekey(%r10),%rdi		// key position in AES context
+	cld
+	movl	%ecx,%eax		// save key length in eax
+	rep ;	movsl			// words in the key schedule
+	movl	-4(%rsi),%ebx		// put some values in registers
+	movl	-8(%rsi),%edx		// to allow faster code
+	movl	-12(%rsi),%ebp
+	movl	-16(%rsi),%esi
+
+	cmpl	$4,%eax			// jump on key size
+	je	aes_36
+	cmpl	$6,%eax
+	je	aes_35
+
+	ksc8(0)
+	ksc8(1)
+	ksc8(2)
+	ksc8(3)
+	ksc8(4)
+	ksc8(5)
+	ksc8(6)
+	jmp	aes_37
+aes_35:	ksc6(0)
+	ksc6(1)
+	ksc6(2)
+	ksc6(3)
+	ksc6(4)
+	ksc6(5)
+	ksc6(6)
+	ksc6(7)
+	jmp	aes_37
+aes_36:	ksc4(0)
+	ksc4(1)
+	ksc4(2)
+	ksc4(3)
+	ksc4(4)
+	ksc4(5)
+	ksc4(6)
+	ksc4(7)
+	ksc4(8)
+	ksc4(9)
+aes_37:	cmpl	$0,%r11d		// ed_flg
+	jne	aes_39
+
+// compile decryption key schedule from encryption schedule - reverse
+// order and do mix_column operation on round keys except first and last
+
+	movl	nrnd(%r10),%eax		// kt = cx->d_key + nc * cx->Nrnd
+	shl	$2,%rax
+	leaq	dkey(%r10,%rax,4),%rdi
+	leaq	ekey(%r10),%rsi		// kf = cx->e_key
+
+	movsq				// copy first round key (unmodified)
+	movsq
+	subq	$32,%rdi
+	movl	$1,%r9d
+aes_38:					// do mix column on each column of
+	lodsl				// each round key
+	movl	%eax,%ebx
+	mix_col(aes_im_tab)
+	stosl
+	lodsl
+	movl	%eax,%ebx
+	mix_col(aes_im_tab)
+	stosl
+	lodsl
+	movl	%eax,%ebx
+	mix_col(aes_im_tab)
+	stosl
+	lodsl
+	movl	%eax,%ebx
+	mix_col(aes_im_tab)
+	stosl
+	subq	$32,%rdi
+
+	incl	%r9d
+	cmpl	nrnd(%r10),%r9d
+	jb	aes_38
+
+	movsq				// copy last round key (unmodified)
+	movsq
+aes_39:	popq	%rbx
+	popq	%rbp
+	popfq
+	ret
+
+
+// finite field multiplies by {02}, {04} and {08}
+
+#define f2(x)	((x<<1)^(((x>>7)&1)*0x11b))
+#define f4(x)	((x<<2)^(((x>>6)&1)*0x11b)^(((x>>6)&2)*0x11b))
+#define f8(x)	((x<<3)^(((x>>5)&1)*0x11b)^(((x>>5)&2)*0x11b)^(((x>>5)&4)*0x11b))
+
+// finite field multiplies required in table generation
+
+#define f3(x)	(f2(x) ^ x)
+#define f9(x)	(f8(x) ^ x)
+#define fb(x)	(f8(x) ^ f2(x) ^ x)
+#define fd(x)	(f8(x) ^ f4(x) ^ x)
+#define fe(x)	(f8(x) ^ f4(x) ^ f2(x))
+
+// These defines generate the forward table entries
+
+#define u0(x)	((f3(x) << 24) | (x << 16) | (x << 8) | f2(x))
+#define u1(x)	((x << 24) | (x << 16) | (f2(x) << 8) | f3(x))
+#define u2(x)	((x << 24) | (f2(x) << 16) | (f3(x) << 8) | x)
+#define u3(x)	((f2(x) << 24) | (f3(x) << 16) | (x << 8) | x)
+
+// These defines generate the inverse table entries
+
+#define v0(x)	((fb(x) << 24) | (fd(x) << 16) | (f9(x) << 8) | fe(x))
+#define v1(x)	((fd(x) << 24) | (f9(x) << 16) | (fe(x) << 8) | fb(x))
+#define v2(x)	((f9(x) << 24) | (fe(x) << 16) | (fb(x) << 8) | fd(x))
+#define v3(x)	((fe(x) << 24) | (fb(x) << 16) | (fd(x) << 8) | f9(x))
+
+// These defines generate entries for the last round tables
+
+#define w0(x)	(x)
+#define w1(x)	(x <<  8)
+#define w2(x)	(x << 16)
+#define w3(x)	(x << 24)
+
+// macro to generate inverse mix column tables (needed for the key schedule)
+
+#define im_data0(p1) \
+	.long	p1(0x00),p1(0x01),p1(0x02),p1(0x03),p1(0x04),p1(0x05),p1(0x06),p1(0x07) ;\
+	.long	p1(0x08),p1(0x09),p1(0x0a),p1(0x0b),p1(0x0c),p1(0x0d),p1(0x0e),p1(0x0f) ;\
+	.long	p1(0x10),p1(0x11),p1(0x12),p1(0x13),p1(0x14),p1(0x15),p1(0x16),p1(0x17) ;\
+	.long	p1(0x18),p1(0x19),p1(0x1a),p1(0x1b),p1(0x1c),p1(0x1d),p1(0x1e),p1(0x1f)
+#define im_data1(p1) \
+	.long	p1(0x20),p1(0x21),p1(0x22),p1(0x23),p1(0x24),p1(0x25),p1(0x26),p1(0x27) ;\
+	.long	p1(0x28),p1(0x29),p1(0x2a),p1(0x2b),p1(0x2c),p1(0x2d),p1(0x2e),p1(0x2f) ;\
+	.long	p1(0x30),p1(0x31),p1(0x32),p1(0x33),p1(0x34),p1(0x35),p1(0x36),p1(0x37) ;\
+	.long	p1(0x38),p1(0x39),p1(0x3a),p1(0x3b),p1(0x3c),p1(0x3d),p1(0x3e),p1(0x3f)
+#define im_data2(p1) \
+	.long	p1(0x40),p1(0x41),p1(0x42),p1(0x43),p1(0x44),p1(0x45),p1(0x46),p1(0x47) ;\
+	.long	p1(0x48),p1(0x49),p1(0x4a),p1(0x4b),p1(0x4c),p1(0x4d),p1(0x4e),p1(0x4f) ;\
+	.long	p1(0x50),p1(0x51),p1(0x52),p1(0x53),p1(0x54),p1(0x55),p1(0x56),p1(0x57) ;\
+	.long	p1(0x58),p1(0x59),p1(0x5a),p1(0x5b),p1(0x5c),p1(0x5d),p1(0x5e),p1(0x5f)
+#define im_data3(p1) \
+	.long	p1(0x60),p1(0x61),p1(0x62),p1(0x63),p1(0x64),p1(0x65),p1(0x66),p1(0x67) ;\
+	.long	p1(0x68),p1(0x69),p1(0x6a),p1(0x6b),p1(0x6c),p1(0x6d),p1(0x6e),p1(0x6f) ;\
+	.long	p1(0x70),p1(0x71),p1(0x72),p1(0x73),p1(0x74),p1(0x75),p1(0x76),p1(0x77) ;\
+	.long	p1(0x78),p1(0x79),p1(0x7a),p1(0x7b),p1(0x7c),p1(0x7d),p1(0x7e),p1(0x7f)
+#define im_data4(p1) \
+	.long	p1(0x80),p1(0x81),p1(0x82),p1(0x83),p1(0x84),p1(0x85),p1(0x86),p1(0x87) ;\
+	.long	p1(0x88),p1(0x89),p1(0x8a),p1(0x8b),p1(0x8c),p1(0x8d),p1(0x8e),p1(0x8f) ;\
+	.long	p1(0x90),p1(0x91),p1(0x92),p1(0x93),p1(0x94),p1(0x95),p1(0x96),p1(0x97) ;\
+	.long	p1(0x98),p1(0x99),p1(0x9a),p1(0x9b),p1(0x9c),p1(0x9d),p1(0x9e),p1(0x9f)
+#define im_data5(p1) \
+	.long	p1(0xa0),p1(0xa1),p1(0xa2),p1(0xa3),p1(0xa4),p1(0xa5),p1(0xa6),p1(0xa7) ;\
+	.long	p1(0xa8),p1(0xa9),p1(0xaa),p1(0xab),p1(0xac),p1(0xad),p1(0xae),p1(0xaf) ;\
+	.long	p1(0xb0),p1(0xb1),p1(0xb2),p1(0xb3),p1(0xb4),p1(0xb5),p1(0xb6),p1(0xb7) ;\
+	.long	p1(0xb8),p1(0xb9),p1(0xba),p1(0xbb),p1(0xbc),p1(0xbd),p1(0xbe),p1(0xbf)
+#define im_data6(p1) \
+	.long	p1(0xc0),p1(0xc1),p1(0xc2),p1(0xc3),p1(0xc4),p1(0xc5),p1(0xc6),p1(0xc7) ;\
+	.long	p1(0xc8),p1(0xc9),p1(0xca),p1(0xcb),p1(0xcc),p1(0xcd),p1(0xce),p1(0xcf) ;\
+	.long	p1(0xd0),p1(0xd1),p1(0xd2),p1(0xd3),p1(0xd4),p1(0xd5),p1(0xd6),p1(0xd7) ;\
+	.long	p1(0xd8),p1(0xd9),p1(0xda),p1(0xdb),p1(0xdc),p1(0xdd),p1(0xde),p1(0xdf)
+#define im_data7(p1) \
+	.long	p1(0xe0),p1(0xe1),p1(0xe2),p1(0xe3),p1(0xe4),p1(0xe5),p1(0xe6),p1(0xe7) ;\
+	.long	p1(0xe8),p1(0xe9),p1(0xea),p1(0xeb),p1(0xec),p1(0xed),p1(0xee),p1(0xef) ;\
+	.long	p1(0xf0),p1(0xf1),p1(0xf2),p1(0xf3),p1(0xf4),p1(0xf5),p1(0xf6),p1(0xf7) ;\
+	.long	p1(0xf8),p1(0xf9),p1(0xfa),p1(0xfb),p1(0xfc),p1(0xfd),p1(0xfe),p1(0xff)
+
+// S-box data - 256 entries
+
+#define sb_data0(p1) \
+	.long	p1(0x63),p1(0x7c),p1(0x77),p1(0x7b),p1(0xf2),p1(0x6b),p1(0x6f),p1(0xc5) ;\
+	.long	p1(0x30),p1(0x01),p1(0x67),p1(0x2b),p1(0xfe),p1(0xd7),p1(0xab),p1(0x76) ;\
+	.long	p1(0xca),p1(0x82),p1(0xc9),p1(0x7d),p1(0xfa),p1(0x59),p1(0x47),p1(0xf0) ;\
+	.long	p1(0xad),p1(0xd4),p1(0xa2),p1(0xaf),p1(0x9c),p1(0xa4),p1(0x72),p1(0xc0)
+#define sb_data1(p1) \
+	.long	p1(0xb7),p1(0xfd),p1(0x93),p1(0x26),p1(0x36),p1(0x3f),p1(0xf7),p1(0xcc) ;\
+	.long	p1(0x34),p1(0xa5),p1(0xe5),p1(0xf1),p1(0x71),p1(0xd8),p1(0x31),p1(0x15) ;\
+	.long	p1(0x04),p1(0xc7),p1(0x23),p1(0xc3),p1(0x18),p1(0x96),p1(0x05),p1(0x9a) ;\
+	.long	p1(0x07),p1(0x12),p1(0x80),p1(0xe2),p1(0xeb),p1(0x27),p1(0xb2),p1(0x75)
+#define sb_data2(p1) \
+	.long	p1(0x09),p1(0x83),p1(0x2c),p1(0x1a),p1(0x1b),p1(0x6e),p1(0x5a),p1(0xa0) ;\
+	.long	p1(0x52),p1(0x3b),p1(0xd6),p1(0xb3),p1(0x29),p1(0xe3),p1(0x2f),p1(0x84) ;\
+	.long	p1(0x53),p1(0xd1),p1(0x00),p1(0xed),p1(0x20),p1(0xfc),p1(0xb1),p1(0x5b) ;\
+	.long	p1(0x6a),p1(0xcb),p1(0xbe),p1(0x39),p1(0x4a),p1(0x4c),p1(0x58),p1(0xcf)
+#define sb_data3(p1) \
+	.long	p1(0xd0),p1(0xef),p1(0xaa),p1(0xfb),p1(0x43),p1(0x4d),p1(0x33),p1(0x85) ;\
+	.long	p1(0x45),p1(0xf9),p1(0x02),p1(0x7f),p1(0x50),p1(0x3c),p1(0x9f),p1(0xa8) ;\
+	.long	p1(0x51),p1(0xa3),p1(0x40),p1(0x8f),p1(0x92),p1(0x9d),p1(0x38),p1(0xf5) ;\
+	.long	p1(0xbc),p1(0xb6),p1(0xda),p1(0x21),p1(0x10),p1(0xff),p1(0xf3),p1(0xd2)
+#define sb_data4(p1) \
+	.long	p1(0xcd),p1(0x0c),p1(0x13),p1(0xec),p1(0x5f),p1(0x97),p1(0x44),p1(0x17) ;\
+	.long	p1(0xc4),p1(0xa7),p1(0x7e),p1(0x3d),p1(0x64),p1(0x5d),p1(0x19),p1(0x73) ;\
+	.long	p1(0x60),p1(0x81),p1(0x4f),p1(0xdc),p1(0x22),p1(0x2a),p1(0x90),p1(0x88) ;\
+	.long	p1(0x46),p1(0xee),p1(0xb8),p1(0x14),p1(0xde),p1(0x5e),p1(0x0b),p1(0xdb)
+#define sb_data5(p1) \
+	.long	p1(0xe0),p1(0x32),p1(0x3a),p1(0x0a),p1(0x49),p1(0x06),p1(0x24),p1(0x5c) ;\
+	.long	p1(0xc2),p1(0xd3),p1(0xac),p1(0x62),p1(0x91),p1(0x95),p1(0xe4),p1(0x79) ;\
+	.long	p1(0xe7),p1(0xc8),p1(0x37),p1(0x6d),p1(0x8d),p1(0xd5),p1(0x4e),p1(0xa9) ;\
+	.long	p1(0x6c),p1(0x56),p1(0xf4),p1(0xea),p1(0x65),p1(0x7a),p1(0xae),p1(0x08)
+#define sb_data6(p1) \
+	.long	p1(0xba),p1(0x78),p1(0x25),p1(0x2e),p1(0x1c),p1(0xa6),p1(0xb4),p1(0xc6) ;\
+	.long	p1(0xe8),p1(0xdd),p1(0x74),p1(0x1f),p1(0x4b),p1(0xbd),p1(0x8b),p1(0x8a) ;\
+	.long	p1(0x70),p1(0x3e),p1(0xb5),p1(0x66),p1(0x48),p1(0x03),p1(0xf6),p1(0x0e) ;\
+	.long	p1(0x61),p1(0x35),p1(0x57),p1(0xb9),p1(0x86),p1(0xc1),p1(0x1d),p1(0x9e)
+#define sb_data7(p1) \
+	.long	p1(0xe1),p1(0xf8),p1(0x98),p1(0x11),p1(0x69),p1(0xd9),p1(0x8e),p1(0x94) ;\
+	.long	p1(0x9b),p1(0x1e),p1(0x87),p1(0xe9),p1(0xce),p1(0x55),p1(0x28),p1(0xdf) ;\
+	.long	p1(0x8c),p1(0xa1),p1(0x89),p1(0x0d),p1(0xbf),p1(0xe6),p1(0x42),p1(0x68) ;\
+	.long	p1(0x41),p1(0x99),p1(0x2d),p1(0x0f),p1(0xb0),p1(0x54),p1(0xbb),p1(0x16)
+
+// Inverse S-box data - 256 entries
+
+#define ib_data0(p1) \
+	.long	p1(0x52),p1(0x09),p1(0x6a),p1(0xd5),p1(0x30),p1(0x36),p1(0xa5),p1(0x38) ;\
+	.long	p1(0xbf),p1(0x40),p1(0xa3),p1(0x9e),p1(0x81),p1(0xf3),p1(0xd7),p1(0xfb) ;\
+	.long	p1(0x7c),p1(0xe3),p1(0x39),p1(0x82),p1(0x9b),p1(0x2f),p1(0xff),p1(0x87) ;\
+	.long	p1(0x34),p1(0x8e),p1(0x43),p1(0x44),p1(0xc4),p1(0xde),p1(0xe9),p1(0xcb)
+#define ib_data1(p1) \
+	.long	p1(0x54),p1(0x7b),p1(0x94),p1(0x32),p1(0xa6),p1(0xc2),p1(0x23),p1(0x3d) ;\
+	.long	p1(0xee),p1(0x4c),p1(0x95),p1(0x0b),p1(0x42),p1(0xfa),p1(0xc3),p1(0x4e) ;\
+	.long	p1(0x08),p1(0x2e),p1(0xa1),p1(0x66),p1(0x28),p1(0xd9),p1(0x24),p1(0xb2) ;\
+	.long	p1(0x76),p1(0x5b),p1(0xa2),p1(0x49),p1(0x6d),p1(0x8b),p1(0xd1),p1(0x25)
+#define ib_data2(p1) \
+	.long	p1(0x72),p1(0xf8),p1(0xf6),p1(0x64),p1(0x86),p1(0x68),p1(0x98),p1(0x16) ;\
+	.long	p1(0xd4),p1(0xa4),p1(0x5c),p1(0xcc),p1(0x5d),p1(0x65),p1(0xb6),p1(0x92) ;\
+	.long	p1(0x6c),p1(0x70),p1(0x48),p1(0x50),p1(0xfd),p1(0xed),p1(0xb9),p1(0xda) ;\
+	.long	p1(0x5e),p1(0x15),p1(0x46),p1(0x57),p1(0xa7),p1(0x8d),p1(0x9d),p1(0x84)
+#define ib_data3(p1) \
+	.long	p1(0x90),p1(0xd8),p1(0xab),p1(0x00),p1(0x8c),p1(0xbc),p1(0xd3),p1(0x0a) ;\
+	.long	p1(0xf7),p1(0xe4),p1(0x58),p1(0x05),p1(0xb8),p1(0xb3),p1(0x45),p1(0x06) ;\
+	.long	p1(0xd0),p1(0x2c),p1(0x1e),p1(0x8f),p1(0xca),p1(0x3f),p1(0x0f),p1(0x02) ;\
+	.long	p1(0xc1),p1(0xaf),p1(0xbd),p1(0x03),p1(0x01),p1(0x13),p1(0x8a),p1(0x6b)
+#define ib_data4(p1) \
+	.long	p1(0x3a),p1(0x91),p1(0x11),p1(0x41),p1(0x4f),p1(0x67),p1(0xdc),p1(0xea) ;\
+	.long	p1(0x97),p1(0xf2),p1(0xcf),p1(0xce),p1(0xf0),p1(0xb4),p1(0xe6),p1(0x73) ;\
+	.long	p1(0x96),p1(0xac),p1(0x74),p1(0x22),p1(0xe7),p1(0xad),p1(0x35),p1(0x85) ;\
+	.long	p1(0xe2),p1(0xf9),p1(0x37),p1(0xe8),p1(0x1c),p1(0x75),p1(0xdf),p1(0x6e)
+#define ib_data5(p1) \
+	.long	p1(0x47),p1(0xf1),p1(0x1a),p1(0x71),p1(0x1d),p1(0x29),p1(0xc5),p1(0x89) ;\
+	.long	p1(0x6f),p1(0xb7),p1(0x62),p1(0x0e),p1(0xaa),p1(0x18),p1(0xbe),p1(0x1b) ;\
+	.long	p1(0xfc),p1(0x56),p1(0x3e),p1(0x4b),p1(0xc6),p1(0xd2),p1(0x79),p1(0x20) ;\
+	.long	p1(0x9a),p1(0xdb),p1(0xc0),p1(0xfe),p1(0x78),p1(0xcd),p1(0x5a),p1(0xf4)
+#define ib_data6(p1) \
+	.long	p1(0x1f),p1(0xdd),p1(0xa8),p1(0x33),p1(0x88),p1(0x07),p1(0xc7),p1(0x31) ;\
+	.long	p1(0xb1),p1(0x12),p1(0x10),p1(0x59),p1(0x27),p1(0x80),p1(0xec),p1(0x5f) ;\
+	.long	p1(0x60),p1(0x51),p1(0x7f),p1(0xa9),p1(0x19),p1(0xb5),p1(0x4a),p1(0x0d) ;\
+	.long	p1(0x2d),p1(0xe5),p1(0x7a),p1(0x9f),p1(0x93),p1(0xc9),p1(0x9c),p1(0xef)
+#define ib_data7(p1) \
+	.long	p1(0xa0),p1(0xe0),p1(0x3b),p1(0x4d),p1(0xae),p1(0x2a),p1(0xf5),p1(0xb0) ;\
+	.long	p1(0xc8),p1(0xeb),p1(0xbb),p1(0x3c),p1(0x83),p1(0x53),p1(0x99),p1(0x61) ;\
+	.long	p1(0x17),p1(0x2b),p1(0x04),p1(0x7e),p1(0xba),p1(0x77),p1(0xd6),p1(0x26) ;\
+	.long	p1(0xe1),p1(0x69),p1(0x14),p1(0x63),p1(0x55),p1(0x21),p1(0x0c),p1(0x7d)
+
+// The rcon_table (needed for the key schedule)
+//
+// Here is original Dr Brian Gladman's source code:
+//	_rcon_tab:
+//	%assign x   1
+//	%rep 29
+//	    dd  x
+//	%assign x f2(x)
+//	%endrep
+//
+// Here is precomputed output (it's more portable this way):
+
+	.section .rodata
+	.align	ALIGN64BYTES
+aes_rcon_tab:
+	.long	0x01,0x02,0x04,0x08,0x10,0x20,0x40,0x80
+	.long	0x1b,0x36,0x6c,0xd8,0xab,0x4d,0x9a,0x2f
+	.long	0x5e,0xbc,0x63,0xc6,0x97,0x35,0x6a,0xd4
+	.long	0xb3,0x7d,0xfa,0xef,0xc5
+
+// The forward xor tables
+
+	.align	ALIGN64BYTES
+aes_ft_tab:
+	sb_data0(u0)
+	sb_data1(u0)
+	sb_data2(u0)
+	sb_data3(u0)
+	sb_data4(u0)
+	sb_data5(u0)
+	sb_data6(u0)
+	sb_data7(u0)
+
+	sb_data0(u1)
+	sb_data1(u1)
+	sb_data2(u1)
+	sb_data3(u1)
+	sb_data4(u1)
+	sb_data5(u1)
+	sb_data6(u1)
+	sb_data7(u1)
+
+	sb_data0(u2)
+	sb_data1(u2)
+	sb_data2(u2)
+	sb_data3(u2)
+	sb_data4(u2)
+	sb_data5(u2)
+	sb_data6(u2)
+	sb_data7(u2)
+
+	sb_data0(u3)
+	sb_data1(u3)
+	sb_data2(u3)
+	sb_data3(u3)
+	sb_data4(u3)
+	sb_data5(u3)
+	sb_data6(u3)
+	sb_data7(u3)
+
+	.align	ALIGN64BYTES
+aes_fl_tab:
+	sb_data0(w0)
+	sb_data1(w0)
+	sb_data2(w0)
+	sb_data3(w0)
+	sb_data4(w0)
+	sb_data5(w0)
+	sb_data6(w0)
+	sb_data7(w0)
+
+	sb_data0(w1)
+	sb_data1(w1)
+	sb_data2(w1)
+	sb_data3(w1)
+	sb_data4(w1)
+	sb_data5(w1)
+	sb_data6(w1)
+	sb_data7(w1)
+
+	sb_data0(w2)
+	sb_data1(w2)
+	sb_data2(w2)
+	sb_data3(w2)
+	sb_data4(w2)
+	sb_data5(w2)
+	sb_data6(w2)
+	sb_data7(w2)
+
+	sb_data0(w3)
+	sb_data1(w3)
+	sb_data2(w3)
+	sb_data3(w3)
+	sb_data4(w3)
+	sb_data5(w3)
+	sb_data6(w3)
+	sb_data7(w3)
+
+// The inverse xor tables
+
+	.align	ALIGN64BYTES
+aes_it_tab:
+	ib_data0(v0)
+	ib_data1(v0)
+	ib_data2(v0)
+	ib_data3(v0)
+	ib_data4(v0)
+	ib_data5(v0)
+	ib_data6(v0)
+	ib_data7(v0)
+
+	ib_data0(v1)
+	ib_data1(v1)
+	ib_data2(v1)
+	ib_data3(v1)
+	ib_data4(v1)
+	ib_data5(v1)
+	ib_data6(v1)
+	ib_data7(v1)
+
+	ib_data0(v2)
+	ib_data1(v2)
+	ib_data2(v2)
+	ib_data3(v2)
+	ib_data4(v2)
+	ib_data5(v2)
+	ib_data6(v2)
+	ib_data7(v2)
+
+	ib_data0(v3)
+	ib_data1(v3)
+	ib_data2(v3)
+	ib_data3(v3)
+	ib_data4(v3)
+	ib_data5(v3)
+	ib_data6(v3)
+	ib_data7(v3)
+
+	.align	ALIGN64BYTES
+aes_il_tab:
+	ib_data0(w0)
+	ib_data1(w0)
+	ib_data2(w0)
+	ib_data3(w0)
+	ib_data4(w0)
+	ib_data5(w0)
+	ib_data6(w0)
+	ib_data7(w0)
+
+	ib_data0(w1)
+	ib_data1(w1)
+	ib_data2(w1)
+	ib_data3(w1)
+	ib_data4(w1)
+	ib_data5(w1)
+	ib_data6(w1)
+	ib_data7(w1)
+
+	ib_data0(w2)
+	ib_data1(w2)
+	ib_data2(w2)
+	ib_data3(w2)
+	ib_data4(w2)
+	ib_data5(w2)
+	ib_data6(w2)
+	ib_data7(w2)
+
+	ib_data0(w3)
+	ib_data1(w3)
+	ib_data2(w3)
+	ib_data3(w3)
+	ib_data4(w3)
+	ib_data5(w3)
+	ib_data6(w3)
+	ib_data7(w3)
+
+// The inverse mix column tables
+
+	.align	ALIGN64BYTES
+aes_im_tab:
+	im_data0(v0)
+	im_data1(v0)
+	im_data2(v0)
+	im_data3(v0)
+	im_data4(v0)
+	im_data5(v0)
+	im_data6(v0)
+	im_data7(v0)
+
+	im_data0(v1)
+	im_data1(v1)
+	im_data2(v1)
+	im_data3(v1)
+	im_data4(v1)
+	im_data5(v1)
+	im_data6(v1)
+	im_data7(v1)
+
+	im_data0(v2)
+	im_data1(v2)
+	im_data2(v2)
+	im_data3(v2)
+	im_data4(v2)
+	im_data5(v2)
+	im_data6(v2)
+	im_data7(v2)
+
+	im_data0(v3)
+	im_data1(v3)
+	im_data2(v3)
+	im_data3(v3)
+	im_data4(v3)
+	im_data5(v3)
+	im_data6(v3)
+	im_data7(v3)
--- linux-2.6.9-rc2-mm1/arch/x86_64/crypto/aes-x86_64-glue.c	1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.9-rc2-mm1-aes/arch/x86_64/crypto/aes-x86_64-glue.c	2004-09-26 23:50:32.296783760 +0200
@@ -0,0 +1,91 @@
+/* 
+ * 
+ * Glue Code for optimized x86_64 assembler version of AES
+ *
+ * Copyright (c) 2001, Dr Brian Gladman <brg@gladman.uk.net>, Worcester, UK.
+ * Copyright (c) 2003, Adam J. Richter <adam@yggdrasil.com> (conversion to
+ * 2.5 API).
+ * Copyright (c) 2003, 2004 Fruhwirth Clemens <clemens@endorphin.org>
+ * Copyright (c) 2004, Florian Bohrer <Florian.Bohrer@t-online.de>
+*/
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/types.h>
+#include <linux/crypto.h>
+#include <linux/linkage.h>
+
+#define AES_MIN_KEY_SIZE	16
+#define AES_MAX_KEY_SIZE	32
+#define AES_BLOCK_SIZE		16
+#define AES_KS_LENGTH   4 * AES_BLOCK_SIZE
+#define AES_RC_LENGTH   (9 * AES_BLOCK_SIZE) / 8 - 8
+
+typedef struct
+{
+    u_int32_t	 aes_Nkey;	// the number of words in the key input block
+    u_int32_t	 aes_Nrnd;	// the number of cipher rounds
+    u_int32_t	 aes_e_key[AES_KS_LENGTH];   // the encryption key schedule
+    u_int32_t	 aes_d_key[AES_KS_LENGTH];   // the decryption key schedule
+    u_int32_t	 aes_Ncol;	// the number of columns in the cipher state
+} aes_context;
+
+ 
+asmlinkage void aes_set_key(void *, const unsigned char [], const int, const int);
+asmlinkage void aes_encrypt(void*, unsigned char [], const unsigned char []);
+asmlinkage void aes_decrypt(void*, unsigned char [], const unsigned char []);
+
+
+static int aes_set_key_glue(void *cx, const u8 *key,unsigned int key_length, u32 *flags)
+{
+	if(key_length != 16 && key_length != 24 && key_length != 32)
+	{
+ 		*flags |= CRYPTO_TFM_RES_BAD_KEY_LEN;
+		return -EINVAL;
+	}
+	aes_set_key(cx, key, key_length, 0);
+	return 0;
+}
+
+static void aes_encrypt_glue(void* a, unsigned char b[], const unsigned char c[]) {
+	aes_encrypt(a,b,c);
+}
+static void aes_decrypt_glue(void* a, unsigned char b[], const unsigned char c[]) {
+	aes_decrypt(a,b,c);
+}
+
+static struct crypto_alg aes_alg = {
+	.cra_name		=	"aes",
+	.cra_flags		=	CRYPTO_ALG_TYPE_CIPHER,
+	.cra_blocksize		=	AES_BLOCK_SIZE,
+	.cra_ctxsize		=	sizeof(aes_context),
+	.cra_module		=	THIS_MODULE,
+	.cra_list		=	LIST_HEAD_INIT(aes_alg.cra_list),
+	.cra_u			=	{
+		.cipher = {
+			.cia_min_keysize	=	AES_MIN_KEY_SIZE,
+			.cia_max_keysize	=	AES_MAX_KEY_SIZE,
+			.cia_setkey	   	= 	aes_set_key_glue,
+			.cia_encrypt	 	=	aes_encrypt_glue,
+			.cia_decrypt	  	=	aes_decrypt_glue
+		}
+	}
+};
+
+static int __init aes_init(void)
+{
+	return crypto_register_alg(&aes_alg);
+}
+
+static void __exit aes_fini(void)
+{
+	crypto_unregister_alg(&aes_alg);
+}
+
+module_init(aes_init);
+module_exit(aes_fini);
+
+MODULE_DESCRIPTION("Rijndael (AES) Cipher Algorithm, x86_64 asm optimized");
+MODULE_LICENSE("Dual BSD/GPL");
+MODULE_AUTHOR("Florian Bohrer");
+MODULE_ALIAS("aes");
--- linux-2.6.9-rc2-mm1/crypto/Kconfig	2004-09-26 11:50:39.692188448 +0200
+++ linux-2.6.9-rc2-mm1-aes/crypto/Kconfig	2004-09-26 10:24:16.219233840 +0200
@@ -173,6 +173,26 @@
 
 	  See http://csrc.nist.gov/encryption/aes/ for more information.
 
+config CRYPTO_AES_X86_64
+	tristate "AES cipher algorithms (x86_64)"
+	depends on CRYPTO && (X86 && X86_64)
+	help
+	  AES cipher algorithms (FIPS-197). AES uses the Rijndael 
+	  algorithm.
+
+	  Rijndael appears to be consistently a very good performer in
+	  both hardware and software across a wide range of computing 
+	  environments regardless of its use in feedback or non-feedback 
+	  modes. Its key setup time is excellent, and its key agility is 
+	  good. Rijndael's very low memory requirements make it very well 
+	  suited for restricted-space environments, in which it also 
+	  demonstrates excellent performance. Rijndael's operations are 
+	  among the easiest to defend against power and timing attacks.	
+
+	  The AES specifies three key sizes: 128, 192 and 256 bits	  
+
+	  See http://csrc.nist.gov/encryption/aes/ for more information.
+
 config CRYPTO_CAST5
 	tristate "CAST5 (CAST-128) cipher algorithm"
 	depends on CRYPTO
--- linux-2.6.9-rc2-mm1/arch/x86_64/Makefile	2004-09-26 11:50:39.654194224 +0200
+++ linux-2.6.9-rc2-mm1-aes/arch/x86_64/Makefile	2004-09-26 10:25:40.214464624 +0200
@@ -63,7 +63,9 @@
 head-y := arch/x86_64/kernel/head.o arch/x86_64/kernel/head64.o arch/x86_64/kernel/init_task.o
 
 libs-y 					+= arch/x86_64/lib/
-core-y					+= arch/x86_64/kernel/ arch/x86_64/mm/
+core-y					+= arch/x86_64/kernel/ \
+					   arch/x86_64/mm/ \
+					   arch/x86_64/crypto/ 
 core-$(CONFIG_IA32_EMULATION)		+= arch/x86_64/ia32/
 drivers-$(CONFIG_PCI)			+= arch/x86_64/pci/
 drivers-$(CONFIG_OPROFILE)		+= arch/x86_64/oprofile/
--- linux-2.6.9-rc2-mm1/arch/x86_64/crypto/Makefile	1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.9-rc2-mm1-aes/arch/x86_64/crypto/Makefile	2004-09-26 10:22:51.074177856 +0200
@@ -0,0 +1,9 @@
+# 
+# x86_64/crypto/Makefile 
+# 
+# Arch-specific CryptoAPI modules.
+# 
+
+obj-$(CONFIG_CRYPTO_AES_X86_64) += aes-x86_64.o
+
+aes-x86_64-y := aes-x86_64-asm.o aes-x86_64-glue.o

-- 


-----------------------------------------------------------------------------
"Real Programmers consider "what you see is what you get" to 
be just as bad a concept in Text Editors as it is in women. 
No, the Real Programmer wants a "you asked for it, you got 
it" text editor -- complicated, cryptic, powerful, 
unforgiving, dangerous."
-----------------------------------------------------------------------------


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] AES x86-64-asm impl.
  2004-10-02 17:53 Florian Bohrer
@ 2004-10-02 19:37 ` Lee Revell
  0 siblings, 0 replies; 17+ messages in thread
From: Lee Revell @ 2004-10-02 19:37 UTC (permalink / raw)
  To: Florian Bohrer; +Cc: linux-kernel

On Sat, 2004-10-02 at 13:53, Florian Bohrer wrote:
> hi,
> 
> this is my first public kernel patch. it is an x86_64 asm optimized version of AES for the 
> crypto-framework. the patch is against 2.6.9-rc2-mm1 but should work with other 
> versions too. 
> 
> 
> the asm-code is from Jari Ruusu (loop-aes).
> the org. glue-code is from Fruhwirth Clemens.

You should have cc'ed Jari and Fruwirth, you'd probably get an amusing
flame fest.

Lee


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2004-10-05 15:24 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <2KWl4-wq-25@gated-at.bofh.it>
2004-10-02 19:41 ` [PATCH] AES x86-64-asm impl Andi Kleen
2004-10-04  2:15   ` dean gaudet
2004-10-04 11:51   ` Jari Ruusu
2004-10-04 12:09     ` Paolo Ciarrocchi
2004-10-04 12:20       ` Jari Ruusu
2004-10-04 12:23         ` Paolo Ciarrocchi
2004-10-04 12:32           ` Jari Ruusu
2004-10-04 12:35             ` Paolo Ciarrocchi
2004-10-04 18:58             ` [discuss] " Raul Miller
2004-10-04 19:26             ` Bill Davidsen
2004-10-04 21:20               ` Lee Revell
2004-10-05 15:00                 ` Bill Davidsen
2004-10-04 13:08     ` Andi Kleen
2004-10-05  0:35       ` Andy Lutomirski
2004-10-05  5:15         ` Linus Torvalds
2004-10-02 17:53 Florian Bohrer
2004-10-02 19:37 ` Lee Revell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox